Authors: David Selby, Yuichiro Iwashita, Kai Spriestersbach, Mohammad Saad, Dennis Bappert, Archana Warrier, Sumantrak Mukherjee, Koichi Kise, Sebastian Vollmer. Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models.Stat(2025).
Published in: Stat (2025) https://doi.org/10.1002/sta4.70054
Large language models (LLMs) have been extensively studied for their ability to generate convincing natural language sequences; however, their utility for quantitative information retrieval is less well understood. Here, we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert‐like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as ‘experts’.
@article{selby2025,
  title     = {Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models},
  author    = {David Selby and Yuichiro Iwashita and Kai Spriestersbach and Mohammad Saad and Dennis Bappert and Archana Warrier and Sumantrak Mukherjee and Koichi Kise and Sebastian Vollmer},
  year      = {2025},
  doi       = {10.1002/sta4.70054},
  url       = {https://doi.org/10.1002/sta4.70054},
  abstract  = {Large language models (LLMs) have been extensively studied for their ability to generate convincing natural language sequences; however, their utility for quantitative information retrieval is less well understood. Here, we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert‐like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as ‘experts’.},
}