Unpublished Preprints
These papers have been, or are about to be, submitted for peer review.
Doublethink: identifying risk factors in UK Biobank with simultaneous Bayesian-frequentist model-averaged hypothesis testing.
Arning, N., Fryer H. and D. J. Wilson (2023)
PreprintAbstract: Big data analysis can uncover risk factors for diverse diseases in large cohorts. Yet the evidence that an outcome is associated with one variable often changes depending on which other variables are included in the model, because of correlation. Accounting for correlation is critical, and Bayesian model-averaging offers a systematic approach. However, epidemiology typically employs classical approaches, perhaps because of concerns that arbitrary choices of prior unduly influence results. Here we show that simultaneous Bayesian and frequentist discovery is possible via model-averaged hypothesis testing in large samples, for a family of priors. We produce interchangeable posterior odds and p-values that control the strong-sense familywise error rate. This arises because the model-averaged deviance follows a chi-squared distribution when large, under the null hypothesis. We implement this ‘Doublethink’ approach in R and apply it to discover risk factors for COVID-19 hospitalization in 2020 among 1,912 variables in UK Biobank. We find that risk factors are highly numerous, encompassing many, but not all, of those reported in the literature. Certain risk factors reported for COVID-19 hospitalization in UK Biobank, like diabetes and hypertension, lacked strong evidence for association after accounting for other variables, while other risk factors that have received less prominence, like mental health, were important. We discuss the potential for impact and limitations of joint Bayesian-frequentist inference, and the mutual insights afforded into the long-standing disagreements on statistical approaches to scientific discovery.