Stability selection for mixed effect models with large numbers of predictor variables: A simulation study

Hyde, Robert; O'Grady, Luke; Green, Martin

doi:10.1016/j.prevetmed.2022.105714

Stability selection for mixed effect models with large numbers of predictor variables: A simulation study

Hyde, Robert; O'Grady, Luke; Green, Martin

Authors

Mr ROBERT HYDE Robert.Hyde4@nottingham.ac.uk
ASSISTANT PROFESSOR IN COMPUTATIONAL BIOLOGY

Luke O'Grady

Martin Green

Abstract

Covariate selection when the number of available variables is large relative to the number of observations is problematic in epidemiology and remains the focus of continued research. Whilst a variety of statistical methods have been developed to attempt to overcome this issue, at present very few methods are available for wide data that include a clustered outcome. The purpose of this research was to make an empirical evaluation of a new method for covariate selection in wide data settings when the dependent variable is clustered. We used 3300 simulated datasets with a variety of defined structures and known sets of true predictor variables to conduct an empirical evaluation of a mixed model stability selection procedure. Comparison was made with an alternative method based on regularisation using the least absolute shrinkage and selection operator (Lasso) penalty. Model performance was assessed using several metrics including the true positive rate (proportion of true covariates selected in a final model) and false discovery rate (proportion of variables selected in a final model that were non-true (false) variables). For stability selection, the false discovery rate was consistently low, generally remaining ≤ 0.02 indicating that on average fewer than 1 in 50 of the variables selected in a final model were false variables. This was in contrast to the Lasso-based method in which the false discovery rate was between 0.59 and 0.72, indicating that generally more than 60% of variables selected in a final model were false variables. In contrast however, the Lasso method attained higher true positive rates than stability selection, although both methods achieved good results. For the Lasso method, true positive rates remained ≥ 0.93 whereas for stability selection the true positive rate was 0.73–0.97. Our results suggest both methods may be of value for covariate selection with high dimensional data with a clustered outcome. When high specificity is needed for identification of true covariates, stability selection appeared to offer the better solution, although with a slight loss of sensitivity. Conversely when high sensitivity is needed, the Lasso approach may be useful, even if accompanied by a substantial loss of specificity. Overall, the results indicated the loss of sensitivity when employing stability selection is relatively small compared to the loss of specificity when using the Lasso and therefore stability selection may provide the better option for the analyst when evaluating data of this type.

Citation

Hyde, R., O'Grady, L., & Green, M. (2022). Stability selection for mixed effect models with large numbers of predictor variables: A simulation study. Preventive Veterinary Medicine, 206, Article 105714. https://doi.org/10.1016/j.prevetmed.2022.105714

Journal Article Type	Article
Acceptance Date	Jul 10, 2022
Online Publication Date	Jul 12, 2022
Publication Date	Sep 1, 2022
Deposit Date	Jul 14, 2022
Publicly Available Date	Jul 14, 2022
Journal	Preventive Veterinary Medicine
Print ISSN	0167-5877
Electronic ISSN	1873-1716
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	206
Article Number	105714
DOI	https://doi.org/10.1016/j.prevetmed.2022.105714
Keywords	Animal Science and Zoology; Food Animals
Public URL	https://nottingham-repository.worktribe.com/output/8952940
Publisher URL	https://www.sciencedirect.com/science/article/pii/S0167587722001477?via%3Dihub
Additional Information	This article is maintained by: Elsevier; Article Title: Stability selection for mixed effect models with large numbers of predictor variables: A simulation study; Journal Title: Preventive Veterinary Medicine; CrossRef DOI link to publisher maintained version: https://doi.org/10.1016/j.prevetmed.2022.105714; Content Type: article; Copyright: © 2022 The Author(s). Published by Elsevier B.V.