Skip to main content

Research Repository

Advanced Search

Stability selection for mixed effect models with large numbers of predictor variables: A simulation study

Hyde, Robert; O'Grady, Luke; Green, Martin


Assistant Professor in Computational Biology

Professor of Cattle Health & Epidemiology


Covariate selection when the number of available variables is large relative to the number of observations is problematic in epidemiology and remains the focus of continued research. Whilst a variety of statistical methods have been developed to attempt to overcome this issue, at present very few methods are available for wide data that include a clustered outcome. The purpose of this research was to make an empirical evaluation of a new method for covariate selection in wide data settings when the dependent variable is clustered. We used 3300 simulated datasets with a variety of defined structures and known sets of true predictor variables to conduct an empirical evaluation of a mixed model stability selection procedure. Comparison was made with an alternative method based on regularisation using the least absolute shrinkage and selection operator (Lasso) penalty. Model performance was assessed using several metrics including the true positive rate (proportion of true covariates selected in a final model) and false discovery rate (proportion of variables selected in a final model that were non-true (false) variables). For stability selection, the false discovery rate was consistently low, generally remaining ≤ 0.02 indicating that on average fewer than 1 in 50 of the variables selected in a final model were false variables. This was in contrast to the Lasso-based method in which the false discovery rate was between 0.59 and 0.72, indicating that generally more than 60% of variables selected in a final model were false variables. In contrast however, the Lasso method attained higher true positive rates than stability selection, although both methods achieved good results. For the Lasso method, true positive rates remained ≥ 0.93 whereas for stability selection the true positive rate was 0.73–0.97. Our results suggest both methods may be of value for covariate selection with high dimensional data with a clustered outcome. When high specificity is needed for identification of true covariates, stability selection appeared to offer the better solution, although with a slight loss of sensitivity. Conversely when high sensitivity is needed, the Lasso approach may be useful, even if accompanied by a substantial loss of specificity. Overall, the results indicated the loss of sensitivity when employing stability selection is relatively small compared to the loss of specificity when using the Lasso and therefore stability selection may provide the better option for the analyst when evaluating data of this type.


Hyde, R., O'Grady, L., & Green, M. (2022). Stability selection for mixed effect models with large numbers of predictor variables: A simulation study. Preventive Veterinary Medicine, 206, Article 105714.

Journal Article Type Article
Acceptance Date Jul 10, 2022
Online Publication Date Jul 12, 2022
Publication Date Sep 1, 2022
Deposit Date Jul 14, 2022
Publicly Available Date Jul 14, 2022
Journal Preventive Veterinary Medicine
Print ISSN 0167-5877
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 206
Article Number 105714
Keywords Animal Science and Zoology; Food Animals
Public URL
Publisher URL
Additional Information This article is maintained by: Elsevier; Article Title: Stability selection for mixed effect models with large numbers of predictor variables: A simulation study; Journal Title: Preventive Veterinary Medicine; CrossRef DOI link to publisher maintained version:; Content Type: article; Copyright: © 2022 The Author(s). Published by Elsevier B.V.


You might also like

Downloadable Citations