Model Class Reliance for Random Forests

Smith, Gavin; Mansilla Lobos, Roberto; Goulding, James

Model Class Reliance for Random Forests

Smith, Gavin; Mansilla Lobos, Roberto; Goulding, James

Authors

Dr Gavin Smith GAVIN.SMITH@NOTTINGHAM.AC.UK
ASSOCIATE PROFESSOR

Mr ROBERTO MANSILLA LOBOS Roberto.MansillaLobos@nottingham.ac.uk
ASSISTANT PROFESSOR

Dr JAMES GOULDING JAMES.GOULDING@NOTTINGHAM.AC.UK
PROFESSOR OF DATA SCIENCE

Abstract

Variable Importance (VI) has traditionally been cast as the process of estimating each variable's contribution to a predictive model's overall performance. Analysis of a single model instance, however, guarantees no insight into a variables relevance to underlying generative processes. Recent research has sought to address this concern via analysis of Rashomon sets-sets of alternative model instances that exhibit equivalent predictive performance to some reference model, but which take different functional forms. Measures such as Model Class Reliance (MCR) have been proposed, that are computed against Rashomon sets, in order to ascertain how much a variable must be relied on to make robust predictions, or whether alternatives exist. If MCR range is tight, we have no choice but to use a variable; if range is high then there exists competing, perhaps fairer models, that provide alternative explanations of the phenomena being examined. Applications are wide, from enabling construction of 'fairer' models in areas such as recidivism to health analytics and ethical marketing. Tractable estimation of MCR for non-linear models is currently restricted to Kernel Regression under squared loss [7]. In this paper we introduce a new technique that extends computation of Model Class Reliance (MCR) to Random Forest classifiers and regressors. The proposed approach addresses a number of open research questions, and in contrast to prior Kernel SVM MCR estimation, runs in linearithmic rather than polynomial time. Taking a fundamentally different approach to previous work, we provide a solution for this important model class, identifying situations where irrelevant covariates do not improve predictions.

Citation

Smith, G., Mansilla Lobos, R., & Goulding, J. (2020, December). Model Class Reliance for Random Forests. Presented at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

Presentation Conference Type	Edited Proceedings
Conference Name	34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Start Date	Dec 7, 2020
End Date	Dec 12, 2020
Acceptance Date	Oct 1, 2020
Online Publication Date	Dec 12, 2020
Publication Date	Dec 12, 2020
Deposit Date	Dec 8, 2020
Publicly Available Date	Jan 7, 2021
Series Title	Advances in Neural Information Processing Systems
Book Title	Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020)
Public URL	https://nottingham-repository.worktribe.com/output/5127765
Publisher URL	https://papers.nips.cc/paper/2020/hash/fd512441a1a791770a6fa573d688bff5-Abstract.html
Related Public URLs	https://nips.cc/