Julia S. Joswig
Imputing missing data in plant traits: A guide to improve gap‐filling
Joswig, Julia S.; Kattge, Jens; Kraemer, Guido; Mahecha, Miguel D.; Rüger, Nadja; Schaepman, Michael E.; Schrodt, Franziska; Schuman, Meredith C.
Authors
Jens Kattge
Guido Kraemer
Miguel D. Mahecha
Nadja Rüger
Michael E. Schaepman
Dr FRANZISKA SCHRODT FRANZISKA.SCHRODT1@NOTTINGHAM.AC.UK
Professor of Earth System Science
Meredith C. Schuman
Abstract
Aim: Globally distributed plant trait data are increasingly used to understand relationships between biodiversity and ecosystem processes. However, global trait databases are sparse because they are compiled from many, mostly small databases. This sparsity in both trait space completeness and geographical distribution limits the potential for both multivariate and global analyses. Thus, ‘gap‐filling’ approaches are often used to impute missing trait data. Recent methods, like Bayesian hierarchical probabilistic matrix factorization (BHPMF), can impute large and sparse data sets using side information. We investigate whether BHPMF imputation leads to biases in trait space and identify aspects influencing bias to provide guidance for its usage. Innovation: We use a fully observed trait data set from which entries are randomly removed, along with extensive but sparse additional data. We use BHPMF for imputation and evaluate bias by: (1) accuracy (residuals, RMSE, trait means), (2) correlations (bi‐ and multivariate) and (3) taxonomic and functional clustering (valuewise, uni‐ and multivariate). BHPMF preserves general patterns of trait distributions but induces taxonomic clustering. Data set–external trait data had little effect on induced taxonomic clustering and stabilized trait–trait correlations. Main Conclusions: Our study extends the criteria for the evaluation of gap‐filling beyond RMSE, providing insight into statistical data structure and allowing better informed use of imputed trait data, with improved practice for imputation. We expect our findings to be valuable beyond applications in plant ecology, for any study using hierarchical side information for imputation.
Citation
Joswig, J. S., Kattge, J., Kraemer, G., Mahecha, M. D., Rüger, N., Schaepman, M. E., …Schuman, M. C. (2023). Imputing missing data in plant traits: A guide to improve gap‐filling. Global Ecology and Biogeography, 32(8), 1395-1408. https://doi.org/10.1111/geb.13695
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 14, 2023 |
Online Publication Date | May 16, 2023 |
Publication Date | Aug 1, 2023 |
Deposit Date | Jul 5, 2023 |
Publicly Available Date | Jul 6, 2023 |
Journal | Global Ecology and Biogeography |
Print ISSN | 1466-822X |
Electronic ISSN | 1466-8238 |
Publisher | Wiley |
Peer Reviewed | Peer Reviewed |
Volume | 32 |
Issue | 8 |
Pages | 1395-1408 |
DOI | https://doi.org/10.1111/geb.13695 |
Keywords | gap‐filling, sparse matrix, Bayesian hierarchical model, matrix factorization, TRY, induced pattern, imputation, machine learning, plant functional trait, sensitivity analysis |
Public URL | https://nottingham-repository.worktribe.com/output/21098420 |
Publisher URL | https://onlinelibrary.wiley.com/doi/10.1111/geb.13695 |
Additional Information | Received: 2022-03-08; Accepted: 2023-04-14; Published: 2023-05-16 |
Files
Imputing missing data in plant traits: A guide to improve gap‐filling
(1.8 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc/4.0/
Copyright Statement
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
You might also like
The status and future of essential geodiversity variables
(2024)
Journal Article
Widespread shifts in body size within populations and assemblages
(2023)
Journal Article
Leaf metabolic traits reveal hidden dimensions of plant form and function
(2023)
Journal Article
Applying the concept of niche breadth to understand urban tree mortality in the UK
(2023)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search