Skip to main content

Research Repository

Advanced Search

Using volunteered geographic information (VGI) in design-based statistical inference for area estimation and accuracy assessment of land cover


Stephen V. Stehman


Professor of Geographical Information

Linda See


Volunteered Geographic Information (VGI) offers a potentially inexpensive source of reference data for estimating area and assessing map accuracy in the context of remote-sensing based land-cover monitoring. The quality of observations from VGI and the typical lack of an underlying probability sampling design raise concerns regarding use of VGI in widely-applied design-based statistical inference. This article focuses on the fundamental issue of sampling design used to acquire VGI. Design-based inference requires the sample data to be obtained via a probability sampling design. Options for incorporating VGI within design-based inference include: 1) directing volunteers to obtain data for locations selected by a probability sampling design; 2) treating VGI data as a “certainty stratum” and augmenting the VGI with data obtained from a probability sample; and 3) using VGI to create an auxiliary variable that is then used in a model-assisted estimator to reduce the standard error of an estimate produced from a probability sample. The latter two options can be implemented using VGI data that were obtained from a non-probability sampling design, but require additional sample data to be acquired via a probability sampling design. If the only data available are VGI obtained from a non-probability sample, properties of design-based inference that are ensured by probability sampling must be replaced by assumptions that may be difficult to verify. For example, pseudo-estimation weights can be constructed that mimic weights used in stratified sampling estimators. However, accuracy and area estimates produced using these pseudo-weights still require the VGI data to be representative of the full population, a property known as “external validity”. Because design-based inference requires a probability sampling design, directing volunteers to locations specified by a probability sampling design is the most straightforward option for use of VGI in design-based inference. Combining VGI from a non-probability sample with data from a probability sample using the certainty stratum approach or the model-assisted approach are viable alternatives that meet the conditions required for design-based inference and use the VGI data to advantage to reduce standard errors.


Stehman, S. V., Fonte, C. C., Foody, G. M., & See, L. (2018). Using volunteered geographic information (VGI) in design-based statistical inference for area estimation and accuracy assessment of land cover. Remote Sensing of Environment, 212,

Journal Article Type Article
Acceptance Date Apr 8, 2018
Online Publication Date Apr 26, 2018
Publication Date Jun 30, 2018
Deposit Date Apr 27, 2018
Publicly Available Date Apr 27, 2019
Journal Remote Sensing of Environment
Print ISSN 0034-4257
Electronic ISSN 0034-4257
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 212
Keywords Probability sampling; External validity; Pseudo-weights; Data quality; Model-based inference; Volunteered geographic information (VGI); Crowdsourcing
Public URL
Publisher URL


You might also like

Downloadable Citations