Skip to main content

Research Repository

Advanced Search

Provenance Network Analytics: An approach to data analytics using data provenance

Huynh, Trung Dong; Ebden, Mark; Fischer, Joel; Roberts, Stephen; Moreau, Luc

Provenance Network Analytics: An approach to data analytics using data provenance Thumbnail


Authors

Trung Dong Huynh

Mark Ebden

JOEL FISCHER Joel.Fischer@nottingham.ac.uk
Professor of Human-Computer Interaction

Stephen Roberts

Luc Moreau



Abstract

Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.

Citation

Huynh, T. D., Ebden, M., Fischer, J., Roberts, S., & Moreau, L. (2018). Provenance Network Analytics: An approach to data analytics using data provenance. Data Mining and Knowledge Discovery, 32(3), 708-735. https://doi.org/10.1007/s10618-017-0549-3

Journal Article Type Article
Acceptance Date Dec 26, 2017
Online Publication Date Feb 15, 2018
Publication Date 2018-05
Deposit Date Jan 4, 2018
Publicly Available Date Feb 15, 2018
Journal Data Mining and Knowledge Discovery
Print ISSN 1384-5810
Electronic ISSN 1573-756X
Publisher Springer Verlag
Peer Reviewed Peer Reviewed
Volume 32
Issue 3
Pages 708-735
DOI https://doi.org/10.1007/s10618-017-0549-3
Keywords data provenance; data analytics; network metrics; graph classification
Public URL https://nottingham-repository.worktribe.com/output/911918
Publisher URL https://link.springer.com/article/10.1007/s10618-017-0549-3

Files





You might also like



Downloadable Citations