Trung Dong Huynh
Provenance Network Analytics: An approach to data analytics using data provenance
Huynh, Trung Dong; Ebden, Mark; Fischer, Joel; Roberts, Stephen; Moreau, Luc
Authors
Mark Ebden
JOEL FISCHER Joel.Fischer@nottingham.ac.uk
Professor of Human-Computer Interaction
Stephen Roberts
Luc Moreau
Abstract
Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
Citation
Huynh, T. D., Ebden, M., Fischer, J., Roberts, S., & Moreau, L. (2018). Provenance Network Analytics: An approach to data analytics using data provenance. Data Mining and Knowledge Discovery, 32(3), 708-735. https://doi.org/10.1007/s10618-017-0549-3
Journal Article Type | Article |
---|---|
Acceptance Date | Dec 26, 2017 |
Online Publication Date | Feb 15, 2018 |
Publication Date | 2018-05 |
Deposit Date | Jan 4, 2018 |
Publicly Available Date | Feb 15, 2018 |
Journal | Data Mining and Knowledge Discovery |
Print ISSN | 1384-5810 |
Electronic ISSN | 1573-756X |
Publisher | Springer Verlag |
Peer Reviewed | Peer Reviewed |
Volume | 32 |
Issue | 3 |
Pages | 708-735 |
DOI | https://doi.org/10.1007/s10618-017-0549-3 |
Keywords | data provenance; data analytics; network metrics; graph classification |
Public URL | https://nottingham-repository.worktribe.com/output/911918 |
Publisher URL | https://link.springer.com/article/10.1007/s10618-017-0549-3 |
Contract Date | Jan 4, 2018 |
Files
10.1007_s10618-017-0549-3.pdf
(1.2 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
Copyright Statement
Copyright information regarding this work can be found at the following address: http://creativecommons.org/licenses/by/4.0
You might also like
Doing the laundry with agents: a field trial of a future smart energy system in the home
(2014)
Presentation / Conference Contribution
Social implications of agent-based planning support for human teams
(2014)
Presentation / Conference Contribution
Supporting team coordination on the ground: requirements from a mixed reality game
(2014)
Presentation / Conference Contribution
Energy advisors at work: charity work practices to support people in fuel poverty
(2014)
Presentation / Conference Contribution
A study of human-agent collaboration for multi-UAV task allocation in dynamic environments
(2015)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search