Automatic detection of protected health information from clinic narratives

Yang, Hui; Garibaldi, Jonathan M.

doi:10.1016/j.jbi.2015.06.015

Automatic detection of protected health information from clinic narratives

Yang, Hui; Garibaldi, Jonathan M.

Authors

Hui Yang

Prof. JONATHAN GARIBALDI JON.GARIBALDI@NOTTINGHAM.AC.UK
Provost and Pvc Unnc

Abstract

This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F measure of 93.6%, which was the winner of this de-identification challenge.

Citation

Yang, H., & Garibaldi, J. M. (2015). Automatic detection of protected health information from clinic narratives. Journal of Biomedical Informatics, 58(Suppl.), S30-S38. https://doi.org/10.1016/j.jbi.2015.06.015

Journal Article Type	Article
Acceptance Date	Jun 23, 2015
Online Publication Date	Jul 29, 2015
Publication Date	2015-12
Deposit Date	Oct 14, 2016
Publicly Available Date	Oct 14, 2016
Journal	Journal of Biomedical Informatics
Print ISSN	1532-0464
Electronic ISSN	1532-0480
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	58
Issue	Suppl.
Pages	S30-S38
DOI	https://doi.org/10.1016/j.jbi.2015.06.015
Keywords	Protected Health Information (PHI); De-identification; Hybrid model; Natural language processing; Clinical text mining
Public URL	https://nottingham-repository.worktribe.com/output/756185
Publisher URL	http://www.sciencedirect.com/science/article/pii/S1532046415001252
Additional Information	This article is maintained by: Elsevier; Article Title: Automatic detection of protected health information from clinic narratives; Journal Title: Journal of Biomedical Informatics; CrossRef DOI link to publisher maintained version: https://doi.org/10.1016/j.jbi.2015.06.015; Content Type: article; Copyright: © 2015 Elsevier Inc.

Files

1-s2.0-S1532046415001252-main.pdf (686 Kb)
PDF

Copyright Statement
Copyright information regarding this work can be found at the following address: http://creativecommons.org/licenses/by-nc-nd/4.0