Skip to main content

Research Repository

Advanced Search

Aggregator: A machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial

Shao, Weixiang; Adams, Clive E.; Cohen, Aaron M.; Davis, John M.; McDonagh, Marian S.; Thakurta, Sujata; Yu, Philip S.; Smalheiser, Neil R.

Authors

Weixiang Shao

Clive E. Adams

Aaron M. Cohen

John M. Davis

Marian S. McDonagh

Sujata Thakurta

Philip S. Yu

Neil R. Smalheiser



Abstract

Objective

It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.
Methods

We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression.
Results

Article pairs from the same trial were identified with high accuracy (F1 score = 0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial.
Discussion

Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.

Journal Article Type Article
Publication Date 2015-03
Journal Methods
Print ISSN 1046-2023
Electronic ISSN 1095-9130
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 74
Pages 65-70
APA6 Citation Shao, W., Adams, C. E., Cohen, A. M., Davis, J. M., McDonagh, M. S., Thakurta, S., …Smalheiser, N. R. (2015). Aggregator: A machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial. Methods, 74, 65-70. https://doi.org/10.1016/j.ymeth.2014.11.006
DOI https://doi.org/10.1016/j.ymeth.2014.11.006
Keywords Evidence-based medicine; Clinical trials; Systematic reviews; Bias; Information retrieval; Informatics
Publisher URL http://www.sciencedirect.com/science/article/pii/S1046202314003661
Copyright Statement Copyright information regarding this work can be found at the following address: http://eprints.nottingh.../end_user_agreement.pdf
Additional Information This article is maintained by: Elsevier; Article Title: Aggregator: A machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial; Journal Title: Methods; CrossRef DOI link to publisher maintained version: https://doi.org/10.1016/j.ymeth.2014.11.006; Content Type: article; Copyright: Copyright © 2014 Elsevier Inc. All rights reserved.
;