Attributes for causal inference in electronic healthcare databases

Side effects of prescription drugs present a serious issue. Existing algorithms that detect side effects generally require further analysis to confirm causality. In this paper we investigate attributes based on the Bradford-Hill causality criteria that could be used by a classifying algorithm to definitively identify side effects directly. We found that it would be advantageous to use attributes based on the association strength, temporality and specificity criteria.


Introduction
The aim of medication is to improve patients' standard of living, but medication can lead to side effects, also known as adverse drug reactions (ADRs). Existing ADR signalling algorithms have a high false positive rate. This reduces their efficiency as the signals they generate need to be confirmed with more rigorous analysis.
A novel approach for signalling ADRs is to develop a causality classifier with suitable input attributes. Such an algorithm would be more efficient at signalling ADRs as it would not require additional analysis. The Bradford Hill causality criteria (BHCC) [1] is an excellent starting point for developing suitable attributes as it is often considered when determining causal relationships. In this paper we investigate attributes based on the BHCC to aid future ADR classifying algorithms. In the continuation of this paper we summarise the existing algorithms, the BHCC and the feature selection applied in the next section, followed by the results and finish with the conclusion.

Background & Methodology
Spontaneous Reporting System (SRS) databases and Electronic Healthcare Databases (EHDs) are the databases generally used for post marketing drug surveillance. The SRS databases rely of voluntary reports of suspected ADRs whereas the EHD databases are often extracted directly from medical practitioners records. Existing algorithms measure association rather than determining causality directly. The BHCC were developed to distinguish between association and causation. The nine factors of interest (in the context as ADR signalling) are: • Association Strength -how strong the association is.
• Temporality -the direction of the association.
• Specificity -how specific the relationship is.
• Experimentation -does the medical event stop and start in sync with the drug? • Dosage -correlation between dosage and medical event occurrence? • Analogy -do similar drugs have similar side effects?
• Coherence -does the association make sense? • Plausibility -is the association possible?
• Consistency -association found in different databases?
The SRS and EHD algorithms calculate a measure of association strength and also cover temporality, as the EHDs apply filters to removed medical events that cause the drug and people submitting reports to SRS algorithms will only report medical events that occur after the drug. Furthermore, people will only report a suspected ADR if it is plausible, so the SRS algorithms indirectly cover plausibility.
The attributes detained in Table 1 were derived using The Health Improvement Network database (www.thinuk.com). Feature selection was applied using a multivariate filter, the Correlation-based Feature Selection (CFS) algorithm [2], as this algorithm is not dependent on a specific classifier.

Results & Discussion
The attributes chosen by the CFS algorithm were LEOP-ARD, RD 13BN F , ABratio Lv3, Gender Ratio and Read Code Level. The reason that the majority of attributes were not selected by the CFS algorithm is because they had a high correlation with either LEOPARD or the RD 13BN F . The Temporality How often the level 2 version of the medical event is recorded after the prescription compared to before. ABratio Level 3 Temporality How often the level 3 version of the medical event is recorded after the prescription compared to before. LEOPARD [4] Temporality 1 if the drug is prescribed significantly more after the medical event than before, 0 otherwise. OE f ilt1 [3] Temporality 1 if the IC ∆ is greater the month before the drug than the month after, 0 otherwise. OE f ilt2 [3] Temporality 1 if the IC ∆ is greater on the day of prescription compared to the month after, 0 otherwise. Experiment Number of patients that have medical event in two distinct hazard periods and not in their non-hazard periods divided by the occurrence in the non-hazard periods.
results show that the specificity attributes Gender Ratio and Read Code level can complement the temporal and strength attributes for ADR signalling. The experiment and dosage attributes investigated in this paper did not offer sufficient additional information than what could be gained from the RD 13BN F or the LEOPARD attributes.