Identification of incident poisoning, fracture and burn events using linked primary care, secondary care and mortality data from England: implications for research and surveillance

Background English national injury data collection systems are restricted to hospitalisations and deaths. With recent linkage of a large primary care database, the Clinical Practice Research Datalink (CPRD), with secondary care and mortality data, we aimed to assess the utility of linked data for injury research and surveillance by examining recording patterns and comparing incidence of common injuries across data sources. Methods The incidence of poisonings, fractures and burns was estimated for a cohort of 2 147 853 0–24 year olds using CPRD linked to Hospital Episode Statistics (HES) and Office for National Statistics (ONS) mortality data between 1997 and 2012. Time-based algorithms were developed to identify incident events, distinguishing between repeat follow-up records for the same injury and those for a new event. Results We identified 42 985 poisoning, 185 517 fracture and 36 719 burn events in linked CPRD-HES-ONS data; incidence rates were 41.9 per 10 000 person-years (95% CI 41.4 to 42.4), 180.8 (179.8–181.7) and 35.8 (35.4–36.1), respectively. Of the injuries, 22 628 (53%) poisonings, 139 662 (75%) fractures and 33 462 (91%) burns were only recorded within CPRD. Only 16% of deaths from poisoning (n=106) or fracture (n=58) recorded in ONS were recorded within CPRD and/or HES records. None of the 10 deaths from burns were recorded in CPRD or HES records. Conclusions It is essential to use linked primary care, hospitalisation and deaths data to estimate injury burden, as many injury events are only captured within a single data source. Linked routinely collected data offer an immediate and affordable mechanism for injury surveillance and analyses of population-based injury epidemiology in England.

ABSTRACT Background English national injury data collection systems are restricted to hospitalisations and deaths. With recent linkage of a large primary care database, the Clinical Practice Research Datalink (CPRD), with secondary care and mortality data, we aimed to assess the utility of linked data for injury research and surveillance by examining recording patterns and comparing incidence of common injuries across data sources. Methods The incidence of poisonings, fractures and burns was estimated for a cohort of 2 147 853 0-24 year olds using CPRD linked to Hospital Episode Statistics (HES) and Office for National Statistics (ONS) mortality data between 1997 and 2012. Time-based algorithms were developed to identify incident events, distinguishing between repeat follow-up records for the same injury and those for a new event.
Results We identified 42 985 poisoning, 185 517 fracture and 36 719 burn events in linked CPRD-HES-ONS data; incidence rates were 41.9 per 10 000 personyears (95% CI 41.4 to 42.4), 180.8 (179.8-181.7) and 35.8 (35.4-36.1), respectively. Of the injuries, 22 628 (53%) poisonings, 139 662 (75%) fractures and 33 462 (91%) burns were only recorded within CPRD. Only 16% of deaths from poisoning (n=106) or fracture (n=58) recorded in ONS were recorded within CPRD and/or HES records. None of the 10 deaths from burns were recorded in CPRD or HES records. Conclusions It is essential to use linked primary care, hospitalisation and deaths data to estimate injury burden, as many injury events are only captured within a single data source. Linked routinely collected data offer an immediate and affordable mechanism for injury surveillance and analyses of population-based injury epidemiology in England.

BACKGROUND
Injuries remain an important preventable cause of morbidity, hospitalisation and health inequality among children and young people in England. [1][2][3][4] Understanding the burden of injuries is important for health service planning and the prioritisation of preventative interventions to those at greatest risk. Despite this, estimating injury burden in England remains a challenge due to fragmented data collection systems and no national surveillance system. Most existing injury studies have relied on single data sources, [5][6][7] such as emergency department (ED) or hospitalisation data, and so underestimate injury burden as injuries seen in primary care or minor injury units are not captured. With recent linkage of a primary care research database to hospitalisation and mortality data, there is new potential to build a more complete picture of the epidemiology of injuries in England. We aimed to estimate population incidence figures for three common childhood injuries ( poisonings, fractures, burns) through developing methods to define incident injury events across linked data. We focused on poisonings, fractures and burns as they are three of the commonest injuries of childhood and adolescence, 8 9 and have been highlighted as priorities for prevention among under 5s in England. 1 We also describe the recording of injury mechanisms and intent according to data source in order to assess the utility of these data for injury surveillance and future studies of injury burden and prevention.

Data sources
We used three routinely collected data sources from England: the Clinical Practice Research Datalink (CPRD), Hospital Episode Statistics (HES) and Office for National Statistics (ONS) mortality data. The CPRD is a longitudinal primary care research database containing the anonymised demographic, medical, prescription and lifestyle data of >15 million patients from the UK. 10 Within the UK, healthcare is available free at the point of access via the National Health Service (NHS), with about 98% of the population registered with a general practitioner (GP). 11 Diagnostic and lifestyle information are recorded in the electronic primary care record using Read codes, 12 with information received from secondary and tertiary care (eg, ED attendances, hospitalisations and specialist unit admissions) also coded in the medical record. Previous studies have shown high levels of transcription of diagnostic information from hospital discharge records and outpatient clinic letters into the electronic record, 13 14 although the completeness of injury recording is unknown. CPRD undergoes quality checks to ensure only high-quality data are used for research.
The HES dataset contains details of all emergency and elective inpatient admissions (of any duration) to NHS hospitals in England, including care paid for by the NHS but delivered by independent or private treatment centres. Diagnoses and procedures are coded using the International Classification of Diseases 10th revision (ICD-10) and the Office of Population Census and Surveys V.4 (OPCS-4), respectively. The ONS mortality dataset contains the date and cause of death (coded using ICD-10) for all deaths registered in England.

Linkage of data sources
Linkage of CPRD, HES and ONS mortality data was carried out by a trusted third party prior to anonymisation using the patient's NHS number, gender and date of birth. Linkage is currently available for 375 (55%) of the general practices submitting data to CPRD; representing about 5% of general practices in England. While general practices participate in CPRD on a voluntary basis, a previous comparison of the CPRD-HES linked practices to demographic data for the UK has shown broadly similar age and sex structures. 15 Infants, young adults and practices from the North East, East Midlands and Yorkshire are slightly underrepresented within CPRD-HES linked data 15 ; likely to relate to delayed GP registration of infants after birth, changes in life circumstances among young adults (eg, moving home, going to university) and regional variation in the uptake of the Vision clinical software system required for participation in the CPRD.

Study population
Using these data sources, we carried out an open cohort study of children and young people aged 0-24 years old who were registered at general practices participating in the CPRD between 1 April 1997 and 31 March 2012, who also had linked HES and ONS mortality data. For each subject, the entry date to the study was the most recent date of their date of birth, practice registration date, the date the practice met CPRD data quality standards or the date from which linked HES data were available (1 April 1997). Patients left the cohort at the earliest date of 31 March 2012, when the child/young person died, reached the age of 25, changed general practice or when the practice stopped participating in CPRD. When a patient changed general practice follow-up ceased, as patients do not retain their unique identifiers within the database.

Defining injury outcomes
We extracted all injury records for poisonings, fractures and burns using a comprehensive Read code list for CPRD, and ICD-10 code lists for HES and ONS mortality data. In addition, treatment procedures (eg, fixation of fracture) were extracted from HES using an OPCS-4 code list. As the primary cause of injury death is recorded in England using external cause codes (ICD-10 V01-Y98), we examined all causes of death recorded per child to identify fracture, poisoning or burn events. For example, the primary cause of death could be a transport accident, but the death would be classified as a fracture case if a child had sustained multiple fractures.

Defining incident events
We aimed to identify all incident poisoning, fracture and burn events per child, distinguishing between records for follow-up care and those indicating new events. We counted injury events, as opposed to individually injured sites, such that if a child sustained multiple injuries of the same type (eg, multiple fractures at different sites), this was only counted once.
We first excluded codes referring to complications and past injuries. Second, we used an algorithm, consisting of a series of time-windows, to distinguish between repeat codes for the same event and those for a new event (table 1). Subsequent hospitalisations and primary care codes occurring within the relevant time-window after the first code were considered part of the same injury event. We used a longer time-window for injury events where the first record was a hospitalisation, as injuries requiring admission may be more severe and require longer follow-up. A third time-window determined whether

Burns (weeks)
Time-window 1: from first to subsequent code in CPRD. Time from the start date of injury event, ie, when the first code for the injury event was recorded in primary care (CPRD). Codes recorded in primary care within this time-window were considered the same injury event.
Time-window used to exclude codes likely to be follow-up care recorded in primary care.

2 6 3
Time-window 2: from first code in HES to subsequent code in CPRD. Time from the hospital discharge date. Used in cases where the first code of the injury event was a hospital admission recorded in HES. Codes recorded in primary care within this time-window were considered the same injury event.
An injury leading to hospital admission may be more severe, and require longer follow-up after discharge. Time from discharge used to account for injuries requiring prolonged hospital admission. Time-window used to distinguish whether a subsequent hospital admission could indicate the same (eg, hospital transfer, readmission) or a new injury event.

2 6
Time-window 4 (burns only): from first CPRD or HES record to procedural codes for skin grafts. Time from the start date of injury event (whether recorded in CPRD or HES) to codes for skin grafts recorded in CPRD or HES.
Time-window used for burns to account for a small number of children with prolonged follow-up and multiple graft procedures following a severe burn. hospitalisations occurring after the event start date referred to the same (eg, readmission) or a new event. For burns, an additional time-window of 2 years was used to account for the small number of children who sustain severe burns requiring multiple grafts. The algorithm thus accounted for simple injury management such as one visit to a GP and complex management involving GP and hospital follow-up. For example, for a child initially admitted to hospital with a fracture, any CPRD records occurring within 26 weeks of this admission were considered the same event. A CPRD record occurring after 26 weeks of this event date was considered a new event. Further detail is given in online supplementary file 1.
We defined time-windows for each injury type by plotting the rates of relevant injury codes entered in CPRD or HES after the first injury code (see online supplementary file 1). The point at which the rate plateaued was used to define the end of the timewindow during which all injury-related codes related to the first code. Clinical plausibility was also taken into account; for example, relatively short time-windows were chosen for poisonings, as repeat self-poisonings commonly occur within 2-3 months of the initial event, 16 and poisoning hospitalisations are most likely to be incident events. 17 Defining mechanism and intent of injury Understanding how an injury occurred (the mechanism, eg, fall) and whether an injury was intentional or unintentional (the intent) is important when identifying and implementing prevention strategies. 18 For each hospitalisation and death, ICD-10 codes V01-Y36, Y90-Y98 were extracted to assess the proportion of events with a documented mechanism and intent, classifying intent as unintentional, intentional or undetermined. For events recorded in CPRD, we extracted relevant Read codes (mapped to ICD-10 V01-Y36, Y90-Y98) for those who had sustained a poisoning, fracture or burn. We assessed the proportion of injury events where a code referring to a mechanism or intent had been recorded on the same day as a code for that injury type.

Statistical analyses
For each data source separately and in linked CPRD-HES-ONS data, we counted the number of incident injury events, as indicated by the first primary care, hospitalisation or death record within the relevant time-window(s) (table 1). We identified the proportion of events recorded in both CPRD and HES, and those captured by all three data sources, by identifying those events with records from more than one data source within the relevant time-window.
Incidence rates of poisoning, fracture and burn events overall and by age were estimated per 10 000 person-years (PY), with 95% CIs, using CPRD, HES, ONS mortality data and the three data sources together (CPRD-HES-ONS). We assessed the proportion of injury events with a mechanism and intent recorded, and conducted a sensitivity analysis to assess the impact of doubling the time-windows used to define incident injury events (eg, a time-window of 3 weeks was extended to 6 weeks in the sensitivity analysis). All data management was conducted using Stata V.13.0.

RESULTS
For the study period 1997-2012, there were 2 147 853 children and young people aged 0-24 within the CPRD database who had linked HES and ONS mortality data and were eligible for inclusion in the study. Of the study cohort, 1 049 150 (49%) were male and 1 098 703 (51%) were female (table 2). Similar proportions of the population were from each socioeconomic quintile, with some underrepresentation of quintiles 3 (18.6%) and 5 (19.1%). A high proportion (35.0%) of the study cohort had missing ethnicity data. Those from white (55.9%) and Asian (3.6%) ethnic groups were underrepresented compared with the 2011 Census (86% and 7.5%, respectively), 19 with the missing group likely to largely represent those of white ethnicity. The regions of England contributing the highest proportions of subjects were the North West (15.8%) and London (15.5%). Median follow-up was 3.2 years (IQR 1.3-7.5).

Injury events according to data source
Over the study period, we identified 42 985 poisoning, 185 517 fracture and 36 719 burn events for the population in linked CPRD-HES-ONS data (table 3). This compared to 34 091, 169 491 and 35 049 events, respectively, when using CPRD alone. A total of 106 children were identified in ONS mortality data with a recorded cause of death from poisoning, 58 from fracture and 10 from burns. Among those who died, most were aged 15-24 (97.3% of poisonings, 88% of fractures and 70% of burns). When using linked CPRD-HES-ONS data, the proportions of events identified in each data source varied by injury type (figure 1). For poisonings, 52.6% of events were only identified using CPRD, as were 75.3% of fracture and 91.1% of burn events. Compared to using CPRD alone, the addition of HES data increased the ascertainment of injury events for each injury type, with the greatest relative impact for poisonings. Of the children who died with one of these injury types, 17 (16.0%) poisonings and 9 (15.5%) fractures were identified in CPRD and/or HES. None of the 10 children who died from burns were identified in CPRD and/or HES when using our code lists.

Incidence according to data source
Overall incidence rates for the study period were 41.9/10 000 PY (95% CI 41. 4 (table 3). For each injury type, estimated incidence was higher in linked CPRD-HES-ONS data than when using CPRD alone (non-overlapping 95% CI), with rates 26%, 10% and 5% higher for poisonings, fractures and burns, respectively. Figure 2 shows incidence rates for the three injury types by data source and child age. Poisoning incidence peaked at age 2 and again at 18 years old compared with single peaks in incidence for fractures and burns at 13 and 1 years old, respectively.

Sensitivity analysis: extending time-windows to define events
We identified 42 214 poisoning, 180 202 fracture and 36 425 burn events in the sensitivity analysis; 98%, 97% and 99%, respectively, of the events identified in the primary analysis. CIs overlapped between incidence rates by injury type and child age in the primary and sensitivity analyses (see online supplementary file 2). Proportions of events identified in CPRD alone, HES alone and in both data sources did not vary from the primary analyses.

DISCUSSION
We report the first UK study to estimate injury incidence using linked primary care, hospitalisation and mortality data, with methods developed to define incident events across these linked data. We have demonstrated that it is essential to use multiple data sources to provide more complete estimates of injury incidence, as many injury events are only captured within a single data source. For example, only one in six deaths from poisonings and fractures were captured by primary and/or secondary care data.

Strengths and limitations
The main strengths of our study are the large study size and use of linked prospectively recorded data to identify events. By defining incident events within linked data, we were able to include multiple events per child over time; an issue of importance in estimating injury burden and for surveillance. In addition, with universal healthcare coverage, and emergency care almost exclusively delivered by the NHS, 20 we are unlikely to have substantially underestimated injury incidence as a result of injuries being seen within the private medical sector. Linked CPRD-HES-ONS data are broadly representative of the UK population 15 and remain the most complete and accurate method currently available in England to estimate injury incidence. While there is some underrepresentation of certain groups within linked CPRD-HES data, 15 it is reassuring that our study cohort is evenly distributed by socioeconomic status (between 18.9% and 20.9% in each quintile), an Figure 1 Numbers and percentages of poisoning, fracture and burn events identified in primary care (Clinical Practice Research Datalink (CPRD)), secondary care (Hospital Episode Statistics (HES)) and deaths (Office for National Statistics mortality) data. *Numbers of children who died from a poisoning, fracture or burn, where this information was also recorded within CPRD, HES or both CPRD and HES, have not been shown in these diagrams due to the ethical constraint of reporting the small numbers involved.
important factor affecting injury incidence. Underrepresentation of young adults and those from the North East, East Midlands and Yorkshire could lead to some underestimation of injury incidence, as rates tend to be higher in these groups. 9 21 Ongoing recruitment of practices to CPRD and plans for future widespread access to primary care data across the UK 22 should increase population and geographical coverage.
By using a time-based algorithm, we may have erroneously treated an injury code as a continuation of the same event or conversely treated a code as a new event. We, however, demonstrated in our sensitivity analysis that even when these timewindows were doubled, incidence by child age was similar to the primary analysis for all injury types.
Within the UK, standardised national ED data are still in development with provisional datasets incomplete and of varying quality. 23 The lack of ED data linked to CPRD-HES-ONS means it is likely we have not identified all injury events, including those recorded in CPRD using nonspecific codes (eg, seen in ED) or within the free text of the medical record. Quantifying the number of ED attendances not captured within CPRD is a challenge. Until 2002, the Home and Leisure Accident Surveillance System (HASS/LASS) captured injury occurrences from a sample of 16-18 UK EDs. For fractures, an injury where a high proportion of the burden is captured in ED data, HASS/LASS estimated incidence as 221/ 10 000 for the time-period 1997-2002, which compares to our estimate of 155/10 000 for this period, indicating we may be underestimating fracture incidence by about 40%. Comparisons for poisonings and burns are more complex due to differences in definitions, and the proportion of the injury burden captured in ED data (see online supplementary file 3). While it is likely we are underestimating incidence, there is little evidence to suggest that GPs differentially record injury occurrences (eg, by age, sex, region) and our results still provide vital information on the trends and patterns of different injury types.
Quantifying the severity of injuries, and the number of children sustaining multiple injuries within the same event (eg, multiple fractures) is complex in CPRD, due to, for example, the use of non-specific codes (eg, 43% of the fracture Read codes used did not specify an anatomical site) and potential selective recording of the most severe injuries by GPs. As injury severity and the number of injury types and body regions injured affect functional and health status outcomes, 24 quantifying this injury burden is important. Future studies using linked data will need to develop methods to systematically identify children sustaining multiple injuries and assess the feasibility of defining injury severity within these data.
Read codes specifying a mechanism and/or intent were infrequently used within CPRD for the injury types we assessed. Further information may be recorded within the free text of the record or coded in alternative ways (eg, in suspected intentional injury a code such as 'referral to social services' may be used, 25 without an injury code). For these reasons, we may have underestimated the recording of mechanisms and intent in primary care data, although our finding does reflect data that can be routinely extracted from CPRD and that corresponds to the ICD-10 external cause codes.

Comparison with previous studies
Our estimate of fracture incidence (180.8/10 000 PY) and observed patterns according to age are consistent with other UK studies, 9 26 27 ; although lower than HASS/LASS. Rennie et al 26 used a hospital-based database, estimating fracture incidence as 202/ 10 000 for 0-16 year olds living in Edinburgh. In this study, 1.2% of children sustained multiple fractures, with each fracture included separately, which in part explains our lower incidence of 192/ 10 000 for this age group. Similarly, Lyons et al 27 used a Welsh database of ED attendances, giving an incidence of 361/10 000 for 0-14 year olds. This compares to our estimate of 187/10 000 for children of this age, which may in part be explained by regional variation in fracture incidence between Wales and England. 9 For poisonings, the majority of UK studies come from individual hospital sites, focusing on hospitalisation or ED rates. 28 29 We found poisoning incidence peaked at ages 2 and 18 years, likely to reflect the different aetiologies of poisoning in preschool children compared with young people (unintentional vs intentional). 30 A study using ED data from the USA estimated poisoning incidence for children aged 0-4 as 42.9/ 10 000, consistent with our estimate of 46.3/10 000 for this age group. 31 In addition, rates from a US surveillance system for 10-19 years olds were 61.8/10 000, 32 which compares to 46.8/ 10 000 for this age group in our study.
Most UK studies of burns incidence focus on those requiring admission or specialist burns care, giving rates considerably lower than our estimates. [33][34][35][36] A study using a surveillance system in Massachusetts gave an estimated burns incidence of 50/10 000 for 0-19 year olds, 37 broadly consistent with our estimate when taking account of changes in burns incidence over time 33 and that we may not have captured all burns ED attendances.

Implications for research and practice
We have demonstrated that it is essential to use linked data sources to build a more complete picture of injury burden, indicating that future injury studies using primary care research databases should use linked hospitalisation and mortality data. While there have been plans to establish injury surveillance and reporting systems across Europe using a standardised ED dataset, these systems and the funding to implement them are yet to be in place in the UK. 38 Current reliance on hospitalisation and mortality data means that much of the injury burden within England is not accounted for within health service or injury prevention planning. While injuries not requiring admission may be less severe, many still have a significant impact in terms of time off work or school, costs of follow-up care and psychological impact. 39 With it now being feasible to link routinely collected primary care, hospitalisation and mortality data, *Intent and mechanism defined in CPRD using Read codes corresponding to ICD-10 codes V01-Y36, Y90-Y98. † Intent and mechanism defined in HES and ONS mortality data using ICD-10 codes V01-Y36, Y90-Y98. ‡As poisonings are both a mechanism and injury type, mechanism recording has not been reported here. these data offer an immediate and affordable mechanism for injury surveillance in England. In addition, these data offer an inexpensive and efficient mechanism of obtaining outcome data for evaluations of injury prevention interventions. As linked ED data become available in the future, this will increase both the completeness of injury events captured and information about injury mechanism and intent. Recording of injury mechanisms in primary care could be improved by providing guidance on injury recording and simplified mechanism and intent categories (eg, similar to ED minimum datasets). 40 Future research should include the linking of ED data to CPRD-HES-ONS data, extending this work to cover all injury types, and the development of methods to define those sustaining multiple injuries.
What is already known on the subject ▸ Current injury data sources based on hospital admissions and deaths underestimate injury burden, though the extent of this underestimation is unknown. ▸ Large primary care databases have been used for injury epidemiology, taking advantage of the longitudinal medical record maintained over a patient's life. ▸ The extent that injuries seen in secondary care are accurately recorded within primary care data is unknown.
What this study adds ▸ Using primary care, secondary care or mortality data in isolation misses a substantial proportion of injury events. ▸ Future injury studies using large primary care databases should use linked hospital admission and mortality data. ▸ Linked data holds potential for injury surveillance, particularly with plans for widespread access to primary care data in England.
Contributors All authors were involved in the conception and design of the study. RB undertook the analysis and drafted the manuscript. All authors contributed to interpreting the findings and revising the manuscript.
Funding RB is funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR) and The University of Nottingham.
Disclaimer The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.