External Validation of e‐ASPECTS Software for Interpreting Brain CT in Stroke

Objective The purpose of this study was to test e‐ASPECTS software in patients with stroke. Marketed as a decision‐support tool, e‐ASPECTS may detect features of ischemia or hemorrhage on computed tomography (CT) imaging and quantify ischemic extent using Alberta Stroke Program Early CT Score (ASPECTS). Methods Using CT from 9 stroke studies, we compared software with masked experts. As per indications for software use, we assessed e‐ASPECTS results for patients with/without middle cerebral artery (MCA) ischemia but no other cause of stroke. In an analysis outside the intended use of the software, we enriched our dataset with non‐MCA ischemia, hemorrhage, and mimics to simulate a representative “front door” hospital population. With final diagnosis as the reference standard, we tested the diagnostic accuracy of e‐ASPECTS for identifying stroke features (ischemia, hyperattenuated arteries, and hemorrhage) in the representative population. Results We included 4,100 patients (51% women, median age = 78 years, National Institutes of Health Stroke Scale [NIHSS] = 10, onset to scan = 2.5 hours). Final diagnosis was ischemia (78%), hemorrhage (14%), or mimic (8%). From 3,035 CTs with expert‐rated ASPECTS, most (2084/3035, 69%) e‐ASPECTS results were within one point of experts. In the representative population, the diagnostic accuracy of e‐ASPECTS was 71% (95% confidence interval [CI] = 70–72%) for detecting ischemic features, 85% (83–86%) for hemorrhage. Software identified more false positive ischemia (12% vs 2%) and hemorrhage (14% vs <1%) than experts. Interpretation On independent testing, e‐ASPECTS provided moderate agreement with experts and overcalled stroke features. Therefore, future prospective trials testing impacts of artificial intelligence (AI) software on patient care and outcome are required before widespread implementation of stroke decision‐support software. ANN NEUROL 2022;92:943–957

accuracy, whereas only 14 compared AI with health care professionals in the same sample (none were in stroke). 1 In another review of software to evaluate brain computed tomography (CT) in ischemic stroke with 68 studies, 38 reported insufficient data on stroke, patient demographics, or clinical testing. 3 Within hours of stroke onset, when treatment is most effective, signs of ischemia on CT imaging are often subtle yet lesion extent may guide treatment decisions. The Alberta Stroke Program Early CT Score (ASPECTS) aids visual assessment by quantifying the extent of middle cerebral artery (MCA) territory ischemic injury (CT hypoattenuation and/or swelling) in 10 regions (a score of 10 is normal, and 0 means the entire territory is affected). 5 Brainomix Ltd. (Oxford, UK) developed AI software (e-ASPECTS) to automatically identify CT features of stroke, including (1) ASPECTS, (2) hyperattenuated MCA (indicating arterial thrombus), and (3) intracranial hemorrhage (ICH).
Following a PubMed search (to August 6, 2021) using the company and software names, and review of evidence published on the company website, we identified 24 studies in English (excluding abstracts) evaluating e-ASPECTS, Supplementary Table S1. The median number of patients in these studies was 125, and over half (14/24) declared financial conflicts of interest with Brainomix. There was no prospective randomized testing. Twenty studies included patients with proven ischemic stroke only, thus precluding the assessment of true negative cases. Most of the studies excluded poor-quality CT (14 excluded, and 7 did not specify), 17 of 24 did not report software failures, and only 4 of 24 tested the impact of patient or imaging factors on software performance.
We established the "Real-World Independent Testing of e-ASPECTS Software" (RITeS) study to provide a large scale, clinically representative, and objective assessment of e-ASPECTS for identifying relevant features on CT brain imaging in patients with stroke.

Study Design
We used data from 9 completed clinical trials or observational studies of patients with stroke in which CT had been assessed by panels of masked experts and a final diagnosis of stroke type determined. [6][7][8][9][10][11][12][13][14] In a secondary analysis of these prospectively collected data, we processed the CTs using e-ASPECTS to compare the expert scan assessments and final diagnoses with e-ASPECTS results for the detection of acute ischemic features or ICH.
Following development of our research plan, we signed a software licensing agreement with Brainomix for use of e-ASPECTS and paid for the software using academic funds. We agreed to separate testing into 2 types: (1) where software is used on the intended population, and (2) other clinical scenarios where software might be used. We thus used 2 overlapping populations: 1. "Target population": Patients with possible ischemic stroke but no alternative pathology on CT (ie, potential candidates for thrombolysis). Here, we included patients with a final diagnosis of ischemic stroke or stroke mimic without a CT-identifiable cause, and compared ASPECTS provided by experts versus software. 2. "Representative population": To simulate hospitalpresenting patients with suspected stroke, we enriched our dataset to include realistic proportions of patients with a final diagnosis of ischemic stroke, ICH, and stroke mimics, and tested the diagnostic accuracy of software versus experts for identifying CT features that might account for stroke symptoms.
We report our results according to Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD), 15 but due to overlap of our research methods, we also consider other reporting standards, see Appendices S1-S3.

Patient Population
We analyzed CT brain scans performed soon after stroke onset from 7 national or international multicenter randomized controlled trials (RCTs) and 2 single-center prospective observational studies. These 9 studies recruited patients with acute stroke since May 2000: and one is ongoing. 16 Six included ischemic stroke, 6-11 2 were ICH, 12,14 and one included all stroke or stroke mimics. 13 Of the RCTs, 2 tested thrombolytics, 6,8 one tested imaging strategies, 11 one tested thrombectomy, 9 one tested hypothermia, 7 one tested blood pressure lowering, 13 and one tested antithrombotic drugs after ICH 12 ; of the observational studies, one studied hemorrhagic, 14 and the other studied ischemic stroke. 10 We were unable to secure approval in time to include a tenth study as initially proposed. 16,17 All 9 included studies had research ethical approval and obtained informed consent for all participants.

Clinical Data Assessment
All 9 studies centrally recorded patient demographics, stroke severity, time elapsed from stroke onset to CT, allocated treatment in the RCTs, and functional outcome.
Final diagnosis (ischemic stroke, ICH, and stroke mimic) was determined similarly in each study by central expert event adjudication, which included the local principal investigator's diagnosis, central masked expert panel review of baseline and follow-up imaging, and all other study data.

Sample for RITeS
We estimated that 725 patients were needed to determine whether e-ASPECTS is noninferior to experts (5% noninferiority limit). 16 To improve the precision of diagnostic accuracy estimates and power for subgroup analyses, we increased our sample size by including all baseline CTs available to RITeS. We did not otherwise select patients for inclusion; we did not exclude patients with low-quality imaging, with ischemic lesions outside the MCA territory, or if final diagnosis was stroke mimic.
To assess whether our sample was clinically representative of patients admitted to the hospital with stroke, we prespecified that age, sex, stroke severity, and time since symptom onset in RITeS would be similar to the UK Sentinel Stroke National Audit Programme (SSNAP; April 2018-March 2019, www.strokeaudit.org), pooled RCT, and registry data. 16 For "target population" testing, we excluded patients with hemorrhage or stroke mimic caused by a structural lesion. For "representative population" testing, we included all patients in RITeS.

Expert Image Assessment
Prior to RITeS, the CTs in the original 9 studies had been rated by central expert panels (total 24 experts with crossover among studies, one expert report per scan), masked to follow-up imaging and most other clinical data. Two studies provided experts with the side affected by stroke, 10,13 and one provided stroke onset time. 13 In 2 studies, experts reviewed CT and concurrent angiography together. 9,11 In the study with the largest contribution to RITeS, experts reviewing CT were masked to all other data. 8 For 6 studies, 7--9,11-13 20 experts performed imaging assessment using the same validated online viewing platform (SIRS 1/2, https:// sirs2.ccbs.ed.ac.uk/sirs2). 18 CT was scored for: ASPECTS 5 ; ischemic injuries in all arterial territories (based on visible hypoattenuation and/or swelling of brain); presence of hyperattenuated arteries; ICH location and size; structural mimics; and pre-stroke brain changes (atrophy, leukoaraiosis, and old stroke lesions). 18 CT image quality was recorded as good, moderate, or poor. We have previously tested interrater agreement for 7 experts using SIRS: Krippendorff's Alpha (k-alpha) was 0.66 for identifying ischemia, and 0.56 for ASPECTS. 19 Two other ischemic stroke studies in RITeS assessed CT for ischemic brain lesions, ASPECTS, and hyperattenuated arteries only. 6,10 Two RITeS studies evaluating hemorrhagic stroke included assessment of hemorrhage location and size but not ASPECTS. 12,14 Image Software Processing We processed batches of 10 CT scans using the Digital Imaging and Communications in Medicine (DICOM) format on the cloud-based Brainomix platform (https:// brainomix.com, versions 9-10). We selected the earliest scan after stroke for each patient and, to be as close as possible to software specifications, used the thinnest slice axial plane CT constructed for soft tissue viewing.
We recorded all upload and processing outcomes. Where a scan did not process, we made further attempts (with alternative DICOM image sets where available). Processing was considered "successful" when software provided an ASPECTS result or when arterial hyperattenuation or hemorrhage were detected. The e-ASPECTS allows users to input the side affected by stroke. We manually included this information for a subset of the target population (35%, 1,052 of 3,035) where side information was available and compared before and after results. We exported e-ASPECTS results to spreadsheets for analysis. We did not review the e-ASPECTS imaging overlays for every case but inspected batch outputs during processing. We also reviewed imaging overlays when uploading affected side data, and in cases that did not process normally.
Once CT processing was complete, we randomly selected a subsample of 100 scans for repeatability testing, stratified by study that had been successfully processed by e-ASPECTS. To ensure e-ASPECTS did not recognize recurrent DICOM meta-data at repeat testing, we created new unique scan identifiers for this subsample with mod-iCAS DICOM anonymizer (Erlangen, Germany).
Primary Outcomes 1. ASPECTS score agreement between experts and e-ASPECTS (including the side affected) in the target population. 2. Diagnostic accuracy of experts and e-ASPECTS for identifying CT features that might account for stroke symptoms (ie, signs of ischemia or hemorrhage) in the representative population, which is outside the intended use of the software.
Secondary Outcomes 1. Proportion of scans successfully processed by e-ASPECTS; factors associated with processing success and accuracy. 2. Repeatability of e-ASPECTS results on the subset of scans presented twice.

Testing and Statistics
We have published the RITeS Statistical Analysis Plan, summarized here. 16 We followed an "intention-to-process" methodology regardless of whether the scan processing was successful.
We principally used diagnostic accuracy statistics to compare e-ASPECTS and expert results. Reference standards varied by test. To assess e-ASPECTS for identifying acute MCA territory ischemic injury at clinically relevant thresholds (ASPECTS 10 vs 0-9; 8-10 vs 0-7; and 6-10 vs 0-5), we used masked expert ASPECTS at baseline as the reference. To compare e-ASPECTS versus masked experts at baseline for identifying features of ischemia (ischemic brain injury or hyperattenuating arteries or both) or ICH as the cause of stroke, we used the final diagnosis as the reference. To account for individual study result clustering and to assess variation within/between contributing studies, we included random-effects bivariate meta-analysis modeling estimates of sensitivity and specificity. We used the Prediction model Risk Of Bias ASsessment Tool (PROBAST) to assess the risk of bias and applicability of our testing. 20 To aid understanding, we summarized results as proportions with or without expert agreement per 100 patients.
We compared expert and software ASPECTS using Bland-Altman plots and prespecified that scores would be considered "equivalent" if within 2 points and for the same cerebral hemisphere. 16 We assessed expert-software agreement with k-alpha. Both methods assess agreement while controlling for inherent result correlation. To compare with previous work, we used Matthews Correlation Coefficient (MCC) and assessed noninferiority. 21 We prespecified that e-ASPECTS would be noninferior if the 90% confidence interval (CI) lower limit for the difference (e-ASPECTS minus expert results) was greater than À5%. 16 For assessing factors associated with expertsoftware agreement of ASPECTS and the diagnostic accuracy of software to detect MCA ischemia (compared to experts), we prespecified test variables and their subgroups. 16 We checked for collinearity in multivariable testing (variance inflation factors >5). We did not impute, but report missing data.
We conducted sensitivity analyses of our primary outcomes for randomly selected subgroups: 1. Of the target population, with a. Balanced representation from all RITeS studies by excluding excess cases from studies with more than double the median case number per trial, and b. Hyperattenuating internal carotid or middle cerebral arteries as a surrogate for large vessel occlusion (ie, not randomly selected). 22 2. Of the representative population, where stroke mimics without structural lesions represent 26% of the total (as identified in RIGHT-2 13 ).
For repeatability testing, we assessed the number of matched initial versus repeat e-ASPECTS results.
In the sample where e-ASPECTS was given information on the side affected by stroke (n = 1,052), software was less likely to score the opposite hemisphere from experts with (<1%, 3/1,052) versus without (4%, 38/1,052) this knowledge, p < 0.0001.
On sensitivity testing in a subset where 26% (221/849) had a final diagnosis of stroke mimic without a corresponding structural abnormality on CT (thus within software scope), we include 63% (538/849) with a final diagnosis of ischemia and 11% (90/849) with hemorrhage. In this subset, diagnostic accuracy results for e-ASPECTS were almost unchanged: for detection of ischemic signs, software sensitivity was 61%, and specificity was 75%; for detection of hemorrhage, software sensitivity was 97%, and specificity was 83%. Figure 4 shows the potential clinical impact per 100 patients assessed using e-ASPECTS: • With ischemic stroke, ischemia will be correctly detected in 68 but missed in 32. • Without ischemic stroke, ischemia will be incorrectly detected in 26.
• With ICH, hemorrhage will be correctly detected in 94 but missed in 6. • Without ICH, hemorrhage will be incorrectly detected in 17.
We found low variance in diagnostic accuracy for ischemia and hemorrhage detection within and between contributing RITeS studies (Supplementary Figs S1 and S2).

Secondary Outcomes Factors
Influencing CT Processing Success and Accuracy. Successful processing was most likely with slice thickness 1 to 5 mm and when experts scored the scan quality as "good." CTs that did not process were more often from older patients who presented to the hospital earlier and were less likely to include ICH (see Table 3). All these variables except age remained significantly associated with processing success on multivariable binary logistic regression, including 3,465 patients with complete demographic and CT slice thickness data (data not shown).
Most prespecified variables were associated with differences in the experts' ASPECTS versus e-ASPECTS on multivariable regression, with increasing patient age, NIHSS, or slice thickness being associated with larger score differences. Expert and e-ASPECTS scores were more similar when scans were performed later after stroke, when e-ASPECTS knew the affected side, and when MCA lesions were smaller or when ischemic lesions were outside the MCA territory (see Table 4).

Repeatability Testing
There were no differences in e-ASPECTS results on repeat processing for 99 CTs, with 100% match. Operator error excluded one scan (nonidentical image set incorrectly uploaded for repeat).

Discussion
RITeS is a large independent assessment of e-ASPECTS software for acute stroke CT and includes almost as many patients as all other prior studies combined. We used clinically relevant patients from 9 prospective studies and expert opinions as the reference standard. RITeS and the contributing studies were rigorously conducted to minimize bias. We tested e-ASPECTS according to the manufacturer's guidance: restricted to patients with symptoms of stroke, with or without CT features of ischemia but with other structural abnormalities excluded. We also enriched the dataset to include representative proportions of patients with non-MCA ischemia, hemorrhage, or a final diagnosis of stroke mimic because this may be more like patients hospitalized with suspected stroke. This latter analysis is outside the manufacturer's indications for software use. We found software performance was modified by patient and imaging variables.
Detection of Acute Ischemic Injury ASPECTS provided by software and experts were reasonably well matched; results were identical for~50% and within AE1 ASPECTS point for up to 75%. As previously shown, we found e-ASPECTS noninferior to experts in this context. 21 Software was more likely to find abnormalities, but conversely underestimated the size of larger lesions. Differences between experts and e-ASPECTS are most relevant if thresholds are used to exclude patients from thrombectomy (ASPECTS <6). Compared with other thresholds we tested, the diagnostic accuracy for e-ASPECTS was greatest at ASPECTS <6 driven by a high specificity (95%). However, the specificity was slightly reduced (90%) in the subgroup of patients with large vessel occlusion. Our findings suggest that, for patients assessed using e-ASPECTS compared to expert interpretation alone, 4% (134 false positive results from 3,035) might be miscategorized as ASPECTS <6, and potentially denied highly effective therapy. Two previous studies showed similar results for e-ASPECTS <6 with misclassifications of 1.6 to 3.4% in smaller (n~60) cohorts. 24, 25 We did not use concurrent CT perfusion or diffusion-weighted magnetic resonance imaging (MRI) to define a "ground truth." [26][27][28] Therefore, software may identify subtle ischemic injury not appreciated by experts. Indeed, software may be more sensitive than experts (68% vs 58%) for correctly detecting ischemic stroke features. However, any improvement in sensitivity is tempered by increased software false positive results compared with experts (12% vs 2%) and, consequently, lesser software specificity (74% vs 95%). The diagnostic accuracy of e-ASPECTS for detecting MCA ischemia was lower in older patients, those with more severe strokes and larger infarcts, all non-modifiable features encountered in patients eligible for thrombectomy. However, the diagnostic accuracy of software can be improved if CT image slices are thin (≤1 mm) and when e-ASPECTS is provided with the side affected by stroke; e-ASPECTS was more likely to score the "wrong" hemisphere when the affected side was unknown. These are simple modifiable factors that users can optimize, assuming a degree of vigilance to avoid side errors.
Six studies (median n = 98) include diagnostic accuracy results for e-ASPECTS with expert reference standards comparable to RITeS: sensitivity 14 to 83%, specificity 57 to 99%, and accuracy 67 to 87%. 21,[28][29][30][31][32] One study used an ASPECTS threshold as we did, 32 the others considered ischemic detection per ASPECTS region for a summed score (ie, 10 Â n). However, 2 studies using summed scores did not control for interdependency between different ASPECTS regions in the same patient. 29,30 As an alternative to accuracy, 3 of 6 studies assessed MCC citing benefits for testing datasets with true positive/negative imbalance. 21,31,32 Our MCC results are similar (0.34-0.48). For testing agreement between software and experts, our k-alpha results are like other validated reader-reliability scoring methods used in 6 studies (median n = 153): kappa (0.25-0.84) 25,33 ; intraclass correlation coefficient (0.47-0.87). [33][34][35][36][37] For all these comparisons, our results tend toward the mid-lower end of published ranges. We hypothesize the broad representation of our dataset (even in our target population) explains this. Four of the 12 studies discussed here excluded details of the time elapsed since stroke onset. Elapsed time is critical because ischemic brain lesion visibility on CT (and therefore ease of detectability) increases with time. For wider context, in a previous systematic review, we explored agreement between human readers and similar AI software from different manufacturers that also automates ASPECTS. In that analysis, we identified comparable results from three studies (total n = 609) assessing only one other software, and the results range is similar for the other software; intraclass correlation coefficient = 0.45-0.53. 3 Additionally, at least one small analysis (n = 52) directly compared e-ASPECTS with another similar software and found no significant difference between them. 29 Detection of Hemorrhage For acute hemorrhage detection, e-ASPECTS tends to over-rather than under-call. Where the volume of apparent hemorrhage was small, e-ASPECTS commonly  identified both ischemic and hemorrhagic features on the same scan (ischemic results are suppressed if ≥4 ml of hemorrhage is detected). Greater software sensitivity to "'hyperdense volumes which may indicate bleeding" compared with experts (14% by software here) might trigger additional expert radiology review and thus delay or deny (if expert opinion is not available) appropriate thrombolysis delivery, potentially limiting treatment-related improvements in patient outcomes. False negative hemorrhage detection (1%) could cause significant clinical worsening if patients with hemorrhage are inappropriately thrombolyzed. Most mimic patients in RITeS had no alternative CT lesion, but some did. Under these conditions (which are beyond software indications for use), there was greater false positive detection. Note, however, that these results did not differ on sensitivity testing with structural mimics excluded.
The potential clinical impact of our diagnostic accuracy results is summarized in Figure 4. Although e-ASPECTS correctly classifies many CTs with and without ischemia (~70-75%) or hemorrhage (~85-95%), a substantial proportion of scans are misclassified compared to the final diagnosis. In general, software was better at excluding than identifying stroke imaging features correctly (greater negative predictive values), driven by higher false positive rates compared with experts. In most analyses, experts performed better, except for true ischemic feature detection where software correctly identified more. Thus, whereas experts may find e-ASPECTS useful for detecting subtle ischemia, they should be aware of false positive feature detection in particular, and always independently assess the CT for hemorrhage. Therefore, we recommend that if e-ASPECTS is to be used, it is only used strictly as approved by US and European authorities, that is to support users who are already competent at interpreting stroke imaging. 38 Although it remains to be proven whether and how this support is helpful in real-time clinical practice. We have not tested the accuracy of combined software-expert opinion and whether this is better than expert opinion alone. The performance of experienced but non-expert clinicians with and without software is also relevant. A previous analysis comparing a range of 16 readers with and without e-ASPECTS who reviewed 60 CT scans found the ASPECTS for both expert and non-expert reader groups were more similar to 24-hour gold-standard scores when using the software. 24 This observation, and particularly its impact on care, requires prospective testing.

Impact of Patient and Imaging Factors
We found that image quality and CT slice thickness are important for successful software processing. It is unclear why scans with acute hemorrhage were more likely to process than those without. Differences in image acquisition methods for hemorrhagic versus ischemic stroke studies in RITeS may contribute. Nearly all clinical and imaging variables we tested modified the agreement between expert and software ASPECTS, as shown previously for slice thickness and presence of background brain changes. 30,39 We were surprised that pre-stroke brain changes visible on CT (atrophy, leukoaraiosis, and old stroke lesions) were not similarly negatively associated with expert-software agreement in RITeS, but they may help explain the effect of age. However, the previous analysis did not compare software and experts directly as we did, but compared both groups against a gold standard in a much smaller sample (n = 119). 30 Given the high prevalence of these pre-stroke features in elderly stroke populations (50-80% had at least one of these findings in RITeS) future assessments of AI software for stroke should also investigate their impact. Image quality did not affect human-software agreement, but fewer poor-quality scans were successfully processed and were unavailable for comparison.

Strengths and Limitations of RITeS
According to our prespecified standard, RITeS scans represent typical populations in whom e-ASPECTS may be used. 16 Background radiological features in RITeS are comparable with published elderly population data. 40 We controlled for between-study differences. Using PROBAST, we found RITeS data to be low risk for bias and appropriate for validation testing of e-ASPECTS. We had more scans successfully processed by e-ASPECTS compared with other studies using existing data (90% vs 69%). 41 We used expert interpretation of imaging for comparison with software which does not represent routine care and was not undertaken in real time. Interpretation of non-enhanced CT in acute stroke is challenging, and for features such as presence of ischemia, even experts disagree, particularly when clinical information is not available as for the majority of expert imaging assessments in RITeS. 5,19 We feel it is appropriate to compare AI software against a gold-standard due to expectations that AI performs similar to or enhances best practice. There is a risk of incorporation bias in RITeS because the index test (baseline CT) was used to derive the reference standard (final diagnosis). However, this risk is likely small because for most of our patients, follow-up information is more likely than baseline imaging to determine the reference standard. We reported our results using TRIPOD because e-ASPECTS is a prediction model for diagnosis. However, TRIPOD is not ideal for RITeS. Given our focus on diagnostic accuracy testing and inclusion of meta-analysis modeling, we have also reviewed STARD and PRISMA guidelines, respectively (Appendices S1-S3). An expert consensus statement aiming to improve legislation for radiology AI, suggests testing AI software beyond accuracy of the defined task and to test other performance (eg, reliability, when software is applied outside its designated clinical use, and how software copes with unexpected data), as we have done. 42 On sensitivity testing with balanced representation of the 9 RITeS studies, expert versus software ASPECT scores were better matched after a large proportion of IST-3 scans (the major contributor to RITeS) were removed. This likely reflects differences in the scan and patient parameters that were associated with lesser humansoftware ASPECTS agreement and were more common in IST-3, especially increased age, worse stroke severity, and thicker CT-slices.
We include an up-to-date summary of all published evidence for e-ASPECTS. As with most published studies (22/24), RITeS is a secondary analysis, albeit using prospectively collected data. However, we strove to minimize patient (or CT) selection in RITeS in a population designed to replicate routine care and we reported all outcomes. We did not exclude scans based on image quality. Dependent variable was absolute difference between expert ASPECTS less e-ASPECTS (ie, integers 0-10). Subgroups of ASPECTS difference for presentation only. n = 2,523 due to incomplete demographic and CT data for 1,577 cases. Raw data are median (interquartile range) or n (%) as appropriate. ASPECTS = Alberta Stroke Program Early CT Score; CT = computed tomography; MCA = middle cerebral artery; NIHSS = National Institutes of Health Stroke Scale.
Contemporaneous CT acquisition and processing may increase the proportion successfully handled by software.
In addition, common advances in CT technology that reduce scan time, improve tissue resolution, and reduce artifacts are likely to account for improvements in software processing success. However, as a surrogate for the modernity of CT hardware, we did not find that the year of CT acquisition differed between groups where software processing was or was not successful. Data shared by Brainomix from one UK hospital indicate >99% of 1,800 CTs were successfully processed by e-ASPECTS but the rates of processing routine, nonselected data are not publicly available. The design of RITeS cannot capture all potential benefits or risks of decision-support software when used in real time. For example, whether additional information provided by software modifies clinician detection of true stroke features on CT, care pathways, or outcome. Instead, we have maximized the use of available data. Robust evidence of benefit and absence of harm are needed to confirm the enthusiasm for decision-support AI. Our clinically relevant results should inform routine practice and guide future research.

Conclusions
When software processing is successful, e-ASPECTS has moderate diagnostic accuracy for stroke feature detection on CT. When used as indicated to detect acute MCA territory ischemia, e-ASPECTS may be more sensitive but less specific than experts with more false positive results. Increased false positive results were also apparent for hemorrhage detection and among patients with a stroke mimic (even when we excluded those with visible abnormalities). We found a 10% failure rate for software processing. Our findings emphasize that e-ASPECTS should only be used as indicated to assist experienced readers to identify possible findings and should not be used as a standalone diagnostic tool. Users should interpret software results with caution and, according to the clinical context, be capable of independently recognizing true ischemic, hemorrhagic, and mimic features on CT to counter software misclassification and if results are not provided. Users can improve software detection by inputting the side affected by stroke and by increasing image quality. Results may be less accurate in older patients and those with severe stroke. Given the rapid growth of AI software for medical imaging, it is important that early adoptions of these methods, such as for acute stroke imaging, are rigorously and independently assessed and that appropriate precedents for quality and clinical effectiveness are set. Further testing of AI software for stroke is required, especially prospective trials of clinical impact as we would expect for any new health care intervention. Ideally, these studies would include patients with suspected stroke, admitted to a range of centers reflecting different levels of expertise in stroke, randomized to clinical decisions with versus without software assistance, and should be completed before widespread software rollout.
Brainomix staff and affiliates were not involved in creation of the RITeS research plan, setting aims, research conduct including image processing, statistical analysis, interpretation of data, or the writing of the paper.