Validation of a Model for Identification of Patients With Compensated Cirrhosis at High Risk of Decompensation

: Background & Aims: It is important to rapidly identify patients with advanced liver disease. Routine tests to assess liver function and fibrosis provide data that can be used to determine patients’ prognoses. We tested the validated the ability of combined data from the ALBI and FIB-4 scoring systems to identify patients with compensated cirrhosis at highest risk for decompensation. Methods: We collected data from 145 patients with compensated cirrhosis (91% Child A cirrhosis and median MELD scores below 8) from a cohort in Nottingham, United Kingdom, followed for a median 4.59 years (development cohort). We collected baseline clinical features and recorded decompensation events. We used these data to develop a model based on liver function (assessed by the ALBI score) and extent of fibrosis (assessed by the FIB-4 index) to determine risk of decompensation. We validated the model in 2 independent external cohorts (1 in Dublin, Ireland and 1 in Menoufia, Egypt) comprising 234 patients. Results: In the development cohort, 19.3% of the patients developed decompensated cirrhosis. Using a combination of ALBI and FIB-4 scores, we developed a model that identified patients at low vs high risk of decompensation (hazard ratio [HR] for decompensation in patients with high risk score was 7.10). When we tested the scoring system in the validation cohorts, the HR for decompensation in patients with a high-risk score was 12.54 in the Ireland cohort and 5.10 in the Egypt cohort. Conclusion: We developed scoring system, based on a combination of ALBI and FIB-4 scores, that identifies patients at risk for liver decompensation. We validated the scoring system in 2 independent international cohorts (Europe and the Middle East), so it appears to apply to diverse populations.


INTRODUCTION
The progression of chronic liver disease (CLD), from fibrosis to clinical outcomes, and clinical sequelae including liver decompensation (ascites, variceal bleeding and encephalopathy) manifest relatively late in the natural history and are often the index presentation of liver disease (1). Identification of patients that need intensive monitoring and timely intervention is challenging. Robust prognostic tools using simple laboratory variables, with potential for implementation in non-specialist settings and across different health care systems, have significant appeal.
The Child Pugh and MELD score are extensively validated and easily applicable tools that have been used for decades in clinical practice; the caveats that limit their performance have been previously documented (2)(3)(4). Importantly, these scoring systems provide value after synthetic liver function has become significantly deranged and provide only short term prognostic value. Presently, there are no scores, performed in routine clinical practice, that provide robust prognostic stratification within early, compensated cirrhosis over the medium/long term.
Serum albumin and bilirubin have recently been combined in a statistical model as the 'ALBI score', a measure of liver function in patients with HCC (5)(6)(7)(8). This model has been extensively validated and subsequently extended to patients with chronic liver disease (without HCC) in a variety of clinical settings (9)(10)(11). We hypothesised that the ALBI score, being a measure of hepatic reserve, would be related to the risk of subsequent liver decompensation and tested this hypothesis in the setting of a prospective, longitudinal cohort of patients with compensated cirrhosis in Nottingham, UK. We further explored the combination ALBI with of markers of liver fibrosis (tracking progression of liver disease) as measured by the Fib-4 index. (12)(13)(14). Finally, we sought to validate this combined assessment of fibrosis and function in two independent, external cohorts.

PATIENTS AND METHODS
Our primary analysis and model building was focussed on the prospective, longitudinal Nottingham (UK) cohort. We then tested the generalisability of our findings on two independent cohorts from Dublin (Ireland) and Menoufia (Egypt).

Prospective Nottingham (UK) cohort
Patients were consecutively recruited from the Nottingham compensated cirrhosis cohort study (3CN). The 3CN study is a prospective, longitudinal study initiated in 2010 focussed on the study of early compensated liver cirrhosis. The study was approved by a NHS ethics committee and standard regulatory requirements obtained (10/H0403/10). Inclusion criteria were patients between the ages of 18-75 years, an established diagnosis of cirrhosis obtained by at least one of the following criteria: histology, radiological or endoscopic evidence of portal hypertension, clinical evidence of cirrhosis with thrombocytopenia and validated noninvasive liver fibrosis test (Transient elastography >15 kPa). Exclusion criteria included the presence of hepatocellular carcinoma at baseline, portal or splenic vein thrombosis, clinical or radiological ascites at baseline visit, history of variceal haemorrhage and any previous episode of clinical encephalopathy. Patients at baseline had blood tests drawn for routine laboratory measures which included albumin, bilirubin and platelet count. Patients were followed up at six monthly visit appointments until the end of the study duration (10/08/2016). At each clinic visit, routine bloods were drawn for assessment of liver function, full blood count, clotting and renal function and assessment made for the appearance of a liver related clinical event in the intervening period. At the end of the study all patients were assessed for clinical outcomes using digital hospital records and contacting primary care physicians directly in those failing to attend secondary care.

Prospective Dublin (Ireland) cohort
Patients were screened at general hepatology outpatients for inclusion criteria and following consent, were invited for a study visit. Enrolment was consecutive and occurred between August 2011 and April 2015.
Cirrhosis at this site was confirmed by one of the following criteria: histology proving cirrhosis, endoscopic or radiological evidence of varices and thrombocytopenia (platelets <150x10 9 ) or radiological evidence of splenomegaly (spleen >11cm) or LSM >12 kPa. Patients were excluded if they had any of the following: diagnosis of hepatocellular carcinoma, splenic or portal vein thrombosis, brain injury or unconsciousness, dementia, active substance use that would preclude clinic visits, and decompensated liver disease (clinical or radiological ascites at baseline visit, history of variceal bleeding and overt clinical encephalopathy). The date and nature of hepatic decompensation, occurring after enrolment in the study, were collected by a liver specialist and supported with either endoscopy reports, abdominal imaging or hospitalization report.

Retrospective Menoufia (Egypt) cohort
Patients were recruited from the outpatient clinic at the National Liver Institute, Menoufia University, Egypt.
This was a retrospective cohort starting in 2006 and only patients with complete follow up data were included. Patients diagnosed with liver cirrhosis were included after fulfilling at least one of the following criteria: histopathological diagnosis by liver biopsy, radiological or endoscopic evidence of portal hypertension or transient elastography >14 kPa. Patients with missing data, hepatocellular carcinoma at baseline, history of any malignancies, evidence of portal or splenic vein thrombosis and any patient with history of hepatic decompensation at time of diagnosis (clinical or radiological ascites, variceal bleeding and any previous episode of clinical encephalopathy) were excluded from the study. Routine laboratory diagnostic tests were performed including serum albumin, bilirubin and platelet count. Hepatic decompensation, following enrolment in the study, were ascertained by clinicians in charge of their care and supported with either endoscopy reports, abdominal imaging or hospitalization report.

Statistical methods
Stata/SE 14.2 (StataCorp, Texas, USA) and R software (R 3.4.4)(15) were used to undertake the analysis.
Continuous variables were presented as median (with interquartile range) and categorical variables were presented as percentages. Highly skewed variables were log10 transformed. The primary outcome was hepatic decompensation. This was defined using the clinical parameters of first episode of ascites (as defined by confirmation with ultrasonography and requiring treatment with diuretics or paracentesis) or the first variceal bleed (defined by requiring endoscopic intervention) or the first episode of encephalopathy , assessed by an experienced clinician and defined by Grade 3 / 4 West Haven classification ; whichever event occurred first. Time to decompensation (TTD) was calculated from date of entry to study until date of first recorded decompensation. Patients were censored if they underwent liver transplantation or died. Deaths, occurring in hospital and outside the hospital, were collected by using a combination of hospital patient records (Nottingham, Dublin and Egypt), family practitioner records (Nottingham and Dublin), death certificates (Nottingham, Dublin and Egypt) and telephone interviews with relatives (Egypt). When the prospective study in Nottingham was initiated we planned for 50 % decompensation rate at 4 years based on a systematic review by D'Amico and colleagues (16). Therefore, we estimated a cohort of n=150, followed for at least 4 years, would yield 75 events. Using the 10:1 rule of thumb for prediction models, we anticipated this would allow us to explore at least 7 prognostic variables. We did not use imputation for missing variables. 6

ALBI vs MELD
Liver function was initially measured by ALBI or MELD score. Univariable Cox regression was undertaken for both variables. Decompensation risk predictions by MELD and ALBI were compared using Harrell's C statistic (17), Gönen & Heller's K (18) and Royston & Sauerbrei's R 2 D (19). Higher values of these measures translate to better survival prediction. Confidence intervals (C.I.) and p-values of the comparisons were estimated using the bootstrap method (1000 samples). Patients who have both ALBI and MELD recorded where included in this part of the analysis.

Association between ALBI, FIB-4 and TTD
TTD according to baseline ALBI grade were examined via Kaplan-Meier (KM) graphs and compared using the log-rank test. Association between ALBI score and FIB-4 (appendix 1) and TTD was tested using univariable Cox regression analysis.

Building of the ALBI-FIB4 Score
A multivariable Cox model combining ALBI score and FIB-4 was generated. Model fit was compared to the univariable models using the log-likelihood ratio (LR) test. Any violation in the proportional hazards assumption was tested by examining the Schoenfeld residuals. The formula for the new score (linear predictor) was produced using the coefficients of the model. A high risk group of patients were identified by applying a cut-off at the 85 th centile of the score (patients within the top 15% risk). Our usual approach has been to define four classes based on cut-offs at the 15 th , 50 th and 85 th centile of the linear predictor as proposed by Cox and Royston (20,21). However in this study, the numbers of decompensation events was too small to justify a four class model and we therefore used the 85 th centile figure as the cut-off.

Model performance and validation
The new model was validated in two independent external cohorts from Dublin (Ireland) and Menoufia (Egypt). Median survival, hazard ratios and percentage decompensation at 3 and 5 years were calculated for each risk group in both the derivation and validation sets. KM graphs according to the two risk groups were also plotted for the derivation and validation sets.
In order to make the formula applicable to a clinics setting we developed a simple online calculator (https://jscalc.io/calc/gdEJj89Wz5PirkSL).

Model comparisons
The discriminatory performance of ALBI-FIB4 score was compared to the other scores, namely ALBI, FIB-4, MELD, and Child-Pugh score using Harrell's C statistic, Gönen & Heller's K and Royston & Sauerbrei's R 2 D. Models were also compared using integrated discrimination improvement (IDI) and category-free net reclassification improvement (NRI >0 ). IDI measures the difference in the discrimination slopes between the new model ALBI-FIB4 and the older models (ALBI, FIB4, MELD and Child-Pugh). (22) Discrimination slope is a measure of the separation in predicted probabilities for events and non-events.(23) NRI measures the amount of correct reclassifications by the new model (ALBI-FIB4) compared to the older models (ALBI, FIB4, MELD and Child-Pugh) based on calculated predicted risk probabilities. (22)(23)(24). In this analysis, an extension of IDI and NRI that incorporates censored survival data was used. (24,25) Confidence intervals (C.I.) and pvalues for IDI and NRI were estimated using perturbation-resampling (1000 iterations). Model comparisons was undertaken in both the derivation and validation sets. Due to the low number of decompensation events and in order to improve the statistical power, both validation sets were combined.

RESULTS
The Nottingham cohort comprised 145 patients with cirrhosis (91% Child Pugh A grade). The most common aetiology was alcohol (44.8%) followed by non-alcoholic fatty liver disease (29.7%) and other aetiologies as shown in Table 1 The Dublin cohort comprised of 141 patients with cirrhosis (90% Child Pugh Grade A). The most common aetiology was alcohol (39.7%) followed by HCV (29.8%). The overall decompensation rate was 12.1% over a maximum time of 6.4 years. The most common decompensation event was variceal haemorrhage (47.1%), followed by ascites (29.4%) and encephalopathy (both 23.5%). The median ALBI score at baseline was -2.48 (-2.71 to -2.13).

ALBI vs MELD
Both baseline ALBI score and MELD were significantly influenced decompensation with hazard ratios (HR) ALBI showed higher values compared to MELD, although some of these were not significant (p=0.480, p=0.001 and p=0.122 respectively). Furthermore, adding MELD to FIB-4 did not improve the fit of the model (LR test, p=0.4058) compared to adding ALBI score to FIB-4 (p=0.0078). ALBI was therefore considered a better measure for liver function compared to MELD for the purpose of this analysis.

Building of the ALBI-FIB4 Score
A multivariable Cox regression model that combined ALBI and FIB-4 scores had a better fit compared to when either of the variables were taken alone (likelihood ratio test p=0.0167, upon adding FIB-4 to ALBI score and p=0.0069 vice versa). In the multivariable model, the hazard ratio, 95% C.I. and p-value for the ALBI score and FIB-4 were 3.79 (1.53, 9.36), p=0.004 and 1.18 (1.05, 1.33), p=0.008 respectively.
The formula for this new ALBI-FIB4 score was as follows: (ALBI score*1.331) + (FIB-4*0.165). Patients with a score greater than -1.822 were considered high risk whereas those equal to or below -1.822 as low risk.

Model performance and validation
Nottingham, UK (derivation cohort) Using the ALBI-FIB4 score, 85% of the cohort were stratified in the low risk group and 15% stratified in the high risk group. The groups showed clear separation (p<0.0001) in those reaching a clinical end point of decompensation (Figure 2A). In the low risk group at baseline, 9.3 % reached a decompensation event within 9 within 3 years ( Table 2). The hazard ratios of reaching a decompensation event was 7.10 (95% C.I. 3.07 to 16.42) in the high risk group compared to the low risk group.

Dublin, Ireland and Menoufia, Egypt (external validation cohorts)
In the external validation cohorts the ALBI-FIB4 index stratified 81% of the Egyptian cohort and 82% of the Irish cohort into the low risk group. The high and low risk groups again showed clear separation (p<0.001) in both validation cohorts ( Figure 2B and 2C). The hazard ratio of reaching decompensation between the low and high risk groups was 12.54 (95% C.I. 4.25 to 36.93) in the Irish cohort and 5.10 (95% C.I. 2.07 to 12.59) in the Egyptian cohort. Percentage of patients reaching decompensation event within the first 3 or 5 years in the two validation cohorts are shown in Table 2. Table 3 displays survival prediction comparisons between ALBI-FIB4 and other scores, namely ALBI, FIB-4, MELD and Child-Pugh score using Harrell's C statistic, Gönen & Heller's K,Royston & Sauerbrei's R 2 D, IDI and NRI in both the derivation and validation sets. The table shows that overall ALBI-FIB4 and in some instances, ALBI score, gave higher values of the five measures compared to the rest, although there may not be enough statistical power to detect significant differences in many of the comparisons due to low number of decompensation events.

DISCUSSION
We have validated the ALBI score in a prospectively accrued data set as a measure of liver dysfunction in patients with chronic liver disease. In addition, we created a new score, 'ALBI-FIB4' which can effectively stratify patients for the risk of future liver decompensation. The ALBI-FIB4 score identified a high risk group more effectively than the MELD score and maintained performance in two external cohorts with distinct differences in aetiology of cirrhosis.
The Nottingham cohort represents early compensated cirrhosis as evidenced by the fact that 91% of the cohorts were Child Pugh stage A (9% child Pugh stage B) with a median MELD score of less than 8 at baseline. This explains why the cohort had predominance of ALBI grade 1 (35%) or ALBI grade 2 (63%) at baseline; with only a small fraction of ALBI grade 3 (3%). When followed prospectively over 5 years the ALBI grade showed impressive prognostic separation; a decompensation rate of 11.5 % versus 27.3 % in grade 1 versus grade 2 respectively. The validation of ALBI in a compensated cirrhosis cohort of mixed aetiology supports previous findings in compensated cirrhosis and chronic hepatitis B (11). Prior to this, the body of evidence for ALBI was focussed on patients with hepatocellular carcinoma (5)(6)(7)(8).
FIB-4 has been extensively validated as a non-invasive marker of fibrosis and shown to have a prognostic ability to predict clinical outcomes across chronic liver disease including HCV, HBV and NAFLD (12)(13)(14). The ALBI score is postulated to show early changes in synthetic function and the use of the ratio results in more sensitivity than simply observing for the individual parameters to cross the upper limit of normal. The combination has biological plausibility, by looking at markers of fibrosis and function simultaneously. The combined score showed superiority within the multivariable cox regression model but this was not translated to statistical differences in Harrell's C statistic or IDI/NRI.
An important strength of our study is that we utilised the robust clinical end points of ascites, encephalopathy and variceal bleeding as originally described in the seminal natural history studies of cirrhosis (26,27). We did not use jaundice or change in Child Pugh classification as these are directly influenced by prognostic variables we chose to study. The Nottingham cohort is a mixed aetiology cohort with compensated cirrhosis followed prospectively for clinical events. This cohort represents a formal, prospective, protocol driven study designed a priori to document the clinical history of patients with compensated cirrhosis. The collection of hard clinical outcomes, using a variety of source documents, was a central aspect of this study but we accept that within each cohort intrinsic limitations exist; e.g. a patient migrating to another health region during the period of study. The prevalence of the different aetiologies, with alcohol the dominant aetiology followed by NAFLD and viral hepatitis is very similar to a large population study of UK patients with cirrhosis (28). This implies the data has direct application to a population in which simple markers such as ALBI and FIB-4 has the greatest relevance. The validation cohort from Ireland, driven by the aetiologies of alcohol and HCV was in contrast to the cohort from Egypt with the predominant aetiologies of HCV and NAFLD. This provides reassurance that the model has generalisability for stratifying liver disease at an international level. A limitation of stratification based on the ALB-FIB-4 score is that only 15% are classified in the high risk group). We deliberately focussed on the previously validated scores of ALBI and FIB-4 in this study as our aim was to create a tool that could be used within a community setting or low resource countries.
A frequently levelled criticism of algorithms such as ALBI-FIB-4 is that they are too complicated to be applied routinely in the clinical setting. To overcome this problem we developed a simple online calculator which can be accessed using the following link: https://jscalc.io/calc/gdEJj89Wz5PirkSL.With respect to comparable models which are routinely performed in clinical practice, the ALBI-FIB-4 score was numerically superior to the other scores including the MELD score and Child Pugh Score. However a limitation of the low number of events was that we were not able to show consistent statistical superiority or look at a wide number of variables in the analysis.
We have shown that routinely available laboratory variables, combined in a novel algorithm ALBI-FIB-4, can stratify patients with cirrhosis for future risk of liver decompensation. The ability to do this in the context of early, compensated cirrhosis with preserved liver synthetic function whilst also predicting long term clinical outcomes has clinical utility for international health care systems.