Bimodality of intratumor Ki67 expression is an independent prognostic factor of overall survival in patients with invasive breast carcinoma

Proliferative activity, assessed by Ki67 immunohistochemistry (IHC), is an established prognostic and predictive biomarker of breast cancer (BC). However, it remains under-utilized due to lack of standardized robust measurement methodologies and significant intratumor heterogeneity of expression. A recently proposed methodology for IHC biomarker assessment in whole slide images (WSI), based on systematic subsampling of tissue information extracted by digital image analysis (DIA) into hexagonal tiling arrays, enables computation of a comprehensive set of Ki67 indicators, including intratumor variability. In this study, the tiling methodology was applied to assess Ki67 expression in WSI of 152 surgically removed Ki67-stained (on full-face sections) BC specimens and to test which, if any, Ki67 indicators can predict overall survival (OS). Visual Ki67 IHC estimates and conventional clinico-pathologic parameters were also included in the study. Analysis revealed linearly independent intrinsic factors of the Ki67 IHC variance: proliferation (level of expression), disordered texture (entropy), tumor size and Nottingham Prognostic Index, bimodality, and correlation. All visual and DIA-generated indicators of the level of Ki67 expression provided significant cutoff values as single predictors of OS. However, only bimodality indicators (Ashman’s D, in particular) were independent predictors of OS in the context of hormone receptor and HER2 status. From this, we conclude that spatial heterogeneity of proliferative tumor activity, measured by DIA of Ki67 IHC expression and analyzed by the hexagonal tiling approach, can serve as an independent prognostic indicator of OS in BC patients that outperforms the prognostic power of the level of proliferative activity.


Introduction
Proliferative activity of tumor tissue, commonly measured by Ki67 immunohistochemistry (IHC), remains in the spot light of breast cancer pathology: depending on the tumor type and the clinical setting, Ki67 is a well-established prognostic and a predictive marker [1]. However, its clinical application is hindered by the lack of standardized measurement methodologies and clearly defined cut-offs. After initial recommendation of a cut-off based on gene-expression definition of Luminal A breast cancer from a single reference laboratory [2], it became evident that cut-offs between "high" and "low" values for Ki67 vary between laboratories and should be adjusted to the local practices [3]. Petrelli et al [4] recently published a systematic review on prognostic value of different cut-off IHC levels of Ki67 in breast cancer, based on meta-analysis of 64,196 patients. It was concluded that Ki67 is an independent prognostic marker for overall survival (OS) in breast cancer patients; the threshold with the greatest prognostic significance remains ill-defined, although a cut-off >25% was associated with a greater risk of death. However, an evidence-based "optimal" cut-off cannot be achieved without robust measurement techniques, therefore, Ki67 potentially should be used as a continuous biomarker.
Besides the need for accurate measurement of the proportion of the Ki67 labelling index (Ki67 LI), which is simply the proportion of Ki67-positive tumor cell profiles within a defined malignant cell population, the assay is further complicated by its' intratumor heterogeneity. This involves an additional step of standardized choice of the tissue for evaluation. Detection and evaluation of hotspots of Ki67 expression in the tumor tissue can be performed by conventional microscopy, by review of whole slide images (WSI), or with the assistance of digital image analysis (DIA) tools [5,6]. However, standardized definitions of hotspots, in terms of their size, shape, and contrast to the surrounding tissue, are needed and do present another challenge for both human and machine-based measurements [7,8].
A recently proposed a methodology for comprehensive Ki67 IHC evaluation in WSI of breast cancer tissue [8], is based on the systematic subsampling of DIA-generated data into a hexagonal tiling (HexT) arrays (honeycomb). It enables computation of a comprehensive set of texture and distribution indicators for Ki67 intratumor variability and has the ability to reveal intrinsic factors behind the Ki67 IHC variance, interpreted as proliferation, entropy, bimodality, and cellularity. It also enables automated detection, quantitative evaluation, and augmented visualization of Ki67 hotspots, based on the upper quintile of the HexT data, conceptualized as "Pareto hotspot". The methodology was tested on 297 breast cancer WSI; however, the patient follow-up data were not available to test clinical utility of the approach.
This study, performed on a different patient cohort (Nottingham, UK), with the HexT methodology applied on another DIA tool, provides further support for the principle of HexT methodology in Ki67 assessment and demonstrates that intratumor heterogeneity, rather than the level of Ki67 expression in the tumor tissue, is an independent predictor of OS in breast cancer patients. After visual assessment of Ki67 IHC stained slides, five cases were excluded from the study due to the IHC staining quality issues and/or tissue artefacts. In addition, based on DIA-generated and HexT-processed data, minimum sampling requirements for spatial heterogeneity testing were applied (see below), with 152 cases remaining in this study. Clinico-pathological characteristics of the 152 cases are summarized in the Table 1.

Image acquisition and analysis
Digital images were recorded for the study, using a ScanScope XT Slide Scanner (Leica Aperio

Computation of heterogeneity parameters from hexagon tiling of the DIA-generated data
DIA results represented by Ki67-negative and Ki67-positive tumor cell nuclei with their X and Y coordinates in the WSI were partitioned into HexT, from which intratumor variance indicators were computed. The process is here described briefly, an in-depth description is available in [8]. Briefly, Hexagons of 825 pixel size corresponding to 0.75 mm circular diameter and 0.4421 mm 2 area were used in this study. HexT was generated to fit the area of the ROI, and the individual nuclei extracted by DIA were assigned to an appropriate hexagon based on their coordinates. Hexagons containing no nuclear profiles by DIA were regarded as missing data; hexagons containing fewer than 100 nuclear profiles were regarded as insufficiently sampled. A minimum requirement of 20 informative hexagons per tumor was applied in further analyses. Local Ki67% was calculated for each hexagon which was then ranked according to six Ki67 LI intervals: level 0 (0% -10%), level 1 (>10% -20%), level 2 (>20% -30%), level 3 (>30% -50%), level 4 (>50% -80%) and level 5 (>80% -100%). The ranks then formed the basis for the co-occurrence matrix used to compute Haralick

Statistical analysis
Statistical analysis was performed with SAS 9.4 software. Summary statistics and distribution analyses were performed with significance tests based on the paired sample t-test, one-way ANOVA with Bonferoni test for pairwise comparisons. Chi-squared and Fisher's exact test were used to estimate significant associations in non-parametric statistics. Inter-observer agreement was tested by kappa statistics. Factor analysis was performed using the factoring method of principal component analysis; 5 factors were retained based on a minimum eigenvalue threshold of 1.3, and a general orthomax rotation of the initial factors was performed. Product-limit estimates were used to summarize overall survival data and the log rank test was used for comparing OS distributions. OS was defined as the time from the breast surgery to the patient's survival at the end of the follow up period. Cox proportional hazards analysis was used to develop a multiple variable model to predict time to death. A combination of forward, backward, and stepwise procedures was used to arrive at the final model. Continuous variables were dichotomised to predict OS using the web-based tool "Cutoff Finder" [17]. Statistical significance was set at p<0.05 (two-sided).

Criteria and results of sampling the DIA-generated data into HexT
The DIA-generated data from 152 WSI were subsampled into HexT, with the minimum requirement of 20 hexagons, each containing at least 100 nuclear profiles, per WSI. The summary statistics of the visual evaluation data, DIA, HexT, and computed indicators is presented in the Online Resource 2.
The tumor area analyzed per WSI ranged from 4 to 164 mm², with a median of 29 mm² (overall, over 13 million cells in the tumor area of 6,000 mm 2 were evaluated). Paired t-test revealed significant underestimation bias (-7.4±16.8, p<0.0001) between the Ki67 Obs Mean score and Ki67% obtained from the WSI DIA; the latter "underestimated" the Ki67 LI (-11.6±30.1, p<0.0001) established in the previous study [12]; the ICC between the 3 variables was 0.47 (moderate).

Hotspot detection by visual review of the WSI
While reviewing 152 WSI, each of the 4 observers identified respectively 37, 67, 32, and 27 tumors with at least one hotspot. The area of the hotspot annotations provided by the observers varied from 1.5 to 5.7 mm 2 (Online Resource 2); accordingly, the relative (to the whole tumor) area of the hotspots varied from 3.3 to 14.6%. The agreement between the observers (taken pairwise) in detecting at least one hotspot was estimated by kappa coefficients ranging from 0.20 to 0.50.
Consequently, hotspots were identified in 33, 23, 12, or 11 tumors by any one, two, three, or all four observers, respectively. Analysis of the actual areas and hotspot overlaps outlined by 2 or more observers in the 46 tumors (as above) revealed that, on average, 24.4%, 13.9%, and 4.4% of the hotspot areas coincided between the 2, 3, and all 4 observers, respectively.

Factor analysis of the comprehensive Ki67 indicators
Factor analysis was performed on 152 patients with a complete set of DIA HexT data along with selected pathology data. The rotated factor pattern of the 5 factors, extracted with eigenvalues of 8.8, 4.2, 2.8, 1.8, and 1.3, respectively, is presented in Fig. 1. Factor 1 was characterized by positive and very similar loadings of the various Ki67% indicators, accompanied by low skewness of the Ki67% distribution in the HexT. Factor 1 therefore was best interpreted as the "Proliferation" factor. however, lower loadings of the visual scoring data on the factor 1 values could be noted.

Associations between the Ki67 indicators and pathology characteristics of the tumors
Associations of the tumor Ki67 indicators and the factor scores with relevant tumor characteristics were explored by ANOVA. In particular, the histological grade (G) was associated with higher factor 1 (p<0.0001) and factor 3 (p<0.0001) scores as well the corresponding primary variables.
Triple negative tumors revealed higher factor 1 scores compared to the HR positive tumors (p<0.05). Triple negative tumors and HER2 positive tumors revealed higher NPI compared to the HR positive tumors (p<0.05). Factor 2, 4, and 5 scores did not reveal significant associations.

Predictors of the overall survival of the patients
Mean duration of follow-up after the surgery was 143.4±71.4 months (range 5 to 248 months, median 156). Seventy-nine patients died during the follow-up period. The G, tumor stage, axillary nodal stage (N), NPI category, and patient's age group did not predict the OS by product-limit analysis.
Several multivariable models were developed to account simultaneously for the comprehensive Ki67 indicators and other characteristics of the tumors to predict OS ( Table 2) served as significant single predictor as well. Importantly, other indicators of Ki67% distribution (standard deviation, interquartile range, skewness) and spatial heterogeneity (factor 5 scores, correlation, entropy) also provided significant cut-off values. Factor 2 (disordered texture) did not reach statistical significance (p=0.14), while factor 3 (tumor size and NPI) provided highly significant cutoff (p<0.0001, not shown).

Discussion
Our study reveals that spatial heterogeneity of proliferative tumor activity, measured by DIA of Ki67 IHC expression and analyzed by the HexT approach, can serve as an independent prognostic indicator of OS in breast cancer patients that outperforms the prognostic power of the level of proliferative activity.
A broad set of Ki67 IHC parameters, representing the level of proliferation, pattern of distribution in the tissue, bimodality and texture indicators were tested in prognostic models along with conventional clinico-pathologic characteristics of the breast cancer patients. Remarkably, although all visual and machine-generated indicators of the level of Ki67 expression in this study provided significant cut-off values as single predictors of OS, only bimodality indicators (Ashman's D, in particular) served as the independent OS predictors in the context of HR and HER2 status. Other indicators of Ki67 spatial heterogeneity -entropy, energy, and correlation -were also significant as single predictors; however, they were "out-powered" in the multivariate Cox regression model. It is likely that the Ashman's D bimodality is a more sensitive indicator of intratumor heterogeneity than the parameters computed from the covariance matrix based on Ki67% ranks (Ashman's D is computed from histograms, searching for two Gaussian functions, taking into account the distance between the spikes and the thickness of each peak). In addition, bimodality indicators do not account for and are independent of spatial peculiarities of the individual tumor samples (size, shape, continuity of growth, etc.), which may cause "noise" in the indicators computed from the covariance matrix. Therefore, although our study presents an independent prognostic value of bimodality indicators, our findings can be indicative of the role of intratumor heterogeneity of the proliferative activity in general.
Clinical utility of Ki67 IHC as prognostic and predictive factor in breast and other cancers is greatly obscured by the lack of standardized measurement methodologies. Our data are in line with this notion: we found significant bias between Ki67% measured as microscope-based Ki67 LI in the previous study [12], DIA, and visual evaluation of the WSI on computer monitor. Furthermore, our study revealed that four observers showed rather low agreement in visual detection of at least one hotspot (kappa ranging from 0.2 to 0.5). Moreover, the size and shape of the hotspots and their spatial overlap varied greatly between the cases and observers. Without proper definitions and standardization of the hotspot detection, the efforts to find hotspots with subsequent evaluation of Ki67 expression may be another source of variation for Ki67 LI measurement. It is also remarkable that the tumors, with at least two observers detecting at least one hotspot, revealed significantly higher entropy, higher correlation and lower energy; however, they were not associated with the bimodality indicators. Employment of DIA technologies may be useful in automated detection and standardization of the size and other characteristics of the hotspots. In particular, the HexT approach enables automated evaluation of hotspots of desired absolute or relative size; for example, the Pareto hotspot would always reflect Ki67% in 20% of "the "hottest tumor" area with the 90 th percentile representing the median value in this subsampled tumor tissue [7].
Our finding, that bimodality of Ki67% distribution in the tumor tissue may be a more important prognostic factor of OS than the level of Ki67% itself, is somewhat unexpected but may have great practical impact. While efforts to standardize Ki67 LI measurements and to define clinically valid cut-offs can lead to consensus and recommendations [4], DIA tools can be calibrated [19], it is challenging to ensure the analytic accuracy of the IHC test where high precision of quantification is required. Biological variability of the tumors and tissue processing variation may interfere with the DIA-based approaches. In that regard, the bimodality and other tissue heterogeneity indicators may prove to be more robust and less sensitive to these variations.
Our data suggest that variability of intratumor proliferative activity may be a fundamental feature of tumor aggressiveness affecting final outcome of the disease, even more important than the average level of proliferation in tumor tissue. At least, it is an independent factor of the disease behavior. It is intriguing that in our study the level of proliferation (factor 1) and tumor size/NPI (factor 3) were associated with the histological grade (G), while the heterogeneity indicators (factor 2, 4, and 5 scores) did not reveal significant clinico-pathologic associations, except the impact on OS. It should also be taken into account that the factors extracted from the comprehensive Ki67 IHC dataset are linearly independent by definition. Moreover, we noted non-linear relationship between the factor 1 and 2, comparable to that reported previously in another patient cohort [18]: higher scores of the factor 2 (disordered texture) were noted in the tumors with moderate scores of the factor 1 (proliferative activity). It can be interpreted that intratumor spatial heterogeneity of the proliferative activity is a potential feature of tumors in the mid-scale of proliferative activity and represents an independent factor of unfavorable disease outcome.
Last but not least, our current study provides additional proof of principle for the HexT approach [3]: while the first study was based on different DIA tool (Aperio/Leica) and different patient cohort, the HexT approach generated essentially the same factor pattern and clinico-pathologic associations of the comprehensive Ki67 indicators (the patient follow-up data was not available in the previous study). This demonstrates that the HexT approach, which has been successfully applied in many aspects in image processing and machine vision related fields [20], has its potential in the field of microscopy images and can serve as a tool for comprehensive IHC test. Likely, this approach can be applied to any IHC or other tissue-based biomarkers where intra-tissue heterogeneity needs to be assessed [18,[21][22][23].
In summary, intratumor heterogeneity of proliferative activity, represented by bimodality indicators, is an independent prognostic factor of worse OS in breast cancer patients, while the level of proliferative activity was significant only in univariate prognostic models. Our study supports the concept that assessment of IHC staining, based on the "honeycomb subsampling" of DIA data, enables comprehensive and efficient methodology for tissue-based biomarker testing. Furthermore, our findings indicate that tissue-based biomarker assessment should take into account intra-tissue heterogeneity aspects to reveal invisible aspects of disease, benefiting from methodologies enabled by digital pathology.