A quantifier-based fuzzy classification system for breast cancer patients

Objectives: Recent studies of breast cancer data have identified seven distinct clinical phenotypes (groups) using immunohistochemical analysis and a range of different clustering techniques. Consensus between unsupervised classification algorithms has been successfully used to categorise patients into these specific groups, but often at the expenses of not classifying the whole set. It is known that fuzzy methodologies can provide linguistic based classification rules. The objective of this study was to investigate the use of fuzzy methodologies to create an easy to interpret set of classification rules, capable of placing the large majority of patients into one of the specified groups. Methods and materials: In this paper, we extend a data-driven fuzzy rule-based system for classification purposes (called ‘fuzzy quantification subsethood-based algorithm’) and combine it with a novel class assignment procedure. The whole approach is then applied to a well characterised breast cancer dataset consisting of ten protein markers for over 1,000 patients to refine previously identified groups and to present clinicians with a linguistic ruleset. A range of statistical approaches were used to compare the obtained classes to previously obtained groupings and to assess the proportion of unclassified patients. Results: A rule set was obtained from the algorithm which features one classification rule per class, using labels of High, Low or Omit for each biomarker, to determine the most appropriate class for each patient. When applied to the whole set of patients, the distribution of the obtained classes had an agreement of 0.9 when assessed using Kendall’s Tau with the original reference class distribution. In doing so, only 38 patients out of 1,073 remain unclassified, representing a more clinically usable class assignment algorithm. Conclusion: The fuzzy algorithm provides a simple to interpret, linguistic rule set which classifies over 95% of breast cancer patients into one of seven clinical groups.


Introduction
Breast cancer is the most common cancer and cause of cancer death in women in the UK [1].
It also leads in terms of numbers and complexity of available treatment options resulting in decision making difficulties regarding the most appropriate treatment choice [2]. Methods have been developed to assist in predicting outcome and to support clinical decision making in breast cancer management. One of the best known is the Nottingham prognostic index (NPI) [3], which is based on a combination of histopathological examination of tumour size, lymph node stage and tumour grading combined in a prognostic index formula [4]. The NPI is now used for the management of individual patients with breast cancer across Europe and elsewhere internationally. Recent data imply that breast cancer is a heterogeneous group of diseases with complex and distinctive underlying molecular pathogenesis [5]. However, the NPI does not contain sufficient information to represent and distinguish this heterogeneity. Further support for this hypothesis is provided by gene expression profiling which has identified distinct tumour groups that have direct clinical relevance in showing prognostic differences [6][7][8]. One of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples [9]. In addition, due to experimental and technical reasons, there are large quantities of noise and redundancy in gene expression data, which may lead to building a prognosis predictor with poor performance [10].
To address the breast cancer disease heterogeneity, clustering approaches have become more and more popular, especially for discovering profiles in cancer with respect to high-throughput genomic data [11,12]. Moreover, an alternative approach to gene expression profiling is to use established robust laboratory technology, such as immunocytochemistry on formalin fixed paraffin embedded patient tumour samples. We and others have applied protein biomarker panels, with known relevance to breast cancer, to large numbers of cases using tissue microarrays, exploring the existence and clinical significance of distinct breast cancer classes through clustering approaches [13][14][15][16].
However, since different clustering algorithms result in different clusters, particularly when large multi-dimensional data sets are considered, consensus clustering methodologies have been used in recent studies [17][18][19][20]. In previous work [21], we applied different clustering algorithms and, through a consensus clustering approach, we identified novel cancer subtypes. This was done at the expense of not classifying a large proportion (38%) of patients.
Alternative approaches may be used to 'relax' the rules of consensus clustering such as rough sets or fuzzy classification methodologies. Rough set theory introduced by Pawlak in 1982 is a mathematical tool to deal with vagueness and uncertainty of information. This approach seems to be of fundamental importance to artificial intelligence, especially in the areas of machine learning and decision support systems [22]. Rough sets theory makes use of lower and upper approximations to set boundaries, and one of its main advantages is that it does not need any preliminary or additional information about data, such as grade of membership or the value of possibility in fuzzy set theory [23]. Parthaláin et al. have successfully used rough and fuzzy-rough set methods for the analysis of mammographic data [24]. However, Li and Wang [25] stated that the rules generated by rough sets are often unstable and have low classification accuracy. For this reason, and because we were not interested in eliminating redundant data (work on reducing the number of biomarkers had been previously undertaken [26]), we focused on fuzzy rule-based systems in our study.
Fuzzy rule-based modelling has become an active research field in recent years because of its unique merits in solving complex non-linear system identification and control problems. Primary advantages of this approach include the facility for the explicit knowledge representation in the form of if-then rules, a mechanism of reasoning in human understandable terms, the capacity of taking linguistic information from human experts and combining it with numerical information, and the ability of approximating complicated non-linear functions with simpler models. Unlike conventional modelling, where a single model is used to describe the global behaviour of a system, fuzzy rule-based modelling is essentially a multi-model approach in which individual rules (where each rule acts like a 'local model') are combined to describe the global behaviour of the system [27].
Fuzzy rule-based systems (FRBS) have often been applied to classification problems in which non-fuzzy input vectors are to be assigned to one of a given set of classes to produce high classification accuracy. Many approaches have been proposed for generating and learning fuzzy if-then rules from numerical data for classification problems [28,29]. FRBS are used by Chang and Liu [30] for stock price prediction, while Ishibuchi and Yamamoto [31] show how the rule weight of each fuzzy rule can be specified in FRBS in the case of multiclass pattern classification problems. Of interest to this paper are data-driven FRBS for handling classification tasks.
There are many non-fuzzy classification algorithms currently available [32]. However, many of these classification algorithms may be very good in generalisation ability and so be very useful for classifying new instances, but lack comprehensibility of the generated models. In fact, most of the models generated by non-fuzzy classification algorithms contain numerical values and may not be linguistically interpretable. This makes it harder for the user to utilise the models for decision making purposes. Note that an automated-system, also known as a computer assisted system, is normally considered as a tool to assist experts or non-experts in decision making. Hence, interpretability of such a system should be regarded as highly important [33].
The purpose of this paper is to use a data-driven subsethood-based fuzzy rule induction algorithm, named 'fuzzy quantification subsethood-based algorithm' (fuzzyQSBA) [34] to refine previously identified breast cancer treatment groups [35]. In addition, using a rule simplification technique, a linguistic ruleset can be extracted from the algorithm. The main intention of the proposed technique is to build a model that can be easily interpreted by a non-expert in classification systems. The seven breast cancer classes presented by Green at al. were derived using clinical expert knowledge, considering patient outcomes and response to treatments. The under-lying classification was firstly proposed by Soria et al. [21], where different clustering techniques were combined using a consensus approach and six clinically relevant groups were identified. The limitation of the six-classes approach reported by Soria et al. was the high number of patients who presented mixed class characteristics and therefore remained unclassified. From a clinical perspective, reducing the number of unclassified patients represents an important challenge in order to be able to advise them on the most accurate and effective treatment. Consequently, reducing the number of unclassified patients to a minimum was also a major objective.
The structure of the paper is as follows: in section 2, the background theory of fuzzy subsethood values, fuzzy quantifiers and subsethood-based fuzzy rule induction algorithms is reported and summarised. At the end of this section, the fuzzyQSBA algorithm is described. Section 3 describes the methodology used and the algorithm specifications. Results of the application of the algorithm to the breast cancer dataset are presented in section 4. Section 5 concludes the paper with a discussion of the results, and suggestions for future work.

Fuzzy subsethood measures
Let A and B be two fuzzy sets defined on the universe U . The fuzzy subsethood value of A with regard to B, S(B, A) represents the degree to which A is a subset of B: where S(B, A) ∈ [0, 1] and ∇ is a t-norm, such as the Lukasiewicz operator [36].
The above definition of fuzzy subsethood value can be extended to calculate the degree of subsethood for linguistic terms in an attribute value V to a decision class D. If {A 1 , A 2 , . . . , A n } ∈ V , it is possible to replace A with A i and B with D in equation (1).
Many more subsethood measures have been developed and reported in literature [37]. However, in the rest of the paper we will use the definition reported in equation (1) as our goal is to extend the fuzzyQSBA algorithm [34].

Rule induction approaches
Fuzzy subsethood values have been used to promote certain linguistic terms as part of the antecedent of an emerging fuzzy rule. This approach involves three main steps [38]: a) classifying training data into subgroups according to the underlying classification results, b) calculating fuzzy subsethood values for every linguistic term, and c) creating rules based on fuzzy subsethood values.
The generation of fuzzy rules is therefore dependent on the fuzzy subsethood values between the decision to be made and the possible linguistic terms of the conditional attributes. In the approach proposed by Chen et al. [38], fuzzy rules are created subject to a pre-specified threshold value α ∈ [0, 1]. Any linguistic term that has a subsethood value that is greater than or equal to α will automatically be chosen as an antecedent for the resulting fuzzy rules. However, this methodology, termed the subsethood-based algorithm (SBA), assumes that all pieces of information gathered from the training data are equally important. This may not be the case in modelling many real problems.
For this reason, a weighted subsethood-based algorithm (WSBA) has been proposed [39], in which a certain weighting strategy has been taken to represent the degree of 'importance'. In particular, weights are created from the subsethood values to provide multiplication factors for each linguistic variable. They are calculated in an intermediate step (between steps b and c, mentioned above) using the following formula: where A i ∈ {A 1 , . . . , A l } is the i-th linguistic term of the linguistic variable A and D is the classification. The advantage of this method compared to the previous one is that it does not require any threshold value α. The crisp weights for each linguistic term can be considered as quantifiers.
A general case application of rule induction approaches is the well known 'Saturday morning problem' [40,41], in which the weather on a Saturday morning (consisting of four attributes, each of which can take two or three linguistic values) is analysed to decide which sport is to be taken (classification result). Chen et al. [38], with their SBA method, achieved a better classification accuracy than the original subsethood-based algorithm [41]. When testing the WSBA on the same problem, Rasmani and Shen obtained even better results [39].

Fuzzy quantifiers
In general, a quantifier in logic can be expressed as Q(x)A(x) where Q(x) is a quantifier and A(x) is a predicate for variable x. In classical logic, both the quantifier and the predicate can be represented by crisp sets. In fuzzy logic the quantifier may be applied to crisp or fuzzy sets. A quantifier based on fuzzy sets seems to be more suitable for quantifier based fuzzy models which are described in natural language.
Although different types of quantifier exist, the fuzzy relative quantifier Q will be considered here, in which µ Q (q) ∈ [0, 1], with q defined on the real interval [0, 1]. In particular, Q possesses non-decreasing behaviour: ∀q 1 , q 2 ∈ Q, q 1 < q 2 → µ Q (q 1 ) ≤ µ Q (q 2 ). In general, the membership function µ Q (q) of a quantifier Q has no direct meaning. Thus in evaluating a fuzzy quantified proposition, a quantification mechanism is needed to map the membership value µ Q (q) such that: An example of a quantified statement is "most students who get a high score are young", where 'most' is the quantifier, 'high' and 'young' are the fuzzy values A and B of equation (1) respectively. The result of evaluating the fuzzy relative quantifier is referred to as the truth-value of the quantifier, and is presented using notation T Q [29].
The fuzzy quantification mechanism involves the definition of the existential quantifier, ∃, and of the universal quantifier, ∀. In addition to these, several different quantifiers can be defined, such as 'almost all', 'almost half', 'a few', etc. However, as small changes in the dataset might cause a change of the entire ruleset, a continuous fuzzy quantification method appears more appropriate.
Vila et al. [42] proposed a continuous fuzzy quantifier which uses linear interpolation between the two extreme cases of the existential quantifier ∃ and the universal quantifier ∀. In particular, the quantifier was defined as: where Q is the quantifier for fuzzy set A relative to fuzzy set D and λ Q is the degree of neighbourhood of the two extreme quantifiers. The truth value of the existential quantifier T ∃,A/D and the universal quantifier T ∀,A/D were defined as: where a k and d k are the membership functions of fuzzy sets A and D respectively, ∇ represents a t-norm and ∆ represents the corresponding t-conorm. By using fuzzy subsethood values as the degree of neighbourhood (λ Q ) of the quantifiers, any possible quantifier that exists between the existential and universal quantifiers can be created in principle. Initially, all linguistic terms of each attribute are used to describe the antecedent of each rule. The reason for keeping this complete form is that every linguistic term may contain important information that should be taken into account.

FuzzyQSBA algorithm
The continuous fuzzy quantifiers are created using information extracted from data and behave as modifiers for each of the fuzzy terms. They can be then used to replace the crisp weights in WSBA, employing the quantification method proposed by Vila et al. [42]. Several reasons have been taken into account to support the use of Vila et al.'s approach: i. The use of the degree of neighbourhood enables the implementation of continuous quantifiers.
Thus, any possible quantifier can be created in principle.
ii. The relative quantifier based method proposed by Villa et al. can be adapted into WSBA easily, thanks to the structure of the WSBA general rule. Thus, the simplicity of WSBA can be preserved.
iii. Relative subsethood values can be used as the degree of neighbourhood of the fuzzy quantifiers. Thus, the two seemingly separate approaches are unified.
iv. This approach fulfills the desirable monotonicity and duality properties of quantification.
v. From a clinical point of view, continuous quantifiers are useful because their interpretability is normally regarded as highly important when developing decision support systems [43,44].
The resulting new method is called fuzzyQSBA [34] and the induced ruleset can computationally be represented by where Q(A ij , D k ) are fuzzy quantifiers as described in equation (2) and µ Aij (x) are fuzzy linguistic terms [33].
The crisp weights that were used in WSBA are herein replaced by fuzzy quantifiers. The main difference of fuzzyQSBA compared to WSBA is that in WSBA the weights for each linguistic term are crisp values and behave as multiplication factor for the linguistic terms. In fuzzyQSBA both the quantifiers and the linguistic terms are fuzzy sets. This offers flexibility as it enables the use of t-norm operators to interpret Q(A ij , D k )∇µ Aij (x) whilst guaranteeing that the inference results are fuzzy sets.
The use of fuzzy quantifier in QSBA also enables representation of the ruleset in a more natural way. This can be shown by the following example, in which a general rule is considered for the three different algorithms: fuzzyQSBA "IF A is ((almost all) A1 OR (a little)A2) and B is ((almost all)B2 OR (almost a quarter of)B3) AND C is ((almost all)C1) THEN Output is D".
Clearly, the use of fuzzy quantifiers make the model more readable, although the computation still needs to be performed using real numbers. Rules presented in the last example above are also useful for clinical judgment. For most of the biomarkers used in this work, there are no standard cut points used in clinical practice. The clinical cut point for ER and PgR, for example, is used to identify patients suitable for hormone therapy. However, there is evidence of a differential response to hormone therapy with increasing levels of these receptors supporting use of continuous data [45]. In addition, no evidence exists for a single clear HER2 status / protein level and response to treatment. For these reasons, the use of continuous rather than categorical data was deemed to be more appropriate for all markers, and hence the rules in the aforementioned form.
Based on the definitions of the fuzzy subsethood value, the existential quantifier and the universal quantifier (equations (1), (3) and (4)), it can be shown that if λ Q is equal to zero then the truth-value of quantifier Q will also be equal to zero. Thus, during the rule generation process, the emerging ruleset is simplified as any linguistic terms whose quantifier has the truth-value of zero will be removed automatically from the fuzzy rule antecedents. Figure 1 shows the framework for this approach.

Algorithm specification
The dataset used for the development of the algorithm consists of a cohort of 1,073 patients presented at Nottingham city hospital between 1986 and 1998 with primary operable breast cancer [46]. Among all the available information, the following ten markers were considered:  [21], this was subsequently reduced down to the above mentioned ten as the minimum number of markers compatible with retaining usefulness for clinical decision making [26]. The minimised panel of ten protein biomarkers has been recently used to identify core classes which are clinically meaningful and well-characterised [35]. Three of these classes had not been previously identified and, while their precise prognostic and therapeutic relevance is not yet clear, their elucidation serves as a basis for ongoing investigations in order to address these important factors.
The core molecular classes identified by Green et al. are similar to those determined by gene expression profiling, but we have been able to refine the definition of the luminal and basal tumours into further distinct classes with different clinical outcome.
The same seven classes previously identified [35] were considered, to be classified using the specified ten markers. The original distribution of patients in these seven groups is presented in the first row of table 1. It can be seen that 76 patients remained unclassified (either distant from all classes or presenting mixed characteristics). In the development of the algorithm, all ten markers were used for the identification of the proper class. No missing values were present in the data set, so the set of 1,073 patients is complete with all information for the ten markers.
[ Table 1 about here.] The whole algorithm was coded using R, a free software environment for computing and graphics [47].

Class membership algorithm
The data-driven subsethood-based fuzzy rule induction algorithm, fuzzyQSBA [34] was used to determine the fuzzy class membership rules. In our particular case, the sets A and D of equation (2) are the set of fuzzified data and the set of classification outcomes, respectively. In addition, it is important to note that the classification outcome D is not fuzzy. Thus, the value of d k in equations (3) and (4) is always one.
Training and test data sets were transformed (fuzzified) using membership functions to create values representing the terms 'high' and 'low'. Membership functions were represented using sigmoid equations. In particular, for the term 'low' the function f (x) = 1 1 + e k(x−c) was used, while f (x) = 1 1 + e −k(x−c) was used for 'high'. In these equations, k represents a constant value defining the slope of the curve, while c is the fixed cut-off point for the specific variable.
For each variable, cut-off points c were selected to determine whether a particular value should be considered 'high' or 'low'. This was done by combining clinical knowledge and information extracted from the data. In particular, the median value of markers was used for ER, PgR, CK7/8, HER3, HER4 and MUC1. Clinical expertise was used for those markers (CK5/6 and HER2) for which clinical knowledge concerning the appropriate cut-off value is well-established, and for those which had a median equal to zero (EGFR and p53). An example of possible membership functions for the ten markers is shown in figure 2.
Having selected the cut-off c for each variable, the same values of c and the slope k was used for both 'low' and 'high' membership functions to maintain that µ(low) = 1 − µ(high). [ Figure 2 about here.] The next step was to select the t-norms and t-conorms to be used for conjunction ('and') and disjunction ('or') operations. A t-norm is a kind of binary operation used in fuzzy logic which generalises conjunction in logic. T-norms are also used to construct the intersection of fuzzy sets. Different examples of t-norms have been proposed, with the most commonly used being the following: T-conorms are dual to t-norms, generalising disjunction. Given a t-norm, the complementary conorm is defined by Important t-conorms are those dual to prominent t-norms: In the development of the algorithm, the two different operator families (min-max and productsum) were compared in both testing and training phases. It was found that the best overall performance was obtained when the min-max operators were used in the training phase (for deriving the classification rules) while product-sum were used in applying the classification rules, particularly in terms of the distribution of patients in the HER2 groups. This may be related to the fact that, with the min-max operators, there is a risk of losing some information. If, for instance, the minimum between two values has to be computed and one of them is always 0.01, then the result will not be affected by the second term being either 0.99 or 0.02. While we cannot explain the theoretical basis for this result, nevertheless, we selected the best overall model. This measure differs from a conventional probability, because in the latter case the sum of all probabilities of a single instance across all classes should be one. For possibilities, instead, every number should be between 0 and 1, but the sum across classes may be greater than 1. As a result, in the original dataset, seven extra columns were to be added by the algorithm for each patient.
In each of these, a class membership (possibility) was reported. An example output is shown in

Class assignment algorithm
It is important to distinguish the fuzzy class membership algorithm from the class assignment algorithm. The former takes the H-scores for the ten markers (from clinical measurement) and uses the fuzzy methodology described in section 3.2 to determine the fuzzy class membership of the patient in each of the seven classes. The latter subsequently takes the results obtained from the class membership algorithm, and uses a 'hard' strategy described below to determine the single 'best' class to represent each patient. This allows classes to be populated to allow comparisons with previous classifications and to meet algorithm specifications. The class assignment algorithm works as follows. Once a patient has been assigned a membership value for each class, the first and the second highest membership values are considered. If the difference between them is greater than a specified threshold, then the patient is assigned to the class with the maximum membership.
If the difference is less than the threshold but the second maximum is in the same class family (luminal / basal / HER2) as the first maximum, then the patient is also assigned to the class with maximum membership. Otherwise the patient is assigned to the 'not classified' group. The specific values of class assignment thresholds are not revealed in this paper as it is intellectual property of a spin-out company, Nottingham Prognostic Ltd [48], which is commercialising the decision support system.

Verification and Validation
Once completed, the algorithm was verified to assess whether it fulfils its requirements using the same 'internal' dataset. Following suggestions from clinicians, it was agreed that a suitable final classification should have between 12% and 15% of patients in HER2 classes (6 and 7 combined), while the number of 'not classified' patients should remain lower than 5%. Cohen's kappa index [49] and Kendall's tau coefficient [50] were used to measure the agreement between old and new classifications.
To avoid the over-fitting problem and issues about performing a test on self, the method underwent preliminary validation on novel data to determine whether it is applicable to other sources. An additional set of 238 patients, recently added to the Nottingham Tenovus Primary Breast Carcinoma Series [46] was used. Information about the ten biomarkers was available for all patients. As a first measure of comparison between obtained results, boxplots of the marker distributions in each class were created for the original (1,073) cases and the new (238) cases.

Marker distributions were analysed in each class using Kruskal-Wallis tests.
A complete and thorough independent validation is still required in order to confirm the algorithm for clinical use. To perform this further validation, new data are currently being collected, and the whole validation process will be the subject of future work.

Results
The algorithm was run over the entire data. While the training of the algorithm was performed on the original dataset omitting the 76 not-classified cases (i.e. 997 cases), the whole dataset (1,073 cases) was used for testing purposes. Having defined all the necessary terms, equation (5) could be applied to define the ruleset and to compute membership values to each class.

Class membership algorithm
The linguistic rules table was generated using the quantifiers obtained by equation (2) and the cut-off points. In particular, the quantifiers table contained values for the 'high' and 'low' rules.
The difference d between these two values was compared with a threshold λ and the terms High, Low and Omit were placed in the linguistic rules table using some rule simplification techniques.
In particular, if the absolute value of d was lower than λ then Omit was entered in the table. If d was greater than zero, then High was entered, otherwise, if d was smaller than zero, Low was placed in the table. By using this procedure, the linguistic rules table reported in table 3 was   obtained. [ In table 3, Omit means that the specified marker is not considered for the respective class membership. As all markers appear in at least one class rule, then in general all ten markers are needed for any new case. It might be possible, of course, to implement a 'step-wise' algorithm that measures the minimum number of markers that characterise any one single class (four markers for class 3) and assess whether they match. If so, the sample could be assigned to that class; if not, then more markers need to be measured. By doing so, it would be possible to reduce the number of markers measured for some samples, but at the expense of extra complexity. Note that the class assignment algorithm outlined in section 3.3 would also need detailed alteration, as the algorithm presented requires all seven class memberships to be calculated as input to the algorithm. We propose the simpler option of measuring all ten markers for all cases. Table 3 was then compared to the ruleset defined by expert clinicians following the classification obtained by Green et al. [35] and reported in figure 3. It can be seen that ER, HER2, PgR and p53 are completely concordant, while CK 7/8 and CK 5/6 clearly identify the basal group (classes 4 and 5). HER4 discriminates between classes 1 and 2, while HER3 is also relevant in the characterisation of the latter. It is worth noting, in fact, that re-running a similar algorithm without considering the HER3 marker (i.e. using a 9-marker dataset) leaves a considerable number

Class assignment algorithm
By using the class assignment rules described above, the final classification was obtained as shown in table 1 (second row).

Verification and Validation
The HER2 group represented 13.7% of the total number of patients, while the 38 unclassified

Discussion
This paper has presented a data-driven subsethood-based fuzzy rule induction algorithm, fuzzyQSBA [34] and its application to a breast cancer dataset. The results show that the model is able to categorise patients into the seven treatment groups previously identified [35] and demonstrate that the final classification indeed meets the initial algorithm requirements and specifications. In addition, our proposed model provides a simple, understandable rule set for classification of patients.
In recent years, several studies have been carried out investigating the application of protein biomarker panels (with known relevance to breast cancer), to large numbers of cases using tissue microarrays, exploring the existence and clinical significance of distinct breast cancer classes [13][14][15][16]. In particular, Abd El-Rehim et al. [46] identified and characterised five breast cancer classes, with a sixth group of only four cases also identified but considered too small for further detailed assessment. Subsequently, we investigated the stability of the proposed classification across different case sets, assay methods and data analysis procedures by investigating the effects of multiple hard-clustering methods on a breast cancer dataset [21,51]. This led to a clear definition of cancer classes, but left many patients in a mixed-classified or unclassified group.
A different approach to hard-clustering is the use of fuzzy methodologies which have become more and more important over recent years in addressing classification problems. Specifically, fuzzy rule-based systems have been utilised to produce high classification accuracy through linguistic rulesets. As a consequence, the fuzzyQSBA algorithm was developed [29] which uses continuous fuzzy quantifiers to create the ruleset.
Using the fuzzyQSBA method together with the class assignment algorithm presented in this paper, it was possible to obtain a refinement of the seven breast cancer classes presented by Green et al. in [35]. This has led to a more 'clinically acceptable' classification, with the proportion of HER2-positive patients ranging between 12.5% and 15% and the total number of unclassified patients being only 3.5% of the available cases. The distribution and percentages of breast cancer patients into the three big classes of luminal, basal and HER2 were established in a seminal study by Sorlie et al. [6] and confirmed by subsequent papers [52][53][54]. The method described here has produced a breast cancer classification consistent with the proportion of cancer subtypes reported in other studies. In addition, these new subclasses have significant differences in tumour characteristics and in clinical outcome, as reported in our most recent study [45]. A linguistic rules table representing the numerical ruleset was also produced (table 3), to facilitate the decision making process for any possible future patient having been diagnosed with breast cancer. By comparing it with the expert ruleset created by clinicians ( figure 3), it can be seen how easily understandable and clinically interpretable our proposed model is. As a matter of fact, the only difference concerns the HER3 biomarker, which seems to be only relevant for the classification of patients in class 2 (and is Omit for all other classes). Further analysis is needed to check whether the incorporation of another marker in the model (Ki67/MIB1) can make HER3 redundant.
The distinction between the class membership and the class assignment algorithms in our proposed methodology is a real strength and can facilitate the medical decision making process. First, the class membership provides an indication of each patient's likelihood to present characteristics of the specified classes. This resulting table can be directly analysed by medical experts when deciding which therapy might be the most beneficial for a particular patient. If, instead, a more clear and decisive classification is required, it is sufficient to run the class assignment algorithm to obtain indication of each class population. However, one can argue that too many variables and thresholds need to be manually passed to the proposed approach. While we acknowledge this, and accept that it may be seen as a limitation, we argue that the existence of such parameters provides the future potential to adjust the parameters to reflect different clinical priorities or external conditions. From a medical perspective, the definition of seven classes resulting from this paper has been used as a starting point for the creation of a clinically usable tool for prospective classification (called 'NPI+'), taking into account current therapeutic strategies [45].
In conclusion, we have shown how the use of fuzzy quantifiers in subsethood based algorithm may improve both classification accuracy and interpretability of derived rulesets. Clinicians can use the linguistic ruleset to quickly assess patients tumour biology and select the most appropriate treatment regimen accordingly. A thorough external validation phase is underway, in which more data from different European centres are being collected and scored to properly assess the accuracy of our methodology and to address concerns about biases and self-testing. In the meantime, validation on a newer small breast cancer cohort has given promising results. Future work will also focus on determining whether novel markers need to be incorporated in the model itself.   After algorithm results nc38 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  (b) 238 patients    Table 2: An example of a possible output of the algorithm. Note that patient 2879 was originally assigned to class 4, although the basal marker CK5/6 has a value of zero. This explains why the possibility value for class 4 is not particularly high.