Coevolutionary Fuzzy Attribute Order Reduction With Complete Attribute-Value Space Tree

Since big data sets are structurally complex, high-dimensional, and their attributes exhibit some redundant and irrelevant information, the selection, evaluation, and combination of those large-scale attributes pose huge challenges to traditional methods. Fuzzy rough sets have emerged as a powerful vehicle to deal with uncertain and fuzzy attributes in big data problems that involve a very large number of variables to be analyzed in a very short time. In order to further overcome the inefficiency of traditional algorithms in the uncertain and fuzzy big data, in this paper we present a new coevolutionary fuzzy attribute order reduction algorithm (CFAOR) based on a complete attribute-value space tree. A complete attribute-value space tree model of decision table is designed in the attribute space to adaptively prune and optimize the attribute order tree. The fuzzy similarity of multimodality attributes can be extracted to satisfy the needs of users with the better convergence speed and classification performance. Then, the decision rule sets generate a series of rule chains to form an efficient cascade attribute order reduction and classification with a rough entropy threshold. Finally, the performance of CFAOR is assessed with a set of benchmark problems that contain complex high dimensional datasets with noise. The experimental results demonstrate that CFAOR can achieve the higher average computational efficiency and classification accuracy, compared with the state-of-the-art methods. Furthermore, CFAOR is applied to extract different tissues surfaces of dynamical changing infant cerebral cortex and it achieves a satisfying consistency with those of medical experts, which shows its potential significance for the disorder prediction of infant cerebrum.


I. INTRODUCTION
I N RECENT years, with the development of various technologies, a large number of data are being continuely generated around us. Big data have attracted plenty of attention from a variety of fields such as biology, health, business management, cognition analysis in human brain, and so on [1]- [3]. The data contents include a complex mixture of texts, speeches, images, and videos [4], [5]. It is also true that massive amount of data can potentially provide a much deeper understanding of both nature and society, opening up new ways for research. It is, however, a challenging task to extract useful knowledge from such big data.
It has been observed that many real datasets are usually structurally complex, high-dimensional, and multi-granular. Those attributes usually exhibit some irrelevant and redundancy information. In these cases, the big datasets increase dynamically in size, which occur in a few of fields such as the public health and welfare, economics analysis, and medical research. Therefore, a series of emerging topics such as big data acquisition, storage, management and processing are important issues [6], [7]. It becomes highly desirable to develop some effective representative methodologies to analyze big data and further handle their characteristics, such as redundancy, uncertainty, fuzzy, and heterogeneity. The massive amount of data makes traditional data analytical methods inadequate to tackle many real-world high dimensional problems. Although a large number of candidate attribute sets is provided, most of them may turn out to be redundant or irrelevant, which heavily deteriorates the performance of traditional methods.
With the over-flooding of big data, researchers and practitioners have started showing remarkable interest to explore the data space, and have considered that structuralized knowledge reasoning is an effective computational paradigm for dealing with big data tasks. Granular computing (GC) focuses on the knowledge representation and reasoning with information granules, and fuzzy sets and rough sets are two crucial branches of GC [8], [9]. Fuzzy set theory (FST) was introduced by Zadeh in 1965 to represent concepts with ambiguous boundaries and to understand the processes of complex human reasoning [10]. It has become a popular tool for the design of fuzzy classifiers. However, a fuzzy set is only characterized by a membership function, which largely ignores the uncertainty. Rough set theory (RST) was presented by Pawlak in 1982 to quantitatively analyze the uncertainty and to process incomplete knowledge [11]. It can find a decision-making table between the strict statistics and random distribution. Since RST can typically describe the uncertainty of knowledge, it has been extensively used in data mining, knowledge discovery, and intelligent system [12]- [16]. Fuzzy rough sets (FRS) appear as a combination of the advantages of two complementary areas (RST and FST), which provides an effective way to overcome the problem of discretization. It can be widely applied to various kinds of attribute reduction problems of numerical or continuous large-scale datasets [17]- [20]. FRS is defined by two fuzzy sets, fuzzy lower and upper approximations, which are obtained by extending the corresponding crisp rough set notions. Elements that belong to the lower approximation are considered to belong to the approximated set with absolute certainty. Elements in the fuzzy rough case have a membership in the range, which allows for a greater flexibility in handling uncertain information [21], [22]. Thus, there is a good potential to improve reasoning and understanding of big data by using a FRS method.

A. Related Work
In recent years, some significant algorithms and models based on FRS have been presented. Zhao et al. [23] used one generalized FRS model to construct a rule-based classifier, in which the consistence degree was used for the reasonable critical value to keep the discernibility information invariant. The experimental results showed this model was effective and feasible on noisy data, whereas, its computational capability in big data needed to further be improved. Jensen and Cornelis [24] exploited the concepts of lower and upper approximations based on FRS and put forward a new nearest neighborhood algorithm to classify all test objects and predict their decision values. Experimental results showed that this algorithm was competitive with some leading classification methods. However, one obvious limitation was that no way was designed to handle the data possessing missing values. Hu et al. [25] summarized the properties of typical fuzzy rough models in handling noisy tasks and revealed why they were sensitive to the level of noise on fuzzy rough computation. Then a collection of robust FRS models based on fuzzy lower approximations is developed, and the experiments results on real-world tasks illustrated the effectiveness of these models. Parthaláin and Jensen [26] used FRS to select features for inclusion and removal from the final candidate subset and presented two unsupervised feature selection approaches as UFRFS and dUFRFS. The approaches were shown to retain useful features. But UFRFS utilized a simple but nevertheless effective backwards elimination way for search, whilst dUFRFS adopted a greedy forward selection way. Furthermore, two search techniques often returned sub-optimal results. Zeng et al. [27] combined the hybrid distance and the Gaussian kernel to construct a novel FRS, and presented the incremental algorithms for feature selection. The efficiency for updating feature sets can be improved, but the variation of multi-attributes was not taken into full consideration. Maji and Garai [28] defined the lower and upper relevance and significance of features for Interval type 2 (IT2) fuzzy approximation spaces, and presented an IT2 FRS-based attribute selection method by integrating the merits of IT2 FRS and the maximal relevance-maximal significance (MRMS) criterion. The effectiveness of the proposed method was shown on several benchmarks and microarray gene expression data sets. Yang et al. [29] presented two incremen-tal algorithms with FRS for attribute reduction in terms of one incoming sample and multiple incoming samples, respectively. The relative discernibility relation was incrementally calculated for each condition attribute. The experimental results demonstrated that proposed algorithms could obtain the reduction result with the good classification accuracy. But they were not applied to real-world big data. So we need to further promote their efficiency on the complex, high-dimensional, and multigranular big data applications.

B. Limitations and Challenges
In the era of big data, the recent apparent progress of FRS algorithms can be beneficial for the analysis of our confronting big data problems. Meanwhile, it is worth mentioning that, although these algorithms based on FRS are dominant in the classification performance, there is still a lack of deep studies for their applications in current complex big data. The traditional algorithms are suffering from the essential limitations and challenges as follows: 1) Most of traditional algorithms are more suitable for the medium datasets. If the sample size or the attribute size of the datasets becomes very large, the processing time of attribute reduction will tremendously grow with the increasing of feature dimensions and number of instances. Furthermore, the data dynamism is due to the mechanism that generates related big data changes at different times or different real-world circumstances, which adds new uncertainty for big data analysis. Thus, that inherent interaction relation among different attributes is not fully captured. Although we can incorporate some known information about the desired data partitions into decomposition process, it is not valid for handling dynamic big data tasks. Improving the efficiency of fuzzy rough attribute reduction algorithms in dynamically increasing big data has become a significant research topic, which accelerates the process of finding reduction sets. 2) The noise problem is one of the main sources of uncertainty in big data applications. When adding noisy or inconsistent data sets that have a lot of boundary objects, most of traditional algorithms usually result in some undesirable feature subsets since their auxiliary space will occupy a large amount of memory, which will be detrimental to the attribute reduction process. Moreover, data objects are normally associated with complex classification scenarios. With the dramatically increasing noise, the speed and volume performances of data generation will deteriorate rapidly. So it is very difficult to generate accurate fuzzy similarity relations for the effective process of fuzzy attribute reduction. The result is often unable to guarantee that the desired reduction set is the optimal attribute set which satisfies the user's need. Consequently, the performance of attribute reduction is often unreliable in most cases. Clearly, the noise problem in big data greatly restricts the practical applications of traditional algorithms.

C. Contributions
In order to address the challenges listed above, we present a new co-evolutionary fuzzy attribute order reduction (CFAOR) algorithm based on complete attribute-value space tree (CAST) to develop the efficient attribute reduction performance for the high-dimensional and uncertain big data. CAST of decision table in the attribute space can adaptively prune and optimize the attribute order tree. The reduced attribute set can satisfy the needs of users with a better convergence speed, which provides the same classification performance with the original attribute set. This CFAOR algorithm is not only suitable for the changing large-scale datasets with interdependent and overlapping attribute variables, but also satisfies the large-scale complex noisy data, which can preserve the consistency of a given decision table. This CAST provides a new viewpoint to understand and extend the FRS-based fuzzy attribute reduction of big data.
CFAOR is widely compared with state-of-the-art fuzzy attribute reduction methods on publicly-available datasets. The experimental results demonstrate the superiority of CFAOR. CFAOR is applied to identify different tissues surfaces of dynamical changing infant cerebral cortex and it can find more preferred different tissue surfaces from dynamical changing infant cerebrum regions. These encouraging results can achieve the satisfying consistency with those of medical experts. So the main advantages of CFAOR are the high efficiency and robustness for attribute reduction solutions, which makes it particularly suitable for complex big data.

D. Organization
This paper is organized as follows: In Section II, we provide some preliminaries. A CAST model of decision table is constructed in Section III. A new CFAOR algorithm is presented in Section IV. An extensive experimental evaluation is provided in Section V. The application performance in the tissues extraction of dynamical changing infant cerebrum regions is detailed in Section VI. Finally, some conclusions are given in Section VII.

II. PRELIMINARIES
In this section, we introduce some relevant preliminaries related to fuzzy rough set theory and decision-making system with rough entropy threshold.
Definition 1: [10] Let U be a non-empty finite set of samples. Each sample is described by a set of real-valued condition attributes A and a symbolic decision attribute set D = {d}. The pair (U, A ∪ D) and A ∩ D = ∅ is called a fuzzy decision table. If the decision attribute d divides U into a family of disjoint sub- } is denoted as the decision class to which the sample x belongs.
Definition 2: [8] Generalization of the granule based approximation operators can be obtained by replacing the partition (U/E) by a covering of U . Let I be an index set, then a collec- Definition 3: [30], [31] A pair of approximation operators is called as dual, if for all A ⊆ U, apr(co(A)) = co(apr(A)),where co(•) denotes a covering operation of universe of objects. Equivalence classes can be generalized by neighborhood operators. A neighborhood operator N is a mapping N : U → P(U), where P(U) represents the collection of subsets of U . It is assumed that the neighborhood operator is Definition 4: [32] For each condition attribute a ∈ A, one can define a fuzzy binary relation R a , which is called a fuzzy equivalence relation if R a is reflexive R(x, x) = 1, symmetric R(x, y) = R(y, x), and sup-min transitive as A subset B ⊆ A can also define a fuzzy equivalence relation, denoted by where R a is a fuzzy equivalence relation. Let F (U ) be the fuzzy power set of U and B ⊆ A. For each x ∈ U , a pair of lower and upper approximation operators of X ⊆ F (U ) based on R B is defined as is considered as the degree of x certainly belonging to X, while R B (X)(x) is the degree of x possibly belonging to X. (R B (X), R B (X)) is referred to as the fuzzy rough set of X with respect to B.
The essences of lower and upper approximations are demonstrated in Fig. 2.
Definition 5: For a fuzzy decision table (U, A ∪ D), and B ⊆ A, the fuzzy-rough positive region of D with respect to B is defined as Definition 6: An attribute subset P ⊆ A is called a reduct of A relative to D, if the following conditions are satisfied: i) For (∀x ∈ U ), ii) For (∀a ∈ P ), ∃y ∈ U which satisfies with So a reduct P is a minimum attribute subset of condition attributes that keeps the positive region of D with respect to A. It can discern these sample pairs, of which the corresponding discernibility attribute sets are not empty.
Definition 7: For the fuzzy-rough attribute reduction process, it must be able to find the dependency between various subsets of the original feature set to deal with multiple features. It is necessary to determine the degree of dependency of the decision attribute with respect to P = {a, b}. In the fuzzy case, since objects may belong to several equivalence classes, the cartesian product of U/IN D({a}) and U/IN D({b}) must be considered to determine U/P as follows: Definition 8: For a fuzzy decision table (U, A ∪ D) and the condition attribute set A = {a 1 , a 2 , . . . , a m }, the attribute order satisfying the user's requirements under A is denoted as S(A) : a 1 ≺ a 2 ≺ · · · ≺ a k . Therefore, the optimal reduction model of attribute order is defined as follows: (9) where g : A(S) → A(R) is any output reduction of attribute sequence with satisfying the needs of users, and R is the reduction of decision table.
Definition 9: Supposed a fuzzy decision table Let the rough entropy threshold be μ (0.75 ≤ μ ≤ 0.95). The decision rules set with μ is defined as follows: i) If a decision rule class Y j (1 ≤ j ≤ t) is absolutely rough set with the indiscernible relation Q, then the rule set Q μ − → d is an absolute rough decision rule set with μ. ii) If a decision rule class Y j (j > t) is relatively rough set with the indiscernible relation Q, then the rule set Q μ − → d is a relatively rough decision rule set with μ.

III. COMPLETE ATTRIBUTE-VALUE SPACE TREE MODEL
In this section, we present a new optimization model of complete attribute-value space tree structure for fuzzy rough at-tribute order reduction to find the optimal solution. This tree can adaptively adjust the topological structure of attribute complete tree, and it can successfully finish pruning and optimizing the attribute order tree for the high-dimensional and uncertain big data. The reduced attribute set can satisfy the needs of users with maintaining a good diversity and a high convergence speed, which provides the same classification performance as the original attribute set.
Definition 10: Basic attributes-value tree denotes that each node of tree is assigned an attribute associated with a basic category, and each branch of node is assigned a value in the range of node attribute's value. The related attributes from the root node to any leaf nodes are satisfied with the given related attribute order.
Given a fuzzy decision table (U, A ∪ D), the attributes order S(B) is defined as (10) Attributes-value tree with S(B) is a descending order tree, in which each non-leaf node is given an attribute value a i of B, each branch is assigned a value in the range of a i , and each node is associated with a subset of U .
Definition 11: Given a fuzzy decision table (U, A ∪ D) and condition attribute subset which can be also denoted as T D (a, S(B), U). So the subtree on the root node a it is represented as T D (a it ).
Definition 12: Supposed that complete attribute-value space tree (CAST) can be represented as a n-order subtrees {T 1 , . . . , T i , . . . , T n }, as outlined in Fig. 3, where Subpop i , P ar i , and Elit i refer to the ith subpopulation, the ith parent node and the ith elitist node in T i , respectively, This CAST can self-adapt the subpopulation sizes in different subtrees and it is employed to capture the interacting attribute order variables by exploiting deep correlation and interdependency among interacting attributes order subsets of big data.
Initially, all co-evolutionary particles are assigned in each node of the original attribute-value space tree, and each inner branch is regarded as a subpopulation with the same number of nodes. As depicted in Fig. 3, it contains two kinds of nodes. One is the ordinary particle, denoted by white hollow dot ' ', and the other one is the elitist particle, denoted by the black entity dot ' ', which is the best children node in each subtree.
In order to select the best elitist in each subpopulation, particles will be compared by their fitness in attribute-value space tree. In each iteration, each P ar i in Subpop i is compared to Elit i in T i . If f (Elit i ) < f(P ar i ), this elitist node of this CAST will be moved up and be exchanged with the parent node. This procedure is continued until all elitists are selected.
Due to excessively pursuing the elitists, CAST regardless of selection directions easily results in the pruning and optimization of attribute order tree into the opposite direction to the sink. Thus, the length of CAST is increased. With the extension operators, the adjacent elitists will be integrated to reconstruct a unified elitist attribute-value space tree.
The main construction processes of CAST are described as follows: 1. According to the attribute order of S(B), all nodes in the ith layer (i ⊆ [1, k]) are assigned attribute value a i . The non-leaf nodes in the same level take the same attribute and all leaf nodes are placed in the (k + 1) th layer. 2. Each node in the (i + 1) th layer is associated with an equivalent class as If the ith node in the ith layer is associated with the equivalence class as E i and the child nodes in the (i + 1) th layer is associated with the equivalence class as E (i + 1)p , the associated equivalent classes of the child nodes are mutually disjoint as E i (i + 1) = ∪ p E (i + 1)p . (i) Calculate E + k = ∪ p E P , where E P is the equivalence classes associated with all leaf nodes in T D (a it ), and remove all branches of T D (a it ). So the node a it is the leaf node a + k of the associated equivalence class E + k . (ii) Replace a + k by the subtree T D (d k ) according to features of T (a, S(B), U ), and continue to prune the corresponding subtree. 5. Adopt the truncated basis to optimize the underlying structure of attribute order tree by where n is the number of variables, w i is the selected loading vector and u i is its corresponding coefficient. 6. Select the 'best-n-basis' of attribute order tree in the descending normalized energy score as follows: where x ∈ R d ×n = [x 1 , x 2 , . . . , x n ] is the training data matrix of branches of n-order subtrees. 7. Generate a set of reference vector Λ = {λ 1 , λ 2 , . . . , λ n } where where H is the number of divisions set along each branch in T i . 8. Perform the assignment of the subtree neighbourhood E(i) = {i 1 , i 2 , . . . , i T }, (λ 1 T , λ 2 T , . . . , λ iT ) is the closest vectors to λ i based on the Euclidean distance in T i . 9. Adopt the Spearman rank correlation coefficient to calculate the similarity between two nearest elitists by 10. Refine the average shared similarity based on all pairwise similarities by 11. Reconstruct the complete attribute-value space tree T E by the elitist sets with the 'best-n-basis' of attribute order tree as where Elit i = (Elit 1 i , Elit 2 i , . . . , Elit n i } is the elitist sets of T i for the 'best-n-basis' from the top level basis B p −1 as the p projection vectors (w 1 , w 2 , . . . , w p ).

IV. FUZZY ATTRIBUTE ORDER REDUCTION ALGORITHM BASED ON CAST
Since CAST structure of decision table in the attribute space adaptively prunes and optimizes the attribute order tree, the optimal attribute order reduction set can be achieved efficiently. The fuzzy similarity of multi-modality attributes can be fully extracted to satisfy the needs of users with the better convergence speed and classification performance. In this section, we present a co-evolutionary fuzzy attribute order reduction (CFAOR) algorithm for large datasets, especially for their dynamic and uncertain analysis. And thus, the decision rule sets generate a series of rule chains to develop an efficient cascade attribute order reduction and classification with a rough entropy threshold. Fig. 4 illustrates the framework of attribute order reduction process based on CAST. The main steps of CFAOR are described as follows: Create n subpopulations with m particles in each particle subpopulation for optimizing the respective assigned attribute subset. 2. Divide the sample dataset into two equal parts, one as the training dataset and the rest as the test set. Normalize the attribute order of the training samples and map them into the [0, 1] binary space. 3. Decompose the large-scale attribute order set into their respective particle subpopulations, and then compute the equivalence classes of their decision tables. 4. Map the attribute order space C(A) into reduction space C(R), and construct subpopulation with niche neighborhood radius r i . 5. Generate complete attribute-value space tree by using Algorithm 1 with the tree root node a 1 , and V (a 1 ) = ∅ in each particle subpopulation PS pop(i). 6. Partition the decision system S = (U, A, V, f ) into one master-table S Q =< U, Q, V Q , f Q > and sub-tables , . . . , k.   S(B)) by the defined pruning rules and obtain a attribute-value space tree T D (a i , S(B)). .

V. EXPERIMENTAL RESULTS
In this section, we conduct some experiments to validate the effectiveness and robustness of the proposed CFAOR algorithm. We start with the description of experimental setting in Section A. The performance comparisons with time and space between CFAOR and some representative fuzzy attribute reduction algorithms are conducted in Section B. In Section C, we carry out the accuracy comparisons of fuzzy attribute reduction and classification.

A. Experimental Setup
In order that the comparisons of fuzzy attribute reduction performance are more sufficient, we select ten real-world datasets from the UCI Machine Learning Repository and NIPS 2003 feature selection challenge datasets [33]. Those datasets, as described in Table I, include five low-dimensional datasets with adding 15% Gaussian noise ratio and five very high-dimensional datasets which come from various real-world domains, such as text categorization (Dexter), radar (Ionophere), drug science (Dorothea), medical science (Spect and Spectf), and biomedical science (Ovarian-cancer and Lung-cancer). We add 15% and 25% noise ratios to five low-dimensional and five highdimensional datasets, respectively. All algorithms are implemented in Visual C# 2013, and all experiments are run on a virtual machine with 12 CPUs and 256 GB memory at the University of Technology Sydney (UTS) High Performance Computing Linux Cluster with 8 nodes.
CFAOR is compared to three state-of-the-art algorithms for feature selection. To make the comparisons as fair as possible, the parameters of the compared algorithms are set to be the same as suggested in their respective references. The parameters of CFAOR are initialized as follows: The initial number of co-evolutionary particles is 2000, the number of particle subpopulation is 50, the number of particles in each particle subpopulation is 400, the number of iterations is 1000, and the number of iterations within each subpopulation is 500. The rough entropy threshold is satisfied with μ = 92%.  running time (in seconds), and Space is the space consumption (in Megabytes). Additionally, the bold value means it is the best result among different fuzzy attribute reduction algorithms.

B. Performance Comparison Metrics With Time and Space for Different Algorithms
As shown in Table II, it is clear that the best solutions obtained by CFAOR are better than those by NFRS [22], UFRFS [26] and RDRAR [29], although each algorithm comes with a substantially reduced computing running time and space load consumption. The bigger dataset is the more significant computational savings for CFAOR. Meanwhile, it is remarkable that in our experiments, the big space consumption also leads to the memory overflow for NNFS in Madelon, Dorothea and Lung-cancer datasets, and also for UFRFS in Madelon, Dexter and Dorothea datasets. But our proposed CFAOR algorithm can cope with this situation well. As it can be seen in Table II, CFAOR clearly outperforms NFRS, UFRFS and RDRAR in terms of Time and Space performance metrics for most datasets. As for the Spectf, Infant and Dorothea datasets, the significant improvements are brought by CFAOR. For example, CFAOR spends the 35.20% less space consumption (Space) than NFRS on the Spectf dataset, and 52.12% on the Infant dataset. These facts indicate that CFAOR can find consistently much faster and consuming less memory, compared with other algorithms.
From Table II, one can clearly see that, in term of the time results, CFAOR outperforms all compared algorithms across all instances. Furthermore, both the time and space results obtained by CFAOR are better than NNFS on all instances. The results reveal that the fuzzy attribute order reduction based on complete attribute-value space tree contributes to CFAOR performance, which has an effect on the ability in producing high quality results across all testing instances.

C. Accuracy Comparisons Fuzzy Attribute Reduction and Classification
In this section, we carry out comparisons of fuzzy attribute reduction and classification accuracy of CFAOR and five compared algorithms: NFRS [22], UFRFS [26], IT2-FR [28] FR-MRMS [28], and RDRAR [29]. Using C4.5 [34] as classifier, we employ a stratified 10×10-fold cross-validation (10-FCV). The original dataset is divided into 10 subsets of instances, where one subset is retained as the testing data, and the remaining subsets are used for training data. Table III depicts the comparisons of the prediction accuracy and standard deviation of all algorithms. We can see that CFAOR is obviously better than the other five techniques. NFRS, RDRAR and IT2-FR cannot deal with high-dimensional datasets, and both RDRAR and NFRS show the worst performance among all algorithms with failing to offer results for most datasets. Meanwhile, it is also noticed that all algorithms do not achieve high prediction accuracy because Madelon dataset contains many noisy features without predictive power.
As shown in Table III, CFAOR with C4.5 classifier has 92.17%, 91.79%, 95.56% and 100% significant average classification accuracies on Spectf, Ionophere, Dorothea and Ovariancancer datasets, respectively, whose corresponding results are identified as the symbol ' ♂ '. If related algorithms cannot deal with high dimensional datasets due to expensive computations, their results are identified as the symbol '−'.
It is obvious that CFAOR is consistently much better than its competitors on almost datasets. Most compared algorithms simply select a few of features from the correlated features set, whereas CFAOR considers both strongly relevant features and their corresponding correlated features simultaneously, which will turn out to be beneficial to reduce the classification error. The main reasons for this are as follows: We construct a complete attribute-value space tree structure of decision table in the attribute space to adaptively prune and optimize the attribute order tree. It can perform attribute order reduction in a much shorter time, and the reduced attribute set can satisfy the needs of users guaranteeing a better convergence speed. Therefore, CFAOR can well avoid some recalculations by directly exploiting the previous results from the current accumulated samples. Of course, it is also necessary to point out that in few special cases, the performance of CFAOR is slightly worse than those compared algorithms. The experimental results obviously demonstrate that the classification system by employing CFAOR as the fuzzy attribute reduction algorithm leads to the appealing performance of classification accuracy. CFAOR provides an effective approach to obtain the optimal result of fuzzy attribute reduction, which significantly enhances the classification accuracy with a reinforcing noise tolerance.

VI. APPLICATION TO TISSUES EXTRACTION OF DYNAMICAL CHANGING INFANT CEREBRAL CORTEX
Various tissues extraction of infant cerebral cortex is a very useful technique for the assessment of different regional matters in brain neurodegenerative diseases. The classification of the neonatal brain MRI is a critical step towards the infant brain development. During the postnatal human brain development, the brain tissues often undergo a wide range of development. Because there are some distinct differences of brain tissue structures between the infant and adult, state-of-the-art methods for adult brain classification are not applicable to the infant brain, which poses additional challenges for the study of infant cerebral cortex [35]- [37].
In this selection, we employ CFAOR to accelerate the tissue extraction of dynamical changing infant cerebral cortex with better accuracy and efficiency. We obtain infant brain datasets from the Internet Brain Segmentation Repository (IBSR) [38]. We apply different algorithms to determine the brain volumes of these subjects and then employed the manually segmented brain areas to evaluate the extraction accuracy. CFAOR is compared with five popular methods (BET [39], BSE [40], FreeSurfer [41], LongSeg [42], BEaST [43], and LPG-PCA [44]). The ♂ ' denote that the significantly best mean value is highlighted in boldface with gray background. default parameters of these methods are used with their respective references.

A. Extraction Results of Infant Brain Tissues From the Inner, Outer and Central Cortical Surfaces
To reveal the segmentation accuracy of cortical surfaces for infant brain 3D-MRI, we estimate of segmentation results from the inner, outer and central cortical surfaces. We label the matters inside the inner neonatal cortical surface as WM, the matters between the inner and outer cortical surfaces as GM, and the matters between the skull and outer cortical surfaces as CSF. For ease of viewing, only part of the segmentation results is shown in Fig. 5. It can be seen that there are some visible isolated holes caused by BET and BSE, but CFAOR can smooth out most of imperfections with disposing of them as parts of the cortical surfaces, and it shows the essential characteristic, which is consistent with the real inner and outer cortical surfaces.
In the following, the average surface distance error of different cortical surfaces is further investigated. Since there are large developmental changes in the developing infant brain matters, we perform the subjects at different birth months to validate the robustness of different algorithms. We select 10 subjects from 2 to 20 birth months from 0.10% 3D MR-18 dataset size. The average distance errors of 10 subjects are illustrated in Fig. 6, where those of the inner, outer and central cortical surfaces are around 0.779 mm/0.785 mm/0.667 mm (BET), 0.701 mm/0.728 mm/0.592 mm (BSE), 0.620 mm/0.658 mm/0.539 mm(CFAOR), respectively. CFAOR achieves overall 9.8%-25.6% improvement. Especially, CFAOR shows a better mean distance error of central cortical surface (0.539 mm), which reflects its significantly higher sensitivity and accuracy. Although, BSE also achieves good results, but it needs to speed more computational cost to obtain those compromised results. Thus, CFAOR can use crowns of gyri to properly simulate growth with simple image warping, and generate a more accuracy results.
From Fig. 6, it is obvious that CFAOR achieves the highest accuracy, indicating its superiority in better characterizing the structural longitudinal surfaces of infant cerebral cortex. Therefore, it can significantly improve the mean classification performance, but two compared methods are under-classification for cerebral regions because some deformable surfaces are located inner cerebral surface regions.

B. Quantification Comparisons of Multi-Criteria Performance
High resolution and high contrast infant MRI allow for the tissue extraction of dynamical changing human brain cerebral cortex, which can lead to successful and accurate segmentation. In the following experiments, multi-criteria for brain tissue extraction assessment, such as the Jaccard similarity coefficient (JSC), the specificity and sensitivity coefficients, and the missed extraction probabilities, are employed to measure the comprehensive performance.
JSC is widely adopted to evaluate the similarity between the extracted brain region Y and the corresponding ground truth X as where | · | denotes the cardinality value, and the value of JSC is The high sensitivity is regarded as the high recognizing percentage for cerebral cortex, denoted as S e . The high specificity is regarded as the high rejecting percentage for non-brain cerebral cortex, denoted as S p . Brain tissue extraction is a compromise between S e and S p . Their coefficients are defined as where TP is the true positive rate which is the number of voxels correctly classified as brain cerebral cortexes, FP is the false positive rate which is the number of voxels incorrectly classified as brain cerebral cortexes, TN is the true negative rate which is the number of voxels correctly classified as non-brain tissues, and FN is the false negative rate which is the number of voxels incorrectly classified as non-brain cerebral cortexes. Furthermore, we adopt two probabilities of missed extraction for both brain tissues p m and false alarm p f to measure the extraction risk, which are calculated as where Z is the extracted brain region with false alarm. Table IV shows the experimental results of CFAOR and compared popular methods on the IBSR datasets. Jointly considering S e and S p , the accuracy of BET, BSE and LongSeg are moderate. However, CFAOR is able to correctly detect the dynamical changing infant cerebral cortex and became remarkable improvement in terms of sensitivity and specificity. CFAOR often performed better than compared popular methods with multi-criteria performance, which indicates its better overlapping of the extracted brain regions with the ground truths. We also notice that LPG-PCA can achieve the better sensitivity in detecting almost brain tissues at the expense of the relatively low specificity.
Meanwhile, it is noticeable from Table IV that the computational time of CFAOR is obviously lower than BSE, BEaST, and FreeSurfer for all tested instances. CFAOR produces high quality solutions with lower computational times, followed by LPG-PCA, BET and LongSeg.
Through these restricted analyses, we can conclude that the extraction of brain tissues yielded by CFAOR further facilitates the correction of intensity non-uniformity for dynamical changing infant cerebral cortex.

C. Comparison of the Dice Similarity Coefficient of Expert Consensus Extraction
To further ensure the reliability of CFAOR, we compare the Dice coefficient of our extraction and the expert consensus extraction between each pair of expert manual extraction. We quantitatively examine the overlap level between our extraction and manual expert extraction. Dice similarity coefficient is defined as where A and B are the voxel sets of two different extractions of the same tissue, respectively. We obtain average Dice values of ten subjects in Table V, which show the differences between our extraction and the expert consensus and the differences of different experts. It also indicates that the committed errors are in the same magnitude range with the inter-experts variability. CFAOR can achieve the highest Dice similarity coefficient as expert consensus extraction and boost much better consistent labeling boundaries for large-scale dynamical changing infant cerebral cortex.

A. Discussion of Experimental Results
As illustrated in above experimental results, it is easy to draw the conclusion that CFAOR outperforms its competitors on most of the used complex datasets and achieves the higher computational efficiency and classification accuracy. CFAOR is effective and efficient. Furthermore, it can be well applied to the tissues extraction of dynamical changing infant cerebral cortex and achieves satisfying results. The reported results illustrate that its predication is highly correlated with the human learning and evaluation. CFAOR has very low-time complexity and high accuracy with high quality solutions, compared to the state-ofthe-art methods.
In contrast, other compared representative fuzzy attribute reduction algorithms as NFRS [22], UFRFS [26], IT2-FR [28], FR-MRMS [28] and RDRAR [29], load features into memory at one time, which costs are heavy in both memory and time, so that the time cost for these algorithms fluctuate obviously because their computational complexity and their classification performance dynamically decrease as time increases.
CFAOR is robust to the large-scale sample size dataset by added the noisy percentage, which means it is more stable than the compared representative algorithms. CFAOR only requires a relatively small time to achieve remarkable classification performance and it is more suitable to be used on noisy attributes than other representative algorithms. Moreover, CFAOR has better classification performance on five high-dimensional datasets with large noisy percentage attributes.
The traditional fuzzy attribute reduction algorithms are unreliable for dealing with the dynamically changing massive datasets with ever-greater amounts and complex fuzzy structures. However, CFAOR is more advantageous than traditional methods to measure accuracy of attribute reduction and classifications in dynamically changing uncertain big data. The major reasons are as follows: the use of fuzzy attribute order reduction model based on CAST structure can explore the structure of the dataset in the attribute dimension, which will reduce the effect of the noisy attributes with good speed. Therefore, it greatly improves the search efficiency in finding the optimal solution. More importantly, CAST can adaptively adjust the topological structure of attribute complete tree, and it can successfully finish pruning and optimizing the attribute order tree for the high-dimensional and uncertain big data. The reduced attribute set can satisfy the needs of users with maintaining a good diversity and a high convergence speed. Nevertheless, the results obtained by compared representative algorithms are not satisfactory. They are often not suitable for the attribute reduction classification task in the above-mentioned complex real-world datasets described in Introduction.
Therefore, we observe that CFAOR has a great superiority in terms of computational time and accuracy of fuzzy attribute reduction, especially for the high-dimensional and uncertain large-scale datasets to be processed. The classification superiority of CFAOR has been clearly proved when big datasets are added the noisy percentages, while its accuracy values maintain stable, which are depicted in Tables II and III. In summary, we can conclude CFAOR is a better choice to balance the computational cost and accuracy of fuzzy attribute reduction, compared with representative algorithms.

B. Closure
In this paper, we have presented a new co-evolutionary fuzzy attribute order reduction algorithm, CFAOR. This algorithm can deal with lots of uncertaint variables of fuzzy attribute sets in big data so that we induce their complexity and non-separability. A complete attribute-value space tree model of decision table is constructed in the attribute space to adaptively prune and optimize the attribute order tree, with providing the same classification performance as the original attribute set. The experimental results have demonstrated that the proposed CFAOR algorithm can carry out attribute order reduction and classification more accurately. It is able to effectively deal with a variety of forms and distributions of big datasets. CFAOR turns out to be efficient and robust on large-scale attribute reduction classification tasks. Furthermore, it is applied into automatic tissues extraction of human cerebral cortex of infant MRIs, and our results have indicated that CFAOR can be used for reliable tissues extraction in the lower resolution infant cerebral cortex. Hence, the proposed CFAOR algorithm is useful for evaluating neuro-protective clinical trials in infant brain.
Despite of these promising results of tissues extraction of large-scale dynamical changing infant cerebral cortex, the visual comparison of automatic extraction results yielded by CFAOR shows a bit of regions of MWM buried in the gyri that cannot detected. The main reason is that the partial volume effect makes it difficult for the MWM region to enter narrow channels in the cortical gray matter [45], [46]. It limits the CFAOR's application in real-world large-scale infant cerebral cortexes. In the future, we plan to enable the straightforward use of the tissues extraction for an accurate reconstruction of the gradual myelination process, which should allow for a higher improvement in the complex infant cerebral resolution.