A support vector-based interval type-2 fuzzy system

In this paper, a new fuzzy regression model that is supported by support vector regression is presented. Type-2 fuzzy systems are able to tackle applications that have significant uncertainty. However general type-2 fuzzy systems are more complex than type-1 fuzzy systems. Support vector machines are similar to fuzzy systems in that they can also model systems that are non-linear in nature. In the proposed model the consequent parameters of type-2 fuzzy rules are learnt using support vector regression and an efficient closed-form type reduction strategy is used to simplify the computations. Support vector regression improved the generalisation performance of the fuzzy rule-based system in which the fuzzy rules were a set of interpretable IF-THEN rules. The performance of the proposed model was demonstrated by conducting case studies for the non-linear system approximation and prediction of chaotic time series. The model yielded promising results and the simulation results are compared to the results published in the area.


I. INTRODUCTION
F UZZY SYSTEMS are used to model various sources of uncertainties and the uncertainties associated with linguistic imprecise knowledge [1].Traditionally, expert knowledge has been the principal source of a rule-based fuzzy system.This source of information is difficult to find and unfeasible to changes in knowledge over time that are hard to capture and include in the model.
Type-2 fuzzy systems, in certain applications, have often outperformed type-1 fuzzy systems.The complexity of the computations of general type-2 fuzzy sets means that many applications use interval type-2 (IT2) fuzzy sets [2].Practically, IT2 fuzzy sets are often easier to manage as compared to general type-2 fuzzy sets.When using IT2 fuzzy sets the mathematics is much less complex [3].A type-reduction mechanism is used in type-2 fuzzy systems in order to obtain a type-1 fuzzy set -the type-reduced set [4].The Karnik-Mendel (KM) method is a commonly used type-reduction algorithm and is used to find the centroid of IT2 fuzzy sets which in turn is a type-reduced set [5].The main advantages of KM type-reducer are its consistency with the extension principle and its strong theoretical ground [6].Nevertheless, the KM algorithm can suffer from the computational cost of iterations, particularly when it is used in fuzzy logic control systems [7].There are some type reducers proposed in the literature alternative to the KM method focused on simplifying the computations and improving the performance [8], [9].
One of the systematical fuzzy modelling and fuzzy identification methodology is the Takagi-Sugeno-Kang (TSK) fuzzy system [10], [11].TSK is a fuzzy system that can transform human knowledge and experience systematically into a rule-based fuzzy system.There is a well established need for learning methods that can enhance the optimisation of membership functions in fuzzy systems.Least-squares estimation is a common method used to minimise the output error of a TSK fuzzy system through training.This type of learning mostly uses a design approach referred to as fuzzy neural networks (FNNs) [12].One main advantage of FNNs is that a high learning accuracy can be achieved when the model is less complicated.FNNs use least-square estimation to minimise the empirical risk and do not tolerate any structural risk.As a consequence, one disadvantage of this approach is that it can suffer from overfitting.As to avoid overfitting, support vector regression (SVR) can be an alternative regression approach and leads to generalisation as compared to least-squares estimation for the fuzzy systems.The fuzzy rules and antecedent parameters can be obtained using fuzzy C-means clustering and the consequent parameters are learnt with the -insensitive learning [13].An SVRbased fuzzy system approach has been applied to various research problems including high-dimensional bioinformatics data sets and yielded promising results [14], [15], [16].One main advantage of SVR is that it takes into account the complexity of the model with the use of a cost function.This cost function can be optimised in order to minimise a bound on the generalisation error yielding a better blind performance as well as preventing overfitting in contrast to FNNs.
Showing some of the disadvantages of least-squares estimation, the aim of this paper is to propose a hybrid learning system that is capable of building a robust fuzzy predictive model through the use of type-2 TSK fuzzy system.A type-2 SVR-based approach in a way similar to fuzzy neural networks that replaces SVR with least-squares for the consequent learning is recently proposed [17].Our approach addresses the computational cost of a type-reduction process on an SVR-based type-2 fuzzy system with one of the recent closed-form type reduction and defuzzification methods.The equation in the consequent part is described using the coefficients of the SVR.
The rest of the paper is organised as follows: The next section covers the materials and methods and describes the characteristics of our approach (Section II).Experimental studies such as non-linear function approximation and chaotic time series prediction are given in Section III.Finally, Section IV concludes the paper.

A. Support Vector Regression
Support Vector Machine (SVM), a statistical learning approach based on structural risk minimisation, can be used for classification and real-value estimation tasks [18].The regression form of SVM is SVR which uses the -insensitive loss function as depicted graphically [19] in Fig. 1 that approximates a linear function f (x) in the following form: where the coefficients w and b are the weight vector and bias term, respectively.Mathematically, the constrained optimisation problem is formally defined as follows: where ξ + , ξ − are the two nonzero slack variables in both directions.The constant parameter C > 0 is the trade-off that it achieves between the complexity of the function and toleration up to a value which deviates greater than .The minimisation function ( 2) is subject to:

B. Support Vector based TSK Fuzzy System
TSK is a fuzzy modelling method, proposed by Takagi, Sugeno and Kang, that can exhibit high-dimensions, nonlinearity, and complexity.Each rule in the structure of the TSK fuzzy system can be expressed in the following form [10]: ) where i = 1..r is the number of fuzzy rules; and (x 1 , x 2 , ..., x n ) are the n input variables; and a fuzzy set for the variable n and rule i is denoted by A ni ; and y i is the rule output of the consequent part; and c ni represents the coefficient of its linear equation.
The fuzzy set A ij is described with any form of membership functions, commonly with the following Gaussian membership function: where µ(x j ) is the degree of membership for input variable x j ; and c ij and σ ij are the centre and standard deviation that characterises a fuzzy set, respectively.The t-norm operation can be defined as: where f i is the firing strength determined by using a t-norm operation defined by the product (*) operator.A normalised firing strength can be defined in the following form: where f i denotes normalised firing strength.A defuzzification operation is processed by finding the overall output obtained by the weighted sum: Let the input and real-valued output training data set D is {( x 1 , y 1 ), ( x 2 , y 2 ), ..., ( x N , y N )}.In order to obtain the coefficients w (weight vector) and b (bias term) of the SVR linear expression, each data item x i in the training data set along with its actual output y i is transformed to represent a training data pair ( x i , y i ) which is fed into SVR as in the following form: Once the w and b are obtained, a defuzzification operation for the support vector-based Takagi-Sugeno-Kang fuzzy system (TSK-SVR I) is formulated as: where the new defuzzified output formulation of TSK-SVR I is denoted by y .SVR part of the hybrid method is implemented through the use of LIBSVM package [20].

C. IT2-TSK A2-C0 Fuzzy System
Generally, an interval A2-C0 TSK model can be defined in the following [21]: ) where i = 1..M represents the IF-THEN rules of the fuzzy system; x 1 , x 2 , ..., x n are the input variables; and Ãni is an interval type-2 fuzzy set for the variable n and rule r; and y i is the rule output; and c 0 , c 1 , c 2 , ..., c n are the consequent parameters.
IT2-TSK A2-C0 involves upper and lower membership functions in the antecedents where the uncertainties may encountered.The firing strengths of a fuzzy rule can be defined by the use of t-norm operator: where f i and f i represent the lower and upper firing strengths, respectively; µ(x j ) is the upper degree of memberships and µ(x j ) is the lower degree of memberships for input variable x j ; and t-norm operation is defined by the product (*) operator.
The model has an an interval type-1 fuzzy set at the end which is determined by its left (y l ) and right (y r ) end points: ...
The end points generally can be calculated through the iterative KM algorithms and the final output can be calculated as:

D. Biglarbegian-Melek-Mendel Type Reduction
Type reduction is processed by finding the end points generally with the use of iterative KM algorithms and then these end points are used to calculate the final output.Due to the high-computational cost of iterative KM algorithms, alternative type-reduction algorithms that are faster in computation and have closed form expressions have been proposed recently in the literature.Some of the computationally effective alternative type-reduction algorithms, many of them are for the defuzzification of Mamdani IT2 fuzzy logic systems, are Liang-Mendel Unnormalised Method [22], Wu-Mendel Uncertainty Bounds Method [23], Coupland-John Geometric Method [24], Greenfield-Chiclana-Coupland-John Collapsing Method [25], Nie-Tan Method [26].
Biglarbegian-Melek-Mendel (BMM) method is one of the recent closed-form type reduction and defuzzification methods that adapted to design the type-reduction process of an IT2-TSK fuzzy system [27], [28].Closed mathematical form type reduction along with the defuzzification process for IT2-TSK fuzzy logic system (FLS) can be computed as: where q and p are the adjustable coefficients to weight the lower (f i ) and upper (f i ) firing strengths of each rule, respectively (if r = 1, then q + p = 1).The rule outputs denoted by y i are not required to be sorted in BMM type reduction.

E. Support Vector based IT2-TSK Fuzzy System
This section introduces the hybrid learning system that incorporates SVR with the IT2-TSK A2-C0 fuzzy system.Generally, least-squares estimation is used to estimate the consequent parameters of TSK fuzzy systems.As compared to least-squares estimation, SVR is an alternative regression approach that leads to generalisation.To address the computational cost, BMM is used as an alternative method to the KM.Let the input and real-valued output training data set D is {( x 1 , y 1 ), ( x 2 , y 2 ), ..., ( x N , y N )}, This data set is transformed into training data pairs {( x 1 , y 1 ), ( x 2 , y 2 ), ..., ( x N , y N )} benefiting from the design parameters of BMM type reduction.Each data item x i in the transformed training data set D along with its actual output y i is fed into SVR in order to obtain the coefficients w (weight vector) and b (bias term) of the SVR linear expression as in the following form: The optimal design parameters q and p can be optimised using a grid search and are used to weight the lower (f i ) and upper (f i ) firing strengths of each rule, respectively.A defuzzification operation for the support vector based IT2-TSK A2-CO fuzzy system (TSK-SVR II) is formulated as: where the new defuzzified output formulation of TSK-SVR II is denoted by y .SVR part of the hybrid method is implemented through the use of LIBSVM package.

III. EXPERIMENTS
In this section two simulations will be presented.The first simulation is an example of nonlinear system approximation and the second is chaotic time series prediction.The results of the proposed approach are compared to those of various methods published in the literature.In order to provide an objective comparison of the proposed methods a widely used  statistical measure, root mean square (RMSE) is used.RMSE can be expressed in the following form: where y obs,i and y prd,i are observed data and predicted data respectively, and N is the number of samples.In addition, improvement gained through the proposed type-2 method  (IT2) over type-1 (T1) method can be calculated as:

A. Nonlinear System Approximation
A nonlinear system equation that appeared in many modelling exercises will be used for comparison purposes [29].This nonlinear system approximation can be defined in the following: The randomly generated data set as shown in Table I which consists of 10 input features within [1,5] to approximate the given non-linear function was from [28] for an unbiased comparison.The fuzzy rules and antecedent parameters of the proposed model are obtained using clustering.The model contained 3 rules.Type-1 fuzzy sets and interval type-2 fuzzy sets for rule 1 characterised by Gaussian membership functions were depicted in Fig. 2 and 3.The prediction results of the proposed model are shown in Table II.The optimal TSK SVR II parameters assessed through the use of RMSE values are found to be C = 3.00 and = 0.1.The comparison of the performance of type-2 TSK A2-CO systems over type-1 TSK systems is also assessed.The percentage improvement of the TSK-SVR II over TSK-SVR I is found to be %25.3.By the use of a grid-search, the adjustable coefficients of BMM type reduction are obtained as (q = 2.15 and p = 0.03).

B. Time-Series Prediction
The Mackey-Glass equation is a kind of time series which has chaotic and non-linear characteristics and its data is produced by a time-delay differential equation expressed as: where the constants a, b, and n are used for the generation of chaotic time series values and t denotes the time.The chaotic behaviour comes from the delay parameter, τ , where τ > 16.8.This equation was initially proposed for modelling the blood cell regulation [30] and was used as benchmark for decades in literature to assess particularly the performance of prediction methods.The data set, given as x, consists of 1200 data samples which produced as in the form of x(t-18), x(t-12), x(t-6) and x(t) for the input samples and x(t+6) for the output samples.The input-output mappings are used to predict future values of x at x(t+6).The discretisized data is formed using the fourth order Runga-Kutta method and 1000 samples were generated by the (25).The samples   were divided into two equal sized groups each contained 500 samples.The former was for the training data and the latter was for the testing the proposed model.The parameters learnt through the training was used to construct a rule-based fuzzy logic system.To measure the outcome of training and testing prediction performances, RMSE was used.
Table III shows the prediction results given as test errors (RMSE values) of the proposed model and those reported in the literature.The fuzzy rules and antecedent parameters of the proposed model are obtained using clustering.The model contained 32 rules.Type-1 fuzzy sets and interval type-2 fuzzy sets for rule 1 characterised by Gaussian membership functions were depicted in Fig. 4 and 5.The optimal TSK-SVR II parameters assessed through the use of RMSE measure are found to be C = 17.75 and = 0.01.The percentage improvement of the TSK-SVR II over TSK-SVR I is found as %12.5.By the use of a grid-search, the adjustable coefficients of BMM type reduction are obtained as (q = 1.50 and p = 0.01).

IV. CONCLUSIONS AND FUTURE WORK
This paper proposed a hybrid system for the IT2-TSK A2-CO fuzzy system.The consequent parameter learning of the fuzzy system with the assistance of SVR regression yielded good performance improvement for the given regression tasks.Computational cost is also become efficient with the use of one the recent closed-form type reduction and defuzzification methods which is adapted to design the type-reduction process of an IT2-TSK fuzzy system.One advantage of the proposed fuzzy system is that it can benefit from the interpretable rules in comparison with the published papers that employ black-box models.Additionally, the generalisation of the overall system is increased and yielded improvement on the prediction performance for the unseen data.In future work, bioinformatics data sets will be studied in order to find out how our approach can cope with such data sets which have high-dimensional and complex characteristics.

Fig. 3 .
Fig. 3. Interval type-2 fuzzy sets (rule 1) for the non-linear approximation problem characterised by Gaussian upper and lower membership functions.

Fig. 4 .
Fig. 4. Type-1 fuzzy sets (rule 1) for the chaotic time series prediction problem characterised by Gaussian membership functions.

Fig. 5 .
Fig. 5. Interval type-2 fuzzy sets (rule 1) for the chaotic time series prediction problem characterised by Gaussian upper and lower membership functions.