Alpha-cut based compositional representation of fuzzy sets and exploration of associated fuzzy set regression

The compositional representation of data and associated statistical approaches is a powerful framework for modelling and reasoning about quantities which reflect proportions of a whole. Recently, an increasing body of work has started exploring the adoption of a compositional representation for modelling interval-valued data reflecting uncertainty or vagueness, for example interval-valued questionnaire responses. Results have flagged the intriguing potential of this approach, such as the elegant handling of traditional inference challenges, including implicitly ensuring coherence in linear regression for interval data, i.e. ensuring the estimated left bound of intervals is smaller than the right one. Building on these insights, extending the compositional representation via alpha-cut decomposition to fuzzy sets is an intuitive next step. In this paper, we discuss this compositional representation of fuzzy sets, building on prior interval work. We proceed to explore the adoption of compositional regression approaches to conduct linear regression on fuzzy set valued data sets. We demonstrate the approach, discuss results and in particular flag shortcomings and the challenges for next steps.


I. INTRODUCTION
Compositional data convey the structural information reflecting the quantitative parts of a whole, such that when features of a dataset inherently influence each other, they are expressed in proportions where all components are nonnegative and are summed up to a constant, e.g., 1, 24 hours or 100% etc. [1], [2] For instance, data on time spent in daily (24 hours) physical activity. This inherent dependency of such compositional data is usually not taken into consideration in standard multivariate statistical approaches [1], [2], [3], [4], [5]. Compositional statistical approaches-such as compositional linear regression-are powerful tools for modelling and reasoning about data which captures proportions of a whole [6], [7], [8], [9], [10], [11], [12].
Recently, an increasing body of work has started exploring the adoption of a compositional representation for modelling interval-valued data reflecting uncertainty or vagueness-for example, interval-valued questionnaire responses [13], [14], [14]. Due to the nature of the mutual dependency of interval endpoint parameters, the compositional transformation is proposed in [15]. The authors articulate how, why and when a compositional representation of interval-valued data may be appropriate, and further demonstrate compositional linear regression applied to interval-valued data [15]. Direnc  The literature on interval-valued regression studies has so far afforded improved model accuracy [16], [17], [18] and increased resilience to parameter flipping or 'loss of mathematical coherence' [19], [20], [21], [22], [23]. The latter is a major challenge for interval-regression as models are required to maintain the mathematical structure of intervals with a left and a right endpoint, or a centre and (positive) range.
Compositional representation of interval-valued data and compositional interval regression results have flagged intriguing potential, such as the elegant handling of such traditional inference challenges, including implicitly ensuring coherence [15]. Building on these insights, extending the compositional representation to fuzzy sets (FSs) is an intuitive next step.
In this paper, we discuss an α-cut based compositional representation of FSs, building on prior interval work [13], [15]. We proceed to explore the adoption of compositional regression approaches to conduct linear regression on the compositional representations of FS valued data. We demonstrate the approach, discuss results and in particular flag shortcomings and the challenges for next steps.
The structure of this paper is as follows. Section II provides background information on the α-cut interval-valued representation of FSs, compositional data and compositional linear regression. Section III presents the α-cut based compositional representation of FSs and the overall methodology for exploring the associated FS regression. Section IV demonstrates the initial experiments of the compositional linear model on synthetically generated FSs interval data and the results illustrations are given. Section V provides conclusions and a reflection on future work.

II. BACKGROUND A. FS and α-cut decomposition
A FS A is defined on a universe of discourse X and characterised by MF µ A (x) that takes values in the interval [0, 1]. A FS A in X can be represented as a set of ordered pairs of a generic elements x, and their grade of MF is shown as follows: If membership grades are constrained to be either 0 or 1, then a crisp set is obtained. Otherwise, membership grades µ A (x) takes the value in the interval of [0, 1] for each element x ∈ X.
The principal role of α-cuts and strong α-cuts in FS theory is their capability to represent FSs via α-cut decomposition [24]. The general idea of alpha cut (α)-cut decomposition is to decompose FSs into a collection of crisp sets (intervals) related together via the α levels [25], [24]. For all membership degrees α level is defined in [0, 1] and given a FS A, an (α) cut is defined as follows: and the strong α-cuts ( α+ [A])are defined as: To illustrate the application of α-cut, the following FS set A is given with the α = 0.5 in Fig. 1. Fig. 1: α = 0.5 decomposition which leads to interval [5,8].
The given 0.5-cut level ( 0.5 [A]) on the FS A provides the closed interval [5,8]. The collection of all (α)-cuts provides the (α)-cut representation of that fuzzy set. Note that for normal, convex FSs, for each α ∈ (0, 1], the α-cut is a closed interval.

B. Interval-Valued Data
A closed interval a is formed by two endpoints as lower a − and upper a + , with the condition of a − ≤ a + .
In this paper, the intervals are formed from the defined α-cuts on FSs ( α [A]), and vice-versa, on a finite domain At times, we refer to a set of n intervals over an α-cut

C. Compositional Data
A data set is called compositional if the quantities reflect proportions of a whole which is a fixed total sum κ, e.g., percentages of workers in different sectors, portions of the chemical elements in a mineral, concentrations of nutrients in a beverage, portions of working time spent on different tasks etc. [5]. More formally, a (row) vector, η = (η 1 , η 2 , ..., η D ) is a Dpart composition where all the components are strictly positive real numbers and reflect relative information. A sample space of this compositional structure is called a simplex S D , which is expressed as follows: (4) As a common practice, the three-part (D = 3) compositional data can be illustrated on a ternary diagram to show the components' compositional structure and inherent dependency. In a ternary diagram, triangle vertices represent the three elements of the composition. While high proportions are close to a vertex, low proportions are further, and equal proportions lie on the triangle's centre. For instance, the composition η = (0.5, 0.3, 0.2) is shown on the illustrative ternary diagram in Fig. 2.

D. Log-ratio Transformation
To enable standard unconstrained multivariate statistic operations, compositional data is commonly mapped from the simplex (S D ) to the real space (R D ). For this purpose, in the 1980s, the log-ratio transformation was proposed [1], [2] which provides a one-to-one mapping on to real values (between −∞ and +∞) where any compositions can be reformulated in terms of log-ratios, and vice versa.
In the literature, several algorithms for log-transformation of compositional data have been proposed (e.g., the centered log-ratio, the additive log-ratio or the isometric log-ratio [2], [26]). As one of the most common transformations, the centred log-ratio (clr) is used in this paper, in order to transform compositions (e.g., η = (η 1 , η 2 , ..., η D ) ∈ S D ) into coordinates as S D → R D . The clr transformation is employed as follows: where g(η) is the geometric mean of the composition vector η: The inverse operations clr −1 is done as follows: As explained in the next subsection, these transformed logratio data can be used in traditional regression models to investigate relationships between variables.

E. Compositional Linear Regression Models
In exploring relationships, compositional data can play the role of both independent and dependent variables through constructing an appropriate regression model on the log-ratio scale.
Most methods for classical linear regression have a close analogue in the form of compositional linear regression models [5]. In this paper, compositions (both dependent and independent variables [26]) are used as an alternative to interval regression models where both sets of variables are intervalvalued [17], [20], [21], [22], [23].
In general, in univariate (non-compositional) linear regression models, the dependence of one variable Y on another variable X is modelled as follows: where Y and X are dependent and independent variables, respectively. The a is the intercept, b is the regression coefficient and ϵ an error term with -generally-0 mean and variance σ 2 . The aforementioned regression model is built as follows. First, the dependent and independent variables are expressed as compositions. The data samples contain n observations of a D part compositions, resulting in a n×D matrix. In this paper, the dependent variables are denoted as Υ, and independent variables are denoted as X. Both dependent and independent variables are transformed into coordinates X clr , Υ clr ∈ R D by using the chosen log-ratio transforms (clr in our case) and the following statistical model is constructed: The regression parameters can be estimated in the standard way by the least squares method [27].
After having a brief overview of α-cut representation of FSs, compositional data and operations, we now proceed to the motivation and methodology of this paper.

III. MOTIVATION AND METHODOLOGY
In recent years, regression models for data sets where both dependent and independent variables are interval-valued have attracted increasing interest, with a view to improving regression model accuracy and mathematical in-coherency, i.e. ensuring the estimated left endpoint of intervals is smaller than the right endpoints. While the latter poses one of the key challenges for interval regression models, recent studies have started exploring the adoption of a compositional representation to address it and unlock further potential advantages such as ensuring estimations remain within a given variable's domain [13], [15].
In this paper, building on these insights, we extend the compositional representation to FSs by using alpha-cut decomposition. Further, we explore the use of the obtained compositional representations of the FSs to facilitate linear regression of FS-valued data.The steps of this approach are as follows.

A. FSs to Compositional Data
In [15], closed intervals are transformed into their 3-part compositional representations adopting cardinality of each individual intervals on a fixed domain. The authors [15], further applied the (interval) compositional representations on linear regression, articulating how, why and when the approach may be appropriate in terms of maintaining mathematical coherency and avoiding estimations outside of the fixed domain.
Adopting the approach in [15], in this paper, we extend the transformation of compositional data via α-cut decomposition in FSs. As mentioned in Section II-A, the α-cut decomposition theorem is used to obtain closed intervals ( α [A]) from FSs where α ∈ [0, 1]. Thus, first, intervals are obtained via the α-cut decomposition theorem and these obtained intervals are transformed in compositional representation (e.g., α η) following the approach in [15], where the corresponding α level is denoted as the left superscript.
As an illustrative example, in Fig. 3a, on a finite domain X, the FS A is decomposed by the α = 0.5 to obtain the interval 0.5 [A] = [5,8]. Later, this obtained interval (Fig. 3b on the domain X) is represented as a 3-part composition which is denoted by 0.5 η on a 3-dimensional simplex (Fig. 3c), i.e. 0.5 η ∈ S 3 as follows: where κ = X + , i.e. the parts sum to the maximum of the domain and the parts themselves are defined as follows: where α = 0.5 that decomposes 0. The procedures in [15] is extended to α-cut decomposition, providing intervals of FSs which in turn transform into compositional data as exemplified above. As the next step of this paper methodology, the transformed compositional representations are processed through to the compositional linear.

B. Compositional Linear (Interval) Regression
As detailed in Section II-E, the transformed compositional data are used to generate coefficients of compositional linear regression models.
Consider two sets of FSs (dependent and independent) are transformed to their compositional representation as outlined in Section III-A. The thus obtained compositional data are transformed by using the log-ratio transforms. As mentioned in section II-D, various log-ratio transform techniques exist in the literature and in this paper, the clr transformation is performed, as it focuses on geometric mean proportion of three compositions (5).
The approach can be divided into seven main steps and is illustrated in the flow chart in Fig. 4. Let two FSs A and B, instances of independent and dependent variables respectively, for the compositional linear regression. First, αcut decomposition intervals ( α [A] and α [B] where α ∈ [0, 1]) are obtained (Step 1 in Fig. 4). These intervals are transformed to their compositional representation, denoted by α H and α Z ∈ S D , following the procedures in Section III-A (Step 2 in Fig. 4).
Next, the clr log-ratio transforms are applied on the obtained compositions representations ( α H clr and α Z clr ∈ R), as Step 3 in Fig. 4. These log-ratio values are processed in the regression model to calculate coefficients (Step 4 in 4), as detailed in Section II-E. Lastly, the estimations of the regression models are inverted back to compositional data (Step 5 in Fig. 4) which in turn is transformed back to intervals (Step 6 in Fig. 4) and 're-assembled' in conjunction with their α-level to FSs (Step 7 in Fig. 4). Note that in this initial exploration of the compositional representation of FSs for linear regression, we adopt a naive approach, where α-cuts are obtained and processed independently for each α-cut level.

IV. EXPERIMENTS AND ILLUSTRATIONS
In Section III, we discussed the compositional representation of FSs and adopting the compositional regression on those calculated representations. In order to explore the behaviour of the model, two experiments are carried out. In the first experiment, the coefficients are calculated and the model is tested by using the same FSs which are used in the training phase. In the second experiment, the obtained same coefficients are tested by a different FS which is not involved in the coefficient generation phase.

A. Experiment 1
To illustrate the approach, we generate two sets of five synthetic FSs as independent and dependent variables denoted  A and B, shown in Fig. 5. We explore adopting linear regression on the compositional representation of the α-cut decompositions of the FSs to conduct regression effectively, mitigating some of the challenges of maintaining domain and parameter coherence.
As can be seen in Fig. 5, each dependent and independent variable FSs' support is gradually increased to explore and to clearly communicate the behaviour of the regression models in respect to different properties of the FSs. In other words, beyond considering model performance, we are conducting an analysis on the resulting regression models for systematic variations of the FSs.
The specific experimental steps are as follows, and each is visually illustrated in Fig. 5: Step 1: To enable visualisation within the paper, we conduct α-cut decomposition, using only three levels (at α = 0+, α = 0.5 and α = 1), thus obtaining three set of intervals for each instance of the independent and dependent variables Step 2: The intervals from Step 1 are transformed to the compositional representation (denoted by α H and α Z ∈ S 3 ). Compositional linear regression is performed by using the 'Compositional' library in R language [28].
Step 3: The clr log-ratio transform is applied on the obtained compositional representations and the α H clr and α Z clr ∈ R are calculated.
Step 4: Following the procedure in Section II-E, the co-efficients are obtained using the linear regression approach applied to the transformed valued generated in Step 3. Note that for each α-cut level, a different linear regression model (denoted by α LM ) is optimised and, thus, a different set of coefficients is calculated. As an example, the 0.5 LM is given in (12).
Note that η clr 2 and η clr 3 are excluded/dropped to avoid having mutually dependent components in the regression stage in (12). η clr 2 and η clr 3 could be excluded and later recreated at the estimated unconstrained real space stage-as for clr, all three components add up to zero. Excluding η clr 2 and η clr 3 would be computationally more efficient as it avoids the estimation of its model parameters, but it makes the pipeline more cumbersome as the recreation requires adjustment depending on the transform used.
With the compositional regression model in place, we can now estimate an output for a given FSs A. In order to so, first, we follow the same four steps in Fig. 5 to obtain the α H clr .
Step 5: We use the models ( α LM ) in conjunction with the α H clr to estimate the α Z clr for each α-cut level decomposition. Later, the inverse operation (7) is applied on the estima-  Fig. 4 tions clr −1 ( α Z clr ) and the compositional representation α Z is obtained.
To provide visual insight of the quality of the regression model, the estimated compositions ( 0.5 Z) from the 0.5 LM are illustrated as black crosses in the ternary diagram in Fig.6, with the blue circles representing the ground truth compositions 0.5 Z.
Step 6: Each estimated composition is transformed into intervals or more specifically α-cut α [B], where α = (0+, 0.5, 1). The estimated 0.5 [B] intervals are illustrated as black dashed-lines in Fig. 7 where the blue lines represent the ground truth 0.5 [B].
In the testing phase of the generated coefficients, the same FSs (A) are used as input to the built linear regression models and the estimations results (black-dashed intervals) are visualised on the given input FSs in Fig. 8. As the initial results indicate that all the estimations are in-line with the fixed domain and follow the mathematical coherency (the left endpoints are smaller than the right endpoints) where the models meet intuitive expectations.

B. Experiment 2
In experiment 1, the generated FS pairs (A − B) are used to optimise coefficients and the same FSs A are tested on the generated coefficients to explore the estimation behaviour of the model.
In experiment 2, we conduct the analysis with a different FS to further explore the behaviour of the models (e.g., 0.5 LM is given in (12)). First, we generate the Gaussian shape FS I (shown on the left-hand side of the Fig. 9) which is not involved in the coefficient generation process. Following the same 7 steps in Fig. 4, the I transform into the clr log-ratio and the same linear regression model coefficients are used to generate the estimation. Later, the back transformation is carried out, and estimations are re-assemble to visualise the result (Î), which is shown as black dashed lines on the righthand side of Fig. 9.
Overall, based on the experiment, it can be observed that the estimations go beyond the given FS (I). We highlight the expected potential of the approach: having the estimations inline with the fixed domain [0, 10] and avoiding the risk of parameter flipping where both of the key issues are addressed in the given results.

V. CONCLUSIONS
Recent studies have started exploring adopting a compositional representation for data reflecting uncertainty or vagueness, for example, interval-valued questionnaire responses. The further effort has been made in articulating how regression for these compositional representations can be conducted and how this can provide advantages in mitigating some of the challenges faced by traditional approaches. This paper proposes the α-cut based compositional representation of FSs and conducts an initial empirical exploration of linear regression using the resulting FS representations.
In the experiments, initial explorations carry out to examine the behaviour of the proposed approach which shows promising results in principle. However, we highlight that substantial questions and challenges remain. For example, further experiments and a formal exploration of whether the compositional representation and associated regression addresses key concerns such as mathematical coherence are needed. Moreover, the simple approach of generating independent regression models for each α-cut, as outlined here, is expected to risk challenges, such as resulting in α-cuts at a higher α level not being a subset of lower α-cuts.
Thus, intuitively, in future work, we will further explore the expected potential of this approach in addressing two key challenges of the risk of parameter flipping and the unexpected generation of estimations outside the dependent variable's domain. Furthermore, we will examine dependency across multiple linear regressions associated with different α levels. Fig. 8: Re-assembling FSs from α-cuts from α LM estimations into interval-valued data (black dashed lines), correspond to Step 7 in Fig. 4. Fig. 9: Experiment 2 -Re-assembling estimation results from α-cuts in α LM to interval-valued data (black dashed lines), correspond to Step 7 in Fig. 4.