Reducing the impacts of intra-class spectral variability on the accuracy of soft classification and super-resolution mapping of shoreline

ABSTRACT The main objective of this research is to assess the impact of intra-class spectral variation on the accuracy of soft classification and super-resolution mapping. The accuracy of both analyses was negatively related to the degree of intra-class spectral variation, but the effect could be reduced through the use of spectral sub-classes. The latter is illustrated in mapping the shoreline at a sub-pixel scale from Landsat ETM+ data. Reducing the degree of intra-class spectral variation increased the accuracy of soft classification, with the correlation between predicted and actual class coverage rising from 0.87 to 0.94, and super-resolution mapping, with the RMSE in shoreline location decreasing from 41.13 m to 35.22 m.


Introduction
Land-cover mapping through the means of an image classification is one of the most common applications of remote sensing. However, the full potential of remote sensing as a source of land-cover information is often unrealized due mainly to a set of technical problems. One of the most important problems limiting classification accuracy is that of mixed pixels (Fisher 1997;Cracknell 1998;Foody 2002b;Costa, Foody, and Boyd 2017), which are often abundant in remotely sensed imagery and cannot be appropriately or accurately classified by conventional (hard) image classifiers that associate each pixel to a single land-cover class. Soft or fuzzy classification techniques allow for the partial-and multiple-class membership within each mixed pixel, and, therefore, may be used to refine the standard mapping process as well as increase the accuracy of land-cover mapping from remote sensing (Wang 1990;Foody and Cox 1994;Doan and Foody 2007;Mather and Tso 2016).
The output of a soft classification is typically a set of fraction images that show the predicted coverage of each thematic class in the area represented by each pixel. This fractional value is often expressed as a proportion of the area of the pixel covered by the relevant class. Soft classifications have been found to provide more informative and potentially more accurate representations of land cover than conventional hard classifications (Foody 1996;Khatami, Mountrakis, and Stehman 2017). Although soft classifications predict the proportion of each land-cover class within each pixel, they do not indicate where the land-cover classes are spatially located within the pixels. The sub-pixel class components may, however, be located geographically through super-resolution mapping analysis (Tatem et al. 2001;Atkinson 1997).
SRM may be considered as a downscaling technique that predicts the location of landcover classes within each image pixel from the fraction images derived from soft classification. Basically, in SRM, a pixel is delineated in a matrix of sub-pixels where each sub-pixel will be predicted to a land-cover class based on the spatial dependence concept. A variety of methods have been used for super-resolution mapping in remote sensing, including those based on spatial dependence maximization (Atkinson 1997), Hopfield neural network (HNN) (Tatem et al. 2001), linear optimization techniques (Verhoeye and De Wulf 2002), contouring method (Foody 2002a), genetic algorithm (GA) (Mertens et al. 2003), backpropagation neural network (BpNN) (Mertens et al. 2004), pixel-swapping method (PSM) (Foody, Muslim, and Atkinson 2005;Atkinson 2005), Markov random fields (MRFs) (Kasetkasem, Arora, and Varshney 2005), attraction models (Mertens et al. 2006;, indicator geostatistics (Boucher and Kyriakidis 2006), interpolation method (Ling et al. 2013;Wang, Shi, and Atkinson 2014), adaptive mapping methods (Zhong et al. 2015;Xu, Zhong, and Zhang 2014), and spatial distribution mapping strategy (Ge et al. 2016). Moreover, combinations of methods may be used such as PSM, HNN and MRF (Li et al. 2016b), BpNN and GA , and contouring and PSM (Su et al. 2012b). In addition, methods may be refined to accommodate for sensor variables such as the point spread function (Wang and Atkinson 2017) and availability of ancillary data sets such as maps or fine spatial resolution imagery (Li et al. 2017). Central to each is the aim to locate the sub-pixel class composition provided by some form or pixel unmixing or soft classification analysis.
To reduce the impact of soft classification error, some algorithms have been proposed to release the constraints of class area fractions, i.e. regularization model (Ling et al. 2014) and local endmember method (Li et al. 2016a). Some approaches seek to maintain the class proportion information predicted by the unmixing (Foody, Muslim, and Atkinson 2005;Foody and Doan 2007). This may be unwise as it effectively assumes that each class can be represented by a single spectral endmember. Since land-cover classes generally display some degree of intra-class spectral variation, this assumption is untenable. Moreover, as a result of the intra-class spectral variation, it must be noted that the spectral response noted for an image pixel could be associated with a range ofland-cover compositions, not just a single composition associated with the use of a single endmember per class.
Indeed, the accuracy of unmixing or soft classification analyses may be negatively related to the degree of intra-class variation present Jay et al. 2017;Huang et al. 2016), and approaches to refine basic unmixing methods to accommodate for this issue have been developed (Song 2005;Xie et al. 2016). Rather than using a single endmember per class to derive a single-class composition estimate in an unmixing analysis, a bundle of endmembers may be used (Bateson, Asner, and Wessman 2000) or a distribution of possible fractional covers could be obtained . This distribution may provide a richer description of the possible sub-pixel class compositions of mixed pixels. Although this provides more information, perhaps in the form of a range of possible locations for a class boundary, it may be preferable to reduce the impacts of intra-class variability on unmixing and so on super-resolution mapping.
This article aims to explore and assess the impacts of intra-class spectral variability on the accuracy of unmixing and ultimately super-resolution mapping. Particular attention is focused on the potential to increase the accuracy of the soft classification and superresolution mapping by reducing the degree of intra-class variation, here through the inclusion of spectral sub-class information into the analysis. Attention is focused on a simple scenario, the fitting of an instantaneous shoreline to imagery unmixed into land and water classes.

Background of the methods used
Because the input to a super-resolution mapping analysis is fraction imagery describing the class composition of the image pixels, many techniques may be used to derive the latter information (Settle and Drake 1993;Song 2005). The resulting fraction images provide the class composition information which may then be located in space via a super-resolution analysis. In this research, a linear mixture model (LMM), which is an efficient method for estimating fractional compositions from multispectral images (Settle 2006), was used. From the range of super-resolution mapping methods available, two methods were selected for this study: contouring method (Foody, Muslim, and Atkinson 2005) and HNN (Tatem et al. 2002). The contouring method was selected because it is considered as the easiest method used for predicting the location of a boundary at a sub-pixel scale (Su et al. 2012a), whereas HNN has proved to provide accurate sub-pixel classification (Nguyen, Atkinson, and Lewis 2011;Li et al. 2014a), and recently, HNN has been demonstrated to be a successful tool and considered as the most widely used for super-resolution land-cover mapping (Su et al. 2012a;Li et al. 2014b;Wang et al. 2015).

Linear mixture model
LMM uses a linear mixture assumption that the derived spectral response at an image pixel is a linear combination of the reflectance of all components within that pixel (Settle and Drake 1993); therefore LMM can be expressed in a mathematical form (Haertel and Shimabukuro 2005) as below: where RF i is the reflectance value of band i; r i;j is the reflectance value of land-cover class j; f j is the fractional value of land-cover class j; E i is the error, and m is the total number of the land-cover classes. The fraction f j is subjected to the following constraints:

The contouring method
The contouring method is a simple generalization of the soft classification output in which a contour or isoline of class membership is fitted to the fraction imagery to represent the shoreline ( Figure 1). The contouring approach to super-resolution mapping involves the fitting of a 0.5 class membership contour to the soft classification that indicates the proportional cover of the two classes to be represented. In other words, regardless of the fraction value of the pixel, the 0.5 class membership is always assigned equally for each of the two classes.
The contour fitted to the output of a soft classification provides a representation of the boundary of land-cover patches. Here, a 0.5 class membership contour was used to separate two classes: a target object and its background. This approach allows the contour to run through pixels and can provide smooth boundaries rather than unrealistic jagged ones that arise when boundaries are fitted to hard classifications and constrained to lie between pixels.
This method has been applied previously to mapping the shoreline from remotely sensed data (Foody 2002a;Foody, Muslim, and Atkinson 2005) and is simple and quick to undertake. However, one potential drawback of the contour method to shoreline mapping is that the class proportional information provided by the soft classification is not maintained in fitting the contour. Thus, the generalization process involved in fitting the contour can lead to a different class composition within the area represented by a pixel than that predicted from the unmixing analysis, which could be problematic if the unmixing analysis was highly accurate.

Hopfield neural network
The HNN-based approach is computationally more demanding than the contour-based approach; however, this method has been demonstrated to be a successful tool for super-resolution land-cover mapping based on the output of soft classification (Nguyen, Atkinson, and Lewis 2011;Su et al. 2012a;Li et al. 2014b;Wang et al. 2015). With the HNN-based approach, the area represented by each pixel is sub-divided into a large number of sub-pixels. Each sub-pixel is given a hard class label with the proportion of a pixel's sub-pixel components allocated to a class reflecting that predicted for the pixel by the unmixing analysis. The sub-pixels are then spatially rearranged within the area represented by a pixel until a suitable geographical representation has been derived. A fuller detail on this method can be found in Tatem et al. (2002), but some of the salient features are outlined below.
The HNN is used as an optimization tool, in which, it is initialized randomly using the class composition estimates from a soft classification and run until it converges to a monotonic stable state (Tatem et al. 2002). The zoom factor, z, determines the increase in spatial resolution from the original remotely sensed imagery, which was used to derive soft classification output, to the new fine spatial resolution image. After convergence to a stable state, the output values of all neurons of the network were either 0 or 1, representing a binary classification of the land cover at the finer spatial resolution. The specific goals and constraints of the HNN energy function determined the final distribution of neuron output values. The energy function can be defined as, where k 1 , k 2 , k 3 , and k 4 are weighting coefficients which define the effects of the corresponding two goal functions (G1 i,j and G2 i,j ), proportion constraint (P i,j ), and multiclass constraint (M i,j ) to the energy function.
Using the class proportion images derived from soft classification (i.e. LMM) as the input, the HNN is implemented using carefully selected settings for the parameters, k 1 , k 2 , k 3 , and k 4 , as they control the optimisation process of the network. Typically, identifying the optimum weighting constraint values is a difficult and tedious task, so estimates based on certain assumptions and multiple-network trial runs are often used (Tatem et al. 2002). In addition, a zoom factor, z, which determines the increase in spatial resolution from the original remotely sensed imagery to the new fine spatial resolution imagery, should also be specified. Finally, the analyst must also define the number of iterations for the analysis.
The output of the HNN approach is a set of binary images with a spatial resolution that is z times finer than that of the input class proportional images derived from soft classification ( Figure 2). The number of the binary images is equal to the number of land-cover classes to be mapped, with each image representing the location of a defined class. When, as in this study, only two classes are used, the boundary between them may be represented by a vector line fitted between sub-pixels with different labels.

Data used
Methodological flow of this study is shown in Figure 3. The focus was on mapping part of the shoreline of the Isle of Wight, UK, from Landsat ETM+ data (Figure 4(a)). It is apparent from Figure 4 that there was a large degree of intra-class spectral variation. The water class, for example, showed a clear variation in terms of turbidity. Since the spectral variability, which is central to this research, was most evident in the shorter wavelength data only, the three shortest wavelength bands (ETM+ band 1, band 2, and band 3) were used for the analyses.
The original 30 m spatial resolution ETM+ image was classified visually into two classes, land and water, for use as a ground or reference dataset. The classified image was then vectorised along the boundary between land and water classes to generate the reference shoreline. The same procedure was used for separating two spectral subclasses, turbid water and clear water.
The Landsat ETM+ image was spatially degraded by a factor of 10 to simulate data sets with a relatively coarse spatial resolution of 300 m (Figure 4(b)). This coarse spatial resolution is comparable, in terms of pixel size, to that of the system such as MODIS and MERIS (Justice and Tucker 2009). The spatially degraded image was obtained by  aggregating pixels to the desired spatial resolution, with each degraded DN expressed as the mean DN of the original undegraded pixels it comprised; this is not an ideal simulation of coarse spatial resolution data but provides a common, spatially coarse, data set for the research similar to that used in other studies. The simulated coarse spatial resolution image produced by the degradation process was used in the analyses to predict the shoreline location using super-resolution mapping methods based on the outputs of a soft classification. Since the shoreline mapping is based on a soft classification, the accuracy of the soft classification is critical as it will impact on the ability to derive an accurate representation of the shoreline. The testing set used to assess the accuracy of the soft classifications contained 5000 pixels drawn randomly from the degraded data. 3.2. The impacts of intra-class spectral variation on the use of soft classification for super-resolution mapping Using the simulated coarse spatial resolution image, a soft classification of the image into the land and water classes was derived through the use of the LMM (Settle and Drake 1993). This analysis was called the two-class analysis. The class endmembers required for the LMM were derived from a sample of 180 randomly selected pure pixels, comprising 90 pixels of each class. For illustrative purposes, the data set was also subjected to a principal components analysis, and the first two components were used to display the classes in feature space (Figure 5(a)). The accuracy of the soft classification was evaluated using correlation coefficient (r) and root mean square error (RMSE) between the predicted and reference coverage.
The output of the soft classification was an estimate of the proportional coverage of the two classes in each pixel. The contouring and HNN approaches were then used to derive super-resolution maps depicting the shoreline from the soft classification. The zoom factor, z, of the HNN was set to 10, to illustrate the potential to zoom from 300 to 30 m. As in previous studies, i.e. Tatem et al. (2001); (Tatem et al. 2002), the weighting coefficients in Equation 3 in this research were set as equal. The magnitude of the weighting coefficients was, however, varied, and six different scenarios set over the range from 70 to 200 were evaluated. The accuracy of the shoreline predictions derived from both the contouring and HNN-based approaches to super-resolution mapping was compared against the shoreline represented in a hard classification of the original, spatially undegraded, Landsat ETM+ imagery.
Initially, the shoreline boundary was derived using the single prediction from the conventional LMM in which the centroid of each class was taken to represent its endmember response. The class centroids derived from the training data, however, do not fully describe the characteristics of the classes spectrally; the centroid represents just one point in the data cloud for a class in feature space. The use of a single pixel provides only one possible description of the endmember and does not recognize the potentially large amount of endmember variability and associated impacts (Somers et al. 2011). Since a single-class endmember and single composition estimate for each pixel may be unrealistic, a distribution of possible class composition estimates for each pixel was derived . This was based on running the LMM repeatedly, using the spectral response of every training pixel as the endmember of the relevant class. The distributional information could be used to indicate the variety of possible class compositions with, for example, the range between the 5th and 95th percentiles of the distribution used to summarize the key characteristics.
The accuracy of the shoreline mapping was evaluated using the average distance and RMSE between the predicted and reference shoreline locations. The perpendicular distance between the predicted and actual location of the shoreline was measured at each point every 10 m along the selected stretch of shoreline ( Figure 4) and its RMSE was calculated. The length of the shoreline in this study area was about 41.68 km. The range of possible shoreline positions was indicated by the average distance between the shorelines represented by the 5th and 95th percentiles of the class composition generated.
To reduce the impacts of intra-class spectral variation on the soft classification and super-resolution mapping, the water class was divided into two spectral sub-classes, turbid water and clear water. In this case, the unmixing analysis essentially focused on three classes, yielding fraction images depicting the proportion cover of land, turbid water, and clear water. For comparability with the earlier work, the training set for each class comprised of 90 pixels ( Figure 5(a)). This analysis is hereafter described as the three-class analysis. Comparison of the accuracy of the soft classification derived in this way and of the shorelines fitted to those associated with the earlier analysis would indicate the potential value of reducing the intra-class spectral variation on superresolution mapping.

Results and discussion
Sub-pixel class composition estimates for the two original classes, land and water, were derived for each pixel in the spatially degraded imagery. Initially, the endmembers were defined as the class centroids for input to the conventional LMM. There was a strong relationship between predicted and actual fractional cover (r = 0.87; RMSE = 0.26) ( Table 1). This suggests that the soft classification outputs were accurate and could be used to derive shoreline map.
The shorelines were derived using two approaches, the contouring and HNN, based on the output of the sub-pixel class compositions. Using the class proportion images derived from a soft classification as the input, the HNN is implemented using a selected set of parameters (Equation 3) which should be carefully chosen by the user. They are four weighting constants (k 1 ; k 2 ; k 3 , and k 4 ), a zoom factor z, and the number of iterations for the performance of the network. The values of the goal and weighting constraints estimation were derived via certain assumptions and multiple-network trial runs. Tatem et al. (2001Tatem et al. ( , 2002 suggest that the weighting constants should be equal for obtaining the highest performance and found, for their study, that an optimal value was 150. In this research, several trial networks were run with different values of the weighting constants (Table 2), the zoom factor of 10, and the number of iterations of 5000. The highest accuracy of shoreline mapping was obtained with the weighting constants of k 1 ¼ k 2 ¼ k 3 ¼ k 4 ¼ 70, and all the HNN-based results presented in this article are based on analyses that used these settings. The output of the HNN approach is a set of binary images with a spatial resolution that is z times finer than that of the input class proportional images derived from soft classification. In the analyses of this research, the HNN was undertaken with a zoom factor of 10. With that zoom factor applied for the input proportion images with spatial resolution of 300 m, the HNN has produced the output maps with a spatial resolution of 30 m which was equal to the spatial resolution of the reference image. The number of the binary images is equal to the number of land-cover classes to be mapped with each image is shown the location of a defined class. In this study, with the purpose of shoreline mapping, the remotely sensed imagery used was mapped to two land-cover classes, water and land. The binary images derived from the HNN approach were then vectorised along the boundary between the land and water classes to generate the shoreline. Figure 6(a) shows the shoreline in a part of the study area derived from the HNN approach as an example and Table 3 illustrates the accuracy of the predicted shorelines derived.
For each image pixel, a distribution of possible class proportion was derived by mixing the distributions of the pure land and water pixels used to train the mixture model (Figure 7). The distributions derived for the area represented by a pixel in each image may then use to derive an alternative indication of shoreline, and it may be preferable to be aware of the range of possible shoreline positions. For illustrative purposes, the locations of the shorelines using the 5th and 95th percentiles of the class composition distributions were generated. The nature of the distribution of possible mixing predictions for an image pixel will depend on the location of the point in feature space and the degree of intra-class variation and class co-variation present. This may impact on the width of the zone of possible shoreline locations, bounded by the 5th and 95th percentiles of land coverage.
The width of the zone of possible shoreline positions varied along the coastal strip of the study area (Figure 8). The average distance between the zone of possible shoreline Table 2. Accuracy of shoreline mapping derived from trial HNN (k 1 ,k 2 ,k 3 ,k 4 are the two goal functions, proportion, and multi-class weighting constants, respectively). positions s shown in Table 4. This information may be used as the means to measure the effect of intra-class variation on shoreline mapping. This range of possible shoreline positions may show the confidence of the shoreline mapped from one single prediction soft classification. It was suggested that the trust in the single set of class proportion   predictions as input for super-resolution mapping may be unwise and the distribution of possible predictions may be used to provide a richer interpretation for this process. Furthermore, if the uncertainty of the distribution is large, there may be a problem for the applications of using soft classification such as change detection and superresolution mapping . The potential to reduce the impacts of intra-class spectral variation on the accuracy of soft classification and super-resolution mapping by a defining spectral sub-class was illustrated by the three-class analysis. It was apparent that the accuracy of the predictions from the three-class analysis (r = 0.94, significant at 0.01 significance level) was much higher than that from the two-class analysis (r = 0.87, significant at the 0.01 significance level). Figure 8 shows the shoreline positions derived from the output of soft classification of both the two-class and three-class analyses using HNN-based approach as an example. For the purpose of the comparison, Table 3 shows the accuracy of the shoreline mapping from these two analyses and highlights that with both approaches used to generate shorelines the accuracy of the shoreline mapping in the three-class analysis was higher than that in the two-class analysis. For example, using HNN-based approach to derive shoreline mapping, the three-class analysis provided the shoreline with higher accuracy (RMSE = 35.22 m) than the two-class analysis (RMSE = 41.13 m). It was suggested that the reduction of the intra-class spectral variability increased the accuracy of soft classification and shoreline mapping.
In terms of the distribution of class composition estimates for each pixel, according to Table 4, the average ranges of possible shoreline positions derived from both the contouring and HNN-based approaches in the three-class analysis were smaller than that from the two-class analysis. For example, using the HNN-based approach, the width  of the zone of possible shoreline locations, bounded by the 5th and 95th percentiles of land coverage, was narrower, 213.62 m, from the three-class analysis (Figure 6(b)) than that from the two-class analysis (Figure 6(a)), 281.60 m. Similar trends were noted for the results based on the contouring approach (Table 4).

Concluding remarks
Mixed pixels are one of the main problems limiting the accuracy of mapping land cover from remotely sensed imagery. Soft classifications allow for partial-and multiple-class membership of mixed pixels and can give a more accurate and realistic representation of land cover than a hard classification. A further refinement may be made by locating geographically the sub-pixel class composition estimates depicted in a soft classification through super-resolution mapping. Some super-resolution mapping approaches attempt to maintain the class proportion information output from a soft classification as based on the assumption that a class can be represented by a single spectral endmember. This may be unrealistic as classes typically display a degree of spectral variability, and the accuracy of soft classification is negatively related to the degree of intra-class variation present. The impacts of intra-class spectral variation on the accuracy of soft classification and super-resolution mapping were investigated through analyses with two land-cover classes (land and water) and then three classes (with the water class subdivided into turbid and clear water). The three-class approach led to the reduction of the intra-class variation of water class compared with two-class analysis and ultimately to an increase in the accuracy of the super-resolution mapping. In terms of the sub-pixel class composition estimates derived by a soft classification, the accuracy of the predictions from the three-class analysis (e.g. r = 0.94 and RMSE = 0.20) was higher than that from the twoclass analysis (e.g. r = 0.87 and RMSE = 0.27). In terms of shoreline mapping, with both approaches used to generate shorelines, the accuracy of the shoreline in the three-class analysis was higher than that in the two-class analysis. For example, using HNN-based approach to derive shoreline mapping, the three-class analysis provided the shoreline with RMSE of 35.22 m, while the RMSE in the two-class analysis reached 41.13 m.
Furthermore, the width of the zone of possible shoreline locations, bounded by the 5th and 95th percentiles of the distributions of sub-pixel class composition estimates, was narrower from the three-class analysis than from the two-class analysis. For example, using the contouring approach to derived shorelines, the average distance between the possible shorelines in the three-class analysis was 250.13 m, while this number in the two-class analysis reached 311.29 m.
In conclusion, the intra-class spectral variation has negative impacts on the accuracy of soft classification and super-resolution mapping derived from the output of soft classification. Reducing the intra-class class spectral variability was a possible approach to decrease the negative impacts of intra-class spectral variability on soft classification and super-resolution mapping. Specifically, reducing the degree of intra-class variation increased the accuracy of soft classification and shoreline mapping accuracy and reduced the range of possible shoreline positions.