Iterative training sample expansion to increase and balance the accuracy of land classification from VHR imagery

Imbalanced training sets are known to produce suboptimal maps for supervised classification. Therefore, one challenge in mapping land cover is acquiring training data that will allow classification with high overall accuracy in which each class is also mapped onto similar user’s accuracy. To solve this problem, we integrated local adaptive region and boxand-whisker plot (BP) techniques into an iterative algorithm to expand the size of the training sample for selected classes in the current study. The major steps of the proposed algorithm are as follows. First, a very small initial training sample for each class set is labeled manually. Second, potential new training samples are found within an adaptive region by conducting local spectral variation analysis. Lastly, three new training samples are acquired to capture information regarding intra-class variation; these samples lie in the lower, median, and upper quartiles of BP. After adding these new training samples to the initial training sample, classification is retrained and the process is continued iteratively until termination. The proposed approach was applied to three very high resolution (VHR) remote sensing images and compared with a set of cognate methods. The comparison demonstrated that the proposed approach produced the best result in terms of overall accuracy and exhibited superiority in balancing user’s accuracy. For example, the proposed approach was typically 2%-10% more accurate than the compared methods in terms of overall accuracy and it generally yielded the most balanced classification.

Iterative training sample expansion to increase and balance the accuracy of land classification from VHR imagery ZhiYong Lv 1, 2 , GuangFei Li 1 , ZheNong Jin 2* , Jón Atli Benediktsson 3 , Fellow, IEEE, Giles M. Foody 4 Abstract-Imbalanced training sets are known to produce suboptimal maps for supervised classification. Therefore, one challenge in mapping land cover is acquiring training data that will allow classification with high overall accuracy in which each class is also mapped onto similar user's accuracy. To solve this problem, we integrated local adaptive region and boxand-whisker plot (BP) techniques into an iterative algorithm to expand the size of the training sample for selected classes in the current study. The major steps of the proposed algorithm are as follows. First, a very small initial training sample for each class set is labeled manually. Second, potential new training samples are found within an adaptive region by conducting local spectral variation analysis. Lastly, three new training samples are acquired to capture information regarding intra-class variation; these samples lie in the lower, median, and upper quartiles of BP. After adding these new training samples to the initial training sample, classification is retrained and the process is continued iteratively until termination. The proposed approach was applied to three very high resolution (VHR) remote sensing images and compared with a set of cognate methods. The comparison demonstrated that the proposed approach produced the best result in terms of overall accuracy and exhibited superiority in balancing user's accuracy. For example, the proposed approach was typically 2%-10% more accurate than the compared methods in terms of overall accuracy and it generally yielded the most balanced classification.
Index Terms-Very high-resolution remote sensing image, land cover classification, training sample collection.

I. INTRODUCTION
R EMOTE sensing images with very high resolution (VHR) can currently be obtained with ease. Considering some of the advantages of VHR images, such as the clarity of the images and the ability to capture ground geometry details, these images have been applied successfully in applications such as land cover mapping [1]- [4] and change detection [5]. However, while VHR images typically have a fine spatial resolution, they may have a low spectral resolution and consequently may not necessarily yield an accurate classification because of limited spectral separability. Even for a VHR image acquired in many wavebands, such as from a hyperspectral sensor, a high classification accuracy is not guaranteed [6]- [8].
A popular technique used to address some of the problems encountered in the classification of VHR imagery is spatialspectral feature extraction [9], [10]. This method focuses on exploring spatial features by utilizing the correlation between a pixel and its neighbors in terms of spectra or another derived feature to complement insufficient spectra. Zhang et al. developed an algorithm, called pixel shape index (PSI), for improving the classification of VHR remote sensing images [11]. On the basis of PSI, Zhang et al. promoted a similar algorithm, called object correlative index, for land cover mapping with VHR remote sensing image [12]. Texture is another helpful feature that can be coupled with spectra for improving classification accuracy [13]- [15]. Many mathematical models have been developed for extracting spatial features and supplementing insufficient spectra. For example, extended morphological profiles (EMPs) were promoted for improving hyperspectral and VHR remote sensing image classification [16]. Thereafter, multi-shape structural element morphological profiles (M EMPs) [17] and morphological attribute profiles (APs) [18] were presented. Here it should be noted that a spatial filter can also a helpful tool for smoothing noise in classification maps based on VHR remote sensing images. For example, an edge-preserving filter has been developed for image classification, and comprehensive research has clearly demonstrated the advantages of that filter [19]- [21]. A rolling guide filter (RGF) has also been proposed and applied successfully to the classification of VHR remote sensing images [22], [23]. Recursive filters (RFs) have also been developed for increasing the accuracy of classification from hyperspectral and VHR remote sensing images [3]. In addition, objectbased approaches [24], [25], deep learning methods [26], [27], and contextual models [28], [29], have been promoted and applied to the classification of VHR remote sensing images. Although numerous methods have been developed and improved classification accuracy for VHR remote sensing images in corresponding cases, no single method can be labeled "the best" or "the most appropriate" for all cases [30]- [32]. Moreover, in terms of the supervised classification of VHR remote sensing images, training sample selection for most available methods are achieved manually [33], [34]. Often an equal number of training samples are selected randomly per class [6], [35], [36]. Therefore, training sample selection depends excessively on the experience of practitioners, and on manually selecting the precise label which can be extremely difficult due to the limitations of human vision [37]. Furthermore, the experimental results based on current methods clearly show that equal training samples per class may not provide a sufficiently accurate classification [2], [35], [38]- [41]. The quality of a classification may be assessed in a variety of ways [42]. Often the focus is on the accuracy of the entire classification assessed over all classes and on the accuracy achieved on a per-class basis. Indeed, popular Anderson-type targets focus on a level of overall accuracy to be achieved with the accuracy for the individual classes to be broadly similar or balanced [43], [44]. Accuracy on a per-class basis can be assessed in a variety of ways depending on the objectives of the analysis [42]. For example, here, a concern is with commission error and hence accuracy on a per-class basis may be assessed using user's accuracy. It is argued here that the training sample can be acquired in a way to enhance the production of an image classification to help achieve desired accuracy targets.
Regardless of spatial-spectral feature extraction, mathematical model-based approaches, or filter smoothing techniques, training and testing are two unavoidable tasks in using supervised classifiers for classification. Although the training stage of a classification should be designed for the specific classifier to be used, with a training set being of variable value to different classifiers a widely observed trend is for classification accuracy to increase with the size of the training set [45], [46]. Thus, methods to help select training samples have been suggested. For example, Richards et al. augmented the training set by using suitable neighbors to improve classification accuracy with hyperspectral and VHR remote sensing images for a maximum-likelihood classifier [47]. Imani et al. explored features using weighted training samples for land cover classification with VHR remote sensing images [48]. Kang et al. found that manually labeling the training sample may result in assigning an incorrect label. Consequently, they promoted an algorithm for correcting mislabeled training samples to improve the classification accuracy [37]. Tu et al. presented an algorithm for detecting and correcting noise labels in the classification of hyperspectral and high spatial resolution images [49], [50]. In addition, many researchers have focused on classification with limited labeled training samples because manually labeling samples for VHR remote sensing images is both time consuming and labor intensive [51], [52]. For example, Huang et al. proposed the automatic labeling and selection of training samples with the assistance of OpenStreetMap [51]. Li et al. promoted an unsupervised sample collection method for urban land cover mapping with Landsat satellite images [53]. Despite considerable research related to training samples for hyperspectral or Landsat images, only a few approaches have focused on training sample collection for VHR image classification. Although the current literature [37], [47] indicates that neighboring pixels can be used to exclude noise sample points and reduce their negative effects on classification accuracy. Intra-class variation within a land cover patch should be further considered when collecting training samples. On the basis of these methods and applications, the intra-class heterogeneity that defines the low correlation among features of the same class should be fully considered when expanding the initial training samples (ITSs) to improve classification accuracy. Furthermore, balancing user accuracy among all classes remains a challenging problem when improving the classification accuracy of VHR remote sensing images.
The use of imbalanced training sets is often a source of classification error, with bias towards the more prevalent classes over the rarer classes. To tackle the issue of class imbalance, a well-known algorithm named Synthetic Minority Over-sampling Technique (SMOTE) has been proposed in [54]. The basic idea of the SMOTE algorithm lies in generating synthetic minority examples in order to over-sample the minority class to acquire a relative balance classification map. In recent years, SMOTE has been improved and applied successfully in a range of fields, such as social data mining [55], hyperspectral data classification [56], and deep learning [57]. More comprehensive analysis on SMOTE can be seen in [58]. It should be noted that, the initialization of the SMOTE algorithm requires an unbalanced training sample set for each class, and it also has limitation when it is applied to highdimensional data [59].
Therefore, in this study, we first propose an effective method for expanding training samples to improve classification accuracy while balancing user accuracy for classification from VHR remote sensing images. The proposed approach consists of the following steps. First, an ITS set with an extremely small quantity is labeled manually. Then, to preserve the uncertain shape and size of a land cover patch in a VHR image, an adaptive region around each point of ITS is constructed to utilize the contextual information. To detect potential new training samples within the adaptive region, the spectral similarity among each pixel within the adaptive region is obtained, and a box-and-whisker plot (BP), which is a classical data descriptive statistical tool, is used. Finally, three points around each initial labeled point are assigned as expanding training samples with the assistance of BP. The preceding steps are considered as an iterative algorithm. In the iterative process, whether the training sample for a specific class should be expanded continually depends on whether the corresponding classification map satisfies a predefined rule proposed here (the details of the suggested rule are provided in Section III). Thus, this algorithm can automatically adjust the number of training samples for each class and seek to balance user's accuracy. The iterative algorithm is terminated when all the expanded training samples for each class satisfy our proposed rule. The major contributions of the proposed approach are as follows.
1) The proposed approach provides a novel method for enriching the ITS set to improve the classification accuracy of VHR images. Intuitively, the shape and size of different individual land cover patches are unknown in a given VHR image scene. Thus, the adaptive region around each training sample point is more helpful than a regular window or a strict mathematical model for utilizing contextual information. In addition, given that the ITS for a specific class may not cover sufficient spectral signature for a land cover patch, the representative ability of the ITS may be scant for a class. To include additional spectral signatures for a class, which is the basic motivation of this study, an iterative algorithm based on the adaptive region and BP techniques is developed.
2) The proposed approach is effective for improving the overall accuracy and balancing the user's accuracy of classifications with VHR remote sensing images. Compared with the results based on ITSs without any processing, the training sample enrichment approach [47], and the relatively new mislabeled training sample detection and correction approach [37], it is shown in this paper that our proposed method achieves better overall accuracy and balanced user's accuracy in classifying VHR remote sensing images from three different sources. Furthermore, the experimental results effectively demonstrate that the proposed approach is feasible for the classification of different image features. Although we have only observed its accuracy on a support vector machine (SVM) classifier, the proposed method may exhibit potential ability for other supervised classification methods.
3) Notably, the proposed approach requires relatively fewer training samples for initialization, and its parameters do not necessitate hard-tuning for different datasets. The experimental section clearly demonstrates that the proposed approach can improve classification accuracy while adjusting user accuracy among different classes. For example, when the ITS is less than 4.2% of the ground reference samples used to test the classifier, the accuracy of the proposed approach without parameter hard-tuning remains competitive.
The remainder of this paper is organized as follows. The assessment of classification accuracy with remote sensing images is reviewed to enhance the understanding of the contributions of our proposed approach in Section II. The details of the proposed approach are presented in Section III. Experiments are conducted and analyzed in Section IV. Lastly, a conclusion is provided in Section V.

II. REVIEW OF THE LAND COVER MAPPING ASSESSMENT.
The accuracy of the classifications produced were evaluated on an overall and per-class basis. Popular measures of accuracy of relevance include overall accuracy (OA), average accuracy (AA) and the kappa coefficient (Ka) [60]- [63]. If TP, TN, FP, and FN are defined as the number of "True Positives", "True Negatives", "False Positives", and "False Negatives", respectively. OA is calculated as "OA= where N is the total number of the pixels in a given image. In addition, the Ka is the percentage of agreement corrected by the number of agreements that would be expected due to chance alone, and the class-specific accuracy [64]. Apart from these measurements, user accuracy is another important index for evaluating classification accuracy from the perspective of map users [60], [61]. User accuracy essentially indicates how frequently the class on a map will actually be present on the ground. For a specific class (C n ), the user's accuracy of C n can calculate as "UA(C n )= T Pc n T pc n +F Pc n ". This feature is referred to the reliability. Thus, user accuracy is important in practical engineering applications. If fact, UA is complemented by commission error (CE): user accuracy = 100%-CE. Assume that the confusion matrix between the result and the ground reference is M = C 11 C 12 C 21 C 22 , and the CEs for Classes C 1 and C 2 are CE(C 1 ) = C12 C11+C12 and CE(C 2 ) = C21 C21+C22 , respectively. Then, high user accuracy indicates low CE and high reliability of the result. Therefore, apart from OA, AA, and Ka, the user accuracy of each class should be improved in practical applications.

III. PROPOSED APPROACH
The major contribution of this paper is to propose an effective method for expanding training samples to improve OA, AA, Ka, and user accuracy. Two techniques, namely, adaptive region and BP, are integrated into an iterative algorithm to achieve these objectives. To clarify the terminated condition of the proposed approach, the matched pixel between two iterations are defined and compared in (1), as presented as follows: where M Cn k−1,k and M Cn k,k+1 are defined in (2) and (3), respectively. M Cn k−1,k is the ratio between the matched pixel and the total pixels in terms of the classification map based on different iterations for class C n . In addition, ε is a very small constant. In this study, ε is fixed at 0.003. Therefore, (1) implies that when the difference among the classification maps (M k−1 , M k , and M k+1 ) based on the different iterations' expanded training samples for the specific class C n is less than ε the current training sample's distribution and quantity of training samples for class C n are satisfactory. Thus, the training samples for class C n will not be adjusted in the next iteration. However, when the training samples for other classes are varied, these variations may affect the accuracy of class C n for a supervised classifier in the next iteration. Therefore, each class should be checked to determine whether it meets the predefined condition (i.e., (1)) in each iteration. When the training samples for a class satisfy this condition, these samples will be transformed directly into the next iteration. Otherwise, the training samples for this class will be expanded continuously. The iteration will be terminated when the result of each class satisfies (1).
The details of the proposed algorithm is presented in Algorithm 1. In this algorithm, ITS 0 is defined as the original training sample set, and L C0 , L C1 , L C2 ...L C T are the individual sample set for each class. Where T is the total number of class.
From the perspective of technique and practical applications, the major advantages of the proposed approach lie in the following aspects.
(1) The proposed approach is highly automated. Although the proposed approach references three parameters (T 1 , T 2 , and ε) in the iteration, they do not require hard-turning.

Algorithm 1
The algorithm of the proposed approach. Moreover, a competitive accuracy can be acquired under the fixed parameter setting for different datasets.
(2) The user's accuracy of each class can be balanced by adjusting the training samples. In the iterative process of the proposed approach, the training samples for each class are adjusted until all the training samples meet our predefined condition (Equation 1). The idea of balancing training samples among different classes to avoid bias in the corresponding accuracy is first promoted.
To guarantee the repeatability of our proposed approach, the referenced code can be downloaded here (https://github.com/Yzxy669/CodeLink). In addition, it is worth noting that the requirement of running the code is OpenCV 2.4 library, C++, and Win 10 system.

A. Construction of an adaptive region for utilizing contextual information
Here, we assume that we can manually assign a small initial training sample with rare points to each class. Instead of directly using ITS, ITS is adopted for the initialization of our proposed approach. As shown in Algorithm 1, ITS 0 = L C0 , L C1 , L C2 ...L C T , where L C0 is the labeled training sample set for class C 0 , T is the total number of interesting classes for a given image scene, W and H denote the width and height of the image, respectively. k is the iteration number of the algorithm. To explore potential additional training samples, an adaptive region technique is employed to utilize the contextual information and then detect the potential additional samples around each labeled pixel.  The adaptive region was constructed with two predefined parameters (T 1 and T 2 ). First, ∆s is defined as the spectral similarity between a labeled sample and its neighbors in terms of gray value, ∆s = P ij − P sur , where P ij and P sur denotes the gray value of the pixel at the position (i, j) and its surrounding neighbors, respectively. If ∆d denotes the total number of the assigned samples around a labeled sample. Then, a region is extended gradually and adaptively from a single labeled pixel in an adaptive way and the iteration will terminate until ∆s or ∆d is not less than the predefined T 1 and T 2 , respectively. Therefore, the proposed adaptive region exhibits an advantage in capturing the shape and size of an irregular land cover patch, as shown in Fig. 1. This indicates that the adaptive technique with a fixed setting of T 1 and T 2 can describe the local boundary of different land cover patches with various shapes and sizes. Selecting potential samples within the adaptive region around the labeled pixel (green point) is beneficial for improving positive probability. Thereafter, the adaptive region technique is applied to each pixel within the adaptive region of a labeled pixel, as shown in Fig. 2, the pixels within an adaptive region have different spectral values but are spectrally homogeneous because the adaptive region is extended gradually by comparing the central pixel and its eight neighboring pixels in terms of spectra.
To further confirm which pixel is more suitable for expanding the training samples, the standard deviation (δ) between each pixel in the extended region and its central pixel is obtained on the basis of its corresponding adaptive region, as illustrated in Fig. 2, if the pixels within an adaptive region has a small δ, it means the homogeneity of the adaptive region is high. The number in each pixel represents the central pixel in the adaptive region, the green pixel symbolizes the initially or previously labeled sample point, and the yellow pixel is the one within the extended adaptive region around the labeled sample. In accordance with the principle of the Tobler's first law of geography (TFL) [24], everything is related to everything else, but near things are more related than distant things. The distribution of pixel in the remote sensing image also obeys this principle. Therefore, when the spectral variation around a pixel is low, the pixel is a good representative for its surrounding spectral signature. Here, in terms of gray image, the value of δ implies the spectral variation of neighboring pixels. Thus, as the implication of TFL, a large δ around a pixel indicates a low possibility of the pixel being a sample, because δ is a measure of the variation of the group pixel that consists of an adaptive region.

B. Expanding training samples with Box and Whisker Plots (BPs)
After acquiring the δ of each pixel within the extended adaptive region central at a labeled pixel, BP is proposed here to expand the original training sample and cover the intra-class spectral heterogeneity. BP exhibits an advantage in identifying outliers from a given dataset without any distribution assumption [65]. BP is used for the first time and integrated into the adaptive region for the training sample collection of VHR image classification.
In this section, BP is adopted to cover more heterogeneous spectral signatures by expanding training samples. BP can achieve this objective because it can classify outliers into different levels, which reference three quartiles: low (Q1), median (Q2), and high (Q3). Due to more than half of the information can be covered between Q1 and Q3, the threevalue referred to Q1, Q2 and Q3 are proposed to assign as the training samples. As shown in Fig. 3, 1) the range between Q1 and Q3 covers approximately 64.29% information; and 2) Q1, Q2, and Q3 signify the different values of a pixel group for an adaptive region. Therefore, selecting Q1, Q2, and Q3 as the expanding training sample points cannot only extend the quality of the training sample but can also cover wider spectral heterogeneity of an entity.
To clarify the expansion of an ITS based on our proposed approach, two GIF pictures from the Pavia University and ZH-3 satellite image datasets are presented in the supporting materials. When iteration=0, the distribution and quantity of the ITS are presented as the legend in the pictures, and the training samples expand automatically around each point of the ITS with the increment of iteration.

IV. EXPERIMENTS
To verify the effectiveness and accuracy of the proposed approach, two experiments are conducted with the following (1) Investigating superiority by compared with the cognate methods: One experiment with two real VHR remote sensing images is designed to investigate the superiority of the proposed approach by comparing its results with those based on using ITS directly and two cognate methods [37], [47].
(2) Testing adaptability to different image features: One experiment with an image acquired from a camera mounted on an unmanned aerial vehicle (UAV) is presented with different spatial-spectral features, including RGFs [22], RFs [3], EMPs [16], and M EMPs [17].
To ensure the fairness of the comparisons among different methods, all the classification maps are achieved using the popular supervised classifier, i.e., SVM. In addition to OA, AA, and Ka for quantitative evaluation and comparison, the standard deviation of user's accuracy (SDUA) is calculated to assess the ability of the proposed approach in balancing user's accuracy among all classes, as shown in equation (4).
Where SDU A is the standard deviation of user's accuracy. As mentioned before, T is the total number of interesting classes in an image scene, U A i is the user accuracy for the the i − th class, and U A is the mean value of user's accuracy for all the classes.

A. Data description
Three real VHR remote sensing images acquired from different sensors are adopted for the following experiments. The details of the datasets are presented as follows.
(1) Pavia University image: This image is acquired by the Reflective Optics System Imaging Spectrometer (ROSIS-03) sensor with a high spatial resolution of 1.3 m/pixel. The original image contains 115 bands with a spectral range of 0.43-0.86 µm. Here, bands 10, 27, and 46 were selected to signify the red, green, and blue bands, respectively, of the   Fig. 4, the size of this area is 610 × 340 pixels, and nine classes are included in this image scene.
(2) ZH-3 image: This image is acquired by QuickBird satellite, which has a spatial resolution of 0.62 m/pixel. The original image has five spectral bands. As shown in Fig. 5, seven classes are classified in this image scene. The size of this area is 943 × 926 pixels.
(3) UAV image scene: This image is acquired by a UAV band with a Canon 5D Mark II camera. The flight elevation when the image was acquired is approximately 100 m. The image has three bands and a resolution of 0.1 m/pixel. As shown in Fig. 6, the size of the image is 1400 × 1000 pixels, and seven classes are included in our experiments.
The first two images refer to an urban area, and the third image is a typical countryside area in China. The three images were acquired using different sensors, platforms, and resolutions. Therefore, achieving a land cover map with highly limited ITSs remains challenging.

B. Experiment setting
To achieve the experiment objectives, the experimental setting is as follows: (1) Selection of ITSs for each experiment.
3) Parameters of the compared sample extension methods.
The proposed approach focuses on improving overall classification accuracy and balancing the user's accuracy by expanding ITSs. To exclude other factors on accuracy, the parameters of the SVM and spatial-spectral feature extraction methods are fixed in each experiment. To ensure fairness in comparison, the parameters of the compared cognate approaches [37], [47] are optimized via the trial-and-error method.

C. Influence of parameters
As mentioned in Section III, three parameters (T 1 , T 2 , and ε) are fixed without hard-tuning for different images. Given that the proposed approach focuses on overall performance and user's accuracy, parameter ε of the proposed approach refers to the number of iterations and training samples. Thus, the relationship between training samples and OA, AA, and Ka for each iteration is investigated for each image.
As shown in Fig. 7, the first, second, and third rows present the relationship accuracy for the Pavia University, ZH-3, and UAV images, respectively. In the first row, OA, AA, and Ka increase with the number of iterations, and the iteration process is terminated at the ninth round when the precondition (1) is satisfied. In the second row, OA, AA, and Ka improve when the iteration is from 0 to 1. Accuracy fluctuates with an unpredicted trend when the iteration is larger than 2, and the iteration for the ZH-3 image is terminated at the 10th round in accordance with the predefined condition. These observations compared with the result based on ITSs clearly demonstrate that accuracies (OA, AA, and Ka) can be improved sharply when the proposed approach is applied to expand ITS.
The third row of Fig. 7 clearly illustrates the feasibility and adaptability of the proposed approach in terms of different spatial-spectral features. Accuracy improves gradually with Fig. 7: Analysis of the influence of the iterations of the proposed approach on the three images in terms of OA, AA, and Ka. The first, second, and third rows represent the Pavia University, ZH3 satellite, and UAV images, respectively. the iteration of the proposed approach. When iteration = 5, accuracy reaches its peak and exhibits a horizontal trend with increasing iteration.
The user's accuracy obtained by the proposed approach for the Pavia University image is presented in Fig. 8. This figure clearly shows the ability of the proposed approach to improve and balance the accuracy of most classes. For example, in the nine interesting classes, the accuracy of 7/9 classes is improved. Compared with the accuracy based on ITSs and expanded ITSs, the improvement for these classes are 6.9% (asphalt), 17.17% (gravel), 34.4% (trees), 1.5% (painted metal sheets), 39.6% (bare soil), 16.8% (bitumen), and 9.8% (shadow). Although the accuracy of 2/9 classes (meadows and self-blocking bricks) is reduced, the loss for meadow and selfblocking bricks is 1.2% and 7.2%, respectively, which are relatively slight. Moreover, OA, AA, and Ka based on the proposed approach are higher than those based on ITSs.
Moreover, 1) the training sample for several classes may be maintained by increasing the number of iterations, but the corresponding accuracy for these classes may be adjusted, such as those for asphalt (Fig. 8a), gravel (Fig. 8c), and trees (Fig. 8d); and 2) increasing training samples may not increase accuracy for several classes, such as meadows (Fig. 8b). These results demonstrate the ability and necessity of the proposed approach in balancing user's accuracy by automatically adjusting the quality of the training samples. This ability is attributed to the probability that the ratio for different training samples among each interesting class may produce different user's accuracy for a supervised classifier. A similar observation is presented in Fig. 9 for the ZH-3 image, where accuracy for 6/7 classes is improved, but accuracy for 1/7 classes is slightly reduced.
To quantitatively evaluate the balancing ability of the proposed approach for the user's accuracy, SDUA is defined and adopted as the assessment index. As shown in Fig. 10, SDUA decreases with increasing number of iterations. This finding demonstrates that deviation in the user's accuracy decreases with increasing number of iterations, verifying the balancing ability for the user's accuracy of the proposed approach.

D. Results
On the basis of the preceding parameter settings and analysis, the results of the two experiments are presented and analyzed in this section.
The first experiment is conducted on a Pavia University image using SVM. The result based on the proposed approach is compared with those using ITSs directly, Kang's method [37], and Richard's method [47]. As well as the proposed algorithm, the Kang's method [37] and Richard's method [47] are also utilize the contextual information to refine the training samples for improving classification accuracy. The quantitative results for each specific class are provided in Table I. The comparisons clearly demonstrate that the selected training sample refining methods achieve better accuracy than using ITSs directly. For the trees in the Pavia University image, Richard's method [47] improves accuracy from 40.5% to 50.8% and Kang's method [37] improves accuracy to 43.8%. The proposed approach improves the detection accuracy for trees by up to 74.9%. Compared with the other methods, the proposed approach performs with the smallest SDUA. This result demonstrates that the deviation among each class is the smallest, and the user's accuracy is better balanced in the proposed approach. Moreover, the table shows that the accuracy of nearly all the classes is evidently improved. Furthermore, the proposed approach achieves the best accuracy in terms of OA, AA, and Ka compared with using ITSs directly, Richard's method [47], and Kang's method [37]. The visual performance in Fig. 11 further verifies the effectiveness of the proposed approach. The proposed approach achieves the least salt-and-pepper noise in the result compared with using ITSs directly, Richard's method [47], and Kang's method [37].
The first experiment based on the ZH-3 satellite image further proves the feasibility and accuracy of the proposed approach. As shown in Table II, the user's accuracy of nearly all the classes is improved by the proposed approached compared with using ITSs directly, Richard's method [47], and Kang's method [37]. The proposed approach and Richard's method [47] achieve similar SDUA, but the former obtains better accuracy in terms of OA, AA, and Ka. Thus, the application of the proposed approach on the ZH-3 satellite image further confirms its feasibility and accuracy in improving OA, AA, and Ka and in balancing the user's accuracy. The visual presentation of these comparisons is provided in Fig. 12.
The second experiment is performed on a UAV image with a spatial resolution of 0.1 m/pixel to test the feasibility and adaptability of the proposed approach with regard to different spatial-spectral features. Table III provides the classification accuracy based on ITSs and our proposed approach while using a specific spatial-spectral feature. The comparison in the table indicates that a spatial-spectral feature with ITSs processed using our proposed approach achieves higher accuracy than that processed using ITSs directly. Furthermore, SDUA shows that the proposed approach reduces deviation among different classes in terms of the user's accuracy. For example, OA for the RGF [22] feature is improved from 79.92% to 93.66% when the proposed approach is applied to process ITSs. Simultaneously, SDUA is reduced from 17.54   Richards' method [47], (c) Kang's method [37], (d) proposed approach, (e) ground reference map, and (f) legend. to 8.14. When observing and comparing classification maps obtained using ITSs directly and by applying the proposed approach to process ITSs in terms of each spatial-spectral feature, the visual accuracy of the classification map based on our proposed approach presents considerable improvement with less noise, as highlighted by the red rectangle in Fig. 13.
The results and comparison of the two experiments referenced to three real VHR remote sensing images show the following findings. 1) The proposed approach can improve the classification accuracy of VHR remote sensing images by expanding ITSs in terms of OA, AA, and Ka. Compared with the cognate methods, the proposed approach achieves better accuracy in each experiment. 2) The proposed approach can improve the user's accuracy of nearly all classes and balance deviation among each specific class. 3) Although the proposed approach references three parameters (T 1 , T 2 , and ε), it can achieve the required accuracy for the different images without parameter hard-tuning. 4) The initialization of the proposed approach merely requires minimal samples, such as less than 20 points for each class in the three images. Moreover, satisfactory accuracy can be achieved using these extremely few initial samples.

V. CONCLUSION
In this paper, a new training sample enrichment approach is proposed to improve classification accuracy and balance the deviation of the user's accuracy. To achieve these objectives, the proposed method assumes that the spectral feature in a land cover patch is typically heterogeneous in a VHR remote sensing image and attempts to enrich and adjust the training samples for each class during the supervised classification process. The proposed approach is implemented with a very small ITS and without parameter hard-tuning, but it achieves higher accuracy compared with state-of-the art cognate methods. In addition, the proposed approach exhibits an advantage in improving the user's accuracy of nearly all the classes and  [22], RF [3], EMPs [16], and M EMP [17] spatial-spectral features, respectively, using ITSs directly; (b), (d), (f), and (h) are the results based on the RGF, RF, EMP, and M EMP spatialspectral features, respectively, using ITSs processed with our proposed approach.
balancing deviation among each user's accuracy. Thus, the proposed approach may be widely used and accepted by many practitioners.
The effectiveness and adaptability of the proposed approach are validated on three VHR remote sensing images acquired from different sensors, platforms, and resolutions. The study areas are typical urban and rural scenes with various land cover types, including roads, trees, meadows, and buildings. Quantitative evaluation and visual comparisons based on the images indicate that the proposed approach can achieve higher accuracy and balanced the user's accuracy for different classes.
In summary, the proposed approach is a promising algorithm for VHR remote sensing image classification. From the methodological perspective, however, the scientific concept of the proposed approach still requires comprehensive investigation. In our future study, we plan to upgrade our computer and apply it to large areas and other types of VHR remote sensing images to further test the robustness and adaptability of the proposed approach. In addition, convolutional multi-scale features will be adopted instead of the single spectral feature used in the proposed approach to improve its robustness and effectiveness. Guangfei Li is currently pursuing a masters degree in computer science at Xian University of Technology, Shaanxi, China. He is interested in spatial-spectral feature extraction, pattern recognition,ground target detection, and land cover/land use change detection, through remote sensing image with high or very high spatial resolution (including satellite imagery and aerial images).
Zhenong Jin received his B.S. degree from Peking University, and Ph.D. from Purdue University. Currently, he is an Assistant Professor in the Department of Bioproducts and Biosystems Engineering at University of Minnesota-Twin Cities. Being a broadly trained researcher in agroecologist, he leverages statistical and process-based models, remote sensing, and machine learning approaches to advance the science that enhance the adaptability and sustainability of crop production, food security and ecosystem functioning in the context of climate change and human interventions. His past and current researches mainly focus on: (i) mapping agriculture features using high-resolution satellite imagery; (ii) forecasting crop yields for a range of applications; (iii) integrating crop models with remote sensing for precision management; and (iv) understanding the impacts of climate change on agroecosystem. Before join University of Minnesota, he worked as a Postdoctoral Researcher at Stanford University, and the Lead Crop Scientist at AtlasAI P.B.C., where he directed the development of 10m-resolution maps of crop types and yield in East Africa. Giles M.Foody (M'01-SM'10-F'13) earned the B.Sc. and Ph.D. degrees from The University of Sheffield, Sheffield, U.K., in 1983 and 1986, respectively. He is currently a Professor of Geographical Information Science with the School of Geography, The University of Nottingham, Nottingham, U.K. His research interests lie at the interface between remote sensing, ecology, and informatics with a core focus on image classification for land cover mapping and monitoring applications at scales ranging from the subpixel to global. He has authored or coauthored 9 books and more than 200 refereed journal articles. Dr. Foody is currently serving as the Founding Editor-in-Chief of Remote Sensing Letters. He holds additional editorial roles on over ten other journals including Landscape Ecology, International Journal of Remote Sensing, Remote Sensing of Environment, Geocarto International, Remote Sensing and the International Journal of Applied Earth Observation and Geoinformation.