Measuring River Wetted Width From Remotely Sensed Imagery at the Subpixel Scale With a Deep Convolutional Neural Network

River wetted width (RWW) is an important variable in the study of river hydrological and biogeochemical processes. Presently, RWW is often measured from remotely sensed imagery, and the accuracy of RWW estimation is typically low when coarse spatial resolution imagery is used because river boundaries often run through pixels that represent a region that is a mixture of water and land. Thus, when conventional hard classification methods are used in the estimation of RWW, the mixed pixel problem can become a large source of error. To address this problem, this paper proposes a novel approach to measure RWW at the subpixel scale. Spectral unmixing is first applied to the imagery to obtain a water fraction image that indicates the proportional coverage of water in image pixels. A fine spatial resolution river map from which RWW may be estimated is then produced from the water fraction image by superresolution mapping (SRM). In the SRM analysis, a deep convolutional neural network is used to eliminate the negative effects of water fraction errors and reconstruct the geographical distribution of water. The proposed approach is assessed in two experiments, with the results demonstrating that the convolutional neural network‐based SRM model can effectively estimate subpixel scale details of rivers and that the accuracy of RWW estimation is substantially higher than that obtained from the use of a conventional hard image classification. The improvement shows that the proposed method has great potential to derive more accurate RWW values from remotely sensed imagery.


Introduction
Rivers and streams are key routes for water movement and play a major role in local-to global-scale hydrological and biogeochemical cycles (Allen & Pavelsky, 2018;Downing et al., 2012). River wetted width (RWW) is a fundamental variable (Song et al., 2018) influencing, among other things, the water surface area that controls the flux of gases in the global carbon cycle (Butman & Raymond, 2011;Raymond et al., 2013). RWW is also an important input to scientific models for the estimation of properties such as river discharge (Azzari & Lobell, 2017;Benstead & Leigh, 2012;Bjerklie et al., 2005;Gleason & Smith, 2014;Sichangi et al., 2016;Song et al., 2018;Sun et al., 2018) as well as the simulation of river biogeochemical processes (Gomez-Velez & Harvey, 2014). Although RWW can be measured in the field, such activity is often time-consuming and can be challenging for some rivers. Recently, the estimation of RWW from remotely sensed imagery has emerged as a viable approach (Stumpf et al., 2016), and compared with field-based methods, remote sensing offers an inexpensive means for RWW estimation that is efficient, especially for the study of geographically large areas (Allen & Pavelsky, 2015;Isikdogan et al., 2017;Miller et al., 2014;Priestnall & Aplin, 2006;Zeng et al., 2015).
Remote sensing-based approaches to RWW estimation typically require the production of a map depicting rivers by manual digitization or automatic image processing methods (Pavelsky & Smith, 2008). Once produced, numerous estimates of RWW can be obtained for the area mapped (Allen & Pavelsky, 2015;Miller et al., 2014;O'Loughlin et al., 2013;Rawlins et al., 2014). A variety of remotely sensed data sets (images) is available for river mapping (Priestnall & Aplin, 2006). These images have different temporal and spatial resolutions and are suitable for a range of application requirements. For example, the MODerate-resolution Imaging Spectroradiometer (MODIS) provides imagery with a finest spatial resolution of 250 m and a daily repeat frequency, which may be attractive for the monitoring of large rivers routinely at regional to global scales (Azzari & Lobell, 2017;Tarpanelli et al., 2013). Medium spatial resolution multispectral imagery, such as those acquired from Landsat series satellites, has a spatial resolution of 30 m and is acquired approximately every 16 days and is widely used in regional scale studies (Allen & Pavelsky, 2015;O'Loughlin et al., 2013). Fine spatial resolution satellite images (Fisher et al., 2013) and aerial sensor imagery (King et al., 2018;Rawlins et al., 2014) can be used to estimate RWW for narrow rivers.
Compared with other land cover classes, a crucial feature to be considered in measuring RWW from remotely sensed imagery is that the area covered by river water bodies often fluctuates rapidly within a short time period (Benstead & Leigh, 2012;Stumpf et al., 2016). The rapid change of RWW implies a certain amount of difficulty for timely and accurately measuring RWW, due to the trade-off between the spatial and temporal resolution of satellite remotely sensed imagery Muad & Foody, 2012). In principle, one of the most crucial factors that influences the accuracy of RWW estimation is the spatial resolution of the remotely sensed imagery used: RWW estimates are often most accurate when fine spatial resolution imagery is used. In practice, however, fine spatial resolution remotely sensed images tend to have a relatively small area of coverage, low temporal resolution, and high price, which greatly limits their utility for RWW estimation. In contrast, coarser spatial resolution remotely sensed images often cover large areas with a high temporal resolution and are inexpensive or freely available, and thus, such imagery is attractive for applications focused on the rapid fluctuation of RWW over large areas. However, it is often difficult to estimate RWW accurately with coarse spatial resolution images. The central reason for this is that the river's boundaries typically run through image pixels rather than along their edges, and the river cannot be accurately represented in river maps produced using conventional (hard) image classifications (Foody et al., 2005).
The magnitude of the aforementioned problem is a function of the river width relative to the pixel size, being greatest for narrow rivers with coarse spatial resolution images. As an indication of the magnitude of the problem, in the North American River Width data set that is produced from Landsat sensor images, the root mean square error (RMSE) is estimated to be 38 m, which is close to the minimum theoretical value obtainable from 30-m resolution Landsat imagery (Allen & Pavelsky, 2015). Moreover, in their results, the smallest RWW values are the same as the size of an image pixel (30 m), and the scatter of points around the regression line is greater for rivers whose widths are less than 100 m, which is mainly caused by the problem of mixed pixels along the river boundaries. This phenomenon poses a real problem for the study of narrower rivers, which comprise a significant proportion of the total river network. Headwaters illustrate this well-they are the finest branches of the river networks and have been estimated to comprise 70% of the global river network in terms of length (Lowe & Likens, 2005) and 15% of the river surface area (Raymond et al., 2013;Wallin et al., 2011). Moreover, processes such as CO 2 evasion occur at a higher rate in narrower rivers due to connectivity with carbon-rich terrain (Benstead & Leigh, 2012) and increased hydrodynamic turbulence (Aufdenkampe et al., 2011). It is thus imperative to address the uncertainty in measuring RWW from rivers whose width is narrower than the spatial resolution of the remotely sensed imagery.
In order to measure RWW from remotely sensed imagery, a binary river map always needs to be produced first. Standard hard classification techniques treat each pixel as belonging solely to a single land cover class. Recognizing that pixels that lie along the edge of the river are typically of mixed class composition in coarseresolution imagery, river boundaries generally cannot be appropriately represented in the binary river map produced by hard classification. Producing a fine-resolution river map from coarse-resolution imagery is expected to increase the accuracy of RWW estimation. This can be achieved via a subpixel scale analysis, including spectral unmixing and superresolution mapping (SRM). Spectral unmixing aims to decompose the spectral signature of mixed pixels into a set of endmembers and their corresponding fraction images. Endmembers are pure spectra corresponding to each of the land cover classes, and fraction images represent the proportional areas of different land cover classes within mixed pixels (Foody & Cox, 1994;Settle & Drake, 1993). SRM is the postprocessing stage of spectral unmixing, which aims to determine the spatial distribution of different land covers within coarse-resolution pixels and generate a fine-resolution land cover map, using coarse-resolution fraction images produced by spectral unmixing as the input . In the past decades, a range of SRM algorithms have been proposed (Atkinson, 2009;Ling et al., 2010;Tatem et al., 2001) and successfully applied in various situations, including water body mapping. For example, Foody et al. (2005) applied SRM in shoreline mapping, with an RMSE of 2.25 m when using a 20-m spatial resolution image as input. Muad and Foody (2012) fused multiple MODIS images to map lake boundaries. Some researchers incorporated fine-resolution digital elevation model (DEM) data sets into SRM to further increase accuracy (Huang et al., 2014;Li et al., 2013;Ling et al., 2008).
Although SRM has been found to be effective at increasing the accuracy with which water bodies can be represented based on remotely sensed imagery, current popular SRM methods might not be able to produce accurate river maps. The objective of SRM is to transform the coarse-resolution fraction images into a fineresolution map, possibly with the aid of prior information on the land cover mosaic (Muad & Foody, 2012). The resultant fine-resolution map is thus dependent on the quality of the fraction images and, if used, the information on the spatial pattern of land cover-that is, a detailed accurate representation of the geographical distribution of the land cover classes at a subpixel scale. In practice, however, the input fraction images include errors that will degrade the SRM analysis and its output. Moreover, the spatial dependence model, which aims to make the resultant fine-resolution land cover map have the maximal spatial dependence in the SRM analysis, may only be effective for relatively large water bodies and will tend to overaggregate land cover patches, making it unsuitable for the description of the spatial pattern of land cover mosaics containing rivers, especially smaller streams . Given these two concerns, a more effective SRM approach is required to produce fine-resolution river maps which might be used for accurate RWW estimation to support rapid and global measurement of RWW.
Recently, deep learning techniques have been successfully applied in a wide range of computer vision, remote sensing image, and water resource analysis tasks (Shen, 2018;Zhang et al., 2016). Deep neural networks, such as convolutional neural network (CNN), have been proposed for image denoising (Zhang et al., 2017), single image superresolution (Dong et al., 2016;Kim et al., 2016), and generating high-resolution climate change projections (Vandal et al., 2017). Deep neural networks can produce results with much higher accuracies than traditional approaches, contributing to their powerful abilities of end-to-end nonlinear mapping and automated feature extraction from images. Consequently, deep neural networks should provide the opportunity for effective SRM analysis, by addressing two fundamental limitations encountered with contemporary SRM methods. First, errors in the fraction images arising from the initial spectral unmixing analysis can be considered to be noise that impacts negatively on the SRM analysis that is based on the fraction images. The CNN model used for image denoising can reduce this problem. Second, image superresolution, which aims to reconstruct a fine spatial resolution image from a coarse-resolution image, is similar to the SRM analysis, as the crucial problem for both of them is establishing the mapping relations between fineand coarse-resolution images. The CNN models have shown their powerful abilities in image superresolution (Dong et al., 2016;Kim et al., 2016) and should be introduced to the SRM analysis, in order to address the limitation of existing models used to provide accurate representation of land cover distribution at a subpixel scale. As such, deep neural networks should provide a means to address the main limitations of the SRM analysis and aid accurate RWW estimation.
This study proposes an approach to increase the accuracy of RWW estimation from coarse spatial resolution remotely sensed imagery. Specifically, this study aims to build a novel SRM algorithm based on the concept of deep learning that addresses the two crucial problems for accurate pixel RWW estimation: generating an accurate fine-resolution river map robust to fraction errors.

Methods
The proposed approach comprises three main processing steps ( Figure 1). First, a coarse-resolution water fraction image is estimated from the observed remotely sensed imagery by spectral unmixing. Second, a fine-resolution river map is produced from the coarse-resolution water fraction image using a CNN-based SRM analysis. Third, RWW is estimated from the resultant fine-resolution river map, and its accuracy is assessed.

Spectral Unmixing
Conventionally, many image analyses have assumed that a region is covered uniformly by a single class. In practice, however, mixed pixels, which are composed of more than one class, are common, especially in coarse-resolution imagery (Foody, 1996). An estimate of the class composition of a mixed pixel may be obtained by spectral unmixing (Foody & Cox, 1994;Settle & Drake, 1993). A popular spectral unmixing analysis is based on the linear mixture model, in which the spectrum of a coarse-resolution pixel is modeled as a linear sum of each endmember (class) spectrum weighted by its fractional cover (Settle & Drake, 1993), and can be represented as where ρ' λ is the reflectance associated with a pixel at the spectral band λ, N is the number of endmembers, ρ iλ is the reflectance and f i is the fractional cover of the ith endmember, and ε λ is the residual term. RMSE is used to determine the model fit and indicate accuracy: where B is the number of bands.
The linear mixture model can often provide a good approximation of the physical process underlying the observations, but a key limitation is that the spectral variability of land cover classes is not considered. Consequently, the observed spectral response for a given pixel can be associated with a variety of class compositions. Multiple endmember spectral mixture analysis (MESMA) is an extension of the basic linear mixture model and allows the endmembers in each coarse spatial resolution pixel to vary in type and number (Dennison & Roberts, 2003;Roberts et al., 1998). MESMA typically comprises two main steps. First, a spectral library that includes the optimal endmembers is constructed. Second, the coarse-resolution pixels are unmixed using the spectral library to produce fraction images.
Here the approaches in the Visualization and Image Processing for Environmental Research Tools software package (Roberts et al., 2007) are used to build the spectral library. The minimum noise fraction transformation is first applied to the original multispectral image, and the purest pixels are found using the pixel purity

10.1029/2018WR024136
Water Resources Research index (Boardman, 1994). The spectra of different endmembers are then selected visually from these purest pixels, and selected endmembers are further refined using the count-based endmember selection index (Dennison & Roberts, 2003). Once the appropriate endmembers have been identified, all possible endmember combinations are analyzed for each pixel, and the endmember combination with the lowest RMSE value in the spectral unmixing analysis is identified.

SRM by Deep CNN
With the coarse-resolution water fraction image produced by MESMA as input, the SRM analysis is applied to produce a fine-resolution land cover map that includes water and thereby depicts rivers. Suppose that the original coarse-resolution water fraction image has the spatial size of M 1 × M 2 pixels. Setting the zoom factor to z, SRM aims to generate a fine-resolution river map, which contains (z × M 1 ) × (z × M 2 ) pixels. Each fineresolution pixel is considered to be pure (i.e., representing an area with a single class) and is assigned as river or nonriver. Based on recent advances with deep learning approaches, a CNN model is used in the SRM analysis. The whole procedure is shown Figure 2 and comprises three parts: training samples generation, CNN model training, and finally, application of the trained CNN model to superresolution river mapping.

Generating Training Samples
The CNN model used in the SRM analysis aims to model the nonlinear relationship between the coarseresolution water fraction image and the fine-resolution river map and needs to be trained with training samples before its application. Training samples, each of which consists of a coarse-resolution water fraction image and its corresponding fine-resolution river map, are critical to the successful use of the CNN model, and one of two methods can be used to form the training set. The first is based on the collection of real coarse-resolution remotely sensed imagery and the corresponding fine-resolution river maps. This method is intuitive but is often difficult to apply because there are many different satellite products with different spectral and spatial properties, and training samples would have to be collected for each of them. Moreover, it is often difficult to ensure that the coarse-resolution images match in time with the fineresolution river maps, a potentially major problem if the water surface area changes rapidly, and spatial misregistration error is also common. The second approach to generate training samples involves using fine-resolution remote sensing images that can be degraded to a coarse resolution. With this method, the problem of temporal consistency and geo-registration is solved, but the number of bands provided by fine-resolution remotely sensed images is often low, which may limit spectral unmixing.
Given the limitations of these two approaches, the training samples in this study are simulated from collected fine-resolution river maps, which can be produced from existing fine-resolution remotely sensed imagery. An independent CNN model should be trained for a predefined zoom factor. During the simulation, a fine-resolution river map is initially used to generate its corresponding coarse-resolution water fraction image by spatial degradation. The area proportions of fine-resolution river pixel within each coarseresolution pixel are calculated and assigned as the water fraction values according to the zoom factor. The resulting simulated water fraction values are error-free, but this is unrealistic because fraction errors are unavoidable during spectral unmixing. The CNN model should be trained not only to map the nonlinear relationship between the coarse-resolution fraction water image and fine-resolution river map but also to denoise the errors in the coarse-resolution fraction water image. Therefore, random noise is added to simulate the fraction errors. As a result, a fine-resolution river map and a simulated noise-added coarseresolution fraction image consist of one training sample. This method is flexible as only fine-resolution river maps are used, and one fine-resolution river map can simulate different coarse-resolution water fraction images, since noise is randomly added. The training samples can then be easily expanded with different amounts of noise to provide more useful information for the training of the CNN model.

Training the CNN Model
All simulated fine-resolution river maps and the corresponding noise-added coarse-resolution water fraction images are then used as samples to train the CNN model. In this research, a very deep CNN (Kim et al., 2016) is used. The CNN used comprises 20 layers. The image input layer is followed by the first 2-D convolutional layer that contains 64 filters of size 3 × 3 and a rectified linear unit layer. The second to the penultimate layers of the CNN are 18 alternating convolutional and rectified linear unit layers. Every convolutional layer contains 64 filters of size 3 × 3 × 64. The last layer consists of a single filter of size 3 × 3 × 64. Zero padding is used before convolutions to ensure the input and output images have the same size.
According to the structure of the CNN model, the input is not the coarse-resolution water fraction image in training samples but the interpolated image matching the size of the fine-resolution river map. A standard method such as bi-cubic interpolation with the appropriate zoom factor may be used to generate this input data set. The output of the CNN model is not the fine-resolution river map in training samples but the residual image, which represents the difference between the interpolated image and the fine-resolution river map. Therefore, the CNN model is not used to directly model the nonlinear relationship between the coarse-resolution water fraction image and the fine-resolution river map but the relationship between the interpolated image and the residual image, which has been shown to be an effective technique for image superresolution (Kim et al., 2016).
With this technique, however, the residual image cannot be calculated directly because the fine-resolution river map has discrete category labels. To solve this problem, a relaxation procedure may be applied to alter the labels of the river map to generate the water indicator image. If a fine-resolution pixel in the river map is Water Resources Research labeled as river, it is absolutely covered by water, and the value of 1 is then assigned to it in the water indicator image; otherwise, it is covered by land, and the value of 0 is assigned to it. Therefore, pixels in the water indicator image take on numerical values on the range [0, 1] reflecting the proportional coverage of water in the area represented by a pixel. The residual image can then be calculated by subtracting the interpolated image from the water indicator image.
Using interpolated images and corresponding residual images produced from all fine-resolution river map training samples, the CNN model can then be trained. Given that the residual image not only includes the errors caused by the interpolation process but also includes those caused by simulated fraction errors, the trained CNN model is intended to simultaneously recover the missing fine-resolution information about the spatial pattern of rivers and also remove noise due to fraction errors.

Superresolution River Mapping
Once the CNN model has been trained, it can be used for superresolution river mapping. Given the trained CNN model has learned the nonlinear relationship between the coarse fraction image and the fineresolution river map from those training samples, it is expected that the unknown fine-resolution river map can be reconstructed from the real coarse-resolution water fraction image produced by spectral unmixing, using the information included in the CNN model.
During this procedure, the coarse-resolution water fraction image produced by spectral unmixing is first interpolated to form the fine-resolution image. Then, a residual image is produced from the interpolated image using the CNN model. By adding the interpolated image and the residual image, a fine-resolution water indicator image is produced. Finally, the threshold method is applied to generate the output resultant fine-resolution river map from the fine-resolution indicator image. Here the threshold value is set to be 0.5, meaning that if the value of a fine-resolution pixel in the fine-resolution indicator image exceeds 0.5, it is classified as a river pixel.

RWW Estimation and Accuracy Assessment
RWW values are estimated automatically from the fine-resolution river map using RivWidth software (Pavelsky & Smith, 2008). To assess the accuracy of RWW, sample points are randomly selected from the RWW map and compared with those in the fine-resolution reference river map. In order to reduce the impact of mismatching between river center lines exacted by RivWidth from different river maps, the average RWW value in the area covering by 5 × 5 pixels in the original coarse-resolution remotely sensed images is used for each sample point. The accuracy of the RWW estimates is assessed using the RMSE, the average error (AE), and the percentage error (PE), defined as follows: where b w k is the RWW value estimated with the proposed method of the kth sample and w ref k is the reference RWW value; and N is the number of samples.
To provide benchmarks to aid the evaluation of the proposed approach, the pixel-based hard classification and three other popular SRM algorithms, pixel swapping (PS; Atkinson, 2005), Hopfield Neutral Network (HNN; Tatem et al., 2001), and the algorithm based on A+ (Timofte et al., 2014), are also applied for comparison. Details description of these three SRM algorithms is provided in Supporting Information S1.

Experiments and Results
The potential of the proposed method was explored in two experiments. The first experiment used a MODIS data set, and the second experiment used a Landsat data set. Earth Explorer (http://earthexplorer.usgs.gov), and the reference fine-resolution river map with a spatial resolution of 30 m was produced from the Landsat OLI image with a threshold approach based on the Modified Normalized Difference Water Index (Xu, 2006; Figure 3c).

Model Implementation
Using the MODIS image as input, MESMA was performed to produce a water fraction image. Here four land cover classes were used: vegetation, soil, urban, and water. Endmembers for these four land cover classes were selected with the Visualization and Image Processing for Environmental Research software (Roberts et al., 2007), and the simplex projection algorithm coded in Matlab (Heylen et al., 2011) was used for spectral unmixing. Once the water fraction image was estimated, the SRM analysis was performed to produce a fineresolution river map.
To illustrate the impact of the zoom factor, five levels were used in the SRM analyses: 2, 4, 6, 8, and 10. For each zoom factor, an independent CNN model was trained. The training samples of the CNN model were generated from the Global River Width from Landsat data set (Allen & Pavelsky, 2018). These river maps were selected from seven places globally ( Figure S1) and were divided into small image subsets, each of which had the size of 400 × 400 pixels. A total of 500 small image subsets was randomly selected to form the training data set ( Figure S2). During the training process, we used the same parameters as Kim et al. (2016): The mini-batch size is 64, the number of training epochs is 80, and the initial learning rate is 0.1 and is reduced by a factor of 10 every 20 epochs. Figure 4 shows river maps produced by different methods from the MODIS image. The main river networks produced from the MODIS image with hard classification and all four SRM models were similar with that in the reference map (Figure 3c), especially for those rivers with widths larger than several kilometers. However, in the river map produced by hard classification, many narrow rivers became disconnected and the resultant river boundaries unrealistically jagged, as only pixel scale information was considered. In contrast, the output of the CNN model yielded maps that were visually more realistic, although this was dependent on the zoom factor used. When z = 2, more rivers were mapped in the study area, and their connectivity was maintained better than with the hard classification, but the river boundaries did not appear smooth; the jagged edge phenomenon was still apparent. With an increase of the zoom factor, this jagged edge phenomenon was reduced. With a large zoom factor, such as z = 8 ( Figure 4e) and z = 10 (Figure 4f), the resultant river map was very similar to the reference (Figure 3c). Many narrow rivers were correctly mapped, river connectivity was well-maintained, and the river boundaries were smooth. These results indicate the advantage of the CNN model for mapping rivers by considering subpixel information.

Results
Of the four SRM methods, the results produced by PS (Figure 4g), HNN (Figure 4h), and A+ (Figure 4i) were visually worse than those produced by the proposed CNN model at z = 10 ( Figure 4f). In the river map produced by PS, there were many incorrectly mapped small rounded water bodies and many disconnected narrow rivers; the errors arose because of fraction errors in the unmixing. The results from HNN and A+ were better than those from the PS, but most narrow rivers were not mapped and river boundaries were not as smooth as those from the CNN. The reason was that although the HNN can eliminate fraction errors to some extent, many pixels associated with narrow rivers were considered as fraction errors and were wrongly eliminated at the same time. Moreover, HNN uses the maximal spatial dependence model and thus cannot always represent the geographical distribution of the water class and hence rivers. A+ is based on the training technology; however, fraction errors were not included in the training samples. In contrast, the proposed CNN model not only can effectively eliminate errors in the input water fraction image but also can realistically reconstruct the spatial pattern of rivers.
The overall accuracies of different methods are shown in Table 1. The overall accuracies of CNN were higher than those obtained from hard classification and the other SRM methods at all zoom factors. In general, the resultant river map of CNN had a higher overall accuracy with a larger zoom factor. However, the overall accuracies of CNN with z = 8 and z = 10 were very close, indicating that the rate of increase in accuracy with increasing zoom factor diminished.
The RWW values measured with different methods were further assessed quantitatively. Given the river maps produced by PS, HNN, and A+ were worse than that produced by CNN, only the results of hard classification and the proposed CNN model were compared to assess the RWW accuracy measured at the pixel and subpixel scales. Figure 5 shows the scatter plots of reference RWW values against those estimated from the MODIS image. For the plot with the hard classification (Figure 5a), the R 2 is 0.7979, while higher coefficients were observed for all SRM outputs, with a maximum of 0.9599 (Figure 5f). Moreover, it was evident that the scatter of points declined with an increase in zoom factor (z), which is inversely related to pixel size.  In the scatter plot based on the output of the hard classification (Figure 5a), the RWW values estimated from MODIS were all constrained to be at least 463 m, the size of a MODIS pixel. Through the use of the SRM analysis, RWW could be estimated at the subpixel scale, and hence, widths below 463 m were possible. Table 2 reports a summary of the accuracy assessments for hard classification and the CNN-based SRM analysis. These quantitative comparisons show that the SRM analysis predicted RWW values more accurately than hard classification, with lower RMSE, AE, and PE values. For all accuracy measures, it was evident that accuracy was positively related the zoom factor, although at a diminishing rate. Most of the improvement occurred from z = 2 to z = 4. There was not much difference between the accuracy values from z = 6 to z = 10. For z = 10, the RMSE decreased from 397.15 to 176.85 m, AE decreased from 92.03 to 4.31 m, and PE decreased from 11.33% to −0.37%, compared with the result of analyses based on the hard classification. The overestimation of RWW in the results of hard classification and CNN with small zoom factors was expected, because they do not include underestimates of rivers narrower than their spatial resolutions. The overestimation of RWW was alleviated in the CNN results at large zoom factors, and there were even some light underestimations of PE. Overall, the improvement of all three accuracy statistics values showed the effectiveness of the SRM analysis for RWW measurement.

Landsat TM Image Experiment 3.2.1. Data Set
An experiment with Landsat TM imagery was undertaken to further assess the proposed method. The study region was the Columbia River basin in northwestern Oregon in the United States of America, covering an area about 5,000 km 2 (90 km × 54 km) and included part of the Willamette River basin and the whole Santiam River basin ( Figure 6). The Willamette River is a tributary of the Columbia River, while the Santiam River is a tributary of the Willamette River. The Santiam River has two principal tributaries, the North Santiam River and the South Santiam River, with drainage areas of about 2,000 and 2700 km 2 , respectively.
A Landsat TM image acquired on 14 August 2005 was used to estimate RWW in the study area (Figure 6a). Fine spatial resolution images were collected from Google Earth and used as the reference. The Google Earth images were acquired only a day after the Landsat TM image on 15 August 2005. The Google Earth images had a spatial resolution of 1 m, and water boundaries were extracted by manual digitization. In the Willamette River, RWW values were mostly larger than 90 m. For the two tributaries of the Santiam River, including the main stems of the north Santiam River and the south Santiam River, RWWs mainly lay in the range between 30 and 90 m. For other lower-order tributaries, RWW values were mostly less than 30 m. Since the very narrow rivers were almost invisible in the Landsat image, only the rivers with a RWW >5 m (Figure 6b) were used in this study.

.2. Model Implementation
The analysis was performed with similar methods as used in the MODIS experiment. Four classes-vegetation, bare land, urban, and water-were used for MESMA. The fine-resolution river maps used for training the CNN model were produced from Google Earth images located outside the basin and comprised 500 images with a size of 400 × 400 pixels. According to the MODIS experiments, the result produced with z = 8 can make the river boundaries smooth enough while can obtain a similar accuracy to that estimated with a larger zoom factor. Therefore, the zoom factor z = 8 was used in this experiment, meaning that the result river map produced by the SRM analysis had a spatial resolution of 3.75 m.

Results
Compared with the reference river width map produced from the fine spatial resolution Google Earth image, the river width maps produced from the Landsat TM image by the CNN-based SRM analysis and hard classification for the Willamette River and the main stem of the Santiam River were very similar to the reference map (Figure 7). For the narrower north Santiam River and two tributaries of the south Santiam River labeled as 1 and 2 in Figure 7, the river widths in the reference and the SRM result were similar, but river widths measured by hard classification differ from the reference, notably as the latter was constrained by the pixel size to have a minimum value of 30 m. Figure 8 shows enlarged images and river maps for subareas A and B shown in Figure 7. The water fraction images estimated from the Landsat TM images in both subareas are shown in Figures 8b and 8k. Large water fraction errors occurred, especially in areas with tree shadows that were misidentified as water during the spectral unmixing analysis. When a hard classification was applied to extract the river maps, the resultant river boundaries did not appear smooth, and the jagged edge phenomenon was obvious, similar to the MODIS experiment. The river maps produced by PS, HNN, and A+ were also different from the reference map. There were many bulges along the river boundaries caused by fraction errors. Both PS and HNN could not deal with this limitation effectively, making the resultant rivers much wider than the reference, especially for narrower rivers as shown in subarea B. In the river map produced by A+, there were many incorrectly mapped patches, and river boundaries were not smooth. In contrast, the CNN model was superior to hard classification and the other three SRM techniques. The river maps produced by CNN were similar to the reference in both subareas. The impact of fraction errors was vastly reduced, and the geographical distribution of water and hence rivers were more accurately depicted, showing the effectiveness of the proposed method. Figure 9 shows scatter plots between the reference RWW values and the results produced by the hard classification and the CNN model, for 5,000 randomly selected sites. For the hard classification, a horizontal line representing the river width of 30 m is evident as the minimum estimated RWW, a constraint arising from the pixel size of the original Landsat TM image. For the CNN model, river widths narrower than 30 m were, however, also estimated, because the river map produced by the SRM analysis had a spatial resolution of 3.75 m. The distribution of points along the 1:1 line is more dispersed in the scatterplot based on the hard classification result than that in the CNN result, implying that the uncertainty of measured river widths was larger in the hard classification result than that in the SRM result. The R 2 between the reference and estimated RWW was 0.8842 for the CNN result which was higher than that of the hard classification result, which was 0.6917. The RMSE also showed the effectiveness of the CNN model in RWW measurement, compared with the hard classification approach. The AE values were both negative for CNN and hard classification, showing an underestimated RWW value. For hard classification, most narrow rivers were overestimated, while wide rivers were more likely to be underestimated, leading to an overall underestimation of RWW. For CNN, although RWW was still underestimated, both narrow and wide rivers were better estimated, and the AE value was also better than that of hard classification.
In Figure 10, the PE values were analyzed by considering the variation of river widths. The RWW values were classified into three categories: ≤30, >30 and ≤90, and >90 m. In general, pure water pixels in the river center could be mapped accurately with both CNN and hard classification approaches, and errors of estimated river widths were mainly caused by the mixed pixels near the river boundaries. Therefore, the degree of overestimation and underestimation decreased with the increase of RWW. For rivers whose widths were larger than 90 m, the percentage of PE values between −20% and 20% were larger than 60% and 80% for hard classification and CNN, respectively. In contrast, for rivers whose widths were less than 30 m, the percentages of PE values between −20% and 20% were only about 10% and 30% for hard classification and CNN, respectively. Specially, for rivers whose widths were less than 30 m, PE values were all positive because the minimal RWW value estimated by hard classification is 30 m, and the percentage of PE larger than 100% reached 54.4%. In contrast, both positive and negative PE values existed for the SRM analysis, while the percentage of PE larger than 100% was only 15.8%. These results showed that for rivers with widths that were comparable to or less than the image pixel size, the proposed CNN model reduced the relative error compared to the hard classification approach.

Discussion
Results from the two experiments described above showed that subpixel scale RWW measurement was more accurate than that of conventional pixel-based RWW measurement. The proposed CNN-based SRM model  could effectively eliminate the impact of fraction errors and reconstruct the spatial pattern of rivers. In practice, however, some issues connected with the performance of the proposed method should be considered.

The Scale Issue
The scale issue is one of the most crucial factors that affect the analysis of remotely sensed data in general and is particularly important for RWW measurement at both pixel and subpixel scales.
For the pixel-based analysis, the scale issue mainly arises from the relationship between the size of the object of interest (e.g., RWW) and the image pixel. A simple example is shown in Figure 11 to illustrate this scale issue. The river is shown as a blue line with a width of 3.7 m, and four remote sensing images are available to estimate its width. The images have spatial resolutions of 1, 2, 4, and 8 m. For simplicity, we assume that the river boundary runs parallel with the pixel boundary. When the hard classification is applied, a pixel is classified as river once >50% of the pixel's area is covered by the river. As shown in Figure 11a, when the image has a spatial resolution of 1 m, several pixels are fully covered by the river, and the pixels located at the boundaries are only partially covered by the river. The river width should be measured as 3 m, for example, when both 35% of the upper boundary pixel and the bottom pixels are covered by the river or 4 m when the upper pixel includes 10% river area while the bottom pixel includes 60% river area. Similarly, the width could be estimated to be 2 or 4 m when the pixel resolution is 2, 0, or 4 m when the pixel resolution is 4 m and 0 m when the pixel resolution is 8 m. This example shows that the coarser the pixel resolution, the more uncertainty is associated with the measured river width and the areal extent of rivers. When the pixel size is much coarser than the river width, the river may not be detected at all. However, the river width is still not estimated accurately even with the 1-m resolution image, but we expect this uncertainty can be substantially reduced if, for example, a 0.1-m resolution river map can be generated by the subpixel analysis.
For the subpixel scale analysis, the scale issue is mainly related to the zoom factor used in the SRM analysis. The larger the zoom factor, the finer the spatial resolution of the resultant river map. Unfortunately, however, the larger the zoom factor, the greater the uncertainty in the SRM analysis . As shown in the experiment results, increasing the zoom factor beyond a certain value may not result in significant further increases in accuracy. Moreover, although the SRM analysis increased the accuracy of RWW estimation, the accuracy obtained for rivers whose widths were less than the size of several pixels was much lower than that of rivers whose widths were larger than the size of several pixels. In practice, therefore, the spatial resolution of the observed coarse remotely sensed image and the zoom factor used in the SRM analysis should be selected carefully.

The CNN Model
The key to accurate estimation of RWW at the subpixel scale is the SRM analysis. In practice, to produce an accurate fine-resolution river map ready for use in the RivWidth software, two crucial problems should be solved in the SRM analysis. The first is how to model the spatial pattern of river networks; the second is how to eliminate negative effects caused by errors in the coarse-resolution water fraction image. Both problems are hard to address with existing SRM models, such as HNN, PS, and A+. In contrast, the CNN model proposed in the study can overcome the shortcomings of other SRM models and enhance the accuracy of the SRM. This improvement is related to two key issues: learning the land cover spatial pattern and handling unmixing fraction errors.
The CNN model can automatically learn the complex nonlinear relationship of spatial textures between the coarse-resolution water fraction image and the fine-resolution river map, thanks to the large number of training samples and the large number of parameters included. In a SRM analysis, a spatial pattern model is often applied to provide information about the land cover spatial pattern, for example, the maximal spatial dependence model used in the HNN and PS algorithms. However, these prior models are not always effective in practice. For example, the maximal spatial dependence model should be suitable when rivers are wide but not for rivers having widths close to the size of one or several pixels. Instead of using a manually defined spatial pattern model, the CNN model learns the spatial pattern automatically from an existing data set, leading to a more powerful modelling ability, as shown by the experimental results. Further improvements are also possible, for example, using more powerful networks with different structures and/or using a larger number of fine spatial resolution images with richer river features as training samples.
The second advantage of the proposed SRM analysis is that fraction errors can be dealt with effectively. In practice, errors are always unavoidable in the fraction images produced by spectral unmixing. In this study, random noise is added into the coarse-resolution water fraction image to simulate fraction errors in the training samples. Given the corresponding fine-resolution river map is error free, the residual image used to train the CNN model includes not only the missing high frequency information but also information about fraction errors. Therefore, the CNN model should be trained not only to recover the details used for spatial resolution transformation but also to denoise fraction errors. In practice, however, completely eliminating the negative effects caused by fraction errors is impossible; the smaller the fraction errors, the higher the expected accuracy.
Computational cost is an important issue when the proposed method is applied to large geographical areas. The CNN model includes two steps, that is, training and mapping. The training step is time-consuming and depends on the parameters used. In the MODIS experiment, the training time is about 20 hr on the Matlab platform running with a NVIDIA X1080 GPU with 8 GB RAM. The zoom factor has less impact on the training time. In contrast, the mapping step is very fast, and only several seconds are needed in the MODIS experiment with the zoom factor of 10. As a comparison, the PS and HNN models need hundreds or thousands of iterations to converge, and several hours are necessary in the MODIS experiment with the zoom factor of 10.
In practice, the CNN model is expected to have a higher efficiency for a large area analysis, because the training needs to be undertaken only once and then any subsequent mapping is very fast. This is particularly attractive for regional to global-scale analyses of RWW.

Uncertainty and Applications
The main limitations of the proposed method are errors included in the water fraction image, which have a negative impact on the SRM analysis that is based on the fraction images. First, the uncertainty of spectral unmixing is unavoidable due to the spectral variations of land cover classes and the nonlinear mixture phenomenon (Roberts et al., 2007). Second, the spectral reflectance values of water are similar to those of shadow, which is often used as an implicit class in the MESMA analysis. Although the shadow class is not included, this similarity will bring uncertainty to the resultant fraction image. Third, in practice, rivers are often covered by bridges crossing them or trees located along river banks. In particular, trees could cover most parts of water bodies in narrow rivers. In this situation, rivers are invisible in remotely sensed images, and RWW cannot be measured directly from remotely sensed imagery, although some degree of remediation is sometime possible via, for example, selection of imagery during leaf-off conditions.
The proposed method can be further improved for future use. First, the proposed method was applied with the MODIS image with the resolution of about 500 m and the Landsat TM image with the resolution of 30 m in the experiments. For both of them, there are spectral bands with finer spatial resolutions, including the first and second bands of MODIS that have a spatial resolution about 250 m and the panchromatic band of Landsat with the spatial resolution of 15 m. Incorporating these finer-resolution bands is expected to increase the accuracy of a SRM analysis Nguyen et al., 2006;Vandal et al., 2018). Second, the proposed method could be applied on other remotely sensed images, such as the newly launched Sentinel-2 imagery that has a finest spatial resolution of 10 m. For remotely sensed imagery with the spatial resolution of several meters or finer than 1 m, the proposed method can also be applied to further improve the RWW estimated. With respect to measuring RWW, our method could now be applied to a "menu" of remotely sensed imagery selected according to catchment type. This could be conducted both at a local scale or globally building on studies such as Allen and Pavelsky (2018), with the knowledge that those narrower rivers will be measured with higher accuracies and/or indeed larger rivers measured using the appropriate imagery with respect to the temporal dimension (i.e., in situations of flooding and low discharge).

Conclusions
RWW is a fundamental variable needed to analyze hydrological and biogeochemical processes in rivers and has been estimated from remotely sensed images acquired by satellite sensors. Although fine spatial resolution remotely sensed images may be used to estimate RWW accurately, research is often constrained to use relatively coarse-resolution images for practical reasons, as well as reasons of timing (e.g., historical analyses, cloud cover avoidance, and water flow). There are many mixed pixels along the river boundaries in coarse-resolution imagery, and this greatly decreases the accuracy of RWW estimation when traditional pixel-based hard classification approaches are used, especially for narrow rivers. To address this problem, this paper proposes an approach to estimate RWW from remote sensing images at the subpixel scale. The approach first uses MESMA to extract water fraction images from the observed remotely sensed image. A fine spatial resolution river map is then produced by a CNN model based SRM analysis and finally is used for RWW estimation.
The proposed approach is assessed with two experiments, one focuses on estimation from a coarse spatial resolution MODIS image and the other on estimation from a medium-resolution Landsat TM image. The results show that the proposed CNN-based SRM analysis model can overcome two main shortcomings of existing SRM models. The spatial pattern of rivers can be represented well by learning the nonlinear mapping between the coarse-resolution water fraction image and the fine-resolution river map with the CNN model. The negative effects caused by errors of the coarse-resolution water fraction image can also be overcome with the CNN model by introducing noise into the training data samples to simulate fraction image errors. The results also show that the proposed method can substantially increase the accuracy of RWW estimation compared with the conventional hard classification approach and has advantages over other SRMbased methods. The proposed approach is valuable for measuring RWW from remotely sensed imagery, especially for relatively narrow rivers and streams, and is applicable at regional to global scales. This is a fundamentally useful feature as such water bodies might play a fundamental yet possibly underestimated role in the gas fluxes with the atmosphere and hence be an important part of the global carbon cycle.