Introducing a Global Dataset of Open Standing Water Bodies
Santoro, Maurizio1; Lamarche, Celine2; Bontemps, Sophie2; Wegmüller, Urs1; Kalogirou, Vasileios3; Arino, Olivier3; Defourny, Pierre2
1Gamma Remote Sensing, SWITZERLAND; 2Université catholique de Louvain, BELGIUM; 3ESA/ESRIN, ITALY

Poor characterization of inland water bodies in current global land cover products triggered an investigation that exploits synthetic aperture radar (SAR) data to provide up-to-date and reliable information on water bodies extent and possibly temporal dynamics. To this scope, Envisat Advanced SAR (ASAR) data acquired in the moderate resolution (150 m) Wide Swath Mode (WSM) were considered because of the frequent observations. Up to daily observations are possible thanks to the strong overlap of swaths of adjacent orbital tracks. The high density of observations allows generating metrics of the temporal variability of the SAR backscatter (TV) and the minimum SAR backscatter (MB) in a time series of measurements. A preliminary analysis at six study areas demonstrated that the features of the multi-temporal SAR metrics were unique (high TV and low MB) with respect to other land cover types. A simple thresholding algorithm in the feature space of TV and MB allowed for accurate detection (Overall Accuracy > 90%) of open standing water bodies with respect to land surfaces. The main advantage of such algorithm is the global classification rule, i.e., the independency from the local land cover types and, therefore, from a set of measurements necessary to calibrate the algorithm. The main limitation of this approach is the imprecise delineation of water bodies. The SAR multi-temporal metrics are affected more by land scatterers than water scatterers so that the pixel is mostly classified as land. To overcome this issue, it is necessary to work at higher resolution compared to the desired final spatial resolution of the water body map and then aggregate the result with an expansion rule. For example, in order to generate a water body map at 300 m (typical pixel size of global land cover products), water body classification should be carried out at 150 m and the results should be aggregated to 300 m. This also allows for detection of thin water bodies, which would disappear if the detection algorithm would be applied directly on SAR data at 300 m pixel size.

The water body detection algorithm has been applied to Envisat ASAR data acquired between 2005 and 2010 to generate a nearly global dataset of open standing water bodies. The ultimate goal is to generate a layer to be included in the Climate Change Initiative Land Cover (CCI-LC) product, which represents the LC "state" and "condition" of the land surface for several epochs at 300m spatial resolution. Envisat ASAR WSM data were the primary dataset. The time span was maximized to limit gaps in the coverage and allow capturing dynamics of water extent in areas with frequent data coverage. For gap filling, also data acquired through to the end of the Envisat mission (April 2012) were used. Further gaps were filled with Envisat ASAR Image Mode Medium resolution data (IMM, 75 m resolution, central US, central Asia). Remaining gaps, primarily over South America and Australia, could only be filled with Global Monitoring (GM1, 500m resolution) mode data, oversampled to 150 m. The 500m resolution of ASAR GM1 was sub-optimal in our mapping efforts; still, this dataset allowed complete coverage of all continents. Isolated isles or groups of isles (e.g., in Oceania) remained unmapped because there are practically no ASAR observation.

The ASAR data was available through the Grid Processing on Demand (G-POD) platform. WSM and IMM images were processed on G-POD to obtain the backscatter measurements at 150 m resolution required for classification. Processing was based on standard approaches including terrain geocoding, speckle filtering, reduction of speckle noise (multi-channel approach) and normalization of the backscatter to reduce the effect of sloped terrain and different viewing geometries. Major efforts had to be spent on the selection of the ASAR data because of multiple entries of the same dataset in the G-POD archives and the limited flexibility of the G-POD platform to handle very large datasets. Over 11 Gbyte of data were produced. In addition, 3 Gbyte of Global Monitoring were obtained by processing the data on local servers. For optimal management of computing resources, the data were tiled according to a 1x1 degree tiling system.

The SAR data was fed to the water body detection algorithm on a tile-by-tile basis. From the multi-temporal SAR backscatter measurements, single multi-year datasets of TV and MB were obtained. Water/land classification was supported by an image of the local terrain slope, derived from a global Digital Elevation Model (DEM) merging several existing DEM datasets. The output of the classification consisted of a map of potential water bodies to be further refined since the multi-temporal SAR metrics were affected by factors that differed locally depending on land cover type, seasonal conditions and amount of SAR data. The refinement rules were based on careful observation of false detections of land as water (more frequent) and water as land (less frequent). Qualitative evaluation of the refined water body map with respect to imagery in Google Earth indicated that major flaws were corrected for.

The final water body product was then consolidated using an independent dataset made of a combination of existing global water bodies' products. The overlay between the SAR product and this dataset allowed underlying the zones of inconsistencies. Based on decision rules related to the amount of input SAR data and visual assessment on high resolution imagery, further improvements in removing remaining commission and omission errors were brought the CCI SAR WB product.

A first quantitative validation is being conducted, based on a validation database of 2500 SAR footprints (150m2) interpreted as "water" or "no-water" using Google Earth imagery. The footprints are spread globally using a stratified random distribution.
A more detailed validation protocol is foreseen during spring with the aim of better understanding the sources of uncertainty associated with the product. Various factors that may affect the accuracy of the water body product (e.g. climatology, geophysical characteristics, signal-based information, etc.) will be identified and retrieved over a sample of points globally distributed. Factors will then be compared to the accuracy of the product and univariate and multivariate statistical analyses will be carried out to estimate the dependence of type I and type II errors on the indices. The results of the analysis could also allow contributing to the current SAR knowledge for water detection.