Recent advances in Biophysical Parameter Retrieval Methods - Opportunities for Sentinel-2
Verrelst, Jochem; Rivera, Juan Pablo; Camps-Valls, Gustavo; Moreno, Jose
University of Valencia, SPAIN

The Global Monitoring for Environment and Security (GMES) Sentinel satellite missions are designed to provide globally-available information on an operational basis for services and applications related to land, ocean and atmosphere. Particularly Sentinel-2 (S2) is intended for land monitoring applications. Building upon experience from earlier missions (e.g. Landsat 5/7, SPOT-5), S2 is configured with improved spectral capabilities (e.g. with specific bands dedicated to better atmospheric correction and cloud screening, bands in red edge). This unprecedented data availability leads to an urgent need for developing robust and accurate retrieval methods. This work present an overview of state-of-the-art retrieval methods dedicated to the quantification of terrestrial biophysical parameters. The rationale of all these methods is that spectral observations are in a way related to the parameters of interest. In all generality, these methods can be categorized into three main domains: 1) parametric regression, 2) non-parametric regression, and 3) physically-based methods.

For the last few years, IPL (Image Processing Laboratory; University of Valencia) has made significant advances in all these three domains, including the development of software to automate these methods. It eventually led to an integrated software package ARTMO (Automated Radiative Transfer Models Operator) that embodies multiple toolboxes and a suite of leaf and canopy radiative transfer models (RTMs). In ARTMO a pre-defined sensor can be chosen (e.g., Sentinel-2) so that the retrieval is directly processed to those band settings. The following toolboxes enable to exploit the aforementioned three retrieval domains to the fullest:

Parametric methods refer to the use of regression models through spectral indices. The 'Vegetation Indices' toolbox encompasses a large collection of established VIs and allows systematic calculation of generic indices. As such, all possible 2-band combinations of a sensor are calculated according to the formulation of an established index, e.g. SRI, NDVI. The predictive power of each VI can immediately be evaluated against in-situ data or input data coming from a RTM by using a fitting curve (e.g. linear, exponential, power). Options to add noise, to control calibration/validation partitioning and various statistical indicators to evaluate the performances (e.g., r2, RMSE, scattering plottings) are provided. The best performing regression model can subsequently be applied to an imagery, which leads to instantaneous mapping of the targeted biophysical parameter.
Non-parametric methods refer to the use of machine learning regression algorithms (MLRA). The 'MLRA' toolbox encompasses a collection of adaptive MLRAs such as partial least squares (PLS), neural networks (NN), support vector regression (SVR), kernel ridge regression (KRR) and Gaussian processes regression (GPR). MLRAs can be powerful because of their ability to perform adaptive nonlinear data fitting. Moreover, depending on the chosen MLRA, multi-output is possible (PLS, NN, KRR) or associated uncertainty estimates are delivered (GPR). This toolbox is designed in a similar way as the Vegetation Indices toolbox; with the same type of options and statistical indicators provided. The generated regression model from the best performing MLRA can subsequently be applied to an imagery which leads to instantaneous mapping of the targeted biophysical parameter(s).
Physically-based methods refer to the inversion of Lookup-table (LUT)-based RTMs through cost functions. This method is considered a physically-sound and robust method to retrieve biophysical parameters but regularization strategies are required to mitigate the drawback of ill-posedness. The 'Inversion' toolbox encompasses a collection of more than 60 cost functions, originating from three different mathematical families, being: information measures, M-estimates and minimum contrast methods. Various regularization options can be introduced in the inversion, being: adding noise, multiple solutions, and data normalizing. Simultaneous retrieval of multiple parameters is possible. Further, along with mean estimates additional uncertainty estimates can be provided in the form of standard deviation and residuals. The best evaluated inversion strategy can subsequently be applied to an imagery, which leads to mapping of the targeted biophysical parameter(s).

In this work, all these methods were evaluated by using Simulated Sentinel-2 data against ground-based validation data from the ESA campaign SPARC (Barrax, Spain). Apart from retrieval accuracy also processing speed and transportability to various other images across the world were analyzed. Preliminary results made us concluding the following: The Vegetation Indices module allows quick mapping of biophysical parameters when local calibration data is available. Calculation of most optimized bands and regression models and mapping the parameter from a simulated S2 imagery occurs almost instantaneously. However, the absence of uncertainty estimates is a major weakness that obstructs transporting the obtained regression model to other images. This method seems therefore less applicable for operational approaches. MLRA approaches require more computational efforts in training the regression model, but once the training process is completed application to an imagery also goes fast, especially for NN. The main advantage of these approaches is its unbeatable predictive power due to its adaptive, flexible fitting. Another strength is the delivery of uncertainty estimates by some MLRAs (e.g. GPR), which allows evaluating its performance to other imageries. However, GPR’s complex computation bear consequences that only a limited number of training samples can be introduced (e.g. up to a few thousands), which, for the time being, makes this method less feasible for operational applications. Nevertheless, efforts are underway both on making GPR coping with larger datasets, as well uncertainty calculation by other MLRAs.
Finally, the main advantage of LUT-based RTM inversion through a cost function is that it is generally applicable and that associated uncertainty estimates can be calculated. However, this method is rather slow as processing against a LUT occurs pixel-by-pixel, and performances tend to be somewhat poorer than MLRAs. To overcome these weaknesses, operational retrieval algorithms typically rely on hybrid forms, e.g. through feeding MLRAs (e.g., NN) by outputs from RTMs. An overview of the strength and weaknesses of these methods will be provided, along with exemplary results for various biophysical parameters. We will close with consolidated guidelines towards powerful retrieval methods that are implementable in operational processing chains.