Tools and Processes for EO Research: A Scalable EO Data Processing Service Model
Rivolta, Giancarlo¹; Farres, Jordi²; Pinto, Salvatore¹; Cuccu, Roberto¹; Santoro, Maurizio³; Del Bianco, Samuele⁴; Oliva Balague, Roger⁵
¹Logica, ITALY; ²ESA ESRIN, ITALY; ³GAMMA RS, SWITZERLAND; ⁴CNR-IFAC, ITALY; ⁵ESA ESAC, SPAIN

The ESA Research and Service Support (RSS) service provides resources to support Earth Observation (EO) data exploitation since many years. Recently, the RSS Grid Processing On-Demand (G-POD) service has been successfully enhanced to provide the required flexibility, in terms of data management and processing capacity, to both EO research projects and EO algorithm validation activities, thus confirming its already demonstrated relevance to EO data exploitation for the coming Sentinel era as well. Service flexibility has been recognized to be key for guaranteeing efficient EO research support. It has been obtained by enhancing the G-POD data management process and by implementing the new RSS scaling-up process, designed to resort to cloud infrastructure in a timely and cost effective mode. In this paper we introduce the RSS service model and its recent enhancements. As concrete examples of how flexible support can produce value for EO research projects and algorithm validation activities, the results of three projects recently supported by RSS are reported as well. The first example belonging to the class of "PI algorithm processing support" for the ENVISAT mission, is the BIOMASAR-II project, aimed at generating a pan-boreal Growing Stock Volume (GSV) map for the whole year 2010 by processing ASAR WS and GM multi-temporal data stack with the existing and validated biomass retrieval algorithm imported onto the G-POD platform. The project processed about 70,000 ASAR products (WSM and GM1) using more than 50,000 hours of CPU during 7 months in 2012 on the G-POD infrastructure. Figure 1. Number of backscatter observations per pixel after processing of Envisat ASAR ScanSAR data on G-POD. The processing was arranged in two separate services to introduce certain flexibility in the long BIOMASAR processing chain. In a first service (service 1) the SAR dataset was processed from the original radar geometry to geometrically and radiometrically corrected geocoded images. The second service (service 2) ingested the processed images and applied a model-based approach to retrieve forest GSV from the SAR backscatter. For service 1, the GAMMA Software was installed on G-POD and batch processing scripts were ingested in G-POD to be then operated via an ad hoc GUI. In this way, the operator could access the SAR database of G-POD and process directly the data on the platform thus exploiting multiple processing cores and avoiding download of a huge amount of data to process data locally. To optimize the processing resources available to the project, the study area consisting of all land masses north of 30 degrees N latitude were divided into 23 sub-regions. The large amount of images within each area forced further sub-setting into several small time intervals. This resulted in several hundreds of processing tasks. The creation of each processing task required significant intervention of the operator to avoid processing duplicates available in the G-POD SAR database. The data search interface was found to be insufficient to support fast selection and de-selection of images when their amount was > 100. Service 2 required the Matlab programming environment. Batch scripts were ingested in G-POD and a related GUI interface was created. As for service 1, the bottleneck was at the level of data selection and amount of resources available for processing, which forced working with rather small processing regions. Using multiple processors however allowed to process several areas simultaneously, thus resulting in an increased performance compared to if the data had been processed on local servers. Figure 2. Pan-boreal hyper-temporal RGB (R: average backscatter, G: Minimum backscatter, B: Temporal variability). The second project belongs to the class of "PI algorithm processing support" for cross sensor comparison. The KLIMA (Kyoto protocoL Informed Management of the Adaptation) project aimed to investigate the ultimate capabilities of IASI (Infrared Atmospheric Sounder Interferometer) on MetOp-A satellite for the retrieval of XCO2 total abundance averaged over a monthly to seasonal time scale and over a spatial scale compatible with the requirements of a comparison with CO2 products of the satellite missions GOSAT (Greenhouse gases Observing SATellite) and OCO (Orbiting Carbon Observatory). Although the processing time per observation was around 2 hours, 5TB of IASI data corresponding to 2300 orbits, covering the period from March 2010 to February 2011, and more than 250,000 observations have been processed by G-POD during 4 months in 2012 [Del Bianco et al. 2013]. The G-POD resources allowed the bulk processing of IASI measurements with the KLIMA algorithm. The features of the G-POD system made it easy the geographical and temporal selection of IASI observations and provided an adequate mean to analyze the IASI data and to conduct a global study over one year of KLIMA performance for the retrieval of XCO2 column. Further enhancements to the KLIMA project results could be reached from the extension of the analysis to the complete IASI dataset and from updating the algorithm for the analysis of IASI/MetOp-B observations. In addition, the G-POD service could be extended to third-party users to make available the analysis tool to the scientific community. Finally, with different optimizations and a different frequency band selection of IASI spectra, the KLIMA-IASI program integrated on G-POD can be used for different purposes: the IASI measurements could be interesting for the validation of other missions and for the study of other key atmospheric constituents such as CH4 and CO. Scientific results have been reported in Del Bianco, S., et al., (2013), XCO2 retrieved from IASI using KLIMA algorithm, submitted to ANNALS OF GEOPHYSICS. The third project belonging to the “Mission testbed” category is the SMOS testbed, aiming to provide a flexible test environment to support the ESA calibration team for L1 calibration, and the Expert Support Laboratories (ESLs) for L2 Soil Moisture and Ocean Salinity pre-validation. In order to streamline the optimization process aimed at ensuring that SMOS final products are better corrected for orbital, seasonal and long term variations, proposed improvements in the calibration procedure or in the image reconstruction algorithms are first validated resorting to the G-POD SMOS testbed campaigns. Such validation campaigns require processing daily samples of SMOS products over specific validation sites around the globe, for the entire mission. As an example, one of these G-POD testbed campaigns allowed the re-processing of all the Calibration CRS products (more than 20,000 products on 25-month dataset) and the sampled science products (more than 6,000 products covering the 25-month dataset) over North Atlantic and Pacific area from L0 to L1C. A future enhanced version of the G-POD SMOS testbed will foresee the introduction of the instrument stability analysis algorithms. These algorithms are currently used by the different ESLs with the L1C products derived from G-POD. Such enhancement will allow the SMOS calibration team and L2 ESLs to directly obtain the pre-defined quality metrics analysis as an output of the G-POD testbed processing.