Making One-Class Classification Feasible for Large Area Applications - An Iterative Approach.
Mack, Benjamin; Waske, Björn
University of Bonn, GERMANY

Regarding the Sentinel Missions the availability of multitemporal data sets from large areas and complementary sensor types (e.g. optical and SAR data) will increase. Such data is particularly suitable for the classification of agricultural and (semi-) natural land cover classes. These areas are often characterized by great temporal variability and typical spatial patterns of high-frequent land cover changes. Thus, monotemporal approaches might be limited in terms of accuracy. In contrast to this, time series contain information about the phenology and allow the classification of classes which might not be separable with a single acquisition. Moreover, spectral ambiguities might be overcome by combining data from different Earth Observation systems (e.g., Sentinel-1 and Sentinel-2).
In the very most cases, supervised classification methods are used for land cover classification. Therefore, a training set, i.e. pixels whose class labels are known, is required during the classifier training. It is well known that the nature of the training set can have a significant impact on the classifier performance and the mapping accuracy.
Ideally the classifier is trained with an exhaustive and representative training set. That is to say, i) labeled samples must be available for all classes within the study area and ii) the samples of each class should cover its distribution in the feature space. It is obvious that the generation of such training sets can be expensive, especially if the area of interest is very large and contains a large number of classes.
However, many users are interested in only a few classes. In this case the use of one-class classifiers is interesting. These algorithms only require labeled samples for the class of interest, i.e. the positive class, but no samples of any other class, i.e. the negative class. Thus, the effort for the training set generation can be significantly reduced, when compared to standard sample set that comprises all classes.
One-class classifier can be categorized in two general groups: i) classifiers that are trained only by labeled positive samples, and ii) classifiers that additionally exploit unlabeled data during the training stage.
Approaches of the first group (e.g. the one-class Support Vector Machine) (Schoelkopf et al. 2001) exhibit fast runtimes. However, several studies have shown that classification accuracy is often significantly lower in comparison to the accuracy achievable with supervised classification. That is especially the case when the classification problem is difficult, i.e. the positive and the negative classes overlap in the feature space.
In this case, approaches of the second group are - theoretically - still able to achieve accuracies comparable to supervised classifiers. Examples of such algorithms are the biased Support Vector Machine (biasedSVM) (Liu et al. 2003), or the Positive and Unlabeled Algorithm (PUL) (Li et al. 2011). However, therefore the unlabeled data must be exploited properly and accurately. This is a difficult task in general because the feature space might be high dimensional (e.g. a time series consisting of optical and SAR data).
Furthermore, when classifying large areas, a large number of unlabeled pixels is available. On the one hand, using a very large number of pixels makes the runtime of the approaches prohibitively slow or even intractable. On the other hand, using an insufficient number of unlabeled pixel can lead to inaccurate classification results.
To overcome this problem, we present an iterative one-class classification scheme In the first iteration a one class classifier is trained with the positive labeled data and a moderate number of unlabeled examples. Using this initial model to classify the whole image will lead to a classification result with a negligible false negative rate, i.e. the probability of misclassifying a pixel of the positive class as belonging to the negative class is zero or relatively low. On the other hand the false positive rate, can still be high, i.e. the probability of misclassifying a pixel of the negative class as belonging to the positive class is high. That is why it is possible to reject all the negative classified pixels and exclude them from the following iterations without making a significant error. In the next iteration, the same steps are accomplished but only on the pixels which are not yet excluded.
The iterations are repeated until a stop criteria is reached which is based on the theoretically optimal Bayes' rule. This was done before in the framework of one-step one-class classification and was realized in the original feature space (Guerrero-Curieses et al. 2002, Mantero et al. 2005). In these frameworks the main difficulty was to derive the necessary probability density functions, even though the data sets of their experiments did not exceed six dimensions. Here, the estimation of the probability densities is a feasible problem because the feature space is the continuous one-dimensional output of the one-class classifier. The study area is located in Luxembourg where RapidEye and Radarsat-2 time series are available for the analysis. The proposed approach is evaluated for several agricultural land cover classes (amongst others grassland, summer cereals, winter cereals, potatoes, corn, and rapeseed). Extensive reference data was available on plot basis.
The proposed technique is independently evaluated for each class, with three different input data sets: the RapidEye time series, the Radarsat-2 time series (using only VV+VH polarisation), and the combined optical and SAR time series. The achieved classification accuracies are compared to the results, provided by a common supervised classification, a Support Vector Machine. A detailed accuracy assessment show that the proposed approach perform similar in terms of accuracy..Moreover the combination of SAR and multispectral data increases classification accuracies.
Especially for large data sets the presented approach - the iterative one-class classification- constitutes a feasible approach and useful modification, which is interesting in regard to Sentinel-1 and 2.