This paper describes the relationship between spectral resolution and classification accuracy in analyses of hyperspectral imaging data acquired from crop leaves. The main scope is to discuss and reduce the risk of model over-fitting. Over-fitting of a classification model occurs when too many and/or irrelevant model terms are included (i.e., a large number of spectral bands), and it may lead to low robustness/repeatability when the classification model is applied to independent validation data. We outline a simple way to quantify the level of model over-fitting by comparing the observed classification accuracies with those obtained from explanatory random data. Hyperspectral imaging data were acquired from two crop-insect pest systems: (1) potato psyllid (Bactericera cockerelli) infestations of individual bell pepper plants (Capsicum annuum) with the acquisition of hyperspectral imaging data under controlled-light conditions (data set 1), and (2) sugarcane borer (Diatraea saccharalis) infestations of individual maize plants (Zea mays) with the acquisition of hyperspectral imaging data from the same plants under two markedly different image-acquisition conditions (data sets 2a and b). For each data set, reflectance data were analyzed based on seven spectral resolutions by dividing 160 spectral bands from 405 to 907 nm into 4, 16, 32, 40, 53, 80, or 160 bands. In the two data sets, similar classification results were obtained with spectral resolutions ranging from 3.1 to 12.6 nm. Thus, the size of the initial input data could be reduced fourfold with only a negligible loss of classification accuracy. In the analysis of data set 1, several validation approaches all demonstrated consistently that insect-induced stress could be accurately detected and that therefore there was little indication of model over-fitting. In the analyses of data set 2, inconsistent validation results were obtained and the observed classification accuracy (81.06%) was only a few percentage points above that obtained using random data (66.7-77.4%). Thus, our analysis highlights a potential risk of model over-fitting and emphasizes the importance of testing for this important aspect as part of developing reliable and robust classification models.
Vol. 9, Iss. 1 Virtual Journal for Biomedical Optics
Christian Nansen, Leandro Delalibera Geremias, Yingen Xue, Fangneng Huang, and Jose Roberto Parra, "Agricultural Case Studies of Classification Accuracy, Spectral Resolution, and Model Over-Fitting," Appl. Spectrosc. 67, 1332-1338 (2013)