Petroleum hydrocarbons are contaminants of great significance. The commonly used analytic method for assessing total petroleum hydrocarbons (TPH) in soil samples is based on extraction with 1,1,2-Trichlorotrifluoroethane (Freon 113), a substance prohibited to use by the Environmental Protection Agency. During the past 20 years, a new quantitative methodology that uses the reflected radiation of solids has been widely adopted. By using this approach, the reflectance radiation across the visible, near infrared-shortwave infrared region (400-2500 nm) is modeled against constituents determined using traditional analytic chemistry methods and then used to predict unknown samples. This technology is environmentally friendly and permits rapid and cost-effective measurements of large numbers of samples. Thus, this method dramatically reduces chemical analytical costs and secondary pollution, enabling a new dimension of environmental monitoring. In this study we adapted this approach and developed effective steps in which hydrocarbon contamination in soils can be determined rapidly, accurately, and cost effectively solely from reflectance spectroscopy. Artificial contaminated samples were analyzed chemically and spectrally to form a database of five soils contaminated with three types of petroleum hydrocarbons (PHCs), creating 15 datasets of 48 samples each at contamination levels of 50-5000 wt% ppm (parts per million). A brute force preprocessing approach was used by combining eight different preprocessing techniques with all possible datasets, resulting in 120 different mutations for each dataset. The brute force was done based on an innovative computing system developed for this study. A new parameter for evaluating model performance scoring (MPS) is proposed based on a combination of several common statistical parameters. The effect of dividing the data into training validation and test sets on modeling accuracy is also discussed. The results of this study clearly show that predicting TPH levels at low concentrations in selected soils at high precision levels is viable. Dividing a dataset into training, validation, and test groups affects the modeling process, and different preprocessing methods, alone or in combination, need to be selected based on soil type and PHC type. MPS was found to be a better parameter for selecting the best performing model than ratio of prediction to deviation, yielding models with the same performance but less complicated and more stable. The use of the “all possibilities” system proved to be mandatory for efficient optimal modeling of reflectance spectroscopy data.
Guy Schwartz, Eyal Ben-Dor, and Gil Eshel, "Quantitative Assessment of Hydrocarbon Contamination in Soil Using Reflectance Spectroscopy: A “Multipath” Approach," Appl. Spectrosc. 67, 1323-1331 (2013)