The spectra in spectral reflectance datasets tend to be quite correlated and therefore they can be represented more compactly using standard techniques such as principal components analysis (PCA) as part of a lossy compression strategy. However, the presence of outlier spectra can often increase the overall error of the reconstructed spectra. This paper introduces a new outlier modeling (OM) method that detects, clusters, and separately models outliers with their own set of basis vectors. Outliers are defined in terms of the robust Mahalanobis distance using the fast minimum covariance determinant algorithm as a robust estimator of the multivariate mean and covariance from which it is computed. After removing the outliers from the main dataset, the performance of PCA on the remaining data improves significantly; however, since outlier spectra are a part of the image, they cannot simply be ignored. The solution is to cluster the outliers into a small number of clusters and then model each cluster separately using its own cluster-specific PCA-derived bases. Tests show that OM leads to lower spectral reconstruction errors of reflectance spectra in terms of both normalized RMS and goodness of fit.
You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
Spectral Accuracy of Reflectance Reconstruction for the Fruits and Flowers Image Using Standard PCA Versus Outlier Modeling and Associated CRa
Normalized RMS
GFC
% of samples
% of samples
#Spectra
CR
Mean
Classic PCA
19200
3D
11.8
0.68
2.48
0.71
45.70
22.13
4D
9.5
0.77
7.00
0.40
68.47
31.24
5D
8.0
0.83
11.09
0.02
82.84
37.64
Outlier modeling Step 1
Inlier Cluster
11459
0.77
5.35
0
56.42
9.25
Outliers
7741
0.67
5.51
0.94
66.50
39.73
Outlier modeling Step 2
Outlier (Cluster 1)
963
0.90
19.94
0
92.63
45.48
Outlier (Cluster 2)
2932
0.82
32.20
0
97.95
84.86
Outlier (Cluster 3)
844
0.84
0
0
91.11
19.67
Outlier (Cluster 4)
2190
0.76
0.55
1.19
96.89
70.55
Outlier (Cluster 5)
812
0.91
47.04
0
94.58
79.06
Mean
0.82
19.76
0.37
95.89
68.19
Outlier modeling of the full dataset
19200
10.7
0.80
11.16
0.15
72.34
33.01
The reconstruction error for each cluster of spectra is listed separately. OM is done using three basis vectors.
Table 2.
Accuracy of Reflectance Reconstruction for 10 Multispectral Images from the Hordley Database Using Classic PCA with a 4D Basis Versus the OM Approach Using Three Basis Vectorsa
Normalized RMS
GFC
Method
OM
4D Classic PCA
OM
4D Classic PCA
% of samples
% of samples
% of samples
% of samples
Image Size
Mean
Mean
Daz
0.52
2.25
12.06
0.45
4.10
17.72
76.73
29.07
78.02
28.43
Persilnonbio
0.70
1.57
2.40
0.71
1.73
2.30
91.75
27.12
92.07
34.85
Goaheadbars
0.71
5.41
2.41
0.69
6.71
4.83
76.35
35.67
78.82
34.86
Couscous
0.80
1.85
0.31
0.79
1.64
0.86
79.60
16.86
80.43
16.00
Elastoplast
0.78
3.57
1.16
0.78
2.77
2.23
82.68
25.12
85.79
24.20
Kellogg’s
0.56
1.06
8.60
0.35
1.62
24.50
66.02
25.93
69.65
18.48
Freeform
0.75
4.13
5.37
0.80
6.39
2.84
91.63
47.40
93.85
54.21
Mulligatawny
0.80
0.39
0.82
0.78
0.09
1.21
84.58
36.49
86.91
26.57
Vanish
0.58
2.12
8.00
0.58
4.38
8.59
79.22
34.42
79.80
37.00
Fairy
0.55
0.45
7.41
0.58
0.40
6.95
80.28
28.13
84.57
29.57
MEAN
0.67
2.28
4.85
0.65
2.98
7.20
80.88
30.62
83.00
30.41
The average CRs are 9.5 and 10.7, respectively. The best result in each column is highlighted in bold.
Table 3.
Accuracy of Reflectance Reconstruction for Seven Multispectral Images from the Columbia University Database Using Classic PCA with Five Eigenvectors as well as the OM Approach Using Three Basis Vectorsa
Normalized RMS
GFC
Method
OM
5D Classic PCA
OM
5D Classic PCA
% of samples
% of samples
% of samples
% of samples
Mean
Mean
PomPoms
0.86
21.42
0.32
0.85
15.67
0.40
86.03
48.50
86.02
44.28
Watercolors
0.77
3.36
1.19
0.81
4.62
0.52
81.75
67.77
84.29
71.74
Beeds
0.80
15.18
0.87
0.82
7.28
0.57
59.88
29.66
63.23
19.55
Beer
0.71
13.17
2.66
0.78
17.72
0.41
96.60
90.04
97.86
91.45
Jelly_beans
0.82
5.69
0.50
0.85
6.67
0.51
72.66
33.49
81.47
35.41
Stuffed_toys
0.82
9.53
1.86
0.73
5.02
0.79
87.84
41.49
48.50
24.67
Paints
0.81
2.83
0.55
0.80
0.39
0.51
89.01
49.29
82.37
39.52
MEAN
0.80
10.17
1.13
0.80
8.19
0.53
81.96
51.46
77.67
46.66
The CRs are about 8.0 and 10.7, respectively. The best result in each column is highlighted in bold.
Table 4.
Colorimetric Accuracy of Reflectance Reconstruction of 18 Spectral Images Taken from 3 Multispectral Databases Using Classic PCA and the Proposed OM Methoda
DE2000 (D65)
DE2000 (A)
OM
4D Classic PCA
OM
4D Classic PCA
Mean
Median
90th
Mean
Median
90th
Mean
Median
90th
Mean
Median
90th
Eastern Finland database
1.49
1.06
3.24
1.70
1.15
3.98
1.52
1.07
3.10
1.52
1.15
3.28
Hordley database
0.90
0.72
1.98
1.19
0.89
2.49
0.95
0.70
1.75
1.02
0.77
2.08
Columbia database
1.38
0.90
3.01
1.86
1.32
3.85
1.44
0.95
3.08
1.75
1.18
3.78
The average CRs are 9.5 and 10.7, respectively.
Table 5.
OM and Classic PCA Running Time (in Seconds) for the Smallest and Largest Images in the Datasets
OM
Image Size
Fast MCD
K-Means
Iterative Refinement
PCA
Total
4D Classic PCA
Fruits and Flowers
14.24
0.16
1.06
0.29
15.75
0.72
PomPoms
20.65
1.14
26.52
6.47
54.78
9.94
Tables (5)
Table 1.
Spectral Accuracy of Reflectance Reconstruction for the Fruits and Flowers Image Using Standard PCA Versus Outlier Modeling and Associated CRa
Normalized RMS
GFC
% of samples
% of samples
#Spectra
CR
Mean
Classic PCA
19200
3D
11.8
0.68
2.48
0.71
45.70
22.13
4D
9.5
0.77
7.00
0.40
68.47
31.24
5D
8.0
0.83
11.09
0.02
82.84
37.64
Outlier modeling Step 1
Inlier Cluster
11459
0.77
5.35
0
56.42
9.25
Outliers
7741
0.67
5.51
0.94
66.50
39.73
Outlier modeling Step 2
Outlier (Cluster 1)
963
0.90
19.94
0
92.63
45.48
Outlier (Cluster 2)
2932
0.82
32.20
0
97.95
84.86
Outlier (Cluster 3)
844
0.84
0
0
91.11
19.67
Outlier (Cluster 4)
2190
0.76
0.55
1.19
96.89
70.55
Outlier (Cluster 5)
812
0.91
47.04
0
94.58
79.06
Mean
0.82
19.76
0.37
95.89
68.19
Outlier modeling of the full dataset
19200
10.7
0.80
11.16
0.15
72.34
33.01
The reconstruction error for each cluster of spectra is listed separately. OM is done using three basis vectors.
Table 2.
Accuracy of Reflectance Reconstruction for 10 Multispectral Images from the Hordley Database Using Classic PCA with a 4D Basis Versus the OM Approach Using Three Basis Vectorsa
Normalized RMS
GFC
Method
OM
4D Classic PCA
OM
4D Classic PCA
% of samples
% of samples
% of samples
% of samples
Image Size
Mean
Mean
Daz
0.52
2.25
12.06
0.45
4.10
17.72
76.73
29.07
78.02
28.43
Persilnonbio
0.70
1.57
2.40
0.71
1.73
2.30
91.75
27.12
92.07
34.85
Goaheadbars
0.71
5.41
2.41
0.69
6.71
4.83
76.35
35.67
78.82
34.86
Couscous
0.80
1.85
0.31
0.79
1.64
0.86
79.60
16.86
80.43
16.00
Elastoplast
0.78
3.57
1.16
0.78
2.77
2.23
82.68
25.12
85.79
24.20
Kellogg’s
0.56
1.06
8.60
0.35
1.62
24.50
66.02
25.93
69.65
18.48
Freeform
0.75
4.13
5.37
0.80
6.39
2.84
91.63
47.40
93.85
54.21
Mulligatawny
0.80
0.39
0.82
0.78
0.09
1.21
84.58
36.49
86.91
26.57
Vanish
0.58
2.12
8.00
0.58
4.38
8.59
79.22
34.42
79.80
37.00
Fairy
0.55
0.45
7.41
0.58
0.40
6.95
80.28
28.13
84.57
29.57
MEAN
0.67
2.28
4.85
0.65
2.98
7.20
80.88
30.62
83.00
30.41
The average CRs are 9.5 and 10.7, respectively. The best result in each column is highlighted in bold.
Table 3.
Accuracy of Reflectance Reconstruction for Seven Multispectral Images from the Columbia University Database Using Classic PCA with Five Eigenvectors as well as the OM Approach Using Three Basis Vectorsa
Normalized RMS
GFC
Method
OM
5D Classic PCA
OM
5D Classic PCA
% of samples
% of samples
% of samples
% of samples
Mean
Mean
PomPoms
0.86
21.42
0.32
0.85
15.67
0.40
86.03
48.50
86.02
44.28
Watercolors
0.77
3.36
1.19
0.81
4.62
0.52
81.75
67.77
84.29
71.74
Beeds
0.80
15.18
0.87
0.82
7.28
0.57
59.88
29.66
63.23
19.55
Beer
0.71
13.17
2.66
0.78
17.72
0.41
96.60
90.04
97.86
91.45
Jelly_beans
0.82
5.69
0.50
0.85
6.67
0.51
72.66
33.49
81.47
35.41
Stuffed_toys
0.82
9.53
1.86
0.73
5.02
0.79
87.84
41.49
48.50
24.67
Paints
0.81
2.83
0.55
0.80
0.39
0.51
89.01
49.29
82.37
39.52
MEAN
0.80
10.17
1.13
0.80
8.19
0.53
81.96
51.46
77.67
46.66
The CRs are about 8.0 and 10.7, respectively. The best result in each column is highlighted in bold.
Table 4.
Colorimetric Accuracy of Reflectance Reconstruction of 18 Spectral Images Taken from 3 Multispectral Databases Using Classic PCA and the Proposed OM Methoda
DE2000 (D65)
DE2000 (A)
OM
4D Classic PCA
OM
4D Classic PCA
Mean
Median
90th
Mean
Median
90th
Mean
Median
90th
Mean
Median
90th
Eastern Finland database
1.49
1.06
3.24
1.70
1.15
3.98
1.52
1.07
3.10
1.52
1.15
3.28
Hordley database
0.90
0.72
1.98
1.19
0.89
2.49
0.95
0.70
1.75
1.02
0.77
2.08
Columbia database
1.38
0.90
3.01
1.86
1.32
3.85
1.44
0.95
3.08
1.75
1.18
3.78
The average CRs are 9.5 and 10.7, respectively.
Table 5.
OM and Classic PCA Running Time (in Seconds) for the Smallest and Largest Images in the Datasets