OSA's Digital Library

Optics Express

Optics Express

  • Editor: C. Martijn de Sterke
  • Vol. 20, Iss. 3 — Jan. 30, 2012
  • pp: 2297–2309
« Show journal navigation

Coded aperture spectroscopy with denoising through sparsity

Alex Mrozack, Daniel L. Marks, and David J. Brady  »View Author Affiliations


Optics Express, Vol. 20, Issue 3, pp. 2297-2309 (2012)
http://dx.doi.org/10.1364/OE.20.002297


View Full Text Article

Acrobat PDF (785 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

We compare noise and classification metrics for three aperture codes in dispersive spectroscopy. In contrast with previous theory, we show that multiplex codes may be advantageous even in systems dominated by Poisson noise. Furthermore, ill-conditioned codes with a regularized estimation strategy are shown to perform competitively with well-conditioned codes.

© 2011 OSA

1. Introduction

Following Golay’s seminal work [1

1. M. T. E. Golay, “Multi-slit spectrometry,” J. Opt. Soc. Am. 39(6), 437–437 (1949). URL http://www.opticsinfobase.org/abstract.cfm?URI=josa-39-6-437. [CrossRef] [PubMed]

], coded aperture spectroscopy has been applied widely over the past sixty years. By turning spectral estimation into a weighing design problem, a large increase in throughput can be achieved. When noise is independent of signal level, advantages of this method are well-established. Optimal weighing designs for estimating a spectrum in the presence of independent Gaussian noise were outlined in [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

] in 1979. We note that a counter theory to [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

] has recently been proposed in [3

3. A. Barducci, D. Guzzi, C. Lastri, V. Nardino, P. Marcoionni, and I. Pippi, “Radiometric and signal-to-noise ratio properties of multiplex dispersive spectrometry,” Appl. Opt. 49(28), 5366–5373 (2010). URL http://ao.osa.org/abstract.cfm?URI=ao-49-28-5366. [CrossRef] [PubMed]

], although experimental results [4

4. A. A. Wagadarikar, M. E. Gehm, and D. J. Brady, “Performance comparison of aperture codes for multimodal, multiplex spectroscopy,” Appl. Opt. 46(22), 4932–4942 (2007). URL http://ao.osa.org/abstract.cfm?URI=ao-46-22-4932. [CrossRef] [PubMed]

] and the simulation results of this paper support the established theory.

Two dimensional arrays allowed for further improvement of system design starting in the 1990s. Mende, et. al., [5

5. S. B. Mende, E. S. Claflin, R. L. Rairden, and G. R. Swenson, “Hadamard spectroscopy with a two-dimensional detecting array,” Appl. Opt. 32(34), 7095–7105 (1993). URL http://ao.osa.org/abstract.cfm?URI=ao-32-34-7095. [CrossRef] [PubMed]

] first used a two dimensional array to claim both the throughput and multiplex advantage. The work of [5

5. S. B. Mende, E. S. Claflin, R. L. Rairden, and G. R. Swenson, “Hadamard spectroscopy with a two-dimensional detecting array,” Appl. Opt. 32(34), 7095–7105 (1993). URL http://ao.osa.org/abstract.cfm?URI=ao-32-34-7095. [CrossRef] [PubMed]

] was improved through two dimensional aperture codes in [6

6. M. E. Gehm, S. T. McCain, N. P. Pitsianis, D. J. Brady, P. Potuluri, and M. E. Sullivan, “Static two-dimensional aperture coding for multimodal, multiplex spectroscopy,” Appl. Opt. 45(13), 2965–2974 (2006). URL http://ao.osa.org/abstract.cfm?URI=ao-45-13-2965. [CrossRef] [PubMed]

], which allow for better conditioning of the measurement. The two dimensional array also brought up the question of multiplex imaging [7

7. M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14,013–14,027 (2007). URL http://www.opticsexpress.org/abstract.cfm?URI=oe-15-21-14013. [CrossRef]

] [8

8. A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47(10), B44–B51 (2008). URL http://ao.osa.org/abstract.cfm?URI=ao-47-10-B44. [CrossRef] [PubMed]

]. Other work in coded aperture spectroscopy has involved coding outside of the entrance aperture. Fateley, et. al., [9

9. R. A. DeVerse, R. M. Hammaker, and W. G. Fateley, “Realization of the hadamard multiplex advantage using a programmable optical mask in a dispersive flat-field near-infrared spectrometer,” Appl. Spectrosc. 54(12), 1751–1758 (2000). URL http://as.osa.org/abstract.cfm?URI=as-54-12-1751. [CrossRef]

] demonstrated a coded aperture spectrometer with encoding via a micro-mirror array.

Table 4.1 of [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

] summarizes the motivation to multiplex spectra in the presence of independent Gaussian noise and the motivation to measure spectra directly in the presence of Poisson noise. While the multiplexed system MSE decreases by a factor of N/4, where N is the dimension of the code, for independent Gaussian noise limited systems, the MSE increases by a factor of two, regardless of the code dimension, for Poisson noise limited systems. These results, however, apply only to the linear unbiased estimator, which was assumed to be the Moore-Penrose pseudo-inverse. While the computing power in the 1970’s limited the algorithms available to Harwit and Sloane, they noted “... there are also good arguments in favor of a biased estimate...” This paper studies Harwit and Sloane’s suggestion using modern non-linear estimators. In using regularization techniques based on a sparsity prior, certain solutions are favored over others. Regularization allows for measurement energy to be focused into a subspace. The rest of the space is filled in by the prior knowledge of the signal class. These biased estimators could reveal codes previously determined to be suboptimal to outperform optimal codes for the linear unbiased estimator.

For the remainder of the paper we compare three measurement systems. These systems are a slit spectrometer based on the identity matrix, a coded aperture spectrometer based on a pseudo-random matrix with elements uniformly selected from binary {0, 1}, and a coded aperture spectrometer based on the cyclic-s matrix. All three systems are representative of singly encoded spectrometers, with binary elements, meaning these systems are achievable with a binary {0, 1} mask and a single detector. The matrices are square and invertible in all cases before processing. (In some cases we reduce the rank in regularization.) The singular value characteristics of each of the spectrometers are shown in Figure 1. The spectrometers were chosen to demonstrate non-multiplexed and multiplexed measurements, and ill-conditioned and well-conditioned measurements. The pseudo-random and cyclic-s spectrometers are multiplexed, while the slit spectrometer is not, and the slit and cyclic-s spectrometer are well-conditioned, while the pseudo-random spectrometer is not. Using these spectrometers, first signal estimation performance is compared over two spectral libraries of one-dimensional spectral signals (Figure 2). One library consists of synthetic data which is spikes in the canonical basis. The other library is a library of reflectance data. The data is smoothly varying over a large wavelength range. Then, classification performance is considered. Samples are drawn from two source distributions and a classifier is designed to determine from which distribution each sample was drawn.

Fig. 1 SVD spectra of the three measurement systems to be compared. The slit code has a flat spectrum and is not multiplexed. The pseudo-random code has a decaying spectrum and is multiplexed. The cyclic-s code has a flat spectrum and is multiplexed.
Fig. 2 A sample spectrum drawn from each of libraries demonstrating the canonical sparsity of the sythetic data set, and the smooth variations of the reflectance data set.

2. Signal Estimation

2.1. Mathematical Framework

Coded aperture spectroscopy is a problem defined by a linear measurement model [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

]. The generalized linear measurement problem is stated as
g=Hf+n
(1)
where g is the measured data from object f, sensing matrix H and noise n. There are two goals of linear measurement which are not mutually exclusive. The first goal is to optimally select a measurement system which defines H and therefore the mathematical properties of the system. The second goal is to select an estimator given knowledge of H and the class of signals from which f is drawn. Selecting an estimator independent of knowledge of f would be to select a linear unbiased estimator. The results for the linear unbiased estimator are the results presented in [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

], and are uniquely determined by the condition number and throughput, the amount of energy transferred, of the matrix H [10

10. B. Noble, Applied linear algebra (Prentice Hall, 1977). ID: DUKE004063101; Formats: Book; 2d ed.; xvii, 477 p. ; 24 cm.; M2: OCLC Number: 02985355 Bibliography: p. 459–463.Includes index.

]. This paper is concerned with an estimator for a special class of signals from which f may be drawn. This class, the class of sparse signals, has special properties which allow for efficient denoising [11

11. S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput. (USA) 20, 33–61 (1998). [CrossRef]

,12

12. J. Bioucas-Dias and M. Figueiredo, “A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. 16(12), 2992 –3004 (2007). [CrossRef] [PubMed]

]. Under the influence of additive Gaussian noise this estimator, a maximum a posteriori (MAP) estimator, solves the objective function
f^=argminfˇgHfˇ22+τΨ1fˇ1
(2)
where is the estimate, τ is the weight of the sparsity inducing prior, classically a Laplacian distribution, and Ψ is the inverse transform from the basis on which f is sparse to the canonical basis. The l2 norm in the functional is from a Gaussian negative log-likelihood function. The noise models observed in optics are most often a Poisson noise model, or a mixed Poisson and Gaussian noise model. Clearly the more optimal negative log-likelihood function would be the Poisson negative log-likelihood function. An even better solution would be to use a minimum mean squared error estimator (MMSE) for the distributions of interest. Since there is no simple analytical form for the posterior distribution with the priors and penalties we have chosen, computing a MMSE would be impractical, so we do not consider it. For the Poisson only noise model the negative log-likelihood to be used in a more optimal MAP estimator is
L(g|fˇ)=1THfˇi=1Ngilog(eiTHfˇ)
(3)
where ei is a canonical basis vector with the ith component is non-zero, and gi is the ith component of the measurement [13

13. Z. Harmany, R. Marcia, and R. Willett, “This is SPIRAL-TAP: sparse poisson intensity reconstruction algorithms - theory and practice,” ArXiv e-prints (2010). 1005.4274.

]. However, the results from [13

13. Z. Harmany, R. Marcia, and R. Willett, “This is SPIRAL-TAP: sparse poisson intensity reconstruction algorithms - theory and practice,” ArXiv e-prints (2010). 1005.4274.

] from solving for the MAP estimate with the optimal negative log-likelihood are not staggeringly better than solving for the Gaussian MAP estimate, at most 7 percent error improvement was demonstrated in [13

13. Z. Harmany, R. Marcia, and R. Willett, “This is SPIRAL-TAP: sparse poisson intensity reconstruction algorithms - theory and practice,” ArXiv e-prints (2010). 1005.4274.

], and the algorithms require non-negativity of the elements of the matrix H. While H contains only non-negative entries in any singly encoded coded aperture spectrometer, the structure of the principal components of the data, along with the structure of the measurement matrix, may make the system amenable to truncated-SVD, truncated singular value decomposition, reconstructions as described in chapter 8 of [14

14. D. J. Brady, Optical imaging and spectroscopy (Wiley-interscience, 2008).

]. Truncated-SVD aids the reconstruction process, because the components of the data which correspond to weak singular vectors are removed, which makes the reconstruction more reliable. The system matrix of reduced dimensionality may have negative entries, and is therefore not usable in the framework laid by [13

13. Z. Harmany, R. Marcia, and R. Willett, “This is SPIRAL-TAP: sparse poisson intensity reconstruction algorithms - theory and practice,” ArXiv e-prints (2010). 1005.4274.

]. When using truncated-SVD the objective function becomes
f^=argminfˇg˜H˜fˇ22+τΨ1fˇ1
(4)
where
g˜=Wg
(5)
H˜=WH
(6)
H=USVT
(7)
W=SnUnT.
(8)
Sn denotes the pseudo-inverse of the n largest components of the diagonal matrix S, and UnT are the corresponding left singular vectors of the sensing matrix H. It is important to note that truncating the singular values of the matrix makes the inversion ill-posed, but not unsolvable. Because the signals we are trying to estimate are sparse, they can be estimated on their full dimension using the theory of compressive measurement [15

15. D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289 –1306 (2006). [CrossRef]

].

2.2. Simulation Setup

To compare the performance of a slit, pseudo-random, and cyclic-s aperture code, Monte Carlo simulations are performed on two sets of data: a canonically sparse synthetic data set and a smoothly-varying real reflectance data set. A slit code is simply represented by an identity matrix. The cyclic-s matix is derived from a Hadamard matrix of dimension (n + 1) where n is the dimension of the spectrum to be measured. The pseudo-random matrix is a matrix with half the entries equal to one, and half the entries equal to zero uniformly distributed throughout the matrix. The sparse data set consists of 9,10, or 11 randomly placed spikes of random heights in a spectrum of 127 different wavelengths. The reflectance set is downloadable from http://www.planetary.brown.edu/relab/. It is taken from a Bidirection Visible Near Infrared Spectrometer and is downsampled to 5nm resolution on a 450–1080nm window, which leads to a signal of 127 dimensions. The data is denoised before simulated measurement by a total variation(TV) method [16

16. A. Chambolle, “An algorithm for total variation minimization and applications,” J. Math. Imaging Vision 20, 89–97 (2004). URL http://dx.doi.org/10.1023/B:JMIV.0000011325.36760.1e. [CrossRef]

], exploiting the smooth variations of reflectance spectra, so the reconstruction does not have to fit noisy data.

One hundred signals are drawn from each data set, measured using each system matrix, and reconstructed by optimizing (4). The optimization is performed via the algorithm presented in [12

12. J. Bioucas-Dias and M. Figueiredo, “A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. 16(12), 2992 –3004 (2007). [CrossRef] [PubMed]

] for three different signal levels and using two different noise models. One noise model is the Poisson only noise model, for which multiplexing is assumed suboptimal, and the other is a mixed noise model with both Poisson noise and Gaussian noise. The Gaussian noise standard deviation is set to be on the same order as the standard deviation of the Poisson noise generated by slit spectrometer noise measurements, i.e. σg is σp of the mean spike value for the synthetic set, and σg is σp of the mean signal value for the real data set. The reason for setting the levels of the Gaussian standard deviation as such is that this situation represents the transition point between a Poisson limited and Gaussian limited measurement for the slit spectrometer. We are not concerned with the zero-level signal locations in the canonically sparse signals, because they do not contribute to noise in the Poisson limited slit spectrometer measurements. The smoothly-varying signals on the other hand are assumed to be approximately constant, and therefore the transition between a Poisson limited and Gaussian limited measurement is assumed to happen when the Gaussian noise standard deviation is set to the square root of the mean value of the signal. For example, if the canonical data is being measured with 1000 photons on average, then each spike generates 100 photons on average. This generates a shot noise level of 10 photons for each peak, which is what the σg is set to for the mixed noise measurements. On the other hand, if the reflectance data is being measured, it is assumed to have uniform intensity throughout the spectrum. Therefore, a signal level of 10000 photons generates about 80 photons per band so the σg is set to 9 photons. The only difference in the estimation process is the denoising step is canonical l1, i.e. Ψ is the identity matrix, for the synthetic data, and is a total variation penalty, i.e. Ψ−1 transforms f into a gradient pseudo-basis, for the reflectance data. In measuring the performance, each signal is first estimated using the optimal regularization parameter τ, as determined by lowest MSE, from (4) on a training set. This means that each signal is reconstructed for many τ over a large range. (For the pseudo-random matrix the optimal truncation level is selected as well.) After the performance is evaluated on the training data sets with optimal parameters, a new random draw of one hundred signals from each data set is performed generating a disjoint test set, and the process is repeated again using only the most commonly used parameters for each measurement system, signal level, noise type combination as determined by the results from the training set data.

The metric used for performance is mean squared error, MSE, and is defined by
MSE=(1/N)i=1N(fif^i)2
(9)
The baseline for comparison is established by using the linear unbiased estimator on the same data. The results of these control experiments should match the predictions of [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

].

2.3. Simulation Results

The control experiment results are as predicted by [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

] (Table 1). For the Poisson noise model the MSE was a factor of two lower for the slit spectrometer versus the cyclic-s spectrometer for nearly all input signal levels and for both data sets. The ill-conditioned pseudo-random coded spectrometer performs orders of magnitude worse than the well-conditioned systems. When a mixed noise model is employed, however, the MSE deteriorates by a substantially greater amount for the slit spectrometer than the cyclic-s spectrometer. For the synthetic data, the cyclic-s spectrometer outperforms the slit spectrometer by a large margin, and for the real reflectance data the cyclic-s spectrometer performs comparably well to the slit spectrometer. Recall the Gaussian signal level was chosen to be on the same order as the Poisson noise in the native signals. The multiplexed signals, however, have many, many more photons per measurement. So while the additive noise ramps quickly for the slit spectrometer, the multiplexed spectrometers effectively stay photon-limited. Clearly as more and more Gaussian noise is added to the point where the noise model is Gaussian dominated the multiplexing advantage will be recognized to its full extent.

Table 1. Confirmation of the linear unbiased estimator results of [2] for the Poisson only noise model, and demonstration of the multiplexing advantage for the mixed noise model.

table-icon
View This Table
| View All Tables

The convex optimization simulations results are summarized in Table 2. Both the training set, in which optimal parameters are selected to minimize the MSE, and the test set, in which the most commonly used parameters from the test set are exclusively employed, are considered in the convex optimization analysis and represent two different measurement scenarios. The training parameter set represents having each spectrum reconstructed by a skilled operator. This is common practice and the results shown in [17

17. D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt. 49(36), 6824–6833 (2010). URL http://ao.osa.org/abstract.cfm?URI=ao-49-36-6824. [CrossRef] [PubMed]

] are of this form. The test set represents having the system calibrated and only using the parameters set in calibration for a given signal level. This would allow for a more automated data acquisition process.

Table 2. Convex optimization reconstruction data showing the multiplexing advantage when convex optimization is used. % singular values refers to the percentage of singular values kept after performing truncated-SVD on the data.

table-icon
View This Table
| View All Tables

The first result from the convex optimization simulations, that is perhaps of the most importance, is the overall performance enhancement achieved through convex optimization. The non-linear processing allowed for efficient denoising for most measurement schemes even with suboptimal parameter selection. The one exception is the the slit spectrometer measuring canonically sparse data with a Poisson-only noise model, for which the lack of multiplexing makes l1 minimization impractical. Another main result is that in the presence of Poisson only noise the multiplexed systems perform better relative to the slit spectrometer than the predictions of [2

2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

] suggest. In some cases in this empirical study of two sparse signal classes the multiplexed systems even outperform the slit system in the presence of Poisson only noise.

For the synthetic and canonically sparse data the multiplexed systems almost always perform better than the slit system. The spikes are actually reconstructed more accurately via multiplexed systems [14

14. D. J. Brady, Optical imaging and spectroscopy (Wiley-interscience, 2008).

] [5

5. S. B. Mende, E. S. Claflin, R. L. Rairden, and G. R. Swenson, “Hadamard spectroscopy with a two-dimensional detecting array,” Appl. Opt. 32(34), 7095–7105 (1993). URL http://ao.osa.org/abstract.cfm?URI=ao-32-34-7095. [CrossRef] [PubMed]

], and l1 denoising optimally removes the noise on the zero components. Once Gaussian noise is added the effect become even more pronounced. While the l1 denoising allows for the slit code to remove noise from the zero components similarly to the multiplexed codes, the less accurate matching of the peak intensities demonstrated in the spectroscopy discussions in chapter 9 of [14

14. D. J. Brady, Optical imaging and spectroscopy (Wiley-interscience, 2008).

] becomes more apparent. The one exception to this is on the cross-validation data the pseudo-random spectrometer performance is unpredictable relative to the slit spectrometer. Since the pseudo-random spectrometer reconstructions require two parameters, the effects of non-optimal parameter selection are more strongly felt.

For the real reflectance data the multiplex and slit spectrometers perform nearly the same under Poisson noise with two exceptions. When optimal parameters are selected, the pseudo-random coded spectrometer outperforms the other two for 1000 photons. At the extremely low signal levels it is better to make a few measurements well, than many measurements poorly, and the pseudo-random matrix focuses its measurement energy into often as few as 30% of its singular vectors when the truncated-SVD approach is used. This effect is not realized for the synthetic data, because on average many more measurements are needed to represent the synthetic data accurately, necessitating some of the ill-conditioned singular vectors from the pseudo-random system. The other extreme is also realized for high signal levels on the reflectance data set. The slit spectrometer is able reconstruct weak features lost in multiplexing noise which begin to affect the relative reconstruction performance at high signal levels. The similar performance can be attributed to the similar coherence [18

18. T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery,” Technical Report (2010).

] of the sensing matrices with the TV pseudo-basis. If the TV basis is considered to be the gradient basis, all the matrices have a worst case coherence of about .95. Once again, when Gaussian noise is added the multiplexed systems drastically outperform the slit spectrometer.

Finally, we note that the pseudo-random matrix is of paramount importance for spectral imaging [17

17. D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt. 49(36), 6824–6833 (2010). URL http://ao.osa.org/abstract.cfm?URI=ao-49-36-6824. [CrossRef] [PubMed]

]. The coded aperture snapshot spectral imager (CASSI) is currently designed to work under the assumption that any subset of a pseudo-random code is still a valid code with identical statistics to any other part of the code. This is not true in the case of a cyclic-s encoded aperture. The results of this paper show that for TV sparse signals the pseudo-random code can perform as well as an optimally orthogonal column code with truncated SVD. One may question how similar the coded aperture spectrometer matrix is to the CASSI matrix. This depends on at what spatial resolution the object needs to be reconstructed. In the limit that the power spectral density is uniform over the entire aperture, and the spatial resolution is not important, the CASSI instrument can be used as a spectrometer. In any case, however, both instruments suffer from being ill-conditioned based on the pseudo-random code. Truncated-SVD estimates have not yet been considered for CASSI, but leave open a future study that should yield considerable improvements to the utility of the system as CASSI measures reflectance spectra.

3. Signal Classification

3.1. Mathematical Framework

While estimation MSE gives an intuitive understanding of how well a signal can be reconstructed, it evenly weights the reconstruction quality of all aspects of the signal. Here we try to use classification to clarify what signal characteristics make a signal more distinguishable by multiplexing or not multiplexing with a correlated noise source. Many different classification strategies exist. We present one hypothetical classification scenario for spectroscopy, and motivate the use of a common classification technique.

Imagine a researcher has spectral data from two similar species normally distributed with means {μ1, μ2} and covariances {Σ1, Σ2}. Unfortunately, the difference in means may be small relative to the covariances of the samples and power of the measured noise. This would make it difficult for an observer to tell from which distribution each sample was drawn from trying to distinguish reconstructed spectra with the naked eye. For this reason a computational method of distinguishing the spectra, or classifying them, is employed. Any classifier needs a set of labeled, or pre-classified data, off of which it will base future classifications. However, acquiring the labels of the pre-classified data is expensive, else the researcher would exclusively classify the data through those outside means. We sketch one such process of selecting data to be labeled, and how to use that labeled data to build an accurate classifier. This is not intended to be a full discussion of the inner workings of classification, detailed discussions of the methods of this work can be found in [19

19. S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batch mode active learning and its application to medical image classification,” in Proceedings of the 23rd International Conference on Machine Learning (ICML (2006), pp. 417–424. URL http://doi.acm.org/10.1145/1143844.1143897. [CrossRef]

] [20

20. Y. Zhang, X. Liao, and L. Carin, “Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO,” IEEE Trans. Geosci. Remote Sens. 42(11), 2535–2543 (2004). [CrossRef]

], but is instead intended to provide the reader with some background for interpreting the simulation results in the following sections.

Batch classification as described in [19

19. S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batch mode active learning and its application to medical image classification,” in Proceedings of the 23rd International Conference on Machine Learning (ICML (2006), pp. 417–424. URL http://doi.acm.org/10.1145/1143844.1143897. [CrossRef]

] and simplified in [20

20. Y. Zhang, X. Liao, and L. Carin, “Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO,” IEEE Trans. Geosci. Remote Sens. 42(11), 2535–2543 (2004). [CrossRef]

] is designed to prune a labeled set of data for classification, the training set, in an optimal way from the whole acquired data set in an optimized way. The classifier is specified by the following: given a sample xi, drawn from data set X = {x1...xN}, the corresponding label ti is assigned by
ti=ϕT(xi)w+n
(10)
ϕ(x)=[1K(x,b1)K(x,b2)K(x,bn)]T
(11)
K(x,bi)=exp(γxbi22).
(12)
.
13
The weights, w, weigh the importance of matching each basis function, bj. How closely a vector xi matches a basis function bj is determined by the radial basis function K(xi, bj).

In selecting the basis functions we did not know what vectors would be labeled and which would be classified without labels. Since the precision matrix was not a function of the labels we assumed that all the data was labeled and proceeded with selecting basis functions. Now we need to select data to label. Clearly it would make sense to label the data selected as basis vectors, so we begin with the labeled set XL={xL1xLNL} equal to the set of basis vectors B. This yields a precision matrix from the labeled set of data
ALNLΦLΦLT+λI
(18)
where the diagonal loading is used to insure invertibility and ΦL corresponds to the columns of Φ representative of the labeled set XL. Now the goal is to add more labels once again with the goal of maximizing the determinant of the precision matrix, this time defined by ALNL+1 in place of A. Since ALNL+1 is created by adding another column to ΦL, by the matrix determinant lemma the vector xLNL+1 that maximizes the determinant of ALNL+1 is defined by
xLNL+1=argmaxx{1+ϕ(x)T(ALNL)1ϕ(x)},
(19)
which can be shown through the matrix determinant lemma [21

21. K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” (2008). URL http://matrixcookbook.com/.

].

3.2. Simulation Setup

The first step in the classification process is generating the set of signals to be classified X. The true spectra set, F, is generated by drawing 300 signals from each of the multivariate Gaussian distributions defined in table 3. The table shows that each of the signals has 9 peaks, and the two distributions similar means and covariances. The 9 peaks occur at the same wavelengths in all spectra. Each spectrum is then measured at a signal level of 100 photons, and is reconstructed using the framework of Section 2 to generate a data set X of samples for classification. However, opposed to the spectra in Section 2, the exact sparsity of the signals to be input into the estimator is known. For this reason a different optimization algorithm is employed to solve (2), which removes the need for selection of an accurate regularization parameter. Orthogonal matching pursuit (OMP) as described in [22

22. J. A. Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Trans. Inf. Theory 50, 2231–2242 (2004). [CrossRef]

], is a greedy algorithm for sparse approximation, which is ideal for classification. Since the species are known to have n characteristic peaks, the algorithm will reconstruct an input signal such that the reconstruction has only the n peaks that best match the data. The only regularization parameter required is the truncation level of the ill-conditioned pseudo-random measurement matrix.

Table 3. Means and variances of two distributions on which classification is to be performed.

table-icon
View This Table
| View All Tables

Following reconstruction classification results were generated for signals using the framework of Section 3.1 with λ = .001 and γ = 25. The basis function selection procedure was run to exhaustion (until more basis functions decreased the determinant of A), and the label selection procedure was run until the change in determinant was less than .1. The data set of reconstructed spectra was normalized and centered, as is standard procedure in classification. The classifier was built using signals reconstructed two times: once to the full support of the data, and once without the weakest peak included. The reduction in reconstruction size for classification was because OMP is not guaranteed to select weak peaks correctly in the presence of noise as is demonstrated in Figure 3 [18

18. T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery,” Technical Report (2010).

]. Finally, it is important to note that in many cases it is more optimal to try to classify on the projection data than the reconstructed data. That indeed may even be true in this case. We present the results of classification on the reconstructed data, because we are trying to show which projection matrices provide the best projections for l1 denoising.

Fig. 3 A sample reconstruction via OMP showing the inaccurate placement of the weakest peak by both the pseudo-random and cyclic-s matrix.

3.3. Simulation Results

We use two metrics to determine the performance of the classifier. One metric is the receiver operating characteristic (ROC). The ROC measures detection rate (PD) versus false positive rate (PF). If the researcher in the problem described cared equally about distinguishing between the two species, then the intersection of the curve with the PD = 1 – PF line is most informative. The second metric is the number of basis functions and labels required to meet the stopping thresholds. If a data set is classified to high accuracy but takes many labels to do so its utility is limited.

The ROC results (Figure 4) for 9 peak reconstructions show that the cyclic-s spectrometer intersects the PD = 1 – PF line with a 96% detection rate, the slit spectrometer with a 93% detection rate, and the pseudo-random spectrometer at various lower rates depending on the truncation level. These results, however, could be misleading. The cyclic-s spectrometer requires nearly twice the number of labels to meet the stopping criteria in building the classifier. This is overcome when the weakest peak of the spectrum is removed from the reconstruction. As aforementioned, the weakest peak cannot be reconstructed reliably by the multiplexing spectrometers. Upon its removal, the cyclic-s and pseudo-random spectrometer ROCs all improve slightly, but what is more important is that the number of labels required for classification with the multiplex spectrometers falls in line with the number of labels required for classification for the slit spectrometer.

Fig. 4 ROC for data reconstructed from signals of total photon level of 100 photons. The cyclic-s matrix generates the highest ROC curves, but requires more labels to do so with a 9 peak reconstruction (see Table 4). The 8 peak reconstruction improves the quality of all of the multiplex ROCs, and allows for classification with the same number of labels as the identity matrix. Random pt XX refers to what fraction of singular values are kept when truncated-SVD is performed. Random full refers to the performance of the pseudo-random matrix without truncated-SVD.

Table 4. Number of basis functions and total labels necessary to meet the stopping criteria for each system’s data.

table-icon
View This Table
| View All Tables

We desired to know what characteristics are best preserved by what systems and the result is surprisingly simple and intuitive. The ability to classify on a given feature is determined by the relative strength of that feature to the other features in the set. If it is weak, it will likely be unrecoverable from the multiplexing noise, while if it is strong it will be reconstructed more accurately by a well-conditioned multiplexing system. This result is of course identical to that demonstrated in chapter 9 of [14

14. D. J. Brady, Optical imaging and spectroscopy (Wiley-interscience, 2008).

], but may be lost if only the MSE analysis is presented. It would be very easy to conceive of a spectrum with many strong peaks and one weak peak on which lies the most important information to identify the substance, and it would be just as easy to conceive of a spectrum with the converse properties.

4. Conclusion

We have shown that multiplex spectrometers can outperform non-multiplex spectrometers under a Poisson noise model. If MSE is the evaluation criterion, an optimal spectrometer will be a multiplex spectrometer with optimal conditioning if the sparsity is exact. If the sparsity is approximate, the results become dependent on signal level. As the signal level increases, the ideal measurement system becomes the slit spectrometer, as the slit spectrometer best measures weak features. The weak feature hypothesis is solidified through classification results. When the weak feature is removed from the data set, multiplex systems outperform the slit spectrometer. However, when the classfication performance depends on the weak feature the data does not cluster well for the multiplex systems, and many more labeled spectra are required for accurate classification. There is no overarching rule for what spectrometer coding a scientist should use based on noise model. To make optimal spectral estimates or classifications the scientist must use knowledge of the data of interest.

References and links

1.

M. T. E. Golay, “Multi-slit spectrometry,” J. Opt. Soc. Am. 39(6), 437–437 (1949). URL http://www.opticsinfobase.org/abstract.cfm?URI=josa-39-6-437. [CrossRef] [PubMed]

2.

M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).

3.

A. Barducci, D. Guzzi, C. Lastri, V. Nardino, P. Marcoionni, and I. Pippi, “Radiometric and signal-to-noise ratio properties of multiplex dispersive spectrometry,” Appl. Opt. 49(28), 5366–5373 (2010). URL http://ao.osa.org/abstract.cfm?URI=ao-49-28-5366. [CrossRef] [PubMed]

4.

A. A. Wagadarikar, M. E. Gehm, and D. J. Brady, “Performance comparison of aperture codes for multimodal, multiplex spectroscopy,” Appl. Opt. 46(22), 4932–4942 (2007). URL http://ao.osa.org/abstract.cfm?URI=ao-46-22-4932. [CrossRef] [PubMed]

5.

S. B. Mende, E. S. Claflin, R. L. Rairden, and G. R. Swenson, “Hadamard spectroscopy with a two-dimensional detecting array,” Appl. Opt. 32(34), 7095–7105 (1993). URL http://ao.osa.org/abstract.cfm?URI=ao-32-34-7095. [CrossRef] [PubMed]

6.

M. E. Gehm, S. T. McCain, N. P. Pitsianis, D. J. Brady, P. Potuluri, and M. E. Sullivan, “Static two-dimensional aperture coding for multimodal, multiplex spectroscopy,” Appl. Opt. 45(13), 2965–2974 (2006). URL http://ao.osa.org/abstract.cfm?URI=ao-45-13-2965. [CrossRef] [PubMed]

7.

M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14,013–14,027 (2007). URL http://www.opticsexpress.org/abstract.cfm?URI=oe-15-21-14013. [CrossRef]

8.

A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47(10), B44–B51 (2008). URL http://ao.osa.org/abstract.cfm?URI=ao-47-10-B44. [CrossRef] [PubMed]

9.

R. A. DeVerse, R. M. Hammaker, and W. G. Fateley, “Realization of the hadamard multiplex advantage using a programmable optical mask in a dispersive flat-field near-infrared spectrometer,” Appl. Spectrosc. 54(12), 1751–1758 (2000). URL http://as.osa.org/abstract.cfm?URI=as-54-12-1751. [CrossRef]

10.

B. Noble, Applied linear algebra (Prentice Hall, 1977). ID: DUKE004063101; Formats: Book; 2d ed.; xvii, 477 p. ; 24 cm.; M2: OCLC Number: 02985355 Bibliography: p. 459–463.Includes index.

11.

S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput. (USA) 20, 33–61 (1998). [CrossRef]

12.

J. Bioucas-Dias and M. Figueiredo, “A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. 16(12), 2992 –3004 (2007). [CrossRef] [PubMed]

13.

Z. Harmany, R. Marcia, and R. Willett, “This is SPIRAL-TAP: sparse poisson intensity reconstruction algorithms - theory and practice,” ArXiv e-prints (2010). 1005.4274.

14.

D. J. Brady, Optical imaging and spectroscopy (Wiley-interscience, 2008).

15.

D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289 –1306 (2006). [CrossRef]

16.

A. Chambolle, “An algorithm for total variation minimization and applications,” J. Math. Imaging Vision 20, 89–97 (2004). URL http://dx.doi.org/10.1023/B:JMIV.0000011325.36760.1e. [CrossRef]

17.

D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt. 49(36), 6824–6833 (2010). URL http://ao.osa.org/abstract.cfm?URI=ao-49-36-6824. [CrossRef] [PubMed]

18.

T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery,” Technical Report (2010).

19.

S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batch mode active learning and its application to medical image classification,” in Proceedings of the 23rd International Conference on Machine Learning (ICML (2006), pp. 417–424. URL http://doi.acm.org/10.1145/1143844.1143897. [CrossRef]

20.

Y. Zhang, X. Liao, and L. Carin, “Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO,” IEEE Trans. Geosci. Remote Sens. 42(11), 2535–2543 (2004). [CrossRef]

21.

K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” (2008). URL http://matrixcookbook.com/.

22.

J. A. Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Trans. Inf. Theory 50, 2231–2242 (2004). [CrossRef]

OCIS Codes
(300.6190) Spectroscopy : Spectrometers
(070.2025) Fourier optics and signal processing : Discrete optical signal processing

ToC Category:
Spectroscopy

History
Original Manuscript: August 10, 2011
Revised Manuscript: October 26, 2011
Manuscript Accepted: October 28, 2011
Published: January 18, 2012

Citation
Alex Mrozack, Daniel L. Marks, and David J. Brady, "Coded aperture spectroscopy with denoising through sparsity," Opt. Express 20, 2297-2309 (2012)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-20-3-2297


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. M. T. E. Golay, “Multi-slit spectrometry,” J. Opt. Soc. Am.39(6), 437–437 (1949). URL http://www.opticsinfobase.org/abstract.cfm?URI=josa-39-6-437 . [CrossRef] [PubMed]
  2. M. Harwit and N. J. A. Sloane, Hadamard transform optics (Academic Press, 1979).
  3. A. Barducci, D. Guzzi, C. Lastri, V. Nardino, P. Marcoionni, and I. Pippi, “Radiometric and signal-to-noise ratio properties of multiplex dispersive spectrometry,” Appl. Opt.49(28), 5366–5373 (2010). URL http://ao.osa.org/abstract.cfm?URI=ao-49-28-5366 . [CrossRef] [PubMed]
  4. A. A. Wagadarikar, M. E. Gehm, and D. J. Brady, “Performance comparison of aperture codes for multimodal, multiplex spectroscopy,” Appl. Opt.46(22), 4932–4942 (2007). URL http://ao.osa.org/abstract.cfm?URI=ao-46-22-4932 . [CrossRef] [PubMed]
  5. S. B. Mende, E. S. Claflin, R. L. Rairden, and G. R. Swenson, “Hadamard spectroscopy with a two-dimensional detecting array,” Appl. Opt.32(34), 7095–7105 (1993). URL http://ao.osa.org/abstract.cfm?URI=ao-32-34-7095 . [CrossRef] [PubMed]
  6. M. E. Gehm, S. T. McCain, N. P. Pitsianis, D. J. Brady, P. Potuluri, and M. E. Sullivan, “Static two-dimensional aperture coding for multimodal, multiplex spectroscopy,” Appl. Opt.45(13), 2965–2974 (2006). URL http://ao.osa.org/abstract.cfm?URI=ao-45-13-2965 . [CrossRef] [PubMed]
  7. M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express15(21), 14,013–14,027 (2007). URL http://www.opticsexpress.org/abstract.cfm?URI=oe-15-21-14013 . [CrossRef]
  8. A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt.47(10), B44–B51 (2008). URL http://ao.osa.org/abstract.cfm?URI=ao-47-10-B44 . [CrossRef] [PubMed]
  9. R. A. DeVerse, R. M. Hammaker, and W. G. Fateley, “Realization of the hadamard multiplex advantage using a programmable optical mask in a dispersive flat-field near-infrared spectrometer,” Appl. Spectrosc.54(12), 1751–1758 (2000). URL http://as.osa.org/abstract.cfm?URI=as-54-12-1751 . [CrossRef]
  10. B. Noble, Applied linear algebra (Prentice Hall, 1977). ID: DUKE004063101; Formats: Book; 2d ed.; xvii, 477 p. ; 24 cm.; M2: OCLC Number: 02985355 Bibliography: p. 459–463.Includes index.
  11. S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput. (USA)20, 33–61 (1998). [CrossRef]
  12. J. Bioucas-Dias and M. Figueiredo, “A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process.16(12), 2992 –3004 (2007). [CrossRef] [PubMed]
  13. Z. Harmany, R. Marcia, and R. Willett, “This is SPIRAL-TAP: sparse poisson intensity reconstruction algorithms - theory and practice,” ArXiv e-prints (2010). 1005.4274.
  14. D. J. Brady, Optical imaging and spectroscopy (Wiley-interscience, 2008).
  15. D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory52(4), 1289 –1306 (2006). [CrossRef]
  16. A. Chambolle, “An algorithm for total variation minimization and applications,” J. Math. Imaging Vision20, 89–97 (2004). URL http://dx.doi.org/10.1023/B:JMIV.0000011325.36760.1e . [CrossRef]
  17. D. Kittle, K. Choi, A. Wagadarikar, and D. J. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt.49(36), 6824–6833 (2010). URL http://ao.osa.org/abstract.cfm?URI=ao-49-36-6824 . [CrossRef] [PubMed]
  18. T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery,” Technical Report (2010).
  19. S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batch mode active learning and its application to medical image classification,” in Proceedings of the 23rd International Conference on Machine Learning (ICML (2006), pp. 417–424. URL http://doi.acm.org/10.1145/1143844.1143897 . [CrossRef]
  20. Y. Zhang, X. Liao, and L. Carin, “Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO,” IEEE Trans. Geosci. Remote Sens.42(11), 2535–2543 (2004). [CrossRef]
  21. K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” (2008). URL http://matrixcookbook.com/ .
  22. J. A. Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Trans. Inf. Theory50, 2231–2242 (2004). [CrossRef]

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

Figures

Fig. 1 Fig. 2 Fig. 3
 
Fig. 4
 

« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited