OSA's Digital Library

Optics Express

Optics Express

  • Editor: C. Martijn de Sterke
  • Vol. 20, Iss. 10 — May. 7, 2012
  • pp: 10658–10673
« Show journal navigation

Hyperspectral video restoration using optical flow and sparse coding

Ajmal Mian and Richard Hartley  »View Author Affiliations


Optics Express, Vol. 20, Issue 10, pp. 10658-10673 (2012)
http://dx.doi.org/10.1364/OE.20.010658


View Full Text Article

Acrobat PDF (1557 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

Hyperspectral video acquisition is a trade-off between spectral and temporal resolution. We present an algorithm for recovering dense hyperspectral video of dynamic scenes from a few measured multispectral bands per frame using optical flow and sparse coding. Different set of bands are measured in each video frame and optical flow is used to register them. Optical flow errors are corrected by exploiting sparsity in the spectra and the spatial correlation between images of a scene at different wavelengths. A redundant dictionary of atoms is learned that can sparsely approximate training spectra. The restoration of correct spectra is formulated as an 1 convex optimization problem that minimizes a Mahalanobis-like weighted distance between the restored and corrupt signals as well as the restored signal and the median of the eight connected neighbours of the corrupt signal such that the restored signal is a sparse linear combination of the dictionary atoms. Spectral restoration is followed by spatial restoration using a guided dictionary approach where one dictionary is learned for measured bands and another for a band that is to be spatially restored. By constraining the sparse coding coefficients of both dictionaries to be the same, the restoration of corrupt band is guided by the more reliable measured bands. Experiments on real data and comparison with an existing volumetric image denoising technique shows the superiority of our algorithm.

© 2012 OSA

1. Introduction

Spectroscopy is the measurement and analysis of electro-optical spectra emitted or reflected by an object or transmitted through a medium. When spectral information is measured at multiple spatial points (for example using a rectangular grid), it is known as imaging spectroscopy. Imaging spectroscopy is also referred to as hyperspectral imaging. A hyperspectral image is a data cube with two spatial and one spectral dimension. Measuring this data cube is generally a sequential process. Either 2D spatial images are sequentially acquired at the desired wavelengths (see Fig. 1) or a 1D hyperspectral sensor, simultaneously measuring all wavelengths of interest along a 1D line, is scanned over a scene. In the latter case the sensor, known as a push-broom sensor, is usually mounted on a moving platform like a satellite or aircraft. In this paper, we focus on hyperspectral video (multiple cubes) acquisition using the former technique.

Fig. 1 Sample bands at 690, 650, 620, 610, 600, 590, 580, 540, 520, 510, 500, 490, 480 and 440 nm of a hyperspectral image cube. Each band is rendered as it would be seen by a human eye.

Fig. 2 An illustration of hyperspectral video. There are three hyperspectral frames each with five bands.
Fig. 3 (a) RGB image of the scene in Fig. 1. (b) Five bands (60nm apart) are sensed in each frame with a between-frame offset of 10nm. Six consecutive frames cover 30 bands (430–720nm) for example the first frame (row) comprises 430, 490, 550, 610, 670nm bands.

In this paper, we use the model presented in Fig. 3(b) to restore the dense (30 band) hyperspectral cubes of all the frames. We assume that a frame is acquired instantly hence there is no motion between the bands of a frame. We do not make any further assumptions such as constant velocity, constant acceleration or minimal motion between frames because motion between adjacent frames can be significant. Since most objects (static or moving) are sensed at all wavelengths (bands), in theory their full spectral response can be recovered if correspondences are known between the frames. However, there are three main challenges in achieving this. Firstly, dense correspondence techniques such as optical flow are sensitive to intensity variations between frames. Unlike traditional video, where adjacent frames have similar illumination, the bands of adjacent frames have a wavelength offset in our case, which causes intensity or texture variations between them and makes optical flow more challenging. Secondly, sequential registration of bands that are many frames apart, accumulates optical flow errors. Thus, the resultant spectral response is corrupted at many pixels. Finally, the spectral response of objects that are occluded in some frames or that enter or exit the field of view of the imager is not measured at all bands or wavelengths. In this paper, we address these challenges and propose an algorithm for hyperspectral video restoration. Experiments on real data and comparison with an existing volumetric image denoising technique shows the superiority of our method.

2. Prior work

To the best of our knowledge, this is the first algorithm proposed for hyperspectral video restoration. However, prior work exists on image denoising, volumetric image denoising and RGB color image restoration. From one perspective, our work falls into the category of sparse data acquisition and recovery (or compressive sensing, see e.g. [1

1. D. Kittle, K. Choi, A. Wagadarikar, and D. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt. 49, 6824–6833 (2010). [CrossRef] [PubMed]

3

3. A. Wagadarikar, N. Pitsianis, X. Sun, and D. Brady, “Video rate spectral imaging using a coded aperture snapshot spectral imager,” Opt. Express 17, 6368–6388 (2009). [CrossRef] [PubMed]

]) since we acquire only a sparse number of bands at a given instant and the remaining bands are acquired after the scene has changed. However, since correspondences can be established between consecutive frames, we believe that our work is more relevant to image restoration and denoising. This section presents a brief survey of techniques that are most relevant in order to put our work into perspective. We avoid surveying optical flow techniques as our primary contribution is in denoising spectral signals and bands to restore hyperspectral video. Our secondary contribution is the construction of spatio-spectral images, which is likely to improve the accuracy of any optical flow technique.

A set of signals x ∈ 𝕉n is said to exhibit a sparse structure if each signal can be approximated as a linear combination of a few atoms from a dictionary D ∈ 𝕉n×M, where M is the number of atoms in the dictionary. The dictionary D contains prototype signal atoms and is usually overcomplete i.e. the number of atoms is greater than the dimensionality of the signals (M > n). An input signal x is approximated as
x^Dα^where
(1)
α^=minαα0s.t.Dαx22ɛ,
(2)
oralternativelyα^=argminα{Dαx22+γα0}.
(3)
Here α ∈ 𝕉M is sparse i.e. it has only a few non-zero elements. The parameter γ sets the tradeoff between the approximation and the sparsity of α. The sparsity of α is ensured by minimizing its 0 pseudo-norm (number of non-zero entries) in the constrained optimization problem of Eq. (2) or its unconstrained version in Eq. (3). Computing 0 is NP-hard and greedy algorithms such as Orthogonal Matching Pursuit (OMP) [4

4. Y. Pati, R. Rexaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of the 27th Asilomar Conference on Signals, Systems, and Computers (IEEE, 1993), 40–44. [CrossRef]

] are used to solve the above equations. Since the 1 regularization also results in a sparse solution for α, it is frequently used to formulate sparse coding as a convex optimization problem
α^=argminα{Dαx22+γ1α1},
(4)
commonly known as the Lasso [5

5. R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc. Ser. B 58, 267–288 (1996).

]. Here γ1 is a regularization parameter similar to Eq. (3).

The choice of dictionary D is critical, especially for the task of signal denoising and restoration. Aharon et al. [6

6. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54, 4311–4322 (2006). [CrossRef]

] proposed the K-SVD algorithm to learn a dictionary in an iterative process. K-SVD alternates between sparse coding of training data based on a current dictionary and updating the dictionary atoms using SVD to better fit the data. Elad and Aharon [7

7. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15, 3736–3745 (2006). [CrossRef] [PubMed]

] used the dictionary learned through the K-SVD algorithm to denoise grayscale images. For computational efficiency, the image was divided into smaller overlapping patches and the results were averaged. The main idea was to constrain the denoised image to be close to the original noisy image and to update the dictionary from the noisy image itself. In addition to self-regularization, 0 regularization was used for sparsity
{A^,X^}=argminA,X{ijDαijRijX22+ijγijαij0+γ2XY22}.
(5)
In this expression, A represents the set of all αij where ij runs over all pixels of the image; Y is the noisy image, X is its unknown denoised version and RijX is a patch around pixel ij extracted from image X. This is similar to Eq. (3) except for the last self regularization term which forces the denoised image to be close to the original noisy image. The parameter γij controls the relative importance of the sparsity of patch ij and γ2 controls the relative importance of self-similarity of the complete reconstructed image. This approach worked well for lower noise levels but the results deteriorated rapidly at higher levels of noise. For denoising high dimensional signals such as volumetric images, the same technique [7

7. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15, 3736–3745 (2006). [CrossRef] [PubMed]

] can be extended by taking smaller overlapping volume patches. In Section 6, we provide a comparison of our proposed approach with the volumetric image denoising of hyperspectral image cubes and show that we achieve superior results.

Mairal et al. [8

8. M. Elad and G. Sapiro, “Sparse representation for color image restoration,” IEEE Trans. Image Process. 17, 53–69 (2008). [CrossRef] [PubMed]

] extended the K-SVD grayscale image denoising algorithm [7

7. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15, 3736–3745 (2006). [CrossRef] [PubMed]

] to restore RGB color images. The color denoising algorithm follows the original K-SVD algorithm applied to p × p × 3 RGB patches except for a new projection method in the OMP step. The inner product yT x in the original OMP [4

4. Y. Pati, R. Rexaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of the 27th Asilomar Conference on Signals, Systems, and Computers (IEEE, 1993), 40–44. [CrossRef]

] is replaced with yT (I +γK/p)x where γ is an empirically selected control parameter and
K=(Jp000Jp000Jp),whereJpisap×pmatrixofones.
(6)

Othman and Qian [9

9. H. Othman and S. Qian, “Noise reduction of hyperspectral imagery using hybrid spatial-spectral derivative-domain wavelet shrinkage,” IEEE Trans. Geosci. Remote Sens. 44, 397–408 (2006). [CrossRef]

] proposed wavelet shrinkage based hyperspectral image denoising. First, noise is removed in the spatial domain followed by noise removal in the spectral domain which also corrects artifacts resulting from spatial denoising. The algorithm operates on the spectral derivative of the hyperspectral cube. Bourguignon et al. [10

10. S. Bourguignon, D. Mary, and E. Slezak, “Sparsity-based denoising of hyperspectral astrophysical data with colored noise: Application to the MUSE instrument,” in 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (IEEE, 2010), 1–4. [CrossRef]

] used sparse representations in redundant transformation spaces for denoising astrophysical spectra. They model astrophysical data as the sum of line and continuous spectra and use the 1-norm regularization to impose sparsity constraints on their respective canonical and DCT bases. Results are reported on simulated data from the MUSE (Multi Unit Spectroscopic Explorer) consortium.

In general, the above algorithms assume that the noise is white Gaussian with zero mean. However, optical flow based registration of hyperspectral frames does not introduce Gaussian noise but artifacts due to non-linear mixing of multiple spectra. Thus, assumptions such as noise being mostly concentrated in the high frequency components of the signal do not hold.

3. Hyperspectral frame registration using optical flow

Estimating the optical flow between hyperspectral frames with heterogeneous bands introduces an additional challenge because of the different wavelength dependent spectral response f the scene. Figure 4 shows the third band from six consecutive frames of a hyperspectral video. In addition to changes due to motion (the moving block in the foreground), the intensity or texture of the images is also varying significantly even though there is only 10nm difference between the bands of consecutive frames. The 3D plots in Fig. 5 represent optical flow (horizontal direction only) calculated with Farnebäck’s algorithm [11

11. G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Proceedings of the 13th Scandinavian Conference on Image Analysis (Springer, 2003), 363–370.

] for the scene in Fig. 4. Notice that optical flow calculated from pairs of heterogeneous bands contain numerous errors (vertical spikes). Motion is incorrectly found in static regions of the scene and sometimes in the wrong direction on the moving block.

Fig. 4 Third band (rendered as gray scale images) from six consecutive frames of a dynamic scene with static background and a moving block in the foreground. Notice the varying texture which makes it challenging to calculate optical flow between frames/bands.
Fig. 5 Optical flow in the horizontal direction for the scene in Fig. 4 represented as 3D plots. The x,y directions correspond to the image dimensions and the vertical direction corresponds to horizontal displacement between frames. The frame bands used to calculate the optical flow are written under each plot. The bottom right plot shows optical flow calculated between five band spatio-spectral images of consecutive hyperspectral frames.

A naive approach is to use a common band between frames for registration. However, this approach decreases the efficiency (frame rate or spectral resolution) by 20% and offers no improvement in optical flow since a single narrow band cannot capture all the texture in the scene. Thus deciding on a common band is a problem in itself as the common band should ideally be scene-specific.

To address the above challenges, we construct a spatio-spectral image by ordering the five measured bands of a hyperspectral frame as shown in Fig. 6(a). Each set of 3 × 3 pixels is formed by ordering the corresponding pixels of the 5 bands (similar to the Bayer pattern). The corner pixels are interpolated from the nearest pixels in the same set excluding the center pixel. Thus, the spatio-spectral image is nine times larger compared to any single band image. Notice that in Fig. 6(b), the inner orange patch of the square is not distinguishable from its blue boundary at 550nm whereas it is visible in the spatio-spectral image (rendered as RGB in Fig. 6(c)).

Fig. 6 (a) Ordering of bands in a spatio-spectral image. Each set of 3 × 3 pixels is formed by ordering the corresponding pixels of the 5 bands. The corner pixels are interpolated from the nearest pixels in the same set excluding the center pixel. (b) A scene patch at 550nm rendered as a gray scale image. (c) A spatio-spectral image of the same patch constructed from 5 bands i.e. 430, 490, 550, 610 and 670nm.

Optical flow calculated from the spatio-spectral images is more accurate and has almost no incorrect motion in static regions of the scene (see Fig. 5 bottom-right). Note that an offset of 10nm still exists between the spatio-spectral images constructed from consecutive hyperspectral frames. However, due to their increased textural information, they result in more acute optical flow. Figure 7 shows an example of sequentially registered 540nm bands five frames apart. The left-most image is the measured 540nm band at frame 6, the middle one is sequentially registered using optical flow between five successive pairs of single heterogeneous bands and the right-most is sequentially registered using optical flow calculated from five pairs of spatio-spectral images. Registration based on spatio-spectral optical flow is significantly better. Notice that the block as well as its reflection from the table is distorted in the middle image whereas the distortions of the block and its reflection are minimal in the right-most image.

Fig. 7 A 540nm band (left) is sequentially registered from frame 6 to 1 using inter-band optical flow (center) and spatio-spectral image-based optical flow (right).

4. Hyperspectral video restoration

Although, the use of spatio-spectral images increases the accuracy of optical flow, some errors still exist. These errors accumulate and become more obvious after sequential registration of bands that are many frames apart. Notice that some distortions still exist in Fig. 7 when the spatio-spectral images are used for optical flow. The spectral curves at the distorted image pixels are also distorted. Since six frames are registered to complete a hyperspectral cube, each spectral response could be a mixture of up to six different spectra. Pure spectral response will be obtained at pixels with no optical flow errors whereas mixtures of six spectra will be obtained at pixels where optical flow errors exist between all pairs of frames.

Parkkinen et al. [12

12. J. Parkkinen, J. Hallikainen, and T. Jaaskelainen, “Characteristic spectra of munsell colors,” J. Opt. Soc. Am. A 6, 318–322 (1989). [CrossRef]

] measured the visible-range reflectance spectra of the 1257 chips in the Munsell Book of Color and reported that the spectra can be well approximated as a linear combination of eight characteristic spectra. This indicates sparsity in the reflectance spectra and the possibility of restoring the spectral response at corrupted pixels as a sparse linear combination of an overcomplete dictionary of spectral atoms. We used training data to learn the dictionary and tested two dictionary-learning algorithms for this purpose, namely the K-SVD algorithm [6

6. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54, 4311–4322 (2006). [CrossRef]

] and the online learning algorithm of Mairal et al. [13

13. J. Mairal, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res. 11, 19–60 (2010).

]. We report results for the later technique, as it performed better.

4.1. Spectral restoration

We propose a spectral restoration model that capitalizes on the fact that the spectral response is measured at five out of 30 wavelengths in each hyperspectral frame. Therefore, these five measurements are more reliable compared to the remaining 25, which come from optical flow based registration and may contain errors. Let sij ∈ 𝕉30 be the spectral response at pixel i, j of the 30 band hyperspectral frame (cube) obtained from optical flow and Dλ ∈ 𝕉30×Mλ be the overcomplete (spectral) dictionary (where Mλ is the size of the spectral dictionary) learned from the static pixels of the hyperspectral frame. Then, according to sparse coding theory the denoised hyperspectral frame can be recovered as
H^ij=Dλα^ij
(8)
where α̂ij are sparse coefficients given by
α^ij=argminαij{Dλαijsij22+γ1αij1},
(9)
and γ1 is the sparsity regularizer. Note that each vector α̂ij is computed separately. The restored multispectral image Ĥ is the ensemble of spectra Ĥij defined at each pixel position ij. Note that Eq. (9) minimizes the 2 error between the input signal sij and its sparse approximation giving equal weights to all dimensions (bands/wavelengths). However, sij is more reliable at the measured bands/wavelengths. Moreover, optical flow between neighbouring frames is less likely to have errors compared to optical flow accumulated over five consecutive frames. Therefore, we introduce a weighting term such that
α^ij=argminαij{W(Dλαijsij)22+γ1αij1},
(10)
where W ∈ 𝕉30×30 is a diagonal matrix of weights that gives the highest weight to the measured wavelengths followed by those which are estimated from optical flow between nearest frames. Bands/wavelengths that are registered between distant frames get the least weights.

It was observed that improved results were obtained by applying a simple edge-preserving filter to the image prior to the restoration step. The spectral image is mixed with its median-filtered version to remove impulsive noise. Thus, we define
s˜ij=(1γ2)sij+γ2s¯ij,
where s̄ij is the median of the eight-connected neighbours and γ2 is a mixing constant that regularizes the relative importance of sij and s̄ij. The median is computed independently in each band. Then, weighting coefficients αij are computed using this filtered image:
α^ij=argminαij{W(Dλαijs˜ij)22+γ1αij1},
(11)
Equation 11 is convex for a known dictionary. The dictionary is learned in the formulation of Eq. (4) using the online dictionary learning algorithm [13

13. J. Mairal, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res. 11, 19–60 (2010).

]. The spectra in static regions of the scene are used as training data for dictionary learning. Note that WDλ needs to be calculated once only. Equation (11) can be solved using the Least Angle Regression (LARS) [14

14. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Ann. Stat. 32, 407–499 (2004). [CrossRef]

] algorithm. Figure 8 shows examples of recovered spectra using the above model and Fig. 9 shows an example of a band recovered after optical flow based registration from frame 1 to frame 6.

Fig. 8 Optical flow errors lead to incorrect spectral reflectance curves at many pixels (see above examples). Using the proposed technique, the correct spectral reflectance can be recovered.
Fig. 9 Left: A 550nm band registered sequentially from frame 1 to frame 6. Center: The errors propagated from optical flow are corrected by spectral restoration. Right: Ground truth 550nm band acquired with frame 6.

4.2. Spatial restoration

Fig. 10 Spectral restoration of a 490nm band sequentially registered from five frames apart. Some errors can be noticed around the boundaries of the moving blocks which are removed by the spatial restoration.

Equation 13 is convex with respect to each variable when the other is fixed. First, the dictionary is learned [13

13. J. Mairal, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res. 11, 19–60 (2010).

] from part of the hyperspectral frame where motion is not detected and then the region where motion is detected is restored using the same dictionary. Unlike [15

15. J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process. 19, 2861–2873 (2010). [CrossRef]

], during restoration of a band, the noisy patch as well as the corresponding noise-free patch from the measured bands are both sparsely approximated using the learned dictionary. Thus the restoration process is guided by both five noise-free patches of measured bands and one noisy patch of the band estimated from optical flow followed by spectral restoration. Intuitively this gives better accuracy and was verified experimentally as well by removing the noisy patch during the sparse coding step. Once all the (overlapping) patches have been calculated, the values corresponding to the band to be recovered are averaged. Figure 10 shows a sample band after spectral restoration and spectral + spatial restoration. A magnified view is given in Fig. 11. Notice the improvement around the boundary of the moving blocks.

Fig. 11 Magnified views of the (left most middle part of the) three images in Fig. 10. After spectral+spatial restoration (middle image), the boundaries are better recovered and the image has more resemblance to the ground truth (right image).

5. Experimental setup and data collection

Our hyperspectral imaging system includes a CRi VariSpec Liquid Crystal Tunable Filter (LCTF), a 25mm lens and a Basler scA750–60fm camera with 752 × 480 spatial resolution (see Fig. 12). A halogen light was used to illuminate the scene and the Macbeth color checker was used for spectral calibration. The LCTF was tuned and synchronized with the camera using custom software. An image was acquired each time the filter tuned to a different wavelength. The LCTF can be tuned to 33 different wavelengths from 400 to 720nm with 10nm step. Figure 12 shows the transmittance of the LCTF, the camera’s CCD sensitivity (both provided by the respective manufacturers) and the spectrum of the halogen light measured with the StellarNet spectroradiometer. The exposure time of the camera was varied during acquisition to cater for the varying LCTF transmittance. Due to low LCTF transmittance, the 400, 410 and 420nm bands were dropped (see Fig. 12(b)) and the last 30 bands were used in our experiments. We collected a sequence of 30 frame hyperspectral video of a static scene with a moving object. A sample image of the scene is given in Fig. 9. The scene was static while the bands of a frame were measured. However, between the frames, an object was moved in the scene. The movement was manually performed and was significant between frames. All 30 bands were measured in each hyperspectral frame but as shown in Fig. 3, only five bands per frame were used to recover the full 30 band hyperspectral frames. The remaining 25 bands per frame were used as ground truth for quantitative analysis. This data is available for research purposes on the first author’s website.

Fig. 12 (a) Hyperspectral camera setup. (b) Transmittance of the LCTF. (c) Quantum efficiency of the camera CCD. (d) Spectral curve of the halogen light.

There are three free parameters in the proposed restoration model. Their values were set to γ1 = γ3 = 0.15 and γ2 = 0.3. The spectral dictionary size was set to over three times the dimensionality of spectral signal (the number of bands) i.e. Mλ = 100 so that the dictionary is overcomplete. The patch size p in spatial restoration was set to 3 since a smaller than 3 × 3 patch does not contain significant spatial information. Accordingly, the spatial dictionary size was set to be over three times the dimensionality of the patch (3 × 3 × 6, where 6 corresponds to 5 measured bands + one band to be restored) i.e. Ms = 256. We also report results for p = 5 and Ms = 700.

6. Hyperspectral video restoration results

We report quantitative results using RMSE (Root Mean Squared Error) between the recovered hyperspectral cube and the measured ground truth. The RMSE between a recovered hyperspectral frame Hr ∈ 𝕉u×v×n and its corresponding ground truth Hg ∈ 𝕉u×v×n is given by
RMSE=1uvni=1uj=1vk=1n(HijkrHijkg)2,
(14)
where n is the spectral and u × v are the spatial dimensions. To avoid bias in the results, RMSE was always measured at only those pixels where motion was detected by optical flow. RMSE of the full frames were much lower due to averaging with more reliable spectra in the static regions. The input frames were divided into static regions and motion regions using two conservative masks obtained from optical flow. The masks ensured that only static regions were used for learning the dictionaries and that RMSE was calculated at the motion pixels.

In the first experiment, we compare optical flow based registration to cubic interpolation (between measured bands) and use the proposed algorithm to restore the hyperspectral frames in both cases. Since learning a dictionary from interpolated bands leads to incorrect sparse representations, additional training data is required in the form of at least one full 30 band hyperspectral cube. We used an additional hyperspectral cube for learning the dictionary for restoration of the interpolated bands. In the case of optical flow (this and all the remaining experiments), the dictionaries were learned from the static regions of the input frame.

Table 1 shows the results of our first experiment. Optical flow gives a smaller RMSE from the ground truth compared to cubic interpolation. Moreover, the proposed restoration algorithm recovers the hyperspectral cubes more accurately from optical flow based registered bands. Nevertheless, it is interesting to see that our algorithm recovers the dense hyperspectral cube with reasonable accuracy from a cube constructed from interpolation between only five measured bands.

Table 1. RMSE of restored hyperspectral frames from measured ground truth. Frames registered with optic flow give better restoration accuracy compared to cubic interpolation.

table-icon
View This Table
| View All Tables

In the second experiment, we run the proposed restoration algorithm in a loop to find the minimum number of required iterations. Table 2 shows that maximum improvement in RMSE was achieved in the first iteration and the second iteration improved the RMSE in only a few cases where the maximum frame distance of optical flow was higher. However, while widely accepted, RMSE is not the best quality measure for denoised images since it does not take structural distortions into account [16

16. P. Ndajah, H. Kikuchi, M. Yukawa, H. Watanabe, and S. Muramatsu, “An investigation on the quality of denoised images,” Int. J. Circuits, Systems and Signal Process. 5, 423–434 (2011).

]. Visual inspection shows that the second iteration improves the structural quality of the images but RMSE drops slightly due to blurring which is caused by the second term in Eq. (11) and the averaging of the overlapping patches during the spatial restoration. The results in Table 2 are given for two different patch sizes used in the spatial restoration stage i.e. patch sizes of 3 × 3 × 6 and 5 × 5 × 6. The third dimension is fixed at 6 because each time there are five measure bands and one additional band to be restored. Note that the smaller patch gives slightly better results and is also computationally more efficient to use since a lower dimensional vector needs to be approximated from a smaller size dictionary.

Table 2. RMSE of recovered hyperspectral frames w.r.t. the number of iterations.

table-icon
View This Table
| View All Tables

In the last experiment, we compare the proposed algorithm with the K-SVD volumetric image denoising algorithm [7

7. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15, 3736–3745 (2006). [CrossRef] [PubMed]

]. For K-SVD denoising, we used the implementation provided by the authors [7

7. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15, 3736–3745 (2006). [CrossRef] [PubMed]

]. The output of the optical flow was used as input to both algorithms. In both cases, the (initial) dictionaries were learned from exactly the same regions of the input hyper-spectral frames i.e. where motion was not detected by optical flow. Figure 13 shows a sample band recovered with the proposed approach and K-SVD for qualitative analysis whereas Table 3 provides a quantitative comparison of the two algorithms for all 30 hyperspectral frames. The K-SVD algorithm did not perform well in removing the optical flow artifacts and achieved lower performance than our spectral restoration model alone. For comparison of our spatial restoration model, we combined the K-SVD volumetric denoising with our spectral restoration model in different configurations. The proposed spectral restoration followed by K-SVD volumetric denoising (λ +KSVD) did not improve the RMSE except for frame number 7, 8 and 18. Overall, the best performance was achieved by the proposed spectro-spatial restoration model. Note that there is more motion between certain frames causing their RMSE to be greater than others.

Fig. 13 Comparison with K-SVD denoising. A 560nm band, registered from 5 frames apart is restored with (a) spectral restoration, (b) spectral + spatial restoration, (d) K-SVD volumetric denoising, (e) spectral restoration + KSVD volumetric denoising and (f) K-SVD volumetric denoising + spectral restoration. Measured ground truth is in (c).

Table 3. Comparison with the volumetric K-SVD algorithm under different configurations. λ: proposed spectral restoration only, λ +G: proposed spectral+spatial restoration, KSVD: K-SVD volumetric denoising [6]. The overall best performance is achieved by the proposed spectral+spatial restoration λ +G.

table-icon
View This Table
| View All Tables

7. Processing time

All algorithms were tested on a 2.4GHz quad core machine with 32bit operating system and 4GB RAM. The code for hyperspectral video acquisition and the optical flow based registration were implemented in Visual C++. Acquisition time for one hyperspectral frame was 0.66 seconds. Optical flow based registration took 4.04 seconds per hyperspectral frame. The spectral and spatial restoration algorithms were implemented in Matlab and the Sparse Modeling Software (SPAMS) [13

13. J. Mairal, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res. 11, 19–60 (2010).

] was used for dictionary learning and least angle regression [14

14. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Ann. Stat. 32, 407–499 (2004). [CrossRef]

]. The time required for spectral dictionary learning and restoration was 6.74 seconds whereas the time required for spatial dictionary learning and restoration was 4.70 seconds per band.

8. Conclusion

We presented an algorithm for the restoration of dense hyperspectral video from a few measured bands per frame. The proposed approach increases the frame rate or spectral resolution of imaging systems by many folds. It exploits the sparsity in spectral response of natural objects and the spatial correlation between images acquired at different wavelengths. The measured bands of each frame are arranged in a Bayer like pattern to make spatio-spectral images which offer better optical flow accuracy. Errors from optical flow are first removed using a spectral restoration model followed by a spatial restoration model. Different formulations of sparse coding are used in both models and the dictionary is learned from regions of the input frame (to be restored) where no motion is detected. Experimental analysis on real data and comparison with an existing state-of-the-art volumetric image denoising technique, under various experimental configurations, shows that the proposed approach consistently achieves higher restoration accuracy. Unlike the majority of image restoration literature, we did not attempt to remove noise or artifacts that had been synthetically introduced in the ground truth images. In fact, we measured the ground truth bands for comparison, which is a more realistic setting.

It is worth mentioning that the number of bands per frame determine a trade-off between accuracy and efficiency. The algorithm will still work with fewer or more number of bands per frame. Fewer bands per frame will increase the hyperspectral video frame rate but will deteriorate the optical flow and restoration accuracy. Similarly, measuring more wavelengths is likely to improve accuracy at the cost of lower frame rate.

Acknowledgment

This research was supported by Australian Research Council grant DP110102399.

References and links

1.

D. Kittle, K. Choi, A. Wagadarikar, and D. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt. 49, 6824–6833 (2010). [CrossRef] [PubMed]

2.

M. Shankar, N. Pitsianis, and D. Brady, “Compressive video sensors using multichannel imagers,” Appl. Opt. 49, B9–B17 (2010). [CrossRef] [PubMed]

3.

A. Wagadarikar, N. Pitsianis, X. Sun, and D. Brady, “Video rate spectral imaging using a coded aperture snapshot spectral imager,” Opt. Express 17, 6368–6388 (2009). [CrossRef] [PubMed]

4.

Y. Pati, R. Rexaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of the 27th Asilomar Conference on Signals, Systems, and Computers (IEEE, 1993), 40–44. [CrossRef]

5.

R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc. Ser. B 58, 267–288 (1996).

6.

M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54, 4311–4322 (2006). [CrossRef]

7.

M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15, 3736–3745 (2006). [CrossRef] [PubMed]

8.

M. Elad and G. Sapiro, “Sparse representation for color image restoration,” IEEE Trans. Image Process. 17, 53–69 (2008). [CrossRef] [PubMed]

9.

H. Othman and S. Qian, “Noise reduction of hyperspectral imagery using hybrid spatial-spectral derivative-domain wavelet shrinkage,” IEEE Trans. Geosci. Remote Sens. 44, 397–408 (2006). [CrossRef]

10.

S. Bourguignon, D. Mary, and E. Slezak, “Sparsity-based denoising of hyperspectral astrophysical data with colored noise: Application to the MUSE instrument,” in 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (IEEE, 2010), 1–4. [CrossRef]

11.

G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Proceedings of the 13th Scandinavian Conference on Image Analysis (Springer, 2003), 363–370.

12.

J. Parkkinen, J. Hallikainen, and T. Jaaskelainen, “Characteristic spectra of munsell colors,” J. Opt. Soc. Am. A 6, 318–322 (1989). [CrossRef]

13.

J. Mairal, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res. 11, 19–60 (2010).

14.

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Ann. Stat. 32, 407–499 (2004). [CrossRef]

15.

J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process. 19, 2861–2873 (2010). [CrossRef]

16.

P. Ndajah, H. Kikuchi, M. Yukawa, H. Watanabe, and S. Muramatsu, “An investigation on the quality of denoised images,” Int. J. Circuits, Systems and Signal Process. 5, 423–434 (2011).

OCIS Codes
(100.3020) Image processing : Image reconstruction-restoration
(100.4145) Image processing : Motion, hyperspectral image processing

ToC Category:
Image Processing

History
Original Manuscript: February 23, 2012
Revised Manuscript: April 19, 2012
Manuscript Accepted: April 19, 2012
Published: April 24, 2012

Citation
Ajmal Mian and Richard Hartley, "Hyperspectral video restoration using optical flow and sparse coding," Opt. Express 20, 10658-10673 (2012)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-20-10-10658


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. D. Kittle, K. Choi, A. Wagadarikar, and D. Brady, “Multiframe image estimation for coded aperture snapshot spectral imagers,” Appl. Opt.49, 6824–6833 (2010). [CrossRef] [PubMed]
  2. M. Shankar, N. Pitsianis, and D. Brady, “Compressive video sensors using multichannel imagers,” Appl. Opt.49, B9–B17 (2010). [CrossRef] [PubMed]
  3. Y. Pati, R. Rexaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of the 27th Asilomar Conference on Signals, Systems, and Computers (IEEE, 1993), 40–44. [CrossRef]
  4. R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. R. Stat. Soc. Ser. B58, 267–288 (1996).
  5. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process.54, 4311–4322 (2006). [CrossRef]
  6. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process.15, 3736–3745 (2006). [CrossRef] [PubMed]
  7. M. Elad and G. Sapiro, “Sparse representation for color image restoration,” IEEE Trans. Image Process.17, 53–69 (2008). [CrossRef] [PubMed]
  8. H. Othman and S. Qian, “Noise reduction of hyperspectral imagery using hybrid spatial-spectral derivative-domain wavelet shrinkage,” IEEE Trans. Geosci. Remote Sens.44, 397–408 (2006). [CrossRef]
  9. S. Bourguignon, D. Mary, and E. Slezak, “Sparsity-based denoising of hyperspectral astrophysical data with colored noise: Application to the MUSE instrument,” in 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (IEEE, 2010), 1–4. [CrossRef]
  10. G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Proceedings of the 13th Scandinavian Conference on Image Analysis (Springer, 2003), 363–370.
  11. J. Parkkinen, J. Hallikainen, and T. Jaaskelainen, “Characteristic spectra of munsell colors,” J. Opt. Soc. Am. A6, 318–322 (1989). [CrossRef]
  12. J. Mairal, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res.11, 19–60 (2010).
  13. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Ann. Stat.32, 407–499 (2004). [CrossRef]
  14. J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process.19, 2861–2873 (2010). [CrossRef]
  15. P. Ndajah, H. Kikuchi, M. Yukawa, H. Watanabe, and S. Muramatsu, “An investigation on the quality of denoised images,” Int. J. Circuits, Systems and Signal Process.5, 423–434 (2011).

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.


« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited