OSA's Digital Library

Optics Express

Optics Express

  • Editor: C. Martijn de Sterke
  • Vol. 20, Iss. 10 — May. 7, 2012
  • pp: 10971–10983
« Show journal navigation

High-resolution lightfield photography using two masks

Zhimin Xu, Jun Ke, and Edmund Y. Lam  »View Author Affiliations


Optics Express, Vol. 20, Issue 10, pp. 10971-10983 (2012)
http://dx.doi.org/10.1364/OE.20.010971


View Full Text Article

Acrobat PDF (1181 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

A major theme of computational photography is the acquisition of lightfield, which opens up new imaging capabilities, such as focusing after image capture. However, to capture the lightfield, one normally has to sacrifice significant spatial resolution as compared to normal imaging for a fixed sensor size. In this work, we present a new design for lightfield acquisition, which allows for the capture of a higher resolution lightfield by using two attenuation masks. They are positioned at the aperture stop and the optical path respectively, so that the four-dimensional (4D) lightfield spectrum is encoded and sampled by a two-dimensional (2D) camera sensor in a single snapshot. Then, during post-processing, by exploiting the coherence embedded in a lightfield, we can retrieve the desired 4D lightfield with a higher resolution using inverse imaging. The performance of our proposed method is demonstrated with simulations based on actual lightfield datasets.

© 2012 OSA

1. Introduction

Advances in computational imaging suggest that we can capture more information than a single two-dimensional (2D) projection of a three-dimensional (3D) scene. Although the acquired picture in this manner may not be visually pleasing, via computational methods in post-processing, it can yield data that could not be obtained with the traditional methods [1

1. E. Y. Lam, “Computational photography: Advances and challenges,” in Tribute to Joseph W. Goodman, H. J. Caulfield and H. H. Arsenault, eds., Proc. SPIE 8122, 81220O (2011).

5

5. T. Mirani, D. Rajan, M. P. Christensen, S. C. Douglas, and S. L. Wood, “Computational imaging systems: Joint design and end-to-end optimality,” Appl. Opt. 47, B86–B103 (2008). [CrossRef] [PubMed]

]. In this paper, we focus on the camera design for computational photography, which allows us to capture the “lightfield”. This is a term commonly used in the computer graphics literature [6

6. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (1996), pp. 31–42.

], but is not a “field” in the wave optics sense [7

7. J. W. Goodman, Introduction to Fourier Optics, 3rd ed. (Roberts and Company Publishers, 2004).

]; instead, it is a collection of light rays in geometric optics, which takes into account not only the geometrical position of the rays but also their directions.

Generally, the radiance along all the rays in a region of 3D space is mathematically characterized by a five-dimensional (5D) plenoptic function [8

8. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing, M. S. Landy and J. A. Movshon, eds. (MIT Press, 1991), pp. 3–20.

], i.e., three coordinates for the position and two angles for the direction. In free space, as the radiance does not change along a line unless it is occluded, such a 5D representation may be reduced to four-dimensional (4D), which is called the “lightfield” [6

6. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (1996), pp. 31–42.

] or “lumigraph” [9

9. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” in Proceedings of ACM SIGGRAPH (1996), pp. 43–54.

]. With a lightfield, we can reconstruct, or render, various observations of the scene. For example, we can manipulate viewpoints and perform refocusing via ray-tracing techniques.

There are two main approaches to capturing lightfields. The first one is to sample each individual light ray directly. An early example is integral photography [10

10. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Théor. Appl. 7, 821–825 (1908). [CrossRef] [PubMed]

], which gathers multiple images from different perspectives by placing an array of microlenses directly before the sensor. This is optically similar to a camera array system [11

11. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in Proceedings of ACM SIGGRAPH (2005), pp. 765–776. [CrossRef]

]. More recently, Adelson and Wang [12

12. E. H. Adelson and J. Y. Wang, “Single lens stereo with a plenoptic camera,” IEEE Trans. Pattern Anal. Mach. Intell. 14, 99–106 (1992). [CrossRef]

], and Ng et al. [13

13. R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report CTSR (2005), pp. 1–11.

], develop what they called plenoptic cameras. In the latter, an additional main lens is placed in front of the microlens array. Since the microlenses are located at the focal plane of this additional lens, the converging rays are separated and finally recorded by the sensor behind the microlens array. A second approach is to acquire the data in the Fourier domain. Veeraraghavan et al. developed the dappled photography [14

14. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in Proceedings of ACM SIGGRAPH 26, (2007).

], where an attenuation mask is added to a regular camera. Its working principle will be discussed in more detail in Sections 2.1 and 2.2. After that, Agrawal et al. extend this design to the problem of capturing useful subsets of time-varying 4D lightfield in a single snapshot [15

15. A. Agrawal, A. Veeraraghavan, and R. Raskar, “Reinterpretable imager: Towards variable post-capture space, angle and time resolution in photography,” Comput. Graph. Forum 29, 763–772 (2010). [CrossRef]

]. This “reinterpretable” imaging system adopts a design of a time-varying mask in the pupil plane and a static mask placed near the sensor, providing a variable resolution tradeoff among the spatial, angular and temporal dimensions.

Nevertheless, a common issue for different lightfield camera systems is that the spatial resolution is traded for angular information (for both angular and temporal information in [15

15. A. Agrawal, A. Veeraraghavan, and R. Raskar, “Reinterpretable imager: Towards variable post-capture space, angle and time resolution in photography,” Comput. Graph. Forum 29, 763–772 (2010). [CrossRef]

]) because the limited sensor elements have to be allocated to all these dimensions [16

16. T. Georgeiv, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-angular resolution tradeoff in integral photography,” in Proceedings of Eurographics Symposium on Rendering (2006), pp. 263–272.

, 17

17. Z. Xu and E. Y. Lam, “Light field superresolution reconstruction in computational photography,” in Signal Recovery and Synthesis, (Optical Society of America, 2011), p. SMB3.

]. For instance, to acquire a lightfield of 144 views on a sensor of size 3072 × 1536, a twelvefold reduction in each spatial dimension means that the maximum resolution achievable is only 256 ×128. There have been attempts to overcome this tradeoff, but they come at the expense of other aspects. For example, the camera array system [11

11. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in Proceedings of ACM SIGGRAPH (2005), pp. 765–776. [CrossRef]

] can gain the 4D radiance information with a high resolution (i.e., full sensor size of each camera) for each perspective, but the system is also known for its large size. This eventually limits its practical use. Alternatively, in a method known as programmable aperture photography [18

18. C.-K. Liang, T.-H. Lin, B.-Y. Wong, C. Liu, and H. H. Chen, “Programmable aperture photography: multiplexed light field acquisition,” in Proceedings of ACM SIGGRAPH 27 (2008), pp. 1–10. [CrossRef]

], we need many image captures to attain the required angular resolution. This results in a long acquisition time, which is not desirable in many practical applications. In [19

19. A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in Proceedings of IEEE International Conference on Computational Photography (IEEE, 2009), pp. 1–8. [CrossRef]

], Lumsdaine and Georgiev depict a new design of a plenoptic camera, called the focused plenoptic camera, where the microlens array is positioned before or behind the focal plane of the main lens. This modification samples the lightfield in a way that allows for a higher spatial resolution. However, at the same time, the angular resolution is decreased. Besides, the low angular resolution also introduces some unwanted aliasing artifacts.

In this paper, we present a camera system that collects the 4D lightfield within a single exposure. With two attenuating masks separately placed at the aperture plane and the optical path of the camera, we can encode the lightfield spectrum in the Fourier domain, and then selectively sub-sample it. We show that this economical and easily adjustable design can overcome various limitations found in other lightfield acquisition systems.

2. A lightfield camera with two masks

2.1. Lightfield mapping via mask-based multiplexing

We explain the mapping of a lightfield with mask-based multiplexing. In geometrical optics, we describe light propagation in terms of rays, which together form a lightfield [6

6. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (1996), pp. 31–42.

]. We describe the light rays by their intersections with two parallel planes as shown in Fig. 1, i.e., a first coordinate pair u = {u,v} (at the u-plane) and a second coordinate pair s = {s,t} (at the s-plane) [6

6. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (1996), pp. 31–42.

]. The lightfield is then ℓ(u,v,s,t), which we abbreviate as ℓ(u,s) in the rest of this paper.

Fig. 1 The two-plane parametrization of a 4D lightfield.

Using this two-plane parametrization, we can analyze a conventional camera fitted with a mask between the u-plane and the s-plane. We depict such a camera in Fig. 2. The u-plane is taken to be at the aperture, while the s-plane at the sensor. They are separated by a distance d, while the mask is placed at a distance z in front of the sensor, where zd. Let m(u,s) be the attenuation on a lightfield produced by the mask. The lightfield measured behind the mask is then ℓo(u,s), given by
o(u,s)=(u,s)m(u,s).
(1)
If we can capture ℓo(u,s), we can retrieve ℓ(u,s) since m(u,s) is known.

Fig. 2 Schematic diagram of a regular camera, with an attenuation mask placed inside it.

In fact, m(u,s) is completely determined by the 2D pattern c(x,y) printed on the mask when the distance d is known. We denote the mask plane as the x-plane, with x = {x,y}. With reference to Fig. 2, because ΔABC and ΔADE are similar triangles, we have
BCDE=ABADxusu=dzd.
(2)
Based on Eq. (2), we have x = (1 − z/d)s + (z/d)u. But since u = {u,v} and s = {s,t},
x=(1zd)s+zdu.
(3)
Thus, m(u,s) can be expressed as
m(u,s)=c[(1zd)s+zdu].
(4)

In reality, however, we seldom directly capture the lightfield ℓo(u,s). Instead, it is instructive to consider the “lightfield-frequency” domain, which is the 4D Fourier transform applied to the lightfield in Eq. (1). Using fu and fs to denote the lightfield-frequency variables, we have
o(fu,fs)=(fu,fs)*M(fu,fs),
(5)
where ℒo(fu, fs), ℒ(fu, fs) and M(fu, fs) are the respective Fourier transforms of ℓo(u,s), ℓ(u,s) and m(u,s), and * denotes the 4D convolution operation. Furthermore, we can express M(fu, fs)
M(fu,fs)=c[(1zd)s+zdu]exp[j2π(fuu+fss)]duds={c[(1zd)s+zdu]exp(j2πfss)ds}exp(j2πfuu)du.
(6)
Clearly, the positioning of the mask (i.e., the value of z) affects the lightfield ℓo(u,s). This effect is explained in further details as follows.
  1. Generally, the mask is between the aperture and the sensor, so 0 < z <d. According to Eq. (6), the inner integration computes the Fourier transform over the dimension of s with some shift and scaling, i.e. [20

    20. R. N. Bracewell, The Fourier Transform and Its Applications, 3rd ed. (McGraw-Hill, 1999).

    ],
    M(fu,fs)=ddz{C(ddzfs)exp[j2π(zdzfs)u]}exp(j2πfuu)du=ddzC(ddzfs)δ(fuzdzfs),
    (7)
    where C(·) represents the 2D Fourier transform of c(·). This means that the modulation caused by the mask in the lightfield-frequency domain happens along an inclined 2D plane, where fuzdzfs=0. Its inclination angle α, if we plot fs versus fu, is given by
    α=arctanzdz.
    (8)
  2. Alternatively, the mask can be placed exactly at the aperture, where z = d. All the rays with the same location in the u-plane are attenuated equally by the mask. Substitute z = d into Eq. (6), then
    M(fu,fs)=C(fu)δ(fs).
    (9)
    Thus, in lightfield-frequency domain, the corresponding convolution only affects the lightfield spectrum along the fu axis (where fs = 0). This observation is critical to our design, as we will explain next.

2.2. Lightfield capture and image reconstruction

The sensor at the s-plane cannot capture the full 4D lightfield ℓo(u,s) as given in Eq. (1). Instead, all rays with the same (s,t) but different (u,v) are collected (i.e., integrated together) by the same photodetector. In the lightfield-frequency domain, this means the sensor only obtains data at fu = 0, or along the fs axis.

Ref. [14

14. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in Proceedings of ACM SIGGRAPH 26, (2007).

] however provides a strategy to capture the 4D lightfield using a normal sensor, which we briefly review here. This will form the basis of our computational photography architecture which makes use of two masks. Assume that c(x) is the sum of a series of cosine waves of equal amplitude; C(fx) is then an impulse train with even symmetry, which causes modulation along a slanted plane. Specifically, Eq. (5) suggests that ℒo(fu, fs) contains replications of ℒ(fu, fs) along a slanted plane at angle α given by Eq. (8). This is shown in Fig. 3. For ease of explanations, we depict the lightfield spectrum as one consisting of several sections along the fu axis, each of which is called an angular spectral slice. By adjusting α and the distance between each consecutive replications of the lightfield spectrum along the slanted plane, we can position all the sections along the fs axis. Therefore, the 2D slice of data collected by the sensor still contains all the information about the 4D lightfield.

Fig. 3 The modulation in the lightfield-frequency domain.

The tradeoff with this mode of capture is that the slice in Fig. 3 needs to be much longer than what would be needed for conventional photography; therefore, many more samples are needed to achieve the same 2D resolution for one reconstructed picture. Put another way, assume the overall number of pixels is q. Then, to resolve n different views, we only assign q/n of the pixels to sample each angular spectral slice, compared with using all q pixels for a single picture in conventional photography. This ultimately results in a loss of the spatial resolution with a scaling of 1/n. Our design of a lightfield camera seeks to ameliorate this problem by showing that when each angular spectral slice can contain more information than merely one perspective or view, fewer replicas of the lightfield spectrum are needed. This means that effectively the sensor slice is shortened, and as a result a higher resolution lightfield can be obtained with a fixed sensor size.

2.3. Lightfield capture with a double-mask design

We propose a lightfield camera as shown in Fig. 4. We assume that the lightfield spectrum is bandlimited, i.e., ℒ(fu, fs) = 0 for |fu| ≥ Bu/2 or |fs| ≥ Bs/2. This is reasonable because the optics imposes a cutoff in the optical transfer function in the fs axis. As for fu, Ref. [21

21. J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum, “Plenoptic sampling,” in Proceedings of ACM SIGGRAPH 27 (2000), pp. 307–318.

] shows that the corresponding bandwidth is basically determined by the depth range of a scene.

Fig. 4 Our proposed lightfield camera, with two attenuation masks respectively placed at the aperture stop and the optical path in the camera.

We analyze the working principle of this camera by considering the operations in the lightfield-frequency domain as shown in Fig. 5. After passing through the first attenuation mask located at the aperture stop, the incoming bandwidth-limited lightfield is convolved with the mask spectrum along the fu axis. If the mask frequency response is a series of impulses, the lightfield spectrum is replicated along the fu axis, causing the angular spectral slices to overlay on each other. This is the lightfield spectrum encoding. Because of the second mask, the encoded lightfield spectrum is then replicated along a slanted line. By adjusting the position of the mask, we can place the desired angular spectral slices along the fs axis. Thereafter, we perform the lightfield reconstruction from the 2D slice data collected by the sensor in the fashion described in Section 2.2.

Fig. 5 The corresponding lightfield-frequency domain operations in our double-mask light-field camera. (The asterisk pattern in the figure denotes the convolution.)

The analysis in lightfield-frequency domain provides an intuitive knowledge of our design. However, for the purpose of mask design and lightfield retrieval, we need to explicitly model the acquisition process. This is expressed as
i(s)=(u,s)m1(u,s)m2(u,s)du=(u,s)c1(u)c2[(1zd)s+zdu]du,
(10)
where i(s) is the 2D picture recorded by the sensor, and m1(u,s) and m2(u,s) are the respective attenuation provided by the masks at the aperture stop (c1(x)) and at the camera’s optical path (c2(x)) shown in Fig. 4. The formula for the masks are given in Eq. (4).

As indicated in Fig. 5, our design is based on a series of operations in the lightfield-frequency domain. Thus, it is rational to convert the integration of Eq. (10) into a form under the Fourier bases. After discretizing Eq. (10) and converting it into matrix form, we have
i=F1M2M1F=F1MF=A,
(11)
where F and F−1 are the matrices consisting of the Fourier basis and its inverse, M1 and M2, respectively, consist of the coefficients of the Fourier transforms of c1(x) and c2(x) and the projection matrix A = F−1MF. Therefore, the image formation of our lightfield camera can be treated as a linear integration process in the content of geometrical optics as indicated in [22

22. A. Levin, W. T. Freeman, and F. Durand, “Understanding camera trade-offs through a Bayesian analysis of light field projections,” in Proceedings of the 10th European Conference on Computer Vision (2008), pp. 88–101.

, 23

23. Z. Xu and E. Y. Lam, “A spatial projection analysis of light field capture,” in Frontiers in Optics, (Optical Society of America, 2010), p. FWH2.

]. More specifically, it is a measuring procedure in the lightfield-frequency domain through a measurement matrix M = M2M1.

We note that the discretized lightfield ℓ is arranged into a 2D matrix of size n ×m, with n as the resolution in the u dimension and m as the resolution in the s dimension. Assume M1 and M2 are of size k × p and p × n, respectively. Then, M is a k × n matrix, which means that we sample k measurements of the coefficients decomposed by n Fourier bases. The size of the final captured picture i is k ×m, meaning we need a sensor with km pixels. We can compare this with the design in [14

14. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in Proceedings of ACM SIGGRAPH 26, (2007).

], which forbids overlapping between each replicated spectrum. Consequently, the matrix M in their case is diagonal (k = n). To achieve a lightfield with the same resolution, the dappled photography system will need nm pixels. In our design, however, the measurement matrix is the product of two matrices M2 and M1. This provides us with the means to control the size of the two dimensions of M separately. Hence if we can achieve a measurement matrix M with k < n, fewer pixels will be used to sample the signal. In other words, we can acquire a higher spatial resolution lightfield using the same number of pixels. As discussed next, we can then realize a measurement matrix with k < n in our design.

2.4. Design of the two masks

In this section, we describe the pattern design of these two attenuation masks. For clarity, only the case of 2D lightfield is carried out here, but these conclusions can be easily extended to a 4D lightfield.

The first row of Fig. 5 shows the desired frequency response of the first mask, which is actually a symmetric impulse train. The interval between each consecutive impulse is equal to the sampling interval of the lightfield spectrum along the fu axis. Thus, the corresponding physical mask pattern is the sum of multiple cosine waves with a given amplitude, which in turn determines M1 completely. Specifically, assume the first mask has the following the frequency response, i.e.,
C1(fu)=i=(n1)n1aiδ(fuiΔfu),
(12)
where n is the expected resolution along the fu axis, ai is the amplitude of the i-th impulse and Δfu is the sampling interval of the lightfield spectrum along the fu axis, which is equal to Bu/n with Bu as the bandwidth in the fu dimension. Because the first mask is convolved with the lightfield spectrum in the lightfield-frequency domain, by converting the convolution into a matrix multiplication, we have M1 equal to
[a0a1an1a1a0an2a(n1)a(n2)a0]p×n.
(13)

Thus, we have constructed a matrix M1 with a Toeplitz-structured block inside it. Because of the second mask, only k rows of M1 are selected, so the other ones are marked with ellipses for simplicity. Note that we can recover the original sparse signal with a high probability from the limited observations measured by a well-designed Toeplitz-structured matrix [24

24. W. U. Bajwa, J. D. Haupt, G. M. Raz, S. J. Wright, and R. D. Nowak, “Toeplitz-structured compressed sensing matrices,” in Proceedings of IEEE/SP 14th Workshop on Statistical Signal Processing, (IEEE, 2007), pp. 294–298. [CrossRef]

, 25

25. W. Yin, S. Morgan, J. Yang, and Y. Zhang, “Practical compressive sensing with Toeplitz and circulant matrices,” in Visual Communications and Image Processing , Proc. SPIE 7744, 77440K (2010).

]. To satisfy the conditions for such a design, several methods have been recommended. As suggested in [24

24. W. U. Bajwa, J. D. Haupt, G. M. Raz, S. J. Wright, and R. D. Nowak, “Toeplitz-structured compressed sensing matrices,” in Proceedings of IEEE/SP 14th Workshop on Statistical Signal Processing, (IEEE, 2007), pp. 294–298. [CrossRef]

], we generate M1 with entries ai, i = 0,...,n − 1 drawn independently from a Gaussian distribution with zero mean. Since ai is symmetric about a0, the values of ai, for i = −(n − 1),...,−1, are then known. Eventually, we obtain the physical pattern of the first mask based on its frequency response in Eq. (12).

As for the second mask placed at the optical path, the second row in Fig. 5 has shown a heuristic example. That is, the frequency response of the second mask is a series of even-symmetric impulses with equal amplitudes. The number of impulses depends on how many measurements are required for reconstruction. To avoid aliasing between the adjacent spectrum replicas, the interval of this impulse train is equal to the lightfield bandwidth in the fs dimension, i.e., Bs. Specifically, the frequency response of the second mask is given by
C2(fx)=i=(k1)/2(k1)/2δ(fxiBs),
(14)
where k is the number of the measurements. Thus the corresponding mask pattern c2(x) can be obtained by computing the inverse Fourier transform of Eq. (14). That is the sum of a series of cosine waves. As regards its matrix form M2, it depends on the requirement of which measurements will be collected for further reconstruction. So we could realize the function of M2 by selecting the k rows of M1 according to the specific design.

2.5. Lightfield reconstruction

This optimization can be solved via a nonlinear conjugate gradient method combined with backtracking line search, as adopted in [29

29. Z. Xu and E. Y. Lam, “Image reconstruction using spectroscopic and hyperspectral information for compressive terahertz imaging,” J. Opt. Soc. Am. A 27, 1638–1646 (2010). [CrossRef]

].

3. Experimental results

To verify the ability to achieve a high-resolution lightfield, a direct way is to use a fixed number of pixels to retrieve a lightfield with a higher spatial resolution. Alternatively, one can aim at obtaining a lightfield of a fixed resolution with fewer pixels, which is the approach we take here. The following experiments are based on actual lightfield datasets from the Stanford lightfield archive [30

30. “The (new) Stanford light field archive,” http://lightfield.stanford.edu/lfs.html.

]. For computational considerations, we choose 100 views on a 10 × 10 grid and resize the image to 128 × 256 pixels.

Figure 6 shows the corresponding mask patterns that are adopted in the experiments. According to Eq. (12) in Section 2.4, the required frequency response of the mask at the aperture stop is an even-symmetric impulse train of size 19×19 (where n = 10×10 in our experiments). The corresponding amplitude of these impulses are drawn independently from a Gaussian distribution with zero mean. The physical pattern shown in Fig. 6(a) is the one we use here. Since the mask at the aperture stop is responsible for encoding the lightfield spectrum, we keep this mask unchanged during our experiments.

Fig. 6 (a) The pattern of the first mask; (b)–(e) are the pattern parts of the second mask, respectively in cases of using full, 64%, 36% and 16% sensor size.

For the mask placed at the optical path, its frequency response depends on the specific requirement of the measurement number. For example, for the case of using full sensor size (i.e., 1280 × 2560), it is a 10 × 10 impulse train with equal amplitude based on Eq. (14). Similarly, we have 8 × 8 for the case of using 64% sensor size (i.e., 1024 × 2048), 6 × 6 for the case of using 36% sensor size (i.e., 768 × 1536) and 4 × 4 for the case of using 16% sensor size (i.e., 512 × 1024). Figure 6(b) – 6(e) show the corresponding pattern parts in these different cases. Notice that since we cannot have negative values in the mask, we need to increase the DC component so that the values in these masks are nonnegative.

Next, we show the performance of our camera when using different sensor sizes. That is, we aim to retrieve the original lightfield of the same spatial resolution from the captured signals by using different physical sensor sizes. Figure 7 shows the captured pictures by using the proposed lightfield camera with different number of pixels. Figure 8 shows the corresponding reconstruction images at one selected viewpoint. For the sake of comparison, we use both the least-norm method in Eq. (15) and our proposed algorithm in Eq. (16) for lightfield reconstruction. In the case of using full sensor, both methods can yield perfect reconstructions as given in the ground truth. With a mild reduction in sensor size, the recoveries can still provide us good details comparable with the ground truth, such as the ones shown in the case of using 64% sensor size. With further reduction, however, the reconstruction becomes difficult, although the reconstructed images are still satisfactory with 36% and 16% pixels. Furthermore, in comparison with the reconstructions by using the least-norm method (the left column in Fig. 8), we can see that our method can preserve more details and provide better artifact control (e.g., the ringing artifacts around the beans). Nevertheless, we also observe that with significant sensor size reduction, some of the details in the images are lost and the images are blurry.

Fig. 8 The reconstructed images at one selected viewpoint by using the least-norm method (left column) and the proposed method in Section 2.5 (right column): (a) ground truth, (b) and (c) full size, (d) and (e) 64% sensor size, (f) and (g) 36% sensor size, (h) and (i) 16% sensor size.

Finally, we show that a higher resolution lightfield can be acquired with our proposed system than that with the conventional lightfield cameras when using the same sensor size. Figure 9 shows the case of using 36% sensor size (i.e., 768 × 1536). If we use the conventional lightfield cameras such as the ones in [13

13. R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report CTSR (2005), pp. 1–11.

, 14

14. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in Proceedings of ACM SIGGRAPH 26, (2007).

], the maximum spatial resolution that can be achieved will be 76 × 153. From the results shown in Fig. 9, we can see that with our proposed camera the lightfield can be recovered at a higher spatial resolution. Such a resolution enhancement effect becomes more prominent in the case of using 16% sensor size (i.e., 512 × 1024). In this case, the best quality that can be achieved with the conventional method is 51 ×102. But by adopting the proposed camera, we can still reconstruct many details of the scene from the captured data. See Fig. 10 for details.

Fig. 9 Reconstructions when using 36% sensor size: (a) ground truth; (b) the best quality that can be achieved by using the traditional lightfield cameras; (c) our reconstruction with the least-norm method; (d) our reconstruction with the proposed iterative method.
Fig. 10 Reconstructions when using 16% sensor size: (a) ground truth; (b) the best quality that can be achieved by using the traditional lightfield cameras; (c) our reconstruction with the least-norm method; (d) our reconstruction with the proposed iterative method.

4. Conclusions

We show a system that can capture a 4D lightfield with two attenuation masks. Taking advantage of the correlations inherent in the lightfield, we develop a post-processing algorithm to reconstruct the lightfield from the captured 2D data from the sensor. The experimental results show that fewer pixels are needed to achieve the same resolution as what one can achieve with a conventional lightfield camera.

Acknowledgments

This work was supported in part by the University Research Committee of the University of Hong Kong under Project 10208648.

References and links

1.

E. Y. Lam, “Computational photography: Advances and challenges,” in Tribute to Joseph W. Goodman, H. J. Caulfield and H. H. Arsenault, eds., Proc. SPIE 8122, 81220O (2011).

2.

W. T. Cathey and E. R. Dowski, “New paradigm for imaging systems,” Appl. Opt. 41, 6080–6092 (2002). [CrossRef] [PubMed]

3.

J. Mait, R. Athale, and J. van der Gracht, “Evolutionary paths in imaging and recent trends,” Opt. Express 11, 2093–2101 (2003). [CrossRef] [PubMed]

4.

W.-S. Chan, E. Y. Lam, M. K. Ng, and G. Y. Mak, “Super-resolution reconstruction in a computational compound-eye imaging system,” Multidim. Syst. Sign. Process 18, 83–101 (2007). [CrossRef]

5.

T. Mirani, D. Rajan, M. P. Christensen, S. C. Douglas, and S. L. Wood, “Computational imaging systems: Joint design and end-to-end optimality,” Appl. Opt. 47, B86–B103 (2008). [CrossRef] [PubMed]

6.

M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (1996), pp. 31–42.

7.

J. W. Goodman, Introduction to Fourier Optics, 3rd ed. (Roberts and Company Publishers, 2004).

8.

E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing, M. S. Landy and J. A. Movshon, eds. (MIT Press, 1991), pp. 3–20.

9.

S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” in Proceedings of ACM SIGGRAPH (1996), pp. 43–54.

10.

G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Théor. Appl. 7, 821–825 (1908). [CrossRef] [PubMed]

11.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in Proceedings of ACM SIGGRAPH (2005), pp. 765–776. [CrossRef]

12.

E. H. Adelson and J. Y. Wang, “Single lens stereo with a plenoptic camera,” IEEE Trans. Pattern Anal. Mach. Intell. 14, 99–106 (1992). [CrossRef]

13.

R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report CTSR (2005), pp. 1–11.

14.

A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in Proceedings of ACM SIGGRAPH 26, (2007).

15.

A. Agrawal, A. Veeraraghavan, and R. Raskar, “Reinterpretable imager: Towards variable post-capture space, angle and time resolution in photography,” Comput. Graph. Forum 29, 763–772 (2010). [CrossRef]

16.

T. Georgeiv, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-angular resolution tradeoff in integral photography,” in Proceedings of Eurographics Symposium on Rendering (2006), pp. 263–272.

17.

Z. Xu and E. Y. Lam, “Light field superresolution reconstruction in computational photography,” in Signal Recovery and Synthesis, (Optical Society of America, 2011), p. SMB3.

18.

C.-K. Liang, T.-H. Lin, B.-Y. Wong, C. Liu, and H. H. Chen, “Programmable aperture photography: multiplexed light field acquisition,” in Proceedings of ACM SIGGRAPH 27 (2008), pp. 1–10. [CrossRef]

19.

A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in Proceedings of IEEE International Conference on Computational Photography (IEEE, 2009), pp. 1–8. [CrossRef]

20.

R. N. Bracewell, The Fourier Transform and Its Applications, 3rd ed. (McGraw-Hill, 1999).

21.

J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum, “Plenoptic sampling,” in Proceedings of ACM SIGGRAPH 27 (2000), pp. 307–318.

22.

A. Levin, W. T. Freeman, and F. Durand, “Understanding camera trade-offs through a Bayesian analysis of light field projections,” in Proceedings of the 10th European Conference on Computer Vision (2008), pp. 88–101.

23.

Z. Xu and E. Y. Lam, “A spatial projection analysis of light field capture,” in Frontiers in Optics, (Optical Society of America, 2010), p. FWH2.

24.

W. U. Bajwa, J. D. Haupt, G. M. Raz, S. J. Wright, and R. D. Nowak, “Toeplitz-structured compressed sensing matrices,” in Proceedings of IEEE/SP 14th Workshop on Statistical Signal Processing, (IEEE, 2007), pp. 294–298. [CrossRef]

25.

W. Yin, S. Morgan, J. Yang, and Y. Zhang, “Practical compressive sensing with Toeplitz and circulant matrices,” in Visual Communications and Image Processing , Proc. SPIE 7744, 77440K (2010).

26.

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D 60, 259–268 (1992). [CrossRef]

27.

E. Y. Lam, X. Zhang, H. Vo, T.-C. Poon, and G. Indebetouw, “Three-dimensional microscopy and sectional image reconstruction using optical scanning holography,” Appl. Opt. 48, H113–H119 (2009). [CrossRef] [PubMed]

28.

X. Zhang and E. Y. Lam, “Edge-preserving sectional image reconstruction in optical scanning holography,” J. Opt. Soc. Am. A 27, 1630–1637 (2010). [CrossRef]

29.

Z. Xu and E. Y. Lam, “Image reconstruction using spectroscopic and hyperspectral information for compressive terahertz imaging,” J. Opt. Soc. Am. A 27, 1638–1646 (2010). [CrossRef]

30.

“The (new) Stanford light field archive,” http://lightfield.stanford.edu/lfs.html.

OCIS Codes
(100.3020) Image processing : Image reconstruction-restoration
(100.3190) Image processing : Inverse problems
(110.1758) Imaging systems : Computational imaging
(110.3010) Imaging systems : Image reconstruction techniques

ToC Category:
Imaging Systems

History
Original Manuscript: March 8, 2012
Revised Manuscript: April 21, 2012
Manuscript Accepted: April 22, 2012
Published: April 26, 2012

Citation
Zhimin Xu, Jun Ke, and Edmund Y. Lam, "High-resolution lightfield photography using two masks," Opt. Express 20, 10971-10983 (2012)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-20-10-10971


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. E. Y. Lam, “Computational photography: Advances and challenges,” in Tribute to Joseph W. Goodman, H. J. Caulfield and H. H. Arsenault, eds., Proc. SPIE 8122, 81220O (2011).
  2. W. T. Cathey and E. R. Dowski, “New paradigm for imaging systems,” Appl. Opt.41, 6080–6092 (2002). [CrossRef] [PubMed]
  3. J. Mait, R. Athale, and J. van der Gracht, “Evolutionary paths in imaging and recent trends,” Opt. Express11, 2093–2101 (2003). [CrossRef] [PubMed]
  4. W.-S. Chan, E. Y. Lam, M. K. Ng, and G. Y. Mak, “Super-resolution reconstruction in a computational compound-eye imaging system,” Multidim. Syst. Sign. Process18, 83–101 (2007). [CrossRef]
  5. T. Mirani, D. Rajan, M. P. Christensen, S. C. Douglas, and S. L. Wood, “Computational imaging systems: Joint design and end-to-end optimality,” Appl. Opt.47, B86–B103 (2008). [CrossRef] [PubMed]
  6. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (1996), pp. 31–42.
  7. J. W. Goodman, Introduction to Fourier Optics, 3rd ed. (Roberts and Company Publishers, 2004).
  8. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing, M. S. Landy and J. A. Movshon, eds. (MIT Press, 1991), pp. 3–20.
  9. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” in Proceedings of ACM SIGGRAPH (1996), pp. 43–54.
  10. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Théor. Appl.7, 821–825 (1908). [CrossRef] [PubMed]
  11. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in Proceedings of ACM SIGGRAPH (2005), pp. 765–776. [CrossRef]
  12. E. H. Adelson and J. Y. Wang, “Single lens stereo with a plenoptic camera,” IEEE Trans. Pattern Anal. Mach. Intell.14, 99–106 (1992). [CrossRef]
  13. R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report CTSR (2005), pp. 1–11.
  14. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in Proceedings of ACM SIGGRAPH26, (2007).
  15. A. Agrawal, A. Veeraraghavan, and R. Raskar, “Reinterpretable imager: Towards variable post-capture space, angle and time resolution in photography,” Comput. Graph. Forum29, 763–772 (2010). [CrossRef]
  16. T. Georgeiv, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-angular resolution tradeoff in integral photography,” in Proceedings of Eurographics Symposium on Rendering (2006), pp. 263–272.
  17. Z. Xu and E. Y. Lam, “Light field superresolution reconstruction in computational photography,” in Signal Recovery and Synthesis, (Optical Society of America, 2011), p. SMB3.
  18. C.-K. Liang, T.-H. Lin, B.-Y. Wong, C. Liu, and H. H. Chen, “Programmable aperture photography: multiplexed light field acquisition,” in Proceedings of ACM SIGGRAPH27 (2008), pp. 1–10. [CrossRef]
  19. A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in Proceedings of IEEE International Conference on Computational Photography (IEEE, 2009), pp. 1–8. [CrossRef]
  20. R. N. Bracewell, The Fourier Transform and Its Applications, 3rd ed. (McGraw-Hill, 1999).
  21. J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum, “Plenoptic sampling,” in Proceedings of ACM SIGGRAPH27 (2000), pp. 307–318.
  22. A. Levin, W. T. Freeman, and F. Durand, “Understanding camera trade-offs through a Bayesian analysis of light field projections,” in Proceedings of the 10th European Conference on Computer Vision (2008), pp. 88–101.
  23. Z. Xu and E. Y. Lam, “A spatial projection analysis of light field capture,” in Frontiers in Optics, (Optical Society of America, 2010), p. FWH2.
  24. W. U. Bajwa, J. D. Haupt, G. M. Raz, S. J. Wright, and R. D. Nowak, “Toeplitz-structured compressed sensing matrices,” in Proceedings of IEEE/SP 14th Workshop on Statistical Signal Processing, (IEEE, 2007), pp. 294–298. [CrossRef]
  25. W. Yin, S. Morgan, J. Yang, and Y. Zhang, “Practical compressive sensing with Toeplitz and circulant matrices,” in Visual Communications and Image Processing, Proc. SPIE 7744, 77440K (2010).
  26. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D60, 259–268 (1992). [CrossRef]
  27. E. Y. Lam, X. Zhang, H. Vo, T.-C. Poon, and G. Indebetouw, “Three-dimensional microscopy and sectional image reconstruction using optical scanning holography,” Appl. Opt.48, H113–H119 (2009). [CrossRef] [PubMed]
  28. X. Zhang and E. Y. Lam, “Edge-preserving sectional image reconstruction in optical scanning holography,” J. Opt. Soc. Am. A27, 1630–1637 (2010). [CrossRef]
  29. Z. Xu and E. Y. Lam, “Image reconstruction using spectroscopic and hyperspectral information for compressive terahertz imaging,” J. Opt. Soc. Am. A27, 1638–1646 (2010). [CrossRef]
  30. “The (new) Stanford light field archive,” http://lightfield.stanford.edu/lfs.html .

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.


« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited