OSA's Digital Library

Optics Express

Optics Express

  • Vol. 16, Iss. 16 — Aug. 4, 2008
  • pp: 11776–11781
« Show journal navigation

Real-time digital holographic microscopy using the graphic processing unit

Tomoyoshi Shimobaba, Yoshikuni Sato, Junya Miura, Mai Takenouchi, and Tomoyoshi Ito  »View Author Affiliations


Optics Express, Vol. 16, Issue 16, pp. 11776-11781 (2008)
http://dx.doi.org/10.1364/OE.16.011776


View Full Text Article

Acrobat PDF (393 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512×512 grids in 24 frames per second.

© 2008 Optical Society of America

1. Introduction

Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed[1

1. U. Schnars and W. Juptner, “Direct recording of holograms by a CCD target and numerical Reconstruction,” Appl.Opt. 33, 2, 179–181 (1994). [CrossRef] [PubMed]

, 2

2. U. Schnars and W. Jueptner, Digital Holography - Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer2005).

]. The technique can obtain a hologram whereby the information of a specimen is electronically recorded, via the use of a CCD (Charge-Coupled Device Image Sensor) and CMOS (Complementary Metal Oxide Semiconductor) image sensor. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm[3

3. O. K. Ersoy, Diffraction, Fourier Optics And Imaging (Wiley-Interscience2006).

]. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. For example, if we obtain a reconstructed image from a hologram whose size is 512×512 using an Intel Core2Duo E6300 CPU, the calculation time for the Fresnel diffraction takes about 1 second.

In order to obtain greater computational speed for the Fresnel diffraction, using hardware is an effective means. For example, a research group developed an FPGA (Field Programmable Gate Array)-based board, FFT-HORN, in order to accelerate the Fresnel diffraction in DHPIV (Digital Holographic Particle Image Velocimetry) [4

4. N. Masuda, T. Ito, K. Kayama, H. Kono, S. Satake, T. Kunugi, and K. Sato, “Special purpose computer for digital holographic particle tracking velocimetry,” Opt. Express 14, 603–608 (2006). [CrossRef] [PubMed]

, 5

5. Y. Abe, N. Masuda, H. Wakabayashi, Y. Kazo, T. Ito, S. Satake, T. Kunugi, and K. Sato, “Special purpose computer system for flow visualization using holography technology,” Opt. Express, 16, 7686–7692 (2008). [CrossRef]

]. Their latest machine, FFT-HORN2, could obtain reconstructed images from holograms whose size is 1024 × 1024 grids, which were captured by a DHPIV optical system, in about 33 milliseconds. The approach using the FPGA technology showed excellent computational speed; however, the approach has the following restrictions: the high cost of developing the FPGA board, long development time and the technical know-how needed for the FPGA technology.

On the other hand, recent GPUs (Graphic Processing Unit) with many stream processors allow us to use highly parallel processors. The stream processor is a simple scalar processor, which can operate 32-bit floating-point addition, multiplication, and multiply-add instructions.

The approach of accelerating numerical calculations using a GPU chip is referred to as GPGPU (General-Purpose computation on GPU) or GPU computing. The merits of GPGPU are the high computational power, low cost for the GPU board, and short development time. In the optics field, several researches using the GPGPU technique for fast calculation of CGH (Computer-Generated-Hologram) have been proposed [6

6. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express, 14, 587–592 (2008). [CrossRef]

, 7

7. L. Ahrenberg, P. Benzie, M. Magnor, and J. Watson, “Computer generated holography using parallel commodity graphics hardware,” Opt. Express, 14, 7636–7641 (2006). [CrossRef]

]. A well-known problem in CGH is the enormous calculation cost for generating a CGH from three-dimensional (3D) object data. These researches could solve this problem, and generate a CGH from a simple 3D object in real-time.

In this paper, we describe a real-time DHM system using the GPGPU technique. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512×512 grids in 24 frames per second.

2. Real-time digital holographic microscopy system

In this section, we describe our real-time DHM system using the GPU card.

Fig. 1. Outline of the real-time DHM system using the GPU

2.1. Outline of the real-time DHM system

Figure 1 shows the set-up for our real-time DHM system. The system mainly consists of two parts: the optical system for recording a hologram and the real-time calculation system using a GPU.

The optical system is a traditional DHM set-up[2

2. U. Schnars and W. Jueptner, Digital Holography - Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer2005).

]. As shown in the figure, we used a 5-mW He-Ne laser as a reference light. The wavelength of the laser is 632.8nm. ”BS” and ”ND” indicate a beam splitter and a neutral density filter, ”M” and ”MO” indicate a mirror and an objective lens. We used a CCD camera, which has a resolution of 1360×1024 and a pixel pitch of 4.65μm × 4.65μm. We also used a test target, USAF 1951, as a sample. These holograms are then transferred to a personal computer (PC) via the USB2.0 interface. The PC controls the GPU and the CCD camera. The GPU, ”GeForce 8800 GTS,” made by NVIDIA, can calculate the Fresnel diffraction at high speed, thus allowing us to obtain reconstructed images from holograms at about 24 frames per second.

2.2. Rapid calculation of the Fresnel Diffraction using the GPU

Here, we briefly describe the Fresnel diffraction. The Fresnel diffraction is expressed as:

u(x,y)=1iλzexp(i2πλz)+a(ξ,η)exp(iπλz((xξ)2+(yη)2))dξdη
(1)

where (x,y) and (ξ,η) are coordinates on reconstruction plane u(x,y) and hologram a(ξ,η) captured by the CCD, respectively, λ is the wavelength of the reference light, and z is the distance from the hologram to the reconstruction plane. Using the convolution theorem, the Fresnel diffraction is expressed as [2

2. U. Schnars and W. Jueptner, Digital Holography - Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer2005).

,3

3. O. K. Ersoy, Diffraction, Fourier Optics And Imaging (Wiley-Interscience2006).

]:

u(x,y)=C+a(ξ,η)h(xξ,yη)dξdη
=C×a(x,y)h(x,y)=C×F1[F[a(x,y)]·F[h(x,y)]]
(2)

where C and h(x,y) define C=1iλzexp(i2πλz) and h(x,y)=exp(iπλz(x2+y2)), and operators F[∙] and F -1[∙] indicate the forward and inverse Fourier transform, respectively. If we calculate the Fresnel diffraction using a computer or a GPU, we must discretize Eqs.(1) and (2), and, subsequently use the FFT algorithm.

Next, we describe rapid calculation of the Fresnel diffraction using the GPU. We used a GPU board made by GALAXY Technology. ”GeForce 8800 GTS” as the GPU chip is mounted on the GPU board. The specifications of the GPU board are a GPU clock of 1.2GHz, memory clock of 1.6GHz, 96 stream processors and a memory of 640 Mbytes. The stream processor used in this paper is a scalar processor, which can operate 32-bit floating-point addition, multiplication, and multiply-add instructions (Note that latest GPUs can operate 64-bit floating operations). Therefore, the GPU chip has a peak performance of 2operates/SP × 96SPs × 1.2GHz = 2304Gflops (floating-point number operations per second). Thus, we can use the GPU chip as a highly parallel processor. We also used the CUDA (Compute Unified Device Architecture) as a programming environment for the GPU chip[8

8. NVIDIA, “NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 1.1,” NVIDIA (2007).

]. In comparison with CG (C for Graphics) language and HLSL (High Level Shader Language), the advantage of the CUDA is to be able to write a source code by a C-like language

In Eq. (2), the Fresnel diffraction involves two forward FFT and one inverse FFT. We can use the FFT library, CUFFT [9

9. NVIDIA, “CUDA FFT Library Version 1.1 Reference Documenta-tion,” NVIDIA (2007).

]. The library can very effectively operate the forward and inverse FFT on the GPU chip. The calculation process is as follows.

Firstly, we send the holograms captured by the CCD to the memory on the GPU board. Secondly, the GPU chip calculates term F[a(x,y)] in Eq. (2) using CUFFT and the result is stored in the memory on the board. Similarly, the GPU chip calculates term F[h(x,y)] and the result is stored in the memory. Thirdly, the GPU chip calculates complex multiplication of F[a(x,y)] and F[h(x,y)], and the result is stored in the memory. Fourthly, the GPU chip calculates u(x,y) = F -1[F[a(x,y)]F[h(x,y)]] using the inverse FFT, and the result is stored in the memory. Finally, the GPU chip calculates |u(x,y)|2, and the host computer receives |u(x,y)|2. On the PC, we process the normalization of |u(x,y)|2 and translate |u(x,y)|2 to an 8-bpp (bits per pixel) image. Here, we can ignore the term C in Eq.(2) because |C|2 is 1 after the calculation of |u(x,y)|2. For the above process, we used our numerical calculation library for wave optics using the GPU, the GWO (GPU-based Wave Optics) library[10

10. T. Shimobaba, T. Ito, N. Masuda, Y. Abe, Y. Ichihashi, H. Nakayama, N. Takada, A. Shiraki, and T. Sugie, “Numerical calculation library for diffraction integrals using the graphic processing unit: the GPU-based wave optics library,” J. Opt. A: Pure Appl. Opt. 10, 075308 (2008), http://www.iop.org/EJ/abstract/1464-4258/10/7/075308/. [CrossRef]

, 11].

2.3. Multiple threading for the real-time DHM system

The GPU can calculate the Fresnel diffraction faster than recent CPUs. For more effective processing, we introduce the multiple-threading technique in the real-time DHM system. The multiple-threading technique is a method for a program to split itself into two or more simultaneously running tasks[12]. Figure 2 (a) and (b) show flowcharts of non-multiple threading and multiple threading, respectively. For the non-multiple-threading, the steps from capturing a hologram to displaying a reconstructed image are sequential.

3. Performance and optical results

Table 1 shows a comparison of the reconstruction rate between the case of CPU only and that of the GPU. The unit of the reconstruction rate is frames per second (fps). The reconstruction rate involves the time from capturing a hologram to displaying a reconstructed image. We used Intel Core2Duo E6300 as the ”CPU” in the table, memory of 2Gbytes, and the operating system of Microsoft Windows XP Professional SP2. The hologram size is 512 × 512 grids. With this size, the GPU can obtain reconstructed images about 20 times faster than the CPU. In case of the hologram size, the single floating point operation on the GPU chip is enough to obtain a reconstruction image from a hologram.

Fig. 2. Parallel processing of the host computer and the GPU using the multiple-threading technique

Table 1. Reconstruction rate between the case of CPU only and that of the GPU.

table-icon
View This Table

Figure 3 and 4 show snapshots of reconstructed animation using CPU and the GPU chip, respectively. In the case of CPU only,we used the FFT library, FFTW[13

13. FFTW Home Page, http://www.fftw.org/.

], for FFT calculations.

Fig. 3. (362KB) Reconstruction image by CPU only. [Media 1]
Fig. 4. (1034KB) Reconstruction image by the GPU [Media 2]

4. Conclusion and future work

Acknowledgment

This research was partially supported by Yamagata Promotional Organization for Industrial Technology and the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientists (B), 19700082, 2007.

References and links

1.

U. Schnars and W. Juptner, “Direct recording of holograms by a CCD target and numerical Reconstruction,” Appl.Opt. 33, 2, 179–181 (1994). [CrossRef] [PubMed]

2.

U. Schnars and W. Jueptner, Digital Holography - Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer2005).

3.

O. K. Ersoy, Diffraction, Fourier Optics And Imaging (Wiley-Interscience2006).

4.

N. Masuda, T. Ito, K. Kayama, H. Kono, S. Satake, T. Kunugi, and K. Sato, “Special purpose computer for digital holographic particle tracking velocimetry,” Opt. Express 14, 603–608 (2006). [CrossRef] [PubMed]

5.

Y. Abe, N. Masuda, H. Wakabayashi, Y. Kazo, T. Ito, S. Satake, T. Kunugi, and K. Sato, “Special purpose computer system for flow visualization using holography technology,” Opt. Express, 16, 7686–7692 (2008). [CrossRef]

6.

N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express, 14, 587–592 (2008). [CrossRef]

7.

L. Ahrenberg, P. Benzie, M. Magnor, and J. Watson, “Computer generated holography using parallel commodity graphics hardware,” Opt. Express, 14, 7636–7641 (2006). [CrossRef]

8.

NVIDIA, “NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 1.1,” NVIDIA (2007).

9.

NVIDIA, “CUDA FFT Library Version 1.1 Reference Documenta-tion,” NVIDIA (2007).

10.

T. Shimobaba, T. Ito, N. Masuda, Y. Abe, Y. Ichihashi, H. Nakayama, N. Takada, A. Shiraki, and T. Sugie, “Numerical calculation library for diffraction integrals using the graphic processing unit: the GPU-based wave optics library,” J. Opt. A: Pure Appl. Opt. 10, 075308 (2008), http://www.iop.org/EJ/abstract/1464-4258/10/7/075308/. [CrossRef]

11.

The GWO library, http://sourceforge.net/projects/thegwolibrary/.

12.

Wikipedia, http://en.wikipedia.org/wiki/Thread_(computer_science).

13.

FFTW Home Page, http://www.fftw.org/.

OCIS Codes
(090.1995) Holography : Digital holography
(090.5694) Holography : Real-time holography

ToC Category:
Holography

History
Original Manuscript: June 6, 2008
Revised Manuscript: July 15, 2008
Manuscript Accepted: July 21, 2008
Published: July 23, 2008

Virtual Issues
Vol. 3, Iss. 9 Virtual Journal for Biomedical Optics

Citation
Tomoyoshi Shimobaba, Yoshikuni Sato, Junya Miura, Mai Takenouchi, and Tomoyoshi Ito, "Real-time digital holographic microscopy using the graphic processing unit," Opt. Express 16, 11776-11781 (2008)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-16-16-11776


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. U. Schnars and W. Juptner, "Direct recording of holograms by a CCD target and numerical Reconstruction," Appl. Opt. 33, 179-181 (1994). [CrossRef] [PubMed]
  2. U. Schnars and W. Jueptner, Digital Holography - Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer 2005).
  3. O. K. Ersoy, Diffraction, Fourier Optics And Imaging (Wiley-Interscience 2006).
  4. N. Masuda, T. Ito, K. Kayama, H. Kono, S. Satake, T. Kunugi, and K. Sato, "Special purpose computer for digital holographic particle tracking velocimetry," Opt. Express 14, 603-608 (2006). [CrossRef] [PubMed]
  5. Y. Abe, N. Masuda, H. Wakabayashi, Y. Kazo, T. Ito, S. Satake, T. Kunugi, and K. Sato, "Special purpose computer system for flow visualization using holography technology," Opt. Express,  16, 7686-7692 (2008). [CrossRef]
  6. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, "Computer generated holography using a graphics processing unit," Opt. Express 14, 587-592 (2008). [CrossRef]
  7. L. Ahrenberg, P. Benzie, M. Magnor, and J. Watson, "Computer generated holography using parallel commodity graphics hardware," Opt. Express 14, 7636-7641 (2006). [CrossRef]
  8. NVIDIA, "NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 1.1," NVIDIA (2007).
  9. NVIDIA, "CUDA FFT Library Version 1.1 Reference Documenta-tion," NVIDIA (2007).
  10. T. Shimobaba, T. Ito, N. Masuda, Y. Abe, Y. Ichihashi, H. Nakayama, N. Takada, A. Shiraki, and T. Sugie, "Numerical calculation library for diffraction integrals using the graphic processing unit: the GPU-based wave optics library," J. Opt. A: Pure Appl. Opt. 10, 075308 (2008), http://www.iop.org/EJ/abstract/1464-4258/10/7/075308/. [CrossRef]
  11. The GWO library, http://sourceforge.net/projects/thegwolibrary/.
  12. Wikipedia, http://en.wikipedia.org/wiki/Thread (computer science).
  13. FFTW Home Page, http://www.fftw.org/.

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

Figures

Fig. 1. Fig. 2. Fig. 3.
 
Fig. 4.
 

Supplementary Material


» Media 1: MOV (362 KB)     
» Media 2: MOV (1034 KB)     

« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited