OSA's Digital Library

Virtual Journal for Biomedical Optics

Virtual Journal for Biomedical Optics

| EXPLORING THE INTERFACE OF LIGHT AND BIOMEDICINE

  • Editors: Andrew Dunn and Anthony Durkin
  • Vol. 6, Iss. 1 — Jan. 3, 2011
« Show journal navigation

Graphics processing unit accelerated non-uniform fast Fourier transform for ultrahigh-speed, real-time Fourier-domain OCT

Kang Zhang and Jin U. Kang  »View Author Affiliations


Optics Express, Vol. 18, Issue 22, pp. 23472-23487 (2010)
http://dx.doi.org/10.1364/OE.18.023472


View Full Text Article

Acrobat PDF (1699 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

We implemented fast Gaussian gridding (FGG)-based non-uniform fast Fourier transform (NUFFT) on the graphics processing unit (GPU) architecture for ultrahigh-speed, real-time Fourier-domain optical coherence tomography (FD-OCT). The Vandermonde matrix-based non-uniform discrete Fourier transform (NUDFT) as well as the linear/cubic interpolation with fast Fourier transform (InFFT) methods are also implemented on GPU to compare their performance in terms of image quality and processing speed. The GPU accelerated InFFT/NUDFT/NUFFT methods are applied to process both the standard half-range FD-OCT and complex full-range FD-OCT (C-FD-OCT). GPU-NUFFT provides an accurate approximation to GPU-NUDFT in terms of image quality, but offers >10 times higher processing speed. Compared with the GPU-InFFT methods, GPU-NUFFT has improved sensitivity roll-off, higher local signal-to-noise ratio and immunity to side-lobe artifacts caused by the interpolation error. Using a high speed CMOS line-scan camera, we demonstrated the real-time processing and display of GPU-NUFFT-based C-FD-OCT at a camera-limited rate of 122 k line/s (1024 pixel/A-scan).

© 2010 OSA

1. Introduction

Fourier-domain optical coherence tomography (FD-OCT) is capable of providing depth/time-resolved images of biological tissues noninvasively with micron level resolution. These features make FD-OCT systems suitable for applications in microsurgical guidance and intervention [1

1. A. Stephen, Boppart, Mark E. Brezinski and James G. Fujimoto, “Surgical Guidance and Intervention,” in Handbook of Optical Coherence Tomography, B. E. Bouma and G. J Tearney, ed. (Marcel Dekker, New York, NY, 2001).

6

6. U. Sharma, N. M. Fried, and J. U. Kang, “All-fiber common-path optical coherence tomography: sensitivity optimization and system analysis,” IEEE J. Sel. Top. Quantum Electron. 11(4), 799–805 (2005). [CrossRef]

]. For practical clinical applications, FD-OCT systems require raw data acquisition, data processing, and visualization rate in excess of 10’s of kHz speed.

The required raw data acquisition rate (A-scan line rate) of FD-OCT systems has been achieved in recent years where > 100,000 line/s speeds are common [7

7. B. Potsaid, I. Gorczynska, V. J. Srinivasan, Y. Chen, J. Jiang, A. Cable, and J. G. Fujimoto, “Ultrahigh speed spectral / Fourier domain OCT ophthalmic imaging at 70,000 to 312,500 axial scans per second,” Opt. Express 16(19), 15149–15169 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-16-19-15149. [CrossRef] [PubMed]

14

14. M. W. Jenkins, F. Rothenberg, D. Roy, V. P. Nikolski, Z. Hu, M. Watanabe, D. L. Wilson, I. R. Efimov, and A. M. Rollins, “4D embryonic cardiography using gated optical coherence tomography,” Opt. Express 14(2), 736–748 (2006), http://www.opticsinfobase.org/oe/abstract.cfm?URI=OPEX-14-2-736. [CrossRef] [PubMed]

]. Even > 1,000,000 line/s rate has been achieved for multi-channel FD-OCT [15

15. W. Wieser, B. R. Biedermann, T. Klein, C. M. Eigenwillig, and R. Huber, “Multi-megahertz OCT: High quality 3D imaging at 20 million A-scans and 4.5 GVoxels per second,” Opt. Express 18(14), 14685–14704 (2010), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-18-14-14685. [CrossRef] [PubMed]

]. The speed of FD-OCT is determined either by the line rate of CMOS/CCD camera (for spectrometer-based systems) or by the spectral sweeping frequency (for swept laser-based systems). Ultrahigh acquisition speed is essential for time-resolved 4D recording of dynamic biological processes such as eye blinking, papillary reaction to light stimulus [10

10. I. Grulkowski, M. Gora, M. Szkulmowski, I. Gorczynska, D. Szlag, S. Marcos, A. Kowalczyk, and M. Wojtkowski, “Anterior segment imaging with Spectral OCT system using a high-speed CMOS camera,” Opt. Express 17(6), 4842–4858 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-6-4842. [CrossRef] [PubMed]

,11

11. M. Gora, K. Karnowski, M. Szkulmowski, B. J. Kaluzny, R. Huber, A. Kowalczyk, and M. Wojtkowski, “Ultra high-speed swept source OCT imaging of the anterior segment of human eye at 200 kHz with adjustable imaging range,” Opt. Express 17(17), 14880–14894 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-17-14880. [CrossRef] [PubMed]

], and embryonic heart beating [12

12. M. Gargesha, M. W. Jenkins, A. M. Rollins, and D. L. Wilson, “Denoising and 4D visualization of OCT images,” Opt. Express 16(16), 12313–12333 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-16-16-12313. [CrossRef] [PubMed]

14

14. M. W. Jenkins, F. Rothenberg, D. Roy, V. P. Nikolski, Z. Hu, M. Watanabe, D. L. Wilson, I. R. Efimov, and A. M. Rollins, “4D embryonic cardiography using gated optical coherence tomography,” Opt. Express 14(2), 736–748 (2006), http://www.opticsinfobase.org/oe/abstract.cfm?URI=OPEX-14-2-736. [CrossRef] [PubMed]

].

However, data processing and image rendering/display speed have not kept up with the ever increasing rate of data acquisition; this became one of the limiting factors for employing FD-OCT systems in practical microsurgery guidance and intervention applications. Currently, for most FD-OCT systems, the raw data is acquired in real-time and saved for post-processing. For microsurgeries, such imaging protocol provides valuable “pre-operative/post-operative” images, but is incapable of providing real-time, “inter-operative” imaging for surgical guidance and visualization. In addition, standard FD-OCT systems suffer from spatially reversed complex-conjugate ghost images that could severely misguide the users. As a solution, the complex full-range FD-OCT (C-FD-OCT) has been utilized, which removes the complex-conjugate image by applying a phase modulation on interferogram frames [16

16. Y. Yasuno, S. Makita, T. Endo, G. Aoki, M. Itoh, and T. Yatagai, “Simultaneous B-M-mode scanning method for real-time full-range Fourier domain optical coherence tomography,” Appl. Opt. 45(8), 1861–1865 (2006). [CrossRef] [PubMed]

20

20. H. M. Subhash, L. An, and R. K. Wang, “Ultra-high speed full range complex spectral domain optical coherence tomography for volumetric imaging at 140,000 A scans per second,” Proc. SPIE 7554, 75540K (2010). [CrossRef]

]. A 140k line/s 2048-pixel C-FD-OCT has been implemented for volumetric anterior chamber imaging [20

20. H. M. Subhash, L. An, and R. K. Wang, “Ultra-high speed full range complex spectral domain optical coherence tomography for volumetric imaging at 140,000 A scans per second,” Proc. SPIE 7554, 75540K (2010). [CrossRef]

]. However, the complex-conjugate processing is even more time-consuming and presents an extra burden when providing real-time images during surgical procedures.

Several methods have been implemented to improve data processing and visualization of FD-OCT images: Field-programmable gate array (FPGA) has been applied to both spectrometer and swept source-based systems [21

21. T. E. Ustun, N. V. Iftimia, R. D. Ferguson, and D. X. Hammer, “Real-time processing for Fourier domain optical coherence tomography using a field programmable gate array,” Rev. Sci. Instrum. 79(11), 114301 (2008). [CrossRef] [PubMed]

,22

22. A. E. Desjardins, B. J. Vakoc, M. J. Suter, S. H. Yun, G. J. Tearney, and B. E. Bouma, “Real-time FPGA processing for high-speed optical frequency domain imaging,” IEEE Trans. Med. Imaging 28(9), 1468–1472 (2009). [CrossRef] [PubMed]

]; multi-core CPU parallel processing has been implemented and achieved 80,000 line/s processing rate on nonlinear-k polarization- sensitive OCT system and 207,000 line/s on linear-k systems, both with 1024-point/A-scan [23

23. G. Liu, J. Zhang, L. Yu, T. Xie, and Z. Chen, “Real-time polarization-sensitive optical coherence tomography data processing with parallel computing,” Appl. Opt. 48(32), 6365–6370 (2009). [CrossRef] [PubMed]

,24

24. J. Probst, D. Hillmann, E. Lankenau, C. Winter, S. Oelckers, P. Koch, and G. Hüttmann, “Optical coherence tomography with online visualization of more than seven rendered volumes per second,” J. Biomed. Opt. 15(2), 026014 (2010). [CrossRef] [PubMed]

]. Moreover, recent progress in general-purpose computing on graphics processing units (GPGPU) makes it possible to implement heavy-duty OCT data processing and visualization on a variety of low-cost, many-core graphics cards.

For standard half-range FD-OCT, GPU-based data processing has been implemented on both linear-k and non-linear-k systems [25

25. Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14(6), 060506 (2009). [CrossRef]

27

27. S. Van der Jeught, A. Bradu, and A. G. Podoleanu, “Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit,” J. Biomed. Opt. 15(3), 030511 (2010). [CrossRef] [PubMed]

]. Real-time 4D OCT imaging has also been achieved up to 10 volume/s through GPU-based volume rendering [24

24. J. Probst, D. Hillmann, E. Lankenau, C. Winter, S. Oelckers, P. Koch, and G. Hüttmann, “Optical coherence tomography with online visualization of more than seven rendered volumes per second,” J. Biomed. Opt. 15(2), 026014 (2010). [CrossRef] [PubMed]

,26

26. K. Zhang and J. U. Kang, “Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system,” Opt. Express 18(11), 11772–11784 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-11-11772. [CrossRef] [PubMed]

]. We have found that, based on a 1024-pixel standard FD-OCT system using NVIDIA’s GTX 480 GPU, the maximum processing line rate is >3,000,000 line/s (effectively >1,000,000 line/s under data transfer limit), which will be shown in detail in Section 4.

For complex full-range FD-OCT, the processing workload is more than 3 times the standard OCT, since each A-scan requires three fast Fourier transforms (FFT) in different dimensions of the frame, a band-pass filtering, and necessary matrix transpose [16

16. Y. Yasuno, S. Makita, T. Endo, G. Aoki, M. Itoh, and T. Yatagai, “Simultaneous B-M-mode scanning method for real-time full-range Fourier domain optical coherence tomography,” Appl. Opt. 45(8), 1861–1865 (2006). [CrossRef] [PubMed]

]. In a separate work, we realized a real-time processing of 1024-pixel C-FD-OCT at >500,000 line/s (effectively >300,000 line/s under data transfer limit), and a real-time camera-limited display speed of 244,000 line/s [28

28. J. U. Kang and K. Zhang, “Real-time complex optical coherence tomography using graphics processing unit for surgical intervention,” to appear on IEEE Photonics Society Annual 2010, Denver, Colorado, USA, November, 2010.

]. Very recently, a 27,900 line/s 2048-pixel C-FD-OCT system was reported by Watanabe et al during the preparation of this manuscript [29

29. Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49(25), 4756–4762 (2010). [CrossRef] [PubMed]

].

In most FD-OCT systems, the signal is sampled nonlinearly in k-space, which will seriously degrade the image quality if the FFT is directly applied to such signal. So far there have been both hardware and software solutions to the nonlinear-k issue. Hardware solutions such as linear-k spectrometer [30

30. Z. Hu and A. M. Rollins, “Fourier domain optical coherence tomography with a linear-in-wavenumber spectrometer,” Opt. Lett. 32(24), 3525–3527 (2007). [CrossRef] [PubMed]

], linear-k swept laser [31

31. C. M. Eigenwillig, B. R. Biedermann, G. Palte, and R. Huber, “K-space linear Fourier domain mode locked laser and applications for optical coherence tomography,” Opt. Express 16(12), 8916–8937 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-16-12-8916. [CrossRef] [PubMed]

] and k-triggering [32

32. D. C. Adler, Y. Chen, R. Huber, J. Schmitt, J. Connolly, and J. G. Fujimoto, “Three-dimensional endomicroscopy using optical coherence tomography,” Nat. Photonics 1(12), 709–716 (2007). [CrossRef]

] have been successfully implemented, but these methods generally increase the system complexity and cost. Software solutions include various interpolation methods such as simple linear interpolation, oversampled linear interpolation, zero-filling linear interpolation, and cubic spline interpolation. Different GPU-based interpolation methods have also been implemented and compared [25

25. Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14(6), 060506 (2009). [CrossRef]

27

27. S. Van der Jeught, A. Bradu, and A. G. Podoleanu, “Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit,” J. Biomed. Opt. 15(3), 030511 (2010). [CrossRef] [PubMed]

]. Alternatively, the non-uniform discrete Fourier transform (NUDFT) has been proposed recently for both swept source OCT [33

33. S. S. Sherif, C. Flueraru, Y. Mao, and S. Change, “Swept Source Optical Coherence Tomography with Nonuniform Frequency Domain Sampling,” in Biomedical Optics, OSA Technical Digest (CD) (Optical Society of America, 2008), paper BMD86. [PubMed]

] and spectrometer-based OCT [34

34. K. Wang, Z. Ding, T. Wu, C. Wang, J. Meng, M. Chen, and L. Xu, “Development of a non-uniform discrete Fourier transform based high speed spectral domain optical coherence tomography system,” Opt. Express 17(14), 12121–12131 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-17-14-12121. [CrossRef] [PubMed]

] through direct Vandermonde matrix multiplication with the spectrum vector. Compared with the interpolation-FFT (InFFT) method, NUDFT is simpler to implement and immune to the interpolation-caused errors such as increased background noise and side-lobes, especially at larger image depth [35

35. S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446. [CrossRef] [PubMed]

]. Moreover, NUDFT has improved sensitivity roll-off than the InFFT [34

34. K. Wang, Z. Ding, T. Wu, C. Wang, J. Meng, M. Chen, and L. Xu, “Development of a non-uniform discrete Fourier transform based high speed spectral domain optical coherence tomography system,” Opt. Express 17(14), 12121–12131 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-17-14-12121. [CrossRef] [PubMed]

]. However, NUDFT by direct matrix multiplication is extremely time-consuming, with a complexity of O(N2), where N is the raw data size of an A-scan. As an approximation to NUDFT, the gridding-based non-uniform fast Fourier transform (NUFFT) has been tried to process simulated [36

36. D. Hillmann, G. Huttmann, and P. Koch, “Using nonequispaced fast Fourier transformation to process optical coherence tomography signals,” Proc. SPIE 7372, 73720R (2009). [CrossRef]

] and experimentally acquired data [35

35. S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446. [CrossRef] [PubMed]

] with reduced calculation complexity of ~O(NlogN). To the best of our knowledge, NUDFT/ NUFFT have yet to be utilized in ultra-high speed, real-time FD-OCT systems due to computational complexity and associated latency in data processing.

In this work, we implemented the fast Gaussian gridding (FGG)-based NUFFT on the GPU architecture for ultrafast signal processing in a general FD-OCT system. The Vandermonde matrix-based NUDFT as well as the linear/cubic InFFT methods are also implemented on GPU as comparisons of image quality and processing speed. GPU-NUFFT provides a very close approximation to GPU-NUDFT in terms of image quality while offering >10 times higher processing speed. Compared with the GPU-InFFT methods, we have also observed improved sensitivity roll-off, a higher local signal-to-noise ratio, and absence of side-lobe artifacts in GPU-NUFFT. Using a high speed CMOS line-scan camera, we demonstrated the real-time processing and display of GPU-NUFFT-based C-FD-OCT at a camera-limited speed of 122 k line/s (1024 pixel/A-scan).

2. System configuration

3. Implementation of GPU-NUDFT and GPU-NUFFT in FD-OCT systems

In this section, the implementation of both GPU-NUDFT and GPU-NUFFT in a standard FD-OCT system is described. For the implementation, the wavenumber-pixel relation of the system k[i]=2π/λ[i] is pre-calibrated accurately, where i refers to the pixel index.

3.1 GPU-NUDFT in FD-OCT

After pre-calibrating the k[i]relation, the depth information A[zm]can be implemented through discrete Fourier transform over non-uniformly distributed data I[ki], as in [33

33. S. S. Sherif, C. Flueraru, Y. Mao, and S. Change, “Swept Source Optical Coherence Tomography with Nonuniform Frequency Domain Sampling,” in Biomedical Optics, OSA Technical Digest (CD) (Optical Society of America, 2008), paper BMD86. [PubMed]

35

35. S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446. [CrossRef] [PubMed]

],
A[zm]=i=0N1I[ki]exp[-j2πΔk(ki-k0)*m] , m=0,1,2,...,N-1,
(1)
where N is the total pixel number of a spectrum, zm refers to the depth coordinate with the pixel index m. Δk=kN-1k0 is the wavenumber range.

For standard FD-OCT, where I[ki]are real-values, Eq. (1) can be reduce to half-range as,

A[zm]=i=0N1I[ki]exp[-j2πΔk(ki-k0)*m] , m=0,1,2,...,N/2-1 .
(2)

For C-FD-OCT, where I[ki] are complex-values after Hilbert transform [16

16. Y. Yasuno, S. Makita, T. Endo, G. Aoki, M. Itoh, and T. Yatagai, “Simultaneous B-M-mode scanning method for real-time full-range Fourier domain optical coherence tomography,” Appl. Opt. 45(8), 1861–1865 (2006). [CrossRef] [PubMed]

], Eq. (1) can be modified to full-range as,
A[zm]=i=0N1I[ki]exp[-j2πΔk(ki-k0)*(mN2)] , m=0,1,2,..., N-1 .
(3)
Here the index m is shifted by N/2 to set the DC component to the center ofA[zm]. Considering a frame with M A-scans, Eq. (2) and (3) can be written in matrix form for processing the whole frame as,
Ahalf=DhalfIreal,(Standard half-range FD-OCT),
(4)
where
Ahalf=[A0[z0]A1[z0]AM-2[z0]AM-1[z0]A0[z1]A1[z1]AM-2[z1]AM-1[z1]A0[zN/22]A1[zN/22]AM-2[zN/22]AM-1[zN/22]A0[zN/21]A1[zN/21]AM-2[zN/21]AM-1[zN/21]] ,
(5)
Ireal=[I0[z0]I1[z0]IM-2[z0]IM-1[z0]I0[z1]I1[z1]IM-2[z1]IM-1[z1]I0[zN2]I1[zN2]IM-2[zN2]IM-1[zN2]I0[zN1]I1[zN1]IM-2[zN1]IM-1[zN1]] ,
(6)
Dhalf=[1111p01p11pN21pN11p0(N/22)p1(N/22)pN2(N/22)pN1(N/22)p0(N/21)p1(N/21)pN2(N/21)pN1(N/21)] ,
(7)
and
Afull=DfullIcomplex,(Standard half-range FD-OCT),
(8)
where
Afull=[A0[z0]A1[z0]AM-2[z0]AM-1[z0]A0[z1]A1[z1]AM-2[z1]AM-1[z1]A0[zN2]A1[zN2]AM-2[zN2]AM-1[zN2]A0[zN1]A1[zN1]AM-2[zN1]AM-1[zN1]] ,
(9)
Icomplex=[I0[z0]I1[z0]IM-2[z0]IM-1[z0]I0[z1]I1[z1]IM-2[z1]IM-1[z1]I0[zN2]I1[zN2]IM-2[zN2]IM-1[zN2]I0[zN1]I1[zN1]IM-2[zN1]IM-1[zN1]] ,
(10)
Dfull=[p0+(N/2)p1+(N/2)pN2+(N/2)pN1+(N/2)p0+(N/21)p1+(N/21)pN2+(N/21)pN2+(N/21)p0(N/22)p1(N/22)pN2(N/22)pN1(N/22)p0(N/21)p1(N/21)pN2(N/21)pN1(N/21)] ,
(11)
where the subscript of A[zm] and I[ki] denotes the index of A-scan within one frame, the complex factorpi=exp[j2π/Δk*(kik0)], Dhalf and Dfull are the Vandermonde matrix, which can be pre-calculated from k[i].

To realize C-FD-OCT mode, a phase modulation ϕ(x)=βxis applied to each B-scan’s 2D interferogram frame I(k,x)by slightly displacing the probe beam off the galvanometer’s pivoting point, as shown in Fig. 1. Here x indicates A-scan index in each B-scan and by applying Fourier transform along x direction, the following equation can be obtained [16

16. Y. Yasuno, S. Makita, T. Endo, G. Aoki, M. Itoh, and T. Yatagai, “Simultaneous B-M-mode scanning method for real-time full-range Fourier domain optical coherence tomography,” Appl. Opt. 45(8), 1861–1865 (2006). [CrossRef] [PubMed]

]:
Fxu[I(k,x)]=|Er(k)|2δ(u)+Γu{Fxu[Es(k,x)]}+Fxu[Er*(k,x)Er(k)]δ(u+β)+Fxu[Es(k,x)Er*(k)]δ(uβ),
(12)
where Es(k,x)and Er(k)are the electrical fields from the sample and reference arms, respectively. Γu{}is the correlation operator. The first three terms on the right hand of Eq. (12) present the DC noise, autocorrelation noise, and complex-conjugate noise, respectively. The last term can be filtered out by a proper band-pass filter in the u domain and then convert back to x domain by applying an inverse Fourier transform along x direction. Here to implement the standard Hilbert transform, we use the Heaviside step function as the band-pass filter and the more delicate filters such as super Gaussian filter can also be designed to optimize the performance [29

29. Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49(25), 4756–4762 (2010). [CrossRef] [PubMed]

]. Finally, the OCT image is obtained by NUDFT in k domain and logarithmically scaled for display.

The GPU-CPU hybrid processing flowchart for standard/complex FD-OCT using GPU-NUDFT is shown in Fig. 2
Fig. 2 Processing flowchart for GPU-NUDFT based FD-OCT: CL, Camera Link; FG, frame grabber; HM, host memory; GM, graphics global memory; DC, DC removal; MT, matrix transpose; FFT-x, Fast Fourier transform in x direction; IFFT-x, inverse Fast Fourier transform in x direction; BPF-x, band pass filter in x direction; Log, logarithmical scaling. The solid arrows describe the main data stream and the hollow arrows indicate the internal data flow of the GPU. The blue dashed arrows indicate the direction of inter-thread triggering. The hollow dashed arrow denotes standard FD-OCT without the Hilbert transform in x direction. Blue blocks: memory for pre-stored data; Yellow blocks: memory for real-timely refreshed data.
, where three major threads are used for the data acquisition (Thread 1), the GPU processing (Thread 2), and the image display (Thread 3), respectively. The three threads are triggered unidirectionally and work in the synchronized pipeline mode, as indicated by the blue dashed arrows in Fig. 2. Thread 2 is a GPU-CPU hybrid thread which consists of hundreds of thousands of GPU threads. The solid arrows describe the main data stream and the hollow arrows indicate the internal data flow of the GPU. The DC removal is implemented by subtracting a pre-stored frame of reference signal. The Vandermonde matrix Dhalf/Dfull is pre-calculated and stored in graphics memory, as the blue block in Fig. 2. The NUDFT is implemented by an optimized matrix multiplication algorithm on CUDA using shared memory technology for the maximum usage of the GPU’s floating point operation ability [39

39. NVIDIA, “NVIDIA CUDA C Best Practices Guide 3.1,” (2010).

]. The graphics memory mentioned at the current stage of our system refers to global memory, which has relatively lower bandwidth and is another major limitation of GPU processing in addition to the PCIE x16 bandwidth limit. The processing speed can be further increased by mapping to texture memory, which has higher bandwidth than global memory.

3.2 GPU-NUFFT in FD-OCT

The direct GPU-NUDFT presented in Section 3.1 has a computation complexity of O(N2), which greatly limits the computation speed and scalability for real-time display even on GPU, as experimentally shown in Section 4. Alternative to direct GPU-NUDFT, here we implemented fast Gaussian gridding-based GPU-NUFFT to approximate GPU-NUDFT: the raw signal I[ki]is first oversampled by convolution with a Gaussian interpolation kernel on a uniform grid, as [40

40. L. Greengard and J. Lee, “Accelerating the nonuniform fast Fourier transform,” SIAM Rev. 46(3), 443–454 (2004). [CrossRef]

],
Iτ[u]=iI[i]gτ[kτ[u]k[i]] , u=0,1,2,…,Mr-1,
(13)
gτ[k]=exp[k24τ] ,
(14)
τ=1N2πR(R0.5)Msp ,
(15)
Gτ[n]=exp[n2τ] ,
(16)
where kτ[u]is uniform grid covering the same range as k[i],gτ[k]is the Gaussian interpolation kernel, Mr=R*Nis the uniform gridding size, and R is the oversampling ratio. On each kτ[u]grid, the source data I[i] is selected such that kτ[u] is within the nearest 2*Msp grids to k[i], where Msp is the kernel spread factor. The calculation of Eq. (13) is illustrated in Fig. 3
Fig. 3 Convolution with a Gaussian interpolation kernel on a uniform grid when R = 2, Msp = 2.
, where we set R = 2, Msp=2 and values of the blue dots on the same kτ[u]grid are summed together to obtain Iτ[u]. The selection of proper R and Msp will be further discussed in Section 4.2. Subsequently Iτ[u]is processed by the regular uniform FFT and finally undergo a deconvolution of gτ[k]by multiplying the factor Gτ[n], as in Eq. (16), where n denotes the axial index. If R>1, there will be redundant data due to oversampling and the final data needs to be truncated to the size of N. The processing flowchart of GPU-NUFFT-based FD-OCT is shown in Fig. 4
Fig. 4 Processing flowchart for GPU-NUFFT based FD-OCT: CL, Camera Link; FG, frame grabber; HM, host memory; GM, graphics global memory; DC, DC removal; MT, matrix transpose; CON; convolution with Gaussian kernel; FFT-x, Fast Fourier transform in x direction; IFFT-x, inverse Fast Fourier transform in x direction; BPF-x, band pass filter in x direction; FFT-kr, FFT in kr direction; TRUC, truncation of redundant data in kr direction; DECON, deconvolution with Gaussian kernel; Log, logarithmical scaling. The blue dashed arrows indicate the direction of inter-thread triggering. The solid arrows describe the main data stream and the hollow arrows indicate the internal data flow of the GPU. The hollow dashed arrow denotes standard FD-OCT without the Hilbert transform in x direction.
.

Here it is worth noting that the Kaiser-Bessel function is found to be the optimal convolution kernel for the gridding-based NUFFT shown in recent works [35

35. S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446. [CrossRef] [PubMed]

,36

36. D. Hillmann, G. Huttmann, and P. Koch, “Using nonequispaced fast Fourier transformation to process optical coherence tomography signals,” Proc. SPIE 7372, 73720R (2009). [CrossRef]

]. The implementation of Kaiser-Bessel convolution on GPU is similar to the Gaussian kernel and will be studied and evaluated in our future work.

4. GPU processing test and comparison of different FD-OCT methods

4.1 GPU processing line rate for different FD-OCT methods

First we performed benchmark line rate test of different FD-OCT processing methods as follows:

  • LIFFT: Standard FD-OCT with linear spline interpolation;
  • LIFFT-C: C-FD-OCT with linear spline interpolation;
  • CIFFT: Standard FD-OCT with cubic spline interpolation;
  • CIFFT-C: C-FD-OCT with cubic spline interpolation;
  • NUDFT: Standard FD-OCT with NUDFT;
  • NUDFT -C: C-FD-OCT with NUDFT;
  • NUFFT: Standard FD-OCT with NUFFT;
  • NUFFT -C: C-FD-OCT with NUFFT;

All algorithms are tested on the GTX 480 GPU with 4096 lines of both 1024-pixel spectrum and 2048-pixel spectrum. Here the 2048-pixel mode is tested as reference only and we will use the 1024-pixel mode for the real-time imaging tests in this work. For each case, both the peak internal processing line rate and the reduced line rate considering the data transfer bandwidth of PCIE x16 interface are listed in Fig. 5
Fig. 5 Benchmark line rate test of different FD-OCT processing method. (a) 1024-pixel FD-OCT; (b) 2048-pxiel FD-OCT; (c) 1024-pixel NUFFT-C with different frame size; (d) 2048-pixel NUFFT-C with different frame size; Both the peak internal processing line rate and the reduced line rate considering the data transfer bandwidth of PCIE x16 interface are listed.
. The processing time is measured using CUDA functions and all CUDA threads are synchronized before the measurement. In our system, the PCIE x16 interface between the host computer and the GPU has a data transfer bandwidth of about 4.6 GByte/s for both transfer-in and transfer-out. The data size is 8.4 MByte for 4096 lines of 1024-pixel spectrum and 16.8 MByte for 4096 lines of 2048-pixel spectrum, by allocating 2 Byte for each pixel.

As in Fig. 5(a), the final processing line rate for 1024-pixel C-FD-OCT with GPU-NUFFT is 173k line/s, which is still higher than the maximum camera acquisition rate of 128k line/s, while the GPU-NUDFT speed is relatively lower in both standard and complex FD-OCT. Also it is notable that the processing speed of LIFFT goes up to >3,000,000 line/s (effectively >1,000,000 line/s under data transfer limit), achieving the fastest processing line rate to date to the best of our knowledge.

For C-FD-OCT, the Hilbert transform, which is implemented by two Fast Fourier transforms, has the computational complexity of ~O(M*logM), where M is the number of lines within one frame. Therefore the processing line rate of C-FD-OCT is also influenced by the frame size M. To verify this, here we tested the relation between processing line rate of NUFFT-C mode versus frame size, as shown in Fig. 5(c) and 5(d), and a speed decrease is observed with increasing frame size.

The zero-filling interpolation with FFT is also effective in suppressing the side-lobe effect and background noise for FD-OCT. However, the zero-filling usually require an oversampling factor of 4 or 8, and two additional FFT, which considerably increase the array size and processing time of the data [41

41. C. Dorrer, N. Belabas, J. Likforman, and M. Joffre, “Spectral resolution and sampling issues in Fourier-transform spectral interferometry,” J. Opt. Soc. Am. B 17(10), 1795–1802 (2000). [CrossRef]

]. From Fig. 5(b), for 2048-pixel NUFFT-C mode, a peak internal processing speed of about 100,000 line/s is achieved. This is >40% faster than 70,000 line/s (1024 lines for 14.75ms) achieved in reference [29

29. Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49(25), 4756–4762 (2010). [CrossRef] [PubMed]

], which implemented GPU-based zero-filling interpolation for C-FD-OCT using a 480-core GPU (NVIDIA Geforce GTX 295, 2 × 240 core, 1.3GHz for each core).

4.2 Comparison of point spread function and sensitivity roll-off

Then we compared the point spread function (PSF) and sensitivity roll-off of different FD-OCT processing methods, as shown in Fig. 6(a)
Fig. 6 Point spread function and sensitivity roll-off of different processing methods: (a) LIFFT; (b) CIFFT; (c) NUDFT; (d) NUFFT; (e) Comparison of PSF at certain image depth using different processing; (f) Comparison of sensitivity roll-off using different processing methods; (g) A-scan FWHM with depth; (h) Performance of NUFFT with different Msp values.
6(d), using GPU based LIFFT, CIFFT, NUDFT and NUFFT, respectively. A mirror is used as an image sample for evaluating the PSF. Here 1024 of GPU-processed 1024-pixel A-scans are averaged to present the mean noise level for the PSF in each axial position [35

35. S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446. [CrossRef] [PubMed]

]. From Fig. 6(a), it is noticed that using LIFFT method introduces significant background level and side-lobes around the signal peak. The side-lobes tend to be broad and extend to their neighboring peaks, which results in a significant degrade of OCT image. When CIFFT is used instead, as shown in Fig. 6(b), the side-lobes are suppressed in the shallow image depth, but still considerably high in deeper depth. Figure 6(c) and 6(d) shows significant suppression of side-lobes over the whole image depth which utilized NUDFT and NUFFT, respectively. Figure 6(e) presents an individual PSF at a certain depth close to the imaging depth limit (intensity of each A-scan is normalized). Both LIFFT and CIFFT have large side-lobes, while both NUDFT and NUFFT have flat background and no noticeable side-lobes, which also indicate a higher local SNR at deeper axial positions. The measured sensitivity roll-off is summarized in Fig. 6(f). All methods have a very close sensitivity level at shallow image depth <400µm, while the interpolation based methods presents faster roll-off rate as axial position is increased. Here the sensitivity roll-off of LIFFT and CIFFT can be further improved by deconvolution of the linear/cubic interpolation kernel, and the result would be more close to NUDFT and NUFFT [35

35. S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446. [CrossRef] [PubMed]

,36

36. D. Hillmann, G. Huttmann, and P. Koch, “Using nonequispaced fast Fourier transformation to process optical coherence tomography signals,” Proc. SPIE 7372, 73720R (2009). [CrossRef]

]. It is also notable from Fig. 6(e) and 6(f) that the NUFFT method exhibit negligible difference from the more time-consuming NUDFT, indicating a very close and reliable approximation. The full width at half maximum (FWHM) values of individual A-scans processed by different methods are presented in Fig. 6(g), which indicates very close axial resolution for all methods.

Figure 6(h) presents the processing result of a certain A-scan using NUFFT with R = 2 and different Msp from 1 to 3. Here R is set to 2 for the convenience of applying GPU-based FFT functions (for length of 2N). From Fig. 6(h), it can be noticed that for Msp2, the NUFFT result is close enough to NUDFT result, therefore Msp=2 is selected to decrease the computational load.

4.3 Comparison of real-time image quality

We compared the real-time image quality of GPU-accelerated C-FD-OCT using different processing methods. Here a multi-layered polymer phantom is used as a sample. In the scanning protocol, each frame consists of 4296 A-scans in acquisition, but the first 200 lines are disposed before processing, since they are within the fly-back period of the galvanometer. Therefore each frame-size is 4096 pixel (lateral) × 1024 pixel (axial).

First, a single frame is captured and GPU-processed using different methods, shown in Fig. 7
Fig. 7 Real-time image of multilayered phantom using different processing methods, where the bars represent 1mm in both dimensions for all images: (a) LIFFT (Media 1, 29.8 fps); (b) CIFFT (Media 2, 29.8 fps); (c) NUDFT (Media 3, 9.3 fps); (d) NUFFT (Media 4, 29.8 fps). All images are originally 4096 pixel (lateral) × 1024 pixel (axial) and rescaled to 1024 pixel (lateral) × 512 pixel (axial) for display on the monitor. (e)~(h): Magnified view corresponding to the blue-boxed area in (a)~(d). ZDL: zero delay line. The red arrows in (a) and (b) indicate the ghost image due to the presence of side-lobes of the reflective surface at a large image depth relative to ZDL. The red lines correspond to the A-scans extracted from the same lateral position of each image, shown collectively in (i). The side-lobes of LIFFT/CIFFT are indicated by the blue arrow in (i).
. The red arrows in Fig. 7(a) and 7(b) indicate the ghost image due to the side-lobes of the reflective surface at a deeper image depth relative to the zero delay line (ZDL). The red lines correspond to the same A-scan position extracted from each image for comparison and shown collectively in (i). The resulting LIFFT/CIFFT images exhibit side-lobes in the order of 10/5 dB high compared to NUDFT/NUFFT images, as indicated by the blue arrow in Fig. 7(i).

Then we screen-captured the real-time displayed scenarios using different GPU- accelerated C-FD-OCT, shown in Media 1, Media 2, Media 3, and Media 4. The image frames are rescaled to 1024 pixel (lateral) × 512 pixel (axial) to accommodate the monitor display. LIFFT/CIFFT/NUFFT modes are running at 29.8 fps, corresponding to a camera-limited line rate of 122k line/s, while the NUDFT mode is GPU-limited to 9.3 fps (38k line/s). As shown in both Fig. 7 and the corresponding movies, in LIFFT/CIFFT modes, the ghost image is evident when the phantom surface moves further away from the ZDL. However, there is no evidence of ghost images in NUDFT/NUFFT modes wherever the surface is placed.

From these result, it is clear that the GPU-NUFFT method is a very close approximation of GPU-NUDFT while offering much higher processing speed. GPU-NUFFT can be achieved at a comparable processing speed to GPU-CIFFT and is immune to interpolation error-caused ghost images.

5. In vivo human finger imaging using GPU-NUFFT based C-FD-OCT

Finally we conducted the in vivo human finger imaging using GPU-NUFFT-based C-FD-OCT, displayed at 29.8fps with original frame size of 4096 pixel (lateral) × 1024 pixel (axial). Figure 8(a)
Fig. 8 Real-time C-FD-OCT images using GPU-NUFFT, where the bars represent 1mm in both dimensions for all images: (a) (Media 5) Finger tip, (coronal). (b) (Media 6) Finger palm (coronal). (c)~(d) (Media 7) Finger nail fold (coronal); (e)~(f) (Media 8, Media 9) Finger nail (sagittal). SD, sweat duct; SC, stratum corneum; SS, stratum spinosum; NP, nail plate; NB, nail bed; NR, nail root; E, epidermis; D, dermis.
and 8(b) present the coronal scans of the finger tip and palm, where the epithelial structures such as sweat duct (SD), stratum corneum (SC) and stratum spinosum (SS) are clearly distinguishable. Figure 8(c) and 8(d) present the coronal scans of the finger nail fold region, showing the major dermatologic structures such as epidermis (E), dermis (D), nail bed (NB), and nail root (NR), as well as in the sagittal scans in Fig. 8(e) and 8(f). The real-time display for each figure is captured as Media 5, Media 6, Media 7, Media 8, and Media 9, at 1024 pixel (lateral) × 512 pixel (axial). Compared to standard FD-OCT, the GPU-NUFFT C-FD-OCT image is free of conjugate artifact, DC noise, and autocorrelation noise. These noises are problematical to remove in standard FD-OCT. Moreover, due to the implementation of the complex OCT processing, the image depth is effectively doubled, with the highest SNR region in the zero delay point. Such ultra high-speed real-time C-FD-OCT could be highly useful for microsurgical guidance and intervention applications.

6. Conclusion

In this work, we implemented and successfully demonstrated the FGG-based NUFFT on the GPU architecture for online signal processing in a general FD-OCT system. The Vandermonde matrix-based NUDFT as well as the linear/cubic InFFT methods were also implemented on GPU as comparisons of image quality and processing speed. GPU-NUFFT provides an accurate approximation to GPU-NUDFT in terms of image quality while offering >10 times higher processing speed. Compared to the GPU-InFFT methods, GPU-NUFFT has better sensitivity roll-off, a higher local signal-to-noise ratio, and is immune to the side-lobe artifacts caused by interpolation error. Using a high speed CMOS line-scan camera, we demonstrated the real-time processing and display of GPU-NUFFT-based C-FD-OCT at a camera-limited speed of 122 k line/s (1024 pixel/A-scan). The GPU processing speed can be increased even higher by implementing a multiple-GPU architecture using more than one GPU in parallel.

Acknowledgments

This work was supported by National Institutes of Health (NIH) grant R21 1R21NS063131-01A1.

References and links

1.

A. Stephen, Boppart, Mark E. Brezinski and James G. Fujimoto, “Surgical Guidance and Intervention,” in Handbook of Optical Coherence Tomography, B. E. Bouma and G. J Tearney, ed. (Marcel Dekker, New York, NY, 2001).

2.

A. F. Low, G. J. Tearney, B. E. Bouma, and I. K. Jang, “Technology Insight: optical coherence tomography--current status and future development,” Nat. Clin. Pract. Cardiovasc. Med. 3(3), 154–162, quiz 172 (2006). [CrossRef] [PubMed]

3.

M. S. Jafri, R. Tang, and C. M. Tang, “Optical coherence tomography guided neurosurgical procedures in small rodents,” J. Neurosci. Methods 176(2), 85–95 (2009). [CrossRef]

4.

K. Zhang, W. Wang, J. Han, and J. U. Kang, “A surface topology and motion compensation system for microsurgery guidance and intervention based on common-path optical coherence tomography,” IEEE Trans. Biomed. Eng. 56(9), 2318–2321 (2009). [CrossRef] [PubMed]

5.

J. U. Kang, J.-H. Han, X. Liu, K. Zhang, C. G. Song, and P. Gehlbach, ““Endoscopic functional Fourier domain common path optical coherence tomography for microsurgery,” IEEE J. Sel. Top. Quantum Electron. 16(4), 781–792 (2010). [CrossRef]

6.

U. Sharma, N. M. Fried, and J. U. Kang, “All-fiber common-path optical coherence tomography: sensitivity optimization and system analysis,” IEEE J. Sel. Top. Quantum Electron. 11(4), 799–805 (2005). [CrossRef]

7.

B. Potsaid, I. Gorczynska, V. J. Srinivasan, Y. Chen, J. Jiang, A. Cable, and J. G. Fujimoto, “Ultrahigh speed spectral / Fourier domain OCT ophthalmic imaging at 70,000 to 312,500 axial scans per second,” Opt. Express 16(19), 15149–15169 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-16-19-15149. [CrossRef] [PubMed]

8.

R. Huber, D. C. Adler, and J. G. Fujimoto, “Buffered Fourier domain mode locking: Unidirectional swept laser sources for optical coherence tomography imaging at 370,000 lines/s,” Opt. Lett. 31(20), 2975–2977 (2006). [CrossRef] [PubMed]

9.

W.-Y. Oh, B. J. Vakoc, M. Shishkov, G. J. Tearney, and B. E. Bouma, “>400 kHz repetition rate wavelength-swept laser and application to high-speed optical frequency domain imaging,” Opt. Lett. 35(17), 2919–2921 (2010). [CrossRef] [PubMed]

10.

I. Grulkowski, M. Gora, M. Szkulmowski, I. Gorczynska, D. Szlag, S. Marcos, A. Kowalczyk, and M. Wojtkowski, “Anterior segment imaging with Spectral OCT system using a high-speed CMOS camera,” Opt. Express 17(6), 4842–4858 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-6-4842. [CrossRef] [PubMed]

11.

M. Gora, K. Karnowski, M. Szkulmowski, B. J. Kaluzny, R. Huber, A. Kowalczyk, and M. Wojtkowski, “Ultra high-speed swept source OCT imaging of the anterior segment of human eye at 200 kHz with adjustable imaging range,” Opt. Express 17(17), 14880–14894 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-17-14880. [CrossRef] [PubMed]

12.

M. Gargesha, M. W. Jenkins, A. M. Rollins, and D. L. Wilson, “Denoising and 4D visualization of OCT images,” Opt. Express 16(16), 12313–12333 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-16-16-12313. [CrossRef] [PubMed]

13.

M. Gargesha, M. W. Jenkins, D. L. Wilson, and A. M. Rollins, “High temporal resolution OCT using image-based retrospective gating,” Opt. Express 17(13), 10786–10799 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-13-10786. [CrossRef] [PubMed]

14.

M. W. Jenkins, F. Rothenberg, D. Roy, V. P. Nikolski, Z. Hu, M. Watanabe, D. L. Wilson, I. R. Efimov, and A. M. Rollins, “4D embryonic cardiography using gated optical coherence tomography,” Opt. Express 14(2), 736–748 (2006), http://www.opticsinfobase.org/oe/abstract.cfm?URI=OPEX-14-2-736. [CrossRef] [PubMed]

15.

W. Wieser, B. R. Biedermann, T. Klein, C. M. Eigenwillig, and R. Huber, “Multi-megahertz OCT: High quality 3D imaging at 20 million A-scans and 4.5 GVoxels per second,” Opt. Express 18(14), 14685–14704 (2010), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-18-14-14685. [CrossRef] [PubMed]

16.

Y. Yasuno, S. Makita, T. Endo, G. Aoki, M. Itoh, and T. Yatagai, “Simultaneous B-M-mode scanning method for real-time full-range Fourier domain optical coherence tomography,” Appl. Opt. 45(8), 1861–1865 (2006). [CrossRef] [PubMed]

17.

B. Baumann, M. Pircher, E. Götzinger, and C. K. Hitzenberger, “Full range complex spectral domain optical coherence tomography without additional phase shifters,” Opt. Express 15(20), 13375–13387 (2007), http://www.opticsinfobase.org/abstract.cfm?URI=oe-15-20-13375. [CrossRef] [PubMed]

18.

L. An and R. K. Wang, “Use of a scanner to modulate spatial interferograms for in vivo full-range Fourier-domain optical coherence tomography,” Opt. Lett. 32(23), 3423–3425 (2007). [CrossRef] [PubMed]

19.

S. Vergnole, G. Lamouche, and M. L. Dufour, “Artifact removal in Fourier-domain optical coherence tomography with a piezoelectric fiber stretcher,” Opt. Lett. 33(7), 732–734 (2008). [CrossRef] [PubMed]

20.

H. M. Subhash, L. An, and R. K. Wang, “Ultra-high speed full range complex spectral domain optical coherence tomography for volumetric imaging at 140,000 A scans per second,” Proc. SPIE 7554, 75540K (2010). [CrossRef]

21.

T. E. Ustun, N. V. Iftimia, R. D. Ferguson, and D. X. Hammer, “Real-time processing for Fourier domain optical coherence tomography using a field programmable gate array,” Rev. Sci. Instrum. 79(11), 114301 (2008). [CrossRef] [PubMed]

22.

A. E. Desjardins, B. J. Vakoc, M. J. Suter, S. H. Yun, G. J. Tearney, and B. E. Bouma, “Real-time FPGA processing for high-speed optical frequency domain imaging,” IEEE Trans. Med. Imaging 28(9), 1468–1472 (2009). [CrossRef] [PubMed]

23.

G. Liu, J. Zhang, L. Yu, T. Xie, and Z. Chen, “Real-time polarization-sensitive optical coherence tomography data processing with parallel computing,” Appl. Opt. 48(32), 6365–6370 (2009). [CrossRef] [PubMed]

24.

J. Probst, D. Hillmann, E. Lankenau, C. Winter, S. Oelckers, P. Koch, and G. Hüttmann, “Optical coherence tomography with online visualization of more than seven rendered volumes per second,” J. Biomed. Opt. 15(2), 026014 (2010). [CrossRef] [PubMed]

25.

Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14(6), 060506 (2009). [CrossRef]

26.

K. Zhang and J. U. Kang, “Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system,” Opt. Express 18(11), 11772–11784 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-11-11772. [CrossRef] [PubMed]

27.

S. Van der Jeught, A. Bradu, and A. G. Podoleanu, “Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit,” J. Biomed. Opt. 15(3), 030511 (2010). [CrossRef] [PubMed]

28.

J. U. Kang and K. Zhang, “Real-time complex optical coherence tomography using graphics processing unit for surgical intervention,” to appear on IEEE Photonics Society Annual 2010, Denver, Colorado, USA, November, 2010.

29.

Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49(25), 4756–4762 (2010). [CrossRef] [PubMed]

30.

Z. Hu and A. M. Rollins, “Fourier domain optical coherence tomography with a linear-in-wavenumber spectrometer,” Opt. Lett. 32(24), 3525–3527 (2007). [CrossRef] [PubMed]

31.

C. M. Eigenwillig, B. R. Biedermann, G. Palte, and R. Huber, “K-space linear Fourier domain mode locked laser and applications for optical coherence tomography,” Opt. Express 16(12), 8916–8937 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-16-12-8916. [CrossRef] [PubMed]

32.

D. C. Adler, Y. Chen, R. Huber, J. Schmitt, J. Connolly, and J. G. Fujimoto, “Three-dimensional endomicroscopy using optical coherence tomography,” Nat. Photonics 1(12), 709–716 (2007). [CrossRef]

33.

S. S. Sherif, C. Flueraru, Y. Mao, and S. Change, “Swept Source Optical Coherence Tomography with Nonuniform Frequency Domain Sampling,” in Biomedical Optics, OSA Technical Digest (CD) (Optical Society of America, 2008), paper BMD86. [PubMed]

34.

K. Wang, Z. Ding, T. Wu, C. Wang, J. Meng, M. Chen, and L. Xu, “Development of a non-uniform discrete Fourier transform based high speed spectral domain optical coherence tomography system,” Opt. Express 17(14), 12121–12131 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-17-14-12121. [CrossRef] [PubMed]

35.

S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446. [CrossRef] [PubMed]

36.

D. Hillmann, G. Huttmann, and P. Koch, “Using nonequispaced fast Fourier transformation to process optical coherence tomography signals,” Proc. SPIE 7372, 73720R (2009). [CrossRef]

37.

NVIDIA, “NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 3.1,” (2010).

38.

NVIDIA, “NVIDIA CUDA CUFFT Library Version 3.1,” (2010).

39.

NVIDIA, “NVIDIA CUDA C Best Practices Guide 3.1,” (2010).

40.

L. Greengard and J. Lee, “Accelerating the nonuniform fast Fourier transform,” SIAM Rev. 46(3), 443–454 (2004). [CrossRef]

41.

C. Dorrer, N. Belabas, J. Likforman, and M. Joffre, “Spectral resolution and sampling issues in Fourier-transform spectral interferometry,” J. Opt. Soc. Am. B 17(10), 1795–1802 (2000). [CrossRef]

OCIS Codes
(100.2000) Image processing : Digital image processing
(110.4500) Imaging systems : Optical coherence tomography
(170.3890) Medical optics and biotechnology : Medical optics instrumentation

ToC Category:
Image Processing

History
Original Manuscript: September 17, 2010
Revised Manuscript: October 14, 2010
Manuscript Accepted: October 15, 2010
Published: October 22, 2010

Virtual Issues
Vol. 6, Iss. 1 Virtual Journal for Biomedical Optics

Citation
Kang Zhang and Jin U. Kang, "Graphics processing unit accelerated non-uniform fast Fourier transform for ultrahigh-speed, real-time Fourier-domain OCT," Opt. Express 18, 23472-23487 (2010)
http://www.opticsinfobase.org/vjbo/abstract.cfm?URI=oe-18-22-23472


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. A. Stephen, Boppart, Mark E. Brezinski and James G. Fujimoto, “Surgical Guidance and Intervention,” in Handbook of Optical Coherence Tomography, B. E. Bouma and G. J Tearney, ed. (Marcel Dekker, New York, NY, 2001).
  2. A. F. Low, G. J. Tearney, B. E. Bouma, and I. K. Jang, “Technology Insight: optical coherence tomography--current status and future development,” Nat. Clin. Pract. Cardiovasc. Med. 3(3), 154–162, quiz 172 (2006). [CrossRef] [PubMed]
  3. M. S. Jafri, R. Tang, and C. M. Tang, “Optical coherence tomography guided neurosurgical procedures in small rodents,” J. Neurosci. Methods 176(2), 85–95 (2009). [CrossRef]
  4. K. Zhang, W. Wang, J. Han, and J. U. Kang, “A surface topology and motion compensation system for microsurgery guidance and intervention based on common-path optical coherence tomography,” IEEE Trans. Biomed. Eng. 56(9), 2318–2321 (2009). [CrossRef] [PubMed]
  5. J. U. Kang, J.-H. Han, X. Liu, K. Zhang, C. G. Song, and P. Gehlbach, ““Endoscopic functional Fourier domain common path optical coherence tomography for microsurgery,” IEEE J. Sel. Top. Quantum Electron. 16(4), 781–792 (2010). [CrossRef]
  6. U. Sharma, N. M. Fried, and J. U. Kang, “All-fiber common-path optical coherence tomography: sensitivity optimization and system analysis,” IEEE J. Sel. Top. Quantum Electron. 11(4), 799–805 (2005). [CrossRef]
  7. B. Potsaid, I. Gorczynska, V. J. Srinivasan, Y. Chen, J. Jiang, A. Cable, and J. G. Fujimoto, “Ultrahigh speed spectral / Fourier domain OCT ophthalmic imaging at 70,000 to 312,500 axial scans per second,” Opt. Express 16(19), 15149–15169 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-16-19-15149 . [CrossRef] [PubMed]
  8. R. Huber, D. C. Adler, and J. G. Fujimoto, “Buffered Fourier domain mode locking: Unidirectional swept laser sources for optical coherence tomography imaging at 370,000 lines/s,” Opt. Lett. 31(20), 2975–2977 (2006). [CrossRef] [PubMed]
  9. W.-Y. Oh, B. J. Vakoc, M. Shishkov, G. J. Tearney, and B. E. Bouma, “>400 kHz repetition rate wavelength-swept laser and application to high-speed optical frequency domain imaging,” Opt. Lett. 35(17), 2919–2921 (2010). [CrossRef] [PubMed]
  10. I. Grulkowski, M. Gora, M. Szkulmowski, I. Gorczynska, D. Szlag, S. Marcos, A. Kowalczyk, and M. Wojtkowski, “Anterior segment imaging with Spectral OCT system using a high-speed CMOS camera,” Opt. Express 17(6), 4842–4858 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-6-4842 . [CrossRef] [PubMed]
  11. M. Gora, K. Karnowski, M. Szkulmowski, B. J. Kaluzny, R. Huber, A. Kowalczyk, and M. Wojtkowski, “Ultra high-speed swept source OCT imaging of the anterior segment of human eye at 200 kHz with adjustable imaging range,” Opt. Express 17(17), 14880–14894 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-17-14880 . [CrossRef] [PubMed]
  12. M. Gargesha, M. W. Jenkins, A. M. Rollins, and D. L. Wilson, “Denoising and 4D visualization of OCT images,” Opt. Express 16(16), 12313–12333 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-16-16-12313 . [CrossRef] [PubMed]
  13. M. Gargesha, M. W. Jenkins, D. L. Wilson, and A. M. Rollins, “High temporal resolution OCT using image-based retrospective gating,” Opt. Express 17(13), 10786–10799 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-13-10786 . [CrossRef] [PubMed]
  14. M. W. Jenkins, F. Rothenberg, D. Roy, V. P. Nikolski, Z. Hu, M. Watanabe, D. L. Wilson, I. R. Efimov, and A. M. Rollins, “4D embryonic cardiography using gated optical coherence tomography,” Opt. Express 14(2), 736–748 (2006), http://www.opticsinfobase.org/oe/abstract.cfm?URI=OPEX-14-2-736 . [CrossRef] [PubMed]
  15. W. Wieser, B. R. Biedermann, T. Klein, C. M. Eigenwillig, and R. Huber, “Multi-megahertz OCT: High quality 3D imaging at 20 million A-scans and 4.5 GVoxels per second,” Opt. Express 18(14), 14685–14704 (2010), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-18-14-14685 . [CrossRef] [PubMed]
  16. Y. Yasuno, S. Makita, T. Endo, G. Aoki, M. Itoh, and T. Yatagai, “Simultaneous B-M-mode scanning method for real-time full-range Fourier domain optical coherence tomography,” Appl. Opt. 45(8), 1861–1865 (2006). [CrossRef] [PubMed]
  17. B. Baumann, M. Pircher, E. Götzinger, and C. K. Hitzenberger, “Full range complex spectral domain optical coherence tomography without additional phase shifters,” Opt. Express 15(20), 13375–13387 (2007), http://www.opticsinfobase.org/abstract.cfm?URI=oe-15-20-13375 . [CrossRef] [PubMed]
  18. L. An and R. K. Wang, “Use of a scanner to modulate spatial interferograms for in vivo full-range Fourier-domain optical coherence tomography,” Opt. Lett. 32(23), 3423–3425 (2007). [CrossRef] [PubMed]
  19. S. Vergnole, G. Lamouche, and M. L. Dufour, “Artifact removal in Fourier-domain optical coherence tomography with a piezoelectric fiber stretcher,” Opt. Lett. 33(7), 732–734 (2008). [CrossRef] [PubMed]
  20. H. M. Subhash, L. An, and R. K. Wang, “Ultra-high speed full range complex spectral domain optical coherence tomography for volumetric imaging at 140,000 A scans per second,” Proc. SPIE 7554, 75540K (2010). [CrossRef]
  21. T. E. Ustun, N. V. Iftimia, R. D. Ferguson, and D. X. Hammer, “Real-time processing for Fourier domain optical coherence tomography using a field programmable gate array,” Rev. Sci. Instrum. 79(11), 114301 (2008). [CrossRef] [PubMed]
  22. A. E. Desjardins, B. J. Vakoc, M. J. Suter, S. H. Yun, G. J. Tearney, and B. E. Bouma, “Real-time FPGA processing for high-speed optical frequency domain imaging,” IEEE Trans. Med. Imaging 28(9), 1468–1472 (2009). [CrossRef] [PubMed]
  23. G. Liu, J. Zhang, L. Yu, T. Xie, and Z. Chen, “Real-time polarization-sensitive optical coherence tomography data processing with parallel computing,” Appl. Opt. 48(32), 6365–6370 (2009). [CrossRef] [PubMed]
  24. J. Probst, D. Hillmann, E. Lankenau, C. Winter, S. Oelckers, P. Koch, and G. Hüttmann, “Optical coherence tomography with online visualization of more than seven rendered volumes per second,” J. Biomed. Opt. 15(2), 026014 (2010). [CrossRef] [PubMed]
  25. Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14(6), 060506 (2009). [CrossRef]
  26. K. Zhang and J. U. Kang, “Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system,” Opt. Express 18(11), 11772–11784 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-11-11772 . [CrossRef] [PubMed]
  27. S. Van der Jeught, A. Bradu, and A. G. Podoleanu, “Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit,” J. Biomed. Opt. 15(3), 030511 (2010). [CrossRef] [PubMed]
  28. J. U. Kang and K. Zhang, “Real-time complex optical coherence tomography using graphics processing unit for surgical intervention,” to appear on IEEE Photonics Society Annual 2010, Denver, Colorado, USA, November, 2010.
  29. Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49(25), 4756–4762 (2010). [CrossRef] [PubMed]
  30. Z. Hu and A. M. Rollins, “Fourier domain optical coherence tomography with a linear-in-wavenumber spectrometer,” Opt. Lett. 32(24), 3525–3527 (2007). [CrossRef] [PubMed]
  31. C. M. Eigenwillig, B. R. Biedermann, G. Palte, and R. Huber, “K-space linear Fourier domain mode locked laser and applications for optical coherence tomography,” Opt. Express 16(12), 8916–8937 (2008), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-16-12-8916 . [CrossRef] [PubMed]
  32. D. C. Adler, Y. Chen, R. Huber, J. Schmitt, J. Connolly, and J. G. Fujimoto, “Three-dimensional endomicroscopy using optical coherence tomography,” Nat. Photonics 1(12), 709–716 (2007). [CrossRef]
  33. S. S. Sherif, C. Flueraru, Y. Mao, and S. Change, “Swept Source Optical Coherence Tomography with Nonuniform Frequency Domain Sampling,” in Biomedical Optics, OSA Technical Digest (CD) (Optical Society of America, 2008), paper BMD86. [PubMed]
  34. K. Wang, Z. Ding, T. Wu, C. Wang, J. Meng, M. Chen, and L. Xu, “Development of a non-uniform discrete Fourier transform based high speed spectral domain optical coherence tomography system,” Opt. Express 17(14), 12121–12131 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-17-14-12121 . [CrossRef] [PubMed]
  35. S. Vergnole, D. Lévesque, and G. Lamouche, “Experimental validation of an optimized signal processing method to handle non-linearity in swept-source optical coherence tomography,” Opt. Express 18(10), 10446–10461 (2010), http://www.opticsinfobase.org/abstract.cfm?uri=oe-18-10-10446 . [CrossRef] [PubMed]
  36. D. Hillmann, G. Huttmann, and P. Koch, “Using nonequispaced fast Fourier transformation to process optical coherence tomography signals,” Proc. SPIE 7372, 73720R (2009). [CrossRef]
  37. NVIDIA, “NVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 3.1,” (2010).
  38. NVIDIA, “NVIDIA CUDA CUFFT Library Version 3.1,” (2010).
  39. NVIDIA, “NVIDIA CUDA C Best Practices Guide 3.1,” (2010).
  40. L. Greengard and J. Lee, “Accelerating the nonuniform fast Fourier transform,” SIAM Rev. 46(3), 443–454 (2004). [CrossRef]
  41. C. Dorrer, N. Belabas, J. Likforman, and M. Joffre, “Spectral resolution and sampling issues in Fourier-transform spectral interferometry,” J. Opt. Soc. Am. B 17(10), 1795–1802 (2000). [CrossRef]

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

Supplementary Material


» Media 1: AVI (4002 KB)     
» Media 2: AVI (3968 KB)     
» Media 3: AVI (3674 KB)     
» Media 4: AVI (3862 KB)     
» Media 5: AVI (3938 KB)     
» Media 6: AVI (3943 KB)     
» Media 7: AVI (3956 KB)     
» Media 8: AVI (3156 KB)     
» Media 9: AVI (3944 KB)     

« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited