OSA's Digital Library

Applied Optics

Applied Optics


  • Editor: Joseph N. Mait
  • Vol. 50, Iss. 13 — May. 1, 2011
  • pp: 1832–1838

Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units

Jian Li, Pavel Bloch, Jing Xu, Marinko V. Sarunic, and Lesley Shannon  »View Author Affiliations

Applied Optics, Vol. 50, Issue 13, pp. 1832-1838 (2011)

View Full Text Article

Enhanced HTML    Acrobat PDF (608 KB)

Browse Journals / Lookup Meetings

Browse by Journal and Year


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools



Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not “share” memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform’s memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.

© 2011 Optical Society of America

OCIS Codes
(100.2000) Image processing : Digital image processing
(170.3890) Medical optics and biotechnology : Medical optics instrumentation
(170.4500) Medical optics and biotechnology : Optical coherence tomography

ToC Category:
Medical Optics and Biotechnology

Original Manuscript: November 1, 2010
Revised Manuscript: February 19, 2011
Manuscript Accepted: February 21, 2011
Published: April 22, 2011

Virtual Issues
Vol. 6, Iss. 6 Virtual Journal for Biomedical Optics

Jian Li, Pavel Bloch, Jing Xu, Marinko V. Sarunic, and Lesley Shannon, "Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units," Appl. Opt. 50, 1832-1838 (2011)

Sort:  Author  |  Year  |  Journal  |  Reset  


  1. T. Schmoll, C. Kolbitsch, and R. A. Leitgeb, “Ultra-high-speed volumetric tomography of human retinal blood flow,” Opt. Express 17, 4166–4176 (2009). [CrossRef] [PubMed]
  2. M. Wojtkowski, V. Srinivasan, T. Ko, J. Fujimoto, A. Kowalczyk, and J. Duker, “Ultrahigh-resolution, high-speed, Fourier domain optical coherence tomography and methods for dispersion compensation,” Opt. Express 12, 2404–2422 (2004). [CrossRef] [PubMed]
  3. M. K. K. Leung, A. Mariampillai, B. A. Standish, K. K. C. Lee, N. R. Munce, I. A. Vitkin, and V. X. D. Yang, “High-power wavelength-swept laser in littman telescope-less polygon filter and dual-amplifier configuration for multichannel optical coherence tomography,” Opt. Lett. 34, 2814–2816 (2009). [CrossRef] [PubMed]
  4. Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14, 060506 (2009). [CrossRef]
  5. K. Zhang and J. U. Kang, “Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system,” Opt. Express 18, 11772–11784 (2010). [CrossRef] [PubMed]
  6. Q. Fang and D. A. Boas, “Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units,” Opt. Express 17, 20178–20190 (2009). [CrossRef] [PubMed]
  7. N. Ren, J. Liang, X. Qu, J. Li, B. Lu, and J. Tian, “GPU-based Monte Carlo simulation for light propagation in complex heterogeneous tissues,” Opt. Express 18, 6811–6823(2010). [CrossRef] [PubMed]
  8. E. Alerstam, W. C. Y. Lo, T. D. Han, J. Rose, S. Andersson-Engels, and L. Lilge, “Next-generation acceleration and code optimization for light transport in turbid media using GPUs,” Biomed. Opt. Express 1, 658–675 (2010). [CrossRef]
  9. Y. Watanabe and T. Itagaki, “Real-time display on SD-OCT using a linear-in-wavenumber spectrometer and a graphics processing unit,” Proc. SPIE 7554, 75542S (2010). [CrossRef]
  10. Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49, 4756–4762 (2010). [CrossRef] [PubMed]
  11. J. Xu, L. Molday, R. Molday, and M. Sarunic, “In vivo imaging of the mouse model of X-linked juvenile retinoschisis with Fourier domain optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 50, 2989 (2009). [CrossRef]
  12. J. Goodman, Statistical Optics (Wiley, 2000).
  13. “NVIDIA CUDA Programming Guide,” (2009), http ://developer.nvidia.com/object/cuda_2_3_downloads.html.
  14. “CUDA CUFFT Library,” 2009, http ://developer.nvidia.com/object/cuda_2_3_downloads.html.
  15. Ideally, the postprocessed data should be directly copied into the frame buffer on the GPU for display without additional copies between host and device.
  16. “CUDA Visual Profiler,” 2009, http ://developer.nvidia.com/object/cuda_2_3_downloads.html.
  17. Because of the overhead incurred from initiating data transfers between device and host memory, data copies need to be “batched” (i.e., multiple individual copies grouped together into a single multiword copy) to amortize this cost . In Fig. , a batch size of 8192 was used.
  18. Integrated GPUs are packaged on the same chip as the system memory controller and system I/O controller to provide a compact, low cost, low power-consumption solution. Because of these design constraints, they provide less processing power and fewer processing cores to meet the requirements.
  19. G. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE 86, 82–85 (1998). [CrossRef]
  20. W. Wieser, B. R. Biedermann, T. Klein, C. M. Eigenwillig, and R. Huber, “Multi-megahertz OCT: High quality 3d imaging at 20 million a-scans and 4.5 gvoxels per second,” Opt. Express 18, 14685–14704 (2010). [CrossRef] [PubMed]

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited