OSA's Digital Library

Optics Express

Optics Express

  • Editor: C. Martijn de Sterke
  • Vol. 19, Iss. 13 — Jun. 20, 2011
  • pp: 12125–12130
« Show journal navigation

GPU based real-time quadrature transform method for 3-D surface measurement and visualization

Arturo Espinosa-Romero and Ricardo Legarda-Saenz  »View Author Affiliations


Optics Express, Vol. 19, Issue 13, pp. 12125-12130 (2011)
http://dx.doi.org/10.1364/OE.19.012125


View Full Text Article

Acrobat PDF (693 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

In this article, we propose a massively parallel, real-time algorithm for the estimation of the dynamic phase map of a vibrating object. The algorithm implements a Fourier-based quadrature transform and temporal phase unwrapping technique. CUDA, a graphic processing unit programming architecture was used to implement the algorithm. It was tested on a fringe pattern sequence using three devices with different capabilities, achieving a processing rate greater than 1600 frames per second (fps).

© 2011 OSA

1. Introduction

A Fourier-based quadrature transform and temporal phase unwrapping technique for estimation of the dynamic phase map from a vibrating object recently became available [1

1. R. Legarda-Saenz, R. Rodriguez-Vera, and A. Espinosa-Romero, “Dynamic 3-D shape measurement method based on quadrature transform,” Opt. Express 18(3), 2639–2645 (2010). [CrossRef] [PubMed]

]. The new method has significant computational advantages over traditional methods. The method allows the automatic processing of large fringe sequences, however, despite all of its advantages, its computation on a conventional computer takes long enough to make it unsuitable for real-time applications.

High-speed digital recording equipment allows to acquire a large sequence of images that record the fluctuations of the projected fringe due to object deformations. Although computer power has increased dramatically in the last 20 years, it is sill a challenge to use general purpose computing platforms to perform real-time image processing tasks. Graphic processing units (GPU), on the other hand, are massively parallel computers that are designed to work like numeric computing engines and, when paralellization is possible, they are perfectly suitable to implement image processing tasks in real-time. In this paper, we propose a real-time algorithm to estimate dynamic phase maps using a graphic processing unit. We tested the algorithms on three different CUDA capable devices [2

2. NVIDIA, NVIDIA CUDA C Programming Guide (version 3.2), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/ (2010).

], achieving a 20x speedup compared to similar code executed on a conventional computer.

2. Theoretical development

The equation that models the observed dynamic fringe pattern can be defined as
I(r,t)=a(r)+b(r,t)cosψ(r,t)
(1)
where r = (x 1, x 2, ⋯ , xn) is an n-dimensional position vector, a(r) is the background illumination and b(r ,t) is the amplitude modulation. It should be noted that a(r) is considered to remain constant throughout the experiment.

A sequence of N frames are captured from the dynamic movement of the object in such way that
{I(r,t0),I(r,t0+Δt,I(r,t0+2Δt),,I(r,t0+(N1)Δt}
(2)
where Δt is the temporal period of the captured frame and is smaller than the temporal period of the vibration cycle. Under this condition each frame can be seen as
Ik(r)=a(r)+bk(r)cosψk(r)
(3)
where Ik(r) = I(r ,to + kΔt), bk(r) = b(r ,to + kΔt) and ψk(r) = ψ (r ,to + kΔt) for k = 0, 1, ⋯ , N – 1.

2.1. The general n-dimensional quadrature transform

As pointed out in [1

1. R. Legarda-Saenz, R. Rodriguez-Vera, and A. Espinosa-Romero, “Dynamic 3-D shape measurement method based on quadrature transform,” Opt. Express 18(3), 2639–2645 (2010). [CrossRef] [PubMed]

], an alternative method to Fourier-based demodulation that avoids the need to select the appropriate filter manually in order to isolate the desired frequency information for every fringe pattern, is to estimate the quadrature transform of every fringe pattern [4

4. K. G. Larkin, D. J. Bone, and M. A. Oldfield, “Natural demodulation of two-dimensional fringe patterns. I. General background of the spiral phase quadrature transform,” J. Opt. Soc. Am. A 18, 1862–1870 (2001). [CrossRef]

, 5

5. M. Servin, J. A. Quiroga, and J. L. Marroquin, “General n-dimensional quadrature transform and its application to interferogram demodulation,” J. Opt. Soc. Am. A 20, 924–934 (2003). [CrossRef]

]. Assuming that the constant background illumination a(r) is filtered from the fringe pattern defined in Eq. (2), the following fringe pattern results [7

7. T. Kreis, Holographic Interferometry: Principles and Methods (Akademie Verlag, 1996).

]
gk(r)=bkcosψk(r)
(4)

Quadrature estimation consists of transforming the fringe pattern of Eq. (4) into its quadrature term defined by
g^(r)=bk(r)sinψk(r)
and obtaining the following fringe complex pattern
gkig^(r)=bk(r)exp[iψk(r)]
(5)
where i=1.

The n-dimensional quadrature transform for fringe patterns with carrier frequency is defined as [5

5. M. Servin, J. A. Quiroga, and J. L. Marroquin, “General n-dimensional quadrature transform and its application to interferogram demodulation,” J. Opt. Soc. Am. A 20, 924–934 (2003). [CrossRef]

]
Qn{gk(r)}=bk(r)sin(ψk(r))=1{iωq|ωq|{gk(r)}}
(6)
where q = {u 1 , u 2 ,..., un} is the n-dimensional position vector on the frequency domain and {·} denotes de Fourier transform. Using Eq. (6) we can write the complex fringe pattern of Eq. (5) as
bk(r)exp[iψk(r)]=gk(r)i1{iωq|ωq|{gk(r)}}
(7)
which in turn can be reduced to the following expression
bk(r)exp[iψk(r)]=1{(1ωq|ωq|){gk(r)}}
(8)

2.2. Temporal phase unwrapping

Once the quadrature term has been computed using Eq. (6) the wrapped phase is obtained with
ψ^k(r)=W{ψk}=arctan(Q{gk(r)}gk)
(9)
where W {·} is called the wrapping operator [6

6. D. Ghigila and M. D. Pritt, Two-Dimensional Phase Unwrapping. Theory, Algorithms, and Software (John Wiley & Sons Ltd., 1998).

].

The transient phase change occurring over time can be retrieved by unwrapping the phase maps. The phase map φM(r) can be computed by the sum of the M – 1 phase differences using the following equation [8

8. J. M. Huntley and H. Saldner, “Temporal phase-unwrapping algorithm for automated interferogram analysis,” Appl. Opt. 32, 3047–3052 (1993). [CrossRef] [PubMed]

, 9

9. S. De Nicola and P. Ferraro, “Fourier transform method of fringe analysis for moire interferometry,” J. Opt. A: Pure Appl. Opt. 2, 228–233 (2000). [CrossRef]

]
φM(r)=k=1Marctancosψ^ksinψ^k1sinψ^kcosψ^k1cosψ^kcosψ^k1sinψ^ksinψ^k1
(10)

3. Parallel implementation of the quadrature transform and phase unwrapping on CUDA

Graphics processing units are processors that were originally designed to perform graphics computation for visualization purposes. They recently have taken a prominent role in scientific computing and other areas where computational requirements are large. In comparison to modern general-purpose processors, using multiple cores to attain a performance increase, GPU’s have a much larger number of cores, and therefore are designed to execute tasks where the parallelism is substantial, and throughput is more important than latency [10

10. J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips “GPU computing,” Proc. IEEE 96(5), 879–899 (2008). [CrossRef]

].

The programming model of a GPU is single-program multiple-data (SMPD), i.e. the same program is concurrently executed in multiple processors, each one processing different data. The GPU hardware is organized as a set of multiprocessors which contain many cores that are capable to execute thousands of threads concurrently. In particular, we implement the algorithm described in this article using the CUDA architecture, a general purpose computing and development platform for NVIDIA GPU’s. CUDA provides a valuable abstraction that allows the organization of threads in a two-level hierarchy; at its most basic level, threads are grouped in blocks, while at a higher level these are organized like a grid. A block can be conceived as a three-dimensional array of threads, while a grid is a two-dimensional array of blocks. Threads within a block shares local memory and registers and can be scheduled on any of the available processor cores, either concurrently or sequentially; this allows a compiled CUDA program to run on different CUDA capable devices [2

2. NVIDIA, NVIDIA CUDA C Programming Guide (version 3.2), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/ (2010).

].

A kernel is a c-like function that is executed in parallel in the GPU. Each thread in a block executes the same kernel, and is given two unique indexes that together identify the thread. These indexes are accessible within the kernel as two variables: threadId and blockId, a 3-component and 2-component vectors that identify respectively the thread within the block and the grid. Each kernel can determine the portion of data it should process using these two indexes. The programming model considers the GPU as an external device, and it distinguishes explicitly the code executed sequentially on the host computer and the one executed in parallel on the GPU (kernels). Therefore, CUDA programs run on at least two devices, the host computer and one (or more) GPU. The memory space of the GPU device and the host computer are different and, as a result, a program needs to transfer data between host and the device.

The input of the algorithm to estimate the dynamic phase map is a sequence of images captured from the vibrating surface where a fringe pattern has been projected; the algorithm output is an estimation of the fringe pattern phase. The algorithm has two well defined stages: application of the quadrature transform to a frame and estimation of the wrapped phase and the unwrapping of the temporal phase. The first stage involves first transforming the frame to the Fourier domain, applying a band-pass filter, and finally computing the quadrature transform and transform it back into the space domain. Applying the filter and computing the quadrature transform to the frame is done in a single step: each frame element is multiplied by its corresponding filter element, while the quadrature transform –as can be seen from Eq. (8), can be applied to the filtered frame by simply blocking half the spectral domain for which ω⃗ · q < 0, and doubling the other half [5

5. M. Servin, J. A. Quiroga, and J. L. Marroquin, “General n-dimensional quadrature transform and its application to interferogram demodulation,” J. Opt. Soc. Am. A 20, 924–934 (2003). [CrossRef]

]. The evaluation of the quadrature transform at every frame element in the Fourier domain is expressed in a linear equation as
Qn(gk(r))=1{ηFlt(q){Ik(q)}},η={0ωq<02otherwise
(11)
where Flt is a band-pass filter defined in the Fourier domain.

The second stage involves the calculation of the wrapped phase map using Eq. (9) and then performing the temporal phase unwrapping using Eq. (10). The computation of the unwrapped phase map at each step depends only on the current and last wrapped phase map calculated, plus the accumulated phase map φm(r) over the last k – 1 algorithm iterations.

A very important feature of this procedure is that since every element is processed independently, the main two stages of the algorithm can be easily parallelized. This allows the concurrent kernel execution on several GPU threads, thus speeding up the computation process.

Algorithm 1 Retrieval of the dynamic phase ϕ k

The algorithm 1 shows an outline of the method proposed. Each algorithm step is preceded by a label indicating where is executed. The labels M, H, and G indicate respectively memory transfer between the host and the GPU, execution in the host, and execution in the GPU. Three different operations are executed on the GPU: the computation of the FFT, the execution of the quadrature transform, and the phase map kernels, labelled respectively as G_FFT, G_K1 and G_K2. Although the algorithm does not show explicit synchronization barriers, it is assumed that the host waits for the GPU to complete its task before proceeding with its execution.

4. Experimental results

The methodology proposed in this article was tested using both a sequential and a parallel implementation of the algorithm. The sequential implementation uses the FFTW library [11

11. M. Frigo and S. G. Johnson, “The design and implementation of FFTW3,” Proc. IEEE 93, 216–231 (2005). [CrossRef]

] to compute the discrete Fourier transform while the CUFFT [12

12. NVIDIA, NVIDIA CUDA CUFFT LIBRARY (PG-05327-032_V02), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/ (2010).

] –the DFT CUDA native library–is used in the parallel implementation. The code of the sequential implementation was a hand-optimized version of the one used in Ref. [1

1. R. Legarda-Saenz, R. Rodriguez-Vera, and A. Espinosa-Romero, “Dynamic 3-D shape measurement method based on quadrature transform,” Opt. Express 18(3), 2639–2645 (2010). [CrossRef] [PubMed]

]. The algorithms were tested in four different platforms: a computer with 4 GB of memory, and Intel Core2 duo processor running at 2.53 GHz, a NVIDIA GeForce 9400M GPU (16 cores), a Tesla C1060 GPU (240 cores), and a TESLA C2050 GPU (448 cores).

The algorithm was tested processing a sequence of 402 640 × 64 frames captured from a vibrating cantilever [13

13. C. Meneses-Fabian, R. Rodriguez-Vera, J. A. Rayas, F. Mendoza-Santoyo, and G. Rodriguez-Zurita, “Surface contour from a low-frequency vibrating object using phase differences and the Fourier-transform method,” Opt. Commun. 272(2), 310–313 (2007). [CrossRef]

]. The entire sequence was stored in the computer memory while the execution of the programs took place. The implementation used single-precision floating point numbers, in order to make the results comparable, given that not all the GPU used to test the algorithm supported double-precision numbers. Even though we expected some degradation on the results for the lack of precision, the phase maps estimated were consistent with the ones reported in Ref. [1

1. R. Legarda-Saenz, R. Rodriguez-Vera, and A. Espinosa-Romero, “Dynamic 3-D shape measurement method based on quadrature transform,” Opt. Express 18(3), 2639–2645 (2010). [CrossRef] [PubMed]

]. The dynamic phase map computed was used to generate a real-time 3D animation of the vibrating cantilever with the fringe pattern projected on it; two frames of that animation are shown in Fig. 1; an animation of the complete sequence can be found here ( Media 1).

Fig. 1 3D representations of the phase map with the projected fringe pattern on it.

Table 1 shows the execution times of the Fourier transform, the computation of the quadrature transform, the estimation of the phase map and the data transfer time data between devices. Table 1 also shows the total time needed to compute the whole sequence and the achieved speed in frames per second (FPS). In the table, we observe a linear acceleration as the number of GPU cores increases; the most time consuming task in the algorithm is the memory transfer between the host and the GPU, and the task that did benefit the most from the parallelization was the dynamic phase map estimation stage. There is no improvement in the time needed to compute the FFT, which can be explained by the relatively small frame size used in this experiment (almost one eighth of a VGA frame).

Table 1. The Execution Time of Algorithm 1 on Different Computing Platforms

table-icon
View This Table

5. Discussions and conclusions

In this paper, we propose a parallel implementation of a temporal phase-unwrapping and Fourier-based quadrature transform method for real-time dynamic phase map estimation. We tested the algorithm on a sequence of frames of a vibrating surface, generating a 3D visualization of the dynamic phase map. The speed achieved is fast enough to perform the dynamic phase map estimation on real-time from images captured on high-speed cameras; the proposed algorithm would allow to perform real-time analysis on mobile devices –the GeForce-9400M is a graphic processing unit on a consumer-grade laptop computer.

Acknowledgments

R. Legarda-Saenz was supported by Consejo Nacional de Ciencia y Tecnología (México) under grants 25793-CB-2005-01-49753 and 104870-SNI08.

References and links

1.

R. Legarda-Saenz, R. Rodriguez-Vera, and A. Espinosa-Romero, “Dynamic 3-D shape measurement method based on quadrature transform,” Opt. Express 18(3), 2639–2645 (2010). [CrossRef] [PubMed]

2.

NVIDIA, NVIDIA CUDA C Programming Guide (version 3.2), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/ (2010).

3.

K. J. Gåsvik, Optical Metrology, 3rd ed. (John Wiley & Sons Ltd., 2002). [CrossRef]

4.

K. G. Larkin, D. J. Bone, and M. A. Oldfield, “Natural demodulation of two-dimensional fringe patterns. I. General background of the spiral phase quadrature transform,” J. Opt. Soc. Am. A 18, 1862–1870 (2001). [CrossRef]

5.

M. Servin, J. A. Quiroga, and J. L. Marroquin, “General n-dimensional quadrature transform and its application to interferogram demodulation,” J. Opt. Soc. Am. A 20, 924–934 (2003). [CrossRef]

6.

D. Ghigila and M. D. Pritt, Two-Dimensional Phase Unwrapping. Theory, Algorithms, and Software (John Wiley & Sons Ltd., 1998).

7.

T. Kreis, Holographic Interferometry: Principles and Methods (Akademie Verlag, 1996).

8.

J. M. Huntley and H. Saldner, “Temporal phase-unwrapping algorithm for automated interferogram analysis,” Appl. Opt. 32, 3047–3052 (1993). [CrossRef] [PubMed]

9.

S. De Nicola and P. Ferraro, “Fourier transform method of fringe analysis for moire interferometry,” J. Opt. A: Pure Appl. Opt. 2, 228–233 (2000). [CrossRef]

10.

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips “GPU computing,” Proc. IEEE 96(5), 879–899 (2008). [CrossRef]

11.

M. Frigo and S. G. Johnson, “The design and implementation of FFTW3,” Proc. IEEE 93, 216–231 (2005). [CrossRef]

12.

NVIDIA, NVIDIA CUDA CUFFT LIBRARY (PG-05327-032_V02), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/ (2010).

13.

C. Meneses-Fabian, R. Rodriguez-Vera, J. A. Rayas, F. Mendoza-Santoyo, and G. Rodriguez-Zurita, “Surface contour from a low-frequency vibrating object using phase differences and the Fourier-transform method,” Opt. Commun. 272(2), 310–313 (2007). [CrossRef]

OCIS Codes
(100.2650) Image processing : Fringe analysis
(100.5070) Image processing : Phase retrieval
(120.5050) Instrumentation, measurement, and metrology : Phase measurement

ToC Category:
Instrumentation, Measurement, and Metrology

History
Original Manuscript: January 5, 2011
Revised Manuscript: April 26, 2011
Manuscript Accepted: May 23, 2011
Published: June 8, 2011

Citation
Arturo Espinosa-Romero and Ricardo Legarda-Saenz, "GPU based real-time quadrature transform method for 3-D surface measurement and visualization," Opt. Express 19, 12125-12130 (2011)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-19-13-12125


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. R. Legarda-Saenz, R. Rodriguez-Vera, and A. Espinosa-Romero, “Dynamic 3-D shape measurement method based on quadrature transform,” Opt. Express 18(3), 2639–2645 (2010). [CrossRef] [PubMed]
  2. NVIDIA, NVIDIA CUDA C Programming Guide (version 3.2), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/ (2010).
  3. K. J. Gåsvik, Optical Metrology , 3rd ed. (John Wiley & Sons Ltd., 2002). [CrossRef]
  4. K. G. Larkin, D. J. Bone, and M. A. Oldfield, “Natural demodulation of two-dimensional fringe patterns. I. General background of the spiral phase quadrature transform,” J. Opt. Soc. Am. A 18, 1862–1870 (2001). [CrossRef]
  5. M. Servin, J. A. Quiroga, and J. L. Marroquin, “General n-dimensional quadrature transform and its application to interferogram demodulation,” J. Opt. Soc. Am. A 20, 924–934 (2003). [CrossRef]
  6. D. Ghigila and M. D. Pritt, Two-Dimensional Phase Unwrapping. Theory, Algorithms, and Software (John Wiley & Sons Ltd., 1998).
  7. T. Kreis, Holographic Interferometry: Principles and Methods (Akademie Verlag, 1996).
  8. J. M. Huntley and H. Saldner, “Temporal phase-unwrapping algorithm for automated interferogram analysis,” Appl. Opt. 32, 3047–3052 (1993). [CrossRef] [PubMed]
  9. S. De Nicola and P. Ferraro, “Fourier transform method of fringe analysis for moire interferometry,” J. Opt. A: Pure Appl. Opt. 2, 228–233 (2000). [CrossRef]
  10. J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips “GPU computing,” Proc. IEEE 96(5), 879–899 (2008). [CrossRef]
  11. M. Frigo and S. G. Johnson, “The design and implementation of FFTW3,” Proc. IEEE 93, 216–231 (2005). [CrossRef]
  12. NVIDIA, NVIDIA CUDA CUFFT LIBRARY (PG-05327-032_V02), http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/ (2010).
  13. C. Meneses-Fabian, R. Rodriguez-Vera, J. A. Rayas, F. Mendoza-Santoyo, and G. Rodriguez-Zurita, “Surface contour from a low-frequency vibrating object using phase differences and the Fourier-transform method,” Opt. Commun. 272(2), 310–313 (2007). [CrossRef]

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

Figures

Algorithm 1 Fig. 1
 

Supplementary Material


» Media 1: MPG (2229 KB)     

« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited