### Abstract

In this article, we propose a massively parallel, real-time algorithm for the estimation of the dynamic phase map of a vibrating object. The algorithm implements a Fourier-based quadrature transform and temporal phase unwrapping technique. CUDA, a graphic processing unit programming architecture was used to implement the algorithm. It was tested on a fringe pattern sequence using three devices with different capabilities, achieving a processing rate greater than 1600 frames per second (fps).

## 1. Introduction

## 2. Theoretical development

**r**= (

*x*

_{1},

*x*

_{2}, ⋯ ,

*x*) is an

_{n}*n*-dimensional position vector,

*a*(

**r**) is the background illumination and

*b*(

**r**

*,t*) is the amplitude modulation. It should be noted that

*a*(

**r**) is considered to remain constant throughout the experiment.

*N*frames are captured from the dynamic movement of the object in such way that where Δ

*t*is the temporal period of the captured frame and is smaller than the temporal period of the vibration cycle. Under this condition each frame can be seen as where

*I*(

_{k}**r**) =

*I*(

**r**

*,t*+

_{o}*k*Δ

*t*),

*b*(

_{k}**r**) =

*b*(

**r**

*,t*+

_{o}*k*Δ

*t*) and

*ψ*(

_{k}**r**) =

*ψ*(

**r**

*,t*+

_{o}*k*Δ

*t*) for

*k*= 0, 1, ⋯ ,

*N*– 1.

### 2.1. The general n-dimensional quadrature transform

*a*(

**r**) is filtered from the fringe pattern defined in Eq. (2), the following fringe pattern results [7]

*n*-dimensional quadrature transform for fringe patterns with carrier frequency is defined as [5

**q**= {

*u*

_{1}

*, u*

_{2}

*,..., u*} is the

_{n}*n*-dimensional position vector on the frequency domain and

*ℱ*{·} denotes de Fourier transform. Using Eq. (6) we can write the complex fringe pattern of Eq. (5) as which in turn can be reduced to the following expression

### 2.2. Temporal phase unwrapping

*W*{·} is called the wrapping operator [6].

*φ*(

_{M}**r**) can be computed by the sum of the

*M*– 1 phase differences using the following equation [8

## 3. Parallel implementation of the quadrature transform and phase unwrapping on CUDA

*threadId*and

*blockId*, a 3-component and 2-component vectors that identify respectively the thread within the block and the grid. Each kernel can determine the portion of data it should process using these two indexes. The programming model considers the GPU as an external device, and it distinguishes explicitly the code executed sequentially on the host computer and the one executed in parallel on the GPU (kernels). Therefore, CUDA programs run on at least two devices, the host computer and one (or more) GPU. The memory space of the GPU device and the host computer are different and, as a result, a program needs to transfer data between host and the device.

*ω⃗*·

**q**< 0, and doubling the other half [5

*Flt*is a band-pass filter defined in the Fourier domain.

*φ*(

_{m}**r**) over the last

*k*– 1 algorithm iterations.

*M*,

*H*, and

*G*indicate respectively memory transfer between the host and the GPU, execution in the host, and execution in the GPU. Three different operations are executed on the GPU: the computation of the FFT, the execution of the quadrature transform, and the phase map kernels, labelled respectively as

*G_FFT*,

*G_K1*and

*G_K2*. Although the algorithm does not show explicit synchronization barriers, it is assumed that the host waits for the GPU to complete its task before proceeding with its execution.

## 4. Experimental results

## 5. Discussions and conclusions

## Acknowledgments

