## Acceleration of computation of φ-polynomials |

Optics Express, Vol. 21, Issue 23, pp. 29065-29072 (2013)

http://dx.doi.org/10.1364/OE.21.029065

Acrobat PDF (1084 KB)

### Abstract

The benefits of making an effective use of impressive computational power offered by multi-core platforms are investigated for the computation of φ-polynomials used in the description of freeform surfaces. Specifically, we devise parallel algorithms based upon the recurrence relations of both Zernike polynomials and gradient orthogonal Q-polynomials and implement these parallel algorithms on Graphical Processing Units (GPUs) respectively. The results show that more than an order of magnitude improvement is achieved in computational time over a sequential implementation if these recurrence-based parallel algorithms are adopted in the computation of the φ-polynomials.

© 2013 Optical Society of America

## 1. Introduction

1. K. Fuerschbach, J. P. Rolland, and K. P. Thompson, “A new family of optical systems employing φ-polynomial surfaces,” Opt. Express **19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

3. O. Cakmakci, K. Thompson, P. Vallee, J. Cote, and J. P. Rolland, “Design of a freeform single-element head-worn display,” Proc. SPIE **7618**, 761803 (2010). [CrossRef]

4. J. C. Miñano, P. Benitez, and A. Santamaria, “Freeform optics for illumination,” Opt. Rev. **16**(2), 99–102 (2009). [CrossRef]

5. F. Zernike, “Beugungstheorie des schneidenver-fahrens und seiner verbesserten form, der phasenkontrastmethode,” Physica **1**(7–12), 689–704 (1934). [CrossRef]

7. G. W. Forbes, “Characterizing the shape of freeform optics,” Opt. Express **20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

8. O. Cakmakci, B. Moore, H. Foroosh, and J. P. Rolland, “Optimal local shape description for rotationally non-symmetric optical surface design and analysis,” Opt. Express **16**(3), 1583–1589 (2008). [CrossRef] [PubMed]

10. R. W. Gray, C. Dunn, K. P. Thompson, and J. P. Rolland, “An analytic expression for the field dependence of Zernike polynomials in rotationally symmetric optical systems,” Opt. Express **20**(15), 16436–16449 (2012). [CrossRef]

11. I. Kaya, K. P. Thompson, and J. P. Rolland, “Comparative assessment of freeform polynomials as optical surface descriptions,” Opt. Express **20**(20), 22683–22691 (2012). [CrossRef] [PubMed]

12. G. W. Forbes, “Fitting freeform shapes with orthogonal bases,” Opt. Express **21**(16), 19061–19081 (2013). [CrossRef] [PubMed]

11. I. Kaya, K. P. Thompson, and J. P. Rolland, “Comparative assessment of freeform polynomials as optical surface descriptions,” Opt. Express **20**(20), 22683–22691 (2012). [CrossRef] [PubMed]

12. G. W. Forbes, “Fitting freeform shapes with orthogonal bases,” Opt. Express **21**(16), 19061–19081 (2013). [CrossRef] [PubMed]

13. C. L. Phillips, J. A. Anderson, and S. C. Glotzer, “Pseudo-random number generation for Brownian dynamics and dissipative particle dynamics simulations on GPU devices,” J. Comput. Phys. **230**(19), 7191–7201 (2011). [CrossRef]

14. Y. Jian, K. Wong, and M. V. Sarunic, “Graphics processing unit accelerated optical coherence tomography processing at megahertz axial scan rate and high resolution video rate volumetric rendering,” J. Biomed. Opt. **18**(2), 026002 (2013). [CrossRef] [PubMed]

15. S. Liu, P. Li, and Q. Luo, “Fast blood flow visualization of high-resolution laser speckle imaging data using graphics processing unit,” Opt. Express **16**(19), 14321–14329 (2008). [CrossRef] [PubMed]

## 2. Recurrence relations of φ-polynomials and their parallelization

7. G. W. Forbes, “Characterizing the shape of freeform optics,” Opt. Express **20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

7. G. W. Forbes, “Characterizing the shape of freeform optics,” Opt. Express **20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

**20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

*2n + m*. The

**20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

*(n-m)/2th*execution of the recurrence relation are computed together at once:

*for each thread*

*get the local id corresponding to the k*

^{th}recurrence run,*compute the a[k], b[k], c[k] locally (see*

*Eq. (5)*)

*store them*

*end*

*k*. For example, the computation of

*(r,φ).*Hence once all the threads return, the computation of the Zernike polynomial,

*for each thread*

*get the local sample ray point, (r,φ), to operate on.*

*create a local data cache [*

*3*

3. O. Cakmakci, K. Thompson, P. Vallee, J. Cote, and J. P. Rolland, “Design of a freeform single-element head-worn display,” Proc. SPIE **7618**, 761803 (2010). [CrossRef]

*].*

*store in cache[0]the first Jacobi, P*

_{0}, in cache [*1*

1. K. Fuerschbach, J. P. Rolland, and K. P. Thompson, “A new family of optical systems employing φ-polynomial surfaces,” Opt. Express **19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*] the second Jacobi, P*

_{1}at (r,φ).*while (recurrence exec num < (n-m)/2)*

*swap cache[0], cache [*

*1*

1. K. Fuerschbach, J. P. Rolland, and K. P. Thompson, “A new family of optical systems employing φ-polynomial surfaces,” Opt. Express **19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*], swap cache [*

*1*

**19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*], cache [*

*2*

*].*

*recurrence exec number + + .*

*end*

*compute power, r^m*

*compute sine/cosine (mφ).*

*store the result*

*end*

*kernels*completely executed on the GPU device. Both algorithms are implemented with CUDA C, and executed as kernels within hundreds of threads in each thread block on GPU.

**20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

*f*and

*g*computation shown in Appendix A of [7

**20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

*for each thread*

*get the local id corresponding to the k*

^{th}recurrence run,*compute the a[k], b[k], c[k], d[k] for the auxiliary recurrence (see Eq. (A.3) of [*

*7*

**20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

*])*

*compute F[k], G[k] for Q-poly recurrence (see Eqs. (A.13) and (A.16) of [*

*7*

**20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

*])*

*store them*

*synchronize threads*

*thread 0: compute f[k] and g[k] (see Eq. (A.18) of [*

*7*

**20**(3), 2483–2499 (2012). [CrossRef] [PubMed]

*])*

*end*

*for each thread*

*get the local sample ray point, (r,φ) to operate on.*

*create an auxiliary auxcache [*

*3*

3. O. Cakmakci, K. Thompson, P. Vallee, J. Cote, and J. P. Rolland, “Design of a freeform single-element head-worn display,” Proc. SPIE **7618**, 761803 (2010). [CrossRef]

*] and a data cache [*

*2*

*]*

*store auxcache[0] the first auxiliary, A*

_{0}, auxcache [*1*

**19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*] the second auxiliary, A*

_{1}at (r,φ).*store cache[0] the first gradient orthogonal Q-poly, Q*

_{0}at (r,φ).*while (recurrence exec num < n)*

*run gradient orthogonal Q-poly recurrence, store it cache [*

*1*

**19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*], (see*

*Eq. (6)*

*).*

*swap auxcache[0], auxcache [*

*1*

**19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*], swap auxcache [*

*1*

**19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*], auxcache [*

*2*

*].*

*swap cache[0], cache [*

*1*

**19**(22), 21919–21928 (2011). [CrossRef] [PubMed]

*].*

*recurrence exec number + + .*

*end*

*compute power, r^m*

*compute sine/cosine (mφ).*

*store the result*

*end*

## 3. Numerical results of SIMT parallelization of φ-polynomials

^{®}and CUDA C programming languages [18] to implement and run the parallel and sequential algorithms for the φ-polynomials. The MATLAB 2013a implements the state of the art numerical algorithms for vector addition and multiplication. All the implementations are run on a middle ranking laptop computer with a GeForce GT 650M GPU and a CPU of Intel

^{®}Core i7-3610QM.

*speedup*and this parameter increases as the ray grid size increases. Figure 2 shows the total execution times of the sequential and parallel algorithms on CPU and GPU and corresponding speedups of the φ-polynomials with respect to the ray grid size.

*speedup*increases with the total number of ray samples and grows significantly in average as the number of rays quadruples across the aperture.

^{®}Core i7 3610QM consists of 4 cores running 8 concurrent threads with hyper-threading [20

20. Intel, “Intel core 7 processor specifications” (Intel, 2013), http://www.intel.com/content/www/us/en/processors/core/core-i7-processor/Corei7Specifications.html.

## 4. Conclusion

## Acknowledgments

## References and links

1. | K. Fuerschbach, J. P. Rolland, and K. P. Thompson, “A new family of optical systems employing φ-polynomial surfaces,” Opt. Express |

2. | O. Cakmakci, S. Vo, K. P. Thompson, and J. P. Rolland, “Application of radial basis functions to shape description in a dual-element off-axis eyewear display: Field-of-view limit,” SID |

3. | O. Cakmakci, K. Thompson, P. Vallee, J. Cote, and J. P. Rolland, “Design of a freeform single-element head-worn display,” Proc. SPIE |

4. | J. C. Miñano, P. Benitez, and A. Santamaria, “Freeform optics for illumination,” Opt. Rev. |

5. | F. Zernike, “Beugungstheorie des schneidenver-fahrens und seiner verbesserten form, der phasenkontrastmethode,” Physica |

6. | M. Born and E. Wolf, |

7. | G. W. Forbes, “Characterizing the shape of freeform optics,” Opt. Express |

8. | O. Cakmakci, B. Moore, H. Foroosh, and J. P. Rolland, “Optimal local shape description for rotationally non-symmetric optical surface design and analysis,” Opt. Express |

9. | I. Kaya and J. P. Rolland, “Hybrid RBF and local phi-polynomial freeform surfaces,” Adv. Opt. Technol. |

10. | R. W. Gray, C. Dunn, K. P. Thompson, and J. P. Rolland, “An analytic expression for the field dependence of Zernike polynomials in rotationally symmetric optical systems,” Opt. Express |

11. | I. Kaya, K. P. Thompson, and J. P. Rolland, “Comparative assessment of freeform polynomials as optical surface descriptions,” Opt. Express |

12. | G. W. Forbes, “Fitting freeform shapes with orthogonal bases,” Opt. Express |

13. | C. L. Phillips, J. A. Anderson, and S. C. Glotzer, “Pseudo-random number generation for Brownian dynamics and dissipative particle dynamics simulations on GPU devices,” J. Comput. Phys. |

14. | Y. Jian, K. Wong, and M. V. Sarunic, “Graphics processing unit accelerated optical coherence tomography processing at megahertz axial scan rate and high resolution video rate volumetric rendering,” J. Biomed. Opt. |

15. | S. Liu, P. Li, and Q. Luo, “Fast blood flow visualization of high-resolution laser speckle imaging data using graphics processing unit,” Opt. Express |

16. | G. W. Forbes, “Robust and fast computation for the polynomials of optics,” Opt. Express |

17. | M. Abramowitz and I. Stegun, |

18. | NVIDIA, |

19. | NVIDIA, |

20. | Intel, “Intel core 7 processor specifications” (Intel, 2013), http://www.intel.com/content/www/us/en/processors/core/core-i7-processor/Corei7Specifications.html. |

21. | C. W. Clenshaw, “A note on the summation of Chebyshev series,” Math. Tables Other Aids Comput. |

**OCIS Codes**

(200.4960) Optics in computing : Parallel processing

(220.0220) Optical design and fabrication : Optical design and fabrication

**ToC Category:**

Optical Design and Fabrication

**History**

Original Manuscript: September 4, 2013

Revised Manuscript: November 1, 2013

Manuscript Accepted: November 9, 2013

Published: November 15, 2013

**Citation**

Ilhan Kaya and Jannick Rolland, "Acceleration of computation of φ-polynomials," Opt. Express **21**, 29065-29072 (2013)

http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-21-23-29065

Sort: Year | Journal | Reset

### References

- K. Fuerschbach, J. P. Rolland, and K. P. Thompson, “A new family of optical systems employing φ-polynomial surfaces,” Opt. Express19(22), 21919–21928 (2011). [CrossRef] [PubMed]
- O. Cakmakci, S. Vo, K. P. Thompson, and J. P. Rolland, “Application of radial basis functions to shape description in a dual-element off-axis eyewear display: Field-of-view limit,” SID16, 1089–1098 (2008).
- O. Cakmakci, K. Thompson, P. Vallee, J. Cote, and J. P. Rolland, “Design of a freeform single-element head-worn display,” Proc. SPIE7618, 761803 (2010). [CrossRef]
- J. C. Miñano, P. Benitez, and A. Santamaria, “Freeform optics for illumination,” Opt. Rev.16(2), 99–102 (2009). [CrossRef]
- F. Zernike, “Beugungstheorie des schneidenver-fahrens und seiner verbesserten form, der phasenkontrastmethode,” Physica1(7–12), 689–704 (1934). [CrossRef]
- M. Born and E. Wolf, Principles of Optics (Cambridge University, 1999).
- G. W. Forbes, “Characterizing the shape of freeform optics,” Opt. Express20(3), 2483–2499 (2012). [CrossRef] [PubMed]
- O. Cakmakci, B. Moore, H. Foroosh, and J. P. Rolland, “Optimal local shape description for rotationally non-symmetric optical surface design and analysis,” Opt. Express16(3), 1583–1589 (2008). [CrossRef] [PubMed]
- I. Kaya and J. P. Rolland, “Hybrid RBF and local phi-polynomial freeform surfaces,” Adv. Opt. Technol.2(1), 81–88 (2013).
- R. W. Gray, C. Dunn, K. P. Thompson, and J. P. Rolland, “An analytic expression for the field dependence of Zernike polynomials in rotationally symmetric optical systems,” Opt. Express20(15), 16436–16449 (2012). [CrossRef]
- I. Kaya, K. P. Thompson, and J. P. Rolland, “Comparative assessment of freeform polynomials as optical surface descriptions,” Opt. Express20(20), 22683–22691 (2012). [CrossRef] [PubMed]
- G. W. Forbes, “Fitting freeform shapes with orthogonal bases,” Opt. Express21(16), 19061–19081 (2013). [CrossRef] [PubMed]
- C. L. Phillips, J. A. Anderson, and S. C. Glotzer, “Pseudo-random number generation for Brownian dynamics and dissipative particle dynamics simulations on GPU devices,” J. Comput. Phys.230(19), 7191–7201 (2011). [CrossRef]
- Y. Jian, K. Wong, and M. V. Sarunic, “Graphics processing unit accelerated optical coherence tomography processing at megahertz axial scan rate and high resolution video rate volumetric rendering,” J. Biomed. Opt.18(2), 026002 (2013). [CrossRef] [PubMed]
- S. Liu, P. Li, and Q. Luo, “Fast blood flow visualization of high-resolution laser speckle imaging data using graphics processing unit,” Opt. Express16(19), 14321–14329 (2008). [CrossRef] [PubMed]
- G. W. Forbes, “Robust and fast computation for the polynomials of optics,” Opt. Express18(13), 13851–13862 (2010). [CrossRef] [PubMed]
- M. Abramowitz and I. Stegun, Handbook of Mathematical Functions (Dover, 1972), Chap. 22.
- NVIDIA, CUDA C Programming Guide (NVIDIA, 2012).
- NVIDIA, CUDA C Best Practices Guide, (NVIDIA, 2012).
- Intel, “Intel core 7 processor specifications” (Intel, 2013), http://www.intel.com/content/www/us/en/processors/core/core-i7-processor/Corei7Specifications.html .
- C. W. Clenshaw, “A note on the summation of Chebyshev series,” Math. Tables Other Aids Comput.9, 118–120 (1995).

## Cited By |
Alert me when this paper is cited |

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

« Previous Article | Next Article »

OSA is a member of CrossRef.