OSA's Digital Library

Optics Express

Optics Express

  • Editor: Andrew M. Weiner
  • Vol. 21, Iss. 13 — Jul. 1, 2013
  • pp: 15815–15825
« Show journal navigation

Higher-order wide-angle split-step spectral method for non-paraxial beam propagation

Brett H. Hokr, C. D. Clark, III, Rachel E. Grotheer, and Robert J. Thomas  »View Author Affiliations


Optics Express, Vol. 21, Issue 13, pp. 15815-15825 (2013)
http://dx.doi.org/10.1364/OE.21.015815


View Full Text Article

Acrobat PDF (4812 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

We develop a higher-order method for non-paraxial beam propagation based on the wide-angle split-step spectral (WASSS) method previously reported [Clark and Thomas, Opt. Quantum. Electron., 41, 849 (2010)]. The higher-order WASSS (HOWASSS) method approximates the Helmholtz equation by keeping terms up to third-order in the propagation step size, in the Magnus expansion. A symmetric exponential operator splitting technique is used to simplify the resulting exponential operators. The HOWASSS method is applied to the problem of waveguide propagation, where an analytical solution is known, to demonstrate the performance and accuracy of the method. The performance enhancement gained by implementing the HOWASSS method on a graphics processing unit (GPU) is demonstrated. When highly accurate results are required the HOWASSS method is shown to be substantially faster than the WASSS method.

© 2013 OSA

1. Introduction

In a recent publication, we described a numerical beam propagation method that represents the beam profile in the basis of the eigenvectors of the Laplacian operator and uses a symmetric operator splitting technique to account for the refractive index variations, known as the wide-angle split-step spectroscopic (WASSS) method [6

6. C. D. Clark and R. Thomas, “Wide-angle split-step spectral method for 2D or 3D beam propagation,” Opt. Quantum. Electron. 41, 849–857 (2010) [CrossRef] .

]. In general, the method provided a two-fold speedup to the finite-difference method reported by Sharma [4

4. A. Sharma and A. Agrawal, “New method for nonparaxial beam propagation,” J. Opt. Soc. Am. B 21, 1082–1087 (2004) [CrossRef] .

] This improvement could be increased by use of a fast Fourier transform algorithm. Here we develop a higher-order WASSS (HOWASSS) method that extends the approximation to higher-order, providing a more efficient method when high accuracies are required. We apply the HOWASSS method to the problem of waveguide propagation to demonstrate the performance and accuracy of the method. An additional performance enhancement is obtained by implementing the method on a graphics processing unit (GPU) using compute unified device architecture (CUDA) technology from NVIDIA™.

2. Formulation

Beam propagation in a medium with a non-uniform refractive index is described by the scalar Helmholtz equation,
2z2ψ(z,r)+r2ψ(z,r)+k02n¯2ψ(z,r)+k02(n2(z,r)n¯2)ψ(z,r)=0,
(1)
where ψ(z, r) is the complex scalar electric field, z is a Cartesian coordinate in the direction of propagation, r are the transverse coordinates, and r2 is the transverse Laplace operator. As usual, k0 is the free space wavenumber, n(z, r) is the refractive index distribution, and 2 = min [n2 (z, x)]. For simplicity we will only consider the two-dimensional (2D) case, however generalization to three dimensions (3D) follows quite trivially. In two dimensions, the electric field is denoted ψ(z, x). Note that x is not necessarily a Cartesian coordinate but could, for example, be a radial coordinate. Expanding ψ(z, x) in terms of the eigenfunctions of the transverse Laplace operator we write
ψ(z,x)=iai(z)ϕi(x),
(2)
where
x2ϕi(x)=λi2ϕi(x)
(3)
ϕi*(x)ϕj(x)dx=δi,j.
(4)
Multiplying the 2D form of Eq. (1) by ϕj*(x), integrating over the transverse coordinate, and applying Eqs. (3) and (4) we are left with
2z2aj(z)=(k02n¯2λj2)aj(z)iai(z)k02(n2(z,x)n¯2)ϕj*(x)ϕi(x)dx.
(5)

To discretize the problem space, let Nx and Nz be the number of discrete grid points in x and z respectively, and let i, j ∈ [1, Nx], and l ∈ [1, Nz]. We let xi = iΔx + x0 and zl = lΔz+ z0, where Δx and Δz are the corresponding step sizes. The matrix N(z) is defined by
Ni,j(z)=k02(n2(z,xi)n¯2)δi,j.
(6)
The discrete eigentransform matrix, S[6

6. C. D. Clark and R. Thomas, “Wide-angle split-step spectral method for 2D or 3D beam propagation,” Opt. Quantum. Electron. 41, 849–857 (2010) [CrossRef] .

], transforms a vector into the spectral basis, while S−1 transforms vectors back into the spatial basis. In the case of Cartesian coordinates with hard boundary conditions this matrix becomes the discrete Fourier transform matrix, and in cylindrical coordinates with azimuthal symmetry and hard boundary conditions, S takes the form of a discrete Hankel transform [7

7. M. Guizar-Sicairos and J. C. Gutiérrez-Vega, “Computation of quasi-discrete Hankel transforms of integer order for propagating optical wave fields,” J. Opt. Soc. Am. A 21, 53–58 (2004) [CrossRef] .

]. We also must define the constant matrix M with elements
Mi,j=k02n¯2λi2δi,j.
(7)
Notice that if λi > k0 then Mi,j becomes imaginary. If we allow eigenvalues such that λi > k0, then the trigonometric functions appearing in P will become hyperbolic (see Eq. (25)), making the method numerically unstable. For this reason, we will limit Nx to satisfy λi < k0. However, we must include at least enough eigenfrequencies to sufficiently represent the initial conditions, otherwise large errors will be incurred. Using these definitions Eq. (5) becomes
2z2a(z)=M2a(z)SN(z)S1a(z).
(8)
Now if we define
A(z)=[a(z)M1za(z)]
(9)
H(z)=[0MMM1SN(z)S10],
(10)
we can write the Helmholtz equation as a first order vector differential equation
zA(z)=H(z)A(z).
(11)

The exact solution to Eq. (11) is given by a Magnus expansion [8

8. W. Magnus, “On the exponential solution of differential equations for a linear operator,” Comm. Pure Appl. Math. 7, 649–673 (1954) [CrossRef] .

],
A(z)=exp(Ω1(z,z0)+Ω2(z,z0)+)A(z0).
(12)
The first two terms of the expansion are
Ω1(z,z0)=z0zH(t)dt
(13)
Ω2(z,z0)=12z0zz0t1[H(t1),H(t2)]dt2dt1.
(14)
Here [H(t1), H(t2)] is the usual commutator. Note that Eq. (11) is essentially the time-dependent Schrödinger equation, in which context the Magnus expansion is more well-known as the time-ordered exponential operator [9

9. M. Bauer, R. Chetrite, K. Ebrahimi-Fard, and F. Patras, “Time-ordering and a generalized Magnus expansion,” Lett. Math. Phys. 103, 331–350 (2012) [CrossRef] .

]. However, we will refer to Eq. (12) as a Magnus expansion to be consistent with discussions concerning initial value problems of the form of Eq. (11).

3. Numerical example

To test our method we will simulate beam propagation through a two-dimensional (Cartesian) symmetric Epstein-layer waveguide tilted at an angle θ from the positive z axis [5

5. A. Sharma and A. Agrawal, “Non-paraxial split-step finite-difference method for beam propagation,” Opt. Quantum. Electron. 38, 19–34 (2006) [CrossRef] .

]. Hard boundary conditions where ψ(z, x0) = 0 and ψ(z, xf) = 0 are assumed. The eigenfunctions of the transverse Laplace operator are then given by
ϕi(x)=sin(iπ(xfx0)x).
(29)
The resulting eigentransform matrix is given by the discrete Fourier matrix
Si,j=2Nx+1sin(π(i+1)(j+1)Nx+1)=Si,j1.
(30)
This particular Fourier matrix was chosen because the hard boundary conditions at x0 and xf are automatically satisfied. We do not need to include these points in our grid, defined by Δx = (xfx0)/(Nx + 1) and Δz = (zfz0)/Nz. The refractive index profile is
n(z,x)=n¯2+2n¯(Δn)sech2(2(x˜cos(θ)zsin(θ))w),
(31)
where is a shifted coordinate to align the waveguide in the center of our computational region to make the hard boundary conditions as negligible as possible, Δn is the height of the refractive index shift, w is the width of the waveguide. The shifted coordinate is given by = x − (1/2)(xfx0) + (1/2)(zfz0) tan(θ). The initial electric field is given by the ze-roth order mode of the Epstein-layer waveguide
ψ(0,x)=sechW(2x˜cos(θ)w)exp(iK0x˜sin(θ)),
(32)
Here, i is the unit imaginary number, not to be confused with the counting index used elsewhere. W and K0 are given by
W=12(1+2w2k02n¯Δn1)
(33)
K0=(2Ww)2+(k0n¯)2.
(34)
The exact solution for this tilted Epstein-layer waveguide is [10

10. M. J. Adams, An Introduction to Optical Waveguides (Wiley, 1981).

]
ψe(z,x)=sechW(2(x˜cos(θ)zsin(θ))w)exp(iK0(x˜sin(θ)+zcos(θ))).
(35)
To measure the error we use the correlation factor,
Error(z)=|1(x0xfψ*(z,x)ψ(z,x)dx)2(x0xfψe*(z,x)ψe(z,x)dx)2|,
(36)
as this provides a measure for both the profile shape and amplitude of the beam [5

5. A. Sharma and A. Agrawal, “Non-paraxial split-step finite-difference method for beam propagation,” Opt. Quantum. Electron. 38, 19–34 (2006) [CrossRef] .

, 6

6. C. D. Clark and R. Thomas, “Wide-angle split-step spectral method for 2D or 3D beam propagation,” Opt. Quantum. Electron. 41, 849–857 (2010) [CrossRef] .

].

C and CUDA versions of both the WASSS and HOWASSS methods were implemented. The code was compiled using nvcc version 4.2 for CUDA code and gcc version 4.6.3 for the C code. The code was run on a workstation equipped with an Intel Xeon X5960 CPU, which is a hyper-threaded six-core CPU clocked at 3.47 GHz with 48 GB of RAM, and a NVIDIA QUADRO 6000 GPU, which has 448 CUDA cores at 574 MHz core clock speed (750 MHz memory clock speed) and 6 GB of dedicated GPU DDR5 RAM.

Table 1. Parameters used in the numerical calculations

table-icon
View This Table
Fig. 1 Error as a function of propagation distance for an aligned waveguide with Nx = 1000 for the (a) WASSS and (b) HOWASSS methods. Note that we have sampled the error every 0.5 mm, and not at every z step calculated, because the rapid oscillations would make the graph difficult to read otherwise.
Fig. 2 Error as a function of propagation distance for a waveguide rotated 50 degrees with Nx = 1000 for the (a) WASSS and (b) HOWASSS methods.
Fig. 3 Plot of |ψ(z, x)| with Nx = 1000, Nz = 2000, and θ = 50° for the (a) HOWASSS and (b) exact solutions for a waveguide tilted at 50 degrees.

4. Discussion

4.1. Stability

By virtue of being a higher order method, the HOWASSS method has improved stability over the WASSS method (see Fig. 4). As we increase the tilt of the waveguide, the WASSS method performs more poorly as the angle increases, and at large step sizes even shows signs of being a bit unstable. However, the HOWASSS method actually becomes more accurate at larger angles. Traditionally, beam propagation methods struggle to obtain accurate results when there are rapid changes in the index of refraction. To simulate a rapidly changing index of refraction we increase the change in the index between the waveguide and the surrounding medium, Δn, while keeping the width of the waveguide fixed. We see that at a 50° waveguide tilt the WASSS method actually becomes unstable with a large stepsize, while the HOWASSS is able to remain stable for at least an additional order of magnitude increase in Δn. However, both methods do loose accuracy as the change in the refractive index becomes steeper, but by decreasing Δz accurate results can still be obtained in a reasonable run time. This same analysis was done for a 0° tilted waveguide and the results were quite similar and so are not shown here.

Fig. 4 (a) The maximum error obtained as a function of waveguide tilt angle showing that the HOWASSS method actually gains a small amount of accuracy at larger angles. (b) The maximum error obtained as a function of waveguide depth, Δn.

4.2. Speed

Computationally, the HOWASSS method requires the multiplication of matrices with vectors. If we restrict Nx so that the matrix M is real then, for the example in Section 3, all the matrices will be real, however the vector A will be complex. In general, the eigenfunctions and the refractive index could be complex. However, for simplicity we will assume these to be real.

Diagonal matrix multiplication is equivalent to element-by-element vector multiplication. Hence, multiplying an Nx × Nx diagonal matrix by an Nx element complex vector requires 2Nx operations. Multiplying an Nx × Nx dense matrix by an Nx element complex vector is equivalent to performing 2Nx dot products, each requiring Nx multiplications and Nx − 1 additions, for a total of 2Nx(2Nx1)=4Nx22Nx operations. Applying P requires a total of 4 diagonal matrix-vector multiplications for a total of 8Nx operations. Applying Q (zl+1) Q (zl) requires 2 diagonal matrix-vector multiplications, 2 dense matrix-vector multiplications, 2 vector-vector additions, and 1 scalar-vector multiplication for a total of 8Nx2+6Nx=2Nx(4Nx3) operations. Applying C(zl+1, zl) requires 4 dense matrix-vector multiplications, 1 vector-vector addition, and 2 element-by-element exponential operations (assumed to only be 1 operation per element) for a total of 16Nx22Nx=2Nx(8Nx1) operations. P is applied four times, Q(zl+1) Q (zl) is applied twice, and C(zl+1, zl) is applied once each step, giving a total of 32Nx2+42Nx=2Nx(16Nx+21) operations. Following the same logic for the WASSS method, we find that it requires 8Nx2+6Nx=2Nx(4Nx+3) operations. Note that this operation count differs from the one reported by Clark and Thomas [6

6. C. D. Clark and R. Thomas, “Wide-angle split-step spectral method for 2D or 3D beam propagation,” Opt. Quantum. Electron. 41, 849–857 (2010) [CrossRef] .

] because we are counting both multiplications and additions in this calculation.

Furthermore, to leading order in Nx, the HOWASSS method is approximately a factor of 4 slower for a given Δz (see Fig. 5). However, the initial hypothesis was that a significant improvement in accuracy may lead to a more efficient algorithm. To test this hypothesis, we executed the HOWASSS and WASSS methods using the GPU implementation with various step sizes, holding all other aspects of the problem constant. Figure 6 summarizes the result. For a propagation length of 100 micrometers, and a waveguide tilt angle of 50 degrees, we show the compute time per micron of propagation as a function of the maximum absolute error. For two different values of Nx, the efficiency at constant error is better for the HOWASSS method, as indicated by shorter compute times. In fact, for error values smaller than about 10−4, the HOWASSS method is much better, with efficiency rapidly exceeding an order of magnitude for absolute error values of less than 10−6.

Fig. 5 (a) Run times of HOWASSS and WASSS methods on the GPU and the single-core CPU for 1000 propagation steps. (b) Comparison of HOWASSS method run times using different number of cores on the CPU and using the GPU for 1000 propagation steps.
Fig. 6 Comparison of the HOWASSS method and WASSS method compute time per propagation distance with Nx = 1000 and Nx = 2000. Both methods were run on the GPU.

The data presented in this paper was generated using double precision arithmetic. This does not result in a significant difference in the run time when the code is run on a CPU. However, GPUs are intrinsically designed to deal with single-precision arithmetic. In order to compute one double-precision number the GPU must perform two single-precision calculations and some additional overhead to combine the result, leading to, at a minimum, a factor of two speed-up when single-precision numbers are used. This speed-up also depends on the specific graphics card used, as some have less capability than others when it comes to double precision arithmetic. Rounding errors can affect the HOWASSS method if Nz is large, around 2000 for the specific example considered in this paper. If the application does not require extreme accuracy, then a substantial speed-up can be achieved by using single-precision arithmetic on the GPU.

4.3. Parallelization

Both the WASSS and HOWASSS methods readily lend themselves to parallelization. For our implementation we chose to use both OpenMP, to make use of multi core CPUs, and NVIDIA compute unified device architecture (CUDA) to make use of the processing power of the graphics processing unit (GPU). Modern GPUs possess orders of magnitude more computational power than the typical CPU. However, to completely utilize this power the algorithm must possess a very high level of parallelism to completely saturate the many processing units of the GPU.

Unfortunately, neither the WASSS nor the HOWASSS method are ideal for utilizing the full power of the GPU because many of the matrix-vector multiplications involve diagonal matrices. Faster than their dense counterparts, these computationally reduce to simple element-by-element vector multiplication. While this is a perfectly parallel operation, it does not offer the large number of independent calculations needed to saturate the GPU. These element-by-element vector multiplications suffer additionally from the fact that they require two memory reads and one memory write to compute only one multiplication. On the GPU reading and writing to memory is much slower than math operations. Given this, there is still a significant speed-up when the code is run on the GPU versus on a single core of the CPU (see Fig. 5). This speed-up was accomplished with little attention payed to optimization of the code. With further optimization an additional factor of two or more would be likely. Additionally, extending this technique to even higher-orders might yield additional improvements.

4.4. Generalizations to other coordinate systems, boundary conditions, and higher dimensions

Both the WASSS and HOWASSS methods can be generalized to different coordinate systems or different boundary conditions. In fact the only complication is that the eigenfunctions and eigenvalues must be known. For a more detailed discussion on this topic see Clark and Thomas [6

6. C. D. Clark and R. Thomas, “Wide-angle split-step spectral method for 2D or 3D beam propagation,” Opt. Quantum. Electron. 41, 849–857 (2010) [CrossRef] .

].

5. Conclusion

Acknowledgments

The authors thank the Air Force Research Laboratory, and acknowledge support through the Human Effectiveness Directorate contract FA8650-08-D-6930 for this effort. This work was partially supported by National Science Foundation Grants ECCS-1250360, DBI-1250361, and CBET-125036. B. H. H. acknowledges the support of the High Energy Laser Joint Technology Office for their sponsorship of the Directed Energy Professional Society’s Summer Research Internship Program.

References and links

1.

G. R. Hadley, “Multistep method for wide-angle beam propagation,” Opt. Lett. 17, 1743–1745 (1992) [CrossRef] [PubMed] .

2.

K. Q. Le, R. Godoy-Rubio, P. Bienstman, and G. R. Hadley, “The complex Jacobi iterative method for three-dimensional wide-angle beam propagation,” Opt. Express 16, 17021–17030 (2008) [CrossRef] [PubMed] .

3.

Y. Y. Lu and P. L. Ho, “Beam propagation method using a [(p−1)/p] Padé approximant of the propagator,” Opt. Lett. 27, 683–685 (2002) [CrossRef] .

4.

A. Sharma and A. Agrawal, “New method for nonparaxial beam propagation,” J. Opt. Soc. Am. B 21, 1082–1087 (2004) [CrossRef] .

5.

A. Sharma and A. Agrawal, “Non-paraxial split-step finite-difference method for beam propagation,” Opt. Quantum. Electron. 38, 19–34 (2006) [CrossRef] .

6.

C. D. Clark and R. Thomas, “Wide-angle split-step spectral method for 2D or 3D beam propagation,” Opt. Quantum. Electron. 41, 849–857 (2010) [CrossRef] .

7.

M. Guizar-Sicairos and J. C. Gutiérrez-Vega, “Computation of quasi-discrete Hankel transforms of integer order for propagating optical wave fields,” J. Opt. Soc. Am. A 21, 53–58 (2004) [CrossRef] .

8.

W. Magnus, “On the exponential solution of differential equations for a linear operator,” Comm. Pure Appl. Math. 7, 649–673 (1954) [CrossRef] .

9.

M. Bauer, R. Chetrite, K. Ebrahimi-Fard, and F. Patras, “Time-ordering and a generalized Magnus expansion,” Lett. Math. Phys. 103, 331–350 (2012) [CrossRef] .

10.

M. J. Adams, An Introduction to Optical Waveguides (Wiley, 1981).

OCIS Codes
(000.4430) General : Numerical approximation and analysis
(080.1510) Geometric optics : Propagation methods
(350.5500) Other areas of optics : Propagation
(080.1753) Geometric optics : Computation methods

ToC Category:
Physical Optics

History
Original Manuscript: March 25, 2013
Revised Manuscript: May 24, 2013
Manuscript Accepted: June 14, 2013
Published: June 25, 2013

Citation
Brett H. Hokr, C. D. Clark, Rachel E. Grotheer, and Robert J. Thomas, "Higher-order wide-angle split-step spectral method for non-paraxial beam propagation," Opt. Express 21, 15815-15825 (2013)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-21-13-15815


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. G. R. Hadley, “Multistep method for wide-angle beam propagation,” Opt. Lett.17, 1743–1745 (1992). [CrossRef] [PubMed]
  2. K. Q. Le, R. Godoy-Rubio, P. Bienstman, and G. R. Hadley, “The complex Jacobi iterative method for three-dimensional wide-angle beam propagation,” Opt. Express16, 17021–17030 (2008). [CrossRef] [PubMed]
  3. Y. Y. Lu and P. L. Ho, “Beam propagation method using a [(p−1)/p] Padé approximant of the propagator,” Opt. Lett.27, 683–685 (2002). [CrossRef]
  4. A. Sharma and A. Agrawal, “New method for nonparaxial beam propagation,” J. Opt. Soc. Am. B21, 1082–1087 (2004). [CrossRef]
  5. A. Sharma and A. Agrawal, “Non-paraxial split-step finite-difference method for beam propagation,” Opt. Quantum. Electron.38, 19–34 (2006). [CrossRef]
  6. C. D. Clark and R. Thomas, “Wide-angle split-step spectral method for 2D or 3D beam propagation,” Opt. Quantum. Electron.41, 849–857 (2010). [CrossRef]
  7. M. Guizar-Sicairos and J. C. Gutiérrez-Vega, “Computation of quasi-discrete Hankel transforms of integer order for propagating optical wave fields,” J. Opt. Soc. Am. A21, 53–58 (2004). [CrossRef]
  8. W. Magnus, “On the exponential solution of differential equations for a linear operator,” Comm. Pure Appl. Math.7, 649–673 (1954). [CrossRef]
  9. M. Bauer, R. Chetrite, K. Ebrahimi-Fard, and F. Patras, “Time-ordering and a generalized Magnus expansion,” Lett. Math. Phys.103, 331–350 (2012). [CrossRef]
  10. M. J. Adams, An Introduction to Optical Waveguides (Wiley, 1981).

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.


« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited