1. Introduction
Refractive index and its variation as a function of wavelength are material properties that are of general interest in a variety of disciplines. Applications in chemistry, acoustics, applied optics, etc. all rely on knowledge of how optical waves behave as they pass through a material. In this work we are specifically interested in material dispersion which quantifies the frequency dependence in optical phase velocity of a wave. Extracting the material dispersion from observed spectral data is fundamentally an estimation problem: given the experimental, noise-corrupted data, extract the true underlying dispersion parameter. “Truth”, of course, can only be defined in terms of a model that the practitioner believes describes the observed data. We will never know the true parameter value and the best we can do is try to minimize the estimation error.
The dispersion associated with a material’s refractive index has been estimated using numerous optical measurement techniques, many of which are summarized in [
9
L. Walmsley, L. Waxer, and C. Dorrer, “The role of dispersion in ultrafast optics,” Rev. Scie. Instrum.
72(1 I) 1–29 (2001). [CrossRef]
] and [
2
A. Brzsnyi, A. P. Kovcs, M. Grbe, and K. Osvay, “Advances and limitations of phase dispersion measurement by spectrally and spatially resolved interferometry,” Opt. Commun.
281(11) 3051–3061 (2008). [CrossRef]
]. Several approaches use spectral interferometry [
7
P. A. Merritt, R. P. Tatam, and D. A. Jackson, “Interferometric chromatic dispersion measurements on short lengths of monomode optical fiber,” J. Lightwave Technol.
7(4) 703–716 (1989). [CrossRef]
,
5
M. Joffre, L. Lepetit, and G. Cheriaux, “Linear techniques of phase measurement by femtosecond spectral interferometry for applications in spectroscopy,” J. Opt. Soc. Am. B
12(12) 2467–2474 (1995). [CrossRef]
,
4
F. K. Fatemi, T. F. Carruthers, and J. W. Lou, “Characterisation of telecommunications pulse trains by fourier-transform and dual-quadrature spectral interferometry,” Electron. Lett.
39(12), 921–922 (2003). [CrossRef]
] as a means of interrogating the sample; this is also the approach taken here. One particular spectral interferometric technique used with optical pulses is Fourier-transform spectral interferometry (FTSI). This technique is often used to characterize an unknown ultrafast optical pulse with a known reference pulse. Since this technique solves for the spectral phase of the unknown optical pulse, then the outcome will also describe dispersion. To measure dispersion, this technique uses a pulsed input to an interferometer in which one path of the interferometer contains the material for characterization. The interference of the two pulses produces a spectral interferogram with a functional form cos(
ϕ) where
ϕ is the optical phase difference between the two paths.
To recover the optical phase using FTSI the following steps are applied to the measured interferogram: 1) the wavelength data (λ-space) are interpolated to produce evenly spaced frequency data (ω-space); 2) the data are windowed to set the endpoints to zero; 3) the data are transformed into the Fourier domain; 4) the peak is located in Fourier domain (large linear phase term) and shifted to zero; 5) the Fourier-domain data are windowed and an inverse Fourier transform is performed; 6) the quadratic and higher phase terms are recovered by curve fitting the processed data.
This is a powerful technique and has been used extensively where dispersion is large. Estimating the dispersion can be difficult, however, in situations where the dispersion, the path through the optical material, or both are small. Each step in the FTSI technique (listed above) adds numerical error to the measured data resulting in added uncertainty in the estimate. Another approach is to fit the data to the model directly using a nonlinear least-squares approach as was done in [
7
P. A. Merritt, R. P. Tatam, and D. A. Jackson, “Interferometric chromatic dispersion measurements on short lengths of monomode optical fiber,” J. Lightwave Technol.
7(4) 703–716 (1989). [CrossRef]
]. Other approaches relying on numerical differentiation are also described in [
7
P. A. Merritt, R. P. Tatam, and D. A. Jackson, “Interferometric chromatic dispersion measurements on short lengths of monomode optical fiber,” J. Lightwave Technol.
7(4) 703–716 (1989). [CrossRef]
] and are referred to as the Central Wavelength and Turning Point Analysis approaches.
Perhaps the main issue with these approaches is that they are only capable of producing point estimates. That is to say, each method provides a single parameter estimate, obtained by minimizing a specific cost function. This cost function is chosen in accordance with the assumed form of the noise e.g., for Gaussian noise one minimizes the mean square error. One of the main problems with such an approach, particularly for the case of weak dispersion, is that the point estimate gives little sense of the degree to which the estimate can be trusted i.e. a confidence interval. Repeating a point estimate can be used to construct a confidence interval, however this interval is more indicative of ones ability to repeat the estimate rather than a true interval bounding the parameter value.
For this reason we suggest treating the parameters to be estimated as random variables, each with a different probability density function (PDF). In order to estimate the parameter PDFs we will use what is referred to as Bayesian analysis (to be described). The density function of most interest to us is pπ
(ϕ
2) where the parameter ϕ
2 will be used to determine the material’s group velocity dispersion. If we can approximate this distribution, we may take the point estimate of ϕ
2 to be the posterior distribution mean
where the operator
E[·] takes the expected value. We note, however, that because the entire distribution is being estimated other choices for
ϕ̂
2 are possible. For example, one may take
ϕ̂
2 to be the mode of the posterior distribution or possibly the value that maximizes the posterior [
10
J. Wang and N. Zabaras, “Hierarchical Bayesian models for inverse problems in heat conduction,” Inverse Prob.
21, 183–206 (2005). [CrossRef]
]. In addition we will also have information pertaining to the confidence we have in this estimate. For example, we may choose the confidence interval bounds to include 95% of the estimated density function, a so-called “credible interval”. This allows us to make statements such as
“There is a 95% chance that ϕ
2 lies in the interval 0.001 ≤ ϕ
2 ≤ 0.002 [ps
2]”.
There is no analogous statement that can be made using repeated point estimates. Additionally, because the Bayesian approach yields the entire distribution, there is no need to assume the form this distribution will take. The approach to be described is valid for Gaussian or highly non-Gaussian pπ
(ϕ
2). Finally, some techniques such as FTSI provide only quadratic and higher phase terms. With the Bayesian approach each of the parameters in the interferogram’s function are estimated simultaneously.
2. Interferometric dispersion measurements
For measurement of dispersion we used a shearing interferometer as shown in
Fig. 1. The laser source was an Agilent 8164A tunable source and two Newport 818-IR detectors simultaneously measured the reflected and transmitted light. Data were collected for two samples - 0.48-mm thick silicon and 6.43-mm thick BK-7 glass by sweeping the laser source from 1540 to 1560 nm at 0.01-nm intervals. Each sample was tilted at approximately 2° with respect to the incoming beam to allow the reflected light to be monitored without the need for a beamsplitter.
Expressions for the reflected and transmitted light are easily obtained using Snell’s Law and the Fresnel Equations. Since the incident light is not normal to the surface of the sample the reflection and transmission coefficients, respectively, are not exactly equal at each surface (as would be the case for normal incidence). In general, the reflected and transmitted optical intensities have the general forms
where ϕ(ϕ) = (2L/ω)ωn(ω) is the phase accumulated through one round trip through the sample neglecting the small, wavelength-dependent change in physical length L, ω = 2πfopt
is the optical angular frequency, n(ω) is the wavelength-dependent index of refraction, and c is the speed of light in vacuum. Exact expressions for the parameters a,b,u,v and γ are not needed (as will be seen below). To eliminate the Fabry-Perot denominator in each expression we analyze the ratio
which has the general form of a two-beam interferometer
where Idc
is the average value of the intensity ratio and IA
is the amplitude of the interference modulation in the intensity ratio.
Figure (
Fig. 2) shows typical transmission, reflection, and intensity ratio data for a 0.48-mm Silicon and a 6.43-mm BK-7 sample. For both samples, the data were obtained by sweeping the laser source from 1540 to 1560 nm at 0.01-nm wavelength intervals. The BK-7 data (
Fig. 2(a),
(c), and
(e)) has a fast oscillation frequency due to the relatively large (6.43-mm) thickness. In contrast, even though Si has a much higher optical index of refraction than BK-7 (3.5 vs. 1.5), the sample is much thinner (0.48-mm) leading to a much slower oscillation frequency (
Fig. 2(b),
(d), and
(f)). In both BK-7 and Si samples, the data show the relatively large linear phase term, but quadratic and higher order phase terms are not obvious from a visual inspection of the data.
Fig. 1. Experimental setup of the Shearing interferometer for a sample of thickness L. The laser source is reflected from the front and back surfaces of the material forming the interferometer.
Fig. 2. Interferograms of BK-7 (a, c, e) and Silicon (b, d, f) for transmission (a) and (b), reflection (c) and (d), and reflection divided by transmission (e) and (f).
Following the standard approach, we expand the refractive index near frequency ω = ω
0 in a Taylor series
where Δω = ω-ω
0 and n′ indicates the derivative of n with respect to ω evaluated at ω
0, and so on. Hence, the intensity ratio can be written
where the coefficients are given by
The term ϕ
2 characterizes the product of group velocity dispersion times the propagation length of the material under study. Given a sequence of measured intensity ratios I(ω) we wish to estimate the dispersion. Compounding this task are the fact that the parameters Idc,IA
,ϕ
0, and ϕ
1 are also unknown and must be estimated as well. Finally, we must consider the fact that the acquired data will be subject to noise. There are several possible sources of noise in the experiment, however the most prominent in this work is considered to be the additive noise on the measured intensity. In experiment the ratio is recorded at the discrete angular frequencies ωn n
= 1⋯N. Our measured intensity is therefore given by the model
where ηn
is a sequence of independent, identically distributed (iid), normally distributed random variables with variance σ
2
G
and In
(
ψ
) is the true, underlying intensity which depends on the model parameters
ψ
= (Idc,IA
,ϕ
0, ϕ
1, ϕ
2). The problem is to therefore estimate each of the parameter distributions pπj
(ψj
) j = 1⋯5 given the sequence of noisy observations yn
.
3. Bayesian estimation of ϕ2
Central to any estimation approach (Bayesian or otherwise) is formation of the likelihood function,
pL
(
yn
|
In
(
ψ
)). The likelihood function is a probabilistic model describing the distribution of the data given the data model as defined by the model parameters. Formulation of the likelihood function begins by examining the assumed form of the uncertainty (noise). In the case of Eq. (
3), the noise is taken as a sequence of iid sample drawn from a zero-mean, jointly Gaussian PDF
The observed data y ≡ yn
n = 1⋯N and the model In
(
ψ
) can therefore be related via
It should be noted that the noise variance σ
2
G
is also unknown and must be estimated from the data. The parameter vector to be estimated is therefore extended to include the noise variance and will be henceforth denoted
θ
= (Idc,IA
,ϕ
0,ϕ
1,ϕ
2, σ
2
G
).
The method of maximum likelihood estimates the model parameters to be those that maximize
pL
(
y
|
θ
) (e.g., the nonlinear least-squares approach of Merritt [
7
P. A. Merritt, R. P. Tatam, and D. A. Jackson, “Interferometric chromatic dispersion measurements on short lengths of monomode optical fiber,” J. Lightwave Technol.
7(4) 703–716 (1989). [CrossRef]
]). Bayesian analysis, on the other hand, makes use of the likelihood function as a means of relating a prior parameter PDF
p
π
(
θ
) to the posterior
p
π
(
θ
|
y) via
The above relationship is referred to as Bayes Theorem and lies at the heart of all Bayesian analysis. Equation (
5) relates any prior information we have about the parameter to the desired posterior distribution. The term in the denominator,
pD
(
y), is a normalizing constant that can be ignored in the following development for reasons that will be explained.
Equation (
5) provides the joint parameter distribution whereas we seek the marginal distributions of each individual parameter. The distribution of the
jth
parameter in the parameter vector is therefore given by the integral
where the notation ∫ℝ
P-1
dθ
-j
denotes the multi-dimensional integral over all parameters other than θj
. Thus, in order to approximate pπ
(ϕ
2) we require a means of performing a potentially high-dimensional integral involving potentially complex functions of our parameters. Fortunately there exists a convenient numerical approach to sampling from the marginal parameter distributions.
3.1. Markov-Chain Monte-Carlo Methods
The term “Monte Carlo methods” is used to describe simulation techniques for investigating probability distributions. These techniques can be highly efficient, especially when independent samples can be generated. Unfortunately, posterior distributions used in Bayesian inference are often complicated, making it difficult to draw independent samples. Nevertheless it is often easy to draw a dependent sequence of samples representing posterior distributions. Over the last 60 years, stochastic algorithms have been developed which sample new values using rules determined by a fixed number of previous observations. The result is a Markov chain; this form of simulation is known as Markov chain Monte Carlo (MCMC). The magic of MCMC is in producing algorithms for which the resultant Markov chains have stationary distributions equal to the distribution we wish to sample. One such implementation, described next, is the Metropolis Hastings algorithm.
Metropolis-hastings algorithm
Suppose that we wish to draw samples from a specific univariate (single parameter) distribution pπ
(θ). The Metropolis-Hastings (MH) algorithm generates a sequence {θ
(i)} by a 2-step procedure. At stage i, a candidate value θ
* is sampled based on the current value θ
(i-1); it is sampled from a second distribution function g(θ
*|θ
(i-1)). A Bernoulli trial is performed, with success probability r′ = min{r, 1}, where
if the result of the trial is success, we set θ
(i) = θ
*; otherwise, we set θ
(i) = θ
(i-1). The alternatives are often described as “moving” or “staying at the current value”, but it is important to note that in the latter case, the same value is recorded twice (once as θ
(i-1), once as The distribution g
(·) provides the probability of moving and is sometimes referred to as the transition PDF or candidate generating PDF.
It is often easy to describe a candidate generating distribution which is symmetric in its arguments, hence cancels out of the numerator and denominator of r. For example, if θ
* is uniformly distributed on an interval of length 2A centered on θ
(i-1), then its density function is
so that
g(
θ
(i-1)|
θ
*) =
g(
θ
(i-1)\
θ
*). The role of the parameter
A is to effectively decide how large a step to take when testing a candidate parameter value. Large values for the “tuning parameter”
A (Eq. (
7)) decreases the likelihood of accepting a new value, thus the samples in the chain tend to be highly correlated. Too small a step, however, and the algorithm can take a prohibitively long time to converge. Here we use a simple approach whereby the tuning parameter is adjusted on-the-fly in order to achieve an appropriate acceptance rate of between 30% – 50%. An implementation of this approach is given in the Appendix. One also must typically allow for some number of transient iterations to pass before the Markov Chain settles onto its stationary distribution. These transient iterations are referred to in the literature as “burn-in”.
In summary, the algorithm works by perturbing the current value θ
(i-1) →θ
* using the distribution g(·). The ratio r then provides the likelihood that we keep the new value θ
*. Parameters that provide a better fit tend to be kept. The process repeats some number of iterations until we have built up a stationary Markov Chain θ
(i)
i = 1⋯Niter
.
Two features of this remarkable algorithm deserve comment. First, that
the first-order Markov chain it produces will have stationary distribution equal to pπ
(
θ), provided only that the candidate generating distribution and the starting value
θ
(0) allow the chain to reach all values for which
pπ
(
θ) > 0. The second important feature of this algorithm is that its implementation depends on the desired distribution only through the ratio in
r. This is an enormous boon in calculating full conditional distributions, which are defined implicitly as the distribution proportional to product of likelihood and prior; the normalizing constant in Eq. (
5), which makes the distribution integrate to 1, need not be calculated.
3.1.1. Gibbs sampling and conditional conjugacy
For multi-parameter systems the algorithm must be slightly modified. Here we use a form of MCMC used specifically to sample multivariate distributions referred to as Gibbs sampling. Define
θ
-j
as the vector obtained by deleting the jth component of
θ
,viz.,
Gibbs sampling of Eq. (
6) produces a sequence {
θ
(i)} with components
θ
(i)
j
of the
ith sampled vector generated sequentially, for
j = 1,2,…,
P. The value θ
(i)
j
is sampled from its
full conditional distributionThe full conditional distribution of θj
can be thought of as the posterior distribution of θj
, if it happened that the other P - 1 parameters were known, and equal to their present values, i.e., if θk
= θ
(i-1)
k
for k ≠ j. Thus, in order to sample from each of the marginal parameter distributions (6) we simply repeat the MH procedure for each parameter j in turn, holding the other P - 1 parameters fixed.
As a final point, we mention that there are certain special cases when we do not need to resort to the MH algorithm in order to sample the parameter values. When full conditional and prior are both identifiable as members of the same parametric family one may take advantage of
conditional conjugacy [
6
W. A. Link and R. J. Barker, Bayesian Inference with ecological examples (Academic Press, San Diego, CA, 2010).
] to sample values of the posterior directly. In our case, it turns out that the noise variance
σ
2
G
may be treated in this fashion as shown in the Appendix.
4. Numerical results
To test the approach, data were simulated according to the signal model given by Eq. (
2) with additive Gaussian noise. The signal-to-noise ratio, defined as the ratio of signal variance to noise variance, is given by
where
σ
2
G
is the noise variance. In the first numerical example, the system parameters were fixed to the values
ϕ
0 =
π/3,
ϕ
1 = 11.5 [ps],
ϕ
2 = 3
e
-4 [ps
2],
Idc
= 0.65, and
IA
= 0.42 while the SNR was fixed to +20dB. The number of observations was taken to be
N = 1024. The MCMC algorithm was implemented using initial values (priors) chosen from a uniform distribution centered at the true value. For this particular function, the choice of prior value can also be made by simply trying a few different parameters and eyeballing the fit. As long as the initial values are reasonably chosen the algorithm will converge to the proper values. We used 25000 iterations of “burn-in” to tune the algorithm and captured the next 25000 iterations as members of the desired PDFs.
Figure 3 shows the sampled PDFs for each of the parameters and noise variance along with the true value used in simulating the data. The most difficult parameter to estimated is clearly
ϕ
2 as evidenced by the relatively large variance with respect to the true value. The other parameters can all be estimated to within a small percentage of their true values. The problem, of course, is that this very weak nonlinearity is hardly influencing the data over this 40nm bandwidth, thus the estimate cannot overcome the noise. If we consider a stronger nonlinearity
ϕ
2 = 1
e
-3 [
ps
2] the results are much improved.
Figure 4 shows the approximated distribution for the stronger nonlinearity (all other parameters are fixed to the same values as were used in
Fig. 3). Also shown is the stationary Markov chain associated with this distribution. The estimated PDF is much narrower and centered roughly about the true value. Not surprisingly, stronger nonlinearities result in improved estimation. From the Markov chain one can see how the initial (incorrect) prior value quickly converges to the correct stationary distribution. For this particular function all parameters show similar behavior, that is to say convergence of the chain is achieved after the first several thousand iterations.
Besides the strength of the dispersion, other parameters can also influence the quality of the estimate.
Figure 5 shows the progression of the confidence intervals as several key parameters were varied. The nominal parameter values used in this study were
SNR = +20
dB, Δ
λ = 40nm,
ϕ
0 =
π/3,
ϕ
1 = 1.5 [ps],
ϕ
2 = 1
e
-3 [ps
2] and
N = 1024 samples. The confidence intervals are taken as the central 95% of the estimated distribution. Clearly the bandwidth over which the measurement is taken has a large affect on the ability to identify the nonlinearity. This is to be expected as for very small nonlinearities it will take large deviations Δ
ω
in order for the quadratic (and higher) terms in the polynomial to contribute to the phase. The confidence intervals are also improved as the number of data are increased. This too is typical of the dependence observed in most estimation problems. The confidence intervals are entirely insensitive, however, to the number of oscillations observed. Regardless of the term governing the linear frequency
ϕ
1 we see roughly the same estimated parameter distributions for
ϕ
2. Finally, as expected, the quality of the estimate can be significantly degraded by noise. For low SNR (+5dB) a very large confidence interval is produced while for reasonable SNR values of +10
dB to +20
dB fairly narrow intervals are produced. Next, the approach is demonstrated on several experimental samples.
Fig. 3. (a-e) Estimated probability distributions associated with the five model parameters Idc,IA
,ϕ
0,ϕ
1,ϕ
2 along with (f) the estimated distribution for the noise variance. The true parameter values are marked by a vertical line.
Fig. 4. (a) Estimated probability density function for the ϕ
2 parameter where the true value was taken as ϕ
2 = 1e
-3 [ps2] and (b) the stationary Markov chain that produced this distribution
Fig. 5. 95% Confidence interval as a function of (a) bandwidth, (b) number of data, (c) the linear frequency term ϕ
1 and (d) the signal-to-noise ratio
5. Experimental results
Figure 6 shows the results of the MCMC algorithm as applied to the sample of 0.48-mm thick silicon. The data used in this experiment correspond to that shown in
Figure 2(b),
(d),
(f) and consisted of a sequence of
N = 2000 measured intensities over the 1540–1560nm bandwidth. The initial values in the Markov chain (priors) were chosen based on our knowledge of the signals. The prior for
Idc
, for example, was always chosen to be a Uniform distribution centered at
E[
y] i.e. the mean of the observed data. Similarly, the prior for
IA
is easily chosen based on the amplitude of the observed data. For
ϕ
1 and
ϕ
2 we use a uniform prior centered about the values predicted by the Sellmeier equation. Priors for
ϕ
0 (the constant phase term) were obtained visually by finding values for which the data and model appeared to be in phase. The parameter
ϕ
2 possesses by far the largest degree of uncertainty. For this reason we used a uniform prior on the interval -0.01 ≤
ϕ
2 ≤ 0.01 (i.e. in generating our Markov chains we took
ϕ
(0)
2 as a random draw from the distribution
pπ
(
ϕ
(0)
2)~
U(-0.01,0.01)). For weakly dispersive materials we would expect the true value to be somewhere in this interval. We have also tried using Gaussian priors centered at the aforementioned values and found no difference in the results.
Figure 6 shows the results of using the MCMC technique to identify the material dispersion using a burn-in of 20000 iterations and recording the next 10000 iterations as values of the desired distribution. The Markov chain is seen to quickly converge to a stationary distribution which is plotted as a histogram in
Fig. 6(b).
Figure 6(c) shows the model fit obtained by setting each of the parameters in
ψ
to the expected values (averages) of the distributions obtained via the MCMC algorithm. Again, we note that we could also have chosen the posterior mode as the final estimate. However, our calculated posteriors were unimodal and largely symmetric so that mean and mode coincide. We observed no substantive difference between these two estimators of
ϕ̂
2. Just as with the numerically generated data, the confidence intervals for all parameters other than
ϕ
2 were extremely narrow. The estimated distribution
pπ
(
ϕ
2) gives a 95% confidence interval of 1.1
e
-5 ≤
ϕ̂
2 ≤ 2.25
e
-3. The published Sellmeier coefficient for Si allows us to calculate the expected quadratic phase term of this sample to be
ϕ̂
2 = 1. 1
e
-3 [
ps
2] [
8
B. Tatian, “Fitting refractive-index data with the Sellmeier dispersion formula,” Appl. Opt.
23(24) 4477–4485 (1984). [CrossRef]
[PubMed]
]. Clearly the estimated distribution contains this value in the 95% confidence interval. In fact, the point estimate associated with this distribution (the mean value) yields
ϕ̂
2 = 1.23 [ps
2]. Additionally, the estimated noise variance was
σ̂
2
G
= 1.21
e
-3. Given the signal amplitude of
IA
= 0.4083 the signal-to-noise ratio was estimated to be
SNR = 70 = 18.5
dB.
Fig. 6. Identification of material dispersion for Si glass. (a) Stationary Markov chain for coefficient ϕ
2 (b) resulting estimated probability density function and (c) Sampled data along with the model fit
Fig. 7. Identification of material dispersion for BK-7. (a) Stationary Markov chain for coefficient ϕ
2 (b) resulting estimated probability density function and (c) Sampled data along with the model fit
As a second example, we estimated the dispersion of BK-7. Again, data were collected over the 1540–1560nm wavelengths. Samples of the acquired data are shown in
Fig. 2. For this sample we obtain the 95% interval estimate -1.1
e
-3 ≤
ϕ̂
2 ≤ - 2.9
e
-4 [
ps
2]. Again, this is consistent with calculations from Sellmeier data published in [
1]. In order to reduce the credible interval size we conducted two different experiments aimed at increasing the bandwidth of the measurement. In the first experiment, the bandwidth of the acquired data was extended to the range 1530–1570 using the same experimental set-up shown in
Fig. 1.
Figure 8(a) shows the estimated probability density associated with this new data set. Clearly the confidence interval has narrowed significantly from the 20nm bandwidth measurement. We now have a 95% credible interval of -6.40
e
-4 ≤
ϕ̂
2 ≤ -3.06
e
-4 [
ps
2] which is again is consistent with predicted values at 1550nm. Based on this new data set the upper limit has shifted slightly toward a more negative value while the lower limit shows a significant shift in the positive direction. In order to increase the measurment bandwidth further, a second experiment was conducted using a Mach-Zhender interferometer. For this second experiment data were collected in the 760–860nm bandwidth range. As expected, moving to 100nm of bandwidth drastically reduces the size of the confidence intervals as shown in
Fig. 8(b). Because the data are collected at 800nm (as opposed to 1550nm) the estimated posterior is now centered at
E[
ϕ̂
2] = 2.37
e
-4 [
ps
2]. The 95% credible interval extends over the narrow range 2.33
e
-4 ≤
ϕ̂
2 ≤ 2.42
e
-4, thus we have a high degree of faith in this estimate. We also note that the estimated interval for
ϕ
2 is slightly lower than that predicted by the Sellmeier equation [
3
J.-C. Diels and W. Rudolph, Ultrashort Laser Pulse Phenomena (Academic Press, San Diego, Ca, 2006).
].
Fig. 8. Estimated PDFs of material dispersion for BK-7 glass. a) Data collected at 1550nm over a 40nm bandwidth (2x larger than that used in generating Fig. 7b) and b) Data collected at 800nm over a 100nm bandwidth
6. Conclusion
Estimating a weak coefficient of dispersion in materials is a challenging problem. Uncertainty in the measurement can effectively mask the influence of a weak, quadratic nonlinearity. Here we have provided an approach that provides both the estimate and, perhaps more importantly, a sense of the uncertainty in the estimate. This was accomplished using a Bayesian estimation procedure referred to as the Markov Chain Monte Carlo approach. Using this approach, one assigns a prior probability to the model parameters and a transition distribution for moving from one parameter value to the next. These two pieces of information are sufficient for generating a stationary Markov chain; samples from this Markov chain may be taken as samples of the probability distribution for the parameter of interest, in this case dispersion. By estimating the entire distribution, the practitioner not only obtains the estimated dispersion value (taken as the distribution mean), but also the confidence interval about that estimate. Distributions with large variance imply large uncertainty in the estimate while narrow distributions are desirable. Here we have shown how the various experimentally controlled parameters influence both the quality of the estimates and the associated uncertainty. We have then demonstrated the approach effective in estimating dispersion from a number of experimental samples.
Appendices
7. Appendix A
The procedure for performing a Bayesian, MCMC estimation of model parameters
ψ
≡ (
Idc,IA
,
ϕ
0,
ϕ
1,
ϕ
2) and the noise variance
σ
2
G
is given in
Algorithm I. For notational convenience, define the sum-squared error between data and model as
It will be shown that the noise variance σ
2
G
can be sampled directly using conditional conjugacy and it is therefore given separate treatment in the algorithm from the first P - 1 model parameters.
Algorithm 1. The MCMC algorithm using Metropolis-Hastings with Gibbs sampling for Gaussian likelihood and Uniform “transition” distributiong(ψ
*
j
, ψj
(i - 1)) = 1/2Aj
.
The key step is the evaluation of the ratio r = pL
(y|
ψ
*)/PL
(y|
ψ
(i-1)). Essentially we are asking whether or not the new “perturbed” parameter value is a better fit to the data than the previously used value. Parameters that lead to a better fit tend to be kept while those leading to a worse fit are more likely to be discarded.
Additionally we are making use of conditional conjugacy in order to sample the noise variance σ
2
G
. Specifically, we define the parameter τ = 1/σ
2
G
and assume a Gamma prior distribution for τ. The gamma family of distributions described by density functions pΓ(x) ∝ x
α-1exp(-βx) for x > 0, indexed by parameters α,β > 0. It can be shown that if τ has a gamma prior with parameters α and β, the full conditional distribution for τ is also in the gamma family, with parameters α′ = α + N/2 and β′ = β + Q(y, ψ
)/2. The gamma prior is conditionally conjugate for τ. Thus in building the Markov chain for σ
2
G
= 1/τ we may simply sample from the inverse Gamma distribution at each iteration. In general, conditional conjugacy simplifies Gibbs sampling by allowing reference to standard probability distributions (like the beta or gamma) readily available in standard software packages.
The above pseudo-code implicitly assumes 1) a “diffuse” Gamma prior on the precision parameter 1/
σ
2
G
whereby we take
α = 1,
β = 0,2) Uniform priors (centered at values predicted by the Sellmeier equation) for material parameters 3) a Gaussian likelihood function, governed by the sum-squared error between data and model and 4) a uniform transition distribution (Eq.
7). The code is quite simple to implement, the costly step being the determination of the sum-squared error function
Q(·,·) (Eq.
9). Also note that in the above described implementation, as the Gibbs sampler moves through the parameter vector the newly updated values are used. That is to say, for
j = 2 we use
ψ
1(
i) rather than
ψ
1(
i - 1) as the first element of
ψ(
i- 1). Finally, many statistical packages provide routines that generate samples from a Gamma distribution but not from an inverse Gamma distribution. Fortunately there is a simple relationship between the Gamma
G(·,·) and inverse Gamma
IG(·,·) distributions
so that for our problem we may sample the variance directly by drawing
Finally, we note that during the burn-in period we divide the tuning parameters
Aj
by the constant value 1.007 after each rejection and multiply by 1.01 after each acceptance. Thus, an acceptance causes us to expand our parameter search while rejection results in smaller “kicks” to the previous value in the chain. The asymmetry in the constants causes a slight bias in the favor of rejection. This is one approach to adjusting
Aj
on-the-fly such that an appropriate acceptance probability of ~ 40% is obtained for each parameter [
6
W. A. Link and R. J. Barker, Bayesian Inference with ecological examples (Academic Press, San Diego, CA, 2010).
].
There are obviously many variations on the above described algorithm, however
Algorithm 1 presents a simple but useful implementation that is straightforward to code in software.
Acknowledgments
The authors would like to acknowledge the Naval Research Laboratory for providing funding for this work
References and links
1. |
http://cvilaser.com/Common/PDFs/Dispersion_Equations.pdf. |
2. |
A. Brzsnyi, A. P. Kovcs, M. Grbe, and K. Osvay, “Advances and limitations of phase dispersion measurement by spectrally and spatially resolved interferometry,” Opt. Commun.
281(11) 3051–3061 (2008). [CrossRef]
|
3. |
J.-C. Diels and W. Rudolph, Ultrashort Laser Pulse Phenomena (Academic Press, San Diego, Ca, 2006). |
4. |
F. K. Fatemi, T. F. Carruthers, and J. W. Lou, “Characterisation of telecommunications pulse trains by fourier-transform and dual-quadrature spectral interferometry,” Electron. Lett.
39(12), 921–922 (2003). [CrossRef]
|
5. |
M. Joffre, L. Lepetit, and G. Cheriaux, “Linear techniques of phase measurement by femtosecond spectral interferometry for applications in spectroscopy,” J. Opt. Soc. Am. B
12(12) 2467–2474 (1995). [CrossRef]
|
6. |
W. A. Link and R. J. Barker, Bayesian Inference with ecological examples (Academic Press, San Diego, CA, 2010). |
7. |
P. A. Merritt, R. P. Tatam, and D. A. Jackson, “Interferometric chromatic dispersion measurements on short lengths of monomode optical fiber,” J. Lightwave Technol.
7(4) 703–716 (1989). [CrossRef]
|
8. |
B. Tatian, “Fitting refractive-index data with the Sellmeier dispersion formula,” Appl. Opt.
23(24) 4477–4485 (1984). [CrossRef]
[PubMed]
|
9. |
L. Walmsley, L. Waxer, and C. Dorrer, “The role of dispersion in ultrafast optics,” Rev. Scie. Instrum.
72(1 I) 1–29 (2001). [CrossRef]
|
10. |
J. Wang and N. Zabaras, “Hierarchical Bayesian models for inverse problems in heat conduction,” Inverse Prob.
21, 183–206 (2005). [CrossRef]
|