3.1. Optical design
In the simplest configuration, as in
Fig. 1, there is no lens between the object and sensor. In this case, the resolution limits of the system are dominated by the sensor pixel size and traditional diffraction limitations. The pixel size presents an object-space resolution limit because the finest spatial frequency which can be sampled on the sensor has period 2
p, where
p is the pixel separation. In the lensless case, such a fringe pattern on the sensor is generated by a grating of the same period at the object.
The other major limit on optical resolution is the traditional diffraction limit. It may not be obvious at first that such limitations apply for the holographic case, but the limitations can be understood simply by considering a localized grating of period Δ on the object, as illustrated in
Fig. 4. Such a grating will scatter incident light at angles θ
m
=sin
-1(
mλ/Δ). If we insist that at least two scattering orders (including
m=0) fall on the sensor regardless of the position of the localized grating, then we require
Np
≥2
λd/Δ, where
N is the linear number of pixels and
d is the distance between the object and sensor. The diffraction-limited object resolution is then
This result is extremely similar to Rayleigh’s criterion, which provides rR
≈1.22λd/(Np
).
Fig. 3. Two reconstructions at different depths from a single holographic data set. (a) shows a reconstruction at z=107 mm. The top target is “in focus” and the bottom target is not. (b) Using the same data but reconstructing at z=244 mm, the reverse is true.
Fig. 4. An illustration of how diffraction limits resolution for holographic reconstruction. So that the localized grating can be reconstructed, at least two diffraction orders (including m=0) must be recorded by the sensor. This limits the minimum grating size that can be resolved.
A reasonable system design approach is to equate these two resolution limits, suggesting
d
opt≈
/λ. For example, given the camera used to acquire the data presented in
Figs. 2 and
3, the ideal working distance is
d
opt=511×(9×10
-6)
2/(632.8×10
-9)=0.065 m, yielding an achievable diffraction-limited resolution of
rd
=9
µm, the pixel size. A larger distance (0.107 m) was used above due to mechanical constraints, resulting in a predicted resolution of
rd
=15
µm. The observed resolution is approximately 17
µm in the vertical direction and 25
µm in the horizontal direction, suggesting that the horizontal motion led to a reduction in resolution by approximately one pixel, as described in Subsection 3.5.
Fig. 5. Portion of experimental setup when using a relay lens. The holographic sensor observes the image, allowing control of the object resolution by adjusting the magnification.
Using the design approach described above, the object resolution is completely controlled by the sensor pixel size. In order to change the object resolution, one can use relay optics to magnify the object as shown in
Fig. 5. The holographic sensor then observes the optical image rather than the object itself. Because Fresnel-Kirchoff propagation can go either backward or forward, the image may fall beyond the sensor. Also, the image can be either real or virtual. In these terms, the image magnification is
Mi
=
si
/
so
and the resolution becomes
Of course, the lens must support this resolution. Specifically, its diameter
Dl
must satisfy
Dl
≥λ
si
/
p. Also, the plane reference field must either be mixed with the scattered object light after the lens (as shown in
Fig. 5) or the reference field must be prepared such that it has planar wavefronts as it strikes the sensor.
3.2. Noise and dynamic range
In any imaging system, noise and dynamic range are concerns. Due to the post-processing of the raw image data, the noise properties of this holographic system are not obvious. Here, we describe the effects of additive white Gaussian noise (AWGN) from the sensor, quantization noise, and shot noise.
For AWGN with the same distribution in each raw frame, it can be shown that the noise variance on the reconstructed field f̂ at the sensor is given by
where
is the noise variance of measurement
Ij
and ‖
P‖
F
is the Frobenius norm of
P. The real and imaginary noise will in general be correlated, but
Eq. (13) holds because the noise adds as
nf
=
n′f
+
in″f
and so |
nf
|
2=
+
, where
nf
is the actual noise on the field, and
n′f
and
n″f
are the real and imaginary parts, respectively. From
Eq. (15), it is clear that to minimize the noise in
f̂, one must minimize ‖
P‖
F
. Note that the third column in
R
p
only serves to force the row-sums of
P to be zero, and the actual value in that column is unimportant. In the limit that it goes to infinity, the values in the third row of
approach zero and ‖
‖
F
=‖
P‖
F
. Therefore, because the Moore-Penrose pseudo-inverse is minimum-norm, choosing
P=
provides the minimum-norm solution of
PR
p
=
T and the minimum-AWGNlinear solution for the field
f.
The effects of quantization error are very similar to those of AWGN. For typical signals, the quantization noise takes the form of additive uniformly distributed noise. The quantization noise variance at the sensor can be written as
where I
max is the maximum measurable intensity, b is the number of bits, and the quantity 2√3 comes from the assumption of a uniform intensity distribution. This assumption need only be valid on the scale of a quantization level I
max/2
b
. Calculating the quantization noise variance on f̂ is very similar to the AWGN case, leading to
The shot noise calculation is less simple than either of the cases presented above, largely because shot noise is not additive. The shot noise variance observed on a raw pixel is proportional to the intensity of the light falling on that pixel, with a proportionality constant α which depends on the wavelength and sensor properties. The resulting noise variance is given by
which simplifies considerably in many common cases. Consider the specific case of
in which the combined AWGN, quantization, and shot noise contributions take the form
The corresponding signal-to-noise ratio (SNR) is given by
Equations (20) and
(21) may be slightly misleading because they assume that the energy devoted to each frame remains constant as the number of frames increases. If the energy from the object
Ef
=
M|
f|
2 and from the reference
ER
=
MR
2 is held fixed, then the noise variance becomes
and the SNR becomes
Fig. 6. Simulated SNR
E
as a function of sensor SNR for linear (solid) and nonlinear (dotted) reconstruction. The upper (high SNR
E
) lines are for M=3 and the lower (low SNR
E
) lines are for M=6. The region to the left is dominated by AWGN and the region to the right by shot noise.
As expected, the SNR
E
contribution from shot noise does not depend on M, whereas the contribution from detector AWGN and quantization noise decreases with increasing M. Therefore, the best noise performance is achieved with small M.
Figure 6 shows a plot of simulated SNR
E
vs. sensor SNR=
I
max/σ
Ig
for two values of
M using both the linear inversion technique described above and also a nonlinear technique. The nonlinear approach chooses
f to minimize the magnitude of (
2R+
1f
T)
f-
D at each pixel. The simulation includes both AWGN and shot noise. In the high-sensor-noise region on the left, the performance is dominated by sensor noise. In that limit, the linear dependence of SNR
E
on the sensor SNR dominates and the 1/√
M dependence is clearly visible. Also, we see that there is a slight improvement (less than 1 dB) in SNR
E
achieved by using nonlinear reconstruction. At high sensor SNR, performance is dominated by shot noise. In that regime, nonlinear reconstruction continues to provide improved SNR
E
, but the 1/√
M dependence is lost.
Equations (21) and
(23) clearly show that it is always advantageous to use large field strengths for both the reference and objects fields. As illustrated in
Fig. 7, best performance is achieved by maximizing the dynamic range, given by 4
f
max
R
max, where
f
max and
R
max are the maximum amplitudes of the object and reference fields, respectively. However, the sensor places a practical upper limit on the field strengths because the total intensity may not exceed
I
max. Therefore, one must impose the constraint
. As an example, given an object field with fixed strength
, one should choose
yielding a dynamic range of 3
I
max/4. If both object and reference field strengths can be adjusted, then one should choose
.
All of the preceding noise analysis is for the field on a single pixel of the sensor. The final field calculation also includes the Fresnel-Kirchoff back-propagation. The primary effect of this back-propagation is to “color” the AWGN according to the magnitude of the amplitude transfer function (ATF) |
H(
kx
,
ky
)|=|ℱ[
K(
x,
y)]|. This filtering process can also be thought of as introducing spatial correlations.
Fig. 8 shows plots of the ATF as a function of
kx
for a 500×500 sensor with 9
µm pixels for several reconstruction distances
z. In all cases, the high frequencies are attenuated, but for larger
z the coloring effect is more extreme. The vertical lines mark the spatial frequencies
kx
=1/(2
rd
), where
rd
is the resolution derived in
Eq. (11).
Figure 9 shows several simulated images corresponding to the same conditions. For larger
z, the noise is less grainy, suggesting weaker high-frequency components. Obviously, the object field also experiences the same low-pass filtering, leading to worse resolution and a blurry image.
Fig. 7. An illustration of the intensities that may fall on the sensor given an object field of maximum amplitude f
max and maximum reference field amplitude R
max. The central region between the dashed lines of height 4f
max
R
max represents the part of the sensor’s full dynamic range that is actually used.
Fig. 8. Amplitude transfer function (ATF) as a function of spatial frequency kx
for several values of z. In each case, the image is 500×500 9µm pixels. The vertical lines show the spatial frequency kx
=1/(2rd
) corresponding to the predicted resolution at that value of z.
The shot noise will likely already have spatial structure because it depends on the object field intensity |f(x,y)|2). Therefore, calculating its final spatial correlations is difficult, although like the AWGN, it will be (additionally) colored by the ATF.
Fig. 9. Several simulated images corresponding to the ATF plots in
Fig. 8. The top row of images shows the entire 500×500 for each value of
z, whereas the lower images are 120×120 pixels, showing the central region. Both the noise and the object field are smoothed at increasing
z.
3.3. Errors in Rj
The previous section discussed the result of errors in the measurements
D due to noise. It is also likely that the actual values of the reference fields will differ from the desired values. The impact of such error is easily seen by considering an estimated reference matrix
R̂, the corresponding inversion matrix
P̂, and the resulting estimated field
f̂.
Equation (5) can be solved using the same measurements
D with either the ideal or estimated versions of
R,
P, and
f. By equating both versions, one finds
where E is a 2×2 error matrix. This matrix is constant for all pixels. In general, it is difficult to predict the value of this matrix given arbitrary R and arbitrary error. However, we present a few common cases here.
If the measured reference fields relate to the true fields as R̂=(1+ε)R, then E=-ε
I/(1+ε) and the resulting field is also scaled in amplitude. If all reference fields have a constant phase error R̂j
=eiϕRj
, then the resulting field measurement is f̂=e-
iϕf.
If the reference fields take the form of
Eq. (19) and
R̂j
=(1+
εj
)
Rj
with small Gaussian-distributed
εj
, then the RMS of the error matrix is given approximately by
where is the variance of εj
. We see that the variance of the error in f depends only on the variance of the error in Rj
. Also, amplitude error in Rj
tends to create amplitude error in f, and phase error creates phase error. However, these hold only in the limit that σε≪1; for larger σε, phase and amplitude errors begin to mix.
Fig. 10. Illustration of how the resolving power with a defocus of Δz is the same for holographic reconstruction and traditional imaging with a lens of focal length F=z/2 and diameter D=Np
.
3.4. Depth of field and reconstruction
Because this technique measures the electric field f (x,y) at the sensor and then reconstructs the field g(x,y) at any position z, it is useful to define two quantities: the depth of field (DOF) and depth of reconstruction (DOR). The DOF has a similar meaning to its use in traditional imaging; it is the range over which an object remains (approximately) in focus for a single reconstruction. In contrast, the DOR is the range of z over which a given f(x,y) can be used to reconstruct g(x,y) with good resolution.
The DOR has largely been calculated in Subsection 3.1, where we found that the diffraction-limited resolution varies approximately as λ
z/(
Np
) for the lensless case. Therefore, a factor of two loss in resolution occurs at
z=2
/λ. There is no loss of resolution at
z<
/λ; the resolution is limited by the pixel size in that domain. However, one must use an appropriate diffraction calculation [
2
J. W. Goodman, Introduction to Fourier Optics , 3rd ed. (Roberts & Company, Greenwood Village, 2004).
] and it may be necessary to upsample the image to avoid aliasing in the diffraction kernel.
The depth of field is less obvious. The resolving power of this system for an object placed at
z+Δ
z, where
z is the reconstruction depth, is identical to a traditional lens with focal length
F=
z/2 and diameter
D=
Np
, as illustrated in
Fig. 10. This is because the lens and propagation through distance
z
0 introduce the same quadratic phase term as back-propagation by a distance -z0 [
2
J. W. Goodman, Introduction to Fourier Optics , 3rd ed. (Roberts & Company, Greenwood Village, 2004).
]. Therefore, the DOF is roughly given by twice the Rayleigh range of the equivalent lens system,
Over this range, the resolution remains less than approximately √2
rd. For the experimental results presented in
Fig. 2, we find DOF=0.56 mm.
3.5. Motion-related error
While the technique presented in Subsection 2.2 compensates for constant-velocity linear lateral motion, deviation from that assumption has several effects on the resulting image. The most obvious is the traditional blur that results from an object moving a distance t
evo
long compared to the pixel size p, where te
is the exposure time and vo
is the object velocity. However, this form of blur affects each frame identically and results in final-image blur that is very similar to the blur in a conventional image. However, in a traditional image, the intensity I(x,y) is blurred, whereas in this case, the field f̂(x,y) (and ĝ(x,y)) is blurred. That is, a traditional images is blurred according to Ī(x,y)=I(x,y)∗B(x,y), where B(x,y) is a blur function. Because the blur is applied to each Ij
(x,y) identically, the result is
Fig. 11. MSE of a sample image as a function of defocus (solid) and axial shifts (dashed). The shifts were performed for M=3,6,9, with increasing error for larger M because the overall shift is larger. For defocus, the object was placed at z=z
0+Δz. For the shifted images, each frame was acquired with the object at zj
=z
0+[j-(M+1)/2]Δ
z
. In all cases, z
0=65 mm and the image size 500×500 9µm pixels.
More pervasive are errors in the velocity estimation or deviation from constant-velocity linear motion. Either of these will result in motion-compensation errors such that the individual frames are not properly re-aligned, leading to blur-like errors on the length scale of the frame misalignment. Frame misalignment on the order of the pixel size are inevitable in the simplest approach, where one simply uses the location of the largest value of the cross-correlation to calculate the shift, and then shifts each frame by the nearest integer number of pixels. One can, estimate sub-pixel shifts to reduce this error further [
6
R. C. Hardie, K. J. Barnard, J. G. Bognar, E. E. Armstrong, and E. A. Watson, “High-resolution image reconstruction from a sequence of rotated and translated frames and its application to an infrared imaging system,” Opt. Eng.
37, 247–260 (1998). [CrossRef]
].
Another important consideration for velocity estimation is that the first and last frames should be, as much as possible, shifted versions of each other. Specifically, a fixed background in addition to the moving object, or large parts of the object falling out of the field of view (or entering it) between frames will confuse the cross-correlation and lead to errors in the velocity estimation. The algorithm is quite robust, but when it fails, it often fails catastrophically, leaping to a peak far from the correct one. For example, a fixed background and moving object will result in two strong peaks which may be far apart, and the algorithm will tend to “hop” between them.
As mentioned above, the process described here does not estimate or compensate for axial motion. However, axial translation introduces errors similar to that produced by simple defocus.
Figure 11 shows the simulated mean square error (MSE) of a sample image as a function of defocus (solid) and axial shifts (dashed). Several values of
M are shown and because total shift increases with
M, the MSE is also larger for larger
M. This figure demonstrates that in order to avoid losing resolution as a result of axial motion, one must keep the axial translation small compared to the DOF.