2. Principle
A schematic diagram of the proposed method is shown in
Fig. 1. The reference and signal 3D objects are imaged by lens arrays. Each lens constituting the lens arrays forms a corresponding image of the object space. Each lens of the lens array is referred to as an elemental lens and the image formed by the elemental lens is referred to as an elemental image. Thus, an elemental image is an ordinary picture of the object space. The captured perspective of the object and its position in the elemental image depend on the position of the corresponding elemental lens relative to the position of the object. By imaging through the lens array, these elemental images are obtained and captured by charge-coupled devices (CCDs). The captured elemental image arrays for reference and signal objects are digitally transformed to sub-image arrays of each. Some sub-images of the reference object are then selected and correlated with every sub-image of the signal object by means of a conventional joint transform correlator (JTC) scheme, thus yielding information on the 3D shift and out-of-plane rotation of the signal object with respect to the reference object.
The most unique part of the proposed method is that a correlation operation is performed on the sub-images instead of the elemental images. The sub-image is a collection of pixels at the same position in all of the elemental images, or equivalently at the same relative position with respect to the optic axis of the corresponding elemental lens [
9
C. Wu, A. Aggoun, M. McCormick, and S.Y. Kung, “Depth extraction from unidirectional image using a modified multi-baseline technique,” in Conference on Stereoscopic Display and Virtual Reality Systems IX
,
A.J. Woods, J.O. Merritt, S.A. Benton, and M.T. Bolas eds., Proc. SPIE
4660, 135–145 (2002).
,
10
J.-H. Park, S. Jung, H. Choi, Y. Kim, and B. Lee, “Depth extraction by use of a rectangular lens array and one-dimensional elemental image modification,” Appl. Opt.
43, 4882–4895 (2004). [CrossRef] [PubMed]
].
Figure 2 illustrates the generation of sub-images. In
Fig. 2(a), 5 elemental lenses are shown and thus 5 elemental images. The pixels at the same location in the elemental images are collected to form the corresponding sub-image. For example, pixels at positions corresponding to a blue dot in every elemental image form one sub-image and pixels at green dot positions form another sub-image. Since there are 5 elemental images in
Fig. 2(a), each sub-image will consist of 5 pixels.
Figure 2(b) shows a 2D case. Pixels located at [
1
T.-C. Poon and T. Kim, “Optical image recognition of three-dimensional objects,” Appl. Opt.
38, 370–381 (1999). [CrossRef]
,
1
T.-C. Poon and T. Kim, “Optical image recognition of three-dimensional objects,” Appl. Opt.
38, 370–381 (1999). [CrossRef]
] in every elemental image (in
Fig. 2(b), 6(H)×4(V) elemental images are shown) are collected to form the [
1
T.-C. Poon and T. Kim, “Optical image recognition of three-dimensional objects,” Appl. Opt.
38, 370–381 (1999). [CrossRef]
,
1
T.-C. Poon and T. Kim, “Optical image recognition of three-dimensional objects,” Appl. Opt.
38, 370–381 (1999). [CrossRef]
]th sub-image and pixels at [
i,j] form the [
i,j]th sub-image. Each sub-image in
Fig. 2(b) consists of 6(H)×4(V) pixels since there are 6(H)×4(V) elemental images.
Fig. 2. Sub-image (a) geometry and (b) generation
The sub-image has two useful features that can be exploited in the proposed method to realize a 3D correlation. One is the fact that each sub-image represents a specific angle in which the object is observed regardless of the 3D position of the object. For example, in
Fig. 2(a),
i-th sub-image (collection of blue dots in
Fig. 2(a)) contains the perspective of the object observed in an angle given by
where
yi
is the position of the
i-th pixel with respect to the optic axis of the corresponding elemental lens. Note that in an ordinary imaging system the angle of observation is determined by the relative position of the imaging lens with respect to the object. This observation-angle dependency on the object position, however, is removed in the sub-image.
Figure 3 demonstrates this point. In the case of an ordinary imaging system shown in
Fig. 3(a), the captured perspective of the object changes as the object moves from position 1 to position 2. With reference to
Fig. 3(a), when the object is located at position 1, the imaging lens observes the object at an angle of
θ
observation and the corresponding oblique perspective of the object is captured. On the contrary, when the object is located at position 2, the imaging lens faces the object at an angle of 0° and thus a center perspective of the object is captured. In the sub-image, however, the perspective of the object contained in each sub-image is the same regardless of the object shift as shown in
Fig. 3(b): the sub-image corresponding to red pixels observes the object with an angle of 0° and the sub-image corresponding to blue pixels observes the object with
θ
sub,i for both positions 1 and 2. The angle-invariance of the sub-image makes it possible to select certain angle of observation deterministically regardless of the object position.
Fig. 3. Observing-angle-invariance of the sub-image: (a) ordinary image (or elemental image) (b) sub-image
Another useful property of the sub-image is that the perspective size is invariant, regardless of the object depth. In an ordinary imaging system, the perspective size is inversely proportional to the object depth. Therefore if the object moves farther from the imaging lens as shown in
Fig. 4(a), the object perspective in the captured image becomes smaller. In the sub-image, however, the object is captured in the form of parallel lines with the sampling period of elemental lens pitch
φ as shown in
Fig. 2, and thus the size of the object perspective in the sub-image is constant. When the object depth changes, only the position of the object perspective is changed in each sub-image but the size itself is not changed. For example, suppose that an object whose transverse size covers 5 elemental lenses is imaged by the lens array shown in
Fig. 4(b). The size of the object perspective in the sub-image is determined by the number of the sub-image parallel lines that intersect the object. In
Fig. 4(b), it is 5 pixel size for the sub-image corresponding to the red dots, and 6 pixel size for the sub-image corresponding to the blue dots. When the object moves longitudinally as shown in lower diagram in
Fig. 4(b), the number of parallel lines intersecting the object is still 5 for red dots and 6 for blue dots, and thus the size of the object perspective in those sub-images is not changed. Only the position of the object perspective in the sub-image is changed (by 2 pixels for the sub-image corresponding to the blue dots and 0 pixels for the sub-image corresponding to the red dots in
Fig. 4(b)). This size-invariant feature removes the necessity for any scale-invariant detection techniques such as a Mellin transform, even though the signal object shifts in the depth direction.
Fig. 4. Size-invariance of the sub-image: (a) ordinary image (b) sub-image
Using these two features of the sub-image, i.e. size-invariance and angle-invariance, the 3D shift and the out-of-plane rotation can be detected by using a JTC scheme as follows. Suppose that the reference object is located at (
yr, zr
) and the signal object is located at (
ys, zs
) as shown in
Fig. 2(a). First, let us assume that the signal object has no out-of-plane rotation for the sake of simplicity; i.e.
θy-z
=0° in
Fig. 2(a). Since there is no out-of-plane rotation, the perspective of the object contained in the
i-th sub-image of the signal object is the same as that contained in
i-th sub-image of the reference object. Note that this is true irrespective of where the signal object is located with respect to the reference object due to the observation angle invariance property of the sub-image. Also note that the sizes of the perspectives in these two sub-images for the reference and signal objects are the same due to the size-invariance property. The position of the perspective in the i-th sub-image is given by
ur,i
=(1/
φ)(
yr
+
zr
tan
θ
sub,y-z,i
) for the reference object and
us,i
=(1/
φ)(
ys
+
zs
tan
θ
sub,y-z,i
) for the signal object. Their position difference Δ
ur,i,s,i
can be written by
Since the
i-th sub-images of the reference and signal objects contain the same perspective of an object with the same size, the position difference Δ
ur,i,s,i
can be detected by correlating the
i-th sub-images of the reference and the signal objects using JTC. In
Eq. (2), only
ys
and
zs
are unknowns and thus, the 3D shift in the signal object can be found through two correlation operations with different
i′s.
When there is an out-of-plane rotation θy-z
of the signal object, we cannot find the 3D shift by correlating the reference object sub-image with the signal object sub-image of the same index because they will, in general, contain different perspectives of the object. In this case, the sub-image pair that contains the same perspective of the object should be found first, in other words the out-of-plane rotation should be detected first. The 3D shift can then be found considering the out-of-plane rotation. The out-of-plane rotation angle θy-z
of the signal object is detected by correlating one arbitrarily chosen sub-image for the reference object with every sub-image of the signal object successively. Among them, the sub-image pair yielding the strongest correlation peak will satisfy θ
sub,y-z,i
- θ
sub,y-z,j
=θy-z
where θ
sub,y-z,i
is the angle of observation of the i-th sub-image of the reference object and θ
sub,y-z,j
is that of the j-th sub-image of the signal object, since they have the same perspective of the object. Therefore, by finding the sub-image pair that produces the strongest correlation peak, the out-of-plane rotation angle θy-z
is detected. After θy-z
is detected, the 3D position of the signal object can also be detected by correlating two more sub-image pairs as a no out-of-plane rotation case. In this case, however, we correlate the i-th sub-image of the reference object with the j-th sub-image of the signal object where θ
sub,y-z,i
- θ
sub,y-z,j
=θy-z
since they have the same perspective. The position difference Δur,i,s,j
of the object perspectives in i-th reference sub-image and j-th signal sub-image is given by
Therefore the 3D position of the signal object can be found through
Eq. (3) by selecting two sub-image pairs corresponding to
θ
sub,y-z,i
for the reference object and
θ
sub,y-z,i
+
θy-z
for the signal object and measuring the positions of their correlation peaks.
Figure 5 shows the overall procedure used in the proposed method.
Fig. 5. Procedure for detecting out-of-plane rotation and 3D shift
In the detection of out-of-plane rotation, the minimum angle resolvable in the proposed method is determined by the difference between the observation angles of neighboring sub-images. Specifically, the angular resolution Δ
θ is given by Δ
θ=
θ
sub,i+1-
θ
sub,i
. Since the observation angle of
i-th sub-image
θ
sub,i
is given by
Eq. (1), the angular resolution Δ
θ becomes
where s is the pixel pitch at the image plane of the lens array. The angular range Ω that can be detected in the proposed method is determined by the range of the observation angle of the sub-images (range of θ
sub). Since yi
is restricted by -φ/2<yi
<φ/2, the angular range Ω becomes
3. Experimental results
In the experiment, we used a 3D object consisting of two man-dolls, longitudinally separated by 30 mm, as reference and signal objects as shown in
Fig. 6. The two man-dolls can be considered as two extreme ends of one 3D object. We captured this 3D object with a lens array consisting of 50×50 rectangular elemental lenses with a 3.3 mm focal length and a 1 mm lens pitch. The captured elemental images of the reference and signal objects are transformed to sub-image arrays as shown in
Fig. 6. In our experiment, one elemental image consisted of 20×20 pixels of CCD, and thus 20×20 sub-images were generated for each of the reference and signal objects. In
Fig. 6, we can see that each sub-image contains the corresponding perspective of the reference or signal object. A noteworthy point here is that the sizes of the perspectives are the same in the sub-images of the reference and signal objects, although their longitudinal positions
zr
and
zs
are different, which confirms the size-invariant property of the sub-image. In order to verify the out-of-plane rotation detection capability, we fixed the reference object at (
xr, yr, zr
)=(0 mm, 0 mm, 25 mm) and rotated the signal object located at (
xs, ys, zs
)=(5 mm, 0 mm, 40 mm) with
θx-z
=0°, 2°, 4°, and 6° and
θy-z
=0°. One reference sub-image is correlated with each sub-image of the signal object. In the correlation operation, the joint power spectrum (JPS) was obtained optically using a He-Ne laser, a spatial light modulator (SLM) with a 0.036 mm pixel pitch and CCD, and it was then Fourier transformed digitally to produce the correlation peak.
Figure 7 shows an example of the JPS captured on CCD and the correlation peak obtained by Fourier transforming the JPS digitally. The correlation peak intensity profile over the sub-image index of the signal object is plotted in
Fig. 8. The correlation peak was normalized by the energy of the signal sub-image. In
Fig. 8, it can be seen that the peak intensity profile correctly reflects the out-of-plane rotation of the signal object. By finding the signal sub-image that yields the maximum correlation peak, the out-of-plane rotation angle (
θx-z, θy-z
) can be detected exactly. In our experiment, it was somewhat prone to errors due to the insufficient resolution of each elemental image, changing illumination conditions and shadings according to the rotations. However, as shown in
Fig. 8, it still follows a correct tendency in this state and could be much enhanced considerably if more robust optical correlation methods were to be used.
Fig. 6. Examples of experimentally obtained elemental images and sub-images
Fig. 7. Example of (a) JPS captured by CCD and (b) correlation peak calculated by Fourier transforming the captured JPS digitally.
Fig. 8. Experimental result: intensity profile of the correlation peaks between one sub-image for a reference object located at (xr, yr, zr
)=(0 mm, 0 mm, 25 mm) and each sub-image of a signal object located at (xs, ys, zs
)=(5 mm, 0 mm, 40 mm) with θx-z
=0°, 2°, 4°, and 6° and θy-z
=0°
After the rotation angle is detected, the 3D location of the signal object is detected by finding the correlation peak positions of two sub-image pairs using
Eq. (3).
Figures 9 and
10 show the detected positions of the correlation peak with various locations of the signal object with or without out-of-plane rotation.
Equations (2) and
(3) indicate that the slope of the tan
θ
sub,x-z,i
-vs.-Δ
u line corresponds to (
zs-zr
) and its Δ
u-offset corresponds to (
xs-xr
) and
zs
. The experimental results shown in
Figs. 9 and
10 demonstrate this point clearly. The slope increases as the signal object moves farther from the reference object longitudinally (see the second to sixth graphs in
Figs. 9 and
10), and the Δ
u-offset reflects (
xs-xr
) in the case of no rotation (see the first graph in
Figs. 9) or (
xs-xr
) and
zs
in the case of rotation (see every graph in
Figs. 10). This provides convincing support for the 3D shift detection capability of the proposed method.
Fig. 9. Experimental result: detected positions of the correlation peak with various locations of the signal object when the reference object is located at (xr, yr, zr
)=(0 mm, 0 mm, 25 mm) and the signal object has no out-of-plane rotation.
Fig. 10. Experimental result: detected positions of the correlation peak with various locations of the signal object when the reference object is located at (xr, yr, zr
)=(0 mm, 0 mm, 25 mm) and the signal object has θx-z
=4°, and θy-z
=0° rotation.