1. INTRODUCTION
In automated vision applications, an increase of the imaging system magnification is always favorable to an improvement in the resolution eventually achieved [
1H. S. Cho, Optomechatronics: Fusion of Optical and Mechatronic Engineering (CRC Press, 2006).
]. In the case of stereovision, however, the geometrical configuration is cumbersome and forbids the use of short-working-distance lenses. The allowed optical magnification is thus drastically limited to relatively low values, even when using expensive extra-long-working-distance lenses. Furthermore, high magnification means a short depth of field that is a limitation in three-dimensional (3D) experiments. In this context, the ability to achieve subpixel resolution is an alternative way to increase the performance of a vision system without changing the imaging device specifications. By feeding the 3D geometrical model of a stereovision setup with more accurate two-dimenstional (2D) data resulting from subpixel evaluation, the resolution of the 3D data eventually reconstructed is also improved. This principle has been introduced in a previous work on the monitoring of the 3D position and orientation of a labeled target [
2N. A. Arias H., P. Sandoz, J. E. Meneses, M. A. Suarez, and T. Gharbi, “3D localization of a labeled target by means of a stereo vision configuration with subvoxel resolution,” Opt. Express 18, 24152–24162 (2010). [CrossRef]
]. In this case, subpixel resolution is based on a pseudo-periodic pattern (PPP) fixed on the moving target and specifically designed to obtain high accuracy by means of phase computations. The performances obtained with the demonstration setup were of a resolution of 1.6 μm in
and
and 5.2 μm in
, and of
in orientation following the three angular directions. These results were obtained by means of 12 mm focal-length lenses allowing a measurement volume of
with a working distance of 50 cm. We may notice that these performances correspond to a particular application consisting in the 3D localization of a known pattern. This principle cannot be translated to the most common application of stereovision—reconstructing unknown 3D scenes from multiple 2D images.
This paper presents further capabilities of this approach for the highly accurate positioning of a labeled target. This work is specialized in the monitoring of 3D translations, as typically performed by microstages supporting specimens in various microtechnology and nanotechnology instruments. A practical application might be the straightforward localization of tiny areas of interest on specimens after their transfer from one instrument to another. Since only translations are tracked, the orientation of the pattern with respect to the pixel frames of the cameras is
a priori known. This allows the design of a simplified PPP as well as faster processing procedures by replacing 2D Fourier transforms with one-dimensional (1D) ones. We also use a PPP whose dimensions are much larger than the field of observation of the cameras used in the stereovision system. This choice has different implications that improve the overall capabilities of the method:
• The allowed displacement range and the image definition are no longer linked with each other through the imaging lens magnification. Large displacements are reconstructed from images of nonoverlapping areas of the PPP. The absolute distance separating nonoverlapping zones of the pattern is retrieved from the systematic positioning of the current view with respect to the whole PPP.
• The PPP provides a size reference that can be used for the calibration of the displacements observed as well as of the magnification of the imaging system.
• Because of the large size of the pattern, the current view of each camera is always covering some zone of the pattern. Any area of the recorded images can thus be used for position reconstruction. A good choice is then to consider only the central pixel of each camera and to determine its corresponding position on the PPP. A complete description of the geometrical model [
3] of the setup is no longer necessary, since we always work on the optical axis of the two cameras. The effects of image distortions are thus reduced, and calibration can be thought of in a different way.
For the validation of the method, we chose to work with a relatively high-magnification lens in order to demonstrate submicrometer resolutions that cannot usually be achieved by means of stereovision. However, a lower magnification could also be used to allow an extended range of displacement, especially along the direction.
2. PSEUDO-PERIODIC PATTERN FOR 3D TRANSLATION MEASUREMENT
In our concept, the PPP fixed on the target of interest is made wider than the largest displacement expected. In this way, the images recorded by each camera are known to be simply rolling over the pattern. The purpose of the image processing becomes, thus, to identify with the highest accuracy the position of the current view with respect to the complete pattern. As in previous works [
4P. Sandoz, R. Zeggari, L. Froelhy, J. L. Prétet, and C. Mougin, “Position referencing in optical microscopy thanks to sample holders with out-of-focus encoded patterns,” J. Microsc. 225, 293–303 (2007). [CrossRef]
J. A. Galeano-Zea, P. Sandoz, E. Gaiffe, J. L. Prétet, and C. Mougin, “Pseudo-periodic encryption of extended 2-D surfaces for high accurate recovery of any random zone by vision,” Int. J. Optomech. 4, 65–82 (2010). [CrossRef]
–
6Z. Galeano, A. July, P. Sandoz, E. Gaiffe, S. Launay, L. Robert, M. Jacquot, F. Hirchaud, J. L. Prétet, and C. Mougin, “Position-referenced microscopy for live cell culture monitoring,” Biomed. Opt. Express 2, 1307–1318 (2011). [CrossRef]
], the pattern design is intended to be used through two complementary steps of image decoding. The first step is based on binary image processing and performs a coarse but absolute positioning of the current view with respect to the complete pattern. The second step involves linear processing and is based on phase measurements. It provides a relative but high-accuracy positioning of the observed zone with respect to the pixel frame of each camera. The combination of these complementary data leads to absolute and high-accuracy position measurements. The basic principle of the position encryption technique has been described in full detail elsewhere [
5J. A. Galeano-Zea, P. Sandoz, E. Gaiffe, J. L. Prétet, and C. Mougin, “Pseudo-periodic encryption of extended 2-D surfaces for high accurate recovery of any random zone by vision,” Int. J. Optomech. 4, 65–82 (2010). [CrossRef]
] for the general case of 2D translations and in-plane rotations. However, the case of 3D translations as addressed in this paper allows the use of a simplified pattern concept, as presented in Fig.
1. The position encryption is based on a periodic frame of stripes altered by missing ones along both directions. The aim of the missing stripes is to encode the stripe order for the localization of any view within the complete pattern. For this purpose, pseudorandom binary sequences obtained by means of linear feedback shift registers (LFSRs) are used [
7S. W. Golomb, Shift Register Sequences (Holden-Day, 1967).
]. In this technique, consecutive words are nested with each other. Any set of
consecutive bits is sufficient for identifying the corresponding word along the complete LFSR sequence. Figure
1(a) illustrates this principle for the case of words of 3 bits. Each bit is represented by a set of three consecutive stripes, with the central stripe present for bits of value 1 and absent for bits of value 0. Once the missing stripes are identified, as well as the bit values, any set of 3 bits forms a word whose position along the sequence is obtained by means of a lookup table known from sequence construction. In this case of words of 3 bits, the complete pattern is made of 30 stripes [
], while any position can be retrieved from the observation of only nine stripes (
) corresponding to 3 bits.
Fig. 1. 3 bits position encryption sequence. (a) Principle and (b) 2D pattern obtained.
Figure
1(b) presents the pattern suited for 2D position encoding, as obtained by multiplying the stripe pattern by itself after a rotation of
. The resulting pattern presents
ambiguities that are not problematic, since only translations are considered in our case. In the general case of target displacements versus the six degrees of freedom, the pattern designs discussed in [
5J. A. Galeano-Zea, P. Sandoz, E. Gaiffe, J. L. Prétet, and C. Mougin, “Pseudo-periodic encryption of extended 2-D surfaces for high accurate recovery of any random zone by vision,” Int. J. Optomech. 4, 65–82 (2010). [CrossRef]
] would provide solutions for unambiguous measurements. In practice, we used the pattern concept of Fig.
1 with words of 6 bits. The elementary stripe period was chosen to be 150 μm. The final size of the pattern is thus
; i.e., 207 lines and columns, while the smallest detail to be resolved by the imaging system is
. The minimum area required for the reading of the 6 consecutive bits necessary for a proper position identification is thus
, i.e., 18 lines and columns. Experimentally, the pattern is fixed on the translation stage in such a way that the pattern directions are parallel to those of the camera pixel frame.
This kind of pattern has also been used for the 2D positioning of a patterned plate translated in the plane [
8P. Sandoz and M. Jacquot, “Lensless vision system for in-plane positioning of a patterned plate with subpixel resolution,” J. Opt Soc. Am. A 28, 2494–2500 (2011). [CrossRef]
]. In this case, the vision system is based on digital holography, which is advantageous for compactness and wide depth of field. The present work can be seen as a 3D counterpart of this approach using a pseudorandomly encoded target.
3. 2D IMAGE PROCESSING
The procedure of image processing applied to each stereo image recorded is summarized in Fig.
2. Figure
2(a) presents the central band of a recorded image in the horizontal direction. The pattern features appear with a high contrast, whereas some noise due to pattern imperfections is present. The image distortions that can be observed will be dealt with later. Since the horizontal stripes of the pattern provide the same information, the signal summation along the columns allows a significant improvement of the SNR as represented in Fig.
2(b). This signal is representative of the zone of the pattern currently observed by the camera. The signal processing is based on the Morlet wavelet transform [
9R. Kronland-Martinet, J. Morlet, and A. Grossmann, “Analysis of sound patterns through wavelet transforms,” Int. J. Pattern Recogn. Artif. Intell. 1, 273–302 (1987). [CrossRef]
] of the intensity distribution of Fig.
2(b). Figures
2(c) and
2(d) present, respectively, the wrapped phase and the modulus of the component of the wavelet transform that corresponds to the spatial frequency of the stripes. The phase signal is representative of the fine position of the stripes with respect to the camera pixel frame, while the modulus minimums are representative of the missing stripes. In practice, the Morlet wavelet transform is only computed for the spatial frequency of interest, but with two different resolution trade-offs. A wide wavelet—allowing a high spectral resolution—is used for phase determination, whereas a narrow one—leading to high spatial resolution—is used for modulus computation, and thus for an easy discrimination of the missing stripes. Figure
2(e) presents an intermediate step used to convert data relative to the pixel indices into data relative to the stripe indices and vice versa. The binary signal of Fig.
2(e) actually results from the thresholding of the phase of Fig.
2(c) following the condition
. This condition is fulfilled for pixels that are closer to a maximum than to a minimum of the sinelike signal. Figure
2(e) can thus be used as a clock signal of the stripes suited for converting pixel indices into stripe indices and vice versa. Thanks to this conversion table, the modulus variations of Fig.
2(d) can be plotted as a function of the stripe index, as represented in Fig.
2(f). At this stage, the absent stripes are easily detected by thresholding. The definition of the threshold value with respect to the average wavelet transform modulus observed on a rolling window of five stripes makes this thresholding step very robust. The resulting positions of the missing stripes indicate which stripe is representative of a bit value within every set of three stripes. The pseudorandom sequence under observation is finally derived from Fig.
2(f). It provides a coarse determination of the zone of the pattern under view. This coarse measurement is adjusted with the phase data of Fig.
2(c) to obtain an absolute and high-resolution determination of the zone under view. The result is always expressed in phase as a function of the central pixel of the image frame as
where
is the absolute phase of the central pixel,
is the stripe index at the central pixel, and
is the wrapped phase at the central pixel. The latter is not directly taken in the data given by Fig.
2(c) but is derived from the least square fitting of the unwrapped phase for the benefit of data averaging and of SNR enhancement.
Fig. 2. Image processing steps: (a) horizontal band of the recorded image, (b) intensity profile obtained by summing along image columns, (c) spectral phase of (b), (d) modulus of (b), (e) binary signal locating the intensity extrema, (f) analog signal for the discrimination of absent and present stripes.
This sequence of image processing is applied four times in the same way, i.e., to the images of the left and right cameras and along the horizontal and vertical directions. At the end we obtain the absolute and high-accuracy pattern position versus the and directions of each camera.
4. 3D TRANSLATION RECONSTRUCTION
The PPP described in Section
2 was observed by a stereovision system made of two USB (uEye-UI-1540-M) cameras equipped with a C-mount zoom lens (Computar MLH-10x) set at maximum magnification. The stereovision configuration is schematized in Fig.
3 with the PPP laying in the (
) plane. The phase of the central pixel as given by Eq. (
1) expresses with high accuracy the pattern position crossing the optical axis of each camera and for both
and
directions. After a target translation from position
to
, the optical axes intersect the PPP at different abscissa that can be expressed as a function of the target displacement by
where
and
are the apparent target displacements in 2D images provided by cameras
and
, respectively, and where
and
are considered positive. Once the values of
and
are obtained from the image processing described in the previous section, the target displacements that are encoded in the relative variations of
and
can be reconstructed by
We may notice that the only parameters involved in Eqs. (
2) and (
3) are the angles of the optical axes of the two cameras with respect to the
axis of the target displacements. These angles can be determined experimentally by applying a calibrated target displacement along the
direction (
); then we have
In the same way, by applying a calibrated target displacement along the
direction (
), we should have
Such experiments were performed by means of a servo-controlled piezoelectric translator (PZT) (
), and the results are presented in Fig.
4. For a target displacement along the
direction, as observed in Fig.
4(a), the values of
and
change linearly with opposite signs as expected. The angles
and
evaluated from this measurement are equal to 32.69° and 34.81°, respectively. For a target displacement along the
direction, we can observe in Fig.
4(b) that
and
vary simultaneously as expected. A residual difference appears between
and
. It is simply due to a slight alignment mismatch between the PPP and the
direction of the PZT, which introduces a
component in the pattern displacement.
Fig. 3. Effects of target translation on the measured position for each camera. A and B stand for the cameras; and stand for the unit vectors along the target displacement axes; and are the angles of the optical axes with respect to the axis; and represent the vectors on which the target displacements are projected for cameras A and B, respectively; and represent the initial and final target positions, respectively; and are the pattern abscissa at its intersections with the optical axes of the two cameras; and represent the target displacement from to .
Fig. 4. Variations in the measured positions and for cameras A and B, respectively, while the target was translated along the (a) and (b) directions by 340 μm.
The
direction is not represented in Fig.
3. For a perfectly aligned setup, the
direction would be independent of the
and
directions, and both cameras should give the same
data:
In practice, alignment is not perfect and lenses produce distortions, especially at high magnification, as in our case. Some coupling between the
,
, and
components results from these actual conditions. Figure
5 presents the variations observed along the
direction for the target displacements corresponding to Fig.
4. Such coupling appears indeed, following two different aspects: linear and nonlinear. The linear contribution is due to the alignment mismatch between the
axes of the PZT, the PPP plane, and the camera optical axis. For instance, the pseudo-periodic plane appears to be rotated about 1.15° around the
axis of the PZT in Fig.
5(b). The nonlinear contribution presents a periodic behavior with an amplitude of about 0.25 μm and a period corresponding to that of the PPP [
in Fig.
5(a) and 150 μm in Fig.
5(b)]. This periodic modulation is due to the distortions introduced by the lenses and that appear slightly in Fig.
2(a). As we have mainly a black and white pattern image, a transition of half a period with respect to any given analysis window is analogous to a shift of the analysis window itself, since the black parts of the image does not contribute to the phase computation. This apparent shift of the region of interest that modulates the effects of the lens distortions is responsible for the periodic modulation appearing in Fig.
5. We simply notice that this nonlinear modulation tends to be maximum at the highest magnification as in our demonstration setup; it would be relaxed in a lower resolution configuration. Similar comments could be made from Fig.
6, which presents the variations of the computed phase while the PZT is translated along the
direction, and in which appear both linear and nonlinear coupling also. Different strategies can be thought of for minimizing or compensating for the nonlinear modulation. They will be presented in a later work dedicated to an in-depth discussion of the calibration issues linked to this approach, which is outside this proof-of-principle paper.
Fig. 5. Variations in the measured positions and for cameras A and B, respectively, while the target was translated along the (a) and (b) directions by 340 μm.
Fig. 6. Variations in the measured positions (a) and and (b) and for cameras A and B, respectively, while the target was translated along the direction by 240 μm.
5. METHOD DEMONSTRATION AND PERFORMANCE
Various experiments were carried out for the demonstration of the method and for the evaluation of the performance obtained. Figures
7–
9 present the resulting data while an axial displacement of 10 μm was applied to the servo-controlled PZT along the
,
, and
directions, respectively. In each case, we present the reconstructed translation as well as the deviation from a straight line. We observe the ability of the method to detect and reconstruct target translations along the three directions. The standard deviations observed are 37, 33, and 45 nm for the
,
, and
directions, respectively. We may notice that these data are not affected by the systematic errors due to the misalignment and that the nonlinearities discussed previously have a low impact because of the short excursion of 10 μm relative to the pattern period of 150 μm. These deviations from a straight line are nevertheless representative of the ultimate capabilities of the method with respect to the SNR allowed by the devices used experimentally. By applying the usual statistical rule [
10R. J. Hansman, “Characteristics of instrumentation,” in The Measurement, Instrumentation, and Sensors Handbook , J. G. Webster, ed. (Springer-Verlag, 1999).
], a good evaluation of the method resolution corresponds to three times the standard deviation observed, i.e., better than 0.15 μm for the results of Figs.
7–
9. This estimation is comparable to the theoretical limit of
of grid methods based on Fourier transforms [
11B. Zhao and A. Asundi, “Microscopic grid methods—resolution and sensitivity,” Opt. Laser Eng. 36, 437–450 (2001). [CrossRef]
], where
is the pattern period and
is the pixel number (
in our case).
Fig. 7. Reconstructed target translation along versus (a) the position given by the PZT capacitive sensor and (b) deviation from a straight line (standard deviation: 0.037 μm).
Fig. 8. Reconstructed target translation along versus (a) the position given by the PZT capacitive sensor and (b) deviation from a straight line (standard deviation: 0.045 μm).
Fig. 9. Reconstructed target translation along (a) versus the position given by the PZT capacitive sensor and (b) deviation from a straight line (standard deviation: 0.033 μm).
Figures
10 and
11 demonstrate the capabilities of the method to reconstruct 3D translations. For that purpose, a spiral-like displacement was applied to the PZT transducer supporting the PPP. We observe that the 3D motions are properly reconstructed even for an excursion larger than the period of the PPP. The mismatch observed between the reconstructed data (black) and the PZT positions returned by its capacitive sensor (red) are due to the setup misalignment discussed previously. The compensation for this mismatch can be obtained by applying a system coordinate transformation after setup calibration.
Fig. 10. Reconstruction of a 3D spiral-like translation applied to the PZT tranducer. Black: reconstructed position; red: position returned by the PZT capacitive sensor. (a) 3D view, (b) front view, (c) side view.
Fig. 11. Same as Fig.
10 with a displacement range larger than the period of the PPP.
Figure
12 presents the outline of the 3D excursion allowed with the hardware used in the setup. This figure was obtained by translating the PPP by steps of 1 mm by means of three linear step-by-step motors (PI-M111-DC). The explored workspace is
along the
,
, and
directions, respectively. This excursion is limited by the actual size of the PPP along the lateral direction and by the depth of field of the lens used along the
direction. The usual trade-off that has to be found between resolution and workspace extension is partially relaxed here. It only stands along the
direction, because of the limited depth of field of the lens used, and that is slightly related to its magnification. The subpixel phase measurement leads, however, to a high dynamics along
(
versus a standard deviation of 45 nm). Along the lateral directions, the size of the PPP defines the allowed excursion thus:
where (
,
,
) is the excursion along the
,
,
directions, (
,
) is the PPP size, and (
,
) is the field of view of the vision system (
in the experimental setup).
Fig. 12. Reconstruction of an extended 3D translation (, , ).
6. CONCLUSION AND PROSPECTS
The stereovisual measurement of the 3D translations of a PPP is demonstrated in this paper with submicrometer resolution. This level of performance results from the high accuracy allowed by the phase processing of the PPP images. The approach based on an extended PPP has two main advantages: first, the measurement range is only limited by the actual size of the PPP and does not depend on the vision system magnification. The common trade-off “resolution versus measurement range” is thus partially released. Second, the 3D position is retrieved from two pairs of coordinates on the PPP. The latter correspond to the two particular PPP’s points that cross the optical axes of the left and right cameras respectively. The image processing is thus always based on the same areas of the recorded images, i.e., horizontal and vertical bands that avoid image corners where distortions are maximal. These image bands serve only to decrypt the sequence of pseudo-periodic code necessary for the identification of the intersection points with the optical axis. The effects of image distortions are thus significantly reduced in comparison with the usual case of 3D scene reconstruction from stereo images. Furthermore, these effects are kept constant during in-plane displacements and are only slightly affected by out-of-plane displacements of the target. Thanks to this property, the complete geometrical model of the stereovision configuration is not required. While the demonstration results of this proof of principle were obtained by means of an experimental evaluation of the camera orientations, further work has still to be carried out for the complete discussion of the calibration issues linked to this measurement approach.
Since in this approach the image processing is mainly based on 1D signals [cf. Fig.
2(b)], the stereovisual recording could be based on three linear cameras instead of on two 2D ones. In such a configuration, two linear cameras would correspond to the
and
directions considered in this paper, while the third one would correspond to the
direction. Then the amount of data to be transferred to the processing unit would be drastically reduced and the allowed image rate increased (up to 60 kHz with commercial devices). Provided that the image processing is fast enough (for instance, by using dedicated hardware such as digital signal processing or field-programmable gate array devices), a 3D positioning bandwidth of several tens of kilohertz could be expected. Another prospect would be to combine this stereovisual technique with the digital holography approach described elsewhere [
8P. Sandoz and M. Jacquot, “Lensless vision system for in-plane positioning of a patterned plate with subpixel resolution,” J. Opt Soc. Am. A 28, 2494–2500 (2011). [CrossRef]
], with the advantage of having the almost unlimited depth of focus (several centimeters) allowed by digital holography while maintaining a submicrometer resolution.