The advantages of optical measurements like fast data acquisition, non-contact measurement or the possibility of soft tissue measurements are used in a wide range of technical, medical and security applications.
The aim of this work is to achieve a precise 3-D model of a human face for computer aided surgeries in dentistry. Due to the fact that mainly children are the patients, fast data acquisition is essential. The time for measurement should be less than 3s and the measurement accuracy better than 100μmm. Furthermore a cost-saving measuring setup is desired.
Like human stereovision, photogrammetric techniques use the same basic principle to get 3-D information of the environment : images of the object are captured from two different perspectives. Pairs of image points resulting from the same object point are called homologous points. These points given, the object can be reconstructed using triangulation methods.
A sketch of the technical realization is given in Fig. 1
. For image acquisition a convergent arrangement of two cameras is applied. The camera model which is used to describe the process of image capturing is the pinhole camera. For a precise reconstruction, all parameters of this model have to be known exactly. The parameters can be divided in intrinsic and extrinsic ones. The most important intrinsic parameter is the ratio of the distance projection centre — image plane and the pixel size. Further intrinsic parameters are the coordinates of the principal point, which is the perpendicularly projected projection centre (x→cl
) in the corresponding image plane. In addition to the pinhole model, anisotropy and shear have been taken into account, distortion is not yet included. The extrinsic six camera parameters (3 for the centre of projection and 3 for the angles of rotation) describe the position of the camera in an external world coordinate system by a simple Euclidian transformation. To reduce the number of parameters, the world coordinate system is identified with the system of the left camera. Actually, the intrinsic parameters of both cameras are determined by a previously calibration procedure using a planar calibration pattern [1
1. Y. Ma, S. Soatto, and J. Kosecka, An Invitation to 3-D Vision (Springer, 2003)
Fig. 1. Schematic arrangement of the measuring setup
From the located homologous points the Essential-Matrix is calculated with the normalized Eight-Point Algorithm [2
2. R. I. Hartley, “In Defense of the Eight-Point Algorithm,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19 , No. 6, pp. 580–593, (1997)
]. The extrinsic parameters are calculated from this matrix using quaternions [3
3. O. Faugeras, Three-Dimensional Computer Vision (Artificial Intelligence) (MIT Press, 1993)
]. This procedure makes the arrangement insensitive against environmental changes because the relative orientation of the cameras is determined from the homologous points.
Whereas li is the intensity at the current position in picture i of the left camera, and l→ the average intensity of the pixel over all N images. The terms for the right camera are analogous. This implies that the transformation between the intensity of homologous points is only linear. Therefore nonlinearities, for example the gain of the cameras or angle dependent scattering may lead to systematic measurement errors. A point is accepted as a homologous one if ρ exceeds a certain threshold (e.g. ρth = 0.9). This threshold is essential to suppress remaining outliers, which mainly occur if an object point is only visible in one camera.
3. Experimental setup
The previously described stereo-photogrammetrical method is realized by an experimental setup, which uses two Fire-Wire-Cameras with 1.3MP (1280 × 960, pixel size = 4.65μm,) and a commercially available XGA-projector (1024×768). The focal length of the camera lenses is 25mm and therefore a camera has a diagonally angle of view of about 2θ = 17°. The lateral resolution, caused by pixel size and focal length, is about 0.4mm whereas the longitudinal resolution covers a range of 0.4mm to 0.8mm, depending on the angle between optical axes of the cameras. In this setup the maximum allowed angle between the optical axes is limited by the nose of the person, because both sides of the nose need to be visible as well as possible in both cameras. We used an angle of 20° which leads to a longitudinal resolution of 0.8mm. The distance between one camera and the measured person is about 1.1m. The measurement volume is about 250×200×180 (H×W×D/mm).
4. Data processing and optimization
4.1. Subpixel interpolation
It is self-evident that an object point, which is mapped in the middle of one pixel in the left camera, will not be mapped into the middle of one in the right camera. Thus, for precise measurements, the position of the corresponding pixel has to be located with subpixel accuracy. The taken images are sampled signals, so the original signal can be reconstructed at any subpixel position. Due to the long computing time the exact reconstruction using the sinc-interpolation is not possible. The simplest interpolation method is the bilinear one, which uses the 4 nearest neighbours to calculate the desired value. At first we interpolate the intensity values at a specified subpixel position in each image and afterwards compute the correlation coefficient ρ
for this position. The position is shifted till ρ
reaches a maximum. Figure 2
shows an example of a bilinear computed subpixel correlation function in a 2×2 sensor field. The central value (u = v = 100) corresponds to the maximum of the integer value based search. This example shows that a maximum can occur in any of the four quadrants. Therefore, four search algorithms are required. The function displayed in Fig. 3
is computed with a bicubic algorithm [6
6. I. E. Abdou and K. Y. Wong, “Analysis of Linear Interpolation Schemes for Bi-Level Image Applications” in IBM Journal of Research and Development, Vol. 26, No. 6, pp. 667–680, (1982)
Fig. 2. Bilinear subpixel correlation function
Fig. 3. Bicubic subpixel correlation function
This interpolation method takes the 16 nearest neighbours into account to compute an interim value. As a result, the new correlation function has only a single maximum. Therefore, only one optimization algorithm is needed. The reason for this behaviour is that between two adjacent quadrants only 50% of the input data for the subpixel interpolation remain unchanged for the bilinear interpolation in comparison to 75% for the bicubic one. It should be mentioned here that a single maximum for the bicubic interpolation cannot be guaranteed, but is in practise true for most relevant cases. The disadvantage of the time consuming bicubic algorithm is compensated by the need of only a single search algorithm.
4.2. Pattern structures
The disadvantages of the faster interpolation algorithms in comparison to the sinc-interpolation are their poorer transfer functions. As a result, the initially used binary patterns yield no reasonable results. Therefore, the patterns should be limited in their spatial frequency. Additionally the minimal frequency of the patterns should also be modified, because low frequencies lead to large homogeneous areas in the patterns, which produce flat correlation functions as shown in Fig. 2
and Fig. 3
. The result of such flat correlation functions is higher noise. To avoid the negative influence of the pixelized patterns, the projector was defocused a little bit.
gives an example of a binary pattern, at which every 2×2 pixel block was randomly switched to black or white. The other two patterns are the fourier transformations of random spectrums. For the right image the high and low frequencies had been surpressed, for the middle one only the high frequencies.
Fig. 4. Example parts (75 × 100 pixels) of a binary pattern, one pattern with limited maximum frequency and a bandlimited one
5.1. Evaluation of the measurement method
Evaluation of the step height sensitivity using a plane with milled grooves between 5μ
m and 160μ
m, the projected patterns correspond to the ones shown in Fig. 4
To verify the accuracy of the measurement system, two well known objects have been tested. At first for quantification of the minimal resolvable height step a matt finished aluminium plate with milled grooves from 5μ
m to 160μ
m was used. To separate this feature of interest from deficiencies caused by imperfect calibration a two-dimensional polynomial fit of fourth order was subtracted. As a result, Fig. 5
shows both the resolved step height of 20μ
m and the improved quality of measurement with the optimized illumination structures. No additional filtering of the data has been carried out.
Fig. 6. Deviation from a flat reference in mm
To check the absolute measurement accuracy a matt finished calibrated granite plate was used. Height and width of this plate filled out the full field of view. All influences of imperfect determination of the intrinsic and extrinsic calibration parameters of the cameras, the uncon-sidered distortion and the computation of the coordinates with the subpixel interpolation can be seen. No filtering and no global fitting except of subtracting the best fit plane were applied to the results shown in Fig. 6
. The absolute error of the full field is less than 0.3mm whereas the standard deviation (rms) is smaller than 50μ
m. The ratio of rms error to the realized measurement field height of 250mm is better than 2∗10-4
5.2. 3-D-Measurements of human faces
) GIF animation of a rendered point cloud of a human face. The picture shows in addition a detailed view of the eye’s region as a pointcloud and as a rendered image.
This article shows that bandlimited projection patterns in combination with subpixel interpolation can improve the capability of 3-D measurements by stereo-photogrammetry. The experimentally determined values for the sensitivity concerning step height detections and the full field uncertainty of measurement are 20μm and 50μm (rms) respectively, whereas the absolute error is less than 0.3mm. The accuracy with respect to the lateral measuring range is about 2 ∗ 10-4. The realized accuracy is sufficient for medical measurements of human faces. The short period of image acquisition (< 3 seconds), the low hardware requirements and the self calibration of the extrinsic parameters are additional advantages of this method. Until now the in the beginning mentioned assumption of a linear intensity transformation between homologous points does not lead to noticeable measurement errors.
In further developments the digital projector will be replaced by a cheaper analogue projection unit, which will allow non pixelized projection patterns. Furthermore, the distortion has to be added to the camera model to overcome the main geometry error. Therefore, stable calibration algorithms have to be implemented.
This project was supported by the Thuringia ministry of science, research and culture under the topic: ‘3-D shape measurement for function orientated diagnostic and therapy in dentistry’.
References and links
Y. Ma, S. Soatto, and J. Kosecka, An Invitation to 3-D Vision (Springer, 2003)
R. I. Hartley, “In Defense of the Eight-Point Algorithm,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19 , No. 6, pp. 580–593, (1997)
O. Faugeras, Three-Dimensional Computer Vision (Artificial Intelligence) (MIT Press, 1993)
F. Devernay, O. Bantiche, and E. Coste-Manire, “Structured light on dynamic scenes using standard stereoscopy algorithms,” in Rapport de recherche de l’INRIA, No. 4477, (June 2002), http://www.inria.fr/rrrt/rr-4477.html
P. Albrecht and B. Michaelis, “Stereo Photogrammetry with Improved Spatial Resolution,” in 14th International Conference on Pattern Recognition, pp. 845–849, (1998)
I. E. Abdou and K. Y. Wong, “Analysis of Linear Interpolation Schemes for Bi-Level Image Applications” in IBM Journal of Research and Development, Vol. 26, No. 6, pp. 667–680, (1982)