OSA's Digital Library

Optics Express

Optics Express

  • Editor: Andrew M. Weiner
  • Vol. 21, Iss. 4 — Feb. 25, 2013
  • pp: 4456–4474
« Show journal navigation

Linear stratified approach using full geometric constraints for 3D scene reconstruction and camera calibration

Jae-Hean Kim and Bon-Ki Koo  »View Author Affiliations


Optics Express, Vol. 21, Issue 4, pp. 4456-4474 (2013)
http://dx.doi.org/10.1364/OE.21.004456


View Full Text Article

Acrobat PDF (5072 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

This paper presents a new linear framework to obtain 3D scene reconstruction and camera calibration simultaneously from uncalibrated images using scene geometry. Our strategy uses the constraints of parallelism, coplanarity, colinearity, and orthogonality. These constraints can be obtained in general man-made scenes frequently. This approach can give more stable results with fewer images and allow us to gain the results with only linear operations. In this paper, it is shown that all the geometric constraints used in the previous works performed independently up to now can be implemented easily in the proposed linear method. The study on the situations that cannot be dealt with by the previous approaches is also presented and it is shown that the proposed method being able to handle the cases is more flexible in use. The proposed method uses a stratified approach, in which affine reconstruction is performed first and then metric reconstruction. In this procedure, the additional constraints newly extracted in this paper have an important role for affine reconstruction in practical situations.

© 2013 OSA

1. Introduction

To develop a method obtaining camera calibration and 3D scene reconstruction simultaneously from images are one of the important topics in the field of computer vision. There have been many approaches concerning these problems.

First, classical approaches utilize known 3D positions of points usually printed on a calibration rig manufactured with accuracy [1

1. R. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE Trans. Robot. Autom. 3, 323–344 (1987). [CrossRef]

3

3. J.-H. Kim and B.-K. Koo, “Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object,” Opt. Express 20, 25292–25310 (2012). [CrossRef] [PubMed]

]. However, such position information can be obtained through specific acquisition systems for large 3D scenes and is rarely available in general situations. Second, the camera calibration and scene reconstruction can be acquired simultaneously from solely the image sequences using only the constraints on the camera intrinsic parameters. This approach is known as self-calibration and provides great flexibility [4

4. M. Pollefeys and L. V. Gool, “Stratified self-calibration with the modulus constraint,” IEEE Trans. Pattern Anal. Mach. Intell. 21, 707–724 (1999). [CrossRef]

, 5

5. M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch, “Visual modeling with a hand-held camera,” Int. J. Comput. Vision 59, 207–232 (2004). [CrossRef]

]. However, to acquire high quality, a lengthy image sequence and a set of many accurate feature correspondences are necessary. Third, the information from restricted camera motion can be used to calibrate cameras [6

6. T. Moons, L. V. Gool, M. Proesmans, and E. Pauwels, “Affine reconstruction from perspective image pairs with a relative object-camera translation in between,” IEEE Trans. Pattern Anal. Mach. Intell. 18, 77–83 (1996). [CrossRef]

8

8. L. Agapito, E. Hayman, and I. Reid, “Self-calibration of rotating and zooming cameras,” Int. J. Comput. Vision 45, 107–127 (2001). [CrossRef]

]. Finally, there have been many methods that use the properties of scene geometry or special objects [9

9. R. Cipolla, T. Drummond, and D. P. Robertson, “Camera calibration from vanishing points in images of architectural scenes,” in “Proc. British Machine Vision Conferece,” (Nottingham, England, 1999), pp. 382–391.

19

19. K.-Y. K. Wong, G. Zhang, and Z. Chen, “A stratified approach for camera calibration using spheres,” IEEE Trans. Image Process. 20, 305–316 (2011). [CrossRef]

]. As these methods do not require a calibration rig but utilize the geometry of a scene or the shapes of objects as a calibration rig, accurate calibration results can be acquired without a lengthy image sequence. The proposed algorithm is categorized into these methods.

2. Related works

It is also possible to obtain projective reconstruction linearly from at least one plane visible in all views [12

12. C. Rother and S. Carlsson, “Linear multi view reconstruction and camera recovery using a reference plane,” Int. J. Comput. Vision 49, 117–141 (2002). [CrossRef]

]. However, to obtain metric reconstruction linearly, the method requires that the plane should be the plane at infinity determined from three orthogonal vanishing points as in [9

9. R. Cipolla, T. Drummond, and D. P. Robertson, “Camera calibration from vanishing points in images of architectural scenes,” in “Proc. British Machine Vision Conferece,” (Nottingham, England, 1999), pp. 382–391.

]. Given calibration and rotation parameters, a linear estimation method using the metric information of a scene geometry was also suggested in [14

14. E. Grossmann and J. Santos-Victor, “Least-squares 3D reconstruction from one or more views and geometric clues,” Comput. Vis. Image Und. 99, 151–174 (2005). [CrossRef]

]. Quadratic constraints from coplanar parallelograms were discovered for camera calibration in [15

15. F. C. Wu, F. Q. Duan, and Z. Y. Hu, “An affine invariant of parallelograms and its application to camera calibration and 3D reconstruction,” in “Proc. European Conference on Computer Vision,” (2006), pp. 191–204.

]. These constraints can be converted to linear constraints with the metric information of the parallelograms. There have also been many methods using other constraints of scene geometry: symmetric polyhedra [17

17. N. Jiang, P. Tan, and L.-F. Cheong, “Symmetric architecture modeling with a single image,” ACM T. Graphic. (Proc. SIGGRAPH Asia) 28 (2009). [CrossRef]

], ellipses [18

18. F. Mai, Y. S. Hung, and G. Chesi, “Projective reconstruction of ellipses from multiple images,” Pattern Recogn. 43, 545–556 (2010). [CrossRef]

], spheres [19

19. K.-Y. K. Wong, G. Zhang, and Z. Chen, “A stratified approach for camera calibration using spheres,” IEEE Trans. Image Process. 20, 305–316 (2011). [CrossRef]

]. If there are only symmetric polyhedra and no parallelograms in a scene, the method described in [17

17. N. Jiang, P. Tan, and L.-F. Cheong, “Symmetric architecture modeling with a single image,” ACM T. Graphic. (Proc. SIGGRAPH Asia) 28 (2009). [CrossRef]

] is a good alternative to our method. However, this method assumes the simplest camera model where only the focal length is unknown and needs a nonlinear optimization approach initialized by solving quadric equations.

3. The infinity homography from parallelism and coplanarity

3.1. Preliminaries and motivations

Fig. 1 Examples of imaged parallelograms in camera images due to the two sets of parallel lines with different directions: Only (a) depicts the imaged parallelogram corresponding to an actual parallelogram existing on a plane in 3D.

3.2. Image rectification for a novel framework

The rectification homographies Hri, for i = 1, 2, are defined here for two views. The rectification homography is defined so that it maps each imaged parallelogram to a rectangle that has edges parallel to the vertical and horizontal axes of the image plane (see Fig. 2). The position and the shape of the rectangle can be chosen arbitrarily. The rectification homography is allocated to each imaged parallelogram in views and is easily computed by using four vertex correspondences.

Fig. 2 Relationship between the original H and the newly defined Hr, which is the infinite homography between the rectified images.

In the images transformed using Hri, it is clear that the two vanishing points deduced from the transformed parallelogram, which is rectangle, are [1, 0, 0]T and [0, 1, 0]T in the homogenous coordinate system without any calculation because the parallel lines intersect in infinity (see Fig. 2).

Since the newly deduced vanishing points from the rectangle are also a transformation of the original vanishing points, it is possible to conclude that the infinite homography, Hr, between the rectified cameras satisfies the following equations:
λx[100]T=Hr[100]T
(2)
and
λy[010]T=Hr[010]T,
(3)
where λx and λy are arbitrary scale factors. From Eqs. (2) and (3), Hr has the following form:
Hr=[λx0u0λyv00w],
(4)
where u, v, and w are arbitrary scalar values.

From Fig. 2, the relationship between the original H and the newly defined Hr is as follows:
HHr21HrHr1.
(5)
This gives equations for the elements of H and Hr.

As Eq. (5) can be obtained for each parallelogram (two sets of parallel lines), the number of equations constraining the variables is 9m + 1 (1 is for scale factor determination), where m is the number of parallelograms. Since the number of variables are 9 for H and 5m for Hrk (k = 1,⋯ ,m), 9m+1 ≥ 9+5m is required for the estimation to be possible. Thus, the required number of parallelograms is m ≥ 2. It is worthwhile to note that this requirement is also identical for the method that uses vanishing point correspondences because two parallelogram can give four vanishing points. It can be concluded that, in the new framework, there is no change in the number of constraints. Accordingly, although Eq. (5) is used to compute H, no advantage is given without using the additional constraint depicted in Section 3.1.

3.3. An additional constraint from an actual parallelogram

First, the camera calibration matrix of a rectified camera is deduced. The camera calibration matrix is
K=[fusu00fvv0001],
where fu and fv denote the focal length expressed in horizontal and vertical pixel dimensions, respectively, s is the skew parameter, and (u0, v0) are the pixel coordinates of the principal point.

The aspect ratio of a parallelogram is defined as the ratio of the height to the length of the bottom side. Let the four vertices of a parallelogram in 3D be [0, 0, 1]T, [1, 0, 1]T, [rcotθ + 1, r, 1]T, and [rcotθ, r, 1]T, where r is the aspect ratio of the parallelogram and θ is the angle between the two sides. [0, 0, 1]T, [1, 0, 1]T, [1, ai, 1]T, and [0, ai, 1]T, for i = 1, 2, are chosen as the four vertices of two rectangles in each view of two rectified cameras, where ai, for i = 1, 2, are the aspect ratios of the two rectangles.

In fact, it will be shown that the variables r and θ disappear from the equation for the additional constraint, thus, the metric information of the parallelogram is not necessary.

The homography mapping between the vertices of the parallelogram and the rectangle of ith camera can then be computed as
Hi=[hi1hi2hi3][1cotθ00ai/r0001].
(6)
The circular points of the plane containing the parallelogram are imaged on the rectified image to be hi1±jhi2. Let ωri be the image of the absolute conic (IAC) on the rectified image plane of the ith camera. It is then possible to obtain two equations that are linear in ωri[10

10. D. Liebowitz and A. Zisserman, “Combining scene and auto-calibration constraints,” in “Proc. IEEE International Conference on Computer Vision,” (Kerkyra, Greece, 1999), pp. 293–300. [CrossRef]

]
(hi1±jhi2)Tωri(hi1±jhi2)=0
or
hi1Tωrihi2=0andhi1Tωrihi1=hi2Tωrihi2.
(7)
From Eq. (7), it can be seen that ωr has the form of
ωri[sin2θ(r/ai)sinθcosθα(r/ai)sinθcosθr2/ai2βαβγ],
where α, β, and γ are arbitrary scalar values.

If Kri represents the camera calibration matrix of the ith rectified camera, then, because ωri=KriTKri1[20

20. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Second Edition (Cambridge University Press, 2003).

], after some manipulations it is possible to obtain
Kri[sinθcosθα0(ai/r)sinθβ00γ],
(8)
where α′, β′, and γ′ are arbitrary scalar values.

As the circular points are fixed under a similarity transformation, the aforementioned selection of the vertices of the parallelogram and the rectangle does not lose generality.

Proof: Without loss of generality, it is assumed that the plane supporting the parallelogram is on Z = 0 of the world coordinate system and that [r1, r2, r3, t] is the rotation and translation which relate the world coordinate system to the camera coordinate system. It can then be seen that
HiKri[r1r2t]
or
μKri1Hi=[r1r2t],
(9)
where Hi corresponds to Eq. (6), Kri corresponds to Eq. (8), and μ is an arbitrary scale factor.

By substituting Eq. (6) and Eq. (8) into Eq. (9), the result is
μ[sinθ0α0sinθβ00γ]=[r1r2t],
(10)
where α″, β″, and γ″ are arbitrary scalar values.

Considering that ||r1|| = ||r2|| = 1 and r3 = r1 × r2, it is given that [r1, r2, r3] = RI or RĪ. Since the rotation matrix between two rectified cameras is one of RITRI, RI¯TRI¯, RITRI¯, and RI¯TRI, it is RI or RĪ.

Proof: Let Kr1 and Kr2 be the camera calibration matrices of the two rectified cameras. Assume Rr is the rotation matrix between two rectified cameras. From Theorem 1 and Eq. (8), Hr can be computed as follows [21

21. Q.-T. Luong and T. Viéville, “Canonical representations for the geometries of multiple perspective views,” Comput. Vis. Image Und. 64, 193–229 (1996). [CrossRef]

]:
HrKr2RrKr11[10u0a2a1v00w],
where u′, v′, and w′ are arbitrary scalar values.

As mentioned above, it is worthwhile to note that Hr is independent of the angle between the two sides of the parallelogram, θ, and the aspect ratio of the parallelogram, r. From Theorem 2, if the two cameras observe the same parallelogram which actually exists in 3D, the additional and linear constraint of λy = (a2/a1)λx can be inserted in Eq. (5) because the aspect ratios, a1 and a2, are values that can be chosen arbitrarily. If a parallelogram does not in fact exist in 3D, it is not possible to assume that r is identical for each camera due to the parallax of the two cameras (refer to Figs. 1(b) and 1(c)); the constraint cannot be used.

Assume that m parallelograms are viewed with two cameras and that mr(≤ m) is the number of actual parallelograms. From Eq. (5) and the additional constraint, the infinite homography, H, can be computed by solving the set of homogeneous equations
HHr2k1HrkHr1k,fork=1,,m
(11)
or
A[H11,H12,,H33,λx1,(λy1),u1,v1,w1,,λxm,(λym),um,vm,wm]T=0,
(12)
where A is a 9m × (9 + 5mmr) matrix composed of the element of the rectification homographies, Hr1k and Hr2k, and Hij is the (i, j) element of H. The parentheses in Eq. (12) indicate that if kth parallelogram is an actual parallelogram, the variable, λyk, is not necessary due to the additional constraint.

3.4. Comparison with related works

As mentioned in Section 3.1, four vanishing points or three vanishing points together with the fundamental matrix are required to obtain the infinite homography. However, it is difficult to acquire more than three vanishing points giving independent constraints in general man-made scenes as in Fig. 3 and it is also annoying to obtain many point correspondences between all view pairs for the fundamental matrix estimation. However, even in that case, the proposed method can obtain the infinite homography due to the additional constrains arising from the actual parallelograms. A degenerate case of the proposed method occurs only when all parallelograms are on planes parallel to each other. Moreover, even if there are four vanishing points, the additional constraints can improve the accuracy of camera calibration when metric constraints are not sufficient. This issue is explored further in Section 7.1.2 and 7.2.1.

Fig. 3 Examples of the position of vanishing points in general man-made scenes. Since all parallel lines are orthogonal or parallel to the ground plane, there are only three independent vanishing points. The vanishing points v2, v3, v4, v5, and v6 are collinear and on the vanishing line of the ground plane.

However, it can be seen that these results also can be acquired from the method of [13

13. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

] because parallelepiped also gives only three vanishing points. This is due to the fact that coplanarity constraints are merged into the canonical projection matrix suggested in [13

13. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

]. However, as mentioned previously, the proposed method can deal with the general position of parallelograms in practical situations due to the additional constraints and, consequently, is free from the restriction that the parallelograms should be the faces of a parallelepiped.

Additional quadric constraints can also be derived from parallelograms and can be used to calibrate camera parameters directly [15

15. F. C. Wu, F. Q. Duan, and Z. Y. Hu, “An affine invariant of parallelograms and its application to camera calibration and 3D reconstruction,” in “Proc. European Conference on Computer Vision,” (2006), pp. 191–204.

]. However, metric information of a parallelogram are needed to convert these non-linear constraint equations to linear ones. For example, a parallelogram should be a diamond or rectangle. In fact, those linear constraint equations can also be derived in the proposed method as well. Those equations are equivalent to the constraint equations in Section 5.1 when a parallelogram’s two sides are equal in length or orthogonal, which means that a parallelogram is a diamond or rectangle.

4. Reconstruction up to affine transformation

Once the infinite homography, H0i, between one reference view and ith view are computed, the camera matrices of an affine reconstruction are P0 = [I|0] and Pi=[H0i|ti][20

20. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Second Edition (Cambridge University Press, 2003).

]. If ith view do not have common images of two or more parallelogrmas with reference view, some manipulations may be needed using intermediate view. If the infinite homographies of Hji and H0j are given, H0i can be computed as follows:
H0i=HjiH0j.
The variables remaining to be solved are camera motion t and 3D points . The equation for point projection is
[uv1]P[X˜1][h1Tt1h2Tt2h3Tt3][X˜1].
(13)
By taking vector product of the two sides of Eq. (13), two independent homogeneous equations
[uh3Th1T10uvh3Th2T01v][X˜t]=0
(14)
are obtained. Thus, 2nm set of equations in 3n+ 3(m − 1) unknowns is generated with m views and n 3D points. It is worthwhile to note that t of the reference view P0 is assumed to be 0 without loss of generality. These equations can be solved linearly to obtain t and of an affine reconstruction. It is worthwhile to note that all 3D points need not to be visible in all views in this process.

4.1. Parameter reduction using affine invariance

If there are parallelograms or parallelepipeds in views, the number of variables can be reduced with geometric constraints. In affine reconstruction process, all corner points of parallelograms or parallelepipeds are represented by one reference point and two or three vectors. Moreover, when using the above affine reconstruction formulation, the coordinate of vanishing point in the first view can be considered as a direction vector that is parallel to the line segments corresponding to this vanishing point. Representing the points on edges and surfaces of parallelograms and parallelepipeds with weighted sum of these vectors, coplanarity and colinearity constraints can be satisfied. If the length ratios between line segments parallel to each other are given, these constraints can be implemented as well using ratios of scalar weight. All these constraints are affine invariant.

Let D be a column vector containing the scalar weights for above direction vectors and the coordinates of 3D points, which cannot be represented by weighted sum of these vectors. Then can be represented by
X˜=ΣD,
(15)
where Σ contains the direction vectors and the geometric constraints described above. Then, Eq. (14) can be rewritten as
[(xh3Th1)Σ10x(yh3Th2)Σ01y][Dt]=0.
(16)
After solving these equations, can be obtained by Eq. (15). Due to this parameterization, the reconstructed vertices constitute parallelograms and parallelepipeds exactly and the coplanarity and colinearity constraints for the points on the edges and surfaces of parallelograms and parallelepipeds are exactly satisfied.

If non-linear optimization are performed to refine the linear estimation results, these parameter reduction is helpful to satisfy the geometric constraints described in this subsection. If these constraints are not embedded through the reduction, extra complex terms describing the geometric constraints between the 3D points should be added to the cost function.

4.2. Comments on linear reconstruction formula

The linear reconstruction formula described above is similar to the method presented in [12

12. C. Rother and S. Carlsson, “Linear multi view reconstruction and camera recovery using a reference plane,” Int. J. Comput. Vision 49, 117–141 (2002). [CrossRef]

]. In that work, a reference plane visible in all views are needed for linear projective reconstruction. If three orthogonal vanishing points, which span the plane at infinity, are detected in all views, internal parameters and metric reconstruction can be obtained linearly. It was assumed that the camera had zero skew and known aspect ratio or fixed internal parameters. However, the assumptions about aspect ratio and fixed internal parameters are not valid for arbitrary archival images or images gathered from the internet while modelling a scene.

The structure parameter reduction is similar to [11

11. D. Jelinek and C. J. Taylor, “Reconstruction of linearly parameterized models from single images with a camera of unknown focal length,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 767–773 (2001). [CrossRef]

] presenting single view based approach and is also suggested in the linear step in [14

14. E. Grossmann and J. Santos-Victor, “Least-squares 3D reconstruction from one or more views and geometric clues,” Comput. Vis. Image Und. 99, 151–174 (2005). [CrossRef]

]. However, in [14

14. E. Grossmann and J. Santos-Victor, “Least-squares 3D reconstruction from one or more views and geometric clues,” Comput. Vis. Image Und. 99, 151–174 (2005). [CrossRef]

], given that calibration and rotation parameters are estimated previously by other calibration methods, the parameterization using the scene geometry is done in metric space not in affine space. It is also straightforward that the regularity and symmetry information used in [14

14. E. Grossmann and J. Santos-Victor, “Least-squares 3D reconstruction from one or more views and geometric clues,” Comput. Vis. Image Und. 99, 151–174 (2005). [CrossRef]

] can be implemented during the overall process of the proposed method by using the constraints of length ratios in Section 4.1 and orthogonality in 5.1. This means that, in the proposed method, full geometric information contributes to the camera calibration and reconstruction simultaneously because the calibration results are obtained at the last step.

5. Upgrade to metric reconstruction

5.1. Constraints from scene geometry

Assume that there are four points, Ei, for i = 1, ⋯ ,4, in metric space and two vectors, dE1 = E2E1 and dE2 = E4E3. If four points in affine space corresponding to the above four points are Ai, for i = 1, ⋯,4, and dA1 = A2A1 and dA2 = A4A3, then, dE1 = A−1dA1 and dE2 = A−1dA2. Assume that dE1 and dE2 are orthogonal to each other in metric space. Then, since dE1TdE2=0, we can obtain following constraint
dA1TΩAdA2=0.
(18)
This equation is similar to the equation describing the relation between the image of absolute conic (IAC) and orthogonal vanishing points. However, Eq. (18) needs the finite 3D points reconstructed in affine space and do not need particular points such as vanishing points.

If we know the ratio of dE1 to dE2 as r, then,
dE1TdE1/dE2TdE2=r2
or
dA1TΩAdA1=r2dA2TΩAdA2.
(19)
Moreover, if we know additionally the angle between dE1 and dE2 as θ, then,
cosθ=dE1TdE2(dE1TdE1)1/2(dE2TdE2)1/2
or
dA1TΩAdA2=rcosθdA2TΩAdA2.
(20)
It is worthwhile to note that all equations derived in this section can be solved linearly.

5.2. Constraints from partial knowledge of camera parameters

The absolute dual quadric (ADQ), Q*A, in affine space can be computed from HEAQ*EHEAT[20

20. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Second Edition (Cambridge University Press, 2003).

], where Q*E is ADQ in metric space, and has the following form
Q*A=[(ΩA)1000].
(21)

If the internal camera parameters are static, we can set all {ωi} to be ω in Eq. (22) and, then, ω can be eliminated using the relation ω=ΩA in reference view. These results in linear equations on ΩA provided H0i is normalized as det(H0i)=1.

Knowing that pixels are rectangles (zero camera skew), and principal point is at origin, and aspect ratio is known as r, the following linear equations on ΩA can be obtained from Eq. (22):
{(H0iTΩAH0i1)12=0(H0iTΩAH0i1)13=(H0iTΩAH0i1)23=0r2(H0iTΩAH0i1)11=(H0iTΩAH0i1)22.

5.3. Comparison with related works

The constraints in Section 5.1 and 5.2 can be derived by basic knowledge of projective geometry. These constraints are somewhat equivalent to the constraints in [13

13. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

] that is extracted by the prior knowledge about intrinsic parameters of cameras and parallelepipeds. The difference is that the 3D points used to derive the constraints are not only the vertices of parallelograms or parallelepipeds but also any 3D points in a scene in the proposed method.

6. Outline of the algorithm

7. Experimental results

7.1. Results on synthetic data

7.1.1. Performance evaluation

Simulated experiments were performed in order to make careful analysis of the performance of the algorithm in various noise magnitude, parallelogram size, and singular configurations. The scenario is shown in Fig. 4. Tests were performed with synthetic 1024×768 images, taken by three cameras with the following intrinsic parameters: (fu, fv,s,u0,v0)=(1200, 1000, 0, 512, 384). Two parallelograms were placed in front of three cameras and arranged so that there are only three vanishing points. The distances from the cameras to the parallelograms was about 8m. The cameras were placed horizontally and had arbitrary roll angles. The distances between the neighborhood cameras were about 6m. The constraints used in this experiment were: orthogonality of the edge of the parallelograms and zero skew angle of the cameras.

Fig. 4 The environment of the simulated experiment for performance evaluation. The arrows illustrate the systematical variation of the relevant parameters.

The data normalization described in [22

22. R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” (Sendai, Japan, 1995), pp. 1064–1070.

] was used for all experiments in this paper. For each parameters, 1,000 independent trials were performed and the results shown are the average. The estimated results are relative quantities and determined up to scale. To obtain position error, the estimated position of the vertices of the parallelograms are fitted to the ground truth using the method of [23

23. B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, “Closed form solution of absolute orientation using orthonormal matrices,” J. Opt. Soc. Am 5, 1127–1135 (1988). [CrossRef]

].

First, the performance was evaluated with respect to the noise magnitude. Zero-mean uniformly distributed noise over the [−n, n] pixel was added to the image projections of the vertices of the parallelograms where n = 0.0, 0.25, 0.5,⋯ ,2. The edge size of the parallelograms and the angle between the planes of the two parallelograms were set to 1.5m and 120°, respectively. The results are shown at Figs. 5(a) and 5(d). We can see that the errors increase linearly with increasing the noise magnitude.

Fig. 5 The results from the simulated experiments to analyze the relation between the performance and noise magnitude ((a) and (d)), size of parallelograms ((b) and (e)), and angle between the two planes ((c) and (f)). [tx, ty, tz] and [X, Y, Z] indicate the position error of the estimated cameras and parallelograms, respectively.

Second, we tested the performance while varying the edge size of the parallelograms from 0.5m to 2.0m. In this experiments, the angle between the planes of the two parallelograms were set to 120° and zero-mean uniformly-distributed noise over the interval [−0.5, 0.5](pixel) was added. The results are shown in Figs. 5(b) and 5(e). We can see that the algorithm can acquire reasonable results for the edge size over 1.0m.

Third, we tested the performance while varying the angle between the planes of the two parallelograms from 50° to 170°. In this experiments, the edge size of the parallelograms were set to 1.5m and zero-mean uniformly-distributed noise over the interval [−0.5, 0.5] pixel was added. The results are shown in Figs. 5(c) and 5(f). We can see that the results are unstable when the angle is near to 100° and 170°. In case of 100°, parallelograms are nearly perpendicular to the image planes of the side cameras. In case of 170°, two parallelograms are almost parallel.

7.1.2. Effect of additional constraints

As mentioned in Section 3.4, the additional constraints can contribute to obtaining the infinite homography even in the case of three vanishing point. In these simulated experiments, the additional accuracy effect of the additional constraint is evaluated. When there are no geometric information except for two parallelograms in a scene, the camera parameters estimated is evaluated. The prior knowledge used is that cameras have a static intrinsic parameters.

Comparisons were made with the following algorithms:
  • Corr: The method using vanishing point correspondences.
  • NAC: The method using the proposed framework not including the additional constraints.
  • AC: The proposed method including the additional constraints.
  • F: Using the infinite homography obtained through the fundamental matrix estimation and the projective reconstruction of the cameras and the plane at infinity [20

    20. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Second Edition (Cambridge University Press, 2003).

    ].
  • Zhang: The classical method of Zhang [2

    2. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 1330–1334 (2000). [CrossRef]

    ].

The simulated camera had the following parameters: fx = 1, 200, fy = 1, 000, s = 0, u0 = 512, and v0 = 384. The resolution of the simulated image was 1024 × 768. Two parallelograms of side 1m were placed around the origin. The angle between the plans containing each parallelogram was 90°.

Zero-mean uniformly distributed noise over the [−n, n] pixel was added to the image projections of the vertices of the parallelograms where n = 0.25, 0.5, 0.75,⋯ ,2. For each method and noise level, 1,000 independent trials were performed and the results shown are the average. For each trial, three images were taken from the cameras randomly placed 5m ahead of the origin and pointing towards the origin so that the faces of the parallelograms can be viewed. The number of features for the fundamental matrix estimation of F was 20, and the features were randomly distributed over the 3D region containing the two parallelograms. For Zhang, it was assumed that the metric information of the vertices of the parallelograms was given. The estimation results of these experiments are shown in Fig. 6. Since the performances are highly similar both for fu and fv and for uo and vo, the estimation results for fv and vo are not depicted here.

Fig. 6 The results of the camera parameter estimates with simulated data: The mean of the absolute error of the calibration parameters are shown as a function of the noise levels for various methods. The cases in which three and four vanishing points exist are indicated by 3vps and 4vps, respectively. (a) and (d) refer to fu. (b) and (e) refer to the skew angle. (c) and (f) refer to u0.

The results with four vanishing point correspondences are shown in Figs. 6(a), 6(b), and 6(c). If none of the sides of the parallelograms are parallel to the plane containing the other parallelogram, there are four vanishing point correspondences, and three of these are not collinear. It is clear that the performance of AC is superior to that of Corr and NAC. It is worthwhile to note that the performances of Corr and NAC are similar, as expected in Section 3.2. The results of AC are similar with those of F, which require the fundamental matrix estimation from many feature correspondences.

If one side of one of the parallelograms is parallel to one side of the other parallelogram as in Fig. 4, only three vanishing points exist. A comparison between the performances of the methods is shown in Figs. 6(d), 6(e), and 6(f) when the number of vanishing points is three or four. Since the case of three vanishing points is a degenerate case for Corr and NAC, no results are depicted for these methods. However, AC and F can provide a solution even in the case of three vanishing points. It was found that the estimation results of AC are nearly identical for the two cases. The estimation results of F with three vanishing points are degraded comparing to the case of four vanishing points. From these result, it can be concluded that AC is superior to F in practical situations in which there are only three vanishing points.

The camera parameters from AC are comparable to those of Zhang. However, it is worthwhile to note that AC uses only parallelism and does not require the metric information that is necessary when using Zhang.

7.2. Results on real images

Various experiments with real images were also performed to test the algorithms. All line segments in the images were extracted automatically [24

24. D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach (Prentice Hall, 2003).

] (See Fig. 7) and the line segments corresponding to the parallelogram’s sides were selected manually. The vertices of the parallelograms were extracted from the intersections of the lines.

Fig. 7 Line extraction examples for the Plant Scene experiment presented in Section 7.2.2.

7.2.1. Tower scene

The resolution of the images was 1024 × 768. The images used in this experiment are shown in Fig. 8. The two parallelograms denoted in Fig. 8(a) with the white dotted lines were used as the input for the algorithms. The prior information used for the algorithm was that cameras have static intrinsic parameters. Fig. 8(d) shows the reconstructed model and the camera poses. Rendered new views of the reconstructed model are shown in Figs. 8(e) and 8(f).

Fig. 8 Three images used in the Tower Scene experiment and reconstructed model and camera poses.

Since the ground truth or any references for the reconstruction results are not available in this experiment, the accuracy of the reconstruction for the known geometry that is nevertheless not used in the algorithm is a useful measure of the performances of the algorithms. The line a and the plane consisting of the two lines b and c depicted in Fig. 8(c) were reconstructed. It was known that the angle between the line a and the normal to the plane was zero. The measure of this angle was 2.22° through the proposed method. In this experiment, there were four vanishing points and the method using vanishing point correspondences also can be used. Using that method, the measurement was 16.81°. From these results, it can be seen that the proposed method give more accurate calibration results owing to the additional constraints even in the case of four vanishing points when the static camera constraint is only used. If the geometric constraints were sufficient in this case, the results from the two methods were comparable.

7.2.2. Plant scene

The resolution of the images was 1024 × 768 and the camera parameters were not static while the images were captured. Four captured images are shown in Fig. 9. Image 0 corresponds to the reference camera. The infinite homographies H01, H12, and H23 are computed in this experiments.

Fig. 9 Four captured images for Plant Scene experiment. (a) Image 0. (b) Image 1. (c) Image 2. (d) Image 3.

H01 can be obtained with the two parallelograms corresponding to the frontal and right face B of the building. However, it looks like that one parallelogram is insufficient between Image 1 and Image 2 because at least two parallelograms not parallel to each other are required to obtain sufficient constraints. In this case, it can be considered that the right and left faces (B and C) of the building are same parallelograms. This is due to the fact that the proposed method computing the infinite homogrphy is independent to the translational motion of the parallelograms. So, face A, B, and C are used to compute H12. Thus, in contrast to the method described in [13

13. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

], since the proposed method can use parallelograms and partially overlapped images, it is more flexible in use. For H23, face A and C are used. Metric constraints used in this experiment were: orthogonality of the edge of the building, right angle between line segment a and b, and length ratio of c to d. Fig. 10 shows the reconstructed model and the camera poses in new view positions.

Fig. 10 Reconstructed model and camera poses for Plant Scene experiment.

7.2.3. Scene of the Bank of China

Five images of the Bank of China in Hong Kong were gathered from the internet(see Fig. 11). The resolution of the images was various from 419 × 783 to 1536 × 2048.

Fig. 11 Five images for the experiment of the scene of Bank of China. (a) Image 0. (b) Image 1. (c) Image 2. (d) Image 3. (e) Image 4.

H01, H02, H23, and H24 are computed using {A, C}, {A, B, D}, {D, E}, and {A, D, F, G}, respectively. It is worthwhile to note that there are no parallelepipeds of which at least six vertices are seen across Image 0 and Image 2 and the method of [13

13. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

] cannot be applied. However, we can find common parallelograms. It is also the same for the pair of Image 2 and Image 3. {A, G} and {D, F} can be considered as the same parallelograms respectively in the estimation of H24 as explained in Section 7.2.2. Metric constraints used in this experiment were: orthogonality of the edge of the building and orthogonality of crossing diagonal line. Reconstructed model and the camera poses are shown in Fig. 12.

Fig. 12 Reconstructed model and camera poses for the experiment of Bank of China.

7.2.4. Scene of the Casa da Música

In this section, reconstruction of the Casa da Música in Porto is presented. Ten images were collected from the internet (see Fig. 13). The resolution of the images was various from 640 × 480 to 2313×2736. It is also noted that there are no parallelepipeds of which at least six vertices are seen across the images and the method of [13

13. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

] cannot be applied.

Fig. 13 Ten images for the experiment of the scene of Casa da Música. (a) Image 0. (b) Image 1. (c) Image 2. (d) Image 3. (e) Image 4. (f) Image 5. (g) Image 6. (h) Image 7. (i) Image 8. (j) Image 9.

The infinite homographies H0i (i = 1,⋯ ,5) were computed using the faces A, B, and C. The infinite homographies H67, H78, and H89 were based on the faces D, E, F, and G. In this experiment, since two parallelograms commonly viewed across the image group {0,⋯ ,5} and {6,⋯ ,9} were not found, the reconstruction process was separately applied to each image group. Then, the two reconstruction results were merged so that the four vertices a, b, c, and d were aligned (see the points depicted in Figs. 13(a), 13(c), 13(h), and 13(i)).

Metric constraints used in this experiment were: right angles for the parallelograms and zero camera skew. The reconstructed model and the camera poses are shown in Fig. 14.

Fig. 14 Reconstructed model and camera poses for the experiment of Casa da Música.

8. Conclusions

The proposed method uses the infinite homography estimated together with the additional constraints. The additional constraints arising from the parallelograms are clearly visible when developing a novel framework for estimating the infinite homography via rectification of parallelograms. It was shown that the novel approach including the additional constraints has two advantages. First, the camera parameters with greater accuracy can be acquired when the geometric constraints are not sufficient. Second, even if only three vanishing points exist in the views, the infinite homography can be computed without the need of estimating the fundamental matrix.

Acknowledgments

This work was supported by the strategic technology development program of MCST/MKE/KEIT [KI001798, Development of Full 3D Reconstruction Technology for Broadcasting Communication Fusion].

References and links

1.

R. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE Trans. Robot. Autom. 3, 323–344 (1987). [CrossRef]

2.

Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 1330–1334 (2000). [CrossRef]

3.

J.-H. Kim and B.-K. Koo, “Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object,” Opt. Express 20, 25292–25310 (2012). [CrossRef] [PubMed]

4.

M. Pollefeys and L. V. Gool, “Stratified self-calibration with the modulus constraint,” IEEE Trans. Pattern Anal. Mach. Intell. 21, 707–724 (1999). [CrossRef]

5.

M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch, “Visual modeling with a hand-held camera,” Int. J. Comput. Vision 59, 207–232 (2004). [CrossRef]

6.

T. Moons, L. V. Gool, M. Proesmans, and E. Pauwels, “Affine reconstruction from perspective image pairs with a relative object-camera translation in between,” IEEE Trans. Pattern Anal. Mach. Intell. 18, 77–83 (1996). [CrossRef]

7.

P. Hammarstedt, F. Kahl, and A. Heyden, “Affine reconstruction from translational motion under various auto-calibration constraints,” J. Math. Imaging Vis. 24, 245–257 (2006). [CrossRef]

8.

L. Agapito, E. Hayman, and I. Reid, “Self-calibration of rotating and zooming cameras,” Int. J. Comput. Vision 45, 107–127 (2001). [CrossRef]

9.

R. Cipolla, T. Drummond, and D. P. Robertson, “Camera calibration from vanishing points in images of architectural scenes,” in “Proc. British Machine Vision Conferece,” (Nottingham, England, 1999), pp. 382–391.

10.

D. Liebowitz and A. Zisserman, “Combining scene and auto-calibration constraints,” in “Proc. IEEE International Conference on Computer Vision,” (Kerkyra, Greece, 1999), pp. 293–300. [CrossRef]

11.

D. Jelinek and C. J. Taylor, “Reconstruction of linearly parameterized models from single images with a camera of unknown focal length,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 767–773 (2001). [CrossRef]

12.

C. Rother and S. Carlsson, “Linear multi view reconstruction and camera recovery using a reference plane,” Int. J. Comput. Vision 49, 117–141 (2002). [CrossRef]

13.

M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

14.

E. Grossmann and J. Santos-Victor, “Least-squares 3D reconstruction from one or more views and geometric clues,” Comput. Vis. Image Und. 99, 151–174 (2005). [CrossRef]

15.

F. C. Wu, F. Q. Duan, and Z. Y. Hu, “An affine invariant of parallelograms and its application to camera calibration and 3D reconstruction,” in “Proc. European Conference on Computer Vision,” (2006), pp. 191–204.

16.

L. G. de la Fraga and O. Schutze, “Direct calibration by fitting of cuboids to a single image using differential evolution,” Int. J. Comput. Vision 80, 119–127 (2009). [CrossRef]

17.

N. Jiang, P. Tan, and L.-F. Cheong, “Symmetric architecture modeling with a single image,” ACM T. Graphic. (Proc. SIGGRAPH Asia) 28 (2009). [CrossRef]

18.

F. Mai, Y. S. Hung, and G. Chesi, “Projective reconstruction of ellipses from multiple images,” Pattern Recogn. 43, 545–556 (2010). [CrossRef]

19.

K.-Y. K. Wong, G. Zhang, and Z. Chen, “A stratified approach for camera calibration using spheres,” IEEE Trans. Image Process. 20, 305–316 (2011). [CrossRef]

20.

R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Second Edition (Cambridge University Press, 2003).

21.

Q.-T. Luong and T. Viéville, “Canonical representations for the geometries of multiple perspective views,” Comput. Vis. Image Und. 64, 193–229 (1996). [CrossRef]

22.

R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” (Sendai, Japan, 1995), pp. 1064–1070.

23.

B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, “Closed form solution of absolute orientation using orthonormal matrices,” J. Opt. Soc. Am 5, 1127–1135 (1988). [CrossRef]

24.

D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach (Prentice Hall, 2003).

OCIS Codes
(150.0155) Machine vision : Machine vision optics
(150.1135) Machine vision : Algorithms
(150.1488) Machine vision : Calibration

ToC Category:
Machine Vision

History
Original Manuscript: November 30, 2012
Revised Manuscript: January 28, 2013
Manuscript Accepted: January 29, 2013
Published: February 13, 2013

Citation
Jae-Hean Kim and Bon-Ki Koo, "Linear stratified approach using full geometric constraints for 3D scene reconstruction and camera calibration," Opt. Express 21, 4456-4474 (2013)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-21-4-4456


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. R. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE Trans. Robot. Autom.3, 323–344 (1987). [CrossRef]
  2. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell.22, 1330–1334 (2000). [CrossRef]
  3. J.-H. Kim and B.-K. Koo, “Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object,” Opt. Express20, 25292–25310 (2012). [CrossRef] [PubMed]
  4. M. Pollefeys and L. V. Gool, “Stratified self-calibration with the modulus constraint,” IEEE Trans. Pattern Anal. Mach. Intell.21, 707–724 (1999). [CrossRef]
  5. M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch, “Visual modeling with a hand-held camera,” Int. J. Comput. Vision59, 207–232 (2004). [CrossRef]
  6. T. Moons, L. V. Gool, M. Proesmans, and E. Pauwels, “Affine reconstruction from perspective image pairs with a relative object-camera translation in between,” IEEE Trans. Pattern Anal. Mach. Intell.18, 77–83 (1996). [CrossRef]
  7. P. Hammarstedt, F. Kahl, and A. Heyden, “Affine reconstruction from translational motion under various auto-calibration constraints,” J. Math. Imaging Vis.24, 245–257 (2006). [CrossRef]
  8. L. Agapito, E. Hayman, and I. Reid, “Self-calibration of rotating and zooming cameras,” Int. J. Comput. Vision45, 107–127 (2001). [CrossRef]
  9. R. Cipolla, T. Drummond, and D. P. Robertson, “Camera calibration from vanishing points in images of architectural scenes,” in “Proc. British Machine Vision Conferece,” (Nottingham, England, 1999), pp. 382–391.
  10. D. Liebowitz and A. Zisserman, “Combining scene and auto-calibration constraints,” in “Proc. IEEE International Conference on Computer Vision,” (Kerkyra, Greece, 1999), pp. 293–300. [CrossRef]
  11. D. Jelinek and C. J. Taylor, “Reconstruction of linearly parameterized models from single images with a camera of unknown focal length,” IEEE Trans. Pattern Anal. Mach. Intell.23, 767–773 (2001). [CrossRef]
  12. C. Rother and S. Carlsson, “Linear multi view reconstruction and camera recovery using a reference plane,” Int. J. Comput. Vision49, 117–141 (2002). [CrossRef]
  13. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell.27, 194–207 (2005). [CrossRef] [PubMed]
  14. E. Grossmann and J. Santos-Victor, “Least-squares 3D reconstruction from one or more views and geometric clues,” Comput. Vis. Image Und.99, 151–174 (2005). [CrossRef]
  15. F. C. Wu, F. Q. Duan, and Z. Y. Hu, “An affine invariant of parallelograms and its application to camera calibration and 3D reconstruction,” in “Proc. European Conference on Computer Vision,” (2006), pp. 191–204.
  16. L. G. de la Fraga and O. Schutze, “Direct calibration by fitting of cuboids to a single image using differential evolution,” Int. J. Comput. Vision80, 119–127 (2009). [CrossRef]
  17. N. Jiang, P. Tan, and L.-F. Cheong, “Symmetric architecture modeling with a single image,” ACM T. Graphic. (Proc. SIGGRAPH Asia) 28 (2009). [CrossRef]
  18. F. Mai, Y. S. Hung, and G. Chesi, “Projective reconstruction of ellipses from multiple images,” Pattern Recogn.43, 545–556 (2010). [CrossRef]
  19. K.-Y. K. Wong, G. Zhang, and Z. Chen, “A stratified approach for camera calibration using spheres,” IEEE Trans. Image Process.20, 305–316 (2011). [CrossRef]
  20. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Second Edition (Cambridge University Press, 2003).
  21. Q.-T. Luong and T. Viéville, “Canonical representations for the geometries of multiple perspective views,” Comput. Vis. Image Und.64, 193–229 (1996). [CrossRef]
  22. R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” (Sendai, Japan, 1995), pp. 1064–1070.
  23. B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, “Closed form solution of absolute orientation using orthonormal matrices,” J. Opt. Soc. Am5, 1127–1135 (1988). [CrossRef]
  24. D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach (Prentice Hall, 2003).

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.


« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited