OSA's Digital Library

Optics Express

Optics Express

  • Editor: C. Martijn de Sterke
  • Vol. 20, Iss. 23 — Nov. 5, 2012
  • pp: 25292–25310
« Show journal navigation

Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object

Jae-Hean Kim and Bon-Ki Koo  »View Author Affiliations


Optics Express, Vol. 20, Issue 23, pp. 25292-25310 (2012)
http://dx.doi.org/10.1364/OE.20.025292


View Full Text Article

Acrobat PDF (2132 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

In this paper, a new and convenient calibration algorithm is proposed for unsynchronized camera networks with a large capture volume. The proposed method provides a simple and accurate means of calibration using a small 3D reference object. Moreover, since the inaccuracy of the object is also compensated simultaneously, the manufacturing cost can be decreased. The extrinsic and intrinsic parameters are recovered simultaneously by capturing an object placed arbitrarily in different locations in the capture volume. The proposed method first resolves the problem linearly by factorizing projection matrices into the camera and the object pose parameters. Due to the multi-view constraints imposed on factorization, consistency of the rigid transformations among cameras and objects can be imposed. These consistent estimation results can be further refined using a non-linear optimization process. The proposed algorithm is evaluated via simulated and real experiments in order to verify that it is more efficient than previous methods.

© 2012 OSA

1. Introduction

To perform vision-based tasks such as 3D reconstruction and tracking using camera networks, the cameras must be calibrated in advance. To obtain accurate calibration results, the first method developed consists in setting up a calibration object with great accuracy [1

1. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd. ed. (Cambridge University Press, 2003).

]. In cases where the image capture volume is large, the calibration object should also be as large as possible. This is due to the fact that the points on the object should cover the volume so as to provide enough reference information. However, in practice it is difficult to handle and maintain a large calibration object; while manufacturing a large object with accuracy is both expensive and poses a difficult problem to resolve.

To avoid the problem of having to use a 3D calibration object, a method using a simple 2D object was proposed in [2

2. Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).

, 3

3. T. Ueshiba and F. Tomita, “Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices,” in “Proc. IEEE International Conference on Computer Vision,” (Nice, France, 2003), pp. 966–973. [CrossRef]

]. In practice, this method is very handy where a small capture volume is involved. However, certain drawbacks remain when the capture volume is increased. The large 2D object should be flat and must be moved by hand to obtain multiple images so as to make calibration possible. Moreover, for camera networks in which the cameras are arranged around the capture volume, it is impossible for the cameras on the opposite side to see the reference points concurrently when using a 2D object. This is one of the disadvantages of using a 2D object compared to using a 3D object. To acquire good relative extrinsic parameters between the cameras, it is desirable to obtain as many reference points observed in a common coordinate system as possible.

Many methods adopting the idea of using moving bright spots have been suggested for the calibration of sparsely distributed camera networks. Kurillo et al. and Medeiros et al. each proposed a method based on pairwise calibration [5

5. G. Kurillo, Z. Li, and R. Bajcsy, “Wide-area external multi-camera calibration using vision graphs and virtual calibration object,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–9. [CrossRef]

, 6

6. H. Medeiros, H. Iwaki, and J. Park, “Online distributed calibration of a large network of wireless cameras using dynamic clustering,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–10. [CrossRef]

]. They used a rod of known length with LED lights at each end to establish a consistent scale for sparsely distributed cameras. Chen et al. used an iterative approach to refine the calibration results from unsynchronized cameras [7

7. X. Chen, J. Davis, and P. Slusallek, “Wide area camera calibration using virtual calibration objects,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head, SC, USA (2000), pp. 520–527.

]. However, all these methods assume that the intrinsic parameters are given a priori, which contributes to stable calibration results for sparsely distributed cameras.

Sinha and Pollefeys developed a multi-camera calibration method using the motion of silhouettes [8

8. S. N. Sinha and M. Pollefeys, “Camera network calibration and synchronization from silhouettes in archived video,” Int. J. Comput. Vision 87, 266–283 (2010). [CrossRef]

]. This method makes use of discrete sampling on the silhouette boundary of a subject to find the frontier points. The latter can be used to determine the epipolar geometry between any two views. This elegant approach has the advantage of not using any calibration object. However, many frontier points should be spread over the images (See Fig.8 in [8

8. S. N. Sinha and M. Pollefeys, “Camera network calibration and synchronization from silhouettes in archived video,” Int. J. Comput. Vision 87, 266–283 (2010). [CrossRef]

]) to obtain reliable results. These can be obtained when a subject moves widely over the capture volume. However, if the cameras are unsynchronized, the subject must act as it would in a stop-motion animation. Although this work also suggested a scheme for solving the problem of unsynchronized video streams, an approximate interpolation strategy was used to treat sub-frame discrepancy, and it can only be applied when the subject’s inter-frame motion is small.

Another method which also uses object silhouettes, by introducing an optimization process over a continuous space, also has been suggested [9

9. E. Boyer, “On using silhouettes for camera calibration,” in “Proc. Asian Conference on Computer Vision,” Hyderabad, India (2006), pp. 1–10.

]. Although this method alleviates the above requirements concerning the distribution of the frontier points, it requires good initialization of the camera parameters.

In this paper, we propose a new calibration algorithm using a small 3D reference object for camera networks. The purpose is to overcome the problem stated above and to eliminate the need for large 2D or 3D reference objects in the case of unsynchronized camera systems. The method is very useful in that it is easier to manufacture, move, and maintain a small reference object. The extrinsic and intrinsic parameters are recovered simultaneously by capturing the object, which is placed arbitrarily in different locations, in the capture volume. The calibration object is maneuvered so that the reference points on the object can fully cover the capture volume. The relative pose parameters between the objects are obtained simultaneously with the camera parameters. Thus the object acts as a large virtual calibration object, and the proposed method provides a simple and accurate means of calibrating camera networks. Moreover, since the proposed method can compensate the inaccuracy of the object, the manufacturing cost of the 3D reference object can be reduced. The radial lens distortion is also calibrated simultaneously using a similar method to [2

2. Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).

].

2. Related works

To calibrate camera networks, Kassebaum et al. proposed a method of placing a 3D calibration object in only one position in each pairwise view overlap [10

10. J. Kassebaum, N. Bulusu, and W.-C. Feng, “3-d target-based distributed smart camera network localization,” IEEE Trans. Image Process. 19, 2530–2539 (2010). [CrossRef] [PubMed]

]. They assumed that the intrinsic parameters were given by a separate estimation method. Although the method can obtain reasonable results by virtue of the given parameters, the need for a large object for accuracy cannot be avoided. Tests are presented in this paper to show that the results obtained with that approach are substantially less accurate than the ones achieved with our algorithm.

Most self-calibration methods based on the factorization approach decompose a point measurement matrix into the camera and point parameters [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

, 11

11. C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization method,” Int. J. Comput. Vision 9, 137–154 (1992). [CrossRef]

14

14. D. Martinec and T. Pajdla, “Structure from many perspective images with occlusions,” in “Proc. European Conference on Computer Vision,” Copenhagen, Denmark, (2002), pp. 355–369.

]. The proposed method newly suggests a factorization-based approach which decomposes projection matrices corresponding to each object location into camera and object parameters. The proposed method is inspired by the works of [3

3. T. Ueshiba and F. Tomita, “Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices,” in “Proc. IEEE International Conference on Computer Vision,” (Nice, France, 2003), pp. 966–973. [CrossRef]

, 15

15. P. Sturm, “Algorithms for plane-based pose estimation,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head Island, SC, USA, (2000), pp. 706–711.

, 16

16. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

], in which the measurement matrix of primitives, e.g., rectangles or parallelepipeds, was decomposed. The proposed method applies this type of work to multi-camera calibration using a small reference object. Since the projection matrices have more geometric information than points, it is very helpful in practice for sparsely distributed cameras compared to the point-based self-calibration technique.

The point-based factorization method used an iterative approach to estimate the scale factors corresponding to the projective depths of each point [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

, 12

12. P. Sturm and B. Triggs, “A factorization based algorithm for multi-image projective structure and motion,” in “Proc. European Conference on Computer Vision,” Cambridge, UK (1996), pp. 709–720.

]. However, in the proposed framework, very simple approaches can be used to estimate the scale factors linearly. Two alternative approaches are introduced. One of them was originated from the work of Wilczkowiak et al. [16

16. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

]. In that work, they also needed a rank-3 factorization as in our case to decompose the measurement matrix of parallelepipeds. In this paper, we analyze the two approaches and suggest which one is better in terms of accuracy.

3. Preliminaries

3.1. Camera parameterization

The camera is represented using a pinhole model. The projection model of a point X in the 3D world coordinate system to a point x in the 2D image of the ith camera is expressed as follows:
xKi[Riti]XMiX,
(1)
where x and X are homogeneous coordinates of the points and ’≅’ indicates equality up to scale. Ki is the camera’s intrinsic matrix, given by [1

1. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd. ed. (Cambridge University Press, 2003).

]. The rotation matrix Ri and the vector ti represent the camera’s orientation and position, respectively. The 3 × 4 matrix Mi encapsulates the camera’s intrinsic and extrinsic parameters.

3.2. Object pose parameterization

Let Xref be the homogeneous coordinates of a reference point with respect to the coordinate system attached to the 3D object. When the object is located at the jth position on the world coordinate system, the point is represented as follows:
X=[Sjvj0T1]Xref=NjXref,
(2)
where the rotation matrix Sj represents the object’s orientation and the vector vj its position.

4. Image measurements

4.1. Projection matrix

Consider the projection of a reference point into a camera image plane. Using the results from Section 3, the projection of the reference points in the image is:
xMiNjXref=PijXref,
(3)
where Pij=MiNj=Ki[RiSjRivj+ti].

The 3 × 4 matrix Pij will be called the projection matrix. It represents a perspective projection that maps the reference points on the object located at the jth position onto the corresponding imaged points of the ith camera. This is illustrated in Fig 1. The rotation matrix RiSj and the vector Rivj + ti represent the ith camera’s rotation and position with respect to the coordinate system attached to the jth object.

Fig. 1 Camera and 3D object pose parameters

When at least six reference points in general position on the object can be viewed, the projection matrix can be computed up to scale [17

17. O. Faugeras, Three-Dimensional Computer Vision (The MIT Press, 1993).

]. Assume that the object is positioned in jth location and can be viewed from the ith camera. Let Xrefk be the kth point on the reference object with respect to the coordinate system attached to the object. Let x˜ijk(=[u,v]T) be the corresponding point measured from the captured image. For each point correspondence Xrefkx˜ijk, we can derive a relationship
[0TXrefkTvXrefkTXrefkT0TuXrefkTvXrefkTuXrefkT0T]p=0
(4)
where pT=[Pij1Pij2Pij3], and Pijl is a 4-vector, the lth row of Pij. From set of n point correspondences, a 2n × 12 matrix Λ can be obtained by stacking up the Eq. (4). The projection matrix is computed up to scale by solving the set of equations
Λp=0.
(5)

The parameters of the projection matrix are further refined by minimizing the re-projection errors defined as follows:
kIijx˜ijkx^ijk(Pij,Xrefk)2,
(6)
where Iij is the set of indices of the reference points that can be visible from the ith camera when the object is placed in the jth location, and x^ijk represents the image points synthesized with Pij and Xrefk. This non-linear minimization problem is solved with the Levenberg-Marquardt algorithm initialized with the results from Eq. (5).

4.2. Measurement matrix

Let us now consider the case in which an object is located at n different locations and seen by m cameras. Let P˜ij be the estimation result of the projection matrix associated with the ith camera and the jth object pose and λij a scale factor such that the following equality can be written:
λijP˜ij=MiNj=Pij.
(7)

We may gather the estimated projection matrix for all m cameras and n object poses into the following single matrix:
W˜=[P˜11P˜1nP˜m1P˜mn].
(8)
The matrix will be called the measurement matrix. When the scale factors are recovered, the measurement matrix can be factorized as follows:
[λ11P˜11λ1nP˜1nλm1P˜m1λmnP˜mn]=[P11P1nPm1Pmn]=[M1Mm][N1Nn]
(9)
However, since the projection matrices are obtained up to scale, the measurement matrix cannot be factorized at its current form.

5. Parameter estimation

5.1. Rescaling the measurement matrix (Method 1)

In this section, we will describe how to solve the scale factor problem referred to in Section 4.2. Let H˜ij be the leading 3 × 3 sub-matrix of P˜ij, which can be written as H˜ijKiRiSj. Assume that, for alternative scale factors {ρij}, the following reduced measurement matrix can be factorized as follows:
Y˜=[ρ11H˜11ρ1nH˜1nρm1H˜m1ρmnH˜mn]=[a1K1R1amKmRm][S1Sn].
(10)

If {ai} have the values such that
det(aiKiRi)=1,
(11)
for all the i’s, then
det(ρijH˜ij)=1.
(12)
From this observation, we can see that if all the H˜ij’s are rescaled to have unit determinants, the above factorization is possible. Then, the alternative scale factors can be determined as
ρij=det(H˜ij)1/3.
(13)

5.2. Rescaling the measurement matrix (Method 2)

In fact, after obtaining P˜ij, we can acquire {λij} by RQ-decomposition of H˜ij’s. Assume that H˜ij is decomposed as H˜ij=ΣΘ, where Σ is an upper triangular matrix and Θ is an orthogonal matrix whose determinant is 1. Since Ki is an upper triangular matrix and (RiSj) is also rotation matrix, we can think that Σ=μijKi and Θ = RiSj. Moreover, since (Ki)33 is 1, (Σ)33 is μij.

Because Hij=KiRiSj, where Hij is the leading 3 × 3 sub-matrix of Pij, the scale factors can be determined as λij=1/μij.

5.3. Analysis of the two rescaling methods

In this section, a more detailed discussion of the two rescailing methods described in Sections 5.1 and 5.2 is given.

Suppose that H˜ij has RQ-decomposition into the product of Σ and Θ, where Σ is an upper triangular matrix and Θ is an orthogonal matrix whose determinant is 1. Let the diagonal elements of Σ be a, b, and c. Then, the scale factor from Method 2 is 1/c. It is worthwhile to note that {(Ki)11, (Ki)22}(= {fu, fv}) = {a/c, b/c}. The determinant of H˜ij is the product of the diagonal elements, abc.

Consequently, the scale factor ρij from Method 1 is (abc)−1/3 and λij from Method 2 is (ccc)−1/3. However, it is worthwhile to note that these scale factors are not ideal values because they are computed from H˜ij contaminated with image noise. From this observation, we can see that ρij has better numerical conditioning than λij in the presence of noise. When H˜ij are contaminated with image noise, the values of a, b, and c are also deviated from their true values. Since these variables are independent of each other, the product of a, b, and c has noise reduction effect compared to ccc. On the other hand, the product of three c’s amplifies its noise.

To support this analysis, we performed the following simulated experiments. We generated 3000 random H˜ij’s. Zero-mean Gaussian-distributed noise was added to the elements of these matrices. The standard derivation used for each element was 10% of the magnitude of the element. Fig. 2 shows the distribution of the relative error of abc and ccc. For example, the relative error of abc is ‖abcãb̃c̃‖/abc, where {a, b, c} are obtained from Hij and {ã, , } from contaminated H˜ij. The experimental comparison and evaluation of the two methods are presented in Section 6.1.2.

Fig. 2 The distribution of the relative error of abc and ccc. The variables, a, b, and c are diagonal elements of the upper triangular matrix obtained through RQ-decomposition of H(= KRS).

5.4. Factorization

5.5. Resolving affine ambiguity

Let the matrices Û3m×3 and 2n×3 in Eq. (16) be decomposed in the 3 × 3 and 2 × 3 submatrices, respectively: U^3m×3T=[U1TUmT] and V^2n×3T=[V1TVnT].

From Eq. (10) and (16), we have T1VjT=Sj. This can be written as:
TTT=VjTVj.
(17)
From the summation of Eq. (17) for all j’s, it can be seen that TTTD because T = D and, consequently, that TDRw, where Rw is an arbitrary rotation matrix representing the fact that the global Euclidean reference frame can be chosen arbitrarily.

5.6. Reconstructing the camera and object pose parameters

We can obtain Ki and Ri from RQ-decomposition of UiT and Sj from the rotation matrix closest to T1VjT [18

18. K. S. Arun, T. S. Huang, and S. D. Blosten, “Least-squares fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 698–700 (1987). [CrossRef]

].

Let h˜ij be the fourth column of P˜ij, which can be written as:
ρijh˜ij=μij(KiRivj+Kiti),
(18)
where μij can be obtained from det(ρijH˜ij)=det(μijKiRiSj) or μij=[det(ρijH˜ij)/det(Ki)]1/3. From Eq. (18), three independent linear equations are obtained and these equations for all pairs of the camera and object pose can be solved to obtain ti and vj.

5.7. Filling missing entries

In practice, for camera networks where all the fields of view of the cameras do not overlap, the calibration object placed in different locations cannot be seen from all the cameras simultaneously. In this case, some entries in the matrix are missing and the factorization cannot proceed. To fill the missing entries, the following simple deduction process can be used.
H˜ijH˜il(H˜kl)1H˜kj,
(19)
where H˜il, H˜kl, and H˜kj can be measured for the kth camera and the lth object position. The candidates for the intermediate kth camera and the lth object position may be multiple. In this case, all the candidates can be used simultaneously to compensate the image noise.

Owing to the simple loop style Eq. (19), the filling process can be completed much more simply compared to the complex process of the point-based factorization methods [13

13. D. Jacobs, “Linear fitting with missing data: Applications to structure from motion and to characterizing intensity images,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” San Juan, Puerto Rico (1997), pp. 206–212.

, 14

14. D. Martinec and T. Pajdla, “Structure from many perspective images with occlusions,” in “Proc. European Conference on Computer Vision,” Copenhagen, Denmark, (2002), pp. 355–369.

], which fit a low rank matrix to the measurement matrix.

5.8. Refining the parameters

5.9. Compensating the inaccuracy of a reference object

It is assumed that the object is composed of planes and that the planes are totally flat. Because the size of the object in the proposed method is small, it is not so difficult to satisfy this requirement for individual small planes. As such, it can be thought that the inaccuracy is caused while these planes are being assembled and patterns printed on the surface. Since it is easy to measure the relative 2D positions between the feature points on each surface plane, we can limit the variables regarding the inaccuracy to the rotation matrices Rf and the translation vectors tf of the planes with respect to the coordinate attached to the object. These variables are described in Fig. 3 for one example plane. Since the planes at different positions can be viewed from more than two cameras concurrently, as in Fig. 3, and the variables are not changed while the object moves, they can be computed to satisfy overall consistency. The re-projection error function is extended from Eq. (20) as follows:
i=1mj=1nkIijx˜ijkx^ijk(Ki,Ri,ti,Sj,vj,Xrefk(Rf(k),tf(k)))2,
(21)
where f(k) is the index of the face including the kth point. Rf(k) and tf(k) are initialized according to the original drawing of the object. The other parameters are initialized with the estimation results obtained up to section 5.7.

Fig. 3 Rotation matrix Rf and translation vector tf representing the inaccurcy of the reference object.

5.10. Dealing with radial distortion

Up to now, the lens distortion of cameras has not been considered. In the proposed method, the first two terms of radial distortion are considered because it has been found that a more elaborate model offers negligible improvement when compared with sensor quantization and just causes numerical instability [2

2. Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).

]. A similar method used in [2

2. Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).

] is adopted. x^ijk of Eq. (21) is extended to include the radial lens distortion parameters. The real (distorted) normalized image coordinates (, ) are represented as follows:
x=x+x[k1(x2+y2)+k2(x2+y2)2]
(22)
y=y+y[k1(x2+y2)+k2(x2+y2)2],
(23)
where (x, y) are distortion-free normalized image coordinates, and k1 and k2 are the coefficients of the radial distortion. An initial guess of k1 and k2 can be obtained through similar equations to Eq. (13) in [2

2. Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).

], which can be modified to be proper for the proposed method or simply by setting them to 0.

6. Experimental results

6.1. Simulated experiments

Simulated experiments were performed to analyze the performance of the proposed algorithm. The advantage of simulated experiments instead of real image experiments is that a rigorous ground truth is available, while the relevant parameters are varied systematically.

Two types of experimental environments were constructed. First, six cameras were evenly located on a circle with a radius of 3m (See Fig. 4(a)). Since all the cameras looked towards the center, all the fields of view of the cameras overlapped. In the second environment, five cameras were mounted on the side walls of a corridor (See Fig. 4(b)). Since the optical axes of all the cameras were set to be parallel to the end wall of the corridor, not all of the fields of view of the cameras overlapped. The length, width, and height of the corridor were 5m, 6m, and 2m, respectively. We refer to these two environments as ENV1 and ENV2, respectively.

Fig. 4 The environments for the simulated experiments. (a) ENV1. (b) ENV2.

Synthetic 1600×1200 images were taken by the cameras with the following intrinsic parameters: (fu, fv, s, u0, v0)=(1600, 1600, 0, 800, 600). Zero-mean Gaussian-distributed noise with standard deviation σ was added to the image projections. It was assumed that there were no gross outliers owing to false correspondence. However, to consider outliers in feature point detection owing to perspective and photometric effect, 10% of the noise-vectors are replaced with those from another Gaussian distribution with 2σ for all simulated experiments.

The 3D object having the shape shown in Fig. 7(a) was used and each face had 9 reference points. The number of trials was 100 for each simulation and the results shown are the averages. All results were obtained with the method described in Section 5.1 in determining the scale factors except for the results of Section 6.1.2. Since the performances obtained for all intrinsic parameters are highly similar, the estimation results for fv, uo, and vo are not depicted here.

6.1.1. Performance w.r.t noise level, object size, and number of object positions

In this section, we analyze the performance with respect to the noise magnitude, the size of the reference object, and the number of object positions.

All of the results are displayed in Table 1. From the results, it is clear that the proposed method can obtain accurate estimates with a relatively small calibration object. When the number of object positions is increased in the case of a smaller object (240mm), error compensation can be observed. The object with a size of 240mm had a reasonable minimum size for acquiring stable results for the given environments. It is worthwhile noting that the number of object positions is equal to the number of shots required to calibrate the network. In ENV2, more object positions were needed than in ENV1, because not all the cameras viewed the same region and the object was placed in several separate view overlaps. Fig. 5 shows the visibility map for the object positions in ENV2 when the number of object positions is 25. From this map, it can be seen that the results for ENV2 validate the process of filling the missing entries described in Section 5.7.

Fig. 5 The visibility map for the object positions in ENV2 when the number of object positions is 25. The horizontal and vertical axes represent the object position index and camera index, respectively.

Table 1. Results of the simulated experiments for performance evaluation. σ, W, and P represent the noise standard deviation in pixels, the object width in mm, and the number of object positions, respectively. RPE is the re-projection error in pixels. Efu is error for fu. Rotation error ER is measured by the Frobenius norm of the discrepancy between the true and estimated rotation matrix. Et is the norm of error for the translational vector in mm.

table-icon
View This Table
| View All Tables

The last four rows of Table 1 are to compare the proposed method with the method suggested by Kassebaum et al. [10

10. J. Kassebaum, N. Bulusu, and W.-C. Feng, “3-d target-based distributed smart camera network localization,” IEEE Trans. Image Process. 19, 2530–2539 (2010). [CrossRef] [PubMed]

]. As expected, the method could not yield accurate estimation results without a sufficiently sized object. To obtain estimation results comparable with those of the proposed method, that method would have needed a very large object with a width of 1450mm in these simulation environments. Moreover, even though a large object was used, that method obtained worse results than those of the proposed method when the intrinsic parameters were not given. This is due to the fact that the feature points cannot be placed inside the object. Unlike when using a large object placed in only one position, the proposed method can provide a great number of feature points evenly distributed over the capture volume.

6.1.2. Comparison of the two rescaling methods

We suggested two methods of determining the scale factors in Sections 5.1 and 5.2. The comparison of the two methods is summarized in Table 2. The number of object positions was 25 in this experiment and the environment was ENV2. After carrying out the factorization step, the RPE results obtained from Method 1 were better than those obtained with the Method 2. These results can be expected from the analysis of Section 5.3.

Table 2. Results of the simulated experiments used to compare the methods in Sections 5.1 and 5.2. RPE1 is the re-projection error obtained after the factorization step and RPE2 after the refinement step.

table-icon
View This Table
| View All Tables

These results are the initial values for the non-linear optimization described in Section 5.8. In the case of a small noise level (σ = 0.5) and a larger object (360mm), the final results after the refinement step are fortunately similar. However, the differences in the initial values affect the final results when the noise level increases or a smaller object (240mm) is used.

The feasibility of the factorization step depends on the rank 3-ness of the reduced measurement matrix rescaled by the two methods. Fig. 6 shows the magnitude of the singular values of rescaled for each environment of Table 2. To be close to rank 3, the ratio of the 3rd and 4th singular values should be large with respect to the ratio of the 1st and 3rd singular values. We can see that this ratio of Method 1 is larger than that of Method 2. From these experimental results, it can be concluded that Method 1 is more desirable than Method 2.

Fig. 6 The magnitude of the singular values of rescaled for each environment of Table 2.

6.1.3. Performance w.r.t object inaccuracy and radial distortion

In this section, we test the feasibility of the approach described in Sections 5.9 and 5.10. The width of the object was 360mm. The image noise was zero-mean Gaussian-distributed noise with standard deviation of 0.5. The number of object positions was 9 for ENV1 and 25 for ENV2, respectively. The results are summarized in Table 3. It can be seen that the proposed method can be used even if the reference object is inaccurate. The radial distortion parameters were also well estimated. The proposed algorithm could not converge for the inaccuracy level more than (θ̂f, ‖f‖)=(3°, 5mm) in ENV1 and (1°, 3mm) in ENV2 (See Table 3 for an explanation of the parameters). Since there were many missing entries in ENV2 compared with ENV1, error propagation during the filling process caused this difference. However, this performance means that we can allow a considerable degree of inaccuracy with the object and decrease the manufacturing cost.

Table 3. Results of the simulated experiments on object inaccuracy and radial distortion. (θ̂f, ‖f‖) represent the pose discrepancy of the object’s faces from an original drawing. Each face rotated around a random axis with angle θ̂f(°) and moved in a random direction f (mm). (k1, k2) are the lens distortion parameters. Ef is the norm of error for f. (Ek1, Ek2) are errors for (k1, k2).

table-icon
View This Table
| View All Tables

6.1.4. Comparison with the method of [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

]

As mentioned in Section 1, the feature points detected for the proposed method can be used as the input of the algorithm of [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

]. Unlike a moving bright spot, which can be viewed from almost any direction, all the features on the object cannot be seen from each camera because the object cannot be made to be transparent. Consequently, the survival duration of the features along the cameras is relatively short compared to instances where a moving bright spot is used. If the cameras are arranged more densely, more stable results may be obtained due to the effects of chaining local stable reconstruction. The simulation results for the method of [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

] using the features detected for the proposed method are shown in Table 4. The width of the object was 360mm. The image noise was zero-mean Gaussian-distributed noise with standard deviation of 0.5. Since the results of [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

] are up to scale, a similarity transformation was applied so that the distances between the estimated camera positions and their corresponding ground truths were minimal. From the results, we can see that satisfactory results cannot be obtained even if the number of object positions increases. Since the cameras of ENV2 look down at the capture volume, the points on the upper plane of the object can be viewed at three adjacent cameras simultaneously. This is the reason why the calibration results can be obtained in ENV2 compared to the results in ENV1 when the number of cameras is small (5 or 6). To identify the effects of the number of cameras, more cameras were located evenly in the environments while fixing the number of object positions. As mentioned above, more accurate calibration results were obtained. This implies that the method of [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

] can be applied only in the case of densely distributed cameras when the feature points are extracted from a 3D reference object for unsynchronized camera networks.

Table 4. Results of the simulated experiments for the method of [4] using the features detected for the proposed method. C and P represent the number of cameras and object positions, respectively. The case in which calibration results cannot be obtained from the method is indicated by ”×”.

table-icon
View This Table
| View All Tables

6.2. Real image experiments

6.2.1. Manufacturing a reference object

Owing to the advantage of the proposed algorithm described in Section 6.1.3, we simply constructed the reference object with cheap foam boards manually (See Fig. 7(a)). Eighteen 2D patterns containing nine concentric circles were printed on sheets of paper which were then attached to the reference object. Since at least two planes should be visible from the cameras in any direction in order to compute the projection matrices, we considered this symmetric design to be rational. The width was about 360mm.

6.2.2. Extraction of feature points

The feature points on the reference object were extracted automatically. First, circle candidates were extracted via edge detection and blob coloring. Then, two concentric circles having the nearest mass centers were paired to discriminate the probable circles on the background. By recognizing the characters shown in Fig. 7(a), the indices of the faces and circles were identified. After obtaining the ellipse parameters of the imaged circles using the method of [19

19. A. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least square fitting of ellipses,” IEEE Trans. Pattern Anal. Mach. Intell. 21, 476–480 (1999). [CrossRef]

], the bi-tangents of every pair of circles on the same plane were computed and the intersection points between the bi-tangents were used as feature points. The extraction results are shown in Fig. 7(b). It is worthwhile noting that these point coordinates should be normalized to obtain stable calibration results [20

20. R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” Sendai, Japan (1995), pp. 1064–1070.

]. This was also performed for the simulated experiments outlined above.

6.2.3. Actual camera network environments

The proposed method was evaluated on two actual camera networks. The types of environments were similar to ENV1 in Fig. 4(a) and ENV2 in Fig. 4(b). We refer to these environments as Network1 and Network2, respectively. Fig. 8 shows the illustration of the camera networks. Table 5 shows the detailed specifications of the camera networks.

Fig. 7 (a) 3D reference object used in real image experiments. (b) Automatic feature extraction results for some circle pairs. The blue lines represent the bi-tangents. The extracted features are represented with red +’s.
Fig. 8 Two actual camera networks for evaluation of the proposed algorithm.

Table 5. The specifications of the actual camera network environments. C represents the number of cameras used in the experiment. P represents the number of object positions.

table-icon
View This Table
| View All Tables

Network1 is shown in the left-hand image of Fig. 8. Images captured from some cameras of Network1 are shown in Fig. 9. Fig. 10 shows the recovered cameras and objects. From these figure, it can be seen that the qualitative positions of cameras and objects were estimated correctly. Since the ground truth is not available in real image experiments, the re-projection results are good indices of the performance of the proposed method. The average re-projection error(RPE) of the reference points was 0.0835 pixels. When the inaccuracy of the object was not compensated by the technique described in Section 5.9, the average RPE was 0.284 pixels. This implies that the object had inaccuracy and the pose discrepancy of the object’s faces from an original drawing was well estimated. The average of the estimated discrepancy of the position was 2.32mm.

Fig. 9 Example of captured images for Network1.
Fig. 10 Visualization of the calibration results for Network1. (a) Top and (b) side views are shown, respectively.

The right-hand image of Fig. 8 shows Network2. Fig. 11 shows the images of the reference object captured from some cameras. Fig. 12 shows the visibility map for the object positions in Network2. The filling process described in Section 5.7 was conducted successfully in this experiment. The visualization of the calibration results are shown in Fig. 13. The average RPE was 0.0473 pixels. When the inaccuracy of the object was not compensated, the average RPE is 0.148 pixels.

Fig. 11 Example of captured images for Network2.
Fig. 12 The visibility map for the object positions in Network2. The horizontal and vertical axes represent the object position index and the camera index, respectively.
Fig. 13 Visualization of the calibration results for Network2. (a) Top and (b) side views are shown, respectively.

In order to investigate the consistency of the proposed method, we applied it to the camera set {0,2,4} and {1,3}, respectively, in both Network1 and Network2. The calibration results are shown in Table 6 and 7. We can see that the differences between the results from the subgroups of the cameras and those from all the cameras are quite small. These results imply that the proposed algorithm provides consistent and stable estimation results.

Table 6. Results of the real image experiments for consistency evaluation. These results are obtained from the subgroups of the cameras, {0,2,4} and {1,3}. Compare these results with those of Table 7.

table-icon
View This Table
| View All Tables

Table 7. Results of the real image experiments for consistency evaluation. These results are obtained from all the cameras. Compare these results with those of Table 6.

table-icon
View This Table
| View All Tables

The results of calibration could not be obtained using the method of [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

] with the feature points detected in either Network1 or Network2. In these environments, no features could be viewed from more than two cameras simultaneously due to the sparse distribution of the cameras.

7. Conclusions

The intrinsic and extrinsic parameters of camera networks were recovered only by capturing a small inaccurate reference object. The proposed method provides a simple and accurate means of calibrating unsynchronized camera networks without the need for a large reference object for accuracy. The algorithm is based on the factorization of projection matrices into the camera and object pose parameters. Since the camera and object pose parameters can be acquired simultaneously, consistency in the rigid transformations among cameras and objects is obtained. These consistent results were used as the initial values for the non-linear optimization step for further refinement. It was shown that the proposed method can compensate the inaccuracy of the object and, consequently, decrease the manufacturing cost of the 3D reference object. The radial lens distortion was also calibrated simultaneously. It was also shown that the method of [4

4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

] is not suitable for sparsely distributed cameras when a 3D reference object is used for unsynchronized camera networks. Various performance analyses of the proposed method were also presented through the simulated and real image experiments.

Acknowledgments

This work was supported by the strategic technology development program of MCST/MKE/KEIT [KI001798, Development of Full 3D Reconstruction Technology for Broadcasting Communication Fusion]. The authors would also like to thank the anonymous reviewers for their valuable comments to improve the quality of the paper, especially for the suggestions and ideas about the analysis of the two rescaling methods and the noise modelling.

References and links

1.

R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd. ed. (Cambridge University Press, 2003).

2.

Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).

3.

T. Ueshiba and F. Tomita, “Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices,” in “Proc. IEEE International Conference on Computer Vision,” (Nice, France, 2003), pp. 966–973. [CrossRef]

4.

T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ. 14, 407–422 (2005). [CrossRef]

5.

G. Kurillo, Z. Li, and R. Bajcsy, “Wide-area external multi-camera calibration using vision graphs and virtual calibration object,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–9. [CrossRef]

6.

H. Medeiros, H. Iwaki, and J. Park, “Online distributed calibration of a large network of wireless cameras using dynamic clustering,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–10. [CrossRef]

7.

X. Chen, J. Davis, and P. Slusallek, “Wide area camera calibration using virtual calibration objects,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head, SC, USA (2000), pp. 520–527.

8.

S. N. Sinha and M. Pollefeys, “Camera network calibration and synchronization from silhouettes in archived video,” Int. J. Comput. Vision 87, 266–283 (2010). [CrossRef]

9.

E. Boyer, “On using silhouettes for camera calibration,” in “Proc. Asian Conference on Computer Vision,” Hyderabad, India (2006), pp. 1–10.

10.

J. Kassebaum, N. Bulusu, and W.-C. Feng, “3-d target-based distributed smart camera network localization,” IEEE Trans. Image Process. 19, 2530–2539 (2010). [CrossRef] [PubMed]

11.

C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization method,” Int. J. Comput. Vision 9, 137–154 (1992). [CrossRef]

12.

P. Sturm and B. Triggs, “A factorization based algorithm for multi-image projective structure and motion,” in “Proc. European Conference on Computer Vision,” Cambridge, UK (1996), pp. 709–720.

13.

D. Jacobs, “Linear fitting with missing data: Applications to structure from motion and to characterizing intensity images,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” San Juan, Puerto Rico (1997), pp. 206–212.

14.

D. Martinec and T. Pajdla, “Structure from many perspective images with occlusions,” in “Proc. European Conference on Computer Vision,” Copenhagen, Denmark, (2002), pp. 355–369.

15.

P. Sturm, “Algorithms for plane-based pose estimation,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head Island, SC, USA, (2000), pp. 706–711.

16.

M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 194–207 (2005). [CrossRef] [PubMed]

17.

O. Faugeras, Three-Dimensional Computer Vision (The MIT Press, 1993).

18.

K. S. Arun, T. S. Huang, and S. D. Blosten, “Least-squares fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 698–700 (1987). [CrossRef]

19.

A. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least square fitting of ellipses,” IEEE Trans. Pattern Anal. Mach. Intell. 21, 476–480 (1999). [CrossRef]

20.

R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” Sendai, Japan (1995), pp. 1064–1070.

OCIS Codes
(150.0155) Machine vision : Machine vision optics
(150.1135) Machine vision : Algorithms
(150.1488) Machine vision : Calibration

ToC Category:
Machine Vision

History
Original Manuscript: August 10, 2012
Revised Manuscript: September 27, 2012
Manuscript Accepted: October 2, 2012
Published: October 22, 2012

Citation
Jae-Hean Kim and Bon-Ki Koo, "Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object," Opt. Express 20, 25292-25310 (2012)
http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-20-23-25292


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd. ed. (Cambridge University Press, 2003).
  2. Z. Zhang, “A flexible new technique for camera calibration,” Tech. Rep. MSR-TR-98-71, Microsoft Corporation (1998).
  3. T. Ueshiba and F. Tomita, “Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices,” in “Proc. IEEE International Conference on Computer Vision,” (Nice, France, 2003), pp. 966–973. [CrossRef]
  4. T. Svoboda, D. Martinec, and T. Pajdla, “A convenient multi-camera selfcalibration for virtual environments,” Presence: Teleop. Virt. Environ.14, 407–422 (2005). [CrossRef]
  5. G. Kurillo, Z. Li, and R. Bajcsy, “Wide-area external multi-camera calibration using vision graphs and virtual calibration object,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–9. [CrossRef]
  6. H. Medeiros, H. Iwaki, and J. Park, “Online distributed calibration of a large network of wireless cameras using dynamic clustering,” in “Proc. ACM/IEEE International Conference on Distributed Smart Cameras,” (2008), pp. 1–10. [CrossRef]
  7. X. Chen, J. Davis, and P. Slusallek, “Wide area camera calibration using virtual calibration objects,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head, SC, USA (2000), pp. 520–527.
  8. S. N. Sinha and M. Pollefeys, “Camera network calibration and synchronization from silhouettes in archived video,” Int. J. Comput. Vision87, 266–283 (2010). [CrossRef]
  9. E. Boyer, “On using silhouettes for camera calibration,” in “Proc. Asian Conference on Computer Vision,” Hyderabad, India (2006), pp. 1–10.
  10. J. Kassebaum, N. Bulusu, and W.-C. Feng, “3-d target-based distributed smart camera network localization,” IEEE Trans. Image Process.19, 2530–2539 (2010). [CrossRef] [PubMed]
  11. C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization method,” Int. J. Comput. Vision9, 137–154 (1992). [CrossRef]
  12. P. Sturm and B. Triggs, “A factorization based algorithm for multi-image projective structure and motion,” in “Proc. European Conference on Computer Vision,” Cambridge, UK (1996), pp. 709–720.
  13. D. Jacobs, “Linear fitting with missing data: Applications to structure from motion and to characterizing intensity images,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” San Juan, Puerto Rico (1997), pp. 206–212.
  14. D. Martinec and T. Pajdla, “Structure from many perspective images with occlusions,” in “Proc. European Conference on Computer Vision,” Copenhagen, Denmark, (2002), pp. 355–369.
  15. P. Sturm, “Algorithms for plane-based pose estimation,” in “Proc. IEEE International Conference on Computer Vision and Pattern Recognition,” Hilton Head Island, SC, USA, (2000), pp. 706–711.
  16. M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell.27, 194–207 (2005). [CrossRef] [PubMed]
  17. O. Faugeras, Three-Dimensional Computer Vision (The MIT Press, 1993).
  18. K. S. Arun, T. S. Huang, and S. D. Blosten, “Least-squares fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Mach. Intell.PAMI-9, 698–700 (1987). [CrossRef]
  19. A. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least square fitting of ellipses,” IEEE Trans. Pattern Anal. Mach. Intell.21, 476–480 (1999). [CrossRef]
  20. R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” Sendai, Japan (1995), pp. 1064–1070.

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.


« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited