OSA's Digital Library

Optics Express

Optics Express

  • Editor: C. Martijn de Sterke
  • Vol. 16, Iss. 3 — Feb. 4, 2008
  • pp: 1448–1459

A practical algorithm for learning scene information from monocular video

Lin Zhu, Jie Zhou, Jingyan Song, Zhenlei Yan, and Quanquan Gu  »View Author Affiliations

Optics Express, Vol. 16, Issue 3, pp. 1448-1459 (2008)

View Full Text Article

Enhanced HTML    Acrobat PDF (646 KB)

Browse Journals / Lookup Meetings

Browse by Journal and Year


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools



The estimate of the scene information, such as the region of ground/non-ground, the relative depth of the ground and the unevenness of ground, is important for applications such as video surveillance, mapbuilding and etc. Previous research in this field is based on specific assumptions which are difficult to satisfy in practical situations. In this paper a practical algorithm is proposed to estimate the scene information in monocular video. With the pedestrian detection results for a period of time, the Pedestrian-Scene Map (PS Map), consisting of the average width of a pedestrian and occurrence probability of a pedestrian at each position of the scene, is learned by integrating the pedestrian samples with different sizes at different positions of the scene. Furthermore, the relative depth of ground region, the ground/non-ground region and the unevenness of ground can be measured with PS Map. Experimental results illustrated the proposed method’s effectiveness with stationary uncalibrated camera for unconstrained environment.

© 2008 Optical Society of America

OCIS Codes
(100.2000) Image processing : Digital image processing
(100.2960) Image processing : Image analysis
(100.6890) Image processing : Three-dimensional image processing
(110.6880) Imaging systems : Three-dimensional image acquisition
(150.6910) Machine vision : Three-dimensional sensing
(150.6044) Machine vision : Smart cameras

ToC Category:
Image Processing

Original Manuscript: November 19, 2007
Revised Manuscript: January 2, 2008
Manuscript Accepted: January 17, 2008
Published: January 18, 2008

Lin Zhu, Jie Zhou, Jingyan Song, Zhenlei Yan, and Quanquan Gu, "A practical algorithm for learning scene information from monocular video," Opt. Express 16, 1448-1459 (2008)

Sort:  Year  |  Journal  |  Reset  


  1. D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," International Journal of Computer Vision 47(1-3), 7-42 (2002). [CrossRef]
  2. D. Forsyth and J. Ponce, in Computer Vision : A Modern Approach, vol. Prentice Hall (2003).
  3. R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, "Shape from shading: A survey," IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690-706 (1999). [CrossRef]
  4. A. Criminisi, I. Reid, and A. Zisserman, "Single view metrology," International Journal of Computer Vision 40, 123-148 (2000). [CrossRef]
  5. D. Hoiem, A. Efros, and M. Hebert, "Geometric Context from a Single Image," Proceedings of the IEEE International Conference on Computer Vision 2, 1284 -1291 (2005).
  6. D. Hoiem, A. Efros, and M. Hebert, "Putting Objects in Perspective," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137 - 2144 (2006).
  7. M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, "Statistical modeling and performance characterization of a real-time dual camera surveillance system," Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335-342 (2000). [CrossRef]
  8. S. G. Jeong and et al, "Real-Time Lane Detection for Autonomous Vehicle," IEEE International Symposium on Industrial Electronics Proceedings (ISIE 2001) pp. 1466-1471 (2001).
  9. N. Krahnstoever and P. R. S. Mendonca, "Bayesian autocalibration for surveillance," Proceedings of the IEEE International Conference on Computer Vision 2, 1858-1865 (2005).
  10. A. Saxena, S. H. Chung, and Y. N. Andrew, "3-D Depth Reconstruction from a Single Still Image," International Journal of Computer Vision 2007, http://ai.stanford.edu/ asaxena/learningdepth/.
  11. "Terminology relating to traveled Surface characteristics annual book of ASTM Standards," American society for testing and material(ASTM). (1999)
  12. "High Capacity Laser Profilograph," http://www.cedex.es/cec/documenti/survey.htm.
  13. S. Se and M. Brady, "Vision-based Detection of Stair-cases," Proceedings of Fourth Asian Conference on Computer Vision ACCV pp. 535-540 (2000).
  14. V. Nair and J. Clark, "An unsupervised, online learning framework for moving object detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317 - 324 (2004).
  15. Z. Zhou and M. li, "Tri-training: exploiting unlabeled data using three classifiers," IEEE Transactions on Knowledge and Data Engineering 17(11), 1529-1541 (2005). [CrossRef]
  16. P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511-518 (2001).
  17. W. H. Ittelson, "Size as a cue to distance: static localization," American Journal of Psychology 64, 54-67 (1951). [CrossRef] [PubMed]
  18. A. Yonas, L. Pettersen, and C. E. Granrud, "Infants’ sensitivity to familiar size as information for distance," Child Development 53, 1285-1290 (1982). [CrossRef] [PubMed]

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited