OSA's Digital Library

Journal of the Optical Society of America A

Journal of the Optical Society of America A

| OPTICS, IMAGE SCIENCE, AND VISION

  • Vol. 20, Iss. 7 — Jul. 1, 2003
  • pp: 1407–1418

Modeling global scene factors in attention

Antonio Torralba  »View Author Affiliations


JOSA A, Vol. 20, Issue 7, pp. 1407-1418 (2003)
http://dx.doi.org/10.1364/JOSAA.20.001407


View Full Text Article

Enhanced HTML    Acrobat PDF (1614 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

Models of visual attention have focused predominantly on bottom-up approaches that ignored structured contextual and scene information. I propose a model of contextual cueing for attention guidance based on the global scene configuration. It is shown that the statistics of low-level features across the whole image can be used to prime the presence or absence of objects in the scene and to predict their location, scale, and appearance before exploring the image. In this scheme, visual context information can become available early in the visual processing chain, which allows modulation of the saliency of image regions and provides an efficient shortcut for object detection and recognition.

© 2003 Optical Society of America

OCIS Codes
(100.5010) Image processing : Pattern recognition
(330.0330) Vision, color, and visual optics : Vision, color, and visual optics
(330.4060) Vision, color, and visual optics : Vision modeling

History
Original Manuscript: October 1, 2002
Revised Manuscript: February 12, 2003
Manuscript Accepted: February 12, 2003
Published: July 1, 2003

Citation
Antonio Torralba, "Modeling global scene factors in attention," J. Opt. Soc. Am. A 20, 1407-1418 (2003)
http://www.opticsinfobase.org/josaa/abstract.cfm?URI=josaa-20-7-1407


Sort:  Author  |  Year  |  Journal  |  Reset  

References

  1. L. Itti, C. Koch, E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998). [CrossRef]
  2. T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention,” Int. J. Comput. Vision 11, 283–318 (1993). [CrossRef]
  3. A. Treisman, G. Gelade, “A feature integration theory of attention,” Cogn. Psychol. 12, 97–136 (1980). [CrossRef] [PubMed]
  4. A. Shashua, S. Ullman, “Structural saliency: the detection of globally salient structures using a locally connected network,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 1988), pp. 321–327.
  5. J. M. Wolfe, “Guided search 2.0. A revised model of visual search,” Psychon. Bull. Rev. 1, 202–228 (1994). [CrossRef] [PubMed]
  6. R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.
  7. B. Moghaddam, A. Pentland, “Probabilistic visual learning for object representation,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 696–710 (1997). [CrossRef]
  8. A. L. Yarbus, Eye Movements and Vision (Plenum, New York, 1967).
  9. I. Biederman, R. J. Mezzanotte, J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982). [CrossRef] [PubMed]
  10. S. E. Palmer, “The effects of contextual scenes on the identification of objects,” Memory Cognit. 3, 519–526 (1975). [CrossRef]
  11. R. A. Rensink, J. K. O’Regan, J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997). [CrossRef]
  12. R. A. Rensink, “The dynamic representation of scenes,” Visual Cogn. 7, 17–42 (2000). [CrossRef]
  13. J. M. Henderson, A. Hollingworth, “High-level scene perception,” Annu. Rev. Psychol. 50, 243–271 (1999). [CrossRef] [PubMed]
  14. P. De Graef, D. Christiaens, G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990). [CrossRef] [PubMed]
  15. M. M. Chun, Y. Jiang, “Contextual cueing: implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol. 36, 28–71 (1998). [CrossRef] [PubMed]
  16. H. Arsenio, A. Oliva, J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002). [CrossRef]
  17. P. G. Schyns, A. Oliva, “From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition,” Psychol. Sci. 5, 195–200 (1994). [CrossRef]
  18. S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996). [CrossRef] [PubMed]
  19. M. C. Potter, E. I. Levy, “Recognition memory for a rapid sequence of pictures,” J. Exp. Psychol. 81, 10–15 (1969). [CrossRef] [PubMed]
  20. M. C. Potter, “Meaning in visual search,” Science 187, 965–966 (1975). [CrossRef] [PubMed]
  21. T. Sanocki, W. Epstein, “Priming spatial layout of scenes,” Psychol. Sci. 8, 374–378 (1997). [CrossRef]
  22. A. Oliva, P. G. Schyns, “Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli,” Cogn. Psychol. 34, 72–107 (1997). [CrossRef] [PubMed]
  23. A. Oliva, P. G. Schyns, “Diagnostic color blobs mediate scene recognition,” Cogn. Psychol. 41, 176–210 (2000). [CrossRef] [PubMed]
  24. A. Oliva, A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” Int. J. Comput. Vision 42, 145–175 (2001). [CrossRef]
  25. D. Noton, L. W. Stark, “Scanpaths in eye movements during pattern perception,” Science 171, 308–311 (1971). [CrossRef] [PubMed]
  26. D. A. Chernyak, L. W. Stark, “Top-down guided eye movements,” IEEE Trans. Syst. Man Cybern. 31, 514–522 (2001). [CrossRef]
  27. T. M. Strat, M. A. Fischler, “Context-based vision: recognizing objects using information from both 2-D and 3-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell. 13, 1050–1065 (1991). [CrossRef]
  28. J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995). [CrossRef]
  29. A. Torralba, P. Sinha, “Statistical context priming for object detection: scale selection and focus of attention,” in Proceedings of the International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 763–770.
  30. A. Torralba, “Contextual modulation of target saliency,” in Advances in Neural Information Processing Systems, T. G. Dietterich, S. Becker, Z. Ghahramani, eds. (MIT Press, Cambridge, Mass., 2002), Vol. 14, pp. 1303–1310.
  31. C. Koch, S. Ullman, “Shifts in visual attention: towards the underlying circuitry,” Hum. Neurobiol. 4, 219–227 (1985).
  32. D. Parkhurst, K. Law, E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002). [CrossRef] [PubMed]
  33. J. M. Wolfe, “Visual search,” in Attention, H. Pashler, ed. (University College London Press, London, 1998).
  34. D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A 4, 2379–2394 (1987). [CrossRef] [PubMed]
  35. B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature 381, 607–609 (1996). [CrossRef] [PubMed]
  36. B. Schiele, J. L. Crowley, “Recognition without correspondence using multidimensional receptive field histograms,” Int. J. Comput. Vision 36, 31–50 (2000). [CrossRef]
  37. C. Carson, S. Belongie, H. Greenspan, J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.
  38. M. M. Gorkani, R. W. Picard, “Texture orientation for sorting photos at a glance,” in Proceedings of the IEEE International Conference on Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 1994), Vol. 1, pp. 459–464.
  39. M. Szummer, R. W. Picard, “Indoor-outdoor image classification,” in Proceedings of the IEEE International Workshop on Content-Based Access of Image and Video Databases (IEEE Computer Society Press, Los Alamitos, Calif., 1998), pp. 42–51.
  40. A. Jepson, W. Richards, D. Knill, “Modal structures and reliable inference,” in Perception as Bayesian Inference, D. Knill, W. Richards eds. (Cambridge U. Press, Cambridge, UK, 1996), pp. 63–92.
  41. A. Treisman, “Properties, parts and objects,” in Handbook of Human Perception and Performance, K. R. Boff, L. Kaufman, J. P. Thomas, eds. (Wiley, New York, 1986), pp. 35.1–35.70.
  42. B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.
  43. P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision andPattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 511–518.
  44. S. Ullman, M. Vidal-Naquet, E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002). [PubMed]
  45. M. Riesenhuber, T. Poggio, “Hierarchical models of object recognition in cortex,” Nat. Neurosci. 2, 1019–1025 (1999). [CrossRef] [PubMed]
  46. S. Edelman, “Computational theories of object recognition,” Trends Cogn. Sci. 1, 296–304 (1997). [CrossRef] [PubMed]
  47. A. Torralba, A. Oliva, “Depth perception from familiar structure,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1226–1238 (2002). [CrossRef]
  48. M. P. Eckstein, J. S. Whiting, “Visual signal detection in structured backgrounds I. Effect of number of possible spatial locations and signal contrast,” J. Opt. Soc. Am. A13, 1777–1787 (1996). [CrossRef]
  49. M. Swain, D. Ballard, “Color indexing,” Int. J. Comput. Vision 7, 11–32 (1991). [CrossRef]
  50. R. Rosenholtz, “A simple saliency model predicts a number of motion popout phenomena,” Vision Res. 39, 3157–3163 (1999). [CrossRef]
  51. A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).
  52. N. Gershenfeld, The Nature of Mathematical Modeling (Cambridge U. Press, Cambridge, UK, 1999).
  53. M. I. Jordan, R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6, 181–214 (1994). [CrossRef]

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.


« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited