OSA's Digital Library

Journal of the Optical Society of America A

Journal of the Optical Society of America A

| OPTICS, IMAGE SCIENCE, AND VISION

  • Vol. 20, Iss. 7 — Jul. 1, 2003
  • pp: 1407–1418

Modeling global scene factors in attention

Antonio Torralba  »View Author Affiliations


JOSA A, Vol. 20, Issue 7, pp. 1407-1418 (2003)
http://dx.doi.org/10.1364/JOSAA.20.001407


View Full Text Article

Acrobat PDF (1614 KB)





Browse Journals / Lookup Meetings

Browse by Journal and Year


   


Lookup Conference Papers

Close Browse Journals / Lookup Meetings

Article Tools

Share
Citations

Abstract

Models of visual attention have focused predominantly on bottom-up approaches that ignored structured contextual and scene information. I propose a model of contextual cueing for attention guidance based on the global scene configuration. It is shown that the statistics of low-level features across the whole image can be used to prime the presence or absence of objects in the scene and to predict their location, scale, and appearance before exploring the image. In this scheme, visual context information can become available early in the visual processing chain, which allows modulation of the saliency of image regions and provides an efficient shortcut for object detection and recognition.

© 2003 Optical Society of America

OCIS Codes
(100.5010) Image processing : Pattern recognition
(330.0330) Vision, color, and visual optics : Vision, color, and visual optics
(330.4060) Vision, color, and visual optics : Vision modeling

Citation
Antonio Torralba, "Modeling global scene factors in attention," J. Opt. Soc. Am. A 20, 1407-1418 (2003)
http://www.opticsinfobase.org/josaa/abstract.cfm?URI=josaa-20-7-1407


Sort:  Author  |  Year  |  Journal  |  Reset

References

  1. L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998).
  2. T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention,” Int. J. Comput. Vision 11, 283–318 (1993).
  3. A. Treisman and G. Gelade, “A feature integration theory of attention,” Cogn. Psychol. 12, 97–136 (1980).
  4. A. Shashua and S. Ullman, “Structural saliency: the detection of globally salient structures using a locally connected network,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 1988), pp. 321–327.
  5. J. M. Wolfe, “Guided search 2.0. A revised model of visual search,” Psychon. Bull. Rev. 1, 202–228 (1994).
  6. R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, and D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.
  7. B. Moghaddam and A. Pentland, “Probabilistic visual learning for object representation,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 696–710 (1997).
  8. A. L. Yarbus, Eye Movements and Vision (Plenum, New York, 1967).
  9. I. Biederman, R. J. Mezzanotte, and J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982).
  10. S. E. Palmer, “The effects of contextual scenes on the identification of objects,” Memory Cognit. 3, 519–526 (1975).
  11. R. A. Rensink, J. K. O’Regan, and J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997).
  12. R. A. Rensink, “The dynamic representation of scenes,” Visual Cogn. 7, 17–42 (2000).
  13. J. M. Henderson and A. Hollingworth, “High-level scene perception,” Annu. Rev. Psychol. 50, 243–271 (1999).
  14. P. De Graef, D. Christiaens, and G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990).
  15. M. M. Chun and Y. Jiang, “Contextual cueing: implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol. 36, 28–71 (1998).
  16. H. Arsenio, A. Oliva, and J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002).
  17. P. G. Schyns and A. Oliva, “From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition,” Psychol. Sci. 5, 195–200 (1994).
  18. S. Thorpe, D. Fize, and C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996).
  19. M. C. Potter and E. I. Levy, “Recognition memory for a rapid sequence of pictures,” J. Exp. Psychol. 81, 10–15 (1969).
  20. M. C. Potter, “Meaning in visual search,” Science 187, 965–966 (1975).
  21. T. Sanocki and W. Epstein, “Priming spatial layout of scenes,” Psychol. Sci. 8, 374–378 (1997).
  22. A. Oliva and P. G. Schyns, “Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli,” Cogn. Psychol. 34, 72–107 (1997).
  23. A. Oliva and P. G. Schyns, “Diagnostic color blobs mediate scene recognition,” Cogn. Psychol. 41, 176–210 (2000).
  24. A. Oliva and A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” Int. J. Comput. Vision 42, 145–175 (2001).
  25. D. Noton and L. W. Stark, “Scanpaths in eye movements during pattern perception,” Science 171, 308–311 (1971).
  26. D. A. Chernyak and L. W. Stark, “Top-down guided eye movements,” IEEE Trans. Syst. Man Cybern. 31, 514–522 (2001).
  27. T. M. Strat and M. A. Fischler, “Context-based vision: recognizing objects using information from both 2-D and 3-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell. 13, 1050–1065 (1991).
  28. J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, and F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
  29. A. Torralba and P. Sinha, “Statistical context priming for object detection: scale selection and focus of attention,” in Proceedings of the International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 763–770.
  30. A. Torralba, “Contextual modulation of target saliency,” in Advances in Neural Information Processing Systems, T. G. Dietterich, S. Becker, and Z. Ghahramani, eds. (MIT Press, Cambridge, Mass., 2002), Vol. 14, pp. 1303–1310.
  31. C. Koch and S. Ullman, “Shifts in visual attention: towards the underlying circuitry,” Hum. Neurobiol. 4, 219–227 (1985).
  32. D. Parkhurst, K. Law, and E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002).
  33. J. M. Wolfe, “Visual search,” in Attention, H. Pashler, ed. (University College London Press, London, 1998).
  34. D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A 4, 2379–2394 (1987).
  35. B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature 381, 607–609 (1996).
  36. B. Schiele and J. L. Crowley, “Recognition without correspondence using multidimensional receptive field histograms,” Int. J. Comput. Vision 36, 31–50 (2000).
  37. C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.
  38. M. M. Gorkani and R. W. Picard, “Texture orientation for sorting photos at a glance,” in Proceedings of the IEEE International Conference on Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 1994), Vol. 1, pp. 459–464.
  39. M. Szummer and R. W. Picard, “Indoor-outdoor image classification,” in Proceedings of the IEEE International Workshop on Content-Based Access of Image and Video Databases (IEEE Computer Society Press, Los Alamitos, Calif., 1998), pp. 42–51.
  40. A. Jepson, W. Richards, and D. Knill, “Modal structures and reliable inference,” in Perception as Bayesian Inference, D. Knill and W. Richards eds. (Cambridge U. Press, Cambridge, UK, 1996), pp. 63–92.
  41. A. Treisman, “Properties, parts and objects,” in Handbook of Human Perception and Performance, K. R. Boff, L. Kaufman, and J. P. Thomas, eds. (Wiley, New York, 1986), pp. 35.1–35.70.
  42. B. Heisele, T. Serre, S. Mukherjee, and T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.
  43. P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision andPattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 511–518.
  44. S. Ullman, M. Vidal-Naquet, and E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002).
  45. M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nat. Neurosci. 2, 1019–1025 (1999).
  46. S. Edelman, “Computational theories of object recognition,” Trends Cogn. Sci. 1, 296–304 (1997).
  47. A. Torralba and A. Oliva, “Depth perception from familiar structure,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1226–1238 (2002).
  48. M. P. Eckstein and J. S. Whiting, “Visual signal detection in structured backgrounds I. Effect of number of possible spatial locations and signal contrast,” J. Opt. Soc. Am. A13, 1777–1787 (1996).
  49. M. Swain and D. Ballard, “Color indexing,” Int. J. Comput. Vision 7, 11–32 (1991).
  50. R. Rosenholtz, “A simple saliency model predicts a number of motion popout phenomena,” Vision Res. 39, 3157–3163 (1999).
  51. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).
  52. N. Gershenfeld, The Nature of Mathematical Modeling (Cambridge U. Press, Cambridge, UK, 1999).
  53. M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6, 181–214 (1994).

Cited By

Alert me when this paper is cited

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.


« Previous Article  |  Next Article »

OSA is a member of CrossRef.

CrossCheck Deposited