You are a creative genius

Visual Intelligence

五月 7, 1999

In 1966, Richard Gregory wrote his influential monograph Eye and Brain , and followed it four years later with The Intelligent Eye . Before turning to Donald Hoffman's book, it is worth recalling them. Gregory's immense, and lasting, achievement is that his books have much to offer to a diverse set of readers: brain researchers, whether perceptual psychologists or physiologists; students coming to brain research for the first time; and the intelligent, general reader interested in science. The books can be commended both on form and content. Gregory's forte has always been to devise and communicate perceptual phenomena, and his books bristle with visual illusions and perceptual demonstrations. However, Gregory's purpose is ultimately scientific: to convince the reader of the correctness of Hermann von Helmholtz's dictum that perception relies profoundly on "unconscious inferences" about what is depicted in an image (or set of images), combined with the idea that since uncertainty is ubiquitous in image processing and interpretation, the brain's inferences are probabilistic.

Consider one of the most fundamental ambiguities in vision, namely its projective nature: a single, monocular image could, as Helmholtz observed, have resulted from an infinity of scenes, since the scene point corresponding to any image point could, in principle, be displaced arbitrarily far from the camera along the corresponding optic ray. Gregory argues that illusions such as those of M. Ponzo and F. C. Muller-Lyer were best understood if they were interpreted as flat (pinhole) projections of three-dimensional scenes. Of course, human vision did not evolve for the interpretation of single, monocular images; but Gregory argues that the same process underlies vision generally.

In one sense, Hoffman brings The Intelligent Eye up to date with a great deal of recent research. He stresses throughout that "vision is not merely a matter of passive perception, it is an intelligent process of active construction - you are a creative genius". As regards form, the book is quite beautiful to look at. The vast majority of the pages have images that together tell much of the story. Some of the images, for example the coloured subjective contours in chapter five, are simply stunning. The book stems from a set of lectures at the University of California, Irvine, and this perhaps explains the breathless, bouncy, informal style. Chapter two (on the perception of line drawings) begins with the insight that "surgery without anaesthesia is no fun", while chapter three (on subjective contours) warns us that "carbon monoxide poisoning is a good thing to avoid". Compared with the relatively sophisticated reader that Gregory aimed at, Hoffman is careful to joke with his reader and is intent on not over-taxing the reader's intelligence. I found the style grating; but if it captures the interests and fires the imaginations of students, who am I to judge?

Since it takes account of recent experimental findings, Hoffman's book is more comprehensive than Gregory's, particularly on subjective contours, object recognition and the perception of motion. It is also considerably less detailed on several topics: there is next to nothing about the trichromatic theory of colour, about stereo vision or about the neurophysiological basis of vision. Overall, there is less scientific depth than in Gregory, though this may not be considered significant in an introductory book.

The book opens with an introduction to the idea that the reader is a creative genius by discussing subjective contours, the devil's triangle, and Nico Tinbergen's demonstration that simple shapes are interpreted as "mom" by blackbird nestlings (who presumably imagined they were in a tree, not in a laboratory). More significantly, chapter one also marks the scientific difference between Hoffman's account and those of Gregory. Instead of emphasising unconscious (and uncertain) inferences, Hoffman argues that vision is based on a set of "universal rules", which, he suggests, parallel those proposed for language by, for example, Noam Chomsky and Steven Pinker. This view is developed in the second chapter, which discusses the interpretation of sketches typified by the Necker cube; but extended to his own work on the interpretation of curved surfaces.

Hoffman agrees with Helmholtz that the fundamental problem is seeing depth, and to this end he proposes ten rules that aim to construct only "generic" views. Rule one is that a straight line in an image should be interpreted as a straight line in space, while rule two (general position) is that "if the tips of two lines coincide in an image, then always interpret them as coinciding in 3D". Later rules interpret drawings of curved surfaces, for example: "Rule six: where possible, interpret a curve in an image as the rim of a surface in 3D." Later chapters add others, about colour, motion and object recognition, until the author has finally amassed 35 rules.

This "big bag of rules" account of vision may be pedagogically appealing, but it does not constitute a theory of vision in any reasonable sense. First, enough illustrations are given to reinforce the point that the "rules" are, in fact, occasionally broken. Consider rule six from the previous paragraph: in what sense is it, in fact, a "rule", since it is never made clear what constitutes "where possible". It is clear that image curves are often not interpreted as the bounding rim of a surface; there are many image curves that are interpreted as reflectance boundaries lying on a 3D surface, and in textured scenes they predominate. Furthermore, what are we to do if the rule does not apply? Do we simply remove it from the bag, or is there a weaker form that we can fall back on? Worse, for numerous examples presented in the book two or more of the "rules" are evidently in conflict; but we are given no guidance about how the visual system might resolve such conflicts. In short, no mechanism is offered for interpreting, executing, generalising or applying the rules.

Hoffman is an accomplished scientist with a good grasp of modern geometry and modern computer vision. It is possible that he would object that any such mechanism would inevitably be mathematically or computationally demanding, and would therefore be beyond the understanding of his audience. However, there is not even an acknowledgement of the need for such a mechanism anywhere in the book, and for me this greatly diminishes the contribution of the book as a modern account of vision.

Michael Brady is professor of information engineering, University of Oxford.

Visual Intelligence: How We Create What We See

Author - Donald D. Hoffman
ISBN - 0 393 04669 9
Publisher - Norton
Price - £21.00
Pages - 294

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
注册
Please 登录 or 注册 to read this article.