Brussels, 15 Sep 2004
'Until now, people have communicated by speech or signs, but the conversion has been done in the human brain. Now we want to do it in a machine,' This is how the coordinator of the CogViSys project, Hans-Hellmut Nagel, describes his work to develop a 'virtual commentator' capable of translating visual information into a textual description
The project began in 2001 and brings together research teams from Germany, France, Belgium the UK and Switzerland. It is funded under the information society technologies (IST) section of the Fifth Framework Programme (FP5).
Significant steps have already been made in the direction of computers imitating humans' ability to recognise and categorise. We already have digital cameras able to shoot videos, digital processors, and high-capacity storage media. Many computers are also able to recognise objects for quality control purposes in a manufacturing environment. Research in the field of cognitive vision - the processing of visual sensory information in order to act and react in a dynamic environment - is now turning increasingly towards more ambitious tasks that resemble human activities and skills more closely.
The potential for a 'virtual commentator', a computer that describes what it sees, are boundless, as illustrated by the range of applications that the CogViSys consortium has investigated - recognising and 'translating' American sign language into words; providing a textual description of traffic conditions using information from surveillance cameras; providing textual descriptions of situation comedy (sitcom) films by learning 'ritualised' interactions within a small group of humans; and learning descriptive representations for objects from videos, thus facilitating a machine search of large video libraries for the occurrence of particular persons, objects or spatio-temporal configurations of these.
'In essence, one could conceive a kind of 'picture-based Google', said Professor Nagel, referring to the video library search facility. 'The advantage of such approaches consists in not being forced to declare in detail what one is looking for (which would reduce redundancy of responses, but at the same time increase the miss rate because semantically irrelevant differences between pictures would exclude them from being reported.)'
Significant progress has been made with 'translating' US sign language, Professor Nagel told CORDIS News. In order to be successful, a machine would have to recognise around 95 per cent of a signing person's movements so that those using the system could communicate without having to interrupt each other too often, explained Professor Nagel. CogViSys has made good progress towards this goal, thanks in part to access to powerful computers.
Professor Nagel said that such technology would mean people increasingly perceiving their environment through a machine, and added that he would be interested in investigating further how this will influence perceptions.
Another potential application is as an observation and alert system for the elderly or infirm. A camera in each room of a house would watch movements and an algorithm would 'understand' the images - it would have time to become familiar with the inhabitant, their movements and the environment. If something out of the ordinary were to happen, an alert would be triggered. Under normal circumstances, however, the inhabitant would retain their privacy as there would be only a computer monitoring the images provided by the camera, and not a human being.
In order for any of these potential applications to function, a number of subgoals relating to conception must first be achieved, for example in the field of categorisation - the technology must be capable of not only recognising particular textures, objects or motions, but of recognising instantiations of classes thereof. 'It's difficult to communicate to people who want to know what they are getting for their money,' admitted Professor Nagel.
Professor Nagel is confident, however, that this is money well invested. The consortium has gained an understanding of the problems involved in developing a virtual commentator. He stops short of promising that the technology he has described will soon be available on the market - 'I have not said that we are there. I don't want to promise more than we can deliver. I have experienced the damage done by unfortunate formulations' - but he says that it is 'not inconceivable' that applications will be available soon.
In a ringing endorsement for the future European Research Council, Professor Nagel added: 'I can't really say when it will be available. You never know the good ideas of other people. That's why we do basic research - you never know how much a solution may be worth in the future.'
For further information, please consult the following web address: