Research collaboration will make computers sound more human

September 24, 1999

Speech and language processing researchers are benefiting by comparing notes. They meet at the Royal Society this week, writes Wendy Barnaby.

If you live in the United States, you can buy and sell shares over the phone. Your dialogue will be with a computer, not a person: a computer programmed to accept your inputs and speak appropriate responses.

If the stock market bores you and your passion is researching how television news presents science, you might like to run videos of last year's news bulletins through a computer database that will spot key words and print out a record showing when and how they were used.

This is speech processing, and it has historically used very different methods from the additional language processing needed for conversation about shares.

An international meeting at the Royal Society this week is aiming to build on a recent trend: to show researchers in each area how their work can benefit from using each other's methods.

The main driver for speech processing has been the US Defense Advanced Research Projects Agency, which over the decades has spent billions on it for use in intelligence - monitoring phone calls, for example.

Practical applications of speech and language-processing systems are further advanced in the US and continental Europe than in the UK.

"You can organise rail travel in Holland and Germany in this way," says one of the speakers at the meeting, Stephen Pulman, of the computer laboratory at the University of Cambridge.

"These countries have large-scale public funding of science. With some kinds of banking systems in the UK you're dealing with speech recognition systems, but the applications are very simple and they give the caller no initiative at all. That's because we're not able to do the more complicated things sufficiently reliably."

Language-processing systems are being used for translating texts from one language to another. One of the organisers of the meeting, Gerald Gazdar of the University of Sussex, is working on the structure of the lexicon: the computer dictionary needed for such a system.

At the moment they are used for translating user manuals sold with, for example, copiers, and the pharmaceutical leaflets found in packets of pills. They do a workman-like job, but the prose is not elegant. "It'll be 100 years before computers produce stylish prose," says Professor Gazdar.

The train-booking and share-buying systems use both speech and language processing, but until about ten years ago, the speech and language communities laboured away in ignorance of each other's efforts.

"Language processing is a top-down way of working that has been driven by people writing rule-sets," explains another of the meeting's organisers, Karen Spack Jones of the computer laboratory at the University of Cambridge.

The rules have been about grammar and meaning, and are developed using knowledge of language and a subject area to enable a computer to process correctly new texts in the same area.

The speech community, on the other hand, has taken a bottom-up approach: looking at what people actually say and deriving information from it on a statistical basis.

Before a closer association began to develop, the speech-processing engineers and statisticians did not see the importance of rules, while the language-processing linguists regarded statistics as a limited tool. In recent years, however, each group has begun to see how useful the other's methods could be.

"Statistics can help language processing," says Dr Pulman. "It can guide the process of developing rules. If you have access to a large body of data that you can analyse statistically, you can make sure that your rules will cover as much of it as possible, so you can make the rules broader than they would otherwise be."

And the speech community? "They are asking how they can enrich their statistical analyses with rules, for instance of syntax," says Dr Spack Jones.

"They're thinking more dynamically about what things they might use their statistically derived information for and how they might characterise it so they can roll it into rules."

Professor Gazdar is optimistic about collaboration between the two communities. "There's now much more prospect for the dividing line between speech and language processing to cease to be a line and to become more of a fuzzy merged area."

At the moment, speech is fed into and out of a natural language-processing system in the form of typed text. But text is not a good representation of speech: it contains no information about intonation, for example, and this limitation produces the characteristically robotic sound of generated speech.

If the two research areas merge, this text interface would disappear and speech would be generated from a richer information base. Then Stephen Hawking would sound more like a human being.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.


Featured jobs