Synthesised speech is rarely convincing. In our series on young researchers, Katrina Wishart meets Iain Murray, who wants to put life into monotone electronic voices
"It's like standing on the edge of a large desert -Jyou know which way to go, you just don't have the material to make a detailed map." This is how Iain Murray describes his research into synthesised speech, the kind used in answering and information services - which usually feature a deep, unexpressive voice with an American accent. Impressive, but rarely convincing. What Murray and his colleagues at the University of Dundee are researching is the possibility of injecting emotion into such voices, putting meaning into the monotone.
As a final-year student in electronic engineering, Murray, now in his mid thirties, was all set for a career with a large electronics company until a lecturer offered students the possibility of conducting postgraduate research into emotion and electronically generated speech. When he started his research, there had been a great deal of work done on synthesised speech but very little into how to make such speech sound natural: speech expressing emotions was seen as "a complication". Murray's starting point was psychology books: he gathered information about emotion and about how a person's voice changes with, say, fear or happiness.
Murray says that basic features, such as the rise in tone and quicker pace of a voice expressing anger, "can be implemented quite readily". But although such additions make a synthetic voice more convincing, it is still devoid of "human quality". Murray admits, "We do not know enough about the human voice to produce a set of rules to mimic vocal emotions very realistically." This is what makes the research so challenging - "today's synthesisers are only just good enough to try adding emotional effects, but even if better synthesisers were available, we would need to deepen our knowledge about emotion before we could try to make better models of it."
Murray's synthesiser generates its speech entirely from rules in a computer program, but more common is the use of concatenative synthesisers,which record words then "glue" them together to create new phrases. The recorded speech does have a human quality but there are so many individual sounds that storing the information can cause difficulties - and there are audible "joins" between the sounds, which make the speech less natural.
Although Murray has problems finding funding for his research, he is certain that there is a market for it. Recently he has been working with BT, adding emotion to the company's Lureate synthesiser. This uses concatenative synthesis to produce a highly natural voice, with a British accent. The armed forces also support a lot of research into the psychology of speech. (Military communication often involves radio contact during which the ability to recognise stress or fear in a soldier's speech is vital. ) Sufferers with speech difficulties will benefit if Murray's research comes to fruition. At present they can type words into a computer that transforms them into synthesised speech. Such speech, however, cannot reflect the personality of its speaker - a young speech-impaired female, for example, may not want her voice to sound like that of an American male.Murray is experimenting with variations and is able to generate different types of voices, including that of a very convincing-sounding child.
To hear some of Iain Murray's computer-generated voices log on at his website: http:///alpha.mic.dundee.ac.uk/irmurray/hamlet.html