Michael Boyd reports on collaborative research efforts that should help to make searching the Web a more exact experience for users at all levels
A vital computing priority is to reduce the time and effort needed to find relevant information on the Internet. Leading the way in resolving this problem are a team of researchers in the computer science departments at the universities of Aberdeen, Cardiff and Liverpool who are collaborating with BT on the Knowledge Reuse and Fusion Transformation (Kraft) initiative.
The main difficulty they have to overcome is that the World Wide Web is composed of a huge number of different information sources which are currently searched in a very simplistic way. Kraft is intended not only to make finding information easier but to merge related information from multiple sources.
"Essentially what we are doing is pooling our expertise in knowledge-based systems (KBS) and widely distributed databases so we can eventually automate the extraction of knowledge from multiple online information sources as now found on the Internet," says the project leader, Peter Gray of the University of Aberdeen.
Kraft will use constraints, a computing technique widely used in industrial planning and scheduling programs. The possible solutions to a problem can be narrowed down by combining a number of constraints.
One database might be a collection of railway timetables while another has similar information about buses. Kraft would fuse the information so the user could plan a journey involving bus-rail connections.
"Essentially it is all about finding common data at the constraint level rather than the traditional search for the basic information itself," says Alex Gray at the University of Wales Cardiff.
"So in the example just given we are enriching the train timetable constraints with bus information and vice versa. Unfortunately this fusion is not easy to bring about and that is why we have a large grant to set up Kraft."
The project began in May last year with a workshop at the University of Wales. This established the design of the network which is now under construction. The other members of the team are Michael Shave at the University of Liverpool, and BT's Nader Azarmi.
Another data matching problem to be overcome is the fact that the same item is described in different ways even within a common language. Americans say sidewalk whereas the British know it as a pavement. Similar examples are legion in industry. So Kraft's architecture uses a new kind of distributed information base which includes ontologies or thesauri which map these differences as well as defining all the constraints and other factors needed to solve particular types of problem. Local KBS problem solvers and databases will also become Kraft network nodes.
The goal is to transform existing knowledge into more accessible forms. When an Internet user asks for information, Kraft's "mediator" programs will find relevant data sources and broker the routing of messages between the user's web browser, the data sources and other Kraft programs which have relevant knowledge. The Aberdeen researchers have responsibility for developing the mediators.
Cardiff, building on earlier research, will develop agents known as facilitators and wrappers. The facilitator translates requests from other languages into English, which the mediator understands. The wrapper takes care of the return process. If the required database was in Chinese, it would transform the retrieved information into English and then again, if necessary, into the user's native tongue.
The Cardiff group's expertise is based on its development of an information transformation support environment (ITSE). With Kraft a lot of language transformation will be required.
Each language translator program used to take a long time to build. Then the group developed a program in the Prolog language, which can analyse the syntax of a language and build up an extensive library of grammatical definitions.
Although translations might take a minute or so longer than the old method, the program had the distinct advantage that a totally new language could be absorbed in a matter of a week as opposed to six or seven months that were previously required.
Kraft's results will be applicable primarily to technical problems involving large shared knowledge sources, such as complex engineering designs or the investigation of protein structures using genome databases.