Precisely the program to hack through jungle Net

April 10, 1998

Helena Flusfeder reports on an Israeli Web search engine that should finally allow users to pinpoint the information they need

Researchers at the Technion-Israel Institute of Technology have developed a programmable software robot that explores web sites according to the user's precise instructions. It can follow links of a specified kind, search to the depth chosen, and even fill out forms to extract information from databases.

W3QS (World Wide Web Query System) is a search engine with its own programming language. It is able to look for structures or relationships between Web documents, rather than merely searching for words or phrases like conventional search engines. In skilled hands it can avoid the multitude of irrelevant hits that are usually returned by search engines such as Infoseek, Lycos and Excite. Developed by Oded Shmueli, a professor in the Technion's computer science department, and PhD student David Konopnicki, the search tool is an advanced prototype. Users can perform "simple", "power" or "advanced" searches. In a simple search you fill out a form which includes the URL (address) of the site required, whether images are required, the string of characters to be searched for, and the depth to which hypertext links are to be pursued. If you do not already have a URL, a "power" search will help you to find the site or sites you want. When on the site, W3QS can be told to search specifically for Latex, HTML, PostScript or image files.

The "advanced" search option lets the skilled user deploy the full power of the W3QL query language.

"The current search engines are limited because the structural information, namely, the organisation of the document into parts pointing to each other, is usually lost," says Professor Shmueli. Most search engines today use "robot" programs which scan the Web periodically and create text-based indices. "Those searches are limited by the particular kind of textual analysis provided by the search service, and the depth of site exploration," Professor Shmueli says. "In particular, robots do not fill out forms themselves, as the number of possibilities is enormous, so they miss interesting avenues that humans might follow."

Professor Shmueli gives an example of the targetted searching that is possible with W3QS: "You're looking for the actual texts of scientific papers written by Drs Smith and Jones, who work in the computer science department at West University. With conventional searches, you might type in the words 'Smith', 'Jones', 'West' and 'University'. A basic Infoseek or AltaVista search could bring up dozens of sites that contain those words. But the results could very likely include a document that discusses the Smith building at Jones University on West Street, or Michael Jones's analysis of Jay Smith's university architecture in the West in the 1800s."

Professor Shmueli claims that if W3QS is used to look for these papers, the user could ask the search engine to go first to the computer science department at West University, and then follow hypertext links, while looking for documents that contain "Smith" and "Jones" in some specified place, together with a link to a file containing the text of a specific paper in, say, PostScript format. W3QS would then fetch that actual file and email it to the user. Conventional search robots turn back when they encounter a Web page with a fill-in-the-blanks form. W3QS can carry right on. "When you write a query, you can specify how the form should be filled out. The system keeps a database of forms which each user has encountered. When a similar form is encountered, it is filled in like the previous forms," Professor Shmueli says.

When W3QS encounters an unfamiliar form it asks the user to complete it manually. Currently, you can only do this if you access the system from an X terminal like those on the Technion site but Professor Shmueli wants this feature to be available to PC users. Anyone with a Java-enabled browser can try the other features at il/W3QS/ "It's the first project to relate to the Web as a big graph," says Yakov Kogan, a master's degree student at the Hebrew University of Jerusalem, as he demonstrates the program.

"Using W3QS I can start from the exact point and get exact information. The regular search tools relate to the Web as a big set of unrelated documents. The user can't utilise the connections between documents."

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.