AnswerBus is an open-domain question answering system based on sentence level information retrieval. It accepts users' natural-language questions in English, German, French, Spanish, Italian and Portuguese and extracts possible answers from the Web. It can respond to users' questions within several seconds. Five search engines and directories (Google, Yahoo, WiseNut, AltaVista, and Yahoo News) are used to retrieve Web pages that potentially contain answers. From the Web pages, AnswerBus extracts sentences that are determined to contain answers. The current rate of correct answers to TREC-8's 200 questions is 70.5%. AnswerBus demonstrates that practical question answering on the Web is highly feasible.
Figure 1 Working process of AnswerBus
The rest of the process is comprised of mainly four steps: 1) select two or three search engines among five for information retrieval and form search engine specific queries based on the question; 2) contact the search engines and retrieve documents referred at the top of the hit lists; 3) extract sentences that potentially contain answers from the documents; 4) rank the answers and return the top choices with contextual URL links to the user. Instead of returning a snippet of fixed length text, AnswerBus return sentences as answers, thus provide users with some contextual information for the answers.
The main approaches adopted in the process of query formation include
The sentence segmentation tool in AnswerBus is designed to process complicated Web documents. In addition to deleting HTML tags, it excludes non-contextual content; regards some special HTML tags as sentence boundary indications; and takes different formatting exceptions into consideration.
In order to determine whether a retrieved sentence is potentially an answer to the question, AnswerBus classifies all words in the original question or sentences in retrieved documents into two categories: matching words and non-matching words. All words that are used to form the search engine specific query are matching words. The rest are non-matching words.
The following formula is used to filter retrieved sentences.
In this formula, q is the number of matching words in the sentence; Q is the total number of matching words in the question. For example, if a query contains three words, then an answer candidate sentence should have at least two of them. When a sentence meets the condition as indicated by the above formula, it will receive a primary score based on the number of matching words it contains. Otherwise, it will receive a score of "0."
|AnswerBus Home | About AnswerBus | FAQ | Bibliography | Courses and Slides | QA Systems on the Web | Web Testimonials|