About AnswerBus Question Answering System

AnswerBus is an open-domain question answering system based on sentence level information retrieval. It accepts users' natural-language questions in English, German, French, Spanish, Italian and Portuguese and extracts possible answers from the Web. It can respond to users' questions within several seconds. Five search engines and directories (Google, Yahoo, WiseNut, AltaVista, and Yahoo News) are used to retrieve Web pages that potentially contain answers. From the Web pages, AnswerBus extracts sentences that are determined to contain answers. The current rate of correct answers to TREC-8's 200 questions is 70.5%. AnswerBus demonstrates that practical question answering on the Web is highly feasible.


Figure 1 Working process of AnswerBus
Figure 1 describes the working process of AnswerBus. AnswerBus takes a user question in natural language. A simple language recognition module will determine whether the question is in English, or any of the other five languages. If the question language is not English, AnswerBus will send the original question and language information about the question to AltaVista's translation tool Babel Fish , and obtain the question that has been translated into English.

The rest of the process is comprised of mainly four steps: 1) select two or three search engines among five for information retrieval and form search engine specific queries based on the question; 2) contact the search engines and retrieve documents referred at the top of the hit lists; 3) extract sentences that potentially contain answers from the documents; 4) rank the answers and return the top choices with contextual URL links to the user. Instead of returning a snippet of fixed length text, AnswerBus return sentences as answers, thus provide users with some contextual information for the answers.

Relevant Documents Retrieval

AnswerBus aims to retrieve enough relevant documents from search engines within a response time that is acceptable to users. The main tasks at this stage is to select one or more appropriate search engines for a specific user question, and then form queries that are tailored to the question as well as the selected search engines. The formation of the queries is an essential procedure because it can largely influence the recall and accuracy of question answering and the speed of the system operation.

The main approaches adopted in the process of query formation include

Candidate Answer Extraction

At this stage, AnswerBus downloads and processes the documents referred at the top of search results returned by different search engines. It first parses the documents into sentences and then determines whether a sentence is an answer candidate through a process of word matching.

The sentence segmentation tool in AnswerBus is designed to process complicated Web documents. In addition to deleting HTML tags, it excludes non-contextual content; regards some special HTML tags as sentence boundary indications; and takes different formatting exceptions into consideration.

In order to determine whether a retrieved sentence is potentially an answer to the question, AnswerBus classifies all words in the original question or sentences in retrieved documents into two categories: matching words and non-matching words. All words that are used to form the search engine specific query are matching words. The rest are non-matching words.

The following formula is used to filter retrieved sentences.

In this formula, q is the number of matching words in the sentence; Q is the total number of matching words in the question. For example, if a query contains three words, then an answer candidate sentence should have at least two of them. When a sentence meets the condition as indicated by the above formula, it will receive a primary score based on the number of matching words it contains. Otherwise, it will receive a score of "0."

Answer Ranking

After the extraction of answer candidate sentences, each sentence has received a primary score. Those sentences with a score of "0" are dropped. Nevertheless, the primary scores are not robust enough for the judgment whether a sentence is a real answer. AnswerBus uses several techniques to refine the primary scores, including the determination of question type, use of a QA specific dictionary, named entities extraction, coreference resolution, and redundancy deletion. The final score that is used to determine the rank of an answer is a combination of the primary score and the influence of all the different factors.

Evaluation

TREC 8's 200 questions are used to evaluate AnswerBus's question answering performance. Top five answers to each question are evaluated manually to determine how many of the 200 questions AnswerBus can answer correctly. AnswerBus' answers are first compared to answer keys provided by TREC. Since AnswerBus is based on the Web, which is different from the large corpus of newswires on which the TREC questions are based, several answers that are different from answer keys are also judged as being right. Sentences are also examined to ensure contextually correct answers are located.
AnswerBus Home | About AnswerBus | FAQ | Bibliography | Courses and Slides | QA Systems on the Web | Web Testimonials
© 2001-2010 AnswerBus