BiND: Biomedical Novelty Detection

Background hypothesis. In existing search engines, including the most advanced question-answering architectures, such as EAGLi, answers or documents are provided to the user some relevance-driven ranking criteria. However, providing a ranked list of documents or answers is not sufficient regarding novelty detection; therefore being able to decide whether the retrieved answer is new would provide a major improvement, which could impact the intrinsic definition of relevance. Aims. The present proposal extends the scope of our previous researches by adressing a single crucial challenge: we want to detect new facts out of the massively redundant observations reported in the litterature. Indeed, the quantity and the redundancy of the published information in life and health sciences is so enormous that it demands the development of search instruments likely to separate between well-known facts and new results, in particular to identify new molecular functions in genomics and proteomics. Methods. The problem of novelty detection is complicated by its intrinsic relationship with relevance in question-answering (QA) systems. Being able to identify new information is only of interest is the information shows some relevance for a given information request. The BiND project aims at extending the scope and methods of the EAGL project, which was focused on Question-Answering, by developing methods to separate between known facts, as available in curated molecular biology databases and previously unreported facts, i.e. really new discoveries. Progress toward our objectives will be measured using TREC genomics-like methodologies. Expected outcomes. These functionalities will be integrated in the existing online EAGL services for the benefit of biologists, experimentalists and database curators.

Starting_date: 
Tuesday, January 1, 2008
Type: 
Join Project
Status: 
In Progress