Title | Automatic assignment of biomedical categories: toward a generic approach. |
Publication Type | Journal Article |
Year of Publication | 2006 |
Authors | Ruch, P |
Journal | Bioinformatics (Oxford, England) |
Volume | 22 |
Issue | 6 |
Pagination | 658-64 |
Date Published | 2006 Mar 15 |
ISSN | 1367-4803 |
Keywords | Abstracting and Indexing as Topic, Algorithms, Artificial Intelligence, Documentation, MEDLINE, Natural Language Processing, Pattern Recognition, Automated, Periodicals as Topic, Proteins |
Abstract | MOTIVATION: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. METHODS: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. RESULTS AND CONCLUSION: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. |
DOI | 10.1093/bioinformatics/bti783 |
Alternate Journal | Bioinformatics |
PubMed ID | 16287934 |