Text-Mining Services of the Swiss Variant Interpretation Platform for Oncology.

TitleText-Mining Services of the Swiss Variant Interpretation Platform for Oncology.
Publication TypeJournal Article
Year of Publication2020
AuthorsCaucheteur, D, Gobeill, J, Mottaz, A, Pasche, E, Michel, P-A, Mottin, L, Stekhoven, DJ, Barbié, V, Ruch, P
JournalStud Health Technol Inform
Date Published2020 Jun 16
KeywordsAbstracting and Indexing, Computational Biology, Data Mining, Humans, MEDLINE, Switzerland

The Swiss Variant Interpretation Platform for Oncology is a centralized, joint and curated database for clinical somatic variants piloted by a board of Swiss healthcare institutions and operated by the SIB Swiss Institute of Bioinformatics. To support this effort, SIB Text Mining designed a set of text analytics services. This report focuses on three of those services. First, the automatic annotations of the literature with a set of terminologies have been performed, resulting in a large annotated version of MEDLINE and PMC. Second, a generator of variant synonyms for single nucleotide variants has been developed using publicly available data resources, as well as patterns of non-standard formats, often found in the literature. Third, a literature ranking service enables to retrieve a ranked set of MEDLINE abstracts given a variant and optionally a diagnosis. The annotation of MEDLINE and PMC resulted in a total of respectively 785,181,199 and 1,156,060,212 annotations, which means an average of 26 and 425 annotations per abstract and full-text article. The generator of variant synonyms enables to retrieve up to 42 synonyms for a variant. The literature ranking service reaches a precision (P10) of 63%, which means that almost two-thirds of the top-10 returned abstracts are judged relevant. Further services will be implemented to complete this set of services, such as a service to retrieve relevant clinical trials for a patient and a literature ranking service for full-text articles.

Alternate JournalStud Health Technol Inform
PubMed ID32570509