Variomes - A High Recall Search Engine to Support the Curation of Genomic Variants
Precision oncology relies on the use of treatments targeting specific genetic variants. However, identifying clinically actionable variants as well as relevant information likely to be used to treat a patient with a given cancer is a labor-intensive task, which includes searching the literature for a large set of variants. The lack of universally adopted standard nomenclature for variants requires the development of variant-specific literature search engines.
We develop a system to perform triage of publications relevant to support an evidence-based decision. Together with providing a ranked list of articles for a given variant, the system is also able to prioritize variants, as found in a Variant Calling Format, assuming that the clinical actionability of a genetic variant is correlated with the volume of literature published about the variant. Our system searches within three pre-annotated document collections: MEDLINE abstracts, PubMed Central full-text articles and ClinicalTrials.gov clinical trials. A variant synonym generator is used to increase the comprehensiveness of the set of retrieved documents. We then apply different strategies to rank the publications.
We assess the search effectiveness of the system using different experimental settings:
- Experimental setting 1: The literature retrieval task is tuned and evaluated using the TREC Precision Medicine 2018 and 2019 benchmarks consisting respectively in 50 and 40 topics. Almost two thirds (62%) of the publications returned in the top-5 are relevant for clinical decision-support.
- Experimental setting 2: The evaluation of the variant prioritization task is based on a manually-created benchmark composed of eight patients for a total of 756 variants. For each patient, we used both their complete set of variants and tumor board reports. Our approach enabled identifying 81.8% of clinically actionable variants in the top-3.
- Experimental setting 3: A comparison of Variomes with LitVar, a well-known search engine for genetic variants is performed. Variomes was able to retrieve on average 90.8% of the content, while LitVar retrieved on average 58.6%. Out of the 9.2% articles, which are “missed” by Variomes, a per error analysis suggests that they are artefacts. To conclude, we are proposing here a competitive system to facilitate the curation of variants for personalized medicine.
Full evaluation report available on bioRxiv: https://www.biorxiv.org/content/10.1101/2021.05.29.446224v1
The variomes services were partially supported by the Swiss Personalized Health Network (SPHN) via the SVIP-O implementation study.