BiTeM Group
Who we are...
The BiTeM Group, headed by Patrick Ruch, is part of the Information Science Department of the HES-SO/HEG Geneva. It gathers a network of researchers (computer scientists, biologists, bioinformaticians, MDs...) affiliated to various research institutions in Geneva. More information about BiTeM can be found on the SIB Text Mining web pages. The Text Mining group of the SIB Swiss Institute of Bioinformatics, gathers BiTeM's infrastructure services for biologists and biocurators.
Research areas
The BiTeM group is involved in several research projects, with a strong focus on clinical and biological data. The main research areas developed are:
-
Text Mining: sometimes alternately referred to as text data mining, roughly equivalent to text analytics, TM aims at deriving high-quality information out of textual contents. High-quality information is typically achieved through the dividing of patterns and trends via association or pattern learning. Knowledge intensive resources such as dictionaries, terminologies, ontologies and manually crafted rules play an important role in the domain. Text mining usually involves the process of structuring the unstructured or semi-structured input text to generate a more structured (or enriched) database. A typical text mining task includes (e.g. question-answering): information retrieval, named-entity recognition and information extraction. Quality in text mining usually refers to some combination of relevance, novelty, and interestingness. Other common tasks include text categorization (filtering, descriptor assignment...), sentiment analysis, document summarization, and entity relation modeling (i.e., extraction of protein-protein interactions). Today many of these tasks are based on pre-trained language models, although data-poor approaches (e.g. information retrieval, rule-based methods, Support Vector Machines, ...) remain highly effective when data are sparse, as often in real case scenarii.
-
Bibliomics: Bibliomics is the bioinformatics study of the bibliome. The bibliome is the totality of biological text corpus. It emphasizes the importance of biological text contents for biomedical sciences. In practice, bibliomics is often regarded as the application of textual data mining to literature in molecular biology and to MEDLINE in particular. However, the notion tends to expand beyond literature to various other contents, such as the web, patent documents or clinical reports. Thus, from the bibliome, biologists and computer scientists datamine to discover new gene targets and drugs, or explore biotic interactions. Under the umbrella of the SIB Swiss Institute of Bioinformatics and tanks to the Elixir Data Platform, the group maintains several literature services to support curators. In particular we develop triage instruments to help biologists and clinicians to efficiently access MEDLINE, PubMedCentral or the ClinicalTrials.gov.