Skip to content

BITEM Projects

METAPLANTCODE

Plants drive terrestrial ecosystems and as autotrophs, are among the most important organisms in the terrestrial food web. Plants are directly affected by climate change which will have a profound effect on ecosystems as well as on primary production and agriculture. Currently, an estimated 2 out of 5 plant species are threatened with extinction (Antonelli et al. 2020). Their loss will also affect other groups of organisms.

Metabarcoding of plants allows to monitor plants’ functional dependencies and organismal networks like food webs, pollination and dispersal e.g. through the analysis of feces and plant traces in guts to monitor herbivores’ diets, plant traces attached to organisms, e.g. for pollinator monitoring by using Malaise or pan traps. To understand and alleviate the driving forces behind the current unprecedented biodiversity change calls for accurate monitoring, harmonization of protocols, high data quality and interoperability among different databases and pipelines, and streamlining of efforts (IPBES 2019; Grooten et al. 2020, Kolter & Gemeinholzer 2021). Metabarcoding describes the analysis of complex environmental DNA (eDNA) samples with the aim of taxonomic identification. The method can be standardized and automated and is suitable for high throughput large scale and long term monitoring. Metabarcoding has the potential to provide a scale and accuracy in biodiversity surveys that was previously unattainable for many taxonomic groups (Deiner et al. 2017; Ruppert et al. 2019, Kolter & Gemeinholzer 2021).

Nevertheless, the increase in technical complexity, compared to most other monitoring methods, also implies a higher susceptibility to errors and therefore requires stringent quality control and harmonization across techniques, workflows and institutions (Deiner et al. 2017; Ruppert et al. 2019; Thalinger et al. 2020). Plant metabarcoding is not as straightforward as in many animal groups (for a review see Deiner et al. 2017), as one barcode marker will not be sufficient for correct species level identification across the whole plant kingdom due to evolutionary constraints (e.g. hybridization of species across different taxonomic levels, polyploidy, rapid and slow radiating groups, apomixis). DNA barcode sequences might sometimes result in resolutions on clades of multiple species or higher taxonomic level only, which is not meeting the requirements for most monitoring purposes. The use of multiple markers in eDNA samples, however, causes assignment problems. Reference databases for comparing unknown sequences with known sequences are constantly being improved in international genbanks, in the course of national Barcode initiatives (NORBOL, ARISE, GBOL, and others), in the Barcode of Life Data System (BOLD; Ratnasingham & Hebert 2007), and by initiatives like Biodiversity Genomics Europe (BGE), which continuously increases the accuracy of identification. Nevertheless, even comprehensive reference databases that are too non-specific decrease taxonomic precision.

The continuous optimization of reference databases highlights the importance of versioning and citability in appropriate repositories. Considering the great importance of plants for organismal interactions and as the primary producers of ecosystems, we here propose the use of not only DNA sequences but also additional information for correct species identification in an AI context (e.g. georeferenced species occurrence data (GBIF.org), site specific vegetation information and species checklists, information on polyploidy, and hybridization, e.g. Biolflor.org). By adding further information to the plant metabarcoding results and evaluating results in an AI context, the accuracy of taxonomic identification will resemble BLAST results in genbanks such as ENA, NCBI and DDBJ. Furthermore, the applicability of taxonomic names changes over space and time, and the same name might refer to different taxonomic concepts (e.g. in reference databases, species checklists, genbanks, different versions of the plant red lists in different countries, etc.). Linking a taxonomic name to a specific concept defined in a taxonomic treatment in a reference publication (Biodiversity Literature Repository ;ChecklistBank) will allow for comparison of identifications over space and time, to find additional data about the taxon through text and data mining, and make use of synonyms and re-descriptions to widen the application of additional data. The project builds upon existing structures, networks and standards, which need further enhancement and harmonization, e.g. across national and international initiatives (e.g. GBIF, BOLD, TDWG, GBOL, NorBol, SBDI, ARISE). Furthermore, specific databases are emerging or in the process of emerging (e.g., ASV banks). Different bioinformatic pipelines are currently being built, and ELIXIR and the de.NBI-cloud services are available for online analysis. Deep learning algorithms, and more specifically pre-trained transformer-based models, can make use of already available knowledge (e.g. DNA sequences, literature), which in some parts need to be made digitally available and taxonomically comparative in standardized ways. With selected user case studies, plant metabarcoding pipelines can be harmonized and optimized for different infrastructures and researchers across Europe. Exchange between European GBIF nodes and communication with other DNA barcoding monitoring activities (e.g. freshwater monitoring) can benefit a larger community in the future.

AIRating - Artificial Intelligence to support the evidence-based rating of information

With the support of the Swiss Innovation Agency Innosuisse, in this joint work with Impaakt, we aim to operationalize and evaluate the impact of companies and provide a specialized search engine, targeted at ranking sources for impact analysis. We will build on and extend recent advances in deep learning to analyze various dimensions of impact to reduce the manual curation efforts.

e-BioDiv - Open Biodiversity FAIR-ification Services for Biospecimens stored in Swiss Natural History Museums

The Earth’s scholarly knowledge about species diversity (biodiversity) is included in a corpus of several hundred million pages of publications spanning over 250 years, with an arbitrary starting point of 1753 for plants and 1758 for animals. Each year an estimated 19,000 animal and plant species and a multiple of augmentations of data are added to the already approximately known 1.9M species. The data about each species are included in highly structured taxonomic treatments and figures. Increasingly these treatments include implicit links to the data used to describe and augment it, such as omic and digitized specimen data produced by SwissBioCollection.

BICIKL - Biodiversity Community Integrated Knowledge Library

Missions

BiCIKL will catalyse a culture change in the way biodiversity data is identified, linked, integrated and re-used across the research cycle. We will cultivate a more transparent, trustworthy and efficient research ecosystem. Vision

BiCIKL will launch a new European starting community of key research infrastructures, researchers, citizen scientists and other stakeholders in the biodiversity and life sciences based on open science practices through access to data, tools and services.

Results

BiCIKL is building the Biodiversity Knowledge Hub (BKH) - a single knowledge portal to interlinked and machine-readable FAIR data (Findable, Accessible, Interoperable and Reusable) using unique stable identifiers on specimens, genomics, observations, taxonomy and publications.

SIB Text Mining / HES-SO BiTeM contributions

One of the main outcomes of the project will be supported by the SIB Literature Services (https://sibils.text-analytics.ch/), which will power the delivery of Biodiversity PMC (https://sibils.text-analytics.ch/search/).

CHEM::AI - Predicting and Exploring Novel Chemical Spaces Using Artificial Intelligence

With the support of the Swiss Innovation Agency Innosuisse, in this joint work with Douglas Teodoro's group and SpiroChem AG, a world leading organic/synthetic chemistry innovator in the life science industry, we aim to provide effective solutions for virtual synthesis of new molecules. This project will expand the current accessible chemical space and reduce costs and development time of new chemical entities.

Swiss Personalized Oncology

Being able, from a large collection of text-based reports, to offer a therapeutic answer to new patient-cases: this is the goal of the Text-Mining group as part of the Swiss Personalized Oncology project (SPHN). Thus, in collaboration with the five Swiss University Hospitals (CHUV, HUG, Insel, USB, USZ), we are experimenting a multiclass categorization tool using deep learning to mine huge collections of past clinical reports, identify therapeutic responses, generate hypotheses and ultimately support the understanding of cancer mechanisms.

Swiss Personalized Oncology

CINECA - Common Infrastructure for National Cohorts in Europe, Canada, and Africa

CINECA project aims to develop a federated cloud-based infrastructure for making genomic and biomolecular data accessible. This project has already assembled a virtual cohort from population, longitudinal and disease studies, such as the European Genome-phenome Archive (EGA), CanDIG, and H3Africa, and has contributed to harmonising metadata based on open global standards. The CINECA consortium will create one of the largest cross-continental implementations of human genetic and phenotypic data federation and interoperability for complex diseases. This project is funded by the European Union’s Horizon 2020 and the Canadian Institute of Health Research.

WeIRD - Web Intelligence for Rare Disease

WeIRD

The WeIRD project aims to provide the informational instruments needed to navigate, search and ultimately question the web evidence space of RD by providing access to high-quality specific contents helpful to help diagnosing RD. The system will use advanced information retrieval and text mining methods to holistically crawl, index and finally analyze all the explicit and implicit knowledge available on Rare Diseases.