Skip to content

Last news

FHDportal

Theme: Data Deposition

Human data, especially genomic data, is increasingly being federated across borders and institutions, with many stakeholders participating in multinational and global biomedical and health data networks, fostering collaborations and partnerships. While such international efforts are essential for the compilation and reuse of data, regulatory constraints often hinder the movement of certain data beyond organisational or national boundaries. Centralised approaches such as the Central European Genome-Phenome Archive (CEGA) are valuable, but not all data can be centralised.

The Federated European Genome-phenome Archive network (FEGA) addresses this, with early work concentrated on local collection of data with central archiving of metadata. FHDportal aims to support both federated and central submission of metadata. It will do this by providing a reusable portal for gathering and storing metadata at a national level, and submitting required metadata centrally to enable discovery of datasets via the CEGA. FHDportal complements the existing system by providing a way to explore richer metadata (for example, including detailed information on specific datasets or local funding information), while enabling a core set of metadata to be queried centrally.

FHDportal will be deployed and tested on FEGA nodes, and should be of interest to the many other countries seeking to join FEGA. The need for FHDportal is based on experience during onboarding and in moving to production nodes. It will offer a common solution for local mobilisation of data and metadata, which can be adapted to local situations. During development, it will be tested on both new and well-established nodes using different technical platforms and infrastructures. The resulting software will be provided to the whole community, and will hopefully become part of the emerging toolkit for new FEGA nodes wishing to establish themselves, and to ensure their nodes meet local needs while bringing European scale benefits.

The SIB Text Mining will develop a service powered by a dedicated language model to support the semi-automatic assignment of descriptors at deposition time. The service aims at facilitating the provision of meta-data by end-users as explored in Teodoro et al. 2017.

Nodes involved: ELIXIR Switzerland, ELIXIR Finland, ELIXIR Luxembourg, ELIXIR UK

Communities: Federated Human Data, Human Copy Number Variation

BioMoQA

For over a decade, the Swiss Institute of Bioinformatics and HES-SO/HEG Geneva have maintained the SIB Literature Services (SIBiLS, Gobeill et al. 2020). The services are part of a broader initiative, led by SIB, the Swiss BioData ecosystem (SBDe). SIBiLS is also an ELIXIR Data Resource, supported by the Data Platform commissioned services.

In 2023, SIBiLS received a significant upgrade with the launch of the Biodiversity PMC digital library. SIBiLS operated the backend of the Biodiversity PMC front-end. It is emerging as a global resource in the field as it complements and addresses limitations of biological scholarly databases such as NLM’s PMC or EuropePMC as it is likely the largest digitally-native repository of articles for biodiversity research and related disciplines; thus delivering a broad coverage “One Health” library.

BioMoQA aims to enhance Biodiversity PMC to address key biodiversity questions about the effects of climate change, habitat loss or invasive species on biodiversity and ecosystems, and the implications for human societies. The new services will support experts and researchers in biodiversity, ecology and environmental sciences who need better search & access services to all bio-related publications.

We will enhance the platform's AI-powered analytical services (i.e. Question-Answering, SPARQL endpoint) and the acquisition & FAIR-ification of new and original contents as operated by Plazi and in particular the TreatmentBank database, which contains Open Access taxonomic treatments from Open Access (OA) and non OA articles.

We will evaluate how these services can help ecologists - and in particular our partner, Prof. Clara Zemp, from the University of Neuchâtel - to monitor biodiversity on island ecosystems. These new services will serve to supply scientific communities interested in biodiversity with a holistic Single Access Point to enhance evidence-based conservation of biodiversity and support the restoration of ecosystems.

METAPLANTCODE Kickoff

See Project page was kicked off in May in Kassel. We will take care of the WP5. Background: Species names are taxonomic hypotheses which are likely to change over time, e.g. each accepted species name on average has eight synonyms, including homonyms, which may result in taxonomic confusion. Species names can also refer to different taxonomic concepts which may hamper the understanding of the delimitation of a species used in monitoring programs in different countries. As species are defined by taxonomic treatments, sections of texts, that define the discovery of a new species, subsequent treatments citing previous ones are adding new data about the respective species and thus provide the history of scientific names. Currently these treatments liberated by Plazi are reused in GBIF and over 42,000 checklists derived from taxonomic publications are integrated in ChecklistBank. The explicit reference to a specific treatment assures that data can be compared over time and space. These sets of taxonomic names can be used to comprehensively explore additional publications, such as ecological or Redlists, and complement the taxonomic works by applying text analytical pipelines, and in particular for biotic interactions using Natural Language Processing methods, including transformers, which are today SOTA (Naderi et al. 2021). In the context of the BICIKL project (Penev et a. 2022), the partners have developed a comprehensive literature bioannotation pipeline based on the SIB Literature Services (SIBiLS, Pasche et al. 2022) to deliver a comprehensive machine-readable biodiversity literature repository (e.g. TreatmentBank, PMC, MEDLINE). Contents of the repository are available in both JATS (Journal Article Tag Suite) and BioC (Biomedical Text Processing: Comeau et al. 2013) formats and holistic access based on OpenAPI is available (SIBiLS: Gobeill et al. 2020). Scientific work program: A reference database for all the treatments referenced at the WP1 sites will be created collecting reference publications, converting these into a machine actionable format, annotating and making treatments and figures FAIR using BLR, and generating JATS compliant treatments to be (data) published in SIBiLS. The integration of names in treatments into Catalogue of Life will be performed by interacting with ChecklistBank (WP4). A persistent identifier is minted to be attached to the respective names in the metabarcoding results. The taxonomic names complemented with its synonyms together with terms from biotic interactions vocabularies will be used to mine the treatments and other available relevant publications and Redlists in SIBiLS and will be made accessible for the combined analyses with metabarcoding data. Semantic enrichment of literature will comprise defined standards and generated annotations used for conversion of floras, Redlists, and ecological works for identification of biotic interactions and related features (e.g. ENVO/EUNIS for habitats). We will furthermore design JSON API and demonstrator GUI to support WP3/WP4 discovery tasks (this subtask will build on top of the named entity recognition to deliver species-centric traits and biotic interaction networks)

Main achievements of the group in 2023

This post is a brief recap of the main achievements of the group in 2023.

The year was extremely rich for the SIB literature services as the Biodiversity PMC portal has been launched and presented at TDWG 2023 in Tasmania. Biodiversity built on top of the BICIKL project, coordinated by Pensoft. The new platform is designed to serve biodiversity, ecological and environmental sciences as it provides Open Access to the large collections of articles in the field. The platform provides also direct access to PLAZI' treatments: a collection of about 500 000 curated taxonomic treatments extracted from the literature. Further, we also launched the Supplementary Data Index, which contains about 20 millions files from PMC (e.g., XLS, CSV, DOC, PDF), including OCR-ized images (e.g., tig, gif, jpeg). A growing set of articles relevant for biodiversity studies (e.g., European Journal of Taxonomy) but not part of the NLM's PubMed Central archive are added to PMC collection of Biodiversity PMC, therefore we renamed this collection PMC+ !


A recap of our publications can be found here.


Thanks to the e-BioDiv support from swissuniversities, we also delivered the e-BioDiv matching portal, see e-BioDiv Matching Services. Of particular interest for Swiss natural history and botanical garden collections, most biodiversity-related Swiss journals (e.g., Candollea) are now part of Biodiversity PMC thanks to PLAZI's TreatmentBank.