Skip to content

METAPLANTCODE Kickoff

See Project page was kicked off in May in Kassel. We will take care of the WP5. Background: Species names are taxonomic hypotheses which are likely to change over time, e.g. each accepted species name on average has eight synonyms, including homonyms, which may result in taxonomic confusion. Species names can also refer to different taxonomic concepts which may hamper the understanding of the delimitation of a species used in monitoring programs in different countries. As species are defined by taxonomic treatments, sections of texts, that define the discovery of a new species, subsequent treatments citing previous ones are adding new data about the respective species and thus provide the history of scientific names. Currently these treatments liberated by Plazi are reused in GBIF and over 42,000 checklists derived from taxonomic publications are integrated in ChecklistBank. The explicit reference to a specific treatment assures that data can be compared over time and space. These sets of taxonomic names can be used to comprehensively explore additional publications, such as ecological or Redlists, and complement the taxonomic works by applying text analytical pipelines, and in particular for biotic interactions using Natural Language Processing methods, including transformers, which are today SOTA (Naderi et al. 2021). In the context of the BICIKL project (Penev et a. 2022), the partners have developed a comprehensive literature bioannotation pipeline based on the SIB Literature Services (SIBiLS, Pasche et al. 2022) to deliver a comprehensive machine-readable biodiversity literature repository (e.g. TreatmentBank, PMC, MEDLINE). Contents of the repository are available in both JATS (Journal Article Tag Suite) and BioC (Biomedical Text Processing: Comeau et al. 2013) formats and holistic access based on OpenAPI is available (SIBiLS: Gobeill et al. 2020). Scientific work program: A reference database for all the treatments referenced at the WP1 sites will be created collecting reference publications, converting these into a machine actionable format, annotating and making treatments and figures FAIR using BLR, and generating JATS compliant treatments to be (data) published in SIBiLS. The integration of names in treatments into Catalogue of Life will be performed by interacting with ChecklistBank (WP4). A persistent identifier is minted to be attached to the respective names in the metabarcoding results. The taxonomic names complemented with its synonyms together with terms from biotic interactions vocabularies will be used to mine the treatments and other available relevant publications and Redlists in SIBiLS and will be made accessible for the combined analyses with metabarcoding data. Semantic enrichment of literature will comprise defined standards and generated annotations used for conversion of floras, Redlists, and ecological works for identification of biotic interactions and related features (e.g. ENVO/EUNIS for habitats). We will furthermore design JSON API and demonstrator GUI to support WP3/WP4 discovery tasks (this subtask will build on top of the named entity recognition to deliver species-centric traits and biotic interaction networks)