ELIXIR FHD Community Day & HDTR Project Meeting
The SIB Text-Mining group will take part in the ELIXIR FHD Community Day and HDTR Project Meeting on 11 November 2025, presenting progress on the metadata assignment and search services developed within FHDportal.
These services aim to enhance the description and findability of deposited human data by using AI-assisted metadata assignment based on established vocabularies. We are developing a multi-class, multi-level classifier for MeSH terms, designed to handle the high-dimensional and sparse nature of these labels, and trained on a dataset of ~9 million annotated examples (supplementary data, from SIBiLS).
However, scaling this approach comes with huge challenges: the high dimensionality, sparsity, and label imbalance of MeSH make training and evaluation demanding. Ensuring fair and reliable descriptors' assignment requires careful model design, including sparse-aware loss functions, sampling strategies or exploration of generative modeling approaches, all powered by large-scale distributed training, pushing the boundaries of AI for biomedical metadata.