EMNLP 2025 in Suzhou: TransBERT Paper Published
We are pleased to announce our participation in EMNLP 2025 in Suzhou, China (Nov. 4–9, 2025) with the publication of the paper “TransBERT: A Framework for Synthetic Translation in Domain-Specific Language Modeling.”
This work presents TransBERT, a framework for pre-training language models using synthetically translated text to address the limited availability of domain-specific data in non-English languages. The TransCorpus toolkit covering French, German, Spanish, and Hindi and the TransBERT-bio-fr model are available for you to explore and try.
☕ Julien Knafou is on site and happy to connect for a chat or coffee!
🤗 Model: https://huggingface.co/jknafou/TransBERT-bio-fr
🖥️ Toolkit: https://github.com/jknafou/TransCorpus
📚 Datasets: https://huggingface.co/jknafou/datasets?search=transcorpus