Skip to content

EMNLP 2025 in Suzhou: TransBERT Paper Published

We are pleased to announce our participation in EMNLP 2025 in Suzhou, China (Nov. 4–9, 2025) with the publication of the paper “TransBERT: A Framework for Synthetic Translation in Domain-Specific Language Modeling.”

This work presents TransBERT, a framework for pre-training language models using synthetically translated text to address the limited availability of domain-specific data in non-English languages. The TransCorpus toolkit covering French, German, Spanish, and Hindi and the TransBERT-bio-fr model are available for you to explore and try.

☕ Julien Knafou is on site and happy to connect for a chat or coffee!

🤗 Model: https://huggingface.co/jknafou/TransBERT-bio-fr

🖥️ Toolkit: https://github.com/jknafou/TransCorpus

📚 Datasets: https://huggingface.co/jknafou/datasets?search=transcorpus