Formal Text Translation Using Large Language Models: A Replication Study on LLM-Driven Translation of Open-Source Software Documentation from English to German Language Variants

Authors

  • Alejandro Lomelin Independence High School, Ashburn, VA
  • Elijah Adejumo Department of Computer Science, George Mason University, Fairfax, VA
  • Brittany Johnson Department of Computer Science, George Mason University, Fairfax, VA

Abstract

Open-source software development is a global activity, with the potential for engaging a wide array of contributors. Central to contributing in open-source is the documentation made available. With the recent emergence of generative AI and Large Language Models (LLMs), translation across languages using these new tools has been a prominent subject in research. However, studies commonly focus on conversational language, overlooking formal and legal texts which would improve the engagement of global audiences through accurate translation of open-source documentation. Furthermore, most works have focused on translation within global languages rather than across different languages. This study aims to evaluate the performance of LLM translation in formal contexts, specifically open-source documentation. To achieve this goal, we designed a replication study (based on “LLMs and Translation: different approaches to localization between Brazilian Portuguese and European Portuguese”) where we are evaluating different approaches to localization between language variants by using an existing dataset of onboarding documents. However, instead of translating English to Portuguese, we focus on translation from English to German language variants. These efforts have implications for both research and practice, providing insights into the effectiveness of LLMs translating formal, or in our case technical, documents. This could not only increase the accessibility of open-source documentation for global audiences, but also improve the translation of similar, technical texts.

Published

2024-10-13

Issue

Section

College of Engineering and Computing: Department of Computer Science