Using Language Models to Promote Inclusive Language in Software Development Communities

Authors

  • Joshua W. Jacas Department of Computer Science, George Mason University, Fairfax, VA
  • Brittany Johnson Department of Computer Science, George Mason University, Fairfax, VA

Abstract

The use of non-inclusive and harmful terminology in software development communities poses significant challenges in fostering an inclusive environment. The HaTe Detector project aims to address this issue by developing a tool that identifies and suggests replacements for harmful terms in computing artifacts.. Non-inclusive language can perpetuate stereotypes, reinforce biases, and create an unwelcoming atmosphere for underrepresented groups. Addressing this issue is important for promoting diversity, equality, and inclusion in tech. The HaTe Detector project uses existing research and tools focused on inclusive language, including the GitHub Inclusifier project, which offer guidelines and automated corrections for promoting inclusive language in technical and everyday contexts. We designed an experiment to evaluate several different LLMS including GPT-4, BERT, RoBERTa, T5, and DistilBERT for their ability to detect and replace harmful terms. We created specific prompts to cover detection, replacement suggestions, contextual understanding, and handling of complex scenarios. Preliminary results indicate that LLMs can effectively identify and suggest replacements for harmful terms and  emphasize the potential of LLMs to support automated tools in promoting inclusive language in  tech. This project contributes to ongoing efforts to foster a more inclusive tech community by building on existing literature and practicing robust evaluation methods.

 

Published

2024-10-13

Issue

Section

College of Engineering and Computing: Department of Computer Science