Creating Complex Math Question Test Sets for LLMs to Train With

Authors

  • Himagnya Elaprolu Department of Computer Science, George Mason University, Fairfax, VA
  • Jie Hao Department of Computer Science, George Mason University, Fairfax, VA
  • Mingrui Liu Department of Computer Science, George Mason University, Fairfax, VA

Abstract

LLMs, or Language Learning Models, (such as ChatGPT) can easily interpret human input and provide the desired output. However, when it comes to solving problems that cannot be conventionally analyzed, like math questions, many LLMs struggle to produce accurate results. In order to address this problem, we have created an algorithm that generates a set of complex math questions, which can be used as a training set in the machine learning process of LLMs. The algorithm asks the user for a number, and then generates and prints a dataset of that many questions. Each question is randomly chosen as an integral or derivative, with each component randomly generated (including nested functions) with 3 predetermined operators separating them. Due to Python’s lack of accuracy in generating answers, the questions are input into an advanced LLM with the capability to do calculus, and the output answers are stored in a csv file. The questions are then input into a less advanced LLM, and its output answers are likewise stored in order to compare it with the accepted answers. Using evaluation metrics in python, the accuracy score of the generated answers in one trial of 25 questions was 0.64, providing a baseline evaluation. Although current LLMs are very capable, many are unable to solve some complex problems that humans input. By using the described process, there exists the possibility to significantly improve the performance of LLMs by using these sets to train them to solve these problems and further aid humanity.

Published

2024-10-13

Issue

Section

College of Engineering and Computing: Department of Computer Science