Generating Extended Regular Expressions: A LLM-Based Approach and a Systematic Synthesis Guide

Authors

  • Daniel Feng Department of Computer Science, George Mason University, Fairfax, VA
  • AJ McDaniel Department of Computer Science, George Mason University, Fairfax, VA
  • Ying Zhou Department of Computer Science, George Mason University, Fairfax, VA
  • August Shi Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX
  • Wing Lam Department of Computer Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5220

Abstract

Runtime-verification (RV) tools built off of the Monitoring Oriented Programming framework (MOP), specifically TraceMOP and JavaMOP, verify that traces from executing a piece of software satisfy formalised runtime properties (e.g., “It is unsafe for two threads to use the same Appendable object”). These runtime properties are written in MOP files and typically have five components: a natural language (NL) specification that describes the property, parameters (e.g., Appendable a), events (e.g., safe_append, unsafe_append), handlers (e.g. @fail, @match), and a logic property (e.g., ere : safe_append*). MOP files require logic properties to be written in a supported formal language, one of which is extended regular expression (ERE). ERE is a popular choice, as it is built on the widely used regular expression (RE), but because of additional constructs (e.g., negation, epsilon/empty events), EREs can be more complex and more time-consuming to synthesize than REs. First, we attempt to use large language models (LLMs), specifically GPT-4o, to automate the creation of EREs. We repeatedly refined six LLM prompts, performing preliminary testing on all six and then extensively testing the best performing prompt. Next, we propose a systematic guide developers can follow to produce EREs in a setting that has the NL specification, parameters, events, and handlers defined. This guide transforms parts of the inherently creative procedure of ERE generation into an iterative verification task, and preliminary testing of the guide has shown promising results. Based on the guide, we plan to create an improved system for prompting LLMs. Then, we plan to create a tool that compares semantic equivalence of EREs for the purpose of automating validation. These developments are the first steps in formulating a system that autonomously generates the parameters, events, the logic property, and handlers solely from the NL specification.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Computer Science