Recurring Pattern Discovery on Time Series via Large Language Models

Authors

  • Amba Narayanan Department of Computer Science, George Mason University, Fairfax, VA
  • Thirumalai Vinjamoor Akhil Srinivas Department of Computer Science, George Mason University, Fairfax, VA
  • Jessica Lin Department of Computer Science, George Mason University, Fairfax, VA

Abstract

Time Series Motif Discovery aims to uncover recurring patterns, or motifs, within large-scale time series data. These motifs have been shown to be valuable for a variety of downstream tasks, including forecasting, anomaly detection, and classification. Prior studies have demonstrated that inferring Context Free Grammars (CFGs) from discretized time series is an effective approach for discovering such patterns. CFGs capture structured patterns by defining rules that generate valid subsequences, similar to how Artificial Intelligence (AI) systems interpret human language. Traditional CFG inference algorithms like Sequitur adapt a bottom-up, greedy approach that identifies local repetitions early without considering larger, globally optimal patterns. In contrast, to address the limitations of existing approaches, we propose a top-down approach, inspired by human pattern recognition. Specifically, leveraging the Llama-3.1-8B-Instruct Large Language Model (LLM), we prioritize the discovery of high-level structures before decomposing into local, finer patterns. We develop a system to instruct the LLM to produce CFGs given a set of 1000 complex strings, each with 900 characters in length. The CFGs generated by the LLM, and subsequently the repeated patterns described by the CFGs, are evaluated against those from Sequitur using annotated ground truth. Preliminary experiments suggest that the LLM-generated CFGs are more compact compared to those from Sequitur. We anticipate that this approach will have significant impact in downstream applications including motif discovery and sequence compression, and ultimately more AI-driven analysis of time series data.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Computer Science