Leveraging the Table of Contents to Improve Efficiency in an RAG-Powered Chatbot

Authors

  • Harshit Kondle Bridgewater-Raritan High School, Bridgewater, NJ
  • Ruhaan Singh Thomas Jefferson High School for Science and Technology, Alexandria, VA
  • Sai Satheshkumar Emerson High School, McKinney, TX
  • Saketh Nandam Rock Ridge High School, Ashburn, VA
  • Yuvaan Chandra Chantilly High School, Greenbriar, VA
  • Anvit Koppella Virginia Polytechnic Institute and State University, Blacksburg, VA
  • Lieutenant Colonel John McKee Air Force CyberWorx, Air Force Academy, CO
  • Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA
  • Kamaljeet Sanghera Department of Information Sciences and Technology, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5163

Abstract

Sectors have increasingly demanded document processing, drawing attention to Retrieval-Augmented Generation (RAG) systems, which leverage large language models and document search to provide answers exclusively from a specific knowledge base. However, RAG with large documents can be inefficient, with weak points such as high latency and memory usage. This research aims to use the Table of Contents(TOC) to narrow the search space. The techniques examined are Keyword-First Pre-filtering and TOC Query Routing. Keyword-First Pre-filtering uses query-TOC keyword matching to cut out irrelevant parts of the document, dramatically reducing unnecessary computation. If keywords do not exactly match, this technique can fall back on semantic matching. Conversely, TOC Query Routing involves processing the whole document but using the TOC to dynamically guide focus to certain sections. To test impact, 27 natural language queries relating to the DAFMAN 36-2664 policy document, created by Lt. Col. John McKee, were fed into each model. Accuracy, average latency, and average memory usage across all models were recorded. The following figures are derived from a comparison with a near-identical model that used brute-force RAG. The base model used Apache Tika for text extraction, then parsed and chunked all the information. These chunks were embedded through the all-MiniLM-L6-v2 sentence-transformer model and stored with ChromaDB. Using Meta's LLaMA 3 8B model, the base model was then able to generate context-aware answers. It was found that Keyword-First Pre-filtering could reduce latency by 54% and memory usage by 87%, while maintaining answer accuracy. TOC Query Routing decreased latency by 32% while maintaining answer accuracy. These findings suggest that TOC-driven strategies can significantly improve the efficiency of RAG systems without compromising accuracy(with Keyword-First Pre-filtering being especially promising), making them ideal for environments like the Department of the Air Force, where speed and resource constraints are critical. This research could be expanded upon by exploring how search spaces can be limited effectively for queries that involve chunks from disparate areas in the document.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Information Sciences and Technology