Enhancing the Performance of Small Language Models in Programming Tasks Through Deep Research and Task Decomposition
Abstract
Recent advances in large language models (LLMs) have demonstrated powerful capabilities for code generation and reasoning, accelerating development on real-world engineering tasks. However, their high resource demands limit real-world deployment and prevent providers from experimenting with strategies that delve more in-depth and require more token usage. Smaller language models (<33B parameters) present a promising alternative but typically underperform on complex programming tasks. This research addresses the performance gap by creating a system that enhances small language model capabilities through task decomposition and deep research. We developed a modular agent framework using Pydantic to orchestrate multiple LLMs within an isolated container, decomposing programming tasks into discrete subtasks: chunking, planning, research, and implementation. Each subtask undergoes iterative refinement until it meets predefined quality criteria before integration into subsequent stages. The approach was evaluated on a subset of Software Engineering (SWE-bench-lite), which consists of 300 real-world, self-contained functional Python bug fixes derived from GitHub repository issues. The issues involve the use of popular Python libraries like Django and scikit-learn and necessitate the skill of working in large codebases. Preliminary evaluations revealed that our full pipeline outperformed the baseline (without decomposition) in successful task completion rates. The first two Python tasks failed under the baseline with different errors, but completed successfully using the full pipeline. These initial findings suggest that systematic task decomposition and deep research offer a viable pathway for improving small LLM effectiveness in complex programming scenarios, potentially enabling more accessible and cost-effective deployment of AI-assisted software development tools. While the initial research prioritized improving task completion rates, it did not explicitly optimize for execution time. Future research will focus on reducing end-to-end execution time through optimization techniques, such as adaptive decomposition and result caching.
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.