Chain-of-thought and Tree-of-thought Analysis

Authors

  • Neal Khalkho Rock Ridge High School, Ashburn, VA
  • Amogh Katiki Thomas Jefferson High School for Science and Technology, Fairfax, VA
  • Vaeshnavi Reddy Alla Data Analytics Engineering Alumni, George Mason University, Fairfax, VA
  • Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA

Abstract

Large Language Models (LLMs) like ChatGPT and Claude are increasingly being used to help student learning. However, the most effective prompting strategies for beginner programmers remain unknown. Two methods, Chain-of-Thought (CoT) and Tree-of-Thought (ToT), offer structured reasoning in different ways: CoT provides a step-by-step explanation, while ToT uses multiple reasoning paths. Although both of these methods have shown promise in helping improve AI performance, little to no research has examined their comparative impact on human learning. In this study, it was evaluated how CoT and ToT prompts influenced a beginner’s understanding of JavaScript sorting algorithms, including Bubble Sort, Selection Sort, Merge Sort, Insertion Sort, and Quick Sort. Using GPT-4o, selected because of its top benchmark performance in coding and reasoning, AI-driven lessons were generated and a five-category rubric—clarity, technical accuracy, inclusion of warnings, retention, and understanding—was used to assess their effectiveness. Three students with little to no prior experience in JavaScript participated in the experiment. While CoT scored a total of 66 points across the rubrics, ToT scored a total of 55 points. Participants reported that CoT responses were easier to follow, boosted confidence, and improved their ability to retain and apply the information learned. In contrast, while ToT did encourage broader thinking, it often introduced a cognitive load that hindered comprehension for beginners. These results indicate that CoT prompting clearly outperformed ToT prompting and is more effective for teaching algorithmic fundamentals to beginners. Limitations of this research include small sample size and limited content scope. In the future, a larger group of students could be gathered and taught using the two different reasoning mechanisms with tests and examinations determining the results. In addition to this, other programming topics other than sorting algorithms could also be explored for this research along with several other reasoning mechanisms.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Information Sciences and Technology