Exploring the creative output of six LLMs measured by Torrance’s 4 components of creative thinking based on holistic and analytic cognitive styles

Authors

  • Andy Wang Kinder High School for the Performing and Visual Arts, Houston, TX
  • Charvi Kanna Montville Township High School, Montville, NJ
  • Herui Ray Li South Forsyth High School, Cumming, GA
  • Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5159

Abstract

As Large Language Models (LLMs) become increasingly integrated into education, understanding their impact on creativity skills is essential. Existing research has explored LLMs in creative education, and how different cognitive styles judge and process AI-generated responses. Yet, research into the personalization of LLMs in responding to cognitive styles while solving creative tasks is minimal. This study explores how well LLMs support creative education, aiming to identify the most suitable LLM for creative output that can effectively adapt to the holistic-analytic dimension of cognitive styles. Using Torrance’s four components of creativity, three student researchers independently prompted six high-traffic general-use LLMs with a control, holistic, and analytic approach to generate multiple solutions to a specific problem. Solutions were scored with a standardized 1 to 5 rubric measuring relevance, reasonability, creativity, and adaption to cognitive styles (styling). Results show all LLMs achieved nearly perfect scores for relevance and reasonability (between 4.91 and 5). Within the four aspects of creativity, Gemini achieved the greatest mean score in elaboration (4.97), Grok in flexibility (4.27), ChatGPT in fluency (3.05), and Deepseek in originality (2.96). The overall creativity score shows Copilot (2.47) and Claude (2.62) performed worse compared to the other four LLMs. Based on the overall creativity score, Grok (3.28) showed the highest scoring performance, with statistical significance compared to all other LLMs (p < 0.05). In addition, Grok significantly outperformed all other LLMs in styling (p < 0.05) except Gemini (p = 0.064). No other LLMs showed statistical significance. Future work will expand the study with the use of additional specific problems prompted to these six LLMs to better determine one specific LLM for teaching creativity based on participants' cognitive style.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Information Sciences and Technology