An Analysis of Generative AI Agents in Automatic Task Completion in Interactive Web Environments

Authors

  • Sarah Li Department of Computer Science, George Mason University, Fairfax, VA
  • Ziyu Yao Department of Computer Science, George Mason University, Fairfax, VA

Abstract

Autonomous AI agents are software programs that can independently complete multi-step tasks given a prompt (e.g.
“Tell me the full address of the nearest airport to George Mason University”). They are expected to complete mundane
daily tasks that do not require human input in the near future. Autonomous agents have become more sophisticated in
recent years owing to advancements in machine learning and natural language processing. However, these agents are
still susceptible to mistakes in comprehension, reasoning, and action execution. WebArena is a standalone web
environment that can host autonomous agents. In this project, we conducted an analysis of WebArena and its provided
agent using the procedural guidelines and task prompts provided by the WebArena team. We ran WebArena using 100
unique prompts ranging from shopping and store management to mapping and navigation. We categorized errors and
discovered which errors occurred most frequently. The autonomous agent successfully completed the task in 23% of
runs. Many runs suffered from an incorrect, missing, or unnecessary action. Common reasons for errors included a lack
of world knowledge or human-like visualization, ignoring part of a prompt, and hallucinations. In the future, training
should ensure that autonomous agents have sufficient world knowledge and experience to navigate the web
successfully. Developers can also add weights to prompts so that agents can focus more on relevant criteria and
introduce memory so that agents can learn from past actions. Hopefully, with more time and research, autonomous
agents will soon improve the lives of many.

Published

2024-10-13

Issue

Section

College of Engineering and Computing: Department of Computer Science