Optimal Computing Budget Allocation for Monte Carlo Tree Search has been demonstrated to be a better tree policy than Upper Confidence bounds applied to Trees in the game of Othello

DANIEL QIU; Jie Xu

doi:10.13021/jssr2023.3996

Authors

DANIEL QIU Department of Systems Engineering and Operations Research, George Mason University, Fairfax, VA
Jie Xu Department of Systems Engineering and Operations Research, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2023.3996

Abstract

Through this internship, we aim to try to find better methods for selecting the best action from a set of choices using tree search. We chose to use Monte Carlo Tree Search with Optimal Computing Budget Allocation (MCTS-OCBA) tree policy as a way to find the best choice from a given set of moves from the board game Othello. Monte Carlo Tree search uses many simulations from a certain board state to generate a search tree with, and eventually choose the best move. A tree policy is the way one would choose the move to assign the next simulation given already collected information. Previously, the most widely accepted tree policy for MCTS was Upper Confidence bounds applied to Trees (MCTS-UCT). A newer algorithm, MCTS-OCBA, has recently been demonstrated to be more accurate than MCTS-UCT in solved board games, such as Tic-Tac-Toe, but has not been tested against MCTS-UCT in unsolved board games with larger board states such as Othello. We try to demonstrate that OCBA is a better tree policy than UCT for MCTS by creating an Othello game engine. From our results, MCTS-OCBA performed much better than MCTS-UCT when the number of nodes added to a tree is small, but asymptotically approaches MCTS-UCT’s success in Othello. We tried adding a heuristic, and experiment results suggested that adding a heuristic further improves MCTS-OCBA’s success over MCTS-UCT.