University of Cambridge > > Machine Learning Journal Club > Measuring Game Temperature With UCT-Monte Carlo

Measuring Game Temperature With UCT-Monte Carlo

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Carl Scheffler.

Exhaustive search in reasonably complex trees, like e.g. the board game Go, is extremely expensive. A particular Monte Carlo policy called upper confidence bound for trees (UCT) has emerged over the last three years as a very promising lever on such reinforcement learning problems, but has recently run into scaling problems when attempting to beat humans on large Go boards. It represents a very heuristic approach to Games.

Combinatorial Game theory on the other hand, is a branch of pure maths providing a sturdy framework for sub-divisible full-information games. It uses an abstract concept called “Temperature” to develop approximate strategies that have a bounded error on the perfect line of play. Unfortunately, it is extremely tedious to discover the temperature of games like Go using traditional exhaustive search.

In this talk I will present preliminary results on an attempt to combine the two worlds of Monte Carlo planning and Combinatorial Game Theory to produce a UCT algorithm that measures Temperature and simultaneously searches for good moves on small (sub) games. There’s faint hope that this could lead to “divide-and-conquer” solutions for search in general AND /OR trees with bounded rewards.

This talk is about a work in progress and part of my preparations for my first year report.

This talk is part of the Machine Learning Journal Club series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity