Learning Conditional Random Fields with Hierarchical Features: Application to the Game of Go
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Oliver Williams.
We examine an important subtask of policy learning in the game of Go: approximating the value function given a fixed policy. We model the value function as the expected territory outcome of a Go board configuration and learn to predict this outcome using a conditional Markov random field (CRF). This task is complicated by the complexity of inference on a Go Board (361 individual territories to predict – all influenced by surrounding positions) and the use of 4 million pattern-based features. Such complexity induces many computational and statistical problems which must be accounted for during both training and inference.
In this work we examine a variety of models (Independent vs. Coupled, Flat vs. Hierarchical), learning algorithms (Local Training vs. Max Likelihood vs. Max Pseudo-likelihood), and inference approaches (Loopy BP vs. Sampling, Bayesian Model Averaging vs. Heuristic Model Selection). We present results from learning to predict territory in expert games and conclude with a prescription for future work on approximating the value function in Go.
This is joint work with Thore Graepel and Ralf Herbrich with contributions by Tom Minka.
This talk is part of the Microsoft Research Machine Learning and Perception Seminars series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|