1,a) Bayesian Approach An Application of Monte-Carlo Tree Search Algorithm for Shogi Player Based on Bayesian Approach Daisaku Yokoyama 1,a) Abstract: Monte-Carlo Tree Search (MCTS) algorithm is quite effective for playing Go, however it has some weakness for playing tactical games, like Shogi. We propose a new MCTS method that uses Bayesian Approach to propagate distributions of leaf values, and apply it for Shogi player. Through large amount of self-play evaluations we conclude the method has high effectiveness. It also reveals several characteristics of the proposed method; simulation search should keep a certain amount of size, increasing the number of simulations is not effective, etc. 1. 1.1 2 2 ( ) 1 Institute of Industrial Science, The University of Tokyo a) yokoyama@tkl.iis.u-tokyo.ac.jp 10 [1] [2] [3] - 1 -
[4] 1.2 [5] Bayesian Approach 2. 2.1 min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1 LOA 46% LOA futility-pruning ( ) min-max Monte-carlo Tree Search Solver[9] 2.2 [2] [3] [10] - 2 -
2.3 [11] () ( ) 2.4 Bayesian Approach [5] Bayesian Approach UCB QSS Bayesian Approach QSS 3. [5] Bayesian Approach 2 3.1 1 3.2 Bayesian Approach Bayesian Approach [12] [13] [14] Bayesian Approach 1 1 (v 1 ) δ 2 v 1 ± δ (v 1 v 2 ) 1 3.3 (Simnum) 2 3 P laydepth P laydepth - 3 -
[ ] while root : refine p = find_refine() refine p or for p in [refine p root ]: p p root U all QSS ESS root leaf [ ] function find_refine(): if U all (100 ) : return P V if U all (Uall th) : return P V leaves leaves ESS leaves 1/10 for p in [ leaves ]: if p +Simdepth < P laydepth: return p // leaves simulation P laydepth return P V [ ] root max(12 depth 2, 3) [ ] - root ( ) - P V Simnum depth(p V )+Simdepth >= P laydepth+p V th 2 PV P V th PV Bayesian Approach QSS(Q Step Size) Bayesian Approach (U all ) PV P laydepth U all PV 1/10 P laydepth QSS 3 PV 3.4 Bayesian Approach 1 1 v v ± δ (Simdepth) 1-4 -
5 # of sim: 3 # of sim: 5 5 # of sim: 1 # of sim: 3 # of sim: 5 5 5 5 5 5 5 0 200 400 600 800 1000 1200 1400 1600 sigma: standard deviation of randomized evaluation value 0.15 0 50 100 150 200 250 300 350 400 450 500 550 delta: 1st pin drift 4 5 [11] (Simnum) 4. 4.1 *1 31 500 1 1000 PV 12 P laydepth 12 1000 500 1000 CPU 95% 4.2 *1 http://www.logos.t.u-tokyo.ac.jp/ gekisashi/ Simdepth 8 P V th 4 σ 4 Simnum 3,5 100 σ 800 σ 1500 σ = 200 4.3 1 v δ ( 1) δ Simdepth 800 P V th 4 δ 5 Simnum 1,3,5 δ 25, 50, 100,200,300,500 δ = 500 ±δ 5 Simnum δ Simnum 1 δ 50 Simnum 3 δ 200 Simnum 1-5 -
0.9 0.8 0.9 0.8 sim depth: 2 sim depth: 4 sim depth: 6 sim depth: 8 sim depth: 10 0.1 0.1 # of sim: 1 0 # of sim: 3 # of sim: 5 # of sim: 1, delta 500-0.1 2 3 4 5 6 7 8 9 10 Simdepth: simulation size 0-0.1 0 2 4 6 8 10 12 14 16 PVth: additional PV length 6 7 PV (σ = 200) 1 δ Simnum 4.4 δ 100 P V th 4 Simdepth 6 Simnum 1,3,5 Simnum 1 δ 500 # of sim: 1, delta 500 Simdepth 6 δ Simdepth 2 10 P V th 7 Simdepth 6 Simdepth 8 12 Simdepth 10 8 0.9 0.8 0.1 sim depth: 2 sim depth: 4 0 sim depth: 6 sim depth: 8 sim depth: 10-0.1 10 20 30 40 50 60 70 80 consumed time ratio 8 PV Simdepth 10 8 P V th 2.5 3 Simdepth 8 4.5 Simdepth 8 Simnum 1 7 9 P V th Simnum Simnum Simnum Simnum = 1-6 -
5 5 additional PV length: 0 additional PV length: 4 additional PV length: 8 additional PV length: 12 5 5 PVth = 12 PVth = 12 PVth = 8 5 5 5 5 5 5 5 0 1 2 3 4 5 6 7 8 Simnum: number of simulation PVth = 0 # of sim: 1 5 # of sim: 3 # of sim: 5 # of sim: 7 10 20 30 40 50 60 70 80 90 100 110 120 consumed time ratio 9 11 6 4 2 0.18 0.16 0.14 P V th Simnum Simnum 0.12 PVth: 0 PVth: 4 0.1 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Simnum: number of simulation 10 (δ = 500) δ 500 v ± δ 5 Simnum 10 1 4.6 δ 100 Simdepth 8 P V th 11 Simnum 1 7 Simnum 1 3 P V th 0 12 P V th 0 8 4.7 [5] 4.8 1 3 [11][5] Bayesian Approach - 7 -
Bayesian Approach [12] Baum ( 5) ( 10) 1 Bayesian Approach Bayesian Approach 1 5. Bayesian Approach [1] Sylvain Gelly, Yizao Wang, Rémi Munos, and Olivier Teytaud. Modification of UCT with Patterns in Monte- Carlo Go. Technical Report RR-6062, INRIA, 2006. [2],,.. 11, 2006. [3],.. 13, 2008. [4] Mark H. M. Winands and Yngvi Björnsson. Evaluation function based monte-carlo LOA. ACG, pp. 33 44, 2009. [5].. 17, pp. 76 83, 2012. [6] Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In CG 2006, 2006. [7] Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In Proceedings of the 17th European conference on Machine Learning, ECML 06, pp. 282 293, 2006. [8] Richard J. Lorentz. Amazons discover monte-carlo. In CG 2008, pp. 13 24, 2008. [9] Mark H. Winands, Yngvi Björnsson, and Jahn-Takeshi Saito. Monte-carlo tree search solver. In CG 2008, pp. 25 36, 2008. [10],,.,. 15, pp. 86 89, 2010. [11],,,.. IPSJ, Vol. 52, No. 11, pp. 3030 3037, Nov 2011. [12] Eric B. Baum and Warren D. Smith. A bayesian approach to relevance in game playing. Artificial Intelligence, Vol. 97, No. 1 2, pp. 195 242, 1997. [13] A. Junghanns. Are there practical alternatives to alphabeta in computer chess? ICGA Journal, Vol. 21, No. 1, pp. 14 32, 1998. [14] Gerald Tesauro, V. T. Rajan, and Richard Segal. Bayesian inference in monte-carlo tree search. In UAI, pp. 580 588, 2010. - 8 -