[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Σχετικά έγγραφα
The Study of Evolutionary Change of Shogi

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Detection and Recognition of Traffic Signal Using Machine Learning

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Quick algorithm f or computing core attribute

DATA SHEET Surface mount NTC thermistors. BCcomponents

Σχεδίαση και Ανάπτυξη Παιχνιδιού για την Εκμάθηση των Βασικών Στοιχείων ενός Υπολογιστή με Χρήση του Περιβάλλοντος GameMaker

1) Formulation of the Problem as a Linear Programming Model

Lecture 34 Bootstrap confidence intervals

Math 6 SL Probability Distributions Practice Test Mark Scheme

Inverse trigonometric functions & General Solution of Trigonometric Equations

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Durbin-Levinson recursive method

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

MIA MONTE CARLO ΜΕΛΕΤΗ ΤΩΝ ΕΚΤΙΜΗΤΩΝ RIDGE ΚΑΙ ΕΛΑΧΙΣΤΩΝ ΤΕΤΡΑΓΩΝΩΝ

Buried Markov Model Pairwise

Development of a Tiltmeter with a XY Magnetic Detector (Part +)

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

Πανεπιστήμιο Πειραιώς Τμήμα Πληροφορικής Πρόγραμμα Μεταπτυχιακών Σπουδών «Πληροφορική»

derivation of the Laplacian from rectangular to spherical coordinates

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

Η ΨΥΧΙΑΤΡΙΚΗ - ΨΥΧΟΛΟΓΙΚΗ ΠΡΑΓΜΑΤΟΓΝΩΜΟΣΥΝΗ ΣΤΗΝ ΠΟΙΝΙΚΗ ΔΙΚΗ

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ

Models for Probabilistic Programs with an Adversary

clearing a space (focusing) clearing a space, CS CS CS experiencing I 1. E. T. Gendlin (1978) experiencing (Gendlin 1962) experienc-

ER-Tree (Extended R*-Tree)

C.S. 430 Assignment 6, Sample Solutions

ΕΛΕΓΧΟΣ ΤΩΝ ΠΑΡΑΜΟΡΦΩΣΕΩΝ ΧΑΛΥΒ ΙΝΩΝ ΦΟΡΕΩΝ ΜΕΓΑΛΟΥ ΑΝΟΙΓΜΑΤΟΣ ΤΥΠΟΥ MBSN ΜΕ ΤΗ ΧΡΗΣΗ ΚΑΛΩ ΙΩΝ: ΠΡΟΤΑΣΗ ΕΦΑΡΜΟΓΗΣ ΣΕ ΑΝΟΙΚΤΟ ΣΤΕΓΑΣΤΡΟ

Automatic extraction of bibliography with machine learning

Bounding Nonsplitting Enumeration Degrees

Answers - Worksheet A ALGEBRA PMT. 1 a = 7 b = 11 c = 1 3. e = 0.1 f = 0.3 g = 2 h = 10 i = 3 j = d = k = 3 1. = 1 or 0.5 l =

Solution Concepts. Παύλος Στ. Εφραιµίδης. Τοµέας Λογισµικού και Ανάπτυξης Εφαρµογών Τµήµα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Liner Shipping Hub Network Design in a Competitive Environment

90 [, ] p Panel nested error structure) : Lagrange-multiple LM) Honda [3] LM ; King Wu, Baltagi, Chang Li [4] Moulton Randolph ANOVA) F p Panel,, p Z

Applying Markov Decision Processes to Role-playing Game

Προετοιμάζοντας τον μελλοντικό δάσκαλο για το ψηφιακό σχολείο

ΑΚΑΔΗΜΙΑ ΕΜΠΟΡΙΚΟΥ ΝΑΥΤΙΚΟΥ ΜΑΚΕΔΟΝΙΑΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ ΕΠΑΝΑΣΧΕΔΙΑΣΜΟΣ ΓΡΑΜΜΗΣ ΣΥΝΑΡΜΟΛΟΓΗΣΗΣ ΜΕ ΧΡΗΣΗ ΕΡΓΑΛΕΙΩΝ ΛΙΤΗΣ ΠΑΡΑΓΩΓΗΣ REDESIGNING AN ASSEMBLY LINE WITH LEAN PRODUCTION TOOLS

Matrices and Determinants

Η ΠΡΟΣΩΠΙΚΗ ΟΡΙΟΘΕΤΗΣΗ ΤΟΥ ΧΩΡΟΥ Η ΠΕΡΙΠΤΩΣΗ ΤΩΝ CHAT ROOMS

Probabilistic Approach to Robust Optimization

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

(C) 2010 Pearson Education, Inc. All rights reserved.


ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

Other Test Constructions: Likelihood Ratio & Bayes Tests

Επίδραση της Συμβολαιακής Γεωργίας στην Χρηματοοικονομική Διοίκηση των Επιχειρήσεων Τροφίμων. Ιωάννης Γκανάς

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΓΡΟΤΙΚΗΣ ΟΙΚΟΝΟΜΙΑΣ & ΑΝΑΠΤΥΞΗΣ

Stabilization of stock price prediction by cross entropy optimization

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Reading Order Detection for Text Layout Excluded by Image

Thermistor (NTC /PTC)

ΕΘΝΙΚΗ ΣΧΟΛΗ ΗΜΟΣΙΑΣ ΙΟΙΚΗΣΗΣ

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

The Simply Typed Lambda Calculus

Γιπλυμαηική Δπγαζία. «Ανθπυποκενηπικόρ ζσεδιαζμόρ γέθςπαρ πλοίος» Φοςζιάνηρ Αθανάζιορ. Δπιβλέπυν Καθηγηηήρ: Νηθφιανο Π. Βεληίθνο

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

ΣΧΕΔΙΑΣΜΟΣ ΚΑΙ ΕΝΙΣΧΥΣΗ ΤΩΝ ΚΟΜΒΩΝ ΟΠΛΙΣΜΕΝΟΥ ΣΚΥΡΟΔΕΜΑΤΟΣ ΜΕ ΒΑΣΗ ΤΟΥΣ ΕΥΡΩΚΩΔΙΚΕΣ

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

ΕΘΝΙΚΟ ΚΑΙ ΚΑΠΟΔΙΣΤΡΙΑΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΣΧΟΛΗ ΘΕΤΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΠΙΚΟΙΝΩΝΙΩΝ

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Topology Structural Optimization Using A Hybrid of GA and ESO Methods

ΓΡΑΜΜΙΚΟΣ & ΔΙΚΤΥΑΚΟΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

α/α Ονοματεπώνυμο Διακριτές Συνεχείς

AME SAMPLE REPORT James R. Cole, Ph.D. Neuropsychology

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

SMD Transient Voltage Suppressors

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Δυνατότητα Εργαστηρίου Εκπαιδευτικής Ρομποτικής στα Σχολεία (*)

«Χρήσεις γης, αξίες γης και κυκλοφοριακές ρυθμίσεις στο Δήμο Χαλκιδέων. Η μεταξύ τους σχέση και εξέλιξη.»

Statistical Inference I Locally most powerful tests

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Section 7.6 Double and Half Angle Formulas

ST5224: Advanced Statistical Theory II

Calculating the propagation delay of coaxial cable

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Η ΑΥΛΗ ΤΟΥ ΣΧΟΛΕΙΟΥ ΠΑΙΧΝΙΔΙΑ ΣΤΗΝ ΑΥΛΗ ΤΟΥ ΣΧΟΛΕΙΟΥ

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

1, +,*+* + +-,, -*, * : Key words: global warming, snowfall, snowmelt, snow water equivalent. Ohmura,,**0,**

ΓΕΩΜΕΣΡΙΚΗ ΣΕΚΜΗΡΙΩΗ ΣΟΤ ΙΕΡΟΤ ΝΑΟΤ ΣΟΤ ΣΙΜΙΟΤ ΣΑΤΡΟΤ ΣΟ ΠΕΛΕΝΔΡΙ ΣΗ ΚΤΠΡΟΤ ΜΕ ΕΦΑΡΜΟΓΗ ΑΤΣΟΜΑΣΟΠΟΙΗΜΕΝΟΤ ΤΣΗΜΑΣΟ ΨΗΦΙΑΚΗ ΦΩΣΟΓΡΑΜΜΕΣΡΙΑ

Reaction of a Platinum Electrode for the Measurement of Redox Potential of Paddy Soil

Transcript:

1,a) Bayesian Approach An Application of Monte-Carlo Tree Search Algorithm for Shogi Player Based on Bayesian Approach Daisaku Yokoyama 1,a) Abstract: Monte-Carlo Tree Search (MCTS) algorithm is quite effective for playing Go, however it has some weakness for playing tactical games, like Shogi. We propose a new MCTS method that uses Bayesian Approach to propagate distributions of leaf values, and apply it for Shogi player. Through large amount of self-play evaluations we conclude the method has high effectiveness. It also reveals several characteristics of the proposed method; simulation search should keep a certain amount of size, increasing the number of simulations is not effective, etc. 1. 1.1 2 2 ( ) 1 Institute of Industrial Science, The University of Tokyo a) yokoyama@tkl.iis.u-tokyo.ac.jp 10 [1] [2] [3] - 1 -

[4] 1.2 [5] Bayesian Approach 2. 2.1 min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1 LOA 46% LOA futility-pruning ( ) min-max Monte-carlo Tree Search Solver[9] 2.2 [2] [3] [10] - 2 -

2.3 [11] () ( ) 2.4 Bayesian Approach [5] Bayesian Approach UCB QSS Bayesian Approach QSS 3. [5] Bayesian Approach 2 3.1 1 3.2 Bayesian Approach Bayesian Approach [12] [13] [14] Bayesian Approach 1 1 (v 1 ) δ 2 v 1 ± δ (v 1 v 2 ) 1 3.3 (Simnum) 2 3 P laydepth P laydepth - 3 -

[ ] while root : refine p = find_refine() refine p or for p in [refine p root ]: p p root U all QSS ESS root leaf [ ] function find_refine(): if U all (100 ) : return P V if U all (Uall th) : return P V leaves leaves ESS leaves 1/10 for p in [ leaves ]: if p +Simdepth < P laydepth: return p // leaves simulation P laydepth return P V [ ] root max(12 depth 2, 3) [ ] - root ( ) - P V Simnum depth(p V )+Simdepth >= P laydepth+p V th 2 PV P V th PV Bayesian Approach QSS(Q Step Size) Bayesian Approach (U all ) PV P laydepth U all PV 1/10 P laydepth QSS 3 PV 3.4 Bayesian Approach 1 1 v v ± δ (Simdepth) 1-4 -

5 # of sim: 3 # of sim: 5 5 # of sim: 1 # of sim: 3 # of sim: 5 5 5 5 5 5 5 0 200 400 600 800 1000 1200 1400 1600 sigma: standard deviation of randomized evaluation value 0.15 0 50 100 150 200 250 300 350 400 450 500 550 delta: 1st pin drift 4 5 [11] (Simnum) 4. 4.1 *1 31 500 1 1000 PV 12 P laydepth 12 1000 500 1000 CPU 95% 4.2 *1 http://www.logos.t.u-tokyo.ac.jp/ gekisashi/ Simdepth 8 P V th 4 σ 4 Simnum 3,5 100 σ 800 σ 1500 σ = 200 4.3 1 v δ ( 1) δ Simdepth 800 P V th 4 δ 5 Simnum 1,3,5 δ 25, 50, 100,200,300,500 δ = 500 ±δ 5 Simnum δ Simnum 1 δ 50 Simnum 3 δ 200 Simnum 1-5 -

0.9 0.8 0.9 0.8 sim depth: 2 sim depth: 4 sim depth: 6 sim depth: 8 sim depth: 10 0.1 0.1 # of sim: 1 0 # of sim: 3 # of sim: 5 # of sim: 1, delta 500-0.1 2 3 4 5 6 7 8 9 10 Simdepth: simulation size 0-0.1 0 2 4 6 8 10 12 14 16 PVth: additional PV length 6 7 PV (σ = 200) 1 δ Simnum 4.4 δ 100 P V th 4 Simdepth 6 Simnum 1,3,5 Simnum 1 δ 500 # of sim: 1, delta 500 Simdepth 6 δ Simdepth 2 10 P V th 7 Simdepth 6 Simdepth 8 12 Simdepth 10 8 0.9 0.8 0.1 sim depth: 2 sim depth: 4 0 sim depth: 6 sim depth: 8 sim depth: 10-0.1 10 20 30 40 50 60 70 80 consumed time ratio 8 PV Simdepth 10 8 P V th 2.5 3 Simdepth 8 4.5 Simdepth 8 Simnum 1 7 9 P V th Simnum Simnum Simnum Simnum = 1-6 -

5 5 additional PV length: 0 additional PV length: 4 additional PV length: 8 additional PV length: 12 5 5 PVth = 12 PVth = 12 PVth = 8 5 5 5 5 5 5 5 0 1 2 3 4 5 6 7 8 Simnum: number of simulation PVth = 0 # of sim: 1 5 # of sim: 3 # of sim: 5 # of sim: 7 10 20 30 40 50 60 70 80 90 100 110 120 consumed time ratio 9 11 6 4 2 0.18 0.16 0.14 P V th Simnum Simnum 0.12 PVth: 0 PVth: 4 0.1 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Simnum: number of simulation 10 (δ = 500) δ 500 v ± δ 5 Simnum 10 1 4.6 δ 100 Simdepth 8 P V th 11 Simnum 1 7 Simnum 1 3 P V th 0 12 P V th 0 8 4.7 [5] 4.8 1 3 [11][5] Bayesian Approach - 7 -

Bayesian Approach [12] Baum ( 5) ( 10) 1 Bayesian Approach Bayesian Approach 1 5. Bayesian Approach [1] Sylvain Gelly, Yizao Wang, Rémi Munos, and Olivier Teytaud. Modification of UCT with Patterns in Monte- Carlo Go. Technical Report RR-6062, INRIA, 2006. [2],,.. 11, 2006. [3],.. 13, 2008. [4] Mark H. M. Winands and Yngvi Björnsson. Evaluation function based monte-carlo LOA. ACG, pp. 33 44, 2009. [5].. 17, pp. 76 83, 2012. [6] Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In CG 2006, 2006. [7] Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In Proceedings of the 17th European conference on Machine Learning, ECML 06, pp. 282 293, 2006. [8] Richard J. Lorentz. Amazons discover monte-carlo. In CG 2008, pp. 13 24, 2008. [9] Mark H. Winands, Yngvi Björnsson, and Jahn-Takeshi Saito. Monte-carlo tree search solver. In CG 2008, pp. 25 36, 2008. [10],,.,. 15, pp. 86 89, 2010. [11],,,.. IPSJ, Vol. 52, No. 11, pp. 3030 3037, Nov 2011. [12] Eric B. Baum and Warren D. Smith. A bayesian approach to relevance in game playing. Artificial Intelligence, Vol. 97, No. 1 2, pp. 195 242, 1997. [13] A. Junghanns. Are there practical alternatives to alphabeta in computer chess? ICGA Journal, Vol. 21, No. 1, pp. 14 32, 1998. [14] Gerald Tesauro, V. T. Rajan, and Richard Segal. Bayesian inference in monte-carlo tree search. In UAI, pp. 580 588, 2010. - 8 -