Applying Markov Decision Processes to Role-playing Game

Σχετικά έγγραφα
: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Buried Markov Model Pairwise

ΕΚΠ 413 / ΕΚΠ 606 Αυτόνοµοι (Ροµ οτικοί) Πράκτορες

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Stabilization of stock price prediction by cross entropy optimization

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Simplex Crossover for Real-coded Genetic Algolithms

ST5224: Advanced Statistical Theory II

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Η Επίδραση των Events στην Απόδοση των Μετοχών

Second Order Partial Differential Equations

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

6. MAXIMUM LIKELIHOOD ESTIMATION

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

Homework 3 Solutions

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Probabilistic Approach to Robust Optimization

* * E mail : matsuto eng.hokudai.ac.jp. Zeiss

y = f(x)+ffl x 2.2 x 2X f(x) x x p T (x) = 1 Z T exp( f(x)=t ) (2) x 1 exp Z T Z T = X x2x exp( f(x)=t ) (3) Z T T > 0 T 0 x p T (x) x f(x) (MAP = Max

AΕΙ ΠΕΙΡΑΙΑ T.T. ΣΧΟΛΗ ΤΕΧΝΟΛΟΓΙΚΩΝ ΕΦΑΡΜΟΓΩΝ

Solution Series 9. i=1 x i and i=1 x i.

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Japanese Fuzzy String Matching in Cooking Recipes

VBA Microsoft Excel. J. Comput. Chem. Jpn., Vol. 5, No. 1, pp (2006)

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

MIA MONTE CARLO ΜΕΛΕΤΗ ΤΩΝ ΕΚΤΙΜΗΤΩΝ RIDGE ΚΑΙ ΕΛΑΧΙΣΤΩΝ ΤΕΤΡΑΓΩΝΩΝ

Σχεδίαση και Ανάπτυξη Παιχνιδιού για την Εκμάθηση των Βασικών Στοιχείων ενός Υπολογιστή με Χρήση του Περιβάλλοντος GameMaker

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data


2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Technical Research Report, Earthquake Research Institute, the University of Tokyo, No. +-, pp. 0 +3,,**1. No ,**1

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΕΙΡΑΙΑ ΤΜΗΜΑ ΟΙΚΟΝΟΜΙΚΗΣ ΕΠΙΣΤΗΜΗΣ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥΔΩΝ ΟΙΚΟΝΟΜΙΚΗ ΚΑΙ ΕΠΙΧΕΙΡΗΣΙΑΚΗ ΣΤΡΑΤΗΓΙΚΗ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Problem Set 3: Solutions

, Evaluation of a library against injection attacks

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Η Έννοια και η Σημασία της Τιμής

Bayesian modeling of inseparable space-time variation in disease risk


GÖKHAN ÇUVALCIOĞLU, KRASSIMIR T. ATANASSOV, AND SINEM TARSUSLU(YILMAZ)

ΠΑΝΔΠΗΣΖΜΗΟ ΠΑΣΡΩΝ ΣΜΖΜΑ ΖΛΔΚΣΡΟΛΟΓΩΝ ΜΖΥΑΝΗΚΩΝ ΚΑΗ ΣΔΥΝΟΛΟΓΗΑ ΤΠΟΛΟΓΗΣΩΝ ΣΟΜΔΑ ΤΣΖΜΑΣΩΝ ΖΛΔΚΣΡΗΚΖ ΔΝΔΡΓΔΗΑ

Research on Economics and Management

Τι είναι η Θεωρία Παιγνίων;

A Determination Method of Diffusion-Parameter Values in the Ion-Exchange Optical Waveguides in Soda-Lime glass Made by Diluted AgNO 3 with NaNO 3

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

page: 2 (2.1) n + 1 n {n} N 0, 1, 2

καθ. Βασίλης Μάγκλαρης

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Example Sheet 3 Solutions

Development of a basic motion analysis system using a sensor KINECT

Bayesian Discriminant Feature Selection

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Development of a Tiltmeter with a XY Magnetic Detector (Part +)

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

Risk! " #$%&'() *!'+,'''## -. / # $

ΣΔΥΝΟΛΟΓΗΚΟ ΔΚΠΑΗΓΔΤΣΗΚΟ ΗΓΡΤΜΑ ΗΟΝΗΧΝ ΝΖΧΝ «ΗΣΟΔΛΗΓΔ ΠΟΛΗΣΗΚΖ ΔΠΗΚΟΗΝΧΝΗΑ:ΜΔΛΔΣΖ ΚΑΣΑΚΔΤΖ ΔΡΓΑΛΔΗΟΤ ΑΞΗΟΛΟΓΖΖ» ΠΣΤΥΗΑΚΖ ΔΡΓΑΗΑ ΔΤΑΓΓΔΛΗΑ ΣΔΓΟΤ

ER-Tree (Extended R*-Tree)

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Uniform Convergence of Fourier Series Michael Taylor

Vol. 37 ( 2017 ) No. 3. J. of Math. (PRC) : A : (2017) k=1. ,, f. f + u = f φ, x 1. x n : ( ).

Vol. 34 ( 2014 ) No. 4. J. of Math. (PRC) : A : (2014) XJ130246).

Additional Results for the Pareto/NBD Model

ΑΛΓΟΡΙΘΜΟΙ Άνοιξη I. ΜΗΛΗΣ

Answers - Worksheet A ALGEBRA PMT. 1 a = 7 b = 11 c = 1 3. e = 0.1 f = 0.3 g = 2 h = 10 i = 3 j = d = k = 3 1. = 1 or 0.5 l =

Εξοικονόμηση Ενέργειας σε Εγκαταστάσεις Δρόμων, με Ρύθμιση (Dimming) ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

; +302 ; +313; +320,.

Q π (/) ^ ^ ^ Η φ. <f) c>o. ^ ο. ö ê ω Q. Ο. o 'c. _o _) o U 03. ,,, ω ^ ^ -g'^ ο 0) f ο. Ε. ιη ο Φ. ο 0) κ. ο 03.,Ο. g 2< οο"" ο φ.

Homomorphism in Intuitionistic Fuzzy Automata

Πιθανολογική Ανάλυση Αποφάσεων. Συστήματα Αποφάσεων Εργαστήριο Συστημάτων Αποφάσεων και Διοίκησης

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Χρηματοοικονομική Ανάπτυξη, Θεσμοί και

Η νοθεία των τροφίμων, μελέτη περίπτωσης η νοθεία του ελαιόλαδου

Medium Data on Big Data

Vol. 34 ( 2014 ) No. 4. J. of Math. (PRC) : A : (2014) Frank-Wolfe [7],. Frank-Wolfe, ( ).

Comparison of Evapotranspiration between Indigenous Vegetation and Invading Vegetation in a Bog

Πανεπιστήµιο Μακεδονίας Τµήµα ιεθνών Ευρωπαϊκών Σπουδών Πρόγραµµα Μεταπτυχιακών Σπουδών στις Ευρωπαϊκές Πολιτικές της Νεολαίας

ΔΙΑΤΜΗΜΑΤΙΚΟ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥΔΩΝ ΣΤΗ ΔΙΟΙΚΗΣΗ ΕΠΙΧΕΙΡΗΣΕΩΝ ΘΕΜΕΛΙΩΔΗΣ ΚΛΑΔΙΚΗ ΑΝΑΛΥΣΗ ΤΩΝ ΕΙΣΗΓΜΕΝΩΝ ΕΠΙΧΕΙΡΗΣΕΩΝ ΤΗΣ ΕΛΛΗΝΙΚΗΣ ΑΓΟΡΑΣ

Σχέση στεφανιαίας νόσου και άγχους - κατάθλιψης

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Μεταπτυχιακή εργασία : Μελέτη της εξέλιξης του προσφυγικού οικισμού της Νέας Φιλαδέλφειας με χρήση μεθόδων Γεωπληροφορικής.

Numerical Analysis FMN011

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

Assalamu `alaikum wr. wb.

1. Panel Data.

ΕΚΠ 413 / ΕΚΠ 606 Αυτόνοµοι (Ροµ οτικοί) Πράκτορες

Transcript:

1,a) 1 1 1 1 2011 8 25, 2012 3 2 MDPRPG RPG MDP RPG MDP RPG MDP RPG MDP RPG Applying Markov Decision Processes to Role-playing Game Yasunari Maeda 1,a) Fumitaro Goto 1 Hiroshi Masui 1 Fumito Masui 1 Masakiyo Suzuki 1 Received: August 25, 2011, Accepted: March 2, 2012 Abstract: In previous research a part of role-playing game (RPG) is represented with Markov decision processes (MDP). In this research we represent a more general PRG with MDP. We maximize an expected total reward under the condition that the true parameter of MDP is known in the first proposition. We maximize an expected total reward with respect to a Bayes criterion under the condition that the true parameter of MDP is unknown in the second proposition. We approximately maximize an expected total reward using learning data under the condition that the true parameter of MDP is unknown in the third proposition. Keywords: Markov decision processes, role-playing game, statistical decision theory, dynamic programming, Bayes criterion 1. RPG RPG [1], [2] RPG RPG 1 Kitami Institute of Technology, Kitami, Hokkaido 090 8507, Japan a) maeda@cs.kitami-it.ac.jp [1], [2] MDP [3], [4], [5] RPG MDP DP [1], [2] RPG MDP RPG c 2012 Information Processing Society of Japan 1608

1 MDP RPG 2. MDP RPG 2.1 RPG RPG HP HP 0 HP M hp sm i SM SM = {sm 1,sm 2,...,sm SM } sm 1 f i F F = {f 1,f 2,...,f F } F (sm i ) F e i E E = {e 1,e 2,...,e E } M(e i ) e i e i HP HP 0 G(e i ) a 1 a 4 a 5 a 6 a 1 a 2 a 3 a 4 mv(sm i,a j ) sm i a j p(e k F (mv(sm i,a j )),θ ) mv(sm i,a j ) e k 1 e k E p(e k F (mv(sm i,a j )),θ ) a 5 p(c(e i ) a 5,e i,θ ) e i e i HP C(e i ) 1 p(c(e i ) a 5,e i,θ ) e i a 5 p(b(e i ) e i,θ ) e i HP B(e i ) e i 1 p(b(e i ) e i,θ ) a 6 p(map a 6,θ ) 1 p(map a 6,θ ) a 6 θ 2.2 MDP RPG MDP MDP s i s i S S = {s 1,s 2,...,s S } S a i a i A A = {a 1,a 2,...,a A } A s i a j s k p(s k s i,a j,ξ ) ξ r(s i,a j,s k ) 1 2 MDP MDP DP ξ (1) t s i t V (s i,t) V (s i,t) =max p(s k s i,a j,ξ )(r(s i,a j,s k )+V(s k,t+1)). a j A s k S (1) MDP RPG x t MDP t x t =(x t,1,x t,2,x t,3,x t,4 ), (2) x t,1 t HP x t,2 t x t,3 t 1 MDP Fig. 1 An example of MDP. c 2012 Information Processing Society of Japan 1609

x t,4 t HP x t,3 = x t,4 =0 A(x t ) x t MDP y t MDP t t x t y t t +1 p(e i F (mv(x t,2,y t )),θ ) e i x t+1 =(x t,1,mv(x t,2,y t ),e i,m(e i )), (3) sm 1 mv(x t,2,y t ) p(e i F (mv(x t,2,y t )),θ ) = 0 1 e p(e i E i F (mv(x t,2,y t )),θ ) x t+1 x t+1 mv(x t,2,y t ) sm 1 =(M hp,sm 1,x t,3,x t,4 ), (4) mv(x t,2,y t ) sm 1 =(x t,1,mv(x t,2,y t ),x t,3,x t,4 ), (5) (4) sm 1 HP M hp t x t y t y t a 5 a 6 a 5 (1 p(c(x t,3 ) a 5,x t,3,θ ))(1 p(b(x t,3 ) x t,3,θ )) x t+1 =(x t,1,x t,2,x t,3,x t,4 ), (6) (1 p(c(x t,3 ) a 5,x t,3,θ ))p(b(x t,3 ) x t,3,θ ) x t+1 x t+1 x t,1 >B(x t,3 ) =(x t,1 B(x t,3 ),x t,2,x t,3,x t,4 ), (7) x t,1 B(x t,3 ) =(M hp,sm 1, 0, 0), (8) (8) sm 1 p(c(x t,3 ) a 5,x t,3,θ )(1 p(b(x t,3 ) x t,3,θ )) x t+1 x t+1 x t,4 >C(x t,3 ) =(x t,1,x t,2,x t,3,x t,4 C(x t,3 )), (9) x t,4 C(x t,3 ) =(x t,1,x t,2, 0, 0), (10) (10) x t,3 r(x t,a 5,x t+1 ) = G(x t,3 ) p(c(x t,3 ) a 5,x t,3,θ )p(b(x t,3 ) x t,3,θ ) x t+1 x t+1 x t,4 C(x t,3 ) =(x t,1,x t,2, 0, 0), (11) x t,4 >C(x t,3 ) x t,1 >B(x t,3 ) =(x t,1 B(x t,3 ),x t,2,x t,3,x t,4 C(x t,3 )), (12) x t,4 >C(x t,3 ) x t,1 B(x t,3 ) =(M hp,sm 1, 0, 0), (13) (11) x t,3 r(x t,a 5,x t+1 )= G(x t,3 ) (13) sm 1 t x t a 6 p(map a 6,θ ) x t+1 =(x t,1,x t,2, 0, 0), (14) (1 p(map a 6,θ ))(1 p(b(x t,3 ) x t,3,θ )) x t+1 =(x t,1,x t,2,x t,3,x t,4 ), (15) (1 p(map a 6,θ ))p(b(x t,3 ) x t,3,θ ) c 2012 Information Processing Society of Japan 1610

x t+1 x t+1 x t,1 >B(x t,3 ) =(x t,1 B(x t,3 ),x t,2,x t,3,x t,4 ), (16) x t,1 B(x t,3 ) =(M hp,sm 1, 0, 0), (17) (17) sm 1 x t,3 r(x t,a 5,x t+1 )=G(x t,3 ) r(x t,a 5,x t+1 )=0 x 1 x 1 =(M hp,sm 1, 0, 0) C(e i ) B(e i ) G(e i ) T T t=1 r(x t,y t,x t+1 ) 3. [6], [7] 3.1 θ U(d(, ),x T +1 y T,θ ) U(d(, ),x T +1 y T,θ )= T r(x t,y t,x t+1 ), (18) t=1 d(, ) x T +1 y T x 1 y 1 x 2 y 2 x T y T x T +1 U(d(, ),x T +1 y T,θ ) θ d(, ) x T +1 y T EU(d(, ),θ ) EU(d(, ),θ ) ( T ) = E r(x t,y t,x t+1 ) t=1 = p(x 2 x 1,y 1,θ )(r(x 1,y 1,x 2 ) x T y T +p(x 3 x 2,y 2,θ )(r(x 2,y 2,x 3 ) + + p(x T +1 x T,y T,θ )r(x T,y T,x T +1 )) )), (19) EU(d(, ),θ ) θ d(, ) θ d (, ) = arg max d(, ) EU(d(, ),θ ). (20) DP (20) DP T MDP T T x T T d (x T,T) =arg V (x T,T) = max max y T A(x T ) y T A(x T ) x T +1 p(x T +1 x T,y T,θ )r(x T,y T,x T +1 ), (21) x T +1 p(x T +1 x T,y T,θ )r(x T,y T,x T +1 ), (22) d (x T,T) T x T V (x T,T) T x T T +1 1 t 1 t T 1 x t t d (x t,t) =arg max p(x t+1 x t,y t,θ )(r(x t,y t,x t+1 ) y t A(x t) x t+1 +V (x t+1,t+1)), (23) V (x t,t)= max p(x t+1 x t,y t,θ )(r(x t,y t,x t+1 ) y t A(x t) x t+1 + V (x t+1,t+1)), (24) d (x t,t) t x t V (x t,t) t x t t (21) (24) d (x 1, 1) 1 T c 2012 Information Processing Society of Japan 1611

3.2 DP θ p(θ) θ θ θ x t y t 1 t x t x t y t 1 = x 1 y 1 x t BEU(d B (,, ),p(θ))= p(θ)eu(d B (,, ),θ)dθ, (25) θ d B(,, ) = arg max d B(,, ) BEU(d B(,, ),p(θ)). (26) DP DP T DP T 1 T x T x T y T 1 d B(x T,x T y T 1,T) =arg max y T A(x T ) x T +1 p(θ x T y T 1 )p(x T +1 x T,y T,θ)dθ r(x T,y T,x T +1 ), (27) V B (x T,x T y T 1,T) = max p(θ x T y T 1 )p(x T +1 x T,y T,θ)dθ y T A(x T ) x T +1 r(x T,y T,x T +1 ), (28) p(θ x T y T 1 ) 1 T x T y T 1 t 1 t T 1 x t x t y t 1 d B(x t,x t y t 1,t) =arg max y t A(x t) x t+1 p(θ x t y t 1 )p(x t+1 x t,y t,θ)dθ (r(x t,y t,x t+1 )+V B (x t+1,x t+1 y t,t+1)). (29) V B (x t,x t y t 1,t) = max p(θ x t y t 1 )p(x t+1 x t,y t,θ)dθ y t A(x t) x t+1 (r(x t,y t,x t+1 )+V B (x t+1,x t+1 y t,t+1)). (30) (27) (30) d B (x 1,x 1, 1) 1 T (27) (30) [8] t x t y t e i x t+1 p(θ xt y t 1 )p(x t+1 x t,y t,θ)dθ p(θ x t y t 1 )p(x t+1 x t,y t,θ)dθ = p(θ x t y t 1 )p(e i F (mv(x t,2,y t )),θ)dθ = N(F (mv(x t,2,y t )),e i x t y t 1 )+α 1 N(F (mv(x t,2,y t )) x t y t 1 )+α 2, (31) α 1 = α(e i F (mv(x t,2,y t ))), (32) α 2 = α(e j F (mv(x t,2,y t ))) e j E + α(mv(x t,2,y t ) F (mv(x t,2,y t ))), (33) N(F (mv(x t,2,y t )),e i x t y t 1 ) x t y t 1 F (mv(x t,2,y t )) e i N(F (mv(x t,2,y t )) x t y t 1 ) x t y t 1 F (mv(x t,2,y t )) α(e i F (mv(x t,2,y t ))) p(e i F (mv(x t,2,y t )),θ) α(mv(x t,2,y t ) F (mv(x t,2,y t ))) 1 e p(e i E i F (mv(x t,2,y t )),θ) c 2012 Information Processing Society of Japan 1612

[6], [7], [8], [9] 0.5 (27) (30) MDP MDP [10] [10] RPG 3.3 DP (24) DP (30) t t L x t y t 1 p(θ xt y t 1 )p(x t+1 x t,y t,θ)dθ L p(θ L)p(x t+1 x t,y t,θ)dθ p(θ L)p(x t+1 x t,y t,θ)dθ (21) (24) DP 4. 4.1 3.1 2 9 1 2 3 4 5 10 3 HP 6 sm 4 a 3 sm 7 HP a 1 HP a 4 3 HP 1 sm 4 a 4 HP 2 Fig. 2 A map of experiment. 2 1 Table 2 Settings of probabilities (the first part). p(e 1 f 2,θ ) p(e 2 f 2,θ ) p(e 1 f 3,θ ) p(e 2 f 3,θ ) 0.3 0.0 0.0 0.8 1 Table 1 Settings of the ground. F(sm 1 ) F(sm 2 ) F(sm 3 ) F(sm 4 ) F(sm 5 ) F(sm 6 ) F(sm 7 ) F(sm 8 ) F(sm 9 ) f 1 f 2 f 3 f 2 f 2 f 3 f 3 f 3 f 3 c 2012 Information Processing Society of Japan 1613

Table 3 3 2 Settings of probabilities (the second part). p(c(e 1 ) a 5,e 1,θ ) p(c(e 2 ) a 5,e 2,θ ) p(b(e 1 ) e 1,θ ) p(b(e 2 ) e 2,θ ) p(map a 6,θ ) 0.9 0.9 0.6 0.6 0.6 4 1 Table 4 Othe settings (the first part). Mhp E M(e 1 ) M(e 2 ) G(e 1 ) G(e 2 ) 10 {e 1,e 2 } 2 4 1 5 5 2 Table 5 Othe settings (the second part). C(e 1 ) C(e 2 ) B(e 1 ) B(e 2 ) T 3 3 1 3 10 sm 1 HP 9 HP 1 sm 4 a 1 sm 5 HP sm 5 4.2 3.3 10 100 1,000 100 100 10 88% 100 94% 1,000 96% DP 5. RPG MDP RPG MDP RPG MDP RPG MDP RPG RPG MDP RPG MDP [11], [12], [13], [14] RPG DP MDP MDP POMDP [15], [16] RPG POMDP RPG MDP RPG RPG c 2012 Information Processing Society of Japan 1614

RPG [1] RPG GI, Vol.2001, No.28, pp.31 38 (2001). [2] RPG GI, Vol.2001, No.58, pp.67 74 (2001). [3] (1973). [4] Martin, L.P.: Markov Decision Processes, John Wiley & Sons (1994). [5] (1979). [6] Berger, J.O.: Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York (1980). [7] (1985). [8] Matsushima, T. and Hirasawa, S.: A Bayes coding algorithm for Markov models, TECHNICAL REPORT OF IEICE, IT95-1, pp.1 6 (1995). [9] (2009). [10] Martin, J.J.: Bayesian Decision Problems and Markov Chains, John Wiley & Sons (1967). [11] Vol.39, No.4, pp.1116 1126 (1998). [12] k- Vol.10, No.3, pp.454 463 (1995). [13] l- Vol.11, No.5, pp.804 808 (1996). [14] MarcoPolo Vol.12, No.1, pp.78 89 (1997). [15] Kaelbling, L.P. Vol.12, No.6, pp.822 830 (1997). [16] POMDPs Vol.14, No.1, pp.148 156 (1999). 7 9 22 63 2 5 10 16 2 21 c 2012 Information Processing Society of Japan 1615

57 62 5 8 13 c 2012 Information Processing Society of Japan 1616