1,a) 1 1 1 1 2011 8 25, 2012 3 2 MDPRPG RPG MDP RPG MDP RPG MDP RPG MDP RPG Applying Markov Decision Processes to Role-playing Game Yasunari Maeda 1,a) Fumitaro Goto 1 Hiroshi Masui 1 Fumito Masui 1 Masakiyo Suzuki 1 Received: August 25, 2011, Accepted: March 2, 2012 Abstract: In previous research a part of role-playing game (RPG) is represented with Markov decision processes (MDP). In this research we represent a more general PRG with MDP. We maximize an expected total reward under the condition that the true parameter of MDP is known in the first proposition. We maximize an expected total reward with respect to a Bayes criterion under the condition that the true parameter of MDP is unknown in the second proposition. We approximately maximize an expected total reward using learning data under the condition that the true parameter of MDP is unknown in the third proposition. Keywords: Markov decision processes, role-playing game, statistical decision theory, dynamic programming, Bayes criterion 1. RPG RPG [1], [2] RPG RPG 1 Kitami Institute of Technology, Kitami, Hokkaido 090 8507, Japan a) maeda@cs.kitami-it.ac.jp [1], [2] MDP [3], [4], [5] RPG MDP DP [1], [2] RPG MDP RPG c 2012 Information Processing Society of Japan 1608
1 MDP RPG 2. MDP RPG 2.1 RPG RPG HP HP 0 HP M hp sm i SM SM = {sm 1,sm 2,...,sm SM } sm 1 f i F F = {f 1,f 2,...,f F } F (sm i ) F e i E E = {e 1,e 2,...,e E } M(e i ) e i e i HP HP 0 G(e i ) a 1 a 4 a 5 a 6 a 1 a 2 a 3 a 4 mv(sm i,a j ) sm i a j p(e k F (mv(sm i,a j )),θ ) mv(sm i,a j ) e k 1 e k E p(e k F (mv(sm i,a j )),θ ) a 5 p(c(e i ) a 5,e i,θ ) e i e i HP C(e i ) 1 p(c(e i ) a 5,e i,θ ) e i a 5 p(b(e i ) e i,θ ) e i HP B(e i ) e i 1 p(b(e i ) e i,θ ) a 6 p(map a 6,θ ) 1 p(map a 6,θ ) a 6 θ 2.2 MDP RPG MDP MDP s i s i S S = {s 1,s 2,...,s S } S a i a i A A = {a 1,a 2,...,a A } A s i a j s k p(s k s i,a j,ξ ) ξ r(s i,a j,s k ) 1 2 MDP MDP DP ξ (1) t s i t V (s i,t) V (s i,t) =max p(s k s i,a j,ξ )(r(s i,a j,s k )+V(s k,t+1)). a j A s k S (1) MDP RPG x t MDP t x t =(x t,1,x t,2,x t,3,x t,4 ), (2) x t,1 t HP x t,2 t x t,3 t 1 MDP Fig. 1 An example of MDP. c 2012 Information Processing Society of Japan 1609
x t,4 t HP x t,3 = x t,4 =0 A(x t ) x t MDP y t MDP t t x t y t t +1 p(e i F (mv(x t,2,y t )),θ ) e i x t+1 =(x t,1,mv(x t,2,y t ),e i,m(e i )), (3) sm 1 mv(x t,2,y t ) p(e i F (mv(x t,2,y t )),θ ) = 0 1 e p(e i E i F (mv(x t,2,y t )),θ ) x t+1 x t+1 mv(x t,2,y t ) sm 1 =(M hp,sm 1,x t,3,x t,4 ), (4) mv(x t,2,y t ) sm 1 =(x t,1,mv(x t,2,y t ),x t,3,x t,4 ), (5) (4) sm 1 HP M hp t x t y t y t a 5 a 6 a 5 (1 p(c(x t,3 ) a 5,x t,3,θ ))(1 p(b(x t,3 ) x t,3,θ )) x t+1 =(x t,1,x t,2,x t,3,x t,4 ), (6) (1 p(c(x t,3 ) a 5,x t,3,θ ))p(b(x t,3 ) x t,3,θ ) x t+1 x t+1 x t,1 >B(x t,3 ) =(x t,1 B(x t,3 ),x t,2,x t,3,x t,4 ), (7) x t,1 B(x t,3 ) =(M hp,sm 1, 0, 0), (8) (8) sm 1 p(c(x t,3 ) a 5,x t,3,θ )(1 p(b(x t,3 ) x t,3,θ )) x t+1 x t+1 x t,4 >C(x t,3 ) =(x t,1,x t,2,x t,3,x t,4 C(x t,3 )), (9) x t,4 C(x t,3 ) =(x t,1,x t,2, 0, 0), (10) (10) x t,3 r(x t,a 5,x t+1 ) = G(x t,3 ) p(c(x t,3 ) a 5,x t,3,θ )p(b(x t,3 ) x t,3,θ ) x t+1 x t+1 x t,4 C(x t,3 ) =(x t,1,x t,2, 0, 0), (11) x t,4 >C(x t,3 ) x t,1 >B(x t,3 ) =(x t,1 B(x t,3 ),x t,2,x t,3,x t,4 C(x t,3 )), (12) x t,4 >C(x t,3 ) x t,1 B(x t,3 ) =(M hp,sm 1, 0, 0), (13) (11) x t,3 r(x t,a 5,x t+1 )= G(x t,3 ) (13) sm 1 t x t a 6 p(map a 6,θ ) x t+1 =(x t,1,x t,2, 0, 0), (14) (1 p(map a 6,θ ))(1 p(b(x t,3 ) x t,3,θ )) x t+1 =(x t,1,x t,2,x t,3,x t,4 ), (15) (1 p(map a 6,θ ))p(b(x t,3 ) x t,3,θ ) c 2012 Information Processing Society of Japan 1610
x t+1 x t+1 x t,1 >B(x t,3 ) =(x t,1 B(x t,3 ),x t,2,x t,3,x t,4 ), (16) x t,1 B(x t,3 ) =(M hp,sm 1, 0, 0), (17) (17) sm 1 x t,3 r(x t,a 5,x t+1 )=G(x t,3 ) r(x t,a 5,x t+1 )=0 x 1 x 1 =(M hp,sm 1, 0, 0) C(e i ) B(e i ) G(e i ) T T t=1 r(x t,y t,x t+1 ) 3. [6], [7] 3.1 θ U(d(, ),x T +1 y T,θ ) U(d(, ),x T +1 y T,θ )= T r(x t,y t,x t+1 ), (18) t=1 d(, ) x T +1 y T x 1 y 1 x 2 y 2 x T y T x T +1 U(d(, ),x T +1 y T,θ ) θ d(, ) x T +1 y T EU(d(, ),θ ) EU(d(, ),θ ) ( T ) = E r(x t,y t,x t+1 ) t=1 = p(x 2 x 1,y 1,θ )(r(x 1,y 1,x 2 ) x T y T +p(x 3 x 2,y 2,θ )(r(x 2,y 2,x 3 ) + + p(x T +1 x T,y T,θ )r(x T,y T,x T +1 )) )), (19) EU(d(, ),θ ) θ d(, ) θ d (, ) = arg max d(, ) EU(d(, ),θ ). (20) DP (20) DP T MDP T T x T T d (x T,T) =arg V (x T,T) = max max y T A(x T ) y T A(x T ) x T +1 p(x T +1 x T,y T,θ )r(x T,y T,x T +1 ), (21) x T +1 p(x T +1 x T,y T,θ )r(x T,y T,x T +1 ), (22) d (x T,T) T x T V (x T,T) T x T T +1 1 t 1 t T 1 x t t d (x t,t) =arg max p(x t+1 x t,y t,θ )(r(x t,y t,x t+1 ) y t A(x t) x t+1 +V (x t+1,t+1)), (23) V (x t,t)= max p(x t+1 x t,y t,θ )(r(x t,y t,x t+1 ) y t A(x t) x t+1 + V (x t+1,t+1)), (24) d (x t,t) t x t V (x t,t) t x t t (21) (24) d (x 1, 1) 1 T c 2012 Information Processing Society of Japan 1611
3.2 DP θ p(θ) θ θ θ x t y t 1 t x t x t y t 1 = x 1 y 1 x t BEU(d B (,, ),p(θ))= p(θ)eu(d B (,, ),θ)dθ, (25) θ d B(,, ) = arg max d B(,, ) BEU(d B(,, ),p(θ)). (26) DP DP T DP T 1 T x T x T y T 1 d B(x T,x T y T 1,T) =arg max y T A(x T ) x T +1 p(θ x T y T 1 )p(x T +1 x T,y T,θ)dθ r(x T,y T,x T +1 ), (27) V B (x T,x T y T 1,T) = max p(θ x T y T 1 )p(x T +1 x T,y T,θ)dθ y T A(x T ) x T +1 r(x T,y T,x T +1 ), (28) p(θ x T y T 1 ) 1 T x T y T 1 t 1 t T 1 x t x t y t 1 d B(x t,x t y t 1,t) =arg max y t A(x t) x t+1 p(θ x t y t 1 )p(x t+1 x t,y t,θ)dθ (r(x t,y t,x t+1 )+V B (x t+1,x t+1 y t,t+1)). (29) V B (x t,x t y t 1,t) = max p(θ x t y t 1 )p(x t+1 x t,y t,θ)dθ y t A(x t) x t+1 (r(x t,y t,x t+1 )+V B (x t+1,x t+1 y t,t+1)). (30) (27) (30) d B (x 1,x 1, 1) 1 T (27) (30) [8] t x t y t e i x t+1 p(θ xt y t 1 )p(x t+1 x t,y t,θ)dθ p(θ x t y t 1 )p(x t+1 x t,y t,θ)dθ = p(θ x t y t 1 )p(e i F (mv(x t,2,y t )),θ)dθ = N(F (mv(x t,2,y t )),e i x t y t 1 )+α 1 N(F (mv(x t,2,y t )) x t y t 1 )+α 2, (31) α 1 = α(e i F (mv(x t,2,y t ))), (32) α 2 = α(e j F (mv(x t,2,y t ))) e j E + α(mv(x t,2,y t ) F (mv(x t,2,y t ))), (33) N(F (mv(x t,2,y t )),e i x t y t 1 ) x t y t 1 F (mv(x t,2,y t )) e i N(F (mv(x t,2,y t )) x t y t 1 ) x t y t 1 F (mv(x t,2,y t )) α(e i F (mv(x t,2,y t ))) p(e i F (mv(x t,2,y t )),θ) α(mv(x t,2,y t ) F (mv(x t,2,y t ))) 1 e p(e i E i F (mv(x t,2,y t )),θ) c 2012 Information Processing Society of Japan 1612
[6], [7], [8], [9] 0.5 (27) (30) MDP MDP [10] [10] RPG 3.3 DP (24) DP (30) t t L x t y t 1 p(θ xt y t 1 )p(x t+1 x t,y t,θ)dθ L p(θ L)p(x t+1 x t,y t,θ)dθ p(θ L)p(x t+1 x t,y t,θ)dθ (21) (24) DP 4. 4.1 3.1 2 9 1 2 3 4 5 10 3 HP 6 sm 4 a 3 sm 7 HP a 1 HP a 4 3 HP 1 sm 4 a 4 HP 2 Fig. 2 A map of experiment. 2 1 Table 2 Settings of probabilities (the first part). p(e 1 f 2,θ ) p(e 2 f 2,θ ) p(e 1 f 3,θ ) p(e 2 f 3,θ ) 0.3 0.0 0.0 0.8 1 Table 1 Settings of the ground. F(sm 1 ) F(sm 2 ) F(sm 3 ) F(sm 4 ) F(sm 5 ) F(sm 6 ) F(sm 7 ) F(sm 8 ) F(sm 9 ) f 1 f 2 f 3 f 2 f 2 f 3 f 3 f 3 f 3 c 2012 Information Processing Society of Japan 1613
Table 3 3 2 Settings of probabilities (the second part). p(c(e 1 ) a 5,e 1,θ ) p(c(e 2 ) a 5,e 2,θ ) p(b(e 1 ) e 1,θ ) p(b(e 2 ) e 2,θ ) p(map a 6,θ ) 0.9 0.9 0.6 0.6 0.6 4 1 Table 4 Othe settings (the first part). Mhp E M(e 1 ) M(e 2 ) G(e 1 ) G(e 2 ) 10 {e 1,e 2 } 2 4 1 5 5 2 Table 5 Othe settings (the second part). C(e 1 ) C(e 2 ) B(e 1 ) B(e 2 ) T 3 3 1 3 10 sm 1 HP 9 HP 1 sm 4 a 1 sm 5 HP sm 5 4.2 3.3 10 100 1,000 100 100 10 88% 100 94% 1,000 96% DP 5. RPG MDP RPG MDP RPG MDP RPG MDP RPG RPG MDP RPG MDP [11], [12], [13], [14] RPG DP MDP MDP POMDP [15], [16] RPG POMDP RPG MDP RPG RPG c 2012 Information Processing Society of Japan 1614
RPG [1] RPG GI, Vol.2001, No.28, pp.31 38 (2001). [2] RPG GI, Vol.2001, No.58, pp.67 74 (2001). [3] (1973). [4] Martin, L.P.: Markov Decision Processes, John Wiley & Sons (1994). [5] (1979). [6] Berger, J.O.: Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York (1980). [7] (1985). [8] Matsushima, T. and Hirasawa, S.: A Bayes coding algorithm for Markov models, TECHNICAL REPORT OF IEICE, IT95-1, pp.1 6 (1995). [9] (2009). [10] Martin, J.J.: Bayesian Decision Problems and Markov Chains, John Wiley & Sons (1967). [11] Vol.39, No.4, pp.1116 1126 (1998). [12] k- Vol.10, No.3, pp.454 463 (1995). [13] l- Vol.11, No.5, pp.804 808 (1996). [14] MarcoPolo Vol.12, No.1, pp.78 89 (1997). [15] Kaelbling, L.P. Vol.12, No.6, pp.822 830 (1997). [16] POMDPs Vol.14, No.1, pp.148 156 (1999). 7 9 22 63 2 5 10 16 2 21 c 2012 Information Processing Society of Japan 1615
57 62 5 8 13 c 2012 Information Processing Society of Japan 1616