Bayesian Variable Order n-gram Language Model based on Hierarchical Pitman-Yor Processes

Σχετικά έγγραφα
A Vocabulary-Free Infinity-Gram Model for Chord Progression Analysis

Buried Markov Model Pairwise

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Elements of Information Theory

ER-Tree (Extended R*-Tree)

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

The Simply Typed Lambda Calculus

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Study of urban housing development projects: The general planning of Alexandria City

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

{takasu, Conditional Random Field

Bayesian modeling of inseparable space-time variation in disease risk

Applying Markov Decision Processes to Role-playing Game

ON NEGATIVE MOMENTS OF CERTAIN DISCRETE DISTRIBUTIONS

Problem Set 3: Solutions

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb


Partial Trace and Partial Transpose

TMA4115 Matematikk 3

Homomorphism in Intuitionistic Fuzzy Automata

Stabilization of stock price prediction by cross entropy optimization

5.4 The Poisson Distribution.

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

Η ΠΡΟΣΩΠΙΚΗ ΟΡΙΟΘΕΤΗΣΗ ΤΟΥ ΧΩΡΟΥ Η ΠΕΡΙΠΤΩΣΗ ΤΩΝ CHAT ROOMS

476,,. : 4. 7, MML. 4 6,.,. : ; Wishart ; MML Wishart ; CEM 2 ; ;,. 2. EM 2.1 Y = Y 1,, Y d T d, y = y 1,, y d T Y. k : p(y θ) = k α m p(y θ m ), (2.1

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Yahoo 2. SNS Social Networking Service [3,5,12] Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Quick algorithm f or computing core attribute

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Διαχείριση Web Περιεχομένου & Γλωσσικά Εργαλεία

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Bayesian Discriminant Feature Selection

Τo ελληνικό τραπεζικό σύστημα σε περιόδους οικονομικής κρίσης και τα προσφερόμενα προϊόντα του στην κοινωνία.

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

ST5224: Advanced Statistical Theory II

Homework 3 Solutions

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΣΧΟΛΗ ΑΝΘΡΩΠΙΣΤΙΚΩΝ ΚΑΙ ΚΟΙΝΩΝΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΤΜΗΜΑ ΦΙΛΟΛΟΓΙΑΣ

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Η ΔΙΔΑΣΚΑΛΙΑ ΤΩΝ ΜΟΡΦΟΛΟΓΙΚΩΝ ΔΙΑΔΙΚΑΣΙΩΝ ΤΗΣ ΠΑΡΑΓΩΓΗΣ ΚΑΙ ΤΗΣ ΣΥΝΘΕΣΗΣ ΥΠΟ ΤΟ ΠΡΙΣΜΑ ΤΩΝ ΑΠΣ: ΜΙΑ ΚΡΙΤΙΚΗ ΘΕΩΡΗΣΗ

derivation of the Laplacian from rectangular to spherical coordinates

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Numerical Analysis FMN011

ΑΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΦΙΛΟΣΟΦΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΙΤΑΛΙΚΗΣ ΦΙΛΟΛΟΓΙΑΣ ΜΕΤΑΠΤΥΧΙΑΚΟΣ ΚΥΚΛΟΣ ΣΠΟΥΔΩΝ ΚΑΤΕΥΘΥΝΣΗ: ΛΟΓΟΤΕΧΝΙΑ-ΠΟΛΙΤΙΣΜΟΣ

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΕΙΡΑΙΩΣ ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΠΜΣ «ΠΡΟΗΓΜΕΝΑ ΣΥΣΤΗΜΑΤΑ ΠΛΗΡΟΦΟΡΙΚΗΣ» ΚΑΤΕΥΘΥΝΣΗ «ΕΥΦΥΕΙΣ ΤΕΧΝΟΛΟΓΙΕΣ ΕΠΙΚΟΙΝΩΝΙΑΣ ΑΝΘΡΩΠΟΥ - ΥΠΟΛΟΓΙΣΤΗ»

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Math 6 SL Probability Distributions Practice Test Mark Scheme

IMES DISCUSSION PAPER SERIES

Démographie spatiale/spatial Demography

Ψηφιακό Μουσείο Ελληνικής Προφορικής Ιστορίας: πώς ένας βιωματικός θησαυρός γίνεται ερευνητικό και εκπαιδευτικό εργαλείο στα χέρια μαθητών

ΑΥΤΟΜΑΤΟΠΟΙΗΣΗ ΜΟΝΑΔΑΣ ΘΡΑΥΣΤΗΡΑ ΜΕ ΧΡΗΣΗ P.L.C. AUTOMATION OF A CRUSHER MODULE USING P.L.C.

Test Data Management in Practice

The Algorithm to Extract Characteristic Chord Progression Extended the Sequential Pattern Mining

Βιογραφικό Σημείωμα. (τελευταία ενημέρωση 20 Ιουλίου 2015) 14 Ιουλίου 1973 Αθήνα Έγγαμος

Twitter 6. DEIM Forum 2014 A Twitter,,, Wikipedia, Explicit Semantic Analysis,

y = f(x)+ffl x 2.2 x 2X f(x) x x p T (x) = 1 Z T exp( f(x)=t ) (2) x 1 exp Z T Z T = X x2x exp( f(x)=t ) (3) Z T T > 0 T 0 x p T (x) x f(x) (MAP = Max

Reading Order Detection for Text Layout Excluded by Image

Probabilistic Approach to Robust Optimization

Biostatistics for Health Sciences Review Sheet

Approximation of distance between locations on earth given by latitude and longitude

Οι απόψεις και τα συμπεράσματα που περιέχονται σε αυτό το έγγραφο, εκφράζουν τον συγγραφέα και δεν πρέπει να ερμηνευτεί ότι αντιπροσωπεύουν τις

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

Topic Modeling with Latent Dirichlet Allocation

Η Διδακτική Ενότητα «Γνωρίζω τον Υπολογιστή», στα πλαίσια των Προγραμμάτων Σπουδών της Πληροφορικής: μια Μελέτη Περίπτωσης.

ΑΚΑΔΗΜΙΑ ΕΜΠΟΡΙΚΟΥ ΝΑΥΤΙΚΟΥ ΜΑΚΕΔΟΝΙΑΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

AΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ

(String-to-Tree ) KJ [11] best 1-best 2. SMT 2. [9] Brockett [2] Mizumoto [10] Brockett [2] [10] [15] ê = argmax e P(e f ) = argmax e M m=1 λ

Other Test Constructions: Likelihood Ratio & Bayes Tests

A research on the influence of dummy activity on float in an AOA network and its amendments

Μηχανισμοί πρόβλεψης προσήμων σε προσημασμένα μοντέλα κοινωνικών δικτύων ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Simplex Crossover for Real-coded Genetic Algolithms

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Conjoint. The Problems of Price Attribute by Conjoint Analysis. Akihiko SHIMAZAKI * Nobuyuki OTAKE

Optimization Investment of Football Lottery Game Online Combinatorial Optimization

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

Μπεϋζιανή Στατιστική και MCMC Μέρος 2 ο : MCMC

Transcript:

Vol. 0 No. 0 2007 Pitman-Yor n-gram,, n-gram Pitman-Yor,, n-gram,,, n,,, Bayesian Variable Order n-gram Language Model based on Hierarchical Pitman-Yor Processes Daichi Mochihashi? and Eiichiro Sumita? This paper proposes a variable order n-gram language model by extending a recently proposed model based on the hierarchical Pitman-Yor processes. Introducing a stochastic process on an infinite depth prediction suffix tree, we can infer the hidden n-gram context from which each word originated. Experiments on standard large corpora showed validity and efficiency of the proposed model. Our architecture is also applicable to general Markov models to estimate their variable orders of generation. 1. n, Shannon 1),,, n, (n 1) (n 1),, V V n 1, n 1,,, n=3 () n = 4, 5, ATR / ATR Spoken Language Communication Research Laboratories / National Institute of Communications Technology NTT Presently with NTT Communication Science Laboratories, The United States of America,,,,,, n,, longer than n, n 3, n,, n, n, 2),, 1

2 2007, 3)4), 5) MDL,,, 6) 7),,, n, 10) 11), Pitman-Yor, n 0-gram 1- gram 2-gram, Pitman-Yor n, n, n, n n,, n,,, 2, Pitman-Yor n, 3 Suffix Tree, 4 LDA, 5 6, 7 2. Pitman-Yor n-gram Pitman-Yor n (HPYLM) 10)11), n, 4), n 8), 9), 10), 0 1 2 she sing sing like cry will he like it ɛ and butter bread model is language = = ( ) 1 CRP Suffix Tree, Fig. 1 Suffix tree representation of a hierarchical Chinese Restaurant Process. Each customer corresponds to a count in the model.,, Kneser- Ney 12), Pitman-Yor, 2- = 13) Pitman-Yor, 14),, CRP ( ) 15),, 1, 2 Prediction Suffix Tree () 25), she will sing, ɛ, ɛ will she, she will, p(sing she will), (ɛ she will),,, Prediction Suffix Tree Suffix Tree HPYLM,,, 1 1 (CRP, =, ) 1, 2 n, 1,, 1-, 2-,,

Vol. 0 No. 0 Pitman-Yor n-gram 3,,,, ( ), she will like, he will like, will, p(like she will) 3-gram p(like she will) ( 0), 2- gram p(like will), like, 2-gram 1-gram,, HPYLM, h = w t n w t 1 w, p(w h) = c(w h) d t hw θ+c(h), + θ+d t h θ+c(h) p(w h ) (1) c(w h) h w, c(h)= w c(w h), h = w t n+1 w t 1 1 t hw w p(w h), p(w h ), w t hw = t h d, θ Pitman-Yor, Suffix Tree,, 10), ( ),,, MCMC 16),, 4 (1), c(w h) 1, t hw 1, Kneser-Ney, Kneser-Ney 1 qk 1 qj k 1 qi j 2 Suffix Tree (1 q i ) i Fig. 2 A stochastic process to add a customer to the suffix tree. (1 q i ) means a penetration probability of node i.,, 1, (n 1), Suffix Tree,, 2, n, 3. Pitman-Yor 3.1 Suffix Tree, Suffix Tree i, q i (, (1 q i) i ), q i [0, 1], i q i Be(α, β) (2), Be(α, β) = Γ(α+β)/(Γ(α)+Γ(β)) q α 1 (1 q) β 1 q, E[q] = α/(α+β) 3, (α, β) h=w t w t 2 w t 1 w t, Suffix Tree ɛ, 4 3.5 3 2.5 2 1.5 1 0.5 (0.5,0.5): Jeffrey s (1,1): Uniform (4,1) (2,4) (1), 1-gram, 0-gram ( 1/V ) Fig. 3 0 0 0.2 0.4 0.6 0.8 1 3 Examples of Beta distributions with different hyperparameters.

4 2007 ɛ w t 1 w t 2,, (n 1), q i q 0, q 1, q 2,..., l 1 p(n=l h) = q l (1 q i ) (3) i=0 l, ( 2),, q i (, ) n,, q i ( ) (3), n n,, 3.2 n-gram, Suffix Tree q i, q i,, w = w 1 w 2 w T, n n = n 1 n 2 n T,, w p(w) = p(w, n, s) (4) n s = s 1 s 2 s T, s t, w t 10) n, s,, HPYLM, w t n n t, n t p(n t w, n t, s t ) (5) (s t, ) n t, s t, n, s n t, s t (5) n t, () Suffix Tree n t, i a i, b i s, q i, E[q i] = a i+α a i+b i+α+β, (5) (6) p(n t w, n t, s t ) p(w t w t, n, s t ) p(n t w t, n t, s t ) (7), n n t w t n, (1), n t, (3) (6), p(n t = l w t, n t, s t ) l 1 a l +α b i+β = a l +b l +α+β a i+b i+α+β i=0 (8), n, Pitman- Yor (VPYLM) ( 4) HPYLM, w t Suffix Tree order[t],, a i b i 1 n, a i b i 1, order[t] order[t], (n 1), n HPYLM (7) n t,, n max, n t (8) ɛ, n n,, n - 3.3, n, n, h = w w 2 w 1 p(w h) = p(w, n h) (9) n = p(w n, h)p(n h) (10) n (2)(3), Suffix Tree, Beta two-parameter process Stick-breaking process 15),, / (s t )

Vol. 0 No. 0 Pitman-Yor n-gram 5 For j = 1 N { For t = randperm(1 T ) { } if (j >1) then remove customer (order[t], w t, w 1:t 1 ); order[t] = add customer (w t, w 1:t 1); } 4 VPYLM Fig. 4 The Gibbs sampler of VPYLM. h n, (8) n HPYLM, (1), Kneser-Ney, VPYLM, HPYLM n n = 0, 1,, (10), w n, s N Gibbs 3.4, Suffix Tree, 5), 18) O(log n),, Power Law, 5, Suffix Tree n 4. n-gram n,, 4.1 Latent Dirichlet Allocation (LDA), LDA 19) LDA,,, Gibbs, 17), CRP, struct ngram { /* n-gram node */ ngram *parent; /* parent node */ splay *children; /* = (ngram **) */ splay *words; /* = (restaurant **) */ int stop; /* a h */ int through; /* b h */ int ncounts; /* c(h) */ int ntables; /* t h */ int id; /* word id */ }; 5 n Fig. 5 Data structure of a n-gram node. θ d = (θ 1, θ 2,..., θ M ), θ θ Dir(γ) = Γ(Mγ) M θ γ 1 Γ(γ) M t (11) t=1, γ, d = w 1 w 2 w N, ( 1 ) θ d Dir(γ) ( 2 ) For n = 1... N, a. t Mult(θ d ) b. w n p(w t), p(w t) 19), LDA,, VPYLM, VPYLM,, LDA, n, n, 4.2, VPYLM, ( Pitman-Yor ) LDA θ d, n, Pitman-Yor n, n h,, w t hw

6 2007,,,, VPYLM LDA (VPYLDA), 6 draw topic, w t k p(k w t, w 1:t 1 ) p(w t w 1:t 1, k) p(t θ d ) (12) p(w t w 1:t 1, k) (n d t,k +γ) (13) k, n order[t] ( table[t] Suffix Tree, ), topic[t], VPYLM, p(w h, k) k VPYLM (10), n d t,k d k (w t ) 5. 5.1 VPYLM on Text Corpora 5.1.0.1, 21) 22) NAB (North American Business News) WSJ 409,246, 10,007,108, 10,000,, 10 For j = 1 N { For t = randperm(1 T ) { d = document(w t); k = topic[t]; if (j >1) then remove customer (table[t]); k = draw topic (w t,w 1:t 1,d); table[t] = add customer (w t,w 1:t 1,k); topic[t] = k; } } 6 VPYLDA Fig. 6 The Gibbs sampler of VPYLDA., Sinica 20), (n = 5 43.39.) 1 N/A Table 1 Test-set perplexities with the number of nodes in the model in English and Japanese corpora. N/A means a memory overflow. (a) (NAB ) n HPYLM VPYLM Nodes(H) Nodes(V) 3 113.60 113.74 1,417K 1,344K 5 101.08 101.69 12,699K 7,466K 7 N/A 100.68 N/A 10,182K 8 N/A 100.58 N/A 10,434K 117.65 10,392K (b) () n HPYLM VPYLM Nodes(H) Nodes(V) 3 78.06 78.22 1,341K 1,243K 5 68.36 69.35 12,140K 6,705K 7 N/A 68.63 N/A 9,134K 8 N/A 68.60 N/A 9,490K 80.38 9,561K 26,497 2000 ( 150 ) 520,000, 10,079,410, 10,000 MeCab 23), 10, 32,783 5.1.0.2, HPYLM, VPYLM N =200, 50 Suffix Tree (α, β) = (4, 1), (1, 1) (α, β), 6 Xeon 3.2GHz, 4GB Linux 5.1.0.3 1, HPYLM VPYLM n n, n max, VPYLM HPYLM 40 %, HPYLM n = 7, 8 n 1, Modified Kneser-Ney SRILM 24), n = 3, 5, 7, 8 118.91, 107.99, 107.24, 107.21, HPYLM VPYLM

Vol. 0 No. 0 Pitman-Yor n-gram 7,,, () n, VPYLM HPYLM 20%, VPYLM n, 7, 8- VPYLM n,,, n = 3, 4 5.2 Suffix Tree, (9) p(w, n h) h = w w 2 w 1 n w n w 1 w,, w n w 1 w, america the united states of, the united states of america, (, ) w, w 1 w 2 w n w, w n w 1w, Suffix Tree, 8, 8- VPYLM, 7 # of occurrences 3.5 10 6 3.0 10 6 2.5 10 6 2.0 10 6 1.5 10 6 1.0 10 6 5.0 10 5 0.0 10 0 0 1 2 3 4 5 6 7 n 8 VPYLM n n = 0, n = 1,... (1), n Fig. 7 Global n-gram context length distribution inferred by a 8-gram VPYLM. n = 0 is a unigram, n = 1 is a bigram, and so on. Each n-gram is further interpolated by lower-order n-grams recursively through the equation (1). p(w, n) Stochastic phrases in the suffix tree 0.9784 primary new issues 0.9726 ˆ at the same time 0.9556 american telephone & 0.9512 is a unit of 0.9394 to # % from # % 0.9026 from # % in # to # % 0.8896 in a number of 0.8831 in new york stock exchange composite trading 0.8696 a merrill lynch & co. 0.7566 mechanism of the european monetary 0.7134 increase as a result of 0.6617 tiffany & co. : 8 Fig. 8 NAB 8 VPYLM Suffix Tree, Stochastic phrases computed from the suffix tree of 8-gram VPYLM that is trained on the NAB corpus. 5.3 VPYLDA on 20news-18828 5.3.0.4 n LDA(VPYLDA), 20news-18828 26), 20 4,000, 400 NAB, 801,580, 10,477 N = 250, h = w 1 w n 1 w n, p(w n h) = p(w n h, t) p(t h) (n = 1... N) t (14) p(w n h, t) t n (10), p(t h) h t,, LDA EM 19) 5.3.0.5 2, M VPYLDA M =1, VPYLM,,,, 1 CRP, t,,

8 2007, 1,,,, 6., Context Tree Weighting 27), Pereira 5),, n, Witten-Bell,,, Pitman-Yor n,,, PPM*, PPM*,, Witten-Bell,,,,, 5, Suffix Tree (α, β) = (4, 1),, (6) q i 2 20news-18828 VPYLDA Table 2 VPYLDA test-set perplexities on the 20news- 18828 dataset. Model PPL VPYLDA (M = 5) 104.69 (M = 10) 103.57 (M = 20) 103.28 VPYLM 105.30 PPL 136 134 132 130 128 126 124 0.1 0.5 1 α 2 5 10 2 1 0.5 0.1 10 5 β 134 132 130 128 126 124 122 9 VPYLM (α, β) Fig. 9 Relationship between the VPYLM hyperparameters (α, β) and test-set perplexities. 10 VPYLM, n = 5. Fig. 10 Random walk generation from the language model trained on Maihime. 28) Newton, 1M NAB, (0.85, 0.57), VPY 9, (α, β) (0.1 10) (0.1 10) β α,, 10 5-gram VPYLM,, n = 5, (8),,, (0.856), (0.803), (0.398), (0.199), 10, mixture of Gaussians mixture of flour,,

Vol.0 No.0 Pitman-Yor n-gram 9 7., n Pitman-Yor, n n Suffix Tree,, n,,, n,, Suffix Tree,,, Suffix Tree,, 29),,, 1) Shannon, C. E.: A mathematical theory of communication, Bell System Technical Journal, Vol.27, pp.379 423, 623 656 (1948). 2) NLP-2005 (2005). 3) Siu, M. and Ostendorf, M.: Variable n-grams and extensions for conversational speech language modeling, IEEE Trans. on Speech and Audio Processing, Vol.8, pp.63 75 (2000). 4) Stolcke, A.: Entropy-based Pruning of Backoff Language Models, Proc. of DARPA Broadcast News Transcription and Understanding Workshop, pp.270 274 (1998). 5) Pereira, F., Singer, Y. and Tishby, N.: Beyond Word N-grams, Proc. of the Third Workshop on Very Large Corpora, pp.95 106 (1995). 6) Leonardi, F.G.: A generalization of the PST algorithm: modeling the sparse nature of protein sequences, Bioinformatics, Vol.22, No.11, pp.1302 1307 (2006). 7) Buhlmann, P. and Wyner, A. J.: Variable Length Markov Chains, The Annals of Statistics, Vol.27, No.2, pp.480 513 (1999). 8) Kawabata, T. and Tamoto, M.: Back-off Method for N-gram Smoothing based on Binomial Posteriori Distribution, ICASSP-96, Vol.I, pp.192 195 (1996). 9) Cowans, P.: Probabilistic Document Modelling, PhD Thesis, University of Cambridge (2006). http://www.inference.phy.cam.ac.uk/ pjc51/thesis/index.html. 10) Teh, Y. W.: A Bayesian Interpretation of Interpolated Kneser-Ney, Technical Report TRA2/06, School of Computing, NUS (2006). 11) Teh, Y.W.: A Hierarchical Bayesian Language Model based on Pitman-Yor Processes, Proc. of COLING/ACL 2006, pp.985 992 (2006). 12) Kneser, R. and Ney, H.: Improved backing-off for m-gram language modeling, Proceedings of ICASSP, Vol.1, pp.181 184 (1995). 13) Pitman, J. and Yor, M.: The Two-Parameter Poisson-Dirichlet Distribution Derived from a Stable Subordinator, Annals of Probability, Vol.25, No.2, pp.855 900 (1997). 14) Teh, Y.W., Jordan, M.I., Beal, M.J. and Blei, D.M.: Hierarchical Dirichlet Processes, Technical Report 653, Department of Statistics, University of California at Berkeley (2004). 15) Ghahramani, Z.: Non-parametric Bayesian Methods, UAI 2005 Tutorial (2005). http://learning.eng.cam.ac.uk/zoubin/talks/ uai05tutorial-b.pdf. 16) Gilks, W.R., Richardson, S. and Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice, Chapman & Hall / CRC (1996). 17) Daumé III, H.: Fast search for Dirichlet process mixture models, AISTATS 2007 (2007). 18) Sleator, D. and Tarjan, R.: Self-Adjusting Binary Search Trees, JACM, Vol. 32, No. 3, pp. 652 686 (1985). 19) Blei, D.M., Ng, A.Y. and Jordan, M.I.: Latent Dirichlet Allocation, Journal of Machine Learning Research, Vol.3, pp.993 1022 (2003). 20) Huang, C.-R. and Chen, K.-j.: A Chinese Corpus for Linguistics Research, Proc. of COLING 1992, pp.1214 1217 (1992). 21) Chen, S. F. and Goodman, J.: An Empirical Study of Smoothing Techniques for Language Modeling, Proc. of ACL 1996, pp.310

10 2007 318 (1996). 22) Goodman, J.T.: A Bit of Progress in Language Modeling, Extended Version, Technical Report MSR TR 2001 72, Microsoft Research (2001). 23) Kudo, T.: MeCab: Yet Another Part-of- Speech and Morphological Analyzer. http://mecab.sourceforge.net/. 24) Stolcke, A.: SRILM An Extensible Language Modeling Toolkit, Proc. of ICSLP, Vol.2, pp. 901 904 (2002). 25) Ron, D., Singer, Y. and Tishby, N.: The Power of Amnesia, Advances in Neural Information Processing Systems, Vol.6, pp.176 183 (1994). 26) Lang, K.: Newsweeder: Learning to filter netnews, Proceedings of the Twelfth International Conference on Machine Learning, pp.331 339 (1995). 27) Willems, F., Shtarkov, Y. and Tjalkens, T.: The Context-Tree Weighting Method: Basic Properties, IEEE Trans. on Information Theory, Vol.41, pp.653 664 (1995). 28) Minka, T. P.: Estimating a Dirichlet distribution (2000). http://research.microsoft.com/ minka/papers/dirichlet/. 29) Pitman, J.: Combinatorial Stochastic Processes, Technical Report 621, Department of Statistics, University of California, Berkeley (2002). (??? ) (??? ) 1973 1998 2005 ( ) 2003 ATR /2007, NTT, 1982. 1999.,ATR. NiCT,, ATR-Langue., e. IEICE, NLP, ASJ, ACL, IEEE, ACM/TSLP Associate Editor.