The partial derivatives of their Kullback-Leibler divergence are given by

Σχετικά έγγραφα
1. For each of the following power series, find the interval of convergence and the radius of convergence:

Homework for 1/27 Due 2/5

Solutions: Homework 3

Last Lecture. Biostatistics Statistical Inference Lecture 19 Likelihood Ratio Test. Example of Hypothesis Testing.

Ψηφιακή Επεξεργασία Εικόνας

A study on generalized absolute summability factors for a triangular matrix


α β

FREE VIBRATION OF A SINGLE-DEGREE-OF-FREEDOM SYSTEM Revision B

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Matrix Algebra

Μια εισαγωγή στα Μαθηματικά για Οικονομολόγους

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

SUPERPOSITION, MEASUREMENT, NORMALIZATION, EXPECTATION VALUES. Reading: QM course packet Ch 5 up to 5.6

w o = R 1 p. (1) R = p =. = 1

Solution Series 9. i=1 x i and i=1 x i.

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Example Sheet 3 Solutions

IIT JEE (2013) (Trigonomtery 1) Solutions

CHAPTER 103 EVEN AND ODD FUNCTIONS AND HALF-RANGE FOURIER SERIES

Homework 4.1 Solutions Math 5110/6830

C.S. 430 Assignment 6, Sample Solutions

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Matrices and Determinants

Finite Field Problems: Solutions

Diane Hu LDA for Audio Music April 12, 2010

Other Test Constructions: Likelihood Ratio & Bayes Tests

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Degenerate Perturbation Theory

Bessel function for complex variable

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Statistical Inference I Locally most powerful tests

CE 530 Molecular Simulation

Στα επόμενα θεωρούμε ότι όλα συμβαίνουν σε ένα χώρο πιθανότητας ( Ω,,P) Modes of convergence: Οι τρόποι σύγκλισης μιας ακολουθίας τ.μ.

The Heisenberg Uncertainty Principle

Lecture 17: Minimum Variance Unbiased (MVUB) Estimators

Outline. M/M/1 Queue (infinite buffer) M/M/1/N (finite buffer) Networks of M/M/1 Queues M/G/1 Priority Queue

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Srednicki Chapter 55

Introduction of Numerical Analysis #03 TAGAMI, Daisuke (IMI, Kyushu University)

The Neutrix Product of the Distributions r. x λ

Solve the difference equation

Second Order Partial Differential Equations

Solutions to Exercise Sheet 5

p n r

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Homework 3 Solutions

On Generating Relations of Some Triple. Hypergeometric Functions

Section 8.3 Trigonometric Equations

EE512: Error Control Coding

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Tridiagonal matrices. Gérard MEURANT. October, 2008

Exercises to Statistics of Material Fatigue No. 5

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

LAD Estimation for Time Series Models With Finite and Infinite Variance

n r f ( n-r ) () x g () r () x (1.1) = Σ g() x = Σ n f < -n+ r> g () r -n + r dx r dx n + ( -n,m) dx -n n+1 1 -n -1 + ( -n,n+1)

Math221: HW# 1 solutions

ST5224: Advanced Statistical Theory II

Empirical best prediction under area-level Poisson mixed models

4.6 Autoregressive Moving Average Model ARMA(1,1)

Approximation of distance between locations on earth given by latitude and longitude

6.3 Forecasting ARMA processes

EN40: Dynamics and Vibrations

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Areas and Lengths in Polar Coordinates

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

derivation of the Laplacian from rectangular to spherical coordinates

On the Galois Group of Linear Difference-Differential Equations

Reminders: linear functions

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

Chapter 3: Ordinal Numbers

Second Order RLC Filters

CRASH COURSE IN PRECALCULUS

The Simply Typed Lambda Calculus

2 Composition. Invertible Mappings

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

5. Choice under Uncertainty

1. Matrix Algebra and Linear Economic Models

APPENDIX A DERIVATION OF JOINT FAILURE DENSITIES

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Homework 8 Model Solution Section

Areas and Lengths in Polar Coordinates

Inverse trigonometric functions & General Solution of Trigonometric Equations

INTEGRATION OF THE NORMAL DISTRIBUTION CURVE

( )( ) ( ) ( )( ) ( )( ) β = Chapter 5 Exercise Problems EX α So 49 β 199 EX EX EX5.4 EX5.5. (a)

Quadratic Expressions

D Alembert s Solution to the Wave Equation

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

2. Μηχανικό Μαύρο Κουτί: κύλινδρος με μια μπάλα μέσα σε αυτόν.

Biorthogonal Wavelets and Filter Banks via PFFS. Multiresolution Analysis (MRA) subspaces V j, and wavelet subspaces W j. f X n f, τ n φ τ n φ.

Every set of first-order formulas is equivalent to an independent set

The Equivalence Theorem in Optimal Design

Problem Set 3: Solutions

Transcript:

1 Gradiet of Kullbac-Leibler divergece Let λ ad λ be two sets of atural paraeters of a expoetial faily, that is, qβ; λ = hβ exp λ tβ aλ. 1 The partial derivatives of their Kullbac-Leibler divergece are give by λ D KLλ, λ = ] λ E qβ; λ λ log qβ; λ = λ E λ λ λ tβ aλ + aλ ] 3 where we have used the expoetial faily idetities λ aλ = E λtβ], Asyptotic behavior of trust-regio ethod = λ λ λ E λ tβ] + E λ tβ] aλ λ 4 = λ λ λ aλ + λ aλ aλ λ 5 = λ λ Iλ, 6 aλ = Iλ. 7 λ Here we show that the trust-regio update give below coverges to a atural gradiet step. dλ = argax dλ {L λ + dλ ξd KL λ + dλ, λ} 8 For large ξ, dλ will be close to zero so that we ca focus o the target fuctios first-order ters. For expoetial failies i caoical for, we have ] qβ; λ + dλ D KL λ + dλ, λ = E λ+dλ log 9 qβ; λ The gradiet of the KL divergece i dλ is thus give by = E λ+dλ dλ tβ aλ + dλ + aλ ] 10 = dλ aλ + dλ aλ + dλ + aλ. 11 D KL λ + dλ, λ = aλ + dλ + aλ + dλdλ aλ + dλ 1 Approxiatig the target fuctio aroud λ yields = aλ + dλdλ. 13 dλ L λ ξdλ aλdλ, 14 which whe axiized gives a update proportioal to the atural gradiet directio, dλ = 1 aλ L λ. 15 ξ 1

3 Latet Dirichlet Allocatio I LDA we have global paraeters β cosistig of distributios over words β ad local paraeters θ, z with distributios pβ = Dirβ ; η, 16 pθ = Dirθ ; α, 17 pz θ = θ z, 18 px z, β = β zx. 19 We approxiate the posterior distributio over β with qβ = Dirβ ; λ. 0 Writig the lielihood i the for of Equatio of the ai paper gives px, z, θ β = Dirθ; α θ z β zx 1 = hθ, z exp log β zx = hθ, z exp tβ, fz, x 3 where h ecopasses all ters which do ot deped o β ad tβ = log β, 4 fz, x = I zx, 5 where I ij is a atrix with etry i, j set to 1 ad all other etries set to 0. We here assue that β is a atrix whose rows are the topics β ad A, = veca vec = tra 6 for atrices A ad. Sice the stadard paraetrizatio of the Dirichlet distributio is already i caoical for, we ca iediately apply Equatio 10 of the ai paper to get the steps of the ier loop of the trustregio update, λ = 1 ρ t λ t + ρ t η + NE fx, z ]. 7 The beliefs over z i.e., are coputed i the usual aer lei et al., 003].

4 Mixture odels Cosider a ixture odel where the local paraeters are the cluster assigets ad the global paraeters are the prior weights π ad the copoets paraeters β. We assue the followig ore specific for for the odel, pπ = Dirπ; α, 8 p π = π, 9 pβ hβ exp η tβ, 30 px, β = gx exp tβ fx 31 for suitable futios t, f, g, h. The factors of a ea-field approxiatio to the posterior are give by qπ = Dirπ; γ, 3 q = φ, 33 qβ hβ exp λ tβ. 34 I each iteratio, the trust-regio ethod alterates betwee coputig the optial φ ad updatig γ ad λ. The approxiate posterior over ixture copoets, slightly abusig otatio, is give by where for the expected log-lielihood we have Oce is coputed, we update γ ad λ via = argax L λ, γ, φ 35 φ exp E q log px, βp π] 36 = exp E q log px, β] exp ψγ ψ γ 37 E q log px, β] = E q tβ ] fx + log gx. 38 γ = 1 ρ t γ t + ρ t α + N, 39 λ = 1 ρ t λ t + ρ t η + N fx. 40 For ii-batches of size, these updates becoe γ = 1 ρ t γ t + ρ t α + N, 41 λ = 1 ρ t λ t + ρ t η + N fx. 4 To use these results with ay cocrete ixture odel, we have to write dow the prior over β i caoical for ad ipleet the expected log-lielihood Equatio 38 ad the update i Equatio 4. 3

4.1 Mixture of ultivariate eroullis For the ultivariate eroulli odel, px, β = i β xi i 1 β i 1 xi, 43 we assue beta distributios for the prior ad approxiate posterior, pβ β a i 1 β i b, 44 qβ β a i i 1 β i bi. 45 λ = a, b are already the atural paraeters of the beta distributio, so that the updates Equatio 4 becoe a = 1 ρ t a t + ρ t a + N x i, 46 b = 1 ρ t b t + ρ t b + N 1 x i. 47 The expected log-lielihood eeded for the coputatio of is give by E q log px, β] = x ψa + 1 x ψb 1 ψa + b, 48 where the digaa fuctio ψ is applied poit-wise ad 1 is a vector of oes. 4. Mixture of Gaussias We assue a oral-iverse-wishart distributio NIW for the paraeters β = µ, Σ of each Gaussia distributio. The NIW for a sige copoet is give by pµ, Σ exp s µ Σ µ exp 1 trψσ Σ ν+d+1 49 = exp s µ Σ µ + sµ Σ 1 trs Σ 1 trψσ ν D + 1 log Σ log Σ 50 where = exp η, s, Ψ, ν tµ, Σ 51 η, s, Ψ, ν = s, s, vecs + Ψ, ν 5 = s, b, vec C, ν = η, 53 tµ, Σ = 1 µ Σ µ, Σ µ, vec Σ, log Σ. 54 4

are the atural paraeters ad sufficiet statistics of the distributio, respectively. The lielihood for a sigle data poit x is give by px µ, Σ exp 1 x µ Σ x µ / Σ 1 55 = exp 1 tr Σ xx + x Σ µ 1 µ Σ µ 1 log Σ 56 where = exp tµ, Σ fx, 57 Hece, assuig the atural paraeters of all copoets are give by fx = 1, x, vec xx, 1. 58 η = η, s, Ψ, ν = s, b, vec C, ν, 59 a update of the ier loop of the trust-regio ethod Equatio 4 is give by s = 1 ρ t s t + ρ t s + N, 60 b = 1 ρ t b t + ρ t b N x, 61 C = 1 ρ t C t + ρ t C + N x x, 6 ν = 1 ρ t ν t + ρ t ν + N. 63 Or, i ters of the ore traditioal paraetrizatio, = 1 ρ t st t + 1 ρ t s + N s s x, 64 Ψ = 1 ρ t Ψ t + s t t t + ρ t Ψ + s + N x x s. 65 A proof that these updates leave Ψ positive defiite is give below. Positive defiiteess of Ψ We show positive defiiteess of Ψ i two steps. First, we show that the costrait is fulfilled for ρ t = 0 ad ρ t = 1. Secod, we show that the set of valid atural paraeters iduced by the costrait is covex, iplyig that the costrait ust be fulfilled for ay liear iterpolatio of two atural paraeters. 5

For ρ t = 0, we have Ψ = Ψ t sice oe of the atural paraeters has chaged. Thus, Ψ is positive defiite if Ψ t is positive defiite. For ρ t = 1, we have Ψ = Ψ + s + N x x s, 66 s = 1 s + N s x s + N x, 67 s = s + N. 68 Note that p 0 = s/s ad p = N φ /s are positive ad su to oe ad therefore ca be cosidered probabilities. Let X be a rado variable which taes o value with probability p 0 ad value x with probability p. The we ca rewrite Equatio 66 as Ψ = Ψ + s E p XX ] s E p X]E p X] = Ψ + s VX]. 69 Sice Ψ is positive defiite ad the covariace atrix VX] is at least positive sei-defiite, Ψ ust be positive defiite. We ext show that the set of valid atural paraeters, is covex. Not that this set is covex iff is covex. This set is covex iff {s, b, C, ν : s > 0, ν > D 1, C 1 4s bb is p. d. }, 70 {s, b, C : s > 0, C 1 s bb is p. d. }, 71 {s, b, C : s > 0, C bb is p. d. }, 7 is covex, sice ay perspective fuctio preserves covexity ad P s, b, C = s, b/s, C is a perspective fuctio. Fially, this set is covex iff the followig set is covex, Assue b 1, C 1, b, C Ω ad let ρ 0, 1]. The Ω = {b, C : C bb is p. d. }. 73 ρc 1 + 1 ρc ρb 1 + 1 ρb ρb 1 + 1 ρb 74 = ρc 1 + 1 ρc ρb 1 b + b b 1 + 1 ρb b 1 75 = ρc 1 b 1 b 1 + 1 ρc b b + ρ1 ρb 1 b b 1 b, 76 which is a su of positive defiite ad sei-defiite atrices ad therefore positive defiite. Hece, ad Ω is covex. ρc 1 + 1 ρc, ρb 1 + 1 ρb Ω 77 6

Expected log-lielihood To copute he expected log-lielihood Equatio 38, we eed E q µ Σ µ ] = Eq Eq µ Σ µ ]] Σ 78 = E q tr s Σ Σ + Σ ] 79 = Ds + ν Ψ, 80 E q Σ µ ] x = ν Ψ x, 81 x ] E q Σ x = ν x Ψ x, 8 D E q log Σ ] ν + 1 i = ψ + D log + log Ψ. 83 i=1 For Equatio 79, see Peterse ad Pederse 008]. For Equatio 83, see ishop 006]. The expected loglielihood is thus give by E q log px, β] = E q 1 x Σ x + xσ µ 1 µ Σ µ + 1 log Σ D ] logπ 84 = ν x Ψ x + ν x Ψ D s ν Ψ 85 + 1 D ν + 1 i ψ + D log log Ψ D logπ 86 i=1 = ν x Ψ x D s + 1 D ν + 1 i ψ + 1 log Ψ D log π. i=1 87 Refereces C. M. ishop. Patter Recogitio ad Machie Learig. Spriger, 006. D. M. lei, A. Y. Ng, ad M. I. Jorda. Latet Dirichlet Allocatio. Joural of Machie Learig Research, 3: 993 10, 003. K. Peterse ad M. Pederse. The Matrix Cooboo, 008. 7