Topic Modeling with Latent Dirichlet Allocation

Σχετικά έγγραφα
Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Solution Series 9. i=1 x i and i=1 x i.

Statistical Inference I Locally most powerful tests

Other Test Constructions: Likelihood Ratio & Bayes Tests

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

ST5224: Advanced Statistical Theory II

Bayesian modeling of inseparable space-time variation in disease risk

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

C.S. 430 Assignment 6, Sample Solutions

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

6.3 Forecasting ARMA processes

Tutorial on Multinomial Logistic Regression

Additional Results for the Pareto/NBD Model

HMY 795: Αναγνώριση Προτύπων

Introduction to Bayesian Statistics

Math 6 SL Probability Distributions Practice Test Mark Scheme

2 Composition. Invertible Mappings

Homework for 1/27 Due 2/5

Assalamu `alaikum wr. wb.

Theorem 8 Let φ be the most powerful size α test of H

An Inventory of Continuous Distributions

The Simply Typed Lambda Calculus

HMY 795: Αναγνώριση Προτύπων

Problem Set 3: Solutions

Μελέτη των μεταβολών των χρήσεων γης στο Ζαγόρι Ιωαννίνων 0

Math221: HW# 1 solutions

Solutions to Exercise Sheet 5

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

ΠΑΝΔΠΗΣΖΜΗΟ ΠΑΣΡΩΝ ΣΜΖΜΑ ΖΛΔΚΣΡΟΛΟΓΩΝ ΜΖΥΑΝΗΚΩΝ ΚΑΗ ΣΔΥΝΟΛΟΓΗΑ ΤΠΟΛΟΓΗΣΩΝ ΣΟΜΔΑ ΤΣΖΜΑΣΩΝ ΖΛΔΚΣΡΗΚΖ ΔΝΔΡΓΔΗΑ

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ON NEGATIVE MOMENTS OF CERTAIN DISCRETE DISTRIBUTIONS

Buried Markov Model Pairwise

4.6 Autoregressive Moving Average Model ARMA(1,1)

Lecture 7: Overdispersion in Poisson regression

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Probability and Random Processes (Part II)

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Statistical analysis of extreme events in a nonstationary context via a Bayesian framework. Case study with peak-over-threshold data

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

ESTIMATION OF SYSTEM RELIABILITY IN A TWO COMPONENT STRESS-STRENGTH MODELS DAVID D. HANAGAL

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Δυσκολίες που συναντούν οι μαθητές της Στ Δημοτικού στην κατανόηση της λειτουργίας του Συγκεντρωτικού Φακού

Example of the Baum-Welch Algorithm

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

ΕΚΤΙΜΗΣΗ ΤΟΥ ΚΟΣΤΟΥΣ ΤΩΝ ΟΔΙΚΩΝ ΑΤΥΧΗΜΑΤΩΝ ΚΑΙ ΔΙΕΡΕΥΝΗΣΗ ΤΩΝ ΠΑΡΑΓΟΝΤΩΝ ΕΠΙΡΡΟΗΣ ΤΟΥ

Online Appendix To: Bayesian Doubly Adaptive Elastic-Net Lasso For VAR Shrinkage

EE512: Error Control Coding

þÿ ³¹µ¹½ º±¹ ±ÃÆ»µ¹± ÃÄ ÇÎÁ

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΠΗΡΕΑΖΕΙ ΤΗΝ ΠΡΟΛΗΨΗ ΚΑΡΚΙΝΟΥ ΤΟΥ ΜΑΣΤΟΥ

5.4 The Poisson Distribution.

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Homomorphism in Intuitionistic Fuzzy Automata

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Test Data Management in Practice

Non-informative prior distributions

derivation of the Laplacian from rectangular to spherical coordinates

Δθαξκνζκέλα καζεκαηηθά δίθηπα: ε πεξίπησζε ηνπ ζπζηεκηθνύ θηλδύλνπ ζε κηθξνεπίπεδν.

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

ΣΔΥΝΟΛΟΓΗΚΟ ΔΚΠΑΗΓΔΤΣΗΚΟ ΗΓΡΤΜΑ ΗΟΝΗΧΝ ΝΖΧΝ «ΗΣΟΔΛΗΓΔ ΠΟΛΗΣΗΚΖ ΔΠΗΚΟΗΝΧΝΗΑ:ΜΔΛΔΣΖ ΚΑΣΑΚΔΤΖ ΔΡΓΑΛΔΗΟΤ ΑΞΗΟΛΟΓΖΖ» ΠΣΤΥΗΑΚΖ ΔΡΓΑΗΑ ΔΤΑΓΓΔΛΗΑ ΣΔΓΟΤ

6. MAXIMUM LIKELIHOOD ESTIMATION

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ: Mετακύλιση τιμών βασικών προϊόντων και τροφίμων στην περίπτωση του Νομού Αιτωλοακαρνανίας

ΖΩΝΟΠΟΙΗΣΗ ΤΗΣ ΚΑΤΟΛΙΣΘΗΤΙΚΗΣ ΕΠΙΚΙΝΔΥΝΟΤΗΤΑΣ ΣΤΟ ΟΡΟΣ ΠΗΛΙΟ ΜΕ ΤΗ ΣΥΜΒΟΛΗ ΔΕΔΟΜΕΝΩΝ ΣΥΜΒΟΛΟΜΕΤΡΙΑΣ ΜΟΝΙΜΩΝ ΣΚΕΔΑΣΤΩΝ

Supplementary Appendix

( y) Partial Differential Equations

Srednicki Chapter 55

Ανάλυση Προτιμήσεων για τη Χρήση Συστήματος Κοινόχρηστων Ποδηλάτων στην Αθήνα

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΓΡΟΤΙΚΗΣ ΟΙΚΟΝΟΜΙΑΣ & ΑΝΑΠΤΥΞΗΣ

«Έντυπο και ψηφιακό βιβλίο στη σύγχρονη εποχή: τάσεις στην παγκόσμια βιομηχανία».

These derivations are not part of the official forthcoming version of Vasilaky and Leonard

þÿÿ ÁÌ» Â Ä Å ¹µÅ Å½Ä ÃÄ

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Homework 3 Solutions

About these lecture notes. Simply Typed λ-calculus. Types

Research on Economics and Management

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

FORMULAS FOR STATISTICS 1

Elements of Information Theory

Υπολογιστική Φυσική Στοιχειωδών Σωματιδίων

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. Τα γνωστικά επίπεδα των επαγγελματιών υγείας Στην ανοσοποίηση κατά του ιού της γρίπης Σε δομές του νομού Λάρισας

Numerical Analysis FMN011

5.1 logistic regresssion Chris Parrish July 3, 2016

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής

«ΑΓΡΟΤΟΥΡΙΣΜΟΣ ΚΑΙ ΤΟΠΙΚΗ ΑΝΑΠΤΥΞΗ: Ο ΡΟΛΟΣ ΤΩΝ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΣΤΗΝ ΠΡΟΩΘΗΣΗ ΤΩΝ ΓΥΝΑΙΚΕΙΩΝ ΣΥΝΕΤΑΙΡΙΣΜΩΝ»

AME SAMPLE REPORT James R. Cole, Ph.D. Neuropsychology

Π Ο Λ Ι Τ Ι Κ Α Κ Α Ι Σ Τ Ρ Α Τ Ι Ω Τ Ι Κ Α Γ Ε Γ Ο Ν Ο Τ Α

Finite Field Problems: Solutions

Τ.Ε.Ι. ΔΥΤΙΚΗΣ ΜΑΚΕΔΟΝΙΑΣ ΠΑΡΑΡΤΗΜΑ ΚΑΣΤΟΡΙΑΣ ΤΜΗΜΑ ΔΗΜΟΣΙΩΝ ΣΧΕΣΕΩΝ & ΕΠΙΚΟΙΝΩΝΙΑΣ

ΔΙΑΤΜΗΜΑΤΙΚΟ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥΔΩΝ ΣΤΗ ΔΙΟΙΚΗΣΗ ΕΠΙΧΕΙΡΗΣΕΩΝ ΘΕΜΕΛΙΩΔΗΣ ΚΛΑΔΙΚΗ ΑΝΑΛΥΣΗ ΤΩΝ ΕΙΣΗΓΜΕΝΩΝ ΕΠΙΧΕΙΡΗΣΕΩΝ ΤΗΣ ΕΛΛΗΝΙΚΗΣ ΑΓΟΡΑΣ

Η Επίδραση των Events στην Απόδοση των Μετοχών

Η αλληλεπίδραση ανάμεσα στην καθημερινή γλώσσα και την επιστημονική ορολογία: παράδειγμα από το πεδίο της Κοσμολογίας

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

AΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ

Lecture 21: Properties and robustness of LSE

Transcript:

Topic Modeling with Latent Dirichlet Allocation Vineet Mehta University of Massachusetts - Lowell Vineet Mehta (UML) Topic Modeling 1 / 34

Contents 1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 2 / 34

Contents 1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 2 / 34

Contents 1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 2 / 34

Contents 1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 2 / 34

1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 3 / 34

Background A number of techniques for text analysis and information retrieval have been developed over the past decades This presentation focuses on one such technique known as Latent Dirichlet Allocation Latent Dirichlet Allocation (LDA) was introduced by David Blei, Andrew Ng and Michael Jordan in a 2003 paper in Journal of Machine Learning Research Since its introduction LDA has been employed for applications beyond text analysis LDA has also seen a number of extensions Vineet Mehta (UML) Topic Modeling 4 / 34

Topic Modeling LDA aims at classifying large collections of documents through statistical relationships amongst words, known as topics Topics are distributions over a vocabulary of words LDA employs Bayesian inference techniques to estimate the statistical quantities that are topics The LDA approach to text analysis does not assume any knowledge of language structure Documents in a text collection are treated as bag-of-words Vineet Mehta (UML) Topic Modeling 5 / 34

Applications A Small Sampling Exploring scientific, political, wikipedia articles Audio information retrieval using acoustic features Image segmentation using visual features Identifying surprising events in video data Analysis of stock categories using financial topic models Development of user recommendation systems in social media Topic models for gene expression analysis Analysis of twitter data for public health status and trends Vineet Mehta (UML) Topic Modeling 6 / 34

Finding Out More A really small sampling...so just google it! Workshops Topic Models: Computation, Application, and Evaluation (NIPS 2013) Applications for Topic Models: Text and Beyond (NIPS 2009) Workshop on Topic Models: Structure, Applications, Evaluation, and Extensions (ICML 2010) Topic Modeling for Humanities Research (MITH 2012) Topic Modeling Software lda-c: C-code by David Blei lda: R-language package at CRAN gensim: Python package includes Latent Dirichlet Allocation mallet: Java machine learning package, including topic modeling topictoolbox: Matlab toolbox by UCI Data UCI machine learning repository: http://archive.ics.uci.edu/ml/datasets.html infochimps: http://www.infochimps.com Enron dataset: https://www.cs.cmu.edu/~enron/ LDC (not free): http://catalog.ldc.upenn.edu Vineet Mehta (UML) Topic Modeling 7 / 34

1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 8 / 34

Bayes Theorem Consider the dataset X (K) N which consists of N samples X (K) N = {x 1 x N } where x i = [x 1,i x K,i ] T is sample from the random vector X (K) i The random vector X (K) i has a distribution p(x (K) i θ) parameterized by θ. And {X (K) i } are independent and identically distributed random vectors. Bayes Theorem posterior p(θ X (K) ) = p(x (K) θ) p(θ) p(x (K) ) likelihood prior evidence Vineet Mehta (UML) Topic Modeling 9 / 34

Key Distributions Univariate Case Binomial Distribution Bin(x θ, N) p(x = x θ, N) = Bernoulli Distribution ( ) N θ x (1 θ) 1 x x N 1 x Bern(x θ) p(x = x θ) = θ x (1 θ) 1 x x {0, 1} Likelihood of N Bernoulli observations p(x (1) N N θ) = θ I(x i =1) (1 θ) I(x i =0) = θ n 1 (1 θ) n 0 N = n 0 + n 1 i=1 Vineet Mehta (UML) Topic Modeling 10 / 34

Key Distributions Multivariate Case Multinomial Distribution Mult(x θ, N) p(x (K) = x θ, N) = ( ) N K θ x k x k x N K 1, k=1 K θ k = 1 k=1 Categorical Distribution Cat(x θ) p(x (K) = x θ, 1) = k=1 θ x k k = θ ki(x k = 1) x {0, 1} K Vineet Mehta (UML) Topic Modeling 11 / 34

Key Distributions Multivariate Case (continued) Likelihood of N Bernoulli observations where p(x (K) N N θ) = p(x i θ) = i=1 N i=1 k=1 K n k = N k=1 θ I(x i,k=1) k = k=1 θ n k k Vineet Mehta (UML) Topic Modeling 12 / 34

Parameterized Priors Hyperparameters Generalization: Parameter θ depends on the hyperparameter ϑ Bayes Theorem p(θ) p(θ ϑ) p(θ X (K) p(x(k) N N, ϑ) = θ)p(θ ϑ) (K) p(x N θ)p(θ ϑ)dθ Vineet Mehta (UML) Topic Modeling 13 / 34

Conjugate Priors Univariate Case: p(θ ϑ) Beta Distribution: ϑ = (α, β) Beta(θ α, β) p(θ α, β) = 1 B(α, β) θα 1 (1 θ) β 1 B(α, β) = Γ(α)Γ(β) Γ(α + β) Posterior Distribution: p(θ X (1) N, ϑ) 1 p(θ X (1) N, ϑ) = B(α,β) θn1+α 1 (1 θ) n 0+β 1 1 θ n 1 +α 1 (1 θ) n0+β 1 dθ = B(α,β) 1 B(n 1 + α, n 0 + β) θn 1+α 1 (1 θ) n 0+β 1 Beta(θ n 1 + α, n 0 + β) Vineet Mehta (UML) Topic Modeling 14 / 34

Prediction Univariate Case Marginalizing out likelihood parameters p(x (1) N α, β) = p(x (1) N θ)p(θ α, β)dθ 1 = θ n1+α 1 (1 θ) n0+β 1 dθ = B(n 1 + α, n 0 + β) B(α, β) B(α, β) = Γ(n 1 + α)γ(n 0 + β)γ(α + β) Γ(n 1 + n 0 + α + β)γ(α)γ(β) New sample likelihood p( x = 1 X (1) N p( x = 1, X(1) N α, β), α, β) = p(x (1) N α, β) = n 1 + α n 1 + n 0 + α + β Vineet Mehta (UML) Topic Modeling 15 / 34

Conjugate Priors Multivariate Case: p(θ ϑ) Dirichlet Distribution Dir(θ ϑ) p(θ ϑ) = 1 (ϑ) k=1 θ ϑ k 1 k (ϑ) = Γ(ϑ k ) k=1, Γ( K ϑ k ) k=1 K θ k = 1 k=1 Posterior Distribution: p(θ X (K) N, ϑ) K k=1 θn k+ϑ k 1 1 p(θ X (K) N, ϑ) = (ϑ) k K k=1 θn k+ϑ k 1 dθ = 1 (ϑ) 1 (n + ϑ) k K k Dir(θ n + ϑ) k=1 θn k+ϑ k 1 Vineet Mehta (UML) Topic Modeling 16 / 34

1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 17 / 34

Modeling Text Notation x i,m z i,m ω j ξ k θ m φ k α β i-th word in m-th document topic from which i-th workd in m-th document is drawn value taken by x i,m, where j [1, V ] and V is vocabulary size value taken by z i,m, where k [1, K] and K is topic count topic distribution for m-th document word distribution for k-th topic hyperparameters for document topic distribution hyperparameters for topic word distribution X (V ) N m words in document m X (V ) N words in all documents (corpus) Z (K) N m topics associated with words in document m Z (K) N topics associated with words in corpus Vineet Mehta (UML) Topic Modeling 18 / 34

Modeling Text Latent Dirichlet Allocation Generative Model for all topics k [1, K] do φ k Dir(φ k β) for all documents m [1, M] do θ m Dir(θ m α) N m Pois(N m ξ) for all words i [1, N m ] in document m do topic index z i,m Mult(z i,m θ m, 1) word x i,m Mult(x i,m φ {k:i(zi,m =ξ k )}, 1) Vineet Mehta (UML) Topic Modeling 19 / 34

1 Introduction 2 Preliminaries 3 Modeling Text with Latent Dirichlet Allocation 4 Parameter Estimation Vineet Mehta (UML) Topic Modeling 20 / 34

Latent Dirichlet Allocation Joint Distribution of Known and Hidden Variables i-th word in m-th document All words in m-th document p(x i,m, z i,m, θ m, Φ α, β) p(x (V ) N m, Z (K) N m N m, θ m, Φ α, β) = p(x i,m, z i,m, θ m, Φ α, β) All words in corpus p(x (V ) N =, Z(K) N i=1, Θ, Φ α, β) = M m=1 p(x (V ) N m, Z (K) N m, θ m, Φ α, β) M N m p(x i,m z i,m, Φ)p(z i,m θ m )p(θ m α)p(φ β) m=1 i=1 Vineet Mehta (UML) Topic Modeling 21 / 34

Latent Dirichlet Allocation Conditional Distributions - Word Likelihoods Word in Document p(x i,m z i,m, Φ) = All Words in Document N m p(x i,m z i,m, Φ) = i=1 = V φ I(x i,m=ω j z i,m =ξ k ) k,j j=1 k=1 N m V i=1 j=1 k=1 V φ ρ k,j k,j j=1 k=1 ρ k,j is the count of word j assigned to topic k φ I(x i,m=ω j z i,m =ξ k ) k,j Vineet Mehta (UML) Topic Modeling 22 / 34

Latent Dirichlet Allocation Conditional Distributions - Topic Likelihoods Topic Likelihood for Single Word p(z i,m θ m ) = k=1 θ I(z i,m=ξ k ) m,k Topic Likelihood for all Words in Document N m p(z i,m θ m ) = i=1 = N m i=1 k=1 k=1 θ υ m,k m,k θ I(z i,m=ξ k ) m,k υ m,k is the count of words in document m assigned to topic k Vineet Mehta (UML) Topic Modeling 23 / 34

Latent Dirichlet Allocation Priors Topic Distribution for Document Word Distribution over Topics p(θ m α) = 1 (α) p(φ β) = 1 (β) k=1 V k=1 j=1 θ α k 1 m,k φ β j 1 k,j Vineet Mehta (UML) Topic Modeling 24 / 34

Latent Dirichlet Allocation Full Joint Distribution p(x (V ) N, Z(K) N N m = =, Θ, Φ α, β) M p(x i,m z i,m, Φ)p(z i,m θ m )p(θ m α)p(φ β) m=1 i=1 1 (α) (β) M V m=1 j=1 k=1 φ ρ k,j +β j 1 k,j θ υ m,k+α k 1 m,k Vineet Mehta (UML) Topic Modeling 25 / 34

Latent Dirichlet Allocation Integrating out Θ and Φ p(x (V ) N = =, Z(K) N 1 (α) (β) M m=1 α, β) (υ m + α) (α) M V m=1 j=1 k=1 k=1 φ ρ k,j +β j 1 k,j (ρ k + β) (β) θ υ m,k+α k 1 m,k dθdφ Vineet Mehta (UML) Topic Modeling 26 / 34

Gibbs Sampling Sampling the Posterior p(z X, α, β) Sampling Algorithm initialize Z to Z (0) = {z (0) 1... z (0) N } at interation l = 0 for l [0, L] do for n [1, N] do sample z (l+1) n p(z (l+1) n {z (l+1) 1... z (l+1) n 1, z(l) n+1... z(l) }, X, α, β) N After sufficient iterations the sampler converges, and the samples z (l) n instances of p(z X, α, β) are Vineet Mehta (UML) Topic Modeling 27 / 34

Constructing the Posterior for Gibbs Sampler p(z n Z n, X, α, β) = p(x, Z α, β) p(x, Z α, β) p(x n, Z n α, β)p(x n α, β) p(x n, Z n α, β) p(x n, Z n α, β) = p(x n Z n, Φ)p(Z n Θ)p(Θ α)p(φ β) dθdφ p(x n Z n, Φ) = M N m V m=1 i=1 j=1 k=1 n=(q,r,s,t) n (m,i,j,k) (q,r,s,t) φ I(x i,m=ω j z i,m =ξ k ) k,j = V j=1 k=1 φ ρ( n) k,j k,j p(z n Θ) = M N m m=1 i=1 k=1 n=(q,r,s,t) n (m,i,k) (q,r,t) θ I(z i,m=ξ k ) m,k = M m=1 k=1 θ υ( n) m,k m,k Vineet Mehta (UML) Topic Modeling 28 / 34

Defining Counts for Posterior in Gibbs Sampler Counts words assigned to topics k: ρ k, ρ ( n) ρ ( n) k,j = { ρ k,j (j, k) (s, t) ρ k,j 1 (j, k) = (s, t) k Counts words in document m assigned to topics: υ m, υ m ( n) { υ ( n) m,k = υ m,k (m, k) (q, t) υ m,k 1 (m, k) = (q, t) Vineet Mehta (UML) Topic Modeling 29 / 34

Joint Distributions and Posterior p(x, Z α, β) = M m=1 (υ m + α) (α) k=1 (ρ k + β) (β) p(x n, Z n α, β) = M m=1 (υ ( n) m + α) (α) k=1 (ρ ( n) k + β) (β) p(z n Z n, X, α, β) p(x, Z α, β) p(x n, Z n α, β) = (υ q + α) (ρ t + β) (υ ( n) q + α) (ρ ( n) t + β) (y) = K k=1 Γ(y k) Γ( K k=1 y k) Γ(y + 1) = yγ(y) Vineet Mehta (UML) Topic Modeling 30 / 34

Simplifing Expression for Posterior p(z n Z n, X, α, β) K k=1 Γ(υ V q,k+α k ) Γ( j=1 Γ(ρ t,j +β j ) K k=1 υ q,k+α k ) Γ( V j=1 ρ t,j +β j ) K k=1 Γ(υ( n) q,k +α V k) Γ( j=1 Γ(ρ( n) t,j +β j ) K k=1 υ( n) q,k +α k) Γ( V j=1 ρ( n) t,j +β j ) Γ(υ q,t+α t) Γ( Γ(ρ t,s+β s) K k=1 υ q,k+α k ) Γ( V j=1 ρ t,j +β j ) Γ(υ ( n) q,t +α t) Γ( K k=1 υ( n) q,k +α k) Γ(υ q,t+α t) Γ( K k=1 υ q,k+α k ) Γ(υ q,t+α t 1) Γ( K k=1 υ q,k+α k 1) υ q,t + α t 1 K k=1 υ q,k + α k 1 Γ(ρ ( n) t,s +β s) Γ( V j=1 ρ( n) t,j +β j ) Γ(ρ t,s+β s) Γ( V j=1 ρ t,j +β j ) Γ(ρ t,s+β s 1) Γ( V j=1 ρ t,j +β j 1) ρ t,s + β s 1 V j=1 ρ t,j + β j 1 Note that the counts υ m,k and ρ k,j are updated over the Gibbs sampling iterations Vineet Mehta (UML) Topic Modeling 31 / 34

Estimating Topic Model Parameters Distribution of topics in documents p(θ m Z Nm, α) = 1 C θm p(z Nm θ m )p(θ m α) = = 1 N m C θm (α) 1 C θm (α) i=1 k=1 k=1 θ I(z (i,m)=ξ k ) m,k θ α k 1 m,k θ υ m,k m,k θα k 1 m,k = Dir(θ m υ m + α) Vineet Mehta (UML) Topic Modeling 32 / 34

Estimating Topic Model Parameters Continued Distribution of words in topics Let N(ξ k ) = {(i, m) : z i,m = ξ k } p(φ k X N(ξ), Z N(ξ), β) = 1 p(x C N(ξ) φ k )p(φ k β) φk = 1 p(x i,m φ C k )p(φ k β) φk = = N(ξ k ) 1 C φk (β) 1 C φk (β) V N(ξ k ) j=1 V j=1 φ I(x i,m=ω j ) k,j φ β j 1 k,j φ ρ k,j +β j 1 k,j = Dir(φ k ρ k + β) Vineet Mehta (UML) Topic Modeling 33 / 34

Estimating Topic Model Parameters Continued Given x = (x 1... x K ) Dir(x α) E[x i ] = α i ᾱ Var[x i ] = α i(ᾱ α i ) ᾱ 2 (ᾱ + 1) ᾱ = K i=1 α i = α T 1 Estimate for distribution of topics in documents [a k = (υ m + α) T 1] E[θ m,k ] = υ m,k + α k a k Var[θ m,k ] = (υ m,k + α k )[a k (υ m,k + α k )] a 2 k (a k + 1) Estimate for distribution of words in topics [b k = (ρ k + β) T 1] E[φ k,j ] = ρ k,j + β j b k Var[φ k,j ] = (ρ k,j + β j )[b k (ρ k,j + β j )] b 2 k (b k + 1) Vineet Mehta (UML) Topic Modeling 34 / 34