Various types of likelihood

Σχετικά έγγραφα
Various types of likelihood

Other Test Constructions: Likelihood Ratio & Bayes Tests

Theorem 8 Let φ be the most powerful size α test of H

6. MAXIMUM LIKELIHOOD ESTIMATION

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Solution Series 9. i=1 x i and i=1 x i.

Lecture 34 Bootstrap confidence intervals

Asymptotic distribution of MLE

Introduction to the ML Estimation of ARMA processes

5.4 The Poisson Distribution.

Tutorial on Multinomial Logistic Regression

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

FORMULAS FOR STATISTICS 1

Statistical Inference I Locally most powerful tests

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

ST5224: Advanced Statistical Theory II

Lecture 7: Overdispersion in Poisson regression

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Lecture 12: Pseudo likelihood approach

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Description of the PX-HC algorithm

Survival Analysis: One-Sample Problem /Two-Sample Problem/Regression. Lu Tian and Richard Olshen Stanford University

Problem Set 3: Solutions

Homework 3 Solutions

Solutions to Exercise Sheet 5

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

derivation of the Laplacian from rectangular to spherical coordinates

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Areas and Lengths in Polar Coordinates

Homework 8 Model Solution Section

More Notes on Testing. Large Sample Properties of the Likelihood Ratio Statistic. Let X i be iid with density f(x, θ). We are interested in testing

Empirical best prediction under area-level Poisson mixed models

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Approximation of distance between locations on earth given by latitude and longitude

Local Approximation with Kernels

6.3 Forecasting ARMA processes

C.S. 430 Assignment 6, Sample Solutions

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Numerical Analysis FMN011

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Biostatistics for Health Sciences Review Sheet

ESTIMATION OF SYSTEM RELIABILITY IN A TWO COMPONENT STRESS-STRENGTH MODELS DAVID D. HANAGAL

Supplementary Appendix

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Abstract Storage Devices

The challenges of non-stable predicates

4.6 Autoregressive Moving Average Model ARMA(1,1)

Chapter 6: Systems of Linear Differential. be continuous functions on the interval


D Alembert s Solution to the Wave Equation

Math 6 SL Probability Distributions Practice Test Mark Scheme

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

The Simply Typed Lambda Calculus

Partial Trace and Partial Transpose

Introduction to Bayesian Statistics

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

Exercises to Statistics of Material Fatigue No. 5

Generalized additive models in R

Areas and Lengths in Polar Coordinates

The Profile Likelihood

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

An Inventory of Continuous Distributions

Tridiagonal matrices. Gérard MEURANT. October, 2008

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007


Mean-Variance Analysis

A Lambda Model Characterizing Computational Behaviours of Terms

w o = R 1 p. (1) R = p =. = 1

Homework for 1/27 Due 2/5

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Lecture 21: Properties and robustness of LSE

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

The ε-pseudospectrum of a Matrix

2 Composition. Invertible Mappings

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Assalamu `alaikum wr. wb.

Space-Time Symmetries

Modern Bayesian Statistics Part III: high-dimensional modeling Example 3: Sparse and time-varying covariance modeling

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Additional Results for the Pareto/NBD Model

( ) 2 and compare to M.

Notes on the Open Economy

SECTION II: PROBABILITY MODELS

Solutions: Homework 3

Transcript:

Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood, Bayesian asymptotics 2. quasi-likelihood, composite likelihood 3. semi-parametric likelihood, partial likelihood 4. empirical likelihood, penalized likelihood 5. bootstrap likelihood, h-likelihood, weighted likelihood, pseudo-likelihood, local likelihood, sieve likelihood, simulated likelihood STA 4508: Topics in Likelihood Inference January 14, 2014 1/57

Nuisance parameters: notation θ = (ψ, λ) = (ψ 1,..., ψ q, λ 1,..., λ d q ) ( ) Uψ (θ) U(θ) =, U U λ (θ) λ (ψ, ˆλ ψ ) = 0 ( ) ( ) iψψ i i(θ) = ψλ jψψ j j(θ) = ψλ i λψ i λλ ( i i 1 (θ) = ψψ i ψλ ) i λψ i λλ j λψ j λλ ( j j 1 (θ) = ψψ j ψλ ). j λψ i ψψ (θ) = {i ψψ (θ) i ψλ (θ)i 1 λλ (θ)i λψ(θ)} 1, l p (ψ) = l(ψ, ˆλ ψ ), j p (ψ) = l p(ψ) j λλ STA 4508: Topics in Likelihood Inference January 14, 2014 2/57

Nuisance parameters: approximate pivots w u (ψ) = U ψ (ψ, ˆλ ψ ) T {i ψψ (ψ, ˆλ ψ )}U ψ (ψ, ˆλ ψ ). χ 2 q w e (ψ) = ( ˆψ ψ) T {i ψψ ( ˆψ, ˆλ)} 1 ( ˆψ ψ). χ 2 q w(ψ) = 2{l( ˆψ, ˆλ) l(ψ, ˆλ ψ )} = 2{l p ( ˆψ) l p (ψ)}. χ 2 q; r u (ψ) = l p(ψ)j 1/2 p ( ˆψ) r e (ψ) = ( ˆψ ψ)j 1/2 p ( ˆψ). N(0, 1),. N(0, 1), r(ψ) = sign( ˆψ ψ)[2{l p ( ˆψ) l p (ψ)}] 1/2. N(0, 1) STA 4508: Topics in Likelihood Inference January 14, 2014 3/57

Nuisance parameters: properties of likelihood maximum likelihood estimates are equivariant: ĥ(θ) = h(ˆθ) for one-to-one h( ) question: which of w e, w u, w are invariant under reparametrization of the full parameter: ϕ(θ)? question: which of r e, r u, r are invariant under interest-respecting reparameterizations (ψ, λ) {ψ, η(ψ, λ)}? consistency of maximum likelihood estimate equivalence of maximum likelihood estimate and root of score equation observed vs. expected information STA 4508: Topics in Likelihood Inference January 14, 2014 5/57

Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. quasi-likelihood, composite likelihood 3. semi-parametric likelihood, partial likelihood 4. empirical likelihood, penalized likelihood 5. bootstrap likelihood, h-likelihood, weighted likelihood, pseudo-likelihood, local likelihood, sieve likelihood, simulated likelihood STA 4508: Topics in Likelihood Inference January 14, 2014 7/57

Marginal and conditional likelihoods Example: Y N(Xβ, σ 2 ), Y R n Example: Y ij N(µ i, σ 2 ), Example: Y ij N(µ, σ 2 i ), j = 1,..., k; i = 1,..., m j = 1,..., k i ; i = 1,..., m Example: Y i1, Y i2 Bernoulli(p i1, p i2 ), i = 1,..., n Example: Y i1, Y i2 Exponential(λ i ψ, λ i /ψ) or ψλ i, ψ/λ i STA 4508: Topics in Likelihood Inference January 14, 2014 8/57

Frequentist inference, nuisance parameters first-order pivotal quantities r u (ψ) = l P (ψ)j P( ˆψ) 1/2. N(0, 1), r e (ψ) = ( ˆψ ψ)j P ( ˆψ) 1/2. N(0, 1), r(ψ) = sign( ˆψ ψ)[2{l P ( ˆψ) l P (ψ)}] 1/2. N(0, 1) all based on treating profile log-likelihood as a one-parameter log-likelihood example y = Xβ + ɛ, ɛ N(0, σ 2 ) ˆσ 2 = (y X ˆβ) T (y X ˆβ)/n STA 4508: Topics in Likelihood Inference January 14, 2014 10/57

log-likelihood -6-4 -2 0 3 4 5 6 7 8 ψ 1 2

Eliminating nuisance parameters by using marginal density f (y; ψ, λ) f m (t 1 ; ψ)f c (t 2 t 1 ; ψ, λ) Example N(Xβ, σ 2 I) : f (y; β, σ 2 ) f m (RSS; σ 2 )f c ( ˆβ RSS; β, σ 2 ) by using conditional density f (y; ψ, λ) f c (t 1 t 2 ; ψ)f m (t 2 ; ψ, λ) Example N(Xβ, σ 2 I) : f (y; β, σ 2 ) f c (RSS ˆβ; σ 2 )f m ( ˆβ; β, σ 2 ) STA 4508: Topics in Likelihood Inference January 14, 2014 12/57

Linear exponential families conditional density free of nuisance parameter f (y i ; ψ, λ) = exp{ψ T s(y i ) + λ T t(y i ) k(ψ, λ)}h(y i ) f (y; ψ, λ) = s = t = f (s, t; ψ, λ) = f (s t; ψ) = STA 4508: Topics in Likelihood Inference January 14, 2014 13/57

Adjusted profile log-likelihood l A (ψ) = l p (ψ) + A(ψ) = l(ψ, ˆλ ψ ) + A(ψ) A(ψ) assumed to be O p (1) generic form is A FR (ψ) = + 1 2 log j λλ(ψ, ˆλ ψ ) log d(λ) d ˆλ ψ Fraser, 2003 closely related A BN (ψ) = 1 2 log j λλ(ψ, ˆλ ψ ) + log d ˆλ d ˆλ ψ SM 12.4.1, BN 1983 if i ψλ (θ) = 0, then ˆλ ψ = ˆλ + O p (n 1 ), suggesting we ignore last term if ψ is scalar, then in principle we can find a parametrization (ψ, λ) in which i ψλ (θ) = 0 SM 12.4.2 STA 4508: Topics in Likelihood Inference January 14, 2014 14/57

Asymptotics for Bayesian inference exp{l(θ; y)}π(θ) π(θ y) = exp{l(θ; y)}π(θ)dθ expand numerator and denominator about ˆθ, assuming l (ˆθ) = 0 π(θ y). = N{ˆθ, j 1 (ˆθ)} expand denominator only about ˆθ result π(θ y). = 1 (2π) d/2 j(ˆθ) +1/2 exp{l(θ; y) l(ˆθ; y)} π(θ) π(ˆθ) STA 4508: Topics in Likelihood Inference January 14, 2014 15/57

Posterior is asymptotically normal π(θ y). N{ˆθ, j 1 (ˆθ)} θ R, y = (y 1,..., y n ) careful statement STA 4508: Topics in Likelihood Inference January 14, 2014 16/57

... posterior is asymptotically normal π(θ y). N{ˆθ, j 1 (ˆθ)} θ R, y = (y 1,..., y n ) equivalently l π (θ) = STA 4508: Topics in Likelihood Inference January 14, 2014 17/57

... posterior is asymptotically normal In fact, If π(θ) > 0 and π (θ) is continuous in a neighbourhood of θ 0, there exist constants D and n y s.t. F n (ξ) Φ(ξ) < Dn 1/2, for all n > n y, on an almost-sure set with respect to π(θ 0 )f (y; θ 0 ), where y = (y 1,..., y n ) is a sample from f (y; θ 0 ), and θ 0 is an observation from the prior density π(θ). F n (ξ) = Pr{(θ ˆθ)j 1/2 (ˆθ) ξ y} Johnson (1970); Datta & Mukerjee (2004) STA 4508: Topics in Likelihood Inference January 14, 2014 18/57

Laplace approximation π(θ y). = 1 (2π) 1/2 j(ˆθ) +1/2 exp{l(θ; y) l(ˆθ; y)} π(θ) π(ˆθ) π(θ y) = π(θ y) = 1 (2π) 1/2 j(ˆθ) +1/2 exp{l(θ; y) l(ˆθ; y)} π(θ) π(ˆθ) {1+O p(n 1 )} y = (y 1,..., y n ), θ R 1 1 (2π) 1/2 j π(ˆθ π ) +1/2 exp{l π (θ; y) l π (ˆθ π ; y)}{1+o p (n 1 )} STA 4508: Topics in Likelihood Inference January 14, 2014 19/57

Posterior tail area θ π(ϑ y)dϑ. = θ 1 (2π) 1/2 el(ϑ;y) l( ˆϑ;y) 1/2 π(ϑ) j( ˆϑ) π( ˆϑ) dϑ STA 4508: Topics in Likelihood Inference January 14, 2014 20/57

Posterior cdf θ π(ϑ y)dϑ. = θ 1 (2π) 1/2 el(ϑ;y) l( ˆϑ;y) 1/2 π(ϑ) j( ˆϑ) π( ˆϑ) dϑ SM, 11.3 STA 4508: Topics in Likelihood Inference January 14, 2014 21/57

BDR, Ch.3, Cauchy with flat prior

Nuisance parameters y = (y 1,..., y n ) f (y; θ), θ = (ψ, λ) π m (ψ y) = π(ψ, λ y)dλ = exp{l(ψ, λ; y)π(ψ, λ)dλ exp{l(ψ, λ; y)π(ψ, λ)dψdλ STA 4508: Topics in Likelihood Inference January 14, 2014 24/57

... nuisance parameters y = (y 1,..., y n ) f (y; θ), θ = (ψ, λ) π m (ψ y) = π(ψ, λ y)dλ = exp{l(ψ, λ; y)π(ψ, λ)dλ exp{l(ψ, λ; y)π(ψ, λ)dψdλ j(ˆθ) = j ψψ (ˆθ) j λλ (ˆθ) STA 4508: Topics in Likelihood Inference January 14, 2014 25/57

Posterior marginal cdf, d = 1 Π m (ψ y) =. = ψ ψ π m (ξ y)dξ 1 (2π) 1/2 elp(ξ) lp(ˆξ) j 1/2 p (ˆξ) π(ξ, ˆλ ξ ) j λλ (ˆξ, ˆλ) 1/2 π(ˆξ, ˆλ) j λλ (ξ, ˆλ ξ ) 1/2 dξ STA 4508: Topics in Likelihood Inference January 14, 2014 26/57

... posterior marginal cdf, d = 1 Π m (ψ y) r = r(ψ) =. = Φ(r B ) = Φ{r + 1 r log(q B r )} q B = q B (ψ) = STA 4508: Topics in Likelihood Inference January 14, 2014 27/57

normal circle, k=2 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 28/57 ψ

normal circle, k=2 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 29/57 ψ

normal circle, k=2 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 30/57 ψ

normal circle, k = 2, 5, 10 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 31/57 ψ

normal circle, k = 2, 5, 10 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 32/57 ψ

normal circle, k = 2, 5, 10 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 33/57 ψ

normal circle, k = 2, 5, 10 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 34/57 ψ

Link to adjusted log-likelihoods π m (ψ y). = 1 (2π) d/2 elp(ψ) lp( ˆψ) j 1/2 p ( ˆψ) π(ψ, ˆλ ψ ) π( ˆψ, ˆλ) j λλ ( ˆψ, ˆλ) 1/2 j λλ (ψ, ˆλ ψ ) 1/2 π m (ψ y) =. c exp{l p (ψ) 1 2 log j λλ(ψ, ˆλ ψ ) + log π(ψ, ˆλ ψ )} l A (ψ) = l p (ψ) 1 2 log j d ˆλ λλ(ψ, ˆλ ψ ) + log d ˆλ ψ if i ψλ (θ) = 0, then ˆλ ψ = ˆλ + O p (n 1 ) STA 4508: Topics in Likelihood Inference January 14, 2014 35/57

Composite likelihood Vector observation: Y f (y; θ), Y Y R m, θ R d Set of events: {A k, k K } Composite Likelihood: (Lindsay, 1988) CL(θ; y) = k K L k (θ; y) w k L k (θ; y) = f ({y A k }; θ) likelihood for an event {w k, k K } a set of weights STA 4508: Topics in Likelihood Inference January 14, 2014 36/57

Examples Composite Conditional Likelihood: (Besag, 1974) L C (θ; y) = s S f s s c(y s y s c) ws, and variants by modifying events Composite Marginal Likelihood: CML(θ; y) = s S f s(y s ; θ) ws, f s (y s ; θ): marginal density of the subvector y s induced by f Independence Likelihood: Pairwise Likelihood: STA 4508: Topics in Likelihood Inference January 14, 2014 37/57

Derived quantities log composite likelihood: cl(θ; y) = log CL(θ; y) score function: U(θ; y) = θ cl(θ; y) = s S w su s (θ; y) U s (θ; y) = θ log f s (y s ; θ) variability matrix: J(θ) = var θ {U(θ; Y )} sensitivity matrix: H(θ) = E θ { θ U(θ; Y )} Godambe information (or sandwich information): G(θ) = H(θ)J(θ) 1 H(θ) STA 4508: Topics in Likelihood Inference January 14, 2014 38/57

Inference Sample: Y 1,..., Y n, i.i.d., CL(θ; y) = n i=1 CL(θ; y i) n(ˆθ CL θ). N{0, G 1 (θ)} G(θ) = H(θ)J(θ) 1 H(θ) STA 4508: Topics in Likelihood Inference January 14, 2014 39/57

... inference w(θ) = 2{cl(ˆθ CL ) cl(θ)}. d a=1 µ az 2 a Z a N(0, 1) µ 1,..., µ d eigenvalues of J(θ)H(θ) 1 STA 4508: Topics in Likelihood Inference January 14, 2014 40/57

... inference w(θ) = 2{cl(ˆθ CL ) cl(θ)}. d a=1 µ az 2 a Z a N(0, 1) µ 1,..., µ d eigenvalues of J(θ)H(θ) 1 w(θ). = (ˆθ CL θ){nh(θ)}(ˆθ CL θ) ˆθ CL. N{θ, G 1 (θ)} STA 4508: Topics in Likelihood Inference January 14, 2014 41/57

Nuisance parameters θ = (ψ, λ) constrained estimator: θ ψ = sup θ=θ(ψ) cl(θ; y) n( ˆψ CL ψ). N{0, G ψψ (θ)} G(θ) = H(θ)J(θ) 1 H(θ) w(ψ) = 2{cl(ˆθ CL ) cl( θ ψ )}. d 0 a=1 µ az 2 a µ 1,..., µ d0 eigenvalues of (H ψψ ) 1 G ψψ Kent, 1982 STA 4508: Topics in Likelihood Inference January 14, 2014 42/57

Model selection Akaike s information criterion Varin and Vidoni, 2005 AIC = 2cl(ˆθ CL ; y) 2 dim(θ) Bayesian information criterion Gao and Song, 2009 BIC = 2cl(ˆθ CL ; y) log n dim(θ) effective number of parameters (?) dim(θ) = tr{h(θ)g 1 (θ)} these criteria used for model averaging Hjort and Claeskens, 2008 or for selection of tuning parameters Gao and Song, 2009 STA 4508: Topics in Likelihood Inference January 14, 2014 43/57

Example: symmetric normal Y i N(0, R), var(y ir ) = 1, corr (Y ir, Y is ) = ρ compound bivariate normal densities to form pairwise likelihood nm(m 1) cl(ρ; y 1,..., y n ) = log(1 ρ 2 ) m 1 + ρ 4 2(1 ρ 2 ) SS w (m 1)(1 ρ) SS b 2(1 ρ 2 ) m SS w = n i=1 s=1 m (y is ȳ i. ) 2, SS b = n(m 1) l(ρ; y 1,..., y n ) = log(1 ρ) n log{1 + (m 1)ρ} 2 2 1 2(1 ρ) SS 1 SS w b 2{1 + (m 1)ρ} m n i=1 y 2 i. STA 4508: Topics in Likelihood Inference January 14, 2014 44/57

... symmetric normal a. var(ˆρ) = a. var(ˆρ CL ) = 2 {1 + (m 1)ρ} 2 (1 ρ) 2 nm(m 1) 1 + (m 1)ρ 2 2 (1 ρ) 2 c(m, ρ) nm(m 1) (1 + ρ 2 ) 2 c(m, ρ) = (1 ρ) 2 (3ρ 2 + 1) + mρ( 3ρ 3 + 8ρ 2 3ρ + 2) + m 2 ρ 2 (1 ρ) 2 2 (1 ρ) 2 a.var(ˆρ CL ) = nm(m 1) (1 + ρ 2 c(m, ρ) ) 2 O( 1 n ) O(1) n m STA 4508: Topics in Likelihood Inference January 14, 2014 45/57

... symmetric normal a.var(ˆρ ), m = 3, 5, 8, 10 a.var(ˆρ CL ) (Cox & Reid, 2004) efficiency 0.85 0.90 0.95 1.00 0.0 0.2 0.4 0.6 0.8 1.0 ρ STA 4508: Topics in Likelihood Inference January 14, 2014 46/57

Likelihood ratio test log likelihoods 30 20 10 0 rho=0.5, n=10, q=5 log likelihoods 40 30 20 10 0 rho=0.8, n=10, q=5 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 rho rho log likelihoods 100 60 40 20 0 rho=0.2, n=10, q=5 log likelihoods 70 50 30 10 0 rho=0.2, n=7, q=5 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 rho rho STA 4508: Topics in Likelihood Inference January 14, 2014 47/57

... symmetric normal + Y i N(µ1, σ 2 R) R st = ρ ˆµ = ˆµ CL, ˆσ 2 = ˆσ 2 CL, ˆρ = ˆρ CL G(θ) = H(θ)J(θ) 1 H(θ) = i(θ) expected Fisher information pairwise likelihood is fully efficient also true for Y i N(µ, Σ) (Mardia, Hughes, Taylor 2007; Jin 2009) because U CL (θ) = J(θ)H(θ)U full (θ) Pagui; Pace et al., 2011 STA 4508: Topics in Likelihood Inference January 14, 2014 48/57

Example: dichotomized MV Normal Y ir = 1{Z ir > 0} Z N(0, R) r = 1,..., m; i = 1, l 2 (ρ) = n {y ir y is log P(y r = 1, y s = 1) + y ir (1 y is ) log P 10 i=1 s<r + (1 y ir )y is log P 01 + (1 y ir )(1 y is ) log P 00 } a.var(ˆρ CL ) = 1 n 4π 2 m 2 (1 ρ 2 ) (m 1) 2 var(t ) T = i (2y ir y is y ir y is ) s<r var(t ) = nm 4 (p 1111 2p 111 + 2p 11 p 2 11 + 1 4 )+ m 3 ( 6p 1111...) + m 2 (...) + m(...) STA 4508: Topics in Likelihood Inference January 14, 2014 49/57

a. variance 0.00 0.05 0.10 0.15 0.20 0.25 pairwise full 0.0 0.2 0.4 0.6 0.8 1.0 rho ρ 0.02 0.05 0.12 0.20 0.40 0.50 ARE 0.998 0.995 0.992 0.968 0.953 0.968 ρ 0.60 0.70 0.80 0.90 0.95 0.98 ARE 0.953 0.903 0.900 0.874 0.869 0.850

Example: multi-level probit model latent variable: z ir = x ir β + b i + ɛ ir, ɛ ir N(0, 1) binary observations: y ir = 1(z ir > 0); r = 1,... m i ; i = 1,... n probit model: Pr(y ir = 1 b i ) = Φ(x ir β + b i); b i N(0, σ 2 b ) likelihood L(β, σ b ) = n i=1 m i pairwise likelihood CL(β, σ b ) = r=1 i=1 r<s Φ(x ir β+b i) y ir {1 Φ(x ir β+b i)} 1 y ir φ(b i, σ 2 b )db i n P y ir y is 11 P y ir (1 y is ) 10 P (1 y ir )y is 01 P (1 y ir )(1 y is ) 00 each Pr(y ir = j, y is = k) evaluated using Φ 2 (, ; ρ irs ) (Renard et al., 2004) STA 4508: Topics in Likelihood Inference January 14, 2014 51/57

... multi-level probit (Renard et al. 2004) computational effort doesn t increase with the number of random effects pairwise likelihood numerically stable efficiency losses, relative to maximum likelihood, of about 20% for estimation of β somewhat larger for estimation of σ 2 b STA 4508: Topics in Likelihood Inference January 14, 2014 52/57

... Example

Markov chains Hjort and Varin, 2008 comparison of likelihood L(θ; y) = pr(y r = y r Y r 1 = y r 1 ; θ) adjoining pairs CML CML(θ; y) = pr(y r = y r, Y r 1 = y r 1 ; θ) composite conditional likelihood (= Besag s PL) CCL(θ; y) = pr(y r = y r neighbours ; θ) STA 4508: Topics in Likelihood Inference January 14, 2014 54/57

... Markov chain example Random walk with p states and two reflecting barriers Transition matrix P = 0 1 0 0... 0 1 ρ 0 ρ 0... 0 0 1 ρ 0 ρ... 0...... 0...... 0 1 0 STA 4508: Topics in Likelihood Inference January 14, 2014 55/57

... Markov chain example Reflecting barrier with five states: efficiency of pairwise likelihood (dashed line) and Besag s pseudolikelihood (solid line) STA 4508: Topics in Likelihood Inference January 14, 2014 56/57

Example: longitudinal count data subjects i = 1,..., n observations counts y ir, r = 1,... m i model y ir Poisson(u ir x T u i1,..., u imi ir β) gamma-distributed random effects but correlated corr(u ir, u is ) = ρ r s joint density has combinatorial number of terms in m i ; impractical weighted pairwise composite likelihood L pair (β) = n i=1 1 m i 1 m i m i r=1 s=r+1 f (y ir, y is ; β) weights chosen so that L pair = full likelihood if ρ = 0 Henderson & Shimura, 2003 STA 4508: Topics in Likelihood Inference January 14, 2014 57/57