Various types of likelihood

Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood, Bayesian asymptotics 2. quasi-likelihood, composite likelihood 3. semi-parametric likelihood, partial likelihood 4. empirical likelihood, penalized likelihood 5. bootstrap likelihood, h-likelihood, weighted likelihood, pseudo-likelihood, local likelihood, sieve likelihood, simulated likelihood STA 4508: Topics in Likelihood Inference January 14, 2014 1/57

Nuisance parameters: notation θ = (ψ, λ) = (ψ 1,..., ψ q, λ 1,..., λ d q ) ( ) Uψ (θ) U(θ) =, U U λ (θ) λ (ψ, ˆλ ψ ) = 0 ( ) ( ) iψψ i i(θ) = ψλ jψψ j j(θ) = ψλ i λψ i λλ ( i i 1 (θ) = ψψ i ψλ ) i λψ i λλ j λψ j λλ ( j j 1 (θ) = ψψ j ψλ ). j λψ i ψψ (θ) = {i ψψ (θ) i ψλ (θ)i 1 λλ (θ)i λψ(θ)} 1, l p (ψ) = l(ψ, ˆλ ψ ), j p (ψ) = l p(ψ) j λλ STA 4508: Topics in Likelihood Inference January 14, 2014 2/57

Nuisance parameters: approximate pivots w u (ψ) = U ψ (ψ, ˆλ ψ ) T {i ψψ (ψ, ˆλ ψ )}U ψ (ψ, ˆλ ψ ). χ 2 q w e (ψ) = ( ˆψ ψ) T {i ψψ ( ˆψ, ˆλ)} 1 ( ˆψ ψ). χ 2 q w(ψ) = 2{l( ˆψ, ˆλ) l(ψ, ˆλ ψ )} = 2{l p ( ˆψ) l p (ψ)}. χ 2 q; r u (ψ) = l p(ψ)j 1/2 p ( ˆψ) r e (ψ) = ( ˆψ ψ)j 1/2 p ( ˆψ). N(0, 1),. N(0, 1), r(ψ) = sign( ˆψ ψ)[2{l p ( ˆψ) l p (ψ)}] 1/2. N(0, 1) STA 4508: Topics in Likelihood Inference January 14, 2014 3/57

Nuisance parameters: properties of likelihood maximum likelihood estimates are equivariant: ĥ(θ) = h(ˆθ) for one-to-one h( ) question: which of w e, w u, w are invariant under reparametrization of the full parameter: ϕ(θ)? question: which of r e, r u, r are invariant under interest-respecting reparameterizations (ψ, λ) {ψ, η(ψ, λ)}? consistency of maximum likelihood estimate equivalence of maximum likelihood estimate and root of score equation observed vs. expected information STA 4508: Topics in Likelihood Inference January 14, 2014 5/57

Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. quasi-likelihood, composite likelihood 3. semi-parametric likelihood, partial likelihood 4. empirical likelihood, penalized likelihood 5. bootstrap likelihood, h-likelihood, weighted likelihood, pseudo-likelihood, local likelihood, sieve likelihood, simulated likelihood STA 4508: Topics in Likelihood Inference January 14, 2014 7/57

Marginal and conditional likelihoods Example: Y N(Xβ, σ 2 ), Y R n Example: Y ij N(µ i, σ 2 ), Example: Y ij N(µ, σ 2 i ), j = 1,..., k; i = 1,..., m j = 1,..., k i ; i = 1,..., m Example: Y i1, Y i2 Bernoulli(p i1, p i2 ), i = 1,..., n Example: Y i1, Y i2 Exponential(λ i ψ, λ i /ψ) or ψλ i, ψ/λ i STA 4508: Topics in Likelihood Inference January 14, 2014 8/57

Frequentist inference, nuisance parameters first-order pivotal quantities r u (ψ) = l P (ψ)j P( ˆψ) 1/2. N(0, 1), r e (ψ) = ( ˆψ ψ)j P ( ˆψ) 1/2. N(0, 1), r(ψ) = sign( ˆψ ψ)[2{l P ( ˆψ) l P (ψ)}] 1/2. N(0, 1) all based on treating profile log-likelihood as a one-parameter log-likelihood example y = Xβ + ɛ, ɛ N(0, σ 2 ) ˆσ 2 = (y X ˆβ) T (y X ˆβ)/n STA 4508: Topics in Likelihood Inference January 14, 2014 10/57

log-likelihood -6-4 -2 0 3 4 5 6 7 8 ψ 1 2

Eliminating nuisance parameters by using marginal density f (y; ψ, λ) f m (t 1 ; ψ)f c (t 2 t 1 ; ψ, λ) Example N(Xβ, σ 2 I) : f (y; β, σ 2 ) f m (RSS; σ 2 )f c ( ˆβ RSS; β, σ 2 ) by using conditional density f (y; ψ, λ) f c (t 1 t 2 ; ψ)f m (t 2 ; ψ, λ) Example N(Xβ, σ 2 I) : f (y; β, σ 2 ) f c (RSS ˆβ; σ 2 )f m ( ˆβ; β, σ 2 ) STA 4508: Topics in Likelihood Inference January 14, 2014 12/57

Linear exponential families conditional density free of nuisance parameter f (y i ; ψ, λ) = exp{ψ T s(y i ) + λ T t(y i ) k(ψ, λ)}h(y i ) f (y; ψ, λ) = s = t = f (s, t; ψ, λ) = f (s t; ψ) = STA 4508: Topics in Likelihood Inference January 14, 2014 13/57

Adjusted profile log-likelihood l A (ψ) = l p (ψ) + A(ψ) = l(ψ, ˆλ ψ ) + A(ψ) A(ψ) assumed to be O p (1) generic form is A FR (ψ) = + 1 2 log j λλ(ψ, ˆλ ψ ) log d(λ) d ˆλ ψ Fraser, 2003 closely related A BN (ψ) = 1 2 log j λλ(ψ, ˆλ ψ ) + log d ˆλ d ˆλ ψ SM 12.4.1, BN 1983 if i ψλ (θ) = 0, then ˆλ ψ = ˆλ + O p (n 1 ), suggesting we ignore last term if ψ is scalar, then in principle we can find a parametrization (ψ, λ) in which i ψλ (θ) = 0 SM 12.4.2 STA 4508: Topics in Likelihood Inference January 14, 2014 14/57

Asymptotics for Bayesian inference exp{l(θ; y)}π(θ) π(θ y) = exp{l(θ; y)}π(θ)dθ expand numerator and denominator about ˆθ, assuming l (ˆθ) = 0 π(θ y). = N{ˆθ, j 1 (ˆθ)} expand denominator only about ˆθ result π(θ y). = 1 (2π) d/2 j(ˆθ) +1/2 exp{l(θ; y) l(ˆθ; y)} π(θ) π(ˆθ) STA 4508: Topics in Likelihood Inference January 14, 2014 15/57

Posterior is asymptotically normal π(θ y). N{ˆθ, j 1 (ˆθ)} θ R, y = (y 1,..., y n ) careful statement STA 4508: Topics in Likelihood Inference January 14, 2014 16/57

... posterior is asymptotically normal π(θ y). N{ˆθ, j 1 (ˆθ)} θ R, y = (y 1,..., y n ) equivalently l π (θ) = STA 4508: Topics in Likelihood Inference January 14, 2014 17/57

... posterior is asymptotically normal In fact, If π(θ) > 0 and π (θ) is continuous in a neighbourhood of θ 0, there exist constants D and n y s.t. F n (ξ) Φ(ξ) < Dn 1/2, for all n > n y, on an almost-sure set with respect to π(θ 0 )f (y; θ 0 ), where y = (y 1,..., y n ) is a sample from f (y; θ 0 ), and θ 0 is an observation from the prior density π(θ). F n (ξ) = Pr{(θ ˆθ)j 1/2 (ˆθ) ξ y} Johnson (1970); Datta & Mukerjee (2004) STA 4508: Topics in Likelihood Inference January 14, 2014 18/57

Laplace approximation π(θ y). = 1 (2π) 1/2 j(ˆθ) +1/2 exp{l(θ; y) l(ˆθ; y)} π(θ) π(ˆθ) π(θ y) = π(θ y) = 1 (2π) 1/2 j(ˆθ) +1/2 exp{l(θ; y) l(ˆθ; y)} π(θ) π(ˆθ) {1+O p(n 1 )} y = (y 1,..., y n ), θ R 1 1 (2π) 1/2 j π(ˆθ π ) +1/2 exp{l π (θ; y) l π (ˆθ π ; y)}{1+o p (n 1 )} STA 4508: Topics in Likelihood Inference January 14, 2014 19/57

Posterior tail area θ π(ϑ y)dϑ. = θ 1 (2π) 1/2 el(ϑ;y) l( ˆϑ;y) 1/2 π(ϑ) j( ˆϑ) π( ˆϑ) dϑ STA 4508: Topics in Likelihood Inference January 14, 2014 20/57

Posterior cdf θ π(ϑ y)dϑ. = θ 1 (2π) 1/2 el(ϑ;y) l( ˆϑ;y) 1/2 π(ϑ) j( ˆϑ) π( ˆϑ) dϑ SM, 11.3 STA 4508: Topics in Likelihood Inference January 14, 2014 21/57

BDR, Ch.3, Cauchy with flat prior

Nuisance parameters y = (y 1,..., y n ) f (y; θ), θ = (ψ, λ) π m (ψ y) = π(ψ, λ y)dλ = exp{l(ψ, λ; y)π(ψ, λ)dλ exp{l(ψ, λ; y)π(ψ, λ)dψdλ STA 4508: Topics in Likelihood Inference January 14, 2014 24/57

... nuisance parameters y = (y 1,..., y n ) f (y; θ), θ = (ψ, λ) π m (ψ y) = π(ψ, λ y)dλ = exp{l(ψ, λ; y)π(ψ, λ)dλ exp{l(ψ, λ; y)π(ψ, λ)dψdλ j(ˆθ) = j ψψ (ˆθ) j λλ (ˆθ) STA 4508: Topics in Likelihood Inference January 14, 2014 25/57

Posterior marginal cdf, d = 1 Π m (ψ y) =. = ψ ψ π m (ξ y)dξ 1 (2π) 1/2 elp(ξ) lp(ˆξ) j 1/2 p (ˆξ) π(ξ, ˆλ ξ ) j λλ (ˆξ, ˆλ) 1/2 π(ˆξ, ˆλ) j λλ (ξ, ˆλ ξ ) 1/2 dξ STA 4508: Topics in Likelihood Inference January 14, 2014 26/57

... posterior marginal cdf, d = 1 Π m (ψ y) r = r(ψ) =. = Φ(r B ) = Φ{r + 1 r log(q B r )} q B = q B (ψ) = STA 4508: Topics in Likelihood Inference January 14, 2014 27/57

normal circle, k=2 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 28/57 ψ

normal circle, k = 2, 5, 10 p value 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 STA 4508: Topics in Likelihood Inference January 14, 2014 31/57 ψ

Link to adjusted log-likelihoods π m (ψ y). = 1 (2π) d/2 elp(ψ) lp( ˆψ) j 1/2 p ( ˆψ) π(ψ, ˆλ ψ ) π( ˆψ, ˆλ) j λλ ( ˆψ, ˆλ) 1/2 j λλ (ψ, ˆλ ψ ) 1/2 π m (ψ y) =. c exp{l p (ψ) 1 2 log j λλ(ψ, ˆλ ψ ) + log π(ψ, ˆλ ψ )} l A (ψ) = l p (ψ) 1 2 log j d ˆλ λλ(ψ, ˆλ ψ ) + log d ˆλ ψ if i ψλ (θ) = 0, then ˆλ ψ = ˆλ + O p (n 1 ) STA 4508: Topics in Likelihood Inference January 14, 2014 35/57