Lecture 12: Pseudo likelihood approach Pseudo MLE Let X 1,...,X n be a random sample from a pdf in a family indexed by two parameters θ and π with likelihood l(θ,π). The method of pseudo MLE may be viewed as follows. Based on the sample, an estimate π of π is obtained using some technique other than MLE. The pseudo MLE of θ is then obtained by maximizing the likelihood l(θ, π). Advantages of Pseudo MLE π is viewed as a nuisance parameter. Pseudo MLE consists of replacing π by an estimate and solving a reduced system of likelihood equations, which works when while a higher dimensional MLE is intractable but a lower dimensional MLE is feasible. The consistency and asymptotic normality hold under fairly standard regularity conditions. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 1 / 9
Lecture 12: Pseudo likelihood approach Pseudo MLE Let X 1,...,X n be a random sample from a pdf in a family indexed by two parameters θ and π with likelihood l(θ,π). The method of pseudo MLE may be viewed as follows. Based on the sample, an estimate π of π is obtained using some technique other than MLE. The pseudo MLE of θ is then obtained by maximizing the likelihood l(θ, π). Advantages of Pseudo MLE π is viewed as a nuisance parameter. Pseudo MLE consists of replacing π by an estimate and solving a reduced system of likelihood equations, which works when while a higher dimensional MLE is intractable but a lower dimensional MLE is feasible. The consistency and asymptotic normality hold under fairly standard regularity conditions. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 1 / 9
Lemma Let X 1,...,X n be iid random variables from a distribution F π on the real line, with π Π R. Let π 0 Π be the true value of parameter, and let π be a sample estimator such that π p π 0. Let ψ(x,π) be a differentiable function of π for π B, an open neighborhood of π 0, and for almost all x in the sample space. Suppose E ψ(x,π 0 ) <. If π ψ(x,π) M(x) for all π B, where E[M(X)] <, then 1 n n i=1 ψ(x i, π) p Eψ(X,π 0 ). Proof. Consider the Taylor series expansion of 1 n n i=1 ψ(x i, π). Notation f (x θ,π): the pdf of X 1 Φ(x θ,π) = logf (x θ,π) l n (θ,π) = n 1 n i=1 Φ(x i θ,π) UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 2 / 9
Lemma Let X 1,...,X n be iid random variables from a distribution F π on the real line, with π Π R. Let π 0 Π be the true value of parameter, and let π be a sample estimator such that π p π 0. Let ψ(x,π) be a differentiable function of π for π B, an open neighborhood of π 0, and for almost all x in the sample space. Suppose E ψ(x,π 0 ) <. If π ψ(x,π) M(x) for all π B, where E[M(X)] <, then 1 n n i=1 ψ(x i, π) p Eψ(X,π 0 ). Proof. Consider the Taylor series expansion of 1 n n i=1 ψ(x i, π). Notation f (x θ,π): the pdf of X 1 Φ(x θ,π) = logf (x θ,π) l n (θ,π) = n 1 n i=1 Φ(x i θ,π) UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 2 / 9
Notation Φ θ = Φ θ, Φ θπ = 2 Φ/ θ π, Φ θθπ = 3 Φ/ θ θ π, etc. Regularity Conditions (A1) The following partial derivatives exist Φ θ,φ θθ,φ θθθ,φ π,φ θπ,φ θθπ,φ θππ. (A2) Interchange of differentiation and integration of f is valid for first and second (partial) derivatives. (A3) φ 11 and φ 12 exist, where φ 11 = E θ0,π 0 [Φ θ ] 2 > 0 and φ 12 = E θ0,π 0 [Φ θ Φ π ]. (A4) log f (x θ 0,π) π f (x θ 0,π) M(x,θ) where EM(X,θ) <. (A5) The third partial derivatives Φ θθθ,φ θθπ,φ θππ are bounded by integrable functions M(x). (A6) For any (θ,π) (θ 0,π 0 ), P θ0,π 0 {f (X θ,π) = f (X θ 0,π 0 )} < 1. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 3 / 9
Theorem (Consistency of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a consistent estimator of π 0. Suppose (A1), (A4) and (A6) hold. Then θ p θ 0, where θ is a solution of θ l n(θ, π) = 0. Proof. By the lemma, l n (θ, π) l n (θ 0, π) p E θ0,π 0 log f (X θ,π 0) f (X θ 0,π 0 ) < loge f (X θ,π 0 ) θ 0,π 0 f (X θ 0,π 0 ) = 0, which means l n (θ, π) has a local maximum in (θ 0 ε,θ 0 + ε). There is a θ satisfying θ l n(θ, π) = 0, which completes the proof. Remark The above theorem establishes only that the pseudo MLE has a consistent root. In many applications, the pseudo maximum likelihood equation has a unique solution and the pseudo MLE is indeed consistent, because the logarithmic derivative of the pseudo likelihood in most cases is a decreasing function of the structural parameter. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 4 / 9
Theorem (Consistency of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a consistent estimator of π 0. Suppose (A1), (A4) and (A6) hold. Then θ p θ 0, where θ is a solution of θ l n(θ, π) = 0. Proof. By the lemma, l n (θ, π) l n (θ 0, π) p E θ0,π 0 log f (X θ,π 0) f (X θ 0,π 0 ) < loge f (X θ,π 0 ) θ 0,π 0 f (X θ 0,π 0 ) = 0, which means l n (θ, π) has a local maximum in (θ 0 ε,θ 0 + ε). There is a θ satisfying θ l n(θ, π) = 0, which completes the proof. Remark The above theorem establishes only that the pseudo MLE has a consistent root. In many applications, the pseudo maximum likelihood equation has a unique solution and the pseudo MLE is indeed consistent, because the logarithmic derivative of the pseudo likelihood in most cases is a decreasing function of the structural parameter. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 4 / 9
Theorem (Consistency of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a consistent estimator of π 0. Suppose (A1), (A4) and (A6) hold. Then θ p θ 0, where θ is a solution of θ l n(θ, π) = 0. Proof. By the lemma, l n (θ, π) l n (θ 0, π) p E θ0,π 0 log f (X θ,π 0) f (X θ 0,π 0 ) < loge f (X θ,π 0 ) θ 0,π 0 f (X θ 0,π 0 ) = 0, which means l n (θ, π) has a local maximum in (θ 0 ε,θ 0 + ε). There is a θ satisfying θ l n(θ, π) = 0, which completes the proof. Remark The above theorem establishes only that the pseudo MLE has a consistent root. In many applications, the pseudo maximum likelihood equation has a unique solution and the pseudo MLE is indeed consistent, because the logarithmic derivative of the pseudo likelihood in most cases is a decreasing function of the structural parameter. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 4 / 9
Theorem (Asymptotic Normality of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a sample estimator such that π π = O p (1/ n). Suppose further that ( ) Σ11 Σ n(ln (θ 0,π 0 ), π π) d N(0,Σ), Σ = 12 Σ 21 Σ 22 Then, under regularity conditions (A1)-(A6), a consistent pseudo MLE θ is asymptotically normal, n( θ θ0 )/σ d N(0,σ 2 ), where σ 2 = 1 + φ 12 (Σ 22 φ 12 2Σ 12 ). φ 11 φ 2 11 Remark If φ 22 = E θ0,π 0 [Φ π ] 2 exists and is positive, and if Σ 12 = 0 ( π is asymptotically equivalent to the MLE of π 0 ), then Σ 22 = φ 11 /(φ 11 φ 22 φ 2 12 ) and σ 2 = φ 22 /(φ 11 φ 22 φ 2 12 ). Thus, in this case θ is asymptotically efficient if φ 12 = 0. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 5 / 9
Theorem (Asymptotic Normality of Pseudo MLE) Let X 1,...,X n be an iid sample with pdf f (x θ 0,π 0 ), and let π be a sample estimator such that π π = O p (1/ n). Suppose further that ( ) Σ11 Σ n(ln (θ 0,π 0 ), π π) d N(0,Σ), Σ = 12 Σ 21 Σ 22 Then, under regularity conditions (A1)-(A6), a consistent pseudo MLE θ is asymptotically normal, n( θ θ0 )/σ d N(0,σ 2 ), where σ 2 = 1 + φ 12 (Σ 22 φ 12 2Σ 12 ). φ 11 φ 2 11 Remark If φ 22 = E θ0,π 0 [Φ π ] 2 exists and is positive, and if Σ 12 = 0 ( π is asymptotically equivalent to the MLE of π 0 ), then Σ 22 = φ 11 /(φ 11 φ 22 φ 2 12 ) and σ 2 = φ 22 /(φ 11 φ 22 φ 2 12 ). Thus, in this case θ is asymptotically efficient if φ 12 = 0. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 5 / 9
Proof. By Taylor s expansion, 0 = nl θ ( θ, π) = nl θ (θ 0, π) + n( θ θ 0 )l θθ (θ 0, π) + 1 2 n( θ θ0 ) 2 l θθθ ( θ, π) + o p (1), where θ lies between θ 0 and θ. Then nl θ (θ 0, π) = n( θ θ 0 ) ( l θθ (θ 0, π) + 1 2 ( θ θ 0 )l θθθ ( θ, π) ) + o p (1). Note that nlθ (θ 0, π) = nl θ (θ 0,π 0 ) n( π π 0 )φ 12 + o p (1). l θθ (θ 0, π) + φ 11 = o p (1). 1 2 ( θ θ 0 )l θθθ ( θ, π) = o p (1). We then see that n( θ θ 0 ) is asymptotically equivalent to ( nl θ0,π 0 n( π π 0 )φ 12 )/φ 11. The conclusion follows from the fact that Σ 11 = φ 11. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 6 / 9
Example: Signal Plus Noise Model Let X be a random variable whose distribution is that of the sum of independent variables Y and Z. The distribution of X is thus the convolution of F Y and F Z, and may be thought of as a model for signals in additive noise. Let X 1,...,X p be a random sample from Y + Z, where Y Poisson(θ 0 ) and Z B(N,p 0 ). The moment estimators of p 0 and θ 0 are p = ( X S 2 )/N and θ = X N p, where X and S 2 are the sample mean and variance, provided X 2 S 2, which occurs with probability tending to 1 as n. Moreover, by δ-method, we could derive that n ( ln (θ 0,p 0 ), p p 0 ) d N(0,Σ), where ( φ11 0 Σ = 0 t 2 (Γ 22 2Γ 23 + Γ 33 ) with t = 1/2Np 0, Γ 22 = θ 0 + Np 0 (1 p 0 ),Γ 23 = θ 0 + Np 0 (1 2p 0 ) and Γ 33 = θ 0 + 2 ( θ 0 + Np 0 (1 p 0 ) ) 2 + Np0 (1 p 0 ) ( 1 6p 0 (1 p 0 ) ) 2. ) UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 7 / 9
Comparison of the asymptotic variances The asymptotic variances of the MLE, pseudo MLE and moment estimator of the signal parameter θ 0 : σmle 2 = φ 22 φ 11 φ 22 φ12 2, σpmle 2 = 1 + φ 12 2 t 2 (Γ 22 2Γ 23 + Γ 33 ), φ 11 φ 2 11 σ 2 MME =(1 Nt)2 Γ 22 + 2Nt(1 Nt)Γ 23 + (Nt) 2 Γ 33. It is not easy to compare these expressions analytically. For a specific range of parameters, we could find ARE(PMLE/MLE) <1, ARE(MME/PMLE) <1. This example lead us to conjecture the pseudo MLE of θ 0 is uniformly more efficient asymptotically than the moment estimator of θ. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 8 / 9
Discussions Pseudo MLE can be viewed as the process of solving a reduced system of likelihood equations which promises to improve the asymptotic behavior of estimates of specific parameters. The requirements on the model are slightly less stringent for pseudo MLE than for any other methods. The method of pseudo MLE may be applicable in situations where the standard theory for MLE breaks down. The main virtue of pseudo MLE s are their tractability in problems in which optimal approaches are computationally unfeasible. An iterative procedure based on the method of pseudo MLE may provide more efficient estimates than the single iteration we have discussed. Under regular conditions, the algorithm should converge to MLE, but the speed of this convergence may preclude its use. UW-Madison (Statistics) Math-Stat Lecture 12 Jan 2018 9 / 9