IISA Talk, DeKalb, June 02 Efficient Semiparametric Estimators via Preliminary Estimators for Kullback-Leibler Minimizers Eric Slud, Math Dept, Univ of Maryland (Ongoing joint project with Ilia Vonta, Univ. of Cyprus.) OUTLINE I. MOTIVATION Semiparametric Frailty Problems II. APPROACH Finite-Dimensional Version III. BACKGROUND LITERATURE Profile Likelihood, Semiparametrics, Frailty Models IV. Semiparametric Examples V. Efficient Estimator in Frailty Case 1
Survival Models with Frailties Variables: T i Survival times, Discrete Covariates Z i C i Censoring Times, cond surv fcn R z (c) given (T i, C i ) DATA: iid triples (min(t i, C i ), I [ T i C i ], Z i ) Observable processes: N i z(t) = I [Zi =z, T i min(c i,t)], Y i z (t) = I [min(ti, C i ) t] TRANSF. MODEL: S T Z (t z) = exp( G(e β z Λ)) G known, β R m, Λ cumulative-hazard fcn PROBLEM: efficient estimation of β. Special Cases: (1) Cox 1972: G(x) x (2) Frailty: unobserved random intercept β 0 = ξ i, G x = G(x) = log 0 e sx df (s) (3) Clayton-Cuzick 1986: G(x) 1 b log(1 + bx) 2
Finite-dimensional Case X i, i = 1,..., n iid f(x, β, λ), β R m, λ R d unknown, with true values (β 0, λ 0 ) loglik(β, λ) = n i=1 log f(x i, β, λ) Profile Likelihood = loglik(β, ˆλ β ) with restricted MLE ˆλβ = arg max λ loglik(β, λ) Min Kullback-Leibler Modified Profile Approach (Severini and Wong 1992) K(β, λ) E β0,λ 0 ( log f(x 1, β, λ)) = {log f(x, β, λ)} f(x, β 0, λ 0 ) dx Define: λ β = arg max λ K(β, λ) Then: λβ estimates curve λ β Candidate Estimator β arg max β loglik(β, λ β ) 3
Key mathematical features of this approach are the convenience of restricting attention to nuisance parameters such as hazards or density functions which satisfy smoothness restrictions; the replacement of operator-inversion within (blocks of) the generalized information operator by differentiation of the restricted Kullback-Leibler minimizer; and diminished need for high-order consistency of estimation, when consistent estimators of Kullback-Leibler minimizers and their derivatives with respect to structural parameters are available. NB: Kullback-Leibler Functional = K(β, λ) 4
Sketch of (Finite-dimensional) Theory Fix β 0, λ 0 and f 0 (x) = f(x, β 0, λ 0 ). Information Matrix : I(β, λ) = = 2 A β,λ Bβ,λ β log f(x, β, λ) 2 βλ log f(x, β, λ) 2 λβ log f(x, β, λ) 2 λ log f(x, β, λ) B β,λ C β,λ f 0 (x)dx λ β = arg max λ K(β, λ) satisfies: λ K(β, λ β ) = 0 Differentiate implicitly (total deriv) wrt β to find: T β [ λ K(β, λ β ))] = B + C β λ β = 0 This implies β λ β = C 1 B and (β, λ β ) is a least-favorable nuisance-parameterization and usual Information about β for this model is I 0 β = A β0,λ 0 ( β λ β ) C β0,λ 0 ( β λ β ) = A β0,λ 0 B β0,λ 0 C 1 β 0,λ 0 B β 0,λ 0 5
Theory, continued Information Ineq says: λ β0 = λ 0, β K(β 0, λ 0 ) = 0 Now assume λ β, 2 β β consistent for λ β, 2 β λ β unif on nbhd of β 0. Can check successively: (A) ( T β ) 2 loglik(β, λ β ) neg-def unif on β 0 nbhd (B) n 1 loglik(β 0, λ β0 ) P 0 (C) β unique loc sol n of T β loglik(β, λ β ) = 0, consistent for β 0. (D) n 1 λ loglik( β, λ β) = n 1 λ loglik(β 0, λ 0 ) + o P ( β β 0 ) [because C 1 B = β λβ ] (E) n 1 β loglik( β, λ β) = n 1 β loglik(β 0, λ 0 ) I 0 β ( β β 0 ) + o P ( β β 0 ) (F) (G) n ( β β0 ) = (I 0 β) 1 1 n ( β loglik(β 0, λ 0 ) + λ loglik(β 0, λ 0 ) β λ β0 ) n ( β β0 ) D N (0, (I 0 β) 1 ) 6
LITERATURE Profile Likelihood Cox, D. R. & Reid, N. (1987) JRSSB McCullagh, P. and Tibshirani, R. (1990) JRSSB Severini, T. and Wong, W. (1992) Ann Stat Semiparametrics Bickel, Klaassen, Ritov, & Wellner: 1993 Book Cox, D. R. (1972) Cox-Model paper JRSSB Owen, A. (1988) Biometrika: Empirical likelihood Qin, J. & Lawless, J. (1994) Ann Stat: EL & GEE Murphy & van der Vaart (2000) JASA Transformation/Frailty Models Cheng, Wei, & Ying (1995) Biometrika Clayton and Cuzick (1986) ISI Centenary Session Slud, E. and Vonta, I. (2002) submitted Parner (1998) Ann. Stat. 7
-dim Examples & Applications (I). Mean estimation in the location model X i λ 0 (x β 0 ), β 0 = E(X 1 ) λ 0 L 1 (dx) compactly supported 0-mean density. (Easy example, X efficient. Owen 1988, Qin & Lawless 1994 studied in connection with empirical likelihood.) For β β 0, λ β (x) = λ 0 (x + β)/(1 αx) with α solving x λ β (x) dx = 0 Define λ β λ β0 (x) = 1 n h n first at β 0 using density estimator n i=1 λ β (x) = λ β0 (x + β) 1 αx φ((x X i + X)/h n ), h n 0, x λ β0 (x + β) 1 αx dx = 0 (II). Cox model. For q z (t) = p Z (z)r z (t) exp( e z β 0 Λ 0 (t)), Λ β (t) t 0 λ β(s) ds = z q z (x) e z β 0 λ 0 (x) z e z β q z (x) 8
Example (II), cont d. First let λ 0 be consistent density estimate of λ 0 (x) = Λ 0(x) (eg by smoothing and differentiating the Kaplan- Meier cumulative-hazard estimator on data in a z = 0 data-stratum.) Then estimate elsewhere by λ β (t) = z ez β 0 Y z (t) λ β0 (t) / z ez β Y z (t) NB. In this example, any β estimator produced in this way collapses to the usual Cox Max Partial Likelihood Estimator! (III). Transformation/Frailty Models In the general G transformation model case, must assume for some finite time τ 0 with Λ 0 (tau 0 ) < that all data are censored at τ 0. In this model, Slud and Vonta (2002) characterize the K-optimizing hazard intensity λ β in its integrated form L = Λ β = 0 λ β (x) dx, through the second order ODE system: 9
Example (III), cont d. dl dλ 0 (s) = z e z β 0 q z (s) G (e z β 0 Λ 0 (s)) z e z β q z (s) G (e z β L(s)) + Q(s) dq dλ 0 (s) = z ez β q z (s) G G e z β L(s) (e z β 0 G (e z β 0 Λ 0 (s)) (e z β G ((e z β L(s)) dl dλ 0 (s)) subject to the initial/terminal conditions L(0) = 0, Q(τ 0 ) = 0 Slud and Vonta (2002) show that these ODE s have unique solutions, smooth with respect to β and differentiable in t, which (with λ β L ) maximize the functional J (β, λ β ) as desired. Consistent preliminary estimators λβ can be developed by substitute for β 0, Λ 0 in those equations (smoothed wrt t) consistent preliminary estimators. Punchline: the new estimator β = arg max β loglik(β, λ β ) is efficient! 10