2012 6 Chinese Journal of Applied Probability and Statistics Vol.28 No.3 Jun. 2012 (,, 675000),,, EM,,. :,,, P-. : O212.7. 1. (count data), Poisson Poisson,, (zero-inflation).,.,, ;,,.,, Fahrmeir Echavarrri (2006) ; Yip Yau (2005) ; Xie et al. (2009, 2008) Poisson Score ; Ghosh et al. (2006) Bayesian.,,,.,,,,. Davidian Giltinan (1995), Diggle et al. (2002), Lin Carroll (2001), He et al. (2002), (2006). Zhang et al. (10BTJ001) (11171105). 2011 11 25, 2011 12 28.
312 (1998), (2006) Lin Carroll (2001),. : Poisson ; P- ; EM ;. 2., Poisson (Zero-Inflated Generalized Poisson Distribution), ZIGP. Y ZIGP(φ, λ, α), φ + (1 φ)p(0; λ, α), y = 0; P{Y = y} = (1 φ)p(y; λ, α), y > 0, p(y; λ, α) Poisson, p(y; λ, α) = 1 ( λ ) y(1 { + αy) y 1 exp y! 1 + αλ α, λ. Y, Y α 0, Poisson. λ(1 + αy) 1 + αλ E[Y λ, α] = λ, Var [Y λ, α] = λ(1 + αλ) 2, Y ZIGP(φ, λ, α), Y E[Y φ, λ, α] = (1 φ)λ, φ = 0, ZIGP GP. }, y = 0, 1, 2,..., Var [Y φ, λ, α] = (1 φ)λ[(1 + αλ) 2 + φλ, (2.1),, Y ij, y ij, i j i j, t ij, x ij, z ij, i = 1, 2,..., n, j = 1, 2,..., n i, (y ij, x ij, z ij, t ij )., Y ij ZIGP(φ ij, λ ij, α),, α, φ ij λ ij ( Wu Zhang (2006)). logit(φ ij ) = zij T γ + g(t ij) + u i, log(λ ij ) = x T ij β + h(t ij) + v i, (2.2)
: 313 γ β z ij x ij, g( ) h( ),, u i v i, (u T i, vt i ) N(0, Σ)., x ij z ij,, (2.1) (2.2) Poisson. 3. P-,,,,,.,, (P- ). Green Silverman (1994), (2.1) (2.2) PL(γ, β, g, h) = l(y; φ, λ) 1 2 λ 1 (g (t)) 2 1 2 λ 2 (h (t)) 2 dt, (3.1) λ 1, λ 2 > 0, l(y; φ, λ) = n n i { I(y ij = 0) log [φ ij + (1 φ ij )p(0; λ ij, α)]f(ω i ; Σ)dω i i=1 j=1 + I(y ij > 0) log [1 φ ij p(y ij ; λ ij, α)]f(ω i ; Σ)dω i }, (3.2) ω i = (u T i, vt i ), f(ω i; Σ)., Yu et al. (2002), Ruppert et al. (2003), g(t) = τ 0 + τ 1 t + + τ q t q + K τ q+k (t h k ) q +, k=1 h(t) = δ 0 + δ 1 t + + δ l t l + K δ l+k (t h k ) l +, w+ k = [max(0, w)] k, {h k } K k=1, k=1 τ = (τ 0, τ 1,..., τ q+k ) T, δ = (δ 0, δ 1,..., δ l+k ) T, B 1 (t) = {1, t,..., t q, (t h 1 ) q +,..., (t h k) q + }T, B 2 (t) = {1, t,..., t l, (t h 1 ) l +,..., (t h k ) l +} T, g( ) h( ) g(t) = τ T B 1 (t), h(t) = δ T B 2 (t). Yu et al. (2002), 5 10,, ;, 10,, K = 15.
314, Yu et al. (2002) ( g(t)) 2 dt (ḧ(t))2 dt τ T Kτ δ T Kδ, K t ;, Wu Zhang (2006), [rank(k T K)/3],, q = l = 3. 4. EM (2.1) (2.2), (3.1) (3.2),,,,, EM, ω = (u, v). D c = (D o, D m ), D m ω,, D o. PL(γ, β, δ, τ, Σ D c ) PL(γ, β, δ, τ, Σ D c ) = l 1 (φ, λ) + n log f(ω i ; Σ) 1 2 Nλ 1τ T Kτ 1 2 Nλ 2δ T Kδ, (4.1) l(φ, λ) = n N = n n i, λ 1, λ 2. i=1 i=1 j=1 (4.1) EM. E-step: M-step: i=1 n i {I(y ij = 0) log[φ ij + (1 φ ij )p(0; λ ij, α)] + I(y ij > 0) log[1 φ ij p(y ij ; λ ij, α)]}, Q(ψ ψ (m) ) = E[PL(γ, β, δ, τ, Σ D c ) D o, ψ (m) ]; ψ (m+1) = arg max ψ [Q(ψ ψ(m) )]. ψ = (γ, β, δ, τ, Σ), ψ (m) m. E,, Metropolis f ω y ( ), MonterCarlo., f(ω), ω, ω, ω, q(ω, ω ) = min {1, f ω y(ω y, γ, β, τ, δ, Σ)f(ω) } f ω y (ω y, γ, β, τ, δ, Σ)f(ω. ) M,, Σ Σ = 1 J J (ω (j) ) T ω (j), j=1
: 315 {ω (j) } J j=1 Metropolis. θ = (γ, β, δ, τ), [ PL(θ, Σ Dc ) ] E = 0, θ Newton-Raphson θ. λ 1, λ 2, GCV(λ 1, λ 2 ) = (2N) 1 Y 1 HY 1 2 {1 (2N) 1 tr(h)} 2, Y 1 = (M T, Y T ), Y T {y ij } (i = 1, 2,..., n, j = 1, 2,..., n i ) N, M T Y T, y ij = 0, M ij = 1, 0. H C = ( B1 0 0 B 2 ) H = SC(C T SC + ) 1 C T,, S = ( S1 0 0 I ), = ( λ1 K 0 0 λ 2 K S 1 = diag(exp{z T ijγ + g(t ij ) + u i }/[1 + exp{z T ijγ + g(t ij ) + u i }] 2 ). λ 1 GCV λ 2, λ 2 = λ 2, GCV λ 1 = λ 1, (λ 1, λ 2 ).. : (1) ψ (0), m = 0; (2) GCV GCV λ (m) 1, λ (m) 2 ; (3) EM ψ (m+1) ; (4) m = m + 1, (2),. 5., 5.1 ),,, Poisson : Y ij ZIP(φ ij, λ ij ), logit(φ ij ) = γ 0 + γ 1 x 1ij + γ 2 x 2i + u i + g(t ij ); log(λ ij ) = β 0 + β 1 x 1ij + β 2 x 2i + v i + h(t ij ),
316 i = 1, 2,..., n, j = 1, 2,..., n i, x 1ij U[ 1, 1], x 2i { 1, 1}, t ij U[ 2, 2], (u i, v i ) T N(0, Σ), Σ ( ) ( ) σ11 σ 12 1 0.3 Σ = =. σ 21 σ 22 0.3 1 h(t) = sin(πt/2) 1 + 2t 2, g(t) = sin(πt/2). [sign(t) + 1] β 0 = 1.5, β 1 = 1, β 2 = 2, γ 0 = 2, γ 1 = 0.5, γ 2 = 1, n = 50, n i = 20, 1000, 1. λ 1 3.25 3.56, λ 2 1.62 2.32, 1, 2 (,, 5% 95% ). 1 β 0 β 1 β 2 γ 0 γ 1 γ 2 0.036 0.066 0.023 0.012 0.058 0.032 0.397 0.175 0.367 0.271 0.305 0.290 1.5 1 0.5 0 0.5 1 1.5 2 t 1.5 1 0.5 0 0.5 1 1.5 2 t 1 g(t) 2 h(t) 5.2, ( 10BTJ001 ). 2008 2009 21 7 17 4687,, 47,,., (y), (x 1 ), BMI(x 2, (kg)/ (m 2 )) (t),, 10, 200.
: 317 y, y > 0 Poisson, α = 1.311. ZIGP, y ij ZIGP(φ ij, λ ij, α), i = 1, 2,..., 10 ( ), j = 1, 2,..., 200 ( ). logit(φ ij ) = γ 0 + γ 1 x 1ij + γ 2 x 2ij + u i + g(t ij ); log(λ ij ) = β 0 + β 1 x 1ij + β 2 x 2ij + v i + h(t ij ). 2, 3, 4. 2 β 0 β 1 β 2 γ 0 γ 1 γ 2 1.304 1.108 0.544-4.065-0.588 0.082 0.640 0.170 0.302 1.155 0.055 0.081 8 9 10 11 12 13 14 15 16 17 8 9 10 11 12 13 14 15 16 17 age age 3 g( ) 4 h( ) [1] Fahrmeir, L. and Echavarrri, L.O., Structured additive regression for overdispersed and zero-inflated count data, Applied Stochastic Models in Business and Industry, 22(5)(2006), 351 369. [2] Yip, K.C.H. and Yau, K.K.W., On modeling claim frequency data in general insurance with extra zeros, Insurance: Mathematics and Economics, 36(2)(2005), 153 163. [3] Xie, F.C., Wei, B.C. and Lin, J.G., Score test for zero-inflated generalized Poisson mixed regression models, Computational Statistics & Data Analysis, 53(9)(2009), 3478 3489. [4] Xie, F.C., Wei, B.C. and Lin, J.G., Assessing influence for pharmaceutical data in zero-inflated generalized Poisson mixed models, Statistics in Medicine, 27(8)(2008), 3656 3673. [5] Ghosh, S.K., Mukhopadhyay, P. and Lu, J.C., Bayesian analysis of zero-inflated regression models, Journal of Statistical Planning and Inference, 136(4)(2006), 1360 1375. [7] Davidian, M. and Giltinan, D.M., Nonlinear Models for Repeated Measurement Data, Chapman and Hall, 1995. [7] Diggle, P.J., Liang, K.Y. and Zeger, S.L., Analysis of Longitudinal Data, Cambridge University Press, 2002.
318 [8] Lin, X.H. and Carroll, R.J., Semiparametric regression for clustered data, Biometrika, 88(4)(2001), 1179 1185. [9] He, X.M., Zhu, Z.Y. and Fung, W.K., Estimation in semiparametrics model for longitudinal data with unspecificed dependence structure, Biometrika, 89(3)(2002), 579 590. [10],,,, 25(2)(2009), 171 184. [11] Zhang, D., Lin, X. and Sower, M.F., Semiparametric stochastic mixed model for longitudinal data, Journal of the American Statistical Association, 93(442)(1998), 710 719. [12] Wu, H. and Zhang, J.T., Nonparametric Regression Methods for Longitudinal Data Analysis, Wiley Interscience, 2006. [13] Green, P.J. and Silverman, B.W., Nonparametric Regression and Generalized Linear Models, Chapman and Hall, 1994. [14] Yu, Y. and Ruppert, D., Penalized spline estimation for partially linear single-index models, Journal of the American Statistical Association, 97(460)(2002), 1042 1054. [15] Ruppert, D., Wand, M.P. and Carroll, R.J., Semiparametric Regression, Cambridge University Press, 2003. Penalized Likelihood Estimation of a Class of Zero-Inflated Semiparametric Mixed-Effect Model Shi Hongxing (School of Primary Education, Chuxiong Normal University, Chuxiong, 675000 ) In this paper, we consider an extension of semiparametric linear mixed-effect model to a class of longitudinal data or clustered data with zero-inflation and propose a novel semiparametric mixed model. Based on the maximum penalized likelihood estimation and EM algorithm, we give a method for our proposed model which may estimate both of the parameters and nonparameters simultaneously. In the method, we use GCV to choose the smoothing parameter.finally, we study a simulation analysis and one real data set to illustrate the proposed method. Keywords: Zero-inflation, semiparametric model, penalized likelihood, P-spline. AMS Subject Classification: 62G05.