Survival Analysis: One-Sample Problem /Two-Sample Problem/Regression. Lu Tian and Richard Olshen Stanford University

Survival Analysis: One-Sample Problem /Two-Sample Problem/Regression Lu Tian and Richard Olshen Stanford University 1

One sample problem T 1,, T n 1 S( ), C 1,, C n G( ) and T i C i Observations: (U i, δ i ), i = 1,, n Objective: Estimate H(t) and S(t) 2

One sample problem N i (t) = I(U i t)δ i, Y i (t) = I(U i t) M i (t) = N i (t) Y i(s)h(s)ds N(t) = n i=1 N i(t), Y (t) = n i=1 Y i(t), and M(t) = n i=1 M i(t). 3

One sample problem The NA estimator Ĥ(t) = I(Y (s) > ) dn(s) Y (s) Consider where Q(t) = I(Y (s) > ) dm(s) = Y (s) D(t) = h(s)i(y (s) = )ds Ĥ(t) H(t) + D(t) 4

Bias The Bias of NA estimator: E(Ĥ(t) H(t)) = E(Q(t)) E(D(t)) = h(s)[1 S(s){1 G(s)}] n ds Suppose that τ > such that S(τ) > and 1 G(τ) >, then < E{D(t)} < H(t)[1 S(τ){1 G(τ)}] n. 5

Asymptotical Distribution For t [, τ] n 1/2 {Ĥ(t) H(t)} = n1/2 Q(t) n 1/2 D(t) = n i=1 n 1/2 I(Y (s) > ) dm i (s) + o p (1) Y (s) Check the conditions of MCLT 1. Condition (a) < i U in, i U in > (t) = n i=1 ni(y (s) > ) Y Y 2 i (s)h(s)ds (s) h(s) π(s) ds 2. Condition (b) < i U in,ϵ, i U in,ϵ > (t) = I(Y (s) > ) I n 1 Y (s) ( ) n 1/2 I(Y (s) > )/Y (s) ϵ h(s)ds since P (sup{n 1/2 I(Y (s) > )/Y (s)} ϵ) P (n 1/2 ϵ 1 n 1 Y (τ)). [,τ] 6

Asymptotical distribution 1/2 By MCLT and Slucky theorem n {Ĥ(t) H(t)} converges weakly to a Gaussian process X(t), t [, τ] with independent increment and var(x(t)) = h(s) π(s) ds, where π(t) = S(t){1 G(t)}. σ 2 (t) can be estimated by dĥ(t) ˆπ(t) = n I(Y (s) > ) Y (s) 2 dn(s) = n τ j t d j Y (τ j ) 2. 7

KM Estimator KM estimator is equivalent to e Ĥ(t) Ŝ(t) converges to S(t) uniformly in [, τ] 1/2 n {Ŝ(t) S(t)} converges weakly to a Gaussian process with a variance function of S(t) 2 σ 2 (t) = S(t) 2 h(s) π(s) ds, which can be estimated by Ŝ 2 (t) similar but not identical to the Greenwood formulae. dˆλ(s) n 1 Y (s), 8

Two sample problem (T i, C i, Z i ), i = 1,, n P (T i > t Z i = j) = 1 F j (t) = S j (t) P (C i > t Z i = j) = 1 G j (t) (G 1 (t) could be different from G (t).) n j the size of group j. Objective: test H : S ( ) = S 1 ( ) 9

Notations N j (t) = n i=1 I(U i t, δ i = 1, Z i = j), j =, 1. Y j (t) = n i=1 I(U i t, Z i = j), j =, 1. M j (t) = N j (t) Y j(s)h j (s)ds. p j = n j /n, j =, 1. π j (s) = P (U i t Z i = j), j =, 1. 1

Two sample problem Weighted logrank test statistics Q w = ( ) 1/2 { n U w () = n n 1 H 1w (s)dn 1 (s) } H w (s)dn (s) where H jw (s) = ( n n n 1 ) 1/2 Y 1 j (s) Y (s) w n(s), j =, 1. 11

Two sample problem Martingale representation Q w (t) = H 1w (s)dm 1 (s) H w (s)dm (s) + R n (t) where R n (t) = = ( n H 1w (s)da 1 (s) n n 1 ) 1/2 H w (s)da (s) Y (s)y 1 (s)w n (s) (h 1 (s) h (s))ds. Y (s) 12

Two sample problem Apply MCLT where σ 2 jw(t) = p 1 j H jw (s)dm j (s) W (σ 2 j (t)) weakly, π 1 j (s) π (s)π 1 (s) π(s) 2 w 2 (s)h j (s)ds. H 1w(s)dM 1 (s) H w(s)dm (s) converges to N{, σ 2 1w(t) + σ 2 w(t)} in distribution. The result can be generalized to t = +. 13

Null Variance Under the null, Q w ( ) N(, σ 2 w) where σ 2 w = f (s)(1 G (s))(1 G 1 (s))w 2 (s) ds. p (1 G (s)) + p 1 (1 G 1 (s)) σ 2 w can be estimated by ˆσ 2 w = = Y (s)y 1 (s)n 1 n 1 1 n 1 w 2 Y (s) n(s)dĥ(s) ( ) n Y (s)y 1 (s) n n 1 Y (s) 2 wn(s)d{n 2 1 (s) + N (s)} 14

The Null Distribution Under the null For the logrank test Z w = Q w ˆσ w N(, 1) under H. Z = j (O j E j ) j V j Y (τ j) d j Y (τ j ) 1. 15

Under Alternative Assume that under H 1 Under H 1 : R n ξ. log ( h (n) 1 (t) h (t) ) = g(t)/ n. Under H 1 : Q w N(ξ, σ 2 w) under alternatives. 16

The shift under alternatives Under the contiguous alternatives: R n p p 1 p p 1 π (s)π 1 (s) w(s) n(h (n) 1 π(s) (s) h (s))ds f (s)(1 G (s))(1 G 1 (s))w(s)g(s) ds = ξ p (1 G (s)) + p 1 (1 G 1 (s)) The noncentrality parameter θ = ξ/σ w determines the power of the test. 17

The shift under alternatives The noncentrailty parameter θ m(s)g(s)w(s)ds m(s)w 2 (s)ds m(s)g 2 (s)ds The maximum value is achieved by letting w(t) g(t). In practice g( ) is unknown, so we may choose w(t) based on our best guess. 18

Cox regression (T i, C i, Z i ), i = 1,, n PH assumption: h(t Z i ) = h (t)e β Z i Noninformative censoring: T i C i Z i. Exist τ such that C i < τ and P (T i τ) >. Z i are i.i.d bounded random variables. 19

PL function S (j) (β, s) = n i=1 I(U i s)e β Z i Z j i, j =, 1, 2. log-pl function: log{p L(β)} = n τ i=1 [ { }] β Z i log S () (β, s) dn i (t) The score function S(β) = n τ i=1 [ ] Z i S(1) (β, s) dn S () i (s) (β, s) The information matrix I(β) = n τ i=1 [ S (2) (β, s) S () (β, s) S(1) (β, s) 2 ] dn S () (β, s) 2 i (s). 2

Asymptotical Normality = S( ˆβ) = S(β ) Î(β )( ˆβ β ) n 1/2 ( ˆβ { 1 β) = n 1 Î(β )} n 1/2 S(β ) 1. n 1 Î(β ) i(β ) = o p (1) 2. n 1/2 S(β ) N{, i(β )} as n. where i(β ) = τ [s (2) (β, s) s(1) (β, s) 2 ] h s () (s)ds. (β, s) 21

Information Matrix Consider the expansion n 1 Î(β ) i(β ) = M n (β ) + A n (β ) A(β ) + A(β ) i(β ) where n τ [ S M n (β) = n 1 (2) (β, s) i=1 S () (β, s) S(1) (β, s) 2 ] dm S () (β, s) 2 i (s) τ [ S A n (β) = n 1 (2) (β, s) S () (β, s) S(1) (β, s) 2 ] S ( (β S () (β, s) 2, s)h (s)ds τ [ s (2) (β, s) A(β) = s () (β, s) s(1) (β, s) 2 ] s () (β s () (β, s) 2, s)h (s)ds M n (β ) = o p (1) (MCLT), A n (β ) A(β ) = o p (1) (LLN) and A(β ) i(β ) = o p (1) (Consistency of β and continuity). 22

Scores Score functions n 1/2 S(β ) = n 1/2 n i=1 [ ] Z i S(1) (β, s) dm S () i (s) (β, s) By MCLT n 1/2 S(β ) converges weakly to N(, Σ ), where Σ = lim n n 1 the information equality! n i=1 τ [ ] 2 Z i S(1) (β, s) Y S () i (s)e β Z i h (s)ds = i(β ) (β, s) 23

Asymptotical Normality Consistency: ˆβ β = o p (1). Normality: n 1/2 ( ˆβ β ) N(, i(β ) 1 ). i(β ) can be consistently estimated n 1 I( ˆβ) 24

Asymptotical Properties of Breslow estimator The cumulative baseline hazard function can be estimated as Ĥ (t) = n i=1 { n } 1 I(U i s)e ˆβ Z i dn i (s). i=1 Consider the expansion { } n 1/2 Ĥ (t) H (t) = n 1/2 [{ n } 1 { n } 1 ] I(U i s)e ˆβ Z i I(U i s)e β Z i dn(s) i=1 [ { t n 1 ] +n 1/2 I(U i s)e β i} Z dn(s) H (s) + o p (1) i=1 i=1 25

Asymptotical Properties of Breslow estimator Continue the expansion for n 1/2 {Ĥ(t) H (t)} [ = s (1) (β, s){s () (β, s)} 1 h (s)ds] n 1/2 ( ˆβ β ) +n 1/2 n i=1 i=1 converging to a mean zero Gaussian process { n 1 I(U i s)e β i} Z dm i (s) + o p (1) 26

Asymptotical Properties of Breslow estimator The variance of n 1/2 {Ĥ(t) H (t)} can be estimated as the empirical variance of n 1/2 [ n i=1 +n 1/2 n ] S (1) (β, s) S () (β, s) dĥ(s) Î(β ) i=1 { where G i N(, 1), i = 1, 2,, n. n 1 n i=1 τ [ ] Z i S(1) (β, s) dn S () i (s)g i (β, s) I(U i s)e β Z i} 1 dn i (s)g i, 27

The Consequence of model mis-specification If h(t Z i ) = h(t, Z i ) h (t)e β Z i, then ˆβ converges to β which is the solution of the estimating equation [ ] s (1) (t) s(1) (β, t) s () (β, t) s() (t) dt = where s (j) (t) = lim n 1 n i=1 Y i (t)h(t, Z i )Z j i. 28

The Consequence of model mis-specification n 1/2 ( ˆβ β ) still converges to a normal distribution with mean zero and variance A 1 BA 1. where A = [ { } B = E Z i s(1) (β, s) dn s () (β i (t), s) β may depend on the censoring distribution! Robust variance estimator for ˆβ. [ s (2) (β, s) s () (β, s) s(1) (β, s) 2 ] s () (s)ds. s () (β, s) 2 Y i (t)e Z i β s () (β, t) { Z i s(1) (β, t) s () (β, t) } de{n i (t)}] 2. 29