Survival Analysis: One-Sample Problem /Two-Sample Problem/Regression. Lu Tian and Richard Olshen Stanford University

Σχετικά έγγραφα
Lecture 21: Properties and robustness of LSE

Statistical Inference I Locally most powerful tests

ST5224: Advanced Statistical Theory II

2. Weighted Log-rank statistics. Y 2 (u) 1. Perspective 1: weighted difference in cumulative hazard functions

Other Test Constructions: Likelihood Ratio & Bayes Tests

Lecture 12: Pseudo likelihood approach

Lecture 34 Bootstrap confidence intervals

Exercises to Statistics of Material Fatigue No. 5

Mean-Variance Analysis

Probability and Random Processes (Part II)

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

6.3 Forecasting ARMA processes

Solution Series 9. i=1 x i and i=1 x i.

Problem Set 3: Solutions

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Μηχανική Μάθηση Hypothesis Testing

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

2 Composition. Invertible Mappings

P AND P. P : actual probability. P : risk neutral probability. Realtionship: mutual absolute continuity P P. For example:

Homework 3 Solutions

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Introduction to the ML Estimation of ARMA processes

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

4.6 Autoregressive Moving Average Model ARMA(1,1)

w o = R 1 p. (1) R = p =. = 1

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Optimal Parameter in Hermitian and Skew-Hermitian Splitting Method for Certain Two-by-Two Block Matrices

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

FORMULAS FOR STATISTICS 1

The Spiral of Theodorus, Numerical Analysis, and Special Functions

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Inverse trigonometric functions & General Solution of Trigonometric Equations

Every set of first-order formulas is equivalent to an independent set

Tridiagonal matrices. Gérard MEURANT. October, 2008

Lecture 7: Overdispersion in Poisson regression

Lecture 34: Ridge regression and LASSO

ΗΜΥ 220: ΣΗΜΑΤΑ ΚΑΙ ΣΥΣΤΗΜΑΤΑ Ι Ακαδημαϊκό έτος Εαρινό Εξάμηνο Κατ οίκον εργασία αρ. 2

6. MAXIMUM LIKELIHOOD ESTIMATION

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

557: MATHEMATICAL STATISTICS II RESULTS FROM CLASSICAL HYPOTHESIS TESTING

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

5.4 The Poisson Distribution.

Econ Spring 2004 Instructor: Prof. Kiefer Solution to Problem set # 5. γ (0)

These derivations are not part of the official forthcoming version of Vasilaky and Leonard

Arithmetical applications of lagrangian interpolation. Tanguy Rivoal. Institut Fourier CNRS and Université de Grenoble 1

Example Sheet 3 Solutions

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Concrete Mathematics Exercises from 30 September 2016

Online Appendix I. 1 1+r ]}, Bψ = {ψ : Y E A S S}, B W = +(1 s)[1 m (1,0) (b, e, a, ψ (0,a ) (e, a, s); q, ψ, W )]}, (29) exp( U(d,a ) (i, x; q)

5. Choice under Uncertainty

Credit Risk. Finance and Insurance - Stochastic Analysis and Practical Methods Spring School Jena, March 2009

Statistics & Research methods. Athanasios Papaioannou University of Thessaly Dept. of PE & Sport Science

C.S. 430 Assignment 6, Sample Solutions

The challenges of non-stable predicates

More Notes on Testing. Large Sample Properties of the Likelihood Ratio Statistic. Let X i be iid with density f(x, θ). We are interested in testing

EE512: Error Control Coding

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Supplementary Appendix

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

( y) Partial Differential Equations

Theorem 8 Let φ be the most powerful size α test of H

Numerical Analysis FMN011

Section 8.3 Trigonometric Equations

Lecture 26: Circular domains

Uniform Convergence of Fourier Series Michael Taylor

Biostatistics for Health Sciences Review Sheet

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

Homework for 1/27 Due 2/5

On the Galois Group of Linear Difference-Differential Equations

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

Finite Field Problems: Solutions

Άσκηση 10, σελ Για τη μεταβλητή x (άτυπος όγκος) έχουμε: x censored_x 1 F 3 F 3 F 4 F 10 F 13 F 13 F 16 F 16 F 24 F 26 F 27 F 28 F

Local Approximation with Kernels

The Simply Typed Lambda Calculus

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

ESTIMATION OF SYSTEM RELIABILITY IN A TWO COMPONENT STRESS-STRENGTH MODELS DAVID D. HANAGAL

Table 1: Military Service: Models. Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 num unemployed mili mili num unemployed

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Solutions to Exercise Sheet 5

Homework 8 Model Solution Section

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Review Test 3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Mean-Variance Hedging on uncertain time horizon in a market with a jump

Second Order Partial Differential Equations

Exercise 2: The form of the generalized likelihood ratio

Berry-Esseen Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University. Hsin Chu, Taiwan 30010, R.O.C.

Math221: HW# 1 solutions

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Transcript:

Survival Analysis: One-Sample Problem /Two-Sample Problem/Regression Lu Tian and Richard Olshen Stanford University 1

One sample problem T 1,, T n 1 S( ), C 1,, C n G( ) and T i C i Observations: (U i, δ i ), i = 1,, n Objective: Estimate H(t) and S(t) 2

One sample problem N i (t) = I(U i t)δ i, Y i (t) = I(U i t) M i (t) = N i (t) Y i(s)h(s)ds N(t) = n i=1 N i(t), Y (t) = n i=1 Y i(t), and M(t) = n i=1 M i(t). 3

One sample problem The NA estimator Ĥ(t) = I(Y (s) > ) dn(s) Y (s) Consider where Q(t) = I(Y (s) > ) dm(s) = Y (s) D(t) = h(s)i(y (s) = )ds Ĥ(t) H(t) + D(t) 4

Bias The Bias of NA estimator: E(Ĥ(t) H(t)) = E(Q(t)) E(D(t)) = h(s)[1 S(s){1 G(s)}] n ds Suppose that τ > such that S(τ) > and 1 G(τ) >, then < E{D(t)} < H(t)[1 S(τ){1 G(τ)}] n. 5

Asymptotical Distribution For t [, τ] n 1/2 {Ĥ(t) H(t)} = n1/2 Q(t) n 1/2 D(t) = n i=1 n 1/2 I(Y (s) > ) dm i (s) + o p (1) Y (s) Check the conditions of MCLT 1. Condition (a) < i U in, i U in > (t) = n i=1 ni(y (s) > ) Y Y 2 i (s)h(s)ds (s) h(s) π(s) ds 2. Condition (b) < i U in,ϵ, i U in,ϵ > (t) = I(Y (s) > ) I n 1 Y (s) ( ) n 1/2 I(Y (s) > )/Y (s) ϵ h(s)ds since P (sup{n 1/2 I(Y (s) > )/Y (s)} ϵ) P (n 1/2 ϵ 1 n 1 Y (τ)). [,τ] 6

Asymptotical distribution 1/2 By MCLT and Slucky theorem n {Ĥ(t) H(t)} converges weakly to a Gaussian process X(t), t [, τ] with independent increment and var(x(t)) = h(s) π(s) ds, where π(t) = S(t){1 G(t)}. σ 2 (t) can be estimated by dĥ(t) ˆπ(t) = n I(Y (s) > ) Y (s) 2 dn(s) = n τ j t d j Y (τ j ) 2. 7

KM Estimator KM estimator is equivalent to e Ĥ(t) Ŝ(t) converges to S(t) uniformly in [, τ] 1/2 n {Ŝ(t) S(t)} converges weakly to a Gaussian process with a variance function of S(t) 2 σ 2 (t) = S(t) 2 h(s) π(s) ds, which can be estimated by Ŝ 2 (t) similar but not identical to the Greenwood formulae. dˆλ(s) n 1 Y (s), 8

Two sample problem (T i, C i, Z i ), i = 1,, n P (T i > t Z i = j) = 1 F j (t) = S j (t) P (C i > t Z i = j) = 1 G j (t) (G 1 (t) could be different from G (t).) n j the size of group j. Objective: test H : S ( ) = S 1 ( ) 9

Notations N j (t) = n i=1 I(U i t, δ i = 1, Z i = j), j =, 1. Y j (t) = n i=1 I(U i t, Z i = j), j =, 1. M j (t) = N j (t) Y j(s)h j (s)ds. p j = n j /n, j =, 1. π j (s) = P (U i t Z i = j), j =, 1. 1

Two sample problem Weighted logrank test statistics Q w = ( ) 1/2 { n U w () = n n 1 H 1w (s)dn 1 (s) } H w (s)dn (s) where H jw (s) = ( n n n 1 ) 1/2 Y 1 j (s) Y (s) w n(s), j =, 1. 11

Two sample problem Martingale representation Q w (t) = H 1w (s)dm 1 (s) H w (s)dm (s) + R n (t) where R n (t) = = ( n H 1w (s)da 1 (s) n n 1 ) 1/2 H w (s)da (s) Y (s)y 1 (s)w n (s) (h 1 (s) h (s))ds. Y (s) 12

Two sample problem Apply MCLT where σ 2 jw(t) = p 1 j H jw (s)dm j (s) W (σ 2 j (t)) weakly, π 1 j (s) π (s)π 1 (s) π(s) 2 w 2 (s)h j (s)ds. H 1w(s)dM 1 (s) H w(s)dm (s) converges to N{, σ 2 1w(t) + σ 2 w(t)} in distribution. The result can be generalized to t = +. 13

Null Variance Under the null, Q w ( ) N(, σ 2 w) where σ 2 w = f (s)(1 G (s))(1 G 1 (s))w 2 (s) ds. p (1 G (s)) + p 1 (1 G 1 (s)) σ 2 w can be estimated by ˆσ 2 w = = Y (s)y 1 (s)n 1 n 1 1 n 1 w 2 Y (s) n(s)dĥ(s) ( ) n Y (s)y 1 (s) n n 1 Y (s) 2 wn(s)d{n 2 1 (s) + N (s)} 14

The Null Distribution Under the null For the logrank test Z w = Q w ˆσ w N(, 1) under H. Z = j (O j E j ) j V j Y (τ j) d j Y (τ j ) 1. 15

Under Alternative Assume that under H 1 Under H 1 : R n ξ. log ( h (n) 1 (t) h (t) ) = g(t)/ n. Under H 1 : Q w N(ξ, σ 2 w) under alternatives. 16

The shift under alternatives Under the contiguous alternatives: R n p p 1 p p 1 π (s)π 1 (s) w(s) n(h (n) 1 π(s) (s) h (s))ds f (s)(1 G (s))(1 G 1 (s))w(s)g(s) ds = ξ p (1 G (s)) + p 1 (1 G 1 (s)) The noncentrality parameter θ = ξ/σ w determines the power of the test. 17

The shift under alternatives The noncentrailty parameter θ m(s)g(s)w(s)ds m(s)w 2 (s)ds m(s)g 2 (s)ds The maximum value is achieved by letting w(t) g(t). In practice g( ) is unknown, so we may choose w(t) based on our best guess. 18

Cox regression (T i, C i, Z i ), i = 1,, n PH assumption: h(t Z i ) = h (t)e β Z i Noninformative censoring: T i C i Z i. Exist τ such that C i < τ and P (T i τ) >. Z i are i.i.d bounded random variables. 19

PL function S (j) (β, s) = n i=1 I(U i s)e β Z i Z j i, j =, 1, 2. log-pl function: log{p L(β)} = n τ i=1 [ { }] β Z i log S () (β, s) dn i (t) The score function S(β) = n τ i=1 [ ] Z i S(1) (β, s) dn S () i (s) (β, s) The information matrix I(β) = n τ i=1 [ S (2) (β, s) S () (β, s) S(1) (β, s) 2 ] dn S () (β, s) 2 i (s). 2

Asymptotical Normality = S( ˆβ) = S(β ) Î(β )( ˆβ β ) n 1/2 ( ˆβ { 1 β) = n 1 Î(β )} n 1/2 S(β ) 1. n 1 Î(β ) i(β ) = o p (1) 2. n 1/2 S(β ) N{, i(β )} as n. where i(β ) = τ [s (2) (β, s) s(1) (β, s) 2 ] h s () (s)ds. (β, s) 21

Information Matrix Consider the expansion n 1 Î(β ) i(β ) = M n (β ) + A n (β ) A(β ) + A(β ) i(β ) where n τ [ S M n (β) = n 1 (2) (β, s) i=1 S () (β, s) S(1) (β, s) 2 ] dm S () (β, s) 2 i (s) τ [ S A n (β) = n 1 (2) (β, s) S () (β, s) S(1) (β, s) 2 ] S ( (β S () (β, s) 2, s)h (s)ds τ [ s (2) (β, s) A(β) = s () (β, s) s(1) (β, s) 2 ] s () (β s () (β, s) 2, s)h (s)ds M n (β ) = o p (1) (MCLT), A n (β ) A(β ) = o p (1) (LLN) and A(β ) i(β ) = o p (1) (Consistency of β and continuity). 22

Scores Score functions n 1/2 S(β ) = n 1/2 n i=1 [ ] Z i S(1) (β, s) dm S () i (s) (β, s) By MCLT n 1/2 S(β ) converges weakly to N(, Σ ), where Σ = lim n n 1 the information equality! n i=1 τ [ ] 2 Z i S(1) (β, s) Y S () i (s)e β Z i h (s)ds = i(β ) (β, s) 23

Asymptotical Normality Consistency: ˆβ β = o p (1). Normality: n 1/2 ( ˆβ β ) N(, i(β ) 1 ). i(β ) can be consistently estimated n 1 I( ˆβ) 24

Asymptotical Properties of Breslow estimator The cumulative baseline hazard function can be estimated as Ĥ (t) = n i=1 { n } 1 I(U i s)e ˆβ Z i dn i (s). i=1 Consider the expansion { } n 1/2 Ĥ (t) H (t) = n 1/2 [{ n } 1 { n } 1 ] I(U i s)e ˆβ Z i I(U i s)e β Z i dn(s) i=1 [ { t n 1 ] +n 1/2 I(U i s)e β i} Z dn(s) H (s) + o p (1) i=1 i=1 25

Asymptotical Properties of Breslow estimator Continue the expansion for n 1/2 {Ĥ(t) H (t)} [ = s (1) (β, s){s () (β, s)} 1 h (s)ds] n 1/2 ( ˆβ β ) +n 1/2 n i=1 i=1 converging to a mean zero Gaussian process { n 1 I(U i s)e β i} Z dm i (s) + o p (1) 26

Asymptotical Properties of Breslow estimator The variance of n 1/2 {Ĥ(t) H (t)} can be estimated as the empirical variance of n 1/2 [ n i=1 +n 1/2 n ] S (1) (β, s) S () (β, s) dĥ(s) Î(β ) i=1 { where G i N(, 1), i = 1, 2,, n. n 1 n i=1 τ [ ] Z i S(1) (β, s) dn S () i (s)g i (β, s) I(U i s)e β Z i} 1 dn i (s)g i, 27

The Consequence of model mis-specification If h(t Z i ) = h(t, Z i ) h (t)e β Z i, then ˆβ converges to β which is the solution of the estimating equation [ ] s (1) (t) s(1) (β, t) s () (β, t) s() (t) dt = where s (j) (t) = lim n 1 n i=1 Y i (t)h(t, Z i )Z j i. 28

The Consequence of model mis-specification n 1/2 ( ˆβ β ) still converges to a normal distribution with mean zero and variance A 1 BA 1. where A = [ { } B = E Z i s(1) (β, s) dn s () (β i (t), s) β may depend on the censoring distribution! Robust variance estimator for ˆβ. [ s (2) (β, s) s () (β, s) s(1) (β, s) 2 ] s () (s)ds. s () (β, s) 2 Y i (t)e Z i β s () (β, t) { Z i s(1) (β, t) s () (β, t) } de{n i (t)}] 2. 29