Variable Selection for Regression Models with Missing Data

Σχετικά έγγραφα
ST5224: Advanced Statistical Theory II

Statistical Inference I Locally most powerful tests

Other Test Constructions: Likelihood Ratio & Bayes Tests

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Uniform Convergence of Fourier Series Michael Taylor

Example Sheet 3 Solutions

2 Composition. Invertible Mappings

Every set of first-order formulas is equivalent to an independent set

Solution Series 9. i=1 x i and i=1 x i.

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

6.3 Forecasting ARMA processes

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Lecture 21: Properties and robustness of LSE

C.S. 430 Assignment 6, Sample Solutions

EE512: Error Control Coding

Problem Set 3: Solutions

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Solutions to Exercise Sheet 5

Lecture 34 Bootstrap confidence intervals

Homework 3 Solutions

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Concrete Mathematics Exercises from 30 September 2016

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

4.6 Autoregressive Moving Average Model ARMA(1,1)

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

2. Let H 1 and H 2 be Hilbert spaces and let T : H 1 H 2 be a bounded linear operator. Prove that [T (H 1 )] = N (T ). (6p)

Numerical Analysis FMN011

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

6. MAXIMUM LIKELIHOOD ESTIMATION

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Μηχανική Μάθηση Hypothesis Testing

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

12. Radon-Nikodym Theorem

Introduction to the ML Estimation of ARMA processes

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Tridiagonal matrices. Gérard MEURANT. October, 2008

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Reminders: linear functions

Section 8.3 Trigonometric Equations

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Supplementary Material For Testing Homogeneity of. High-dimensional Covariance Matrices

SOME PROPERTIES OF FUZZY REAL NUMBERS

The challenges of non-stable predicates

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Congruence Classes of Invertible Matrices of Order 3 over F 2

Lecture 2. Soundness and completeness of propositional logic

Fractional Colorings and Zykov Products of graphs

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Approximation of distance between locations on earth given by latitude and longitude

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

5. Choice under Uncertainty

Optimal Parameter in Hermitian and Skew-Hermitian Splitting Method for Certain Two-by-Two Block Matrices

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Matrices and Determinants

Online Appendix I. 1 1+r ]}, Bψ = {ψ : Y E A S S}, B W = +(1 s)[1 m (1,0) (b, e, a, ψ (0,a ) (e, a, s); q, ψ, W )]}, (29) exp( U(d,a ) (i, x; q)

Asymptotic distribution of MLE

derivation of the Laplacian from rectangular to spherical coordinates

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Second Order RLC Filters

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

A Note on Intuitionistic Fuzzy. Equivalence Relation

Lecture 12: Pseudo likelihood approach

Lecture 15 - Root System Axiomatics

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Exercises to Statistics of Material Fatigue No. 5

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Local Approximation with Kernels

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

w o = R 1 p. (1) R = p =. = 1

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Finite Field Problems: Solutions

Supplementary Materials: Proofs

Tutorial on Multinomial Logistic Regression

Empirical best prediction under area-level Poisson mixed models

Inverse trigonometric functions & General Solution of Trigonometric Equations

Arithmetical applications of lagrangian interpolation. Tanguy Rivoal. Institut Fourier CNRS and Université de Grenoble 1

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Srednicki Chapter 55

5.4 The Poisson Distribution.

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Math221: HW# 1 solutions

Appendix S1 1. ( z) α βc. dβ β δ β

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1


HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Second Order Partial Differential Equations

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

posterior distribution is well defined for any prior π and ν-almost all x X. Moreover,

Homework for 1/27 Due 2/5

Laplace Expansion. Peter McCullagh. WHOA-PSI, St Louis August, Department of Statistics University of Chicago

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Bayesian Analysis in Moment Inequality Models Supplement Material

The Simply Typed Lambda Calculus

Transcript:

Statistica Sinica (2010): Supplement Variable Selection for Regression Models with Missing Data Ramon I. Garcia, Joseph G. Ibrahim and Hongtu Zhu Department of Biostatistics, University of North Carolina at Chapel Hill, USA Supplementary Material S1. Assumptions for proofs of Theorems 1-2 Even though the model l(η) = n l i(η) = n log f(d o,i η) may be misspecified, White (1994) has shown that the unpenalized ML estimate converges to the value of η which minimizes E [ n l i(η)] = n li (η)g(d o,i )dd o,i where g( ) is the true density. We denote the true value by ηn = arg sup η E[l(η)]. For simplicity, we further assume that E[ ηl i (η)] = 0 for all i and η = η n, for all n. Similarly, we define η Sn = argsup η : β j 0,j SE[Q(η η )] and let η Sn = η S, for all n. The following assumptions are needed to facilitate development of our methods, although they may not be the weakest possible conditions. (C1) η is unique and an interior point of the parameter space Θ, where Θ is compact. (C2) η 0 η in probability. (C3) For all i, l i (η) is three-times continuously differentiable on Θ and l i (η), j l i (η) 2 and j k l l i (η) are dominated by B i (D o,i ) for all j, k, l = 1,, d where j = / η j. We also require that the same smoothness condition also holds for h(d o,i ; η) = E [log f(z m,i D o,i ; η) D o,i ; η]. (C4) For each ɛ > 0, there exists a finite K such that for all n. 1 sup n 1 n [ ] E B i (D o,i )1 [Bi (D o,i )>K] < ɛ

(C5) RAMON I. GARCIA, JOSEPH G. IBRAHIM and HONGTU ZHU 1 lim n n 1 lim n n lim n 1 n lim n 1 n ηl 2 i (η ) = A(η ), η l i (η ) η l i (η ) T = B(η ), D 20 Q(ηS η ) = C(ηS η ), D 10 Q(ηS η )D 10 Q(ηS η ) T = D(ηS η ), where A(η ) and C(η S η ) are positive definite and D ij denotes the i-th and j-th derivatives of the first and second component of the Q function respectively. (C6) Define a n = max j { p λ jn ( β j ) : β j 0 }, and b n = max j { p λ jn ( β j ) : β j 0 }. 1. max j {λ jn : β j 0} = o p(1). 2. a n = O p (n 1/2 ). 3. b n = o p (1). (C7) Define d n = min j {λ jn : β j = 0}. 1. For all j such that β j = 0, lim n λ 1 jn lim inf β 0+ p λ jn (β) > 0 in probability. 2. n 1/2 d n p. Proof of Theorem 1a. Given assumptions (C1) - (C6), then it follows from White (1994) that n 1/2 η l i (η ) D N (0, B(η )) (1.2) and n 1/2 ( η 0 η ) D N ( 0, A(η ) 1 B(η )A(η ) 1). (1.3)

VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA To show η λ is a n-consistent maximizer of η, it is enough to show that p P sup u =C l(η + n 1/2 u) n p λjn ( βj + n 1/2 u j ) p l(η ) + n p λjn ( βj ) < 0 converges to 1 for large C, since this implies there exists a local maximizer in the ball {η +n 1/2 u; u C} and thus η λ η = O p (n 1/2 ). Taking a Taylor s series expansion of the penalized likelihood function, we have p p l(η + n 1/2 u) l(η ) n p λjn ( βj + n 1/2 u j ) + n p λjn ( βj ) p 1 l(η + n 1/2 u) l(η ) n p λjn ( βj + n 1/2 u j ) + n p λjn ( βj ) = n 1/2 u T η l(η ) 1 [ 1 ] p 1 ] 2 ut n 2 ηl(η ) u n [p 1/2 λjn ( βj )sgn(βj )u j 1 2 p 1 [ ] p λ jn ( βj )u 2 j + o p (1) n 1/2 u T η l(η ) 1 2 ut A(η )u + p 1 n 1/2 a n u 1 1 2 b n u 1 2 + o p (1) n 1/2 u T η l(η ) 1 2 ut A(η )u + p 1 n 1/2 a n u 1 + o p (1), (1.4) where u = (u T 1, ut 2 )T and u 1 is a p 1 1 vector. The second inequality in (1.4) follows because p λjn (0) = 0 and p λjn 0. The third inequality follows from condition (C5) and the fact that p 1 u i p 1 ( p 1 u2 i )1/2. The last inequality follows from (C6). Since the first and third terms in (1.4) are O p (1) by (1.2) and condition (C6) - 2, and u T A(η )u is bounded below by u 2 the smallest eigenvalue of A(η ), then the second term in (1.4) dominates the rest and all the terms can be made negative for large enough C. Proof of Theorem 1b. Suppose that the conditions of Theorem 1a hold, and there exists an, η λ, which is a n-consistent estimator of η. It suffices to show that for large n, the gradient of the penalized log likelihood function evaluated at η λ, such that p 1

RAMON I. GARCIA, JOSEPH G. IBRAHIM and HONGTU ZHU η λ η = O p (n 1/2 ) and ˆβ (2)λ = O p (n 1/2 ) = o p (1), is zero. Taking a Taylor s series expansion of the penalized log likelihood function about η, we have p 0 = n 1/2 η l( η λ ) n η p λjn ( β j ) η= ηλ = n 1/2 η l(η ) n 1/2 ( η λ η ) [ T 1 ] n 2 ηl(η ) + O p (n 1 ) p n 1/2 η p λjn ( β j ) = O p (1) n 1/2 η η= ηλ p p λjn ( β jλ ) (1.5) η= ηλ where the last equality follows from n 1/2 η l(η ) = n 1/2 ( η λ η ) T [ 2 ηl(η )/n ] = O p (1). Therefore, for j = p 1 + 1,... p, the gradient with respect to β j of the second term of (1.5), is sgn( ˆβ j )n 1/2 λ jn [λ 1 jn p λ jn ( ˆβ j )]. Since ˆβ (2)λ = o p (1), λ 1 jn p λ jn ( ˆβ j ) is greater than zero for large n, it follows that (1.5) is dominated by the term sgn( ˆβ j )n 1/2 d n. Since n 1/2 d n p, it must be the case that ˆβjλ = 0 for j = p 1 +1,..., p, otherwise the gradient could be made large in absolute value and could not possibly be equal to zero. Proof of Theorem 1c. Given conditions (C1) - (C7), Theorems 1a and 1b apply. Thus, there exists a ˆβ ( ) T ( λ = ˆβT (1)λ, 0 T, and ˆηλ = ˆβT λ, ˆτ λ T, ˆαT λ, ˆξ ) T λ T which is a n local ( ) T ( ) T maximizer of (6). Let β = β(1) T, 0T, γ = β(1) T, τ T, α T, ξ T, γ = ( ) T ( β(1) T, τ T, α T, ξ T, ˆγλ = ˆβT (1)λ, ˆτ λ T, ˆαT λ, ˆξ ) T λ T, and l(γ) = l((β T (1), 0, τ T, α T, ξ T )). Let Ã(γ) be the resulting matrix from removing the p 1 +1 to p rows and columns

VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA from the matrix A((β T (1), 0, τ T, α T, ξ T )) and similiarly define B. Let, ( ) h 1 β(1) = (p λ1 ( β 1 )sgn( β 1 ),..., p λ p1 ( β p1 )sgn( β p1 )) T, ( ) ) G 1 β(1) = diag (p λ 1 ( β 1 ),..., p λ p1 ( β p1 ), ( ) ( ) h(γ ) = h 1 β(1), G(γ ) = G 1 β(1) 0, and 0 0 0 Σ(γ ) = ] 1 [Ã(γ ) + G(γ ) B(γ ) ] 1 [Ã(γ ) + G(γ ). Then, using a Taylor s series expansion, we have p 0 = γ l( γλ ) n γ p λj ( β λj ) γ= γλ = γ l(γ ) nh(γ ) n( γ λ γ ) [ T 1 ] n 2 γ l(γ ) + G(γ ) + o p (1) [ ] = n 1/2 γ l(γ ) n 1/2 h(γ ) n 1/2 ( γ λ γ ) T Ã(γ ) + G(γ ) + o p (1), which indicates } 1 n { γ 1/2 λ γ + [Ã(γ ) + G(γ )] [ h(γ D 1 ) = n 1/2 Ã(γ ) + G(γ )] γ l(γ ), and therefore } 1 n { γ 1/2 λ γ + [Ã(γ ) + G(γ )] h(γ D ) N (0, Σ(γ )). For the SCAD penalty with λ jn = λ n, if λ n = o p (1), n 1/2 p λ n and conditions (C1) - (C5) are satisfied, then the oracle properties of Theorem 1 hold. For the ALASSO penalty, with λ jn = λ n ˆβ j 1 where ˆβ j is the unpenalized ML estimate, λ n = O p (n 1/2 p ), nλ n and conditions (C1) - (C5) imply Theorem 1. Therefore, depending on the penalty function and specification of λ jn, the rates of λ jn which characterize the oracle properties, may be different. Under the assumptions of Theorem 1 for the SCAD and ALASSO penalty functions, h(η ) 0, therefore the asymptotic covariance matrix of ˆγ λ is n 1 Σ(γ ). Using Louis s formula (Louis (1982)), an estimate of Σ(γ ) is, Var(ˆγ λ ) n 1 [Â(ˆγ λ) + G(ˆγ λ )] 1 ˆB(ˆγλ ) [A(ˆγ λ ) + G(ˆγ λ )] 1, (1.6)

RAMON I. GARCIA, JOSEPH G. IBRAHIM and HONGTU ZHU where Q i (γ γ ) = γ [ ] log f(d c,i ; γ, β (2) = 0)f(z m,i D o,i ; γ, β (2) = 0)dz m,i, γ=γ ˆB(γ ) = n 1 Q i (γ γ ) Q i (γ γ ) T, Q(γ γ ) = γ Q((β (1), 0, τ, α, ξ) γ, β (2) = 0) γ=γ Q(γ γ ) = γq((β 2 (1), 0, τ, α, ξ) γ, β (2) = 0) γ=γ, and Â(γ ) = n 1 Q(γ γ ) + n 1 Q(γ η ) Q(γ γ ) T where v = vv T. Proof of Theorem 2a. n 1 E[( γ log f(d c ; γ, β (2) = 0)) D o ; γ, β (2) = 0] γ=γ To prove Theorem 2, we first show that for η tn p ηt, t = 1, 2, Q(η 1n η 2n ) Q(η 1 η 2 ) = o p (n) E[Q(η 1n η 2n )] E[Q(η 1 η 2 )] = o p (n) Q(η 1n η 2n ) E[Q(η 1 η 2 )] = o p (n). (1.7) First we note that conditions (C3) and (C4) imply [Q(η 1 η 2 ) E[Q(η 1 η 2 )]/n converges in probability to 0 for all η 1, η 2 Θ. Furthermore, because conditions (C3) and (C4) satisfy the W-LIP assumption of Lemma 2 of Andrews (1992), we obtain the uniform continuity and stochastic continuity of E[Q(η 1 η 2 )] and [Q(η 1 η 2 ) E(Q(η 1 η 2 ))]/n respectively. Because the stochastic continuity and pointwise convergence properties satisfy the assumptions of Theorem 3 of Andrews (1992), we have which implies (1.7). 1 sup (η 1,η 2 ) Θ Θ n Q(η p 1 η 2 ) E[Q(η 1 η 2 )] 0, (1.8) We also need to show that the hypothetical estimator η S = argsup η : βj 0,j SQ(η η ) is a n-consistent estimator of ηs. To prove this, it is enough to show that [ ] P sup Q(ηS + n 1/2 u η ) Q(ηS η ) u =C 1 ɛ

VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA for large C, since this implies there exists a local maximizer in the ball {η + n 1/2 u; u C} and thus η S ηs = O p(n 1/2 ). Taking a Taylor s series expansion of the first component of the Q function, we have Q(ηS + n 1/2 u η ) Q(ηS η ) = n 1/2 u T D 10 Q(ηS η ) 1 [ 1 ] 2 ut n D20 Q(ηS η ) u + o p (1) = n 1/2 u T D 10 Q(ηS η ) 1 2 ut C(ηS η )u + o p (1). (1.9) Conditions (C3) and (C5) ensure that n 1/2 D 10 Q(η S η ) D N(0, D(η S η )) = O p (1) and C(η S η ) is positive definite. Therefore, the second term dominates the rest and (1.9) can be made negative for large enough C. p Let η Sλ = argsup Q(η η 0 ). Since η 0 η p and η Sλ η Sλ, we have η: β j =0,j S λ 1 n dic Q(λ, 0) = 1 n (IC Q(λ) IC Q (0)) = 1 n [2Q ( η 0 η 0 ) 2Q ( η λ η 0 ) + ĉ n ( η λ ) ĉ n ( η 0 )] 2 n [Q ( η 0 η 0 ) Q ( η Sλ η 0 )] + o p (1) = 2 n [Q ( η 0 η 0 ) Q ( η Sλ η )] + o p (1) 2 n [Q ( η 0 η 0 ) Q ( η Sλ η )] + o p (1) = 2 n E [Q (η η )] E [ Q ( η S λ η )] + o p (1) 2 n min S S T {E [Q (η η )] E [Q (η S η )]} + o p (1), where the second and fourth inequalities follow because Q ( η λ η 0 ) Q ( η Sλ η 0 ) and Q ( η Sλ η ) Q ( η Sλ η ) for all λ and the third and fifth equalities follow from (1.7). Therefore, we have which yields Theorem 2a. ( ) P r inf λ Ru p IC Q (λ) > IC Q (0) 1,

RAMON I. GARCIA, JOSEPH G. IBRAHIM and HONGTU ZHU Proof of Theorem 2b. Under the assumptions of Theorem 2b, we have n 1/2 δ Q (λ 2, λ 1 ) = n 1/2 (IC Q (λ 2 ) IC Q (λ 1 )) = 2n 1/2 (Q( η λ1 η 0 ) 2Q( η λ2 η 0 )) + n 1/2 ( (ĉ( η λ2 ) ĉ( η λ1 )) ) ( ) = 2n 1/2 Q( η λ1 η 0 ) E[Q(ηS λ1 η 0 )] 2n 1/2 Q( η λ2 η 0 ) E[Q(ηS λ2 η 0 )] ( ) +2n 1/2 E[Q(ηS λ2 η 0 )] E[Q(ηS λ1 η 0 )] + n 1/2 δ c (λ 2, λ 1 ) = O p (1) + n 1/2 p δ c21. Thus IC Q (λ 2 ) > IC Q (λ 1 ) in probability, which yields Theorem 2b. Theorem 2c is similar to that of Theorem 2b. Proof of S2. Statistical model for application of SIAS method to linear regression simulations. To implement SIAS, we assume the response model is y i N(u T i β, σ2 ), the covariate distribution is u i N(µ u, Σ u ) for i = 1,..., n and the missing covariates are MAR. For the prior distribution of all the parameters we assume π(β, γ, σ 2, µ u, Σ u ) = p {π(β j γ j )π(γ j )}π(σ 2 )π(µ u Σ u )π(σ u ) where µ u Σ u N 8 (0, δ 1 Σ u ), Σ 1 u Wishart(r, I 8 ), σ 2 Gamma(ν/2, νω/2), β j (1 γ j )N(0, t 2 j ) + γ jn(0, c 2 j t2 j ) and γ j Bernoulli(1/2). The hyperparameters were selected to reflect a lack of prior information on the parameters, i.e. δ = ν = ω =.001, r = 8. For the values of t j and c j, we use those suggested by George and McCulloch (1993) where (σβ 2 j /t 2 j, c2 j ) = (1, 5), (1, 10), (10, 100), (10, 300) and σβ 2 j was estimated using preliminary simulations. We performed 5,000 simulations after a burn-in period of 5000 iterations. The posterior probability of γ was calculated from the posterior simulations and the model with the highest probability was selected as the best model. The results of (σβ 2 j /t 2 j, c2 j ) = (1, 10) are presented since it gives the best model with the highest posterior probability. S3. Simulation results evaluating performance of standard errors of penalized estimates for linear regression simulations

VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA Table 1.1: Standard errors of penalized estimates of linear regression model with covariates missing at random Method ˆβ1 ˆβ2 ˆβ5 SD SD m SD mad SD SD m SD mad SD SD m SD mad SCAD-RE.138.164.042.170.187.039.160.180.039 SCAD-IC Q.141.161.039.178.180.048.163.175.038 ALASSO-RE.157.161.031.183.180.035.165.173.036 ALASSO-IC Q.139.164.039.198.185.037.166.176.038 Oracle.138.155.036.179.157.040.147.139.028 In order to test the accuracy of the asymptotic error formula (1.6), we estimated the standard errors of the significant coefficients, β 1, β 3, and β 5 for the linear regression model using n = 60, σ = 1 with the covariates missing at random. The median of the absolute deviations ˆβ jλ βj divided by.6745, denoted by SD, of the 100 penalized estimates can be regarded as the true standard error. The median of the estimated standard errors is denoted as SD m. The median absolute deviation error divided by.6745, denoted SD mad, measures the overall performance of the standard error formula. The results, which are presented in Table 1.1, indicate that the standard error estimate does a good job of estimating the true standard error. All of the SD mad values were less than.05. S4. Statistical model for application of SIAS method to Melanoma data Using the definition of y i, z i and x i in the main document, we assume a logistic regression model on y i x i, β with E(y i x i, β) = exp(γ i )/(1 + exp(γ i )), where γ i = (1, z i, x i ) T β, and β = (β 0, β 1,..., β 6 ) T. We assume the covariates are MAR with the following covariate distribution for i = 1,..., n. f(z i x i ; α) = f(z i3 z i1, z i2, x i ; α 3 )f(z i1, z i2 x i ; α 1, α 2 ) Since x i are completely observed, they are conditioned on throughout. We take a (z i1, z i2 x i ) N 2 (µ i, Σ), where µ i = (µ i1, µ i2 ) and µ is = α s0 + 3 α sjx ij for s = 1, 2, i = 1,..., n and Σ is an unstructured 2 2 covariance matrix. We also assume a logistic regression model for x i3 conditional on (z i1, z i2, x i ) with with E(y i x i, β) = exp(ψ i )/(1 + exp(ψ i )), where

RAMON I. GARCIA, JOSEPH G. IBRAHIM and HONGTU ZHU ψ i = (1, z i1, z i2, x i ) T ϕ, and ϕ = (ϕ 0, ϕ 1,..., ϕ 5 ) T. Let ν j = (α 1j, α 2j ) T for j = 0,..., 3. For the prior distribution, we assume π(β, ϕ, ν 0,..., ν 3, Σ) = p {π(β j γ j )π(γ j )} 5 π(ϕ l ) l=0 3 π(ν k Σ)π(Σ), where ϕ l N(0, δ 1 ) for l = 0,..., 5, ν k Σ N 2 (0, δ 1 Σ) for k = 0,..., 3, Σ 1 Wishart(r, I 2 ), β j (1 γ j )N(0, t 2 j )+γ jn(0, c 2 j t2 j ) and γ j Bernoulli(1/2) for j = 1,..., 6. The hyperparameters were selected to reflect lack of prior information on the parameters, i.e. δ =.001, r = 2. We set (σβ 2 j /t 2 j, c2 j ) = (1, 10). The posterior probability of γ was calculated from 5000 simulated observations after 5,000 burn-in iterations and the model with the highest probability was selected as the best model. References Andrews, D. W. K. (1992). Generic uniform convergence. Econometric Theory 8, 241-57. George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of the American Statistical Association 88, 881-889. Louis, T. A. (1983). Finding the observed information matrix when using the EM algorithm. Journal of Royal Statistical Society, Series B 44, 226-233. White, H. (1994). Estimation, Inference, and Specification Analysis. Cambridge University Press, New York. k=0