Supplementary Materials: Proofs

Σχετικά έγγραφα
ST5224: Advanced Statistical Theory II

Other Test Constructions: Likelihood Ratio & Bayes Tests

Statistical Inference I Locally most powerful tests

2 Composition. Invertible Mappings

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Example Sheet 3 Solutions

Every set of first-order formulas is equivalent to an independent set

Solution Series 9. i=1 x i and i=1 x i.

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Homework 3 Solutions

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

6.3 Forecasting ARMA processes

C.S. 430 Assignment 6, Sample Solutions

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

A Two-Sided Laplace Inversion Algorithm with Computable Error Bounds and Its Applications in Financial Engineering

EE512: Error Control Coding

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Areas and Lengths in Polar Coordinates

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Math221: HW# 1 solutions

Problem Set 3: Solutions

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

Uniform Convergence of Fourier Series Michael Taylor

Areas and Lengths in Polar Coordinates

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Matrices and Determinants

4.6 Autoregressive Moving Average Model ARMA(1,1)

Section 8.3 Trigonometric Equations

Homework 8 Model Solution Section

6. MAXIMUM LIKELIHOOD ESTIMATION

Approximation of distance between locations on earth given by latitude and longitude

w o = R 1 p. (1) R = p =. = 1

Solutions to Exercise Sheet 5

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Online Appendix I. 1 1+r ]}, Bψ = {ψ : Y E A S S}, B W = +(1 s)[1 m (1,0) (b, e, a, ψ (0,a ) (e, a, s); q, ψ, W )]}, (29) exp( U(d,a ) (i, x; q)

Local Approximation with Kernels

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Lecture 34 Bootstrap confidence intervals

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Lecture 21: Properties and robustness of LSE

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Supplementary Material For Testing Homogeneity of. High-dimensional Covariance Matrices

derivation of the Laplacian from rectangular to spherical coordinates

SOME PROPERTIES OF FUZZY REAL NUMBERS

5. Choice under Uncertainty

557: MATHEMATICAL STATISTICS II RESULTS FROM CLASSICAL HYPOTHESIS TESTING

Lecture 2. Soundness and completeness of propositional logic

Pg The perimeter is P = 3x The area of a triangle is. where b is the base, h is the height. In our case b = x, then the area is

A Note on Intuitionistic Fuzzy. Equivalence Relation

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Tridiagonal matrices. Gérard MEURANT. October, 2008

Coefficient Inequalities for a New Subclass of K-uniformly Convex Functions

Reminders: linear functions

Exercises to Statistics of Material Fatigue No. 5

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

2. Let H 1 and H 2 be Hilbert spaces and let T : H 1 H 2 be a bounded linear operator. Prove that [T (H 1 )] = N (T ). (6p)

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Congruence Classes of Invertible Matrices of Order 3 over F 2

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Finite Field Problems: Solutions

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

5.4 The Poisson Distribution.

Arithmetical applications of lagrangian interpolation. Tanguy Rivoal. Institut Fourier CNRS and Université de Grenoble 1

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Inverse trigonometric functions & General Solution of Trigonometric Equations

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Fractional Colorings and Zykov Products of graphs

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

The semiclassical Garding inequality

1 String with massive end-points

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

An Inventory of Continuous Distributions

Space-Time Symmetries

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING

Bayesian Analysis in Moment Inequality Models Supplement Material

The Pohozaev identity for the fractional Laplacian

Second Order Partial Differential Equations

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Theorem 8 Let φ be the most powerful size α test of H

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

Numerical Analysis FMN011

Sequent Calculi for the Modal µ-calculus over S5. Luca Alberucci, University of Berne. Logic Colloquium Berne, July 4th 2008

CRASH COURSE IN PRECALCULUS

Second Order RLC Filters

Parametrized Surfaces

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

( y) Partial Differential Equations

10.7 Performance of Second-Order System (Unit Step Response)

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Section 7.6 Double and Half Angle Formulas

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Transcript:

Supplementary Materials: Proofs The following lemmas prepare for the proof of concentration properties on prior density π α (β j /σ for j = 1,..., p n on m n -dimensional vector β j /σ. As introduced earlier, we consider the shrinkage prior β j λ j, σ 2 ind N mn (, λ j σ 2 I mn, (1 λ j ψ j ind exp(ψ, ψ j iid gamma(α, 2. (2 For j = 1,..., p n, we let p(λ j ψ j denote the prior density for λ j given ψ j and p α (ψ denote the prior density for ψ j given the hyper-parameter α. In the following lemmas on the multivariate Dirichlet-Laplace (DL prior, we want to show that it concentrates within a small neighborhood around zero with a dominated and steep peak, while retaining heavy tails away from zero. Lemma 1 d For any positive constant µ > and thresholding value a n m n ɛ n /p n, 1 π α (xdx p (1+µ n, x a n (3 if the Dirichlet parameter satisfies α p (1+ν n for ν > µ. Lemma 2 For a small constant η > and E defined in assumption A 2 (2, log ( ( log pn s x j η m nɛ n inf x π α(x m ne p n s π α (x j dx 1... dx pn s nɛ 2 n, (4 nɛ 2 n/s. (5 1

Lemma 3 The prior density π α (β j /σ satisfies max j ξ sup x 1, x 2 [ β j /σ m nc ɛ n, β j /σ + m nc ɛ n] π α (x 1 π α (x 2 < l n, (6 for a constant C and s log l n log p n, The three lemmas above specify the conditions on prior concentration. Lemma 1 and Lemma 2 establish that the prior probability that the L 2 -norm of each basis coefficient larger than a n is arbitrarily small with the constraint: a n d /m n ɛ n /p n. Therefore, the shrinkage prior on β j /σ has a dominated and steep peak around zero. Moreover, Lemma 3 controls the variation in the prior density of basis coefficients at points not too close to zero. The shrinkage prior density concentrates in a very small area around zero but also has heavy and sufficiently flat tails. Thus the prior mimics the composition of a continuous distribution and a point mass at zero. The proof of these lemmas follows. Proof of Lemma 1 For a vector x of dimension m n, the prior density is defined in (1 and (2. When λ is given, T def = x 2 /λ χ 2 m n. Next, we have 1 π α (xdx = x a n = a 2 n /p n + ( P T > a2 n λ ( P a 2 n /pn P p(λ ψp α (ψdψdλ [ ] T > a2 n p(λ ψp α (ψdψ dλ λ ( [ ] T > a2 n p(λ ψp α (ψdψ dλ (7 λ For any constant µ >, we can conclude that a 2 n /p n ( [ ] P T > a2 n p(λ ψp α (ψdψ dλ P(T > p n P α (λ < a 2 λ n/p n e pn/2, (8 where the last inequality is obtained from the tail property of χ 2 -distribution. 2

Next, from Lemma 3.3 in Bhattacharya et al. (215, it follows that a 2 n/p n P ( T > a2 n λ [ ] p(λ ψp α (ψdψ dλ P α (λ > a 2 n/p n C 1 α log(p n /a 2 n (9 for a constant C 1 >. Combining the inequalities in (7, (8 and (9, we have 1 π α (xdx e pn/2 + C 1 α log(p n /a 2 n p (1+µ n. x a n for α p (1+ν n where ν > µ. Proof of Lemma 2 When λ j is given, the m n -dimensional vector x j satisfies that x j / λ j iid χ mn for j = 1,..., p n. Therefore, for the prior of λ j defined in (2, E( x j = E [ E ( x j λj ] = E [ λj E( x j / λ j λj ] = 2Γ((mn + 1/2 E( λ j, (1 Γ(m n /2 and then E( λ j = E [ E( λ j ψ j ] = Γ(3/2E( ψ j = Γ(3/2 2Γ(α+1/2. Then for a con- Γ(α stant C 2 > free of α, E( x j = C 2 mn α. By the Central Limit Theorem, P ( pn s ( x j ηɛ n = P mn 1 p s p n s x j mn ηɛ n p s 1/2 (11 if α C 3ɛ n, which can be verified since α p (1+ν p s n for ν >. Therefore, we show that log ( pn s x j mn ηɛ n p n s π α (x j dx 1... dx pn s nɛ 2 n. (12 3

Next, we need to prove that log ( inf x mne π α (x nɛ 2 n/s. For s 1 /s 2 close to zero, inf x m ne π α(x = (2πλ mn/2 e mne2 /2λ p(λ ψp α (ψdλdψ (2πs 2 mn/2 e mne2 /2s 1 P α (s 1 < λ < s 2. (13 As λ follows prior distribution in (2 for a given hyper-parameter α, we have P α (s 1 < λ < s 2 = (e s 1/ψ e s 2/ψ p α (ψdψ e s 2/2 αe s 2/2 (s 2 s 1 s (1 α 2 C 4 α(1 s 1 /s 2 e s 2/2 1 t α e t dt Γ(α (s 2 s 1 s2 ψ α 2 e s 2/ψ dψ for a constant C 4 >. Therefore, when we have m n /s 1 nɛ 2 n/s and s 2 nɛ 2 n/s, we have ( log inf x π α(x m ne m n 2 log(2πs 2 + m ne 2 2s 1 + (1 + ν log p n + s 2 /2 nɛ 2 n/s. (14 Proof of Lemma 3 By assumption, min j ξ β j /σ mn > ɛ n, max j ξ β j /σ mn < E and π α (x = = φ(x λp(λ ψp α (ψdψdλ (2πλ mn/2 e x 2 /2λ p(λ ψp α (ψdψdλ where the function of λ in the integrand f (λ def = (2πλ mn/2 e x 2 /2λ is concave and it is increasing on (, x 2 /m n ] and decreasing on ( x 2 /m n,. Then we can separate the 4

integral by x 2 /m n log p n so that f (λ is increasing until the point x 2 /m n. Since x 2 /m n log p n f (λ x 2 /m n log p n f (λ ( 2π x 2 p(λ ψp α (ψdψdλ m n log p n x 2 /m n mn/2 e mn log pn/2 (15 p(λ ψp α (ψdψdλ p(λ ψp α (ψdψdλ f (λ 2 x 2 /m n log p n ( 2 x (2π x 2 /m n mn/2 e mn log pn/4 2 P α < λ < x 2 m n log p n m n C 5 (2π x 2 /m n mn/2 e mn log pn/4 αe x 2 /2m n (16 for C 5 > a constant free of n. The inquality in (16 is obtained from the conlcusion in the proof of Lemma 2. Then for m n > nɛ 2 n, we can prove that x 2 /m n log p n f (λ p(λ ψp α (ψdψdλ x 2 /m n log p n f (λ p(λ ψp α (ψdψdλ (log pnmn/2 log pn/2 e mn C 5 e mn log pn/4 αe x 2 /2m n { =exp log C 5 + m n 2 log log p n m } n log p n + (1 + ν log p n + x 2 4 2m n exp { } C 5nɛ 2 n for a constant C 5 >. The inequality on the ratio implies that, for a given x such that x 2 / m n (ɛ n, E, the prior density of x has the dominated mass on λ ( x 2 /m n log p n, : π α (x 1. (17 x 2 /m n log p n φ(x λp(λ ψp α (ψdψdλ From (17, we have the following results. For any x 1 and x 2 such that x 1 < x 2, contributions of λ to π α (x 1 and π α (x 2 outside of the interval λ (s 3, are negligible, 5

where s 3 def = min ( x 1 2 /m n log p n, x 2 2 /m n log p n. Then we have l n = max j ξ max j ξ = max j ξ max j ξ π α (x 1 sup x 1, x 2 β j /σ ±C mnɛn π α (x 2 ( φλ (x 1 s 3 φ λ (x 2 sup x 1, x 2 β j /σ ±C mnɛn s 3 s sup 3 e 1 x 1, x 2 β j /σ ±C mnɛn s 3 φ λ (x 2 p(λ ψp α (ψdψdλ φ λ (x 2 p(λ ψp α (ψdψdλ 2λ ( x 2 2 x 1 2 φ λ (x 2 p(λ ψp α (ψdψdλ φ λ (x 2 π(λ ψπ α (ψdψdλ { 1 ( sup exp x2 2 x x 1, x 2 β 1 2 } j /σ ±C o mnɛn 2s 3 { max exp 2m n ( β j/σ / m n + C ɛ n C ɛ n /s 3 }. j ξ Hence s log l n C 6 sm n Eɛ n /s 3 for constant C 6 >. In view of sm n ɛ n log p n, it follows that s log l n log p n. Proof of Theorem 4.1 The proof of Theorem 4.1 follows the results of Theorem A.1 and Theorem A.2 in Song and Liang (216, but the difference is that we have additional bias in the true model, Y = p f j (X j + σε = B(Xβ + σδ + σε, (18 where each additive function is estimated by m n -dimensional basis expansion. Therefore, Y includes both bias term and linear regression part analogous to the model in Song and Liang (216. The linear coefficient β = (β 1,... β pn T is a p n m n -dimensional basis coefficient vector where a multivariate DL prior will be imposed for each covariate. The true model is the additive non-parametric model p f j (X j +σ ε, where f j are κ-smooth functions for j ξ and f j = are zero effects for j / ξ. In our proposed model, we estimate each function by basis expansion B(X j β j, which imposes an extra bias term δ in (21 due to approximation error. The magnitude of δ is bounded by a multiplier of m κ n. 6

For B n = {At least p of β j /σ is larger than a n } with m n p nɛ 2 n/ log p n and m n p n nɛ 2 n, C n = { B(Xβ p f j (X j σ ɛ n } {σ 2 /σ 2 > (1 + ɛ n /(1 ɛ n } {σ 2 /σ 2 < (1 ɛ n /(1 + ɛ n } and A n = B n C n, we first consider test function φ n = max{φ n, φ n }, φ n = φ n = {ξ ξ, ξ p+s} {ξ ξ, ξ p+s} { Y T (I H ξ Y } 1 (n m n ξ σ 1 c 2 ɛ n (19 { 1 B(X ξ ( B(X ξ T B(X ξ 1 B(Xξ T Y fj (X j c σ } nɛ n /5 j ξ (2 for sufficiently large constants c and c, where H ξ = B(X ξ ( B(X ξ T B(X ξ 1 B(Xξ T is the hat matrix corresponding to the subgroup ξ, and j ξ f j (X j is the summation of true non-parametric functions for covariates X j within subgroup ξ. For any ξ that satisfies ξ ξ, m n ξ m n ( p + s nɛ 2 n, { } Y T (I H ξ Y E (f,σ 2 1 (n m n ξ σ 1 2 c ɛ n { ε T (I H ξ ε =E (f,σ 2 1 1 + 2 εt (I H ξ δ n m n ξ n m n ξ + δt (I H ξ δ n m n ξ } c ɛ n P ( ε T (I H ξ ε (n m n ξ + 2 ε T (I H ξ δ + δ T (I H ξ δ (n m n ξ c ɛ n. Since the magnitude of basis expansion bias is in the order of m κ n, we have δ T (I H ξ δ δ 2 nm 2κ n (n m n ξ ɛ n, 2ε T (I H ξ δ 2 δ ε 2 nm κ n ε. For normally distributed error ε, given the property of chi-square distribution, we have P ( 2 ε T (I H ξ δ ( (n m n ξ c ɛ n P ε c nm 2κ n ɛ 2 n exp{ c nɛ 2 n}. 7

Combining with the conclusion in (A.3 of Song and Liang (216, we can prove that { } Y T (I H ξ Y E (f,σ 2 1 (n m n ξ σ 1 2 c ɛ n exp{ c 3 nɛ 2 n}. (21 Next, fj (X j can be approximated by B-spline basis expansion B(X j β j with bias term δ, where β j is the B-spline basis coefficients for the true non-parametric function. We have pn B(Xβ fj (X j = B(X(β β δ B(X ξ (β ξ β ξ + B(X ξ cβ ξ c + δ, where B(Xξ (β ξ β ξ n βξ β ξ, B(Xξ cβ ξ c nan p n nɛ n and δ nɛn. By the condition that m n ( p + s nɛ 2 n, we have, for some constant c 3, E (f,σ 2 1 { B(X ξ ( B(X ξ T B(X ξ 1 B(Xξ T Y j ξ f j (X j c σ nɛ n /5 } =E (f,σ 2 1 { B(Xξ (β ξ β ξ ( fj (X j B(X ξ β ξ c σ nɛ n /5 } j ξ E (f,σ 2 1 { ( B(X ξ T B(X ξ 1 B(Xξ T Y β ξ c σ ɛ n /5 } E (f,σ 2 1 { B(Xξ ( B(X ξ T B(X ξ 1 B(Xξ T ε + ( B(X ξ T B(X ξ 1 B(Xξ T δ c ɛ n /5 } exp( c 3nɛ 2 n, (22 where the inequality is given by (A.4 of Song and Liang (216 and the fact that δ nm κ n. Therefore, combining (21 and (22, we can prove that E (f,σ 2 φ n exp( c 3 nɛ 2 n for some fixed c 3, according to (A.5 in Song and Liang (216. For the additive model, the process to prove that sup (β,σ 2 C n E β (1 φ n is similar to the proof of Theorem A.1 in Song and Liang (216, by replacing their covariate matrix X with 8

B(X. Therefore, we have inequalities E (f,σ 2 φ n exp( cnɛ 2 n, (23 sup E (β,σ 2 (1 φ n exp( c nɛ 2 n. (24 (β,σ 2 C n The next step is to show that lim n { m(dn } P L (D n exp( c 4nɛ 2 n > 1 exp{ c 5 nɛ 2 n}, (25 for any positive c 4, where m(d n is the marginal likelihood of the data after integrating out the parameters on their priors and L (D n is the likelihood under the truth so that m(d n L (D n = (σ n exp{ Y B(Xβ 2 /2σ 2 } σ n exp{ Y p n f j (X j 2 /σ 2 } π(β, σ2 dβdσ 2. From the proof of Theorem A.1 in Song and Liang (216, we only need to show that, conditional on σ 2 σ 2 η ɛ 2 n, for some constants c 5 and c 6, ( ( P π Y B(Xβ 2 /2σ 2 < p n Y fj (X j 2 / σ 2 + c 5 nɛ 2 n e c 6nɛ 2 n 1 e c 5nɛ 2 n For our proposed model, each element in B(X is smaller than 1 and δ 2 nm 2κ n nɛ 2 n. By the property of chi-square distribution and normal distribution, with probability larger than 1 exp{ c 5 nɛ 2 n}, we have ε 2 n(1+c 5, ε T B(X c 5nɛ n for constant c 5. Then 9

for model parameters (β, σ 2, we have { Y B(Xβ 2 /2σ 2 < Y ( f 1 (X 1 +... + f p n (X pn 2 /σ 2 + c 5 nɛ 2 n} ={ εσ /σ + B(X(β β/σ + δσ /σ 2 < ε 2 + 2c 5 nɛ 2 n} { B(X(β β/σ < ε 2 + 2c 5 nɛ 2 n (σ /σ ε (σ /σ δ } { pn (β j β j /σ 2η } m n ɛ n { β j /σ [ β j/σ η m n ɛ n /s, β j/σ + η m n ɛ n /s ] } for all j ξ { β j /σ η } m n ɛ n. j / ξ for some constant η. The second subset relation is given by Assumption A 1 (3 such that the eigenvalues of B(X T B(X are in the same order as n/m n. By Lemma 1 and Lemma 2, π α ( j / ξ β j /σ η m n ɛ n σ 2 σ 2 η ɛ 2 n p n s = π(x pn s x j η i dx 1... dx pn s exp{ c 6nɛ 2 n} m nɛ n π α ( β j /σ [ β j/σ η m n ɛ n /s, β j/σ + η m n ɛ n /s ] { σ 2 σ 2 η ɛ 2 n} ηɛ n inf x π α(x/s exp{ c 6nɛ 2 n}. m ne These verify the inequality in (25. Finally, from the proof of Theorem A.1 in Song and Liang (216, for B n defined above, we have P(B n < e c 7nɛ 2 n for some constants c7, given that x >a n π α (xdx p (1+µ n. Therefore, combining the results of P r(b n < e c 7nɛ 2 n with (23, (24 and (25, we have ( B(Xβ p n P (π fj (X j σ nɛ n or σ 2 σ 2 > c 4 ɛ n X, Y e c 1nɛ 2 n e c 2nɛ 2 n. In conclusion, the results in Theorem 4,1 is proved. 1

Proof of Theorem 4.2 The proof of Theorem 4.2 follows the proof of Theorem A.3 in Song and Liang (216. Note that in our proposed model, we have basis expansion matrix B(X instead of covariate matrix X, the coefficients vector β = (β T 1,..., β T p n T are shrunk within blocks β 1,..., β pn. Furthermore, our model has an additional bias term δ due to B-spline basis expansion. We first define the set S 1 = { β β c 1 ɛ n, σ 2 σ 2 c 2 ɛ n }, and π(β σ 2 = inf (β,σ 2 S 1 π(β, σ 2 /π(σ 2, π(β σ 2 = sup (β1,σ 2 S 1 π(β, σ 2 /π(σ 2. The maximum of L 2 - norm for each group within coefficient vector is defined as β max = max βj. We can conclude by proof of Theorem A.3 in Song and Liang (216, ( β P(ξ = ξ (ξ X, Y π c ( s a n π(β σ ξ σ 2 det(b(x ξ T B(X ξ 1 2π max Γ(a + (n s/2 inf ( β (ξ c max a n(σ +c 2 ɛ n SSE(β(ξ c/2 + b a +(n s/2, (26 where SSE(β (ξ c is the SSE when β (ξ c is given and fixed. Next, we let β min = min βj be the minimum of L 2 -norms for each predictor within coefficient vector. Similarly, we can show that ( β P(ξ = ξ ξ X, Y π \ξ > a n, β (ξ c a n π(β σ min σ ξ σ 2 det(b(x ξ T B(X ξ 1 max ( s Γ(a + (n s/2 2π sup ( β (ξ c max a n(σ +c 2 ɛ n SSE(β(ξ c/2 + b a +(n s/2, where SSE(β (ξ c is the SSE when β (ξ c is given and fixed. Using the fact that the basis expansion bias is δ such that δ 2 nmn 2κ, we can follow the derivation of (A.13 and (A.14 of Song and Liang (216. With probability larger than 11

1 4p n p c 6 n, SSE(β (ξ c =(Y B(X (ξ cβ (ξ ct (I H ξ (Y B(X (ξ cβ (ξ c σ 2 ε T (I H ξ ε + σ 2 δ T (I H ξ δ + B(X(ξ cβ (ξ 2 c + 2σ 2 2c 6n log p n β j ; (27 j (ξ c SSE(β (ξ c σ 2 ε T (I H ξ ε + σ 2 δ T (I H ξ δ 2σ 2c 6n log p n j (ξ c β j, (28 and with probability 1 p c 6 n for any constant c 6, we have ε T (I H ξ ε ε T (I H ξ ε c 7m n ξ \ ξ log p n. Since δ T (I H ξ δ δ T (I H ξ δ c 7m n ξ \ ξ nm 2κ n, we can conclude that sup β(ξ c max a n(σ +c 2 ɛ n(sse(β (ξ c/2 + b a +(n s/2 inf β(ξ c max a n(σ +c 2 ɛ n(sse(β (ξ c/2 + b a +(n s/2 exp{c 8m n ξ \ ξ log p n + c 8m n ξ \ ξ nm 2κ n }. (29 n If m n ( log p n 1/2κ, we have nm 2κ n log p n, so that the right hand side of (29 is bounded by c 8 m n ξ \ξ log p n for some constant c 8 >. Given (6 in Lemma 3, π(β ξ σ 2 /π(β ξ σ 2 ln. s We can proceed with P(ξ = ξ X, Y σ + c 2 ɛ n P(ξ = ξ X, Y ls n [p σ n (1+µ /(1 p (1+µ n ] ξ \ξ exp{c 8 m n ξ \ ξ log p n } c 2 ɛ n ln s σ + c 2 ɛ n p µ m n ξ \ξ σ n, c 2 ɛ n with proper choice of the constants. Therefore, with min j ξ β j m n ɛ n, following the proof of Theorem A.2 in Song and 12

Liang (216, we can prove that ( P (ξ = ξ X, Y π β β > min β j a n σ j ξ X, Y e cnɛ2 n ξ ξ { In conclusion, we have P π ( ξ(a n = ξ X, Y } > 1 p µ n µ, µ >. > 1 p µ n for some constants 13