On the general understanding of the empirical Bayes method

Σχετικά έγγραφα
Other Test Constructions: Likelihood Ratio & Bayes Tests

Statistical Inference I Locally most powerful tests

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

ST5224: Advanced Statistical Theory II

Solution Series 9. i=1 x i and i=1 x i.

4.6 Autoregressive Moving Average Model ARMA(1,1)

Lecture 12: Pseudo likelihood approach

2 Composition. Invertible Mappings

Theorem 8 Let φ be the most powerful size α test of H

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Lecture 34 Bootstrap confidence intervals

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Exercises to Statistics of Material Fatigue No. 5

Every set of first-order formulas is equivalent to an independent set

Uniform Convergence of Fourier Series Michael Taylor

Problem Set 3: Solutions

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

6. MAXIMUM LIKELIHOOD ESTIMATION

Example Sheet 3 Solutions

Solutions to Exercise Sheet 5

Introduction to the ML Estimation of ARMA processes

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Lecture 2. Soundness and completeness of propositional logic

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Notes on the Open Economy

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Μηχανική Μάθηση Hypothesis Testing

Concrete Mathematics Exercises from 30 September 2016

Lecture 21: Properties and robustness of LSE

New bounds for spherical two-distance sets and equiangular lines

STAT200C: Hypothesis Testing

Bounding Nonsplitting Enumeration Degrees

Section 8.3 Trigonometric Equations

6.3 Forecasting ARMA processes

2. Let H 1 and H 2 be Hilbert spaces and let T : H 1 H 2 be a bounded linear operator. Prove that [T (H 1 )] = N (T ). (6p)

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

EE512: Error Control Coding

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Υπολογιστική Φυσική Στοιχειωδών Σωματιδίων

Numerical Analysis FMN011

Homework 3 Solutions

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

Matrices and Determinants

C.S. 430 Assignment 6, Sample Solutions

A Note on Intuitionistic Fuzzy. Equivalence Relation

Partial Differential Equations in Biology The boundary element method. March 26, 2013

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Inverse trigonometric functions & General Solution of Trigonometric Equations

Iterated trilinear fourier integrals with arbitrary symbols

derivation of the Laplacian from rectangular to spherical coordinates

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Congruence Classes of Invertible Matrices of Order 3 over F 2

D Alembert s Solution to the Wave Equation

ORDINAL ARITHMETIC JULIAN J. SCHLÖDER

Homework 8 Model Solution Section

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Local Approximation with Kernels

5. Choice under Uncertainty

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Fractional Colorings and Zykov Products of graphs

Homework for 1/27 Due 2/5

1. A fully continuous 20-payment years, 30-year term life insurance of 2000 is issued to (35). You are given n A 1

Supplementary Materials: Proofs

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

Math221: HW# 1 solutions

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Homomorphism in Intuitionistic Fuzzy Automata

The Simply Typed Lambda Calculus

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Χρηματοοικονομική Ανάπτυξη, Θεσμοί και

P AND P. P : actual probability. P : risk neutral probability. Realtionship: mutual absolute continuity P P. For example:

Lecture 7: Overdispersion in Poisson regression

Supplementary Appendix

Mean-Variance Analysis

Second Order Partial Differential Equations

Various types of likelihood

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Bayesian Analysis in Moment Inequality Models Supplement Material

Approximation of distance between locations on earth given by latitude and longitude

These derivations are not part of the official forthcoming version of Vasilaky and Leonard

Last Lecture. Biostatistics Statistical Inference Lecture 19 Likelihood Ratio Test. Example of Hypothesis Testing.

Math 446 Homework 3 Solutions. (1). (i): Reverse triangle inequality for metrics: Let (X, d) be a metric space and let x, y, z X.

Asymptotic distribution of MLE

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Higher Derivative Gravity Theories

Tridiagonal matrices. Gérard MEURANT. October, 2008

w o = R 1 p. (1) R = p =. = 1


Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

The Probabilistic Method - Probabilistic Techniques. Lecture 7: The Janson Inequality

Transcript:

On the general understanding of the empirical Bayes method Judith Rousseau 1, Botond Szabó 2 1 Paris Dauphin, Paris, France 2 Budapest University of Technology and Economics, Budapest, Hungary ERCIM 2014, Pisa, 8. 12. 2014.

Table of contents 1 Introduction 2 General Theorem on EB 3 Examples Gaussian white noise model Nonparametric regression Density function problem 4 Epilogue

Introduction General Theorem on EB Examples Motivation Applications: Genetics Clark & Swanson (2005). Contextual region classification Lazebnik et al. (2009). High dimensional classification Chen et al. (2008). Robotics Schauerte et al. (2013). Although it is widely used in practice, it does not have full theoretical underpinning. Epilogue

Bayes vs Frequentist approach Statistical model: Consider a collection of distributions P = {P θ : θ Θ}. Schools: Frequentist Bayes Model: X (n) P θ0, θ 0 Θ θ Π (prior), X (n) θ P θ Goal: Try to recover θ 0 : Update our belief about θ: Estimator ˆθ(X (n) ) Posterior: θ X (n)

Bayes vs Frequentist approach Statistical model: Consider a collection of distributions P = {P θ : θ Θ}. Schools: Frequentist Bayes Model: X (n) P θ0, θ 0 Θ θ Π (prior), X (n) θ P θ Goal: Try to recover θ 0 : Update our belief about θ: Estimator ˆθ(X (n) ) Posterior: θ X (n) Frequentist Bayes Investigate Bayesian techniques from frequentist perspective, i.e. assume that there exists a true θ 0 and investigate the behaviour of the posterior θ X (n).

Adaptive Bayes Assume that we have a family of prior distributions indexed by a hyper-parameter λ. {Π λ : λ Λ}, Problem: In (nonparametric models) the posterior crucially depends on the prior, hence on the hyper-parameter.

Adaptive Bayes Assume that we have a family of prior distributions indexed by a hyper-parameter λ. {Π λ : λ Λ}, Problem: In (nonparametric models) the posterior crucially depends on the prior, hence on the hyper-parameter. Question: How to choose λ? Fixed λ: without strong belief misleading. Use the data to find λ, i.e. adaptive techniques: Hierarchical Bayes: endow λ with hyper-prior π(λ). Empirical Bayes: estimate λ from the data X (n).

Empirical Bayes method EB Method: Frequentist estimator for the hyper-parameter λ : Marginal likelihood empirical Bayes: Plug in the marginal maximum likelihood estimator ˆλ n = arg max{λ : e ln(θ) Π λ (dθ)}, where l n (θ) is the log-likelihood, into the posterior ( X (n) ) = Π Πˆλn λ ( X (n) ). λ=ˆλn Θ

Empirical Bayes method EB Method: Frequentist estimator for the hyper-parameter λ : Marginal likelihood empirical Bayes: Plug in the marginal maximum likelihood estimator ˆλ n = arg max{λ : e ln(θ) Π λ (dθ)}, where l n (θ) is the log-likelihood, into the posterior ( X (n) ) = Π Πˆλn λ ( X (n) ). λ=ˆλn Mimics the HB method. Θ

Empirical Bayes method EB Method: Frequentist estimator for the hyper-parameter λ : Marginal likelihood empirical Bayes: Plug in the marginal maximum likelihood estimator ˆλ n = arg max{λ : e ln(θ) Π λ (dθ)}, where l n (θ) is the log-likelihood, into the posterior ( X (n) ) = Π Πˆλn λ ( X (n) ). λ=ˆλn Mimics the HB method. Other frequentist estimators for ˆλ n : MM, MRE,... Widely used in the literature, BUT missing full theoretical justification. Θ

Theoretical investigation Frequentist analysis: Consider a loss L and a collection of nested sub-classes {Θ β : β B}. Minimax risk: r n,β = inf ˆθn T n sup θ Θ β E θ L(ˆθ n, θ). Do we have adaptive contraction rate: inf E θ0 (θ : L(θ, θ Πˆλn 0 ) Mr n,β X (n) ) 1 θ 0 Θ β for all β B and a large enough constant M > 0?

Theoretical investigation Frequentist analysis: Consider a loss L and a collection of nested sub-classes {Θ β : β B}. Minimax risk: r n,β = inf ˆθn T n sup θ Θ β E θ L(ˆθ n, θ). Do we have adaptive contraction rate: inf E θ0 (θ : L(θ, θ Πˆλn 0 ) Mr n,β X (n) ) 1 θ 0 Θ β for all β B and a large enough constant M > 0? Literature: Specific models: Florens & Simoni (2012), Knapik et al.(2012), Sz. et al. (2013), Serra & Krivobokova (2014). Comparing EB and HB in parametric models: Petrone et al. (2014). General nonparametric models, BUT for well behaved estimators ˆλ n : Donnet et al. (2014).

The set of possible hyper-parameters λ Determine the location of λ: Define ε n (λ) = ε n (λ, θ 0 ), such that Π λ (θ : θ θ 0 2 Kε n (λ)) = e nε2 n (λ), for some K > 0 (specified later).

The set of possible hyper-parameters λ Determine the location of λ: Define ε n (λ) = ε n (λ, θ 0 ), such that Π λ (θ : θ θ 0 2 Kε n (λ)) = e nε2 n (λ), for some K > 0 (specified later). Let us denote by m n = inf β B r n,β and assume that m n 1/ n. Define the set: Λ n = {λ : ε n (λ) m n }. Let ε n,0 = min λ {ε n (λ) : λ Λ n }.

The set of possible hyper-parameters λ Determine the location of λ: Define ε n (λ) = ε n (λ, θ 0 ), such that Π λ (θ : θ θ 0 2 Kε n (λ)) = e nε2 n (λ), for some K > 0 (specified later). Let us denote by m n = inf β B r n,β and assume that m n 1/ n. Define the set: Λ n = {λ : ε n (λ) m n }. Let ε n,0 = min λ {ε n (λ) : λ Λ n }. Finally define the set of probable hyper-parameters Λ 0 = {λ : ε n (λ) M n ε n,0 } Λ c n, for some M n tending to infinity. Our first goal is to show that ˆλ n Λ 0.

Conditions Following Donet et al. (2014) we introduce some assumptions: Entropy (hyper): Discretize Λ c 0 into balls B(λ i, u n, ) and assume that for some w n M n : log N n = o(w 2 n nε 2 n,0 )

Conditions Following Donet et al. (2014) we introduce some assumptions: Entropy (hyper): Discretize Λ c 0 into balls B(λ i, u n, ) and assume that for some w n M n : log N n = o(w 2 n nε 2 n,0 ) Transformation: Let ψ λ,λ : Θ Θ such that if θ Π λ ( ) then ψ λ,λ (θ) Π λ ( ) for λ, λ Λ, and introduce the notation dq θ λ,n(x (n) ) = sup e ln(ψλ,λ (θ))(x (n)) dµ(x (n) ). λ λ u n

Conditions Following Donet et al. (2014) we introduce some assumptions: Entropy (hyper): Discretize Λ c 0 into balls B(λ i, u n, ) and assume that for some w n M n : log N n = o(w 2 n nε 2 n,0 ) Transformation: Let ψ λ,λ : Θ Θ such that if θ Π λ ( ) then ψ λ,λ (θ) Π λ ( ) for λ, λ Λ, and introduce the notation dq θ λ,n(x (n) ) = sup e ln(ψλ,λ (θ))(x (n)) dµ(x (n) ). λ λ u n Boundedness: For all θ B ( θ 0, ε n (λ), 2 ) Q θ λ,n (X n ) e cnε2 n (λ), c < 1.

Conditions Following Donet et al. (2014) we introduce some assumptions: Entropy (hyper): Discretize Λ c 0 into balls B(λ i, u n, ) and assume that for some w n M n : log N n = o(w 2 n nε 2 n,0 ) Transformation: Let ψ λ,λ : Θ Θ such that if θ Π λ ( ) then ψ λ,λ (θ) Π λ ( ) for λ, λ Λ, and introduce the notation dq θ λ,n(x (n) ) = sup e ln(ψλ,λ (θ))(x (n)) dµ(x (n) ). λ λ u n Boundedness: For all θ B ( θ 0, ε n (λ), 2 ) Q θ λ,n (X n ) e cnε2 n (λ), c < 1. Sieve: For all λ Λ c 0 assume that there exists Θ n(λ) such that Qλ,n(X θ n )Π λ (dθ) e w 2 n nε2 n,0. Θ n(λ) c

Conditions II Tests: For all λ Λ c 0 and all θ Θ n(λ) there exist tests ϕ n,i (θ) : E θ0 (ϕ n,i (θ)) e c1nd 2 (θ,θ 0), sup Qλ θ i,n(1 ϕ n,i (θ)) e c1nd 2 (θ,θ 0), d(θ,θ ) ζd(θ,θ 0) with 0 < ζ < 1, and for some large enough C > 0 { θ θ 0 2 > Cε n (λ), θ Θ n (λ)} {d(θ, θ 0 ) ε n (λ), θ Θ n (λ)}.

Conditions II Tests: For all λ Λ c 0 and all θ Θ n(λ) there exist tests ϕ n,i (θ) : E θ0 (ϕ n,i (θ)) e c1nd 2 (θ,θ 0), sup Qλ θ i,n(1 ϕ n,i (θ)) e c1nd 2 (θ,θ 0), d(θ,θ ) ζd(θ,θ 0) with 0 < ζ < 1, and for some large enough C > 0 { θ θ 0 2 > Cε n (λ), θ Θ n (λ)} {d(θ, θ 0 ) ε n (λ), θ Θ n (λ)}. Entropy: For all u Cε n (λ): N(ζu, {u d(θ, θ 0 ) 2u} Θ n (λ), d(, )) c 1 nu 2 /2

Conditions II Tests: For all λ Λ c 0 and all θ Θ n(λ) there exist tests ϕ n,i (θ) : E θ0 (ϕ n,i (θ)) e c1nd 2 (θ,θ 0), sup Qλ θ i,n(1 ϕ n,i (θ)) e c1nd 2 (θ,θ 0), d(θ,θ ) ζd(θ,θ 0) with 0 < ζ < 1, and for some large enough C > 0 { θ θ 0 2 > Cε n (λ), θ Θ n (λ)} {d(θ, θ 0 ) ε n (λ), θ Θ n (λ)}. Entropy: For all u Cε n (λ): N(ζu, {u d(θ, θ 0 ) 2u} Θ n (λ), d(, )) c 1 nu 2 /2 Local Metric Exchange: There exists M 1, M 2 > 0 and λ n Λ 0 satisfying ε n (λ n ) M 1 ε n,0 such that { θ θ 0 2 ε n (λ n )} B n (θ 0, M 2 ε n (λ n ), {KL(p θ, p θ0 ), V 0,k (p θ, p θ0 )}).

Main theorem Theorem: Assume that all the above conditions hold. Then for all θ 0 Θ with P θ0 -probability tending to one we have ˆλ n Λ 0.

Main theorem Theorem: Assume that all the above conditions hold. Then for all θ 0 Θ with P θ0 -probability tending to one we have ˆλ n Λ 0. Our next goal is to give upper bounds for the EB contraction rates. Following Donet et al. (2014) assume that Uniform likelihood ratio: sup sup λ Λ 0 θ B(θ 0,λ) P θ0 { inf l n (ψ λ,λ (θ)) l n (θ 0 ) K 5 nε 2 1 n,0} = o( λ λ u n N n (u n ) Stronger Entropy (hyper): N n (u n ) = o((nε 2 n,0 )k/2 ).

Main theorem Theorem: Assume that all the above conditions hold. Then for all θ 0 Θ with P θ0 -probability tending to one we have ˆλ n Λ 0. Our next goal is to give upper bounds for the EB contraction rates. Following Donet et al. (2014) assume that Uniform likelihood ratio: sup sup λ Λ 0 θ B(θ 0,λ) P θ0 { inf l n (ψ λ,λ (θ)) l n (θ 0 ) K 5 nε 2 1 n,0} = o( λ λ u n N n (u n ) Stronger Entropy (hyper): N n (u n ) = o((nε 2 n,0 )k/2 ). Theorem: Under the preceding conditions the empirical Bayes posterior distribution contracts around the truth with a rate M n ε n,0 : ( θ : θ θ Πˆλn 0 2 M n ε n,0 X (n)) P θ0 1.

Gaussian white noise model Gaussian white noise model Model: Let us observe the sequence X (n) = (X 1, X 2,...) satisfying X i = θ 0,i + (1/ n)z i, i = 1, 2,..., where θ 0 = (θ 0,1, θ 0,2,...) the unknown infinite dimensional parameter and Z i are iid standard normal random variables. Sub-classes: Θ β (M) = {θ l 2 : θ 2 i Mi 1 2β }.

Gaussian white noise model Gaussian white noise model Model: Let us observe the sequence X (n) = (X 1, X 2,...) satisfying X i = θ 0,i + (1/ n)z i, i = 1, 2,..., where θ 0 = (θ 0,1, θ 0,2,...) the unknown infinite dimensional parameter and Z i are iid standard normal random variables. Sub-classes: Θ β (M) = {θ l 2 : θ 2 i Mi 1 2β }. Priors: Π α ( ) = i=1 N(0, i 1 2α ), see Knapik et al. (2012). Π τ ( ) = i=1 N(0, τ 2 i 1 2α ), see Sz. et al. (2013). Π N ( ) = N i=1 g( ), where G 1e G2 t α g(t) G 3 e G4 t α, see Arbel et. al (2012) for HB. Π γ ( ) = i=1 N(0, e γi ), see Castillo et al. (2014) for HB.

Gaussian white noise model GWN model with regularity hyper-parameter Prior: Π α ( ) = i=1 N(0, i 1 2α ) All the conditions of our theorems are met.

Gaussian white noise model GWN model with regularity hyper-parameter Prior: Π α ( ) = i=1 N(0, i 1 2α ) All the conditions of our theorems are met. Upper bound on ε n (α) (for θ 0 Θ β ): Concentration inequality vd Vaart & v Zanten (2008): nε 2 n(α) = log Π α (θ : θ θ 0 Kε n (α)) ϕ α,θ0 (Kε n (α)/2), Centered small ball Li & Shao (2001): log Π α (θ : θ 2 Kε n (α)/2) (K/2) 1/α ε n (α) 1/α. RKHS term: Cε 1/β n inf h 2 h H α H i 1+2α θ : h θ α 0,i 2 ε n (α) 1+2α 2β β 0 0 2 ε n Solution: ε n (α) n β α 1+2α. i=1

Gaussian white noise model GWN model with regularity hyper-parameter Prior: Π α ( ) = i=1 N(0, i 1 2α ) All the conditions of our theorems are met. Upper bound on ε n (α) (for θ 0 Θ β ): Concentration inequality vd Vaart & v Zanten (2008): nε 2 n(α) = log Π α (θ : θ θ 0 Kε n (α)) ϕ α,θ0 (Kε n (α)/2), Centered small ball Li & Shao (2001): log Π α (θ : θ 2 Kε n (α)/2) (K/2) 1/α ε n (α) 1/α. RKHS term: Cε 1/β n inf h 2 h H α H i 1+2α θ : h θ α 0,i 2 ε n (α) 1+2α 2β β 0 0 2 ε n Solution: ε n (α) n β α 1+2α. i=1 EB rate: α 0 = β, hence ε n,0 = ε n (α 0 ) n β 1+2β.

Gaussian white noise model GWN model with scaling hyper-parameter Prior: Π τ ( ) = i=1 N(0, τ 2 i 1 2α ), with fixed α > 0. All the conditions of our theorems are met.

Gaussian white noise model GWN model with scaling hyper-parameter Prior: Π τ ( ) = i=1 N(0, τ 2 i 1 2α ), with fixed α > 0. All the conditions of our theorems are met. Upper bound on ε n (τ) (for θ 0 Θ β ): Similarly as in the regularity hyper-parameter case: nε 2 n(τ) log Π τ (θ : θ 2 Kε n (τ)/2) + inf h H τ : h θ h 2 H τ 0 2 ε n(τ) 1/β ε n(τ) τ 1/α ε n (τ) 1/a + τ 2 i=1 i 2(α β)

Gaussian white noise model GWN model with scaling hyper-parameter Prior: Π τ ( ) = i=1 N(0, τ 2 i 1 2α ), with fixed α > 0. All the conditions of our theorems are met. Upper bound on ε n (τ) (for θ 0 Θ β ): Similarly as in the regularity hyper-parameter case: nε 2 n(τ) log Π τ (θ : θ 2 Kε n (τ)/2) + inf h H τ : h θ h 2 H τ 0 2 ε n(τ) 1/β ε n(τ) τ 1/α ε n (τ) 1/a + τ 2 EB posterior rate: i=1 i 2(α β) n β 1+2β, if β < α + 1/2, ε n (τ 0 ) n β 1+2β (log n) 1/(1+2β), if β = α + 1/2, n 1/2+α 2+2α, if β > α + 1/2.

Nonparametric regression Nonparametric regression model Fixed design: Assume that we observe Y 1, Y 2,..., Y n satisfying Y i = f 0 (x i ) + Z i, i = 1, 2,..., n, where Z i are iid standard Gaussian random variables and x i = i/n.

Nonparametric regression Nonparametric regression model Fixed design: Assume that we observe Y 1, Y 2,..., Y n satisfying Y i = f 0 (x i ) + Z i, i = 1, 2,..., n, where Z i are iid standard Gaussian random variables and x i = i/n. Series decomposition: Let us denote by θ 0 = (θ 0,1, θ 0,2,..) the Fourier coefficients of the regression function f 0 L 2 (M): f 0 (t) = θ 0,j ψ j (t). j=1

Nonparametric regression Nonparametric regression model Fixed design: Assume that we observe Y 1, Y 2,..., Y n satisfying Y i = f 0 (x i ) + Z i, i = 1, 2,..., n, where Z i are iid standard Gaussian random variables and x i = i/n. Series decomposition: Let us denote by θ 0 = (θ 0,1, θ 0,2,..) the Fourier coefficients of the regression function f 0 L 2 (M): f 0 (t) = θ 0,j ψ j (t). j=1 Prior: Π α ( ) = i=1 N(0, i 1 2α ) on θ 0. EB posterior rate: for θ 0 S β (M) we have ε n (α 0 ) n β 1+2β.

Density function problem Density function Model: Let X 1, X 2,..., X n be iid sample from the density function f 0 on [0, 1]. Assume that the density takes the form: f 0 (x) = exp θ 0,j ϕ j (x) c(θ 0 ), θ 0 l 2, j=1 where (ϕ j ) is an orthonormal basis of L 2 ([0, 1]).

Density function problem Density function Model: Let X 1, X 2,..., X n be iid sample from the density function f 0 on [0, 1]. Assume that the density takes the form: f 0 (x) = exp θ 0,j ϕ j (x) c(θ 0 ), θ 0 l 2, j=1 where (ϕ j ) is an orthonormal basis of L 2 ([0, 1]). Prior: Log-linear priors Rivoirard & Rousseau (2012): f θ (x) = exp θ j ϕ j (x) c(θ), θ l 2, j=1 where the parameter θ l 2 follows Π τ ( ) = i=1 N(0, τ 2 i 1 2α ), Π N ( ) = N i=1 g( ).

Summary We characterized the set Λ 0 where the marginal likelihood estimator ˆλ n belongs (with probability tending to one). We gave an upper bound on the EB contraction rate. We investigated various examples: Gaussian white noise (reproduced multiple specific results from the literature), Nonparametric regression, Density function problem.

Future/Ongoing work Extensions: Consider other, more complex models. Consider other metrics (at the moment we have L 2 -norm). Lower bounds on the contraction rates of the EB posterior. Inverse problems. Investigate the coverage property of the EB credible sets. (Under polished tail assumption, see Sz. et al. (2014)). Using the EB results on the coverage to derive general theorems on hierarchical Bayes credible sets.