Modern Bayesian Statistics Part III: high-dimensional modeling Example 3: Sparse and time-varying covariance modeling

Σχετικά έγγραφα
Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Online Appendix To: Bayesian Doubly Adaptive Elastic-Net Lasso For VAR Shrinkage

Bayesian modeling of inseparable space-time variation in disease risk

Statistical Inference I Locally most powerful tests

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Solution Series 9. i=1 x i and i=1 x i.

Other Test Constructions: Likelihood Ratio & Bayes Tests

Homework 3 Solutions

Nondifferentiable Convex Functions

Molecular evolutionary dynamics of respiratory syncytial virus group A in

Approximation of distance between locations on earth given by latitude and longitude

ST5224: Advanced Statistical Theory II

Tridiagonal matrices. Gérard MEURANT. October, 2008

The challenges of non-stable predicates

5.4 The Poisson Distribution.

Srednicki Chapter 55

Parametrized Surfaces

An Inventory of Continuous Distributions

Introduction to the ML Estimation of ARMA processes

6.3 Forecasting ARMA processes

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Solutions to Exercise Sheet 5

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

2 Composition. Invertible Mappings

10.7 Performance of Second-Order System (Unit Step Response)

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Tutorial on Multinomial Logistic Regression

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

4.6 Autoregressive Moving Average Model ARMA(1,1)

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

The ε-pseudospectrum of a Matrix

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Section 8.3 Trigonometric Equations

794 Appendix A:Tables

Math 6 SL Probability Distributions Practice Test Mark Scheme

Second Order Partial Differential Equations

Homework 8 Model Solution Section

Lecture 34: Ridge regression and LASSO

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Probability and Random Processes (Part II)

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

6. MAXIMUM LIKELIHOOD ESTIMATION

DERIVATION OF MILES EQUATION FOR AN APPLIED FORCE Revision C

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Mean-Variance Analysis

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

EE512: Error Control Coding

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Table A.1 Random numbers (section 1)

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def


3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Description of the PX-HC algorithm

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Matrices and Determinants

Additional Results for the Pareto/NBD Model

Περιεχόµενα. 1. Γενικό πλαίσιο. 2. Η ΚΑΠ σήµερα. 3. Γιατί χρειαζόµαστε τη µεταρρύθµιση; 4. Νέοι στόχοι, µελλοντικά εργαλεία και πολιτικές επιλογές

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

HW 3 Solutions 1. a) I use the auto.arima R function to search over models using AIC and decide on an ARMA(3,1)

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Problem Set 3: Solutions

Dr. D. Dinev, Department of Structural Mechanics, UACEG

Numerical Analysis FMN011

Spherical Coordinates

상대론적고에너지중이온충돌에서 제트입자와관련된제동복사 박가영 인하대학교 윤진희교수님, 권민정교수님

Lecture 7: Overdispersion in Poisson regression

derivation of the Laplacian from rectangular to spherical coordinates

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

2.153 Adaptive Control Lecture 7 Adaptive PID Control

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Lecture 2. Soundness and completeness of propositional logic

Pg The perimeter is P = 3x The area of a triangle is. where b is the base, h is the height. In our case b = x, then the area is

5. Choice under Uncertainty

The Probabilistic Method - Probabilistic Techniques. Lecture 7: The Janson Inequality

Notes on the Open Economy

w o = R 1 p. (1) R = p =. = 1

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

Spatial Modeling with Spatially Varying Coefficient Processes

Lecture 21: Scattering and FGR

Sequent Calculi for the Modal µ-calculus over S5. Luca Alberucci, University of Berne. Logic Colloquium Berne, July 4th 2008

Mock Exam 7. 1 Hong Kong Educational Publishing Company. Section A 1. Reference: HKDSE Math M Q2 (a) (1 + kx) n 1M + 1A = (1) =

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

ΕΓΓΡΑΦΟ ΕΡΓΑΣΙΑΣ ΤΩΝ ΥΠΗΡΕΣΙΩΝ ΤΗΣ ΕΠΙΤΡΟΠΗΣ. Εκθεση χώρας - Κύπρος {COM(2015) 85 final}

DATA SHEET Surface mount NTC thermistors. BCcomponents

C.S. 430 Assignment 6, Sample Solutions

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Fractional Colorings and Zykov Products of graphs

Strain gauge and rosettes

Lecture 21: Properties and robustness of LSE

Transcript:

Modern Bayesian Statistics Part III: high-dimensional modeling Example 3: Sparse and time-varying covariance modeling Hedibert Freitas Lopes 1 hedibert.org 13 a amostra de Estatística IME-USP, October 2018 1 Professor of Statistics and Econometrics at Insper, São Paulo. 1

Example 3: Sparse and time-varying covariance modeling Consider the Gaussian linear model y = X β + ε, ε N (0, σ 2 I ), where y is n-length vector and X is a n q design matrix. Ridge Regression (l 2 penalty on β) ˆβ ridge = arg min β { y X β 2 + λ β 2 2 leading to ˆβ ridge = (X X + λi ) 1 X y. LASSO (l 1 penalty on β) }, λ 0, ˆβ lasso = arg min{ y X β 2 + λ β 1 }, λ 0, β which can be solved by using quadratic programming techniques such as coordinate gradient descent. 2

Bayesian regularization in linear regression problems Hierarchical scale mixture of normals: β ψ N (0, ψ) and ψ θ p(ψ), Maximum a posteriori (MAP): arg max{p(y β, σ 2 )p(β ψ)} β A few cases: Prior p(ψ) p(β) Bayesian Lasso ψ E(λ 2 /2) Laplace Ridge ψ IG(a, b) Scaled-t Normal-Gamma ψ G(λ, 1/(2γ 2 )) below p(β λ, γ 2 ) = 1 π2 λ 1/2 γ λ+1/2 Γ(λ) β λ 1/2 K λ 1/2 ( β /γ), where Var(β λ, γ 2 ) = 2λγ 2 and excess kurtosis 3/λ. 3

The Normal-Gamma prior High mass close to zero and heavy tails Figure: λ = 0.1 (dot), λ = 0.33 (dot-dashed), λ = 1 (solid). 4

Spike-and-slab priors Stochastic search variable selection (SSVS) SSVS places independent mixture priors directly on the coefficients β J (1 J) N (0, τ 2 ) +J N (0, c 2 τ 2 ), }{{}}{{} spike slab with c > 1 large, τ > 0 small and J Ber(ω). SMN representation β ψ N (0, ψ) ψ J (1 J)δ τ 2(.) + Jδ c 2 τ 2(.) 5

Spike-and-slab priors on the scale parameter ψ Normal mixture of Inverse-Gammas (NMIG) β K N (0, Kτ 2 ), K ω (1 ω)δ υ0 (.) + ωδ υ1 (.), υ 0 /υ 1 1, τ 2 IG(a τ, b τ ). (1) SMN representation β ψ N (0, ψ) and ψ (1 ω)ig(a τ, υ 0 b τ ) + ωig(a τ, υ 1 b τ ) The resulting marginal distribution of β is a two component mixture of scaled Student s t distributions. 6

Spike-and-slab priors on the scale parameter ψ Figure: Conditional density for hypervariance ψ for NMIG mixture prior where υ 0 = 0.005, υ 1 = 1, a τ = 5, b τ = 50 and (a) ω = 0.5 (black line), (b) ω = 0.95 (red line). Note that as ω has a Uniform prior, (a) also corresponds to the marginal density of ψ. Observe that only the height of the density changes as ω varied. 7

Spike-and-slab priors: summary Prior Spike ψ J = 0 Slab ψ J = 1 Marginal β ω Constant c SSVS ψ J = 0 = δrq(.) ψ J = 1 = δq(.) ωn (0, Q) + (1 ω)n (0, rq) 1 NMIG IG(ν, rq) IG(ν, Q) ωt2ν(0, Q/ν) + (1 ω)t2ν(0, rq/ν) 1/(ν 1) Mixture of Laplaces E(1/2rQ) E(1/2Q) ωlap( Q) + (1 ω)lap( rq) 2 Mixture of Normal-Gammas G(a, 1/2rQ) G(a, 1/2Q) ωng(βj a, Q) + (1 ω)ng(βj a, r, Q) 2a Laplace-t E(1/2rQ) IG(ν, Q) ωt2ν(0, Q/ν) + (1 ω)lap( rq) c1 = 2, c2 = 1/(ν 1) 8

Gaussian dynamic regression problems Consider the univariate Gaussian dynamic linear model (DLM) expressed by y t = F tβ t + ν t, ν t N (0, V t ) (2) β t = G t β t 1 + ω t, ω t N (0, W t ), (3) where β is of length q and β 0 N (m 0, C 0 ). Dynamic regression model: F t = X t and G t = I q. Static regression model: W t = 0 for all t. 9

Shrinkage in dynamic regression problems Two main obstacles: 1. Time-varying parameters (states), and 2. A large number of predictors q. Two sources of sparsity: 1. horizontal sparsity: β j,t = 0, t for some coefficients j. 2. vertical sparsity: β j,t = 0 for several js at time t. 10

Shrinkage in dynamic regression problems Two main obstacles: 1. Time-varying parameters (states), and 2. A large number of predictors q. Two sources of sparsity: 1. horizontal sparsity: β j,t = 0, t for some coefficients j. 2. vertical sparsity: β j,t = 0 for several js at time t. Illustration: q = 5 and T = 12 jan feb mar apr may jun jul aug sep oct nov x 0 β 1,1 β 1,2 β 1,3 β 1,4 β 1,5 β 1,6 β 1,7 β 1,8 β 1,9 β 1,10 β 1,11 x 1 0 0 0 0 0 0 0 0 0 0 0 x 2 β 3,1 β 3,2 β 3,3 β 3,4 β 3,5 0 0 0 β 3,9 β 3,10 β 3,11 x 3 0 0 β 4,3 β 4,4 β 4,5 β 4,6 β 4,7 β 4,8 β 4,9 β 4,10 β 4,11 x 4 β 5,1 β 5,2 β 5,3 β 5,4 β 5,5 0 0 0 β 5,9 β 5,10 β 5,11 11

Our contribution: a dynamic spike-and-slab model Our contribution is defining a a spike-and-slab prior that not only shrinks time-varying coefficients in dynamic regression problems but allows for dynamic variable selection. We use a non-centered parametrization: y t = F tβ t + ν t, ν t N (0, σt 2 ) β t = G t β t 1 + ω t, ω t N (0, W t ), where β t = ( ) β 1,t β q,t,..., ψ1,t ψq,t G t = diag(ϕ 1,..., ϕ q ) W t = diag((1 ϕ 2 1),..., (1 ϕ 2 q)), F t = (X 1,t ψ1,t,..., X q,t ψq,t ). 12

Our contribution: a dynamic spike-and-slab model For shrinking the states β 1,..., β T, for any j coefficient, we place independent priors for each ψ t = τ 2 K t as τ 2 iid p(τ 2 θ), (K t K t 1 = υ i ) ind ω 1,i δ 1 (.) + (1 ω 1,i )δ r (.), ω 1,i = p (K t = 1 K t 1 = υ i ), where υ i {r, 1}, p(k 1 = r) = p(k 1 = 1) = 1/2, r = Var spike (β θ)/ Var slab (β θ) 1 and p(τ 2 θ) is one of priors from the previous Table. Markov switching process for K t That is, K t is a binary random latent variable that can assume binary values (regimes) υ 0 = r or υ 1 = 1, depending only on the previous value of K t 1 and the prior transition probabilities {ω 0,0 ; ω 0,1 ; ω 1,1 ; ω 1,0 }. 13

Our contribution: direct acyclic graph 14