Online Appendix To: Bayesian Doubly Adaptive Elastic-Net Lasso For VAR Shrinkage

Σχετικά έγγραφα
Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

6.3 Forecasting ARMA processes

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Srednicki Chapter 55

Section 8.3 Trigonometric Equations

2 Composition. Invertible Mappings

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Solution Series 9. i=1 x i and i=1 x i.

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

Approximation of distance between locations on earth given by latitude and longitude

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Notes on the Open Economy

derivation of the Laplacian from rectangular to spherical coordinates

Solutions to Exercise Sheet 5

DERIVATION OF MILES EQUATION FOR AN APPLIED FORCE Revision C

Second Order Partial Differential Equations

Description of the PX-HC algorithm

PARTIAL NOTES for 6.1 Trigonometric Identities

[1] P Q. Fig. 3.1

Modern Bayesian Statistics Part III: high-dimensional modeling Example 3: Sparse and time-varying covariance modeling

Exercises to Statistics of Material Fatigue No. 5

C.S. 430 Assignment 6, Sample Solutions

ST5224: Advanced Statistical Theory II

An Inventory of Continuous Distributions

Section 7.6 Double and Half Angle Formulas

CRASH COURSE IN PRECALCULUS

Section 9.2 Polar Equations and Graphs

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Problem Set 3: Solutions

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Partial Trace and Partial Transpose

Partial Differential Equations in Biology The boundary element method. March 26, 2013

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

Math 6 SL Probability Distributions Practice Test Mark Scheme

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

Testing for Indeterminacy: An Application to U.S. Monetary Policy. Technical Appendix

5.4 The Poisson Distribution.

The Simply Typed Lambda Calculus

Statistical Inference I Locally most powerful tests

Matrices and Determinants

6. MAXIMUM LIKELIHOOD ESTIMATION

Bayesian modeling of inseparable space-time variation in disease risk

Numerical Analysis FMN011

Tutorial on Multinomial Logistic Regression

Example Sheet 3 Solutions

Derivation of Optical-Bloch Equations

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

Reminders: linear functions

EE512: Error Control Coding

DiracDelta. Notations. Primary definition. Specific values. General characteristics. Traditional name. Traditional notation

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Math221: HW# 1 solutions

Tridiagonal matrices. Gérard MEURANT. October, 2008

Supplementary Appendix

Additional Results for the Pareto/NBD Model

Concrete Mathematics Exercises from 30 September 2016

Areas and Lengths in Polar Coordinates

Δθαξκνζκέλα καζεκαηηθά δίθηπα: ε πεξίπησζε ηνπ ζπζηεκηθνύ θηλδύλνπ ζε κηθξνεπίπεδν.

Περίληψη (Executive Summary)

Congruence Classes of Invertible Matrices of Order 3 over F 2

These derivations are not part of the official forthcoming version of Vasilaky and Leonard

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

DuPont Suva. DuPont. Thermodynamic Properties of. Refrigerant (R-410A) Technical Information. refrigerants T-410A ENG

FORMULAS FOR STATISTICS 1

Part III - Pricing A Down-And-Out Call Option

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

D Alembert s Solution to the Wave Equation

1. A fully continuous 20-payment years, 30-year term life insurance of 2000 is issued to (35). You are given n A 1

Strain gauge and rosettes

4.6 Autoregressive Moving Average Model ARMA(1,1)

Figure A.2: MPC and MPCP Age Profiles (estimating ρ, ρ = 2, φ = 0.03)..

CHAPTER 101 FOURIER SERIES FOR PERIODIC FUNCTIONS OF PERIOD

DuPont Suva 95 Refrigerant

Example of the Baum-Welch Algorithm

Space-Time Symmetries

Areas and Lengths in Polar Coordinates

상대론적고에너지중이온충돌에서 제트입자와관련된제동복사 박가영 인하대학교 윤진희교수님, 권민정교수님

Technical Information T-9100 SI. Suva. refrigerants. Thermodynamic Properties of. Suva Refrigerant [R-410A (50/50)]

Parametrized Surfaces

Every set of first-order formulas is equivalent to an independent set

( ) 2 and compare to M.


ΠΑΝΔΠΗΣΖΜΗΟ ΠΑΣΡΩΝ ΣΜΖΜΑ ΖΛΔΚΣΡΟΛΟΓΩΝ ΜΖΥΑΝΗΚΩΝ ΚΑΗ ΣΔΥΝΟΛΟΓΗΑ ΤΠΟΛΟΓΗΣΩΝ ΣΟΜΔΑ ΤΣΖΜΑΣΩΝ ΖΛΔΚΣΡΗΚΖ ΔΝΔΡΓΔΗΑ

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Τμήμα Πολιτικών και Δομικών Έργων

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΓΕΩΤΕΧΝΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΚΑΙ ΔΙΑΧΕΙΡΙΣΗΣ ΠΕΡΙΒΑΛΛΟΝΤΟΣ. Πτυχιακή εργασία

Similarly, we may define hyperbolic functions cosh α and sinh α from the unit hyperbola

Transcript:

Online Appendix To: Bayesian Doubly Adaptive Elastic-Net Lasso For VAR Shrinkage Deborah Gefang Department of Economics University of Lancaster email: d.gefang@lancaster.ac.uk April 7, 203 I would like to thank Gary Koop, Esther Ruiz and two anonymous referees for their constructive comments. I would also like to thank the conference participants of CFE, ESEM202, and RCEF202 for helpful discussions. Any remaining errors are my own responsibility.

Technical Details for Models Nested in DAE- Lasso This section presents the priors, posteriors, and full conditional Gibbs schemes for Lasso, adaptive Lasso, e-net Lasso, and adaptive e-net Lasso.. Lasso VAR Shrinkage Following Song and Bickel (20), we define Lasso estimator for a VAR as: N 2 k ˆβ L = arg min β {[y (I n X)β] [y (I n X)β] + λ β j } () Correspondingly, the conditional multivariate mixture prior for β takes the following form: j= N 2 k π(β Σ, Γ, λ ) j= { 0 2πfj (Γ)) exp[ 2f j (Γ) β2 j ]d(f j (Γ))} { M 2 exp( 2 Γ M Γ)} 2 (2) where Γ = [γ, γ 2,..., γ N 2 k], M = Σ I Nk, and f j (Γ) is a function of Γ and Λ to be defined later. In this mixture prior, the terms associated with the L penalty are conditional on Σ through f j (Γ). In equation (2), the variances of β a and β b for a b are related through M. However, β a and β b themselves are independent of each other. We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (2) takes the form of a multivariate Normal distribution Γ N(0, M). For ease of exposition, we first write the 2

N 2 k N 2 k covariance matrix M as following: M = M,... M,j M,j+... M,N 2 k.................. M j,... M j,j M j,j+... M j,n 2 k M j+,... M j+,j M j+,j+... M j+,n 2 k.................. (3) M N 2 k,... M N 2 k,j M N 2 k,j+... M N 2 k,n 2 k M j+,j+... M j+,n 2 k Let H j = (M j,j+,..., M j,n 2 k).......... M N 2 k,j+... M N 2 k,n 2 k We next construct independent variables τ j for j =, 2,..., N 2 k using standard textbook techniques (e.g. Anderson, 2003; Muirhead 982). τ = γ + H (γ 2, γ 3,..., γ N 2 k) (4) τ 2 = γ 2 + H 2 (γ 3, γ 4,..., γ N 2 k) (5)... τ N 2 K = γ N 2 k + H N 2 k γ N 2 k (6) τ N 2 K = γ N 2 k (7) 3

The joint density of τ, τ 2,..., τ N 2 k is N(τ 0, σ 2 γ )N(τ 2 0, σ 2 γ 2 )...N(τ N 2 k 0, σ 2 γ N 2 k ) (8) where σ 2 γ j = M j,j H j (M j,j+,..., M j,n 2 k), with σ 2 γ N 2 k = M N 2 k,n 2 k. Note that it is computationally feasible to derive σ 2 γ j when M is sparse. The Jacobian of transforming Γ N(0, M) to (8) is. Defining η j = τ j /λ, we can write (8) as N(η 0, σ 2 γ )N(η 2 0, σ 2 γ 2 )...N(η N 2 k 0, σ 2 γ N 2 k ) (9) Let f j (Γ) = 2(ηj 2 ), the scale mixture prior is: N 2 k π(β Σ, Γ, λ ) { exp[ β2 j 0 2π(2ηj 2)) 2(2ηj 2)]d(2η2 j ) j= λ2 2σγ 2 exp[ j 2 (σγ 2 j )/λ 2 ]} (0) The last two terms in (0) constitute a scale mixture of Normals (with an exponential mixing density), which can be expressed as the univariate Laplace distribution λ 2 exp( λ β σγ 2 j σ 2 j ). γj Equation (0) shows that the conditional prior for β j is N(0, ), and 2ηj 2 the conditional prior for β is β Γ, Σ, Λ, Λ 2 N(0, D Γ) () where DΓ = diag([,,..., ]). 2η 2 2η2 2 2η 2 N 2 k 4

Priors for Σ and λ 2 can be elicited following standard practice in VAR and Lasso literature. In this paper, we set Wishart prior for Σ and Gamma prior for λ 2 : Σ W (S, ν), λ 2 G(µ λ 2, ν λ 2 ). The full conditional posterior for β is β N(β, V β ), where V β = [(I N X) )(Σ I Nk )(I N X)+(D Γ ) ], and β = V β [(I N X) (Σ I Nk )y]. The Full conditional posterior for Σ is W (S, ν), with S = (Y XB) (Y XB) + 2Q Q + S and ν = T + 2Nk + ν, with vec(q) = Γ. The Full conditional posterior for λ 2 is G( µ λ, ν λ ), where ν λ = ν λ + 2N 2 k ν λ µ λ and µ λ = ν λ +2µ λ τ 2 Finally the full conditional posterior of is j /σγ 2 j 2ηj 2 λ Inverse Gaussian: IG( 2, λ2 βj 2 ). Γ can not be directly drawn from the σ2 γ σ 2 j γ j posteriors. But it can be recovered in each Gibbs iteration using the draws of and Σ. Conditional on arbitrary starting values, the Gibbs sampler contains the following six steps:. draw β Σ, Λ, Γ from N(β, V β ); 2. draw Σ β, Λ, Γ from W (S, ν) 3. draw λ 2 Σ, β, Γ from G( µ λ, ν λ ) 4. draw λ β, Σ, Λ 2ηj 2 from IG( 2, λ2 βj 2 ) for j =, 2,...N 2 k. σ2 γ σ 2 j γ j 5. calculate Γ based on draws of Σ and in the current iteration. We adopt the same form of the inverse-gaussian density used in Park and Casella (2008). 5

.2 Adaptive Lasso VAR Shrinkage We define the adaptive Lasso estimator for a VAR as: N 2 k ˆβ AL = arg min β {[y (I n X)β] [y (I n X)β] + λ,j β j } (2) Correspondingly, the conditional multivariate mixture prior for β takes the following form: j= N 2 k π(β Σ, Γ, Λ ) j= { 0 2πfj (Γ)) exp[ 2f j (Γ) β2 j ]d(f j (Γ))} { M 2 exp( 2 Γ M Γ)} 2 (3) where Γ = [γ, γ 2,..., γ N 2 k], M = Σ I Nk, and f j (Γ) is a function of Γ and Λ to be defined later. In this mixture prior, the terms associated with the L penalty are conditional on Σ through f j (Γ). In equation (3), the variances of β a and β b for a b are related through M. However, β a and β b themselves are independent of each other. We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (3) takes the form of a multivariate Normal distribution Γ N(0, M). For ease of exposition, we first write the N 2 k N 2 k covariance matrix M as following: 6

M = M,... M,j M,j+... M,N 2 k.................. M j,... M j,j M j,j+... M j,n 2 k M j+,... M j+,j M j+,j+... M j+,n 2 k.................. (4) M N 2 k,... M N 2 k,j M N 2 k,j+... M N 2 k,n 2 k M j+,j+... M j+,n 2 k Let H j = (M j,j+,..., M j,n 2 k).......... M N 2 k,j+... M N 2 k,n 2 k We next construct independent variables τ j for j =, 2,..., N 2 k using standard textbook techniques (e.g. Anderson, 2003; Muirhead 982). τ = γ + H (γ 2, γ 3,..., γ N 2 k) (5) τ 2 = γ 2 + H 2 (γ 3, γ 4,..., γ N 2 k) (6)... τ N 2 K = γ N 2 k + H N 2 k γ N 2 k (7) τ N 2 K = γ N 2 k (8) The joint density of τ, τ 2,..., τ N 2 k is N(τ 0, σ 2 γ )N(τ 2 0, σ 2 γ 2 )...N(τ N 2 k 0, σ 2 γ N 2 k ) (9) 7

where σ 2 γ j = M j,j H j (M j,j+,..., M j,n 2 k), with σ 2 γ N 2 k = M N 2 k,n 2 k. Note that it is computationally feasible to derive σ 2 γ j when M is sparse. The Jacobian of transforming Γ N(0, M) to (9) is. Defining η j = τ j /λ,j, we can write (9) as N(η 0, σ 2 γ, )N(η 2 0, σ 2 γ 2,2 )...N(η N 2 k 0, σ 2 γ N 2 k,n 2 k ) (20) Let f j (Γ) = 2(ηj 2 ), the scale mixture prior is: N 2 k π(β Σ, Γ, Λ ) { exp[ β2 j 0 2π(2ηj 2)) 2(2ηj 2)]d(2η2 j ) j= λ2,j 2σγ 2 exp[ j 2 (σγ 2 j )/λ 2 ]},j (2) Equation (2) shows that the conditional prior for β j is N(0, ), and 2ηj 2 the conditional prior for β is β Γ, Σ, Λ, Λ 2 N(0, D Γ) (22) where DΓ = diag([,,..., ]). 2η 2 2η2 2 2η 2 N 2 k Priors for Σ and λ 2,j can be elicited following standard practice in VAR and Lasso literature. In this paper, we set Wishart prior for Σ and Gamma prior for λ 2,j : Σ W (S, ν), λ 2,j G(µ λ 2,j, ν λ 2,j ). The full conditional posterior for β is β N(β, V β ), where V β = [(I N X) )(Σ I Nk )(I N X)+(D Γ ) ], and β = V β [(I N X) (Σ I Nk )y]. The Full conditional posterior for Σ is W (S, ν), with S = (Y XB) (Y XB) + 2Q Q + S and ν = T + 2Nk + ν, with vec(q) = Γ. The 8

Full conditional posterior for λ 2,j is G( µ λ,j, ν λ,j ), where ν λ,j = ν λ,j + 2 and µ λ,j = ν λ,j σj 2µ λ,j 2τj 2µ. Finally the full conditional posterior of is +ν λ λ,j σγ 2,j j 2ηj 2 λ Inverse Gaussian: IG( 2,j, λ2,j ). Γ can not be directly drawn from the βj 2σ2 γ σ 2 j γ j posteriors. But it can be recovered in each Gibbs iteration using the draws of and Σ. Conditional on arbitrary starting values, the Gibbs sampler contains the following six steps:. draw β Σ, Λ, Γ from N(β, V β ); 2. draw Σ β, Λ, Γ from W (S, ν) 3. draw λ 2,j β, Σ, Λ, j, Γ from G( µ λ,j, ν λ,j ) for j =, 2,...N 2 k 4. draw λ β, Σ, Λ 2ηj 2 from IG( 2,j, λ2,j ) for j =, 2,...N 2 k. βj 2σ2 γ σ 2 j γ j 5. calculate Γ based on draws of Σ and in the current iteration..3 E-net Lasso VAR Shrinkage We define the e-net Lasso estimator for a VAR as: N 2 k N 2 k ˆβ EL = arg min β {[y (I n X)β] [y (I n X)β] + λ β j + λ 2 βj 2 } j= j= (23) Correspondingly, the conditional multivariate mixture prior for β takes 9

the following form: N 2 k λ2 π(β Σ, Γ, λ, λ 2 ) { exp( λ 2 2π j= 0 2 β2 j ) 2πfj (Γ)) exp[ 2f j (Γ) β2 j ]d(f j (Γ))} { M 2 exp( 2 Γ M Γ)} 2 (24) where Γ = [γ, γ 2,..., γ N 2 k], M = Σ I Nk, and f j (Γ) is a function of Γ and Λ to be defined later. In this mixture prior, the terms associated with the L penalty are conditional on Σ through f j (Γ). In equation (24), the variances of β a and β b for a b are related through M. However, β a and β b themselves are independent of each other. We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (24) takes the form of a multivariate Normal distribution Γ N(0, M). For ease of exposition, we first write the N 2 k N 2 k covariance matrix M as following: M = M,... M,j M,j+... M,N 2 k.................. M j,... M j,j M j,j+... M j,n 2 k M j+,... M j+,j M j+,j+... M j+,n 2 k.................. (25) M N 2 k,... M N 2 k,j M N 2 k,j+... M N 2 k,n 2 k 0

M j+,j+... M j+,n 2 k Let H j = (M j,j+,..., M j,n 2 k).......... M N 2 k,j+... M N 2 k,n 2 k We next construct independent variables τ j for j =, 2,..., N 2 k using standard textbook techniques (e.g. Anderson, 2003; Muirhead 982). τ = γ + H (γ 2, γ 3,..., γ N 2 k) (26) τ 2 = γ 2 + H 2 (γ 3, γ 4,..., γ N 2 k) (27)... τ N 2 K = γ N 2 k + H N 2 k γ N 2 k (28) τ N 2 K = γ N 2 k (29) The joint density of τ, τ 2,..., τ N 2 k is N(τ 0, σ 2 γ )N(τ 2 0, σ 2 γ 2 )...N(τ N 2 k 0, σ 2 γ N 2 k ) (30) where σ 2 γ j = M j,j H j (M j,j+,..., M j,n 2 k), with σ 2 γ N 2 k = M N 2 k,n 2 k. Note that it is computationally feasible to derive σ 2 γ j when M is sparse. The Jacobian of transforming Γ N(0, M) to (30) is. Defining η j = τ j /λ, we can write (30) as N(η 0, σ 2 γ )N(η 2 0, σ 2 γ 2 )...N(η N 2 k 0, σ 2 γ N 2 k ) (3)

Let f j (Γ) = 2(ηj 2 ), the scale mixture prior is: N 2 k π(β Σ, Γ, λ, λ 2 ) j= λ2 { exp( λ 2 2π 2 β2 j ) 0 exp[ β2 j 2π(2ηj 2)) 2(2ηj 2)]d(2η2 j ) (32) 2σγ 2 exp[ j 2 (σγ 2 j )/λ 2 ]} λ2 where η j = τ j /λ. The last two terms in (32) constitute a scale mixture of Normals (with an exponential mixing density), which can be expressed as the univariate Laplace distribution λ 2 σ 2 γ j exp( λ σ 2 γj β j ). Equation (32) shows that the conditional prior for β j is N(0, and the conditional prior for β is 2λ 2 η 2 j +), β Γ, Σ, Λ, Λ 2 N(0, D Γ) (33) where D Γ = diag([ 2η 2 2λ 2 η 2+, 2η2 2 2η 2λ 2 η2 2 +,..., 2 N 2 k 2λ 2 η 2 N 2 k +]). Priors for Σ and λ 2 can be elicited following standard practice in VAR and Lasso literature. In this paper, we set Wishart prior for Σ and Gamma priors for λ 2 and λ 2: Σ W (S, ν), λ 2 G(µ λ 2, ν λ 2 ), λ 2 G(µ λ2, ν λ2 ). The full conditional posterior for β is β N(β, V β ), where V β = [(I N X) )(Σ I Nk )(I N X)+(D Γ ) ], and β = V β [(I N X) (Σ I Nk )y]. The Full conditional posterior for Σ is W (S, ν), with S = (Y XB) (Y XB) + 2Q Q + S and ν = T + 2Nk + ν, with vec(q) = Γ. The 2

Full conditional posterior for λ 2 is G( µ λ, ν λ ), where ν λ = ν λ + 2N 2 k and µ λ = ν λ µ λ ν λ +2µ λ τ 2 j /σ 2 γ j. The Full conditional posterior for λ 2 is G( µ λ2, ν λ2 ), µ λ2 ν λ2 where ν λ2 = ν λ2 + N 2 k and µ λ2 = ν λ2 +µ λ2 β 2. Finally the full conditional j λ posterior of is Inverse Gaussian: IG( 2 2ηj 2, λ2 βj 2 ). Γ can not be directly σ2 γ σ 2 j γ j drawn from the posteriors. But it can be recovered in each Gibbs iteration using the draws of and Σ. 2ηj 2 Conditional on arbitrary starting values, the Gibbs sampler contains the following six steps:. draw β Σ, Λ, Λ 2, Γ from N(β, V β ); 2. draw Σ β, Λ, Λ 2, Γ from W (S, ν) 3. draw λ 2 β, Σ, Λ 2, Γ from G( µ λ, ν λ ) 4. draw λ 2 β, Σ, Λ, Γ from G( µ λ2, ν λ2 ) 5. draw λ β, Σ, Λ 2ηj 2, Λ 2 from IG( 2, λ2 βj 2 ) for j =, 2,...N 2 k. σ2 γ σ 2 j γ j 6. calculate Γ based on draws of Σ and in the current iteration..4 Adaptive E-net Lasso VAR Shrinkage In line with Zou and Zhang (2009), we define the adaptive e-net Lasso estimator for a VAR as following: N 2 k N 2 k ˆβ AEL = arg min β {[y (I n X)β] [y (I n X)β] + λ,j β j + λ 2 βj 2 } j= j= (34) 3

Correspondingly, the conditional multivariate mixture prior for β takes the following form: N 2 k λ2 π(β Σ, Γ, Λ, λ 2 ) { exp( λ 2 2π j= 0 2 β2 j ) 2πfj (Γ)) exp[ 2f j (Γ) β2 j ]d(f j (Γ))} { M 2 exp( 2 Γ M Γ)} 2 (35) where Γ = [γ, γ 2,..., γ N 2 k], M = Σ I Nk, and f j (Γ) is a function of Γ and Λ to be defined later. In this mixture prior, the terms associated with the L penalty are conditional on Σ through f j (Γ). We need to find an appropriate f j (Γ) which provides us tractable posteriors. The last term in equation (35) takes the form of a multivariate Normal distribution Γ N(0, M). For ease of exposition, we first write the N 2 k N 2 k covariance matrix M as following: M = M,... M,j M,j+... M,N 2 k.................. M j,... M j,j M j,j+... M j,n 2 k M j+,... M j+,j M j+,j+... M j+,n 2 k.................. (36) M N 2 k,... M N 2 k,j M N 2 k,j+... M N 2 k,n 2 k 4

M j+,j+... M j+,n 2 k Let H j = (M j,j+,..., M j,n 2 k).......... M N 2 k,j+... M N 2 k,n 2 k We next construct independent variables τ j for j =, 2,..., N 2 k using standard textbook techniques (e.g. Anderson, 2003; Muirhead 982). τ = γ + H (γ 2, γ 3,..., γ N 2 k) (37) τ 2 = γ 2 + H 2 (γ 3, γ 4,..., γ N 2 k) (38)... τ N 2 K = γ N 2 k + H N 2 k γ N 2 k (39) τ N 2 K = γ N 2 k (40) The joint density of τ, τ 2,..., τ N 2 k is N(τ 0, σ 2 γ )N(τ 2 0, σ 2 γ 2 )...N(τ N 2 k 0, σ 2 γ N 2 k ) (4) where σ 2 γ j = M j,j H j (M j,j+,..., M j,n 2 k), with σ 2 γ N 2 k = M N 2 k,n 2 k. Note that it is computationally feasible to derive σ 2 γ j when M is sparse. The Jacobian of transforming Γ N(0, M) to (4) is. Defining η j = τ j /λ,j, we can write (4) as N(η 0, σ 2 γ, )N(η 2 0, σ 2 γ 2,2 )...N(η N 2 k 0, σ 2 γ N 2 k,n 2 k ) (42) 5

Let f j (Γ) = 2(ηj 2 ). The scale mixture prior in (35) can be rewritten as: N 2 k π(β Σ, Γ, Λ, λ 2 ) j= λ2 { exp( λ 2 2π 2 β2 j ) 0 exp[ β2 j 2π(2ηj 2)) 2(2ηj 2)]d(2η2 j ) (43) λ2,j 2σγ 2 exp[ j 2 (σγ 2 j )/λ 2 ]},j The last two terms in (43) constitute a scale mixture of Normals (with an exponential mixing density), which can be expressed as the univariate Laplace distribution λ,j 2 σ 2 γ j exp( λ,j σ 2 γj β j ). Equation (43) shows that the conditional prior for β j is N(0, and the conditional prior for β is 2λ 2 η 2 j +), β Γ, Σ, Λ, Λ 2 N(0, D Γ) (44) where D Γ = diag([ 2η 2 2λ 2 η 2+, 2η2 2 2η 2λ 2 η2 2 +,..., 2 N 2 k 2λ 2 η 2 N 2 k +]). Priors for Σ and λ 2,j can be elicited following standard practice in VAR and Lasso literature. In this paper, we set Wishart prior for Σ and Gamma priors for λ 2,j and λ 2,j: Σ W (S, ν), λ 2,j G(µ λ 2,j, ν λ 2,j ), λ 2,j G(µ λ2, ν λ2 ). The full conditional posterior for β is β N(β, V β ), where V β = [(I N X) )(Σ I Nk )(I N X)+(D Γ ) ], and β = V β [(I N X) (Σ I Nk )y]. The Full conditional posterior for Σ is W (S, ν), with S = (Y XB) (Y XB) + 2Q Q + S and ν = T + 2Nk + ν, with vec(q) = Γ. The 6

Full conditional posterior for λ 2,j is G( µ λ,j, ν λ,j ), where ν λ,j = ν λ,j +2 and µ λ,j = ν λ,j σ 2 j µ λ,j 2τ 2 j µ λ,j +ν λ,j σ 2 γ j. The Full conditional posterior for λ 2 is G( µ λ2, ν λ2 ), µ λ2 ν λ2 where ν λ2 = ν λ2 + N 2 k and µ λ2 = ν λ2 +µ λ2 β 2. Finally the full conditional j λ posterior of is Inverse Gaussian: IG( 2,j, λ2,j ). Γ can not be directly 2ηj 2 βj 2σ2 γ σ 2 j γ j drawn from the posteriors. But it can be recovered in each Gibbs iteration using the draws of and Σ. 2ηj 2 Conditional on arbitrary starting values, the Gibbs sampler contains the following six steps:. draw β Σ, Λ, Λ 2, Γ from N(β, V β ); 2. draw Σ β, Λ, Λ 2, Γ from W (S, ν) 3. draw λ 2,j β, Σ, Λ, j, Λ 2, Γ from G( µ λ,j, ν λ,j ) for j =, 2,...N 2 k 4. draw λ 2 β, Σ, Λ, Γ from G( µ λ2, ν λ2 ) 5. draw λ β, Σ, Λ 2ηj 2, Λ 2 from IG( 2,j, λ2,j ) for j =, 2,...N 2 k. βj 2σ2 γ σ 2 j γ j 6. calculate Γ based on draws of Σ and in the current iteration. 2 Detailed Forecast Evaluation Results Tables -4 report the DAELasso forecasts results along with Lasso, adaptive Lasso, e-net Lasso, adaptive e-net Lasso, and those of the factor models and the seven popular Bayesian shrinkage priors in Koop (20). In line with Koop (20), we present MSFE relative to the random walk and log predictive likelihood for GDP, CPI and FFR. The results for DAELasso 7

and four other Lasso types of shrinkage methods are reported at the top of each table, followed by those of the methods reported in Koop (20). Koop (20) considers three variants of the Minnesota prior. The first is the natural conjugate prior used in Banbura et al (200), which is labelled Minn. Prior as in BGR. The second is the traditional Minnesota prior of Litterman (986), which is labelled Minn. Prior Σ diagonal. The third is the traditional Minnesota prior except that the upper left 3 3 block of Σ is not assumed to be daigonal, which is labelled Minn. Prior Σ not diagonal. Koop (20) also evaluates the performances of four types of SSVS priors. The first is the same as George et al (2008), which is labelled SSVS Non-conj. semi-automatic. The second is a combination of the non-conjugate SSVS prior and Minnesota prior with variables selected in a data based fashion, which is labelled SSVS Non-conj. plus Minn. Prior. The Third is a conjugate SSVS prior, which is labelled SSVS Conjugate Semi-automatic. The fourth is a combination of the conjugate SSVS prior and Minnesota prior, which is labelled SSVS Conjugate plus Minn. Prior. Finally the results for factor-augmented VAR models with one and four lagged factors are labelled as Factor model p= and Factor model p=4, respectively. We refer to Koop (20) for a lucid description of these priors. 8

DAELasso adaptive e-net Lasso e-net Lasso adaptive Lasso Lasso Minn. Prior as in BGR Minn. Prior Σ diagonal Minn. Prior Σ not diagonal SSVS Conjugate semi-automatic SSVS Conjugate plus Minn. Prior SSVS Non-conj. semi-automatic SSVS Non-conj. plus Minn. Prior Factor model p= Factor model p=4 Notes: Table : Rolling Forecasting for h = MSFEs as proportion of random walk MSFEs. Sum of log predictive likelihoods in parentheses. GDP CPI FFR 0.58 0.32 0.57 ( -98.9 ) ( -92.7 ) ( -2.7 ) 0.67 0.40 0.63 ( -95.8 ) ( -99.4 ) ( -25.0 ) 0.68 0.40 0.63 ( -25.3 ) ( -2.6 ) ( -223.7 ) 0.77 0.3 0.62 ( -225.6 ) ( -209.2 ) ( -228.3 ) 0.67 0.39 0.63 ( -255.8 ) ( -24.3 ) ( -257.6 ) 0.58 0.34 0.5 ( -90.5) ( -209.2 ) ( -77.4 ) 0.6 0.30 0.52 ( -94.0 ) ( -93.0 ) ( -8.7 ) 0.6 0.3 0.53 ( -92. ) ( -202.4 ) ( -85.9 ) 0.8 0.38 0.63 ( -209.4 ) ( -23.8 ) ( -75.8 ) 0.59 0.35 0.5 ( -9.4 ) ( -22. ) ( -79.2 ) 0.88 0.47 0.73 ( -234.3 ) ( -236.0 ) ( -23.0 ) 0.68 0.34 0.52 ( -97.9 ) ( -95.2 ) ( -77.2 ).2 0.59.42 ( -252.8 ) ( -242.7 ) ( -236.4 ) 4.46.88 2.88 ( -40.7 ) ( -457.0 ) ( -352.7 ) 9

DAELasso adaptive e-net Lasso e-net Lasso adaptive Lasso Lasso Minn. Prior as in BGR Minn. Prior Σ diagonal Minn. Prior Σ not diagonal SSVS Conjugate semi-automatic SSVS Conjugate plus Minn. Prior SSVS Non-conj. semi-automatic SSVS Non-conj. plus Minn. Prior Factor model p= Factor model p=4 Notes: Table 2: Rolling Forecasting for h = 4 MSFEs as proportion of random walk MSFEs. Sum of log predictive likelihoods in parentheses. GDP CPI FFR 0.55 0.48 0.65 ( -206.9 ) ( -205.9 ) ( -230.9 ) 0.53 0.47 0.55 ( -95.7 ) ( -204.4 ) ( -29.9) 0.53 0.47 0.55 ( -25.2 ) ( -23.5 ) ( -225.5 ) 0.74 0.54 0.78 ( -233.9 ) ( -223.0 ) ( -247.7 ) 0.53 0.47 0.55 ( -255.9 ) ( -242.6 ) ( -259.0 ) 0.59 0.55 0.59 ( -27. ) ( -227.7 ) ( -23.4 ) 0.59 0.55 0.59 ( -2. ) ( -232.4 ) ( -246.6 ) 0.58 0.58 0.58 ( -20.6 ) ( -222.2 ) ( -22. ).23 0.99.32 ( -282.6 ) ( -284.3 ) ( -273.8 ) 0.63 0.54 0.6 ( -230.2 ) ( -22.2 ) ( -23.5 ).60.22.64 ( -294. ) ( -266.2 ) ( -268.8 ) 0.63 0.5 0.58 ( -209.9 ) ( -20.3 ) ( -98. ).39 0.9.35 ( -280. ) ( -255.5 ) ( -283.4 ) 5.03 3.64 6.73 ( -562.9 ) ( -522.3 ) ( -593.8 ) 20

DAELasso adaptive e-net Lasso e-net Lasso adaptive Lasso Lasso Minn. Prior as in BGR Minn. Prior Σ diagonal Minn. Prior Σ not diagonal SSVS Conjugate semi-automatic Table 3: Recursive Forecasting for h = SSVS Conjugate plus Minn. Prior SSVS Non-conj. semi-automatic SSVS Non-conj. plus Minn. Prior Factor model p= Factor model p=4 Notes: MSFEs as proportion of random walk MSFEs. Sum of log predictive likelihoods in parentheses. GDP CPI FFR 0.55 0.29 0.56 ( -20.4 ) ( -90.9 ) ( -224.2 ) 0.67 0.40 0.63 ( -242.0 ) ( -20.6 ) ( -239.8 ) 0.68 0.40 0.63 ( -225.6 ) ( -22.5 ) ( -237.9 ) 0.62 0.28 0.60 ( -29.2 ) ( -96.4 ) ( -226.8 ) 0.67 0.39 0.62 ( -236.5 ) ( -22. ) ( -242.9 ) 0.56 0.30 0.5 ( -92.3 ) ( -95.9 ) ( -229. ) 0.58 0.28 0.54 ( -204.3 ) ( -82.2 ) ( -238.8 ) 0.55 0.27 0.52 ( -95.4 ) ( -84. ) ( -249.5 ) 0.68 0.27 0.63 ( -99.9 ) ( -9.2 ) ( -245.3 ) 0.56 0.3 0.5 ( -92.5 ) ( -97.6 ) ( -228.5) 0.64 0.32 0.58 ( -205. ) ( -96.5 ) ( -237.2 ) 0.65 0.29 0.54 ( -203.9 ) ( -87.6 ) ( -228.9 ) 0.68 0.30 0.67 ( -98.3 ) ( -93.2 ) ( -227.9 ) 0.90 0.35 0.77 ( -22.9 ) ( -29. ) ( -245.6 ) 2

DAELasso adaptive e-net Lasso e-net Lasso adaptive Lasso Lasso Minn. Prior as in BGR Minn. Prior Σ diagonal Minn. Prior Σ not diagonal SSVS Conjugate semi-automatic Table 4: Recursive Forecasting for h = 4 SSVS Conjugate plus Minn. Prior SSVS Non-conj. semi-automatic SSVS Non-conj. plus Minn. Prior Factor model p= Factor model p=4 Notes: MSFEs as proportion of random walk MSFEs. Sum of log predictive likelihoods in parentheses. GDP CPI FFR 0.54 0.48 0.6 ( -28.3 ) ( -206.6 ) ( -239.6 ) 0.53 0.47 0.55 ( -25.6 ) ( -207.0 ) ( -247.3 ) 0.53 0.47 0.55 ( -225.5 ) ( -23.7 ) ( -239.0 ) 0.63 0.52 0.66 ( -228.0 ) ( -24.7 ) ( -242.2 ) 0.53 0.47 0.55 ( -236.2 ) ( -222.8 ) ( -244.3 ) 0.6 0.52 0.59 ( -24.7 ) ( -29.4 ) ( -249.6 ) 0.6 0.52 0.6 ( -24.0 ) ( -27.6 ) ( -278. ) 0.62 0.52 0.59 ( -23.3 ) ( -26. ) ( -244.8 ) 0.65 0.60 0.59 ( -22.4 ) ( -225.0 ) ( -249.5 ) 0.84 0.70 0.67 ( -29.6 ) ( -246.6 ) ( -258.5 ) 0.75 0.77 0.88 ( -293.2 ) ( -226.4 ) ( -268. ) 0.67 0.49 0.53 ( -29.0 ) ( -20.6 ) ( -233.7 ) 0.84 0.55 0.69 ( -228.9 ) ( -2.6 ) ( -244. ) 0.89 0.62 0.68 ( -243.6 ) ( -227.4 ) ( -249. ) 22