Spatial Modeling with Spatially Varying Coefficient Processes

Spatial Modeling with Spatially Varying Coefficient Processes Alan E. Gelfand, Hyon-Jung Kim, C. F. Sirmans, Sudipto Banerjee Presenters: Halley Brantley and Chris Krut September 28, 2015

Introduction Want to model selling price of single family homes in Baton Rouge. Covariates Age See Figure 1. Pg. 392 Square feet of liv Square feet of ot # of bathrooms

Linear Model Y (s) = X(s)β + W (s) + ɛ β = (β 0, β 1, β 2, β 3, β 4 ) X(s) = (1, Age, LivingArea, OtherArea, Bathrooms) ɛ N(0, τ 2 ) independent error term Spatial Components 1. s i is the location of observation i 2. Spatial random effect W (s) N(0, σ 2 H) 3. (H) ij = ρ(s i s j φ) 4. cov(y (s i ), Y (s j ) β, τ 2, φ) = σ 2 ρ(s i s j φ)

Challenge: βs may not be constant across space Possibility 1: divide area into similar blocks. Must assume coefficient is constant on specified area. Arbitrary scale of resolution Can t interpolate the surface Possibility 2: Assume β is a polynomial or spline function of lat/lon. Arbitrary specification of polynomial could be too inflexible. Splines are more flexible, but need to specify number and location of knots.

Gaussian Process Definition Y (s) is a Gaussian process with mean function µ(s) and covariance function cov(y (s), Y (s )) = H(s, s ) if for every subset of locations s 1,..., s n the vector Ỹ = (Y (s 1),..., Y (s n )) T Ỹ MV N n ( µ, H) (1) where µ = (µ(s 1 ),..., µ(s n )) T and H is a matrix such that {H} ij = H(s i, s j ; φ). To be a valid covariance function H(s, s : φ) must be positive semidefinite in the sense that it generates covariance matrices H which are positive semi definite v T Hv 0[1, Pg. 80].

Spatial Intercept Model: Process Model Y (s) = β 0 + x(s)β 1 + β 0 (s) + ε(s) E(ε(s)) = 0 cov(ε(s), ε(s )) = τ 2 I(s = s ) β 0 (s) Gaussian Process(0,H(s, s ; φ, σ 2 )) H(s, s ; φ, σ 2 ) = σ 2 ρ(s s ; φ). β 0 (s) = β 0 + β 0 (s) Assign Prior distributions to remaining parameters (β 0, β 1, σ 2 0, τ 2, φ 0 )

Spatial Intercept Model: Discretized Model Observe Y (s), x(s), over set of locations s 1,..., s n. Y (s i ) = β 0 + x(s i )β 1 + β 0 (s i ) + ε(s i ) ε(s i ) N(0, τ 2 ) (Y (s i ) β 0, β 1, β 0 (s i )) N(β 0 + x(s i )β 1 + β 0 (s i ), τ 2 ) β 0 = (β 0 (s 1 ),..., β 0 (s n )) β 0 MV N(0, σ 2 0H 0 (φ 0 )) Assign Prior distributions to remaining parameters (β 0, β 1, σ 2 0, τ 2, φ 0 )

Spatial Slope Model: Process Model Y (s) = β 0 + x(s)β 1 + x(s)β 1 (s) + ε(s) E(ε(s)) = 0 cov(ε(s), ε(s )) = τ 2 I(s = s ) β 0 (s) Gaussian Process(0,H(s, s ; φ, σ 2 )) H(s, s ; φ, σ 2 ) = σ 2 ρ(s s ; φ)). β 1 (s) = β 1 + β 1 (s) Assign Prior distributions to remaining parameters (β 0, β 1, σ 2 0, τ 2, φ 0 ) The process Y (s) ends up being heterogeneous and non-stationary. var(y (s) β0, β 1, τ 2, σ 2 1, φ 1 ) = x 2 (s)σ 2 1 + τ 2 cov(y (s), Y (s ) β 0, β 1, τ 2, σ 2 1, φ 1 ) = x(s)x(s )σ 2 1ρ 1 (s s ; φ 1 )

Spatial Slope Model: Discretized Model Observe Y (s), x(s), over set of locations s 1,..., s n. Y (s i ) = β 0 + x(s i )β 1 + x(s i )β 1 (s i ) + ε(s) ε(s i ) N(0, τ 2 ) (Y (s i ) β 0, β 1, β 1 (s i )) N(β 0 + x(s i )β 1 + x(s i )β 1 (s i ), τ 2 ) β 1 = (β 1 (s 1 ),..., β 1 (s n )) β 1 MV N(0, σ 2 1H 1 (φ 1 )) Assign Prior distributions to remaining parameters (β 0, β 1, σ 2 0, τ 2, φ 0 )

Spatial Varying Coefficients Model Y (s) = β 0 + x(s)β 1 + β 0 (s) + x(s)β 1 (s) + ε(s) Y (s i ) = β 0 + x(s i )β 1 + β 0 (s i ) + x(s i )β 1 (s i ) + ε(s i ) Vector of spatial random effects at observed locations: β = (β 0 (s 1 ), β 1 (s 1 ),..., β 0 (s n ), β 1 (s n )) T β MV N(0, H(φ) T ) ( ) T11 T Positive definite matrix: T = 12 T 21 T 22 cov(β 0 (s i ), β 0 (s j )) = ρ(s i s j : φ)t 11 cov(β 1 (s i ), β 1 (s j )) = ρ(s i s j : φ)t 22 cov(β 0 (s i ), β 1 (s j )) = ρ(s i s j : φ)t 12

Multiple Regression with Spatially Varying coefficients Y (s) = X T (s) β(s) + ε(s) Y (s) = X T (s)µ β + X T (s)β(s) + ε(s) X T (s) = (1, x 1 (s),..., x p (s)) β(s) = µ β + β(s) µ β = (β 0, β 1,..., β p ) β(s) = (β 0 (s), β 1 (s),..., β p (s))

Multiple Regression with Spatially Varying coefficients y = X(1 µ β ) + Xβ + ε y = (Y (s 1 ),..., Y (s n )) T X(s 1 ) X(s 2 ) X =... X(s) µ β = (β 0, β 1,..., β p ) T β = (β 0 (s 1 ),..., β p (s 1 ),..., β 0 (s n ),..., β p (s n )) T β MV N(0, H(φ) T ) T is a p + 1 p + 1 positive definite matrix. cov(β i (s), β j (s )) = ρ(s s : φ)t i+1j+1

Marginalized Likelihood L(µ β, T, τ 2, φ y) = f(y X, µ β, β, T, φ, τ 2 )π(β)dβ L(µ β, T, τ 2, φ y) = (X(H(φ) T )X T + τ 2 I) 1/2 exp( 1 2 (y X(1 µ β)) T (X(H(φ) T )X T + τ 2 I) 1 (y X(1 µ β )))

Slice Sampling Want to sample from posterior f(µ β, T, τ 2, φ y) Introduce an auxiliary variable U unif(0, L(µ β, T, τ 2, φ y)) and use the fact f(µ β, T, τ 2, φ, U y) = I(U < L(µ β, T, τ 2, φ y))π(µ β, T, τ 2, φ) Sample U from unif(0, L(µ β, T, τ 2, φ y)). Sample µ β, T, τ 2, φ from π(µ β, T, τ 2, φ) subject to I(U < L(µ β, T, τ 2, φ y)).

Sampling β(s) 1. We can obtain the Marginal Posterior f( β, y) = f( β µ β, T, φ, τ 2, y) f(µ β, T, φ, τ 2 y) 2. Full Conditional is of a known form f( β µ β, T, φ, τ 2, y) = N(Bb, B) B = (X T X/τ 2 + H 1 (φ) T 1 ) 1 b = X T y/τ 2 + (H 1 (φ) T 1 )(1 µ β ) 3. We can generate from f( β, y) using the samples from f(µ β, T, φ, τ 2 y) and the full conditional distribution in 2.

Sampling β(s n+1 ) for a new location s n+1 1. We can obtain the Marginal Posterior f( β(s n+1 ), y) = f( β(s n+1 ) β, µ β, T, φ, τ 2 )f( β, µ β, T, φ, τ 2 y) 2. Conditional distribution has a known form f( β(s n+1 ) β, µ β, T, φ, τ 2 ) = N(µ, σ) µ = µ β + (h T new(φ)h 1 (φ) I)( β 1 n 1 µ β ) σ = (I h T new(φ)h 1 (φ)h new (φ))t h new (φ) = (ρ(s n+1 s 1 ; φ),..., ρ(s n+1 s 1 ; φ)) T 3. Similar to before we can use posterior samples to generate β(s n+1 )

Predict y(s n+1 ) from predictive posterior 1. The Predictive Posterior is given by f(y(s n+1 ) y) = f(y(s n+1 ) β(s n+1 ), τ 2 ) f( β(s n+1 ) β, µ β, T, φ) f( β, µ β, T, φ, τ 2 y) 2. Can sample from f(y(s n+1 ) y) by iteratively generating from f( β, µ β, T, φ, τ 2 y), f( β(s n+1 ) β, µ β, T, φ), f(y(s n+1 ) β(s n+1 ), τ 2 ).

Application to housing prices Compared 4 different models: 1. Allow all βs to vary spatially (multivariate). 2. Allow 3 of the βs to vary spatially. 3. Allow 2 of the βs to vary spatially. 4. Allow all βs to vary spatially (independently). Model comparison was based on the goodness-of-fit and penalty for model complexity. D = (Y obs (s, t) µ(s, t)) 2 + k σ 2 (s, t) (s,t) (s,t)

Spatial Results See Table 3, Pg. 392

Spatial Results See Figure 4, Pg. 393

Space-Time Regression with Varying Coefficients Y (s) = X T (s, t) β(s, t) + ε(s, t) β(s, t) = µ β + β(s, t) 1. β(s, t) = β(s) 2. β(s, t) = β(s) + α(t) 2.a α k (t) = I(t = 1)α k1 +... + I(t = M)α km 2.b α(t) = (α 0 (t),..., α P (t)) T cov(α(t), α(t )) = ρ (2) (t t ; γ)t 3. β(s, t) = β t (s)(independent processes nested in t) (cov(β t (s), β t (s )) = ρ(s s ; φ (t) )T t 4. Separable covariance cov(β(s, t), β(s, t )) = ρ 1 (s s ; φ) ρ 2 (t t : γ)t

Spatio-Temporal Results Model prices over 4 years. Model 1: β(s, t) = β(s) Model 2: β(s, t) = β(s) + α(t) Model 2a: α(t) as 4 iid time dummies Model 2b: α(t) as multivariate temporal process Model 3: β(s, t) = β (t) (s)(independent process at each t) Model 3a: common µ b across t. Model 3b: µ (t) b Model 4: Separable spatio-temporal model.

See Table 6 Pg. 395

See Table 7 Pg. 395

Generalized Linear Model f(y(s i ) θ(s i )) = h(y(s i )) exp(θ(s i )y(s i ) b(θ(s i ))) L( β y) ( = exp y(si )X T (s i ) β(s i ) b(x T (s i ) β(s ) i )) θ(s i ) = X T (s i ) β(s i ) β N(1 n 1 µ β, H(φ) T ) Constructing a satisfactory sampler is quite challenging. More work needs to be done on model fitting side.

Carl Edward Rasmussen. Gaussian processes for machine learning. 2006.