An Introduction to Splines

Σχετικά έγγραφα
Numerical Analysis FMN011

EE512: Error Control Coding

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Approximation of distance between locations on earth given by latitude and longitude

2 Composition. Invertible Mappings

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 10η: Basics of Game Theory part 2 Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Partial Differential Equations in Biology The boundary element method. March 26, 2013

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Other Test Constructions: Likelihood Ratio & Bayes Tests

Modern Regression HW #8 Solutions

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

5.4 The Poisson Distribution.

Section 8.3 Trigonometric Equations

Statistical Inference I Locally most powerful tests

Generalized additive models in R

Reminders: linear functions

ST5224: Advanced Statistical Theory II

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

6.3 Forecasting ARMA processes

Biostatistics for Health Sciences Review Sheet

Math 6 SL Probability Distributions Practice Test Mark Scheme

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

Homework 3 Solutions

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Section 9.2 Polar Equations and Graphs

Forced Pendulum Numerical approach

Άσκηση 10, σελ Για τη μεταβλητή x (άτυπος όγκος) έχουμε: x censored_x 1 F 3 F 3 F 4 F 10 F 13 F 13 F 16 F 16 F 24 F 26 F 27 F 28 F

The Simply Typed Lambda Calculus

TMA4115 Matematikk 3

HISTOGRAMS AND PERCENTILES What is the 25 th percentile of a histogram? What is the 50 th percentile for the cigarette histogram?

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Problem Set 3: Solutions

Module 5. February 14, h 0min

10.7 Performance of Second-Order System (Unit Step Response)

Μηχανική Μάθηση Hypothesis Testing

Bayesian modeling of inseparable space-time variation in disease risk

Lecture 2. Soundness and completeness of propositional logic

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Local Approximation with Kernels

Areas and Lengths in Polar Coordinates

Matrices and Determinants

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Correction Table for an Alcoholometer Calibrated at 20 o C

Math221: HW# 1 solutions

The challenges of non-stable predicates

Solution Series 9. i=1 x i and i=1 x i.

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

Concrete Mathematics Exercises from 30 September 2016

Solutions to Exercise Sheet 5

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Tridiagonal matrices. Gérard MEURANT. October, 2008

Quadratic Expressions

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Areas and Lengths in Polar Coordinates

Στατιστική Ανάλυση Δεδομένων II. Γραμμική Παλινδρόμηση με το S.P.S.S.

Instruction Execution Times

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

derivation of the Laplacian from rectangular to spherical coordinates

If we restrict the domain of y = sin x to [ π, π ], the restrict function. y = sin x, π 2 x π 2

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

D Alembert s Solution to the Wave Equation

Congruence Classes of Invertible Matrices of Order 3 over F 2

If we restrict the domain of y = sin x to [ π 2, π 2

C.S. 430 Assignment 6, Sample Solutions

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

HOMEWORK#1. t E(x) = 1 λ = (b) Find the median lifetime of a randomly selected light bulb. Answer:

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Homework 8 Model Solution Section

( ) 2 and compare to M.

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Fractional Colorings and Zykov Products of graphs

4.6 Autoregressive Moving Average Model ARMA(1,1)

Matrices and vectors. Matrix and vector. a 11 a 12 a 1n a 21 a 22 a 2n A = b 1 b 2. b m. R m n, b = = ( a ij. a m1 a m2 a mn. def

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

FORMULAS FOR STATISTICS 1

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Nondifferentiable Convex Functions

Second Order Partial Differential Equations

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

[2] T.S.G. Peiris and R.O. Thattil, An Alternative Model to Estimate Solar Radiation

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Srednicki Chapter 55

Second Order RLC Filters

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

SUPERPOSITION, MEASUREMENT, NORMALIZATION, EXPECTATION VALUES. Reading: QM course packet Ch 5 up to 5.6

Assalamu `alaikum wr. wb.

Μορφοποίηση υπό όρους : Μορφή > Μορφοποίηση υπό όρους/γραμμές δεδομένων/μορφοποίηση μόο των κελιών που περιέχουν/

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Transcript:

An Introduction to Splines Trinity River Restoration Program Workshop on Outmigration: Population Estimation October 6 8, 2009

An Introduction to Splines 1 Linear Regression Simple Regression and the Least Squares Method Least Squares Fitting in R Polynomial Regression 2 Smoothing Splines Simple Splines B-splines Overfitting and Smoothness

An Introduction to Bayesian Inference 1 Linear Regression Simple Regression and the Least Squares Method Least Squares Fitting in R Polynomial Regression

An Introduction to Bayesian Inference 1 Linear Regression Simple Regression and the Least Squares Method Least Squares Fitting in R Polynomial Regression

Simple Linear Regression Daily temperatures in Montreal from April 1 (Day 81) to June 30 (Day 191), 1961. Montreal Temp. April 1 to June 30, 1961 Temperature 0 5 10 15 20 100 120 140 160 180 Day of Year Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 5/52

Simple Linear Regression The Model Assumptions Mean On average, the change in the response is proportional to the change in the predictor. Errors 1. The deviation in the response for any observation does not depend on any other observation. 2. The average magnitude of the deviation is the same for all values of the predictor. Mathematically For i = 1,..., n: y i = β 0 + β 1 x i + ɛ i where ɛ 1,..., ɛ n are independent with mean 0 and variance σ 2. Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 6/52

The Least Squares Method Example: The Montreal Data Montreal Temp. April 1 to June 30, 1961 Montreal Temp. April 1 to June 30, 1961 Temperature 0 5 10 15 20 Temperature 0 5 10 15 20 100 120 140 160 180 100 120 140 160 180 Day of Year Day of Year Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 7/52

The Least Squares Method The Residuals Definition Given values for β 0 and β 1, the residual for the i th observation is the difference between the observed and the predicted response: where ŷ i = β 0 + β 1 x i. e i = y i ŷ i Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 8/52

The Least Squares Method The Least Squares Criterion The least squares method defines the best values of β 0 and β 1 to be those that minimize the sum of the squared residuals: SS = n ei 2 = i=1 n (y i ŷ i ) 2. i=1 Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 9/52

The Least Squares Method Example: The Montreal Data Montreal Temp. April 1 to June 30, 1961 Montreal Temp. April 1 to June 30, 1961 Temperature 0 5 10 15 20 Temperature 0 5 10 15 20 100 120 140 160 180 100 120 140 160 180 Day of Year Day of Year SS=1549.37 SS=1148.56 Introduction to Splines: Linear Regression, Simple Regression and the Least Squares Method 10/52

An Introduction to Bayesian Inference 1 Linear Regression Simple Regression and the Least Squares Method Least Squares Fitting in R Polynomial Regression

Least Squares Fitting in R The Data Suppose that the data is a data frame with elements: x: the days from 90 to 181 y: the observed temperatures > data = read. table (" MontrealTemp1. txt ") > summary ( data ) x y Min. : 90.0 Min. : -0.90 1st Qu.:112.8 1st Qu.: 5.60 Median : 135. 5 Median : 11. 55 Mean :135.5 Mean :11.46 3rd Qu.:158.2 3rd Qu.:16.70 Max. :181.0 Max. :23.60 > Introduction to Splines: Linear Regression, Least Squares Fitting in R 12/52

Least Squares Fitting in R Fitting the Model Fitting the model with lm: > lm(y~x, data ) Call : lm( formula = y ~ x, data = data ) Coefficients : ( Intercept ) x -15.3996 0.1982 Introduction to Splines: Linear Regression, Least Squares Fitting in R 13/52

Least Squares Fitting in R Fitting the Model Fitting the model with lm: > lmfit = lm(y~x, data ) > attributes ( lmfit ) $ names [1] " coefficients " " residuals " [3] " effects " " rank " [5] " fitted. values " " assign " [7] "qr" "df. residual " [9] " xlevels " " call " [11] " terms " " model " $ class [1] "lm" > Introduction to Splines: Linear Regression, Least Squares Fitting in R 14/52

Least Squares Fitting in R The Fitted Line Plotting the fitted line over the raw data: # Plot the raw data > plot ( data $x, data $y, main =" Montreal Temp.... ", xlab =" Day of Year ",ylab =" Temperature ") # Add the fitted line > lines ( data $x, lmfit $fit, col =" red ",lwd =3) Introduction to Splines: Linear Regression, Least Squares Fitting in R 15/52

Least Squares Fitting in R The Fitted Line Montreal Temp. April 1 to June 30, 1961 Temperature 0 5 10 15 20 100 120 140 160 180 Day of Year Introduction to Splines: Linear Regression, Least Squares Fitting in R 16/52

The Least Squares Method Goodness-of-Fit Testing Residual Diagnostics The value of the residuals should not depend on x or y in any systematic way. Common indications of lack of fit: trends with x or y (curves or clusters of high/low values) constant increase/decrease (funnel shape) increase followed by decrease (football shape) very large (+ or -) values (outliers) Assessed by plotting e versus x and y. Introduction to Splines: Linear Regression, Least Squares Fitting in R 17/52

Least Squares Fitting in R Residual Plots Plotting the residuals versus the predictor and response: ## Plot the residuals versus day > plot ( data $x, lmfit $ resid, xlab =" Day of Year ",ylab =" Residual ") > abline (h =0) ## Plot the residuals versus temperature > plot ( data $y, lmfit $ resid, xlab =" Temperature ",ylab =" Residual ") > abline (h =0) Introduction to Splines: Linear Regression, Least Squares Fitting in R 18/52

Least Squares Fitting in R The Fitted Line Residuals vs. Day Residuals vs. Temperature 100 120 140 160 180 10 5 0 5 10 Day of Year Residual 0 5 10 15 20 10 5 0 5 10 Temperature Residual Introduction to Splines: Linear Regression, Least Squares Fitting in R 19/52

Exercises 1. Montreal Temperature Data April 1 to June 30, 1961 File: Intro to splines\exercises\montreal temp 1.R Use the provide code to fit the simple linear regression model to the Montreal temperature data from the spring of 1961, plot the fitted line, and produce the residual plots. 2. Montreal Temperature Data Jan. 1 to Dec. 31, 1961 File: Intro to splines\exercises\montreal temp 2.R Repeat exercise 1 with the data from all of 1961. Introduction to Splines: Linear Regression, Least Squares Fitting in R 20/52

An Introduction to Bayesian Inference 1 Linear Regression Simple Regression and the Least Squares Method Least Squares Fitting in R Polynomial Regression

Polynomial Regression Motivation 0 100 200 300 20 10 0 10 20 Montreal Temp. January 1 to December 31, 1961 Day of Year Temperature Introduction to Splines: Linear Regression, Polynomial Regression 22/52

Polynomial Regression Motivation Residuals vs. Day Residuals vs. Temperature 0 100 200 300 20 10 0 10 20 Day of Year Residual 20 10 0 10 20 20 10 0 10 20 Temperature Residual Introduction to Splines: Linear Regression, Polynomial Regression 23/52

Polynomial Regression Polynomials Definition A polynomial of degree D is a function formed by linear combinations of the powers of its argument up to D: y = β 0 + β 1 x + β 2 x 2 + + β D x D Specific Polynomials Linear y = β 0 + β 1 x Quadratic y = β 0 + β 1 x + β 2 x 2 Cubic y = β 0 + β 1 x + β 2 x 2 + β 3 x 3 Quartic y = β 0 + β 1 x + β 2 x 2 + β 3 x 3 + β 4 x 4 Quintic y = β 0 + β 1 x + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 Introduction to Splines: Linear Regression, Polynomial Regression 24/52

Polynomial Regression The Design Matrix Definition The design matrix for a regression model with n observations and p predictors is the matrix with n rows and p columns such that the value of the j th predictor for the i th observation is located in column j of row i. Design matrix for a polynomial of degree D 1 2 3. n 1 x 1 x1 2 x1 3 x1 D 1 x 2 x2 2 x2 3 x2 D 1 x 3 x3 2 x3 3 x3 D. 1 x n xn 2 xn 3 xn D Introduction to Splines: Linear Regression, Polynomial Regression 25/52

Polynomial Regression in R Constructing the Design Matrix Quadratic The design matrix for polynomial regression can be generated with the function outer(): > D = 2 > X = outer ( data $x,1:d,"^") > X [1:5,] [,1] [,2] [1,] 1 1 [2,] 2 4 [3,] 3 9 [4,] 4 16 [5,] 5 25 > Note: we do not need to include the intercept column. Introduction to Splines: Linear Regression, Polynomial Regression 26/52

Polynomial Regression in R Least Squares Fitting Quadratic > lmfit = lm(y~x, data ) > attributes ( lmfit ) $ names [1] " coefficients " " residuals "... $ class [1] "lm" > lmfit $ coefficients ( Intercept ) X1 X2-23. 715358962 0. 413901580-0. 001014625 Introduction to Splines: Linear Regression, Polynomial Regression 27/52

Polynomial Regression in R Fitted Model Quadratic 0 100 200 300 20 10 0 10 20 Montreal Temp. January 1 to December 31, 1961 Day of Year Temperature 0 100 200 300 15 10 5 0 5 10 15 Day of Year Residual Introduction to Splines: Linear Regression, Polynomial Regression 28/52

Exercises 1. Montreal Temperature Data Jan. 1 to Dec. 31, 1961 File: Intro to splines\exercises\montreal temp 3.R Use the provided code to fit polynomial regression models of varying degree to the data for all of 1961. Models of different degree are constructed by setting the variable D (e.g., D=2 produces a quadratic model). What is the minimal degree required for the model to fit well? 2. Montreal Temperature Data Jan. 1, 1961, to Dec. 31, 1962 File: Intro to splines\exercises\montreal temp 4.R Repeat this exercise using the data from both 1961 and 1962. Introduction to Splines: Linear Regression, Polynomial Regression 29/52

An Introduction to Bayesian Inference 2 Smoothing Splines Simple Splines B-splines Overfitting and Smoothness

An Introduction to Bayesian Inference 2 Smoothing Splines Simple Splines B-splines Overfitting and Smoothness

Splines Motivation 0 200 400 600 20 10 0 10 20 Montreal Temp. January 1 to December 31, 1962 Day of Year Temperature 0 200 400 600 15 10 5 0 5 10 Day of Year Residual How is the temperature changing in the spring of 1962? y = 7.6 8.3x 0.3x 2 5.2 10 4 x 3 + 4.4 10 6 x 4 2.1 10 8 x 5 + 6.0 10 11 x 6 8.9 10 14 x 7 + 5.5 10 17 x 8 Introduction to Splines: Smoothing Splines, Simple Splines 32/52

Splines A Linear Spline for the Montreal Temperature Data 0 200 400 600 20 10 0 10 20 Montreal Temp. January 1 to December 31, 1962 Day of Year Temperature 0 200 400 600 15 10 5 0 5 10 Day of Year Residual How is the temperature changing in the spring of 1962? y = 144.5 +.3x Introduction to Splines: Smoothing Splines, Simple Splines 33/52

Splines Linear Splines Definition A linear spline is a continuous function formed by connecting linear segments. The points where the segments connect are called the knots of the spline. Introduction to Splines: Smoothing Splines, Simple Splines 34/52

Splines Higher Order Splines Definition A spline of degree D is a function formed by connecting polynomial segments of degree D so that: the function is continuous, the function has D 1 continuous derivatives, and the D th derivative is constant between knots. Introduction to Splines: Smoothing Splines, Simple Splines 35/52

Simples Splines The Truncated Polynomials Definition The truncated polynomial of degree D associated with a knot ξ k is the function which is equal to 0 to the left of ξ k and equal to (x ξ k ) D to the right of ξ k. { (x ξ k ) D 0 x < ξk + = (x ξ k ) D x ξ k The equation for a spline of degree D with K knots is: D K y = β 0 + β d x d + b k (x ξ k ) D + d=1 k=1 Introduction to Splines: Smoothing Splines, Simple Splines 36/52

Simple Splines The Design Matrix The design matrix for a spline of degree D with K knots is the n by 1 + D + K matrix with entries: 1 x 1 x1 2 x1 D (x 1 ξ 1 ) D + (x 1 ξ K ) D + 1 x 2 x2 2 x2 D (x 2 ξ 1 ) D + (x 2 ξ K ) D + 1 x 3 x3 2 x3 D (x 3 ξ 1 ) D + (x 3 ξ K ) D +. 1 x n xn 2 xn D (x n ξ 1 ) D + (x n ξ K ) D + Introduction to Splines: Smoothing Splines, Simple Splines 37/52

Simple Splines in R The Design Matrix After defining the degree and the locations of the knots, the design matrix can be generated with the functions outer and cbind: > D = 3 > K = 5 > knots = 730 * (1: K)/(K +1) > X1 = outer ( data $x,1:d,"^") > X2 = outer ( data $x,knots,">") * outer ( data $x,knots,"-")^d > X = cbind (X1,X2) > round (X[c (1,150,300),1:5],1) [,1] [,2] [,3] [,4] [,5] [1,] 1 1 1 0.0 0 [2,] 150 22500 3375000 22745.4 0 [3,] 300 90000 27000000 5671495.4 181963 > Introduction to Splines: Smoothing Splines, Simple Splines 38/52

Simple Splines in R Fitting the Spline Model lmfit = lm(y~x, data = data ) Introduction to Splines: Smoothing Splines, Simple Splines 39/52

Simple Splines in R Fitted Cubic Spline 0 200 400 600 20 10 0 10 20 Montreal Temp. January 1 to December 31, 1962 Day of Year Temperature 0 200 400 600 15 10 5 0 5 10 Day of Year Residual Introduction to Splines: Smoothing Splines, Simple Splines 40/52

Exercises 1. Montreal Temperature Data Jan. 1 to Dec. 31, 1961 File: Intro to splines\exercises\montreal temp 5.R Use the code provided to fit splines of varying degree and with different numbers of knots to the data from 1961 and 1962. Introduction to Splines: Smoothing Splines, Simple Splines 41/52

An Introduction to Bayesian Inference 2 Smoothing Splines Simple Splines B-splines Overfitting and Smoothness

The B-Spline Basis Troubles with Truncated Polynomials Splines computed from the truncated polynomials may be numerically unstable because: the values in the design matrix may be very large, and the columns of the design matrix may be highly correlated. Introduction to Splines: Smoothing Splines, B-splines 43/52

The B-spline Basis in R Generating the Design Matrix and Fitting the Model The B-spline design matrix can be constructed via the function bs provided by the splines library: > library ( splines ) > D = 3 > K = 5 > knots = 730 * (1: K)/(K +1) > X = bs( data $x, knots = knots, degree =D, intercept = TRUE ) > lmfit = lm(y~x -1, data = data ) > Introduction to Splines: Smoothing Splines, B-splines 44/52

The B-spline Basis in R Fitted Cubic B-spline Model 0 200 400 600 20 10 0 10 20 Montreal Temp. January 1, 1961, to December 31, 1962 Day of Year Temperature 0 200 400 600 15 10 5 0 5 10 Day of Year Residual Introduction to Splines: Smoothing Splines, B-splines 45/52

Exercises 1. Montreal Temperature Data Jan. 1 to Dec. 31, 1961 File: Intro to splines\exercises\montreal temp 6.R Fit B-splines to the data from 1961 and 1962 using the code in the file. Increase the number of knots to see how this affects the fit of the curve. What happens when the number of knots is very large, say K = 50? Introduction to Splines: Smoothing Splines, B-splines 46/52

An Introduction to Bayesian Inference 2 Smoothing Splines Simple Splines B-splines Overfitting and Smoothness

Overfitting and Smoothness Motivation A cubic spline with 50 knots: 0 200 400 600 20 10 0 10 20 Montreal Temp. January 1, 1961, to December 31, 1962 Day of Year Temperature Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 48/52

Overfitting and Smoothness Knot Selection Concept The shape of a spline can be controlled by carefully choosing the number of knots and their exact locations in order to: 1. allow flexibility where the trend changes quickly, and 2. avoid overfitting where the trend changes little. Challenge Choosing the number of knots and their location is a very difficult problem to solve. Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 49/52

Overfitting and Smoothness Penalization Concept We can also balance overfitting and smoothness by controlling the size of the spline coefficients. Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 50/52

Overfitting and Smoothness Penalization for Truncated Polynomials Penalization for the Linear Spline Consider the equation for each segment of the spline: (0, ξ 1 ) : y = β 0 + β 1 x (ξ 1, ξ 2 ) : y = (β 0 b 1 ξ 1 ) + (β 1 + b 1 ) x (ξ 2, ξ 3 ) : y = (β 0 b 1 ξ 1 b 2 ξ 2 ) + (β 1 + b 1 + b2) x The spline is smooth if b 1, b 2,..., b K are all close to 0. Penalized Least Squares PSS = n K (y i ŷ i ) 2 + λ i=1 k=1 b 2 k Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 51/52

Overfitting and Smoothness Penalization for the B-spline Basis Penalization for the B-spline The spline is smooth if b 1, b 2,..., b K are all close to each other. (But not necessarily close to 0.) Penalized Least Squares PSS = n K (y i ŷ i ) 2 + λ ((b k b k 1 ) (b k 1 b k 2 )) 2 i=1 k=3 Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 52/52

Overfitting and Smoothness A Penalized Cubic B-spline A penalized cubic B-spline with 50 knots and λ = 5: 0 200 400 600 20 10 0 10 20 Montreal Temp. January 1, 1961, to December 31, 1962 Day of Year Temperature Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 53/52

Exercises 1. Montreal Temperature Data Jan. 1 to Dec. 31, 1961 File: Intro to splines\exercises\montreal temp 7.R Fit penalized cubic B-splines to the Montreal temperature data for 1961 and 1962 using the provided code. Introduction to Splines: Smoothing Splines, Overfitting and Smoothness 54/52