waffle Chris Parrish June 18, 2016

Σχετικά έγγραφα
categorical Chris Parrish June 7, 2016

m4.3 Chris Parrish June 16, 2016

ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΙΓΑΙΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΤΗΣ ΙΟΙΚΗΣΗΣ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥ ΩΝ ΜΕΤΑΠΤΥΧΙΑΚΟ ΙΠΛΩΜΑ ΙΟΙΚΗΣΗΣ ΕΠΙΧΕΙΡΗΣΕΩΝ ΜΕ. Ι..Ε.

United States of America

5.1 logistic regresssion Chris Parrish July 3, 2016

SPRING /1-5/30 (PRE SPRING 12/1-5/30)

519.22(07.07) 78 : ( ) /.. ; c (07.07) , , 2008

Modern Regression HW #8 Solutions

2018 Bowling.com Youth Open Championships OFFICIAL RESULTS Division: Doubles, U20 **All scholarship points are divided amongst team members** Rank

Οδηγίες χρήσης του R, μέρος 2 ο

Wan Nor Arifin under the Creative Commons Attribution-ShareAlike 4.0 International License. 1 Introduction 1

5.6 evaluating, checking, comparing Chris Parrish July 3, 2016

Ηλεκτρονικοί Υπολογιστές IV

Wan Nor Arifin under the Creative Commons Attribution-ShareAlike 4.0 International License. 1 Introduction 1

Biostatistics for Health Sciences Review Sheet


DirichletReg: Dirichlet Regression for Compositional Data in R

ΠΟΛΛΑΠΛΗ ΠΑΛΙΝΔΡΟΜΗΣΗ: ΑΣΚΗΣΕΙΣ

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Άσκηση 10, σελ Για τη μεταβλητή x (άτυπος όγκος) έχουμε: x censored_x 1 F 3 F 3 F 4 F 10 F 13 F 13 F 16 F 16 F 24 F 26 F 27 F 28 F

Lampiran 1 Output SPSS MODEL I

Supplementary figures


[2] T.S.G. Peiris and R.O. Thattil, An Alternative Model to Estimate Solar Radiation

( ) ( ) STAT 5031 Statistical Methods for Quality Improvement. Homework n = 8; x = 127 psi; σ = 2 psi (a) µ 0 = 125; α = 0.

Γραµµική Παλινδρόµηση

Start Random numbers Distributions p-value Confidence interval.


An Introduction to Splines

Οδηγίες χρήσης του R, μέρος 1 ο. Κατεβάζουμε το λογισμικό από την ιστοσελίδα

MATHACHij = γ00 + u0j + rij

Does anemia contribute to end-organ dysfunction in ICU patients Statistical Analysis

Table 1: Military Service: Models. Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 num unemployed mili mili num unemployed

Supplementary Appendix

Ανάλυση Δεδομένων με χρήση του Στατιστικού Πακέτου R

Generalized additive models in R

Μηχανική Μάθηση Hypothesis Testing

Ανάλυση Δεδομένων με χρήση του Στατιστικού Πακέτου R

Το άτομο του Υδρογόνου

Homework 3 Solutions

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

1. Εισαγωγή...σελ Δεδομένα της εργασίας...σελ Μεθοδολογία...σελ Ανάλυση των δεδομένων.σελ Συγκριτικά αποτελέσματα..σελ.

Partial Trace and Partial Transpose

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

PENGARUHKEPEMIMPINANINSTRUKSIONAL KEPALASEKOLAHDAN MOTIVASI BERPRESTASI GURU TERHADAP KINERJA MENGAJAR GURU SD NEGERI DI KOTA SUKABUMI

ΠΑΝΔΠΗΣΖΜΗΟ ΠΑΣΡΩΝ ΣΜΖΜΑ ΖΛΔΚΣΡΟΛΟΓΩΝ ΜΖΥΑΝΗΚΩΝ ΚΑΗ ΣΔΥΝΟΛΟΓΗΑ ΤΠΟΛΟΓΗΣΩΝ ΣΟΜΔΑ ΤΣΖΜΑΣΩΝ ΖΛΔΚΣΡΗΚΖ ΔΝΔΡΓΔΗΑ

ΣΤΑΤΙΣΤΙΚΗ ΕΠΙΧΕΙΡΗΣΕΩΝ ΕΙΔΙΚΑ ΘΕΜΑΤΑ. Κεφάλαιο 13. Συμπεράσματα για τη σύγκριση δύο πληθυσμών

1.1 t Rikon * --- Signif. codes: 0 *** ** 0.01 *

Λογιστική Παλινδρόµηση

SECTION II: PROBABILITY MODELS

Μενύχτα, Πιπερίγκου, Σαββάτης. ΒΙΟΣΤΑΤΙΣΤΙΚΗ Εργαστήριο 6 ο

Approximation of distance between locations on earth given by latitude and longitude

1. A fully continuous 20-payment years, 30-year term life insurance of 2000 is issued to (35). You are given n A 1

Section 9.2 Polar Equations and Graphs

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

2. ΧΡΗΣΗ ΣΤΑΤΙΣΤΙΚΩΝ ΠΑΚΕΤΩΝ ΣΤΗ ΓΡΑΜΜΙΚΗ ΠΑΛΙΝΔΡΟΜΗΣΗ

6.3 Forecasting ARMA processes

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Transient Voltage Suppression Diodes: 1.5KE Series Axial Leaded Type 1500 W

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Solution Series 9. i=1 x i and i=1 x i.

Στοιχεία από την r-project για την επεξεργασία και χαρτογράφηση χωρική κατανομή σημειακών παρατηρήσεων

!"!"!!#" $ "# % #" & #" '##' #!( #")*(+&#!', & - #% '##' #( &2(!%#(345#" 6##7

Απλή Ευθύγραµµη Συµµεταβολή

2. ΕΠΙΛΟΓΗ ΤΟΥ ΜΕΓΕΘΟΥΣ ΤΩΝ ΠΑΡΑΤΗΡΗΣΕΩΝ

ΠΕΡΙΓΡΑΦΙΚΗ και ΕΠΑΓΩΓΙΚΗ ΣΤΑΤΙΣΤΙΚΗ

8. ΑΠΛΗ ΓΡΑΜΜΙΚΗ ΠΑΛΙΝΔΡΟΜΗΣΗ Ι

CE 530 Molecular Simulation

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Για να ελέγξουµε αν η κατανοµή µιας µεταβλητής είναι συµβατή µε την κανονική εφαρµόζουµε το test Kolmogorov-Smirnov.

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

MSWD = 1.06, probability = 0.39

UDZ Swirl diffuser. Product facts. Quick-selection. Swirl diffuser UDZ. Product code example:

Repeated measures Επαναληπτικές μετρήσεις

Εργαστήριο στατιστικής Στατιστικό πακέτο S.P.S.S.

Monolithic Crystal Filters (M.C.F.)

Simon et al. Supplemental Data Page 1

Section 8.3 Trigonometric Equations

S

Αλληλεπίδραση ακτίνων-χ με την ύλη

ST5224: Advanced Statistical Theory II

5.4 The Poisson Distribution.

ΕΡΓΑΙΑ Εθηίκεζε αμίαο κεηαπώιεζεο ζπηηηώλ κε αλάιπζε δεδνκέλωλ. Παιεάο Δπζηξάηηνο

Ψηφιακή Επεξεργασία Φωνής

DOUGLAS FIR BEETLE TRAP-SUPPRESSION STUDY STATISTICAL REPORT

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM


χ 2 test ανεξαρτησίας

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής


Math 6 SL Probability Distributions Practice Test Mark Scheme

Ηλεκτρονικοί Υπολογιστές I

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

(i) Περιγραφική ανάλυση των μεταβλητών PRICE

10. ΠΟΛΛΑΠΛΗ ΓΡΑΜΜΙΚΗ ΠΑΛΙΝΔΡΟΜΗΣΗ

Supervised Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Transcript:

waffle Chris Parrish June 18, 2016 Contents Waffle House 1 data..................................................... 2 exploratory data analysis......................................... 2 Waffle Houses............................................. 2 Marriage.s.............................................. 3 MedianAgeMarriage.s....................................... 4 MedianAgeMarriage.s and Marriage.s.............................. 5 model divorce rate ~ age at marriage................................... 6 map m5.1............................................... 6 model divorce rate ~ marriage rate..................................... 8 map m5.2............................................... 8 model divorce rate ~ age at marriage and marriage rate......................... 9 map m5.3............................................... 9 model marriage rate ~ age at marriage.................................. 10 map m5.4............................................... 10 plotting multivariate posteriors...................................... 11 predictor residual plots........................................ 11 counterfactual plots......................................... 12 posterior prediction plots...................................... 15 simulating spurious association................................... 17 waffle reference: - McElreath, Statistical Rethinking, chap 5, pp.119-164 Waffle House library(rethinking) ## Loading required package: rstan ## Loading required package: ggplot2 ## rstan (Version 2.9.0-3, packaged: 2016-02-11 15:54:41 UTC, GitRev: 05c3d0058b6a) ## For execution on a local, multicore CPU with excess RAM we recommend calling ## rstan_options(auto_write = TRUE) ## options(mc.cores = parallel::detectcores()) ## Loading required package: parallel ## rethinking (Version 1.58) library(ggplot2) 1

data ## R code 5.1 # load data data(waffledivorce) d <- WaffleDivorce str(d) ## 'data.frame': 50 obs. of 13 variables: ## $ Location : Factor w/ 50 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10... ## $ Loc : Factor w/ 50 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10... ## $ Population : num 4.78 0.71 6.33 2.92 37.25... ## $ MedianAgeMarriage: num 25.3 25.2 25.8 24.3 26.8 25.7 27.6 26.6 29.7 26.4... ## $ Marriage : num 20.2 26 20.3 26.4 19.1 23.5 17.1 23.1 17.7 17... ## $ Marriage.SE : num 1.27 2.93 0.98 1.7 0.39 1.24 1.06 2.89 2.53 0.58... ## $ Divorce : num 12.7 12.5 10.8 13.5 8 11.6 6.7 8.9 6.3 8.5... ## $ Divorce.SE : num 0.79 2.05 0.74 1.22 0.24 0.94 0.77 1.39 1.89 0.32... ## $ WaffleHouses : int 128 0 18 41 0 11 0 3 0 133... ## $ South : int 1 0 0 1 0 0 0 0 0 1... ## $ Slaves1860 : int 435080 0 0 111115 0 0 0 1798 0 61745... ## $ Population1860 : int 964201 0 0 435450 379994 34277 460147 112216 75080 140424... ## $ PropSlaves1860 : num 0.45 0 0 0.26 0 0 0 0.016 0 0.44... head(d) ## Location Loc Population MedianAgeMarriage Marriage Marriage.SE Divorce ## 1 Alabama AL 4.78 25.3 20.2 1.27 12.7 ## 2 Alaska AK 0.71 25.2 26.0 2.93 12.5 ## 3 Arizona AZ 6.33 25.8 20.3 0.98 10.8 ## 4 Arkansas AR 2.92 24.3 26.4 1.70 13.5 ## 5 California CA 37.25 26.8 19.1 0.39 8.0 ## 6 Colorado CO 5.03 25.7 23.5 1.24 11.6 ## Divorce.SE WaffleHouses South Slaves1860 Population1860 PropSlaves1860 ## 1 0.79 128 1 435080 964201 0.45 ## 2 2.05 0 0 0 0 0.00 ## 3 0.74 18 0 0 0 0.00 ## 4 1.22 41 1 111115 435450 0.26 ## 5 0.24 0 0 0 379994 0.00 ## 6 0.94 11 0 0 34277 0.00 exploratory data analysis Waffle Houses ggplot(d, aes(x = WaffleHouses / Population, y = Divorce)) + geom_point(shape = 20, color = "darkred") + geom_smooth(method = "lm") + labs(x = "Waffle Houses per million", y = "Divorce rate") 2

14 12 Divorce rate 10 8 6 0 10 20 30 40 Waffle Houses per million Marriage.s Standardize marriage. d$marriage.s <- (d$marriage - mean(d$marriage))/sd(d$marriage) ggplot(d, aes(marriage.s, Divorce)) + geom_point(aes(x = Marriage.s, y = Divorce), shape = 20, color = "darkred") + geom_smooth(method = "lm") + labs(x = "Marriage.s", y = "Divorce") 3

12 Divorce 10 8 6 1 0 1 2 3 Marriage.s M edianagem arriage.s Standardize median age at marriage. # standardize predictor d$medianagemarriage.s <- (d$medianagemarriage-mean(d$medianagemarriage))/ sd(d$medianagemarriage) ggplot(d, aes(medianagemarriage.s, Divorce)) + geom_point(aes(x = MedianAgeMarriage.s, y = Divorce), shape = 20, color = "darkred") + geom_smooth(method = "lm") + labs(x = "MedianAgeMarriage.s", y = "Divorce") 4

12 Divorce 10 8 6 2 1 0 1 2 3 MedianAgeMarriage.s M edianagem arriage.s and M arriage.s How are marriage rate and median age at marriage related? ggplot(d, aes(medianagemarriage.s, Marriage.s)) + geom_point(shape = 20, color = "darkred") + geom_smooth(method = "lm") + labs(x = "MedianAgeMarriage.s", y = "Marriage.s") 5

3 2 1 Marriage.s 0 1 2 3 2 1 0 1 2 3 MedianAgeMarriage.s model divorce rate ~ age at marriage. D i Normal(µ i, σ) µ i = α + β A A i α Normal(10, 10) β A Normal(0, 1) σ Uniform(0, 10) map m5.1 # fit model m5.1 <- map( alist( Divorce ~ dnorm( mu, sigma ), mu <- a + ba * MedianAgeMarriage.s, a ~ dnorm( 10, 10 ), ba ~ dnorm( 0, 1 ), sigma ~ dunif( 0, 10 ) ), data = d ) precis(m5.1) ## Mean StdDev 5.5% 94.5% ## a 9.69 0.20 9.36 10.01 6

## ba -1.04 0.20-1.37-0.72 ## sigma 1.45 0.14 1.22 1.68 plot( precis(m5.1) ) a ba sigma 0 2 4 6 8 10 Value ## R code 5.2 # compute percentile interval of mean MAM.seq <- seq( from=-3, to=3.5, length.out=30 ) mu <- link( m5.1, data=data.frame(medianagemarriage.s=mam.seq) ) ## [ 100 / 1000 ] [ 200 / 1000 ] [ 300 / 1000 ] [ 400 / 1000 ] [ 500 / 1000 ] [ 600 / 1000 ] [ 700 / 1000 ] [ 800 / 1000 ] [ 900 / 1000 ] [ 1000 / 1000 ] mu.pi <- apply( mu, 2, PI ) # plot it all plot( Divorce ~ MedianAgeMarriage.s, data=d, col=rangi2 ) abline( m5.1 ) ## Warning in abline(m5.1): only using the first two of 3 regression ## coefficients shade( mu.pi, MAM.seq ) 7

Divorce 6 8 10 12 2 1 0 1 2 3 MedianAgeMarriage.s model divorce rate ~ marriage rate. D i Normal(µ i, σ) µ i = α + β R R i α Normal(10, 10) β R Normal(0, 1) σ Uniform(0, 10) map m5.2 ## R code 5.3 d$marriage.s <- (d$marriage - mean(d$marriage))/sd(d$marriage) m5.2 <- map( alist( Divorce ~ dnorm( mu, sigma ), mu <- a + br * Marriage.s, a ~ dnorm( 10, 10 ), br ~ dnorm( 0, 1 ), sigma ~ dunif( 0, 10 ) ), data = d ) precis(m5.2) ## Mean StdDev 5.5% 94.5% ## a 9.69 0.24 9.31 10.07 ## br 0.64 0.23 0.27 1.02 ## sigma 1.67 0.17 1.40 1.94 plot( precis(m5.2) ) 8

a br sigma 0 2 4 6 8 10 Value model divorce rate ~ age at marriage and marriage rate. D i Normal(µ i, σ) µ i = α + β R R i + β A A i α Normal(10, 10) β R Normal(0, 1) β A Normal(0, 1) σ Uniform(0, 10) map m5.3 ## R code 5.4 m5.3 <- map( alist( Divorce ~ dnorm( mu, sigma ), mu <- a + br*marriage.s + ba*medianagemarriage.s, a ~ dnorm( 10, 10 ), br ~ dnorm( 0, 1 ), ba ~ dnorm( 0, 1 ), sigma ~ dunif( 0, 10 ) ), data = d ) precis( m5.3 ) ## Mean StdDev 5.5% 94.5% ## a 9.69 0.20 9.36 10.01 ## br -0.13 0.28-0.58 0.31 ## ba -1.13 0.28-1.58-0.69 ## sigma 1.44 0.14 1.21 1.67 ## R code 5.5 plot( precis(m5.3) ) 9

a br ba sigma 2 0 2 4 6 8 10 Value model marriage rate ~ age at marriage map m5.4 ## R code 5.6 m5.4 <- map( alist( Marriage.s ~ dnorm( mu, sigma ), mu <- a + b*medianagemarriage.s, a ~ dnorm( 0, 10 ), b ~ dnorm( 0, 1 ), sigma ~ dunif( 0, 10 ) ), data = d ) precis(m5.4) ## Mean StdDev 5.5% 94.5% ## a 0.00 0.10-0.16 0.16 ## b -0.71 0.10-0.87-0.56 ## sigma 0.69 0.07 0.58 0.80 plot( precis(m5.4) ) 10

a b sigma 0.5 0.0 0.5 Value plotting multivariate posteriors predictor residual plots ## R code 5.7 # compute expected value at MAP, for each State mu <- coef(m5.4)['a'] + coef(m5.4)['b']*d$medianagemarriage.s # compute residual for each State m.resid <- d$marriage.s - mu ## R code 5.8 plot( Marriage.s ~ MedianAgeMarriage.s, d, col=rangi2 ) abline( m5.4 ) ## Warning in abline(m5.4): only using the first two of 3 regression ## coefficients # loop over States for ( i in 1:length(m.resid) ) { x <- d$medianagemarriage.s[i] # x location of line segment y <- d$marriage.s[i] # observed endpoint of line segment # draw the line segment lines( c(x,x), c(mu[i],y), lwd=0.5, col=col.alpha("black",0.7) ) } 11

Marriage.s 1 0 1 2 2 1 0 1 2 3 MedianAgeMarriage.s counterfactual plots ## R code 5.9 # prepare new counterfactual data A.avg <- mean( d$medianagemarriage.s ) R.seq <- seq( from=-3, to=3, length.out=30 ) pred.data <- data.frame( Marriage.s=R.seq, MedianAgeMarriage.s=A.avg ) # compute counterfactual mean divorce (mu) mu <- link( m5.3, data=pred.data ) ## [ 100 / 1000 ] [ 200 / 1000 ] [ 300 / 1000 ] [ 400 / 1000 ] [ 500 / 1000 ] [ 600 / 1000 ] [ 700 / 1000 ] [ 800 / 1000 ] [ 900 / 1000 ] [ 1000 / 1000 ] mu.mean <- apply( mu, 2, mean ) mu.pi <- apply( mu, 2, PI ) # simulate counterfactual divorce outcomes R.sim <- sim( m5.3, data=pred.data, n=1e4 ) ## [ 1000 / 10000 ] 12

[ 2000 / 10000 ] [ 3000 / 10000 ] [ 4000 / 10000 ] [ 5000 / 10000 ] [ 6000 / 10000 ] [ 7000 / 10000 ] [ 8000 / 10000 ] [ 9000 / 10000 ] [ 10000 / 10000 ] R.PI <- apply( R.sim, 2, PI ) # display predictions, hiding raw data with type="n" plot( Divorce ~ Marriage.s, data=d, type="n" ) mtext( "MedianAgeMarriage.s = 0" ) lines( R.seq, mu.mean ) shade( mu.pi, R.seq ) shade( R.PI, R.seq ) MedianAgeMarriage.s = 0 Divorce 6 8 10 12 1 0 1 2 Marriage.s ## R code 5.10 R.avg <- mean( d$marriage.s ) A.seq <- seq( from=-3, to=3.5, length.out=30 ) pred.data2 <- data.frame( Marriage.s=R.avg, MedianAgeMarriage.s=A.seq ) mu <- link( m5.3, data=pred.data2 ) ## [ 100 / 1000 ] [ 200 / 1000 ] [ 300 / 1000 ] [ 400 / 1000 ] 13

[ 500 / 1000 ] [ 600 / 1000 ] [ 700 / 1000 ] [ 800 / 1000 ] [ 900 / 1000 ] [ 1000 / 1000 ] mu.mean <- apply( mu, 2, mean ) mu.pi <- apply( mu, 2, PI ) A.sim <- sim( m5.3, data=pred.data2, n=1e4 ) ## [ 1000 / 10000 ] [ 2000 / 10000 ] [ 3000 / 10000 ] [ 4000 / 10000 ] [ 5000 / 10000 ] [ 6000 / 10000 ] [ 7000 / 10000 ] [ 8000 / 10000 ] [ 9000 / 10000 ] [ 10000 / 10000 ] A.PI <- apply( A.sim, 2, PI ) plot( Divorce ~ MedianAgeMarriage.s, data=d, type="n" ) mtext( "Marriage.s = 0" ) lines( A.seq, mu.mean ) shade( mu.pi, A.seq ) shade( A.PI, A.seq ) Marriage.s = 0 Divorce 6 8 10 12 2 1 0 1 2 3 MedianAgeMarriage.s 14

posterior prediction plots ## R code 5.11 # call link without specifying new data # so it uses original data mu <- link( m5.3 ) ## [ 100 / 1000 ] [ 200 / 1000 ] [ 300 / 1000 ] [ 400 / 1000 ] [ 500 / 1000 ] [ 600 / 1000 ] [ 700 / 1000 ] [ 800 / 1000 ] [ 900 / 1000 ] [ 1000 / 1000 ] # summarize samples across cases mu.mean <- apply( mu, 2, mean ) mu.pi <- apply( mu, 2, PI ) # simulate observations # again no new data, so uses original data divorce.sim <- sim( m5.3, n=1e4 ) ## [ 1000 / 10000 ] [ 2000 / 10000 ] [ 3000 / 10000 ] [ 4000 / 10000 ] [ 5000 / 10000 ] [ 6000 / 10000 ] [ 7000 / 10000 ] [ 8000 / 10000 ] [ 9000 / 10000 ] [ 10000 / 10000 ] divorce.pi <- apply( divorce.sim, 2, PI ) ## R code 5.12 plot( mu.mean ~ d$divorce, col=rangi2, ylim=range(mu.pi), xlab="observed divorce", ylab="predicted divorce" ) abline( a=0, b=1, lty=2 ) for ( i in 1:nrow(d) ) lines( rep(d$divorce[i],2), c(mu.pi[1,i],mu.pi[2,i]), col=rangi2 ) ## R code 5.13 identify( x=d$divorce, y=mu.mean, labels=d$loc, cex=0.8 ) 15

Predicted divorce 6 8 10 12 ## integer(0) 6 8 10 12 Observed divorce ## R code 5.14 # compute residuals divorce.resid <- d$divorce - mu.mean # get ordering by divorce rate o <- order(divorce.resid) # make the plot dotchart( divorce.resid[o], labels=d$loc[o], xlim=c(-6,5), cex=0.6 ) abline( v=0, col=col.alpha("black",0.2) ) for ( i in 1:nrow(d) ) { j <- o[i] # which State in order lines( d$divorce[j]-c(mu.pi[1,j],mu.pi[2,j]), rep(i,2) ) points( d$divorce[j]-c(divorce.pi[1,j],divorce.pi[2,j]), rep(i,2), pch=3, cex=0.6, col="gray" ) } 16

ME AR AL AK KY OK GA CO RI LA MS IN NH TN AZ SD OR VT WV NM WA MD MA KS IA OH NC DC DE MI TX VA HI MO WY IL MT FL CA NY PA WI SC NE UT CT ND MN NJ ID 6 4 2 0 2 4 simulating spurious association ## R code 5.15 N <- 100 # number of cases x_real <- rnorm( N ) # x_real as Gaussian with mean 0 and stddev 1 x_spur <- rnorm( N, x_real ) # x_spur as Gaussian with mean=x_real y <- rnorm( N, x_real ) # y as Gaussian with mean=x_real d <- data.frame(y,x_real,x_spur) # bind all together in data frame pairs(~ y + x_real + x_spur, data=d, col="darkred") 17

2 1 0 1 2 y 3 1 1 3 2 0 1 2 x_real x_spur 2 0 1 2 3 3 2 1 0 1 2 3 2 1 0 1 2 3 demo.lm <- lm(y ~ x_real + x_spur, data=d) options(show.signif.stars=false) summary(demo.lm) ## ## Call: ## lm(formula = y ~ x_real + x_spur, data = d) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.5484-0.5838 0.1744 0.6885 2.3940 ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) -0.003954 0.103260-0.038 0.970 ## x_real 0.864603 0.159877 5.408 4.58e-07 ## x_spur 0.059178 0.098300 0.602 0.549 ## ## Residual standard error: 1.025 on 97 degrees of freedom ## Multiple R-squared: 0.368, Adjusted R-squared: 0.355 ## F-statistic: 28.24 on 2 and 97 DF, p-value: 2.158e-10 18