DirichletReg: Dirichlet Regression for Compositional Data in R

Σχετικά έγγραφα
DirichletReg: Dirichlet Regression for Compositional Data in R

Modern Regression HW #8 Solutions


5.1 logistic regresssion Chris Parrish July 3, 2016

Wan Nor Arifin under the Creative Commons Attribution-ShareAlike 4.0 International License. 1 Introduction 1

Wan Nor Arifin under the Creative Commons Attribution-ShareAlike 4.0 International License. 1 Introduction 1

519.22(07.07) 78 : ( ) /.. ; c (07.07) , , 2008

Λογιστική Παλινδρόµηση

Generalized additive models in R

waffle Chris Parrish June 18, 2016

Biostatistics for Health Sciences Review Sheet

Γραµµική Παλινδρόµηση

Supplementary figures

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Εργασία. στα. Γενικευμένα Γραμμικά Μοντέλα

Επιστηµονική Επιµέλεια ρ. Γεώργιος Μενεξές. Εργαστήριο Γεωργίας. Viola adorata

Table 1: Military Service: Models. Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 num unemployed mili mili num unemployed

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Does anemia contribute to end-organ dysfunction in ICU patients Statistical Analysis

Supplementary Appendix

Supplementary Information 1.

Άσκηση 10, σελ Για τη μεταβλητή x (άτυπος όγκος) έχουμε: x censored_x 1 F 3 F 3 F 4 F 10 F 13 F 13 F 16 F 16 F 24 F 26 F 27 F 28 F

Lampiran 1 Output SPSS MODEL I

Λογαριθμικά Γραμμικά Μοντέλα Poisson Παλινδρόμηση Παράδειγμα στο SPSS

ΠΟΛΛΑΠΛΗ ΠΑΛΙΝΔΡΟΜΗΣΗ: ΑΣΚΗΣΕΙΣ

PENGARUHKEPEMIMPINANINSTRUKSIONAL KEPALASEKOLAHDAN MOTIVASI BERPRESTASI GURU TERHADAP KINERJA MENGAJAR GURU SD NEGERI DI KOTA SUKABUMI

R & R- Studio. Πασχάλης Θρήσκος PhD Λάρισα

Ανάλυση της ιακύµανσης

Μαντζούνη, Πιπερίγκου, Χατζή. ΒΙΟΣΤΑΤΙΣΤΙΚΗ Εργαστήριο 5 ο

Interpretation of linear, logistic and Poisson regression models with transformed variables and its implementation in the R package tlm

Μενύχτα, Πιπερίγκου, Σαββάτης. ΒΙΟΣΤΑΤΙΣΤΙΚΗ Εργαστήριο 5 ο

SECTION II: PROBABILITY MODELS

2. ΕΠΙΛΟΓΗ ΤΟΥ ΜΕΓΕΘΟΥΣ ΤΩΝ ΠΑΡΑΤΗΡΗΣΕΩΝ

!"!"!!#" $ "# % #" & #" '##' #!( #")*(+&#!', & - #% '##' #( &2(!%#(345#" 6##7

Μενύχτα, Πιπερίγκου, Σαββάτης. ΒΙΟΣΤΑΤΙΣΤΙΚΗ Εργαστήριο 6 ο

Για να ελέγξουµε αν η κατανοµή µιας µεταβλητής είναι συµβατή µε την κανονική εφαρµόζουµε το test Kolmogorov-Smirnov.

t ts P ALEPlot t t P rt P ts r P ts t r P ts

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Ανάλυση Δεδομένων με χρήση του Στατιστικού Πακέτου R

FORMULAS FOR STATISTICS 1

An Introduction to Splines

Απλή Ευθύγραµµη Συµµεταβολή

794 Appendix A:Tables

1.1 t Rikon * --- Signif. codes: 0 *** ** 0.01 *

1. Ιστόγραμμα. Προκειμένου να αλλάξουμε το εύρος των bins κάνουμε διπλό κλικ οπουδήποτε στο ιστόγραμμα και μετά

( ) ( ) STAT 5031 Statistical Methods for Quality Improvement. Homework n = 8; x = 127 psi; σ = 2 psi (a) µ 0 = 125; α = 0.

Αν οι προϋποθέσεις αυτές δεν ισχύουν, τότε ανατρέχουµε σε µη παραµετρικό τεστ.

(i) Περιγραφική ανάλυση των μεταβλητών PRICE

Funktionsdauer von Batterien in Abhängigkeit des verwendeten Materials und der Umgebungstemperatur

1. Εισαγωγή...σελ Δεδομένα της εργασίας...σελ Μεθοδολογία...σελ Ανάλυση των δεδομένων.σελ Συγκριτικά αποτελέσματα..σελ.

Table A.1 Random numbers (section 1)

5.6 evaluating, checking, comparing Chris Parrish July 3, 2016

Αυτό το κεφάλαιο εξηγεί τις ΠΑΡΑΜΕΤΡΟΥΣ προς χρήση αυτού του προϊόντος. Πάντα να μελετάτε αυτές τις οδηγίες πριν την χρήση.

ΕΡΓΑΙΑ Εθηίκεζε αμίαο κεηαπώιεζεο ζπηηηώλ κε αλάιπζε δεδνκέλωλ. Παιεάο Δπζηξάηηνο

Γενικευµένα Γραµµικά Μοντέλα

Ηλεκτρονικοί Υπολογιστές IV

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679


Εφαρμοσμένη Στατιστική Δημήτριος Μπάγκαβος Τμήμα Μαθηματικών και Εφαρμοσμένων Μαθηματικών Πανεπισ τήμιο Κρήτης 28 Μαρτίου /36

Introduction to the ML Estimation of ARMA processes

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Σεμινάριο Προηγμένα Θέματα Στατιστικής. Dr. Nikolaos Mittas Dr. Theodosios Theodosiou

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

!"#$%& '!(#)& a<.21c67.<9 /06 :6>/ 54.6: 1. ]1;A76 _F -. /06 4D26.36 <> A.:4D6:6C C4/4 /06 D:43? C</ O=47?6C b*dp 12 :1?6:E /< D6 3:4221N6C 42 D:A6 O=

ΟΙΚΟΝΟΜΕΤΡΙΑ. Βιολέττα Δάλλα. Εθνικό και Καποδιστριακό Πανεπιστήµιο Αθηνών

+ ε βελτιώνει ουσιαστικά το προηγούμενο (β 3 = 0;) 2. Εξετάστε ποιο από τα παρακάτω τρία μοντέλα:

LAMPIRAN. Lampiran I Daftar sampel Perusahaan No. Kode Nama Perusahaan. 1. AGRO PT Bank Rakyat Indonesia AgroniagaTbk.

ΠΕΡΙΓΡΑΦΙΚΗ ΣΤΑΤΙΣΤΙΚΗ ΚΑΙ ΕΛΕΓΧΟΣ ΥΠΟΘΕΣΕΩΝ

Supplementary Material for The Cusp Catastrophe Model as Cross-Sectional and Longitudinal Mixture Structural Equation Models

Άσκηση 1. Πληθυσμός (Χ i1 )

Άσκηση 2. i β. 1 ου έτους (Υ i )

8. ΑΠΛΗ ΓΡΑΜΜΙΚΗ ΠΑΛΙΝΔΡΟΜΗΣΗ Ι

ΔPersediaan = Persediaan t+1 - Persediaan t

Γενικευμένα Γραμμικά Μοντέλα (GLM) Επισκόπηση

APPENDIX B NETWORK ADJUSTMENT REPORTS JEFFERSON COUNTY, KENTUCKY JEFFERSON COUNTY, KENTUCKY JUNE 2016

ΑΝΑΛΥΣΗ ΠΑΛΙΝΔΡΟΜΗΣΗΣ,

DOUGLAS FIR BEETLE TRAP-SUPPRESSION STUDY STATISTICAL REPORT

Ανάλυση Δεδομένων με χρήση του Στατιστικού Πακέτου R

$ι ιι η ι ι!η ηι ι ANOVA. To ANOVA ι ι ι η η η ιη (Analysis of Variance). * ι! ι ι ι ι ι η ιη. ;, ι ι ι! η ιι ηιη ι ι!η ι η η ιη ι ι η ι η.

Lampiran 2 Hasil Kuesioner No. BA1 BA2 BA3 BA4 PQ1 PQ2 PQ3 PQ4 PQ

Εργαστήριο στατιστικής Στατιστικό πακέτο S.P.S.S.

ΑΝΑΛΥΣΗ Ε ΟΜΕΝΩΝ. 7. Παλινδρόµηση


ΠΕΡΙΓΡΑΦΙΚΗ και ΕΠΑΓΩΓΙΚΗ ΣΤΑΤΙΣΤΙΚΗ

MATHACHij = γ00 + u0j + rij

Description of the PX-HC algorithm

χ 2 test ανεξαρτησίας

Προσοµοίωση Εξέτασης στο µάθηµα του Γεωργικού Πειραµατισµού


ο),,),--,ο< $ι ιι!η ι ηι ι ιι ιι t (t-test): ι ι η ι ι. $ι ι η ι ι ι 2 x s ι ι η η ιη ι η η SE x

Στατιστική Ανάλυση Δεδομένων II. Γραμμική Παλινδρόμηση με το S.P.S.S.

Α. Μπατσίδης Πρόχειρες βοηθητικές διδακτικές σημειώσεις

Supplemental tables and figures

ΜΑΘΗΜΑ 3ο. Υποδείγματα μιας εξίσωσης

3 Regressionsmodelle für Zähldaten

Szabolcs Sofalvi, M.S., D-ABFT-FT Cleveland, Ohio

ΠΑΡΑΡΤΗΜΑ 1 (4 ου ΜΑΘΗΜΑΤΟΣ): ΠΑΡΑ ΕΙΓΜΑΤΑ ΕΛΕΓΧΩΝ ΥΠΟΘΕΣΕΩΝ, ΕΠΙΛΟΓΗΣ ΜΟΝΤΕΛΩΝ ΚΑΙ ΜΕΤΑΒΛΗΤΩΝ

ΑΝΑΛΥΣΗ ΔΕΔΟΜΕΝΩΝ. Δρ. Βασίλης Π. Αγγελίδης Τμήμα Μηχανικών Παραγωγής & Διοίκησης Δημοκρίτειο Πανεπιστήμιο Θράκης

Research on Economics and Management

Λυμένες Ασκήσεις για το μάθημα:

Transcript:

DirichletReg: Dirichlet Regression for Compositional Data in R Marco J. Maier Wirtschaftsuniversität Wien Abstract... Keywords: Dirichlet regression, Dirichlet distribution, multivariate generalized linear model, rates, proportions, rates, compositional data, simplex, R. 4. Application examples 4.1. The Arctic lake (common parametrization) > library(dirichletreg) > head(arcticlake) sand silt clay depth 1 0.775 0.195 0.030 10.4 2 0.719 0.249 0.032 11.7 3 0.507 0.361 0.132 12.8 4 0.522 0.409 0.066 13.0 5 0.700 0.265 0.035 15.7 6 0.665 0.322 0.013 16.3 > AL <- DR_data(ArcticLake[, 1:3]) > AL[1:6, ] sand silt clay 1 0.7750000 0.1950000 0.0300000 2 0.7190000 0.2490000 0.0320000 3 0.5070000 0.3610000 0.1320000 4 0.5235707 0.4102307 0.0661986 5 0.7000000 0.2650000 0.0350000 6 0.6650000 0.3220000 0.0130000 > lake1 <- DirichReg(AL ~ depth, ArcticLake) > lake1 Call: DirichReg(formula = AL ~ depth, data = ArcticLake) using the common parametrization

2 DirichletReg: Dirichlet Regression for Compositional Data in R > par(mfrow = c(2, 1)) > plot(al, cex = 0.5, a2d = list(colored = FALSE, c.grid = FALSE)) > plot(rep(arcticlake$depth, 3), as.numeric(al), pch = 21, bg = rep(c("#e495a5", + "#86B875", "#7DB0DD"), each = 39), xlab = "Depth (m)", ylab = "Proportion", + ylim = 0:1) sand silt.9.8.7.6.5.4.3.2.1.9.8.7.6.5.4.3.1.2.3.4.5.6.7.8.9.2.1 clay Figure 1: Arctic lake: Ternary plot and depth vs. composition.

Marco J. Maier 3 Log-likelihood: 101.4 on 6 df (54+3 iterations) ---------------------------------------- Coefficients for variable no. 1: sand (Intercept) depth 0.11662 0.02335 ---------------------------------------- Coefficients for variable no. 2: silt (Intercept) depth -0.31060 0.05557 ---------------------------------------- Coefficients for variable no. 3: clay (Intercept) depth -1.1520 0.0643 ---------------------------------------- > coef(lake1) $sand (Intercept) depth 0.11662480 0.02335114 $silt (Intercept) depth -0.31059591 0.05556745 $clay (Intercept) depth -1.15195642 0.06430175 > lake2 <- update(lake1,. ~. + I(depth^2). + I(depth^2). + I(depth^2)) > anova(lake1, lake2) Analysis of Deviance Table Model 1: DirichReg(formula = AL ~ depth, data = ArcticLake) Model 2: DirichReg(formula = AL ~ depth + I(depth^2) depth + I(depth^2) depth + I(depth^2), data = ArcticLake) Deviance N. par Difference df p-value Model 1-202.7393 6 - - - Model 2-217.9937 9 15.25441 3 0.001611655 > summary(lake2) Call: DirichReg(formula = AL ~ depth + I(depth^2) depth + I(depth^2) depth + I(depth^2), data = ArcticLake) Standardized Residuals: Min 1Q Median 3Q Max sand -1.7647-0.7080-0.1786 0.9598 3.0460 silt -1.1379-0.5330-0.1546 0.2788 1.5604 clay -1.7661-0.6583-0.0454 0.6584 2.0152 Beta-Coefficients for variable no. 1: sand (Intercept) 1.4361967 0.8026814 1.789 0.0736.

4 DirichletReg: Dirichlet Regression for Compositional Data in R depth -0.0072383 0.0329433-0.220 0.8261 I(depth^2) 0.0001324 0.0002761 0.480 0.6315 Beta-Coefficients for variable no. 2: silt (Intercept) -0.0259705 0.7598827-0.034 0.9727 depth 0.0717450 0.0343089 2.091 0.0365 * I(depth^2) -0.0002679 0.0003088-0.867 0.3857 Beta-Coefficients for variable no. 3: clay (Intercept) -1.7931487 0.7362293-2.436 0.01487 * depth 0.1107906 0.0357705 3.097 0.00195 ** I(depth^2) -0.0004872 0.0003308-1.473 0.14079 Signif. codes: `***' <.001, `**' < 0.01, `*' < 0.05, `.' < 0.1 Log-likelihood: 109 on 9 df (168+2 iterations) AIC: -200, BIC: -185.0217 Number of Observations: 39 Link: Log Parametrization: common 4.2. Blood samples (alternative parametrization) > Bld <- BloodSamples > Bld$Smp <- DR_data(Bld[, 1:4]) > blood1 <- DirichReg(Smp ~ Disease 1, Bld, model = "alternative", base = 3) > blood2 <- DirichReg(Smp ~ Disease Disease, Bld, model = "alternative", base = 3) > anova(blood1, blood2) Analysis of Deviance Table Model 1: DirichReg(formula = Smp ~ Disease 1, data = Bld, model = "alternative", base = 3) Model 2: DirichReg(formula = Smp ~ Disease Disease, data = Bld, model = "alternative", base = 3) Deviance N. par Difference df p-value Model 1-303.8560 7 - - - Model 2-304.6147 8 0.7586655 1 0.3837465 > summary(blood1) Call: DirichReg(formula = Smp ~ Disease 1, data = Bld, model = "alternative", base = 3) Standardized Residuals: Min 1Q Median 3Q Max Albumin -2.1310-0.9307-0.1234 0.8149 2.8429 Pre.Albumin -1.0687-0.4054-0.0789 0.1947 1.5691 Globulin.A -2.0503-1.0392 0.1938 0.7927 2.2393 Globulin.B -1.8176-0.5347 0.1488 0.5115 1.3284 MEAN MODELS: Coefficients for variable no. 1: Albumin (Intercept) 1.11639 0.09935 11.237 <2e-16 ***

Marco J. Maier 5 > par(mar = c(4, 4, 4, 4) + 0.1) > plot(rep(arcticlake$depth, 3), as.numeric(al), pch = 21, bg = rep(c("#e495a5", + "#86B875", "#7DB0DD"), each = 39), xlab = "Depth (m)", ylab = "Proportion", + ylim = 0:1, main = "Sediment Composition in an Arctic Lake") > Xnew <- data.frame(depth = seq(min(arcticlake$depth), max(arcticlake$depth), + length.out = 100)) > for (i in 1:3) lines(cbind(xnew, predict(lake2, Xnew)[, i]), col = c("#e495a5", + "#86B875", "#7DB0DD")[i], lwd = 2) > legend("topleft", legend = c("sand", "Silt", "Clay"), lwd = 2, col = c("#e495a5", + "#86B875", "#7DB0DD"), pt.bg = c("#e495a5", "#86B875", "#7DB0DD"), pch = 21, + bty = "n") > par(new = TRUE) > plot(cbind(xnew, predict(lake2, Xnew, F, F, T)), lty = "24", type = "l", ylim = c(0, + max(predict(lake2, Xnew, F, F, T))), axes = F, ann = F, lwd = 2) > axis(4) > mtext(expression(paste("precision (", phi, ")", sep = "")), 4, line = 3) > legend("top", legend = c(expression(hat(mu[c] == hat(alpha)[c]/hat(alpha)[0])), + expression(hat(phi) == hat(alpha)[0])), lty = c(1, 2), lwd = c(3, 2), bty = "n") Sediment Composition in an Arctic Lake Proportion 0.0 0.2 0.4 0.6 0.8 1.0 Sand Silt Clay ^ µ c = α^c α^0 φ^ = α^0 0 50 100 150 Precision (φ) 20 40 60 80 100 Depth (m) Figure 2: Arctic lake: Fitted values of the quadratic model.

6 DirichletReg: Dirichlet Regression for Compositional Data in R > AL <- ArcticLake > AL$AL <- DR_data(ArcticLake[, 1:3]) > dd <- range(arcticlake$depth) > X <- data.frame(depth = seq(dd[1], dd[2], length.out = 200)) > pp <- predict(dirichreg(al ~ depth + I(depth^2), AL), X) > plot(al$al, cex = 0.1, reset_par = FALSE) > points(dirichletreg:::coord.trafo(al$al[, c(2, 3, 1)]), pch = 16, cex = 0.5, + col = gray(0.5)) > lines(dirichletreg:::coord.trafo(pp[, c(2, 3, 1)]), lwd = 3, col = c("#6e1d34", + "#004E42")[2]) > Dols <- log(cbind(arcticlake[, 2]/ArcticLake[, 1], ArcticLake[, 3]/ArcticLake[, + 1])) > ols <- lm(dols ~ depth + I(depth^2), ArcticLake) > p2 <- predict(ols, X) > p2m <- exp(cbind(0, p2[, 1], p2[, 2]))/rowSums(exp(cbind(0, p2[, 1], p2[, 2]))) > lines(dirichletreg:::coord.trafo(p2m[, c(2, 3, 1)]), lwd = 3, col = c("#6e1d34", + "#004E42")[1], lty = "21") sand.1.9.4.3.2.8.7.6.5.5.6.4.7.3.9.8.2.1 silt.1.2.3.4.5.6.7.8.9 clay Figure 3: Arctic lake: OLS (dashed) vs. Dirichlet regression (solid) predictions.

Marco J. Maier 7 > par(mfrow = c(1, 4)) > for (i in 1:4) { + boxplot(bld$smp[, i] ~ Bld$Disease, ylim = range(bld$smp[, 1:4]), main = paste(names(bld)[i]), + xlab = "Disease Type", ylab = "Proportion") + segments(c(-5, 1.5), unique(fitted(blood2)[, i]), c(1.5, 5), unique(fitted(blood2)[, + i]), lwd = 3, lty = 2) + } Albumin Pre.Albumin Globulin.A Globulin.B Proportion 0.1 0.2 0.3 0.4 0.5 Proportion 0.1 0.2 0.3 0.4 0.5 Proportion 0.1 0.2 0.3 0.4 0.5 Proportion 0.1 0.2 0.3 0.4 0.5 A B A B A B A B Disease Type Disease Type Disease Type Disease Type Figure 4: Blood samples: Box plots and fitted values (dashed lines indicate the fitted values for each group). DiseaseB -0.07002 0.13604-0.515 0.607 Coefficients for variable no. 2: Pre.Albumin (Intercept) 0.5490 0.1082 5.076 3.86e-07 *** DiseaseB -0.1276 0.1493-0.855 0.393 Coefficients for variable no. 3: Globulin.A - variable omitted (reference category) - Coefficients for variable no. 4: Globulin.B (Intercept) 0.4863 0.1094 4.445 8.8e-06 *** DiseaseB 0.1819 0.1472 1.236 0.216 PRECISION MODEL: (Intercept) 4.2227 0.1475 28.64 <2e-16 *** Signif. codes: `***' <.001, `**' < 0.01, `*' < 0.05, `.' < 0.1 Log-likelihood: 151.9 on 7 df (56+2 iterations) AIC: -289.9, BIC: -280.0476 Number of Observations: 30 Links: Logit (Means) and Log (Precision) Parametrization: alternative > alpha <- predict(blood2, data.frame(disease = factor(c("a", "B"))), F, T, F) > L <- sapply(1:2, function(i) ddirichlet(dr_data(bld[31:36, 1:4]), unlist(alpha[i, + ])))

8 DirichletReg: Dirichlet Regression for Compositional Data in R > LP <- L/rowSums(L) > dimnames(lp) <- list(paste("c", 1:6), c("a", "B")) > print(data.frame(round(lp * 100, 1), pred. = as.factor(ifelse(lp[, 1] > LP[, + 2], "==> A", "==> B"))), print.gap = 2) A B pred. C 1 59.4 40.6 ==> A C 2 43.2 56.8 ==> B C 3 38.4 61.6 ==> B C 4 43.8 56.2 ==> B C 5 36.6 63.4 ==> B C 6 70.2 29.8 ==> A 4.3. Reading skills data (alternative parametrization) > RS <- ReadingSkills > RS$acc <- DR_data(RS$accuracy) > RS$dyslexia <- C(RS$dyslexia, treatment) > rs1 <- DirichReg(acc ~ dyslexia * iq dyslexia * iq, RS, model = "alternative") > rs2 <- DirichReg(acc ~ dyslexia * iq dyslexia + iq, RS, model = "alternative") > anova(rs1, rs2) Analysis of Deviance Table Model 1: DirichReg(formula = acc ~ dyslexia * iq dyslexia * iq, data = RS, model = "alternative") Model 2: DirichReg(formula = acc ~ dyslexia * iq dyslexia + iq, data = RS, model = "alternative") Deviance N. par Difference df p-value Model 1-133.4682 8 - - - Model 2-131.8037 7 1.664453 1 0.1970031 > a <- RS$accuracy > logra_a <- log(a/(1 - a)) > rlr <- lm(logra_a ~ dyslexia * iq, RS) > summary(rlr) Call: lm(formula = logra_a ~ dyslexia * iq, data = RS) Residuals: Min 1Q Median 3Q Max -2.66405-0.37966 0.03687 0.40887 2.50345 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2.8067 0.2822 9.944 2.27e-12 *** dyslexiayes -2.4113 0.4517-5.338 4.01e-06 *** iq 0.7823 0.2992 2.615 0.0125 * dyslexiayes:iq -0.8457 0.4510-1.875 0.0681. --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.2 on 40 degrees of freedom Multiple R-squared: 0.6151, Adjusted R-squared: 0.5862 F-statistic: 21.31 on 3 and 40 DF, p-value: 2.083e-08 > summary(rs2)

Marco J. Maier 9 > B2 <- DR_data(BloodSamples[, c(1, 2, 4)]) > plot(b2, cex = 0.001, reset_par = FALSE) > div.col <- c("#023fa5", "#1A44A4", "#2749A4", "#314DA4", "#3952A5", "#4056A6", + "#465BA8", "#4D5FA9", "#5264AA", "#5868AC", "#5D6CAE", "#6371AF", "#6875B1", + "#6D79B3", "#727DB5", "#7681B7", "#7B85B8", "#8089BA", "#848DBC", "#8991BE", + "#8D95BF", "#9199C1", "#959CC3", "#9AA0C5", "#9EA4C6", "#A2A7C8", "#A5ABCA", + "#A9AECB", "#ADB1CD", "#B1B5CE", "#B4B8D0", "#B8BBD1", "#BBBED2", "#BEC1D4", + "#C1C4D5", "#C5C7D6", "#C8CAD8", "#CBCCD9", "#CDCFDA", "#D0D1DB", "#D3D4DC", + "#D5D6DD", "#D7D8DE", "#D9DADF", "#DBDCE0", "#DDDEE0", "#DFDFE1", "#E0E0E1", + "#E1E1E2", "#E2E2E2", "#E2E2E2", "#E2E1E1", "#E2E0E0", "#E1DFDF", "#E1DDDD", + "#E0DBDC", "#E0D9DA", "#DFD6D8", "#DED4D6", "#DDD1D3", "#DCCFD1", "#DBCCCE", + "#DBC9CC", "#D9C6C9", "#D8C2C6", "#D7BFC3", "#D6BCC0", "#D5B8BD", "#D4B5BA", + "#D2B1B7", "#D1ADB3", "#CFA9B0", "#CEA5AC", "#CCA1A9", "#CB9DA5", "#C999A2", + "#C7949E", "#C6909A", "#C48C96", "#C28792", "#C0828E", "#BE7E8A", "#BC7986", + "#B97482", "#B76F7E", "#B56A7A", "#B36576", "#B06071", "#AE5A6D", "#AB5569", + "#A94F64", "#A64A60", "#A3445B", "#A03E57", "#9D3752", "#9B304E", "#982949", + "#952045", "#911640", "#8E063B") > temp <- (alpha/rowsums(alpha))[, c(2, 4, 1)] > points(dirichletreg:::coord.trafo(temp/rowsums(temp)), pch = 22, bg = div.col[c(1, + 100)], cex = 2, lwd = 0.25) > temp <- B2[1:30, c(2, 3, 1)] > points(dirichletreg:::coord.trafo(temp/rowsums(temp)), pch = 21, bg = (div.col[c(1, + 100)])[BloodSamples$Disease[1:30]], cex = 0.5, lwd = 0.25) > temp <- B2[31:36, c(2, 3, 1)] > points(dirichletreg:::coord.trafo(temp/rowsums(temp)), pch = 21, bg = div.col[round(100 * + LP[, 2], 0)], cex = 1, lwd = 0.5) > legend("topleft", bty = "n", legend = c("disease A", "Disease B", NA, "Expected Values"), + pch = c(21, 21, NA, 22), pt.bg = c(div.col[c(1, 100)], NA, "white")) Disease A Disease B Expected Values Albumin.1.9.2.8.3.7.4.6.6.5.5.4.7.3.8.2.9.1 Pre.Albumin.1.2.3.4.5.6.7.8.9 Globulin.B Figure 5: Blood samples: Observed values and predictions

10 DirichletReg: Dirichlet Regression for Compositional Data in R Call: DirichReg(formula = acc ~ dyslexia * iq dyslexia + iq, data = RS, model = "alternative") Standardized Residuals: Min 1Q Median 3Q Max 1 - accuracy -1.5661-0.8204-0.5112 0.5211 3.4334 accuracy -3.4334-0.5211 0.5112 0.8204 1.5661 MEAN MODELS: Coefficients for variable no. 1: 1 - accuracy - variable omitted (reference category) - Coefficients for variable no. 2: accuracy (Intercept) 1.8649 0.2991 6.235 4.52e-10 *** dyslexiayes -1.4833 0.3029-4.897 9.74e-07 *** iq 1.0676 0.3359 3.178 0.001482 ** dyslexiayes:iq -1.1625 0.3452-3.368 0.000757 *** PRECISION MODEL: (Intercept) 1.5579 0.3336 4.670 3.01e-06 *** dyslexiayes 3.4931 0.5880 5.941 2.83e-09 *** iq 1.2291 0.4596 2.674 0.00749 ** Signif. codes: `***' <.001, `**' < 0.01, `*' < 0.05, `.' < 0.1 Log-likelihood: 65.9 on 7 df (37+2 iterations) AIC: -117.8, BIC: -105.3144 Number of Observations: 44 Links: Logit (Means) and Log (Precision) Parametrization: alternative > confint(rs2) 95% Confidence Intervals (original form) - Beta-Parameters: Variable: 1 - accuracy variable omitted Variable: accuracy 2.5% Est. 97.5% (Intercept) 1.279 1.86 2.451 dyslexiayes -2.077-1.48-0.890 iq 0.409 1.07 1.726 dyslexiayes:iq -1.839-1.16-0.486 - Gamma-Parameters 2.5% Est. 97.5% (Intercept) 0.904 1.56 2.21 dyslexiayes 2.341 3.49 4.65 iq 0.328 1.23 2.13 > confint(rs2, exp = TRUE) 95% Confidence Intervals (exponentiated) - Beta-Parameters: Variable: 1 - accuracy

Marco J. Maier 11 variable omitted Variable: accuracy 2.5% exp(est.) 97.5% (Intercept) 3.592 6.455 11.601 dyslexiayes 0.125 0.227 0.411 iq 1.506 2.908 5.618 dyslexiayes:iq 0.159 0.313 0.615 - Gamma-Parameters 2.5% exp(est.) 97.5% (Intercept) 2.47 4.75 9.13 dyslexiayes 10.39 32.89 104.12 iq 1.39 3.42 8.41 Affiliation: Marco J. Maier Institute for Statistics and Mathematics Wirtschaftsuniversität Wien Vienna University of Economics and Business Augasse 2 6 1090 Vienna, Austria Telephone: +43/1/31336 4335 E-mail: Marco.Maier@wu.ac.at URL: http://statmath.wu.ac.at/~maier/

12 DirichletReg: Dirichlet Regression for Compositional Data in R > gcol <- c("#e495a5", "#39BEB1")[3 - as.numeric(rs$dyslexia)] > tmt <- c(-3, 3) > par(mfrow = c(3, 2)) > qqnorm(residuals(rlr, "pearson"), ylim = tmt, xlim = tmt, pch = 21, bg = gcol, + main = "Normal Q Q Plot: OLS Residuals", cex = 0.75, lwd = 0.5) > abline(0, 1, lwd = 2) > qqline(residuals(rlr, "pearson"), lty = 2) > qqnorm(residuals(rs2, "standardized")[, 2], ylim = tmt, xlim = tmt, pch = 21, + bg = gcol, main = "Normal Q Q Plot: DirichReg Residuals", cex = 0.75, lwd = 0.5) > abline(0, 1, lwd = 2) > qqline(residuals(rs2, "standardized")[, 2], lty = 2) > plot(readingskills$iq, residuals(rlr, "pearson"), pch = 21, bg = gcol, ylim = c(-3, + 3), main = "OLS Residuals", xlab = "IQ", ylab = "Pearson Residuals", cex = 0.75, + lwd = 0.5) > abline(h = 0, lty = 2) > plot(readingskills$iq, residuals(rs2, "standardized")[, 2], pch = 21, bg = gcol, + ylim = c(-3, 3), main = "DirichReg Residuals", xlab = "IQ", ylab = "Standardized Residuals", + cex = 0.75, lwd = 0.5) > abline(h = 0, lty = 2) > plot(fitted(rlr), residuals(rlr, "pearson"), pch = 21, bg = gcol, ylim = c(-3, + 3), main = "OLS Residuals", xlab = "Fitted", ylab = "Pearson Residuals", + cex = 0.75, lwd = 0.5) > abline(h = 0, lty = 2) > plot(fitted(rs2)[, 2], residuals(rs2, "standardized")[, 2], pch = 21, bg = gcol, + ylim = c(-3, 3), main = "DirichReg Residuals", xlab = "Fitted", ylab = "Standardized Residuals", + cex = 0.75, lwd = 0.5) > abline(h = 0, lty = 2) Normal Q Q Plot: OLS Residuals Normal Q Q Plot: DirichReg Residuals Sample Quantiles 3 2 1 0 1 2 3 Sample Quantiles 3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3 Theoretical Quantiles Theoretical Quantiles OLS Residuals DirichReg Residuals Pearson Residuals 3 2 1 0 1 2 3 Standardized Residuals 3 2 1 0 1 2 3 1 0 1 2 IQ 1 0 1 2 IQ OLS Residuals DirichReg Residuals iduals 1 2 3 esiduals 1 2 3

Marco J. Maier 13 > g.ind <- as.numeric(rs$dyslexia) > g1 <- g.ind == 1 > g2 <- g.ind!= 1 > par(mar = c(4, 4, 4, 4) + 0.1) > plot(accuracy ~ iq, RS, pch = 21, bg = c("#e495a5", "#39BEB1")[3 - g.ind], cex = 1.5, + main = "Dyslexic (Red) vs. Control (Green) Group", xlab = "IQ Score", ylab = "Reading Accuracy", + xlim = range(readingskills$iq)) > x1 <- seq(min(rs$iq[g1]), max(rs$iq[g1]), length.out = 200) > x2 <- seq(min(rs$iq[g2]), max(rs$iq[g2]), length.out = 200) > n <- length(x1) > X <- data.frame(dyslexia = factor(rep(0:1, each = n), levels = 0:1, labels = c("no", + "yes")), iq = c(x1, x2)) > pv <- predict(rs2, X, TRUE, TRUE, TRUE) > lines(x1, pv$mu[1:n, 2], col = c("#e495a5", "#39BEB1")[2], lwd = 3) > lines(x2, pv$mu[(n + 1):(2 * n), 2], col = c("#e495a5", "#39BEB1")[1], lwd = 3) > ols <- 1/(1 + exp(-predict(rlr, X))) > lines(x1, ols[1:n], col = c("#ad6071", "#00897D")[2], lwd = 3, lty = 2) > lines(x2, ols[(n + 1):(2 * n)], col = c("#ad6071", "#00897D")[1], lwd = 3, lty = 2) > par(new = TRUE) > plot(x1, pv$phi[1:n], col = c("#6e1d34", "#004E42")[2], lty = "11", type = "l", + ylim = c(0, max(pv$phi)), axes = F, ann = F, lwd = 2, xlim = range(rs$iq)) > lines(x2, pv$phi[(n + 1):(2 * n)], col = c("#6e1d34", "#004E42")[1], lty = "11", + type = "l", lwd = 2) > axis(4) > mtext(expression(paste("precision (", phi, ")", sep = "")), 4, line = 3) > legend("topleft", legend = c(expression(hat(mu)), expression(hat(phi)), "OLS"), + lty = c(1, 3, 2), lwd = c(3, 2, 3), bty = "n") Dyslexic (Red) vs. Control (Green) Group Reading Accuracy 0.5 0.6 0.7 0.8 0.9 1.0 µ^ φ^ OLS 0 100 200 300 400 500 600 700 Precision (φ) 1 0 1 2 IQ Score Figure 7: Reading skills: Predicted values of Dirichlet regression and OLS regression.