Πολυμεταβλητές αναλύσεις Κυργίδης Αθανάσιος MD, DDS, BΟpt, PhD MSc Medical Research, Μετεκπαίδευση ΕΠΙ ΕΚΑΒ Γναθοπροσωπικός Χειρουργός Ass Editor, Hippokratia
Θεωρία του Χάους Edward Norton Lorenz (born May 23, 97) Edward N Lorenz, "Deterministic non-periodic flow," Journal of the Atmospheric Sciences, 963 Stephen H Kellert, In the Wake of Chaos: Unpredictable Order in Dynamical Systems, University of Chicago Press, 993
Παράξενοι ελκυστές στο χώρο φάσεων παράξενοι ελκυστές στην καρδιά του Χάους υπάρχει τάξη (ντετερμινισμός) Stephen H Kellert, In the Wake of Chaos: Unpredictable Order in Dynamical Systems, University of Chicago Press, 993 Edward N Lorenz, "Deterministic non-periodic flow," Journal of the Atmospheric Sciences, 963
Narrow focus = limited number of endpoints p<5 /2 statistical significance
Drug Multiple Comparisons (outcomes) t-test t-test More symptoms, more likely the improvement t-test Safety More side-effects Post-Hoc analyses: MANOVA Correction per Bonferroni Choice of other statistical methods (multinomial or polytomous statistics) Gauger et al Environ Health Perspect 24
Simpson's Paradox and Clinical Trials: What You Find Is Not Necessarily What You Prove RCT: brain injury following cardiac arrest Myocardial Infarction Drowning Adverse effect: hypotension Abramson et al Ann Emerg Med 992 Kyrgidis et al J Clin Oncol 2 Kyrgidis et al Lancet Oncol 23
Placebo Therapy Experimental Therapy 2 Patients 95 Dead 45% Mortality 2 Patients Dead 55% Mortality Hypotensive Normotensive Hypotensive Normotensive 2 Patients 5 Dead 75% Mortality 8 Patients 8 Dead 44% Mortality 5 Patients 93 Dead 62% Mortality 5 Patients 7 Dead 34% Mortality MI Drowning MI Drowning MI Drowning MI Drowning 5 Patients 3 Dead 87% 5 Patients 2 Dead 4% 6 Patients 49 Dead 82% 2 Patients 3 Dead 26% 6 Patients 54 Dead 9% 6 Patients 39 Dead 43% 6 Patients 5 Dead 83% 44 Patients 2 Dead 27% Abramson et al Ann Emerg Med 992 Kyrgidis et al J Clin Oncol 2 Kyrgidis et al Lancet Oncol 23
Simpson s Paradox Example: 44% of male applicants are admitted by a university, but only 33% of female applicants Does this mean there is unfair discrimination? University investigates and breaks down figures for Engineering and English programmes Male Accept 35 2 Refuse entry 45 4 Total 8 6 Female
Simpson s Paradox No relationship between sex and acceptance for either programme So no evidence of discrimination Why? More females apply for the English programme, but it it hard to get into More males applied to Engineering, which has a higher acceptance rate than English Must look deeper than single cross-tab to find this out Engineering Male Accept 3 Refuse entry 3 Total 6 2 Female English Male Female Accept 5 Refuse entry 5 3 Total 2 4
Another Example A study of graduates salaries showed negative association between economists starting salary and the level of the degree ie PhDs earned less than Masters degree holders, who in turn earned less than those with just a Bachelor s degree Why? The data was split into three employment sectors Teaching, government and private industry Each sector showed a positive relationship Employer type was confounded with degree level Abramson et al Ann Emerg Med 992 Kyrgidis et al J Clin Oncol 2 Kyrgidis et al Lancet Oncol 23
Υπάρχει Στατιστική Σχέση για πρόβλεψη; Πόσο ισχυρή είναι αυτή η σχέση; Ποιο είναι το καταλληλότερο μοντέλο; Πώς εκφράζεται η καλύτερη προσαρμογή του μοντέλου; ΑΝΑΛΥΣΗ ΣΥΣΧΕΤΙΣΗΣ Αναφέρεται στη γραμμική σχέση μεταξύ των δυο μεταβλητών ΠΑΛΙΝΔΡΟΜΗΣΗ Παραδεχόμαστε ή υποθέτουμε μια συναρτησιακή σχέση ανάμεσα στις μεταβλητές
Selecting a Multivariate Technique Dependency dependent (criterion) variables and independent (predictor) variables are present Interdependency variables are interrelated without designating some dependent and others independent
Dependency Techniques Multiple regression Discriminant analysis Multivariate analysis of variance(manova) Linear structural relationships (LISREL) Conjoint analysis
Uses for Multiple Regression Control for confounding variables to better evaluate the contribution of other variables
Uses for Multiple Regression Control for confounding variables to better evaluate the contribution of other variables Predict values for a criterion variable by developing a self-weighting estimating equation
Uses for Multiple Regression Control for confounding variables to better evaluate the contribution of other variables Predict values for a criterion variable by developing a self-weighting estimating equation Test and explain causal theories Path analysis
Uses for Discriminant Analysis Classify persons or objects into various groups Analyze known groups to determine the relative influence of specific factors
Use for MANOVA Assess relationship between two or more dependent variables and classificatory variables or factors samples Eg measure differences between employees customers manufactured items production parts
Uses of LISREL Explains causality among constructs not directly measured Two parts Measurement model Structural Equation model
Use for Conjoint Analysis Market research Product development
Binary Logistic Regression Anxious? Similarity to linear regression: 2 3 4 Model fit Interpreting coefficients Inferential statistics Predicting Y for values of the independent variables
Binary Logistic Regression Linear regression on a dichotomous dependent variable: Yes = No = Y = Support Privatizing Social Security X = Income
Methods of Regression Forced Entry: All variables entered simultaneously Hierarchical: Variables entered in blocks Blocks should be based on past research, or theory being tested Good Method Stepwise: Variables entered on the basis of statistical criteria (ie relative contribution to predicting outcome) Should be used only for exploratory analysis Slide 3
Equation for Step The number e is a famous irrational number, and is one of the most important numbers in mathematics The first few digits are:278288284594523536287473527 (and more ) e is the base of the Natural Logarithms (invented by John Napier) We can say that the odds of a patient who is treated being cured are 34 times higher than those of a patient who is not treated, with a 95% CI of 56 to 748 e, 229 The important thing about this confidence interval is that it doesn t cross OR e e 3, 47 ) This is important because values greater (both values are greater than e than mean that as the predictor variable(s) increase, so do the odds of (in this case) being cured Values less than mean the opposite: as the ln(orthe) ln( 3,decreases 47 ),229 predictor increases, odds of being cured
Μια πλήρης παρέκκλιση από την βασική μας ανάλυση: Η ιστορία της πολλαπλής παλινδρόμησης Πόσα χρήματα θα συγκεντρώνονταν από την επιβολή ενός εισαγωγικού δασμού στα ζωικά και φυτικά έλαια (βούτυρο, έλαιο σόγιας, κλπ); Ο υπολογισμός αυτός απαιτεί να γνωρίζουμε τις ελαστικότητες προσφοράς και ζήτησης, τόσο τις εγχώριες, όσο και των κρατών απ όπου εισάγονται τα τα έλαια Το πρόβλημα αυτό έλυσε πρώτος ο Wright το 928 στο Παράρτημα Β του έργου του The Tariff on Animal and Vegetable Oils 32
Διάγραμμα 4, σελ 296, Παράρτημα Β (928): 33
Ποιος, όμως, έγραψε το Παράρτημα Β ; το παράρτημα αυτό πιστεύεται ότι το έγραψε είτε ο ίδιος ο Philip Wright σε συνεργασία με το γιό του, Sewall Wright, που ήταν εξαίρετος στατιστικολόγος ή ο γιος του μόνος του Ποιοι ήταν οι δύο αυτοί άντρες και ποια η ιστορία τους; 34
Philip Wright (86-934) Sewall Wright (889-988) MA Harvard, Econ, 887 Lecturer,Harvard,93-97 ScD Harvard, Biology, 95 Prof, U Chicago, 93-954 άσημος οικονομολόγος και ποιητής διάσημος γενετικός στατιστικολόγος 35
Παράδειγμα: Ζήτηση τσιγάρων Πόσο θα μειωνόταν η κατανάλωση τσιγάρων από την επιβολή ενός (υποθετικού) φόρου; Για να απαντήσουμε στο ερώτημα αυτό, χρειάζεται να γνωρίζουμε την ελαστικότητα της ζήτησης τσιγάρων, δηλαδή, το συντελεστή παλινδρόμησης 36
Logistic regression () Table 2 Age and signs of coronary heart disease (CD) Age CD Age CD Age CD 22 23 24 27 28 3 3 32 33 35 38 4 4 46 47 48 49 49 5 5 5 52 54 55 58 6 6 62 65 67 7 77 8
How can we analyse these data? Comparison of the mean age of diseased and non-diseased women Non-diseased: 386 years Diseased: 587 years (p<) Linear regression?
Dot-plot: Data from Table 2 Signs of coronary disease Yes No 2 4 AGE (years) 6 8
Logistic regression (2) Table 3 Prevalence (%) of signs of CD according to age group Diseased Age group # in group # % 2-29 5 3-39 6 7 4-49 7 2 29 5-59 7 4 57 6-69 5 4 8 7-79 2 2 8-89
Dot-plot: Data from Table 3 Diseased % 8 6 4 2 2 3 4 Age (years) 5 6 7
The logistic function Probability of disease 8 e α βx P(y x ) e α βx 6 4 2 x
multinomial logistic regression Case Processing Summary N Marginal Percentage Histological_diagnosisa Nodular Histological_diagnosis Superficial 77 Nodular 54 Micronodular 9 Morpheaform 8 Mixed 5 Pinkus 8 Infiltrating 32 Flat No 87 Yes 26 Elevated No 89 Yes 24 Nodular No 249 Yes 64 Arborizingtelangiectasia 45,4% Yes 7 Largeblue_grayovoidnests Intercept B Std Error -5,953 [Flat=] 24,6% 49,2% 6,% 2,6% 4,8% 2,6%,2% 59,7% 4,3% 6,4% 39,6% 79,6% 2,4% No 95% Confidence Interval for Exp(B) 322,44 9,47 [Flat=] b [Elevated=] 8,3 [Elevated=] b [Nodular=] -,44 [Nodular=] b [Arborizingtelangiectasia=] -,477 [Arborizingtelangiectasia=] b [Largeblue_grayovoidnests=] -,988 [Largeblue_grayovoidnests=] b [Ulceration=] -,6 [Ulceration=] b [PseudoPigmentation=],58 [PseudoPigmentation=] b [Shortfinesuperficialtelangiectasia= Wald 6,72,3,3 62,74 9,42,68,439,932,54,462 2,95 Upper Bound 463,92,42,544,25,23,787,88,45,586,89,37,65 2,53E38,92E-39,228, 2,553E4 6,47E-34,643,2 5,896E4,495E-33 9389,5,998,325 Lower Bound,959,682,955 Exp(B),985,,48 Sig, 6,72,788 df,284,665 2,99 4,8,889 5,445 ] [Shortfinesuperficialtelangiectasia= ] [Multiplesmallerosions=] 42 [Multiplesmallerosions=] 54,6% No b,64 b [Shinywhite_redstructurelessareas= 23 -,296,58,4,529,33,748,78,576,435,744 3,9,264 2,97
Kyrgidis Athanassios, MD, MSc, PhD 3 Papazoli St, Thessaloniki, 546 3, Greece Τel +3-6947-566727 Fax +3-23-5467 E-mail: akyrgidi@gmailcom