Statistics & Research methods. Athanasios Papaioannou University of Thessaly Dept. of PE & Sport Science

Σχετικά έγγραφα
Μηχανική Μάθηση Hypothesis Testing

5.4 The Poisson Distribution.

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

ST5224: Advanced Statistical Theory II

HISTOGRAMS AND PERCENTILES What is the 25 th percentile of a histogram? What is the 50 th percentile for the cigarette histogram?

Other Test Constructions: Likelihood Ratio & Bayes Tests

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

t-distribution t a (ν) s N μ = where X s s x = ν 2 FD ν 1 FD a/2 a/2 t-distribution normal distribution for ν>120

Aquinas College. Edexcel Mathematical formulae and statistics tables DO NOT WRITE ON THIS BOOKLET

Statistical Inference I Locally most powerful tests

The Simply Typed Lambda Calculus

Lecture 34 Bootstrap confidence intervals

The challenges of non-stable predicates

Math 6 SL Probability Distributions Practice Test Mark Scheme

Repeated measures Επαναληπτικές μετρήσεις

Section 8.3 Trigonometric Equations

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Περιγραφική στατιστική μεθοδολογία.

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

Solution Series 9. i=1 x i and i=1 x i.

ΗΥ-SPSS Statistical Package for Social Sciences 6 ο ΜΑΘΗΜΑ. ΧΑΡΑΛΑΜΠΟΣ ΑΘ. ΚΡΟΜΜΥΔΑΣ Διδάσκων Τ.Ε.Φ.Α.Α., Π.Θ.

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Block Ciphers Modes. Ramki Thurimella

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Probability and Random Processes (Part II)

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Δεδομένα (data) και Στατιστική (Statistics)

Exercise 2: The form of the generalized likelihood ratio

2 Composition. Invertible Mappings

ΠΕΡΙΕΧΟΜΕΝΑ 1 ΕΙΣΑΓΩΓΗ ΤΟ PASW ΜΕ ΜΙΑ ΜΑΤΙΑ ΠΕΡΙΓΡΑΦΙΚΗ ΣΤΑΤΙΣΤΙΚΗ: Η ΜΕΣΗ ΤΙΜΗ ΚΑΙ Η ΔΙΑΜΕΣΟΣ... 29

Approximation of distance between locations on earth given by latitude and longitude

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

EE512: Error Control Coding

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΓΕΩΠΟΝΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΒΙΟΤΕΧΝΟΛΟΓΙΑΣ ΚΑΙ ΕΠΙΣΤΗΜΗΣ ΤΡΟΦΙΜΩΝ. Πτυχιακή εργασία

Biostatistics for Health Sciences Review Sheet

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. Τα γνωστικά επίπεδα των επαγγελματιών υγείας Στην ανοσοποίηση κατά του ιού της γρίπης Σε δομές του νομού Λάρισας

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Fractional Colorings and Zykov Products of graphs

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Last Lecture. Biostatistics Statistical Inference Lecture 19 Likelihood Ratio Test. Example of Hypothesis Testing.

Προσομοίωση BP με το Bizagi Modeler

«ΨΥΧΙΚΗ ΥΓΕΙΑ ΚΑΙ ΣΕΞΟΥΑΛΙΚΗ» ΠΑΝΕΥΡΩΠΑΪΚΗ ΕΡΕΥΝΑ ΤΗΣ GAMIAN- EUROPE

ΠΕΡΙΕΧΟΜΕΝΑ. Κεφάλαιο 1: Κεφάλαιο 2: Κεφάλαιο 3:

Supplementary Appendix

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Μηχανική μάθηση. Ενότητα 3: Hypothesis testing basics. Ιωάννης Τσαμαρδίνος Τμήμα Επιστήμης Υπολογιστών

C.S. 430 Assignment 6, Sample Solutions

Investigating the fuzzy areas of accuracy and confidence of muslim pupils- learners of Greek as Second Language in Thrace, Greece

ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΒΑΛΕΝΤΙΝΑ ΠΑΠΑΔΟΠΟΥΛΟΥ Α.Μ.: 09/061. Υπεύθυνος Καθηγητής: Σάββας Μακρίδης

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

Homework 3 Solutions

Δείγμα (μεγάλο) από οποιαδήποτε κατανομή

Inverse trigonometric functions & General Solution of Trigonometric Equations

ΚΟΙΝΩΝΙΟΒΙΟΛΟΓΙΑ, ΝΕΥΡΟΕΠΙΣΤΗΜΕΣ ΚΑΙ ΕΚΠΑΙΔΕΥΣΗ

557: MATHEMATICAL STATISTICS II HYPOTHESIS TESTING

Πτυχιακή Εργασία. Παραδοσιακά Προϊόντα Διατροφική Αξία και η Πιστοποίηση τους

ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. ΘΕΜΑ: «ιερεύνηση της σχέσης µεταξύ φωνηµικής επίγνωσης και ορθογραφικής δεξιότητας σε παιδιά προσχολικής ηλικίας»

CE 530 Molecular Simulation

Démographie spatiale/spatial Demography

Math221: HW# 1 solutions

An Introduction to Signal Detection and Estimation - Second Edition Chapter II: Selected Solutions

Areas and Lengths in Polar Coordinates

Example Sheet 3 Solutions

Every set of first-order formulas is equivalent to an independent set

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

ΠΕΡΙΕΧΟΜΕΝΑ 1 ΕΙΣΑΓΩΓΗ ΤΟ PASW ΜΕ ΜΙΑ ΜΑΤΙΑ ΠΕΡΙΓΡΑΦΙΚΗ ΣΤΑΤΙΣΤΙΚΗ: Η ΜΕΣΗ ΤΙΜΗ ΚΑΙ Η ΔΙΑΜΕΣΟΣ... 29

Matrices and Determinants

Calculating the propagation delay of coaxial cable

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

(Στατιστική Ανάλυση) Δεδομένων I. Σύγκριση δύο πληθυσμών (με το S.P.S.S.)

4.6 Autoregressive Moving Average Model ARMA(1,1)

Μαντζούνη, Πιπερίγκου, Χατζή. ΒΙΟΣΤΑΤΙΣΤΙΚΗ Εργαστήριο 5 ο

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Μενύχτα, Πιπερίγκου, Σαββάτης. ΒΙΟΣΤΑΤΙΣΤΙΚΗ Εργαστήριο 5 ο

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

Εγχειρίδια Μαθηµατικών και Χταποδάκι στα Κάρβουνα

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Areas and Lengths in Polar Coordinates

Conjoint. The Problems of Price Attribute by Conjoint Analysis. Akihiko SHIMAZAKI * Nobuyuki OTAKE

Lecture 2. Soundness and completeness of propositional logic

6.3 Forecasting ARMA processes

Section 9.2 Polar Equations and Graphs

Εγκατάσταση λογισμικού και αναβάθμιση συσκευής Device software installation and software upgrade

Reminders: linear functions

Finite Field Problems: Solutions

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

SUPERPOSITION, MEASUREMENT, NORMALIZATION, EXPECTATION VALUES. Reading: QM course packet Ch 5 up to 5.6

Instruction Execution Times

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

Θέματα Στατιστικής στη γλώσσα R

w o = R 1 p. (1) R = p =. = 1

Concrete Mathematics Exercises from 30 September 2016

χ 2 test ανεξαρτησίας

Comparison of Evapotranspiration between Indigenous Vegetation and Invading Vegetation in a Bog

Models for Probabilistic Programs with an Adversary

Transcript:

Statistics & Research methods Athanasios Papaioannou University of Thessaly Dept. of PE & Sport Science

30 25 1,65 20 1,66 15 10 5 1,67 1,68 Κανονική 0 Height 1,69 Καμπύλη Κανονική Διακύμανση & Ζ-scores

30 25 20 15 10 5 0 Height Από κυρτό σε κοίλο Point of inflection Σημείο κλίσης 1,65 1,66 1,67 1,68 1,69 Ημονάδαμέτρησηςαπό0 to 1 is standard: Standard Deviation (Τυπική Απόκλιση) Ηπεριοχήαπό-1 σε 1 καλύπτει το 68.26% της συνολκής επιφάνειας - ή 68.26% του συνόλου των περιπτώσεων Ηπεριοχήαπό-2 σε 2 καλύπτει το 95.44% της συνολικής επιφάνειας - ή 95.44% του συνόλου των περιπτώσεων Από -3 σε 3: 99.74% του συνόλου των περιπτώσεων -2-1 0 +1 +2

Standard Deviation S.D. = Σd 2 /(N-1)

S.D. = Σd 2 /(N-1) = 84/9 = 3.055 X.d d 2 1 20 5 25 2 19 4 16 3 17 2 4 4 16 1 1 5 15 0 0 6 14-1 1 7 14-1 1 8 13-2 4 9 11-4 16 10 11-4 16 Σx = 150 M=150/10 = 15 Σd 2 =84

Τυπική απόκλιση στο SPSS Statistics Descriptives Or Statistics Frequencies Descriptives

Variance (Διακύμανση) S.D. 2

Z-Scores (Ή Standard Scores) Σημείο κλίσης -2-1 X 0(M) +1 +2 Z scores Z = 0 είχνει το Μέσο Όρο Ένα Standard Score δίνει πληροφορία για τη θέση ενός συγκεκριμένου σκορ σε σχέση με το Μέσο Όρο της κατανομής. Δείχνει πόσο μακριά πάνω ή κάτω από το Μέσο Όρο (Mean) βρίσκεταιαυτότοσκορ. Z = (X M)/S.D.

Άσκηση 1 Point of inflection -2-1 X 0(M) +1 +2 1.70 1.77 Z = (X M)/S.D. Υπολογίστε το ποσοστό των ανδρών που το ύψος του είναι κάτω από 1.70m όταν M.O. = 1.77m και S.D. =.10m. Z = (1.70 1.77)/.10 = -.07/.10 = -.70. 25.80% βρίσκονται μεταξύ M.O. (1.77) and.70 S.D. (1.70m). Έτσι 50% - 25.8 = 24.2% έχουν ύψος κάτω από 1.70.

Άσκηση 2 Point of inflection -2-1 0 X +1 +2 46 60 Z = (X M)/S.D. Εξετάσεις: Κλίμακα 0-100 Εάν Μ.Ο. = 46 and S.D. = 28, τι ποσοστό ατόμων έχουν βαθμολογία πάνω 60? Ζ = (60 46)/28= 14/28= 0.5. 19.15% ποσοστό ατόμων σκοράρουν μεταξύ M.Ο. και 60. Έτσι 100% - (50 +19.15%) = 30.85% έχουν βαθμολογία πάνω από 60.

Skewness (λοξότητα) Όταν ο Μέσος Όρος δεν είναι στο μέσο της κατανομής. Θετική skewness: Μακριά Ουρά προς τα δεξιά Αρνητική skewness: Μακριά Ουρά προς τα αριστερά Z= S/s s S= Skewness&s s = Standard Error for skewness If Z > 2.58 significance of skewness p<.01. If Z > 3.27 significance of skewness p<.001.

Kurtosis (Κυρτότητα) Όταν η καμπύλη είναι πολύ υπερυψωμένη or πολύ επίπεδξ. Z= K/s k S= Skewness&s k = Standard Error for Kurtosis Εάν Z > 2.58 significance of skewness p<.01. Εάν Z > 3.27 significance of skewness p<.001.

Transformations Ενδιάμεση Θετική Skewness SQRT(X) Πολύ Θετική Skewness LG10(X) Πολύ Ακραία Θετική Skewness 1/X Ενδιάμεση Αρνητική Skewness SQRT(K-X) Πολύ Θετική Skewness LG10(K-X) Πολύ Ακραία Θετική Skewness 1/(K-X) Όπου K = μέγιστη τιμή + 1

Hypothesis testing Central Limit Theorem & sampling error, levels of confidence, statistical significance, Type 1 & Type 2 Errors

-2-1 0 +1 +2 Sampling Error Σφάλμα δειγματοληψίας If many samples are extracted from the same population they will not share identical characteristics between each other and with the population: Sampling Error. Because of Sampling Error the Mean of a sample is impossible to be exactly the same with the Mean of the population.

Central Limit Theorem -2-1 0 +1 +2 If random samples of equal size are repeatedly chosen from any population, then the Means of these Samples will have a Normal Distribution. The average of Means of the sample Means will be approximately the same as the population Mean.

The Mean of the Population is a fixed score that we do not know (but if we examine all the population). -2-1 0 +1 +2 The Standard Deviation of the theoretical distribution of the sample Means indicates the Sampling Error and is called STANDARD ERROR OF THE MEAN : S.E. Based on Central Limit Theorem: 68.26% of all Means lie + 1 S.D. from the Mean of the population. 95.44% of all Means lie + 2 S.D. from the Mean of the population. 99.73% of all Means lie + 3 S.D. from the Mean of the population.

Standard error of mean (Τυπικό σφάλμα) The Standard Deviation of the theoretical distribution of the Means of the Samples indicates the Sampling Error and is called STANDARD ERROR OF THE MEAN S.E.M = S.D.s / N Example 1 Where (within which limit) do we expect the Mean of population of adult men if we randomly chose a sample of 100 men with a Mean = 1.75 m & S.D. =.10 m? S.E. =.10 / 10 =.01, Thus: 68.26% 1.75 +.01 (1.74 1.76) 95.44% 1.75 +.02 (1.73 1.77) 99.73% 1.75 +.03 (1.72 1.78)

Example 2 What Deviation we expect for the percentage of votes for a party for which the sample is 10.000 voters, Mean = 42 (%) and S.D. = 5 (%)? S.E. = 5 / 10000 = 5/100 =.05, Thus with 99.73% probabilities (2.7 in a thousand error probability): 42 + (3 *.05) = 41.85 42.15

From the 1 st Example Where do we expect the Mean of population of adult men if we randomly chose a sample of 100 men with a Mean = 1.75 m & S.D. =.10 m? S.E. =.10 / 10 =.01, Thus: 68.26% 1.75 +.01 (1.74 1.76) 95.44% 1.75 +.02 (1.73 1.77) 99.73% 1.75 +.03 (1.72 1.78) Knowing that in 99.73% the height of men is 175 + 3 cm (172 178). Thus, if we have a sample of 100 handball athletes with Mean = 178.1 cm and SD = 10 cm, what are the probabilities for this sample to be a representative sample of all adult men? Less than 100 99.73% =.27% Can we say that the height of handball athletes differs from the height of the typical population of adult men in a non-random way? If yes, how sure we are? What are the probabilities to be sure/confident about it?

Levels of Confidence Level of confidence Interval range From the Normal Curve: Probability of Sample Mean being within interval Probability of Sample Mean being outside interval 95.44% Μpop + 2.00 S.D. 0.9544 0.0556 95% Mpop + 1.96 S.D. 0.95 0.05 99.73% Μpop + 3.00 S.D. 0.9973 0.0027 99% Mpop + 2.58 S.D. 0.99 0.01

Null Hypothesis Null Hypothesis : The two sample Means (randomly chosen men & handball athletes) were extracted from the same population. Rejection of Null Hypothesis: The two sample Means were NOT extracted from the same population. Alternative hypothesis: The two sample Means were extracted from different populations (e.g., typical men & handball athletes).

Type I Error Type 1 Error(α) at the 95% level of confidence (or, p<.05 level of significance): 5% probabilities to reject the Null Hypothesis while this is true. We reduce the probability of Type 1 Error when we increase the level of confidence (e.g., from 95% to 99%). However, this increases the probability to accept a Null Hypothesis while it should be rejected! (i.e. in fact, the samples maybe correspond to different populations).

Type II Error Type 2 Error (β): The probability not to reject the null hypothesis while we should had rejected it. Example: Distribution of Height of randomly selected sample of men N=100, Μean = 175 cm and SD = 10 cm. Thus SE = 10 / 100 = 1, Thus for a level of confidence 95% the limit is 175 + (1.96 * 1) cm = 173.04-176.96

Height of men selected at random 95% level of confidence M= 175 176 176.96 5% = α (Error Ι) (Error ΙΙ) β = 15% Height of volleyball athletes N=100 with Μ= 178 cm & SD = 10 cm. Thus SE = 1 M= 1.78 1.79 Ζ = (176.96 178) / 1 = -1.04 This corresponds to 15% of the population of volleyball athletes. Thus, there are 15% probabilities for the volleyball athletes to belong to a population with Mean < 176.96 (which was set as limit for the typical men).

Confusion Matrix Statistical Decision Reality Η 0 Η alt Η 0 1-α β Η alt α 1-β Η 0 Η alt Η 0.95.15 Η alt.05.85 Because we never know the reality (we do not study populations): If in reality H 0 =true, correct st. decision is to set high lev. Of conf for accept. If in reality H alt =true, correct st. decision is to set high lev. Of conf for accept. If in reality H 0 =true, the probability for correct st decision: H 0 95% or 1-α If in reality H 0 =true, the probability for wrong st. decision: H 0 5% or α If in reality H alt =true, the probability for wrong st. decision: H alt 15% or β If in reality H alt =true, the probability for correct st. decision:h alt 85% or 1-β 1- β = power