Statistics & Research methods Athanasios Papaioannou University of Thessaly Dept. of PE & Sport Science
30 25 1,65 20 1,66 15 10 5 1,67 1,68 Κανονική 0 Height 1,69 Καμπύλη Κανονική Διακύμανση & Ζ-scores
30 25 20 15 10 5 0 Height Από κυρτό σε κοίλο Point of inflection Σημείο κλίσης 1,65 1,66 1,67 1,68 1,69 Ημονάδαμέτρησηςαπό0 to 1 is standard: Standard Deviation (Τυπική Απόκλιση) Ηπεριοχήαπό-1 σε 1 καλύπτει το 68.26% της συνολκής επιφάνειας - ή 68.26% του συνόλου των περιπτώσεων Ηπεριοχήαπό-2 σε 2 καλύπτει το 95.44% της συνολικής επιφάνειας - ή 95.44% του συνόλου των περιπτώσεων Από -3 σε 3: 99.74% του συνόλου των περιπτώσεων -2-1 0 +1 +2
Standard Deviation S.D. = Σd 2 /(N-1)
S.D. = Σd 2 /(N-1) = 84/9 = 3.055 X.d d 2 1 20 5 25 2 19 4 16 3 17 2 4 4 16 1 1 5 15 0 0 6 14-1 1 7 14-1 1 8 13-2 4 9 11-4 16 10 11-4 16 Σx = 150 M=150/10 = 15 Σd 2 =84
Τυπική απόκλιση στο SPSS Statistics Descriptives Or Statistics Frequencies Descriptives
Variance (Διακύμανση) S.D. 2
Z-Scores (Ή Standard Scores) Σημείο κλίσης -2-1 X 0(M) +1 +2 Z scores Z = 0 είχνει το Μέσο Όρο Ένα Standard Score δίνει πληροφορία για τη θέση ενός συγκεκριμένου σκορ σε σχέση με το Μέσο Όρο της κατανομής. Δείχνει πόσο μακριά πάνω ή κάτω από το Μέσο Όρο (Mean) βρίσκεταιαυτότοσκορ. Z = (X M)/S.D.
Άσκηση 1 Point of inflection -2-1 X 0(M) +1 +2 1.70 1.77 Z = (X M)/S.D. Υπολογίστε το ποσοστό των ανδρών που το ύψος του είναι κάτω από 1.70m όταν M.O. = 1.77m και S.D. =.10m. Z = (1.70 1.77)/.10 = -.07/.10 = -.70. 25.80% βρίσκονται μεταξύ M.O. (1.77) and.70 S.D. (1.70m). Έτσι 50% - 25.8 = 24.2% έχουν ύψος κάτω από 1.70.
Άσκηση 2 Point of inflection -2-1 0 X +1 +2 46 60 Z = (X M)/S.D. Εξετάσεις: Κλίμακα 0-100 Εάν Μ.Ο. = 46 and S.D. = 28, τι ποσοστό ατόμων έχουν βαθμολογία πάνω 60? Ζ = (60 46)/28= 14/28= 0.5. 19.15% ποσοστό ατόμων σκοράρουν μεταξύ M.Ο. και 60. Έτσι 100% - (50 +19.15%) = 30.85% έχουν βαθμολογία πάνω από 60.
Skewness (λοξότητα) Όταν ο Μέσος Όρος δεν είναι στο μέσο της κατανομής. Θετική skewness: Μακριά Ουρά προς τα δεξιά Αρνητική skewness: Μακριά Ουρά προς τα αριστερά Z= S/s s S= Skewness&s s = Standard Error for skewness If Z > 2.58 significance of skewness p<.01. If Z > 3.27 significance of skewness p<.001.
Kurtosis (Κυρτότητα) Όταν η καμπύλη είναι πολύ υπερυψωμένη or πολύ επίπεδξ. Z= K/s k S= Skewness&s k = Standard Error for Kurtosis Εάν Z > 2.58 significance of skewness p<.01. Εάν Z > 3.27 significance of skewness p<.001.
Transformations Ενδιάμεση Θετική Skewness SQRT(X) Πολύ Θετική Skewness LG10(X) Πολύ Ακραία Θετική Skewness 1/X Ενδιάμεση Αρνητική Skewness SQRT(K-X) Πολύ Θετική Skewness LG10(K-X) Πολύ Ακραία Θετική Skewness 1/(K-X) Όπου K = μέγιστη τιμή + 1
Hypothesis testing Central Limit Theorem & sampling error, levels of confidence, statistical significance, Type 1 & Type 2 Errors
-2-1 0 +1 +2 Sampling Error Σφάλμα δειγματοληψίας If many samples are extracted from the same population they will not share identical characteristics between each other and with the population: Sampling Error. Because of Sampling Error the Mean of a sample is impossible to be exactly the same with the Mean of the population.
Central Limit Theorem -2-1 0 +1 +2 If random samples of equal size are repeatedly chosen from any population, then the Means of these Samples will have a Normal Distribution. The average of Means of the sample Means will be approximately the same as the population Mean.
The Mean of the Population is a fixed score that we do not know (but if we examine all the population). -2-1 0 +1 +2 The Standard Deviation of the theoretical distribution of the sample Means indicates the Sampling Error and is called STANDARD ERROR OF THE MEAN : S.E. Based on Central Limit Theorem: 68.26% of all Means lie + 1 S.D. from the Mean of the population. 95.44% of all Means lie + 2 S.D. from the Mean of the population. 99.73% of all Means lie + 3 S.D. from the Mean of the population.
Standard error of mean (Τυπικό σφάλμα) The Standard Deviation of the theoretical distribution of the Means of the Samples indicates the Sampling Error and is called STANDARD ERROR OF THE MEAN S.E.M = S.D.s / N Example 1 Where (within which limit) do we expect the Mean of population of adult men if we randomly chose a sample of 100 men with a Mean = 1.75 m & S.D. =.10 m? S.E. =.10 / 10 =.01, Thus: 68.26% 1.75 +.01 (1.74 1.76) 95.44% 1.75 +.02 (1.73 1.77) 99.73% 1.75 +.03 (1.72 1.78)
Example 2 What Deviation we expect for the percentage of votes for a party for which the sample is 10.000 voters, Mean = 42 (%) and S.D. = 5 (%)? S.E. = 5 / 10000 = 5/100 =.05, Thus with 99.73% probabilities (2.7 in a thousand error probability): 42 + (3 *.05) = 41.85 42.15
From the 1 st Example Where do we expect the Mean of population of adult men if we randomly chose a sample of 100 men with a Mean = 1.75 m & S.D. =.10 m? S.E. =.10 / 10 =.01, Thus: 68.26% 1.75 +.01 (1.74 1.76) 95.44% 1.75 +.02 (1.73 1.77) 99.73% 1.75 +.03 (1.72 1.78) Knowing that in 99.73% the height of men is 175 + 3 cm (172 178). Thus, if we have a sample of 100 handball athletes with Mean = 178.1 cm and SD = 10 cm, what are the probabilities for this sample to be a representative sample of all adult men? Less than 100 99.73% =.27% Can we say that the height of handball athletes differs from the height of the typical population of adult men in a non-random way? If yes, how sure we are? What are the probabilities to be sure/confident about it?
Levels of Confidence Level of confidence Interval range From the Normal Curve: Probability of Sample Mean being within interval Probability of Sample Mean being outside interval 95.44% Μpop + 2.00 S.D. 0.9544 0.0556 95% Mpop + 1.96 S.D. 0.95 0.05 99.73% Μpop + 3.00 S.D. 0.9973 0.0027 99% Mpop + 2.58 S.D. 0.99 0.01
Null Hypothesis Null Hypothesis : The two sample Means (randomly chosen men & handball athletes) were extracted from the same population. Rejection of Null Hypothesis: The two sample Means were NOT extracted from the same population. Alternative hypothesis: The two sample Means were extracted from different populations (e.g., typical men & handball athletes).
Type I Error Type 1 Error(α) at the 95% level of confidence (or, p<.05 level of significance): 5% probabilities to reject the Null Hypothesis while this is true. We reduce the probability of Type 1 Error when we increase the level of confidence (e.g., from 95% to 99%). However, this increases the probability to accept a Null Hypothesis while it should be rejected! (i.e. in fact, the samples maybe correspond to different populations).
Type II Error Type 2 Error (β): The probability not to reject the null hypothesis while we should had rejected it. Example: Distribution of Height of randomly selected sample of men N=100, Μean = 175 cm and SD = 10 cm. Thus SE = 10 / 100 = 1, Thus for a level of confidence 95% the limit is 175 + (1.96 * 1) cm = 173.04-176.96
Height of men selected at random 95% level of confidence M= 175 176 176.96 5% = α (Error Ι) (Error ΙΙ) β = 15% Height of volleyball athletes N=100 with Μ= 178 cm & SD = 10 cm. Thus SE = 1 M= 1.78 1.79 Ζ = (176.96 178) / 1 = -1.04 This corresponds to 15% of the population of volleyball athletes. Thus, there are 15% probabilities for the volleyball athletes to belong to a population with Mean < 176.96 (which was set as limit for the typical men).
Confusion Matrix Statistical Decision Reality Η 0 Η alt Η 0 1-α β Η alt α 1-β Η 0 Η alt Η 0.95.15 Η alt.05.85 Because we never know the reality (we do not study populations): If in reality H 0 =true, correct st. decision is to set high lev. Of conf for accept. If in reality H alt =true, correct st. decision is to set high lev. Of conf for accept. If in reality H 0 =true, the probability for correct st decision: H 0 95% or 1-α If in reality H 0 =true, the probability for wrong st. decision: H 0 5% or α If in reality H alt =true, the probability for wrong st. decision: H alt 15% or β If in reality H alt =true, the probability for correct st. decision:h alt 85% or 1-β 1- β = power