Φροντιστήριο Βιοστατιστικής. Έλεγχος Υποθέσεων. Παύλος Αγιανιάν Επικ. Καθηγητής ΔΠΘ

Φροντιστήριο Βιοστατιστικής Έλεγχος Υποθέσεων Παύλος Αγιανιάν Επικ. Καθηγητής ΔΠΘ Αλεξανδρούπολη 215

Αρχική Κατανομή N (μ ο, σ 2 ) μ ο Τυποποιημένη Κατανομή N (, 1) -Z 1-α/2 Z 1-α/2

N (, 1) N (μ ο, σ 2 ) -Z 1-α/2 μ ο Z 1-α/2 Δείχνει πόσα σίγμα από την μέση τιμή βρίσκεται η τιμή

N (μ ο, σ 2 ) -Z 1-α/2 Z 1-α/2 μ ο

N (μ ο, σ 2 ) Περιοχή απόρριψης Περιοχή απόρριψης -Z 1-α/2 Z 1-α/2 Για α/2 =.25, Z 1-α/2 1.96 ΠΙΝΑΚΕΣ Κριτική τιμή (κριτήριο ελέγχου) (αφορά τυποποιημένη κατανομή, δηλ. μ ο = και σ = 1)

N (μ ο, σ 2 ) Για α =.5, Z 1-α 1.65 Για α =.25, Z 1-α 1.96 Z 1-α μ ο

N (μ ο, σ 2 ) Για α =.5, -Z 1-α -1.65 Για α =.25, -Z 1-α -1.96 -Z 1-α μ ο

N (μ ο, σ 2 ) = εκτιμώμενη τιμή (π.χ μέση τιμή δείγματος από πείραμα ή δειγματοληψία) Z 1-α μ ο Έλεγχος Υπόθεσης Απορρίπτουμε την Hο αν:

N (μ ο, σ 2 ) Ζ ο = πόσες τυπικές αποκλίσεις απέχει η από την μ ο Ζ ο Z 1-α μ ο Απορρίπτουμε την Hο αν:

N (μ ο, σ 2 ) -Z 1-α μ ο Απορρίπτουμε την Hο αν:

N (μ ο, σ 2 ) Ζ ο -Z 1-α μ ο Απορρίπτουμε την Hο αν: <

N (μ ο, σ 2 ) -Z 1-α/2 Z 1-α/2 μ ο Απορρίπτουμε την Hο αν:

N (μ ο, σ 2 ) Ζ ο Ζ ο -Z 1-α/2 Z 1-α/2 μ ο Απορρίπτουμε την Hο αν: ή

N (μ ο, σ 2 ) Z ο =1.25 μ ο MS Excel = NORMSDIST(z) -> προσθετική πιθανότητα (p) μέχρι κάποιο σημείο Ζ o = NORMSDIST(1.25) ->.894 = 1-NORMSDIST(z) -> προσθετική πιθανότητα (p) από κάποιο σημείο Ζ o = 1-NORMSDIST(1.25) ->.16

Πόσα άτομα προβλέπεται να εμφανίσουν τιμές > 85 ; N (μ ο =8, σ=1, n=1) Z ο =.5 8 = 85 = 1-NORMSDIST(.5) ->.385.385 1 31 άτομα

Αντίστροφα, σε ποιο Ζ crit αντιστοιχεί η τιμή p=.385 ; N (μ ο =8, σ=1, n=1) Z crit =.5 8 = 85 = NORMSINV(1-.385) ->.51

Σε ποιο p αντιστοιχεί η τιμή 85 ; N (μ ο =8, σ=1, n=1) Z ο =.5 8 = 85 = 1-NORMDIST(85, 8, 1, TRUE) ->.385

Σε ποιο p αντιστοιχεί η τιμή 85 ; Είναι σημαντικά μεγαλύτερη από την μέση τιμή (8); N (μ ο =8, σ=1, n=1) Z ο =.5 Μονόπλευρος Έλεγχος 8 = 85 = 1 - NORMDIST(85, 8, 1, TRUE) ->.385 Απορρίπτουμε την Η ο > α=.5

Η τιμή 1 είναι σημαντικά μεγαλύτερη από τη μέση τιμή (8); N (μ ο =8, σ=1, n=1) Μονόπλευρος Έλεγχος Z crit = NORMSINV(1-.5) =1.644 8 = 1 = 1 - NORMDIST(1, 8, 1, TRUE) ->.227 Δεχόμαστε την Η ο < α=.5

Σε ποια τιμή αντιστοιχεί το σημείο απόρριψης (p=.5); N (μ ο =8, σ=1, n=1) Μονόπλευρος Έλεγχος Z crit = NORMSINV(1-.5) =1.644 8 x crit = 96.45 96.45 =NORMINV(1-.5, 8, 1)

Η τιμή 97 είναι σημαντικά μεγαλύτερη ή μικρότερη από τη μέση τιμή (8); N (μ ο =8, σ=1, n=1) Αμφίπλευρος Έλεγχος Z crit (.5) =1.644 8 = 97 Z crit = NORMSINV(1-.25) =1.96 Δεχόμαστε την Η ο = 1 - NORMDIST(97, 8, 1, TRUE) ->.445 < α=.5 αλλά > α/2=.25

Για n > 3 έχουμε: s SD

Student s t-test Υπάλληλος της Guinness William Sealy Gosset 1876-1937 Δημοσίευσε άρθρα με το ψευδώνυμο «Student» εξ ου και το όνομα Student s t-test Κατανομή t:

Στην πράξη, όταν το δείγμα είναι μικρό (n<3) χρησιμοποιούμε το στατιστικό μέγεθος t αντί του Ζ. Για τον έλεγχο υποθέσεων ισχύει για το t ό,τι έχουμε δείξει παραπάνω για το Ζ με τη διαφορά ότι χρησιμοποιείται η κατανομή t για την κριτική τιμή (t crit ) και όχι η κανονική κατανομή. Όταν το δείγμα είναι μεγάλο, η τιμές Ζ crit και t crit έχουν απειροελάχιστη διαφορά. Δειγματική Απόκλιση Όχι το SD (s) t crit (.5, n=17) =2.11 Z crit (.5) =1.644

ΠΡΟΣΟΧΗ: η τιμή t crit και αντίστροφα η πιθανότητα στην οποία αντιστοιχεί μια τιμή t (ή t crit ) υπολογίζεται με n-1 βαθμούς ελευθερίας (df) MS Excel p-value = TDIST(t, df, tail) tail =2 Αμφίπλευρος Έλεγχος t crit = TINV(α, df) df=n-1 tail =1 Μονόπλευρος Έλεγχος* *Στην περίπτωση αυτή χρησιμοποιούμε πιθανότητα 2α Το διάστημα εμπιστοσύνης που προκύπτει από το t crit είναι:

Λίστα στατιστικών συναρτήσεων του MS Excel 27 AVERAGE Returns the arithmetic mean and the specified numbers and takes the form =AVERAGE(number1,number2, ), where the numbers can be names, arrays, or references that resolve to numbers. Cells containing text, logical values, or empty cells are ignored, but cells containing a zero value are included. INTERCEPT Calculates the point at which a line will intersect the y-axis by using existing x-values and y- values and takes the form =INTERCEPT(known_y s,known_x s), where known_y s is the dependent set of observations or data and known_x s is the independent set of observations or data. CHIDIST CHIINV Returns the one-tailed probability of the chi-squared distribution (used to compare observed vs. expected values) and takes the form =CHIDIST(x,degrees_freedom), where x is the value at which you want to evaluate the distribution and degrees_freedom is the number of degrees of freedom. Returns the inverse of the CHIDIST (one-tailed probability of the chi-squared distribution) and takes the form =CHINV(probability,degrees_freedom), where probability is a probability associated with the chi-squared distribution and degrees_freedom is the number of degrees of freedom. KURT LARGE MAX Returns the kurtosis of a data set (characterizes the relative peaked-ness or flatness of a distribution compared with the normal distribution), takes the form =KURT(number1,number2, ), and accepts up to 3 numeric arguments. Returns the kth largest value in an input range and takes the form =LARGE(array,k), where k is the position from the largest value in array you want to find. Returns the maximum largest value in a range, takes the form =MAX(number1,number2, ), and can accept up to 3 arguments, ignoring text, error values, and logical values. CHITEST Returns the test for independence and takes the form =CHITEST(actual_range,expected_range), where actual_range is the range of data that contains observations to test against expected values and expected_range is the range of data that contains the ratio of the product of row totals and column totals to the grand total. CONFIDENCE Returns the confidence interval for a population mean and takes the form =CONFIDENCE(alpha,standard_dev,size), where alpha is the significance level used to compute the confidence level (an alpha of.1 indicates a 9 percent confidence level); standard_dev is the population standard deviation for the COUNTIF DEVSQ FDIST Counts the number of cells within a range that match specified criteria and takes the form =COUNTIF(range,criteria), where range is the range you want to test and criteria is the logical test to be performed on each cell. Returns the sum of squares of deviations of data points from their sample mean, takes the form =DEVSQ(number1,number2, ), where numbers can be names, arrays, or references that resolve to numbers, and accepts up to 3 arguments. Returns the F probability distribution and takes the form =FDIST(x,degrees_freedom1,degrees_freedom2), where s the value at which to evaluate the function, degrees_freedom1 is the numerator degrees of freedom, and degrees_freedom2 is the denominator. FREQUENCY Returns the number of times that values occur within a population and takes the form =FREQUENCY(data_array,bins_array). MEDIAN MIN Computes the median of a set of numbers, takes the form =MEDIAN(number1,number2, ), and can accept up to 3 arguments, ignoring text, error values, and logical values. Returns the smallest value in a range, takes the form =MIN(number1,number2, ), and can accept up to 3 arguments, ignoring text, error values, and logical values. NORMDIST Returns the normal cumulative distribution for the specified mean and standard deviation and takes the form =NORMDIST(x,mean,standard_dev,cumulative), where s the value for which you want the distribution; mean is the arithmetic mean of the distribution; standard_dev is the standard deviation of the distribution; and cumulative is a logical value that determines the form of the function (if TRUE, returns the cumulative distribution function; if FALSE, returns the probability mass function). NORMINV Returns the inverse of the normal cumulative distribution for the specified mean and standard deviation and takes the form =NORMINV(probability,mean,standard_dev), where probability is a probability corresponding to the normal distribution; mean is the arithmetic mean of the distribution; and standard_dev is the standard deviation of the distribution. NORMSDIST Returns the standard normal cumulative distribution function and takes the form =NORMSDIST(z). NORMSINV Returns the inverse of the standard normal cumulative distribution (with a mean of zero and a standard deviation of one) and takes the form =NORMSINV(probability), where probability is a probability corresponding to the normal distribution.

Λίστα στατιστικών συναρτήσεων του MS Excel 27 QUARTILE SKEW SLOPE Returns the value in an input range that represents a specified quarter-percentile and takes the for =QUARTILE(array,quart). Returns the skewness of a distribution (the degree of asymmetry of a distribution around its mean), takes the form =SKEW(number1,number2, ), and accepts up to 3 arguments. Returns the slope of the linear regression line and takes the form =SLOPE(known_y s,known_x s). ZTEST MODE Returns the two-tailed P-value of a z-test (generates a standard score for x with respect to the data set, array, and returns the two-tailed probability for the normal distribution), and takes the form =ZTEST(array,x,sigma), where array is the array or range of data against which to test x; s the value to test; and sigma is the known population s standard deviation. Determines which value occurs most frequently in a set of numbers, takes the form =MODE(number1, number2, ), and can accept up to 3 arguments, ignoring text, error values, and logical values. SMALL Returns the k-th smallest value in a data set and takes the form =SMALL(array,k), where k is the position from the smallest value in array you want to find. MEDIAN Computes the median of a set of numbers, takes the form =MEDIAN(number1,number2, ), and can accept up to 3 arguments, ignoring text, error values, and logical values. STANDARDIZE Returns a normalized value from a distribution characterized by mean and standard_dev and takes the form =STANDARDIZE(x,mean,standard_dev), where s the value you want to normalize; mean is the arithmetic mean of the distribution; and standard_dev is the standard deviation of the distribution. LINEST Calculates the statistics for a line using the least squares method to arrive at a slope that best describes the given data and takes the form =LINEST(known_y s,known_x s,const,stats). STDEV TDIST TINV TTEST VAR Estimates standard deviation, assuming that the arguments represent only a sample of the total population, and takes the form =STDEV(number1,number2, ), accepting up to 3 arguments. Returns the percentage points (probability) for the the student s t-distribution, where a numberic value (x) is calculated value of t for which the percentage points are to be computed, and takes the form =TDIST(x,degrees_freedom,tails), where s the numeric value at which to evaluate the distribution; degrees_freedom is an integer indicating the number of degrees of freedom; and tails specifies the number of distribution tails to return (if 1, returns the one-tailed distribution; if 2, returns the two-tailed distribution). Returns the inverse of the Student s t-distribution as a function of the probability and the degrees of freedom, and takes the form =TINV(probability,degrees_freedom), where probability is the probability associated with the two-tailed Student s t-distribution and degrees_freedom is the number of degrees of freedom to characterize the distribution. Returns the probability associated with a Student s t-test and takes the form =TTEST(array1,array2,tails,type), where array1 is the first data set; array2 is the second data set; tails specifies the number of distribution tails (if 1, uses the one-tailed distribution; if 2, uses the two-tailed distribution); and type is the kind of t-test to perform (1 = paired; 2 = two-sample equal variance; 3 = two=sample unequal variance). Computes variance, assuming that the arguments represent only a sample of the total population, and takes the form =VAR(number1,number2, ), accepting up to 3 arguments.