HISTOGRAMS AND PERCENTILES What is the 25 th percentile of a histogram? The point on the horizontal axis such that of the area under the histogram lies to the left of that point (and to the right) What is the 25 th percentile in this case? % per cigarette 4 3 2 (5) (35) (5) Distribution of the number of cigarettes smoked per day by male current smokers in 97 2 4 8 Number of cigarettes (5) Point on horizontal axis Area to the left of it 5% 5% + 35% 2 5% + 7% 3 5% + 5% So the 25 th percentile is about 3 What does that say about the people in the study? out of 4 of them smoked 3 or fewer cigarettes per day Other percentiles are defined in a similar way Eg,, the 95 th percentile is the point on the horizontal axis such that 95% of the area under the histogram lies to the left of it % per cigarette What is the 5 th percentile for the cigarette histogram? 4 3 2 (5) (35) (5) 2 4 8 Number of cigarettes 5 th percentile = 2 (5) out of of these men smoked or fewer cigarettes per day The 25 th, 5 th, and 75 th percentiles are called quartiles: 25 th percentile = first quartile (Q) 5 th percentile = second quartile (2Q) = the median 75 th percentile = third quartile (3Q) The interquartile range (IQR) is the distance between the first and third quartiles; this is measure of spread that is not sensitive to outlying values In the cigarette example, Q 3 3Q 37 IQR = 3Q Q 37 3 = 24 5 5 2
WHY ARE NORMAL CURVES OF INTEREST? Normal curves often provide a simple, compact way of describing how some variable is distributed Many variables (eg, height, blood pressure,, but not years of education, ) have histograms which follow (match up well with) a normal curve: Histogram of heights of women in HANES (976 98) Approximating normal curve For such variables, areas under the histogram that is, population percentages can be approximated by the corresponding areas under the normal curve: 535 56 585 6 635 66 685 7 735 HEIGHT (INCHES) Areas under the normal curve can be computed easily knowing only the average and the SD 5 3 Normal curves are well known and well understood A convenient means of communication As Chapter 8 explains, the sampling distribution of sample averages tends to follow the normal curve This is the cornerstone of statistical inference! THE STANDARD NORMAL CURVE 4 3 2 2 3 4 STANDARD UNITS 2 4 PERCENT PER STANDARD UNIT The equation of the curve is ordinate = % 2π e (abscissa)2 /2 Two very important properties are: The total area under the curve is % Just like a histogram The curve is symmetric about 5 4
A BRIEF TABLE OF AREAS UNDER THE STANDARD NORMAL CURVE The following figure shows some benchmark areas under the standard normal curve: 4 3 2 2 3 4 68 4 % 68% 95 2 % 95% 99 3 4 % All but /4 of % What is the area under the standard normal curve to the right of? = half of = 2 = 2 [ % 68% ] = 2 32% = 6% 5 5 What is the area under the standard normal curve between and 2? 2 = 2 2 2 = 2 [ 95% 68% ] = 2 27% = 3 2 % Alternatively 2 = 2 = 6% 2 2 % = 3 2 % What is the area under the standard normal curve between and 2? 2 = + 2 = 2 68% + 2 95% = 34% + 47 2 % = 8 2 % 5 6
A NORMAL TABLE The following table is like the one on page A86 of FPPA, except that it omits the columns of heights of the normal curve: z z Area (percent) z Area z Area z Area 5 8664 3 9973 5 399 55 8789 35 9977 797 6 894 3 9986 5 92 65 9 35 99837 2 585 7 99 32 99863 25 974 75 999 325 99885 3 2358 8 928 33 9993 35 2737 85 9357 335 9999 4 38 9 9426 34 99933 45 3473 95 9488 345 99944 5 3829 2 9545 35 99953 55 477 25 9596 355 9996 6 455 2 9643 36 99968 65 4843 25 9684 365 99974 7 56 22 9722 37 99978 75 5467 225 9756 375 99982 8 5763 23 9786 38 99986 85 647 235 982 385 99988 9 639 24 9836 39 9999 95 6579 245 9857 395 99992 6827 25 9876 4 999937 5 763 255 9892 45 999949 7287 26 997 4 999959 5 7499 265 992 45 999967 2 7699 27 993 42 999973 25 7887 275 994 425 999979 3 864 28 9949 43 999983 35 823 285 9956 435 999986 4 8385 29 9963 44 999989 45 8529 295 9968 445 99999 Benchmarks z Area 5% 9% 95% 2 3 5 7 THE QUARTILES OF THE STANDARD NORMAL CURVE What is the first quartile of the standard normal curve?? Area 75%?? =? Area % Area % Area % The quartiles of the standard normal curve are: Q = 675 2Q = 3Q = 675 The interquartile range (IQR) for the standard normal curve is IQR = 3Q Q = 675 ( 675) = 35 33 4/3 5 8
What are standard units? STANDARD UNITS Standard units say how many SDs a value is above (+ sign) or below ( sign) average The women in the HANES study had heights averaging to 635 inches, with an SD of 25 inches What is 6 in standard units? 6 is inches average That s SD average So 6 is in standard units What is 685 in standard units? 685 is inches average That s SDs average So 685 is in standard units What height is 24 in standard units? The height is SDs average That s 2 5 5 2 = 6 inches average The height is Reminder: standard units say how many SDs a value is above (+ sign) or below ( sign) average Height (Inches) Standard Units (Dimensionless) 24 6 635 685 Average = 635 SD = 25 Average = SD = Is there a formula for converting a value to standard units? Yes, the formula is standard units = value average SD In our example, to express 685 in standard units you compute 685 635 25 = 5 25 = 2 Is there a formula for converting back from standard units to the original scale? Yes, the formula is Height (Inches) Standard Units (Dimensionless) 24 6 635 685 Average = 635 SD = 25 value = average + (standard units SD) In our example, to find the height corresponding to 24 standard units, you compute 635 + ( 24 25 ) = 635 6 = 575 5 9 5
THE NORMAL APPROXIMATION If a list of numbers follows the normal curve, the percentage of entries falling in a given interval can be estimated by first converting the interval to standard units and then finding the corresponding area under the standard normal curve This procedure is called the normal approximation Consider the heights of women in HANES: USING THE NORMAL APPROXIMATION A group of people have heights that follow a normal curve with average 69 and SD 3 About what percentage of these people have heights 66 or under?? 66 69 Height (inches) Ave = 69, SD = 3 Height (Inches) Standard Units (Dimensionless) 6 633 685 2 Average = 635 SD = 25 Standard units The percentage of women with heights between 6 and 685 is exactly equal to the area under the from to, and approximately equal to the area under the between and, namely 85% If a histogram follows the normal curve, about percent of the area lies within one SD of the average, and about percent within two SDs of the average Warning: The normal approximation, especially for onesided areas, is only valid if the histogram is approximately normal Use your judgement 5 Answer = 6% (see page 5) The method: original units standard units standard normal curve Are these men, or women? Ave height = 5 foot 9: they re men 5 2
Same population (heights averaging to 69 with an SD of 3 ) What height is exceeded by 5% of the population? SUMMARY What is the general procedure for working these kinds of problems? DRAW THE PICTURE 69? 5% Height (inches) Ave = 69, SD = 3 Standard units Sketch the normal curve Put in the axis for the original units Put in the axis for the standard units Shade the area of interest Proceed Thus? = height = SDs above average = 65 3 + 69 = 5 + 69 = 74 = 7 2 Ave Name? (units?) Ave =?, SD =? The method: standard normal curve standard units original units Standard units Be sure to follow this procedure on the homework, quizzes, and exams! 5 3 5 4