Solution Manual for Statistics, Updated Edition, 13th Edition

Preview Extract
6 Chapter 2 Chapter Methods for Describing Sets of Data 2 2.2 In a bar graph, a bar or rectangle is drawn above each class of the qualitative variable corresponding to the class frequency or class relative frequency. In a pie chart, each slice of the pie corresponds to the relative frequency of a class of the qualitative variable. 2.4 First, we find the frequency of the grade A. The sum of the frequencies for all 5 grades must be 200. Therefore, subtract the sum of the frequencies of the other 4 grades from 200. The frequency for grade A is: 200 ๏€ญ (36 + 90 + 30 + 28) = 200 ๏€ญ 184 = 16 To find the relative frequency for each grade, divide the frequency by the total sample size, 200. The relative frequency for the grade B is 36/200 = .18. The rest of the relative frequencies are found in a similar manner and appear in the table: Grade on Statistics Exam A: 90๏€ญ100 B: 80๏€ญ 89 C: 65๏€ญ 79 D: 50๏€ญ 64 F: Below 50 Total Relative Frequency .08 .18 .45 .15 .14 1.00 a. The graph shown is a pie chart. b. The qualitative variable described in the graph is opinion on library importance. c. The most common opinion is more important, with 46.0% of the responders indicating that they think libraries have become more important. d. Using MINITAB, the Pareto diagram is: Chart of Percent 50 40 Percent 2.6 Frequency 16 36 90 30 28 200 30 20 10 0 More Same Less Importance Of those who responded to the question, almost half (46%) believe that libraries have become more important to their community. Only 18% believe that libraries have become less important. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.8 a. From the pie chart, 50.4% or 0.504 of adults living in the U.S. use the internet and pay to download music. From the data, 506 out of 1,003 adults or 506/1,003 = 0.504 of adults in the U.S. use the internet and pay to download music. These two results agree. b. Using MINITAB, a pie chart of the data is: 7 Pie Chart of Download-Music C ategory Pay No Pay No Pay 33.0% Pay 67.0% a. Data were collected on 3 questions. For questions 1 and 2, the responses were either โ€˜yesโ€™ or โ€˜noโ€™. Since these are not numbers, the data are qualitative. For question 3, the responses include โ€˜character countsโ€™, โ€˜roots of empathyโ€™, โ€˜teacher designedโ€™, otherโ€™, and โ€˜noneโ€™. Since these responses are not numbers, the data are qualitative. b. Using MINITAB, bar charts for the 3 questions are: Chart of Classroom Pets 60 50 40 Count 2.10 30 20 10 0 Yes No Classroom Pets Copyright ยฉ 2017 Pearson Education, Inc. 8 Chapter 2 Chart of Pet Visits 40 Count 30 20 10 0 Yes No Pet Visits Chart of Education 30 25 Count 20 15 10 5 0 Character Counts Roots of Empathy Teacher designed Other None Education c. 2.12 Many different things can be written. Possible answers might be: Most of the classroom teachers surveyed (61 / 75 ๏€ฝ 0.813) keep classroom pets. A little less than half of the surveyed classroom teachers (35 / 75 ๏€ฝ 0.467) allow visits by pets. Using MINITAB, the pie chart is: Pie Chart of Loc Category Urban Suburban Rural Rural 5.7% Suburban 32.8% Urban 61.5% Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.14 a. b. 2.16 The two qualitative variables graphed in the bar charts are the occupational titles of clan individuals in the continued line and the occupational titles of clan individuals in the dropout line. In the Continued Line, about 63% were in either the high or the middle grade. Only about 20% were in the nonofficial category. In the Dropout Line, only about 22% were in either the high or middle grade while about 64% were in the nonofficial category. The percentages in the low grade and provincial official categories were about the same for the two lines. Using MINITAB, the Pareto chart is: Chart of Allocation .14 Relative frequency .12 .10 .08 .06 .04 .02 0 #5 #8 #6 #7 #10 #11 #2 #3 #4 #9 #1 Track Proportion within all data. From the graph, it appears that tracks #5 and #8 were over-utilized and track #1 is underutilized. a. Using MINITAB, the Pareto chart of the total annual shootings involving the Boston street gang is: Chart of Total Shootings .5 Proportion of Total Shootings 2.18 9 .4 .3 .2 .1 0 2006 2007 2009 2008 2010 Year Proportion within all data. Copyright ยฉ 2017 Pearson Education, Inc. 10 Chapter 2 b. Using MINITAB, the Pareto chart of the annual shootings of the Boston gang members is: Chart of Gang Member Shootings Proportion of Gang Members .5 .4 .3 .2 .1 0 2006 2007 2009 2008 2010 Year Proportion within all data. c. Using MINITAB, the side-by-side bar graphs showing the distribution of dives for the three match situations are: Chart of Team behind, Tied, Team ahead Left Team behind Middle Tied Right .8 .6 .4 Proportion 2.20 Because the proportion of shootings per year dropped drastically after 2007 for both the total annual shootings and annual shootings of the Boston street gang members, it appears that Operation Ceasefire was effective. .2 0 Team ahead .8 .6 .4 .2 0 Left Middle Right Dive Proportion within all data. From the graphs, it appears that when a team is tied or ahead, there is no difference in the proportion of times the goal-keeper dives right or left. However, if the team is behind, the goal-keeper tends to dive right much more frequently than left. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.22 11 Using MINITAB, a bar graph of the data is: Chart of Measure 9 8 7 Freq 6 5 4 3 2 1 0 Total Visitors Paying visitors Big shows Funds raised Members Measure The researcher concluded that โ€œthere is a large amount of variation within the museum community with regards to . . . performance measurement and evaluationโ€. From the data, there are only 5 different performance measures. I would not say that this is a large amount. Within these 5 categories, the number of times each is used does not vary that much. I would disagree with the researcher. There is not much variation. Using MINITAB a bar chart for the Extinct status versus flight capability is: Chart of Extinct, Flight 80 70 60 50 Count 2.24 40 30 20 10 0 Flight Extinct No Yes Absent No Yes No No Yes Yes It appears that extinct status is related to flight capability. For birds that do have flight capability, most of them are present. For those birds that do not have flight capability, most are extinct. Copyright ยฉ 2017 Pearson Education, Inc. 12 Chapter 2 The bar chart for Extinct status versus Nest Density is: Chart of Extinct, Nest Density 60 50 Count 40 30 20 10 0 Nest Density H Extinct L H Absent L H No L Yes It appears that extinct status is not related to nest density. The proportion of birds present, absent, and extinct appears to be very similar for nest density high and nest density low. The bar chart for Extinct status versus Habitat is: Chart of Extinct, Habitat 40 Count 30 20 10 0 Habitat Extinct A TA Absent TG A TA No TG A TA TG Yes It appears that the extinct status is related to habitat. For those in aerial terrestrial (TA), most species are present. For those in ground terrestrial (TG), most species are extinct. For those in aquatic, most species are present. 2.26 The difference between a bar chart and a histogram is that a bar chart is used for qualitative data and a histogram is used for quantitative data. For a bar chart, the categories of the qualitative variable usually appear on the horizontal axis. The frequency or relative frequency for each category usually appears on the vertical axis. For a histogram, values of the quantitative variable usually appear on the horizontal axis and either frequency or relative frequency usually appears on the vertical axis. The quantitative data are grouped into intervals which appear on the horizontal axis. The number of observations appearing in each interval is then graphed. Bar charts usually leave spaces between the bars while histograms do not. 2.28 In a histogram, a class interval is a range of numbers above which the frequency of the measurements or relative frequency of the measurements is plotted. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.30 2.32 a. This is a frequency histogram because the number of observations is displayed rather than the relative frequencies. b. There are 14 class intervals used in this histogram. c. The total number of measurements in the data set is 49. 13 Using MINITAB, the relative frequency histogram is: .25 Relative frequency .20 .15 .10 .05 0 2.34 a. 0.5 2.5 4.5 6.5 8.5 10.5 Class Interval 12.5 14.5 16.5 Using MINITAB, the relative frequency histogram is: Histogram of RDER 0.25 Relative Frequency 0.20 0.15 0.10 0.05 0 -45 -15 15 45 75 105 135 165 195 225 255 RDER Value 2.36 b. From the graph, the proportion of subjects with RDER values between 75 and 105 is about 0.18. The exact proportion is 13 / 71 ๏€ฝ 0.183 . b. From the graph, the proportion of subjects with RDER values below 15 is about 0.01 ๏€ซ 0.08 ๏€ฝ 0.09 . The exact proportion is ๏€จ1 ๏€ซ 6 ๏€ฉ / 71 ๏€ฝ 0.099 . a. Because the label on the vertical axis is โ€˜Percentโ€™ , this is a relative frequency histogram. Copyright ยฉ 2017 Pearson Education, Inc. 14 Chapter 2 b. 2.38 From the graph, the percentage of the 992 senior managers who reported a high level of support for corporate sustainability is about 3.8 ๏€ซ 2.4 ๏€ซ 2.1 ๏€ซ 1.2 ๏€ซ 1.2 ๏€ซ 0.5 ๏€ซ 0.7 ๏€ซ 0.2 ๏€ซ 0.1 ๏€ซ 0 ๏€ซ 0.1 ๏€ฝ 12.3% . Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf Display: Depth Stem-and-leaf of Depth Leaf Unit = 0.10 2 4 8 (3) 7 5 3 13 14 15 16 17 18 19 N = 18 29 00 7789 125 08 11 347 The data in the stem-and-leaf display are displayed to 1 decimal place while the actual data is displayed to 2 decimal places. To 1 decimal place, there are 3 numbers that appear twice โ€“ 14.0, 15.7, and 18.1. However, to 2 decimal places, none of these numbers are the same. Thus, no molar depth occurs more frequently in the data. 2.40 a. Using MINITAB, the dot plot of the honey dosage data is: Dotplot of Honey Dosage Group 4 6 8 10 12 14 16 Improvement Score b. Both 10 and 12 occurred 6 times in the honey dosage group. c. From the graph in part c, 8 of the top 11 scores (72.7%) are from the honey dosage group. Of the top 30 scores, 18 (60%) are from the honey dosage group. This supports the conclusions of the researchers that honey may be a preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.42 a. 15 Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf Display: Spider Stem-and-leaf of Spider Leaf Unit = 10 1 3 (3) 4 2 1 2.44 0 0 0 0 0 1 N = 10 0 33 455 67 9 1 b. The spiders with a contrast value of 70 or higher are in bold type in the stem-and-leaf display in part a. There are 3 spiders in this group. c. The sample proportion of spiders that a bird could detect is 3 / 10 ๏€ฝ 0.3 . Thus, we could infer that a bird could detect a crab-spider sitting on the yellow central part of a daisy about 30% of the time. a. A stem-and-leaf display of the data using MINITAB is: Stem-and-Leaf Display: FNE Stem-and-leaf of FNE Leaf Unit = 1.0 2 3 6 10 12 (2) 11 7 3 2 N = 25 0 67 0 8 1 001 1 3333 1 45 1 66 1 8999 2 0011 2 3 2 45 b. The numbers in bold in the stem-and-leaf display represent the bulimic students. Those numbers tend to be the larger numbers. The larger numbers indicate a greater fear of negative evaluation. Thus, the bulimic students tend to have a greater fear of negative evaluation. c. A measure of reliability indicates how certain one is that the conclusion drawn is correct. Without a measure of reliability, anyone could just guess at a conclusion. Copyright ยฉ 2017 Pearson Education, Inc. 16 Chapter 2 2.46 a. Using MINITAB, histograms of the two sets of SAT scores are: Histogram of TOT2011, TOT2014 1400 1500 TOT2011 1600 1700 1800 TOT2014 10 Frequency 8 6 4 2 0 1400 1500 1600 1700 1800 It appears that the distributions of both sets of scores are somewhat skewed to the right. Although the distributions are not identical for the two years, they are similar. b. Using MINITAB, a histogram of the differences of the 2014 and 2011 SAT scores is: Histogram of Diff 35 30 Frequency 25 20 15 10 5 0 -200 -150 -100 -50 0 50 Diff c. It appears that there are more differences above 0 than below 0. Thus, it appears that in general, the 2014 SAT scores are higher than the 2011 SAT scores. However, there are many differences that are very close to 0. d. Wyoming had the largest improvement in SAT scores from 2011 to 2014, with an increase of 65 points. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.48 17 Using MINITAB, the side-by-side histograms are: Histogram of ZETA without, ZETA with GYPSUM -60 ZETA without -48 -36 -24 -12 ZETA with GYPSUM Frequency 40 30 20 10 0 -60 -48 -36 -24 -12 The addition of calcium/gypsum increases the values of the zeta potential of silica. All of the values of zeta potential for the specimens containing calcium/gypsum are greater than all of the values of zeta potential for the specimens without calcium/gypsum. 2.50 A measure of central tendency measures the โ€œcenterโ€ of the distribution while measures of variability measure how spread out the data are. 2.52 A skewed distribution is a distribution that is not symmetric and not centered around the mean. One tail of the distribution is longer than the other. If the mean is greater than the median, then the distribution is skewed to the right. If the mean is less than the median, the distribution is skewed to the left. 2.54 a. For a distribution that is skewed to the left, the mean is less than the median. b. For a distribution that is skewed to the right, the mean is greater than the median. c. For a symmetric distribution, the mean and median are equal. 2.56 Assume the data are a sample. The mode is the observation that occurs most frequently. For this sample, the mode is 15, which occurs 3 times. The sample mean is: ๏ƒฅ x ๏€ฝ 18 ๏€ซ 10 ๏€ซ 15 ๏€ซ 13 ๏€ซ 17 ๏€ซ 15 ๏€ซ 12 ๏€ซ 15 ๏€ซ 18 ๏€ซ 16 ๏€ซ 11 ๏€ฝ 160 ๏€ฝ 14.545 x๏€ญ n 11 11 The median is the middle number when the data are arranged in order. The data arranged in order are: 10, 11, 12, 13, 15, 15, 15, 16, 17, 18, 18. The middle number is the 6th number, which is 15. 2.58 The median is the middle number once the data have been arranged in order. If n is even, there is not a single middle number. Thus, to compute the median, we take the average of the middle two numbers. If n is odd, there is a single middle number. The median is this middle number. Copyright ยฉ 2017 Pearson Education, Inc. 18 Chapter 2 A data set with 5 measurements arranged in order is 1, 3, 5, 6, 8. The median is the middle number, which is 5. A data set with 6 measurements arranged in order is 1, 3, 5, 5, 6, 8. The median is the 5 ๏€ซ 5 10 ๏€ฝ ๏€ฝ 5. average of the middle two numbers which is 2 2 n ๏ƒฅx i 2.60 1 ๏€ซ 2 ๏€ซ 3 ๏€ซ … ๏€ซ 9 42 ๏€ฝ ๏€ฝ 3.23 . This is the average number of n 13 13 sword shafts buried at each grave site. i ๏€ฝ1 a. The mean is x ๏€ฝ ๏€ฝ b. To find the median, the data must be arranged in order. The data arranged in order are: 1 1 1 2 2 2 2 3 4 4 5 6 9 There are a total of 13 observations, which is an odd number. The median is the middle number which is 2. Half of the grave sites had 2 or fewer sword shafts buried and half had 2 or more. c. The mode is the number that occurs most frequently. In this case, the mode is 2. n ๏ƒฅx i 2.62 a. b. 54 ๏€ซ 42 ๏€ซ 51 ๏€ซ … ๏€ซ 40 365 ๏€ฝ ๏€ฝ 45.625 . This is the Performance The mean is x ๏€ฝ i ๏€ฝ1 ๏€ฝ n 8 8 Anxiety Inventory scale for participants in 8 different studies. To find the median, the data must be arranged in order. The data arranged in order are: 39 40 41 42 43 51 54 55 There are a total of 8 observations, which is an even number. The median is the average 42 ๏€ซ 43 85 ๏€ฝ ๏€ฝ 42.5 . Half of the middle 2 numbers which are 42 and 43. The median is 2 2 of the studies had a PAI scale less than 42.5 and half had a value greater than 42.5. c. If 39 were eliminated, the mean becomes n ๏ƒฅx i 54 ๏€ซ 42 ๏€ซ 51 ๏€ซ … ๏€ซ 40 326 x ๏€ฝ i ๏€ฝ1 ๏€ฝ ๏€ฝ ๏€ฝ 46.571 . The data arranged in order are now: n 7 7 40 41 42 43 51 54 55. The median is the middle number which is 43. The mean increased by 0.946 while the median only increased by 0.5. 2.64 a. There are 35 observations in the honey dosage group. Thus, the median is the middle number, once the data have been arranged in order from the smallest to the largest. The middle number is the 18th observation which is 11. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data b. There are 33 observations in the DM dosage group. Thus, the median is the middle number, once the data have been arranged in order from the smallest to the largest. The middle number is the 17th observation which is 9. c. There are 37 observations in the control group. Thus, the median is the middle number, once the data have been arranged in order from the smallest to the largest. The middle number is the 19th observation which is 7. d. Since the median of the honey dosage group is the highest, the median of the DM groups is the next highest, and the median of the control group is the smallest, we can conclude that the honey dosage is the most effective, the DM dosage is the next most effective, and nothing (control) is the least effective. ๏ƒฅ x ๏€ฝ 77.07 ๏€ฝ 1.927 The mean of the driving performance index values is: x ๏€ฝ 40 n a. The median is the average of the middle two numbers once the data have been arranged in order. After arranging the numbers in order, the 20th and 21st numbers are 1.75 and 1.75 ๏€ซ 1.76 ๏€ฝ 1.755 1.76. The median is: 2 The mode is the number that occurs the most frequently and is 1.4. b. The average driving performance index is 1.927. The median is 1.755. Half of the players have driving performance index values less than 1.755 and half have values greater than 1.755. Three of the players have the same index value of 1.4. c. Since the mean is greater than the median, the data are skewed to the right. Using MINITAB, a histogram of the data is: Histogram of Performance 10 8 Fr equency 2.66 19 6 4 2 0 1.5 2.0 2.5 P er for mance 3.0 Copyright ยฉ 2017 Pearson Education, Inc. 3.5 20 Chapter 2 2.68 a. The mean number of ant species discovered is: x๏€ฝ ๏ƒฅ x ๏€ฝ 3 ๏€ซ 3 ๏€ซ … ๏€ซ 4 ๏€ฝ 141 ๏€ฝ 12.82 n 11 11 The median is the middle number once the data have been arranged in order: 3, 3, 4, 4, 4, 5, 5, 5, 7, 49, 52. The median is 5. The mode is the value with the highest frequency. Since both 4 and 5 occur 3 times, both 4 and 5 are modes. b. For this case, we would recommend that the median is a better measure of central tendency than the mean. There are 2 very large numbers compared to the rest. The mean is greatly affected by these 2 numbers, while the median is not. c. The mean total plant cover percentage for the Dry Steppe region is: x๏€ฝ ๏ƒฅ x ๏€ฝ 40 ๏€ซ 52 ๏€ซ … ๏€ซ 27 ๏€ฝ 202 ๏€ฝ 40.4 n 5 5 The median is the middle number once the data have been arranged in order: 27, 40, 40, 43, 52. The median is 40. The mode is the value with the highest frequency. Since 40 occurs 2 times, 40 is the mode. d. The mean total plant cover percentage for the Gobi Desert region is: x๏€ฝ ๏ƒฅ x ๏€ฝ 30 ๏€ซ 16 ๏€ซ … ๏€ซ 14 ๏€ฝ 168 ๏€ฝ 28 n 6 6 The median is the mean of the middle 2 numbers once the data have been arranged in order: 14, 16, 22, 30, 30, 56. The median is 22 ๏€ซ 30 52 ๏€ฝ ๏€ฝ 26 . 2 2 The mode is the value with the highest frequency. Since 30 occurs 2 times, 30 is the mode. e. Yes, the total plant cover percentage distributions appear to be different for the 2 regions. The percentage of plant coverage in the Dry Steppe region is much greater than that in the Gobi Desert region. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.70 a. 21 Using MINITAB, the simple statistics are: Descriptive Statistics: ZETA without, ZETA with GYPSUM Variable ZETA without ZETA with GYPSUM N 50 50 Mean -52.070 -10.958 Median -52.250 -11.300 Mode -50.2 -11.3 N for Mode 3 5 For the liquid solutions prepared without calcium/gypsum, the mean zeta potential measurement is -52.070, the median is -52.250 and the mode is -50.2. The average zeta potential measurement for liquid solutions prepared without calcium/gypsum is -52.070. Half of the zeta potential measurements for liquid solutions prepared without calcium/gypsum are less than or equal to -52.250 and half are greater than -52.250. The most common zeta potential measurement for liquid solutions prepared without calcium/gypsum is -50.2. 2.72 b. For the liquid solutions prepared with calcium/gypsum, the mean zeta potential measurement is -10.958, the median is -11.300 and the mode is -11.3. The average zeta potential measurement for liquid solutions prepared with calcium/gypsum is -10.958. Half of the zeta potential measurements for liquid solutions prepared with calcium/gypsum are less than or equal to -11.300 and half are greater than -11.300. The most common zeta potential measurement for liquid solutions prepared with calcium/gypsum is -11.3. c. The interpretation remains the same as in Exercise 2.48. The addition of calcium/gypsum increases the values of the zeta potential of silica. The mean, median and mode of the values of zeta potential for the specimens containing calcium/gypsum are greater than the mean, median and mode of the values of zeta potential for the specimens without calcium/gypsum. a. The mean number of power plants is: n ๏ƒฅx i x๏€ฝ i ๏€ฝ1 n ๏€ฝ 5 ๏€ซ 3 ๏€ซ 4 ๏€ซ … ๏€ซ 3 79 ๏€ฝ ๏€ฝ 3.95 20 20 The median is the mean of the middle 2 numbers once the data have been arranged in order: 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 9, 11 The median is 3๏€ซ 4 7 ๏€ฝ ๏€ฝ 3.5 . 2 2 The number 1 occurs 5 times. The mode is 1. b. Deleting the largest number, 11, the new mean is: n ๏ƒฅx i x ๏€ฝ i ๏€ฝ1 n ๏€ฝ 5 ๏€ซ 3 ๏€ซ 4 ๏€ซ … ๏€ซ 3 68 ๏€ฝ ๏€ฝ 3.58 19 19 Copyright ยฉ 2017 Pearson Education, Inc. 22 Chapter 2 The median is the middle number once the data have been arranged in order: 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 9 The median is 3. The number 1 occurs 5 times. The mode is 1. By dropping the largest measurement from the data set, the mean drops from 3.95 to 3.58. The median drops from 3.5 to 3 and the mode stays the same. c. Deleting the lowest 2 and highest 2 measurements leaves the following: 1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7 The new mean is: n ๏ƒฅx i x ๏€ฝ i ๏€ฝ1 n ๏€ฝ 1 ๏€ซ 1 ๏€ซ 1 ๏€ซ … ๏€ซ 7 57 ๏€ฝ ๏€ฝ 3.56 16 16 The trimmed mean has the advantage that some possible outliers have been eliminated. 2.74 The primary disadvantage of using the range to compare variability of data sets is that the two data sets can have the same range and be vastly different with respect to data variation. Also, the range is greatly affected by extreme measures. 2.76 The variance of a data set can never be negative. The variance of a sample is the sum of the squared deviations from the mean divided by n ๏€ญ 1. The square of any number, positive or negative, is always positive. Thus, the variance will be positive. The variance is usually greater than the standard deviation. However, it is possible for the variance to be smaller than the standard deviation. If the data are between 0 and 1, the variance will be smaller than the standard deviation. For example, suppose the data set is 0.8, 0.7, 0.9, 0.5, and 0.3. The sample mean is: x๏€ฝ ๏ƒฅ x ๏€ฝ 0.8 ๏€ซ 0.7 ๏€ซ 0.9 ๏€ซ 0.5 ๏€ซ 0.3 ๏€ฝ 3.2 ๏€ฝ 0.64 n .5 5 The sample variance is: s2 ๏€ฝ ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 3.22 5 ๏€ฝ 2.28 ๏€ญ 2.048 ๏€ฝ 0.058 5 ๏€ญ1 4 2.28 ๏€ญ The standard deviation is s ๏€ฝ 0.058 ๏€ฝ 0.241 Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.78 a. b. 2.80 s2 ๏€ฝ s2 ๏€ฝ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ n ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ Range ๏€ฝ 4 ๏€ญ 0 ๏€ฝ 4 s ๏€ฝ 3.3333 ๏€ฝ 1.826 17 2 20 ๏€ฝ 0.1868 ๏€ฝ 20 ๏€ญ 1 18 ๏€ญ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 1002 40 ๏€ฝ 3.3333 40 ๏€ญ 1 380 ๏€ญ 82 5 ๏€ฝ 2.3 ๏€ฝ 4 ๏€ญ1 22 ๏€ญ n s ๏€ฝ 0.1868 ๏€ฝ 0.432 s ๏€ฝ 2.3 ๏€ฝ 1.52 ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 17 2 7 ๏€ฝ 3.619 7 ๏€ญ1 s ๏€ฝ 3.619 ๏€ฝ 1.90 27 2 9 ๏€ฝ8 9 ๏€ญ1 s ๏€ฝ 8 ๏€ฝ 2.828 (๏€ญ5) 2 18 ๏€ฝ 1.624 18 ๏€ญ 1 s ๏€ฝ 1.624 ๏€ฝ 1.274 63 ๏€ญ Range ๏€ฝ 8 ๏€ญ (๏€ญ2) ๏€ฝ 10 s2 ๏€ฝ d. n ๏€ญ1 s ๏€ฝ 4.8889 ๏€ฝ 2.211 Range ๏€ฝ 6 ๏€ญ 0 ๏€ฝ 6 s2 ๏€ฝ c. 2 n a. b. ๏€ฝ 2 s2 ๏€ฝ ๏ƒฅ 2 2 n ๏€ญ1 ๏ƒฅ 202 10 ๏€ฝ 4.8889 ๏€ฝ 10 ๏€ญ 1 84 ๏€ญ n n ๏€ญ1 ๏ƒฅ 2 2 c. s2 ๏€ฝ ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 145 ๏€ญ Range ๏€ฝ 2 ๏€ญ (๏€ญ3) ๏€ฝ 5 s2 ๏€ฝ 2.82 ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 29 ๏€ญ This is one possibility for the two data sets. Data Set 1: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Data Set 2: 0, 0, 1, 1, 2, 2, 3, 3, 9, 9 The two sets of data above have the same range = largest measurement ๏€ญ smallest measurement ๏€ฝ 9 ๏€ญ 0 ๏€ฝ 9 . The means for the two data sets are: ๏€ฝ ๏ƒฅ x 0 ๏€ซ 1 ๏€ซ 2 ๏€ซ 3 ๏€ซ 4 ๏€ซ 5 ๏€ซ 6 ๏€ซ 7 ๏€ซ 8 ๏€ซ 9 45 x1 ๏€ฝ ๏€ฝ ๏€ฝ 4.5 n 10 10 Copyright ยฉ 2017 Pearson Education, Inc. 23 24 Chapter 2 x2 ๏€ฝ ๏ƒฅ x ๏€ฝ 0 ๏€ซ 0 ๏€ซ 1 ๏€ซ 1 ๏€ซ 2 ๏€ซ 2 ๏€ซ 3 ๏€ซ 3 ๏€ซ 9 ๏€ซ 9 ๏€ฝ 30 ๏€ฝ 3 n 10 10 The dot diagrams for the two data sets are shown below. Group Dotplot of Data 2.84 a. b. c. d. 2.86 a. x-bar1 1 2 0 s2 ๏€ฝ 2 ๏ƒฅ x-bar2 ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 4 2 2 n ๏€ญ1 6 n ๏€ฝ ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 282 5 ๏€ฝ 69.2 ๏€ฝ 17.3 5 ๏€ญ1 4 226 ๏€ญ 2 2 s2 ๏€ฝ 2 2 n ๏€ญ1 n ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ s ๏€ฝ 17.3 ๏€ฝ 4.1593 552 4 ๏€ฝ 456.75 ๏€ฝ 152.25 square feet 4 ๏€ญ1 3 1213 ๏€ญ n ๏€ฝ n ๏€ญ1 s ๏€ฝ 152.25 ๏€ฝ 12.339 feet s2 ๏€ฝ 8 Data ๏€ฝ (๏€ญ15) 2 6 ๏€ฝ 21.5 ๏€ฝ 4.3 6 ๏€ญ1 5 59 ๏€ญ s ๏€ฝ 4.3 ๏€ฝ 2.0736 2 24 22 ๏€ญ 0.2933 n s2 ๏€ฝ ๏€ฝ 25 6 ๏€ฝ ๏€ฝ 0.0587 square ounces n ๏€ญ1 6 ๏€ญ1 5 s ๏€ฝ 0.0587 ๏€ฝ 0.2422 ounce ๏ƒฅ 2 The range is the difference between the largest and smallest numbers. For this data set, the range is 9 ๏€ญ 1 ๏€ฝ 8 . ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n b. The sample variance is s 2 ๏€ฝ c. The sample standard deviation is s ๏€ฝ 5.526 ๏€ฝ 2.351 . n ๏€ญ1 ๏€ฝ 422 13 ๏€ฝ 5.526 . 13 ๏€ญ 1 202 ๏€ญ Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.88 d. Both the range and the standard deviation have the same units of measure as the original variable. a. The range is the difference between the largest and smallest observations and is 17.83 ๏€ญ 4.90 ๏€ฝ 12.93 meters. b. The variance is: s2 ๏€ฝ 2.90 25 ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 126.322 13 ๏€ฝ 16.767 square meters 13 ๏€ญ 1 1428.64 ๏€ญ c. The standard deviation is s ๏€ฝ 16.767 ๏€ฝ 4.095 meters. a. For Group A, the range is 67.20. From the printout, the range is 122.40 ๏€ญ 55.20 ๏€ฝ 67.2 . b. For Group A, the standard deviation is 14.48. From the printout, s ๏€ฝ s 2 ๏€ฝ 209.53 ๏€ฝ 14.48 . 2.92 c. Group B has the more variable permeability data. Group B has the largest range, the largest variance and the largest standard deviation. a. Range ๏€ฝ 11 ๏€ญ 1 ๏€ฝ 10 s2 ๏€ฝ b. ๏ƒฅ ๏€จ ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 792 20 ๏€ฝ 7.524 20 ๏€ญ 1 455 ๏€ญ s ๏€ฝ s 2 ๏€ฝ 7.524 ๏€ฝ 2.743 Dropping the largest measurement: Range ๏€ฝ 9 ๏€ญ 1 ๏€ฝ 8 ๏€จ ๏ƒฅ x๏€ฉ x ๏€ญ 2 682 n 19 ๏€ฝ 5.035 s2 ๏€ฝ ๏€ฝ s ๏€ฝ s 2 ๏€ฝ 5.035 ๏€ฝ 2.244 n ๏€ญ1 19 ๏€ญ 1 By dropping the largest observation from the data set, the range decreased from 10 to 8, the variance decreased from 7.524 to 5.035 and the standard deviation decreased from 2.743 to 2.244. ๏ƒฅ c. 2 334 ๏€ญ Dropping the largest and smallest measurements: Range ๏€ฝ 9 ๏€ญ 1 ๏€ฝ 8 ๏€จ ๏ƒฅ x๏€ฉ x ๏€ญ 2 67 2 n 18 ๏€ฝ 4.918 s2 ๏€ฝ ๏€ฝ s ๏€ฝ s 2 ๏€ฝ 4.918 ๏€ฝ 2.218 n ๏€ญ1 18 ๏€ญ 1 By dropping the largest and smallest observations from the data set, the range decreased from 10 to 8, the variance decreased from 7.524 to 4.918 and the standard deviation decreased from 2.743 to 2.218. ๏ƒฅ 2 333 ๏€ญ Copyright ยฉ 2017 Pearson Education, Inc. 26 Chapter 2 2.94 The Empirical Rule applies only to data sets that are mound-shapedโ€”that are approximately symmetric, with a clustering of measurements about the midpoint of the distribution and that tail off as one moves away from the center of the distribution. 2.96 Since no information is given about the data set, we can only use Chebyshev’s rule. 2.98 a. At least 0 of the measurements will fall between x ๏€ญ s and x ๏€ซ s . b. At least 3/4 or 75% of the measurements will fall between x ๏€ญ 2s and x ๏€ซ 2s . c. At least 8/9 or 89% of the measurements will fall between x ๏€ญ 3s and x ๏€ซ 3s . a. x๏€ฝ s2 ๏€ฝ ๏ƒฅ x ๏€ฝ 206 ๏€ฝ 8.24 n ๏ƒฅ 25 ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 2062 25 ๏€ฝ 3.357 25 ๏€ญ 1 1778 ๏€ญ s ๏€ฝ s 2 ๏€ฝ 1.83 b. Number of Measurements in Interval Interval Percentage x ๏‚ฑ s , or (6.41, 10.17) 18 18 / 25 ๏€ฝ 0.72 or 72% x ๏‚ฑ 2s , or (4.58, 11.90) 24 24 / 25 ๏€ฝ 0.96 or 96% x ๏‚ฑ 3s , or (2.75, 13.73) 25 25 / 25 ๏€ฝ 1 c. The percentages in part b are in agreement with Chebyshev’s rule and agree fairly well with the percentages given by the Empirical Rule. d. Range ๏€ฝ 12 ๏€ญ 5 ๏€ฝ 7 s ๏‚ป range / 4 ๏€ฝ 7 / 4 ๏€ฝ 1.75 The range approximation provides a satisfactory estimate of s. a. Using MINITAB, the histogram is: Histogram of Wheels 12 10 8 Frequency 2.100 or 100% 6 4 2 0 1 2 3 4 5 6 7 8 Wheels Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 27 Although the distribution is somewhat mound-shaped, the distribution is skewed to the right. b. Using MINITAB, the mean and standard deviation are: Descriptive Statistics: Wheels Variable Wheels c. d. N 28 Mean 3.214 StDev 1.371 x ๏‚ฑ 2 s ๏ƒž 3.214 ๏‚ฑ 2(1.371) ๏ƒž 3.214 ๏‚ฑ 2.742 ๏ƒž (0.472,5.956) 3 According to Chebyshevโ€™s rule, at least of the measurements will fall within 2 4 standard deviations of the mean. e. According to the Empirical Rule, approximately 95% of the measurements will fall within 2 standard deviations of the mean. f. 2.102 Twenty-six of the twenty-eight observations fall within the interval. The proportion is 26 ๏€ฝ 0.929 . The Empirical Rule does provide a good estimate of the proportion. The 28 actual percentage is 92.9% which is close to 95%. a. If the data are symmetric and mound shaped, then the Empirical Rule will describe the data. About 95% of the observations will fall within 2 standard deviation of the mean. The interval two standard deviations below and above the mean is x ๏‚ฑ 2s ๏ƒž 39 ๏‚ฑ 2(6) ๏ƒž 39 ๏‚ฑ 12 ๏ƒž (27, 51) . This range would be 27 to 51. b. To find the number of standard deviations above the mean a score of 51 would be, we subtract the mean from 51 and divide by the standard deviation. Thus, a score of 51 is 51 ๏€ญ 39 ๏€ฝ 2 standard deviations above the mean. From the Empirical Rule, about .025 of 6 the drug dealers will have WR scores above 51. c. By the Empirical Rule, about 99.7% of the observations will fall within 3 standard deviations of the mean. Thus, nearly all the scores will fall within 3 standard deviations of the mean. The interval three standard deviations below and above the mean is x ๏‚ฑ 3s ๏ƒž 39 ๏‚ฑ 3(6) ๏ƒž 39 ๏‚ฑ 18 ๏ƒž (21, 57) . This range would be 21 to 57. 2.104 2.106 a. Because the histogram in Exercise 2.34 is skewed to the right, Chebyshevโ€™s rule is more appropriate for describing the distribution of the RDER values. b. The interval x ๏‚ฑ 2 s is 78.19 ๏‚ฑ 2(63.24) ๏ƒž 78.19 ๏‚ฑ 126.48 ๏ƒž (๏€ญ48.29, 204.67) . At least ยพ or 75% of the observations will be between -48.29 and 204.67. a. By Chebyshevโ€™s rule, at least 75% of the observations will fall within 2 standard deviations of the mean. This interval is x ๏‚ฑ 2 s ๏ƒž 0.90 ๏‚ฑ 2(1.10) ๏ƒž 0.90 ๏‚ฑ 2.20 ๏ƒž (๏€ญ1.30, 3.10) . Copyright ยฉ 2017 Pearson Education, Inc. 28 Chapter 2 2.108 b. No. A value of 7 would be (7 ๏€ญ 0.90) / 1.10 ๏€ฝ 5.55 standard deviations above the mean. This would be very unusual. a. There are 2 observations with missing values for egg length, so there are only 130 useable observations. x๏€ฝ s2 ๏€ฝ ๏ƒฅ x 7,885 ๏€ฝ ๏€ฝ 60.65 130 n ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ (7,885) 2 130 ๏€ฝ 249,586.4231 ๏€ฝ 1,934.7785 130 ๏€ญ 1 129 727,842 ๏€ญ s ๏€ฝ s 2 ๏€ฝ 1,934.7785 ๏€ฝ 43.99 b. The data are not symmetrical or mound-shaped. Thus, we will use Chebyshevโ€™s Rule. We know that there are at least 8/9 or 88.9% of the observations within 3 standard deviations of the mean. Thus, at least 88.9% of the observations will fall in the interval: x ๏‚ฑ 3s ๏ƒž 60.65 ๏‚ฑ 3(43.99) ๏ƒž 60.65 ๏‚ฑ 131.97 ๏ƒž ( ๏€ญ71.32, 192.69) Since it is impossible to have negative egg lengths, at least 88.9% of the egg lengths will be between 0 and 192.69. 2.110 2.112 a. The mean and standard deviation for Group A are 73.62 and 14.48. The histogram of the data from Group A is skewed to the right so Chebyshevโ€™s rule is more appropriate. We know at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. This interval is x ๏‚ฑ 3s ๏ƒž 73.62 ๏‚ฑ 3(14.48) ๏ƒž 73.62 ๏‚ฑ 43.44 ๏ƒž (30.18, 117.06) . b. The mean and standard deviation for Group B are 128.54 and 21.97. The histogram of the data from Group B is skewed to the left so Chebyshevโ€™s rule is more appropriate. We know at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. This interval is x ๏‚ฑ 3s ๏ƒž 128.54 ๏‚ฑ 3(21.97) ๏ƒž 128.54 ๏‚ฑ 65.91 ๏ƒž (62.63, 194.45) . c. The mean and standard deviation for Group C are 83.07 and 20.05. The histogram of the data from Group C is skewed to the right so Chebyshevโ€™s rule is more appropriate. We know at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. This interval is x ๏‚ฑ 3s ๏ƒž 83.07 ๏‚ฑ 3(20.05) ๏ƒž 83.07 ๏‚ฑ 60.15 ๏ƒž (22.92, 143.22) . d. It appears that weathering type B results in faster decay. To decide which group the patient is most likely to come from, we will compute the z-score for each group. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data Group T: z ๏€ฝ Group V: z ๏€ฝ Group C: z ๏€ฝ x๏€ญ๏ญ ๏ณ x๏€ญ๏ญ ๏ณ x๏€ญ๏ญ ๏ณ ๏€ฝ 22.5 ๏€ญ 10.5 ๏€ฝ 1.58 7.6 ๏€ฝ 22.5 ๏€ญ 3.9 ๏€ฝ 2.48 7.5 ๏€ฝ 22.5 ๏€ญ 1.4 ๏€ฝ 2.81 7.5 29 The patient is most likely to have come from Group T. The z-score for Group T is z ๏€ฝ 1.58 . This would not be an unusual z-score if the patient was in Group T. The z-scores for the other 2 groups are both greater than 2. We know that z-scores greater than 2 are rather unusual. 2.114 a. The 50th percentile is also called the median. b. The QL is the lower quartile. This is also the 25th percentile or the score which has 25% of the observations less than it. c. The QU is the upper quartile. This is also the 75th percentile or the score which has 75% of the observations less than it. 2.116 For mound-shaped distributions, we can use the Empirical Rule. About 95% of the observations will fall within 2 standard deviations of the mean. Thus, about 95% of the measurements will have z-scores between -2 and 2. 2.118 We first compute z-scores for each x value. a. z๏€ฝ b. z๏€ฝ c. z๏€ฝ d. z๏€ฝ x๏€ญ๏ญ ๏ณ x๏€ญ๏ญ ๏ณ x๏€ญ๏ญ ๏ณ x๏€ญ๏ญ ๏ณ ๏€ฝ 100 ๏€ญ 50 ๏€ฝ2 25 ๏€ฝ 1๏€ญ 4 ๏€ฝ ๏€ญ3 1 ๏€ฝ 0 ๏€ญ 200 ๏€ฝ ๏€ญ2 100 ๏€ฝ 10 ๏€ญ 5 ๏€ฝ 1.67 3 The above z-scores indicate that the x value in part a lies the greatest distance above the mean and the x value of part b lies the greatest distance below the mean. 2.120 The mean score is 153. This is the arithmetic average score of U.S. twelfth graders on the mathematics assessment test. The 25th percentile score is 111. This indicates that 25% of the U.S. twelfth graders scored 111 or lower on the assessment test. The 75th percentile score is 177. This indicates that 75% of the U.S. twelfth graders scored 177 or lower on the Copyright ยฉ 2017 Pearson Education, Inc. 30 Chapter 2 assessment test. The 90th percentile score is 197. This indicates that 90% of the U.S. twelfth graders scored 197 or lower on the assessment test. 2.122 2.124 a. From Exercise 2.35, the proportion of fup/fumic ratios that fall above 1 is 0.034. The percentile rank of 1 is (1 ๏€ญ 0.034)100% ๏€ฝ 96.6 th percentile. b. From Exercise 2.35, the proportion of fup/fumic ratios that fall below 0.4 is 0.695. The percentile rank of 0.4 is (0.695)100% ๏€ฝ 69.5 th percentile. a. The z-score associated with a score of 30 is z ๏€ฝ b. x ๏€ญ x 30 ๏€ญ 39 ๏€ฝ ๏€ฝ ๏€ญ1.50 . This means s 6 that a score of 30 is 1.5 standard deviations below the mean. x ๏€ญ x 39 ๏€ญ 39 ๏€ฝ ๏€ฝ 0 . Half or 0.5 of the The z-score associated with a score of 39 is z ๏€ฝ s 6 observations are below a score of 39. 2.126 Since the 90th percentile of the study sample in the subdivision was 0.00372 mg/L, which is less than the USEPA level of 0.015 mg/L, the water customers in the subdivision are not at risk of drinking water with unhealthy lead levels. 2.128 a. If the distribution is mound-shaped and symmetric, then the Empirical Rule can be used. Approximately 68% of the scores will fall within 1 standard deviation of the mean or between 53% ๏‚ฑ 15% or between 38% and 68%. Approximately 95% of the scores will fall within 2 standard deviations of the mean or between 53% ๏‚ฑ 2(15%) or between 23% and 83%. Approximately all of the scores will fall within 3 standard deviations of the mean or between 53% ๏‚ฑ 3(15%) or between 8% and 98%. b. If the distribution is mound-shaped and symmetric, then the Empirical Rule can be used. Approximately 68% of the scores will fall within 1 standard deviation of the mean or between 39% ๏‚ฑ 12% or between 27% and 51%. Approximately 95% of the scores will fall within 2 standard deviations of the mean or between 39% ๏‚ฑ 2(12%) or between 15% and 63%. Approximately all of the scores will fall within 3 standard deviations of the mean or between 39% ๏‚ฑ 3(12%) or between 3% and 75%. c. Since the scores on the red exam are shifted to the left of those on the blue exam, a score of 20% is more likely to occur on the red exam than on the blue exam. 2.130 Yes. From the graph in Exercise 2.129 c, we can see that there are 4 observations with zscores greater than 3. There is then a gap down to 2.18. Those 4 observations are quite different from the rest of the data. After those 4 observations, the data are fairly similar. We know that by ranking the data, we can reduce the influence of outliers. But, by doing this, we lose valuable information. 2.132 The interquartile range is the distance between the upper and lower quartiles. 2.134 For a mound-shaped distribution, the Empirical Rule can be used. Almost all of the observations will fall within 3 standard deviations of the mean. Thus, almost all of the observations will have z-scores between -3 and 3. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.136 31 The interquartile range is IQR ๏€ฝ QU ๏€ญ QL ๏€ฝ 85 ๏€ญ 60 ๏€ฝ 25 . The lower inner fence ๏€ฝ QL ๏€ญ 1.5(IQR) ๏€ฝ 60 ๏€ญ 1.5(25) ๏€ฝ 22.5 . The upper inner fence ๏€ฝ QU ๏€ซ 1.5(IQR) ๏€ฝ 60 ๏€ซ 1.5(25) ๏€ฝ 122.5 . The lower outer fence ๏€ฝ QL ๏€ญ 3(IQR) ๏€ฝ 60 ๏€ญ 3(25) ๏€ฝ ๏€ญ15 . The upper outer fence ๏€ฝ QU ๏€ซ 3(IQR) ๏€ฝ 60 ๏€ซ 3(25) ๏€ฝ 160 . With only this information, the box plot would look something like the following: The whiskers extend to the inner fences unless no data points are that small or that large. The upper inner fence is 122.5. However, the largest data point is 100, so the whisker stops at 100. The lower inner fence is 22.5. The smallest data point is 18, so the whisker extends to 22.5. Since 18 is between the inner and outer fences, it is designated with a *. We do not know if there is any more than one data point below 22.5, so we cannot be sure that the box plot is entirely correct. 2.138 To determine if the measurements are outliers, compute the z-score. a. b. c. d. x ๏€ญ x 65 ๏€ญ 57 ๏€ฝ ๏€ฝ .727 11 s Since this z-score is less than 3 in magnitude, 65 is not an outlier. z๏€ฝ x ๏€ญ x 21 ๏€ญ 57 ๏€ฝ ๏€ฝ ๏€ญ3.273 11 s Since this z-score is more than 3 in magnitude, 21 is an outlier. z๏€ฝ x ๏€ญ x 72 ๏€ญ 57 ๏€ฝ ๏€ฝ 1.364 s 11 Since this z-score is less than 3 in magnitude, 72 is not an outlier. z๏€ฝ x ๏€ญ x 98 ๏€ญ 57 ๏€ฝ ๏€ฝ 3.727 11 s Since this z-score is more than 3 in magnitude, 98 is an outlier. z๏€ฝ Copyright ยฉ 2017 Pearson Education, Inc. 32 Chapter 2 2.140 a. Using MINITAB, the box plot for data is given below. Boxplot of Data 210 200 190 Data 180 170 160 150 140 b. 2.142 In this data set, there is one outlier. It corresponds to the value 140. a. The z-score is z ๏€ฝ x ๏€ญ x 175 ๏€ญ 79 ๏€ฝ ๏€ฝ 4.17. s 23 b. Yes, we would consider this measurement an outlier. Any observation with a z-score that has an absolute value greater than 3 is considered a highly suspect outlier. x ๏€ญ x 155 ๏€ญ 67.755 ๏€ฝ ๏€ฝ 3.25 . Because this s 26.871 observation is more than 3 standard deviations from the mean, it is considered a highly suspect outlier. It would not be considered typical of the study sample. 2.144 The z-score corresponding to 155 is: z ๏€ฝ 2.146 a. The approximate 25th percentile PASI score before treatment is 10. The approximate median before treatment is 15. The approximate 75th percentile PASI score before treatment is 27.5. b. The approximate 25th percentile PASI score after treatment is 3.5. The approximate median after treatment is 5. The approximate 75th percentile PASI score after treatment is 7.5. c. Since the 75th percentile after treatment is lower than the 25th percentile before treatment, it appears that the ichthyotherapy is effective in treating psoriasis. Using MINITAB, a boxplot of the data is: Boxplot of Rockfall 17.5 15.0 Rockfall 2.148 12.5 10.0 7.5 5.0 Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 33 From the boxplot, there is no indication that there are any outliers. We will now use the z-score criterion for determining outliers. From Exercises 2.61 and 2.88, x ๏€ฝ 9.72 and s ๏€ฝ 4.095 . The z-score associated with the minimum value is x ๏€ญ x 4.9 ๏€ญ 9.72 z๏€ฝ ๏€ฝ ๏€ฝ ๏€ญ1.18 and the z-score associated with the maximum value is s 4.095 x ๏€ญ x 17.83 ๏€ญ 9.72 z๏€ฝ ๏€ฝ ๏€ฝ 1.98 . Neither of these indicates there are any outliers. s 4.095 2.150 a. Using MINITAB, the boxplots of the three groups are: Boxplot of Honey, DM, Control 18 16 14 Data 12 10 8 6 4 2 0 Honey 2.152 DM Control b. The median improvement score for the honey dosage group is larger than the median improvement scores for the other two groups. The median improvement score for the DM dosage group is higher than the median improvement score for the control group. c. Because the interquartile range for the DM dosage group is larger than the interquartile ranges of the other 2 groups, the variability of the DM group is largest. The variability of the honey dosage group and the control group appear to be about the same. d. There appears to be one outlier in the honey dosage group and one outlier in the control group. a. z๏€ฝ b. The z-score is low enough to suspect that the librarian’s claim is incorrect. Even without any knowledge of the shape of the distribution, Chebyshev’s rule states that at least 8/9 of the measurements will fall within 3 standard deviations of the mean (and, consequently, at most 1/9 will be above z ๏€ฝ 3 or below z ๏€ฝ ๏€ญ3 ). c. The Empirical Rule states that almost none of the measurements should be above z ๏€ฝ 3 or below z ๏€ฝ ๏€ญ 3 . Hence, the librarian’s claim is even more unlikely. d. When ๏ณ ๏€ฝ 2 , z ๏€ฝ x๏€ญ๏ญ ๏ณ ๏€ฝ 4๏€ญ7 ๏€ฝ ๏€ญ3 1 x๏€ญ๏ญ ๏ณ ๏€ฝ 4๏€ญ7 ๏€ฝ ๏€ญ1.5 2 Copyright ยฉ 2017 Pearson Education, Inc. 34 Chapter 2 This is not an unlikely occurrence, whether or not the data are mound-shaped. Hence, we would not have reason to doubt the librarian’s claim. 2.154 A bivariate relationship is a relationship between 2 quantitative variables. 2.156 A positive association between two variables means that as one variable increases, the other variable tends to also increase. A negative association between two variables means that as one variable increases, the other variable tends to decrease. 2.158 Using MINITAB, the scatterplot is as follows: Scatterplot of Variable 2 vs Variable 1 18 16 14 Variable 2 12 10 8 6 4 2 0 0 1 2 3 4 5 Variable 1 It appears that as variable 1 increases, variable 2 also increases. Using MINITAB, a scatter plot of the data is: Scatterplot of SLUGPCT vs ELEVATION 0.625 0.600 0.575 SLUGPCT 2.160 0.550 0.525 0.500 0.475 0.450 0 1000 2000 3000 4000 5000 6000 ELEVATION If one uses the one obvious outlier (Denver), then there does appear to be a trend in the data. As the elevation increases, the slugging percentage tends to increase. However, if the outlier is removed, then it does not look like there is a trend to the data. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 2.162 a. 35 A scattergram of the data is: Scatterplot of Strikes vs Age 90 80 70 Strikes 60 50 40 30 20 10 120 130 140 150 160 170 180 190 Age b. 2.164 There appears to be a trend. As the age increases, the number of strikes tends to decrease. Using MINITAB, a scatterplot of the data is: Scatterplot of Freq vs Resonance 7000 6000 Freq 5000 4000 3000 2000 1000 0 5 10 15 20 25 Resonance There is an increasing trend and there is very little variation in the plot. This supports the researcherโ€™s theory. a. Using MINITAB, a graph of the Anthropogenic Index against the Natural Origin Index is: Scatterplot of F-Anthro vs F-Natural 90 80 70 60 F-Anthro 2.166 50 40 30 20 10 0 5 10 15 20 25 30 35 40 F-Natural Copyright ยฉ 2017 Pearson Education, Inc. 36 Chapter 2 This graph does not support the theory that there is a straight-line relationship between the Anthropogenic Index against the Natural Origin Index. There are several points that do not lie on a straight line. b. After deleting the three forests with the largest anthropogenic indices, the graph of the data is: Scatterplot of F-Anthro vs F-Natural 60 50 F-Anthro 40 30 20 10 0 5 10 15 20 25 30 35 40 F-Natural After deleting the 3 data points, the relationship between the Anthropogenic Index against the Natural Origin Index is much closer to a straight line. 2.168 Using MINITAB, a scattergram of the data is: Scatterplot of Mass vs Time 7 6 5 Mass 4 3 2 1 0 0 10 20 30 40 50 60 Time Yes, there appears to be a negative trend in this data. As time increases, the mass tends to decrease. There appears to be a curvilinear relationship. As time increases, mass decreases at a decreasing rate. 2.170 One way the bar graph can mislead the viewer is that the vertical axis has been cut off. Instead of starting at 0, the vertical axis starts at 12. Another way the bar graph can mislead the viewer is that as the bars get taller, the widths of the bars also increase. 2.172 a. This graph is misleading because it looks like as the days are increasing, the number of barrels collected per day is also increasing. However, the bars are the cumulative number of barrels collected. The cumulative value can never decrease. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data b. 37 Using MINITAB, the graph of the daily collection of oil is: Chart of Barrells 2500 Barrells 2000 1500 1000 500 0 May-16 May-17 May-18 May-19 May-20 May-21 May-22 May-23 Day From this graph, it shows that there has not been a steady improvement in the suctioning process. There was an increase for 3 days, then a leveling off for 3 days, then a decrease. 2.174 The range can be greatly affected by extreme measures, while the standard deviation is not as affected. 2.176 The z-score approach for detecting outliers is based on the distribution being fairly moundshaped. If the data are not mound-shaped, then the box plot would be preferred over the zscore method for detecting outliers. 2.178 One technique for distorting information on a graph is by stretching the vertical axis by starting the vertical axis somewhere above 0. 2.180 From part a of Exercise 2.179, the 3 z-scores are ๏€ญ1, 1 and 2. Since none of these z-scores are greater than 2 in absolute value, none of them are outliers. From part b of Exercise 2.179, the 3 z-scores are ๏€ญ2, 2 and 4. There is only one z-score greater than 2 in absolute value. The score of 80 (associated with the z-score of 4) would be an outlier. Very few observations are as far away from the mean as 4 standard deviations. From part c of Exercise 2.179, the 3 z-scores are 1, 3, and 4. Two of these z-scores are greater than 2 in absolute value. The scores associated with the two z-scores 3 and 4 (70 and 80) would be considered outliers. From part d of Exercise 2.179, the 3 z-scores are .1, .3, and .4. Since none of these z-scores are greater than 2 in absolute value, none of them are outliers. 2.182 ๏ณ ๏‚ป range / 4 ๏€ฝ 20 / 4 ๏€ฝ 5 2.184 a. ๏ƒฅ x ๏€ฝ 13 ๏€ซ 1 ๏€ซ 10 ๏€ซ 3 ๏€ซ 3 ๏€ฝ 30 ๏ƒฅ x ๏€ฝ 13 ๏€ซ 1 ๏€ซ 10 ๏€ซ 3 ๏€ซ 3 ๏€ฝ 288 2 x๏€ฝ 2 2 2 2 2 30 ๏ƒฅx ๏€ฝ 5 ๏€ฝ 6 Copyright ยฉ 2017 Pearson Education, Inc. 38 Chapter 2 s2 ๏€ฝ b. ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 n ๏€ญ1 2 n s2 ๏€ฝ ๏ƒฅ ๏€ฝ s ๏€ฝ 27 ๏€ฝ 5.196 2 2 4 ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ฝ 252 4 ๏€ฝ 84.75 ๏€ฝ 28.25 4 ๏€ญ1 3 241 ๏€ญ s ๏€ฝ 28.25 ๏€ฝ 5.315 ๏ƒฅ x ๏€ฝ 1 ๏€ซ 0 ๏€ซ 1 ๏€ซ 10 ๏€ซ 11 ๏€ซ 11 ๏€ซ 15 ๏€ฝ 49 ๏ƒฅ x ๏€ฝ 1 ๏€ซ 0 ๏€ซ 1 ๏€ซ 10 ๏€ซ 11 ๏€ซ 11 ๏€ซ 15 ๏€ฝ 569 ๏ƒฅ x ๏€ฝ 49 ๏€ฝ 7 x๏€ฝ 2 n s2 ๏€ฝ ๏ƒฅ 2 2 2 2 2 2 7 ๏€จ ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 492 7 ๏€ฝ 226 ๏€ฝ 37.667 7 ๏€ญ1 6 569 ๏€ญ s ๏€ฝ 37.667 ๏€ฝ 6.137 ๏ƒฅ x ๏€ฝ 3 ๏€ซ 3 ๏€ซ 3 ๏€ซ 3 ๏€ฝ 12 ๏ƒฅ x ๏€ฝ 3 ๏€ซ 3 ๏€ซ 3 ๏€ซ 3 ๏€ฝ 36 ๏ƒฅ x ๏€ฝ 12 ๏€ฝ 3 x๏€ฝ 2 2 n s2 ๏€ฝ e. 2 n ๏€ญ1 2 d. n 302 5 ๏€ฝ 108 ๏€ฝ 27 5 ๏€ญ1 4 288 ๏€ญ ๏ƒฅ x ๏€ฝ 13 ๏€ซ 6 ๏€ซ 6 ๏€ซ 0 ๏€ฝ 25 ๏ƒฅ x ๏€ฝ 13 ๏€ซ 6 ๏€ซ 6 ๏€ซ 0 ๏€ฝ 241 ๏ƒฅ x ๏€ฝ 25 ๏€ฝ 6.25 x๏€ฝ 2 c. 2 ๏ƒฅ 2 2 2 4 ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 122 4 ๏€ฝ 0 ๏€ฝ0 4 ๏€ญ1 3 36 ๏€ญ s๏€ฝ 0 ๏€ฝ0 a) x ๏‚ฑ 2s ๏ƒž 6 ๏‚ฑ 2(5.2) ๏ƒž 6 ๏‚ฑ 10.4 ๏ƒž (๏€ญ4.4, 16.4) All or 100% of the observations are in this interval. b) x ๏‚ฑ 2s ๏ƒž 6.25 ๏‚ฑ 2(5.32) ๏ƒž 6.25 ๏‚ฑ 10.64 ๏ƒž (๏€ญ4.39, 16.89) All or 100% of the observations are in this interval. c) x ๏‚ฑ 2s ๏ƒž 7 ๏‚ฑ 2(6.14) ๏ƒž 7 ๏‚ฑ 12.28 ๏ƒž (๏€ญ5.28, 19.28) All or 100% of the observations are in this interval. d) x ๏‚ฑ 2s ๏ƒž 3 ๏‚ฑ 2(0) ๏ƒž 3 ๏‚ฑ 0 ๏ƒž (3, 3) All or 100% of the observations are in this interval. Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data Suppose we construct a relative frequency bar chart for this data. This will allow the archaeologists to compare the different categories easier. First, we must compute the relative frequencies for the categories. These are found by dividing the frequencies in each category by the total 837. For the burnished category, the relative frequency is 133 / 837 ๏€ฝ 0.159 . The rest of the relative frequencies are found in a similar fashion and are listed in the table. Pot Category Number Found Computation 133 460 55 14 165 4 4 2 837 133 / 837 460 / 837 55 / 837 14 / 837 165 / 837 4 / 837 4 / 837 2 / 837 Burnished Monochrome Slipped Curvilinear Decoration Geometric Decoration Naturalistic Decoration Cycladic White clay Conical cup clay Total Relative Frequency 0.159 0.550 0.066 0.017 0.197 0.005 0.005 0.002 1.001 A relative frequency bar chart is: Chart of Pot Category .60 .50 Relative Frequency 2.186 39 .40 .30 .20 .10 0 Burnished Monochrome Slipped Curvilinear Geometric Naturalistic Cycladic Conical Pot Category Proportion within all data. The most frequently found type of pot was the Monochrome. Of all the pots found, 55% were Monochrome. The next most frequently found type of pot was the Painted in Geometric Decoration. Of all the pots found, 19.7% were of this type. Very few pots of the types Painted in naturalistic decoration, Cycladic white clay, and Conical cup clay were found. Copyright ยฉ 2017 Pearson Education, Inc. 40 Chapter 2 2.188 a. Using MINITAB, the stem-and-leaf display is as follows. Character Stem-and-Leaf Display Stem-and-leaf of Books Leaf Unit = 1.0 1 5 6 (3) 5 4 1 1 b. N = 14 1 6 2 0124 2 8 3 044 3 9 4 002 4 5 3 The leaves that correspond to students who earned an โ€œAโ€ grade are highlighted in the graph above. Those students who earned Aโ€™s tended to read the most books. n ๏ƒฅx i c. 53 ๏€ซ 42 ๏€ซ 40 ๏€ซ … ๏€ซ 16 443 ๏€ฝ ๏€ฝ 31.643 . This is the average The mean is x ๏€ฝ i ๏€ฝ1 ๏€ฝ 14 14 n number of books read per student. To find the median, the data must be arranged in order. In this problem, the data are already arranged in order. There are a total of 14 observations, which is an even number. The median is the average of the middle 2 numbers which are 30 and 34. The 34 ๏€ซ 30 64 ๏€ฝ ๏€ฝ 32 . Half of the students read more than 32 books and half median is 2 2 read fewer. The mode is the observation appearing the most. In this data set, there are two modes 34 and 40 because each appears 2 times in the data set. The most frequent number of books read is either 34 or 40. d. Since the mean and the median are almost the same, the distribution of the data set is approximately symmetric. This can be verified by the stem-and-leaf display in part a. e. For those students who earned A, the mean is n ๏ƒฅx i x๏€ฝ i ๏€ฝ1 n ๏€ฝ 53 ๏€ซ 42 ๏€ซ 40 ๏€ซ … ๏€ซ 24 296 ๏€ฝ ๏€ฝ 37 . 8 8 The variance is s 2 ๏€ฝ ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 296 2 8 ๏€ฝ 530 ๏€ฝ 75.7143 7 7 11, 482 ๏€ญ The standard deviation is s ๏€ฝ s 2 ๏€ฝ 75.7143 ๏€ฝ 8.701 . Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data f. 41 For those students who earned a B or C, the mean is n ๏ƒฅx i x๏€ฝ i ๏€ฝ1 n ๏€ฝ 40 ๏€ซ 28 ๏€ซ 22 ๏€ซ … ๏€ซ 16 147 ๏€ฝ ๏€ฝ 24.5 6 6 The variance is s 2 ๏€ฝ ๏ƒฅ ๏€จ๏ƒฅ x๏€ฉ x ๏€ญ 2 2 n ๏€ญ1 n ๏€ฝ 147 2 6 ๏€ฝ 363.5 ๏€ฝ 72.7 5 5 3,965 ๏€ญ The standard deviation is s ๏€ฝ s 2 ๏€ฝ 72.7 ๏€ฝ 8.526 . 2.190 g. The students who received Aโ€™s have a more variable distribution of the number of books read. The variance and standard deviation for this group are greater than the corresponding values for the B-C group. h. The z-score for a score of 40 books is z ๏€ฝ i. The z-score for a score of 40 books is z ๏€ฝ j. The group of students who earned Aโ€™s is more likely to have read 40 books. For this group, the z-score corresponding to 40 books is 0.34. This is not unusual. For the B-C group, the z-score corresponding to 40 books is 1.82. This is close to 2 standard deviations from the mean. This would be fairly unusual. x ๏€ญ x 40 ๏€ญ 37 ๏€ฝ ๏€ฝ 0.345 . Thus, someone who s 8.701 read 40 books read more than the average number of books, but that number is not very unusual. x ๏€ญ x 40 ๏€ญ 24.5 ๏€ฝ ๏€ฝ 1.82 . Thus, someone who s 8.526 read 40 books read many more than the average number of books. Very few students who received a B or a C read more than 40 books. A pie chart of the data is: Pie Chart of Drive Star 2 4.1% 5 18.4% 3 17.3% Category 2 3 4 5 4 60.2% More than half of the cars received 4 star ratings (60.2%). A little less than a quarter of the cars tested received ratings of 3 stars or less. Copyright ยฉ 2017 Pearson Education, Inc. 42 Chapter 2 2.192 a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Ratio Variable Ratio N 26 Mean 3.507 StDev 0.634 Minimum 2.250 Maximum 5.060 x ๏€ญ x 5.06 ๏€ญ 3.507 ๏€ฝ ๏€ฝ 2.45 s 0.634 x ๏€ญ x 2.25 ๏€ญ 3.507 ๏€ฝ ๏€ฝ ๏€ญ1.98 The z-score associated with the smallest ratio is z ๏€ฝ s 0.634 x ๏€ญ x 3.507 ๏€ญ 3.507 ๏€ฝ ๏€ฝ0 The z-score associated with the mean ratio is z ๏€ฝ s 0.634 The z-score associated with the largest ratio is z ๏€ฝ b. Yes, I would consider the z-score associated with the largest ratio to be unusually large. We know if the data are approximately mound-shaped that approximately 95% of the observations will be within 2 standard deviations of the mean. A z-score of 2.45 would indicate that less than 2.5% of all the measurements will be larger than this value. c. Using MINITAB, the box plot is: Boxplot of Ratio 5.0 4.5 Ratio 4.0 3.5 3.0 2.5 2.0 From this box plot, there are no observations marked as outliers. a. Using MINITAB, a histogram of the data is: Histogram of pH 12 10 8 Percent 2.194 6 4 2 0 5.4 6.0 6.6 7.2 7.8 8.4 9.0 pH Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 43 From the graph, it looks like the proportion of wells with ph levels less than 7.0 is: 0.005 ๏€ซ 0.01 ๏€ซ 0.02 ๏€ซ 0.015 ๏€ซ 0.027 ๏€ซ 0.031 ๏€ซ 0.05 ๏€ซ 0.07 ๏€ซ 0.017 ๏€ซ 0.05 ๏€ฝ 0.295 b. Using MINITAB, a histogram of the MTBE levels for those wells with detectible levels is: Histogram of MTBE-Level MTBE-Detect = Detect 80 70 60 Percent 50 40 30 20 10 0 0 10 20 30 40 50 MTBE-Level From the graph, it looks like the proportion of wells with MTBE levels greater than 5 is: 0.03 ๏€ซ 0.01 ๏€ซ 0.01 ๏€ซ 0.01 ๏€ซ 0.01 ๏€ซ 0.01 ๏€ซ 0.01 ๏€ฝ 0.09 c. The sample mean is: n ๏ƒฅx i x ๏€ฝ i ๏€ฝ1 n ๏€ฝ 7.87 ๏€ซ 8.63 ๏€ซ 7.11 ๏€ซ ๏ƒ— ๏ƒ— ๏ƒ— ๏€ซ 6.33 1,656.16 ๏€ฝ ๏€ฝ 7.427 223 223 The variance is: s2 ๏€ฝ ๏ƒฅ ๏€จ๏ƒฅ x ๏€ฉ x ๏€ญ 2 i 2 n ๏€ญ1 n ๏€ฝ 1,656.162 148.13391 223 ๏€ฝ ๏€ฝ 0.66727 223 – 1 222 12 , 447.9812 – The standard deviation is: s ๏€ฝ s 2 ๏€ฝ 0.66727 ๏€ฝ 0.8169 x ๏‚ฑ 2s ๏ƒž 7.427 ๏‚ฑ 2(0.8169) ๏ƒž 7.427 ๏‚ฑ 1.6338 ๏ƒž (5.7932, 9.0608). From the histogram in part a, the data look approximately mound-shaped. From the Empirical Rule, we would expect about 95% of the wells to fall in this range. In fact, 212 of 223 or 95.1% of the wells have pH levels between 5.7932 and 9.0608. d. The sample mean of the wells with detectible levels of MTBE is: n ๏ƒฅx i x๏€ฝ i ๏€ฝ1 n ๏€ฝ 0.23 ๏€ซ 0.24 ๏€ซ 0.24 ๏€ซ ๏ƒ—๏ƒ—๏ƒ— ๏€ซ 48.10 240.86 ๏€ฝ ๏€ฝ 3.441 70 70 Copyright ยฉ 2017 Pearson Education, Inc. 44 Chapter 2 The variance is: ๏€จ๏ƒฅ x ๏€ฉ ๏€ญ x ๏ƒฅ 2 i 2 s2 ๏€ฝ n ๏€ญ1 n ๏€ฝ 240.86 2 5283.5011 70 ๏€ฝ ๏€ฝ 76.5725 70-1 69 6112.266- The standard deviation is: s ๏€ฝ s 2 ๏€ฝ 76.5725 ๏€ฝ 8.7506 x ๏‚ฑ 2 s ๏ƒž 3.441 ๏‚ฑ 2(8.7506) ๏ƒž 3.441 ๏‚ฑ 17.5012 ๏ƒž (๏€ญ14.0602, 20.9422). From the histogram in part b, the data do not look mound-shaped. From Chebyshevโ€™s Rule, we would expect at least ยพ or 75% of the wells to fall in this range. In fact, 67 of 70 or 95.7% of the wells have MTBE levels between -14.0602 and 20.9422. 2.196 a. Using MINITAB, the dot plot for the 9 measurements is: Dotplot of Cesium -6.0 -5.7 -5.4 -5.1 -4.8 -4.5 -4.2 Cesium b. Using MINITAB, the stem-and-leaf display is: Character Stem-and-Leaf Display Stem-and-leaf of Cesium Leaf Unit = 0.10 1 2 4 (3) 2 N = 9 -6 0 -5 5 -5 00 -4 865 -4 11 Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data c. 45 Using MINITAB, the histogram is: Histogram of Cesium 2.0 Frequency 1.5 1.0 0.5 0.0 -6.0 -5.5 -5.0 -4.5 -4.0 Cesium 2.198 d. The stem-and-leaf display appears to be more informative than the other graphs. Since there are only 9 observations, the histogram and dot plot have very few observations per category. e. There are 4 observations with radioactivity level of -5.00 or lower. The proportion of measurements with a radioactivity level of -5.0 or lower is 4 / 9 ๏€ฝ 0.444 . Using MINITAB, the descriptive statistics are: Descriptive Statistics: Ammonia Variable Ammonia N 8 Mean 1.4713 StDev 0.0640 Minimum 1.3700 Q1 1.4125 Median 1.4900 Q3 1.5250 Maximum 1.5500 The stem-and-leaf display for the data is: Stem-and-Leaf Display: Ammonia Stem-and-leaf of Ammonia Leaf Unit = 0.010 1 3 4 4 1 13 14 14 15 15 N = 8 7 12 8 013 5 Since the data look fairly mound-shaped, we will use the Empirical Rule. We know that approximately 99.7% of all observations will fall within 3 standard deviation of the mean. For this data, the interval 3 standard deviations below the mean to 3 standard deviations above the mean is: x ๏‚ฑ 3s ๏ƒž 1.471 ๏‚ฑ 3(0.064) ๏ƒž 1.471 ๏‚ฑ 0.192 ๏ƒž (1.279, 1.663) We would be fairly confident that the ammonia level of a randomly selected day will fall between 1.279 and 1.663 parts per million. Copyright ยฉ 2017 Pearson Education, Inc. 46 Chapter 2 2.200 a. From the histogram, the data do not follow the true mound-shape very well. The intervals in the middle are much higher than they should be. In addition, there are some extremely large velocities and some extremely small velocities. Because the data do not follow a mound-shaped distribution, the Empirical Rule would not be appropriate. b. Using Chebyshev’s rule, at least 1 ๏€ญ 1/ 4 ๏€ฝ 1 ๏€ญ 1/16 ๏€ฝ 15 /16 or 93.8% of the velocities will fall within 4 standard deviations of the mean. This interval is: 2 x ๏‚ฑ 4s ๏ƒž 27,117 ๏‚ฑ 4(1,280) ๏ƒž 27,117 ๏‚ฑ 5,120 ๏ƒž (21,997, 32,237) At least 93.75% of the velocities will fall between 21,997 and 32,237 km per second. c. If we assume that the distributions are symmetric and mound-shaped, then the Empirical Rule will describe the data. We will compute the mean plus or minus one, two and three standard deviations for both data sets: Low income: x ๏‚ฑ s ๏ƒž 7.62 ๏‚ฑ 8.91 ๏ƒž (๏€ญ1.29, 16.53) x ๏‚ฑ 2s ๏ƒž 7.62 ๏‚ฑ 2(8.91) ๏ƒž 7.62 ๏‚ฑ 17.82 ๏ƒž (๏€ญ10.20, 25.44) x ๏‚ฑ 3s ๏ƒž 7.62 ๏‚ฑ 3(8.91) ๏ƒž 7.62 ๏‚ฑ 26.73 ๏ƒž (๏€ญ19.11, 34.35) Middle Income: x ๏‚ฑ s ๏ƒž 15.55 ๏‚ฑ 12.24 ๏ƒž (3.31, 27.79) x ๏‚ฑ 2s ๏ƒž 15.55 ๏‚ฑ 2(12.24) ๏ƒž 15.55 ๏‚ฑ 24.48 ๏ƒž (๏€ญ8.93, 40.03) x ๏‚ฑ 3s ๏ƒž 15.55 ๏‚ฑ 3(12.24) ๏ƒž 15.55 ๏‚ฑ 36.72 ๏ƒž (๏€ญ21.17, 52.27) The histogram for the low income group is as follows: .35 Relatie frequency 2.202 Since the data look approximately symmetric, the mean would be a good estimate for the velocity of galaxy cluster A2142. Thus, this estimate would be 27,117 km per second. .30 .25 .20 .15 .10 .05 -19.11 -10.00 -1.29 7.62 Complexity 16.53 25.44 34.35 Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 47 The histogram for the middle income group is as follows: Relatie frequency .35 .30 .25 .20 .15 .10 .05 -21.17 -8.93 3.31 15.55 Complexity 27.79 40.03 52.27 The spread of the data for the middle income group is much larger than that of the low income group. The middle of the distribution for the middle income group is 15.55, while the middle of the distribution for the low income group is 7.62. Thus, the middle of the distribution for the middle income group is shifted to the right of that for the low income group. We might be able to compare the means for the two groups. From the data provided, it looks like the mean score for the middle income group is greater than the mean score for the lower income group. (Note: From looking at the data, it is rather evident that the distributions are not moundshaped and symmetric. For the low income group, the standard deviation is larger than the mean. Since the smallest measurement allowed is 0, this indicates that the data set is not symmetric but skewed to the right. A similar argument could be used to indicate that the data set of middle income scores is also skewed to the right.) 2.204 a. For site A, there is no real pattern to the data that would indicate that the data are skewed. For site G, most of the data are concentrated from 250 and up. There are relatively few observations less than 250. This indicates that the data are skewed to the left. b. For site A, there are 2 modes (two distance intervals with the largest number of observations). Since there is no more than one mode, this would indicate that the data are probably from hearths inside dwellings. For site G, there is only one mode. This would indicate that the data are probably from open air hearths. 2.206 The relative frequency for each cell is found by dividing the frequency by the total sample size, n ๏€ฝ 743 . The relative frequency for the digit 1 is 109 / 743 ๏€ฝ 0.147 . The rest of the relative frequencies are found in the same manner and are shown in the table. Copyright ยฉ 2017 Pearson Education, Inc. 48 Chapter 2 First Digit 1 2 3 4 5 6 7 8 9 Total Relative Frequency 0.147 0.101 0.104 0.133 0.097 0.157 0.120 0.083 0.058 1.000 Frequency 109 75 77 99 72 117 89 62 43 743 Using MINITAB, the relative frequency bar chart is: Chart of Freq .16 Relative Frequency .14 .12 .10 .08 .06 .04 .02 0 1 2 3 4 5 6 7 8 9 FirstDigit Proportion within all data. Benford’s Law indicates that certain digits are more likely to occur as the first significant digit in a randomly selected number than other digits. The law also predicts that the number “1” is the most likely to occur as the first digit (30% of the time). From the relative frequency bar chart, one might be able to argue that the digits do not occur with the same frequency (the relative frequencies appear to be slightly different). However, the histogram does not support the claim that the digit “1” occurs as the first digit about 30% of the time. In this sample, the number “1” only occurs 14.7% of the time, which is less than half the expected 30% using Benford’s Law. 2.208 If the distributions of the standardized tests are approximately mound-shaped, then it would be impossible for 90% of the school districts’ students to score above the mean. If the distributions are mound-shaped, then the mean and median are approximately the same. By definition, only 50% of the students would score above the median. If the distributions are not mound-shaped, but skewed to the left, it would be possible for more than 50% of the students to score above the mean. However, it would be almost impossible for 90% of the students scored above the mean. 2.210 For the first professor, we would assume that most of the grade-points will fall within 3 standard deviations of the mean. This interval would be: Copyright ยฉ 2017 Pearson Education, Inc. Methods for Describing Sets of Data 49 x ๏‚ฑ 3s ๏ƒž 3.0 ๏‚ฑ 3(.2) ๏ƒž 3.0 ๏‚ฑ .6 ๏ƒž (2.4, 3.6) Thus, if you had the first professor, you would be pretty sure that your grade-point would be between 2.4 and 3.6. For the second professor, we would again assume that most of the grade-points will fall within 3 standard deviations of the mean. This interval would be: x ๏‚ฑ 3s ๏ƒž 3.0 ๏‚ฑ 3(1) ๏ƒž 3.0 ๏‚ฑ 3.0 ๏ƒž (0.0, 6.0) Thus, if you had the second professor, you would be pretty sure that your grade-point would be between 0.0 and 6.0. If we assume that the highest grade-point one could receive is 4.0, then this interval would be (0.0, 4.0). We have gained no information by using this interval, since we know that all grade-points are between 0.0 and 4.0. However, since the standard deviation is so large, compared to the mean, we could infer that the distribution of gradepoints in this class is not symmetric, but skewed to the left. There are many high grades, but there are several very low grades. By taking the first professor, you know you are almost positive that you will get a final grade of at least 2.4, but almost no chance of getting a final grade of 4. By taking the second professor, you know the grades are skewed to the left and that many of the students will get high grades, but also a few will get very low grades. 2.212 The answers to this will vary. Some things that should be included in the discussion are: From the graph, it is obvious that the amount of money spent on education has increased tremendously over the period from 1966 to 2000 (from about $4.5 billion in 1966 to about $22.5 billion in 2000). However, one should note that the number of students has also increased. It might be better to reflect the amount of money spent as the amount of money spent per student over the years from 1966 to 2000 rather than the total amount spent. In the description of the exercise, it says that the horizontal line represents the annual average fourth-grade childrenโ€™s reading ability. It also indicates that the fourth-grade reading test scores are designed to have an average of 250 with a standard deviation of 50. Thus, regardless of whether the childrenโ€™s reading abilities increase or decrease, the annual average will always be 250. This line does not give any information about whether the childrenโ€™s reading abilities are improving or not. In addition, if the reading scores of seventh and twelfth graders and the mathematics scores of fourth graders improved over the same time period, one could conclude that the reading scores of the fourth graders also improved over the same time period. Thus, this graph does not support the governmentโ€™s position that our children are not making classroom improvements despite federal spending on education. This graph only portrays that the total amount of money spent on education over the time period from 1966 to 2000 increased. Copyright ยฉ 2017 Pearson Education, Inc.

Document Preview (44 of 555 Pages)

User generated content is uploaded by users for the purposes of learning and should be used following SchloarOn's honor code & terms of service.
You are viewing preview pages of the document. Purchase to get full access instantly.

Shop by Category See All


Shopping Cart (0)

Your bag is empty

Don't miss out on great deals! Start shopping or Sign in to view products added.

Shop What's New Sign in