Section 2-1: Frequency Distributions for Organizing and Summarizing Data
Chapter 2: Exploring Data with Tables and Graphs
Section 2-1: Frequency Distributions for Organizing and Summarizing Data
1. The table summarizes measurements from 40 subjects. It is not possible to identify the exact values of all of the
original cotinine measurements.
2. The classes of 0โ100, 100โ200, โฆ, 400โ500 overlap, so it is not always clear which class we should put a value
in. For example, the value of 100 could go in the first class or the second class. The classes should be mutually
exclusive.
3.
Cotinine (ng/Ml)
0โ99
100โ199
200โ299
300โ399
400โ499
Relative Frequency
27.5%
30.0%
35.0%
2.5%
5.0%
4. The sum of the relative frequencies is 125%, but it should be 100%, with a small round off error. All of the
relative frequencies appear to be roughly the same, but if they are from a normal distribution, they should start
low, reach a maximum, and then decrease.
5.
Class width: 100
Class midpoints: 49.5, 149.5, 249.5, 349.5, 449.5, 549.5
Class boundaries: โ0.5, 99.5, 199.5, 299.5, 399.5, 499.5, 599.5
6.
Class width: 90
Class midpoints: 1004.5, 1094.5, 1184.5, 1274.5, 1364.5,1454.5
Class boundaries: 959.5, 1049.5, 1139.5, 1229.5, 1319.5, 1409.5, 1499.5
7.
Class width: 100
Class midpoints: 49.5, 149.5, 249.5, 349.5, 449.5, 549.5, 649.5
Class boundaries: โ0.5, 99.5, 199.5, 299.5, 399.5, 499.5, 599.5, 699.5
8.
Class width: 100
Class midpoints: 149.5, 249.5, 349.5, 449.5, 549.5
Class boundaries: 99.5, 199.5, 299.5, 399.5, 499.5, 599.5
9. No. The maximum frequency is in the first class instead of being near the middle, so the frequencies below the
maximum do not mirror those above the maximum.
10. No. The frequencies start high and then decrease, so the frequencies below the maximum do not mirror those
above the maximum.
11. By symmetry, the last three frequencies would be 18, 12, and 2. The middle frequency would be
153 ๏ญ 2 ๏จ18๏ฉ ๏ญ 2 ๏จ12 ๏ฉ ๏ญ 2 ๏จ 2 ๏ฉ ๏ฝ 89.
12. No. The maximum frequency is in the second class instead of being near the middle, and the first two
frequencies are much greater than the last two frequencies, so the frequencies below the maximum do not mirror
those above the maximum.
8
Chapter 2: Exploring Data with Tables and Graphs
13. The pulse rates appear to have a distribution that is approximately normal.
Pulse Rate (Male)
40โ49
50โ59
60โ69
70โ79
80โ89
90โ99
100โ109
Frequency
2
23
53
43
25
5
2
14. The pulse rates appear to have a distribution that is approximately normal.
Pulse Rate (Female)
30โ39
40โ49
50โ59
60โ69
70โ79
80โ89
90โ99
100โ109
Frequency
1
1
17
33
41
37
13
4
15. The verbal IQ scores appear to have a distribution that is approximately normal.
IQ (Verbal)
50โ59
60โ69
70โ79
80โ89
90โ99
100โ109
110โ119
120โ129
Frequency
3
8
13
26
18
6
2
2
16. The verbal IQ scores appear to have a distribution that is approximately normal.
IQ (Verbal)
60โ69
70โ79
80โ89
90โ99
100โ109
Frequency
1
7
8
4
1
Section 2-1: Frequency Distributions for Organizing and Summarizing Data
17. Yes. The distribution appears to be approximately normal.
Red Blood Cell Count
(Males)
3.00โ3.49
3.50โ3.99
4.00โ4.49
4.50โ4.99
5.00โ5.49
5.50โ5.99
Frequency
1
16
29
57
44
6
18. Yes. The distribution appears to be approximately normal.
Red Blood Cell Count
(Females)
3.00โ3.49
3.50โ3.99
4.00โ4.49
4.50โ4.99
5.00โ5.49
5.50โ5.99
6.00โ6.49
Frequency
1
20
85
33
7
0
1
Weight (kg) in
September
50โ59
60โ69
70โ79
80โ89
90โ99
Frequency
2
12
11
3
4
Weight (kg) in April
40โ49
50โ59
60โ69
70โ79
80โ89
90โ99
100โ109
Frequency
2
20
28
8
7
1
1
19.
20.
Chapter 2: Exploring Data with Tables and Graphs
21. The frequency distribution suggests that the reported heights were rounded with disproportionately many 0s and
5s. This suggests that the results are not very accurate.
Last Digit
0
1
2
3
4
5
6
7
8
9
Frequency
9
2
1
3
1
15
2
0
3
1
22. The frequency distribution suggests that the reported weights were not rounded since the last digits seem
equally distributed.
Last Digit
0
1
2
3
4
5
6
7
8
9
Frequency
4
3
6
4
4
5
7
7
6
4
23. The two distributions differ substantially. The presence of cotinine appears to be much higher for smokers than
for nonsmokers exposed to smoke.
Cotinine (ng/mL)
0โ99
100โ199
200โ299
300โ399
400โ499
500โ599
Smokers
27.5%
30.0%
35.0%
2.5%
5.0%
0%
Nonsmokers Exposed to
Smoke
85.0%
5.0%
2.5%
2.5%
0%
5.0%
Section 2-1: Frequency Distributions for Organizing and Summarizing Data
24. The two distributions show moderate difference. It appears females have slightly higher pulse rates.
Pulse Rate
0โ99
100โ199
200โ299
300โ399
400โ499
500โ599
600โ699
Males
0.7%
33.3%
58.8%
6.5%
0%
0%
0.7%
Females
0%
16.7%
61.3%
18.7%
0%
1.3%
2%
25.
Cotinine (Nonsmokers Exposed
to Smoke in ng/mL)
Cumulative Frequency
Less than 100
34
Less than 200
36
Less than 300
37
Less than 400
38
Less than 500
38
Less than 600
40
26.
Brain Volume (cm3)
Less than 1049
Less than 1139
Less than 1229
Less than 1319
Less than 1409
Less than 1499
Frequency
6
13
16
18
19
20
27. a. The values of 551 and 543 are clearly outliers; the values of 384, 241, 197, and 178 could also be outliers.
b. The number of classes increases from six to ten. The outlier can greatly increase the number of classes. If
there are too many classes, we might use a larger class width with the effect that the true nature of the
distribution may be hidden.
Cotinine (Nonsmokers Exposed
to Smoke in ng/mL)
0โ99
100โ199
200โ299
300โ399
400โ499
500โ599
600โ699
700โ799
800โ899
900โ999
Frequency
34
2
1
1
0
2
0
0
0
1
Chapter 2: Exploring Data with Tables and Graphs
Section 2-2: Histograms
1. It is easier to see the distribution of the data by examining the graph of the histogram than by examining the
numbers in a frequency distribution.
2. Not necessarily. Because the sample subjects themselves chose to be included, the voluntary response sample
might not be representative of the population.
3.
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
4. No, the term โnormal distributionโ has a different meaning than the term โnormalโ that is used in ordinary
speech. A normal distribution will have a histogram that is approximately bell shaped. Determining whether a
histogram fits the bell shape would be subjective.
5.
approximately 50
6.
Class width: 0.5 mm, lower limit of first class: 2.0 mm, upper limit of first class: 2.4 mm
7.
The largest possible value would be approximately 4.5 mm, which would not be an outlier.
8.
The sample does not appear to be from a normal distribution, since it is not symmetric about the middle class.
9.
The pulse rates of males appear to have a distribution that is approximately normal.
Histogram for Exercise 9
Histogram for Exercise 10
60
40
50
30
Frequency
Frequency
40
30
20
20
10
10
0
30
40
50
60
70
80
90
0
100 110
30
40
50
Pulse Rate (M ale)
60
70
80
90
100 110
Pulse Rate (Female)
10. The pulse rates of females appear to have a distribution that is approximately normal.
11. The IQ scores appear to have a distribution that is approximately normal.
Histogram for Exercise 11
Histogram for Exercise 12
9
25
8
7
20
Frequency
Frequency
6
15
10
5
4
3
2
5
1
0
60
80
100
120
0
60
Verbal IQ (Low Lead)
70
80
90
100
Verbal IQ (High Lead)
12. The IQ scores appear to have a distribution that is approximately normal.
110
Section 2-2: Histograms
13
13. Yes. The red blood cell counts appear to have a distribution that is very approximately normal, although some
might describe the distribution as being left-skewed instead of normal.
Histogram for Exercise 13
Histogram for Exercise 14
60
90
80
50
70
60
Frequency
Frequency
40
30
50
40
20
30
20
10
0
10
3.0
3.5
4.0
4.5
5.0
5.5
0
6.0
3.0
3.5
Red Blood Cell Count (Female)
4.0
4.5
5.0
5.5
6.0
6.5
Red Blood Cell Count (M ale)
14. Yes. The red blood cell counts appear to have a distribution that is very approximately normal.
15. The histogram is shown below.
Histogram for Exercise 15
Histogram for Exercise 16
14
20
12
15
Frequency
Frequency
10
8
6
10
4
5
2
0
40
50
60
70
80
90
0
100
50
September Weight (kg) of M ales
60
70
80
90
100
April Weight (kg) of M ales
16. The histogram is shown above.
17. The histogram suggests that the reported heights were rounded with disproportionately many 0s and 5s. This
suggests that the results are not very accurate.
Histogram for Exercise 17
Histogram for Exercise 18
16
7
14
6
5
10
Frequency
Frequency
12
8
6
3
2
4
1
2
0
4
0
1
2
3
4
5
6
7
8
9
0
Last Digit
0
1
2
3
4
5
Last Digit
6
7
8
9
Chapter 2: Exploring Data with Tables and Graphs
18. The histogram suggests that the reported weights were not rounded since the last digits seem equally
distributed.
19. Only part (c) appears to represent data from a normal distribution. Part (a) has a systematic pattern that is not
that of a straight line, part (b) has points that are not close to a straight-line pattern, and part (d) is really bad
because it shows a systematic pattern and points that are not close to a straight-line pattern.
Section 2-3: Graphs That Enlighten and Graphs That Deceive
1. The data set is too small for a graph to reveal important characteristics of the data. With such a small data set, it
would be better to simply list the data or place them in a table.
2. No. If the sample is a bad sample, such as one obtained from voluntary responses, there are no graphs or other
techniques that can be used to salvage the data.
3. No. Graphs should be constructed in a way that is fair and objective. The readers should be allowed to make
their own judgments, instead of being manipulated by misleading graphs.
4. Center, variation, distribution, outliers, change in the characteristics of data over time. The time-series graph
does the best job of giving us insight into the change in the characteristics of data over time.
5.
The pulse rate of 36 beats per minute appears to be an outlier.
6.
There do not appear to be any outliers.
7.
The data are arranged in order from lowest to highest, as 36, 56, 56, and so on.
8.
The two values closest to the middle are 72 mm Hg and 74 mm Hg.
Section 2-3: Graphs That Enlighten and Graphs That Deceive
9. There was a steep jump in the first four years, but the numbers of triplets have shown a downward trend in the
past several years.
8000
Number of Triplets
7000
6000
5000
4000
3000
2000
1000
0
1995.0
1997.5
2000.0
2002.5
2005.0
2007.5
2010.0
2012.5
Year
10. The number of fatalities is decreasing, most likely due to greater enforcement of DUI laws and greater public
awareness campaigns.
6.5
6.0
Fatalities
5.5
5.0
4.5
4.0
3.5
3.0
1990
1995
2000
2005
2010
2015
Year
11. Misconduct includes fraud, duplication, and plagiarism, so it does appear to be a major factor.
900
Number of Retractions
800
700
600
500
400
300
200
100
0
Fraud
Error
Duplication
Other
Plagiarism
Chapter 2: Exploring Data with Tables and Graphs
12. The overwhelming response was that thank-you notes should be sent to everyone who is met during a job
interview. Given what is at stake, that seems like a wise strategy.
400
Number
300
200
100
0
Everyone
M ost Senior
M ost Time Best Conversation Don’t Send
13.
14.
15. The distribution appears to be roughly bell-shaped, so the distribution is approximately normal.
60
50
Frequency
40
30
20
10
0
34.5
44.5
54.5
64.5
74.5
84.5
Pulse Rate (M ale)
94.5
104.5
Section 2-4: Scatterplots, Correlation, and Regression
16. The distribution appears to be roughly bell-shaped, so the distribution is approximately normal.
40
Frequency
30
20
10
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
114.5
Pulse Rate (Female)
17. Because the vertical scale starts with a frequency of 200 instead of 0, the difference between the โnoโ and โyesโ
responses is greatly exaggerated. The graph makes it appear that about five times as many respondents said
โno,โ when the ratio is actually a little less than 2.5 to 1.
18. The two costs are one-dimensional in nature, but the baby bottles are three-dimensional objects. The $4500 cost
isnโt even twice the $2600 cost, but the baby bottles make it appear that the larger cost is about five times the
smaller cost.
Section 2-4: Scatterplots, Correlation, and Regression
1. The term linear refers to a straight line, and r measures how well a scatterplot of the sample paired data fits a
straight-line pattern.
2. No. Finding the presence of a statistical correlation between two variables does not justify any conclusion that
one of the variables is a cause of the other.
3.
A scatterplot is a graph of paired ๏จ x, y ๏ฉ quantitative data. It helps us by providing a visual image of the data
plotted as points, and such an image is helpful in enabling us to see patterns in the data and to recognize that there
may be a correlation between the two variables.
4.
a. 1
b. 0
c. 0
d. โ1
5.
There does not appear to be a linear correlation between brain volume and IQ score.
Scatterplot for Exercise 5
Scatterplot for Exercise 6
115
450
400
110
350
IQ
Weight (lb)
105
100
300
250
200
150
95
100
90
1000
1100
1200
1300
1400
Volume
6.
25
30
35
40
Chest (in.)
There does appear to be a linear correlation between chest sizes and weights of bears.
45
50
55
Chapter 2: Exploring Data with Tables and Graphs
7. There does not appear to be a linear correlation between body temperature at 8 AM on one day and at 8 AM on
the following day.
Scatterplot for Exercise 7
Scatterplot for Exercise 8
77
97.75
76
Height of first son (in.)
Day 2
97.50
97.25
97.00
96.75
96.50
96.5
75
74
73
72
71
97.0
97.5
98.0
70
98.5
73
74
Day 1
8.
75
76
77
78
79
Height of father (in.)
There does not appear to be a linear correlation between heights of fathers and the heights of their first sons.
9. With n ๏ฝ 5 pairs of data, the critical values are ๏ฑ0.878. Because r ๏ฝ 0.127 is between โ0.878 and 0.878,
evidence is not sufficient to conclude that there is a linear correlation.
10. With n ๏ฝ 7 pairs of data, the critical values are ๏ฑ0.754. Because r ๏ฝ 0.980 is in the right tail region beyond
0.754, there are sufficient data to conclude that there is a linear correlation.
11. With n ๏ฝ 7 pairs of data, the critical values are ๏ฑ0.754. Because r ๏ฝ 0.502 is between โ0.754 and 0.754,
evidence is not sufficient to conclude that there is a linear correlation.
12. With n ๏ฝ 10 pairs of data, the critical values are ๏ฑ0.632. Because r ๏ฝ ๏ญ0.017 is between โ0.632 and 0.632,
evidence is not sufficient to conclude that there is a linear correlation.
Chapter Quick Quiz
1. The class width is 0.12 ๏ญ 0.08 ๏ฝ 0.04.
2.
The class boundaries are 0.075 and 0.115.
3.
No, it is impossible to determine the original values.
4.
16, 17, 18, 18, 19
7.
time-series graph
5.
bell-shaped
8.
scatterplot
6.
variation
9.
Pareto chart
10. A frequency distribution is in the format of a table, but a histogram is a graph.
Review Exercises
1.
Temperature (๏ฐF)
97.0โ97.4
97.5โ97.9
98.0โ98.4
98.5โ98.9
99.0โ99.4
Frequency
2
4
7
5
2
Review Exercises
2. Yes, the data appear to be from a population with a normal distribution because the bars start low and reach a
maximum, then decrease, and the left half of the histogram is approximately a mirror image of the right half.
7
6
Frequency
5
4
3
2
1
0
97.2
97.7
98.2
98.7
99.2
Body Temperature (Fahrenheit)
3.
By using fewer classes, the histogram does a better job of illustrating the distribution.
4.
There are no outliers.
5.
Yes. There is a pattern suggesting that there is a relationship.
32.5
30.0
Neck Size (in.)
27.5
25.0
22.5
20.0
17.5
15.0
100 150 200 250 300 350 400 450
Weight (lb)
6.
a. time-series graph
b. scatterplot
c. Pareto chart
7. By using a vertical scale that starts at 45% instead of 0%, the difference is greatly exaggerated. The graph
creates the false impression that male enrollees outnumber female enrollees by a ratio of about 3:1, but the actual
percentages of 53% and 47% are much closer than that.
Chapter 2: Exploring Data with Tables and Graphs
Cumulative Review Exercises
1.
Grooming Time (min)
0โ9
10โ19
20โ29
30โ39
40โ49
Frequency
2
3
9
4
2
2. The histogram is approximately bell-shaped. The frequencies increase to a maximum and then decrease, and the
left half of the histogram is roughly a mirror image of the right half. The data do appear to be from a population
with a normal distribution.
9
8
Frequency
7
6
5
4
3
2
1
0
0
10
20
30
40
50
Grooming Time (min)
3.
4. There are disproportionately many last digits of 0 and 5. Fourteen of the 20 times have last digits of 0 or 5. It
appears that the subjects reported their results and they tended to round the results. The data do not appear to be
very accurate.
Last Digit
0
1
2
3
4
5
6
7
8
9
Frequency
5
0
2
0
1
9
0
2
1
0
Cumulative Review Exercises
5.
a. ratio
b. continuous
c. No. The grooming times are quantitative data.
d. statistic
6.
The scatterplot helps address the issue of whether there is a correlation between heights of mothers and heights
of their daughters. The scatterplot does not reveal a clear pattern suggesting that there is a correlation.
71
Daughterโs Height (in.)
70
69
68
67
66
65
58
60
62
64
66
68
M otherโs Height (in.)
Copyright ยฉ 2018 Pearson Education, Inc.

