Read assignments before reaching out to me.
Due: by Monday 2 PM
Chapt. 7 assignments are at the end all exercises
You will have a link you must use information from that site for the CDC assignment. . Also use 2010 is the best time frame
for the examples.
Section
7
Analyzing our Marketing Test, Survey Results
and Other Metrics Using
Confidence Intervals
Rhonda Knehans Drake
Associate Professor, New York University
Data Analytics, Interpretation and Reporting
copyright © 2013
2
• When we estimate population averages or percentages based on
samples, a certain amount of error is present.
• The amount of error present is a function mostly of your
sample size.
• The larger the sample, the less error in your estimates.
• Today we will learn how to place bounds around our estimates
obtained.
• With such an interval, we will then be able to say with 90%, 95%
or 99% confidence that the true population estimate will lie in
these bounds.
Introduction
3
The eight confidence interval formulas we will discuss are for
the following situations:
1. Average or mean based on large samples
2. Average or mean based on small samples
3. Response rate or survey percent based on large samples
4. Response rate or survey percent based on small samples
5. Difference between 2 averages for large samples
6. Difference between 2 averages for small samples
7. Difference between 2 response rates for large samples
8. Difference between 2 response rates for small sample
Confidence Intervals
A-B Split Tests
4
1. Confidence Interval for Averages or Means Based on
Large Samples (n ≥ 30)
To calculate a confidence interval around a mean, the following
information is required:
– The sample mean x obtained from the test.
– The sample standard deviation S obtained from the test.
Many software packages, including Microsoft ExcelTM, can
automatically calculate this value for you. (Review Section 3 for the
standard deviation formula.)
– The sample size n of the test.
This is the number of observations used to calculate your mean. The
sample size must be greater than or equal to 30 in size.
– The desired confidence level: 90%, 95% or 99%.
A confidence interval constructed around the mean will guarantee, with
your specified level of confidence, the true mean will fall within those
bounds.
Sample Means
(Large Sample)
5
• Once all information is known, you construct the confidence
interval around the mean by adding and subtracting from the
mean a multiple of your standard deviation associated with the
sample mean. The “multiple” depends on your desired
confidence level.
• The formula for a confidence interval around the mean is
calculated as follows:
x (Z )(S/n )
Where:
• S is the standard deviation associated with the sample.
• n is the sample size.
• Z is equal to 1.645, 1.96 or 2.575 for a 90%, 95% or 99%
confidence level
Sample Means
(Large Sample)
6
Example:
Money magazine conducts a survey of 100 retirees across the US
and asks them how much they have in their retirement fund.
You obtain an average of $84.75 with a standard deviation of
$18.75, both in thousands of dollars.
You are about to write an article based on this average but realize
that the true average is something more or less than this in reality.
Construct a 95% confidence interval around this average.
Sample Means
(Large Sample)
7
Sample Means
(Large Sample)
84.75 1.96 (18.75 / √100)
84.75 1.96 (1.875)
84.75 3.675
($81,075 , $88,425)
1.
2.
3.
4.
8
Later we will discuss how to chose the confidence level and
address if 95% was the appropriate level for this example.
Sample Means
(Large Sample)
9
• Let’s do the previous
example again but using
the Plan-
alyzer.
1.
2.
3.
Sample Means
(Large Sample)
Select the tab “Table of
Calculators”
Select “Confidence
Interval Calculators for
Averages, Large Samples”
Select “One Sample”
10
Sample Means
(Large Sample)
Input the required info.
11
Sample Means
(Large Sample)
See the answer.
12
2) Confidence Interval for Averages or Means Based on
Small Samples (n < 30)
To calculate a confidence interval around a mean, the following
information is required:
– The sample mean x obtained from the test.
– The sample standard deviation S obtained from the test.
Many software packages, including Microsoft ExcelTM, can
automatically calculate this value for you. (Review Section 3 for the
standard deviation formula.)
– The sample size n of the test.
This is the number of observations used to calculate your mean. The
sample size must be greater than or equal to 30 in size.
– The desired confidence level: 90%, 95% or 99%.
A confidence interval constructed around the mean will guarantee,
with your specified level of confidence, the true mean will fall within
those bounds.
Sample Means
(Small Sample)
13
• Once all information is known, you construct the confidence
interval around the mean by adding and subtracting from the
mean a multiple of your standard deviation associated with the
sample mean. The “multiple” depends on your desired
confidence level.
• The formula for a confidence interval around the mean is
the same as the prior formula except we use a value from the
“t-distribution” which is for approximate normally distributed
data:
x (t )(S / n )
Where:
• S is the standard deviation associated with the sample.
• n is the sample size.
• t is obtained by using the excel function TINV as will be seen
shortly.
Sample Means
(Small Sample)
14
Example:
Suppose Money Magazine only conducted the survey to a sample
of 10 retirees instead of 100 as in our prior example, all else the
same.
Construct a 95% confidence interval around this average.
Sample Means
(Small Sample)
15
We construct the confidence interval as before except we will use
The t-distribution.
84.75 ± (t) (18.75/√10 )
Where the value of t = TINV(.05,9)
= 2.262
84.75 ± (2.262) (18.75/3.1623)
84.75 ± (2.262) (5.93)
84.75 ± 13.
41
($71.34, $98.16)
(Small Sample)
(1-conf
level)
(n-1)
Sample Means
16
Note our confidence interval is wider for two reasons:
1. The smaller sample size
1. Our sample is less than 30 we cannot assume it is normal but
only approximately normal so our multiplier is larger (2.2
62
versus 1.96).
Sample Means
(Small Sample)
17
• Let’s do the previous
example again but using
the Plan-alyzer.
1.
2.
3.
Sample Means
(Small Sample)
Select the tab “Table of
Calculators”
Select “Confidence
Interval Calculators for
Averages, Small Samples”
Select “One Sample”
18
Sample Means
(Small Sample)
Input the required info.
19
Sample Means
(Small Sample)
See the answer.
20
3. Confidence Intervals for Response Rates or Survey
Percentages Based on Large Samples (where n*p and
n*(1 – p) are both ≥ 5)
To calculate a confidence interval around a sample proportion, the
following information is required.
– The sample proportion p obtained from the test.
– The sample size n of the test.
This is the number of observations used to calculate your proportion.
The sample size, when multiplied by the sample proportion and when
multiplied by one minus the sample proportion, must both be greater
than or equal to
5.
– The desired confidence level: 90%, 95% or 99%.
A confidence interval constructed around the sample proportion will
guarantee, with your specified level of confidence, the true population
proportion will fall within those bounds.
Sample Proportions
(Large Sample)
21
• Once all information is known, you construct the confidence
interval around the sample proportion by adding and subtracting
from the sample proportion a multiple of the standard deviation
associated with the sample proportion. The “multiple” depends
on your desired confidence level.
• The formula for a confidence interval around the sample
proportion is calculated as follows:
• Where n is your sample size and Z is 1.645, 1.96, 2.575 for a
90%, 95% or 99% confidence interval
This is the standard deviation for a
proportion or response rate
Sample Proportions
(Large Sample)
22
Example:
AT&T samples a new prospect list and sends them an offer to
order their new wireless cellular service.
They sample 10,000 prospects and receive a 0.89% response rate.
What is the margin of error at 90% confidence?
Sample Proportions
(Large Sample)
23
• So, for our example we have the confidence interval is:
p^ (z/2 )(Sp^ )
= .0089 ± (1.645)·√ (.0089)(1- .0089 )/10,000
= .0089 ± (1.645)·√ (.0089)(.9911 )/10,000
= .0089 ± (1.645)·√ 10000008
= .0089 ± (1.645)·(.0008944)
= .0089 ± .0015
(.0074, .0104) or (.74%, 1.04%)
Sample Proportions
(Large Sample)
0.0089 1.645*(√[(0.0089)*(0.9911)/10,000]
0.0089 1.645*(√0.0000008)
0.0089 1.645*(0.0008944)
0.0089 .0015
(0.0074 , 0.0104) or (0.74% , 1.04%)
1.
2.
3.
4.
5.
24
• Let’s do the previous
example again but using
the Plan-alyzer.
1.
2.
3.
Sample Proportions
(Large Sample)
Select the tab “Table of
Calculators”
Select “Confidence
Interval Calculators for
Percentages, Large
Samples”
Select “One Sample”
25
Sample Proportion
(Large Sample)
Input the required info.
26
Sample Proportion
(Large Sample)
See the answer.
27
4. Confidence Intervals for Response Rates or Survey
Percentages Based on Small Samples (where either n*p or
n*(1 – p) are < 5)
To calculate a confidence interval around a sample proportion, the
following information is required:
– The sample proportion p obtained from the test.
– The sample size n of the test.
This is the number of observations used to calculate your proportion.
The sample size, when multiplied by the sample proportion and when
multiplied by one minus the sample proportion, must both be greater
than or equal to 5.
– The desired confidence level: 90%, 95% or 99%.
A confidence interval constructed around the sample proportion will
guarantee, with your specified level of confidence, the true population
proportion will fall within those bounds.
Sample Proportions
(Small Sample)
28
• Once all information is known, you construct the confidence
interval around the sample proportion by adding and subtracting
from the sample proportion a multiple of the standard deviation
associated with the sample proportion. The “multiple” depends
on your desired confidence level.
• The formula for a confidence interval around the sample
proportion is the same as the prior formula except we use a
value from the “t-distribution” which is for approximate normally
distributed data:
• Where n is your sample size and t is obtained by using the
Excel function TINV as will be seen shortly.
This is the standard deviation for a
proportion or response rate
Sample Proportions
(Small Sample)
29
Example:
Suppose AT&T only sampled 100 prospects instead of 10,000 as
in our previous example, all else the same.
What is the margin of error at 90% confidence?
Sample Proportions
(Small Sample)
30
We construct the confidence interval as before except we will use
the t-distribution.
0.0089 (t)·√(.0089)(1- .0089 )/100
Where the value of t = TINV(.10,99)
= 1.66
= .0089 ± (1.66)·√(.0089)(.9911 )/100
= .0089 ± (1.66)·√ .0000882
= .0089 ± (1.66)·(.0093914)
= (0.00, 0.0245) or (0.00%, 2.45%)
The lower bound here cannot be negative, so we change it to zero.
Sample Proportions
(Small Sample)
(1-conf
level)
(n-1)
31
• Let’s do the previous
example again but using
the Plan-alyzer.
1.
2.
3.
Sample Proportions
(Small Sample)
Select the tab “Table of
Calculators”
Select “Confidence
Interval Calculators for
Percentages, Small
Samples”
Select “One Sample”
32
Sample Proportions
(Small Sample)
Input the required info.
33
See the answer.
Sample Proportions
(Small Sample)
34
5. Confidence Interval for the Difference between 2 Means or
Averages for Large Samples (n1 ≥ 30 and n2 ≥ 30 )
To calculate a confidence interval around the difference between
two sample means, the following information is required:
– The means of both samples (x1 and x2).
– The standard deviation of both samples (S1 and S2).
Many software packages, including Microsoft ExcelTM, can
automatically calculate these values.
– The size of both samples (n1 and n2).
These are the number of observations that went into calculating each
of your means. Both sample sizes n1 and n2 must be greater than or
equal to 30 in size.
– The desired confidence level: 90%, 95% or 99%.
A confidence interval constructed around the sample proportion will
guarantee, with your specified level of confidence, the true population
proportion will fall within those bounds.
•
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Samples Means – Large Samples)
35
• Once all information is known, you construct the confidence
interval around the difference between two mean by adding and
subtracting from the difference in means a multiple of the
standard deviation associated with the difference. The
“multiple” depends on your desired confidence level.
• The formula for a confidence interval around the difference
between means is calculated as follows:
• Where Z is 1.645, 1.96, 2.575 for a 90%, 95% or 99%
confidence interval.
This is the standard deviation for the
difference in averages.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
(X1 – X2) ± (Z)(√(S1
2 /n1) + (S2
2 /n2))
36
Example:
You sample 100 home sales in San Francisco and 100 home sales
in NYC for 2010 with the following results:
NSF = 100 XSF = $745.25 SSF = $
40
NNYC = 100 XNYC = $775.10 SNYC = $
45
Is there any difference in home prices between NYC and San
Francisco? Base your answer on the 95% confidence level.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
37
• Assume SSF = $40 and SNYC = $45
(775.10 – 745.25) ± (1.96)·√ [(402/100)+(452/100)]
29.85 ± (1.96)·√ 16 + 20.25
29.85 ± (1.96)·√ 36.25
29.85 ± (1.96)·(6.021)
29.85 ± 11.80
($18,050, $41,650)
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
1.
2.
3.
4.
5.
(775.10-745.25) 1.96*(√[(402/100)+(452/100 )])
29.85 1.96*(√(16+20.25))
29.85 1.96*(6.021)
29.85 11.80
($18.050 , $41.650)
38
How do we interpret?
• What if the interval was -$18,050 to $41,540. How would you
interpret and is it okay in this case to have a negative value?
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
– With 95% confidence, we can say that NYC home prices in
2010 are higher than SF home prices by anywhere from
$18,050 to $41,540.
– If Zero were in the interval then you would say no
difference between the two (a hypothesis test!!).
– It means there’s no statistical evidence to conclude that
NYC prices are different from those of SF.
– Its ok to have a negative value.
39
• Let’s do the previous
example again but using
the Plan-alyzer.
1.
2.
3.
Select the tab “Table of
Calculators”
Select “Confidence
Interval Calculators for
Averages, Large Samples”
Select “Test vs. Control”
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
40
Input the required info.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
41
See the answer.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Large Samples)
42
6. Confidence Interval for the Difference between 2 Means or
Averages for Small Samples (n1 < 30 or n2 < 30).
– If one or both samples is less than 30 in size then you will
replace the Z value with the t value.
– You will again use the TINV function in excel.
– Your parameters are TINV(1- confidence level, n1+n2 – 2).
– All else the same.
– This will be used for small market research problems.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Means – Small Samples)
43
7. Confidence Intervals for the Difference Between 2
Percentages for Large Samples (n1*p1, n1*(1-p1), n2*p2,
n2*(1-p2), all ≥ 5)
To calculate a confidence interval around the difference between
two sample proportions, the following information is required:
– The proportions (p1 and p2) for both samples.
– The size of both samples (n1 and n2).
These are the number of observations used in calculating each of the
sample proportions. Both sample sizes, when multiplied by their
respective sample proportions and when multiplied by one minus their
respective sample proportions, must all be greater than or equal to 5.
– The desired confidence level: 90%, 95% or 99%.
A confidence interval constructed around the sample proportion will
guarantee, with your specified level of confidence, the true population
proportion will fall within those bounds.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
44
• Once all information is known, you construct the confidence
interval around the difference between two proportions by adding
and subtracting from the difference in proportions a multiple of the
standard deviation associated with the difference. The “multiple”
depends on your desired confidence level.
• The formula for a confidence interval around the difference
between proportions is calculated as follows:
• Where Z is 1.645, 1.96, 2.575 for a 90%, 95% or 99% confidence
interval.
This is the standard deviation for the
difference in proportions.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
45
Example:
You are in charge of new card acquisitions at American Express.
You conduct a new offer test for the green card versus your
control offer with the following results
Did the test beat the control with 95% confidence? Do you have a
winner?
Sample Size Response Rate
Control Offer with 10,000 Bonus Miles 10,000 1.10%
Test Offer with 25,000 Bonus Miles 10,000 1.38%
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
1.
2.
3.
4.
46
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
0.00297
0.000001
.0028
.0028
47
So how do we interpret?
• With 95% confidence the test can do worse than the control by -.017%
OR do better than the control by .577%.
• As such we say the test and the control are not significantly different
since the confidence interval wrapped around the difference in
response rates contains zero.
• Had the lower bound been above zero then we would say the test has
beaten the control.
• But let’s be real here. For all purposes, the test is a winner. The lower
bound is soooo close to zero. So worst case the test is the same as
the control with much upside potential.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
48
So how do we interpret (continuation)?
• But remember just because the test beaten the control from a statistical
point of view, that does not mean that it won from a marketing point of
view.
• In this example we were giving away additional sky miles. So the test
will need to beat the control by some minimum most likely greater than
zero or else we will not generate the same revenue.
• Suppose I told you that based on the cost of the additional sky miles
the test needs to beat the control by at least .025% to break-even.
Would you consider it a winner?
• What if I told you the test needs to beat the control by .25% to break
even. Would you now consider the test a winner?
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
49
• Let’s do the previous
example again but using
the Plan-alyzer.
1.
2.
3.
Select the tab “Table of
Calculators”
Select “Confidence
Interval Calculators for
Proportions, Large
Samples”
Select “Test vs. Control”
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
50
Input the required info.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
51
See the answer.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Large Samples)
52
8. Confidence Intervals for the Difference Between 2
Percentages for Small Samples (where n1*p1 or n1*(1-p1)
or n2*p2 or n2*(1-p2) < 5)
• You will replace the Z value with the t value.
• You will again use the TINV function in excel.
• Your parameters are TINV(1- confidence level, n1+n2 – 2).
• All else the same.
• This will be used for small market research problems.
Confidence Interval Estimation (A-B Split Testing)
(Difference Between Two Sample Proportions – Small Samples)
53
• Remember, interpretation of confidence intervals is not that
simple.
• They will not tell you what to do.
• They simply give you valid best and worst case scenarios to
take into
consideration.
• They give you additional information upon which to help you
base your marketing decisions:
• Worst case, are the results meeting your criteria?
• How does the upside potential compare to the downside potential?
Interpretation of Confidence Intervals
54
• No direct marketer should ever consider evaluating their test
results with a confidence level lower than 90%. To do so assumes
way to much risk.
• And, fishing for a confidence level that yields significance should
never be practiced.
• The rules that any good direct marketer should follow regarding
significance are shown on the next slide.
Setting the Confidence Level
55
Evaluate your test response
rate at 95% confidence level
Significant?
Yes No
Is it significant at the 99%
confidence level?
Not that we want to go that
low but is it significant at the
90% confidence level?
Yes No Yes No
A no brainer.
Let’s roll!
That’s okay…at
a minimum let’s
consider a
partial to full
roll out.
Okay, so we
have something
here. Let’s
either retest or
go for a partial
rollout.
Not good. We
should scrape
this from further
consideration.
Setting the Confidence Level
56
It is important to keep in mind the following facts regarding the
creation of a confidence interval.
• If you want more confidence in your estimates, the resulting interval
will widen.
• If you increase your sample size, the resulting interval will become
tighter.
• The more accuracy you need in your test estimate, the higher you
should set your confidence level.
• The confidence level you set should depend on the risk you are
willing to take in making an incorrect decision.
Setting the Confidence Level
57
• If our sample represents a large percent (> 10%) of the population in total,
then we typically apply a correction factor to our margin of error estimates.
• The larger our sample is as a percent of the total population the more valid
our estimates.
• Of the following two samples, which would you think would yield a better
parameter estimate?
• A sample of size 5,000 from a population of size 10,000 in total
• A sample of size 5,000 from a population of size 10,000,000 in total.
• If our sample represents 10% or more of the total population, you
multiple the margin of error for the first four formulas by:
Finite Population Correction Factor
58
Example:
Going back to our first exercise, suppose the survey of 100 people
was conduct to the 800 residents of the Happy Retirement Village and
not Money Magazine subscribers.
What is our correction factor and what is the new interval?
Finite Population Correction Factor
3.
1.
2.
3.4398
59
7.1 Briefly explain how the width of a confidence interval decreases
with an increase in the sample size. Give me an example.
7.2 Briefly explain how the width of a confidence interval decreases
with a decrease in the confidence level. Give me an example.
7.3 According to a study done by Dr. Martha S. Linet and others, the mean
duration of the recent headache was 8.2 hours for a sample of 5055
females aged 12 through 29. Assume that this sample represents the
current population of all headaches for all females aged 12 through 29
and that the standard deviation for this sample is 2.4 hours. Make a 95%
confidence interval for the mean duration of all headaches for all 12-to-29-
year-old females. Do by hand and
using The Plan-alyzer.
7.4 A sample of 12 observations was drawn from a population of size 100.
Calculate a 95% confidence interval around the average for this sample
by hand. HINT: You will want to use the finite population correction factor for this
problem as found on slides 57 and 58.
13 15 9 11 8 16 14 9 10 14 16 12
Section 7 Exercises
60
7.5 A company wants to estimate the mean net weight of its “Big Top Circus”
cereal boxes. A sample of 16 such boxes produced the mean net weight of
31.98 ounces with a standard deviation of .26 ounces. Make a 95%
confidence interval for the mean net weight of all boxes. Do by hand and
using The Plan-alyzer.
7.6 Crate and Barrel Cataloger promises its customers that the products
ordered will be mailed within 72 hours after an order is placed. The quality
control department at the company checks from time to time to see if this
promise is fulfilled. Recently, the quality control department took a sample
of 50 orders and found that 42 of them were mailed within 72 hours of the
placement of the orders.
a) Construct a 95% confidence interval for the percentage of all orders
that are mailed within 72 hours of their placement. Do by hand and
using The Plan-alyzer.
b) Suppose the confidence interval obtained in part a is too wide. How
can the width of this interval be reduced? Discuss all possible
alternatives. Which of these alternatives is the best?
Section 7 Exercises
61
7.7 In virtual reality a person views a computer-generated scene that changes
as if the viewer’s body were in motion. Some individual experience
unpleasant side effects from virtual reality, such as nausea, dizziness, or
disorientation. In a recent study by Clare Tegan of Britain’s Defense
Research Agency, each of the 150 people included in the study spent 20
minutes wearing a head-mounted virtual reality system through which he
or she explored a virtual environment consisting of a series of rooms.
Either during their time in the virtual environment or in the 10 minutes
immediately afterward, 61% of these 150 persons suffered side effects.
Find the 95% confidence interval for the proportion of all virtual reality
users who would suffer side effects. Do by hand and using The Plan-
alyzer.
Section 7 Exercises
62
7.8 One of the major problems faced by department stores is a high
percentage of returns. The manager of Macy’s wanted to estimate the
percentage of all sales that result in returns. A sample of 40 sales showed
that 8 of them had products returned within the time allowed for returns.
a) Construct a 99% confidence interval for the percentage of all sales
that result in returns. Do by
hand and using The Plan-alyzer.
b) Do you think 99% confidence is appropriate in this case and if not
what would be a more appropriate level of confidence to use?
7.9 According to a survey, the mean price of gasoline in the U.S. was
$1.20 per gallon in 1995 and $1.10 per gallon in 1994 (Wow, don’t you
wish!) Suppose these means were based on random samples of 100
gas station for 1995 and 120 gas station for 1994. Also, assume that
the sample standard deviations were $.11 for 1995 and $.10 for 1994.
Find a 90% confidence interval for the difference between the mean
gasoline prices for 1995 and 1994. Do by hand and using The Plan-
alyzer.
Section 7 Exercises
63
7.10 An insurance company wants to know if the average speed at which men
drive cars is higher than that of women drivers. The company took a
random sample of 27 cars driven by men on a highway and found the
mean speed to be 68 miles per hour with a standard deviation of 2.2 miles.
Another sample of 18 cars driven by women on the same highway gave a
mean speed of 65 miles per hour with a standard deviation of 2.5 miles.
Assume that the speeds at which all men and all women drive cars on this
highway are both known to be normally distributed. Construct a 99%
confidence interval for the difference between the mean speeds of cars
driven by all men and all women drivers on this highway. Do by hand and
using The Plan-alyer.
Section 7 Exercises
64
7.11 Removed.
7.12 In a Prevention magazine survey released in 2008, Princeton Survey
Research Association examined the weight of children aged 3 to 17.
According to this study, 24% of children in this age group were
overweight in 2000, and 31% were considered overweight in 2008.
Suppose that these percentages are based on random samples of 400
and 500 children in the given age group in 2000 and 2008, respectively.
Conduct a 95% confidence interval for the difference between the
portions of the overweight 3-to-17-year-olds in 2000 and 2008. Do by
hand and using The Plan-alyzer.
Section 7 Exercises
CDC
CASESTUDY ASSIG
N
MENT
Rising Insurance Premium for the Northeast
You are a journalist for the New York Times. Your specialty is research.
Some reader’s have written you complaining that their health insurance premiums are going up while that is not the case for people living in the Midwest. Upon calling several insurance companies (at your reader’s prompting), you were told the same thing –
“Those residing in the Northeast compared to those residing in the Midwest live a more risky lifestyle in terms of alcohol consumption, driving habits, etc. As such their premiums unfortunately are going up while those living in the Midwest are not.”
You have decided to research this and write an article for the newspaper to either confirm or refute this claim by the insurance companies. You suspect they are not telling the whole story and that the cost difference is most likely a function of the higher cost of living in the Northeast.
Upon searching for various data sources to assist you in this effort, you ran across the Center for Disease Control’s “2010 Behavior Risk Factor Surveillance Survey.” This survey is conducted every 10 years and compiles the following information by state.
Variable |
Label |
SMK |
Current cigarette smokers |
WEI |
Overweight (based on government height weight formula |
SED |
Sedentary lifestyle (less than three 20-minuted exercise sessions a week) |
ACT |
No leisure time activity off of the job |
ALC |
Binge drinking (five or more drinks on occasion) |
DWI |
Drinking and driving (after too much to drink) |
SEA |
Seat-belt use (occasionally or never) |
STATE |
U.S. State (alphanumeric) |
N |
Number of people surveyed |
In this study you will note that about 1,000 people from each state were randomly chosen to estimate the true population percentages.
The file has been uploaded to blackboard. A printout of the dataset is attached. Please note that some data is missing for certain states and is denoted with an asterisk.
Your task:
1. First you will need to determine which states represent the Midwest and which represent the Northeast. You can find this information as defined by the U.S. government at the web address: www.census.gov/geo/www/us_regdiv
2. Once determined you will then need to weight together the survey percentages based on population estimates for each state for the northeast and again for the Midwest. Use the 2000 population estimates. You can find these population estimates at the web address:
http://quickfacts.census.gov/qfd/index.html
3. Once all data is obtained and weighted together, you will then be ready to conduct some hypothesis tests to determine statistical significance. I suggest you create confidence intervals around the difference in percentages or averages. For example, you may want to determine if the percentage of those that binge drink is higher in the Northeast versus the Midwest and if so, is the difference statistically significant.
4. In addition to using the CDC data you are required to also add some census data on income (median income) and education (% with bachelors degree) to the mix. But you must confirm it makes sense by correlating these measures with the risky CDC behaviors at a state level. In other words, you will conduct a correlation analysis where you correlate the census median incomes and percent with a bachelors degree versus the CDC smoking and alcohol consumption percentages. Are they correlated and if so how?
5. Addiitonally read the lasted research on education on life expectancy. Google “Do Rich People Live Longer Harvard.” You may wish to cite portions of this study in your findings.
6. In addition, I want you to consider the cost of living differences for the NE states versus the Midwest states. Google “Money Magazine Cost of Living Calculator” to find a tool to help you conduct this portion of the analysis. The true reason as to why we in the NE pay more is simply because our cost of living is more. Is that true? What do these calculators suggest?
7. At this stage you will then be ready to create a final presentation. The presentations will need to display the raw data and findings in table and graph format using Excel. Be creative. (Hint: Perry loves presentations with nice backgrounds and graphics).
8. Your final report will be in Power Point format with no more than 15 slides.
9. Each Presentation must have at a minimum the following sections:
· Introduction
· Data Used and How Collected
· Analysis Steps
· Findings
· Conclusion
10. On the last class each person will present their findings.
Census Web Site Screen Shoot
1. This is where you will go. At this location you can get mini reports by state including population estimates, median income values, education levels, etc. It is a vast source of free information by state.
2. To find the map that defines what states represent Northeast versus the Midwest go to the PDF file located at: www.census.gov/geo/www/us_regdiv
2010 CDC Survey Data Regarding Behavior Risk