c06Exploration_6.1A.indd Page 143 23/07/15 12:27 AM f-389/201/WB01616/9781118335888/ch06/text_s

Student Name:

Save

Print

EX P LO RAT IO N 6.1A : Haircut Prices

Haircut Prices

EXPLORATION

Do women pay more than men for haircuts? Is this a statistical tendency or always true? By

how much do women spend more than men, on average? How much do haircut prices vary

within a sex as well as between sexes?

To investigate these questions a professor asked students in her class to report the cost of

their most recent haircut, along with their sex.

1. Which would you consider to be the explanatory variable and which the response? Also

classify the type (categorical or quantitative) for each variable.

Explanatory:

Type:

Response:

Type:

2. Is this an experiment or an observational study? Explain briefly.

3. Did the professor who collected the data make use of random sampling, random assign-

ment, both, or neither?

Sex

The following “parallel” dotplots (using the same scale along the horizontal axis) reveal

the sample distributions of haircut prices for each sex:

Female

Male

0

25

50

75

100

Haircut cost (in dollars)

125

150

4. Compare and contrast the distributions of haircut prices between men and women in this

class. (Hint: As you learned in the Preliminaries, comment on center, variability, shape,

and unusual observations. You should give enough detail that someone reading your comments could re-create the overall pattern of the graphs from your description. Also be sure

to relate your comments to the context.)

5. Further explore the data.

a. Explain why the right skewness of these distributions makes sense in this context.

6.1A

Reset

143

c06Exploration_6.1A.indd Page 144 23/07/15 12:27 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

144

CH A PT E R 6

Comparing Two Means

b. Explain why the unusual behavior at the lower end of the dotplots—several values at $0

and then a gap to the next smallest prices—makes sense in this context.

6. Based on the dotplots (without performing any calculations), make a guess for each sex’s

mean haircut price.

Men:

Women:

7. Based on the dotplots (without performing any calculations), which sex do you think has

the larger standard deviation of haircut prices? Explain your answer.

8. Copy the haircut data, which you can access from the book’s website, to the clipboard.

Open the Descriptive Statistics applet. Check the Stacked check box (notice the applet

assumes the explanatory variable is the first column and the response variable is the

second column.) Also keep the Includes header box checked and press Clear. Paste

the data into the Sample data box and press Use Data. You should see that the dotplots

are similar to the ones shown previously. Check the Actual boxes to show the means and

standard deviations (Std dev).

a. Report the sample size, mean haircut price, and standard deviation (SD) of haircut

prices for each sex. (Include appropriate symbols and measurement units.)

Sample size

Sample mean

Sample SD

Men

Women

b. Which sex has the larger mean haircut price? Is this what you predicted in #6?

c. Which sex has the larger SD of haircut prices? Is this what you predicted in #7?

9. Would you conclude that these data show an association between haircut price and a

person’s sex? If so, describe the nature of this association.

To enter special characters and

formatting, use “Ctrl+E” in

Windows and “Cmd + E” in

Mac.

Reset

c06Exploration_6.1A.indd Page 145 23/07/15 12:27 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.1A : Haircut Prices

10. Let’s return to the research questions that we started with. Address these questions based

on the previous graphs and statistics.

a. Do women pay more than men for haircuts? If so, is this a statistical tendency (i.e., true

on average) or always true?

b. By how much do women spend more than men, on average?

11. Do the unequal sample sizes between the two sexes lead you to doubt whether any con-

clusions can be drawn from these data? Explain.

12. Based on how these data were collected, would you feel comfortable generalizing your

results to the population of all college students in the U.S.? How about the population of

all college students at the professor’s university? Explain your answers.

13. Based on both the differences in centers and the amount of overlap in the distributions

of haircut prices between men and women, do you predict that the difference will turn

out to be statistically significant? (You will learn how to assess statistical significance in

the next section.)

Further analyses

The individual values of the haircut prices that you have been analyzing are:

Women (n =

37):

0, 0, 0, 15, 15, 15, 20, 20, 20, 25, 30, 30, 35, 35, 35, 40, 45, 45, 45,

45, 50, 50, 50, 50, 55, 60, 65, 70, 70, 75, 90, 110, 120, 120, 150,

150, 150

Men (n = 13):

0, 0, 0, 14, 15, 15, 20, 20, 20, 22, 23, 60, 75

As you learned in Section 3.2, the median is the middle value in a data set once the values are

arranged in order. The location of the median can be found by calculating (n + 1)2.

Reset

145

c06Exploration_6.1A.indd Page 146 23/07/15 12:27 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

146

CH A PT E R 6

Reset

Comparing Two Means

14. Now let’s look at the median haircut prices and compare them to the mean haircut price.

a. Determine (by hand) the median haircut price for each sex. (You can verify your calcu-

lation by checking the Actual box for the Median in the applet.)

Median haircut price for women:

Median haircut price for men:

b. Which sex has the larger median haircut price? Is this what you expected?

Definitions

c. Are the medians less than or greater than the means? Is this consistent with the right-

skewed distributions of haircut prices? Explain.

The value for which 25%

of the data lie below

that value is called the

lower quartile (or 25th

percentile). Similarly, the

value for which 25% of the

data lie above that value is

called the upper quartile

(or 75th percentile). Quartiles can be calculated by

determining the median of

the values above/below

the location of the overall

median. The difference

between the quartiles is

called the inter-quartile

range (IQR), another measure of variability along

with standard deviation.

The five-number

summary for the distribution of a quantitative

variable consists of the

minimum, lower quartile,

median, upper quartile,

and maximum.

One way to summarize the distribution of a quantitative variable such as haircut price is by

dividing the distribution into four pieces of roughly equal size (number of observations). In

other words, summarize the distribution by determining where the bottom 25% of the data

are, the next 25%, the next 25%, and then the top 25%.

15. Explore the data using quartiles and the five-number summary.

a. Calculate the lower quartile for the women’s haircut prices. First note that the median is

the (37 + 1)2 = 19th ordered value. So, the lower quartile is the median of the bottom

18 values, which is found in position (18 + 1)2 = 9.50. So, the lower quartile is the

average of the 9th and 10th ordered values from the bottom.

b. Similarly, calculate the upper quartile for the women’s haircut prices.

c. Calculate the lower and upper quartiles for the men’s haircut prices.

16. Report the five-number summary for the women’s haircut prices and for the men’s hair-

cut prices:

Minimum

Women’s haircut prices

Men’s haircut prices

Lower

quartile

Median

Upper

quartile

Maximum

c06Exploration_6.1A.indd Page 147 23/07/15 12:27 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.1A : Haircut Prices

17. In the applet, check the box for Boxplot to overlay the two boxplots as well. Describe what

the boxplots reveal about how the distributions of haircut prices compare between male

and female students in the professor’s class.

18. Find the IQR for both the men’s and women’s haircut prices.

One advantage to the IQR is that it is not sensitive to extreme values/outliers like the standard deviation is. Just like the median is not sensitive to extreme values/outliers, but the

mean is.

KEY IDEA

The IQR is a resistant measure of variability, whereas the standard deviation is sensitive to extreme values and skewness.

19. Remove the male haircut price of $75 from the data and (in the applet) find the new SD

and IQR. Compare them to the values you had earlier and confirm that this illustrates that

the IQR is more resistant to extreme values than the SD.

More data

20. Collect data from yourself and your classmates on most recent haircut price and sex.

Enter the data into the applet and examine dotplots, boxplots, and summary statistics.

Write a paragraph or two summarizing what the data reveal about whether haircut price

is associated with the sex of your classmates. Also address the research questions with

which this exploration began. Finally, comment on how broadly you would be willing to

generalize your findings.

Reset

147

Definition

A boxplot is a visual

display of the five-number summary. The box

displays the middle 50%

of the distribution and

its width (the IQR) helps

us compare the spread

of the distribution; the

whiskers extend to the

smallest and largest values in the data set.

c06Exploration_6.2.indd Page 151 23/07/15 12:28 AM f-389

Student Name:

/201/WB01616/9781118335888/ch06/text_s

Save

Print

EX P LO RAT IO N 6.2: Lingering Effects of Sleep Deprivation

Lingering Effects of Sleep Deprivation

STEP 1: Ask a research question. Many students pull “all-nighters” when they have

an important exam or a pressing assignment. Concerns that may arise include: Can you really

function well the next day after a sleepless night? What about several days later: Can you recover

from a sleepless night by getting a full night’s sleep on the following nights?

1. What is your research conjecture about whether or not one can recover from a sleepless

night by getting a full night’s sleep on the following nights? What are some other related

questions that you would be interested in investigating related to this issue?

STEP 2: Design a study and collect data. Researchers Stickgold, James, and Hobson

investigated delayed effects of sleep deprivation on learning in a study published in Nature

Neuroscience (2000). Twenty-one volunteers, aged 18–25 years, were first trained on a visual

discrimination task that involved watching stimuli appear on a computer screen and reporting

what was seen. See Figure 6.9.

Subjects were flashed the screen on the left and then it was masked by

the screen on the right. Then subjects were asked whether they had seen an L or a V and

whether the slanted lines were placed vertically or horizontally.

FIGURE 6.9

After the training period, subjects were tested. Performance was recorded as the minimum time (in milliseconds) between the appearance of stimuli and an accurate response. Following these baseline measurements, one group was randomly assigned to be deprived of sleep

for 30 hours, followed by two full nights of unrestricted sleep, whereas the other group was allowed to get unrestricted sleep on all three nights. Following this, both groups were retested on

the task to see how well they remembered the training from the first day. Researchers recorded

the improvement in performance as the decrease in time required at retest compared to training.

(Note: For example, if someone took 5 milliseconds (ms) to respond at the beginning of the

study and then 2 ms to respond at the end, the improvement score is 3 ms. But if someone took

2 ms at the beginning and then 5 ms at the end, the improvement score is −3 ms.)

The goal of the study was to see whether the improvement scores tend to be higher for the

unrestricted sleep treatment than for the sleep deprivation treatment.

EXPLORATION

6.2

Reset

151

c06Exploration_6.2.indd Page 152 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

152

CH A PT E R 6

Comparing Two Means

2. Identify the explanatory and response variables in this study. Also classify them as either

categorical or quantitative.

Explanatory:

Type:

Response:

Type:

3. Was this an experiment or an observational study? Explain how you are deciding.

4. Let 𝜇unrestricted be the long-run mean improvement on this task three days later when

someone has had unrestricted sleep and let 𝜇deprived denote the long-run mean improvement when someone is sleep deprived on the first night.

In words and symbols, state the null and the alternative hypotheses to investigate whether

sleep deprivation has a negative effect on improvement in performance on visual discrimination tasks. (Hint: For the alternative hypothesis: Do you expect the people to do better

or worse when sleep deprived? Based on your answer, what sign/direction should you

choose for the alternative hypothesis?)

Here are the data, with positive values indicating better performance at retest than at training,

and negative values indicating worse performance at retest than at training:

Unrestricted-sleep group’s improvement scores (milliseconds):

25.20, 14.50, −7.00, 12.60, 34.50, 45.60, 11.60, 18.60, 12.10, 30.50

Sleep-deprived group’s improvement scores (milliseconds):

−10.70, 4.50, 2.20, 21.30, −14.70, −10.70, 9.60, 2.40, 21.80, 7.20,

10.00

STEP 3: Explore the data.

5. To look at graphical and numerical summaries of the data from the study, go to the

Multiple Means applet. The sleep deprivation data have already been entered into the applet.

a. Notice that the applet creates parallel dotplots, one for each study group. Based on

these dotplots alone, which group (unrestricted or deprived) appears to have had the

higher mean improvement? How are you deciding?

b. Based on the dotplots alone, which group (unrestricted or deprived) appears to have

had more variability in improvement? How are you deciding?

Reset

c06Exploration_6.2.indd Page 153 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.2: Lingering Effects of Sleep Deprivation

c. Notice also that the applet also computes numerical summaries of the data, such as the

mean and standard deviation (SD) for the improvements in each group.

i. For the unrestricted group, record the sample size (n), mean, and SD.

n u n re s t r i c t e d

=

xunrestricted =

SD unrestricted

=

ii. For the deprived group, record the sample size (n), mean, and SD.

ndeprived =

xdeprived =

SDdeprived =

Recall from earlier in the course that the standard deviation is a measure of variability.

Relatively speaking, smaller standard deviation values indicate less variability and a distribution whose data values tend to cluster more closely together, compared to a distribution with

a larger standard deviation.

d. Based on the numerical summaries reported in #5(c), which group (unrestricted or

deprived) had the higher mean improvement?

e. Based on the numerical summaries reported in #5(c), which group (unrestricted or

deprived) had the higher variability in improvement?

f. Notice that the applet also reports the observed difference in means for the improve-

ments of the two groups. Record this value (and its measurement units).

xunrestricted − xdeprived =

g. Before you conduct an inferential analysis, does this difference in sample means (as

reported in #5(f)) strike you as a meaningful difference? Explain your answer.

STEP 4: Draw inferences.

6. What are two possible explanations for why we observed the two groups to have different

sample means for improvement in performance?

7. Describe how you might go about deciding whether the observed difference between the

two sample means is statistically significant. (Hint: Think about how you assessed whether

an observed difference between two sample proportions was statistically significant in

Chapter 5. Use the same strategy, with an appropriate modification for working with

means instead of proportions.)

Reset

153

c06Exploration_6.2.indd Page 154 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

154

CH A PT E R 6

Comparing Two Means

Once again the key question is how often random assignment alone would produce a difference

in the groups at least as extreme as the difference observed in this study if there really were no

effect of sleep condition on improvement score. You addressed similar questions in Chapter 5

when you analyzed the dolphin therapy and yawing studies. The only change is that now the

response variable is quantitative rather than categorical, so the relevant statistic is the difference

in group means rather than the difference in group proportions. Also once again, we use simulation to investigate how often such an extreme difference would occur by chance (random

assignment) alone (if the null hypothesis of no difference/no effect/no association were true).

In other words, we will again employ the 3S strategy.

1. Statistic:

8. A natural statistic for measuring how different the observed group means are from each

other is the difference in the mean improvement scores between the two groups. Report

the value of this statistic, as you did in #5(f).

2. Simulate: You will start by using index cards to perform a tactile simulation of randomly assigning the 21 subjects between the two groups, assuming that sleep condition has

no impact on improvement.

Because the null hypothesis asserts that improvement score is not associated with sleep

condition, we will assume that the 21 subjects would have had exactly the same improvement

scores as they did, regardless of which sleep condition group (unrestricted or deprived) the

subject had been assigned.

9. a. How many index cards do you need to conduct this simulation?

b. What will you write on each index card?

To conduct one repetition of this simulation:

• Shuffle the stack of 21 cards well and then randomly distribute cards into two stacks: one

stack with 10 cards (the unrestricted group) and one with 11 (the sleep-deprived group).

• Calculate and report the sample means for each rerandomized group:

Rerandomized unrestricted group’s mean:

Rerandomized deprived group’s mean:

• Calculate the difference in group means: unrestricted mean minus sleep-deprived mean.

Report this value.

Reset

c06Exploration_6.2.indd Page 155 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.2: Lingering Effects of Sleep Deprivation

• Combine this result with your classmates’ to create a dotplot that shows the distribution

of several possible values of difference in sample means that could have happened due to

pure chance if sleep condition has no impact on improvement. Sketch a dotplot on an axis

like the one below, being sure to label the horizontal axis.

−20

−15

−10

−5

0

5

10

15

Label:

c. At about what value is the dotplot centered? Explain why this makes sense. (Hint: What

are we assuming to be true when we conduct the simulation?)

d. Where is the observed difference in means from the original study (as reported in #8)

on the dotplot? Did this value happen often, somewhat rarely, or very rarely? How are

you deciding?

10. As before with simulation-based analyses, you would now like to conduct many, many

more repetitions to determine what is typical and what is not for the difference in group

means, assuming that sleep condition has no impact on improvement score. We think

you would prefer to use a computer applet to do this rather than continue to shuffle cards

for a very long time, calculating the difference of group means by hand. Go back to the

Multiple Means applet, check the Show Shuffle Options box, select the Plot display, and

press Shuffle Responses.

a. Describe what the applet is doing and how this relates to your null hypothesis from #4.

b. Record the simulated difference in sample means for the rerandomized groups, as

given in the applet output. Is this difference more extreme than the observed difference

from the study (as reported in #8)? How are you deciding?

c. Click on Shuffle Responses again and record the simulated difference in sample means

for the rerandomized groups. Did it change from #10(b)?

20

Reset

155

c06Exploration_6.2.indd Page 156 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

156

CH A PT E R 6

Comparing Two Means

d. Click on Re-Randomize again and record the simulated difference in sample means

for the rerandomized groups. Did it change from #10(b) and #10(c)?

e. Now to see many more possible values of the difference in sample means, assuming

sleep condition has no impact on improvement, do the following in the Multiple

Means applet:

• Change Number of Shuffles from 1 to 997.

• Press Shuffle Responses to produce a total of 1000 shuffles and rerandomized

statistics.

f. Consider the histogram of the 1,000 could-have-been values of difference in sample

means, assuming that sleep condition has no effect on improvement.

i. What does one observation on the histogram represent? (Hint: Think about what

you would have to do to put another observation on the graph.)

ii. Describe the overall shape of the null distribution displayed in this histogram.

iii. Where does the observed difference in sample means (as reported in #8) fall in this

histogram: near the middle or out in a tail? Are there a lot of observations that are

even more extreme than the observed difference, assuming sleep condition has no

impact on improvement? How are you deciding?

g. To estimate a p-value, continue with the Multiple Means applet.

• Type in the observed difference in group means (as reported in #8) in the Count

Samples box (for the one-sided alternative hypothesis) and press Count.

• Record the approximate p-value.

h. Fill in the blanks of the following sentence to complete the interpretation of the p-value.

The p-value of _______ is the probability of observing _____________

__________________ assuming ______________________________.

3. Strength of evidence:

11. Based on the p-value, evaluate the strength of evidence provided by the experimental data

against the null hypothesis that sleep condition has no effect on improvement score: not

much evidence, moderate evidence, strong evidence, or very strong evidence?

Reset

c06Exploration_6.2.indd Page 157 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.2: Lingering Effects of Sleep Deprivation

12. Significance: Summarize your conclusion with regard to strength of evidence in the con-

text of this study.

13. Estimation:

a. Use the 2SD method to approximate a 95% confidence interval for the difference in

long-run mean improvement score for subjects who get unrestricted sleep minus the

long-run mean improvement score for subjects who are sleep deprived. (Hints: Remember the observed value of the difference in group means and obtain the SD of the

difference in group means from the applet’s simulation results. The interval should be

observed difference in means ±2SD, where SD represents the standard deviation of

the null distribution of the difference in group means.)

b. Interpret what this confidence interval reveals, paying particular attention to whether

the interval is entirely positive, entirely negative, or contains zero. (Hint: Be sure to

convey “direction” in your interpretation by saying how much larger improvement

scores are on average for the treatment you find to have the larger long-run mean: I’m

95% confident that the long-run mean improvement score is __________ to ________

higher with the _________ treatment as opposed to the _________ treatment.)

STEP 5: Formulate conclusions.

14. Generalization: Were the participants in this study randomly selected from a larger pop-

ulation? Describe the population to which you would feel comfortable generalizing the

results of this study.

15. Causation: Were the participants in the study randomly assigned to a sleep condition?

How does this affect the scope of conclusion that you can draw?

Another statistic

Could we have chosen a statistic other than the difference in group means to summarize how

different the two groups’ improvement scores were? Yes, for example we could have used

the difference in group medians. Why might we do this? For one reason, the median is less

affected by outliers than the mean (see Section 3.2).

Reset

157

c06Exploration_6.2.indd Page 158 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

158

CH A PT E R 6

Comparing Two Means

To analyze the difference in group medians, we carry out the 3S strategy as before, except:

• We calculate the observed value of the difference in medians as the statistic.

• After we conduct the rerandomizing, we calculate the difference in medians for the rerandomized data. Then we repeat this process a large number of times.

• We determine the p-value by counting how many of the simulated statistics are at least as

large as the observed value of the difference in medians.

16. Return to the Multiple Means applet and use the Statistic pull-down menu (on the left)

to select Difference in Medians.

a. From the Summary Statistics for the original data, record the median improvement

score for each group. Also record the difference between the medians (unrestricted

median minus deprived median).

Unrestricted median:

Deprived median:

Difference (unrestricted − deprived):

b. Enter 1000 for Number of Shuffles and press Shuffle Responses. Describe the result-

ing null distribution of difference in group medians. Does this null distribution appear

to be centered near zero? Does it seem to have a bell-shaped distribution?

c. To calculate a p-value based on the difference in medians, enter the observed value in

the Count samples box. Then press Count. Report both the value that you enter into

the applet and the resulting p-value.

d. Does this p-value indicate strong evidence that sleep deprivation has a harmful effect

on improvement score? Explain how you are deciding.

e. With which statistic (difference in means or difference in medians) do the data provide

stronger evidence that sleep deprivation has a harmful effect on improvement score?

Explain how you are deciding.

17. STEP 6: Look back and ahead.

Looking back: Did anything about the design and conclusions of this study concern you?

Issues you may want to critique include:

• Any mismatch between the research question and the study design

• How the experimental units were selected

• How the treatments were assigned to the experimental units

Reset

c06Exploration_6.2.indd Page 159 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.2: Lingering Effects of Sleep Deprivation

• How the measurements were recorded

• The number of experimental units in the study

• Whether what we observed is of practical value

Looking ahead: What should the researchers’ next steps be to fix the limitations or build

on this knowledge?

Reset

159

c06Exploration_6.3.indd Page 161 23/07/15 12:28 AM f-389

Student Name:

/201/WB01616/9781118335888/ch06/text_s

Save

Print

EX P LO RAT IO N 6.3 : Close Friends

Close Friends

How many close friends do you have? You know, the kind of people you like to talk to about

important personal matters. Do men and women tend to differ on the number of close

friends? And if so, by how much do men and women differ with regard to how many close

friends they have? One of the questions asked of a random sample of adult Americans on the

2004 General Social Survey (GSS) was:

From time to time, most people discuss important matters with other people. Looking back

over the last six months—who are the people with whom you discussed matters important

to you? Just tell me their first names or initials.

The interviewer then recorded how many names each person gave, along with keeping

track of the person’s sex. The GSS is a survey of a representative sample of U.S. adults who are

not institutionalized.

1. Identify the variables recorded. Also classify each as either categorical or quantitative, and

identify each variable’s role: explanatory or response.

2. Did this study make use of random assignment, random sampling, both, or neither?

3. Was this an experiment or an observational study? Explain how you are deciding.

4. In words, state the null and the alternative hypotheses to test whether American men and

women differ with regard to how many friends they have.

5. Define the parameters of interest and assign symbols to them.

6. State the null and the alternative hypotheses in symbols.

EXPLORATION

6.3

Reset

161

c06Exploration_6.3.indd Page 162 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

162

CH A PT E R 6

Reset

Comparing Two Means

The survey responses are summarized in the Table 6.5 (and the data can be found in expanded

form in the file CloseFriends).

Table 6.5 Number of close friends reported by sex

Number of close friends reported

0

1

2

3

4

5

6

Total

Men

196

135

108

100

42

40

33

654

Women

201

146

155

132

86

56

37

813

7. Now you will obtain numerical summaries (statistics, such as mean and SD) of the survey

data using the Multiple Means applet.

• Open the data file CloseFriends to access the raw data. Copy the data (e.g., CTRL-A

and CTRL-C).

• Open the Multiple Means applet and press Clear. Click inside the Sample data box

and paste (e.g., CTRL-V). Then press Use Data.

a. Report the sample size, sample mean, and sample SD of the number of close friends

reported for each sex. Based on the sample statistics, who tends to have more close

friends, on average: men or women? How are you deciding?

b. Based on the sample statistics, who tends to have more variability with respect to how

many close friends they have: men or women? How are you deciding?

c. Based on the data as presented in Table 6.5, are the distributions of number of close

friends symmetric or skewed? If skewed, in which direction?

d. Calculate the observed difference in the mean number of close friends between these two

groups (men − women). (Verify your calculation by checking the Observed diff output.)

e. Notice that the value recorded in (d) is a negative number. Well, number of friends can

only be 0 or more. Why is the number recorded in (d) less than 0?

The question we want to answer is, Is the difference in mean number of close friends between

men and women, as seen in the GSS sample data, something that could plausibly have happened

by chance, by random sampling alone, if, on average, men and women have the same number

of close friends?

8. Using the same approach as in Section 6.2 with the Multiple Means applet, you can use

this applet to generate possible values of the difference in sample means under the null

hypothesis by shuffling which response values go with which explanatory variable values.

Check the Show Shuffle Options box and enter 1,000 for the Number of Shuffles. Press

c06Exploration_6.3.indd Page 163 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.3 : Close Friends

the Shuffle Responses button (there will be a pause; this is a large data set!). The histogram on the right is the simulated null distribution for the difference in sample means.

Enter the observed difference in sample means in the Count Samples box and press

Count.

a. Record your estimated p-value.

b. Fill in the blanks of the following sentence, to complete the interpretation of the

p-value.

The p-value of ________ is the probability of observing

________________________________ assuming

__________________________________.

9. If the difference in means was larger (more different than 0), how would this impact the

size of the p-value?

10. How would increasing the sample size (all else remaining the same) change the p-value?

Why?

11. How would increasing the standard deviation of the number of close friends for both

males and females (all else remaining the same) change the p-value? Why?

As we saw earlier, another measure of the strength of evidence against the null hypothesis

would be to standardize the statistic by subtracting the hypothesized value and dividing by

the standard error of the statistic.

12. On the left side of the applet (below the Sample data window) use the Statistic pull-

down menu to select the t-statistic (again, there will be a pause). Record the value of

the t-statistic that is computed for your data. Write a one-sentence interpretation of this

value.

Notice that the standardized statistic uses the letter t (the one you should remember from

Chapter 3) instead of the z (from Chapters 1 and 5) we saw when testing proportions. This

is because the theoretical distribution used is now a t-distribution instead of a normal distribution. These t-distributions are very similar to normal distributions especially when sample

sizes are large. The t-statistic, like the z-statistic, tells us how many standard deviations our

sample difference is above or below the mean and it can be judged in the same manner. More

details about the t-statistic are given earlier in this section and in the Calculation Details

appendix.

Reset

163

c06Exploration_6.3.indd Page 164 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

164

CH A PT E R 6

Comparing Two Means

13. In light of the value of the standardized statistic, should you expect the p-value to be large

or small? How are you deciding?

14. The histogram on the far right now displays the null distribution of simulated t-statistics.

Describe the behavior of this distribution.

15. Below the null distribution check the box Overlay t distribution. Does the t-distribution

appear to adequately predict the behavior of the shuffled t-statistics? What do you think

this suggests about whether the validity conditions will be met for these data?

16. Enter the observed value of the t-statistic in the Count Samples box and press Count.

(Remember to use the pull-down menu to specify what direction(s) you want to consider more extreme based on the alternative hypothesis.) How does the p-value from the

t-distribution (in orange) compare to the simulation-based p-value (from #8(a))?

Validity conditions

The validity conditions required for this theory-based approach (a “two-sample t-test”) to be

valid are shown in the box.

VA L I D I T Y C O N D I T I O N S

The quantitative variable should have a symmetric distribution in both groups or you

should have at least 20 observations in each group and the sample distributions

should not be strongly skewed.

17. Do the validity conditions appear to be satisfied for these data? Justify your answer.

Reset

c06Exploration_6.3.indd Page 165 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

EX P LO RAT IO N 6.3 : Close Friends

18. Based on the above analysis, state your conclusion in the context of the study. Be sure to

comment on the following.

a. Statistical significance: Do the data provide evidence that the mean number of close

friends that men have is different from the mean number of friends women have? How

are you deciding?

b. Causation: Do the data provide evidence that how many friends one has is caused by

one’s sex? How are you deciding?

c. Generalization: To whom can you apply the results of this study? All people? All adults?

How are you deciding?

19. Estimation: When you have selected the t-statistic and the theory-based overlay, you can

also use this applet for estimating the parameter of interest. (You can also go directly to

the Theory-Based Inference applet which will also allow you to change the confidence

level.) Check the box next to 95% CI(s) for difference in means.

a. Identify, in words related to the context of this study, the relevant parameter to be

estimated here.

b. Report the 95% confidence interval for this parameter.

c. Does the 95% confidence interval calculated from the GSS sample data contain the

value 0? What does that imply? (Hint: Recall that a confidence interval is an interval

of plausible values for the parameter of interest, and the interval is calculated using the

sample statistics.)

d. Does the 95% confidence interval agree with your conclusion in #16? How are you deciding?

Reset

165

c06Exploration_6.3.indd Page 166 23/07/15 12:28 AM f-389

/201/WB01616/9781118335888/ch06/text_s

Print

166

CH A PT E R 6

Comparing Two Means

e. Fill in the blanks of the sentence below to complete the interpretation of the 95% con-

fidence interval:

We are _______% confident that the mean number of close friends men have is

___________ than the mean number of close friends that women have,

by between ________ and _______ friends.

f. How would the width of the interval change if you increased the confidence level to

99%? Why?

g. How would the width of the interval change if you increased the sample size? Why?

h. How would the width of the interval change if the variability of the number of close

friends increased for both males and females? Why?

20. Step 6: Look back and ahead.

Looking back: Did anything about the design and conclusions of this study concern you?

Issues you may want to critique include:

• The match between the research question and the study design

• How the observational units were selected

• How the measurements were recorded

• The number of observational units in the study

• Whether what we observed is of practical value

Looking ahead: What should the researchers’ next steps be to fix the limitations in this

study and/or build on this knowledge?

Reset

Don't use plagiarized sources. Get Your Custom Essay on

statistics packets

Just from $13/Page

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

24/7 Customer Support

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.