Hery answer
ST
A
TISTI
C
S FINAL
1
5
Statistics Final
Name
Institution
Running head: STATISTICS FINAL 1
Statistics Final
1. Classify the following studies as descriptive or inferential and explain your reasons:
a. (1 pts.) A study on stress concluded that more than half of
all
Americans older than 1
8
have at least “moderate” stress in their lives. The study was based on responses of
3
4
,
0
00 households to the 1
9
85 National Health Interview Survey.
This is an inferential study because it is casting predictions about a large population i.e. all American beyond
18
years from analysis of a sample i.e. 34,000 households. This is typical of inferential studies where one does not have access to the whole population of interest to the study and normally has to base findings on a limited number of data. The study given as an example above has used the results from the analysis of a sample and generalized it to the larger American population.
b. (1 pts.) A report in a farming magazine indicates that more than
95
% of the 400 largest farms in the nation are still considered family operations.
This is a descriptive study. The data was collected from a small population and a good description is offered which makes it easier to interpret the data. In the example given, a statistical measure (95%) has been used to describe the group that was being studied (400 largest farms). The results given do not allow us to arrive at conclusions concerning a larger group.
2
. Thirtyfive fourthgrade students were asked the traditional question “what do you want to be when you grow up? The responses are summarized in the following table:
E
mployment
Frequency
Relative Frequency
Teacher
80.229
D
octor
6
0.1
7
1
Scientist
30.086
Police Officer
90.2
57
Athlete
9
0.
25
7
a. (2 pts.) Construct a pie chart for relative frequency
b. (2 pts.) Construct a bar graph for the relative frequencies
3. In a college freshman English course, the following
20
grades were recorded
48
88
47
39
45
44
98
76
84
54
67
91
84
38
75
38
35
82
42
82
338
82
84
Subject 
Grade 

1  35  
2  38  
4  39  
5  42  
44  
7  45  
47  
48  
10 
54  
11 
67  
12 
75  
13 
76  
14 
82  
15 

16 
84  
17 

18  88  
19 
91  
20  98  

1257 

STDDEV 
21 .599 89 

Mean 
62 .85 

Variance 
466. 55 5 3 

Q1 
43.0 

Q2 
60.5 

Q3 
83.0 
Find the:
a. (1 pt.)Quartiles for the above data set
1st quartile = 43
2nd quartile = 60.5
3rd quartile = 83
Interquartile range = 83 43 = 40
b. (1 pt.)Range for the above data set
The range for the above dataset is 35 to 98
c. (1 pt.)Mean for the above data set
Mean = 62.85
d. (1 pt.)Variance for the above data set
Variance = 466.5553
4. The age distribution of students at a community college is given below:
Age in Y ears 
Number of Student s (f) 

Under 21 
4946 

21 – 25 
4808 

26 – 30 
2673 

30 – 35 
2 90 36 

Over 35 
525 

Suppose a student is selected at random. Let
A = the event the student is under 21
B
= the event the student’s age is between 21 and 25
C = the event the student’s age is between 26 and 30
D = the event the student’s age is between 31 and 35
E = the event the student’s age is under 35
4946
4808
2673
525
Age in years 
Number of students 

A 
Under 21 

B 
21 – 25 

C 
26 – 30 

D 
30 – 35 
29036  
E 
Over 35 

Total 
41988 

P (B) 
0.114508907 

P (E) 
0.98749 64 28 

a. (2 pts.) Find P (B)
P (B) = 0.114508907
b. (2 pts.) Find P (E)
P (E) = 0.987496428
5. A study of the effect of college education on job satisfaction was conducted. A contingency table is presented below:
Total
Total
Attended College 
Did not Attend 

Satisfied with job 
325 
186 
511 
Not satisfied with job 
190 
369 
559 
515 
555 
1070 
If you were to randomly sample an individual from this population, find the probability of selecting an individual who is
a. (2 pts.) satisfied with job
Individuals satisfied with job = 325 + 186 = 511
Total populatio
n = 10
70
P (satisfied with job) = 511/1070 = 0.478
b. (3 pts.) did not attend college
given
not satisfied with the job
Individuals who did not attend college given not satisfied with the job = 369
Total population = 1070
P (did not attend college given not satisfied with the job) = 369/1070 = 0.345
c. (3 pts.) not satisfied with job, and did not attend college
Number of individuals not satisfied with job = 559
Number of individuals who did not attend college = 555
Total population = 1070
P (not satisfied with job, and did not attend college) = (559/1070) * (555/1070) = 0.271
6. The random variable x is the number of houses sold by a realtor in a single month at the realestate office. Its probability distribution is:
1 2 3 4 5 60.09
Houses sold (x) 
Probability P(x) 

0 
0.09 

0.24 

0.21 

0.17 

0.03 

0.15 

0.02 
a. (3 pts.) Compute the mean of the random variable.
μ = xi Pi
00.09
0
0.24
0.24
0.21
30.17
40.03
50.15
60.09
70.02
Household (x) 
Probability P (x) 
xP(x) 

0.42 

0.51 

0.12 

0.75 

0.54 

0.14 

μx 
2. 72 
Mean of random variable x = 2.72
b. (3 pts.) Compute the standard deviation of the random variable.
x2 = (xi
μ x
)2 Pi
Household (x)xi μ x
Probability P (x)
02.72
0.09
12.72
0.24
22.72
0.21
32.72
0.17
42.72
0.03
52.72
0.15
62.72
0.09
72.72
0.02
3.6616
μ x 
xi μ x 
xi μ x Pi 

2 .72 
7.3984 
0.6658 56 

1.72 
2.9584 
0.710016 

0.72 
0.5184 
0.108864 

0.28 
0.0784 
0.01 3328 

1.28 
1.6384 
0.049152 

2.28 
5.1984 
0.7 79 76 

3.28 
10.7584 
0.9 68 256 

4.28 
18.3184 
0.366368 

xi μ x Pi 
3.6 616 

x xi μ x Pi 

x 
1.913530768 

Standard deviation x = 1.913530768
7. According to the U.S. National Center for Health Statistics, the mean height of 18 24 year old American males is = 69.7 inches. Assume the heights are normally distributed with a standard deviation of 2.7 inches.
Fill in the following blanks:
68.26% = 1
1 = 2.7 inches
69.7 2.7 = 67 inches
69.7 + 2.7 = 72.4 inches
95.44% = 2
2 = 2.7 * 2 = 5.4 inches
69.7 5.4 = 64.3 inches
69.7 + 5.4 = 75.1 inches
99.74% = 3
3 = 8.1 inches
69.7 8.1 = 61.6 inches
69.7 + 8.1 =
77
.8 inches
a. (1 pt.) About 68.26% of 18 24 year old American males are between 67 and 72.4 inches tall.
b. (1pt.) About 95.44% of 18 24 year old American males are between 64.3 and 75.1 inches tall.
c. (1 pt.) About 99.74% of 18 24 year old American males are between 61.6 and 77.8 inches tall.
8. The average of freshman college students is =
18.5
years, with a standard deviation
= 0.4 years
.
a. (4 pts.) Let x̅ denote the mean age of a random sample of n = 50 students. Determine the mean and standard deviation of the random variable x̅.
Average age = 18.5 years
x̅ is the sample mean of 50 randomly chosen students.
The mean of random variable x̅ = population mean = 18.5 years.
The standard deviation of random variable x̅ is given by = / =
Where = population variance; which can be assumed to be 0.4 years
= 0.4 / = 0.05656854 years
b. (4 pts.) Repeat part (a) with n = 100.
The mean of random variable x̅ = population mean = 18.5 years.
= 0.4 years
The standard deviation of random variable x̅ is given by = 0.4 / = 0.04 years
9. A brand of salsa comes in jars marked net weight 680 grams. Suppose the actual mean net weight μ = 680 grams with a standard deviation of 22.7 grams. Further suppose that the net weights are normally distributed.
a. (4 pts.) Determine the probability that a randomly selected jar of this brand of salsa will have a weight less than 660 grams.
Z = Zscore
Z = (x μ) / / (n) = (660680)/22.7/ (1) = 0.881057
P (have less than 660 grams) = 0.189143412
b. (4 pts.) Determine the probability that the 15 randomly selected jars of this brand of salsa will have a mean weight of less than 660 grams.
P (15 randomly selected jars will less than 660 grams) = 0
10. (8 pts.) Each year a large university collects data on average beginning monthly salaries of its business school graduates. A random sample of 125 recent graduates with bachelor’s degrees in marketing has a mean stating monthly salary of
x̅ = $1635
with a standard deviation of s = $288. Use these data to obtain a 90% confidence interval estimate for the mean starting monthly salary, µ, of all recent graduates with bachelor’s degrees in marketing from this university.
Confidence interval (CI) = mean (t * S. E. M) to mean + (t * S. E. M)
Where standard error of mean (s. e. m) = SD /
Sample size n = 125
Degree of freedom = 125 1 = 124
Probability = 1 0.9 = 0.1; P ˂ 0.1
From excel; TINV (0.1, 124)
Value of critical – t = 1.657234
97
1
x̅ = $1635
SD = $288
S. E. M = 288/ = 25.7595031
25.7595031 * 1.657234971 = $42.68954937
For the lower end of the range: 1635 42.68954937 = 15
92
.31
For the upper end of the range: 1635 + 42.68954937 = 1677.69
90% confidence interval = $1592.31 to $1677.69
We can be 90% confident that the starting monthly salary for all recent graduates with bachelor’s degree in marketing lies between $1592.31 and $1677.69.
11. A college administrator wants to study the average age of students who drop out of college after only attending one semester. He randomly selects 25 students who are in this group. Their ages are listed below:
35.6
20.1
18.1
21.3
20.1
19.2
18.5
18.9
18.6
18.4
19.2
18.8
17.7
21.0
19.3
24.2
19.0
19.6
18.6
19.4
20.3
20.4
19.6
19.9
19.2
Assume that the ages are normally distributed with a standard deviation of sigma = 0.8 year.
a. (5 pts.) Find a 95% confidence interval for the mean age, µ, of first semester college dropouts.
Sample size n = 25
1
4
20.1
Student 









Age  35.6  20.1  18.1  21.3  19.2  18.5  18.9  18.6  18.4 
19
18.6












18.8  17.7  21  19.3  24.2  19.6  19.4  20.3  20.4 
19.2



Total Age 
Mean age 
19.9 
505 
20.2 
Sample mean = 20.2 years
Degree of freedom = 25 1 = 24
Probability P ˂ 0.05
From excel; TINV (0.05, 24)
Value of critical – t = 2.063898547
x̅ = 20.2
SD = 0.8 years
S. E. M = 0.8/ = 0.16
2.063898547 * 0.16 = 0.330223767 years
95% CI = from (20.2 0.330223767) years to (20.2 + 0.330223767) years
For the lower end of the range: 19.86977623 years
For the upper end of the range: 20.53022377 years
The 95% confidence interval for the mean age of first year college dropouts is between 19.9 to 20.5 years.
b. (3 pts.) Interpret your results in part (a) in words.
The results above imply that we are 95% confident that the true population mean of first semester college dropouts lies within the calculated confidence interval i.e. 19.9 years and 20.5 years.
12. An insurance company stated that in 1987, the average yearly car insurance cost for a family in the U.S. was $1188. In the same year, a random sample of 37 families in California resulted in a mean cost of x̅ = $1228 with a standard deviation of s = $21.00.
a. (4 pts.) Does this suggest that the average insurance cost for a family in California in 1987 exceeded the national average?
The sample given cannot be used to make a conclusive judgment as to whether the average insurance cost (μ 1) for a family in California exceeded the national average (μ 2) for the year 1987. The reason is because the sample size (37 families) is too small compared to the actual population of families in the U.S. So the above statement remains to be an assumption until a statistical procedure is used to verify it.
b. (4 pts.) State the appropriate null and alternative hypotheses for this question.
The appropriate hypotheses are:
Null hypothesis, H0: μ1 μ 2 = 0;
Alternative hypothesis, H1: μ1 μ 2 0
c. (4 pts.) Perform the statistical test of the null hypothesis at a significance level of 5%
n = 37
Degree of freedom (DF) = n 1 = 37 1 = 36
S = $21
Solution:
Standard error of mean difference (SE) = S / = 21 / 36 =
3.5
tscore (t) = (d’ D) / SE = (40 0) / 3.5 = 11.42857143
d’ is the mean difference between the sample pairs = 1228 1188 = 40
D = 0 is the hypothesized mean difference between population pairs
Finding P (t ˂ 11.42857) = 0; and P (t ˃ 11.42857) = 0
For this twotailed test, the Pvalue for the probability that a tscore having 36 degrees of freedom is less than 11.42857 or greater than 11.42857 is 0.
Since the Pvalue (0) is lesser than the significance level (0.05), the null hypothesis can be rejected i.e. it is not safe to say that the average insurance cost (μ 1) for a family in California was equal to the national average (μ 2) for the year 1987.
13. (10 pts.) A computerized tutorial center at a local college wants to compare two different statistical software programs. Students going to the center are matched with other student having similar abilities in statistics (assume the matching process creates matched pairs acceptable for use with the appropriate paired test statistic for the null hypothesis of no difference). A random sample of 10 student pairs is selected for each pair, one student is randomly assigned program A, the other program B. After two weeks of using the program, the students are given an evaluation test. Their grades are:
7591
88
76
Program A 
Program B 

64  62  
68  72  
79  
97  57  
90  
55  56  
89  
77  
95 
Do the data provide evidence, at the 5% significance level, that there is a difference in mean student performance between the two software programs? Assume that the population of all possible paired differences is approximately normally distributed. In support of your decision show the null and alternative hypothesis and the value of the test statistics computed for assessing the significance level.
Program A
Program B
164
62
268
72
4
375
79
4
6.1
37.21
97
57
590
91
1
655
56
1
3.1
9.61
68
88
20
864
89
977
1095
76
Pairs 
Difference, d 
(d – d’) 
(d – d’)squared 
2  0.1  0.01  
6.1 
37.21 

40 
37.9 
1436.41 

3.1 
9.61 

22.1 
488.41 

25 
27.1 
734.41 

92 
15 
12.9 
166.41 
19 
16.9 
285.61 

Total 
21 
3204.9 

Mean of d = d’ 
2.1 

Probability P 
0.733003906 
n = 10
Degree of freedom (DF) = n 1 = 10 1 = 9
Solution:
Hypothesis
Null hypothesis H0: μ d = 0
Alternative hypothesis Ha: μ d 0
Conducting a matchedpairs ttest of the null hypothesis:
Standard deviation of the differences = S
S = [ (d d’) 2 / (n 1) = [3204.9 / (10 1)] = = 18.8706
Standard error of the mean difference (SE) = S / = 18.8706 / 10 = 5.96741
tscore test statistic (t) = (d’ D) / SE = ( 2.1 0) / 5.96741 = 0.3519
d’ is the mean difference between the sample pairs = 2.1
D = 0 is the hypothesized mean difference between population pairs
For this twotailed test, the Pvalue for the probability that a tscore having 9 degrees of freedom is less than 0.3519 or greater than 0.3519 is 0.733as found using excel’s formula.
Interpretation of results:
Since the Pvalue (0.733) is greater than the significance level (0.05), the null hypothesis cannot be rejected.
14. Ten students in a graduate program were randomly selected. Their grade point averages (
GPA
s) when they entered the program were between 3.5 and
4.0
. The following data were obtained regarding their GPAs on entering the program versus their current GPAs:
Entering GPA
Current GPA
3.53.5
3.8
3.63.9
3.5
3.7
3.9
4.0
3.5
4.0
3.7
3.6
3.6
3.9
3.6
3.7
4.0
4.0
3.9
a. (3 pts.) Determine the linear regression equation for the data.
A linear regression line has the formula Y = A + B
X
Subject 13.5
3.5
12.25
12.25
23.8
3.6
33.9
3.5
12.25
43.7
3.9
15.21
54
3.5
14
16
12.25
64
3.7
16
13.69
73.6
3.6
12.96
12.96
12.96
83.9
3.6
15.21
12.96
93.7
4
14.8
13.69
16
104
3.9
16
15.21
X  Y 
XY 
X2 
Y2 

12.25 

13.68 
14.44 
12.96 

13.65 
15.21 

14.43 
13.69 

14.8 

14.04 

15.6 

38.1 
36.8 
140.21 
145.45 
135.74 

X – Entering GPA
Y – Current GPA
n = 10
A = [(Y) (X2) (X) (XY)] / [n (X2) (X) 2]
B = [n (XY) (X) (Y)] / [n(X2) (X) 2]
X mean = 3.81
Y mean = 3.68
Slope B = 0.0069
Intercept A = 3.6536
The linear regression equation: Y = 3.6536 + 0.0069 X
b. (3 pts.) Graph the regression equation
c. (3 pts.) Describe the apparent relationship between the entering GPAs and current GPAs for students in this graduate program.
The apparent relationship between the entering GPAs (X) and the current GPAs (Y) is that the current GPAs are lower than the entering GPAs for most students (6 out of the 10 students).
d. (3 pts.) What does the slope for the regression line represent in terms of current GPAs?
The slope of the regression line shows that the current GPAs only tend to increase minimally for a big increase in the entering GPAs.
e. (3 pts.) Use the regression equation to predict the current GPA of a student with an entering GPA of 3.6
The linear regression equation: Y = 3.6536 + 0.0069 X
Entering GPA = X = 3.6
Current GPA = Y = 3.6536 + 0.0069 (3.6) = 3.67844
GPA
Regression Line 0 1 3 5 3.6536 3.6604999999999999 3.6743000000000001 3.6880999999999999 Entering GPA
Relative frequency
pie chart
Difference, d 2 4 4 40 0 0.10000000000000009 6.1 6.1 37.9 Difference, d 2 4 4 40 0 0.10000000000000009 6.1 6.1 37.9 Employment
Relative frequency