Statistics Question

List of materials, Statistical Analysis, 2022For the final exam you will be tested on the following:
l
Marginal, joint, conditional probability under statistically independent and
dependent conditions
l
Use the rules of complements, addition, and multiplication
l
Use and interpret Bayes Theorem
l
Answer questions on useful distributions including Bernoulli, Binomial,
Poisson, Uniform, Exponential, Normal, t, and Chi-squared
l
Use counting rules (Combination and Permutation) to calculate probabilities
l
Calculate standard deviations (errors) for sample means and proportions
l
Calculate and interpret Confidence Intervals for populations and
proportions using z- and/or t-.
l
Conduct and interpret Hypothesis tests for populations and proportions
using Z or t statistics, one- or two- populations; including p-values
l
Conduct and interpret two Chi-square tests; understand degrees of freedom
l
Conduct and interpret ANOVA tests; understand degrees of freedom
l
Interpret each part of a linear regression result table;
l
Explain quantitatively the relationship between independent variables and
dependent variable using the coefficients;
l
Understand R2, residuals, and confident intervals for linear regressions
l
Understand that correlation does not imply causality – come up with
alternative stories to linear regression results
l
Understand other “pitfalls” of linear regression models
Note: Since the final will be covering the whole course, it is important for you to
understand which part(s) each question is about, i.e., which tool(s) we learned in
this course should be used for each application.
Final Exam Practice Questions
Notes:
• Our final will include short questions (multiple choice) and full questions
(calculation and/or interpretation)

Here we provide you 15 examples of “short questions”. The goal is to show you the
style of short questions we will see in the final. It also provides extra examples for
you to practice.

For the style of “full questions”, you can review our quiz/homework questions as
well as exercise questions after in textbook.

The list of points of knowledge covered here and that covered in our final are not
necessarily (exactly) the same – so you should still follow your own pace for
reviewing materials, instead of only working on the practice questions.
Section A: Multiple Choices I (only one correct option in each question)
1.
A recent survey conducted by the personnel manager of a major enterprise
resources planning (ERP) company showed that 35% of the employees were dissatisfied
with their salary, 80% were satisfied with their work assignments, 15% were dissatisfied
with their work hours, 17% were dissatisfied with both their salary and work assignments,
and 8% were dissatisfied with both their work assignments and work hours. What is the
percentage of employees who are satisfied with both their salary and work assignments?
A)
B)
C)
D)
E)
0.38
0.02
0.62
0.52
None of the above
2.
Toss a fair die 4 four times. What is the probability that you get an even number for
the first toss, and an odd number for the second toss, and two 6’s for the last two tosses?
A)
B)
C)
D)
E)
1/2
(1/2)*(1/2)*(1/6)*(1/6)
(1/6)*(1/6)*(1/6)*(1/6)
0
None of the above.
3.
Suppose the heights of graduate students at University Tallmen is approximately
normally distributed with mean 180cm and standard deviation 15cm. And the heights of
undergraduate students at the same university is approximately normally distributed with
mean 180 cm and standard deviation 10cm. Please find the height x, such that about 16% of
the undergraduate students are at least of height x.
A)
B)
C)
D)
E)
195cm
180cm
165cm
170cm
190cm
4.
Suppose we have constructed a 95% confidence interval for the population
proportion parameter: [0.1, 0.7]. Which of the following interpretation is correct?
A)
B)
C)
D)
In repeated sampling, if we construct interval estimates using the same formula as
we did for calculating [0.1, 0.7], 95% of these intervals cover the true proportion.
95% of the observations in the sample lie in the given interval.
We are certain that the true population proportion is in [0.1, 0.7].
None of the above
5.
A and B are two events. P(A|B) = 0.3, P(B)=0.5, P(A)=0.4. What is P(A∩B) + P (B|A)?
A)
B)
C)
D)
E)
Cannot determine based on given information.
0.525
0.575
0.5
0.45
6.
Many people claim that at least 45% of residents in MD rank Maryland University
above JHU. You want to show that the percentage is much lower. How should you formulate
the alternative hypothesis for your test?
A)
B)
C)
D)
p ≥ 0.45
p < 0.45 p ≠ 0.45 None of the above 7. Suppose your portfolio include 100 shares of stock A plus 100 dollars in cash. Let X denote the return of one share of stock A, and you know E(X) = 3 dollars. What is the expected value of your portfolio? A) B) C) D) 400 dollars 103 dollars 300 dollars 100 dollars 8. A commuter owns two cars, one a compact and one SUV. About 80% of the time he uses the compact to travel to work and the SUV for the remaining 20%. When he uses the compact car he gets home by 3 p.m. about 70% of the time; if he uses the SUV he gets home by 3 p.m. about 60% of the time. If on one day he gets home after 3 p.m., what is the probability that he used the compact car? A) B) C) D) 40% 70% 75% 50% 9. The restaurant chain FastFoodGo surveys 400 customers with two questions: (i) what’s the quality of the food (4 levels)? (ii) Will you recommend to a friend? Summarizing all questionnaires produced the following table of joint probabilities (one entry is missing). Rating Will recommend Will not recommend Poor 0.02 0.10 Fair 0.08 Good 0.35 0.14 Excellent 0.20 0.02 Let 𝑃! denote the probability that a customer who gave the restaurant a rating of Poor will recommend the restaurant to a friend. Let 𝑃" denote the probability that a customer who will not recommend the restaurant to a friend gave an “Excellent” rating to the restaurant. What is 𝑃! +𝑃" ? A) B) C) D) 0.08 0.17 0.65 0.22 Use the following regression output to answer questions 10-12: A sample of data was used to create a regression model to predict life expectancy (in years) in various countries as a function of health expenditure per capita (in $1k), percentage of smokers, alcohol consumption per capita (in Liters), and percentage of obese population. [Note: these sample questions are created based on the example covered in our lecture. This is not necessarily the case for our final exam.] Residuals: Min 1Q -5.7286 -0.8571 Median 0.0276 3Q 1.8234 Max 3.4527 Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept)
80.635844
2.226717 36.213 < 2e-16 *** health_expenditure 0.763668 0.255114 2.993 0.00598 ** smoker_perc -0.005631 0.086649 -0.065 0.94868 alcohol -0.104202 0.173024 -0.602 0.55223 obese_perc -0.120079 0.074961 -1.602 0.12126 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.192 on 26 degrees of freedom (4 observations deleted due to missingness) Multiple R-squared: 0.3622, Adjusted R-squared: 0.2641 F-statistic: 3.691 on 4 and 26 DF, p-value: 0.0165 10. Among all data points (countries), what is the maximum difference (in absolute value) between the predicted life expectancy and actual life expectancy? A) B) C) D) E) 5.73 3.45 80.64 0.76 It cannot be determined given the current info. 11. Is the coefficients obese_perc and alcohol significantly different from 0 at a 5% significance level? A) B) C) D) E) obese_perc is significant, but alcohol is not significant obese_perc is not significant, but alcohol is significant obese_perc is significant, and alcohol is significant obese_perc is not significant, and alcohol is not significant We cannot determine their significance given the current info. 12. Which of the following is a correct explanation based on the result: A) An 1% increase in health_expenditure per capita is associated to an 0.76% increase in life_expectancy on average An 1$ increase in health_expenditure per capita is associated to an 0.76% increase in life_expectancy on average An 1k$ increase in health_expenditure per capita is associated to an 0.76 years increase in life_expectancy on average An 1$ increase in health_expenditure per capita is associated to an 0.76 years increase in life_expectancy on average B) C) D) Section B: Multiple Choices II (one or more correct options in each question) [Note: these will be implemented as True/False questions.] 13. Which of the following is/are correct regarding t-distribution and chi-square distribution? A) Both distributions have density curves (pdf) symmetric about 0. B) Both distributions have one parameter. C) Both distributions take only non-negative values. D) The area under the t-distribution density curves is bigger than that under the chisquare distribution density curves. 14. Consider a random sample (of size n), whose observations are independently and identically drawn from a distribution. Regarding sampling distribution of the sample mean, which of the following is/are correct? A) B) C) D) Sampling distribution of the sample mean is always normally distributed. The expected value of the sample mean is smaller than the population mean. The standard deviation of the sample mean decreases as the sample size increases. When the population standard deviation is unknown, the sample mean follows a tdistribution of n-1 degrees of freedom. 15. Which of the following statements are true regarding standard normal and tdistribution(s)? A) As the degree of freedom (df) increases, the t-distribution looks more and more like the standard normal distribution B) They are both symmetric around 0 C) 𝑧#.#% – 𝑡#.#%,"# ≥ 0 D) They are both bell-shaped Suggested Answers: 1: C) 1-(0.35+0.2-0.17)=0.62 2: B) 3: E) Note the info about graduate students is irrelevant 4: A) This is the definition of confidence interval 5: B) 0.3*0.5+(0.3*0.5)/0.4 6: B) You wish to show that “the percentage is much lower than 45%”, hence you would like to put that statement as the alternative in the test. (Following the logic of proof by contradiction) 7: A) E(100X+100) = 100*E(X)+100 = 400. 8: C) 9: D) first the missing cell is 1-(0.02+0.08+0.35+0.20+0.1+0.14+0.02)=0.09. 𝑃! = 0.02/(0.02+0.10), and 𝑃" = 0.02/(0.10+0.09+0.14+0.02), P1+P2 = 0.22. 10: A): this is asking the largest residual (in abs. value) 11: D) 12: C) 13: B) only 14: C) only 15: A), B) and D) Statistical Analysis Class 1. Probability Prof. Yiqing Xing Logistics • Instructor: Prof. Yiqing Xing xingyq@jhu.edu • TA: Mr. Ronghao Lu, rlu23@jhu.edu • Office hours (Zoom); • Tuesdays 12 – 2 pm; starting next week • Best way to reach us: emails • Zoom/recordings available upon your request. Johns Hopkins Carey Business School_Probability_ Slide 1 Grading 40% Final + 50% Homework/Quiz + 10% Attendance • 3 in-class quizzes – classes 2, 4 and 6 • 2 Homework • 1 Empirical exercise • Homework group: 3-4 per group. Will send out google docs for you to sign up; by this Sunday (09/04) • No pop-up quizzes Johns Hopkins Carey Business School_Probability_ Slide 2 Overview - Probability 1. Basic Concepts 2. Probability Rules 3. Conditional Probability 4. Bayes’ Theorem 5. Counting Rules Johns Hopkins Carey Business School_Probability_ Slide 3 Experiment, Outcome, Sample Space Experiment- process of observation that has an uncertain outcome Outcome- potential result of an experiment Sample Space (S)- the set of all possible outcomes from an experiment Common elementary examples 1. Toss a coin 2. Draw a card A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K Johns Hopkins Carey Business School_Probability_ Slide 4 Event Event - The collection some outcomes • NA or #(A): number of outcomes in an event A • Sample space S itself is an event, with its number of outcomes NS Example [Experiment: “Draw a card”] Some events A 2 3 4 5 6 7 8 9 10 J Q K A: “getting an Ace” A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K B: “getting a ” C: “getting a 5 or K” NA = ?, NB = ?, NC = ? Johns Hopkins Carey Business School_Probability_ Slide 5 Probability Probability- The chance that something will happen expressed in fractions, decimals or percent's, such that 1. If o is an outcome, then 0 £ P(o) £1 2. The probabilities of all of the sample space outcomes must sum to 1. The probability of an event is the sum of the probabilities of its outcomes • E is a certain event if P(E) = 1 • E is an impossible event if P(E) = 0 Johns Hopkins Carey Business School_Probability_ Slide 6 Overview - Probability 1. Basic Concepts 2. Probability Rules 3. Conditional Probability 4. Bayes’ Theorem 5. Counting Rules Johns Hopkins Carey Business School_Probability_ Slide 7 Probability Rules • Rule of complements • Addition rule • Multiplication Rule (later in this lecture) Johns Hopkins Carey Business School_Probability_ Slide 8 Rule of Complements The complement ( A ) of an event A is the set of all sample space outcomes not in A Rule of Complements P ( A ) = 1− P ( A) Next: the Addition Rule … A A Venn diagram Johns Hopkins Carey Business School_Probability_ Slide 9 Union and Intersection The union of A and B is the event that includes outcome(s) in either A or B , or both. Written as: A∪B The intersection of A and B is the event that includes outcome(s) that belong to both A and B Written as: A∩B Johns Hopkins Carey Business School_Probability_ Slide 10 Union and Intersection Example [Experiment: “Draw a card”] A: “an A” B: “a ” A∪B : an A or a A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A∩B : an A B : an , , or Johns Hopkins Carey Business School_Probability_ Slide 11 The Addition Rule 1: Mutually Exclusive Events A and B are mutually exclusive if they have no outcomes in common A B In other words: P(A ∩ B) = 0 If A and B are mutually exclusive: P(A∪ B) = P(A) + P(B) Johns Hopkins Carey Business School_Probability_ Slide 12 The Addition Rule (cont.) If A and B are non-mutually exclusive: P(A∪ B) = P(A) + P(B) − P(A∩ B) where P(A∩ B) is the joint probability of A and B both occurring together Johns Hopkins Carey Business School_Probability_ Slide 13 Exercise for you [Experiment: “Draw a card”] A: “getting an A” P=1/13 A 2 3 4 5 6 7 8 9 10 J Q K B: “getting a A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K ” P=1/4 C: “getting a 5 or K” P=2/13 A and C are mutually exclusive A and B are not mutually exclusive P(A∪C) = 1/13 + 2/13 = 3/13 P(A∪B) = 1/13 + 1/4 – 1/52 = 16/52 = 4/13 Johns Hopkins Carey Business School_Probability_ Slide 14 Overview - Probability 1. Basic Concepts 2. Probability Rules 3. Conditional Probability 4. Bayes’ Theorem 5. Counting Rules Johns Hopkins Carey Business School_Probability_ Slide 15 Conditional Probability Johns Hopkins Carey Business School_Probability_ Slide 16 Source: https://watch.global.nba.com/players/hotzones/#!/stephen_curry Conditional Probability “Researchers found 19 infections among 488 unvaccinated people. They also found 24 infections among 2,352 fully vaccinated people.” A: infection B: vaccinated • P(A) = 43/2840 = 1.5% vacc infection 24 unvacc total 19 43 488 2840 no infect 2352 • Suppose you know someone is vaccinated, then probability of A (infection) becomes 24/2352 = 1.0% The above is called conditional probability, written as P(A|B), read as “probability of event A given event B” Johns Hopkins Carey Business School_Probability_ Slide 17 Calculating Conditional Probability P(A | B) = P(A ∩ B) / P(B) vacc infection 24 unvacc Total 19 43 488 2840 no infect 2352 P(inf |vacc) == ?P(inf ∩ vacc) / P(vacc) = (24/2840) / (2352/2840) = 24 / 2352 = 1.0% P(inf |unvac) == ?P(inf ∩ unvacc) / P(unvacc) = 19/488 = 3.9% Exercise: P(no infect |vacc) = ?? Johns Hopkins Carey Business School_Probability_ Slide 18 Independence of Events Two events A and B are independent if and only if: P ( A B ) = P( A) This is equivalent to P ( B A) = P( B) [Assumes P ( A) and P (B ) greater than zero] Question: Are “vaccinated” and “infected” independent? Exercise: Name an experiment with two independent events Johns Hopkins Carey Business School_Probability_ Slide 19 Multiplication Rule: Calculating the Joint Probability The joint probability of A and B (the intersection of A and B ) : P(A∩B) = P(A) P(B | A) = P(B) P(A | B) [simply reverse the definition of conditional probability..] If A and B are independent, this becomes P(A∩ B) = P(A) P(B) Example: B: “getting a red card”: P(B) = 1/2 C: “getting a K”: P(C) = 3/13 The prob. of getting “a red K” = P(B∩C) = P(B)P(C) = 3/26 Johns Hopkins Carey Business School_Probability_ Slide 20 Overview - Probability 1. Basic Concepts 2. Probability Rules 3. Conditional Probability 4. Bayes’ Theorem 5. Counting Rules Johns Hopkins Carey Business School_Probability_ Slide 21 Bayes’ Theorem: Example Why?? Unvaccinated → higher chances of infections → more costly Johns Hopkins Carey Business School_Probability_ Slide 22 Example: Age and Cancer An insurance company wishes to know whether Bob has cancer. You know Bob is above 65-year-old. Suppose A = “Someone has cancer”; B = “Someone is 65+” In data you found that P(A) = 1%, P(B) = 15%, and P(B|A) = 45%. What’s the likelihood of Bob having cancer, i.e. P(A|B)? Johns Hopkins Carey Business School_Probability_ Slide 23 Example: Age and Cancer (cont.) P(A) = 1%, P(B) = 15%, and P(B|A) = 45%. What’s the likelihood of Bob having cancer, i.e. P(A|B)? 𝑃 𝐴∩𝐵 𝑃 𝐴 𝑃(𝐵|𝐴) 1% ∗ 45% 𝑃 𝐴 𝐵 = = = = 3% 𝑃(𝐵) 𝑃(𝐵) 15% Bayes’ Rule/Theorem Johns Hopkins Carey Business School_Probability_ Slide 24 Learning From the Evidence Belief before P(A) “prior” Evidence Belief after P(A|B) “posterior” P(A|B) > P(A): given the evidence (65+), it is more likely for
Bob to have cancer
Why?
• Evidence more likely to occur under A
𝑃(𝐵|𝐴) > 𝑃(𝐵|𝐴)
recall:
𝐴 = “𝑁𝑜𝑡 𝐴”
This is pretty much Bayes’ Theorem! Johns Hopkins Carey Business School_Probability_ Slide 25
Bayes Updating
• Two competing hypotheses A 1 and A2
• You observe some evidence B
• You have some stat. about (e.g., from frequencies in large
sample)
• P(A1) and P(A 2)
• P(B|A1) and P(B|A2)
• You can have an updated belief about the likelihood of A 1, A2
Belief
before
P(A1)
“prior”
News/evidence: B
Belief
after
P(A1|B)
“posterior”
Johns Hopkins Carey Business School_Probability_ Slide 26
Bayes Updating: Example
Jack and Bill sell insurance in your insurance agency.
• Jack sells 80% of the policies, and Bill sells the rest.
• 10% of the policies Jack sells have a Claim filed within one year,
compared to 25 percent of those sold by Bill.
A client announces his intention to file a claim. What is the
probability Jack sold him the policy?
Model/Math:
P(Jack) = 0.80
P(Bill) = 0.20
P(Claim Jack) = 0.10
P(Claim Bill) = 0.25
Q: P(Jack|Claim) = ?
Johns Hopkins Carey Business School_Probability_ Slide 27
Bayes’ Theorem: Insurance Example
P(Jack) = 0.80
Recall Bayes’ Theorem:
P(Bill) = 0.20
P(Jack and Claim)
P(Jack Claim) =
P(Claim)
P(Claim Jack) = 0.10
P(Claim Bill) = 0.25
However, you do not know P(Jack and Claim) and P(Claim)
directly…
How to calculate them?
P(Jack and Claim) = P(Claim Jack)´ P(Jack)
(Multiplication Rule)
= 0.10 ´ 0.80 = 0.08
Similarly
P(Bill and Claim) = 0.25 ´ 0.20 = 0.05
Johns Hopkins Carey Business School_Probability_ Slide 28
Bayes’ Theorem: Insurance Example
P ( J and Claim )
P ( Jack Claim ) =
P ( J and Claim ) + P ( B and Claim )
=
P ( J )  P( Claim J )
P ( J )  P (Claim J ) + P ( B )  P (Claim B )
0.08
8
=
=  61.5%
0.08 + 0.05 13
Jack
Bill
Claim
.08
.05
.13
No Claim
.72
.15
.87
.80
.20
1.0
Johns Hopkins Carey Business School_Probability_ Slide 29
Example: HIV Testing
0.6% population has HIV.
A HIV test, with 1% type-I error and 0.1% type-II error, i.e.
• 99.9% people with HIV test positive
• 1% people without HIV test (falsely) positive
If Allen gets a positive result from the test, how likely it is that
he really has HIV?
Define HIV= “someone has HIV”, NoH= “someone has no
HIV”, Pos = “Positive result”. We have:
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%,
P(Pos|NoH) = 1%
Q: P(HIV|Pos) = ?
Johns Hopkins Carey Business School_Probability_ Slide 30
Example: HIV Testing (cont.)
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%,
P(Pos|NoH) = 1%. Q: P(HIV|Pos) = ?
Johns Hopkins Carey Business School_Probability_ Slide 31
Example: HIV Testing (cont.)
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%,
P(Pos|NoH) = 1%. Q: P(HIV|Pos) = ?
Upon receiving a positive test result, Allen only has a 37.6%
chance of truly having HIV!!!
Johns Hopkins Carey Business School_Probability_ Slide 32
Example: HIV Testing (cont.)
Why the ending belief P(HIV | Pos) = 37.6% is so low??
“prior”
News/evidence:
Positive result
“posterior”
Belief
before
Belief
after
P(HIV)
=0.6%
P(HIV|Pos)
=37.6%
• Strong evidence (“Pos”) ≠ Strong posterior
• The former only tells you by how much to update your belief
• In this case, belief is updated a lot: from 0.6% to 37.6%
Hopkins
Carey Business
School_Probability_
• Though the ending belief does not Johns
look
very
strong
… Slide 33
With More Than Two Events …
More Generally …
k mutually exclusive events (A1, …, Ak), one of which must be true
• Knowing an event B
• The posterior probability of Ai is
P( Ai  B) P( Ai ) P( B | Ai )
P( Ai | B) =
=
P( B)
P( B)
P( Ai ) P( B | Ai )
=
P( A1 ) P( B | A1 ) + P( A2 ) P( B | A2 ) +…+ P( Ak ) P( B | Ak )
Johns Hopkins Carey Business School_Probability_ Slide 34
Intuitive Method: Flipping Probability Tree
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%, P(Pos|NoH) = 1%
Probability Tree (flipped)
Probability Tree
HIV?
99.9%
Test
Result?
Joint
prob.
Pos
0.5994%
Test
Result?
HIV?
HIV
Neg
HIV
0.6%
No
HIV
Neg
1%
99.4%
No
HIV
Step 0: draw the
flipped tree
Joint
prob.
Pos
37.6%
0.994%
HIV 0.5994%
1.5934%
Pos
Neg
Step 1: copy the joint
prob’s in flipped tree
62.4%
No 0.994%
HIV
Step 2: get the total
Step 3: Calculate the
Johns Hopkins Carey Business School_Probability_ Slide 35
prob. for “Pos”
sought-after posteriors
Intuitive Method: Flipping Probability Tree
Probability Tree
Who sells
policy?
Probability Tree (flipped)
Claim?
10%
Yes
Joint
prob.
Claim?
Who sells
Policy?
Jack
8%
No
Jack
80%
Bill
No
25%
20%
Joint
prob.
Yes
8/13
5%
Jack 8%
13%
5/13
Yes
Bill
Bill
No
Step 0: draw the
flipped tree
Step 1: copy the joint
prob’s in flipped tree
Step 2: get the total
prob. for “Claim”
5%
Step 3: Calculate the
sought-after posteriors
Johns Hopkins Carey Business School_Probability_ Slide 36
Overview – Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
(Notes: Multiple Trails)
Johns Hopkins Carey Business School_Probability_ Slide 37
Example combinations
How many ways to form a homework group of 3 among 8
students (A, B, C, D, E, F, G, H)?
[Note: Treat groups ABC, ACB, BAC, etc. as the same.]
8!
8!
8  7  6  5!
=
=
= 8  7 = 56
8 C3 =
3!(8 − 3)! 3!(5!) 3  2 1(5!)
Johns Hopkins Carey Business School_Probability_ Slide 38
Example permutations
Number of ways to award 3 prizes among 8 students?
“Order” is important: 1st, 2nd and 3rd prizes are different!
Recall counting rule: 8 x 7 x 6 = 336
8 x 7 x 6 = (8 x 7 x 6 x 5 x 4 x 3 x 2 x 1) / (5 x 4 x 3 x 2 x 1)
= 8! / (8-3)!
Johns Hopkins Carey Business School_Probability_ Slide 39
Permutations vs Combinations
In permutations ordering is important.
e.g. Ways to award 3 prizes (1st, 2nd and 3rd) among 8
participants?
(Sam, Joe, Ray) is different from (Joe, Ray, Sam).
When the ordering is not important, we use combinations
Johns Hopkins Carey Business School_Probability_ Slide 40
Example permutations
The CEO of NanoSOFT must select five people from a list
of 15 young executives to serve as examples of outstanding
managerial talent. Each executive is to receive a monetary
reward. The first one selected will get the highest bonus,
the second one the second highest, and so on.
15 P5 =
15! = 151413121110! = 360,360
(15 − 5)!
10!
Johns Hopkins Carey Business School_Probability_ Slide 41
Overview – Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
(Notes: Multiple Trails)
Johns Hopkins Carey Business School_Probability_ Slide 42
Notes: Multiple Trails
[Use P{} to represent probability related to multiple trails]
• P{AB}: Probability of “A on trail 1, followed by B on trail 2”
• P{B|A}: Probability of B on trail 2, given A (has occurred on
trail 1)
When trails are independent:
• P{AB} = P{A} × P{B}
• P{B|A} = P{B} = P{B|B}
Johns Hopkins Carey Business School_Probability_ Slide 43
Multiple Trails: Example
Consider the experiment of “tossing a coin twice”
• Name a possible outcome
HT
• What’s the sample space?
H
{TT, HT, TH, HH}
T
• P{HT} = P(H)P(T) = 0.5 * 0.5 = 0.25
Now, tossing a coin for 10 times
• Which one is more likely?
• Which one is more likely?
A: H H H H H H H H H H
A: 10 H
B: H T T H T H H T T H
B: 5 H 5 T
Johns Hopkins Carey Business School_Probability_ Slide 44
Multiple Trails: Example
Draw two cards one after the other. On each trail, let
A = “Ace of any suite”
B = “K of any suite”
Part 1: If cards are drawn with replacement: trails are
independent
• P{A|A} = 4/52 = P{A|B}
• P{AA} = (4/52)*(4/52)
Part 2: If cards are drawn without replacement: trails are not
independent
• P{A|B} = 4/51, P{A|A} = 3/51
• P{AA} = (4/52)*(3/51)
Johns Hopkins Carey Business School_Probability_ Slide 45
Multiple Trails: Example
Which of the following appears most likely, secondly and least likely?
a. Drawing a red marble from a bag containing 50% red marbles
and 50% white marbles.
Answer: P (R ) = 50%
b. Drawing a red marble from a bag containing 50% red marbles
and 50% white marbles given last drawing you selected a red
marble (with replacement).
Answer: P ( R R ) = P ( R ) = 50%
c. Drawing a red marble seven times in succession with replacement,
from the bag containing 90% red marbles and 10% white marbles.
Answer: P (RRRRRRR ) = (.90)7 = .48
Johns Hopkins Carey Business School_Probability_ Slide 46
To do…
1. Sign up for homework groups – links will be sent out shortly
2. Homework 1 will be posted over the weekend, due in about two weeks
Johns Hopkins Carey Business School_Probability_ Slide 47
Statistical Analysis
Class 5: Chi-Square and ANOVA
Yiqing Xing
Logistics
1. Homework 1 (done) 10%
2. Quiz 1 (done) 10%
3. Homework 2 (due next Friday) 10%
4. Quiz 2 (next week) 10%
5. Empirical Exercise (will be posted later this week) 10%
Final Exam: 10/20, Thursday morning, 9:30-12 conflicts??
• Online via Canvas/Zoom

“Formula sheet”: bring it to final, with (handwriting!) notes
Also posted: list of coverage + some
practice
questions
Johns Hopkins
Carey Business
School_Probability_ Slide 2
Overview – Class 5
1. Chi-Squared Test for Goodness-of-fit
2. Chi-Squared Test for Independence

One-Way Analysis of Variance (ANOVA)
Johns Hopkins Carey Business School_Probability_ Slide 3
Example
A fashion store wishes to
compare consumer
preferences in MD with a known
distribution (based on
historical market shares in CA.)
The store surveys a random
sample of 400 MD consumers.
Johns Hopkins Carey Business School_Probability_ Slide 4
Source: www.pinterest.com
Example
A fashion store wishes to compare consumer preferences in MD with
a known distribution (based on historical market shares in CA.)
The store surveys a random sample of 400 MD consumers.
What should happen if
the distribution is true?
Brand
Distribution
(CA Mkt Share)
MD
Frequency
1
20%
102
2
35%
121
3
30%
120
4
15%
57
Total
100%
400
Johns Hopkins Carey Business School_Probability_ Slide 5
Example
A fashion store wishes to compare consumer preferences in MD with
a known distribution (based on historical market shares in CA.)
The store surveys a random sample of 400 MD consumers.
What should happen if
the distribution is true?
Brand
Distribution
(CA Mkt Share)
MD
Frequency
Expected
Frequency
1
20%
102
80
analyze
2
35%
121
140
squared
3
30%
120
120
differences
4
15%
57
60
Total
100%
400
400
Let’s
Johns Hopkins Carey Business School_Probability_ Slide 6
Example
A fashion store wishes to compare consumer preferences in MD with
a known distribution (based on historical market shares in CA.)
The store surveys a random sample of 400 MD consumers.
fi
Brand
Distribution
(CA Mkt Share)
MD
Frequency
Expected
Frequency
1
20%
102
80
2
35%
121
140
3
30%
120
120
4
15%
57
60
Total
100%
400
400
Ei = npi
Now we get a statistic. What’s its distribution?
Johns Hopkins Carey Business School_Probability_ Slide 7
What is Chi-Square Distribution?
• Suppose each term 𝑓𝑓𝑖𝑖 − 𝐸𝐸𝑖𝑖 follows a Normal
distribution, then the test statistics 𝜒𝜒 2 is
the sum of several Normal RV’s, squared.
Consider j random variables’s Z1, Z2, …, Zj, independent,
and identically distributed according to N(0,1)
• Recall: Z1+ Z2+…+ Zk ~ N(0, k)
• What about Z12+ Z22+…+ Zk2 ?
Johns Hopkins Carey Business School_Probability_ Slide 8
What is Chi-Squared Distribution?
• Formally, defined as the sum of k
independent Z2 (recall Z ~ N(0,1))
• Specified by the degrees of freedom df
(df = # of independent Z2 in the sum)
Not a normal distribution
• Skewed to the right and take only nonnegative values
Called a “chi-squared distribution”.
Johns Hopkins Carey Business School_Probability_ Slide 9
Examples of chi-squared distributions
Johns Hopkins Carey Business School_Probability_ Slide 10
Chi-Squared: Software and Table
Excel functions:
• CHIDIST(number for lookup, df)
• convert the variable value to a probability
• CHIINV(probability, df)
• convert the variable value to a probability
Table
Johns Hopkins Carey Business School_Probability_ Slide 11
Chi-squared test for goodness of fit: Formal Set-up

Each of n items (400 consumers) is classified into one of k
groups (4 brands)

Goal: to test
H0: p1, …, pk are the true probabilities
(p1+…+ pk = 1)

fi (i th observed frequency): observed counts in group i.
(i = 1, 2, …, k)

Ei = npi (i th expected frequency): expected number in
group i, if pi is indeed the true probability

Intuition: Compare fi’s to Ei’s, to see whether the
observed and expected are consistent.
Johns Hopkins Carey Business School_Probability_ Slide 12
Chi-square test for goodness of fit: steps
H0: probabilities are p1, p2, … , pk
Ha: the null hypothesis is not true
1. Compute the test statistic:
2. Find the p-value (with software/table) using the chi-square
distribution with (k – 1) degrees of freedom
3. Reject H0 at significance level α if p-value < α Note: Large values of the test statistic provide evidence against H0. Distributions _ Slide 13 Johns Hopkins Carey Business School _ Sampling and Sampling Fashion Store Example 2 (𝑓𝑓 − 𝐸𝐸 ) 𝑖𝑖 𝑖𝑖 𝜒𝜒 2 = � 𝐸𝐸𝑖𝑖 H0: p1 = 0.20, p2 = 0.35, p3 = 0.30, p4 = 0.15 Let α = 0.05 df = (4-1) = 3 𝑖𝑖=1,…,𝑘𝑘 fi Ha: H0 fails to hold Brand Distribution MD Frequency Expected Frequency 1 20% 102 80 6.05 2 35% 121 140 2.58 3 30% 120 120 0 4 15% 57 60 0.15 Total 100% 400 400 8.78 p-value = 0.0324 < α = 0.05 Ei = npi Decision: Reject H0 at the 0.05 significance level Interpretation: our evidence does not support the null hypothesis that MD consumer preference follows the same distribution as in CA Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 14 Exercise A manager wishes to test whether employee absence is equally likely across day of the week (Monday through Friday). She collected the counts of the number of days of absence of 200 employees by day of the week. • What are the hypotheses? H0: p1 = 20%, p2 = 20%, … , p5 = 20% Ha: the null hypothesis is not true • What is the degrees of freedom, df ? We have k = 5 groups (Monday through Friday), so df = k-1=4. Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 15 Additional Note 1: Size Guidelines for goodness of fit test The test generally works well when all the expected counts (Ei) are at least 5 • What if not? Q: df = ? • We can group some categories df = 4 - 1 = 3 Category Expected frequencies 1 2 2 2.3 3 1.2 4 29.2 5 10.1 6 5.2 Total 50 Category Expected frequencies 1, 2 or 3 5.5 4 29.2 5 10.1 6 5.2 Total 50 Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 16 Additional Note 2: Estimating Parameters from the Sample Example: Officials of the SAT test claims that the scores are normally distributed. Consider a sample of 200 writing scores. Score Frequencies 200-350 21 351-425 24 426-500 45 501-575 67 576-650 22 651-800 21 Total 200 Challenge: we do not know the parameters for the normal distribution! Need to estimate ! Solution: we estimate those using , calculated from the sample. Then do chi-square test. Now df = k – 1 – m =6–1–2 =3 m = 2 is # of parameters estimated from sample Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 17 Chi-Squared Tests • How should a fashion store compare consumer preferences in Maryland with the known distribution (based on historical market shares) in California?  Chi-squared test for goodness of fit • A financial institution sells several kinds of investment products. How do we examine whether customer satisfaction depends on the type of investment product purchased?  Chi-squared test for independence Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 18 Overview – Class 5 1. Chi-Squared Test for Goodness-of-fit 2. Chi-Squared Test for Independence • One-Way Analysis of Variance (ANOVA) [Optional] Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 19 Example A financial institution sells several kinds of investment products. Does client satisfaction depend on investment fund type? Client Satisfaction Fund High Low Med All Bond 15 3 12 30 Stock 24 2 4 30 TaxDef 1 15 24 40 All 40 20 40 100 Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 20 Chi-Squared Test for Independence H0: there is independence between the row variable and the column variable. Ha: the two variables are dependent Again, We need a measure of how much the observed data deviates from what we would expect under H0 Observed Freq. Expected Freq. Fund High Low Med All Fund High Low Med All Bond 15 3 12 30 Bond 12 6 12 30 Stock 24 2 4 30 Stock 12 6 12 30 TaxDef 1 15 24 40 TaxDef 16 8 16 40 All 40 20 40 100 All 40 20 40 100 Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 21 Chi-Squared Test Statistic 𝜒𝜒 2 = (𝑓𝑓𝑖𝑖𝑖𝑖 − 𝐸𝐸�𝑖𝑖𝑖𝑖 )2 � = � 𝐸𝐸𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑟𝑟𝑖𝑖 ×𝑐𝑐𝑗𝑗 � in which 𝐸𝐸𝑖𝑖𝑖𝑖 = (obs. freq – exp. freq)2 � exp. freq 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑛𝑛 Larger 𝜒𝜒 2 row 𝑖𝑖 total × column 𝑗𝑗 total i.e., exp. freq at cell 𝑖𝑖𝑖𝑖 = 𝑛𝑛  Smaller p  More likely to reject  More evidence against H0 Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 22 Degrees of Freedom Degrees of Freedom: df = (r - 1) × (c - 1) (r is the number of rows in the table, and c is the number of columns) Reject H0 at significance level α if p-value < α Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 23 Client Satisfaction Example H0: Client satisfaction is independent of fund type Ha: Client satisfaction is not independent of fund type Observed Freq. Fund High Low Med Bond 15 3 12 Stock 24 2 4 TaxDef 1 15 24 All 40 20 40 𝜒𝜒 2 = All 30 30 40 100 Expected Freq. Fund High Low Med Bond 12 6 12 Stock 12 6 12 TaxDef 16 8 16 All 40 20 40 All 30 30 40 100 (𝑓𝑓𝑖𝑖𝑖𝑖 − 𝐸𝐸�𝑖𝑖𝑖𝑖 )2 (15 – 12)2 (3 – 6)2 24– 16 2 + + ⋯+ ≈ 46.44 � = � 12 6 16 𝐸𝐸𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 p-value = 0.0000 < α = 0.05 Decision: Reject H0 at the 0.05 significance level df = (3-1)(3-1) = 4 Interpretation: our evidence suggests that client satisfaction is dependent Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 24 of fund type. Size Guidelines for independence test The test generally works well when all the expected cell counts (𝐸𝐸�𝑖𝑖𝑖𝑖 ) are 5 or more • If not: again we can combine some rows/columns then do the chi-square test based on the transformed data. Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 25 Execrise Suppose you obtain a simple random sample of 50 working DC residents, and you want to test whether the distribution of job satisfaction level (1, 2, …, 5) stays the same across three age groups (

Don't use plagiarized sources. Get Your Custom Essay on
Statistics Question
Just from $13/Page
Order Essay
Achiever Essays
Calculate your paper price
Pages (550 words)
Approximate price: -

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

24/7 Customer Support

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

Calculate the price of your order

Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

Essay Writing Service

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.

Live Chat+1(978) 822-0999EmailWhatsApp

Order your essay today and save 20% with the discount code RESEARCH

slot online