List of materials, Statistical Analysis, 2022For the final exam you will be tested on the following:
l
Marginal, joint, conditional probability under statistically independent and
dependent conditions
l
Use the rules of complements, addition, and multiplication
l
Use and interpret Bayes Theorem
l
Answer questions on useful distributions including Bernoulli, Binomial,
Poisson, Uniform, Exponential, Normal, t, and Chi-squared
l
Use counting rules (Combination and Permutation) to calculate probabilities
l
Calculate standard deviations (errors) for sample means and proportions
l
Calculate and interpret Confidence Intervals for populations and
proportions using z- and/or t-.
l
Conduct and interpret Hypothesis tests for populations and proportions
using Z or t statistics, one- or two- populations; including p-values
l
Conduct and interpret two Chi-square tests; understand degrees of freedom
l
Conduct and interpret ANOVA tests; understand degrees of freedom
l
Interpret each part of a linear regression result table;
l
Explain quantitatively the relationship between independent variables and
dependent variable using the coefficients;
l
Understand R2, residuals, and confident intervals for linear regressions
l
Understand that correlation does not imply causality – come up with
alternative stories to linear regression results
l
Understand other “pitfalls” of linear regression models
Note: Since the final will be covering the whole course, it is important for you to
understand which part(s) each question is about, i.e., which tool(s) we learned in
this course should be used for each application.
Final Exam Practice Questions
Notes:
• Our final will include short questions (multiple choice) and full questions
(calculation and/or interpretation)
•
Here we provide you 15 examples of “short questions”. The goal is to show you the
style of short questions we will see in the final. It also provides extra examples for
you to practice.
•
For the style of “full questions”, you can review our quiz/homework questions as
well as exercise questions after in textbook.
•
The list of points of knowledge covered here and that covered in our final are not
necessarily (exactly) the same – so you should still follow your own pace for
reviewing materials, instead of only working on the practice questions.
Section A: Multiple Choices I (only one correct option in each question)
1.
A recent survey conducted by the personnel manager of a major enterprise
resources planning (ERP) company showed that 35% of the employees were dissatisfied
with their salary, 80% were satisfied with their work assignments, 15% were dissatisfied
with their work hours, 17% were dissatisfied with both their salary and work assignments,
and 8% were dissatisfied with both their work assignments and work hours. What is the
percentage of employees who are satisfied with both their salary and work assignments?
A)
B)
C)
D)
E)
0.38
0.02
0.62
0.52
None of the above
2.
Toss a fair die 4 four times. What is the probability that you get an even number for
the first toss, and an odd number for the second toss, and two 6’s for the last two tosses?
A)
B)
C)
D)
E)
1/2
(1/2)*(1/2)*(1/6)*(1/6)
(1/6)*(1/6)*(1/6)*(1/6)
0
None of the above.
3.
Suppose the heights of graduate students at University Tallmen is approximately
normally distributed with mean 180cm and standard deviation 15cm. And the heights of
undergraduate students at the same university is approximately normally distributed with
mean 180 cm and standard deviation 10cm. Please find the height x, such that about 16% of
the undergraduate students are at least of height x.
A)
B)
C)
D)
E)
195cm
180cm
165cm
170cm
190cm
4.
Suppose we have constructed a 95% confidence interval for the population
proportion parameter: [0.1, 0.7]. Which of the following interpretation is correct?
A)
B)
C)
D)
In repeated sampling, if we construct interval estimates using the same formula as
we did for calculating [0.1, 0.7], 95% of these intervals cover the true proportion.
95% of the observations in the sample lie in the given interval.
We are certain that the true population proportion is in [0.1, 0.7].
None of the above
5.
A and B are two events. P(A|B) = 0.3, P(B)=0.5, P(A)=0.4. What is P(A∩B) + P (B|A)?
A)
B)
C)
D)
E)
Cannot determine based on given information.
0.525
0.575
0.5
0.45
6.
Many people claim that at least 45% of residents in MD rank Maryland University
above JHU. You want to show that the percentage is much lower. How should you formulate
the alternative hypothesis for your test?
A)
B)
C)
D)
p ≥ 0.45
p < 0.45
p ≠ 0.45
None of the above
7.
Suppose your portfolio include 100 shares of stock A plus 100 dollars in cash. Let X
denote the return of one share of stock A, and you know E(X) = 3 dollars. What is the
expected value of your portfolio?
A)
B)
C)
D)
400 dollars
103 dollars
300 dollars
100 dollars
8.
A commuter owns two cars, one a compact and one SUV. About 80% of the time he
uses the compact to travel to work and the SUV for the remaining 20%. When he uses the
compact car he gets home by 3 p.m. about 70% of the time; if he uses the SUV he gets home
by 3 p.m. about 60% of the time. If on one day he gets home after 3 p.m., what is the
probability that he used the compact car?
A)
B)
C)
D)
40%
70%
75%
50%
9.
The restaurant chain FastFoodGo surveys 400 customers with two questions: (i)
what’s the quality of the food (4 levels)? (ii) Will you recommend to a friend? Summarizing
all questionnaires produced the following table of joint probabilities (one entry is missing).
Rating
Will recommend
Will not recommend
Poor
0.02
0.10
Fair
0.08
Good
0.35
0.14
Excellent
0.20
0.02
Let 𝑃! denote the probability that a customer who gave the restaurant a rating of Poor will
recommend the restaurant to a friend. Let 𝑃" denote the probability that a customer who
will not recommend the restaurant to a friend gave an “Excellent” rating to the restaurant.
What is 𝑃! +𝑃" ?
A)
B)
C)
D)
0.08
0.17
0.65
0.22
Use the following regression output to answer questions 10-12:
A sample of data was used to create a regression model to predict life expectancy (in years)
in various countries as a function of health expenditure per capita (in $1k), percentage of
smokers, alcohol consumption per capita (in Liters), and percentage of obese population.
[Note: these sample questions are created based on the example covered in our lecture. This
is not necessarily the case for our final exam.]
Residuals:
Min
1Q
-5.7286 -0.8571
Median
0.0276
3Q
1.8234
Max
3.4527
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
80.635844
2.226717 36.213 < 2e-16 ***
health_expenditure 0.763668
0.255114
2.993 0.00598 **
smoker_perc
-0.005631
0.086649 -0.065 0.94868
alcohol
-0.104202
0.173024 -0.602 0.55223
obese_perc
-0.120079
0.074961 -1.602 0.12126
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.192 on 26 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.3622,
Adjusted R-squared: 0.2641
F-statistic: 3.691 on 4 and 26 DF, p-value: 0.0165
10.
Among all data points (countries), what is the maximum difference (in absolute
value) between the predicted life expectancy and actual life expectancy?
A)
B)
C)
D)
E)
5.73
3.45
80.64
0.76
It cannot be determined given the current info.
11.
Is the coefficients obese_perc and alcohol significantly different from 0 at a 5%
significance level?
A)
B)
C)
D)
E)
obese_perc is significant, but alcohol is not significant
obese_perc is not significant, but alcohol is significant
obese_perc is significant, and alcohol is significant
obese_perc is not significant, and alcohol is not significant
We cannot determine their significance given the current info.
12.
Which of the following is a correct explanation based on the result:
A)
An 1% increase in health_expenditure per capita is associated to an 0.76% increase
in life_expectancy on average
An 1$ increase in health_expenditure per capita is associated to an 0.76% increase
in life_expectancy on average
An 1k$ increase in health_expenditure per capita is associated to an 0.76 years
increase in life_expectancy on average
An 1$ increase in health_expenditure per capita is associated to an 0.76 years
increase in life_expectancy on average
B)
C)
D)
Section B: Multiple Choices II (one or more correct options in each question)
[Note: these will be implemented as True/False questions.]
13.
Which of the following is/are correct regarding t-distribution and chi-square
distribution?
A)
Both distributions have density curves (pdf) symmetric about 0.
B)
Both distributions have one parameter.
C)
Both distributions take only non-negative values.
D)
The area under the t-distribution density curves is bigger than that under the chisquare distribution density curves.
14.
Consider a random sample (of size n), whose observations are independently and
identically drawn from a distribution. Regarding sampling distribution of the sample mean,
which of the following is/are correct?
A)
B)
C)
D)
Sampling distribution of the sample mean is always normally distributed.
The expected value of the sample mean is smaller than the population mean.
The standard deviation of the sample mean decreases as the sample size increases.
When the population standard deviation is unknown, the sample mean follows a tdistribution of n-1 degrees of freedom.
15.
Which of the following statements are true regarding standard normal and tdistribution(s)?
A)
As the degree of freedom (df) increases, the t-distribution looks more and more like
the standard normal distribution
B)
They are both symmetric around 0
C)
𝑧#.#% – 𝑡#.#%,"# ≥ 0
D)
They are both bell-shaped
Suggested Answers:
1: C) 1-(0.35+0.2-0.17)=0.62
2: B)
3: E) Note the info about graduate students is irrelevant
4: A) This is the definition of confidence interval
5: B) 0.3*0.5+(0.3*0.5)/0.4
6: B) You wish to show that “the percentage is much lower than 45%”, hence you would like
to put that statement as the alternative in the test. (Following the logic of proof by
contradiction)
7: A) E(100X+100) = 100*E(X)+100 = 400.
8: C)
9: D) first the missing cell is 1-(0.02+0.08+0.35+0.20+0.1+0.14+0.02)=0.09. 𝑃! =
0.02/(0.02+0.10), and 𝑃" = 0.02/(0.10+0.09+0.14+0.02), P1+P2 = 0.22.
10: A): this is asking the largest residual (in abs. value)
11: D)
12: C)
13: B) only
14: C) only
15: A), B) and D)
Statistical Analysis
Class 1. Probability
Prof. Yiqing Xing
Logistics
• Instructor: Prof. Yiqing Xing xingyq@jhu.edu
• TA: Mr. Ronghao Lu, rlu23@jhu.edu
• Office hours (Zoom);
•
Tuesdays 12 – 2 pm; starting next week
• Best way to reach us: emails
• Zoom/recordings available upon your request.
Johns Hopkins Carey Business School_Probability_ Slide 1
Grading
40% Final + 50% Homework/Quiz + 10% Attendance
• 3 in-class quizzes – classes 2, 4 and 6
• 2 Homework
• 1 Empirical exercise
• Homework group: 3-4 per group. Will send out google docs
for you to sign up; by this Sunday (09/04)
• No pop-up quizzes
Johns Hopkins Carey Business School_Probability_ Slide 2
Overview - Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
Johns Hopkins Carey Business School_Probability_ Slide 3
Experiment, Outcome, Sample Space
Experiment- process of observation that has an uncertain outcome
Outcome- potential result of an experiment
Sample Space (S)- the set of all possible outcomes from an
experiment
Common elementary examples
1. Toss a coin
2.
Draw a card
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
Johns Hopkins Carey Business School_Probability_ Slide 4
Event
Event - The collection some outcomes
• NA or #(A): number of outcomes in an event A
• Sample space S itself is an event, with its number of outcomes NS
Example [Experiment: “Draw a card”]
Some events
A
2
3
4
5
6
7
8
9 10 J
Q K
A: “getting an Ace”
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
B: “getting a
”
C: “getting a 5 or K”
NA = ?, NB = ?, NC = ?
Johns Hopkins Carey Business School_Probability_ Slide 5
Probability
Probability- The chance that something will happen expressed in
fractions, decimals or percent's, such that
1. If o is an outcome, then 0 £ P(o) £1
2. The probabilities of all of the sample space outcomes must
sum to 1.
The probability of an event is the sum of the probabilities of its
outcomes
• E is a certain event if P(E) = 1
• E is an impossible event if P(E) = 0
Johns Hopkins Carey Business School_Probability_ Slide 6
Overview - Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
Johns Hopkins Carey Business School_Probability_ Slide 7
Probability Rules
• Rule of complements
• Addition rule
• Multiplication Rule (later in this lecture)
Johns Hopkins Carey Business School_Probability_ Slide 8
Rule of Complements
The complement ( A ) of an event A is the set of all
sample space outcomes not in A
Rule of Complements
P ( A ) = 1− P ( A)
Next: the Addition Rule …
A
A
Venn diagram
Johns Hopkins Carey Business School_Probability_ Slide 9
Union and Intersection
The union of A and B is the event that includes outcome(s)
in either A or B , or both.
Written as: A∪B
The intersection of A and B is the event that includes
outcome(s) that belong to both A and B
Written as: A∩B
Johns Hopkins Carey Business School_Probability_ Slide 10
Union and Intersection
Example [Experiment: “Draw a card”]
A: “an A”
B: “a
”
A∪B : an A or a
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
A∩B : an A
B : an , , or
Johns Hopkins Carey Business School_Probability_ Slide 11
The Addition Rule 1: Mutually Exclusive Events
A and B are mutually exclusive if they have no outcomes in
common
A
B
In other words:
P(A ∩ B) = 0
If A and B are mutually exclusive:
P(A∪ B) = P(A) + P(B)
Johns Hopkins Carey Business School_Probability_ Slide 12
The Addition Rule (cont.)
If A and B are non-mutually exclusive:
P(A∪ B) = P(A) + P(B) − P(A∩ B)
where P(A∩ B) is the joint probability of A and B both
occurring together
Johns Hopkins Carey Business School_Probability_ Slide 13
Exercise for you
[Experiment: “Draw a card”]
A: “getting an A” P=1/13
A
2
3
4
5
6
7
8
9 10 J
Q K
B: “getting a
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
A
2
3
4
5
6
7
8
9 10 J
Q K
” P=1/4
C: “getting a 5 or K” P=2/13
A and C are mutually exclusive
A and B are not mutually exclusive
P(A∪C) = 1/13 + 2/13 = 3/13
P(A∪B) = 1/13 + 1/4 – 1/52 = 16/52 = 4/13
Johns Hopkins Carey Business School_Probability_ Slide 14
Overview - Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
Johns Hopkins Carey Business School_Probability_ Slide 15
Conditional Probability
Johns Hopkins Carey Business School_Probability_ Slide 16
Source: https://watch.global.nba.com/players/hotzones/#!/stephen_curry
Conditional Probability
“Researchers found 19 infections among 488 unvaccinated people. They also
found 24 infections among 2,352 fully vaccinated people.”
A: infection
B: vaccinated
• P(A) = 43/2840 = 1.5%
vacc
infection 24
unvacc
total
19
43
488
2840
no infect
2352
• Suppose you know someone is vaccinated, then probability of
A (infection) becomes 24/2352 = 1.0%
The above is called conditional probability,
written as P(A|B),
read as “probability of event A given event B”
Johns Hopkins Carey Business School_Probability_ Slide 17
Calculating Conditional Probability
P(A | B) = P(A ∩ B) / P(B)
vacc
infection 24
unvacc
Total
19
43
488
2840
no infect
2352
P(inf |vacc) == ?P(inf ∩ vacc) / P(vacc)
= (24/2840) / (2352/2840)
= 24 / 2352 = 1.0%
P(inf |unvac) == ?P(inf ∩ unvacc) / P(unvacc)
= 19/488 = 3.9%
Exercise: P(no infect |vacc) = ??
Johns Hopkins Carey Business School_Probability_ Slide 18
Independence of Events
Two events A and B are independent if and only if:
P ( A B ) = P( A)
This is equivalent to
P ( B A) = P( B)
[Assumes P ( A) and P (B ) greater than zero]
Question: Are “vaccinated” and “infected” independent?
Exercise: Name an experiment with two independent events
Johns Hopkins Carey Business School_Probability_ Slide 19
Multiplication Rule: Calculating the Joint Probability
The joint probability of A and B (the intersection of A and B ) :
P(A∩B) = P(A) P(B | A) = P(B) P(A | B)
[simply reverse the definition of conditional probability..]
If A and B are independent, this becomes
P(A∩ B) = P(A) P(B)
Example:
B: “getting a red card”: P(B) = 1/2
C: “getting a K”: P(C) = 3/13
The prob. of getting “a red K” = P(B∩C) = P(B)P(C) = 3/26
Johns Hopkins Carey Business School_Probability_ Slide 20
Overview - Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
Johns Hopkins Carey Business School_Probability_ Slide 21
Bayes’ Theorem: Example
Why??
Unvaccinated → higher chances of infections → more costly
Johns Hopkins Carey Business School_Probability_ Slide 22
Example: Age and Cancer
An insurance company wishes to know whether Bob has
cancer. You know Bob is above 65-year-old.
Suppose A = “Someone has cancer”; B = “Someone is 65+”
In data you found that
P(A) = 1%, P(B) = 15%, and P(B|A) = 45%.
What’s the likelihood of Bob having cancer, i.e. P(A|B)?
Johns Hopkins Carey Business School_Probability_ Slide 23
Example: Age and Cancer (cont.)
P(A) = 1%, P(B) = 15%, and P(B|A) = 45%.
What’s the likelihood of Bob having cancer, i.e. P(A|B)?
𝑃 𝐴∩𝐵
𝑃 𝐴 𝑃(𝐵|𝐴) 1% ∗ 45%
𝑃 𝐴 𝐵 =
=
=
= 3%
𝑃(𝐵)
𝑃(𝐵)
15%
Bayes’ Rule/Theorem
Johns Hopkins Carey Business School_Probability_ Slide 24
Learning From the Evidence
Belief
before
P(A)
“prior”
Evidence
Belief
after
P(A|B)
“posterior”
P(A|B) > P(A): given the evidence (65+), it is more likely for
Bob to have cancer
Why?
• Evidence more likely to occur under A
𝑃(𝐵|𝐴) > 𝑃(𝐵|𝐴)
recall:
𝐴 = “𝑁𝑜𝑡 𝐴”
This is pretty much Bayes’ Theorem! Johns Hopkins Carey Business School_Probability_ Slide 25
Bayes Updating
• Two competing hypotheses A 1 and A2
• You observe some evidence B
• You have some stat. about (e.g., from frequencies in large
sample)
• P(A1) and P(A 2)
• P(B|A1) and P(B|A2)
• You can have an updated belief about the likelihood of A 1, A2
Belief
before
P(A1)
“prior”
News/evidence: B
Belief
after
P(A1|B)
“posterior”
Johns Hopkins Carey Business School_Probability_ Slide 26
Bayes Updating: Example
Jack and Bill sell insurance in your insurance agency.
• Jack sells 80% of the policies, and Bill sells the rest.
• 10% of the policies Jack sells have a Claim filed within one year,
compared to 25 percent of those sold by Bill.
A client announces his intention to file a claim. What is the
probability Jack sold him the policy?
Model/Math:
P(Jack) = 0.80
P(Bill) = 0.20
P(Claim Jack) = 0.10
P(Claim Bill) = 0.25
Q: P(Jack|Claim) = ?
Johns Hopkins Carey Business School_Probability_ Slide 27
Bayes’ Theorem: Insurance Example
P(Jack) = 0.80
Recall Bayes’ Theorem:
P(Bill) = 0.20
P(Jack and Claim)
P(Jack Claim) =
P(Claim)
P(Claim Jack) = 0.10
P(Claim Bill) = 0.25
However, you do not know P(Jack and Claim) and P(Claim)
directly…
How to calculate them?
P(Jack and Claim) = P(Claim Jack)´ P(Jack)
(Multiplication Rule)
= 0.10 ´ 0.80 = 0.08
Similarly
P(Bill and Claim) = 0.25 ´ 0.20 = 0.05
Johns Hopkins Carey Business School_Probability_ Slide 28
Bayes’ Theorem: Insurance Example
P ( J and Claim )
P ( Jack Claim ) =
P ( J and Claim ) + P ( B and Claim )
=
P ( J ) P( Claim J )
P ( J ) P (Claim J ) + P ( B ) P (Claim B )
0.08
8
=
= 61.5%
0.08 + 0.05 13
Jack
Bill
Claim
.08
.05
.13
No Claim
.72
.15
.87
.80
.20
1.0
Johns Hopkins Carey Business School_Probability_ Slide 29
Example: HIV Testing
0.6% population has HIV.
A HIV test, with 1% type-I error and 0.1% type-II error, i.e.
• 99.9% people with HIV test positive
• 1% people without HIV test (falsely) positive
If Allen gets a positive result from the test, how likely it is that
he really has HIV?
Define HIV= “someone has HIV”, NoH= “someone has no
HIV”, Pos = “Positive result”. We have:
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%,
P(Pos|NoH) = 1%
Q: P(HIV|Pos) = ?
Johns Hopkins Carey Business School_Probability_ Slide 30
Example: HIV Testing (cont.)
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%,
P(Pos|NoH) = 1%. Q: P(HIV|Pos) = ?
Johns Hopkins Carey Business School_Probability_ Slide 31
Example: HIV Testing (cont.)
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%,
P(Pos|NoH) = 1%. Q: P(HIV|Pos) = ?
Upon receiving a positive test result, Allen only has a 37.6%
chance of truly having HIV!!!
Johns Hopkins Carey Business School_Probability_ Slide 32
Example: HIV Testing (cont.)
Why the ending belief P(HIV | Pos) = 37.6% is so low??
“prior”
News/evidence:
Positive result
“posterior”
Belief
before
Belief
after
P(HIV)
=0.6%
P(HIV|Pos)
=37.6%
• Strong evidence (“Pos”) ≠ Strong posterior
• The former only tells you by how much to update your belief
• In this case, belief is updated a lot: from 0.6% to 37.6%
Hopkins
Carey Business
School_Probability_
• Though the ending belief does not Johns
look
very
strong
… Slide 33
With More Than Two Events …
More Generally …
k mutually exclusive events (A1, …, Ak), one of which must be true
• Knowing an event B
• The posterior probability of Ai is
P( Ai B) P( Ai ) P( B | Ai )
P( Ai | B) =
=
P( B)
P( B)
P( Ai ) P( B | Ai )
=
P( A1 ) P( B | A1 ) + P( A2 ) P( B | A2 ) +…+ P( Ak ) P( B | Ak )
Johns Hopkins Carey Business School_Probability_ Slide 34
Intuitive Method: Flipping Probability Tree
P(HIV) = 0.6%, P(NoH) = 99.4%, P(Pos|HIV) = 99.9%, P(Pos|NoH) = 1%
Probability Tree (flipped)
Probability Tree
HIV?
99.9%
Test
Result?
Joint
prob.
Pos
0.5994%
Test
Result?
HIV?
HIV
Neg
HIV
0.6%
No
HIV
Neg
1%
99.4%
No
HIV
Step 0: draw the
flipped tree
Joint
prob.
Pos
37.6%
0.994%
HIV 0.5994%
1.5934%
Pos
Neg
Step 1: copy the joint
prob’s in flipped tree
62.4%
No 0.994%
HIV
Step 2: get the total
Step 3: Calculate the
Johns Hopkins Carey Business School_Probability_ Slide 35
prob. for “Pos”
sought-after posteriors
Intuitive Method: Flipping Probability Tree
Probability Tree
Who sells
policy?
Probability Tree (flipped)
Claim?
10%
Yes
Joint
prob.
Claim?
Who sells
Policy?
Jack
8%
No
Jack
80%
Bill
No
25%
20%
Joint
prob.
Yes
8/13
5%
Jack 8%
13%
5/13
Yes
Bill
Bill
No
Step 0: draw the
flipped tree
Step 1: copy the joint
prob’s in flipped tree
Step 2: get the total
prob. for “Claim”
5%
Step 3: Calculate the
sought-after posteriors
Johns Hopkins Carey Business School_Probability_ Slide 36
Overview – Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
(Notes: Multiple Trails)
Johns Hopkins Carey Business School_Probability_ Slide 37
Example combinations
How many ways to form a homework group of 3 among 8
students (A, B, C, D, E, F, G, H)?
[Note: Treat groups ABC, ACB, BAC, etc. as the same.]
8!
8!
8 7 6 5!
=
=
= 8 7 = 56
8 C3 =
3!(8 − 3)! 3!(5!) 3 2 1(5!)
Johns Hopkins Carey Business School_Probability_ Slide 38
Example permutations
Number of ways to award 3 prizes among 8 students?
“Order” is important: 1st, 2nd and 3rd prizes are different!
Recall counting rule: 8 x 7 x 6 = 336
8 x 7 x 6 = (8 x 7 x 6 x 5 x 4 x 3 x 2 x 1) / (5 x 4 x 3 x 2 x 1)
= 8! / (8-3)!
Johns Hopkins Carey Business School_Probability_ Slide 39
Permutations vs Combinations
In permutations ordering is important.
e.g. Ways to award 3 prizes (1st, 2nd and 3rd) among 8
participants?
(Sam, Joe, Ray) is different from (Joe, Ray, Sam).
When the ordering is not important, we use combinations
Johns Hopkins Carey Business School_Probability_ Slide 40
Example permutations
The CEO of NanoSOFT must select five people from a list
of 15 young executives to serve as examples of outstanding
managerial talent. Each executive is to receive a monetary
reward. The first one selected will get the highest bonus,
the second one the second highest, and so on.
15 P5 =
15! = 151413121110! = 360,360
(15 − 5)!
10!
Johns Hopkins Carey Business School_Probability_ Slide 41
Overview – Probability
1. Basic Concepts
2. Probability Rules
3. Conditional Probability
4. Bayes’ Theorem
5. Counting Rules
(Notes: Multiple Trails)
Johns Hopkins Carey Business School_Probability_ Slide 42
Notes: Multiple Trails
[Use P{} to represent probability related to multiple trails]
• P{AB}: Probability of “A on trail 1, followed by B on trail 2”
• P{B|A}: Probability of B on trail 2, given A (has occurred on
trail 1)
When trails are independent:
• P{AB} = P{A} × P{B}
• P{B|A} = P{B} = P{B|B}
Johns Hopkins Carey Business School_Probability_ Slide 43
Multiple Trails: Example
Consider the experiment of “tossing a coin twice”
• Name a possible outcome
HT
• What’s the sample space?
H
{TT, HT, TH, HH}
T
• P{HT} = P(H)P(T) = 0.5 * 0.5 = 0.25
Now, tossing a coin for 10 times
• Which one is more likely?
• Which one is more likely?
A: H H H H H H H H H H
A: 10 H
B: H T T H T H H T T H
B: 5 H 5 T
Johns Hopkins Carey Business School_Probability_ Slide 44
Multiple Trails: Example
Draw two cards one after the other. On each trail, let
A = “Ace of any suite”
B = “K of any suite”
Part 1: If cards are drawn with replacement: trails are
independent
• P{A|A} = 4/52 = P{A|B}
• P{AA} = (4/52)*(4/52)
Part 2: If cards are drawn without replacement: trails are not
independent
• P{A|B} = 4/51, P{A|A} = 3/51
• P{AA} = (4/52)*(3/51)
Johns Hopkins Carey Business School_Probability_ Slide 45
Multiple Trails: Example
Which of the following appears most likely, secondly and least likely?
a. Drawing a red marble from a bag containing 50% red marbles
and 50% white marbles.
Answer: P (R ) = 50%
b. Drawing a red marble from a bag containing 50% red marbles
and 50% white marbles given last drawing you selected a red
marble (with replacement).
Answer: P ( R R ) = P ( R ) = 50%
c. Drawing a red marble seven times in succession with replacement,
from the bag containing 90% red marbles and 10% white marbles.
Answer: P (RRRRRRR ) = (.90)7 = .48
Johns Hopkins Carey Business School_Probability_ Slide 46
To do…
1. Sign up for homework groups – links will be sent out shortly
2. Homework 1 will be posted over the weekend, due in about two weeks
Johns Hopkins Carey Business School_Probability_ Slide 47
Statistical Analysis
Class 5: Chi-Square and ANOVA
Yiqing Xing
Logistics
1. Homework 1 (done) 10%
2. Quiz 1 (done) 10%
3. Homework 2 (due next Friday) 10%
4. Quiz 2 (next week) 10%
5. Empirical Exercise (will be posted later this week) 10%
Final Exam: 10/20, Thursday morning, 9:30-12 conflicts??
• Online via Canvas/Zoom
•
“Formula sheet”: bring it to final, with (handwriting!) notes
Also posted: list of coverage + some
practice
questions
Johns Hopkins
Carey Business
School_Probability_ Slide 2
Overview – Class 5
1. Chi-Squared Test for Goodness-of-fit
2. Chi-Squared Test for Independence
•
One-Way Analysis of Variance (ANOVA)
Johns Hopkins Carey Business School_Probability_ Slide 3
Example
A fashion store wishes to
compare consumer
preferences in MD with a known
distribution (based on
historical market shares in CA.)
The store surveys a random
sample of 400 MD consumers.
Johns Hopkins Carey Business School_Probability_ Slide 4
Source: www.pinterest.com
Example
A fashion store wishes to compare consumer preferences in MD with
a known distribution (based on historical market shares in CA.)
The store surveys a random sample of 400 MD consumers.
What should happen if
the distribution is true?
Brand
Distribution
(CA Mkt Share)
MD
Frequency
1
20%
102
2
35%
121
3
30%
120
4
15%
57
Total
100%
400
Johns Hopkins Carey Business School_Probability_ Slide 5
Example
A fashion store wishes to compare consumer preferences in MD with
a known distribution (based on historical market shares in CA.)
The store surveys a random sample of 400 MD consumers.
What should happen if
the distribution is true?
Brand
Distribution
(CA Mkt Share)
MD
Frequency
Expected
Frequency
1
20%
102
80
analyze
2
35%
121
140
squared
3
30%
120
120
differences
4
15%
57
60
Total
100%
400
400
Let’s
Johns Hopkins Carey Business School_Probability_ Slide 6
Example
A fashion store wishes to compare consumer preferences in MD with
a known distribution (based on historical market shares in CA.)
The store surveys a random sample of 400 MD consumers.
fi
Brand
Distribution
(CA Mkt Share)
MD
Frequency
Expected
Frequency
1
20%
102
80
2
35%
121
140
3
30%
120
120
4
15%
57
60
Total
100%
400
400
Ei = npi
Now we get a statistic. What’s its distribution?
Johns Hopkins Carey Business School_Probability_ Slide 7
What is Chi-Square Distribution?
• Suppose each term 𝑓𝑓𝑖𝑖 − 𝐸𝐸𝑖𝑖 follows a Normal
distribution, then the test statistics 𝜒𝜒 2 is
the sum of several Normal RV’s, squared.
Consider j random variables’s Z1, Z2, …, Zj, independent,
and identically distributed according to N(0,1)
• Recall: Z1+ Z2+…+ Zk ~ N(0, k)
• What about Z12+ Z22+…+ Zk2 ?
Johns Hopkins Carey Business School_Probability_ Slide 8
What is Chi-Squared Distribution?
• Formally, defined as the sum of k
independent Z2 (recall Z ~ N(0,1))
• Specified by the degrees of freedom df
(df = # of independent Z2 in the sum)
Not a normal distribution
• Skewed to the right and take only nonnegative values
Called a “chi-squared distribution”.
Johns Hopkins Carey Business School_Probability_ Slide 9
Examples of chi-squared distributions
Johns Hopkins Carey Business School_Probability_ Slide 10
Chi-Squared: Software and Table
Excel functions:
• CHIDIST(number for lookup, df)
• convert the variable value to a probability
• CHIINV(probability, df)
• convert the variable value to a probability
Table
Johns Hopkins Carey Business School_Probability_ Slide 11
Chi-squared test for goodness of fit: Formal Set-up
•
Each of n items (400 consumers) is classified into one of k
groups (4 brands)
•
Goal: to test
H0: p1, …, pk are the true probabilities
(p1+…+ pk = 1)
•
fi (i th observed frequency): observed counts in group i.
(i = 1, 2, …, k)
•
Ei = npi (i th expected frequency): expected number in
group i, if pi is indeed the true probability
•
Intuition: Compare fi’s to Ei’s, to see whether the
observed and expected are consistent.
Johns Hopkins Carey Business School_Probability_ Slide 12
Chi-square test for goodness of fit: steps
H0: probabilities are p1, p2, … , pk
Ha: the null hypothesis is not true
1. Compute the test statistic:
2. Find the p-value (with software/table) using the chi-square
distribution with (k – 1) degrees of freedom
3. Reject H0 at significance level α if p-value < α
Note: Large values of the test statistic
provide
evidence
against
H0. Distributions _ Slide 13
Johns Hopkins
Carey Business
School _ Sampling
and Sampling
Fashion Store Example
2
(𝑓𝑓
−
𝐸𝐸
)
𝑖𝑖
𝑖𝑖
𝜒𝜒 2 = �
𝐸𝐸𝑖𝑖
H0: p1 = 0.20, p2 = 0.35, p3 = 0.30, p4 = 0.15
Let α = 0.05
df = (4-1) = 3
𝑖𝑖=1,…,𝑘𝑘
fi
Ha: H0 fails to hold
Brand
Distribution
MD
Frequency
Expected
Frequency
1
20%
102
80
6.05
2
35%
121
140
2.58
3
30%
120
120
0
4
15%
57
60
0.15
Total
100%
400
400
8.78
p-value = 0.0324 < α = 0.05
Ei = npi
Decision: Reject H0 at the 0.05 significance level
Interpretation: our evidence does not support the null
hypothesis that MD consumer preference follows the
same distribution as in CA Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 14
Exercise
A manager wishes to test whether employee absence is equally
likely across day of the week (Monday through Friday). She
collected the counts of the number of days of absence of 200
employees by day of the week.
• What are the hypotheses?
H0: p1 = 20%, p2 = 20%, … , p5 = 20%
Ha: the null hypothesis is not true
• What is the degrees of freedom, df ?
We have k = 5 groups (Monday through Friday), so df = k-1=4.
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 15
Additional Note 1: Size Guidelines for goodness of fit test
The test generally works well when all the expected
counts (Ei) are at least 5
• What if not?
Q: df = ?
• We can group some categories
df = 4 - 1 = 3
Category
Expected
frequencies
1
2
2
2.3
3
1.2
4
29.2
5
10.1
6
5.2
Total
50
Category
Expected
frequencies
1, 2 or 3
5.5
4
29.2
5
10.1
6
5.2
Total
50
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 16
Additional Note 2: Estimating Parameters from the Sample
Example: Officials of the SAT test claims that the scores
are normally distributed. Consider a sample of 200
writing scores.
Score
Frequencies
200-350
21
351-425
24
426-500
45
501-575
67
576-650
22
651-800
21
Total
200
Challenge: we do not know
the parameters for the
normal distribution!
Need to estimate
!
Solution: we estimate those
using
, calculated
from the sample. Then do
chi-square test.
Now df = k – 1 – m
=6–1–2
=3
m = 2 is # of
parameters
estimated
from sample
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 17
Chi-Squared Tests
• How should a fashion store compare consumer
preferences in Maryland with the known distribution
(based on historical market shares) in California?
Chi-squared test for goodness of fit
• A financial institution sells several kinds of investment
products. How do we examine whether customer
satisfaction depends on the type of investment product
purchased?
Chi-squared test for independence
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 18
Overview – Class 5
1. Chi-Squared Test for Goodness-of-fit
2. Chi-Squared Test for Independence
• One-Way Analysis of Variance (ANOVA) [Optional]
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 19
Example
A financial institution sells several kinds of investment products.
Does client
satisfaction depend on
investment fund type?
Client Satisfaction
Fund
High
Low
Med
All
Bond
15
3
12
30
Stock
24
2
4
30
TaxDef
1
15
24
40
All
40
20
40
100
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 20
Chi-Squared Test for Independence
H0: there is independence between the row variable and the
column variable.
Ha: the two variables are dependent
Again, We need a measure of how much the observed data
deviates from what we would expect under H0
Observed Freq.
Expected Freq.
Fund
High
Low
Med
All
Fund
High
Low
Med
All
Bond
15
3
12
30
Bond
12
6
12
30
Stock
24
2
4
30
Stock
12
6
12
30
TaxDef
1
15
24
40
TaxDef
16
8
16
40
All
40
20
40
100
All
40
20
40
100
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 21
Chi-Squared Test Statistic
𝜒𝜒 2 =
(𝑓𝑓𝑖𝑖𝑖𝑖 − 𝐸𝐸�𝑖𝑖𝑖𝑖 )2
�
=
�
𝐸𝐸𝑖𝑖𝑖𝑖
𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑟𝑟𝑖𝑖 ×𝑐𝑐𝑗𝑗
�
in which 𝐸𝐸𝑖𝑖𝑖𝑖 =
(obs. freq – exp. freq)2
�
exp. freq
𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑛𝑛
Larger 𝜒𝜒 2
row 𝑖𝑖 total × column 𝑗𝑗 total
i.e., exp. freq at cell 𝑖𝑖𝑖𝑖 =
𝑛𝑛
Smaller p
More likely to reject
More evidence against H0
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 22
Degrees of Freedom
Degrees of Freedom:
df = (r - 1) × (c - 1)
(r is the number of rows in the table, and c is the number of
columns)
Reject H0 at significance level α if p-value < α
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 23
Client Satisfaction Example
H0: Client satisfaction is independent of fund type
Ha: Client satisfaction is not independent of fund type
Observed Freq.
Fund
High
Low
Med
Bond
15
3
12
Stock
24
2
4
TaxDef
1
15
24
All
40
20
40
𝜒𝜒 2 =
All
30
30
40
100
Expected Freq.
Fund
High
Low
Med
Bond
12
6
12
Stock
12
6
12
TaxDef
16
8
16
All
40
20
40
All
30
30
40
100
(𝑓𝑓𝑖𝑖𝑖𝑖 − 𝐸𝐸�𝑖𝑖𝑖𝑖 )2 (15 – 12)2 (3 – 6)2
24– 16 2
+
+ ⋯+
≈ 46.44
�
=
�
12
6
16
𝐸𝐸𝑖𝑖𝑖𝑖
𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
p-value = 0.0000 < α = 0.05
Decision: Reject H0 at the 0.05 significance level
df = (3-1)(3-1) = 4
Interpretation: our evidence suggests that client satisfaction is dependent
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 24
of fund type.
Size Guidelines for independence test
The test generally works well when all the expected cell
counts (𝐸𝐸�𝑖𝑖𝑖𝑖 ) are 5 or more
• If not: again we can combine some rows/columns then do
the chi-square test based on the transformed data.
Johns Hopkins Carey Business School _ Sampling and Sampling Distributions _ Slide 25
Execrise
Suppose you obtain a simple random sample of 50 working DC
residents, and you want to test whether the distribution of job
satisfaction level (1, 2, …, 5) stays the same across three age
groups (
Why Work with Us
Top Quality and Well-Researched Papers
We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.
Professional and Experienced Academic Writers
We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.
Free Unlimited Revisions
If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.
Prompt Delivery and 100% Money-Back-Guarantee
All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.
Original & Confidential
We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.
24/7 Customer Support
Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
Our Services
No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.
Essays
No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.
Admissions
Admission Essays & Business Writing Help
An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.
Reviews
Editing Support
Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.
Reviews
Revision Support
If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.