Math E x a m feb 23 8 am

course metrail attached

Advanced Business Statistics

▪ Introduction to

Hypothesis

Testing

(One Sample)

Winter 2022

Agenda

Introduction to Hypothesis Testing of the Mean

(One Sample)

❑ When σ is known

❑ When σ is unknown

❑ For Proportion

Inferential Statistics

• Population Means

• Population Proportion

Inferential Statistics

Estimating
Testing

Hypothesis

Concept of Hypothesis Testing

For many people, using term hypothesis testing seems likely new although

application of hypothesis testing and the concept underlying are quite familiar.

The best example is a criminal trial.

Usually, people face a trial if they are accused of a crime. The case is

presented by a prosecutor, and based on the presented evidence the jury

makes a decision.

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

In fact, this case is a test of hypothesis that the jury conducts. They actually

consider two hypotheses to be tested:

❑ Null hypothesis (𝐻0): The defendant is innocent

❑ Alternative hypothesis (𝐻𝐴 𝑜𝑟 𝐻1): The

defendant is guilty

In case of the decision, there are only two possible decisions, guilty

innocent, that the jury makes after reviewing the evidence presented by both

the prosecutor and defendant.

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

When the defendant is convicted

it means that the jury is rejecting

the null hypothesis in favor of the

alternative hypothesis

There is enough evidence

to conclude that the

defendant is guilty

When the defendant is acquitted

it means that the jury is not

rejecting the null hypothesis in

favor of the alternative hypothesis

There is not enough

evidence to conclude that

the defendant is guilty

Statistically

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

Important: generally, we interpret the result by saying that

there is not enough evidence to reject null hypothesis or

alternative hypothesis. We do not directly say that we accept

the null or alternative hypothesis.

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

Two possible errors may occur:

❑ Error type one: we reject a null hypothesis although it is true

❑ Error type two: we do not reject a null hypothesis although it is false

𝐻0is true 𝐻0is false

Reject 𝐻0
error type one

P(error type one) = α
Correct

decision

Not reject 𝐻0 Correct decision
error type two

P(error type two) = β

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

▪ There are two hypothesis; (1) null and (2) alternative

▪ In hypothesis testing, first, we start with the assumption that null hypothesis

is true.

▪ The main objective is to determine whether there is enough evidence to

reject 𝐻0 or 𝐻𝐴

▪ Two possible results are:

o there is enough evidence to support the alternative

o there is not enough evidence to support the alternative

▪ Two possible errors are:

o Reject a true null hypothesis, P(error type one) = α

o Not reject a false null hypothesis, P(error type two) = β

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

As mentioned before, we start with the assumption that null hypothesis is true.

For example, a police officer is testing the average speed of vehicle in a city is

85 km/hrs or not.

The null hypothesis is 𝐻0: 𝜇 = 85. But for alternative hypothesis there are

three possible situations: 𝐻1: 𝜇 > 85, 𝐻1: 𝜇 < 85, 𝐻1: 𝜇 ≠ 85.

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

To construct hypotheses, one of the three possible hypotheses may be asked:

𝐻0: 𝜇 = 85

𝐻1: 𝜇 > 85

𝐻0: 𝜇 = 85

𝐻1: 𝜇 < 85

𝐻0: 𝜇 = 85

𝐻1: 𝜇 ≠ 85

one-tail right

one-tail left

two-tail

Hypotheses

Testing

Instructor: Ahmad Teymouri All rights Reserved
Concept of Hypothesis Testing

The rejection

region

is a range of values such that if the test statistic falls into

that range, we decide to reject the null hypothesis in favor of the alternative

hypothesis.

one-tail right one-tail left two-tail

𝑍𝛼 − 𝑍𝛼 − 𝑍𝛼/2 𝑍𝛼/2

𝑯𝟎

rejection

region

𝑯𝟎 rejection
region
𝑯𝟎 rejection
region

1- α = confidence level

α = significance

Testing Population Mean (µ) when Population Standard
Deviation (σ) is Known – Main Steps

Construct

hypotheses

𝐻0: 𝜇 = 𝑎

𝐻1: 𝜇 > 𝑎

𝐻0: 𝜇 = 𝑎

𝐻1: 𝜇 < 𝑎

𝐻0: 𝜇 = 𝑎

𝐻1: 𝜇 ≠ 𝑎

or
or

Draw appropriate z-normal graph and define the

location of 𝑍.

one-tail right one-tail left two-tail
𝑍𝛼 − 𝑍𝛼 − 𝑍𝛼/2 𝑍𝛼/2

Find the z value (z critical) from Normal table and put it

on the graph. For on-tails 𝑍𝛼 and for two-tail 𝑍𝛼/2.

1 2

Compute Z-stat:

𝑍𝑠𝑡𝑎𝑡 =

ത𝑋 − 𝜇

ൗ
𝜎

𝑛

Put the value of Z-stat on

the graph.

4
One-tail right: If 𝑍𝑠𝑡𝑎𝑡 > 𝑍𝛼 , there is enough

evidence to reject

𝐻0.

One-tail left: If 𝑍𝑠𝑡𝑎𝑡 < −𝑍𝛼 , there is enough

evidence to reject 𝐻0.

Two-tail: If 𝑍𝑠𝑡𝑎𝑡 > 𝑍𝛼/2 or 𝑍𝑠𝑡𝑎𝑡 < −𝑍𝛼/2, there

enough evidence to reject 𝐻0.

5
Make

decision
rejection
region
rejection
region
rejection
region

Example 1

Conduct the following test and interpret the result.

𝐻0: 𝜇 = 800

𝐻1: 𝜇 > 800

σ = 150 ത𝑋 = 770 𝑛 = 100 𝛼 = 0.05

The test is one-tail right.

For z critical, we use Normal table.

For z-stat, we use the formula:

z-stat < z-critical. Therefore z-stat does not fall in rejection region. There is not enough

evidence to reject 𝐻0.
rejection

region

𝑍𝛼 = 𝑍0.05 = −1.64

𝟏.𝟔𝟒
𝑍𝑠𝑡𝑎𝑡 =

ത𝑋 − 𝜇
ൗ
𝜎
𝑛

=
770 − 800

150/ 100
= −2

−𝟐

Example 2

Conduct the following test and interpret the result.

𝐻0: 𝜇 = 12

𝐻1: 𝜇 < 12 σ = 4 ത𝑋 = 11 𝑛 = 144 𝛼 = 0.01

The test is one-tail left.

For z critical, we use Normal table.
For z-stat, we use the formula:

z-stat < z-critical. Therefore z-stat falls in rejection region. There is enough evidence to reject

𝐻0.

𝑍𝛼 = 𝑍0.01 = −2.33

𝑍𝑠𝑡𝑎𝑡 =
ത𝑋 − 𝜇

ൗ
𝜎
𝑛

=
11 − 12

4/ 144
= −3

rejection
region

−𝟑 −𝟐.𝟑𝟑

Example 3

Conduct the following test and interpret the result.

𝐻0: 𝜇 = 50,000

𝐻1: 𝜇 ≠ 50,000

σ = 8,000 ത𝑋 = 51,150 𝑛 = 200 𝛼 = 0. 1

The test is two tail.

For z critical, we use Normal table.
For z-stat, we use the formula:

z-stat > z-critical. Therefore z-stat falls in rejection region. There is enough evidence to reject

𝐻0.

𝑍𝛼/2 = 𝑍0.1/2 = 𝑍0.05 = −1.64

𝑍𝑠𝑡𝑎𝑡 =
ത𝑋 − 𝜇
ൗ
𝜎
𝑛

=
51,050 − 50,000

8000/ 200
= 2.03

rejection
region

+𝟏.𝟔𝟒−𝟏.𝟔𝟒

rejection
region

𝟐.𝟎𝟑

Example 4

Example 04: A business school claims that, on average, the required

GMAT score for an MBA student is more than 600. To examine the

claim, a MBA student asks a random sample of her 16 classmates

about their GMAT score. The results are exhibited here.

680 620 570 585 590 600 600 650

630 590 590 610 600 600 580 640

Can the student conclude at the 5% significance level that the claim is

true, assuming that GMAT score is normally distributed with a standard

deviation of 35?

Instructor: Ahmad Teymouri All rights Reserved
Example 4

𝐻0: 𝜇 = 600

𝐻1: 𝜇 > 600

σ = 35 ത𝑋 =
σ 𝑥

𝑛
=
9735

16
= 608.43 𝑛 = 16 𝛼 = 0.05

The test is one-tail right.
For z critical, we use Normal table.
For z-stat, we use the formula:
z-stat < z-critical. Therefore z-stat does not fall in rejection region. There is not enough evidence to reject 𝐻0. 𝑍𝛼 = 𝑍0.05 = −1.64 𝑍𝑠𝑡𝑎𝑡 = ത𝑋 − 𝜇 ൗ 𝜎 𝑛

=
608.43 − 600

35/ 16
= 0.963

rejection
region

𝟏.𝟔𝟒𝟎. 𝟗𝟔𝟑

Testing Population Mean (µ) when Population Standard
Deviation (σ) is Unknown – Main Steps

Construct
hypotheses
𝐻0: 𝜇 = 𝑎
𝐻1: 𝜇 > 𝑎
𝐻0: 𝜇 = 𝑎
𝐻1: 𝜇 < 𝑎 𝐻0: 𝜇 = 𝑎 𝐻1: 𝜇 ≠ 𝑎 or or

Draw appropriate student-t graph and define the

location of 𝑡.

one-tail right one-tail left two-tail

𝑡𝛼 − 𝑡𝛼 − 𝑡𝛼/2 𝑡𝛼/2
Find the t value (t critical) from student-t table and put

it on the graph. For on-tails 𝑡𝛼 and for two-tail 𝑡𝛼/2.

Degree of freedom is n-1.

1 2
3

Compute t-stat:

𝑡𝑠𝑡𝑎𝑡 =
ത𝑋 − 𝜇

ൗ
𝑠

𝑛
Put the value of t-stat on

the graph.

4
One-tail right: If 𝑡𝑠𝑡𝑎𝑡 > 𝑡𝛼 , there is enough
evidence to reject 𝐻0.

One-tail left: If 𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼 , there is enough evidence to reject 𝐻0.

Two-tail: If 𝑡𝑠𝑡𝑎𝑡 > 𝑡𝛼/2 or 𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼/2, there is

enough evidence to reject 𝐻0.
5
Make
decision
rejection
region
rejection
region
rejection
region

Example 5

A nurse claims that the average American is less than 10 kg overweight. A

random sample of 11 Americans was weighed to measure the difference

between their actual and ideal weights. The results (kg) are exhibited here.

9 11 12 10 8.5 8 8 7 8 9 8

Can the nurse conclude that her claim is true? She has considered 90%

confidence level.

Instructor: Ahmad Teymouri All rights Reserved
Example 5

𝐻0: 𝜇 = 10

𝐻1: 𝜇 < 10

ത𝑋 =
σ𝑥

𝑛
=
98.5

11
= 8.95 𝑠 = 1.49 Excel function STDEV.S 𝑛 = 11 1 − 𝛼 = 0.9 𝛼 = 0.1

The test is one-tail left.

For t critical, we use t table. Degree of freedom is 11-1=10

For t-stat, we use the formula:

t-stat < t-critical. Therefore t-stat falls in rejection region. There

is enough evidence to reject 𝐻0.

𝑡𝛼 = 𝑡0.1 = 1.37

𝑡𝑠𝑡𝑎𝑡 =
ത𝑋 − 𝜇
ൗ
𝑠
𝑛

=
8.95 − 10

1.49/ 11
= −2.32

First, we construct the hypothesis:

rejection
region

−𝟐.𝟑𝟐 −𝟏.𝟑𝟕

Testing Population Proportion – Main Steps

Construct
hypotheses

𝐻0: 𝑝 = 𝑏

𝐻1: 𝑝 > 𝑏

𝐻0: 𝑝 = 𝑏

𝐻1: 𝑝 < 𝑏

𝐻0: 𝑝 = 𝑏

𝐻1: 𝑝 ≠ 𝑏

or
or
Draw appropriate z-normal graph and define the
location of 𝑍.
one-tail right one-tail left two-tail
𝑍𝛼 − 𝑍𝛼 − 𝑍𝛼/2 𝑍𝛼/2
Find the z value (z critical) from Normal table and put it
on the graph. For on-tails 𝑍𝛼 and for two-tail 𝑍𝛼/2.
1 2
3
Compute Z-stat:

𝑍𝑠𝑡𝑎𝑡 =
Ƹ𝑝 − 𝑝

𝑝(1 − 𝑝)/𝑛

Put the value of Z-stat on
the graph.

4
One-tail right: If 𝑍𝑠𝑡𝑎𝑡 > 𝑍𝛼 , there is enough
evidence to reject 𝐻0.

One-tail left: If 𝑍𝑠𝑡𝑎𝑡 < −𝑍𝛼 , there is enough evidence to reject 𝐻0.

Example 6

An insurance company wants to know what proportion of drivers in a city has at

least one police ticket because of passing speed limit. The finance department

claims that more than three-quarter of the drives falls in this group.

As a test, a random sample of 200 cars that has auto insurance with that

company was selected. They found that 143 drivers have police ticket because

of passing speed limit.

Does the insurance company have enough evidence at the 10% significance

level to support its belief?

Instructor: Ahmad Teymouri All rights Reserved
Example 6

𝐻0:𝑝 = 0.75

𝐻1: 𝑝 > 0.75

𝛼 = 0.1 𝑛 = 200 𝑥 = 143 Ƹ𝑝 =
𝑥

𝑛
=
143

200
= 0.71

The test is one-tail right proportion test.

For z critical, we use Normal table.
For z-stat, we use the formula:

z-stat > z-critical. Therefore z-stat does not fall in rejection region. There is not

enough evidence to reject 𝐻0.
rejection

region𝑍𝛼 = 𝑍0.1 = −1.28

𝟏.𝟐𝟖

𝑍𝑠𝑡𝑎𝑡 =
Ƹ𝑝 − 𝑝

𝑝(1 − 𝑝)/𝑛
=

0.71 − 0.75

0.75(1 − 0.75)/200
= −1.3

−𝟏. 𝟑

In Class Activity 1

Because television audiences of newscasts tend to be older (and

because older people suffer from a variety of medical ailments)

pharmaceutical companies’ advertising often appears on national news

in the three networks (ABC, CBS, and NBC). The ads concern

prescription drugs such as those to treat heartburn. To determine how

effective the ads are, a survey was undertaken. Adults over 50 who

regularly watch network newscasts were asked whether they had

contacted their physician to ask about one of the prescription drugs

advertised during the newscast. The responses (1 = No and 2 = Yes)

were recorded. Estimate with 95% confidence the fraction of adults over

50 who have contacted their physician to inquire about a prescription

drug.

In Class Activity 2

A random sample of 18 young adult men (20–30 years old) was sampled.

Each person was asked how many minutes of sports he watched on television

daily. The responses are listed here. It is known that σ = 10. Test to determine

at the 5% significance level whether there is enough statistical evidence to

infer that the mean amount of television watched daily by all young adult men

is greater than 50 minutes.

50 48 65 74 66 37 45 68 64

65 58 55 52 63 59 57 74 65

In Class Activity 3

A manufacturer of lightbulbs advertises that, on average, its long-life

bulb will last more than 5,000 hours. To test the claim, a statistician took

a random sample of 100 bulbs and measured the amount of time until

each bulb burned out. If we assume that the lifetime of this type of bulb

has a standard deviation of 400 hours, can we conclude at the 5%

significance level that the claim is true?

In Class Activity 4

Companies that sell groceries over the Internet are called e-grocers.

Customers enter their orders, pay by credit card, and receive delivery by truck.

A potential e-grocer analyzed the market and determined that the average

order would have to exceed $85 if the e-grocer were to be profitable. To

determine whether an e-grocery would be profitable in one large city, she

offered the service and recorded the size of the order for a random sample of

customers. Can we infer from these data that an e-grocery will be profitable in

this city?

References

• Business Statistics in Practice: Second Canadian Edition, Bowerman,

O’Connell, et al. McGraw-Hill, Third Canadian Edition

• G. Keller (2017) Statistics for Management and Economics (Abbreviated),

11th Edition, South-Western (students can also use the 8th edition of the

same textbook).

• M. Middleton (1997) Data Analysis Using Microsoft Excel, Duxbury. (A good

reference for basic statistical work with Excel.)

Thank you

Advanced Business Statistics

▪ Continuous

Probability Distributions

▪

Normal Distribution

▪ Student t Distribution

▪ Data Collection and

Sampling

Winter 2022

Agenda

❑ Review Distributions

❑ Normal Distribution

❑ t-Student Distribution

❑ Sampling and Data Collection

Random Variables

A random variable is a function or rule that assigns a

number to each outcome of an experiment.

Alternatively, the value of a random variable is a

numerical event.

Two Types of Random Variables:

– Discrete Random Variable

– one that takes on a countable number of values

– E.g. values on the roll of dice: 2, 3, 4, …, 12

– Continuous Random Variable

– one whose values are not discrete, not countable

– E.g. time (30.1 minutes? 30.10000001 minutes?)

Instructor: Ahmad Teymouri All rights Reserved
Probability Distributions

A probability distribution is a table, formula, or graph that describes the

values of a random variable and the probability associated with these

values.

➢ Discrete (Binomial, Poisson, …)

Discrete variable can take on a countable number of values.

➢ Continuous (Uniform, Normal, …)

Continuous is one whose values are uncountable and have an infinite

continuum of possible values.

An upper-case letter will represent the name of the random variable,

usually X. Its lower-case counterpart will represent the value of the

random variable.

The probability that the random variable X will equal x is → P(X = x)

Probability

Notation

0 ≤ 𝑃 𝑥 ≤ 1 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 ෍

𝑎𝑙𝑙 𝑥𝑖

𝑃 𝑥 = 1𝑎𝑛𝑑

Instructor: Ahmad Teymouri All rights Reserved
Probability Distributions

Examples of discrete variables:

❖ number of defective items produced during a week (possible values

0,1,2,…)

❖ result of the toss of a fair die (1,2,3,4,5,6)

❖ result of the flip of a coin (tails = 0, heads = 1)

❖ budget for a project when there are 3 alternatives ($25,000 , $40,000

and $50,000)

Example

The Statistical Abstract of the United States is published annually. It contains

a wide variety of information based on the census as well as

other

sources.

The objective is to provide information about a variety of different aspects of

the lives of the country’s residents. One of the questions asks households to

report the number of persons living in the household. The following table

summarizes the data. Develop the probability distribution of the random

variable defined as the number of persons per household.

1 2 3 4 5 6

7 or

Total

31.1 38.6 18.8 16.2 7.2 2.7 1.4 116

Number of Persons

Number of Household

(Millions)

Instructor: Ahmad Teymouri All rights Reserved
Probability Distributions

X P(X)

1
31.1

116
=0.268

2
38.6

116
=0.333

3
18.8

116
=0.162

4
16.2

116
=0.140

5
7.2

116
=0.062

6
2.7

116
=0.023

7 or
more

1.4

116
=0.012

P X ≤ 3 = 0.162 + 0.333 + 0.268 = 0.763

The probability that a household has 3 or less

persons:

P X = 6 = 0.023

The probability that a household has 6 persons:

P X ≥ 5 = 0.062 + 0.023 + 0.012 = 0.097

The probability that a household has 5 or more

persons:

In Class Activity

A survey of Amazon.com shoppers reveals the following probability distribution

of the number of books purchased per hit.

a. What is the probability that an Amazon.com visitor will buy four books?

b. What is the probability that an Amazon.com visitor will buy eight books?

c. What is the probability that an Amazon.com visitor will not buy any books?

d. What is the probability that an Amazon.com visitor will buy at least one

book?

x 0 1 2 3 4 5 6 7

P(x) 0.35 0.25 0.20 0.08 0.06 0.03 0.02 0.01

Instructor: Ahmad Teymouri All rights Reserved
In Class Activity

A university librarian produced the following probability distribution of the

number of times a student walks into the library over the period of a semester.

a. P( X ≥ 20 )

b. P( X = 60 )

c. P( X > 50 )

d. P( X > 100 )

x 0 5 10 15 20 25 30 40 50 75

100

P(x) 0.22 0.29 0.12 0.09 0.08 0.05 0.04 0.04 0.03 0.03 0.01

Continuous Distributions

Unlike a discrete random variable, a continuous random variable is one that

can assume an uncountable number of values.

❑ We cannot list the possible values because there is an infinite

number of them.

❑ Because there is an infinite number of values, the probability of each

individual value is virtually 0.

❑ Thus, we can determine the probability of a range of values only.

Probability Density Functions

A function f(x) is called a probability density function (over the range a ≤ x ≤ b

if it meets the following requirements:

• f(x) ≥ 0 for all x between a and b

• The total area under the curve between a and b is 1.0

f(x)

x
ba

area=1

Instructor: Ahmad Teymouri All rights Reserved
Normal Distribution

The normal distribution, with the well known “bell-shaped” curve, is defined by

its mean 𝜇 and its standard deviations σ. Its probability density function is

given by:

where 𝑒 = 2.71828…

𝜋 = 3.14159…

▪ Bell-shaped

▪ Symmetric about the mean

𝜇

▪ Total area under the curve is equal1

𝑓 𝑥 =
1

𝜎

2𝜋
𝑒
−
1
2
(
𝑥−𝜇
𝜎

𝜇
𝜎

Instructor: Ahmad Teymouri All rights Reserved
Normal Distribution

𝜇 = 5 𝜇 = 9 𝜇 = 13

Normal Distributions with the Same Variance but Different Means

Normal Distributions with the Same Means but Different

Variances

𝜎 = 8

𝜎 =13

𝜎 =16

Example 2

The daily hours spent on computer game of 100 students are shown in the

below table.

Student # hours Student # hours Student # hours Student # hours Student # hours

1 1.53 21 4.11 41 2.59 61 3.39 81 4.50

2 4.51 22 4.36 42 6.51 62 6.00 82 4.51

3 3.11 23 3.59 43 4.45 63 3.58 83 4.58

4 6.13 24 4.26 44 6.36 64 8.50 84 5.00

5 4.10 25 4.43 45 5.24 65 7.22 85 5.28

6 4.11 26 4.22 46 5.00 66 4.10 86 7.00

7 6.33 27 6.45 47 2.50 67 5.00 87 1.25

8 3.00 28 4.00 48 4.00 68 6.00 88 2.41

9 4.59 29 4.51 49 5.44 69 6.27 89 6.00

10 4.05 30 5.25 50 6.43 70 5.11 90 3.43

11 5.44 31 4.28 51 3.50 71 3.24 91 4.44

12 5.21 32 2.50 52 5.00 72 6.00 92 4.18

13 6.00 33 3.00 53 3.21 73 5.16 93 2.55

14 5.12 34 5.24 54 3.30 74 3.43 94 4.34

15 5.00 35 7.00 55 3.34 75 5.14 95 6.00

16 5.44 36 4.00 56 3.05 76 4.51 96 3.09

17 5.12 37 6.37 57 5.26 77 4.00 97 5.00

18 4.24 38 5.05 58 6.53 78 1.10 98 6.42

19 5.00 39 5.00 59 3.37 79 5.00 99 5.11

20 2.35 40 4.48 60 7.15 80 3.21 100 0.15

Instructor: Ahmad Teymouri All rights Reserved
Normal Distribution

Hours Frequency

0 < X ≤ 1 1

1 < X ≤ 2 3

2 < X ≤ 3 6

3 < X ≤ 4 17

4 < X ≤ 5 27

5 < X ≤ 6 25

6 < X ≤ 7 16

7 < X ≤ 8 4

8 < X ≤ 9 1

100

0 1 2 3 4 5 6 7 8 9

Instructor: Ahmad Teymouri All rights Reserved
Normal Distribution

• The probability of spending less than 3

hours on computer game:

• The probability of spending more than 6

hours on computer game:

• The probability of spending between 4

and 7 hours on computer game:

The Empirical Rule

The 68-95-99.7 Rule (the Empirical Rule)

In bell-shaped distributions, about 68% of the values fall within one

standard deviation of the mean, about 95% of the values fall within two

standard deviations of the mean, and about 99.7% of the values fall

within three standard deviations of the mean.

Standard Normal Distribution

To calculate the probability that a normal random variable falls into any

interval, the area in the interval under the curve must be computed. The

normal distribution function is not as simple as other distribution.

The random variable can be standardized by subtracting its mean µ and

dividing by its standard deviation σ and amount of the probability is

extracted from the normal

distribution table

When the variable is normal, the transformed variable is called a “standard

normal” random variable and denoted by Z whose μ = 0 and σ= 1; that is:

𝑧 =
𝑋 − 𝜇

𝜎

Instructor: Ahmad Teymouri All rights Reserved
Standard Normal Distribution

P( Z ≤ -2.45 ) = 0.0071

P( Z ≤ -1.28 ) = 0.1003

0- 2.45

0- 1.28

Example 3

X is normally distributed with mean 100 and standard deviation 20. What is

the probability that X is less than 145?

𝑃 𝑋 < 145 = 𝑃 𝑋 − 100

20
< 145 − 100

20
= 𝑃 𝑍 < 2.25 = 0.9878

From normal

distribution table

Example 4

X is normally distributed with mean 1,000 and standard deviation 250. What is

the probability that

X lies between 800 and 1,100?

𝑃 800 < 𝑋 < 1100 = 𝑃 800 − 1000

250

< 𝑋 − 1000

250
< 1100 − 1000

250

= 𝑃 −0.8 < 𝑍 < 0.4 = 𝑃 𝑍 < 0.4 − 𝑃 𝑍 < −0.8

= 0.6554 − 0.2119 = 0.4435

From normal
distribution table

Example 5

The lifetimes of light bulbs that are advertised to last for 5,000 hours are

normally distributed with a mean of 5,100 hours and a standard deviation of 200

hours. What is the probability that a bulb lasts longer than the advertised figure?

𝑃 𝑋 > 5000 = 𝑃
𝑋 − 5100

200
>
5000 − 5100

200
= 𝑃 𝑍 > −0.5

= 1 − 𝑃 𝑍 < −0.5 = 1 − 0.3085 = 0.6915

From normal
distribution table

Instructor: Ahmad Teymouri All rights Reserved
In Class Activity

a. P(Z < 2.23)

b. P(Z > 1.87)

c. P(1.04 < Z < 2.03)

Instructor: Ahmad Teymouri All rights Reserved
In Class Activity

The long-distance calls made by the employees of a company are normally

distributed with a mean of 6.3 minutes and a standard deviation of 2.2

minutes. Find the probability that a call

a. lasts between 5 and 10 minutes.

b. lasts more than 7 minutes.

c. lasts less than 4 minutes.

Finding Values of Z

Often, we are asked to find some value of Z for a given probability, i.e.

given an area (A) under the curve, what is the corresponding value of

z (zA) on the horizontal axis that gives us this area? That is:

𝑃 𝑍 > 𝑍𝐴 = 𝐴

Instructor: Ahmad Teymouri All rights Reserved
Finding Values of Z

Example 6: Because of the relatively high interest rates, most consumers

attempt to pay off their credit card bills promptly. However, this is not always

possible. An analysis of the amount of interest paid monthly by a bank’s Visa

cardholders reveals that the amount is normally distributed with a mean of $27

and a standard deviation of $7.

What interest payment is exceeded by only 20% of the bank’s Visa

cardholders?

𝜇 = 27 𝜎 = 7

𝑃 𝑍 > 𝑧𝐴 = 0.2 → 1 − 𝑃 𝑍 > 𝑧𝐴 = 0.2 → 𝑃 𝑍 < 𝑧𝐴 = 0.8 𝑧𝐴 = 0.84

𝑧 =
𝑋 − 𝜇

𝜎
→ 0.84 =

𝑋 − 27

7
→ 𝑋 = 32.88

Interest payment is exceeded by only 20% of the

bank’s Visa cardholders is $32.88.

Student’s t Distribution

In probability and statistics, Student’s t-distribution (or simply the t-distribution)

is any member of a family of continuous probability distributions that arises

when estimating the mean of a normally distributed population in situations

where the sample size is small and the population standard deviation is

unknown.

The density function for the Student t distribution is as follows

𝑓 𝑡 =
Γ[
𝜈 + 1
2

]

𝜐𝜋Γ(

𝜈

2
)

1 +

𝑡2

𝜈

−(
𝜈 + 1

2
)

𝜈 (nu) is called the degrees of freedom

Γ (Gamma function) is Γ(k)=(k-1)(k-2)…(2)(1)

Instructor: Ahmad Teymouri All rights Reserved
Student’s t Distribution

In much the same way that µ and σ define the normal distribution, ν, the

degrees of freedom, defines the Student t Distribution:

As the number of degrees of freedom increases, the t distribution approaches

the standard normal distribution.

Determining Student t Values

The student t distribution is used extensively in statistical inference.

T-distribution Table in appendix lists values of .

That is, values of a Student t random variable with degrees of freedom

such that:

𝑡𝐴,

𝜐

𝑃(𝑡 > 𝑡𝐴,𝜐) = 𝐴

The values for A are pre-determined “critical”

values, typically in the 10%, 5%, 2.5%, 1%

and 0.5% range.

Determining Student t Values

Instructor: Ahmad Teymouri All rights Reserved
Determining Student t Values

For example, if we want the value of t with 10 degrees of freedom such that

the area under the Student t curve is 0.05:

t0.05,10=1.812

Instructor: Ahmad Teymouri All rights Reserved
In Class Activity

Use the t table to find the following values of t.

–

–
–
–

𝑡0.10,15

𝑡0.10,23

𝑡0.025,83

𝑡0.05,195

Data, Statistics, and Information

Statistics is a tool for converting data into information:

• But where then does data come from?

• How is it gathered?

• How do we ensure its accurate?

• Is the data reliable?

• Is it representative of the population from which it was drawn?

Data Statistics Information

Methods of Collecting Data

There are many methods used to collect or obtain data for statistical

analysis. Three of the most popular methods are:

Surveys
Direct Observation (E.g. Number

of customers entering a bank per hour)

Experiments (E.g. new ways to

produce things to minimize costs)

Questionnaire Design – Key Principles

1. Keep the questionnaire as short as possible.

2. Ask short, simple, and clearly worded questions.

3. Start with demographic questions to help respondents get started

comfortably.

4. Use dichotomous (yes/no) and multiple-choice questions.

5. Use open-ended questions cautiously.

6. Avoid using leading-questions.

7. Pretest a questionnaire on a small number of people.

8. Think about the way you intend to use the collected data when

preparing the questionnaire.

Simple Random Sampling

A simple random sample is a sample selected

in such a way that every possible sample of

the same size is equally likely to be chosen.

For example, drawing three names from a hat

containing all the names of the students in the

class is an example of a simple random

sample: any group of three names is as

equally likely as picking any other group of

three names.

Instructor: Ahmad Teymouri All rights Reserved
Sampling

Sampling is a sub-set of a whole population. Sampling is often done for two

reasons:

• Cost → it’s less expensive to sample 1,000 television viewers

than 100 million TV viewers

• Practicality → performing a crash test on every automobile

produced is impractical

Three Sampling Methods

• Simple random sampling

• Stratified random sampling

• Cluster sampling

Stratified Random Sampling

A stratified random sample is obtained by separating the population into

mutually exclusive sets, or strata, and then drawing simple random samples

from each stratum.

Strata 1 : Gender

Male

Female

Strata 2 : Age

less than 20

20-30

31-40

41-50

51-60

More than 60

Strata 3 : Occupation

professional

clerical

blue collar

other

Cluster Random Sampling

A cluster sample is a simple random

sample of groups or clusters of elements

(vs. a simple random sample of individual

objects).

This method is useful when it is difficult or

costly to develop a complete list of the

population members or when the

population elements are widely dispersed

geographically.

Two major types of error can arise when a sample of observations is taken

from a population:

Sampling Error refers to differences between the sample and the population

that exist only because of the observations that happened to be selected for

the sample.

Increasing the sample size will reduce this error.

Non-sampling Errors are more serious and are due to mistakes made in the

acquisition of data or due to the sample observations being selected

improperly. Three types of non-sampling errors:

Increasing the sample size will not reduce this type of error.

Sampling and Non-Sampling Errors

• Errors in data acquisition

• Nonresponse errors

• Selection bias

References

• Business Statistics in Practice: Second Canadian Edition, Bowerman,

O’Connell, et al. McGraw-Hill, Third Canadian Edition

• G. Keller (2017) Statistics for Management and Economics (Abbreviated),

11th Edition, South-Western (students can also use the 8th edition of the

same textbook).

• M. Middleton (1997) Data Analysis Using Microsoft Excel, Duxbury. (A good

reference for basic statistical work with Excel.)

Thank you

Advanced Business Statistics

▪ Introduction to Hypothesis Testing (Two Sample

)

Winter 2022

Agenda

Introduction to Hypothesis Testing of the

Deference Means (Two Sample)

❑ When 𝜎1 and 𝜎2 are known

❑ When 𝜎1 and 𝜎2 are unknown

• Considering equal variances

• Considering unequal variances

❑ For Proportion

Concept of Hypothesis Testing – Review

▪ There are two hypothesis; (1) null and (2) alternative

▪ In hypothesis testing, first, we start with the assumption that null

hypothesis is true.

▪ The main objective is to determine whether there is enough evidence to

reject 𝐻0 or 𝐻𝐴

▪ Two possible results are:

o there is enough evidence to support the alternative

o there is not enough evidence to support the alternative

▪ Two possible errors are:

o Reject a true null hypothesis, P(error type one) = α

o Not reject a false null hypothesis, P(error type two) = β

𝐻0: 𝜇 = 𝑎

𝐻1: 𝜇 > 𝑎

𝐻0: 𝜇 = 𝑎

𝐻1: 𝜇 < 𝑎

𝐻0: 𝜇 = 𝑎

𝐻1: 𝜇 ≠ 𝑎

𝐻0: 𝜇 = 𝑎
𝐻1: 𝜇 > 𝑎
𝐻0: 𝜇 = 𝑎
𝐻1: 𝜇 < 𝑎 𝐻0: 𝜇 = 𝑎 𝐻1: 𝜇 ≠ 𝑎

𝐻0:𝑝 = 𝑏

𝐻1:𝑝 > 𝑏

𝐻0:𝑝 = 𝑏

𝐻1:𝑝 < 𝑏

𝐻0:𝑝 = 𝑏

𝐻1:𝑝 ≠ 𝑏

𝒛𝜶

− 𝒛𝜶

− 𝒛𝜶/𝟐 𝒛𝜶/𝟐

𝒛𝜶
− 𝒛𝜶
− 𝒛𝜶/𝟐 𝒛𝜶/𝟐

𝒕𝜶

− 𝒕𝜶

− 𝒕𝜶/𝟐 𝒕𝜶/𝟐

one-tail right

one-tail left

two-tail

one-tail right
one-tail left
two-tail
one-tail right
one-tail left
two-tail

𝑍𝑠𝑡𝑎𝑡 =
ത𝑋 − 𝜇

ൗ
𝜎

𝑛

𝑡𝑠𝑡𝑎𝑡 =
ത𝑋 − 𝜇

ൗ
𝑠

𝑛

𝑍𝑠𝑡𝑎𝑡 =
Ƹ𝑝 − 𝑝

𝑝(1 − 𝑝)
𝑛

Hypothesis

Testing

𝛅 𝐮𝐧𝐤𝐧𝐨𝐰𝐧

𝑧𝑠𝑡𝑎𝑡 > 𝑧𝛼

reject 𝐻0

𝑧𝑠𝑡𝑎𝑡 < −𝑧𝛼 reject 𝐻0

𝑧𝑠𝑡𝑎𝑡> 𝑧𝛼/

𝑧𝑠𝑡𝑎𝑡 < −𝑧𝛼/2

reject 𝐻0

𝑡𝑠𝑡𝑎𝑡 > 𝑡 reject 𝐻0

𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼 reject 𝐻0

𝑡𝑠𝑡𝑎𝑡> 𝑡𝛼/2
𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼/2

reject 𝐻0

𝑧𝑠𝑡𝑎𝑡 > 𝑧𝛼 reject 𝐻0

𝑧𝑠𝑡𝑎𝑡 < −𝑧𝛼 reject 𝐻0 𝑧𝑠𝑡𝑎𝑡> 𝑧𝛼/2
𝑧𝑠𝑡𝑎𝑡 < −𝑧𝛼/2 reject 𝐻0

1 2 3 4

Two Populations – Two Samples

Population 1

Parameters: 𝝁𝟏 and 𝜹𝟏

Population 2

Parameters: 𝝁𝟐 and 𝜹𝟐

Sample

Size: 𝒏𝟏

Statistics: ഥ𝑿𝟏 and 𝒔𝟏

Sample

Size: 𝒏𝟐

Statistics: ഥ𝑿𝟐 and 𝒔𝟐

Inference about the

difference between two

population means 𝜇1 − 𝜇2

Testing Difference Population Means:
𝝈𝟏and 𝝈𝟐are Known – Main Steps
When standard deviation of both populations are known, below steps should be

followed to test the difference between

populations’ mean.

𝐻0:𝜇1 = 𝜇2

𝐻1:𝜇1 > 𝜇2

𝐻0:𝜇1 = 𝜇2

𝐻1:𝜇1 < 𝜇2

𝐻0:𝜇1 = 𝜇2

𝐻1:𝜇1 ≠ 𝜇2

𝑧

𝛼

−

𝑧𝛼

− 𝑧𝛼/2 𝑧𝛼/2

one-tail right
one-tail left
two-tail

𝑧𝑠𝑡𝑎𝑡 =
ത𝑋1 − ത𝑋2

(𝛿1)
2

𝑛1
+

(𝛿2)

𝑛2

One-tail right: If 𝑍𝑠𝑡𝑎𝑡 > 𝑍𝛼 , there is
enough evidence to

reject 𝐻0.

One-tail left: If 𝑍𝑠𝑡𝑎𝑡 < −𝑍𝛼 , there is enough evidence to reject 𝐻0.

Two-tail: If 𝑍𝑠𝑡𝑎𝑡 > 𝑍𝛼/2 or 𝑍𝑠𝑡𝑎𝑡 <

−𝑍𝛼/2 , there is enough evidence to

reject 𝐻0.

1 2-3 4

5
Instructor: Ahmad Teymouri All rights Reserved

Example 1

A baby-food producer company, ABC, claims that its product helps babies to

gain weight faster than a leading competitor’s product, XYZ. A survey was

• Mothers were asked which product (ABC or XYZ) they intended to

feed their babies.

• There were 38 mothers feeding babies with ABC and 29 mentioned

that they would feed their babies XYZ.

• Mothers were asked to keep track of their babies’ weight gains over

the next 3 months.

• Each baby’s weight gain (in gram) was recorded. ( refer to Excel Data)

Assume, according to historical data, the standard deviation of babies’ weight

who were fed by ABC is 285 gram and by XYZ is 320 gram. Can we conclude,

using weight gain as our criterion, that company A is indeed superior. Assume

confidence level is 95%.

Instructor: Ahmad Teymouri All rights Reserved
Example 1

ത𝑋1 =
σ𝑥1
𝑛1

=
94,078

38
= 2,476 𝛿1 = 285

1 − 𝛼 = 0.95

𝛼 = 0.05

𝑍0.05 = 1.64

✓ Question is a hypothesis testing

✓ Standard deviation of populations are known, we use z

ത𝑋2 =
σ𝑥2
𝑛2

=
66,605

= 2,297 𝛿2 = 320

𝑛1 = 38

𝑛2 = 29

from Z table

Instructor: Ahmad Teymouri All rights Reserved
Example 1

𝐻0: 𝜇1 = 𝜇2

𝐻1:𝜇1 > 𝜇2

The test is one-tail right.

For z critical, we use Normal table.

For z-stat, we use the formula:

z-stat > z-critical. Therefore z-stat falls in rejection region. There is enough evidence to reject

𝐻0. Company A is superior.

𝑍𝛼 = 𝑍0.05 = −1.64

First, we construct the hypothesis:

rejection

region

𝟏.𝟔𝟒 𝟐.𝟑𝟕

𝑧𝑠𝑡𝑎𝑡 =
ത𝑋1 − ത𝑋2
(𝛿1)
2
𝑛1
+
(𝛿2)
2
𝑛2

=
2,476 − 2,297

(285)2

38
+
(320)2

Testing Difference Population Means:
𝝈𝟏and 𝝈𝟐are unknown

When standard deviation of both populations are unknown, the difference of

population means depends on whether the two unknown population variances

are equal or not.

• 𝜎1
2 = 𝜎2

• 𝜎1
2 ≠ 𝜎2

For both situations, student t distribution is applied. However, the formula is

different.

Testing Difference Population Means:
𝝈𝟏and 𝝈𝟐are Unknown – Main Steps
When 𝜎1

2 = 𝜎2
2 below steps should be followed to test the difference between

populations’ mean.
𝐻0:𝜇1 = 𝜇2
𝐻1:𝜇1 > 𝜇2
𝐻0:𝜇1 = 𝜇2
𝐻1:𝜇1 < 𝜇2 𝐻0:𝜇1 = 𝜇2 𝐻1:𝜇1 ≠ 𝜇2

𝑡𝛼

− 𝑡𝛼

− 𝑡𝛼/2 𝑡𝛼/2

one-tail right
one-tail left
two-tail

𝑡𝑠𝑡𝑎𝑡 =
ത𝑋1 − ത𝑋2

𝑠𝑝
2 1
𝑛1

+
1
𝑛2

One-tail right: If 𝑡𝑠𝑡𝑎𝑡 > 𝑡𝛼, there is enough
evidence to reject 𝐻0.

One-tail left: If 𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼, there is enough evidence to reject 𝐻0.

Two-tail: If 𝑡𝑠𝑡𝑎𝑡 > 𝑡𝛼/2 or 𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼/2 ,

there is enough evidence to reject 𝐻0.

1 2-3 4
5

𝑠𝑝
2 =

(𝑛1 − 1)

𝑠1
2

+ (𝑛2 − 1)𝑠2

𝑛1 + 𝑛2 − 2

𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 = 𝑛1 + 𝑛2 − 2

Example 2

A large car manufacturer developed a mandatory training program for its

workers in 4 countries. In assessing the effectiveness of this program, an

operation manager designed a survey by measuring two factors:

• Workers’ improved performance (saving time, second).

• age of each worker, 21-to-30 and 31-to-40 age groups.

The survey measured 200 workers’ improved performance in each age group,

the data was listed in the Excel file. With 90% confidence, can we conclude

that the training program has been more effective for age group 31-to-40.

Assume the variances of the improved performance for workers 21-to-30 and

31-to-40 are equal 𝜎1
2 = 𝜎2

Instructor: Ahmad Teymouri All rights Reserved
Example 2
✓ Question is a hypothesis testing

✓ Standard deviation of populations are unknown, we use t

ത𝑋1 =
σ𝑥1
𝑛1

=
17,575

200
= 87.87 𝑆1 = 13.38

1 − 𝛼 = 0.90

𝛼 = 0.10

𝑡0.1 = 1.282

ത𝑋2 =
σ𝑥2
𝑛2

=
10,254

200
= 51.22

𝑆2 = 17.06

𝑛1 = 200

𝑛2 = 200

from t table

= 𝑛1 + 𝑛2 − 2
= 200 + 200 − 2 = 398

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

Instructor: Ahmad Teymouri All rights Reserved
Example 2
𝐻0: 𝜇1 = 𝜇2
𝐻1:𝜇1 < 𝜇2

The test is one-tail left.

For t critical, we use t-student table.

For t-stat, we use the formula:

𝑡𝛼 = 𝑡0.1 = 1.282

First, we construct the hypothesis:
rejection
region

𝟐𝟑.𝟗−𝟏.𝟐𝟖𝟐

𝑡𝑠𝑡𝑎𝑡 =
ത𝑋1 − ത𝑋2
𝑠𝑝
2 1
𝑛1
+
1
𝑛2

=
(87.87 − 51.22)

235(
1
200

+
1
200

)

= 23.9

𝑠𝑝
2 =
(𝑛1 − 1)𝑠1
2
+ (𝑛2 − 1)𝑠2
2
𝑛1 + 𝑛2 − 2
𝑠𝑝
2 =

200 − 1 13.382 + (200 − 1)17.062

200 + 200 − 2
= 235

t-stat > t-critical. Therefore t-stat

does not fall in the rejection region.

We fail to r𝑒𝑗𝑒𝑐𝑡 𝐻0.

Instructor: Ahmad Teymouri All rights Reserved
Testing Difference Population Means:
𝝈𝟏and 𝝈𝟐are Unknown – Main Steps
When 𝜎1

2 ≠ 𝜎2
2 below steps should be followed to test the difference between

populations’ mean.
𝐻0:𝜇1 = 𝜇2
𝐻1:𝜇1 > 𝜇2
𝐻0:𝜇1 = 𝜇2
𝐻1:𝜇1 < 𝜇2 𝐻0:𝜇1 = 𝜇2 𝐻1:𝜇1 ≠ 𝜇2 𝑡𝛼 − 𝑡𝛼 − 𝑡𝛼/2 𝑡𝛼/2 one-tail right one-tail left two-tail 𝑡𝑠𝑡𝑎𝑡 = ത𝑋1 − ത𝑋2 𝑠1 2

𝑛1
+
𝑠2

2
𝑛2
One-tail right: If 𝑡𝑠𝑡𝑎𝑡 > 𝑡𝛼, there is enough
evidence to reject 𝐻0.
One-tail left: If 𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼, there is enough evidence to reject 𝐻0. Two-tail: If 𝑡𝑠𝑡𝑎𝑡 > 𝑡𝛼/2 or 𝑡𝑠𝑡𝑎𝑡 < −𝑡𝛼/2 , there is enough evidence to reject 𝐻0. 1 2-3 4 5

𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

(
𝑠1

2
𝑛1
+

𝑠2
2

𝑛2
)2

(
𝑠1
2

𝑛1
)2

𝑛1 − 1
+

(
𝑠2

2
𝑛2
)2

𝑛2 − 1

Example 3

A business analyst designed a survey to determine the effect of gender on the

automobile insurance rate. A random sample of young men and women was

listed in the Excel file. 178 men and 211 women were asked how many

kilometers he or she had driven in the past year.

After analysing data, she claims that men and women drive totally almost an

equal distance in a year. With 90% confidence, do you accept her analysis.

Assume the variances of the distance driven by men and women drivers are

not equal 𝜎1
2 ≠ 𝜎2

Instructor: Ahmad Teymouri All rights Reserved
Example 3
ത𝑋1 =
σ𝑥1
𝑛1

=
3,589,962

178
= 20,168

𝑆1 = 3,609

1 − 𝛼 = 0.90
𝛼 = 0.10
𝛼

2
= 0.05

𝑡0.05 = 1.645

ത𝑋2 =
σ𝑥2
𝑛2

=
3,913,864

210
= 18,549 𝑆2 = 3,386

𝑛1 = 178

𝑛2 = 211

from t table

=
(
𝑠1

2
𝑛1
+
𝑠2
2
𝑛2
)2
(
𝑠1
2
𝑛1
)2
𝑛1 − 1
+
(
𝑠2
2
𝑛2
)2
𝑛2 − 1

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

(
3,6092

178
+

3,3862

211
)2

(
3,6092

178
)2

178 − 1
+

(
3,3862

211
)2

211 − 1

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

= 367

✓ Question is a hypothesis testing
✓ Standard deviation of populations are unknown, we use t

Instructor: Ahmad Teymouri All rights Reserved
Example 3
𝐻0: 𝜇1 = 𝜇2
𝐻1:𝜇1 ≠ 𝜇2

The test is two-tail.

For t critical, we use t-student table.
For t-stat, we use the formula:

𝑡𝛼/2 = 𝑡0.05 = 1.645

First, we construct the hypothesis:
𝑡𝑠𝑡𝑎𝑡 =
ത𝑋1 − ത𝑋2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2

=
(20,168 − 18,549)

(
3,6092

178
+
3,3862

211
)

= 4.53

t-stat > t-critical. Therefore t-stat falls in the rejection region. We have enough evidence to

reject 𝐻0.
rejection
region

+𝟏.𝟔𝟒𝟓−𝟏.𝟔𝟒𝟓

rejection
region

𝟒.𝟓𝟑

Testing Difference Population Proportions

In some cases, the objective is to determine the difference between proportion

of two populations, 𝑝1−𝑝2.

To draw inferences about 𝑝1and 𝑝2, a sample of size 𝑛1from population 1 and a

sample of size 𝑛2 from population 2 are taken.

Population 1 Population 2

Parameter: 𝒑𝟏 Parameter: 𝒑𝟐

Statistics: ෝ𝒑𝟏 Statistics: ෝ𝒑𝟐

Sample

Size: 𝒏𝟏
Sample
Size: 𝒏𝟐

ෝ𝒑𝟏 =
𝒙𝟏
𝒏𝟏

ෝ𝒑𝟐 =
𝒙𝟐
𝒏𝟐

For each sample, the

number of successes is

represented by x, which

we label 𝒙𝟏 and 𝒙𝟐 ,
respectively.

Testing Difference Population Proportions
– Main Steps

𝐻0: 𝑝1 = 𝑝2

𝐻1:𝑝1 > 𝑝2

𝐻0:𝑝1 = 𝑝2

𝐻1:𝑝1 < 𝑝2

𝐻0:𝑝1 = 𝑝2

𝐻1:𝑝1 ≠ 𝑝2

𝑧𝛼

− 𝑧𝛼

− 𝑧𝛼/2 𝑧𝛼/2
one-tail right
one-tail left
two-tail

One-tail right: If 𝑍𝑠𝑡𝑎𝑡 > 𝑍𝛼, there is enough evidence to reject 𝐻0.

One-tail left: If 𝑍𝑠𝑡𝑎𝑡 < −𝑍𝛼, there is enough evidence to reject 𝐻0.

Two-tail: If 𝑍𝑠𝑡𝑎𝑡 > 𝑍𝛼/2 or 𝑍𝑠𝑡𝑎𝑡 < −𝑍𝛼/2, there is enough evidence to reject 𝐻0.

1 2-3 4
5

𝑧𝑠𝑡𝑎𝑡 =
( Ƹ𝑝1− Ƹ𝑝2)

Ƹ𝑝(1 − Ƹ𝑝)(
1
𝑛1

+
1
𝑛2
)

Ƹ𝑝 =
𝑥1 + 𝑥2
𝑛1 + 𝑛2

Example 4

Selling extended warranties for products is a profitable business for many

stores. The extended warranty is offered for both regular and sale prices. A

store manager has recently conducted a survey about the difference in

proportion of customers who bought extended warranty. The below table

shows the results:

At the 1% significance level, can we say that the people who paid the regular

price are more likely to buy an extended warranty?

Sale Price Regular Price

Sample size 354 478

Number who bought

extended warranty

111 105

Instructor: Ahmad Teymouri All rights Reserved
Example 4

✓ Question is a population proportion hypothesis testing

𝑥1 = 111

𝛼 = 0.01

𝑍0.01 = 2.33

𝑥2 = 105

𝑛1 = 354

𝑛2 = 478

from Z table

Ƹ𝑝2 =
𝑥2
𝑛2

=
105

478
= 0.22

Ƹ𝑝1 =
𝑥1
𝑛1

=
111

354
= 0.31

Ƹ𝑝 =
𝑥1 + 𝑥2
𝑛1 + 𝑛2

=
111 + 105

354 + 478
= 0.26

Instructor: Ahmad Teymouri All rights Reserved
Example 4
𝐻0: 𝑝1 = 𝑝2
𝐻1:𝑝1 < 𝑝2

This is case number one. The test is one-tail left.

For z critical, we use Normal table.
For z-stat, we use the formula:

z-stat > z-critical. Therefore z-stat does not fall in the rejection region. There is not enough

evidence to accept 𝐻1.

𝑍𝛼 = 𝑍0.01 = −2.33

First, we construct the hypothesis:
𝑧𝑠𝑡𝑎𝑡 =
( Ƹ𝑝1− Ƹ𝑝2)
Ƹ𝑝(1 − Ƹ𝑝)(
1
𝑛1
+
1
𝑛2
)

=
(0.31 − 0.22)

0.26(1 − 0.26)(
1
354

+
1
478

)

= 2.92

rejection
region

𝟐.𝟗𝟐−𝟐.𝟑𝟑

In Class Activity 1

How important to your health are regular vacations? In a study a random

sample of men and women were asked how frequently they take

vacations. The men and women were divided into two groups each. The

members of group 1 had suffered a heart attack; the members of group

2 had not. The number of days of vacation last year was recorded for

each person. Can we infer that men and women who suffer heart

attacks vacation less than those who did not suffer a heart attack?

In Class Activity 2

In designing advertising campaigns to sell magazines, it is important to know

how much time each of a number of demographic groups spends reading

magazines. In a preliminary study, 40 people were randomly selected. Each

was asked how much time per week he or she spends reading magazines;

additionally, each was categorized by gender and by income level (high or

low). The data are stored in the following way: column 1 = Time spent reading

magazines per week in minutes for all respondents; column 2 = Gender (1 =

Male, 2 = Female); column 3 = Income level (1 = Low, 2 = High).

Is there sufficient evidence at the 10% significance level to conclude that men

and women differ in the amount of time spent reading magazines?

In Class Activity 3

The manager of a dairy is in the process of deciding which of two new carton-

filling machines to use. The most important attripute is the consistency of the

fills. In a preliminary study she measured the fills in the 1-liter carton and listed

them here. Can the manager infer that the two machines differ in their

consistency of fills?

Data Analysis – Microsoft Excel

Let’s answer example 1 with data analysis Add-Ins in Excel.

Instructor: Ahmad Teymouri All rights Reserved
Data Analysis – Microsoft Excel

References

• Business Statistics in Practice: Second Canadian Edition, Bowerman,

O’Connell, et al. McGraw-Hill, Third Canadian Edition

• G. Keller (2017) Statistics for Management and Economics (Abbreviated),

11th Edition, South-Western (students can also use the 8th edition of the

same textbook).

• M. Middleton (1997) Data Analysis Using Microsoft Excel, Duxbury. (A good

reference for basic statistical work with Excel.)

Thank you

Advanced Business Statistics

▪ Introduction to Estimation (One Sample)

Winter 2022

Agenda

Introduction to Estimation of the Mean (One Sample)

❑ When

is known

❑ When σ is unknown

❑ For Proportion

Introduction

In almost all realistic situations parameters of population are unknown. We

use the sampling distribution to draw inferences about the unknown

population parameters.

Statistical inference is the process by

which we acquire information and

draw conclusions about populations

from samples.

Sample

Population

Important Definitions

Sample

Population

Instructor: Ahmad Teymouri All rights Reserved
Important Definitions

Population:

A population is the total of any kind of units under consideration by the

statistician such as blood pressure population of men

Parameter:

A parameter is a characteristic of a population such as size, mean, and

standard

deviation

Sample:

A sample is any portion of the population selected for study such as 100 men

selected from blood pressure population of men.

Statistic:

A statistic is a characteristic of a sample such as size, mean, and standard

deviation

Important Symbols

1-αConfidence Level

Significance Level

Mean
Standard

Deviation
Size Proportion

Population
Sample

σ N
ഥX S n

𝑝

ෝ𝑝

Population Inference

There are two types of inference:

❑ Estimation

❑ Hypothesis Testing

The objective of estimation is to determine the approximate value of a

population parameter on the basis of a sample statistic.

Concept of Estimation

As its name suggests, the objective of estimation is to determine the

approximate value of a population parameter on the basis of a sample

statistic. An important example is estimation of the population mean ( )

by employing the sample mean ( ).

There are two situations to estimate ( ):

❖ When population standard deviation ( ) is known

❖ When population standard deviation ( ) is unknown

ഥX

Estimating when Is Known

µ σ

When standard deviation of the population is known, following equation can be

applied to estimate the population

mean.

Therefore:

is called the Lower Confidence Level (LCL)

is called the Upper Confidence Level (UCL)

Also:

is called the Standard Error

ത𝑋 − 𝑍

𝛼

𝛿

𝑛

< 𝜇 < ത𝑋 + 𝑍𝛼

2
𝛿

𝑛

ത𝑋 −

𝑍𝛼
2

𝛿
𝑛

ത𝑋 + 𝑍𝛼
2

𝛿
𝑛
𝛿
𝑛

Instructor: Ahmad Teymouri All rights Reserved
Estimating when Is Knownµ σ

To estimate when is know, we should do following steps:

Step1: write all provided data

Step 2: extract from Z table.

Step 3: write the appropriate equation to estimate

Step 4: plug in the numbers

µ σ
𝑍𝛼
2
µ

Example 1

A statistics practitioner took a random sample of 49 observations from a

population with a standard deviation of 21 and computed the sample

mean to be 100. Estimate the population mean with 90%

confidence.

ത𝑋 − 𝑍𝛼
2

𝛿

𝑛
< 𝜇 < ത𝑋 + 𝑍𝛼

2
𝛿
𝑛

ത𝑋 =

100

𝛿 =

𝑛 =

1 − 𝛼 = 0.9

𝛼 = 0.1
𝛼

2
= 0.05

𝑍0.05 = 1.64

100 − 1.64
21

49
< 𝜇 < 100 + 1.64

21
49

95.08 < 𝜇 < 104.92

with 90% confidence, the population

mean is a number lower than 104.92 and

higher than 95.08.

from Z table

standard deviation of population is

known, we use z for estimation:

Example 2

The mean of a random sample of 100 observations from a normal population

with a standard deviation of 40 is 250. Estimate the population mean with 95%

confidence.
ത𝑋 − 𝑍𝛼
2
𝛿
𝑛
< 𝜇 < ത𝑋 + 𝑍𝛼 2 𝛿 𝑛

ത𝑋 = 250

𝛿 =

𝑛 = 100

1 − 𝛼 = 0.95

𝛼 = 0.05

𝛼

2
= 0.0

𝑍0.025 = 1.96

250 − 1.96
40

100
< 𝜇 < 250 + 1.96

40
100

242.16 < 𝜇 < 257.84

with 95% confidence, the

population mean is a number

lower than 257.84 and higher than

242.16.

from Z table
standard deviation of population is
known, we use z for estimation:

Example 3

How many rounds of golf do physicians (who play golf) play per year? A survey

of 12 physicians revealed the following numbers:

11 21 19 5 14 35 26 31 13 19 36 44

Estimate with 99% confidence the mean number of rounds per year played by

physicians. The number of rounds is normally distributed with a standard

deviation of 10.

ത𝑋 − 𝑍𝛼
2
𝛿
𝑛
< 𝜇 < ത𝑋 + 𝑍𝛼 2 𝛿 𝑛

ത𝑋 =
σ𝑥

𝑛
=
274

= 22.83

𝛿 =

𝑛 = 12

1 − 𝛼 = 0.99

𝛼 = 0.01

𝛼

2
= 0.005

𝑍0.005 = 2.56

22.83 − 2.56
10

12
< 𝜇 < 22.83 + 2.56

10
12

15.44 < 𝜇 < 30.22

with 99% confidence, the population

mean is a number lower than 30.22 and

higher than 15.44.from Z table

standard deviation of population is
known, we use z for estimation:

Sample Size to Estimate Mean

As explained before, following equation is used to estimate the population

mean.

We can solve the equation for n to compute the required sample size for

estimation of the mean:

Where B represent the “standard error” of estimation or “within unit” of the

mean.
ത𝑋 − 𝑍𝛼
2
𝛿

𝑛
≤ 𝜇 ≤ ത𝑋 + 𝑍𝛼

2
𝛿
𝑛

𝑛 = (𝑍𝛼
2

𝛿

𝐵
)2

Example 4

Determine the sample size required to estimate a population mean to within 12

units given that the population standard deviation is 40. A confidence level of

95% is judged to be appropriate.

𝑛 = (𝑍𝛼
2
𝛿

𝐵
)2

𝐵 = 12

𝛿 = 40

𝑛 = ?

1 − 𝛼 = 0.95
𝛼 = 0.05
𝛼
2
= 0.025
𝑍0.025 = 1.96
from Z table

𝑛 = (1.96
40

12
)2= 42.68

𝑛 = 43

43 samples is required for this

estimation.

Example 5

A medical statistician wants to estimate the average level of blood

pressure of people who are on a new diet plan. In a preliminary study,

he already knew that the standard deviation of the population of blood

pressure is about 20 mmHG. How large a sample should he take to

estimate the mean blood pressure to within 4 unit. Assume the level of

confidence is 99%.

𝑛 = (𝑍𝛼
2
𝛿
𝐵
)2

𝐵 = 4

𝛿 = 20

𝑛 = ?
1 − 𝛼 = 0.99
𝛼 = 0.01
𝛼
2
= 0.005
𝑍0.005 = 2.56
from Z table

𝑛 = (2.56
20

4
)2= 163.84

𝑛 = 164

164 samples of people’s blood

pressure is required for this

estimation.

Estimating when Is Unknownµ σ

When standard deviation of the population is unknown, following equation can

be applied to estimate the population mean.

Therefore:
is called the Lower Confidence Level (LCL)
is called the Upper Confidence Level (UCL)
Also:
is called the Standard Error

ത𝑋 −

𝑡𝛼
2

𝑠

𝑛
< 𝜇 < ത𝑋 + 𝑡𝛼

2
𝑠
𝑛

ത𝑋 − 𝑡𝛼
2

𝑠
𝑛

ത𝑋 + 𝑡𝛼
2

𝑠
𝑛
𝑠
𝑛

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

= 𝑛 − 1

To estimate when is unknow, we should do following steps:

Step1: write all provided data

Step 2: extract from t table. Remember, you need to compute

degree of freedom which is n-1 to be able to use t able.

Step 3: write the appropriate equation to estimate
Step 4: plug in the numbers
µ σ
𝑡𝛼
2
µ

Example 6

A random sample of 25 was drawn from a population. The sample mean and

standard deviation are 450 and 90. Estimate mean of the population with 95%

confidence.
ത𝑋 − 𝑡𝛼
2
𝑠
𝑛
< 𝜇 < ത𝑋 + 𝑡𝛼 2 𝑠 𝑛

ത𝑋 = 450

𝑠 =

𝑛 = 25

1 − 𝛼 = 0.95
𝛼 = 0.05
𝛼
2
= 0.025

𝑡0.025 = 2.064

450 − 2.064
90

25
< 𝜇 < 450 + 2.064

90
25

412.85 < 𝜇 < 487.

with 95% confidence, the
population mean is a number

lower than 487.16 and higher than

412.85.

from t table

standard deviation of population is

unknown, we use t for estimation:

degree of freedom = n-1

=25-1=24

Example 7

A police control officer is conducting an analysis of the amount of time left on

gas stations. A quick survey of 20 police cars that have just left a gas station

are the following times (in minutes).

6 4 6 9 5 6 7 4 5 8 4 4 5 6 5 4 3 4 3 5

Estimate with 99% confidence the mean amount of time a police car spends in

a gas station.

ത𝑋 − 𝑡𝛼
2
𝑠
𝑛
< 𝜇 < ത𝑋 + 𝑡𝛼 2 𝑠

𝑛𝑠 =

1.56

𝑛 = 20

1 − 𝛼 = 0.99
𝛼 = 0.01
𝛼
2
= 0.005

𝑡0.005 = 2.861

5.15 − 2.861
1.56

20
< 𝜇 < 5.15 + 2.861

1.56

20
4.16 < 𝜇 < 6.14

with 99% confidence, the average of

amount of time left on gas stations is lower

than 6.14 min and higher than 4.16 min.

from t table
standard deviation of population is
unknown, we use t for estimation:
degree of freedom = n-1

=20-1=19

ത𝑋 =
σ 𝑥

𝑛
=
103

20
= 5.15

Excel function STDEV.S

Example 8

What is the average salary of students working in summer internship? To

determine an answer, a random sample of 16 students was drawn (US dollar).

15,500 12,400 13,000 19,000 21,000 22,000 18,000 12,300

16,100 13,200 17,900 10,400 9,200 16,800 17,100 15,600

Estimate with 90% confidence the mean of students’ salary working in summer

internship

ത𝑋 − 𝑡𝛼
2
𝑠
𝑛
< 𝜇 < ത𝑋 + 𝑡𝛼 2 𝑠

𝑛𝑠 =

3,636

𝑛 = 16

1 − 𝛼 = 0.90

𝛼 = 0.1
𝛼
2
= 0.05

𝑡0.05 = 1.753

15,594 − 1.753
3,636

16
< 𝜇 < 15,594 + 1.753

3,636
16

14,001 < 𝜇 < 17,187

with 90% confidence, the average salary of

students working in summer internship is lower

than 17,187 and higher than 14,001 US dollar.

from t table
standard deviation of population is
unknown, we use t for estimation:

degree of freedom = n-1=16-1=15

ത𝑋 =
σ𝑥

𝑛
=
249,500

16
= 15,594

Excel function STDEV.S

Estimating a Population Proportion

In many situations, we need to estimate a population proportion. For example,

proportion of people who watch advertisements during a TV show. To estimate

proportion, following equation should be used:

Ƹ𝑝 − 𝑍𝛼
2

Ƹ𝑝 1 − Ƹ𝑝

𝑛
< 𝑝 < Ƹ𝑝 + 𝑍𝛼

Ƹ𝑝(1 − Ƹ𝑝)

𝑛

Lower Confidence Level (LCL) Upper Confidence Level (UCL)

Example 9

A business school wanted to know whether the graduates of the school can find

the job within the first three months after finishing their study. 424 graduates

were surveyed and asked about their employment. After tallying the responses,

it was reported that only 152 were hired within the first three months of

graduation. Estimate with 90% confidence the proportion of all business school

graduates who can find a job within the first three months of graduation.

𝑛 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 =

152

Ƹ𝑝 =

152

424

= 0.36

𝑛 = 424

1 − 𝛼 = 0.90
𝛼 = 0.1
𝛼
2
= 0.05
𝑍0.05 = 1.64
from Z table
Ƹ𝑝 − 𝑍𝛼
2
Ƹ𝑝 1 − Ƹ𝑝
𝑛
< 𝑝 < Ƹ𝑝 + 𝑍𝛼 2 Ƹ𝑝(1 − Ƹ𝑝) 𝑛

proportion of population needs to be

computed:

0.36 − 1.64

0.36 1 − 0.36

424
< 𝑝 < 0.36 + 1.64

0.36 1 − 0.36
424

0.32 < 𝑝 < 0.4

The proportion of graduates of the school can find the

job within the first three months is lower than 0.4 and

higher than 0.32

Sample Size to Estimate Proportion

If we solve the proportion function for n, the required sample size for

estimation of proportion is computed:

Where B represent the “standard error” of estimation or “within unit” of

the mean.

𝑛 = (𝑍𝛼
2

Ƹ𝑝 1 − Ƹ𝑝 /𝐵)2

Example 10

Determine the sample size necessary to estimate a population proportion to

within 0.03 with 90% confidence. Assume Ƹ𝑝 = 0.6.

𝑛 = (
𝑍𝛼
2

Ƹ𝑝(1 − Ƹ𝑝)
𝐵
)2

Ƹ𝑝 = 0.60

1 − 𝛼 = 0.90
𝛼 = 0.1
𝛼
2
= 0.05
𝑍0.05 = 1.64
from Z table

𝐵 = 0.03

718 samples is required for this

estimation.

𝑛 = (
1.64 0.6(1 − 0.6)

0.03
)2= 717.22

𝑛 = 718

Data Analysis Plus – Microsoft Excel

Download “Data Analysis Plus” Add-Ins from the below website and

follow the instruction to install it.

Data Analysis Plus – Microsoft Excel

Let’s solve Example 8 with using Microsoft Excel.

References

• Business Statistics in Practice: Second Canadian Edition, Bowerman,

O’Connell, et al. McGraw-Hill, Third Canadian Edition

• G. Keller (2017) Statistics for Management and Economics (Abbreviated),

11th Edition, South-Western (students can also use the 8th edition of the

same textbook).

• M. Middleton (1997) Data Analysis Using Microsoft Excel, Duxbury. (A good

reference for basic statistical work with Excel.)

Thank you

Advanced Business Statistics

▪ Introduction to Estimation

(Two Sample)

Winter

022

Agenda

Introduction to Estimation of the Deference of Means

(Two Sample)

❑ When 𝜎1 and 𝜎2 are known

❑ When 𝜎1 and 𝜎2 are unknown

• Considering equal variances

• Considering unequal variances

❑ For Proportion

Review – Population Mean Estimation for One

Sample

Population

𝜇 = ത𝑋 ± 𝑧

𝛼

𝛿

𝑛

𝜇 = ത𝑋 ± 𝑡𝛼
2

𝑠

𝑛

𝑝 = Ƹ𝑝 − 𝑍𝛼
2

Ƹ𝑝 1 − Ƹ𝑝 /𝑛

Population

Mean

Estimation

δ known

δ unknown

Proportion

𝑛 = (𝑍𝛼
2

𝛿

𝐵
)2

𝑛 = (
𝑍𝛼
2

Ƹ𝑝(1 − Ƹ𝑝)

𝐵
)2

Two Populations – Two Samples

Sample

Population 1

Sample 2

Population 2

Average salary of

retired people in

Ontario

Average salary of
retired people in

Quebec

Inference about the Difference between Two
Means
To estimate the difference between two population means, we draw

random independent samples from each of two

populations.

Population 1 Population 2

Parameters:

𝝁𝟏 and 𝜹𝟏

Parameters:

𝝁𝟐 and 𝜹𝟐

Statistics:
ഥ𝑿𝟏 and 𝒔𝟏

Statistics:
ഥ𝑿𝟐 and 𝒔𝟐

Sample

Size: 𝒏𝟏

Sample

Size: 𝒏𝟐

Estimate Difference Population Means:
𝝈𝟏and 𝝈𝟐are known

When standard deviation of both populations are known, the following equation

can be applied to estimate the difference between populations’ mean.

𝜇1 − 𝜇2 = ( ത𝑋1 − ത𝑋2) ± 𝑧𝛼
2

(𝛿1)
2

𝑛1
+

(𝛿2)

𝑛2

ത𝑋1 − ത𝑋2 − 𝑧𝛼
2

(𝛿1)
2
𝑛1
+
(𝛿2)
2

𝑛2

Therefore:

ത𝑋1 − ത𝑋2 + 𝑧𝛼
2

(𝛿1)
2
𝑛1
+
(𝛿2)
2
𝑛2

Standard Error

Lower Confidence Level (LCL)

Upper Confidence Level (UCL)

We rarely estimate difference population mean with using previous formula

mainly because the population variances are usually unknown.

But, it is necessary to estimate the standard error:

𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 =
(𝛿1)

2
𝑛1
+
(𝛿2)
2
𝑛2

Example 1

A baby-food producer company, ABC, claims that its product helps babies to

gain weight faster than a leading competitor’s product, XYZ. A survey was

• Mothers were asked which product (ABC or XYZ) they intended to

feed their babies.

• There were 38 mothers feeding babies with ABC and 29 mentioned

that they would feed their babies XYZ.

• Mothers were asked to keep track of their babies’ weight gains over

the next 3 months.

• Each baby’s weight gain (in gram) was recorded. ( refer to Excel Data)

Assume, according to historical data, the standard deviation of babies’ weight

who were fed by ABC is 150 gram and by XYZ is 90 gram.

Estimate with 95% confidence the difference between the mean weight gains

of ABC and XYZ.

ത𝑋1 =
σ𝑥1
𝑛1

=
94,078

38
= 2,476

𝛿1 = 150

1 − 𝛼 = 0.95

𝛼 = 0.05

𝛼

2
= 0.025

𝑍0.025 = 1.96

standard deviation of populations are known, we use z for estimation:

ത𝑋2 =
σ𝑥2
𝑛2

=
66,605

= 2,297

𝛿2 = 90

𝑛1 = 38

𝑛2 = 29

from Z table

𝜇1 − 𝜇2 = ( ത𝑋1 − ത𝑋2) ± 𝑧𝛼
2
(𝛿1)
2
𝑛1
+
(𝛿2)
2
𝑛2

𝜇1 − 𝜇2 = (2,476 − 2,297) ± 1.96
(150)2

38
+
(90)2

121 < 𝜇1 − 𝜇2 < 237

with 95% confidence, the difference between the

mean weight gains of ABC and XYZ is lower than

237 gram and higher than 121 gram.

Estimate Difference Population Means:
𝝈𝟏and 𝝈𝟐are unknown

In many situations, standard deviation of populations are unknown. Therefore,

the difference of population means depends on whether the two unknown

population variances are equal or not.

•

𝜎1
2 = 𝜎2

•

𝜎1
2 ≠ 𝜎2

For both situations, student t distribution is applied. However, the formula is

different.

𝜇1 − 𝜇2 = ത𝑋1 − ത𝑋2 ± 𝑡𝛼
2

𝑠𝑝
2

1
𝑛1
+
1
𝑛2
Standard Error

𝑛1 + 𝑛2 − 2

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚𝑠𝑝

2 =
(𝑛1 − 1)𝑠1

2
+ (𝑛2 − 1)𝑠2

2
𝑛1 + 𝑛2 − 2

Pooled

Variance

𝜇1 − 𝜇2 = ത𝑋1 − ത𝑋2 ± 𝑡𝛼
2

𝑠1
2

𝑛1
+
𝑠2

2
𝑛2
Standard Error

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 =

(
𝑠1

2
𝑛1
+

𝑠2
2

𝑛2
)2

(
𝑠1
2

𝑛1
)2

𝑛1 − 1
+

(
𝑠2

2
𝑛2
)2

𝑛2 − 1

𝜎1
2 ≠ 𝜎2
2
𝜎1
2 = 𝜎2
2

Example 2

A large car manufacturer developed a mandatory training program for its

workers in 4 countries. In assessing the effectiveness of this program, an

operation manager designed a survey by measuring two factors:

• Workers’ improved performance (saving time, second).

• age of each worker, 21-to-30 and 31-to-40 age groups.

The survey measured 200 workers’ improved performance in each age group,

the data was listed in the Excel file. Estimate with 90% confidence the

difference in improved performance mean of the two age groups, 21-to-30 and

31-to-40. Assume the variances of the improved performance for workers 21-

to-30 and 31-to-40 are equal 𝜎1
2 = 𝜎2

ത𝑋1 =
σ 𝑥1
𝑛1

=
17,575

200

= 87.87

𝑆1 = 13.38

1 − 𝛼 = 0.90

𝛼 = 0.10

𝛼

2
= 0.05

𝑡0.05 = 1.645

standard deviation of populations are unknown (assume equal)

we use t for estimation:

ത𝑋2 =
σ𝑥2
𝑛2

=
10,254

200
= 51.22

𝑆2 = 17.06

𝑛1 = 200

𝑛2 = 200

from t table

𝜇1 − 𝜇2 = ( ത𝑋1 − ത𝑋2) ± 𝑡𝛼
2

𝑠𝑝
2
1
𝑛1
+
1
𝑛2

𝑠𝑝
2 =

(𝑛1 − 1)𝑠1
2
+ (𝑛2 − 1)𝑠2

2
𝑛1 + 𝑛2 − 2
𝑠𝑝
2 =

(200 − 1)(13.38)2+(200 − 1)(17.06)2

200 + 200 − 2
= 234

= 𝑛1 + 𝑛2 − 200 + 200 − 2 = 398
𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

𝜇1 − 𝜇2 = (87.87 − 51.22) ± 1.645 234
1

200
+

1
200

34.13 < 𝜇1 − 𝜇2 < 39.17

with 90% confidence, the difference in improved performance mean of the two

age groups, 21-to-30 and 31-to-40, is lower than 39.17 second and higher than

34.13 second.

Example 3

A business analyst designed a survey to determine the effect of gender on the

automobile insurance rate. A random sample of young men and women was

listed in the Excel file. 178 men and 211 women were asked how many

kilometers he or she had driven in the past year. Estimate with 90% confidence

the difference between mean distance driven by male and female drivers.

Assume the variances of the distance driven by men and women drivers are not

equal 𝜎1
2 ≠ 𝜎2

=
3,589,962

178
= 20,168

𝑆1 = 3,609

1 − 𝛼 = 0.90
𝛼 = 0.10
𝛼
2
= 0.05
𝑡0.05 = 1.645

standard deviation of populations are unknown (assume unequal)

we use t for estimation:
ത𝑋2 =
σ𝑥2
𝑛2

=
3,913,864

210
= 18,549

𝑆2 = 3,386

𝑛1 = 178

𝑛2 =

211

from t table
𝜇1 − 𝜇2 = ത𝑋1 − ത𝑋2 ± 𝑡𝛼
2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2

=
(
𝑠1

2
𝑛1
+
𝑠2
2
𝑛2
)2
(
𝑠1
2
𝑛1
)2
𝑛1 − 1
+
(
𝑠2
2
𝑛2
)2
𝑛2 − 1

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

(
3,6092

178
+

3,3862

211
)2

(
3,6092

178
)2

178 − 1
+

(
3,3862

211
)2

211 − 1

𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓
𝑓𝑟𝑒𝑒𝑑𝑜𝑚

= 367

𝜇1 − 𝜇2 = 20,168 − 18,549 ± 1.645
3,6092

178
+
3,3862
211

1,032 < 𝜇1 − 𝜇2 < 2,207

with 90% confidence, the difference between mean distance driven by male and

female drivers is lower than 2,207 km and higher than 1,032 km.

Estimate Difference Population Proportions

In some cases, the objective is to determine the difference between proportion

of two populations, 𝑝1−𝑝2.

To draw inferences about 𝑝1and 𝑝2, a sample of size 𝑛1from population 1 and a

sample of size 𝑛2 from population 2 are taken.

Population 1 Population 2

Parameter: 𝒑𝟏 Parameter: 𝒑𝟐

Statistics: ෝ𝒑𝟏 Statistics: ෝ𝒑𝟐

Sample
Size: 𝒏𝟏
Sample

Size: 𝒏𝟐

ෝ𝒑𝟏 =
𝒙𝟏
𝒏𝟏

ෝ𝒑𝟐 =
𝒙𝟐
𝒏𝟐

For each sample, the

number of successes is

represented by x, which

we label 𝒙𝟏 and 𝒙𝟐 ,
respectively.

The confidence interval estimator of 𝑝1−𝑝2 is computed by the

following formula:

𝑝1−𝑝2 = ( Ƹ𝑝1− Ƹ𝑝2) ± 𝑍𝛼
2

Ƹ𝑝1(1 − Ƹ𝑝1)

𝑛1
+

Ƹ𝑝2(1 − Ƹ𝑝2)

𝑛2
Standard Error

Example 4

Selling extended warranties for products is a profitable business for many

stores. The extended warranty is offered for both regular and sale prices. A

store manager has recently conducted a survey about the difference in

proportion of customers who bought extended warranty. The below table shows

the results:

Estimate with 95% confidence the difference in proportion of extended

warranties bought for regular price and sale price.

Sale Price Regular Price

Sample size 354

478

Number who bought

extended warranty

111 105

𝑥1 = 111

1 − 𝛼 = 0.95
𝛼 = 0.05
𝛼
2
= 0.025
𝑍0.025 = 1.96

𝑥2 = 105

𝑛1 = 354

𝑛2 = 478

from Z table

question is about determining the difference between proportion of two

populations.
𝑝1−𝑝2 = ( Ƹ𝑝1− Ƹ𝑝2) ± 𝑍𝛼
2
Ƹ𝑝1(1 − Ƹ𝑝1)
𝑛1
+
Ƹ𝑝2(1 − Ƹ𝑝2)
𝑛2

𝑝1−𝑝2 = (0.31 − 0.22) ± 1.96
0.31(1 − 0.31)

354
+
0.22(1 − 0.22)

478

0.06 < 𝑝1−𝑝2 < 0.12

with 95% confidence, the difference in proportion of

extended warranties bought for regular price and

sale price is lower than 12% and higher than 6%.

Data Analysis Plus – Microsoft Excel

As we explained before, we can use Data Analysis Plus” Add-Ins in Microsoft

Excel.

Data Analysis Plus – Microsoft Excel

Let’s answer example 2 with data analysis plus Add-Ins.

Let’s answer example 3 with data analysis plus Add-Ins.

Let’s answer example 4 with data analysis plus Add-Ins.

References

• Business Statistics in Practice: Second Canadian Edition, Bowerman,

O’Connell, et al. McGraw-Hill, Third Canadian Edition

• G. Keller (2017) Statistics for Management and Economics (Abbreviated),

11th Edition, South-Western (students can also use the 8th edition of the

same textbook).

• M. Middleton (1997) Data Analysis Using Microsoft Excel, Duxbury. (A good

reference for basic statistical work with Excel.)

Thank you

Order your essay today and save 25% with the discount code: STUDYSAVE

Order Now

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Math E x a m feb 23 8 am ”

Get high-quality paper

NEW! AI matching with writer

Order a unique copy of this paper

Type of paper needed:

Pages:

600 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

Our Services

Math E x a m feb 23 8 am

Order a unique copy of this paper