# Statistics & Probability Worksheet

Exam 3 Practice Questions1. True or false: The deterministic model is more practically useful.
2. Epsilon is:
a.
b.
c.
d.
What the p-value is referred to in multiple regression
The Greek symbol representing the predictor variable
The error term
Used in simple linear regression but not multiple linear regression
3. True or false: when we add at least one additional predictor variable to a simple linear
regression model, it is called multiple linear regression.
4. True or false: a t-test is the statistic used to determine if a simple linear regression
equation is statistically useful.
5. True or false: we first look at the results of a global F-test to determine if a multiple
regression equation is statistically useful.
6. How can we best determine if a model is practically useful:
a.
b.
c.
d.
The R2 is high
2s is low
The prediction interval is narrow
All of the above
7. When do we use a global F?
a. To test the interaction term in a multiple regression
b. To test the quadratic term
c. To test the levels of a qualitative variable
d. To test the overall fit of a model with multiple predictors
8. True or false: The dependent variable is the predictor variable.
9. If the p-value associated with the t-statistic in a hypothesis test for a simple linear
regression is below alpha, it means:
a.
b.
c.
d.
The model is statistically useful
The model is practically useful
This is the best model
This model works very well
10. If the relationship between the temperature and crowd levels at the beach depends on
whether it is a weekday or a weekend, there is:
a.
b.
c.
d.
A curvilinear relationship
An interaction
A causal relationship
11. If the relationship between temperature and crowds at the beach is stronger the higher
the temperature gets, there is:
a.
b.
c.
d.
A curvilinear relationship
An interaction
A causal relationship
A qualitative variable
12. To create the quadratic term, we need to:
a.
b.
c.
d.
Square the variable
Multiply the variable by itself
Divide the variable by itself
A or B are both correct
13. True or false: After we find that a model containing two predictors and the interaction
between them is statistically useful, the next step is to run a global F test to see if the
interaction term is useful.
14. The y-intercept can be practically interpreted when:
a.
b.
c.
d.
It makes sense for x to equal zero
An x of zero is within the range of your data set
Either A or B
Both A and B
15. True or false: a quadratic term and an interaction term are the same thing.
16. True or false: in multiple linear regression, the goal is to use two or more variables to
predict the dependent variable.
17. Which of the following fit indices would indicate the best-fitting model:
a.
b.
c.
d.
An R2 of .767 and a standard deviation of 289
An R2 of .622 and a standard deviation of 310
An R2 of .799 and a standard deviation of 270
An R2 of .699 and a standard deviation of 320
18. If we have a qualitative variable with 3 dummy variables, which of the following could be
the qualitative variable:
a.
b.
c.
d.
Season of the year
Month of the year
Days of the week
19. If we can reject the null hypothesis for the global F test for a model containing two
predictors and their interaction, the next step is to:
a.
b.
c.
d.
Divide the p-value by 2
Test each of the predictors to see which work
Stop. You are done.
Look at the t-test for the interaction
20. Studies show that the amount of money people earn is correlated with a higher degree
of happiness but that the strength of this relationship weakens as income increases. This
is an example of a(n):
a.
b.
c.
d.
downward curvature
upward curvature
negative interaction
positive interaction
1. True or false: The deterministic model is more practically useful.
False
2. Epsilon is:
a.
b.
c.
d.
What the p-value is referred to in multiple regression
The Greek symbol representing the predictor variable
The error term
Used in simple linear regression but not multiple linear regression
3. True or false: when we add at least one additional predictor variable to a simple linear
regression model, it is called multiple linear regression.
TRUE
4. True or false: a t-test is the statistic used to determine if a simple linear regression
equation is statistically useful.
TRUE
5. True or false: we look at the results of a global F-test to determine if a multiple
regression equation is statistically useful.
TRUE
6. How can we best determine if a model is practically useful:
a. The R2 is high
b. 2s is low
c. The prediction interval is narrow
d. All of the above
7. When do we use a global F?
a. To test the interaction term in a multiple regression
b. To test the quadratic term
c. To test the levels of a qualitative variable
d. To test the overall fit of a model with multiple variables
8. True or false: The dependent variable is the predictor variable.
FALSE
9. If the p-value associated with the t-statistic in a hypothesis test for a simple linear
regression is below alpha, it means:
a.
b.
c.
d.
The model is statistically useful
The model is practically useful
This is the best model
This model works very well
10. If the relationship between the temperature and crowd levels at the beach depends on
whether it is a weekday or a weekend, there is:
a. A curvilinear relationship
b. An interaction
d. A causal relationship
11. If the relationship between temperature and crowds at the beach is stronger the higher
the temperature gets, there is:
a.
b.
c.
d.
A curvilinear relationship
An interaction
A causal relationship
A qualitative variable
12. To create the quadratic term, we need to:
a. Square the variable
b. Multiply the variable by itself
c. Divide the variable by itself
d. A or B are both correct
13. True or false: After we find that a model containing two predictors and the interaction
between them is statistically useful, the next step is to run a global F test to see if the
interaction term is useful.
FALSE
14. The y-intercept can be practically interpreted when:
a. It makes sense for x to equal zero
b. An x of zero is within the range of your data set
c. Either A or B
d. Both A and B
15. True or false: a quadratic term and an interaction term are the same thing.
FALSE
16. True or false: in multiple linear regression, the goal is to use two or more variables to
predict the dependent variable.
TRUE
17. Which of the following fit indices would indicate the best-fitting model:
a.
b.
c.
d.
An R2 of .767 and a standard deviation of 289
An R2 of .622 and a standard deviation of 310
An R2 of .799 and a standard deviation of 270
An R2 of .699 and a standard deviation of 320
18. If we have a qualitative variable with 3 dummy variables, which of the following could be
the qualitative variable:
a.
b.
c.
d.
Season of the year
Month of the year
Days of the week
19. If we can reject the null hypothesis for the global F test for a model containing two
predictors and their interaction, the next step is to:
a. Divide the p-value by 2
b. Test each of the predictors to see which work
c. Stop. You are done.
d. Look at the t-test for the interaction
20. Studies show that the amount of money people earn is correlated with a higher degree
of happiness but that the strength of this relationship weakens as income increases. This
is an example of a(n):
a.
b.
c.
d.
downward curvature
upward curvature
negative interaction
positive interaction
Lecture 18: Exam 3 Review
Linear Regression
 Regression Goal: Predict the value of one QN
variable from values of related variables.
Think inputs and outputs!
 Dependent variable (DV): QN variable to be
predicted (y)
 Independent variables (IVs): predictor variables
(x1,x2…)
 Experimental unit: object upon which
measurements (y, x) are taken
Linear Regression
 “Linear” → use a straight-line model to relate y to x
 Two types of linear models:
Deterministic: y = β0 + β1x
β0 = y-intercept
β1 = slope
Probabilistic: y = β0 + β1x + ε
ε = random error
1) Hypothesize the model: E(y) = β0 + β1x……
2) Assumptions on random error
3) Collect data; estimate betas
෢0 + 𝛽
෢1 x
Yields prediction equation: 𝑦ො = 𝛽
e.g., : 𝑦ො = 10 + 2x
4) Test model utility (Is model useful for predicting y?)
5) If yes, use model for prediction/inferences
Note: Steps 2 and 3 are interchangeable
Multiple Regression
Multiple independent variables (IVs): x1, x2, x3, x4, … , xk
Independent variables can be quantitative or qualitative
Model: E(y) = β0 + β1×1 + β2×2 + … + βkxk
Example 12.1: (p. 687): Predict auction price of GF clock
Dependent Variable
y = Auction Price (Experimental unit = a single clock)- QN
Independent Variables
x1 = Age of clock (in years) – QN
x2 = Number of bidders on the clock – QN
Multiple Regression
Theory: Price increases linearly with Age & # Bidders
Step 1: E(y) = β0 + β1×1 + β2×2 (1st-order model)
Model proposes 2 straight-lines:
(1) relating y to x1 (with slope β1)
(2) relating y to x2 (with slope β2)
y
y
Slope= β1
Age (x1)
Slope= β2
Bidders (x2)
STATISTIX software results:
Least Squares Linear Regression of PRICE
Predictor
Variables
Constant
AGE
NUMBIDS
Coefficient
-1338.95
12.7406
85.9530
Std Error
173.809
0.90474
8.72852
T
-7.70
14.08
9.85

AICc
319.55
PRESS
646070
0.8923
0.8849
Source
DF
Regression 2
Residual
29
Total
31
SS
MS
F
4283063 2141531 120.19
516727 17818.2
4799790
Cases Included 32
P
0.0000
0.0000
0.0000
Mean Square Error (MSE)
Standard Deviation
VIF
0.0
1.1
1.1
17818.2
133.485
P
0.0000
Missing Cases 0
Least Squares prediction equation:
𝑦ො = -1339 + 12.74×1 + 85.95×2
MR Example
Interpreting estimated betas: E(y) = β0 + β1×1 + β2×2
𝛽መ1 = 12.74:
For every 1 year increase in Age (x1), we estimate
Price (y) to increase \$12.74, holding Number of
bidders (x2) fixed (e.g., after accounting for x2)
𝛽መ2 = 85.95:
For every 1 bidder increase in Number of bidders
(x2), we estimate Price (y) to increase \$85.95,
holding Age (x1) fixed (e.g., after accounting for x1)
Is the model statistically useful?
❖Conduct “global” F-test
E(y) = β0 + β1×1 + β2×2
Test: H0: β1 = β2=0 (model is not useful)
Ha: At least 1 β ≠0 (model is “statistically useful”)
Test statistic: F = # from the printout = 120.19
P-value: p = # from printout = 0.000
❖Individual t-tests and for x1 and x2
Is the model practically useful?
Look at R2 = .892 (SX printout)
Interpretation:
89.2% of the sample variation in auction prices (y)
can be explained by the 1st-order model with Age (x1)
and Number of Bidders (x2).
Look at 2s = 2(133.5) = 267 (SX printout)
Interpretation:
95% of the sampled auction prices will fall within \$267
of their predicted values using the 1st-order model
with Age (x1) and Number of Bidders (x2).
Numerical Measures of Model Fit
1) Coefficient of Determination (R2)
2) Coefficient of Correlation (r)
Coefficient of Determination – measures percentage of
variation in y “explained” by the model
Coefficient of Correlation – measures the strength and
the direction of the linear relationship between y and x
Final Step: Using the model
1) Predict Price (y) for a GF clock with …
Age(x1)=150 years and # Bidders (x2) =5 bidders
2) Estimate mean Price for all GF clocks with …
Age(x1)=150 years and # Bidders (x2) =5 bidders
STATISTIX software results:
Predicted/Fitted Values of PRICE
Lower Predicted Bound
Predicted Value
Upper Predicted Bound
SE (Predicted Value)
713.61
1001.9
1290.2
140.95
Unusualness (Leverage)
Percent Coverage
Corresponding T
0.1151
95
2.05
Lower Fitted Bound
Fitted Value
Upper Fitted Bound
SE (Fitted Value)
Predictor Values: AGE=150.00, NUMBIDS=5.0000
909.29
1001.9
1094.5
45.279
MR Step 5
95% PI for y: (714, 1290)
Interpretation: We are 95% confident that the price
of a single 150 year old GF clock with 5 bidders will
fall between \$714 & \$1290.
95% CI for E(y): (909, 1095)
Interpretation: We are 95% confident that the
average price of all 150 year old GF clocks with 5
bidders will fall between \$909 & \$1,095.
Interaction Model
• The relationship between y and x1 depends on x2
• The relationship between y and x2 depends on x1
E(y) = β0 + β1×1 + β2×2 + β3×1 x2
Slope of y vs. x1 line = (β1 + β3×2)
Slope of y vs. x2 line = (β2 + β3×1)
No interaction:
• parallel
• same slope
Interaction:
• Non-parallel
• different slopes
STATISTIX software results:
Least Squares Linear Regression of PRICE
Predictor
Variables
Constant
AGE
NUMBIDS
AGEBIDS
Coefficient
320.458
0.87814
-93.2648
1.29785

AICc
PRESS
0.9539
0.9489
295.25
288487
Source
DF
Regression 3
Residual
28
Total
31
SS
4578427
221362
4799789
Cases Included 32
Std Error
295.141
2.03216
29.8916
0.21233
T
1.09
0.43
-3.12
6.11
P
0.2868
0.6690
0.0042
0.0000
Mean Square Error (MSE)
Standard Deviation
MS
1526142
7905.79
F
193.04
VIF
0.0
12.2
28.3
30.5
7905.79
88.9145
P
0.0000
Missing Cases 0
Least Squares prediction equation:
𝑦ො = 321 + .878×1 -93.27×2 + 1.298x1x2
Statistically Useful?
Model: E(y) = β0 + β1×1 + β2×2 + β3x1x2
Test H0: β1 = β2 = β3= 0 (nothing in the model works)
Ha: At least one βi is not 0 (something works)
Global F=193, p-value= 0
Conclusion: α=.05 > p-value= 0 → Reject H0
Test H0: β3 = 0 (no interaction)
Ha: β3 > 0 (positive interaction-slope increases)
t-value =6.11, p-value=0/2 = 0
Conclusion: α=.05 > p-value=0 → Reject H0
Interaction Model
Caveat #1: Avoid interpreting other t-tests
Caveat #2: Be careful when interpreting beta
estimates—just don’t do it
Adjusted-R2 = .949 Goes up with interaction included
94.9% of the sample variation in auction prices (y) can
be explained by the interaction model with Age (x1) and
Number of Bidders (x2).
2sd=88.92(2)= 178 Goes down with interaction included
95% of the sampled auction prices will fall within \$178 of
their predicted values using the interaction model with
Age (x1) and Number of Bidders (x2).
E(y) = β0 + β1x + β2×2 (2nd-order model)
Graphs as a “quadratic” or curve relating y to x
Experimental unit = home (sample n = 15 homes)
Dependent Variable: Monthly Electric Usage (QN)
Independent Variable: Size of Home (QN)
Theory:
Rate of increase of usage (y) with
size (x) is slower for larger homes
Scatter Plot of USAGE vs SIZE
2100
USAGE
1900
1700
1500
1300
1100
1200
2000
2800
SIZE
3600
STATISTIX software results:
Least Squares Linear Regression of USAGE
Predictor
Variables
Constant
SIZE
SIZESQ
Coefficient
-806.717
1.96162
-3.404E-04

AICc
PRESS
0.9773
0.9735
126.13
56695
Source
Regression
Residual
Total
DF
2
12
14
Cases Included 15
Std Error
166.872
0.15252
3.212E-05
T
-4.83
12.86
-10.60
P
0.0004
0.0000
0.0000
Mean Square Error (MSE)
Standard Deviation
SS
1300900
30240
1331140
MS
650450
2520.02
Missing Cases 0
F
258.11
VIF
0.0
74.2
74.2
2520.02
50.1998
P
0.0000
Estimate betas: E(y) = β0 + β1x + β2×2
𝛽መ0 = -806.7, 𝛽መ1 = 1.96, 𝛽መ2 = -.00034
Interpretations:
β0: y-intercept of curve
No practical int. since Size(x)=0 is nonsensical
β 1: not a slope, but a shift parameter
Shifts parabola right or left along the x-axis;
No practical interpretation
β 2: Rate of curvature; larger the number the faster the
rate; Negative “sign” indicates downward
curvature
Model: E(y) = β0 + β1x + β2×2
Test: H0: β1 = β2 = 0
Ha: At least one β is not zero
Global-F Test Statistic = 258, p-value=0
Conclusion: α=.05 > p-value=0 → Reject H0
Test: H0: β2 = 0 (no curvature)
Ha: β2 < 0 (downward curvature) t-value =-10.60, p-value=0/2 = 0 Conclusion: α=.05 > p-value=0 → Reject H0
Caveat #1: Avoid interpreting other t-tests
Caveat #2: Be careful when predicting y
Avoid “extrapolation” – selecting x outside range of the sample
97.3% of the sample variation in Usage (y) values
can be explained by the quadratic model with
Size (x)
2s = 2(50) = 100
95% of the sampled Usage (y) values will fall within 100 kwhours of their predicted values using the quadratic model
with Size (x)
Modeling Qualitative Data (2 levels)
Discrimination in the workplace example:
Model Salary of USF professor based on Gender (M,F)
Dependent Var: y=salary; Independent Var: Gender (M,F)
Experimental Unit = a single USF professor
QL Var with 2 levels: Model using x = {1 if Female, 0 if Male}
– referred to as a “dummy” variable
– the value assigned zero is called the “base” level
Model: E(y) = β0 + β1x
-where E(y) is the mean salary
Modeling Qualitative Data (2 levels)
Interpreting betas in dummy variable model:
E(y) = β0 + β1x , where x = {1 if Female, 0 if Male}
x=0: E(y) = β0 = Mean salary for Males (𝝁𝑴 )
x=1: E(y) = β0 + β1 = Mean salary for Females (𝝁𝑭 )
β1 = 𝝁𝑭 – 𝝁𝑴 = Difference between mean salary of
Females and mean salary of Males
Test to Conduct:
H0: β1 = 0 (𝜇𝐹 =𝜇𝑀 ; no discrimination)
Ha: β1 < 0 (𝜇𝐹

Don't use plagiarized sources. Get Your Custom Essay on
Statistics & Probability Worksheet
Just from \$13/Page
Pages (550 words)
Approximate price: -

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

## Calculate the price of your order

Total price:
\$0.00

How it works?

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

## Essay Writing Service

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.