# Linear data modelling

INSTRUCTIONS

Every learner should submit his/her own homework solutions. However, you are allowed to discuss the homework with each other (in fact, I encourage you to form groups and/or use the forums) – but everyone must submit his/her own solution; you may not copy someone else’s solution.

The homework consists of three parts:

1. True and False questions:

2. Fill in the missing values and interpret

3. Coding-analyzing

The first two parts will help you get ready for the midterm as the question types could be very similar.

The third part will help you develop skills, make you understand the flow of an analysis, get you ready for individual work.

Bonus questions are not included in the grade, but if you do it, it will make me happy and I may award extra points for them.

Use the word document provided to add your answers, output, plots, anything that helps with your explanation (the in class practice solutions could give ideas). Try to make it look professional (do not just spit out code).

Submission: Create a zip folder named with your WSU id (e.g. w999w999_HW1). In the zip file you should save both the written analysis (doc, or pdf format) and your program(s). If you are submitting a jupyter notebook, please attach an html output as well for easy viewing as.

Good luck!

True or false questions:

1. The constant variance assumption is diagnosed used histograms:

a. True

b. False

2. The regression coefficients measure the linear dependence between variables

a. True
b. False

3. In the simple linear regression we lose three degrees of freedom because of the estimation of

a. True
b. False

4. A negative value of is consistent with an inverse relationship between x and y.

a. True
b. False

Fill in the missing values:

Call:
lm(formula = happiness ~ income, data = data)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.20427 0.08884 A 0.0219 *

income B 0.01854 38.505 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7181 on 496 degrees of freedom

Multiple R-squared: 0.7493, Adjusted R-squared: 0.7488

F-statistic: 1483 on 1 and 496 DF, p-value: < 2.2e-16

1. What is our regression testing? (what are the varibales?)

2. What is the value of A?

3. What is the value of B?

4. How many observations did the dataset have?

5. What is the correlation between income and happiness?

6. What is the variance of the model?

House Price Prediction
The data for the problem is in house.txt.

Note the units the variables are in:

Value and Price : one unit is a 1000

Square footage unit is 100

Number_Bedrooms is 1.

1. Load the data and print the first 6 lines

R-users:

data = read.table(“house.txt”, sep=””, header = TRUE)

Python users:

a. How many columns and how many rows does your data have?

b. What can you say about our data?

a. What is the mean price and mean size? How about minimum and maximum for each? Is the median different from the mean? Etc.

2. For our analysis we are only going to use two variables. Intuitively we all think that the house size does matter when determining the price of a house. Create your dependent variable “price” (column 2 is the price of the house in thousands of dollars)
Create your independent variable “size” (column 3 is the size in hundreds of square feet)

a. Plot price vs size.

b. Do you see a linear relationship?

3. Obtain the equation of the line that fits to the data.

a. Write the equation for the regression line

b. How do you interpret the intercept? (watch out for the units that it was measured)

c. How do you interpret the slope? (watch out for the units that it was measured)

d. Are the coefficients statistically significant?

a. What is the null and alternative hypothesis that you are testing?

b. What are your conclusions and why?

4. What is the variance of the model?

5. Bonus: Plot the regression fitted line on the scatterplot.

6. What is the 99% confidence interval of the model? Does the interval include zero? What does that mean?

7. What price do you predict for a house with size= 20?

8. Bonus: What is your 99% prediction interval?

9. Model assumptions:

a. What are the model assumptions

b. How do you test for them? Do they hold?

c. Do you see outliers?

10. What does the Box-Cox transformation suggest you do?

11. What is the correlation between the independent and dependent variable?

12. What is the value of the coefficient of determination and how do you interpret it?

13. Do you have any other suggestions to improve our model?

INSTRUCTIONS

Every learner should submit his/her own homework solutions. However,
you are allowed to discuss the homework with each other (in fact, I
encourage you to form groups and/or use the forums)

but everyone must
submit his/her own solution; you may not copy s
omeone else’s solution.

The homework
consists

of three parts:

1.

True and False questions:

2.

Fill in the missing values and interpret

3.

Coding

analyzing

The first two parts will help you get ready for the midterm as the question
types could be very similar.

T
he third

pa
rt will help you develop skills, make you understand the flow of
an analysis, get you ready for individual work.

Bonus questions are not included in the grade, but if you do it, it will make
me happy
J

and I may award extra points for them.

Use

the word document provided to add your answers, output, plots,
anything that helps with your explanation (the in class practice solutions
could give ideas). Try to make it look professional (do not just spit out
code).

Submission:
Create a zip folder named with your WSU id (
e.g.
w999w999_HW1
). In the zip file you should save
both the writt
en analysis
(doc, or pdf

format) and your program(s)
. If you are submitting a jupyter
notebook, please attach an html output
as well
for easy viewing

as
.

Good luck!

INSTRUCTIONS

Every learner should submit his/her own homework solutions. However,
you are allowed to discuss the homework with each other (in fact, I
encourage you to form groups and/or use the forums) – but everyone must
submit his/her own solution; you may not copy someone else’s solution.

The homework consists of three parts:
1. True and False questions:
2. Fill in the missing values and interpret
3. Coding-analyzing
The first two parts will help you get ready for the midterm as the question
types could be very similar.
The third part will help you develop skills, make you understand the flow of
an analysis, get you ready for individual work.
Bonus questions are not included in the grade, but if you do it, it will make
me happy  and I may award extra points for them.

Use the word document provided to add your answers, output, plots,
anything that helps with your explanation (the in class practice solutions
could give ideas). Try to make it look professional (do not just spit out
code).

Submission: Create a zip folder named with your WSU id (e.g.
w999w999_HW1). In the zip file you should save both the written analysis
(doc, or pdf format) and your program(s). If you are submitting a jupyter
notebook, please attach an html output as well for easy viewing as.

Good luck!

Order your essay today and save 25% with the discount code: GREEN

## Order a unique copy of this paper

600 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26