ACTU PS5841 Data Science in Finance and Insurance β Autumn 2019Dr. Yubo Wang
Assignment-1
Assigned 9/5/19, Due 9/17/19 (Tue)
Problem 1. Statistical Learning
Suppose the observed data are generated by
π¦ = 1 + 2π₯ + π,
π₯ β [β50,50],
π β π(π = 0, π 2 = 102 )
Use your preferred data analysis tool (a spreadsheet at this stage can be useful to many), demonstrate
numerically that a simple linear regression model π¦Μ = π½Μ0 + π½Μ1 π₯ is able to learn.
[a] Specifically, use a test set of size 100 and training sets of various sizes (30, 100, 200, 300),
numerically estimate the corresponding expected test MSE and complete the following table.
Training Set size
Expected Test MSE
30
100
200
300
[b] Please also provide a plot of the expected test MSE against the training set size.
Problem 2. Bias-Variance Trade-off
Suppose the observed data are generated by
π₯
π¦=
+ π,
π₯ β [β25,25],
π β π(π = 0, π 2 = 0.52 )
2
β1 + π₯
Suppose you use polynomial regressions π¦Μ = βππ=0 π½Μπ π₯ π , π = 1, 2, β¦ ,6 to learn from data and make
predictions.
Use your preferred data analysis tool (a spreadsheet at this stage can be useful to many), numerically
demonstrate the trade-off between bias and variance.
Specifically, use 300 training sets and test them on the test set associated with π₯ = β20, β10, 0, 10, 20.
[a] Please complete the following table with your estimates to demonstrate that the variance-bias
decomposition roughly holds for each model.
degree n
Expeted Test MSE
squred bias
variance
variance of error term
LHS – RHS
1
2
3
4
5
6
[b] Please provide a graph based on your estimates demonstrating the bias-variance trade-off.
Please see notes on linear model and on Excel on the next page.
ACTU PS5841 Data Science in Finance and Insurance β Autumn 2019
Dr. Yubo Wang
Assignment-1
Assigned 9/5/19, Due 9/17/19 (Tue)
Notes on linear model
Μ , the coefficients based on least squares estimation are
Μ = π½Μ0 + ππ π·
For a linear model π
Μ = (πΏπ πΏ)βπ πΏπ π
π·
π
Μ = (π½Μ0 , π·
Μ π ) , πΏ = (π, π1 , β¦ , ππ ) where ππ = (π₯1π , β¦ , π₯ππ )π , and π = (π¦1 , β¦ , π¦π )π .
where π·
Notes on Excel
Transposition π¨π = ππ
π΄ππππππΈ(π¨)
Matrix multiplication π¨π© = ππππΏπ(π¨, π©)
Inverse matrix π¨β1 = ππΌπππΈπ
ππΈ(π¨)
π
π΄ππ·() returns a number randomly sampled [0,1)
πππ
π. πΌππ(ππππππππππ‘π¦, ππππ, π π‘πππ£) returns the inverse of the normal cumulative distribution for
the specified mean and standard deviation.
Data->What-if analysis->Data Table is a convenient tool for automating repetitive tasks.
Bias vs Variance (2)
High Bias
Low Variance
Low Bias
High Variance
Prediction Error
Test Sample
Training Sample
Low
High
Model Complexity
Bias vs Variance
3
(3)
E