CIS-STA 3920 Project Assignment, Draft #1
Turn in 2 documents, one to each of the two portals.
Doc#1: Submit this project as a Microsoft Word document, only, to the designated Project
Portal. ; automatic deduction of 20 points if late up to 24 hours; after 24 hours, no credit.
Doc#2: Submit the Excel document in which you computed lagged stock return and lagged risk
data sets, from 2006 on, according to instructions I presented in class and in LN5.A. Submit to
the Data Portal.
For the Word document filename, put your last name first, then first name, then word
Project. For the Excel filename, put your last name first, then first name, then the word Data.
Showcase your ability to produce a well-crafted document, your capacity to learn how to do
something challenging on your own, and your ability to engage with the reader and to explain
what is happening. Demonstrate your ability to follow instructions and to work with integrity.
✓ On the cover page, include your name, the course number and date.
✓ On the cover page, show an interesting title and graphic, with the goal encouraging the reader
to turn to turn to the second page, rather than giving up at a glance. Use a graphic from your
paper – include your name in the title of the graph.
✓ On the second page, place a well-formatted Table of Contents.
✓ Number your pages.
✓ Number the questions in the same way as shown: 1, 2,…
✓ Answer each question at its numbered position. I will not look elsewhere for the answers.
✓ Insert your name in the main title of every R-graphic you show.
✓ Do not show the text of my questions. Instead, fold the question into the answer. For
example, instead of repeating my first question, you could say, “1. I am going to begin with
an introduction to the corporation from which…,” that is, don’t show the question.
✓ When using a quotation, use quotation marks and then a footnote to identify the source.
✓ Source all material taken from books, articles, my lecture notes, or the Web.
✓ For recommended layout style, review my lecture notes.
✓ Write in the first person.
✓ Share some of your (a) thought processes, (b) miscues, (c) workarounds, and (d) insights.
✓ Demonstrate engagement with the concepts and approaches taken in the Lecture Notes.
✓ Do not share your project work with anyone else; that is cheating.
✓ Do not plagiarize; do not plagiarize me.
TO START: First, select a classification methodology to learn in your project. Select either
Classification Trees or Support Vector Machines. Those topics are covered in the following
chapters in the ISLR text:
(a) Chapter 8, pages, 303-331 covers Tree-Based Methods (or CART, short for
classification and regression trees). Only show results for classification trees, not
(b) Chapter 9, pages, 337-368 covers SVM (“support vector machines”);
In terms of coverage, if you pick Chapter 8, Tree-Based Methods, get through boosting.
In Chapter 9, SVM, get through the section on support vector machines.
Appendix. After making your choice from above, get introduced to it by walking through the lab
example provided in the ISLR text. Show that work in an Appendix, including any graphs.
On those graphs, insert your name into the titles. To actually get the data used in the ISLR
examples, you will likely need to download an R package called ISLR; it contains the data
sets used in the text.
1. [20 points] Begin by introducing your reader to the corporation from which your stock data
comes. Tell the reader something you learned about that corporation that you found
interesting, something which would demonstrate to a recruiter that you possess curiosity and
the ability to employ it.
Explain to the reader why we took two different transformations of the price data,
return and risk. Illustrate your discussion with before-and-after graphics. Review LN2.A
and your LN2 homework as a refresher.
Then, using trimmed screenshots where needed from Excel, sketch out for the reader
how you converted your Yahoo-sourced stock data into lagged stock risk data set since
2.  First, draw a random sample of size n=300 without replacement from your stock return
data set. Recall that your stock return data contains a HiLo return column and standardized
log lag1 and log lag2 return columns. I will call this your n=300 stock return data set.
Show and explain how this is done.
Second, draw another random sample of size 300 without replacement from your stock
return data set. Recall that your stock risk data contains a HiLo risk column and
standardized log lag1 and log lag2 columns. I will call this your n=300 stock risk data set.
Next, using your n=300 stock return data set, walk through the steps covered by the
ISLR text for your chosen method, SVM or CART, explaining in your own words what you
are doing. Put your name into the title of any graphs you show. Where you are unclear as to
what is happening or why it is being done, say so, and document your efforts to work
towards understanding. Do not waste time trying to fake comprehension via plagiarism.
You are welcome to read up on from other sources concerning your chosen method,
including from textbooks, academic articles, and the web. However, carefully credit the
sources from which you gain insight, and do not plagiarize them! It is natural that you will
not understand much of what you read, but you can start wrestling with it and you can
document that wrestling.
Finally, run the program on your n=300 stock risk data set and compare the
performance to that of your n=300 stock return data set. Include use of the chi-square test.
Discuss the differences, the reasons that these would happen, and the lessons learned about
the nature of the stock market.
3.  Select one of the tuning parameters or decision criteria that lie beneath the surface of
your chosen methodology, CART or CART. Engage with it by researching beyond the
ISLR text. Then experiment with it. Experiment with your data and with other data sets.
Try decreasing or increasing n. Look at other sources for help, documenting the sources.
When borrowing text, use quotation marks and footnote the source. Do not plagiarize text
Here are some examples of possibilities from Chapter 8 on CART. On page 312, the
Gini index is defined, but what is it? Can you compute it yourself? How is the G statistic
used after it is computed? What is it compared against? Is that a parameter that you can
tune? Also, at the top of that page, the text says that the “…classification error rate is not
sufficiently sensitive.” See if you can demonstrate a lack of sensitivity! See if you can
figure out what is meant in this context by “sensitive?” To what is it not sensitive? The text
also says that entropy is an alternative to Gini and gives similar results. Do you find that to
Here is an example from Chapter 9 on SVM. On page 346, a parameter denoted C is
introduced, but what does it do? Demonstrate that you worked diligently on this problem:
the goal is to engage in with the issue, not to produce miracles of comprehension or a
plagiarism dump! Take a hands-on, practical approach: that requires experimentation.
4.  Create classification space plots for both of your n=300 data sets, using your chosen
methodology, SVM or CART. Be sure to explain how you went about this. Create the plot
using the same techniques that we did in our plots for other methods. Work it out yourself,
step by step.
If you are doing SVM, you will find that the SVM software automatically outputs
classification space plots. Do not show me those plots, as they will count ZERO on this
assignment. I am requiring that you create a classification space yourself, step by step. As
we know from trying to do that ourselves, it is not easy. Evidence of thoughtful and diligent
work is more important than getting your plots to work perfectly.
5.  Prepare a comparative study of knn, naive Bayes, logistic regression, and your selected
method. Make this comparison on your two n=300 data sets, splitting the data randomly in
half to get the training and testing sets.
Explain what you are doing as you go along, explain what you understand about what
distinguishes the methods, discuss reasons why the results vary, and why there might be
systematic differences in performance between return data and risk data. Show
classification space plots for knn, naïve Bayes, and logistic regression. At the end, show a
single table in which you summarize the overall correct forecast rate for the stock returns for
the four methods; then another table summarizing the performance on the stock risk data.
Why Work with Us
Top Quality and Well-Researched Papers
We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.
Professional and Experienced Academic Writers
We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.
Free Unlimited Revisions
If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.
Prompt Delivery and 100% Money-Back-Guarantee
All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.
Original & Confidential
We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.
24/7 Customer Support
Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.
No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.
Admission Essays & Business Writing Help
An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.
Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.
If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.