Question 1 (10 points)Which of the following characteristics are true of Nominal/Categorical

variables?

Question 1 options:

A. The “order” of values is known

B. It is possible to calculate the Mode of the values

C. It is possible to calculate the Median of the values

D. It is possible to calculate the Mean of the values

E. It is possible to quantify the difference between each value

F. It is possible to add or subtract values

G. It is possible to multiply and divide values

H. The variable has a “true” zero point

Question 2 (10 points)

Which of the variable types below are considered “continuous?”

Question 2 options:

A. Nominal / Categorical

B. Ordinal

C. Interval

D. Ratio

Question 3 (10 points)

Which of the following characteristics are true of Ordinal variables?

Question 3 options:

A. The “order” of values is known

B. It is possible to calculate the Mode of the values

C. It is possible to calculate the Median of the values

D. It is possible to calculate the Mean of the values

E. It is possible to quantify the difference between each value

F. It is possible to add or subtract values

G. It is possible to multiply and divide values

H. The variable has a “true” zero point

Question 4 (10 points)

Which of the following characteristics are true of Interval variables?

Question 4 options:

A. The “order” of values is known

B. It is possible to calculate the Mode of the values

C. It is possible to calculate the Median of the values

D. It is possible to calculate the Mean of the values

E. It is possible to quantify the difference between each value

F. It is possible to add or subtract values

G. It is possible to multiply and divide values

H. The variable has a “true” zero point

Question 5 (10 points)

Volume, Velocity, and Variety are characteristics commonly associated with:

Question 5 options:

A. Prescriptive analytics

B. Predictive analytics

C. The Analytical Framework

D. The CoNVO model

E. Business intelligence

F. Big Data

G. The Internet of Things (IoT)

H. Measures of variability

Question 6 (10 points)

Online services like Netflix, Pandora, and Amazon provide their customers with

product recommendations based on their past behavior by primarily using:

Question 6 options:

A. Live human assistance

B. Descriptive analysis

C. Inferential analysis

D. Predictive analytics

E. Prescriptive analytics

F. Unstructured text analysis

Question 9

The ________ identifies the number of standard deviations a particular value is

from the mean of its distribution.

Question 9 options:

A. Z-score

B. F statistic

C. coefficient of variation

D. t-score

Question 10 (10 points)

You conduct a random survey of 1000 SPS graduate students at NYU to collect

basic demographic data. After doing some descriptive statistical analysis, you

notice that there is unimodal distribution of age among the students with

median that is much lower that the mean age. This would indicate that the age

data has a:

Question 10 options:

A. Normal distribution

B. Negatively skewed distribution

C. Positively skewed distribution

D. Distribution of unknown skewness given the information provided

Question 11 (10 points)

Open the attached Excel file, Sales Associate Performance Report.xlsx. It lists

the number of sales that each of the company’s 25 sales associates made in the

month of January.

Conduct a Pareto analysis of the data in Excel (do NOT simply create a Pareto

chart) and answer the following question:

Which sales associates are responsible for approximately the top 80% of all

sales in January? (Check all answers that apply)

In your Excel file (which you will upload at the end of the exam), please briefly

list all of the steps you followed to calculate your Pareto analysis.

Sales Associate Performance Report.xlsx

Question 11 options:

A. Quentin

B. Laura

C. Ellen

D. Nancy

E. Jason

F. Isaac

G. Tanya

H. Charles

I. Charles

J. Karin

K. Brett

L. Jason

M. Thomas

N. Barbara

O. Bryan

P. Edgar

Q. Morgan

R. Roberta

S. MIguel

T. Beatrice

U. Allen

Question 12 (10 points)

Consider the following data set:

Variable 1

Variable 2

15

2

15

9

14

5

10

3

5

8

5

10

8

17

3

15

The sample correlation coefficient for this data set is __________.

Question 12 options:

A. -0.18

B. -0.55

C. 0.55

D. 0.47

E. -0.47

F. 0.18

G. -0.80

H. -0.61

Question 13 (10 points)

In a hypothesis test, if we fail to reject the null hypothesis when it is NOT true,

we are committing a:

Question 13 options:

A. Type I error

B. Type II error

C. Type III error

D. Type IV error

Question 14 (10 points)

A marketing analyst at a magazine publisher wants to determine if there are

statistically significant differences in the subscription renewal rates for two of

the fashion magazines that it sells. If the analyst conducts a one-way ANOVA

test on the renewal rate data in JMP and sees in the results that “Prob>F” is

equal to 0.0058, which of the following conclusions should be drawn?

Question 14 options:

A. If the “Prob>F” is greater than the F Ratio, we can reject the null hypothesis and conclude that the

means are statistically different.

B. Since the “Prob>F” is less than 0.05, we can accept the null hypothesis. Therefore, the means are

NOT statistically different.

C. Since the “Prob>F” is less than 0.05, we can reject the null hypothesis. Therefore, the means are

statistically different.

D. Since the “Prob>F” is positive, we can reject the null hypothesis. Therefore, the means are

statistically different.

E. Since the “Prob>F” is a non-zero value, we can accept the null hypothesis and conclude that the

means are NOT statistically different.

F. Since the “Prob>F” is positive, we can accept the null hypothesis. Therefore, the means are NOT

statistically different.

Question 15 (10 points)

Which analytical technique would you use to conduct hypothesis testing on a

sample of approximately 17,000 consumers of whether their marital status

(single, married, divorced, or widowed) plays a significant role in whether a they

prefer using 1) a standard dishwasher, 2) a mini counter-top dishwasher, 3) a

premium “smart home” dishwasher, or 4) simply washing dishes by hand?

Question 15 options:

A. Z-Score Analysis

B. Correlation Analysis

C. Prescriptive Analytics

D. Pareto Analysis

E. Chi-Square Analysis

F. Fisher’s Exact Test

Question 16 (10 points)

The marketing analyst at a national travel agency is evaluating a new

promotional email to customers against one that was used last time by using

A/B Testing. The new promotional email seems to have a higher conversion rate

than the older one, but the analyst still wants to conduct a chi squared test to

determine if the results are significant.

How should the analyst interpret the results?

Question 16 options:

A. If p value is less than 0.05, then we can say with 5% certainty that the results are not due to chance.

B. If p value is less than 0.05, then we can say with 95% certainty that the results are due to chance.

C. If p value is less than 0.95, then we can say with 95% certainty that the results are not due to

chance.

D. If p value is less than 0.05, then we can say with 95% certainty that the results are not due to

chance.

E. If p value is more than 0.05, then we can say with 95% certainty that the results are not due to

chance.

Question 17 (10 points)

Westworld Vacations has a customer database of 2000 people and decides to

create an email campaign with a discount code in order to generate sales

through its website. It creates an email and then modifies the Call to Action (the

part of the copy which encourages customers to do something — in the case of

a sales campaign, make a purchase).

•

•

To 1000 people it sends the email with the call to action stating, “Offer

ends this Saturday! Use code A1”

To another 1000 people it sends the email with the call to action stating,

“Offer ends soon! Use code B1”.

All other elements of the email’s copy and layout are identical.

The company then monitors which campaign has the higher success rate by

analyzing the use of the promotional codes.

The email using the code A1 has a 6.5% response rate (65 of the 1000 people

emailed used the code to buy a product), and the email using the code B1 has a

4.2% response rate (42 of the recipients used the code to buy a product).

Conduct a chi squared analysis using the attached Excel worksheet to analyze

the email campaign results.

What do the results show?

Test for Significance of A-B Test.xlsx

Question 17 options:

A. The p-value is less than 0.05, so the difference is due to chance

B. The p-value cannot be interpreted

C. The p-value is more than 0.05, so the difference is not due to chance

D. The p-value is less than 0.05, so the difference is not due to chance

Question 18 (40 points)

Match each of the following analytical terms with its intended purpose.

Question 18 options:

1. Chi-Square Analysis

2. Box-and-Whisker Plot (Box Plot)

3. Z-score

4. p-value

5. Coefficient of Determination

6. Kurtosis

7. Skewness

8. Fisher’s Exact Test

•

Using a statistical significance test on contingency tables with a small sample size (for example, one whe

any of the expected values is less than 5 or the total of the expected values is less than 50).

•

Showing a visualization of the distribution of a variable’s values around its median, as well as the

boundaries of its quartiles.

•

Measuring the degree of “tailedness” (i.e., the propensity to produce outliers) for the probability

distribution of a random value

•

Measuring the degree of asymmetry of the distribution of a variable’s values around its mean

•

Determining the proportion of the variance in the dependent variable that is predictable from the

independent variable

•

Estimating the probability for a given statistical model that, when the null hypothesis is true, the statistic

summary (such as the sample mean difference between two compared groups) would be greater or equal

the actual observed results

•

Hypothesis testing applied to two sets of data (observed vs expected frequencies) to evaluate how likely

is that the differences between the sets arose by random chance due to sampling error.

•

Determining the number of standard deviations a particular value is from the mean of its distribution

Question 19 (10 points)

A ________ predicts the change in a dependent variable due to a one-unit increase in an

independent variable while holding other variables constant.

Question 19 options:

A. residual

B. coefficient of determination

C. regression coefficient

D. correlation coefficient

Question 20 (10 points)

A strong, robust multiple linear regression model requires which of the following assumption

(be sure to choose ALL correct answers)

Question 20 options:

A. The model needs to show a high degree of overfitting

B. The coefficient of determination can never be positive

C. The presence of collinearity among the predictor variables

D. The absence of collinearity among the predictor variables

E. The intercept coefficient should always be positive

F. All of the independent variables must be continuous.

G. Homoscedasticity in regard to the error terms in the fitted model

H. Heteroscedasticity in regard to the error terms in the fitted model

Question 21 (10 points)

Z-Mobile, a mobile phone company, is trying to predict each month whether its customers w

cancel their service based on the following variables: 1) customer income, 2) monthly spend

domestic calls, 3) monthly spend on international calls, and 4) monthly spend on data service

Which type of analytical technique would be the most appropriate one for the company to

use?

Question 21 options:

A. Analysis of Variance (ANOVA)

B. Fischer’s Exact Test

C. Logistic Regression

D. T-Test

E. Multiple Linear Regression

F. Simple Linear Regression

G. Factor Analysis

H. Chi-Square Testing

Question 22 (10 points)

Which analytical technique would you use to determine how a museum’s monthly attendanc

is affected by the following factors?:

•

•

•

•

•

number of elementary and high school students in the area

monthly budget spent on “special exhibits”

number of “special exhibits” currently featured

amount spent on advertising that month and the month prior

number of days in the month that fall on a holiday

Question 22 options:

A. Chi-Squared Analysis

B. Logistic Regression

C. Factor Analysis

D. Multivariate Linear Regression

E. Fisher’s Exact Test

F. Simple Linear Regression

G. Analysis of Variance (ANOVA)

Question 23 (10 points)

Which of the following is terms is used to describe standard deviations of the error terms in

linear regression model that are constant and do not depend on the x-value?

Question 23 options:

A. Kurtosis

B. Lepto-Kurtosis

C. Heteroscedasticity

D. Multicollinearity

E. Homoscedasticity

F. Skewness

Question 24 (10 points)

The Least Squares Estimates method is used the to calculate the value of the dependent

variable in Logistic Regression analysis

Question 24 options:

A. True

B. False

Question 25 (10 points)

You are an entertainment blogger who runs a popular YouTube channel about movies, TV, a

video games that people can subscribe to.

Several months ago, you created a collection of movie/TV character-themed t-shirts which

seems to be selling well but could be doing much better in your opinion. While thinking up

better ways to market your t-shirts, you wonder if subscribers to your YouTube channel are

much more likely to buy your t-shirts than non-subscribers.

You go to your customer database and pull up a large random sample of people and data on

whether they bought any of your t-shirts or not and whether they subscribe to your YouTub

channel or not. The variable BuyTshirt is equal “1” if they purchased any of your t-shirts and

“0” if they did not. The variable YouTube is equal “1” if they subscribe to your YouTube

channel and “0” if they do not.

You then run a Logistic Regression on the two variables to calculate odds ratios for buying a

shirt given whether one subscribes or not to your YouTube channel. You first decide to

compare those who are non-subscribers ( YouTube = 0 in the numerator) to those who are

subscribers ( YouTube = 1 in the denominator) as so:

When you calculate this in JMP, the result produced is: 0.07

True or False?: The result means that YouTube channel non-subscribers have a 0.07% chanc

of buying one of your t-shirts

Question 25 options:

A. True

B. False

Question 26 (40 points)

Which of the following factors make a data visualization more effective in a presentation to

i9mportant audience? (check all that apply)

Question 26 options:

A. Designing your data visualizations with your audience in mind

B. A high signal-to-noise ratio

C. A low signal-to-noise ratio

D. Choosing 3D charts over 2D charts in order increase audience engagement

E. Minimizing the “white space” as much as possible

F. A sense of narrative

G. An emphasis on functional design over decorative design

H. Always choosing style over substance

I. Using as wide a variety of colors as possible to make data visualizations eye-catching

J. A greater emphasis on making all data visualizations more “Referenceable” rather than “Glanceable”

Question 27 (10 points)

Which of the following data visualizations is best for presentation to an important audience

(if the images do not appear in your browser, consult the attached file)

Which of the following data visualizations is best for presentation to an important

audience.pdf

Question 27 options:

A. It depends upon the goals, narrative and the audience of the presentation.

B. It depends specifically upon the viewer’s knowledge of statistics.

C. It depends upon your aesthetic or artistic tastes.

Kuiper Car Company

Kuiper is a (fictional) car manufacturer that has collected some data on its competitors’ cars

You work for the marketing department of the car manufacturer and your boss has asked

you to help with a competitive analysis of the market.

Use the following data file to conduct your analysis in JMP: Kuiper-b.jmp

Question 28 (10 points)

What is the price of the most expensive car?

Question 28 options:

A. $74,819.22

B. $69,872.92

C. None of these options

D. $71,917.44

E. $70,755.47

Question 29 (10 points)

What is the average (mean) price of a car?

Question 29 options:

A. $21,235.53

B. $18,024.99

C. None of these options

D. $21,343.14

E. $26,787.88

Question 30 (10 points)

What is the shape of the price distribution?

Question 30 options:

A. Positively-skewed

B. Symmetric

C. Bimodal

D. None of these options

E. Negatively-skewed

Question 31 (10 points)

On average, which manufacturer has the lowest priced cars?

Question 31 options:

A. Buick

B. Chevrolet

C. Saturn

D. Pontiac

E. Mercury

F. None of these options

Question 32 (10 points)

Which of the following best describes the relationship between liters and cylinders?

Question 32 options:

A. None of these options

B. There is no correlation between liters and cylinders

C. There is a strong positive correlation between liters and cylinders

D. There is a weak positive correlation between liters and cylinders

E. There is a strong negative correlation between liters and cylinders

Question 33 (10 points)

What are the slope and intercept of the regression line predicting liters based on cylinders?

Question 33 options:

A. The slope is 0.318 and the intercept is 0.917

B. The slope is 0.7632361 and the intercept is -0.983916

C. The slope is 0.917 and the intercept is 0.318

D. None of these options

E. The slope is -0.983916 and the intercept is 0.7632361

