Instruction:
For this assignment, we will be working on understanding the behaviors and characteristics of people who use a digital application related to health and wellness. A free version offers limited content, such as tracking your exercise. A paid subscription service offers personalized coaching from a certified exercise trainer. The company generally values the utilization of the application. The product managers want the users to sign in frequently. Paid coaching is a source of revenue for the company. The product managers are also interested in analyzing their data to better understand the patterns of user behaviors. Your work on this assignment will support that effort.
Data:
To begin to investigate these questions, the company has gathered some information(exercise_application_user_data.csv attached below) The data include a sample of new users of the application. Each user.s behavior was tracked for 90 days. For each user, basic demographic information (Age.Group and Country) was collected from the user’s profile. Then the following characteristics were measured:
Weekly.Exercise.Sessions: This records an individual user’s rate of weekly exercise sessions tracked in the application over the 90 day period.
Daily.Logins: This measures the user’s rate of logins per day over the 90 day period.
Paid.Coaching: This measure (TRUE/FALSE) indicates whether the user paid for any coaching service within 90 days.
Based on the information above and the data provided, please answer the following questions. For multiple choice questions, please select the best answer. When numeric answers are requested, enter the value to at least 4 decimal places. For questions with a free response, please answer in approximately 2-4 sentences. Your answers should be directed to the product manager of the digital application.
So far, we’ve covered linear and logistic regression, so please do not go beyond these two areas. Please also provide R code and brief explanations with the answers, thank you.
Logistic Regression and Odds Ratio Example:
A survey was conducted among 600 teenagers to determine if college graduation is
related to the likelihood of wearing a seatbelt when in a motor vehicle (beltalways). The
primary outcome (dependent variable) was whether a seatbelt was always worn (coded
0 for no and 1 for yes). The main independent variable was college graduate (grad)
(coded 0=non-graduate and 1=graduate) and a possible confounding variable was male
sex (sexm) (coded 0 for no and 1 for yes).
Because the variables are categorical (binary), we use multiple logistic regression
analysis using the “glm” function in R (general linear models). “glm” includes different
procedures so we need to add the code at the end “family=binomial (link=logit)” to
indicate logistic regression. We can conduct the logistic analysis using the code below:
>log.out summary(log.out)
The default output shows the regression slopes which can be used to judge the
direction of associations and their statistical significance. Note that grad has a positive
association with wearing seatbelts and sexm has a negative association with wearing
seatbelts.
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept). -1.31109 0.84639 -1.549 0.121376
sexm
– 0.15505 0.17060 -0.909 0.363414
grad
0.17192 0.07599 2.262 0.023672 *
We need to exponentiate the coefficients in order to get the odds ratios. R can do this
with the following command:
> exp(log.out$coeff)
(Intercept)
sexm
grad
0.2695271 0.8563720 1.1875858
Interpreting this Odds Ratios
In this case, the outcome is wearing a seatbelt. Males were 14% less likely to wear a
seatbelt, but the difference was not statistically significant (p=0.36). College graduates
were 19% more likely to wear a seatbelt (p=0.024).
Remember: When variables show to have a negative association with the outcome
variable – when it is converted into an Odds Ratio it will be less than 1. (sexm). When
variables show to have a positive association with the outcome variable – when it is
converted into an Odds Ratio it will be greater than 1. (grad).
Question 12
5 pts
The managers are interested in whether users from the USA tend to log
in as often as users from the other countries. We also want to take
differences in weekly exercise sessions and age groups into account.
(Paid Coaching status is related to a separate goal, so we won’t include it
here.) Build a model that will show how much more or less often the
users from the USA sign in. Display relevant information about the
model. Then provide a short interpretation (2-4 sentences) about your
findings.
Edit View Insert Format Tools Table
12pt
Paragraph
B I U A
T² ✓ 6
Screenshot(Alt + A)Question 14
5 pts
Now build a multivariable model to show the increase in the odds of paid
coaching for users in the USA relative to other users. Make sure to
include adjustments for the other measured variables (age group, weekly
exercise sessions, and daily logins). Calculate the estimated odds ratio
(the exponential of the coefficient) for users in the USA relative to others.
Round this value to 3 decimal places.
Question 15
What does the model from the previous question tell you about the
relationship between daily logins and paid coaching?
5 pts
Additional log-ins cause users to become more likely to pay for coaching. Every
additional log-in per day increases the relative odds by about 57.4%.
Paid coaching causes users to log in 57.4% more frequently.
There is a positive relationship between these variables, with additional daily
log-ins associated with about a 57.4% relative increase in the odds.
There are too many potentially confounding factors to say anything about the
relationship between paid coaching and daily log-ins.Question 16
5 pts
Thinking about all of the questions you have addressed so far, what type
of research study was conducted?
A randomized, controlled trial that tested the effect of age on subscription
rates.
An observational study that compared users based on their self-selected and
pre-existing characteristics.
An observational study that compared users by random assignment to
treatment and control groups.
A quasi-experimental study that looked at different times and places.
Question 17
5 pts
What are the potential concerns with the study’s design? Select the best
answer.
The study has too much of a limitation because it was only conducted in 3
countries. We don’t know if other countries would have had the same kinds of
user behaviors or effects.
The company should have focused on age effects on paid coaching. Younger
users under 35 had lower rates of paid coaching in the multivariable model
with a statistically significant p-value. The problem was not with the kind of
research we conducted; it’s that we investigated the wrong questions.
The study was observational. The effects we discovered might not be fully
attributable to the independent variables that we studied. Selection bias could
lead to imbalanced groups, and the statistical results we calculated might not
isolate the effect of the independent variable. Confounding variables, both
measured and unmeasured, might have an impact on this effect.
All of the above.Question 18
Thinking about the entire range of analyses, our study was:
5 pts
Actionable because the findings of the analysis will guide us on where to focus
for further research in strategic development.
Actionable because a clear change in strategy was investigated, and we now
have good information about the likely effects of that change.
Not actionable because the findings of the analysis will not guide us on where
to focus for further research in strategic development.
Not actionable because a clear change in strategy was not investigated.
Question 19
5 pts
What else could you recommend to the managers of the product for
improving their performance in terms of daily logins and paid
subscriptions? Select a strategy that is actionable, would have
measurable effects, and is amenable to experimentation.
Edit View Insert Format Tools Table
12pt
Paragraph
BIUA
>
T²
✓Question 20
5 pts
For the strategy you selected in the previous question, how much do you
think it will increase the proportion of users who pay for coaching within
3 months of signing up? State and justify an opinion on the expected
effect size.
Edit View Insert Format Tools Table
12pt Paragraph
B I
U A
T² v