I need some tutoring with statistics using SPSS

Question #5 indicates there are 23 outliers, based on Jackknife distances. We are not using Jackknife, but boxplots. You will get a different number of outliers using boxplots, so ignore the Jackknife indication. Most likely you will see 31 outliers using boxplots. Go with what you get with boxplots. I’m more concerned with the process/logic of your responses and not on the specific answers as those will vary somewhat from person to person. There is too much variability in how you all answer these graded assignment questions; depends on the number of outliers you identify

1. What challenges did Research Project 3 pose as you were completing it?

2. In Question 9 you ran four independent bivariate regression analyses. Suppose instead of

keeping these analyses separate, you instead ran a single multiple regression analysis that

included all four IVs being analyzed collectively and in the presence of each other. How

different do you think the respective B values would be? Explain.

3. As you begin thinking about your own dissertation research, what role do you perceive

correlation and regression analysis playing relative to the statistical strategy you would use?

Week 11
Applying Linear Regression and Prediction: A Guided Example
This handout material provides a guided example of a regression analysis. The example is
similar in structure to the previous guided examples that have been presented in Week’s 7, 8, and
10. Prior to working through this example you are encouraged to review Chapter 9 of the
assigned textbook by Wilson and Joye as well as the Gallo supplement on regression.
Guided Example Context
The research context for this example is to determine if pilots’ sex (male vs. female) has
any relationship to their annual salary, and to be able to generate a prediction equation that could
be used to predict pilots’ salary based on their sex. A random sample of N = 62 full-time ATPs
from Spirit Airlines was selected and participants were asked to report their annual salary and
sex (male or female). A copy of the data acquired from this study is given in the Excel file,
“Week 11 Guided Example Data.”
Pre-Data Analysis
Pre-data analysis involves stating the research question, identifying the correct research
methodology to answer the RQ, and conducting an a priori power analysis to determine the
minimum sample size needed.
What is the RQ? The overriding research question for the current example is: “What is
the effect of sex on the annual salaries of ATPs? In the context of the current study, an ATP is
defined as a full-time pilot working for Spirit Airlines who holds an ATP certificate, which is the
highest level of aircraft pilot certificate issued by the FAA. In part 121, or air carrier operations,
each pilot is required to have an ATP certificate. Annual salary is defined in U.S. dollars, and sex
refers to male or female.
What is the research methodology? One research methodology that could be used to
answer this question is ex post facto because the study involves a pre-existing group membership
that cannot be manipulated. Alternatively, the research methodology could be correlational
because the study involves a single group (ATPs) and is examining the relationship between
multiple variables (salary and sex) of this single group.
So which is the more appropriate methodology: ex post facto or correlational? If the
focus of the study were to examine the differences in salary between male and female ATPs to
see if one group’s mean annual salary was significantly higher (or lower) than a second group’s
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 1
mean annual salary, then effects-type ex post facto would be appropriate. In fact, this would be
analogous to the second guided example given in Week 7 that examined the difference in mean
FAA IRA scores between two groups of students: those who prepared for the exam using
Gleim’s software and those who prepared for the exam using Sheppard Air’s software.
However, given that one of the study’s goals is to generate a prediction equation that can
be used to predict salaries based on sex, then this is a prediction study and therefore the
appropriate research methodology/design is prediction correlational research. This is because
the results of the study will be used to forecast future behavior by examining the correlations
between variables. More concretely, because we endeavor to estimate the effect on a dependent
measure (annual salary) relative to a change in a predictor variable (sex), the focus is on the
regression coefficient (B) because the regression coefficient is a reflection of causal effects.
What is the minimum sample size needed? To determine the minimum sample size, we
consult G•Power based on the following parameters:
• Test family = t tests
• Statistical test = Linear multiple regression: Fixed model, single regression coefficient
• Type of power analysis: A priori
• Tail(s) = Two
• Effect size f 2 = 0.15, which is a medium effect
• α error prob = .05
• Power = .8
• Number of predictors = 1
The minimum total sample size is N = 55. We will use a two-tailed test because we do not know
what the effect will be. Furthermore, in the absence of any guidance from theory, the literature,
or a preliminary study, we set the expected effect size to medium.
Data Analysis
We now conduct the hypothesis test. Following is a summary of the steps associated with
the corresponding hypothesis test.
Step 1: Formulate the null and alternative hypotheses.
H0: β = 0: The slope of the regression line in the population is zero (horizontal line).
More specifically, sex (x) has no significant effect on annual salary (y) among
full-time ATPs.
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 2
H1: β ≠ 0: The slope of the regression line in the population differs significantly from
zero (oblique line). More specifically, sex (x) has a significant effect on on
annual salary (y) among full-time ATPs.
Step 2: Determine the test criteria. The test statistic is t, the level of significance is α =
.05, and the boundary of the critical region is determined from the table of critical values for t
given in Table 7.1 (p. 4) of the Gallo supplement for Week 7. From the Excel file, N = 62 and
therefore the degrees of freedom are df = 62 – 2 = 60. For a 2-tailed test with a = .05, the critical
value is tcritical = ± 2.0. Thus, to be significant, the calculated t must be greater than or equal to
tcritical = 2.0 or less than or equal to tcritical = -2.0. This is shown in Figure 1. Note that the sample
size is greater than the minimum sample size needed.
Reject H0
α = .025
t(60) = −2.0
Reject H0
α = .025
t(60) = 2.0
Figure 1. Critical regions for two-tailed t test for α =
.05 and df = 60 (N = 62).
Step 3: Collect data and compute sample statistics. We import the given Excel file into
our statistical software program and then run a regression analysis The first thing we do is
observe that the independent variable, x = sex, is categorical. Before we can perform a regression
analysis we have to express this categorical variable as a continuous variable. Because x
represents a dichotomy, we will use a binary coding scheme of 0 and 1 and assign 0 to males and
1 to females. Thus, we must create a new variable in the data set that has these assignments
before we conduct any analyses.
Check assumptions. The next thing to do is check to see if the data satisfy the underlying
assumptions. As indicated in the Gallo supplement, there are four main assumptions: (a)
linearity, (b) homoscedasticity, (c) independence of the residuals, and (d) normality. The single
best way to confirm these assumptions is by examining residual plots. Recall that residuals are

nothing more than the difference between an observed y score and the corresponding predicted y
score for a particular x value. Your statistical software should provide scatter plots of the
residuals as part of its output for a regression analysis. If it does not do so by default, then you
€
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 3
will have to invoke additional menu commands to get the plots. For readers using SPSS, here is a
link that describes how to test for these assumptions using SPSS:
http://blog.uwgb.edu/bansalg/statistics-data-analytics/linear-regression/what-are-the-fourassumptions-of-linear-regression/. You also must make sure that you will be working with the
modified continuous version of the x variable and not the categorical x variable.
Linearity of residuals. To confirm the linearity assumption, we examine a residual plot in
which the y scores are placed on the y axis and the residuals are place on the x axis.
(Alternatively, we could examine a scatter plot that involves plotting the residuals on the y axis

and the predicted y values on the x axis.) A copy of this plot is shown in Figure 2. We now
examine this plot to see if the relationship is linear in form. This appears to be the case. The
scatter plot does not suggest a nonlinear pattern. As a result, this assumption is satisfied.
€
Figure 2. Scatter plot of y vs. residuals.
Homoscedasticity of residuals. Recall from the guided example in Week 10 involving
correlation that homoscedascity means “equal variances,” which implies that the variances are
the same for all values of each variable. To confirm this is the case for a regression analysis, we
examine the same residuals vs. predicted scatter plot we used for the linearity assumption. What
we seek is a plot that has no discernible pattern. From Figure 2, there appears to be no detected
systematic trend (other than the linear relationship). More specifically, the data do not “fan out”
in a horizontal “V” form or show a curvilinear form. As a result, this assumption is satisfied.
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 4
Independence of residuals. In a regression analysis the residuals also must be
independent of each other. There are several instances when this assumption has the potential to
be violated. One form of dependency in the residuals occurs when there is a systematic change
over time in the nature of the participants such as in a longitudinal study. Although this is not the
case of the current study because it is cross-sectional in nature (the data were collected only one
time), it would still be prudent to confirm this assumption. For our purposes, the best strategy for
detecting violations of this assumption is to examine a plot of the residuals vs. the case numbers.
If there is no detectable/discernible pattern, then there is a good indication that the residuals are
independent. To do so, we plot the residuals on the y axis and the case numbers on the x axis. As
shown in Figure 3, there appears to be no detectable/discernible pattern and therefore this
assumption is satisfied.
Figure 3. Scatter plot of residuals vs. case numbers (1–62).
Normality of residuals. To confirm the normality assumption, we examine a normal q-q
plot similar to what we have done in previous examples. For the current situation, though, the qq plot involves the residuals. As shown in Figure 4, the points appear to fall along the line and
are contained within the 95% confidence band. This assumption also can be confirmed by
examining the corresponding Shapiro-Wilk Goodness of Fit test. The results of this test confirm
the residuals are from a normal distribution.
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 5
Figure 4. Normal q-q plot of the residuals.
These preliminary analyses confirm that the four principal assumptions of regression are
satisfied. As a result, we now run the regression analysis and review the primary results.
Run the analysis. Now that the assumptions have been met, we run the analysis and
review the findings. As part of our review we also will interpret the findings in the context of the
research setting.
The scatter plot. A copy of the scatter plot that corresponds to this analysis is shown in
Figure 5. Note how the scatter plot shows the dichotomy between Males (coded 0) and Females
(coded 1) relative to their salaries.
Figure 5. Results of regression analysis in which y = annual
salaries were regressed on x = sex (male or female).
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 6
The regression equation. The results of the analysis produced the following regression

equation: y = -3902.09x + 56515.06 with B = -3902.09, which is the slope, and B0 = 56515.06,
which is the y value of the y intercept. From a generic perspective, B is interpreted as for every
one-unit change in x we can expect on average a B-unit change in y. Because x represents a
€
dichotomy with 0 = Males and 1 = Females, the only change in the slope is between 0 and 1.
This can be seen from Figure 5. Note how the slope of the line is negative: It decreases from
Males to Females, which indicates that the annual salary of males is higher than that of females.
This can be confirmed by simply substituting the coded values into the regression equation:
• When we substitute 0 (Males) for x into the equation, we get

y = -3902.09(0) + 56515.06 = 0 + 56515.06 = 56515.06
This result represents the predicted average salary for Males.
• When we substitute 1 (Females) for x into the equation, we get
€ 
y = -3902.09(1) + 56515.06 = -3902.09 + 56515.06 = 52612.97
This result represents the predicted average salary for females.
Thus, based on these results, the mean average salary for males is $56,515.06 and the mean
€
average salary for females is $52,612.97. Note that the difference in these mean salaries is
$56,515.06 – $52,612.97 = $3,902.09
This difference is exactly what B is equal to in the regression equation. So when conducting a
regression analysis involving a dichotomy where one group is coded 1 and another group is
coded 0, B represents the difference in group means, and B0 represents the mean of the group that
was coded 0.
The coefficient of determination. The coefficient of determination is r2 = .04. This is
interpreted from an explained variance perspective as follows: 4% of the variance in salaries (y)
is explained by the variance in sex (x). Because the variance in sex is the difference between
males and females, we would conclude that 4% of the variance in salaries is being explained by
whether a pilot is male or female. This implies that 96% of the variance in salaries is being
explained by something else. We also can interpret this from a prediction perspective: If we
know the sex of a pilot (male or female), then we have 4% of the information to perfectly predict
his or her annual salary.
The result of the t test. The t test result is t(60) = -1.59, p = .1173, which means that the
regression equation is not significant and the corresponding r2 also is not significant.
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 7
Standard error of estimate. Recall that the standard error of estimate is a metric that
provides an idea of how far “off” we will be on average in using x to predict y. Our statistical
software reports this error as the root mean square error. Based on the results, RMSE = 9586.92,
which indicates that if we use the sex of a pilot (male vs. female) to predict pilots’ annual salary,
we will be “off” on average by $9,586.92 or by approximately $9,600.
Outliers. Before we continue to Step 4 of hypothesis testing, it might be prudent to first
check for outliers. Although this is not an assumption of regression as it is with correlation, it is
still possible that outliers could have an impact on the results. Running an outlier analysis using
Jackknife distances, we discover two outliers: Case 48 (male pilot) and Case 60 (female pilot),
and both appear to have relatively high salaries compared to the rest of the sample. The results of
a regression analysis in the absence of these two outliers are as follows:

• y = -3944.10x + 55721.29
• r2 = .05
€
• t(58) = -1.76, p = .0831
• RMSE = 8584.68
Note that although the results appear to be nearly the same as those with outliers present,
there is a slight improvement in the analysis without outliers: (a) There is a gain of one
additional percent in explained variance or prediction (5% vs. 4%); (b) There is a stronger t
value, and the corresponding p value is closer to the preset threshold for committing a Type I
error (.0831 vs. .1173); and (c) There is less error when using the regression equation to predict
salaries (RMSE = 8584.68 vs. 9586.92). Based on these findings, it appears the outliers are
having a slight impact on the findings in that they are suppressing (or masking) the relationship.
Because the two outliers represent only 3% of the sample, and because they are not having that
great of an impact on the results, we choose to keep the outliers to reflect a real-world situation.
Step 4: Make a decision: Either reject or fail to reject the null hypothesis. The
decision is fail to reject the null hypothesis and conclude that sex has no significant effect on the
annual salaries of full-time ATPs from Spirit Airlines.
Post-Data Analysis
We now determine and report the corresponding effect size, power, and 95% confidence
interval. We also discuss plausible explanations for the results.
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 8
What is the effect size? Recall that effect size of regression is f 2, which is equal to the
ratio of r2 and (1 – r2). Given r2 = 04, f 2 = .04/.96 = .042, which is a small effect size.
What is the power of the study? To determine the actual power of the study we consult
G*Power and focus on the post hoc side by entering the parameters to reflect the actual results
from the study: Effect size f 2 = .042 and Total sample size = 62. When these changes are made,
the power of the study is .355, which indicates there is a 35.5% probability that the small effect
found in the sample truly exists in the population.
What is the corresponding confidence interval? The 95% CI reported in the output from
our statistical software is as follows: 95% CI = [-8420.99, 532.79], which tells us that 95% of the
time we can expect the true effect sex has on salaries to lie within the interval [-8420.99, 532.79].
In other words, 95% of the time we can expect female ATPs from Spirit Airlines to earn anywhere
from $8,420.99 less than their male counterparts to $532.79 more than their male counterparts.
Another way of looking at this is to say that if we were to randomly select 100 samples of size N =
62 from the same population, then in 95 of these samples the effect of sex on salaries would be
between -8420.99 and 532.79. Furthermore, because the null hypothesis states that the relationship
will be 0, and because 0 is within the 95% CI, we fail to reject the corresponding null hypothesis.
Finally, we would conclude that based on the width of the 95% CI, the corresponding accuracy in
parameter estimation (AIPE) is not that good (judgment call). In other words, the sample data are
not a good source for accurately predicting the true effect sex has on salaries.
What are some plausible explanations for the result? One plausible explanation for the
result of this study is that Spirit Airlines might have made a sincere effort to address the gender gap
among its ATPs. A second plausible explanation is a combination of sample size and sampling
strategy. Of the 62 participants, 27 were females, which represented approximately 44% of the
sample. According to the FAA, approximately 6% of the nation’s ATPs are female. Thus, if the
sampling strategy had been stratified random sampling instead of simple random sampling, and the
sample size was larger than 62, it is possible that the sex effect would have been significant. What
other plausible explanations for the results can you think of?
Michael A. Gallo © 2018
Week 11: Gallo Guided Example: Applying Linear Regression and Prediction Page 9
Week 12
Research Project 3
This week is dedicated exclusively to engaging in a research project that will enable you to apply
the various concepts you learned primarily from the previous 2 weeks, but also throughout the
entire course thus far. This research project uses selected variables from the certified flight
instructors (CFIs) study, which was examined in Research Project 1 (Week 4) and Research
Project 2 (Week 9). The variables that have been targeted for the current project are as follows:
X2 = Participants’ age in years
X6 = Total years participants have held a CFI certificate
X7 = Participants’ total hours dual given
X9 = Participants’ total flight time (in hours)
Y = Complacency scores, which were measured on a 7-item Likert scale ranging
from 1 = Strongly Disagree to 5 = Strongly Agree. Thus, scores could range from
7 to 35, with higher scores indicating a greater likelihood toward complacency as
a flight instructor.
The dataset is given in the Excel file “Week 12–Research Project 3 Data.” Also note that the
sample size is now N = 276 instead of the original N = 340.
1. Using your statistical software package, generate a correlation matrix that involves all five
variables. This matrix is essentially a table that contains the bivariate correlations of all the
variables. The table also will have 1s along its diagonal due to symmetry as follows:
Y
X2
X5
X7
X9
Y
1
X2
X5
X7
X9
1
1
1
1
2. Interpret each r value associated with y in the context of the given research setting. (Note:
From the correlation matrix table, the correlations will fall along the first column.)
3. For each r value in Question 2, interpret the corresponding r2 value from both an explained
variance perspective and a prediction perspective.
4. Of all of four correlation coefficients between each IV and y, which factor has the strongest
relationship with y? What is a plausible explanation for this strong relationship?
5. Conduct an outlier analysis using Jackknife distances involving all the variables. In other
words, include all the variables (Y, X2, X6, X7, and X9) in the outlier analysis collectively
(there should be 23 outliers). Repeat Question 1 and compare the two correlation matrices.
What impact did the outliers have on the correlations with y?
6. Using the data set without outliers, conduct a hypothesis test involving the IV that has the
strongest relationship with y. You are to structure this test in exactly the same manner as
given in the guided example: pre-data analysis, data analysis, and post-data analysis. Provide
a summary report of your findings.
7. Extend Question 6 by conducting a hypothesis test for regression. You are to structure this
test in exactly the same manner as given in the guided example: pre-data analysis, data
analysis, and post-data analysis. Provide a summary report of your findings.
8. Answer the following two questions:
a. Does it make sense to develop a regression equation to predict a score on one variable
from a score on a second variable if the two variables are not correlated? Why or why
not? Give an example that would support a “yes” response and an example that would
support a “no” response.
b. Based on your response to Question 8a, to what extent would the prediction equation
from Question 7 be beneficial from a practical perspective?
9. Conduct four independent bivariate regression analyses, one for each IV.
a. Summarize the results in the following table. (The first row is completed for you.)
Factor
Bi
B0
ti (251)
p
r2
X2 = Age
-0.0247
17.15
-1.63
.1047
.01
X5 = Years Held CFI Cert.
X7 = Hour Dual Given
X9 = Total Flight Time
Note. N = 253. Y = Level of complacency.
b. Interpret each Bi value in the context of the research setting.
c. Of the four IVs, which has the strongest predictive power. Is this consistent with your
response to Question 4?
10. Let’s now apply this study to research fundamentals:
a. To what extent are the results from Questions 6, 7, and 9, generalizable from both
population and ecological external validity perspectives?
b. What are some limitations/delimitations of this study?
c. What would be an appropriate recommendation for future research?
Y = Complacency
X2 =X6Age
= Years Held
X7 =CFI
Hours
Cert.Dual
X9 = Total
GivenFlight Time
21
50
30
2500
12500
19
44
13
2500
3500
18
28
6
1500
1800
24
34
14
2000
2600
13
36
15
1800
2200
14
25
2.5
1050
1500
18
29
2
1075
1075
13
30
7
2500
2900
19
31
7
1300
1600
18
23
1
75
350
12
37
18
574
4413
16
38
13
2500
3000
7
65
43
4500
6000
17
68
38
2500
4500
16
66
28
5381
8748
15
61
30
3000
7000
21
65
10
1100
2000
13
72
17
8000
12000
14
66
10
1800
3500
17
34
13
1300
4800
24
30
5
700
1350
17
78
50
5000
12500
15
75
54
4500
6500
20
56
15
800
2300
18
55
26
12000
15500
15
62
33
1200
3600
8
55
26
6000
8000
13
77
34
250
2500
16
59
20
20
5500
16
68
43
4000
15000
15
64
41
4500
9000
14
69
34
1500
4800
17
65
42
2000
14500
19
70
25
4100
5600
15
68
45
10000
35000
13
69
20
6500
8300
13
59
12
3100
4600
14
70
34
12000
14782
14
70
41
2000
3500
15
65
40
4000
7000
14
43
20
5410
7800
23
59
13
4000
5800
16
76
25
5000
7500
19
68
36
3600
8400
16
63
5
410
1500
19
37
10
1500
13000
14
18
9
20
14
19
9
12
12
8
14
15
17
15
18
20
15
8
18
17
12
16
15
17
17
13
16
8
13
9
15
18
16
17
16
15
29
18
21
19
17
19
15
16
17
17
18
44
67
55
60
68
35
63
66
56
78
72
47
48
58
72
28
58
71
73
48
35
71
78
58
61
67
61
55
64
80
74
85
76
61
72
60
62
66
48
62
78
38
64
36
56
58
48
21
8
33
37
10
15
30
10
25
59
50
29
10
25
42
7
36
40
20
20
15
48
60
38
37
21
5
25
29
55
15
70
55
30
21
8
15
43
7
20
22
4
9
9
14
17
3
2400
268
3000
2000
50
2500
1000
1500
3000
4000
8000
4000
3000
750
3500
2500
3000
1000
3500
2400
1000
4000
4200
5000
1400
2000
500
500
7500
5000
1500
3500
8000
1200
5599
820
2800
20000
1500
5300
2500
1400
4000
550
375
2000
125
8800
4310
12000
5500
1200
3900
11500
2700
20000
27000
16000
18800
4000
1600
5000
3100
20000
15000
6300
3500
4500
6500
8200
24000
10000
4500
3800
1300
8500
12000
3000
7500
13000
4740
10166
1800
4000
10000
3000
8780
6700
2200
5000
1425
1600
3100
650
20
16
16
22
13
11
15
18
21
16
17
10
13
17
10
18
16
17
16
14
16
20
19
8
13
19
16
17
16
18
14
13
14
15
8
17
16
17
24
24
18
16
13
12
14
17
17
69
42
55
58
49
66
74
48
54
66
69
50
67
52
60
60
69
31
32
50
40
40
21
78
66
67
69
65
89
53
34
29
68
58
73
55
50
52
56
72
53
51
63
51
36
79
67
46
20
1
35
15
25
20
25
9
45
21
30
48
8
27
12
23
12
4
24
17
1
1
50
4
23
25
3
60
26
1
1.5
42
15
9
5
12
8
6
3
20
15
39
13
5
41
19
1400
1100
250
2000
423
700
1165
1000
2000
2000
2700
1000
1210
500
1100
800
2000
4000
400
1555
2000
120
75
1000
700
2900
10
800
5477
1000
426
400
5000
2100
350
50
5000
1000
3
500
1465
1000
11000
1300
700
10000
1950
14600
1900
900
23700
1200
15000
3538
3000
4400
3000
6850
4000
74
5000
11000
2200
2600
5000
2900
7300
3000
750
400
2000
1500
5400
450
2800
12212
1500
1103
700
13000
4700
3000
640
6000
1450
1100
1700
2350
1700
25500
3400
1300
11700
3800
18
17
11
22
17
12
17
14
18
14
10
17
18
15
17
14
10
15
11
13
17
18
14
25
14
29
15
14
15
15
17
17
11
14
18
16
13
16
15
18
12
19
13
14
22
16
15
62
71
51
58
61
51
55
65
54
64
24
71
57
70
70
83
35
66
57
33
24
40
53
36
75
23
63
28
42
44
72
80
41
52
70
63
36
55
39
27
57
49
60
62
45
33
54
13
20
26
12
42
29
25
30
21
26
0.5
10
9
25
47
39
10
5
10
11
3
1.5
13
15
53
1
33
9
10
8
7
32
1
30
15
30
4
25
1.5
6
28
9
6
15
10
3
5
2500
2300
9900
1000
5000
600
6000
4000
2100
950
1
100
1600
650
3500
5000
2300
1900
200
500
650
1300
1900
5000
6900
66
1500
894
1750
3000
625
4800
200
1200
2500
12000
350
1800
50
600
2500
1000
1005
500
1000
39
750
4000
5050
10100
1800
20500
1800
12000
11000
2800
15460
305
1500
2500
3265
12000
8000
4300
4000
1850
1020
2000
1650
2900
7000
11000
510
3000
5175
4500
4000
2100
7100
650
1800
4700
19000
900
6000
1065
800
8700
2500
3500
2200
2000
670
4000
8
17
12
20
17
17
16
9
16
14
16
15
29
18
19
11
12
9
18
20
16
16
13
11
9
22
21
15
12
16
18
13
8
9
11
8
16
19
14
14
7
18
14
11
15
13
11
67
39
72
52
73
62
55
67
60
66
69
46
42
53
47
35
67
70
64
64
50
35
41
86
67
54
57
74
64
60
33
48
31
34
64
54
48
50
69
64
63
56
60
65
52
51
63
42
12
39
4
53
24
5
41
8
5
44
25
13
22
15
13
34
25
15
15
6
13
14
33
44
2
35
73
42
30
6
1
7
4
38
27
10
5
42
32
37
11
40
41
30
1
41
5600
560
3000
230
3500
140
400
6000
900
450
2800
1000
2600
2500
2000
1230
4500
6000
1200
800
400
4000
1500
19000
7700
347
2200
6995
4000
3100
1800
50
5900
500
5000
2400
1850
450
3500
3400
1200
860
2500
450
4000
45
2500
7000
6300
7400
1250
10500
1400
1250
8000
1500
3200
4800
12000
8600
4500
4500
4500
11100
14000
1800
3000
1100
6000
3000
23400
24200
1005
5000
8816
11000
8700
3500
1700
6800
1050
12000
4000
3500
1400
9800
4100
3200
1385
20000
22200
7000
225
7500
22
12
27
16
17
16
19
15
15
19
20
15
17
19
24
19
21
16
16
10
16
22
16
14
14
16
15
10
18
22
12
16
11
15
20
18
23
17
10
16
16
16
71
61
59
38
57
75
61
44
75
30
45
77
64
34
65
39
63
56
35
66
27
59
60
67
51
65
57
68
68
77
60
25
61
44
54
70
68
57
61
55
45
54
15
41
29
17
9
47
33
22
41
2
10
13
8
5
35
2
42
32
12
35
3
25
40
40
7
36
4
12
35
20
37
6
15
3
35
38
4
15
21
35
25
4
2000
5000
1500
3000
250
2800
1000
3700
8000
150
2000
1800
2000
1000
4024
200
400
500
900
4000
900
600
1000
9000
450
2000
260
115
5000
75
1523
150
500
100
1100
3000
20
600
1100
5000
1700
1450
2500
25000
4300
12000
2800
6600
12000
9700
13000
650
2500
4000
4000
1600
8860
800
8000
2200
2400
6000
2100
5000
4600
11000
1450
13000
750
1200
22000
7900
3195
775
1700
4500
15000
5000
2020
1200
1800
20000
4500
2000
Week 4
Research Project 1
This week is dedicated exclusively to engaging in a research project that will enable you
to apply the various concepts you learned from the previous 3 weeks. The Research Project 1
dataset, which is given as an Excel file, contains actual research data collected from a random
sample of N = 340 certified flight instructors (CFIs). The dataset consists of 10 independent
variables (IVs) and one dependent variable (DV). A description of each variable follows.
X1 = Participants’ gender
X2 = Participants’ age in years
X3 = Participants’ race/ethnicity
X4 = Participants’ marital status
X5 = Participants’ highest level of education
X6 = Total years participants have held a CFI certificate
X7 = Participants’ total hours dual given
X8 = Participants’ total hours dual given in previous 90 days
X9 = Participants’ total flight time (in hours)
X10 = Types of certificates participants current hold
Y = Complacency scores, which were measured on a 7-item Likert scale ranging
from 1 = Strongly Disagree to 5 = Strongly Agree. Thus, scores could range from
7 to 35, with higher scores indicating a greater likelihood toward complacency as
a flight instructor.
1. Based on the IVs and DV, what would be an appropriate overall RQ for this study?
There are two RQs:
(a) What is the relationship between CFIs’ demographics (X1 through X5) and their level of
complacency?
(b) What is the relationship between CFIs’ professional experience/background ((X6 through X10)
and their level of complacency?
2. What would be an appropriate research methodology/design for this study? Explain.
The research methodology is correlational. This is because we are dealing with one group (CFIs)
and multiple variables.
3. Using your statistical software package, find the appropriate measures of central tendency and
variability for Y and X1– X9. Complete a chart similar to the one below. The first row is
completed for you as a guide.
Factor
Appropriate
Measure of
Central Tendency
X1
X2 = age
Mode
Median*
X3 = race/ethnicity
X4 = marital status
X5 = Education
X6 = years held CFI
Mode
Mode
Mode
Median*
X7 = hours dual given
Median*
X8 = hours dual past
90 days
X9 = total flight time
Median*
Y
Mean
Median*
Appropriate
Measure of
Variability
Not applicable
Standard Deviation
and range (20 to 89)
Not applicable
Not applicable
Not applicable
Standard Deviation
and range (0.5 to 73)
Standard Deviation
and range (1 to 20000)
Standard Deviation
and range (0 to 3000)
Standard Deviation
and range (11.7 to
35000)
Standard Deviation
Reason
Actual Measure of
Central Tendency
Data are nominal
Data are skewed
Male
60
Data are nominal
Data are nominal
Data are nominal
Data are skewed
Caucasian
Married
4-year degree
20
Data are skewed
1850
Data are skewed
20
Data are skewed
6185
Data are continuous
15.8
*Although the median is appropriate because of the skewnees, note that it also is appropriate
to report the mean. It also is important to report the range (low to high). Also note that the
skewness could be related to outliers. Nevertheless, for descriptive statistics, we keep the
outliers because they are part of the story we are telling.
4. Explain how you would determine the measure of central tendency for X10?
There is no direct measure of central tendency because participants simply listed the types of
certificates they held. One way to report a “central” metric would be to indicate the mean
number of certificates. For example, participants averaged 2.5 certificates. Another way
would be to simply report the mode, which would be the most frequently held certificate
(CFI). Other responses are acceptable as long as they are reasonable.
5. Prepare a descriptive statistics summary table for all the continuous variables. Use the table
below as a guide. (Round to two decimal places.)
Factor
N
M
SD
Range (L–H)
Shape of
Distribution
X2 = age
X6 = years held
CFI
X7 = hours dual
given
X8 = hours dual
past 90 days
X9 = total flight
time
Y
321
307
56.71
22.26
14.8
15.4
20–89
0.5–73
Skewed left
Skewed right
309
2559
2804.0
1–20000
Skewed right
318
53.29
238.4
0–3000
Skewed right
306
6185
5836.0
11.7–35000
Skewed right
337
15.8
3.8
7–29
Symmetrical
6. Consider the outliers associated with X8. Determine which outliers you think are rare cases
and which you think might be contaminants. Explain.
There appears to be two outliers that are contaminants: Row 60: X8 = 3000 and Row 128: X8 =
2990. Given that X8 represents the total number of dual hours given in the past 90 days, it is
impossible for someone to accumulate 3000 hours dual given in this time frame. All others
hours listed appear to be reasonable. These cases definitely need to be reconciled or
eliminated for inferential statistics.
7. Based on the results for Y, how would you assess CFIs’ overall level of complacency?
As noted in the problem description, Y scores could range from 7 to 35 with higher scores
indicating a greater likelihood toward complacency as a flight instructor. The mean score was
M = 15.8, the median was Mdn = 16, the range was 7 to 29, and SD = 3.8. If we were to
consider the midrange of the possible scores (Note: The midrange is the low score + high
score divided by 2) we get 7 + 35 = 42, and 42 divided by 2 is equal to 21.
If we now compare the mean of M = 15.8 (or about 21) to the midrange, the mean is below the
midrange, which indicates that overall complacency scores were low. Thus, for this sample of
CFIs, it appears that they have a relatively low level of complacency with respect to flight
instruction.
8. Based on the results for X1 and X5, to what population do you think the results of this study
would be generalizable? Why?
Note that for gender, there were 301 male CFIs and 36 female CFIs. This makes us wonder if
gender is really a variable or a constant when 89% of the sample is male. For highest level of
education, it looks like 4-year (37.5%) and master’s degree (30.7%) are the two highest
categories. Thus, 68% (approximately two-thirds) of the sample’s highest level of education is
a 4-year degree or a master’s degree. Putting this all together, it would be reasonable to
conclude that the results of the study are generalizable to male CFIs who have at least a 4year college degree.
9. Interpret in the context of the given research setting Q1, Q2, and Q3 for Y.
• Q1 = 14 (25% of the CFIs’ scores were lower than 14)
• Q2 = 16 (50% of the CFIs’ scores were lower than 16)
• Q3 = 18 (75% of the CFIs’ scores were lower than 18)
10. What type of distribution do the complacency (Y) scores form? What statistical data do you
have to support your answer? If we were to assume that the complacency scores
approximate a normal distribution, what is the probability that a CFI selected at random will
have a complacency score of at least 21?
• Symmetrical (bell-shaped)
• The mean (M = 15.8) and the median (Mdn = 16.0) are nearly identical
• Round the mean to 16 and the standard deviation to 4. If x = 21, then the corresponding z
score z = 1.25. The area under the curve between z = 0 and z = 1.25 is .3944. Therefore, the
probability a Y score is at least 21 is 0.5000 – 0.3944 = .1056. Thus, there is a 10.56%
likelihood of a CFI scoring at least 21 on the complacency instrument.

Order your essay today and save 25% with the discount code: STUDYSAVE

Order Now

Turn in your highest-quality paper
Get a qualified writer to help you with

“ I need some tutoring with statistics using SPSS ”

Get high-quality paper

NEW! AI matching with writer

Order a unique copy of this paper

Type of paper needed:

Pages:

600 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

Our Services

I need some tutoring with statistics using SPSS

Order a unique copy of this paper