INSTRUCTIONS
You are to test your hypotheses from your Assignment #2.
You will produce a report written in APA format.
You must include:
An Overview of the study: 1) the overall purpose of the study and your hypotheses, (how did you come up with those hypotheses?)
The Data: a description of the data set
The Statistics: What statistics did you employ? You must present:
Descriptive Statistics: include the appropriate graphical displays for each of your three hypotheses and the relevant numbersI
inferential Statistics: (ttest, ChiSquare). How did you test your hypotheses? Did you accept or reject the null? how do you know? Include ONLY the relevant outcomes meaning – if you can’t explain what a statistic is – do not include it.
Outcomes: You major findings or trends found as a result of your analysis
Summary: A short, interpretative summary of your results and conclusions
Finally: In just a few sentences – what would you have done differently?
Running Head: Statistics 200
1
Statistics 200 Example Descriptives,
Correlation, ChiSquare
Stat 200
2
Discrimination within the Workplace
Discrimination has been an ongoing issue within the organization of 474 employees.
There is a 56.2% gap between the number of minorities and majorities in the workforce. Part one
of this study analyzed the relationship between minority status, Employment Category, and
current salary. There are two research hypotheses for part one of the study: H1 tested that there
would be a positive correlation between employment category and current salary; and H1b, there
would be a negative correlation between the minority and employment category. Part two of the
study investigated the claims of discrimination between minority groups and their employment
status. If the claims are true, the experimental hypothesis (H2) would show that there are
differences between the majority and minority groups and their employment status; otherwise,
we would fail to reject the null hypothesis (H0).
Method
Participants
Participants in this study were current employees of the organization. There was a total of
474 participants (258 males and 216 females) in the study (see Table 1), and out of the total sample
21.9% identified as being a member of a minority (see Table 2). Additionally, there were more male
minorities (64) than female minorities (40), and more male majority (194) than female majority
(176) (see Tables 1, 2, 3)
Table 1: Gender
Valid Cumulative
FrequencyPercent Percent Percent
ValidFemale 216
45.6 45.6
45.6
Male 258
54.4 54.4
100.0
Total 474
100.0 100.0
Stat 200
3
Table 2: Minority Classification
ValidNo
Yes
Total
Table 3: Crosstab Count
Valid Cumulative
FrequencyPercent Percent Percent
370
78.1 78.1
78.1
104
21.9 21.9
100.0
474
100.0 100.0
Minority Classification
No
Yes
Total
GenderFemale 176
40
216
Male 194
64
258
Total
370
104
474
Materials and Procedure
No materials were used beyond the organization’s internal data systems.
Data Collection Instruments
Internal Human Resources data was used for this study. Information regarding an employee’s
sex, date of birth, education level, employment status, EEO-ADA information, current and past salary,
and retention information was collected and analyzed. Participants were not required to provide
additionally consent for this study as per their contract with the organization, information collected during
employment could be used for federal, state, and local reporting, as well as policy and program evaluation
and changes.
Results
The data showed that the H1 and H1b were retained for part one of the study. There was a
positive correlation between employment category and current salary (r = .780, N = 474, p < .0005, onetailed) and a negative correlation between the minority and employment (r = -.144, N = 474, p < .002,
one-tailed) (See Table 4). As displayed in Table 5, the majority group make up significantly more of the
organization’s workforce (78.1%) than the minority group (21.9%). The majority group also held more
clerical (76%) and managerial (95.2%) positions in comparison to their counterpart. There was a slight
three percent difference between the two groups in custodial positions minority (48.1%) and majority
(51.9%) which indicates that minorities are more likely to be hired into lower level positions than midand high-level positions. This is true since only 24% of those in the minority group are in clerical roles
and 4.8% are in managerial positions. H2
Stat 200
4
was retained, as the data showed that there were more minority males than females in the
workforce overall (see Table 3). The relationship between minority classification and type of
employment category (custodial, clerical, and manager) was significant χ2 (1, N = 474) = 26.172,
p = .0005 (see Table 6).
Table 4: Correlations
Employment
Category
Employment Category Pearson Correlation 1
Sig. (2-tailed)
N
474
Current Salary
Pearson Correlation .780**
Sig. (2-tailed)
.000
N
474
Minority Classification Pearson Correlation -.144**
Sig. (2-tailed)
.002
N
474
**. Correlation is significant at the 0.01 level (2-tailed).
Current Salary
.780**
.000
474
1
474
-.177**
.000
474
Minority
Classification
-.144**
.002
474
-.177**
.000
474
1
474
Table 5: Minority Classification * Employment Category Crosstabulation
Employment Category
Clerical
Custodial Manager
Minority Classification No
Count
276
14
80
Expected Count
283.4
21.1
65.6
% within Minority
74.6%
3.8%
21.6%
Classification
% within Employment
76.0%
51.9%
95.2%
Category
Yes
Count
87
13
4
Expected Count
79.6
5.9
18.4
% within Minority
83.7%
12.5%
3.8%
Classification
% within Employment
24.0%
48.1%
4.8%
Category
Total
Count
363
27
84
Expected Count
363.0
27.0
84.0
% within Minority
76.6%
5.7%
17.7%
Classification
% within Employment
100.0%
100.0%
100.0%
Category
Total
370
370.0
100.0%
78.1%
104
104.0
100.0%
21.9%
474
474.0
100.0%
100.0%
Stat 200
5
Table 6: Chi-Square Tests
Value
26.172a
29.436
df
2
2
Asymptotic
Significance (2sided)
.000
.000
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
9.778
1
.002
Association
N of Valid Cases
474
a. 0 cells (0.0%) have expected count less than 5. The minimum
expected count is 5.92.
Table 7: Correlations
Employment
Category
Employment Category Pearson Correlation 1
Sig. (2-tailed)
N
474
Minority Classification Pearson Correlation -.144**
Sig. (2-tailed)
.002
N
474
**. Correlation is significant at the 0.01 level (2-tailed).
Minority
Classification
-.144**
.002
474
1
474
Conclusion
The results from part one of the study showed that the H1 and H1b were retained. There
was an overall positive relationship between salary and employment category, and a negative
relationship within the minority groups and employment category. Clearly, there is more that
needs to be done to increase diversity throughout the workforce. This will help to shrink the
employment gap within the clerical and managerial positions, and as more minorities increase
within the ranks, the salary gap between the minority and majority groups should begin to shrink.
Example Write-up Independent T-Test
An Advertising Agency is commissioned to create a TV advert to promote a new product. Since the product is designed
for men and women, the TV advert has to appeal to men and women equally. Before the company that commissioned
the Advertising Agency spends $250,000 across a number of TV networks, it wants to make sure that the TV advert
created by the Advertising Agency appeals equally to men and women. More specifically, the company wants to know
whether the way that men and women engage with the TV advert is the same. To achieve this, the TV advert is shown to
20 men and 20 women, who are then asked to fill in a questionnaire that measures their engagement with the
advertisement. The questionnaire provides an overall engagement score.
There were 20 male and 20 female participants. The advertisement was more engaging to male viewers (M = 5.56, SD =
0.29) than female viewers (M = 5.30, SD = 0.39).
There was homogeneity of variances for engagement scores for males and females, as assessed by Levene's test for
equality of variances (p = .174).
Mean difference between groups
To determine the mean difference between the two groups and provide a measure of the likely range (plausible values)
of this mean difference, you need to consult the last four columns of the Independent Samples Test, as highlighted
below:
Note: You established, in a previous section, that the mean engagement score for males (5.56 ± 0.29) was higher than
that for females (5.30 ± 0.39). The mean difference between these two group means is presented in the "Mean
Difference" column and has a value of 0.25900. This mean difference is calculated as the difference between the mean
engagement score for males and for females, i.e., 5.56 – 5.30 = 0.26 ≈ 0.25900 (to account for rounding errors). We can
now establish that the mean engagement score for males is 0.25900 higher than for females. However, we would also
like to report a measure of variability of the mean difference, which we can do with the standard error of the mean
difference (the "Std. Error Difference" column), which is 0.10954, or using 95% confidence intervals (the "Lower" and
"Upper" columns), which are 0.03726 to 0.48074. My preference is for reporting 95% confidence intervals, but either is
fine (unlike when reporting descriptive statistics, with sampling distributions it is fine to use the standard error; see
Altman & Bland, 2005). As before, the results in this example will be reported to 2 decimal places, but you should report
them to whatever level of precision is appropriate for your dependent variable
You could report these results as follows:
Male mean engagement score was 0.26, 95% CI [0.04 to 0.48] higher than female mean engagement score.
OR
Male mean engagement score was 0.26 (SE = 0.11) higher than female mean engagement score.
Reporting statistical significance
To determine whether the independent-samples t-test is statistically significant (i.e., the mean difference is statistically
significant), you need to consult the middle portion of the Independent Samples Test table, as highlighted below:
You are presented with the observed t-value (the "t" column), the degrees of freedom (the "df" column), and the
statistical significance (p-value) (the "Sig. (2-tailed)" column). If p < .05, this means that the mean difference between
the two groups is statistically significant. Alternatively, if p > .05, you do not have a statistically significant mean
difference between the two groups. In this example, p = .023 (i.e. p < .05). Therefore, it can be concluded that males and
females have statistically significantly different mean engagement scores. Or, phrased another way, the mean difference
in engagement score between males and females is statistically significant. What this result means is that there is a 23 in
1,000 chance (2.3%) of getting a mean difference at least as large as the one obtained if the null hypothesis was true
(the null hypothesis stating that there is no difference between the group means). Remember, the independent-samples
t-test is testing whether the means are equal in the population.
Note: It is important to remember that the level of significance (p-value) does not indicate the strength or importance of
the mean difference between groups, only the likelihood of a mean difference as large or larger as the one you
observed, given that the null hypothesis is true. For example, if this example had produced a p-value of 0.0115 (p =
.0115), this does not mean it is twice as 'strong' or 'important' as p = .023. In layman's terms, the p-value is simply trying
to inform you whether the difference in the two groups you studied is not a 'fluke' and it really is likely that you would
expect to see a difference like the one in your study in the population (not just in your sample). In that sense, a lower pvalue simply indicates how confident you can be that your result is a 'real' one. Whether that makes it important or
large, it cannot tell you.
Results: There was a statistically significant difference in mean engagement score between males and females, t(38) =
2.365, p = .023.
The breakdown of the last part (i.e., t(38) = 2.365, p = .023) is as follows:
Part
Meaning
Column in
Table
1
t
Indicates that we are comparing to a t-distribution (t-test).
2
(38)
Indicates the degrees of freedom, which is N - 2
df
3
2.365
Indicates the obtained value of the t-statistic (obtained tvalue)
t
4
p = .023
Indicates the probability of obtaining the observed t-value
if the null hypothesis is correct.
Sig. (2-tailed)
When reporting the statistical significance of the test in your report, you would normally include it directly after
reporting the mean difference and 95% confidence intervals. So, adding in this information in this example, you could
report the results as:
There was a statistically significant difference in engagement scores between males and females, with males scoring
higher than females, M = 0.26, 95% CI [0.04, 0.48], t(38) = 2.365, p = .023.
OR if you have chosen to use the standard error (of the mean difference), you could report the result as:
There was a statistically significant difference in engagement scores between males and females, with males scoring
higher than females, M = 0.26, SE = 0.11, t(38) = 2.365, p = .023.
Sometimes you might be asked to state the null and alternative hypotheses of the independent-samples t-test for your
data, and then to state whether you should reject the null hypothesis and accept the alternative hypothesis, or fail to
reject the null hypothesis and reject the alternative hypothesis. Notice that it is not possible to accept the null
hypothesis, so don't ever state this.
In this example, as the test result was statistically significant (p = .023), you could report the result as:
There was a statistically significant difference between means (p < .05), and therefore, we can reject the null hypothesis
and accept the alternative hypothesis.
Putting it all together (You can report the results as follows
There were 20 male and 20 female participants. An independent-samples t-test was run to determine if there were
differences in engagement to an advertisement between males and females. There were no outliers in the data, as
assessed by inspection of a boxplot. Engagement scores for each level of gender were normally distributed, as assessed
by Shapiro-Wilk's test (p > .05), and there was homogeneity of variances, as assessed by Levene’s test for equality of
variances (p = .174). The advertisement was more engaging to male viewers (M = 5.56, SD = 0.35) than female viewers
(M = 5.30, SD = 0.35), a statistically significant difference, M = 0.26, 95% CI [0.04, 0.48], t(38) = 2.365, p = .023.
Interpretation? That’s up to you
Pearson’s Product-Moment Correlation
The bivariate Pearson correlation indicates the following: Whether a statistically significant linear relationship exists
between two continuous variables. The strength of a linear relationship (i.e., how close the relationship is to being a
perfectly straight line)
What is a bivariate relationship?
Bivariate correlation is a measure of the relatiovnship between the two variables; it measures the strength of
their relationship, which can range from absolute value 1 to 0. The stronger the relationship, the closer the value is to 1.
What is the Pearson’s r?
In statistics, the Pearson product-moment correlation coefficient (/ˈpɪərsɨn/) (sometimes referred to as the PPMCC or
PCC or Pearson’s r) is a measure of the linear correlation between two variables X and Y, giving a value between +1 and
−1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1
What is a Bivariate Analysis?
Bivariate analysis is one of the simplest forms of quantitative (statistical) analysis. It involves the analysis of two
variables (often denoted as X, Y), for the purpose of determining the empirical relationship between them.
What is a bivariate regression model?
Bivariate Regression. The simplest form of regression is bivariate regression, in which one variable is the outcome and
one is the predictor. Very little information can be extracted from this type of analysis.
What is the value of R in a scatter plot?
Pearson’s r can range from -1 to 1. An r of -1 indicates a perfect negative linear relationship between variables, an r of 0
indicates no linear relationship between variables, and an r of 1 indicates a perfect positive linear relationship between
variables. Figure 1 shows a scatter plot for which r = 1.
Why would you use a Pearson correlation?
A Pearson’s correlation is used when there are two quantitative variables. The possible research hypotheses are that
there is a postive linear relationship between the variables, a negative linear relationship between the variables, or no
linear relationship between the variables.
What is an example of bivariate data?
Univariate: one variable, Bivariate: two variables. Univariate means “one variable” (one type of data) Example: Travel
Time (minutes): 15, 29, 8, 42, 35, 21, 18, 42, 26. The variable is Travel Time.
What does the Pearson correlation coefficient show?
The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of 0 indicates that there is no
association between the two variables. A value greater than 0 indicates a positive association; that is, as the value of one
variable increases, so does the value of the other variable.
What are the limits of the correlation coefficient?
Properties: Limit: Coefficient values can range from +1 to -1, where +1 indicates a perfect positive relationship, -1
indicates a perfect negative relationship, and a 0 indicates no relationship exists.. Pure number: It is independent of the
unit of measurement.
What is a strong correlation?
Here r = +1.0 describes a perfect positive correlation and r = -1.0 describes a perfect negative correlation. Closer the
coefficients are to +1.0 and -1.0, greater is the strength of the relationship between the variables.
What is correlation in data analysis?
Correlation is a technique for investigating the relationship between two quantitative, continuous variables, for
example, age and blood pressure. Pearson’s correlation coefficient (r) is a measure of the strength of the association
between the two variables.
Pearson Product-Moment Correlation
What does this test do?
The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the
strength of a linear association between two variables and is denoted by r. Basically, a Pearson product-moment
correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation
coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this
new model/line of best fit).
What values can the Pearson correlation coefficient take?
The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of 0 indicates that there is no
association between the two variables. A value greater than 0 indicates a positive association; that is, as the value of one
variable increases, so does the value of the other variable. A value less than 0 indicates a negative association; that is, as
the value of one variable increases, the value of the other variable decreases. This is shown in the diagram below:
How can we determine the strength of association based on the Pearson correlation coefficient?
The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, will be to either +1 or
-1 depending on whether the relationship is positive or negative, respectively. Achieving a value of +1 or -1 means that
all your data points are included on the line of best fit – there are no data points that show any variation away from this
line. Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there is variation around the line of best
fit. The closer the value of r to 0 the greater the variation around the line of best fit. Different relationships and their
correlation coefficients are shown in the diagram below:
Are there guidelines to interpreting Pearson’s correlation coefficient?
The following list are general guidelines.
Coefficient, r
Strength of Association
Positive
Negative
Small
.1 to .3
-0.1 to -0.3
Medium
.3 to .5
-0.3 to -0.5
Large
.5 to 1.0
-0.5 to -1.0
Remember that these values are guidelines and whether an association is strong or not will also depend on what you are
measuring.
Can you use any type of variable for Pearson’s correlation coefficient?
No, the two variables have to be measured on either an interval or ratio scale. However, both variables do not need to
be measured on the same scale (e.g., one variable can be ratio and one can be interval).
Do the two variables have to be measured in the same units?
No, the two variables can be measured in entirely different units. For example, you could correlate a person’s age with
their blood sugar levels. Here, the units are completely different; age is measured in years and blood sugar level
measured in mmol/L (a measure of concentration). Indeed, the calculations for Pearson’s correlation coefficient were
designed such that the units of measurement do not affect the calculation. This allows the correlation coefficient to be
comparable and not influenced by the units of the variables used.
What about dependent and independent variables?
The Pearson product-moment correlation does not take into consideration whether a variable has been classified as a
dependent or independent variable. It treats all variables equally. For example, you might want to find out whether
basketball performance is correlated to a person’s height. You might, therefore, plot a graph of performance against
height and calculate the Pearson correlation coefficient. Lets say, for example, that r = .67. That is, as height increases so
does basketball performance. This makes sense. However, if we plotted the variables the other way around and wanted
to determine whether a person’s height was determined by their basketball performance (which makes no sense), we
would still get r = .67. This is because the Pearson correlation coefficient makes no account of any theory behind why
you chose the two variables to compare. This is illustrated below:
Does the Pearson correlation coefficient indicate the slope of the line?
It is important to realize that the Pearson correlation coefficient, r, does not represent the slope of the line of best fit.
Therefore, if you get a Pearson correlation coefficient of +1 this does not mean that for every unit increase in one
variable there is a unit increase in another. It simply means that there is no variation between the data points and the
line of best fit. This is illustrated below:
What assumptions does Pearson’s correlation make?
There are five assumptions with respect to Pearson’s correlation:
1. The variables must be either interval or ratio measurements
2. The variables must be approximately normally distributed
3. There is a linear relationship between the two variables (but see note at bottom of page
4. Outliers are either kept to a minimum or are removed entirely.
5. There is homoscedasticity of the data.
ANOVA
The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences
between the means of two or more independent (unrelated) groups (although you tend to only see it used when there
are a minimum of three, rather than two groups).
Example
A manager wants to raise the productivity at his company by increasing the speed at which his employees can use a
particular spreadsheet program. As he does not have the skills in-house, he employs an external agency that provides
training in this spreadsheet program. They offer 3 courses: a beginner, intermediate and advanced course. He is unsure
which course is necessary for the type of work they do at his company, so he sends 10 employees on the beginner
course, 10 on the intermediate and 10 on the advanced course. When they all return from the training, he gives them a
problem to solve using the spreadsheet program, and times how long it takes them to complete the problem. He then
compares the three courses (beginner, intermediate, advanced) to see if there are any differences in the average time it
took to complete the problem.
Results
Descriptive Analysis Table
Table 1 displays the results for descriptive statistics for the groups beginner, intermediate, and advanced measuring the
dependent variable Time (x=27.2 versus x=23.6, and x=23.4,). Group means suggest that differences of interest exist
between the Beginner group versus both the Intermediate and Advanced Groups. Differences in SD and Std..Error
suggest that great variability exists in the Intermediate group (SD=3.30, Std error1.04) and Advanced group (SD=3.24, Std
error 1.024) and less in the Beginner group, SD=3.05 Std. error .964). Differences in the 95% confidence Interval for the
mean exist as well (and you would cite the differences).
Table 1: Descriptive Results
ANOVA Table
Displayed are the results of the ANOVA analysis that show if there are any significant differences between groups but
does not identify where those differences exist. If the results are significant (p