Application Assignment #1For this application assignment, we will use data from glassdoor about employee compensation. The data set has
9 variables (columns) and 1000 employees (rows). The variables in order are: Job Title, Gender, Age, Latest
Performance Evaluation Score (1-5), Highest Education, Department, Seniority Level (1-5), Base Pay, Bonus Pay
Overarching question/task: Imagine you have been hired to explore whether this company’s dataset shows
evidence of a gender pay gap. Do not alter the data set in any way unless you can justify the exclusion of data
points due to them being outliers or errors (and I am not implying that these are in the data set).
Chapter 1
a) Which variables are categorical in the data set and what does that mean? Are there limits/downsides to this
and is there any way to change that?
b) Which variables are quantitative and explain how you know that (or what that means).
c) Variables like Performance Evaluation and Seniority Level are evaluated from 1 to 5. Explain what level of
measurement this is and how that impacts how scores are interpreted.
d) One type of data, ordinal, is not present. Explain how we could take data such as Bonus and convert that to
ordinal data. Explain an upside and a downside to converting those data in that way?
e) We can use these data in a descriptive manner or in an inferential way. Explain what that means.
f) If we use these data to look for gender pay gap issues, are there data mining or ethical issues we need to
consider?
Chapter 2
[You can use relative or precent freq distributions for (g) and (h) too.]
g) For one interesting categorical variable (not gender because it only has 2 categories), create a frequency
distribution and graphical display that shows the distribution of employees across the categories.
h) Now repeat (a) with a quantitative variable.
i) Create cross-tabulations of average base pay and bonus by gender. You can do these together or in 2 tables.
j) Create graphical displays that show the average base pay and bonus for men and women.
k) In (j), we looked at financial compensation by sex/gender, but that doesn’t factor in the other aspects of
gender pay gaps. Imagine you are hired to create at least two graphical displays to show that there are gender
pay gaps across job titles, education level, departments, and/or levels of job performance or seniority.
Present those and write a brief explanation of what someone should see when viewing those. You get to
select the issues you want to explore – you don’t have to do all of them. Make sure you select images that
make really compelling cases for a gap OR no gap!
Chapter 3
l) Capture the center/location and spread/variability of base pay for each gender/sex. This should include
measures that are standard, such as
1. mean, median, mode, and quartiles
2. standard deviation, range, IQ range
m) Distribute the data for base pay such that you can evaluate its shape.
n) Are there any outliers in the data? Explain on what you base this assessment.
o) If the shape of a distribution is skewed and there are outliers, explain how that would impact your responses
in (a) regarding measures of central tendency.
p) Calculate and display (boxplot) the five-number summary for bonus pay separate for each gender/sex.
q) Go back to your consideration of the data in (k), for whatever variables you selected there, create summary
statistics for each group within the variable and summarize what you find. For example, I might examine
gender pay disparities across seniority to see if the gap grows as seniority increases. Summary statistics might
show that the mean base pay for men and women is closer together at a seniority ranking of “1” but the
difference grows to be a bigger gap with each increase in seniority. The summary statistics (a mean and
median in this case) will help me see that in fine detail.
Final summary:
Given what you opted to examine here, what do you think about the gender pay issue for this company? Is there
an issue or not? If so, where might you suggest the company look to address the issue?