Post a cohesive response based on scenario provided. To prepare for discussion read Learning Resource and your professional experience. Be sure to discuss the following: “See attachment for detailed instructions
Workplace Evaluation and Testing
In the world of learning and performance, evaluation is the act of passing judgment on the value of a problem and its proposed solutions. Measurement is the act of gathering data and then using what is found out as a basis for decisions as to the worth of a problem and the value of a solution. Measures are the attributes that the people doing the evaluation pay attention to when making a judgment, such as customer service, timeliness, security, return on investment, and so on. Metrics are units of measurement such as how frequently a behavior occurs, how long before a behavior appears in seconds or hours, how many checks or levels of approval there are, and how much money is gained in hundreds or thousands of dollars. For example, if a client wants to measure customer service, the metric might be how frequently people exhibit the previously determined desired behaviors. If the measure is time, the metric may be years, days, or milliseconds, depending on the circumstances. Taken together measures and metrics are what people accept as evidence that there is a problem and that circumstances improved after a solution was imposed.
Mosele, J. & Dessinger, J. (2009). Handbook of Improving Performance in the Workplace. (Volume 3). Pfeiffer-Wiley
To prepare for this Discussion, pay particular attention to the following Learning Resources:
· Review this week’s Learning Resources, especially:
· Read Week 4 Lecture –
See Word doc
.
· Read Chapter 11-12 – See Word doc
Assignment:
This week you learned about evaluation and testing. Some might feel that testing does not have a place in the organization while others feel it is critical to the success of an organization.
· In a debate-like format, take a stance on organizational testing strategies.
· How you feel about their need within the workplace.
· Be sure to summarize the use of testing within your current work environment.
· 3 – 4 paragraphs
· No plagiarism
· APA citing
Evaluating Results and Benefits – Week #4 Lecture 1
Performance Improvement
Welcome to Week 4. We are officially halfway through the course. This week we will discuss the importance of performance improvement within the workplace. This is an essential topic when considering the success of an organization.
Performance improvement is defined as the measurement of output of a business process.
The process is then modified in order to increase the output and/or increase efficiency or effectiveness of the process. Performance improvement can be used at an individual level or at an organizational level which makes this an effective tool in generating organizational success (Moseley & Dessigner, 2009).
Performance improvement is considered to be an organizational change where management puts a program into place in order to measure the current level of performance throughout the organization. This allows management to develop ideas that can modify organizational behavior and infrastructure. The end result aims to be higher output, effectiveness, and efficiency. In addition, organizational efficacy may be improved as the measurements can look at goals and objectives that need improvement.
In the workplace, human performance can often be improved by engaging employees in a rewarding experience. By rewarding an employee, behavior can be modified to motivate the employees to become more productive. When an employee is motivated, it is easier to direct them towards the goal of the organization which ultimately leads to success.
Rewards do not always have to be monetary. Organizational or departmental competitions might be one way to motivate an employee. Time off, gift cards, and flex time are examples of non-cash rewards that might motivate an individual within the workplace. The goal is to connect the employees with the rewards as a means of being successful in performance improvement.
Return on Investment, or ROI, in training and development can be defined as a means of measuring the economic return that has been generated from an investment as a result of a training program. The returns are then compared against the cost of the program in order to achieve an annual rate for return on the investment. So, you might be wondering what this has to do with performance overall. The answer is simple, ROI is about judging the investment based on training and development. Customer complaints and returns are also a measurement in ROI which in the end, gives a solid measurement of the success of a program and/or product. If the program can boost the bottom line, you have a solid program. If not, it’s time to reconsider, make changes, and move forward (Moseley & Dessigner, 2009).
Now that you understanding performance improvement and the importance of the return on investment, it is important to discuss performance testing. After all, this is necessary as it works hand-in-hand with the above. Performance tests require an individual to perform a task while an evaluator observes. The performance test will test the workplace processes to ensure accuracy, efficiency, and reliability. A performance test is real-time and allows for immediate feedback from the evaluator. In the instance a task is not working properly, the evaluation will provide ideas for improvement. In addition, once feedback is received, the task can be performed again with the improvements to determine efficiency and the cycle continues until success occurs.
There are two steps in designing a performance test, which is important to understand. Design and development are essential in the design of the performance test. Design synthesizes analysis data and then specifies a solution. Development builds several testing scenarios in order to determine the best output. Designing is typically the first step and development follows. By doing this, you have an opportunity at trial and error to ensure that the performance testing conducted gives you the most bang for your buck. In the end, you will find the greatest results to ensure that the performance is meeting organizational and industrial standards creating a productive and profitable organization.
Resources:
Mosele, J. & Dessinger, J. (2009). Handbook of Improving Performance in the Workplace. (Volume 3). Pfeiffer-Wiley
CHAPTER ELEVEN
Performance-Based Evaluation: Tools, Techniques, and Tips
Judith A. Hale
This chapter focuses on six rules and their associated tools, techniques, and tips for measuring the magnitude of problems and the effect of solutions so that the evaluations are more evidence-based, that is, they are based on actual observations or outcomes, not hypothetical events or hearsay. Collectively, the rules, tools, techniques, and tips are meant to support the evaluation of interventions or solutions designed to improve human performance. Their use increases the chances that evaluation is based on valid information that is useful to decision-makers. Rules are prescribed guides for what to do, when, and why. The rules begin with how to get agreement on what measures and metrics to use as the basis of the evaluation. They conclude with how to present findings to clients to facilitate understanding and decisions. Tools are instruments used in the execution of a task. They are a means to an end. Techniques are suggestions about how to carry out a task or make better use of a tool usually with the intent of saving time or reducing error. Tips are bits of expert advice intended to make the application of a rule or the use of a tool easier. Tools, techniques, and tips are meaningless without rules; likewise, rules without tools, techniques, and tips are difficult to apply.
THE RULES
The rules for evaluating needs and solutions based on facts or evidence are
1. Get sufficient clarity—Have clients explain what they perceive as a need or goal in detail. The factors and observations they are using as a basis for determining there is a problem are the same factors they will use to judge improvement or success. Clarity about the details facilitates gaining consensus about the need and the evidence.
2. Set a baseline—Set a baseline or describe the current state of affairs sufficiently so that improvement can be measured. Clients cannot determine whether circumstances have changed unless they have something against which to compare the new situation.
3. Leverage data already being collected—Leverage data the client already has to measure whether change is happening and the desired level of improvement occurred. This saves time, reduces the cost of evaluating, and increases the likelihood the evidence will be accepted.
4. Track leading indicators—Leading indicators are the presence of interim behaviors or results that predict results if they continue. When clients track leading indicators, they are in a better position to take corrective action in time to make a difference.
5. Analyze the data—Examine the data for patterns, frequency, and significance so they guide future decisions. The analysis should lead to insights and better understanding of the current situation and how much change has occurred.
6. Tell the story—Communicate the logic behind the decision and the evidence used to measure the effectiveness of the solution. This will facilitate commitment to the solution and meaningful dialogue about the need for any next steps to further support improvement.
The rules are somewhat linear or similar to a procedure; however, it helps to have a deeper understanding of some of the more common performance improvement measures and metrics to use them efficiently.
MEASURES, METRICS, AND EVIDENCE
In the world of learning and performance, evaluation is the act of passing judgment on the value of a problem and its proposed solutions. Measurement is the act of gathering data and then using what is found out as a basis for decisions as to the worth of a problem and the value of a solution. Measures are the attributes that the people doing the evaluation pay attention to when making a judgment, such as customer service, timeliness, security, return on investment, and so on. Metrics are units of measurement such as how frequently a behavior occurs, how long before a behavior appears in seconds or hours, how many checks or levels of approval there are, and how much money is gained in hundreds or thousands of dollars. For example, if a client wants to measure customer service, the metric might be how frequently people exhibit the previously determined desired behaviors. If the measure is time, the metric may be years, days, or milliseconds, depending on the circumstances. Taken together measures and metrics are what people accept as evidence that there is a problem and that circumstances improved after a solution was imposed.
1. Get Sufficient Clarity
The first rule is to get sufficient clarity as to what stakeholders are using as evidence that a need exists and what information they will accept as proof that performance improved. A desired by-product of getting clarity is consensus among stakeholders as to the importance of the need and what they will accept as evidence of improvement. Clients typically dictate solutions, such as training, coaching, new software, or a change in personnel to improve performance. They may assume the basis for the request is obvious and accepted by others. However, until the information on which they are making the request is explicit, it is difficult to determine whether there is agreement or whether there is sufficient evidence to warrant action. The best time to help clients articulate the basis for their request is at the time of the request. There are tools, techniques, and tips to help clients better articulate or express what they are using as evidence a need exists or what they will take as evidence that the situation improved as a result of some intervention.
Tool 1a: Getting Clarity. A simple, but effective tool is shown in
Table 11.1
, Getting Clarity. It can be a spreadsheet or table that lists the problem and the evidence in different columns. Clients use it to capture what is known and what is suspected. The Issue column is where clients list the problem they are concerned about. The Evidence column is where clients note what information they are using as a basis for their conclusion that there is a problem and how pervasive it is. It helps clients connect the problem with the evidence. For example, the issue might be customer complaints, turnover of key personnel, or cost overruns. The questions then are about how clients know these are the issues. Tool 1a. Getting Clarity, as shown in
Table 11.1
, has examples in it. However, when using the tool put only that information in each column that is relevant to the situation.
There are at least two ways to use Tool 1a: (1) ask questions and fill it out based on what is learned or (2) prepare it ahead of time using one’s best guess or past experience.
Technique and Tip 1. Ask Questions. A simple technique is to probe, simply asking for more information about the logic behind the request. For example, if clients were told it seemed they had given the situation a lot of thought and the goal was to not waste their time or misuse their resources, they may be more willing to openly discuss the basis on which they decided there was a problem. They may be more willing to share what led them to the conclusion that an action or a solution was needed. The intent is to get clients to explain what they have seen happening that convinced them that a solution is needed and what behaviors will convince them that the situation had improved. A tip is to position the questions asked of them as a desire to save time, avoid mistakes, and use resources wisely. Most people are willing to share their experiences and reasoning if the request is not experienced as a statement of doubt about how they made the decision but rather a genuine interest in better understanding the problem.
Technique and Tip 2. Come Prepared and Have an Organization Scheme. It is best to have measures and metrics already in mind before discussing a problem or a solution. This is easier to do when one has more experience with a client or a performance problem. The list of measures and metrics are used to facilitate a more robust conversation with clients. A technique that supports this tip is to develop an organizing scheme for measures that quickly presents a mental image or reference point about how to evaluate a need or a solution. Table 11.2, Function and Program, Measures presents one way of organizing measures. It separates measuring a function’s worth from that of a solution’s worth. It also suggests measures that clients may already be thinking about, but may not express.
Measures of Contribution
. These measures are used to judge the degree overall that the learning and performance function adds value to the organization. Examples of contribution might be
1. Alignment—The degree clients see the link between what actions are being proposed and their needs being met. The metric might be the number of programs explicitly tied to major initiatives.
2. Productivity—The degree clients see how much was delivered and how timely the work was done. Metrics might be the number of programs produced within a year and the lapse time in days or weeks between the request and the delivery.
3. Cost competitive—The degree clients see the use of cost competitive resources and their being used wisely. Metrics might be the number and cost of internal and external resources used to develop solutions.
4. Customer relations—The degree clients experience the learning and performance improvement function as easy to work with. Metrics might be the average rating of customers’ opinion on a survey and the number of anecdotes commending the function’s work.
Program Measures
. These are the factors clients consider when judging the worth of specific products, programs, and services. They might include:
1. Satisfaction—How satisfied stakeholders are with the current state and how satisfied they are after implementing the solution. Metrics might be the average rating of opinions on a survey and the standard deviation (the amount of variance) among those opinions.
2. Learning—How proficient workers were before a solution was implemented compared to after it was implemented. Metrics might be pre- and post-test scores, how frequently completed work met standards, and how quickly tasks were done.
3. Transfer or behavior change—How many people’s behavior changed after the solution was implemented and how quickly did it change. Metrics might be the frequency of discrete behaviors and how many days it took for those behaviors to show up consistently.
4. Goal accomplishment—To what degree did the solution deliver on the promise? The metric depends on the goal. If the goal was increased sales, the metric might be the number of proposals accepted or the number of leads that converted to sales.
5. Time to proficiency—How long does it take to bring people to proficiency compared to what it was after the implementation of the solution. The metric might be quantity of work performed within a given time frame, accuracy of work, or how quickly people could do the work to standard without supervision.
6. Cost of proficiency—What it costs in time and dollars to bring a workforce to proficiency and how much it would cost to increase the level of proficiency. The metrics might include the fee for external resources compared to the aggregate cost of using employees, such as salary, benefits, facilities, equipment, and so forth.
Measures by Level
.
Table 11.3
is another example of how to organize issues, that is, at the workplace, work, or worker levels. The issues listed are examples.
Each identified issue then lends itself to questions about what the evidence is to determine whether there is a problem and what can be used to measure improvement. For all three levels, the measures and metrics might be the frequency of rework, misused resources, loss of talent, and the like. What may be different are the cause and the solution. When clients are given a menu of measures and metrics, they are in a better position to pick the ones that are most relevant, accessible, and would help them make better decisions. Having an organizing schema and using tools like that shown in
Table 11.1
also allow clients to add metrics meaningful to their situation. In the process it will become clear on what basis clients currently judge that there is a problem, the adequacy of the work done to address those problems, and the value of the solutions.
2. Set a Baseline
The baseline is simply the current state of affairs. Without this information there is little or no basis for determining whether circumstances improved as a result of an intervention or solution. The tool used to gain clarity (
Table 11.1
) can be expanded to record the baseline by simply adding another column, as shown in
Table 11.4
. The second column lists what is being used as evidence of a need, and the third column is where the baseline is recorded.
Table 11.4
has examples of the type of information to might capture in the Getting Clarity Tool.
Technique and Tip 3. Do Not Be Afraid of Fuzzy Data; Instead Improve It. Sometimes in our desire to be precise, people too easily reject or are suspicious of information about the current state of affairs because it is old or the client has doubts about its accuracy. A technique that helps is to take the data in whatever condition they are and suggest using the solution as an opportunity to get better data. For example, when clients say “yes, but. …” to the suggestion to use customer satisfaction survey results as a baseline, first discuss what other data might be available. For example, in the retail industry the number of returns and aging receivables might be used to augment customer satisfaction data. In the financial services industry, the number of referrals and renewal of contracts for services could augment customer satisfaction data. Next acknowledge that the data may be incomplete, but offer that they still provide a baseline, and future measurement will produce better data. Finally, offer suggestions about how to get more accurate baseline data such as leveraging data from other sources.
3. Leverage Data Already Being Collected
One of the frequently cited excuses for not evaluating program effectiveness is the argument that evaluation costs money and takes time. What people unfortunately conclude is that they lack the money and the time to measure change efficiently or cost-effectively. The argument presumes the measurement has to start from scratch or the beginning, so to speak. However, if clients leverage the measurement that is already occurring, they can save time and avoid unnecessary expenses.
Table 11.5
has examples of measurement activities commonly done. All of these measures could be leveraged to identify needs, set baselines, and measure improvement.
Table 11.5
Typical Ongoing Measurement
Annual employee morale survey usually done by human resources |
Customer satisfaction survey done by marketing |
Exit interviews done by human resources |
Safety reports usually done by safety or quality control |
Call center technical and customer support call sheets, usually done by the call centers themselves |
Periodic compliance studies done by internal audit or quality assurance |
Aging receivables report usually done by accounting, specifically accounts receivables |
Sales logs with number of calls, who was called, usually done by sales staff or their managers |
Technique and Tip 4. Assume the Data Already Exists. Most organizations collect an immense amount of data about their costs, operations, customer satisfaction, and the like. Therefore, the tip is to assume someone in the organization can already produce meaningful measures and metrics. The technique is to partner or collaborate with other departments that already capture different types of performance data. Going back to Tool 1a, the table has a column for current evidence. These are the data the organization is already getting. A question clients might be asked is, “How do they use these data to measure change or improvement?’ A challenge might be getting access to the data. A tip is to offer to help the other departments get better data or help them argue their case for greater management support.
4. Track Leading Indicators
Leading indicators are data that predict success or failure. A common mistake is to wait until a lot of time has passed to determine whether circumstances improved. Waiting also results in lost opportunities to reinforce a solution or to take corrective action. Examples of leading indicators have been added to Tool 1a. Getting Clarity in
Table 11.6
. Another column can be added or replace the baseline data column with one for suggested leading indicators. In most instances, the data sources for the leading indicators are the same as the baseline, but are captured and reported more frequently.
In other instances, the information sources for the leading indicators are not the same as for the baseline, and, therefore, would have to be collected. Here are some examples:
If the goal is for employees to get more timely feedback on their performance on the premise that this will result in fewer grievances and improved efficiencies; and the solutions include asking supervisors to do more frequent performance reviews, to redesign the performance review form, to automate the process, and to train supervisors on how to use the form, the leading indicators might be
The number of supervisors asking for technical support to use the new system each month
The number of reviews posted on the automated system monthly
The number of employees reporting that they got reviews in the last thirty or sixty days
If the goal is to improve customer retention on the premise that it will increase profits or margins and cash flow because repeat customers require less technical support and are more likely to buy more product and buy it more quickly; and the solutions are to offer technical training to customers, to certify customers who complete the training, and to have account executives call customers more frequently, then the leading indicators might be
The number of customers requesting information about the training and certification
The number of customers participating in the training and applying for certification
The number of account executives calling key customers more frequently
The average sales cycle times of customers who are signed up for training and eventually certified compared to those who do not sign up for training
The frequency and time duration of technical support to clients who are participating in training and later certified compared to those who are not trained or certified
Technique and Tip 5. Think of Leading Indicators as Formative Evaluation or a Way to Measure Transfer. Typically, formative evaluation occurs before the launch of a product or program. It is done to confirm the usability of a solution and the accuracy of the information. However, formative evaluation can also be done after the launch to measure usability rates and the target audience’s initial perceptions. In this case what is being measured is the rate of transfer; the goal is to identify early what needs to be done to increase usage and overcome resistance. Unfortunately, organizations often invest in a program, launch it, and then believe the target audience will automatically use it or adopt the new desired behaviors without further intervention. However, if the target audience does not use the program or adopt the new behaviors in a timely fashion, the odds are they will not do it later. Therefore, formative evaluation that is done after the launch can increase the odds that a program will be successful. A technique is to more purposefully do post-launch formative evaluation, or measure transfer, and to decide ahead of time what indicators to use to measure acceptance and resistance. In this case, the indicators become leading indicators or predictors.
Technique and Tip 6. Use Self-Report and Let People Know It. Self-report is the process of asking a target audience to report on its own behavior. Should someone question the validity of self-report, there is some research that shows it is valid (Norwick, Choi, & Ben-Shachar, 2002). The technique is to survey the people whose behavior is expected to change, usually the target audience of the solution. For example, if the solution was for people to use a procedure, system, or performance support tool, simply ask them how frequently they are using it. A tip is to let people know in advance that they will be asked at some time in the future about their usage. Another tip is to be sure to get permission from the target audience’s supervisor to solicit their input and then be sure to tell them permission was granted so they know any future questions are legitimate. Tool 2a, as shown in Table 11.7, suggests a five-point scale for surveying a target audience and possible questions.
Technique and Tip 7. Poll Vested Parties to Confirm Self-Report Data. Should clients continue to doubt the self-report data, a technique is to confirm the results by polling others who have a vested interest in the adoption of the new behaviors, such as supervisors, team leads, or customers. However, let people know in advance that they will be asked for their observations, and remind them what the behaviors are that they should be looking for. Tool 2b, as shown in
Table 11.8
, suggests some questions and scales for polling vested parties to corroborate self-report data.
5. Analyze the Data
The value of data comes from the insights they provide. However, insights require some analysis, which usually means some mathematical and statistical manipulation. Analysis also requires skill in sampling, instrument design, and quantitative and qualitative analytics.
Technique and Tip 8. Get Help from the Experts. Experts can suggest the best ways to get data, how to design data instruments so they are effective, how to sample people or work products so clients can have confidence in the conclusions, and how to analyze the data. A tip is to form a relationship with staff in the statistics department at a local university. Such faculty and graduate students have access to and experience in using data analysis software and, therefore, can save time and increase clients’ confidence in the results.
Technique and Tip 9. Understand the Different Types of Data. In the world of learning and performance, data come in many forms; therefore, it is important to understand the differences when starting to interpret the data. Data are usually classified as:
Hard data or data that can be independently verified, whether they are facts or opinions, or expressed in numbers or words.
Soft data or data that cannot or are not independently verified, whether they are in the form of numbers or words.
Quantitative data are numerical data (counts, percentages, weights, degrees, and so forth) and, therefore, lend themselves to some mathematical or statistical manipulation. They are used to predict outcomes or show relationships between variables. Quantitative data may be hard or soft, depending on whether or not they were independently verified. They are usually collected through the use of tests and measurement instruments such as weight scales, temperature probes, rulers, and the like.
Qualitative data are data that reflect opinions and may be expressed in words or numbers, as in the example, “On a scale of 1 to 5 with 1 meaning strongly disagree and 5 meaning strongly agree, how much do you agree or not with the statement X?” When they are expressed as numbers, they can be mathematically manipulated, but the data are still qualitative because they are measures of opinions. Also, when opinions are expressed as words or phrases, the frequency of specific words or phrases can be counted and manipulated mathematically. Qualitative data may be hard or soft, depending on whether or not they were independently verified. Qualitative data are usually collected through the use of interviews, observations, document checks, and surveys, in-basket exercises, and the like.
Here are two examples:
Self-report data using a five-point scale are soft (not verified), qualitative data (opinions even though the opinions have a numerical rating and can be averaged). Because a scale is used, the opinions can be mathematically manipulated. The ratings can be added and an average calculated along with percentages. Even the standard deviation of the cumulative scores can be computed. However, if data from another independent source are added, such as a survey sent to bosses, audit reports, or marketing studies, the data become hard data because now the results can be validated.
Results from a focus group session are soft qualitative data; however, adding corroborative data from another source such as exit interviews or surveys makes the data hard qualitative data.
The tip is to use data gathering methods that get the required data and to find ways to independently validate the initial findings. For example, test data that measures how much people know about how to do a task (quantitative data), might be validated by examining data from time sheets or production reports (measures of how much they did), quality assurance (measures of how well they did it), or direct observation of their performing the task in the work setting (measures of how they did the task). Test data alone is not a valid measure of whether or not people can do a task. A technique is to go back to Tool 1 designed to help get clarity and add yet another column for how you will validate the data you gather when measuring the effectiveness of your solutions as shown in Table 11.9. Getting Clarity Expanded More.
Technique and Tip 10. Understand What Descriptive and Inferential Statistics Do and When Each Is Used. The tip is to understand the more common statistical methods and what questions each is intended to answer or to measure. There are two types of statistics, descriptive and inferential.
Descriptive Statistics. Descriptive statistics are used to evaluate data derived from interviews, surveys, focus groups, document checks, observations, time studies, and the like. They are also used to analyze test data by calculating how many people received what score, what the average score was, and how many scored significantly higher or lower than the majority of the group. Commonly used descriptive statistics are
Frequency count—Answers the question of how often a value occurred, such as how many people took a test, how many passed the test, how many scored 70 or above, how many used the sales job aid, and so forth.
Percentage—Answers the question of how many compared to the whole group did each of these activities, such as what portion of the group could have taken the test, passed the test, and used the job aid.
Averages—Answer the question about what is typical. There are three ways to calculate the average. One is to calculate the mean, or the arithmetic average. Mode is the most frequent value, and median is that point at which there are an equal number of values above it as there are below it. Collectively, the mean, mode, and median help clients determine what is typical or common. For example, averages answer questions like, “Overall how well did people do on the test? There are three ways to answer this question: (1) the mean test score was X; (2) the most frequently earned test score was Y; and (3) half of the people scored above Z and half of the group scored below Z.
Standard deviation—Answers the question about what is not typical because a value is too far above or too far below the mean. Values that are one or more standard deviations above or below the mean are considered significant.
Inferential Statistics. Inferential statistics are used to determine whether there is a relationship between two variables, whether from two groups or different data from the same group. Inferential statistics help compare people’s performance before the intervention and their performance afterward, match people’s actual performance to what was expected of them, and answer the question of whether changing something produced a change in something else. Inferential statistics require the use of formulas and statistical tables to interpret the results. There are many different types of inferential statistical formulae. Here are three commonly used ones:
Correlation coefficient (Spearman Rho)—Answers the question of whether there is a relationship between two sets of data. For example, it answers the question of whether or not success on a test influences people’s tenure on the job, or the opposite question: Did people’s tenure on job influence their performance on a test?
Chi-square goodness of fit—Answers the question of whether or not a value is different from what was expected. For example, it answers the question of whether people’s performance on a test was what bosses expected.
Two-tailed t tests—Commonly used to determine whether pre- and post-tests scores are significantly different.
Working with an expert is especially helpful when deciding how to analyze data, what methods to use, and what conclusions can be safely drawn from the results.
Technique and Tip 11. Do Not Confuse Correlation with Causation. Correlation measures the relationship between two factors, such as test scores and job retention. It answers the question about the degree to which a factor is influenced by a change in another factor. The influence may be positive in that, when there is a change in one factor’s value, the other factor’s value changes in the same direction, that is, they both increase or decrease. For example, assume you want to know whether people with higher test scores stay in a job longer than those with lower test scores. The correlation is positive if those with higher test scores do stay in the job longer. A negative correlation means that, as the value of the first factor increases, the value of the second factor decreases, that is, they go in opposite directions. For example, people with higher test scores seem to spend less time in a job. If there is no correlation, then there is no relationship between test scores and job tenure.
Causation measures the degree to which one factor causes a change in another factor. In this case, a goal may be to determine whether higher test scores (seemingly being smarter) causes people to stay in a job longer. Causation is much more difficult to prove because the other factors that may contribute to tenure have to be ruled out or eliminated. In the case of job tenure, it is important to isolate test scores from age, proximity to work, the relationship with the boss or co-workers, salary, the presence of other employment opportunities, and so on. Because of the need to isolate the one variable being measured, tests of causation require controlling the other variables, called “controlled studies.” These studies select two identical groups and do something to one group (such as train or offer incentives) and see whether something happens to the first group compared to the other. For example, to determine whether offering tuition reimbursement reduced turnover, it would be offered to one group of employees, but not to another group, and then we would see whether there was any effect on retention. However, it would be essential that the two groups be identical on other factors, such as education, job experience, and age, work with bosses with identical characteristics and management styles, and so on. Therefore, the tip is to do correlations unless there is the luxury of doing a controlled group study and sufficient time to determine whether the assumptions about how performance will be affective come true.
Technique and Tip 12. Use Tools and Guidelines Already Available. A tip is to make use of the proven tools and guidelines developed by others. For example, the book Performance-Based Evaluation (Hale, 2002) comes with a disk that has forty-four tools, including ones for developing surveys, designing and conducting interviews, conducting focus groups, and doing observation. It also has guidelines on how to analyze the results from interviews, surveys, focus groups, and observations. The book The Value of Learning (Phillips & Phillips, 2007) is full of guidelines on how to get and interpret data. This handbook and the references in this and the other chapters also provide a list of proven resources that can help save time and money. It is smart to leverage the work of others.
6. Tell the Story
It is not enough to gather and analyze data. It is important to communicate what was done, what data were used, and what changes were tracked. Communicating helps keep clients engaged and may prevent the misuse of the data or the propensity to draw the wrong conclusions. Telling the story goes beyond reporting the results or comparing the before and after states. It means presenting information in ways that help clients see relationships and implications.
Technique and Tip 14. Use Pictures or Graphics. Once data are analyzed, the numbers may be sufficient by themselves to tell the story. However, clients may not be willing to spend the time reading the detail. A technique is to develop an illustration that reflects the whole story. The illustration can take many different forms. For example, it may be a chart, graph, diagram, or another form. The goal is to communicate what was done, the agreements made, and the results achieved in one picture. One side benefit of simply telling the story is a more productive conversation about what was accomplished and what must still be done to fully appreciate the benefits of the solution. Figure 11.1 shows Tool 3. Tell the Story. Tool 3, like Tool 1, is meant to be modified or adapted to each situation. For example, it is missing a timeline, which can be easily added when illustrating a specific story.
Figure 11.1
Tool 3. Tell the Story.
Example: Reducing the Cost of Sales
. Assume a consultant, as part of a team, is asked to help reduce the cost of sales. The cost of sales is mostly driven by the time and related travel expenses invested by the salesperson in prospecting (repeatedly calling the prospect to negotiate a convenient time to meet and finally meeting with the prospect to get enough information to develop a proposal), then developing and delivering the proposal, conducting follow-up calls and meetings, and so on. The more time the salesperson spends on getting the contract, the more costly the sale. The costs are even higher when the time invested does not result in a sale, since this cost of time has to be spread across successful sales. Currently 40 percent of the proposals are unsuccessful as they do not result in a sale. A deeper examination of the problem showed that one factor that contributed to the failure to get the contract was that salespersons do not appropriately qualify the prospects; instead they engage in the full sales process independent of whether the prospect is potentially a viable customer. It is also learned that that sales persons are rewarded for the number of prospects they have in their sales pipelines, not for doing a good job of screening out prospects less likely to buy.
The team recommended that the company change the way it measures and rewards the sales staff and adopt a more rigorous process for qualifying prospects. The sales staff was trained on the new qualifying process. Sales management endorsed the new qualifying process and agreed to reward sales staff on using the process. Sales management tracked and reported monthly the number of pre-qualified prospects in the pipeline compared to the number not pre-qualified and percent that resulted in sales. Finance compared and reported last year’s aggregate sales call expenses with the coming year’s expenses on a monthly basis. Everyone agreed the leading indicators were the frequency the new process was used and the number of qualified prospects in the pipeline. Finance also agreed to correlate the number of sales people trained in the qualifying process with sales and the cost of sales. At the end of the year, 80 percent of the leads were qualified using the new process and the number of unsuccessful proposals dropped to 12 percent and the cost of sales dropped by almost 25 percent.
Figure 11.2
helps tell the story.
Figure 11.2
Reducing Cost of Sales.
Example: Shorten the Time to Proficiency
. Assume a team is commissioned to shorten the time it takes to bring new hires to full proficiency, meaning they can independently perform all of the tasks to standard measured in terms of accuracy, completeness, and safety. Currently new hires complete a 9-week training program; however, according to their managers it takes an additional five months of on-the-job experience before they are considered “proficient.” Nine weeks plus five months were accepted as the baseline. The team recommended that electronic performance support tools be created to reinforce the training and support new hires when they are doing the more complex tasks. The team also recommended that new hires be told what criteria will be used to judge their proficiency. Information about the evidence of proficiency and the use of the performance support tools was then incorporated into the training. Once the support tools were developed, new hires reported weekly on how frequently they used them and which ones they used; this also reinforced the use of the tools. Managers were asked monthly to rate the new hires’ level of proficiency. Asking managers to rate the new hires on a monthly basis also focused the managers’ attention on the behaviors and results they had agreed were evidence of proficiency; this also reinforced the desired behaviors. Usage and managers’ ratings were accepted as leading indicators in that the data were reported in monthly management meetings to sustain interest in the initiative and support for the tools. To determine if the new training format with the added performance support tools was effective, the time to proficiency of new hires trained in the new format was compared to that of new hires trained using the old format. The time to proficiency went from five months to six weeks.
Figures 11.3
and
11.4
were used to tell the story.
Technique and Tip 15. Remember That Two Heads Are Better Than One. Consolidating information into one or two graphics helps to focus on the important metrics, the assumptions about what behaviors will more likely lead to improved results, and what clients are willing to accept as evidence of improvement. It also is more respectful of clients’ time. A tip is to collaborate with clients and colleagues when deciding how to best tell the story. Another tip is to create a draft using Tool 3, for example, and let clients modify it. The goal is to learn ways to more quickly communicate the situation, the measures and metrics, and the measurement process. A tip is to think less is more and simpler is easier. However, communicating complex relationships is not simple, which is why it helps to work with a team to come up with innovative ideas for telling the story.
Figure 11.4 New Time to Proficiency.
When to Apply the Rules, Tools, Techniques, and Tips
Rules and tools are best used when clients are considering a significant investment of money and resources either to fix a problem or take advantage of an opportunity. They are also helpful when anticipating questions or doubts about the usefulness or wisdom of a future or past investment.
SUMMARY
This handbook contains many resources and guidelines for evaluating interventions. This chapter has focused on six rules, three tools, and fifteen tips and techniques for measuring the need for intervention and the results of intervening. Collectively, the rules, tools, techniques, and tips are intended to increase the odds that evaluation is done, is cost-effective, and is meaningful. For example, to increase the odds that evaluations:
Add value to clients—Make sure there is clarity of purpose and agreement on the measures and metrics.
Are done and done well—Leverage the measurement activities of other groups in the organization.
Produce results that will be accepted as valid—Work with measurement experts and involve clients in the process of identifying metrics.
Help interventions fully realize their potential—Track leading indictors to identify the need for further intervention early.
Are appreciated—Learn to tell the story simply of what was done and the results that were achieved.
References
Dessinger, J. C., & Moseley, J. L. (2004). Confirmative evaluation: Practical strategies for valuing continuous improvement. San Francisco: Pfeiffer.
Guerra-López, I. (2008). Performance evaluation: Proven approaches for improving program and organizational performance. San Francisco: Jossey-Bass.
Hale, J. A. (2002). Performance-based evaluation: Tools and techniques to measure the impact of training. San Francisco: Pfeiffer.
Hale, J. A. (2007). Performance consultant’s fieldbook: Tools and techniques for improving organizations and people (2nd ed.). San Francisco: Pfeiffer.
Norwick, R., Choi, Y. S., & Ben-Shachar, T. (2002). In defense of self-reports. The Observer, 15(3).
Phillips, P. P., & Phillips, J. J. (2007). The value of learning: How organizations capture value and ROI and translate into support, improvement, and funds. San Francisco: Pfeiffer.
Stolovitch, H., & Keeps, E. (2004). Analysis and return on investment toolkit. San Francisco: Pfeiffer.
CHAPTER TWELVE
Testing Strategies: Verifying Capability to Perform
Peter R. Hybert
What is the best way to assess that someone
Knows how to do the job?
Is ready to “go solo”?
Beyond general impressions, what is the best way to verify someone’s capability to perform the job? The standard answer would be “create a test.”
Presently, the business climate seems to be increasing its emphasis on testing and certification. Seemingly, more businesses are using testing in a training context—as a way to establish a baseline entry point, as a way to “test out” of unneeded training, or as a way to ensure capability at the end of training. Many professional associations and outside companies are offering certifications (for example, Microsoft, Project Management Institute, American Society for Quality, American Society for Training and Development, International Society for Performance Improvement) as a way for employees to build up credentials as evidence of their capability. In some cases, especially in regulated environments, testing is even used as a “gate” to control entry into the workplace (meaning that employees need to be qualified before they are allowed to go “solo”).
Once it is decided that a test is needed, it is easy to jump directly to creating a series of multiple-choice and true-false questions. However, testing is too important for this haphazard approach. It is worth taking the time and effort to define an overall strategy for testing that fits the performance and the business situation. Many, if not most, of the key decisions are business decisions, not training decisions. How does management want to run the workplace? How much control do they need or want over who is allowed to perform what work? What regulatory requirements exist? What liability and risk are they accepting when they allow employees to perform work for which they may not be capable? How much effort (that is, infrastructure, time, attention) are they willing to spend on testing?
Another reason that test strategy is important is that testing can be challenged. All testing involves evaluating a sample of the total performance and, if someone dislikes the result, he or she may well challenge whether the test was fair or relevant. Legal challenges can be expensive and damaging to the credibility of the organization. If a business intends to test, then it will want to be on solid footing.
An important business consideration is that effort spent creating and implementing testing adds cost to the business. Technically, it also adds no value … at least not to the learner or the ultimate performance. It is similar to quality inspection in manufacturing. Testing is necessary to ensure an acceptable end product, but the actual act of checking for quality is verification; it only provides peace of mind. From the standpoint of cost, the fewer resources needed for inspection, the better. The best result is when no defects are found . . . which means the test was unnecessary after all. Much like insurance, it is better if it is never needed. How much to invest in testing is a business decision.
As an aside, when the performers receive feedback on their test performance, there is value. But, if the feedback is used for learning, the “inspection” has become more than a test—it is now a learning activity. And, because it changes the performance, it is no longer a measure of capability but a continuous improvement mechanism.
Finally, there are times when testing is just poorly done—superstitions and ungrounded assumptions are often built into the way tests are developed and administered which undermine their usefulness and accuracy.
COMMON PROBLEMS AND MISCONCEPTIONS ABOUT TESTING
There is no such thing as a completely objective test. All tests require judgment, even “objective” tests. Making a single defendable correct answer does not make a test objective. How was that test item selected? How was it worded? How was the test administered? All these decisions affected the question in front of the learner and all probably involved some subjective decision making. If the testing is intended to reflect capability to perform, consider how many job situations occur in which there is one right answer that the performer simply needs to choose from a list of options . . . not many.
Tests test whatever they ask the performer to do. If a test asks a participant a multiple-choice question, it tests the ability to perform “trial and error” reasoning. Good test-takers can eliminate some of the options and improve their chances of getting it right. In fact, there are books published to help college students figure out how to do just that. On the other hand, token testing is set up merely to meet a compliance requirement; everybody passes because they have to. In this kind of test, all that is tested is the employee’s ability to “go through the motions because in life there are stupid rules that should just be followed”—it may be easier, but it is probably not something to encourage.
The common “cut score” of 80 percent is more a custom than a reasoned decision—at the least, there is judgment involved (Shrock & Coscarelli, 2007). In some cases, people insist on 100 percent as the passing score because they are unwilling to specifically identify the 20 percent that is not important. In other cases, there is a perceived risk about any employee who does not pass a test, even if he or she passes it later. The concern is that, if something happens in the future, those test records could be used as evidence that the company knew the employee was not capable. So to cover this eventuality, the tests are made very easy; they no longer test the boundary conditions that would clearly delineate capability from non-capability (FDA, 2008).
In short, many tests really do not measure what their authors really want to know. At best, they test only memory, test-taking skills, or the ability to do abstract reasoning. But they rarely look at the overall ability to execute a task or perform a job in a real situation. If memory, test-taking skills, or abstract reasoning skills were reliable indicators of capability, it might be an acceptable tradeoff but people can perform poorly at tests and still get the job done quite well. This is the same phenomenon as people who do poorly in school but exceptionally well in business. This is not saying anything about school or business, but simply that the criteria for success in school are not necessarily the same as for success in business. If the goal is to verify that employees have the capability to perform, then test the ability to perform. Knowledge testing does not do this; performance testing does.
Finally, there is the question of cost for value. Using a knowledge test for important decisions such as hiring, promotion, or pay requires a significant investment to ensure that the company is not exposed to legal risk through challenges of bias or validity. It takes a lot more effort to develop and implement a knowledge test. In many business settings, there is time and cost pressure. Large corporations are typically risk-averse. Performance testing can reduce these concerns while also providing a better picture of learner capability.
What can be done to implement effort testing? Below are four keys to a testing strategy that verifies capability to perform.
Develop an overall testing plan that supports and is integrated with business management practices.
Develop testing plans for strategic areas of the performance; do not test everything.
Design a logical “library” of tests that can be used in training, on-the-job, or even in selection situations.
Use performance testing when possible and knowledge testing only when required or when there is no other alternative.
WHAT IS A PERFORMANCE TEST?
Performance tests are exactly what they sound like—the candidate performs a task and the evaluator observes the process or evaluates the output using the same criteria used to evaluate real job performance. If employees can perform, they pass.
A knowledge test, on the other hand, asks employees questions about facts, rules, classifications, and occasionally even situations so that the employees can describe a response believed to be correct. In many cases, test-makers make these tests easier to grade by making employees select options from a list, rather than generate their answer from scratch.
These two approaches to testing are much different—see Table 12.1 below.
To take these definitions even further, let’s look at the difference between a performance test and a skill test. They are really not the same thing. A skill test may consist of performing something, but it may still not be a test of the actual performance.
For example, consider the typing test that is often used as a requirement for data-entry jobs. The assumption is that, if applicants can manipulate a keyboard at a certain rate with only a certain amount of errors, they will be able to perform the job. This may be true. Yet, many jobs are measured by more than just the rate of data entered. Imagine someone making a hiring decision for a data-entry job. Is it better to hire the fastest typist or to hire someone who notices when something seems to be incorrect? Is someone who just types whatever is provided more desirable than someone who notices gaps or errors? The employer may be better off hiring someone who would actually look up and insert an address into a letter, instead of someone who would literally type “put address here” right into the document.
A typing test measures a skill, not the entire performance. A better test would be to identify some typical work products (for example, letters, reports, presentations) and give the individual a reasonable amount of time to create them and then review the complete performance. This testing would be more reflective of that person’s actual capability to perform on the job.
COMMENTS ON KNOWLEDGE TESTING
Until now, the chapter has emphasized performance tests. However, there are still occasions wherein a knowledge test must be used. A knowledge test can be an effective way to confirm that learners remember information they have received, can apply rules they have learned, and even determine strategies about performance.
Where knowledge tests are used, it is important to be clear about the reason for using them, their limitations, and how the results will be used. A great deal of complexity can arise from introducing statistics and validation into the discussion and is beyond the scope of this chapter. However, a key principle to apply is “get as close as possible to performance.”
As mentioned earlier, work situations almost never present a decision with a clear list of five options or only a binary (true or false) decision. But portions of the process may require a decision or an answer to a question. Those situations can be developed into performance-like questions—where the performer can demonstrate, if not “know-how,” at least “know-what” as in “know what to do.”
GUIDELINES FOR TESTING—A PRACTITIONER’S VIEW
Training professionals working on designing or developing tests often have to work within a set of limitations and challenges.
Most practitioners have to move quickly.
In many cases there is a small audience to address, frequently under one thousand people.
There are often multiple stakeholders. The testing strategy needs to meet the requirements of business leaders, regulators, line managers, and employees.
Budgets are typically limited, especially for something that may only be a quality check.
There are cultural expectations and biases against tests that must be overcome due to people’s experience in school.
Changes in the business climate drive ongoing maintenance of the content over time and, possibly, even re-testing.
They need to avoid rewriting things that already exist, for example, taking the procedure and turning it into a test by putting checkmarks next to each step.
Finally, they need to really test what their learners need to know and whether learners actually have the capability to perform.
Performance tests meet the requirements above and should be the weapon of choice for most testing, especially when it is being used for decisions that affect the employee’s future. Knowledge tests make most sense for prerequisite concepts and informal “checks for understanding.”
TEST PLANNING PROCESS
If testing is used in a larger context, whether a qualification/certification system or as part of a training strategy, it is important to design an approach that applies testing at the right places and in the right amounts to meet the business need without unnecessarily adding time and cost to the process.
The first consideration is how testing supports the overall business intent. After that, specify individual tests and how they fit into the performance sequence, learning strategy, and role requirements. The final phase is actually designing and developing the test instruments. Although there are many alternatives paths to take, a general concept structure such as the one shown in Table 12.2 can help to guide thinking and provide a basis for the communication needed to make the combined business and training decisions.
TEST PLANNING CONCEPT MODEL
The model in Figure 12.1 illustrates a way to think through the decisions needed to define a test strategy. It is a way of visually breaking down the performance to identify the key components. This breakdown can help the test designer choose where to place the measurements, that is, where to use testing to get the best information with the least risk and effort.
At the top of Figure 12.1, at the level of “Workplace/On-the-Job,” is the actual work the employee performs. It could be a call center agent answering an incoming call, in which case the output would be an order or a refund or an answer. Or it could be a manufacturing operator, in which case the steps might be the actual manufacturing process and the output could be a component or even a finished product.
The middle level shows a partial view of the supporting capabilities needed to perform the work in the level above. This model is essentially like a learning hierarchy (Gagné, 1977) but instead of showing what is required to learn, it shows the “chunks” of capability needed to perform. This model works well for visualizing the relationship but it has a weakness. Quite often, the supporting capabilities are needed to perform multiple steps. Consequently, in trying to include all the links, the diagram can become unhelpfully complicated.
Figure 12.1 Components of Performance.
Source: Peter R. Hybert, 2007. Reproduced with permission.
The bottom level includes supporting capabilities that are outside the scope of consideration and are, as a result, “prerequisite.”
The “callouts” in the figure indicate potential test points within the model. At the Workplace level, evaluate the importance of the performance. Is it necessary to actually verify that employees can execute the performance per the company standards before allowing them to do so? In other words, does this performance require a “gate” or “checkpoint”?
If it does, identify the output criteria. For example, if the output is a component design, the criteria may be that the component can be readily manufactured. Or that it meets cost targets. Or even that the design documents (drawings, specifications, and so forth) meet internal format standards.
Next, identify the process criteria. For example, for a manufacturing process, each step may require specific things to be done in a specific order. For example, does the operator wait until the tank has cooled before beginning to drain it?
Moving down to the Training level, decide whether you need any verification of the supporting capabilities. If so, it may be appropriate to define knowledge or skill tests for individual capabilities.
The supporting capabilities may be trained and tested separately; they may be grouped and trained all at once but then tested only at the step level; or they may be tested only at the output level. All those decisions depend on the amount of learning and doing involved in the work and the capability of the audience.
At the prerequisite level, decide whether to create entry assessments (for example, literacy, typing accuracy, and so forth) to confirm that learners have the necessary prerequisites to begin learning the supporting capabilities they need to perform in the workplace.
An important insight is that, from the bottom to the top of the model, all of the supporting capabilities below must be present in order to perform the blocks above. That means that if an employee can perform the steps and produce the output, it can be assumed that he or she has the underlying capabilities as well. However, testing for the underlying capabilities doesn’t necessarily prove that employees will be able to perform the steps and produce the output.
A common sense way to prove this point is to consider the question: “Who would a reasonable person rather have performing an operation on her loved one? Would she rather have someone who earned good grades in medical school and has great skills in identifying the appropriate organs, suturing, or whatever, or someone who has actually performed the specific procedure in question on over one hundred patients with no undesirable outcomes?” Most people would probably choose the second physician, assuming that, by definition, anyone who can consistently and successfully perform the procedure must also have enough of the needed medical skills and smarts to do the procedure, whereas there is no evidence that the first physician, who may be very skilled and smart, can actually “put it all together” or integrate the supporting capabilities into the desired performance.
The question above also surfaces another key point—real performance deals with “noise factors” and non-standard occurrences. The physician is expected to be able to execute the procedure not only when everything is normal, but also when confronted with unexpected circumstances. What if an abnormal heart rhythm develops? What if the patient goes into shock? What if the patient has a reaction to the anesthetic? All the knowledge and skills learned in a “safe learning environment” are necessary but not sufficient. Performers who can “get it all done” in real situations constitute the desired outcome. As much as possible, learners should be trained and tested to deal with those real-world situations.
DESIGN DECISIONS FOR PERFORMANCE TESTS
When it comes to actually creating tests, there are two separate activities, design and development. Although these terms are often used interchangeably in the training business, they are really two different processes. Design is synthesizing analysis data and then defining or specifying a solution. Development is actually building that solution. Development is where the team can “divide and conquer” to build several tests in parallel, as long as there is a clear design that provides a framework within which the developers can work.
In general, it is more efficient to design all of the tests and test-related business processes before beginning development.
Assuming a need for verification tests, additional business decisions must be made regarding how the testing will be implemented. They include:
Where will the testing be done? In the training environment, the work environment, or both?
Who will administer the testing? In most cases in which performance tests are used, the best resource to implement the test is a master performer, that is, someone who is very capable at performing the task, rather than a trainer or supervisor who does not regularly do the work.
Is the organization willing and ready to implement performance testing? If the use of certification is being driven primarily by political or marketing motives, there may not be support for the work analysis and processes needed to implement true performance testing in the workplace. If sponsors are more interested in just getting a test in place, rather than a true verification of capability, they would probably be satisfied with a simple one-shot written test, probably delivered via computer.
Companies that do decide to seriously implement performance testing may choose to collect and sequence a number of tests into a qualification path for key roles or sections of the organization. One advantage of these systems is that they can allow people to legitimately “test out” of training if they already have the capability, while still giving the company a way to verify that employees are, in fact, capable.
One company built a performance-test-based qualification system when several critical factors came together. They had a sponsor who had always wanted to change the compliance-oriented approach of “going through the motions” that was used historically. The company had a plant shutdown scheduled, which gave the test development team and master performers a window of time to implement the tests without impacting production. And they had a business reason—they needed to qualify the operators when they returned from shutdown on the new/changed equipment and processes. Of course, it is not necessary to wait for all the stars to align to make performance testing viable, but in this case it made it an easier decision for management.
Defining Test “Chunks”
After solving the business direction questions, the next step is beginning to design the tests. The first design problem is how to “chunk” the work into areas for which to build tests. And, once again, this is really a business question.
For example, in a manufacturing environment, a supervisor may decide she wants all operators to be qualified before allowing them to perform specific tasks. But does she want them qualified to do everything in the area or on subsets of the work, such as operating a specific machine? One view says to qualify them in the entire area so that supervisors know they can assign anyone in the area to any task that needs to be done. Another school of thought is to qualify employees on smaller tasks so that they can be put to work more quickly instead of waiting for them to get qualified on everything. That is a management question. A smart test designer will design the tests to be as reconfigurable as possible, though, just in case.
Once the big blocks of performance are identified, the test designer needs to decide whether each requires one or many actual tests. For example:
Are separate tests necessary for each product or product variation?
Can different shifts or locations use the same test?
The key here is the concept of the “assignable chunk.” If a supervisor can assign someone to a task with discrete boundaries, that may be the appropriate chunk for a test. For example, quality control technicians often use test equipment in a lab to perform similar tests on a range of products. But the actual test steps and parameters may vary. This will require a decision about the number and scope of tests needed. If the performer can perform the test on one product, can he or she perform the test on any product? The answer is “it depends.” The best person to make the decision is probably the master performer.
One option that can be tempting is to reduce this to simply “how to use the particular piece of equipment.” This seems appealing, but it is really downgrading the test from a performance to a skill test. It is a bigger leap of faith to assume the learner will be able to put the entire performance together in the real environment.
The “assignable chunk” approach allows a test designer to break a larger performance into smaller parts that can be tested practically. If an employee might conceivably only perform a portion of some process, it becomes difficult to test the complete process in the real environment. For example, some manufacturing processes run over several days and several shifts. That means that the employee may be assigned to one set of tasks one day and another the next day. A second-shift operator may never perform certain tasks. Breaking the performance into the blocks an employee will likely be assigned to within one shift makes it easier to keep track of who is partially qualified and what parts remain to be observed.
For very long-term (multi-year) processes, a “portfolio approach” can be used. For example, engineers may contribute to a large number of product development projects. Engineers maintain a portfolio and, when they have actually performed all the pieces, even if over several different projects, they can submit the portfolio for review by an evaluator. The evaluator would critique the end-product (the portfolio), not a process, so it doesn’t affect accuracy to have the evaluator checking it all at once. A downside though is that the employee will have been doing the work for quite a while without being officially qualified; this may or may not be acceptable in some situations. (This is usually managed with supervisor/mentor oversight.)
Using the “assignable chunk” approach is also a good way to figure out the library of tests needed for a specific role or area and then sequencing them for individual qualification. Perhaps the second-shift quality control technicians do not perform all the same tests as the day shift. Maybe the first part of the process is not the easiest to perform, so it may not be the first part to learn. Or maybe the Line 2 operators make a variation of the end-product that does not require some of the same tasks as needed by the Line 1 operators. Does the supervisor need to be able to shift people from one manufacturing line to the other or do people always work on the same line? The answers to these questions describe the assignable chunks that can be used to create the overall list of tests for qualification requirements for specific roles, shifts, areas, and so forth.
Product or Process?
A good way to think about the difference between testing the product and testing the process is thinking about how someone’s ability would be tested to make the turkey for a family’s Thanksgiving dinner. The model in Table 12.3 answers additional test design questions. Assuming performance is being tested, should the test measure the end-product or the process?
In general, testing the product offers many advantages, but it will not work for every situation. If evaluators only have to assess the product and not also observe the process, they can schedule the assessment at a convenient time, rather than having to be present during the performance. It also allows flexibility in how people get things done—as long as they generate the desired end-result. Of course, there are times when flexibility is exactly what the business does NOT want. In those cases, evaluators may need to observe the process.
When necessary, testing both the product and process is certainly feasible. However, additional testing generates additional effort and cost so it is important to evaluate the cost/benefit.
Real or Simulated Work?
To test performance, is it better to test the actual performance in the job environment or in a simulated setting? Testing the performance of real work has the benefit of including “noise” factors, as well as possible boundary situations. And the employee is getting work done during the testing—the downtime of being off the job to be tested is eliminated. But sometimes it is more effective to design a simulated test to ensure that specific challenges are built into the test and will always occur. There are advantages to both approaches, depending on the performance being tested, some of which are shown in Table 12.4.
Regardless of which setting is utilized, performance tests should be “open book” tests. When employees are being tested, they should be able to reference any information or tools they would normally be able to use on the job. Procedures, computer look-up’s, reference guides, “cheat sheets,” and other performance support tools (PSTs) should be allowed. The only document the employee should not have access to during the actual testing is the performance test itself, since it isn’t actually a reference resource used on the job.
Overall Pass/Fail—“Deal Breakers”
Eventually, specific criteria must be defined, either for individual process steps or for the characteristics of the end-product. However, prior to that, the overall “deal breakers” must be decided upon and delineated. “Deal breakers” are any criteria that, if not met, result in failure on the test, regardless of the performance on individual steps. These criteria usually involve safety or legal requirements as well as general criteria for the end-product. For example, if drivers can perform all the steps to change a tire, they should still fail if they forget to secure one of the lug nuts (end-product criteria). They should also fail if they put their feet under the wheel while jacking up the car because this is a safety violation (process criteria). In most business settings, these overall criteria minimally include compliance with laws and procedures and the use of safe work practices.
TIPS FOR DEVELOPING TESTS
Once all of the tests are designed, the development process is fairly straightforward—anyone who can perform a detailed task analysis can probably develop a performance test. Just like any tool, the test has to be designed for usability in the environment by the audience.
Identify the Critical Elements (Performance “Micro-Criteria”)
To create the test, one needs to understand the work at a very detailed level. One of the most effective ways to accomplish this is through in-person observation of a master performer doing the work, augmented by questions and explanations. Although a comprehensive treatment of performance analysis will not be discussed in this chapter, it is very important to the development of effective performance tests. Below are a set of key “watch-fors” and tips to be aware of when doing this.
Find a master performer, not just an expert, but someone who does the job well.
Have the official procedure available, if there is one.
Actually watch the person work or talk through examples—interviews can be helpful, but it is much better to be in the environment, see how the performer organizes the workspace, uses tools and information during the process, and so forth.
Imagine performing the work yourself.
Ask questions until the person’s answer is completely understood. Any question that the test designer would ask is probably a question that a learner would ask and quite possibly something that should be trained on and tested for. Besides, master performers are often better at doing the job than at explaining it.
Build rapport. Especially in a manufacturing setting, employees may be concerned that anyone observing them is trying to find something wrong. In this same vein, it is essential to handle the situation delicately when variances from the procedure are observed or when an operator explains something about how things “really get done.”
Focus on what the performer is doing: decisions he or she has to make, how he or she knows what to do next. Remember, it is a performance test, not a process audit. Quite often the human performance is not the same as the work process.
Look for “boundary conditions” and non-standard decisions and actions. Use questions like “Why?” “What happens if X doesn’t happen?” or “What determines when X is ready for Y?” to find out tacit know-how.
The goal of the analysis is simple enough—to identify the critical criteria for each step of the performance. The intent is to find out, when the performer does the task, what is important about it and what can be more or less ignored because it doesn’t affect the outcome.
To capture the information during the analysis process, simply take notes. Try to capture as much as possible at the time, but try to keep up with the work in real time. Creating quick diagrams or taking digital photos can also help with recalling details later. Remember, every observation might not be captured the first time around, so some situations require multiple observation sessions.
An effective approach might use a performance checklist as the actual test instrument so that the steps and criteria are visible and so that the coach can use the test as a recording tool during the observation process. Additionally, this format can also be used as a coaching tool or practice document for the learner during training.
An Example of Performance Test Content
Using an example of a product-labeling process operator, a partial example of the performance test with detailed criteria is included in Figure 12.2. In this case, the steps from the perspective of what the employee would be doing and the criteria the evaluator would use to determine whether the employee met the requirements of each step are provided. In the case of an output review, this would describe the steps the evaluator should perform and criteria to evaluate when reviewing the output.
Identifying the detailed criteria often results in generating new know-how. Looking at each step and deciding what the key criteria are may be a level of focus that has never been done on the work. Often, procedures are developed when the process is new, but over time the focus moves to newer work processes. As a result, continuous improvement (or evolution) and best practices are not always fed back into job documents. When performance tests are developed for existing work, there are often discoveries that can be used to standardize, or even streamline, the work.
Figure 12.2 Performance Checklist Example.
On occasion, merely watching the performance is not enough. Some situations require finding out whether an employee understands what to do in non-standard situations or situations that are unlikely to occur during the observation process. “What if there was a problem here?” “What should be done if there are three rejects in a row?” These types of questions can be inserted into the performance checklist at appropriate times to check for this know-how without having to actually observe it. When creating the test, keep the flow of the work in mind, because evaluators may not be able to interrupt employees with questions when they need to be focused on the work they are doing.
TIPS FOR DEVELOPING KNOWLEDGE TESTS
With a little work, a number of the same principles from performance testing can be applied to knowledge testing:
Define an overall strategy for testing—where are tests really necessary and why? Figure out the business requirements for testing and be clear about whether a test is a verification step or simply an in-process check within training.
Design the test before developing it. This sounds obvious but, in practice, test developers often just write questions. Designing it first means identifying the key capabilities to be tested and then allocating test items proportionally across those capabilities.
Include tests of knowledge of what to do in non-standard situations to ensure that capability in “boundary situations” is verified.
Consider making the test open book, in the sense that the employee is allowed to use reference materials and information that would normally be available on the job. This will help keep the test closer to the performance.
CONCLUSION
Ultimately, testing is a way to confirm, or verify, capability of employees to perform a task, duty, or job. Testing that is used for important human resource decisions such as hiring, promotion, compensation, or even work assignments should be fair and valid. Performance tests can be an effective way to teach, coach, and test capability if the organization can implement the processes and build the culture to support them. The benefits are significant—improved performance, reduced liability, and even more consistent work processes.
For effective performance testing, start from the top and identify the most critical work performances. The most critical tasks are those that require extensive employee capability to perform or those that, if performed poorly, may pose risk to customers, employees, or equipment.
First, decide whether a test is necessary to ensure employees are capable of performing. Then, selectively target testing where it will give the best view of capability. Base “pass or fail” on job performance criteria. Test the boundary conditions—don’t make it easy. But do design the tests to reflect the real job challenges, the “noise factors,” and situations that people will likely encounter.
Performance testing will yield improved clarity of what is really required on the job. The information and tools can be used in training and coaching as well as for testing. Because performance tests are almost a mirror image of analysis, they can be developed rapidly and early in an overall training process. Ultimately, performance testing will give people who can do the job a fair and valid way to demonstrate it.
References
Coscarelli, W. C., & Shrock, S. A. (1996). Criterion-referenced test development; Technical and legal guidelines for corporate training (2nd ed.). San Francisco: International Society for Performance Improvement/Pfeiffer.
FDA (2008, May). Guideline on general principles. www.fda.gov/cder/guidance/pv.htm.
Gagné, R. (1977). Analysis of objectives. In L. Briggs (Ed.), Instructional design: Principles and applications. Englewood Cliffs, NJ: Educational Technology Publications.
Gilbert, T. F. (1978). Human competence: Engineering worthy performance. New York: McGraw-Hill.
Hybert, P. (2006). Project profile: Designing a performance measurement system. www.prhconsulting.com.
Hybert, P., & Smith, K. (1999). It only counts if you can do it on the job! www.prhconsulting.com.
Lentz, R. (2008, August). Feynman Challenger report appendix. www.ralentz.com/old/space/feynman-report.html.
Merrill, M. D. (2002). First principles of instruction. Educational Technology Research and Development (pp. 43–59). Logan, UT: Utah State University Press.
Practical assessment research and evaluation. (2008, August). http://pareonline.net/.
Shrock, S. A., & Coscarelli, W. C. (2007). Criterion-referenced test development: Technical and legal guidelines for corporate training (3rd ed.). San Francisco: International Society for Performance Improvement/Pfeiffer.
Svenson, R., & Wallace, G. (2008). Performance-based employee qualification/certification systems. www.eppic.biz/services/Performance-basedEmployeeQualification-Certifi-cationSystems2008 .