Surg Endosc (2013) 27:2631–2637DOI 10.1007/s00464-012-2771-9
and Other Interventional Techniques
NEW TECHNOLOGIES
A primer on standards setting as it applies to surgical education
and credentialing
Juan Cendan • Daryl Wier • Kevin Behrns
Received: 24 September 2012 / Accepted: 11 December 2012 / Published online: 26 January 2013
Springer Science+Business Media New York 2013
Abstract
Background Surgical technological advances in the past
three decades have led to dramatic reductions in the morbidity associated with abdominal procedures and permanently altered the surgical practice landscape. Significant
changes continue apace including surgical robotics, natural
orifice-based surgery, and single-incision approaches.
These disruptive technologies have on occasion been
injurious to patients, and high-stakes assessment before
adoption of new technologies would be reasonable.
Methods We reviewed the drivers for well-established
psychometric techniques available for the standards-setting
process.
Results We present a series of examples that are relevant
in the surgical domain including standards setting for
knowledge and skills assessments.
Conclusions Defensible standards for knowledge and
procedural skills will likely become part of surgical clinical
practice. Understanding the methodology for determining
standards should position the surgical community to assist
in the process and lead within their clinical settings as
J. Cendan (&)
Department of Medical Education, College of Medicine,
University of Central Florida, 6850 Lake Nona Blvd., Suite 317,
Orlando, FL 32827, USA
e-mail: juan.cendan@ucf.edu
D. Wier
Department of Surgery, University of Central Florida, Orlando,
FL, USA
K. Behrns
Department of Surgery, University of Florida, Gainesville, FL,
USA
standards are considered that may affect patient safety and
physician credentialing.
Keywords Clinical assessment Credentialing
High-stakes assessment Performance metrics
Skill assessment Standard setting
Surgical technological advances in the past three decades
have led to dramatic reductions in the morbidity associated
with abdominal procedures and permanently altered the
landscape of a practicing surgeon’s available armamentarium. Laparoscopy led the way in the late 1980s; however, significant changes continue apace, including surgical
robotics, natural orifice-based surgery, and single-incision
approaches. Although generally associated with patient
care improvements, these advances have brought temporarily disruptive changes; the initial deployments of the
technologies have on occasion been injurious to patients.
Patient safety concerns and a general intolerance for error
in our societal and medicolegal infrastructure suggest that
high-stakes assessment before adoption of new clinical
technologies will be inevitable.
Advances in innovative technical approaches and novel
devices are presented to surgeons who must then consider
clinical adoption of those devices and techniques with little
more than a description or brief skills laboratory session.
For both the clinician and the equipment developer,
eventual clinical adoption represents a balance of the
clinical relevance and the possible business marketability
(profitability) for the device. The training program
involved in widespread clinical adoption may not be the
highest-order priority in this process, which tends to be
driven by the surgical device industry. Thus, the surgical
community must help shape the process such that surgeons
123
2632
adopting new techniques and processes are able to do so
through a defensible mechanism.
The principal stakeholders in any standards-setting
process are the consumers, funding organizations, and
professional representative bodies [1]. In the case of surgeons, that would be, respectively, our patients, the insurance and governmental funding agencies that have an
interest in safe and cost-effective procedural delivery, and
those entities that license and accredit our clinical practice.
This latter group includes the hospital we operate in, the
American Board of Surgery, the Accreditation Council for
Graduate Medical Education (ACGME) and its residency
review committees, the American College of Surgeons
(ACS), and our partners and colleagues.
Surgical education in residency and beyond
The mechanisms currently in place for surgical education
broadly represent the concept of graded and supervised
responsibility. Overall this has been a success, and it is a
model replicated throughout the Americas and Europe. The
process begins with the undergraduate medical curriculum
(UGME) and follows with the residency (graduate medical
education, GME). After residency, a process of continuing
medical education (CME) comes into place, largely guided
by the certifying specialty and by state boards. Efforts to
expand tighter regulation of CME training are being
implemented to address the activities that are occurring at
the professional level when the active surgeon is no longer
working under the auspices of a controlled educational
environment.
Standards in UGME have paralleled efforts in general
education with local and national standards procedures in
place for many topic areas. A longtime standard-bearer in
this arena has been the National Board of Medical Examiners, which, together with the Federation of State Medical
Boards, provides the United States Medical Licensing
Examination, a three-part examination consisting of
knowledge and skills performance assessment. The standards set in these examinations are routinely monitored and
continuously reviewed by nationally recognized experts.
In the GME arena, the ACGME is currently revising its
competence criteria to reflect developmental acquisition of
milestones. The acquisition of milestones should occur
over time and should build on each other in a progressive
manner. Over a period of time, a trainee would accumulate
knowledge and skill that would lead to competence in an
area.
The CME environment has lagged behind in the adoption of stringent processes to document maintenance and
competence in new skills. The majority of CME programs
remain at a level that reflects mere attendance at a function.
123
Surg Endosc (2013) 27:2631–2637
The ACS has initiated a program that classifies CME
activities according to five levels that reflect the involvement of the learner, ranging from class attendance (lowest)
to in situ supervision by an expert (highest level). CME is
evolving from attendance-based knowledge acquisition to a
process that includes attendance and self-assessment. The
American Board of Surgery Maintenance of Certification
(ABS-MOC, Part 2) requires 90 h of Category I CME over
the 3-year MOC cycle. Of the 90 h, 60 h must include a
self-assessment activity. The ABS also requires MOC, Part
4, which stipulates that diplomates must demonstrate
ongoing participation in national, regional, or local outcomes database or a quality assessment program.
In GME and CME, there is a notion that an actual highstakes assessment is unnecessary because at this stage of
professional development, the idea of continual improvement and self-education should be paramount. Optimally,
graduate students or professionals would identify an area of
weakness and develop a plan for self-education; implementation of the idea in clinical practice would be followed
by outcomes analysis. This cycle would be continually
repeated in areas in which the student or professional
exhibits weakness. However, this type of formative analysis and development does not exclude the possibility for
well-focused summative evaluations.
Assessment options
Surgical skills assessment has been traditionally an observational process wherein a resident is assessed in a clinical
environment; feedback is then provided, and over the
course of the resident’s training, goals of increasing complexity are achieved. This method has come under criticism
for being too subjective, prone to bias, and possibly not
representative of the resident’s entire skill set [2, 3]. The
surgical education community has made significant strides
toward objectifying the process, particularly since the
widespread adoption of surgical simulation platforms;
however, a recent meta-analysis by van Hove et al. [4]
conclude that ‘‘most methods of skills assessment are valid
for feedback or measuring progress of training, but few can
be used for examination or credentialing’’. In order to sit
for the ABS qualifying examination, the applicant must not
only complete the general surgery training program, but
also demonstrate successful completion of ACLS, ATLS,
and FLS. It is worth noting that even for such a rigorous
and high-stakes assessment, the published data are scant;
there may be proprietary data, however, that are not generally available. Van Hove et al. note that of the nine
articles referencing the Fundamentals of Laparoscopic
Surgery (FLS) module, only one provides level 1b (prospective) data, with one other article presenting level 3b
Surg Endosc (2013) 27:2631–2637
(nonconsecutive) and the remaining seven having level 4
(nonmatched case series) data, despite the adoption of FLS
into the surgical certifying process in North America.
In the near future, applicants for the ABS Qualifying
Examination must be evaluated in two Clinical Assessment
and Management Exams (outpatient) (CAMEO) and two
Operative Performance Evaluations. Soon this will increase
from two evaluations for each over the 6 years to six
evaluations each over the 5 years. Additionally, applicants
will also need to demonstrate successful completion of
Fundamentals of Endoscopic Surgery (FES). This is much
like FLS; however, there is a proscribed curriculum, and
FES will be more time-intensive than FLS. In short, surgeons are likely to be subjected to high-stakes scrutiny as
part of our licensure, and just how competence will be
measured is of great consequence.
Defining standards of competence
The responsibility of determining who is competent to
practice surgery—or more generally to perform procedures—is pronounced [5]. Particular to surgery, there are
perioperative concerns and intraprocedural variations that
complicate the standards-setting process. There is more to
the management of the operative patient that the mechanical
skill of completing the operation; the need for a justifiable,
documented, accountable, and defensible method for
declaring comprehensive competence is critical. Observational methodologies for assessing procedure-based [6] and
nontechnical skills in context (in the operating room) are
currently being investigated [7]. A balanced approach to any
assessment would reflect on the entire appropriateness of
care of the patient. That is, does the patient have the
appropriate indications for a procedure, and was the process
of evaluation and management leading up to the procedure
appropriate? In effect, the process of defining competence
can lead to fundamental exploration regarding how we
teach, what we want our learners to know, and how we want
them to behave.
It is worth noting that content and performance standards are not the same. Content standards refer to the
material and curriculum that learners are expected to know;
performance standards refer to the level of performance
expected from trainees. In either case, setting a standard
should pass those students who are competent and fail
those who are not. The procedure defining the standards
must include the input of experts; however, these judges
must be careful to set the passing mark at a level that is
reasonable. The process for agreeing on where that mark
should be set can be tremendously enlightening to all
involved and will generally be met with vigorous
discussion.
2633
There is substantial discussion about how experts evaluate students who are deemed competent; although
tempting, it is not enough to paraphrase Supreme Court
Justice Potter Stewart in Jacobellis v. Ohio [8]: ‘‘We know
it [competence] when we see it.’’ Experts, in fact, may
judge students by mastery criteria rather than competence
criteria. The differences between competence criteria and
mastery criteria can be difficult to define, but the standardssetting process must be defensible, and a mastery assessment could be unfair to a surgeon who is trying to maintain
or enhance his or her practice in a competitive
environment.
The marginal pass
An interesting concept in this process is the idea of the
learner who manages a marginal pass [9]. Experts involved
in the standards-setting process must keep in mind the
examinees who are about to take the examination and what
the likelihood of them knowing about or correctly performing a particular question or skill. Furthermore, that
conceptual examinees should reflect a reasonable, or even
marginal, pass. That is, the judge cannot be thinking of the
likelihood that the ‘‘best’’ student or the ‘‘top 10’’ student
would know the required information, but rather the likelihood that someone who will be a reasonable and safe
practitioner of that specific task would know. In this process, we have found it useful to conceive of the resident
that will be safe, good in the operating room, and knowledgeable, a resident who will make sound decisions for the
patient but is not otherwise a superstar. The judges need to
couch the pass/fail mark in this reality, and all judgments
and decisions must flow from that visualization.
The difficulty in defining a marginal pass is an issue for
the ABS; the failure rate for the certifying examination has
increased to 28 %. Perhaps this failure rate has occurred
because of the difficulty in distinguishing a marginal pass
from a failure. Approaches to further standardize the
examination and evaluation are in process, but the tendency to include experts in decision-making panels may
complicate the process by pulling the expectations toward a
particular subject matter expert’s data set.
A central tenet is the idea that performance standards
must be defensible; this outcome can only be guaranteed
with a transparent and well-documented standards-setting
process. The process needs to be reasonable, thoughtful,
and systematic [10]. In particular, the selection and training
of panelists, the sequencing of activities in the assessment,
and careful documentation are critical. Failure to be clear
on any one of these categories can lead to an invalid outcome declaration. Development of the judges is important;
for example, the judges need to understand the difference
123
2634
between a passing score (e.g., the percentage correct score)
and the passing rate (e.g., the percentage of students who
pass the test at any given score). Background materials
may need to be reviewed in preparation for such a process.
The number of judges required for such a process varies in
the literature, but a minimum of six and a maximum of 12
is reflected in most topic reviews [11].
Methods
There are a number of methods for setting performance
standards [11–13]. Generally performance can be assessed
by comparing one student to the others (relative), by setting
a specific cutoff of performance (absolute), or by creating
an assessment that reflects both an absolute and a relative
behavior—a compromise method. Additionally, in creating
a comprehensive assessment, the judges may face the need
to declare compensatory or noncompensatory standards
[11]—that is, if the performance on isolated parts will be
averaged (poor performance can be compensated by good
performance elsewhere) or if there are components that are
critical to perform (must-pass components). In some cases,
there may be a need to mix these approaches within the
same assessment; if that is the case, each section will need
different performance metrics.
Relative methods
The most familiar method of standards setting is norm
referencing. All of us experienced norm referencing in
secondary education classes, where an examination was
given and a curve was devised for the class. In essence, the
examinee is compared to a peer group and the score distribution curve determines excellent—or, conversely,
poor—performance. From the standpoint of the educator,
this process is simple and reproducible; however, the
mathematical process defines the fact that there will always
be poor performers (e.g., there is always a lowest quartile),
although the entire class may have performed exceedingly
well and in excess of a conceptual minimal standard. There
has been a shift away from this type of analysis, and we
will not provide an example of it here.
Absolute methods
The Angoff method for establishing pass/fail standards has
a well-established history and can be easily incorporated
into assessment standards setting. The process is one where
the evaluators describe the characteristics of a borderline
examinee and share some examples of prior students. Each
judge then reviews the examination items and determines
the likelihood that the borderline student would know that
123
Surg Endosc (2013) 27:2631–2637
item on a 0–100 % scale, or even a binary scale (e.g., true/
false, yes/no). A recorder notes these observations on a
chart, and once complete, the entire group reviews these
metrics. The process for review must be systematic and not
favor a specific judge, such as a senior faculty member or
the chairman. It is best to have that process defined a priori.
If there are wide variations between expected results, then
the group must discuss the issue and come to a conclusion.
It is generally accepted that variations greater than 20 %
should be discussed. The average of the judges’ scores for
that particular item then becomes the pass/fail mark for that
item and the average of all items is the pass/fail mark for
the examination.
Compromise methods
The Hofstee method combines an a priori judgment about
performance on the examination with actual data from the
test takers [14]. Judges are asked to define minimum and
maximum acceptable passing scores and failure rates.
Graphically, this creates a boundary set that is rectangular
in shape (Fig. 1), the midpoint of which becomes the
pass/fail mark. This method is best used with some prior
knowledge of performance on the examination; however, it
can be deployed from a subgroup analysis or even during
the assessment exercise using the graphical Hofstee variation [15]. This method focuses on the performance of the
whole test, not the specific items. This is a practical method
when you have existing performance metrics and wish to
now set a pass/fail point for an examination.
Below we work through a sample case that will seem
familiar to anyone involved in surgical education, whether
at the UGME or GME level. We will demonstrate several
of the aforementioned techniques for standards setting.
Our program faced a recent challenge that forced us to
consider these steps. In particular, we were preparing to
start a series of new residencies that will incorporate
ACGME milestones criteria from inception. Several of the
milestones criteria required standardized patient (SP)
encounters and resident expertise with patient interviewing
skills. Our SP platform had not yet been developed to the
point where summative evaluation was possible, and we
had to complete a process of standards setting through a
combination of Angoff-style judgment-based reviews of
individual assessment items, a group declaration of Hofstee
limits for pass/fail, and construction of borderline regression curves for the individual stations, which is beyond the
scope of this article [16]. We now have in place defensible
performance standards for these stations and feel confident
that residents who perform beyond a certain score should
be passed and those below should fail and thus require
remediation.
Surg Endosc (2013) 27:2631–2637
2635
Fig. 1 Rectangular boundary set
indicating the pass/fail mark
Table 1 Statements used to tabulate the validity of a test according to the Hofstee method
Statement
Result
The lowest acceptable percentage of residents that will fail the exam is…
Minimum fail rate
The highest acceptable percentage of residents that will fail the exam is…
Maximum fail rate
The lowest acceptable percentage-correct score that allows a borderline resident to pass is…
Minimum pass score
The highest acceptable percentage-correct score that allows a borderline resident to pass is…
Maximum pass score
Sample case
You are asked by your chair to create a comprehensive,
high-stakes exam for residents that incorporates multiple
choice questions (knowledge) and procedures (skills). This
assessment will determine whether the resident can pass
from junior to senior level (high stakes, must be defensible). Where do you start? The program director wrote an
exam 2 years ago and has prior performance data but has
not formally declared a pass/fail mark for the assessment;
furthermore, she has decided to add a new 10-station skills
examination to this end-of-year evaluation. Thus, the entire
instrument will have 100 multiple-choice questions and 10
skills-station performance evaluations. This standardssetting procedure mirrors what surgeons face in the ABS
qualifying examination (multiple-choice questions with
one correct response) as well as the certifying examination
(oral exam questions that may pose a broader range of
acceptable behaviors). The process for determining pass/
fail standards is consistent with our description herein.
You need to convene a representative panel and are
fortunate to have 10 judges. The panelists must have a
reasonable understanding of the actual level of the learner
and a grasp of the range of behaviors that are observed at
Table 2 Tabulating the test validity of statements in Table 1
according to the Hofstee method
Finding
Judge
Mean
1
2
3
4
5
6
7
8
9
10
Min pass
70
60
65
65
62
70
72
60
65
68
65.7
Max pass
75
70
75
75
80
75
75
70
72
80
74.7
Min fail
0
5
8
12
8
10
5
5
10
8
7.1
Max fail
12
10
12
15
10
15
10
10
20
15
12.9
that level. The panel could include the program director, a
chief resident, and a minimally invasive surgery fellow
with an interest in education, then be filled out with other
faculty members. The general faculty members could be
specialists as long as they fit the aforementioned caveat—
that is, realistic expectations specific to the learner and
considerate to the concept of the marginal pass.
Because you have prior performance data for the
assessment, it is time to consider it. The panel learns that
the prior average was a 79.2 %, the minimum–maximum
range was 58–92 %, and all questions were answered
correctly by at least 40 % of test takers. The panel may
decide that this is a good exam; thus, they are not interested
123
2636
Surg Endosc (2013) 27:2631–2637
Table 3 Example of the dichotomous method
Item
Judge
Mean
1
2
3
4
5
6
7
8
9
10
Question 1
70
80
82
80
78
72
75
77
75
70
75.9
Question 2
70
80
75
90
90
95
88
90
75
80
83.3
Able to tie intracorporeal knot in \60s
Yes
No
Yes
Yes
Yes
No
No
Yes
Yes
No
60
. . . for every question on the assessment
Overall pass/fail, mean of means
in reviewing each of the 100 questions but would rather
take a global approach to the exam using the Hofstee
method. You should then ask the judges to declare an
answer to the following four statements in Table 1 and
tabulate these according to Table 2.
You create a graphic that represents the boundaries of the
judgment ranges and overlay it with the actual cumulative
performance of the residents taking the exam. The responses
from your panel are represented in Fig. 1. The point at which a
diagonal line across the rectangle intersects the actual performance curve is the pass/fail point for the whole assessment.
It is possible that the panel decides to review each and every
question in order to determine the pass/fail point. The Angoff
methodology (demonstrated in the subsequent paragraphs) has
been used by psychometricians to address this need. Although
time-consuming, it can reveal wide ranges of expectations
among the panelists (important for the program to address) and
may also reveal areas of consistent expectations (which can be
developed into must-pass stations, if desired).
Creating a pass/fail for the skills portion of the examination leaves us with two new quandaries: first, the assessment
is new, and there are no prior performance metrics to rely on;
and second, the individual assessments are not in traditional
multiple-choice format. We can utilize the Angoff method
for setting standards for this new examination as long as the
judgment panel has a reasonable concept of what is being
measured and what the performance should be.
Sharing this information is the responsibility of the
assessment creator, and there are a number of ways to
address the problem in an informed manner. For example, a
station that utilizes a virtual reality simulator metric may
have published performance data [17] that can inform the
process. For example, the time to complete a particular
activity on your simulator is 60–90 seconds for a 25–75
percentile of all users. If this is the case, your judgment
panel may use that information in the context of your
specific residents. It is worth noting that this type of
information is useful, but final judgment requires local
reckoning with the actual level of the learner, access to
equipment, degree of supervision, and so on.
Some components of the skills examination may be
dichotomous (e.g., they tied the knot), and some may be
123
73.07
graded along a time or other scale. Scaled numbers can be
averaged across judges, whereas dichotomous can be
assigned a 0 or 100 and thus averaged across the entire
assessment. An example of this method is shown in Table 3.
It is worth noting that the judges could decide that inability to
pass a particular dichotomous item translates to an overall
failure. This invokes a separate type of process that we will
not discuss further, but it also requires significant discussion
and agreement among the judges—a minimal skills set.
This same process could have been followed had the
panel decided to review each and every question in order to
determine the pass/fail point. Although time-consuming, it
can reveal wide ranges of expectations among the panelists
(important for the program to address) and may also reveal
areas of consistent expectations that can be developed into
must-pass stations, if desired.
Conclusions
The standards-setting process is laborious and can create
great discomfort. Professionals have long worked under the
premise that part of their charge was to self-regulate and,
by extension, accept the responsibility of their work
product and its consequences. The development and
adoption of novel surgical technologies and the confluence
of the patient safety movement, a societal move toward
regulation, and risk minimization lead us to consider the
process of skills assessment for high-stakes activities. The
surgical community should lead the way in defining
appropriate knowledge and skills metrics along the learning arch from UGME, to GME and into the CME environments. This document can serve as a primer for the
process introducing some of the general concepts driving
the process. The principal concepts are that any standardssetting process should be (1) deliberate and open for critique and (2) purposefully fair to those being assessed,
while (3) continuing to couch the relevance of the process
within the higher goal of patient safety.
An entire psychometrics literature exists to support these
endeavors, and we review only the basics here. Although
Surg Endosc (2013) 27:2631–2637
the term marginal may seem pejorative, it is a critical
concept in the process, and defining that level crystalizes
our understanding of knowledge and performance expectations. We maintain that the standards-setting process can
be tremendously enlightening to everyone involved in
forming the curriculum, the assessing body, and eventually
the person being evaluated. Surgeons must participate in
this process.
Disclosures Drs. Cendan, Wier, and Behrns have no conflicts of
interest or financial ties to disclose.
References
1. Southgate L, Hays RB, Norcini J, Mulholland H, Ayers B,
Woolliscroft J, Cusimano M, McAvoy P, Ainsworth M, Haist S,
Campbell M (2001) Setting performance standards for medical
practice: a theoretical framework. Med Educ 35:474–481
2. Darzi A, Smith S, Taffinder N (1999) Assessing operative skill
needs to become more objective. BMJ 318:887–888
3. Taffinder N, Sutton C, Fishwick RJ, McManus IC, Darzi A
(1998) Validation of virtual reality to teach and assess psychomotor skills in laparoscopic surgery: results from randomised
controlled studies using the MIST VR laparoscopic simulator.
Stud Health Technol Inf 50:124–130
4. van Hove PD, Tuijthof GJ, Verdaasdonk EG, Stassen LP,
Dankelman J (2010) Objective assessment of technical surgical
skills. Br J Surg 97:972–987
5. Searle J (2000) Defining competence—the role of standard setting. Med Educ 34:363–366
6. Marriott J, Purdie H, Crossley J, Beard JD (2011) Evaluation of
procedure-based assessment for assessing trainees’ skills in the
operating theatre. Br J Surg 98:450–457
2637
7. Crossley J, Marriott J, Purdie H, Beard JD (2011) Prospective
observataional study to evaluate NOTSS (non-technical skills for
surgeons) for assessing trainees’ non-technical performance in
the operating theatre. Br J Surg 98:1010–1020
8. Brennan (1964) Jacobellis v Ohio. Tomson-Reuters. http://case
law.lp.findlaw.com/scripts/getcase.pl?court=us&vol=378&invol=
184
9. Zieky M (ed) (2001) So much has changed: how the setting of
cutscores has evolved since the 1980s. Lawrence Erlbaum
Associates, Mahwah
10. Hambleton RK, Brennan RL, Brown W, Dodd B, Forsyth RA,
Mehrens WA, Nellhaus J, Reckase M, Rindone D, van der Linden
WJ, Zwick R (2000) A response to ‘‘Setting reasonable and
useful performance standards in the National Academy of Sciences’ Grading the Nation’s Report Card’’. Educ Measure Issues
Pract 19(2):5–13
11. Downing SM, Tekian A, Yudkowsky R (2006) Procedures for
establishing defensible absolute passing scores on performance
examinations in health professions education. Teach Learn Med
18(1):50–57
12. Cusimano M (1996) Standard setting in medical education. Acad
Med 71(10):S112–S120
13. Cusimano MD, Rothman AI (2003) The effect of incorporating
normative data into a criterion-referenced standard setting in
medical education. Acad Med 78(10):S88–S90
14. Hofstee W (1983) The case for compromise in educational
selection and grading. Jossey-Bass, San Francisco
15. DeGruijter D (1985) Compromise models for establishing
examination standards. J Educ Measure 22:263–269
16. Gormley G (2011) Summative OSCEs in undergraduate medical
education. Ulster Med J 80(3):127–132
17. von Websky MW, Vitz M, Raptis DA, Rosenthal R, Clavien PA,
Hahnloser D (2012) Basic laparoscopic training using the Simbionix LAP Mentor: setting the standards in the novice group.
J Surg Educ 69(4):459–467
123
Reproduced with permission of the copyright owner. Further reproduction prohibited without
permission.
Working Smart a professional practice forum
Navigating Privacy & Security / e-HIM Best Practices / Standards Strategies / The Sound Record
Building Interoperability Standards
and Ensuring Patient Safety
By Michael Glickman, MSE, and Anna Orlova, PhD
A
ANYONE WHO HAS ever developed a standard knows well the
many challenges that must be surmounted. Once a standard is
published, however, it’s not the end but in many respects only
the beginning. Moving standards from specification to practice
requires an equivalent if not greater effort, as does ensuring
that standards are not stuck at a point in time but are “living”
and are periodically updated to reflect experience from users as
well as advances in the state-of-the-art health information and
communication technology. More importantly, individual standards have to work together to enable information sharing and
interoperability across various health information and communication technology (HICT) products.
The result of 16 years of standards development has led the
International Organization of Standardization (ISO) Technical Committee 215, Health Informatics (ISO/TC 215) to the
practical realization that a “bundle” of individual standards
is required to create interoperable health information technology (health IT) standards that will ensure both adoption
and sustainability.
Building Interoperability Standards
A bundle of individual standards that work together to enable
interoperability represents a high-level standard specification—an assembly of individual standards that move information from sender to receiver. Interoperability standards are
harmonized and intergrated individual standards constrained
to meet healthcare and business needs for sharing information
among organizations and systems for a specific scenario (use
case) of health information exchanges.
According to the Health Level Seven (HL7) definition, interoperability is comprised of the following three components (pillars):
48 / Journal of AHIMA November–December 15
1. Semantic interoperability—shared content
2. Technical interoperability—shared information exchange
infrastructure (transport)
3. Functional interoperability—shared rules of information
exchanges (i.e., business rules and information governance
(IG), “the rules of the road”)
Thus, the interoperability standard—a bundle or assembly of
individual standards—has to include individual standards from
these three components of interoperability. The concept of “a
bundle” of individual standards working together was first introduced by the Health Information Technology Standards Panel (HITSP, www.hitsp.org) in 2005. HITSP operated as a public
and private collaborative supported through a contract from
the Office of the National Coordinator for Health IT (ONC) to
the American National Standards Institute (ANSI). The HITSP
bundle was formally called “Interoperability Specification (IS).”
Between 2005 and 2009, HITSP developed 19 ISs for various national use cases including Electronic Health Record (EHR) Laboratory Result Reporting (IS 01), Biosurveillance (IS 02), Consumer Empowerment (IS 03), Quality (IS 06), and Consultation
and Transfer of Care (IS 09), among many others.
HITSP IS included specific individual standards grouped by
the following categories:
Semantic Interoperability
– Data Standards (vocabularies and terminology standards)
– Information Standards (reference information models,
information templates, and other)
Technical Interoperability
– Information Exchange Standards (message-based and
document-based)
– Identifier Standards
– Privacy and Security Standards
Functional Interoperability
– Functional Standards (requirements for health information and communication technology derived from
the analysis of the use case)
–
Business Processes Standards (guidelines and best
practices described in the use cases)
For example, HITSP Biosurveillance IS 02 included 110 individual standards (see Figure 1).1 This assembly of standards supported a charge formulated in the National Biosurveillance Use
Case of transmiting “essential data from electronically enabled
healthcare to authorized public health agencies in real-time.”
Essential data included 40 data elements defined by the Centers for Disease Control and Prevention (CDC). Biosurveillance
use case was the first of the three national use cases developed
for HITSP by the American Health Information Community
(AHIC)—an ONC advisory committee that identified priorities
for health IT interoperability and developed national use cases.
The first three use cases included biosurveillance, EHR laboratory result reporting, and consumer empowerment. A total of
152 national use cases were developed by AHIC between 2005
and 2009. These use cases served as business requirements for
the HITSP interoperability specifications.
Built upon the HITSP methodology, ISO/TC 215 decided
to move forward with developing interoperability standards.
The working title for the ISO “bundle” is “Standards Reference
Portfolio (RSP).” The first domain selected for developing ISO
RSP is clinical imaging. The work has been conducted in collaboration between ISO/TC 215 and DICOM, a standards development organization.2
ISO RSP includes standards for content and payload (semantic interoperability), transport (technical interoperability), and
rules (functional interoperability) (i.e., standards for information governance and information management practices—
which are strategic AHIMA imperatives).3,4,5,6
Critical constituents of the RSP bundle’s functional interoperability standards include standards that enable data capture (information availability), data quality validation (data integrity), data
protection (capture of patient consent for healthcare procedure as
well as information sharing; protection of privacy, confidentiality,
and security of information), and other standards for information
governance principles in healthcare defined by AHIMA.7
ISO RSP also defines conformance criteria, which are
statements that specify how various individual standards
should work together. These criteria will be used by vendors to test RSP and to deploy standards into their products.
They also will be used in HICT certification processes, so
users know that the product is compliant with the interoperability standard.
Maintenance is important to keep RSP up-to-date. In order
to ensure that standards remain relevant ISO has developed directives that govern all ISO standards. One of these is the com-
Figure 1: Number of Individual Standards
Included in the HITSP Biosurveillance IS 02
STANDARD CATEGORY
NUMBER OF STANDARDS
Semantic Interoperability (Content)
Data Standards
28
Information Standards
17
Technical Interoperability (Transport)
Information Exchange Standards
46
Identifier Standards
11
Privacy and Security Standards
5
Functional Interoperability (Rules)
Functional Standard
1
Business Processes Standards
(guidelines, best practices, use cases)
1
Total
109
pulsory five-year systematic review when standards developers
employ a systematic review process to determine whether a
standard is (a) still relevant and in use, (b) is no longer needed
and should be retired, or (c) is in use but should be revised to
ensure its continued value to the industry. Adoption and continuing feedback from the users regarding the standards-based
capabilities allows them to keep the standard updated to meet
user needs.
Figure 2 on page 50 presents the interoperability standard
framework with various RSP components and enablers.
Ensuring Patient Safety through
Interoperability Standards
One of the keys of health IT adoption is enabling patient safety
while using the means of standards-based health information
and communication technology. Specific aspects of ensuring
health technology safety through standardization is specified
in the ISO/IEC 80001 standard published in 2010.8 It was up for
a systematic review in 2015. This standard was born out of the
recognition that networked medical devices are increasingly
being deployed on general purpose IT infrastructure. Though
the manufacturers have to rigorously apply risk management
to identify and manage potential safety issues and receive regulatory clearance to place their technology on the market, once
the product is purchased, implemented, and placed in use, risk
management processes are rarely applied to the resulting network of integrated devices, health information and communication technology systems, and applications. Unintended consequences that compromise patient safety had been occurring
far too frequently and overall confidence in the technology had
Journal of AHIMA November–December 15 / 49
Working Smart a professional practice forum
Navigating Privacy & Security / e-HIM Best Practices / Standards Strategies / The Sound Record
Figure 2: Interoperability Standard
Framework: Components and Enablers
been suffering accordingly.
Specific safety risks associated with non-interoperability of
health IT products include:
Data quality, misidentification and integration of patient
data from multiple sources (record matching on a patient
represents a critical record management step, so that the
information from one patient cannot be added to the chart
of another patient)
Data accuracy, availability, and integrity issues due to configuration, security, or IT operations failures
Decision support failures due to incorrect or outdated
medical logic, reference data, algorithms or alert triggers
Failures and inconsistencies in delivery, integration, or presentation of diagnostic information results
Failures and inconsistencies in delivery, integration or
presentation of therapy information (such as radiotherapy information)
Insufficient attention to workflow, human factors, change
management, or training of clinicians
Privacy breaches, data governance issues, or other causes
that erode provider and consumer confidence
The risk management for health information and communication technology can be formulated in four questions:
1. What can go wrong?
2. How can it happen?
3. What can be done about it?
4. How do we know we have done enough?
The ISO/IEC 80001 standard evaluated under the ISO systematic review process demonstrated that:
Yes, the ISO/IEC 80001 standard remains highly relevant,
50 / Journal of AHIMA November–December 15
even more so given the increasingly complex health IT environments and the increased integration of medical devices and various health IT products
No, ISO/IEC 80001 standard has not been widely implemented; as much as ever, it is widely recognized as a key
component of addressing safety and security when interoperable technology is deployed
Yes, the state-of-the-art health information and communication technology has been advanced
Yes, much has been learned about what is needed to ensure
the safe use of information in healthcare
There is a need for new understanding of medical device safety, and of the safety of any collection of objects running software
and being connected, such as the Internet of Things, in the context of a specific use or use case.
ISO/TC 215 Health Software Ad Hoc Group looked at the
broader issue of health software safety standards to admit that
“while our initial focus was on health software, we have recognized that the architecture of health software safety standards
must also address the safety of the broader health IT system,
and the socio-technical environment of which health software
is a component.”
This “environment” includes not only the information technology (i.e., hardware, software, networks, interfaces to other
systems and data), but also the:
People (i.e., clinicians, patients, consumers, caregivers, administrators)
Care processes (i.e., clinical workflow, decision algorithms
and care protocols)
Organization (i.e., capacity, governance, configuration decisions about how health IT is applied)
External environment (i.e., regulations, public opinion,
ambient conditions)
The group further focused on defining end-to-end safety management strategy, leveraging standards in areas such as risk,
quality, security, IT lifecycle, information governance, etc., and
identifying gaps that need to be filled. The report finalized during
spring 2015 identified the technology lifecycle over which safety
must be established and maintained. Eight key topics are integral
to achieving health information and communication technology safety. Grouped under the three categories—people, technology, and policies—they include:
People
1. Organization’s culture, roles, and competencies
2. Human factors, usability, and change management
Technology
3. Systems and software lifecycle processes
4. Safety management processes across software lifecycle
Policies
5. IT and information governance
6. Risk management
7. Quality management
8. Information privacy and security management
The ISO/IEC 80001 standard is an example of a standard that
will be included in the ISO/TC 215 RSP (bundle) to ensure that
standards included in the RSP properly address risks associated with semantic, technical, and functional components of
interoperability. ¢
Notes
1. Health Information Technology Standardization Panel
(HITSP). “Biosurveillance Interoperability Specification
(IS) Number 02.” 2009. www.hitsp.org/InteroperabilitySet_Details.aspx?MasterIS=true&InteroperabilityId=49&P
refixAlpha=1&APrefix=IS&PrefixNumeric=02.
2. Digital Imaging and Communication in Medicine (DICOM). http://dicom.nema.org.
3. Tech Terms. “Payload definition.” www.techterms.com/
definition/payload.
4. Cohasset Associates and AHIMA. “A Call to Adopt Information Governance Practices: 2014 Information Governance in Healthcare.” 2014. www.ahima.org/~/media/
AHIMA/Files/HIM-Trends/IG_Benchmarking.ashx.
5. Cohasset Associates and AHIMA. “Professional Readiness
and Opportunity: 2015 Information Governance in Healthcare.” 2015. www.ahima.org/~/media/AHIMA/Files/HIMTrends/IGSurveyWhitePaperCR_7_27.ashx?la=en.
6. AHIMA. “A Call to Adopt Information Governance… .”
7. Ibid.
8. International Organization for Standardization (ISO)
and International Electrotechnical Commission
(IEC). “ISO/IEC 80001-1:2010. Application of risk management for IT-networks incorporating medical devices—Part 1: Roles, responsibilities and activities.”
October 1, 2010. www.iso.org/iso/catalogue_detail.
htm?csnumber=44863.
Michael L. Glickman (MGlickman@CNAInc.com) is CEO of Computer Network Architects and chair of ISO/TC 215 Health Informatics. Anna Orlova
(anna.orlova@ahima.org) is senior director for standards at AHIMA and an
ISO/TC 215 member.
HIM challenges
mounting up?
VHC CODING SOLUTIONS
PUT YOU OVER THE TOP.
Relief Coding, IP/OP Reviews, ICD-10
Impact Analysis, Post ICD-10
Implementation Assessments, Training,
Consulting Services, Denials Management,
and More.
RE VE NU E CYCLE S O LU T IO NS FO R H IM
Journal of AHIMA November–December 15 / 51
Copyright of Journal of AHIMA is the property of American Health Information Management
Association and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder’s express written permission. However, users may print,
download, or email articles for individual use.
Discussion Topic: Health Care Standard Setting
Define health care standard setting as it relates to patient safety. Next, discuss
your experience with the Vila Health scenario as it relates to understanding
patient safety standards.
Below is just a scenario from Vila Health. Your solution for this discussion will
focus on the problems Vila Hialth is facing with not making profit because of
regulatory demands. Some of their profits were directed to improving other areas
just to satisfy CMS, Joint Commission, CDC regulations.
What can they do to strike the right balance of making profit and at the same time
meeting demands from regulatory agencies. You need to make some
recommendations based on the resources provided on 2 sheets double spaced.
Regulatory and Compliance Inventory
From: Frederick Mora, Director of Quality Management
To: Sam
Sam,
At yesterday afternoon’s Advisory Board meeting, there was a lot of concern about that bad
inspection Clarion Court received from the state Department of Health, as well as some disturbing
long-term trends in their reported quality measures. It is of course completely unacceptable to have
black marks like these associated with one of our facilities when we, as an organization, work so
hard to be known for quality and safety.
It’s clear that we are going to need to take a very close look at Clarion Court’s day-to-day operations.
But before we do that, there’s wide agreement that there are broader concerns here at Vila Health
corporate. Things have changed so much so quickly on the regulatory and accreditation front that we
may have become out of touch with the realities on the ground. If we’re going to straighten things up
at individual facilities, that just won’t do.
I came out of that meeting with a mandate for information gathering. We need to amass perspectives
from across the organization and determine exactly where we are, particularly in terms of meeting
basic accreditation requirements and complying with ever-changing regulations. To this end, I’m
asking the members of our QA team to consult with leadership of all of our facilities and get their
perspective on their regulatory and compliance situations. I’m dividing the portfolio of Vila Health
facilities among the different members of the QA team; I would like you to focus on Valley City
Regional Hospital and of course the Clarion Court skilled nursing facility. I’ve prepared a brief fact
sheet that provides some essential information about each of them.
I’d particularly like you to talk to people about three things: which regulatory bodies they’re
concerned with in their unit, what the key regulation from that body is, and what their difficulties are
in complying with that regulation. Take good notes!
Frederick
Facilities Fact Sheet
Vila Health is proud to offer exceptional, compassionate health care services across the upper
Midwest, serving a wide variety of communities and situations.
Clarion Court Skilled Nursing Facility (Burnsville, MN)
Clarion Court is a 112-bed skilled nursing facility that has been part of the Vila Health network since
1983. Clarion Court provides residents of the southern Twin Cities area with cutting-edge senior
care.
1479 Riverwood Drive, Burnsville, MN 55337
(855) 556-2577
Valley City Regional Hospital (Valley City, ND)
Offering primary care and specialty services for both inpatient and outpatient clients, Valley City
Regional Hospital is a 60-bed facility serving the greater Valley City region. Our physicians are
connected to the award-winning health services throughout the Vila Health network.
721 Chautauqua Blvd., Valley City, ND 58072
(701) 846-7700