Use of probability and z scores in research to get better results

Discussion:

This week was our first foray into the tantalizing field of probabilities and z scores. Oftentimes, probability questions tend to cause confusion for even scientifically inclined people. Regardless of the confusion, probabilities and associated z scores are foundational to statistics and research designs. For this discussion thread, answer the three questions below.

(1) Based on the readings this week, what do you (as a researcher) think is the most useful probability tenet/concept for supporting the results of your research?

(2) In this course (and likely several other courses in your graduate program), you will encounter statements about some result being “statistically significant”. If you had to explain “statistically significant” to someone with little or no prior background in probability theory, how would you explain it? [For this question, just do your best to reduce this complex concept into the simplest explanation you can; again, there is no one right answer.]

(3) Using any of the articles you identified earlier in this course for use in your research, provide an example of either (a) how the author(s) used probabilities or z scores in their research, or (b) perhaps could have used probabilities or z scores in their research. Note: Not every research article will include probabilities or z scores and not all research would necessarily require their use. If you think this is the case for your selected article, then state so and provide your rationale as to why based on the readings this week.

2/10/22, 2:06 PM Ch5 Probability Pt1

vassarstats.net/textbook/ch5pt1.html 1/5

Print this Window

Chapter 5. Basic Concepts of Probability
Part I

“Not chaos-like, together crush’d and bruis’d,
But, as the world, harmoniously confused

Where order in variety we see,
And where, tho’ all things differ, all agree.”

—Pope, Windsor Forest

Imagine you are sitting in your statistics class one day when a man walks in and proclaims:
“I have developed the power of mind over matter. If one of you will stand in one corner of this
room and toss 100 pennies, I will stand in the opposite corner and, through the sheer power of
thought, cause those penny tosses to come up heads. Now mind you, I don’t claim that each and
every toss will come up heads—I haven’t yet perfected this skill to the level of 100 percent. But
what I will do is produce an impressive number of heads, to the point where you will at least be
willing to take my claim seriously.”

And so your class decides to put him to the test. One student stands in the front left corner of
the room to toss the pennies one-by-one; another is positioned nearby to record the head or
tail outcome of each toss; and the man who has made the claim stands in the right rear
corner of the room, attempting to exert the sheer power of his thought upon the outcomes.
The rest of the class watches closely to make sure that there is no hanky-panky or collusion.

I have concocted this little scenario to show that you already have some good, solid intuitions
about the generalities of probability, even though you might not yet know anything at all
about its technical details. You can begin eliciting these intuitions by asking yourself the
question: How many heads would have to turn up in the 100 penny tosses before I would find
the results “impressive” enough to take the man’s claim seriously? When I ask the students in
my own statistics classes to reflect on this question, an occasional ardent skeptic will proclaim
that he or she would not be impressed even if 100% of the tosses turned up as heads. Most,
however, say that their threshold for being sufficiently impressed falls somewhere between
70% and 80%, that is, between 70 and 80 heads out of 100 tosses. Never in all my years
before the blackboard have I found a student who would be impressed with only 50% heads,
nor with anything less than 50% heads.

But of course not, you will say—obviously! And indeed it is obvious—but do pause for a
moment to consider why you find it so. I expect it comes down to something like this.
Somewhere within you is the idea that a standard minted coin, on any particular toss, has a
50% chance of coming up heads and a 50% chance of coming up tails. And then there is a
process of inference. For some students it might take place step-by-step as a deliberate
conscious process; for others it is perhaps a kind of global intuitive leap. Either way, here is
the underlying logic of it. If any particular toss of a coin has a 50% chance of coming up
heads and a 50% chance of coming up tails, then in any multiplicity of coin tosses we would
expect the total numbers of heads and tails outcomes to be about evenly divided, about half-
and-half, 50/50. Thus you would not be impressed if our mind-over-matter claimant came up
with only 50 heads out of 100 tosses, because that is just about what you would expect of
any set of coin tosses, irrespective of whether anyone was there trying to zap the tosses with
the sheer power of thought.

I expect you would also not be impressed by 51 heads out of 100, nor 52, nor 53. But what
about 63, or 73, or 83? On the other hand, all but the most fervent skeptic would be

2/10/22, 2:06 PM Ch5 Probability Pt1

vassarstats.net/textbook/ch5pt1.html 2/5

impressed with 100 heads out of 100. But what about 95, or 85, or 75? In brief, where do you
draw the line? Somewhere between 50 heads and 100 heads is a point where a rational and
open-minded person could reasonably say “I am impressed.” Intuitively, you know the line is
there, and you probably have an intuitive hunch of its general vicinity. Let us suppose that
after reflecting on this question you decide that the line falls somewhere in the vicinity of
70%. Anything less than 70 heads out of 100 will not impress you; an even 70 heads out of
100 will begin to impress you; 71 out of 100 will impress you a bit more; 72 out of 100 still
more; and so on.

Now for another question, also aimed at eliciting some intuitions that you already have about
the subject. Suppose that the test of the man’s claim had involved only 10 penny tosses. The
question is, would you in this case still draw the line at 70%? That is, would you be anywhere
near as impressed with 7 heads out of 10 tosses as you would be with 70 heads out of 100
tosses? I expect your answer to this question will be an immediate “no,” based on a strong
underlying intuition to the effect that a 70% line could be much more easily reached or
exceeded with only 10 tosses than it could be with as many as 100 tosses. Later in this
chapter we will examine inferential procedures by which you can determine that the
respective likelihoods for these two outcomes occurring by mere chance are

OUTCOME LIKELIHOOD

>70% heads in 10 tosses: 17%
>70% heads in 100 tosses: 0.005%

The first of these would fall far short of the standard 5% criterion of statistical significance
introduced in Chapter 4 (i.e., equal to or less than 5%), while the second would far surpass it.
So if our mind-over-matter claimant actually were to get as many as 70% heads in 100
tosses, you could allow yourself to be very impressed indeed—assuming of course that the
pennies were not somehow mechanically biased in favor of heads, that there was no collusion,
that the outcomes of the tosses were accurately recorded, and so on. For it is very highly
unlikely (5/1000ths of 1%) that he or anyone else would get as many as 70% heads in 100
tosses by mere chance coincidence.

But more of such details later. First we must lay some foundations. A couple of centuries ago
the great mathematician Laplace observed that the theory of probability is “at bottom only
common sense reduced to calculation.” I think a better way of putting it would be to say that
the theory of probability is common sense expanded by calculation—for in fact there are many
aspects of the subject that go well beyond common sense, and some that are even flatly
contradictory of common sense. Still, the conceptual and computational apparatus of
probability does have its roots in common sense, and for that reason you will find you have a
substantial head start in the task of studying it. As we begin developing the subject, please do
not be lulled if you find some of the illustrative examples—coin tosses and the like—to be
rather trivial. Keep your eye not on the examples, but on the general concepts that lie behind
them. And bear in mind that these concepts are the foundation of all statistical inference, and
thus of virtually everything else in this text that will follow.

Elementary Probabilities: “common sense reduced to calculation”

The basic concept is an idea of utter simplicity. Imagine you have four small balls, similar in
every respect except that one is red and the other three are blue. Place the balls in a box,
close the lid, and shake the box so as to jumble the order of its contents. Now reach into the
box and blindly withdraw one ball—but before you do so, place a bet on whether the ball you
draw will be red or blue. Assuming that you would want to place your bet rationally rather
than just arbitrarily, the task is to determine which of the two colors has the greater chance of
being drawn. And that is tantamount to asking which color has the greater probability of being
drawn. The common sense of it is this. If you are blindly drawing 1 ball out of 4, and if these

2/10/22, 2:06 PM Ch5 Probability Pt1

vassarstats.net/textbook/ch5pt1.html 3/5

4 balls include 3 blue and 1 red, then you have 3 chances out of 4 of drawing a blue ball, but
only 1 chance out of 4 of drawing a red one. Thus the rational choice would be to bet on blue,
for while you would still have 1 chance out of 4 of losing your bet, you would have 3 chances
out of 4 of winning it. A bet on red, on the other hand, would have only 1 chance out of 4 of
winning and 3 chances out of 4 of losing.

Although the calculations to which this common sense is “reduced” can sometimes grow quite
complex, the basic operation is just elementary arithmetic. It amounts to taking common-
sense concepts such as “3 chances out of 4” and converting them into a meaningful and
useful numerical form. In general, for any common-sense concept of the form “this particular
event has x chances out of y of occurring,” the probability of that event can be defined
numerically as the ratio of x to y. Thus the probability, P, of drawing a blue ball (with 3
chances out of 4) is

P(blue) = 3/4 = .75

and the probability of drawing a red ball (with 1 chance out of 4) is

P(red) = 1/4 = .25

So far, so good. Calculating that the respective probabilities are P(blue)=.75 and P(red)=.25,
you rationally place your bet on drawing a blue ball. The bet is down, you blindly reach into
the box and draw a ball—and it is red! The moral of this scenario is that a probability value
such as P(blue)=.75 or P(red)=.25 is not a reliable predictor of the outcome of a singular
event. It is a reliable predictor only in reference to a multiplicity of events; and the greater
the number of such events, the more reliable the prediction. For the present example, the
practical predictive meaning of the two probability values is this: If you were to perform the
ball-drawing operation an indefinitely large number of times (returning the drawn ball to the
box and shaking the box prior to each new draw), you would draw one or another of the blue
balls in 75% of the cases and the red ball in 25% of the cases. If you were to do it only a few
times, your observed percentages of blue and red would perhaps differ markedly from these
theoretical values of 75% and 25%; however, the more you repeated the operation, the closer
they would come.

To take another example, imagine a statistics class containing 30 students, of whom 7 are
freshmen, 12 are sophomores, 10 are juniors, and one is a senior. If you were to select one
student at random from this class, what is the probability that the student you select will be a
freshman? As there is a total of 30 students in the class, of whom exactly 7 are freshmen,
that probability is

P(freshman) = 7/30 = .2333

By the same reasoning, the probability of selecting a sophomore is

P(sophomore) = 12/30 = .40

and so on for the categories junior and senior:

P(junior) = 10/30 = .3333

P(senior) = 1/30 = .0333

These examples illustrate what is known as the relative frequency concept of probability, so
named because it defines the probability of an event in terms of the number (or frequency) of
possibilities favorable to the occurrence of that event, relative to the total number of

2/10/22, 2:06 PM Ch5 Probability Pt1

vassarstats.net/textbook/ch5pt1.html 4/5

possibilities. Thus, in its general conceptual structure, the probability of the occurrence of a
certain event, x, is framed as

P(x) =
number of possibilities favorable to the occurrence of x

total number of pertinent possibilities

So the probability of randomly drawing a blue ball from a box is

P(blue) =
the total number of blue balls in the box

the total number of balls in the box

and the probability of randomly selecting a sophomore from a statistics class is

P(sophomore) =
the number of sophomores in in the class

the total number of students in the class

Similarly, if we were to select at random one person from a room full of persons, the
probability of selecting a woman would be

P(woman) =
the number of women in in the room

the total number of persons in the room

It will be fairly evident that probability values determined in this fashion will always fall
somewhere between P=0 and P=1.0, inclusive; for the operation always involves the division
of one number by another number, and the divisor is in every case equal to or larger than the
number it divides. Abstract though this point might seem, it makes direct contact with
common sense. If there are no women in a room of 100 persons, then the probability of
randomly selecting a woman from that room is nil. Hence

P(woman) = 0/100 = 0

Conversely, if the room with 100 persons contains only women, the probability of selecting a
woman is what common sense would call one-hundred percent:

P(woman) = 100/100 = 1.0

If the number of women in the room is greater than zero but less than 100, then the
probability of selecting a woman will fall somewhere in between P=0 and P=1.0; and the
greater the proportion of women, the greater the probability. That, indeed, is exactly what a
probability value is—a statement of proportion. Multiply any probability value by 100, and you
convert it into a common-sense statement of percentage.

There are certain kinds of situations where the particular values for “number of possibilities
favorable to the occurrence of x” and “total number of pertinent possibilities” can be precisely
known in advance, either by physically counting all the relevant possibilities, or else by
enumerating them through logical analysis. In a case of this sort, the resulting probability
value is said to be determined a priori. With our example of balls in a box, for instance, we
know in advance that the box contains exactly 4 balls, of which exactly 3 are blue. Nothing
else is needed. Given these prior facts, we can then proceed by “pure logic” to conclude that
the probability of blindly drawing a blue ball from the box is exactly P(blue)=.75. A variation

2/10/22, 2:06 PM Ch5 Probability Pt1

vassarstats.net/textbook/ch5pt1.html 5/5

on the theme of a priori probability reasoning is illustrated by our introductory example of
tossing pennies. From everything we know about the physical properties of pennies and the
physical principles involved in flipping them, it is reasonable to assume that the two possible
outcomes, heads and tails, are equally likely. Thus, P(head)=1/2=.5 and P(tail)=1/2=.5.

There are many other kinds of situations, however, where the appropriate probability value
cannot be known precisely in advance, but rather must be estimated on the basis of observing
a large number of actual instances. Here the determination is said to be a posteriori. This is
what lurks behind the scenes when a meteorologist tells us there is a 30-percent probability
of rain today, or when a physician tells a patient there is an 87-percent chance that a certain
surgical procedure will be successful. For the meteorologist it is an estimate based on
observations of the frequency of rain in the past under similar combinations of temperature,
humidity, atmospheric pressure, and other relevant factors:

P(rain) =

number of previously observed
similar days that produced rain

total number of previously
observed similar days

while for the physician it is an estimate based on the previously observed success rate of this
particular surgical procedure for patients of this particular age, gender, physical condition, and
so forth:

P(success) =

number of similar patients for whom
the surgery proved successful

total number of similar patients on
whom the surgery was performed

I expect it is intuitively obvious to you that the confidence one can have in such a probability
estimate increases in proportion to the number of observations on which it is based. We will
make more of this point later in the present chapter and in other chapters as well.

End of Chapter 5, Part I.
Return to Top of Chapter 5, Part I

Go to Chapter 5, Part 2

Print this Window

Home Click this link only if the present page does not appear in a frameset headed by the logo
Concepts and Applications of Inferential Statistics

http://vassarstats.net/textbook/ch5pt2.html

http://vassarstats.net/textbook/index.html

2/10/22, 2:05 PM Ch5 Probability Pt2

vassarstats.net/textbook/ch5pt2.html 1/7

Print this Window

Chapter 5. Basic Concepts of Probability
Part II

Compound Probabilities: Common Sense Expanded by Calculation

The usefulness of “reducing” common-sense concepts of probability to precise statements of
proportionality is that they can then be subjected to the powerful analytical apparatus of
mathematical calculation. Under certain circumstances they can be added, subtracted,
multiplied, or divided. And while these operations themselves are elementary and
commonplace, they end up telling us things about probability that go far beyond the raw
intuitions of common sense.

There are basically two ways in which individual probability values can be linked together
mathematically, and these in turn correspond to two basic kinds of logical linkage. The first is
associated with the common-sense meaning of the word “and,” and the second with the
common-sense meaning of the word “or.” In formal logic, the relationships denoted by these
words are spoken of as conjunction and disjunction, respectively. Conjunctive probability
questions take the general form, “What is the probability of having A and B occur?” and
disjunctive questions take the form, “What is the probability of having A or B occur?”
(Compound probabilities are sometimes described in the language of set theory. In this case,
conjunction will be spoken of as “intersection” and disjunction will be described as “union.”)

¶Conjunctive Probabilities: ‘A and B’, ‘A and B and C’, etc.

Here again the basic concept is firmly rooted in common sense. Suppose you were to toss a
penny twice. There is a 50% chance of getting a head on the first toss; and then, if you do
get a head on the first toss, there is also a 50% chance of getting a head on the second toss.
The probability that you will get a head on the first toss and on the second toss is therefore
50% of 50%. The rest is just elementary arithmetic. One-half of one-half is one-quarter; 50%
of 50% is 25%; or in decimal form, .5x.5=.25. If you were to toss a penny three times, the
probability that all three tosses would come up heads would be 50% of 50% of 50%, which is
12.5%; or again in decimal form, .5x.5x.5=.125. And so on for four tosses, five, ten, a
hundred, or a thousand.

The general principle is that conjunctive probabilities are linked together by the mathematical
act of multiplication. Thus, in the abstract, the probability of having A and B occur is the
product of the two separate component probabilities for A and B:

P(A and B) = P(A) x P(B)

The probability of having A and B and C occur is the product of the three separate component
probabilities for A and B and C:

P(A and B and C) = P(A) x P(B) x P(C)

and so on.

In the simple case of repeatedly tossing a coin, the probability of getting a head on any
particular toss is completely independent of the outcome of any other toss, past, present, or

2/10/22, 2:05 PM Ch5 Probability Pt2

vassarstats.net/textbook/ch5pt2.html 2/7

future. If you get a head on the first toss, the probability of getting a head on the second toss
is P(H)=.5; and if you get a tail on the first toss, the probability of getting a head on the
second toss is also P(H)=.5. No matter how many times you have tossed the coin, no matter
how many heads have already come up, or how many tails, the probability of getting a head
on the next toss is still exactly P(H)=.5. There are many other kinds of situations, however,
where the probability of an event is not independent but dependent—that is, where the
probability of one event depends on the outcome of some other event.

We will illustrate this distinction with an example that will also show how calculated
probabilities sometimes end up telling a story quite different from what common-sense
intuitions might have led you to expect. Imagine a room that contains 4 females and 6 males.
Question 1: If you were to select 3 persons from this room at random, what is the probability
that all 3 would be females? And question 2: If you were to select 3 persons from this room at
random, what is the probability that all 3 would be males? Since the room contains more
males than females, you will surely see intuitively that the probability for all-3-males will be
greater than the probability for all-3-females. But I expect it will not be intuitively obvious to
you at all that the probability for all-3-males is more than five times as great as the
probability for all-3-females. Here is how it works. The critical point to take note of is that the
probability of the outcome for each successive draw, after the first, depends on the
outcome(s) of the preceding draw(s).

First for the possibility that all 3 of the persons selected will be females. Clearly, the
probability of selecting a female on the first draw is the ratio 4/10, since there are 10 persons
in the room, of whom 4 are females. But then, once you have made your first selection, there
remain only 9 persons in the room from whom to make your second selection; and if your
first selection is a female, then only 3 of the remaining persons are females. Thus, if your first
selection happens to be a female, the probability of selecting a female on the second draw is
not 4/10, but rather 3/9.

After the second selection there are only 8 persons left in the room; and if both of the persons
already drawn are females, then only 2 of these 8 remaining are females. Thus, the
probability that the third draw will also be a female is not 4/10 nor 3/9, but rather 2/8. The
probability of selecting females in all three draws in this situation is therefore

P(all 3 females) = (4/10)x(3/9)x(2/8) = .033

The logic is the same for the possibility of selecting males on all 3 draws. The probability of
selecting a male on the first draw is 6/10, since the room contains 10 persons, of whom 6 are
males. But then, once you have made your first selection, there remain only 9 persons in the
room from whom to make your second selection; and if your first selection is a male, then
only 5 of the persons remaining are males. Thus, if your first selection happens to be a male,
the probability of selecting a male on the second draw is not 6/10, but rather 5/9. After the
second selection there are only 8 persons left in the room; and if both of the persons already
drawn are males, then only 4 of these 8 remaining are males. Thus, the probability that the
third draw will also be a male is not 6/10 nor 5/9, but rather 4/8. The probability of selecting
males on all three draws in this situation is therefore

P(all 3 males) = (6/10)x(5/9)x(4/8) = .167

which you will note is about 5 times larger than the P=.033 probability of selecting females on
all three draws.

In both of these calculations the basic principle is the same as the one we outlined at the
beginning of this section: the conjunctive probability of any two or more events is equal to the

2/10/22, 2:05 PM Ch5 Probability Pt2

vassarstats.net/textbook/ch5pt2.html 3/7

product of their separate component probabilities. It is simply a matter of making sure that
the values assigned to the component probabilities are the rightones.

¶Disjunctive Probabilities: ‘A or B’, ‘A or B or C’, etc.

The principle of disjunctive probabilities is even simpler. If you toss a coin, there is a 50%
chance that it will come up heads, a 50% chance that it will come up tails, and thus a 100%
chance that it will come up either heads or tails. With each lottery ticket you purchase you
have a 1 in 100 million chance of winning. With two lottery tickets your chance of winning
with either the first or the second is therefore 2 in 100 million. With three tickets, the chance
of winning with either the first or the second or the third is 3 in 100 million; and so forth.
When two or more chance events are disjunctively linked by the word “or,” the corresponding
mathematical linkage is just simple addition. Thus, the probability of having A or B occur is
equal to the sum of the two component probabilities for A and B:

P(A or B) = P(A) + P(B)

The probability of having A or B or C occur is the sum of the three separate component
probabilities for A and B and C:

P(A or B or C) = P(A) + P(B) + P(C)

and so on.

To put it concretely, imagine a class of 30 students composed of 4 freshmen, 12 sophomores,
10 juniors, and 4 seniors. The instructor announces that one of these students will be
randomly selected to win the class lottery prize, which is an automatic A in the course. The
probability that the lucky winner will be either a freshman or a sophomore is

4 freshmen

30 students
+

12 sophomores

30 students
=

30
= .53

and the probability that it will be either a freshman or a sophomore or a junior is

4 freshmen
30 students
+
12 sophomores
30 students
+

10 juniors

30 students
=

30
= .87

The restriction on this additive operation is that component probabilities can be added
together in this simple fashion only when they pertain to possibilities that are mutually
exclusive. In the example that we have just examined, each of the four academic-class
categories excludes all the others. If you are a freshman, you cannot at the same time be a
sophomore or a junior or a senior. If you are a sophomore, you cannot at the same time be a
freshman or a junior or a senior. And so forth.

But now consider the following variation on this classroom theme. Suppose the class has a
total of 26 students, of whom 12 are sophomores and 14 are juniors. Of the sophomores, 7
are females and 5 are males; and of the 14 juniors, 8 are females and 6 are males.
Question: In randomly selecting one of the members of this class to win the lottery, what is
the probability that the student selected will be either a sophomore or a female?

Here is how not to answer the question. The probability of selecting a sophomore is
12/26 (right!), and the probability of selecting a female is 15/26 (right!). The probability of

2/10/22, 2:05 PM Ch5 Probability Pt2

vassarstats.net/textbook/ch5pt2.html 4/7

selecting either a sophomore or a female is therefore

P(soph. or female) = P(soph.) +

P(female)

26
+

26
=

26
= 1.038[Wrong!]

Applying the simple additive formula in this situation would lead you to conclude that the
chance of selecting either a sophomore or a female is about 104%—which is patently absurd.
For reasons described earlier, the probability that any particular event or outcome will occur
must always fall within the range bounded at the bottom by P=0 (0%) and at the top by
P=1.00 (100%).

The reason why simple addition does not work in this situation can be gleaned by going back
and looking closely at some details. When you say that the component probability of selecting
a sophomore is 12/26, what you are really saying is

P(soph.)

7 soph. females + 5 soph. males

26 students

And when you say that the component probability of selecting a female is 15/26, what you are
really saying is

P(female)

7 soph. females + 8 jr. females

26 students

15
26

In and of themselves, these two component probabilities are quite correct—but now see what
happens when you add them together.

P(soph.)

P(female)
=

7 soph. females + 5 soph. males
26 students

7 soph. females + 8 jr. females
26 students

15
26

27
26

= 1.038

The complication here is that the two components, sophomore and female, are not mutually
exclusive. Seven of the 26 students are both sophomores and females, and these 7 students
are being counted twice: once because they are sophomores, and then again because they
are females. If you are counting apples in a basket and mistakenly count some of them twice,
you end up with an inflated measure of the number of apples. If you are counting the
elements that enter into a probability calculation and count some of them twice, you end up
with an inflated measure of probability.

The method commonly recommended for correcting this inflation is to calculate the disjunctive

2/10/22, 2:05 PM Ch5 Probability Pt2

vassarstats.net/textbook/ch5pt2.html 5/7

probability of “A or B” by simple addition, as we have just done, and then subtract from that
inflated sum the conjunctive (multiplicative) probability of “A and B.” That is, for cases where
the component probabilities are not mutually exclusive

P(A or B) = P(A) + P(B) — P(A and B)

The reason why this method works is that in subtracting the conjunctive probability of “A and
B,” you are in effect removing from the inflated sum the amount that was counted twice.

There are, however, two limitations of this method. The first is that the elemental
probabilities, P(A) and P(B), are sometimes not independent, in which case the conjunctive
portion of the above formula, “— P(A and B),” will involve some more or less complicated
conditional probabilities. Here again is an illustration of how not to do something.

P(soph. or female)

= P(soph.)+P(female) — P(soph and female)

= (12/26)+(15/26)—[(12/26)(15/26)]

= 1.038 — .266

= .772 [Wrong!]

Although this result is not obviously preposterous, as was our earlier P=1.038, a moment’s
reflection will show that it clearly cannot be correct. Among the 26 students are 6, namely the
junior males, who are neither sophomores nor females. The probability of selecting one of
these is 6/26=.231; and the probability of selecting someone other than one of these, namely
a sophomore or a female, is accordingly 1—.231=.769. [Correct!]

The first result, .772, is not merely wrong by a certain small amount. It is wrong
fundamentally and in principle, because the probability of selecting someone who is both a
sophomore and a female is not simply

P(soph.)

P(female)

12 soph.

26 students

15 females

26 students

= .266 [Wrong!]

Sure enough, the probability of selecting a sophomore is 12/26 (the number of sophomores
divided by the total number of students). But then, if you do select a sophomore, the
probability that this particular one of the 12 sophomores will also be a female is not 15/26
(the number of females divided by the total number of students), but rather 7/12 (the
number of female sophomores divided by the total number of sophomores). Hence

P(soph.)

P(female)

12 soph.
26 students

7 soph. females

12 soph.

7 soph. females
26 students

= .269 [Correct!]

So the true probability of selecting either a sophomore or a female, correcting for inflation and
taking account of the overlap between “sophomore” and “female,” is

P(soph. or female)

2/10/22, 2:05 PM Ch5 Probability Pt2

vassarstats.net/textbook/ch5pt2.html 6/7

= P(soph.)+P(female) — P(soph and female)

= (12/26)+(15/26)—[(12/26)(7/12)]

= 1.038 — .269

= .769 [Correct!]

which is of course identical with our earlier calculation of 1—.231=.769.

The second limitation is that even when the elemental probabilities are independent, the
method will not work when there are more than two items in the disjunction (A or B or C; A or
B or C or D; etc.). For example, suppose you are tossing 3 coins, A, B, and C. What is the
probability of getting a head (H) on either A or B or C? When you set it up in the manner just
described, the result is

P(H[A] or H[B] or H[C])
= P(H[A]) + P(H[B]) + P(H[C]) — P(H[A] and H[B] and H[C])

= (.5+.5+.5) — (.5x.5x.5) = 1.375[Wrong!]

which, again, is patently absurd.

The method that will work in such multi-item disjunctions, and which I recommend for two-
item disjunctions as well, is actually quite a simple one. In any particular situation, there is a
certain probability that the event or outcome in question will occur, and a certain probability
that it will not occur; and taken together, these two complementary probabilities must
always add up to P(Total)=1.0. Thus, one way of determining the probability that a disjunction
(“A or B,” “A or B or C,” etc.) will occur is to figure out the probability that it will not occur,
and then subtract that amount from 1.0. That is

P(that x will occur)

= P(Total) — P(that x will nor occur)
= 1 — P(that x will nor occur)

For example, when tossing 3 coins, A, B, and C, the only way you could not get the
disjunctive outcome of a head (H) on either A or B or C would be to get the conjunctive
outcome of a tail (T) on A and B and C. The probability of getting a tail on all 3 of the tosses
is

.5x.5x.5 = .125

So the complementary probability of not getting a tail on all 3 tosses, which can happen only
if you get a head on at least one of the tosses—A or B or C—is

1.0 — .125 = .875

In principle, virtually any probability situation you might ever encounter within the context of
inferential statistics can be seen as a more or less complex combination of elemental
conjunctive and disjunctive probabilities. In actual practice you will not normally need to see it
this way, because most inferential statistical procedures are streamlined to the point where
these elemental constituents are no longer visible. Nonetheless, it is important that you have
a sense of what is going on behind the scenes in these streamlined procedures. In the next
section we will examine some of the basic ways in which conjunctive and disjunctive

2/10/22, 2:05 PM Ch5 Probability Pt2

vassarstats.net/textbook/ch5pt2.html 7/7

probabilities can be combined, and then in Chapter 6 we will see how these potentially very
complex and unwieldy combinations can be transformed into structures of elegant simplicity.

End of Chapter 5, Part II.
Return to Top of Chapter 5, Part II

Go to Chapter 5, Part 3

Print this Window

Home Click this link only if the present page does not appear in a frameset headed by the logo
Concepts and Applications of Inferential Statistics

http://vassarstats.net/textbook/ch5pt3.html

http://vassarstats.net/textbook/index.html

2/10/22,2:06 PM Ch5 Probability Pt3

vassarstats.net/textbook/ch5pt3.html 1/7

Print this Window

Chapter 5. Basic Concepts of Probability
Part III

Conjunction and Disjunction as the Building Blocks of More Complex Probabilities

We will illustrate our points with an imaginary example taken from a medical context. There is
a certain disease that has an abrupt and unmistakable onset, and for which there is currently
no effective treatment. Of all the persons who come down with this disease, 40%
spontaneously recover within two months and the remaining 60% do not recover within two
months. As there is no way of predicting which patient will recover and which will not, the
two-month outcome for any particular patient is regarded as a matter of mere chance. Thus,
for any particular patient who comes down with the disease, the probability of spontaneously
recovering within two months is

P(recovery)=.4

and the probability of not recovering within two months is

P(non-recovery)=.6

(Note that these two probabilities are a posteriori, in the sense that they are based on the
rates of recovery and non-recovery in previously observed cases of the disease.)

Then comes a report from a field botanist working in the Brazilian rain forests of a certain
plant that the indigenous peoples of the region have long used to treat the disease,
apparently with some degree of success. A team of researchers reasons that perhaps the
plant contains an ingredient that could be developed into an effective medical treatment for
the disease. But as the isolation and refinement of that ingredient would surely prove quite
costly, they first undertake to determine experimentally whether an extract from the plant, in
its raw form, shows any effectiveness at all in treating the disease. In the next few
paragraphs we will be describing their two-stage effort to put the raw plant extract to an
experimental test—first, with a small sample of patients; and then, when the first experiment
proves promising, with a much larger sample. We will stipulate that the researchers determine
beforehand that the raw plant extract has no toxic effects. Also, please note that in both of
these investigations the experimental hypothesis is directional. The researchers are expecting
that the extract will produce a recovery rate significantly greater than the 40% rate
attributable to mere chance spontaneous recovery.

In their preliminary test of this hypothesis, the researchers administer the plant extract to 10
randomly selected patients, beginning for each patient immediately after the onset of the
disease and continuing for two months. As the experiment runs its course, the investigators
observe that 7 of the 10 patients have recovered within their respective two-month periods,
while 3 have not recovered.

Now clearly, the 70% recovery rate observed in this experiment is greater than the 40%
baseline rate for spontaneous recovery. But the question is, is it significantly greater? This is
the question of statistical significance, which you will recall is essentially a question of how
likely or unlikely it is that the observed result could have been produced by nothing more than
mere chance coincidence. Specifically for the present example, how likely or unlikely it is that

2/10/22, 2:06 PM Ch5 Probability Pt3

vassarstats.net/textbook/ch5pt3.html 2/7

as many as 7 of the 10 patients could have recovered by mere chance (“spontaneously”), if
the plant extract was completely ineffective?

The first thing to notice about a question of this general type is that it has several layers. The
first layer is indicated by the phrase “as many as,” which reveals itself upon analysis to be
simply another way of saying “this many or more.” The logic of this point applies to scientific
research in general. In most cases, the pertinent probability question is not, What is the
probability of getting exactly this result? Rather it is, How likely is it that mere chance
coincidence might have produced a result “as large as this,” which is to say, “this large or
larger?” (Sometimes the question is in the opposite direction: “… this small or smaller?”) So
the first layer of the question takes the form of a disjunction: How likely is it that mere chance
coincidence could have produced as many as 7 recoveries out of 10, which is to say, either
exactly 7 recoveries out of 10, or exactly 8 out of 10, or 9 out of 10, or 10 out of 10?
Abbreviating “recoveries” as “R,” the formal expression would be

P(7 or more R out of 10)
= P(7R or 8R or 9R or 10R)
= P(7R) + P(8R) + P(9R) + P(10R)

So, essentially, it is just a task of adding up several component probabilities. The only
complication is that, in order to add them, you first have to get them. We will introduce the
logic of this type of task by examining a much simpler analogy involving coin tosses. If you
toss 3 coins, A, B, and C, what is the mere-chance probability of getting as many as 2 heads,
that is, either exactly 2 heads or exactly 3 heads? Here again (abbreviating “heads” as “H”),
the first statement of the question takes the form of a disjunction:

P(2 or more H in 3 tosses)
= P(2H or 3H)
= P(2H) + P(3H)

But once you start getting into its details, you find that even this relatively simple situation
has a rather complex structure. The source of the complexity is that, while there is only one
combination of heads (H) and tails (T) that will yield exactly 3 heads in 3 tosses:

A B C
H H H

there are several different ways of getting exactly two heads in 3 tosses, namely:
A B C
H H T
H T H
T H H

As illustrated in the following table, you can think of this complex situation as a network of
probability pathways. The network has two main branches, labeled 2H and 3H, of which the
first has three separate sub-branches, one for each of the three ways in which it is possible to
get exactly 2 heads in 3 tosses. Each separate branch or sub-branch represents a conjunctive
(multiplicative) probability, and where branches or sub-branches converge there is a
disjunctive (additive) probability. Thus, there are 3 separate sub-pathways for reaching the
outcome of exactly 2 heads, each with the conjunctive probability .5x.5x.5=.125. (Recall that
P(H)=.5 and P(T)=.5.) The disjunctive probability of reaching this outcome one way or the

2/10/22, 2:06 PM Ch5 Probability Pt3

vassarstats.net/textbook/ch5pt3.html 3/7

other is therefore .125+.125+.125=.375. There is only one pathway for reaching the outcome
of exactly 3 heads; it also has a conjunctive probability of .5x.5x.5=.125. Thus, the overall
disjunctive probability of getting as many as 2 heads in 3 tosses (i.e., exactly 2 heads or
exactly 3 heads) is .375+.125=.5.

Outcome Probability

HHT .5x.5x.5=.125

.375HTH .5x.5x.5=.125

THH .5x.5x.5=.125

3H HHH .5x.5x.5=.125 .125

Total = .5

Let us now apply the same logic to the more complex example of our medical experiment.
The task, recall, is to figure out the disjunctive probability of getting by mere chance, in the
absence of any effective treatment, one or the other of the following potential outcomes:

7 patients recover and 3 do not recover; or
8 patients recover and 2 do not recover; or
9 patients recover and 1 does not recover; or
all 10 patients recover

We will begin with the most extreme outcome, because that is the simplest. Just as there is
only one way of getting 3 heads in 3 coin tosses, there is only one way of getting 10
recoveries in 10 patients. Obviously that could occur only if patient 1 recovers, and patient 2
recovers, and patient 3 recovers, and so on through patient 10. (Recall that P(recovery)=.4 and
P(non-recovery)=.6.) The probability for the extreme outcome of 10 recoveries out of 10 is
therefore just the single conjunctive pathway

.1x.2x.3x.4x.Patient
1x.2x.3x.4x.5x.6x.7x.8x.9×10
.4x.4x.4x.4x.4x.4x.4x.4x.4x.4 = .000,105

As we move on to the less extreme outcomes, however, we begin to find the pathways
branching into a multiplicity of sub-pathways. The extreme outcome of 10 recoveries out of
10 patients can be reached by only one route. The nearest less extreme outcome, 9
recoveries out of 10, on the other hand, can be reached by any one or another of 10 possible
sub-pathways. That is, this outcome would be produced by any one or another of 10 different
combinations of recoveries and non-recoveries. The first of these would be for the case where
all patients except patient 1 recover; the second would be for the case where all except
patient 2 recover; the third for the case where all except patient 3 recover; and so on, up
through the case where all patients except patient 10 recover. As shown by the following
calculations, this would constitute a total of 10 conjunctive sub-pathways for the potential
outcome of 9 recoveries out of 10 patients, each with a probability equal to .49x.6=.000,157.

.1x.2x.3x.4x.Patient
1x.2x.3x.4x.5x.6x.7x.8x.9×10
.6x.4x.4x.4x.4x.4x.4x.4x.4x.4 = .49x.6 = .000,157
.4x.6x.4x.4x.4x.4x.4x.4x.4x.4 = .49x.6 = .000,157
.4x.4x.6x.4x.4x.4x.4x.4x.4x.4 = .49x.6 = .000,157

2/10/22, 2:06 PM Ch5 Probability Pt3

vassarstats.net/textbook/ch5pt3.html 4/7

.4x.4x.4x.6x.4x.4x.4x.4x.4x.4 = .49x.6 = .000,157

.4x.4x.4x.4x.6x.4x.4x.4x.4x.4 = .49x.6 = .000,157

.4x.4x.4x.4x.4x.6x.4x.4x.4x.4 = .49x.6 = .000,157

.4x.4x.4x.4x.4x.4x.6x.4x.4x.4 = .49x.6 = .000,157

.4x.4x.4x.4x.4x.4x.4x.6x.4x.4 = .49x.6 = .000,157

.4x.4x.4x.4x.4x.4x.4x.4x.6x.4 = .49x.6 = .000,157

.4x.4x.4x.4x.4x.4x.4x.4x.4x.6 = .49x.6 = .000,157

The disjunctive probability of reaching the outcome one way or the other is therefore the sum
of these 10 sub-pathway probabilities, which (since the 10 separate probabilities are all the
same) can be calculated multiplicatively as

10 x (.49 x .6) = .00157

The principle illustrated by this calculation applies to all such cases where we are interested in
the probability of getting “10 out of 10,” “9 out of 10,” “22 out of 30,” and so forth.
Abstractly, we can speak of it as the probability of getting “k out of N,” where k represents the
first number in such an expression and N represents the second. Two other abstract terms
that we will need for this formulation are p, which is the probability that the event in question
(e.g., the recovery of a patient) will occur in any particular instance; and q, which is the
complementary probability that the event in question will not occur. For all situations of this
general type, the number of ways in which it is possible to get the result of “k out of N” (10
out of 10, 9 out of 10, 8 out of 10, and so on) is given by the formula

number of ways(k out of N) =
N!

k!(N—k)!
and the probability of getting the result in any particular one of these ways is

pk x qN-k

Put it all together and you have

P(k out of N) =
N!

k!(N—k)!

x pk x qN-k

(A brief refresher course on the factorial and exponential operations required to apply this
formula is provided in SideTrip 5.1).

Listed below are the full calculations, using this formula, for the four pathway probabilities
that are needed to answer our original question concerning the probability of getting as many
as 7 recoveries out of 10 patients, by mere chance, in the absence of any effective treatment.
The results of the intermediate exponential operations are rounded to six decimal places, and
the final result of each calculation is rounded to four decimal places.

Probability that exactly 7 out of 10 patients will recover:

[N=10, k=7, p=.4, q=.6]

10!

7!x3!

.47x.63

3,628,800

5,040×6

.001,638x.216

javascript:display()

2/10/22, 2:06 PM Ch5 Probability Pt3

vassarstats.net/textbook/ch5pt3.html 5/7

120

.000,354

= .0425

Translation: There are 120 different ways of reaching the result of 7 recoveries out of 10
patients, and each of those ways has a mere-chance probability of .000,354. The probability
of reaching the result one way or the other is therefore 120x.000,345=.0425.

Probability that exactly 8 out of 10 patients will recover:

[N=10, k=8, p=.4, q=.6]

10!

8!x2!

.48x.62

3,628,800

40,320×2

.000,655x.36

.000,236

= .0106

Translation: There are 45 different ways of reaching the result of 8 recoveries out of 10
patients, and each of those ways has a mere-chance probability of .000,236. The probability
of reaching the result one way or the other is therefore 45x.000,236=.0106.

Probability that exactly 9 out of 10 patients will recover:

[N=10, k=9, p=.4, q=.6]

10!

9!x1!

.49x.6

3,628,800

362,880×1

.000,262x.6

.000,157

= .0016

Translation: There are 10 different ways of reaching the result of 9 recoveries out of 10
patients, and each of those ways has a mere-chance probability of .000,157. The probability
of reaching the result one way or the other is therefore 10x.000,157=.0016.

Probability that all 10 patients will recover:

[N=10, k=10, p=.4, q=.6]

10!

10!x0!

.410x.60

3,628,800 x

.000,105×1

2/10/22, 2:06 PM Ch5 Probability Pt3

vassarstats.net/textbook/ch5pt3.html 6/7

3,628,800
1

.000,105

= .0001

Translation: There is only one way of reaching the result of 10 recoveries out of 10 patients,
and that way has a mere-chance probability of .0001.

The full answer to the question is then simply the sum of these separate pathway
probabilities:

P(7 or more R out of 10)

= P(7R) + P(8R) + P(9R) + P(10R)

= .0425 + .0106 + .0016 + .0001 = .0548

Translation: There is a 5.48% likelihood that as many as 7 recoveries out of 10 patients would
occur by mere chance coincidence, in the absence of any effective treatment.

You will recall from Chapter 4 that the standard criterion for statistical significance is the 5%
level. Any observed result that has a mere-chance likelihood equal to or less than 5% (P<.05) is regarded as significant, and any observed result that has a mere-chance likelihood greater than 5% (P>.05) is regarded as non-significant. By this criterion the present result (P=.0548)
falls short of significance, though only by a very slight distance. Most investigators would be
inclined to describe such a narrow miss as marginally significant, which is a compact way of
saying “not quite significant, but close enough to warrant further investigation.”

Following this custom, our medical researchers regard their results as promising and proceed
to put the plant extract to a much fuller test. Making use of the contacts they have with a
number of clinical facilities throughout the land, they arrange to have the extract administered
to a total of 1,000 patients, again beginning for each patient immediately after the onset of
the disease and continuing for two months. As the new experiment runs its course, the
investigators observe that 430 of the 1,000 patients have recovered within their respective
two-month periods, while 570 have not recovered. This 43% recovery rate is of course much
lower than the non-significant 70% rate observed in the first experiment—and, indeed, only
slightly above the 40% baseline rate of mere chance spontaneous recovery. But keep in mind
that we are now dealing with a much larger sample. It is the same point made earlier in our
introductory mind-over-matter example, concerning the difference between a 10-toss test and
a 100-toss test. Clearly, it would be much easier for mere chance to produce a 70% recovery
rate in a sample of 10 patients than in a sample of 1,000 patients. What you are now being
asked to consider is the possibility that for a sample of 1,000 patients even a recovery rate of
43% would be very unlikely to result from mere chance.

From the fact that I am bothering to use this “430 out of 1,000” scenario as an example, you
can safely assume that the result will prove significant, once we figure out the details. Except
for the difference in the numbers, the question of statistical significance here is exactly the
same as before: given that any particular patient has a 40% chance of spontaneous recovery
within two months, and a 60% chance of non-recovery, how likely or unlikely it is that as
many as 430 out of any particular set of 1000 patients could recover by mere chance, if the
plant extract has no effect whatsoever? Essentially, it is the disjunctive probability of getting
either 430 recoveries out of 1000, or 431 out of 1000, or 432 out of 1000, and so on, up
through 1000 out of 1000.

2/10/22, 2:06 PM Ch5 Probability Pt3

vassarstats.net/textbook/ch5pt3.html 7/7

The logic for this question is exactly the same as for the one we have just examined. The only
difference is in the complexity of the details. For the probability of 7 or more recoveries out of
10 you only need to perform four main pathway calculations, whereas for 430 or more out of
1000 you need to perform a total of 571—and most of these would involve factorial and
exponential operations of rather staggering proportions. Take a few minutes to try to work out
just the first of these calculations

P(430 R out of 1000) =
1000!

430!x570!
x .4430 x .6570

and you will find yourself hoping there might be an easier way. Fortunately, there is.

End of Chapter 5.
Return to Top of Chapter 5, Part 3

Go to Chapter 5 Appendix [Exact Binomial Probability Calculator]
Go to Chapter 6 [Introduction to Probability Sampling Distributions]

Print this Window

Home Click this link only if the present page does not appear in a frameset headed by the logo
Concepts and Applications of Inferential Statistics

http://vassarstats.net/textbook/ch5apx.html

http://vassarstats.net/textbook/ch6pt1.html

http://vassarstats.net/textbook/index.html

2/10/22,2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 1/9

Print this Window

Chapter 6.
Introduction to Probability Sampling Distributions:

From Jagged Complexity to Streamlined Simplicity

“Simplicity, simplicity, simplicity! I say, let
your affairs be as two or three, and not a

hundred or a thousand. … Simplify, simplify.”
—Thoreau, Walden

In the historical development of probability theory, the first step in the streamlining occurred
in connection with what are known as binomial probabilities. Although the term is perhaps
new to you, the concept it describes is simply an extension of the matters we have just been
discussing in Chapter 5. A binomial probability situation is one, such as the mere-chance
occurrence of heads when tossing coins, or the mere-chance recovery of patients from a
disease, for which you can specify

the probability that the event
or outcome in question will
occur in any particular
instance

E.g., p=.5, the probability that any particular tossed
coin will come up as a head; p=.4, the probability that
any particular patient will spontaneously recover.

the probability that the event
or outcome in question will
not occur in any particular
instance

E.g., q=.5, the probability that any particular tossed
coin will not come up as a head; q=.6, the probability
that any particular patient will not spontaneously
recover.

the number of instances in
which the event or outcome
has the opportunity to occur

To show how the streamlining came about, we will begin by considering the very simple
binomial probability situation in which N is equal to 2, and then work our way up through
some more complex binomial situations. If you are tossing 2 coins with elemental probabilities
of p=.5 for the outcome of getting a head on any particular one of the coins (labeled below
as “H”) and q=.5 for the outcome of not getting a head (labeled as “–“), the possible
conjunctive outcomes and their associated probabilities are

Coin
Sub-pathway

Probability

Number
of Heads

Main Pathway
Probability

—

.5x.5=.25

.25

(25%)

—

.5x.5=.25

.50

(50%)

.5x.5=.25

.25

(25%)

Totals 1.0

(100%)

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 2/9

Similarly, if you are considering 2 randomly selected patients with elemental probabilities of
p=.4 for any particular one of the patients spontaneously recovering (labeled below as “R”)
and q=.6 for not spontaneously recovering (labeled as “–“), the possible conjunctive
outcomes and their associated probabilities are

Patient
Sub-pathway
Probability

Number of
Recoveries

Main Pathway
ProbabilityA B

— — .6x.6=.36 0 .36 (36%)

—

R
—

.6x.4=.24

.4x.6=.24
1 .48 (48%)

R R .4x.4=.16 2 .16 (16%)

Totals 1.0 (100%)

In Figure 6.1 we display these two sets of probabilities side-by-side in the form of two
histograms. In each case, you can think of the total area of the histogram as representing
100% of the total probability that applies to that particular situation. Thus, for both examples
there is a 100% chance that the outcome will include either zero or 1 or 2 of the particular
events in question—heads or patient recoveries. All that differs is the way in which the 100%
total is divided up by the 3 possible outcomes: 25%|50%|25% for the coin-toss example
versus 36%|48%|16% for the patient-recovery example.

Figure 6.1. Two Binomial Sampling Distributions for the Case where N=2

These two sets of probabilities are illustrative of a large class of theoretical structures
collectively known as probability sampling distributions, so named because they describe
the probabilities for the entire range of outcomes that are possible in the particular situations
that are under consideration.

Following this chapter is an appendix that will dynamically generate the graphical outlines and
numerical details of the binomial sampling distribution for any values of p and q, and for any
value of N between 1 and 40, inclusive.

Thus, the sampling distribution for the coin-toss situation (N=2, p=.5, q=.5) specifies that
any particular randomly selected sample of 2 tossed coins has a 25% chance of including zero
heads, a 50% chance of including exactly 1 head, and a 25% chance of including 2 heads.
The sampling distribution for the patient-recovery situation (N=2, p=.4, q=.6) specifies that
any particular sample of 2 randomly selected patients who have come down with this disease
has a 36% chance of ending up with zero recoveries, a 48% chance of ending up with exactly

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 3/9

1 recovery, and a 16% chance of ending up with 2 recoveries. By extension, these two
sampling distributions would also specify that any particular randomly selected sample of 2
tossed coins has a 50%+25%=75% chance of including at least 1 head, and that any
particular sample of 2 randomly selected patients has a 48%+16%=64% chance of ending up
with at least one recovery.

Another way of interpreting a probability sampling distribution would be to say that it
describes the manner in which the outcomes of any large number of randomly selected
samples will tend to be distributed. Thus, a large number of randomly selected samples of 2
tossed coins would tend to have 25% of the samples including zero heads, 50% including
exactly 1 head, and 25% including 2 heads. A large number of randomly selected 2-patient
samples would tend to have 36% of the samples with zero recoveries, 48% with exactly 1
recovery, and 16% with 2 recoveries.

The interpretation of central tendency and variability for a sampling distribution is essentially
the same as for any other distribution. The central tendency of the distribution describes the
average of all the outcomes, and its variability is the measure of the tendency of individual
outcomes to be dispersed away from that average. For the particular case of a binomial
probability sampling distribution, these parameters can be easily determined according to the
formulas given below. [Note the new symbols used here. When referring to an entire
population of potential outcomes, the convention is to use (lower-case Greek letter “mu”)
for the mean and (lower-case Greek letter “sigma”) for the variance and standard
deviation.]

mean: = Np

variance: 2 = Npq

standard deviation: = sqrt[Npq]

Thus, for the coin-toss example, with N=2, p=.5, and q=.5, you have

mean: = 2x.5 = 1.0

variance: 2 = 2x.5x.5 = 0.5

standard deviation: = sqrt[0.5] = ±0.71

And for the patient-recovery example, with N=2, p=.4, and q=.6, you have

mean: = 2x.4 = 0.8

variance: 2 = 2x.4x.6 = 0.48

standard deviation: = sqrt[0.48] = ±0.69

Although we are illustrating these various points about probability sampling distributions with
the specific examples of coin tosses and patient recoveries, please keep in mind as we
proceed that these distributions are abstract and general, in the sense that they apply to any
probability situation at all that has certain defining properties. Thus, the sampling distribution
shown in the left-hand histogram of Figure 6.1 is not limited to the example of tossing two
coins. It applies to any situation at all in which the probability of a certain event is p=.5, the
complementary probability of its non-occurrence is q=.5, and there are N=2 opportunities for
it to occur. Similarly, the sampling distribution shown in the right-hand histogram of
Figure 6.1 applies to any situation at all in which the defining properties are p=.4, q=.6, and
N=2.

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 4/9

At any rate, in examining Figure 6.1 you can hardly fail to notice that the two sampling
distributions have rather different shapes. Both are unimodal, but whereas the one for the
coin-toss situation (p=.5, q=.5) is symmetrical, the one for the patient-recovery example
(p=.4, q=.6) has a pronounced positive skew. But now see what happens when the size of
the sample is increased from N=2 to N=10. The specific probabilities for the outcomes
depicted by the histogram columns in Figure 6.2 have been calculated using the factorial and
exponential formula examined in Chapter 5,

P(k out of N) =
N!

k!(N—k)!

x pk x qN-kT

substituting N=10 along with the appropriate values of p, q, and k.

Figure 6.2. Two Binomial Sampling Distributions for the Case where N=10

Please take a moment to compare the outlines of Figure 6.2 with those of Figure 6.1, for the
transformation is quite remarkable. As a result of increasing the size of the sample to N=10,
both sampling distributions have grown smoother, and the one for the patient-recovery
example has in addition grown much more symmetrical. You will also surely recognize the
smooth curve that has been superimposed upon the two sampling distributions in Figure 6.2
as the by-now familiar outline of the normal distribution.

In Figure 6.3 we increase the size of the sample to N=20, and there you can see these trends
carried even further. The principle illustrated by Figures 6.1, 6.2, and 6.3 is that, as the size
of N increases, the shape of a binomial sampling distribution comes closer and closer to the
shape of the normal distribution. Eventually it comes so close as to be equivalent to the
normal distribution, for all practical purposes—and that is the point where all the jagged
complexity that we have been describing suddenly gives way to a smooth, streamlined
simplicity. The reason for this remarkable convergence is explained in SideTrip 6.1

Figure 6.3. Two Binomial Sampling Distributions for the Case where N=20

If the elemental probabilities, p and q, are both equal to .5, as in the coin-toss example, a

javascript:display1()

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 5/9

sufficiently close approximation to the normal distribution is reached at the point where N is
equal to 10. If p and q are anything other than .5, it is reached at some point beyond N=10.
In general, a binomial sampling distribution may be regarded as a sufficiently close
approximation to the normal distribution if the products of Np and Nq are both equal to or
greater than 5. That is

Np>5 and Nq>5
Criterion for determining that a binomial
sampling distribution is a sufficiently close
approximation of the normal distribution.

For elemental probabilities of .4 and .6, as in our patient-recovery example, the point of
sufficiently close approximation is reached at N=13; for .3 and .7, it is reached at N=17;
for .2 and .8, it is reached at N=25; and so on. The simple way of determining these points is
to divide 5 by either p or q, whichever is smaller, rounding the result up to the next higher
integer if it comes out with a decimal fraction. Thus, for .4 and .6 you have 5/.4=12.5, which
rounds up to N=13; for .3 and .7 it is 5/.3=16.67, which rounds up to N=17; and so on.

The graph of the normal distribution that appears in Figure 6.4 will help you see where all this
is leading. Once you know that a distribution is normal, or at least a close approximation of
the normal, you are then in a position to specify the proportion of the distribution that falls to
the left or right of any particular point along the horizontal axis, or alternatively the
proportion that falls in-between any two points along the horizontal axis. And specifying the
proportion is tantamount to specifying the probability. For the standardized normal
distribution, these points along the horizontal axis are marked out in units of z, with each unit
of z being equal to 1 unit of standard deviation, .

Figure 6.4. The Unit Normal Distribution

Most of the percentage values that appear in the body of the graph will already be familiar to
you. To the right of z=+1 falls 15.87% of the total distribution; to the right of z=+2 falls
2.28% of the total distribution; and so on. And since the normal distribution is precisely
symmetrical, the same percentages also fall to the left of corresponding negative values of z.
For reasons that will be evident in a moment, Figure 6.4 also shows shaded areas
corresponding to the proportions of the normal distribution that fall to the left of z=—1.14 and
to the right of z=+1.14—in each case, 12.71%. An abbreviated version of the standard table
of the normal distribution is shown in Table 6.1. The complete version is given in Appendix A.
(Please note that the entries in Table 6.1 are not listed as percentages but as proportions;
e.g., .1587 instead of 15.87%.)

Table 6.1. Proportions of the normal distribution falling to the left of negative values

javascript:display2()

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 6/9

of z or to the right of positive values of z. Bold-face entries pertain to examples
discussed in the text.

±z

0.00
0.20
0.40
0.60
0.80
1.00
1.14
1.20
1.40
1.60

Area
Beyond ±z

.5000
.4702
.3446
.2743
.2119
.1587
.1271
.1151
.0808
.0548

±z

1.80
1.90
2.00
2.20
2.40
2.60
2.80
3.00
3.50
4.00

Area
Beyond ±z

.0359
.0287
.0228
.0139
.0082
.0047
.0026
.0013
.0002
.00003

Now turn to Figure 6.5 where you will see this same standardized normal distribution
superimposed on the binomial sampling distribution that applies to our patient-recovery
example with a sample size of 20. The defining properties of this sampling distribution are
N=20, p=.4, q=.6. Hence

mean: = 20x.4 = 8.0

standard deviation: = sqrt[20x.4x.6] = ±2.19

Figure 6.5. Binomial Sampling Distribution for N=20, p=.4, q=.6

You will note that there are two scales along the horizontal axis of Figure 6.5. The first,
labeled k, delineates “number of recoveries in 20 patients”; and the second, labeled z, is a
direct translation of this binomial k scale into units of standard deviation, with each unit of
standard deviation equal to ±2.19 units of k. Any particular one of the discrete values on the
k scale—0, 1, 2, … , 18, 19, 20—can be directly translated into its corresponding value on the
z scale by application of the formula

z = (k— )±.5
Note 1. Only the positive value of is used in the denominator of this expression.

Note 2. The component ±.5 (‘plus or minus .5’) in the numerator is a ‘correction
for continuity’, aimed at transforming the zig-zag angularity of the binomial

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 7/9

distribution into the smooth curve of the normal distribution. It achieves this effect
by reducing the distance between k and the mean by one-half unit. Thus, when k
is smaller than the mean you add half a unit (+.5), and when k is larger than the
mean you subtract half a unit (—.5).

Thus, for k=5 we have

z =
(5—8 )+.5

2.19

=
—2.5

2.19

= —1.14

and for k=11:

z =
(11—8 )—.5

2.19

=
+2.5

2.19

= +1.14

With the calculation of z for a binomial outcome of the general type “k or fewer” or “k or
more,” you are then in a position to use the normal distribution for assessing the probability
associated with that outcome. You simply treat your calculated value of z as though it
belonged to the z scale of the normal distribution—and read off the corresponding probability
value from the table of the normal distribution. The two particular values of z that we have
just calculated are z=—1.14 for “5 or fewer” recoveries and z=+1.14 for “11 or more”
recoveries. As we saw a moment ago in Figure 6.5 and Table 6.1, the proportions of the
normal distribution falling to the left of z=—1.14 and to the right of z=+1.14 are in each case
equal to .1271. Hence, on the basis of the normal distribution we would judge that with a
random sample of 20 patients there is a 12.71% chance that as few as 5 will spontaneously
recover, and an equal 12.71% chance that as many as 11 will spontaneously recover. If you
work out the exact binomial probabilities for these two outcomes using the factorial and
exponential formula given earlier, you will find that they come out to 12.56% for “5 or fewer”
and 12.75% for “11 or more.” The 12.71% probability values arrived at via the normal
distribution do not hit these targets precisely, but for all practical purposes they are close
enough. The difference between .1271 and .1256 is only .0015; between .1271 and .1275 it is
only .0004.

To appreciate just how useful and labor-saving this streamlining can be, return for a moment
to the second experiment of our medical researchers, the one in which there were 430
recoveries out of 1,000 patients. The question is: In a random sample of 1,000 patients, how
likely is it that as many as 430 (k>430) would recover by mere chance, if the plant extract
had no effectiveness whatsoever? First try to imagine how much it would cost you in time and
patience to perform the requisite 571 exact binomial calculations according to the factorial
and exponential formula, assuming you had access to a calculator powerful enough to perform
huge operations such as 1,000! and .4430x.6570—and then reflect on how very much easier it
is to reach the same goal on the basis of the normal distribution.

First establish the defining properties of the relevant binomial sampling distribution, making
sure that the products of Np and Nq are both equal to or greater than 5:

N=1,000, p=.4, q=.6. Hence

mean: = 1,000x.4 = 400

standard deviation: = sqrt[1,000x.4x.6] = ±15.49

Then perform the simple calculation that translates the details of the binomial situation into
the language of the normal distribution:

z =

(430—400 )—.5 =

+29.5 = +1.90

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 8/9

15.49 15.49

And then refer your calculated value of z to the table of the normal distribution. For the
moment you need only refer to the abbreviated version shown in Table 6.1, where you will
find that the proportion of the normal distribution falling to the right of z=+1.9 is equal
to .0287.

The meaning of this proportion is illustrated in Figure 6.6. Of all the possible spontaneous-
recovery outcomes that could have occurred within this sample—zero recoveries, 1 recovery,
2 recoveries, and so on—only 2.87% would include as many as 430 recoveries. The mere-
chance probability of the observed result is therefore P=.0287, which of course clears the
conventional cut-off point for statistical significance (P<.05) by a considerable margin.

Figure 6.6. Location of z=+1.90 within the Unit Normal Distribution

Moving beyond the rather rigid mechanics of the .05 criterion of statistical significance, you
can also think of this P=.0287 probability value in terms of confidence. There is only about a
3% likelihood that the observed outcome would have occurred by mere chance coincidence.
Hence, you can be about 97% confident that it is the result of something other than mere
chance coincidence—presumably some active ingredient of the plant extract. The researchers
can now go on to the costly task of isolating and refining that ingredient, with a high degree
of confidence that they are on the right track.

As we indicated earlier, the streamlining of binomial probabilities was only the first historical
step. In the course of time there came the discovery and detailed analysis of several other
families of sampling distributions, each providing a relatively simple streamlined method for
answering inferential-statistical questions of the general type: How likely is it that such-and-
such might occur by mere chance coincidence? In the remaining chapters of this text we will
be making extensive use of some of these families of sampling distributions, in particular,
several that are variations on the normal distribution, plus some others that are known as
t-distributions, F-distributions, and chi-square distributions. Of all of these, it is the family of
chi-square sampling distributions whose logic follows most closely upon the concepts we have
been developing in the present chapter, so that is where we will go next, in Chapter 8. First,
however, in Chapter 7, another brief interlude on the general concept of statistical
significance.

This chapter includes two appendices:

2/10/22, 2:06 PM Ch6 Intro Sampling Distributions

vassarstats.net/textbook/ch6pt1.html 9/9

1. Binomial Sampling Distribution Generator, which will produce a graphic and numerical
display of the properties of a binomial sampling distribution; and

2. z to P Calculator, which will calculate the probabilities associated with any particular
value of z.

End of Chapter 6.
Return to Top of Chapter 6

Go to Chapter 7 [Tests of Statistical Significance: Three Overarching Concepts]

Print this Window

Home Click this link only if the present page does not appear in a frameset headed by the logo
Concepts and Applications of Inferential Statistics

http://vassarstats.net/textbook/binomial01x.html

http://vassarstats.net/textbook/ch6apx.html

http://vassarstats.net/textbook/ch7pt1.html

http://vassarstats.net/textbook/index.html

American Economic Journal: Macroeconomics 2015, 7(2): 249–290
http://dx.doi.org/10.1257/mac.20130255

249

Unemployment Insurance Fraud and Optimal Monitoring †

By David L. Fuller, B. Ravikumar, and Yuzhe Zhang *

An important incentive problem for the design of unemployment
insurance is the fraudulent collection of unemployment benefits by
workers who are gainfully employed. We show how to efficiently
use a combination of tax/subsidy and monitoring to prevent such
fraud. The optimal policy monitors the unemployed at fixed intervals.
Employment tax is nonmonotonic: it increases between verifications
but decreases after a verification. Unemployment benefits are rela-
tively flat between verifications but decrease sharply after a verifica-
tion. Our quantitative analysis suggests that the optimal monitoring
cost is 60 percent of the cost in the current US system. (JEL D82,
H24, J64, J65)

Unemployment insurance programs insure workers against the risk of losing their jobs through no fault of their own. Such insurance, however, has many
potential incentive problems. In this paper, we study the incentive problem associ-
ated with fraudulent collection of unemployment benefits. The US Department of
Labor finds that more than 60 percent of unemployment insurance fraud overpay-
ments are attributed to concealed earnings fraud—when a worker collecting unem-
ployment benefits finds a job but continues collecting the benefits. Motivated by this
fact, we study optimal unemployment insurance in an environment where workers
can conceal earnings and collect unemployment benefits.

We study an infinitely lived worker in continuous time who has CARA prefer-
ences, is initially unemployed, and faces a stochastic arrival of employment oppor-
tunities. Employment is assumed to be an absorbing state. An employed worker can
conceal his employment status and continue to claim unemployment benefits. The
worker’s employment status can be detected using a costly monitoring technology.
In order to focus on the issue of hidden employment, we abstract from moral hazard

* Fuller: Department of Economics, University of Wisconsin Oshkosh, 800 Algoma Boulevard, Oshkosh, WI
54901 (e-mail: fullerdl@gmail.com); Ravikumar: Research Division, Federal Reserve Bank of St. Louis, PO Box
442, St. Louis, MO 63166 (e-mail: b.ravikumar@wustl.edu); Zhang: Department of Economics, Texas A&M
University, College Station, TX 77843 (e-mail: yuzhe-zhang@econmail.tamu.edu). We are grateful to Árpád
Ábrahám, Nicola Pavoni, an anonymous referee, seminar participants at the Federal Reserve Bank of St. Louis,
University of Missouri, and Toulouse School of Economics, and participants at the Workshop on Macroeconomic
Applications of Dynamic Games and Contracts, Midwest Macroeconomics Meeting, Midwest Theory Meeting,
Asia Meeting of the Econometric Society, Society for the Advancement of Economic Theory Conference, and
Tsinghua Workshop in Macroeconomics for their helpful comments. We would also like to thank George Fortier
for editorial assistance. The views expressed in this article are those of the authors and do not necessarily reflect the
views of the Federal Reserve Bank of St. Louis or the Federal Reserve System.

† Go to http://dx.doi.org/10.1257/mac.20130255 to visit the article page for additional materials and author
disclosure statement(s) or to comment in the online discussion forum.

http://dx.doi.org/10.1257/mac.20130255

mailto:fullerdl@gmail.com

mailto:b.ravikumar@wustl.edu

mailto:yuzhe-zhang@econmail.tamu.edu

http://dx.doi.org/10.1257/mac.20130255

250 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

issues by assuming that there is no search effort decision and that the wage offer
distribution is degenerate.1

In our model, there are two instruments to deter fraudulent collection of unem-
ployment benefits: tax/subsidy and monitoring. Both instruments are costly. The
first distorts consumption relative to full insurance, and the second has a direct cost.
We deliver a precommitment mechanism that optimally trades off between the two
instruments. Our mechanism allows both instruments to be fully history dependent.
As a result, the unemployed worker’s consumption (i.e., the unemployment bene-
fits) and the employed worker’s consumption vary over time.

Since employment is an absorbing state in our model, the treatment of the worker
who reports transitioning to employment is straightforward: constant consumption
forever and no monitoring. Since employment status is private information, the
worker who reports being unemployed is not fully insured and is monitored.

We consider two monitoring mechanisms: deterministic verification and stochas-
tic verification. Under deterministic verification, the worker is either verified with
probability one or not verified at all. We focus on this case for most of the paper since
it is simpler and makes the results more transparent. We show later that our results
remain the same under stochastic verification, where the worker is verified with a
probability between zero and one. That is, even though our deterministic mechanism
appears restrictive, the general mechanism of stochastic verification does not offer
any additional economic insights on unemployment insurance and monitoring.

Under deterministic verification, the optimal contract has three key features.
First, monitoring occurs at fixed intervals and is independent of history. Second,
the unemployment benefits decrease with the duration of unemployment between
monitoring dates and jump downward at every monitoring date. Third, there is a
nonmonotonic tax on employment.

The periodicity of monitoring follows from the fact that with CARA preferences
the worker’s utility flows in a new cycle are proportional to those in the previous
cycle. Hence, his incentive to commit fraud remains the same and he is monitored in
the same manner as in the previous cycle. Unemployment benefits decreasing with
duration is a familiar feature from the previous literature. Unemployment benefits
jump downward at the monitoring date because the unemployed worker’s premon-
itoring consumption is distorted upward. In our model, increasing the unemployed
worker’s premonitoring consumption benefits the truth-teller more than it benefits
the liar.2 Within a monitoring cycle, the employment tax increases with duration of
unemployment. The consumption for the worker who transitions to employment
earlier exceeds that of the worker who transitions later. However, the employment
tax decreases after the monitoring date. This is because the unemployed worker who
transitions to employment shortly after the monitoring date can conceal earnings

1 The literature on the optimal provision of unemployment insurance concentrates on moral hazard and exam-
ines incentives for optimal search effort (e.g., Baily 1978; Shavell and Weiss 1979; and Hopenhayn and Nicolini
1997). Hopenhayn and Nicolini (1997) and Wang and Williamson (2002) show that the search effort margin is
quantitatively insignificant: The unemployed worker’s optimal search effort almost equals what the current US
system implies.

2 For the same reason, in Mirrleesian taxation models with hidden ability, the labor supply of a low-ability
worker is distorted downward.

VOL. 7 nO. 2 251fuller et al.: unemployment insurance fraud

until the next monitoring date, while the worker who transitions to employment at
the monitoring date cannot.

Our optimal mechanism also deters fraud due to quits. This occurs when workers
quit their jobs, become unemployed, and start collecting unemployment benefits.
The incentives in our optimal contract ensure that the employed workers do not
engage in such behavior.3

To assess the empirical relevance of our theoretical analysis, we conduct a par-
tial equilibrium quantitative exercise similar to Hopenhayn and Nicolini (1997).
We find that the optimal monitoring cost is 60 percent of the cost incurred by the
US unemployment insurance system. Furthermore, using the same resources as
the US system, the optimal contract delivers higher utility to the average worker:
1.55 percent higher consumption at every date. This gain arises from two sources:
(i) improved consumption smoothing between employed and unemployed states,
and (ii) reduced monitoring costs (or higher average consumption). Almost all of
the gain in our optimal contract comes from (i). This is similar to the quantitative
finding in Hopenhayn and Nicolini (1997) and Wang and Williamson (2002). The
cost saving in their optimal contracts is due to improved consumption smoothing
and not due to faster transitions from unemployment to employment.

The remainder of the paper proceeds as follows. In Section I, we present the key
facts on unemployment insurance fraud. We also provide evidence that deterring
concealed earnings fraud involves a case-by-case investigation and, thus, a per case
cost, as in our model. Section II describes the model. In Section III, we establish two
properties of the optimal mechanism: scaling and periodic monitoring. In Section
IV, we use these properties to analyze the optimal unemployment insurance scheme
with exogenously given monitoring dates. Then, we characterize the optimal mon-
itoring dates in Section V. In Section VI, we show that our mechanism prevents
employed workers from quitting. In Section VII, we examine the stochastic moni-
toring case. In this section, we also describe the similarities and differences between
the insights from the deterministic mechanism and the insights from the stochastic
mechanism. We conclude in Section VIII.

I. Unemployment Insurance Fraud Data

In this section, we first briefly describe the program in place for determining
the accuracy of payments in the US unemployment insurance system. Second,
we provide details on the nature of “fraud” overpayments by category for 2007
(Appendix A provides information for more years). Third, we present data on how
these payments were detected. Finally, we discuss “off-the-books” employment.

Accuracy of Benefit payments.—Unemployment insurance benefits in the United
States are paid out by the states, with each state deciding its benefit levels and how
to finance the benefits. The US Department of Labor’s BAM (Benefit Accuracy

3 Hansen and Imrohoroglu (1992) study a model where unemployed workers can reject job offers and an exoge-
nous fraction of such workers are denied benefits. In our optimal mechanism, the unemployed worker who receives
a job offer has no incentive to refuse the offer.

252 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

Measurement) program determines the accuracy of these expenditures by choos-
ing a random sample of weekly unemployment insurance claims and determin-
ing whether there were any overpayments. The investigators also interview some
claimants if necessary. Some overpayments are simple errors in calculating benefits,
while some represent fraud overpayments.

The goal of the program is different from the goal of unemployment insurance
fraud investigators. While the latter look to recapture overpayments, BAM investi-
gators calculate statistics of the unemployment insurance program (see BAM State
Operations Handbook ET No. 495, 4th edition). We use these statistics throughout
the paper.

Overpayments Due to Fraud.—There are several types of unemployment insur-
ance fraud. Examples include collecting unemployment benefits while being
employed, after quitting a job, or after refusing a suitable job offer. Table 1 catego-
rizes the overpayments by type of fraud.

“Concealed Earnings” refers to cases where payments are made to individu-
als who are simultaneously earning wages and collecting unemployment benefits.
“Insufficient Job Search” refers to cases where individuals did not meet the manda-
tory work search requirement (e.g., a minimum number of job applications must be
filed each week). “Refused Suitable Offer” refers to cases where individuals were
offered a job deemed suitable, but rejected it. “Quits” and “Fired,” respectively, refer
to cases where payments are made to individuals who voluntarily left their jobs or
who were fired from their jobs for a valid reason (e.g., poor performance or missing
work). “Unavailable for Work” refers to cases where payments are made to individ-
uals who cannot work (e.g., disability).

Overpayments due to concealed earnings fraud in 2007 were ten times overpay-
ments due to unemployed agents not actively searching or refusing suitable work
(see Table 1). While the data indicate that concealed earnings fraud is the dominant
source of overpayments, it does not imply that moral hazard from reduced search
effort is unimportant for the design of unemployment insurance. It might be the case
that the current unemployment insurance system provides adequate incentives to
search but does not deter concealed earnings fraud.

Table 1—Unemployment Insurance Overpayments in the United States, 2007

Category Percent of fraud overpayment

Concealed earnings 60.06
Insufficient job search 4.95
Refused suitable offer 0.80
Quits 13.29
Fired 4.17
Unavailable for work 7.06
Other 9.67
Total 100.00

Source: BAM program, US Department of Labor. Note that these are our calculations. Our
definitions of each type of fraud differ slightly from those used in the BAM reports available
online.

VOL. 7 nO. 2 253fuller et al.: unemployment insurance fraud

Detection Technologies.—The detection technologies used by BAM are shown in
Table 2. For example, “Verification of search contact” refers to cases when the BAM
investigator verifies the potential job contact reported by the unemployed person;
“Claimant interview” is an interview with the person collecting benefits.

Since 2003 , states have used a cross-matching technology, comparing unem-
ployment insurance records with employment records. One might think concealed
earnings fraud could be automatically detected this way; however, only 7.0 per-
cent of the fraud cases are detected by cross-matching with the state’s directory
of new hires (see Table 2). For instance, cross-matching technology would not
automatically catch a worker who is collecting unemployment benefits in one state
while employed in another state. Furthermore, the directory of new hires is updated
monthly, so even within individual states some workers who truthfully report unem-
ployment in a specific week may show up in a cross-match of employment records
and be mistakenly flagged for fraud. In most cases when a worker appears in both
unemployment insurance records and employment records, further investigation is
necessary to determine if fraud has actually occurred.

In addition, the worker could commit a more nuanced form of concealed earnings
fraud by truthfully reporting the transition to employment but underreporting the
earnings. (The worker is entitled to collect some unemployment benefits as long
as the reported earnings are sufficiently low.) In 2007, roughly 40 percent of those
committing concealed earnings fraud reported positive earnings. Less than 2 percent
of these cases were detected by cross-matching the unemployment insurance records
with wage records (updated quarterly) in each state (see Table 2). In fact, employees
working in a sector not covered by the unemployment insurance system will never
show up in the state wage records (e.g., federal employees and self-employed).

These data suggest that more than 90 percent of the overpayments due to con-
cealed earnings fraud were not detectable under the automatic procedures available
to the state authorities. Instead, detection involves a case-by-case investigation and,
thus, a per-case cost of verification.

Working “Off-the-Books”.—A worker could collect unemployment benefits
while working “off-the-books” and being paid in cash. In such cases, verifying the
true employment status might be prohibitively expensive. However, the evidence

Table 2—Detection Technologies, 2007

Percent of concealed earnings
Detection method fraud overpayments detected by method

Verification of search contact 1.20
Verification of wages and/or separation 63.99
Claimant interview 10.06
Verification of eligibility with 3rd parties 1.26
Unemployment insurance records 13.69
Job/employment service records 0.16
Verification with union 0.65
Crossmatch with state directory of new hires 7.00
Crossmatch with state wage record files 1.99

Source: Benefit Accuracy Measurement Program, US Department of Labor

254 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

suggests that concealed earnings fraud is committed by workers in “official”
employment. While the worker is committing concealed earnings fraud, his weekly
earnings are similar to the weekly earnings in the preunemployment job (which,
by design, has to be official for the worker to collect unemployment benefits). In
2007, those committing concealed earnings fraud were earning 82 percent of their
previous job’s wages, on average. One-fourth of those committing this fraud were
earning more while collecting benefits than before they became unemployed. Such
relatively high earnings while committing fraud suggest official or “on-the-books”
employment rather than “off-the-books” employment.4

II. Model

The Unemployment Insurance authority is a risk-neutral principal with a discount
rate r > 0 . She provides insurance to a risk-averse worker, whose preferences are
given by

E [ ∫ 0
∞

e −rt rv (c(t)) dt ] ,

where c(t) is consumption at time t , v(c) = − e −ρc is a CARA utility function with
risk aversion ρ , r is the discount rate, and E is the expectation operator. Note that
the flow utility is rv(c) and that the agent’s subjective discount rate is the same as
the principal’s.

A worker can be either employed with wage w > 0 or unemployed with wage
zero. The worker is unemployed at t = 0 and transitions to employment with Pois-
son rate π > 0 . We assume that employment is permanent. (For similar assump-
tions, see the unemployment insurance model of Hopenhayn and Nicolini 1997 and
the disability insurance model of Golosov and Tsyvinski 2006.)

The worker’s employment status is private information, so an employed worker
can claim to be unemployed and continue collecting the unemployment benefits. We
refer to this as fraud. The principal can verify the worker’s unemployment report
at a cost of γ units of the consumption good. Verification reveals the worker’s true
employment status.

We study precommitment mechanisms that efficiently deliver unemployment
benefits and deter fraud. In addition to the tax/subsidy instrument used by the
unemployment insurance literature, our mechanism uses the monitoring instrument
to provide incentives.5

We assume that the principal always collects the wage, so an unemployed worker
can never claim to be employed. Hence, there is no need for verification when
the worker reports a transition to employment. Furthermore, since employment is
an absorbing state, verification is unnecessary forever if the worker reports to be

4 The BAM program detects 10.0 percent of the fraud overpayments by interviewing the claimants (see Table 2).
Such interviews might reveal some cash earnings.

5 See Setty (2014) for a model of optimal unemployment insurance where the agent’s search effort is monitored.
Empirically, as noted in Table 1, fraudulent behavior in search effort is not as costly as concealment of earnings.

VOL. 7 nO. 2 255fuller et al.: unemployment insurance fraud

employed just once in the past. The incentive problem then reduces to ensuring that
an employed worker does not claim to be unemployed.

We focus on deterministic verification mechanisms. In each period the worker is
either verified with probability one or not verified at all. This mechanism is subop-
timal; it is dominated by a stochastic verification mechanism in our environment.
One may then ask why study the deterministic case? Our goal is to characterize
the optimal combination of the two instruments: tax/subsidy and monitoring. In
Section VII, we show that the key economic insights on these two instruments are
nearly identical in both the deterministic and stochastic cases. In both cases, optimal
monitoring and employment tax have the same pattern. The stochastic monitoring
case requires cumbersome notation and provides less intuition, so we start by ana-
lyzing the deterministic case.

In our deterministic mechanism, the verification in any period is based on the
history of employment status reports and past verifications outcomes. Since verifi-
cation is necessary only for agents who have been reporting unemployment in every
period in the past, a sufficient statistic for history is the duration of unemployment
reports. In other words, at t = 0 the principal commits to all future verification
periods, mapping durations of unemployment reports to {0, 1} . In a verification
period, clearly no worker would misreport. (Any penalty ϵ > 0 induces truth tell-
ing in the verification period.) Thus, the principal does not have to keep track of
the outcomes of past verifications. We represent the set of verification periods as
{ m i ; i = 1, 2, …} , where m i is the date of the i th verification.6

The timing is as follows. In the initial period, the worker is unemployed. Then
the stochastic job opportunity arrives. The worker either remains unemployed or
transitions to employment. He then chooses to report either employment or unem-
ployment to the principal. Conditional on the unemployment report, the principal
verifies the true employment status if the period is a verification period. Then, condi-
tional on the report and the outcome of the verification, the principal assigns current
and future consumptions. In subsequent periods, if the worker reported employment
in the past, he is in an absorbing state and no further reports are necessary. If the
worker reported unemployment in every period in the past, then the sequence of
events is the same as in the initial period.

If an unemployed worker transitions to employment at t , let c E (t, s) denote his
consumption at time s ≥ t . Because the principal and the worker have the same
discount rate and employment is an absorbing state, efficiency requires that the
worker’s consumption remain constant after t for all s . We therefore suppress s in

c E (t, s) and denote this constant level of consumption as c E (t) . The flow utility
from this level of consumption then is rv ( c E (t)) . We denote the discounted sum
of utilities to a worker who accepts a job offer for the first time at t as E(t) , i.e.,
E(t) = ∫ t

∞ e −r(s−t) rv ( c E (t)) ds = v ( c E (t)) . Since employment status is private
information, E(t) is also the continuation utility to a worker who accepted an offer
before t , but reports employment for the first time at t .

6 There is no loss of generality in assuming a countable collection of verification periods. Since each verification
costs γ > 0 , the principal would not want to verify infinitely many times in any finite time interval.

256 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

An unemployed worker’s consumption at t is denoted by c U (t) and his flow utility
is rv ( c U (t)) . His continuation utility,

U(t) ≡ ∫

∞

e −r(x−t) e −π(x−t) rv( c U (x)) dx + ∫
t

∞

e −r(x−t) e −π(x−t) πE(x) dx,

is the sum of expected utilities before and after the transition ( e −π(x−t) in the first
integral is the conditional probability of remaining unemployed at date x and
e −π(x−t) π in the second integral is the density function of the transition time). Hence,

(1)

U(t) = ∫
t

∞

e −(r+π)(x−t) (πE(x) + ru(x)) dx

= ∫
t

s
e −(r+π)(x−t) (πE(x) + ru(x)) dx + e −(r+π)(s−t) U(s), for all t < s,

where u(x) ≡ v ( c U (x)) . We will refer to (1) as promise-keeping constraints.
The principal commits at t = 0 to verification periods { m i ; i = 1, 2, …} and

consumptions { ( c E (t), c U (t)) ; t ≥ 0} . The verification periods and consumptions
are history dependent. We denote this precommitment contract as σ .

incentive compatibility requires that a worker who transitioned to employment at
t ∈ ( m i , m i+1 ) does not have the incentive to delay the report of the transition to a
later time s ∈ (t, m i+1 ) , i.e., report unemployment and commit fraud from t to s and
then report employment from s onward:

(2) E(t) ≥ ∫
t

s
e −r(x−t) rv ( c U (x) + w) dx + e −r(s−t) E(s), ∀ s ∈ (t, m i+1 ).

Note that the worker cannot delay the report beyond the next verification period m i+1 .
We restrict contract allocations to

(3) E(t) ≥ U(t), for all t .

Restriction (3) rules out the fraud due to refusal of offers noted in Table 1 (0.8 per-
cent of total fraud overpayments). This restriction can be derived by adding a job
refusal option to our model. For ease of exposition, we have imposed the restric-
tion on the mechanism; Appendix B describes the job refusal option and derives
this restriction.

The expected cost for the principal is

c(σ) = ∫
0

∞

e −(r+π)t (π c E (t) + r c U (t)) dt + ∑
i

e −(r+π) m i γ .

There should, in fact, be an additional term in c(σ) : the discounted income obtained
by the principal, πw ____ r + π

However, unlike the unemployment insurance literature that

VOL. 7 nO. 2 257fuller et al.: unemployment insurance fraud

endogenizes job-finding probabilities, the discounted income in our model is a con-
stant, so it does not affect the optimal σ .

The principal’s problem is to find an incentive compatible σ that minimizes c(σ)
and delivers the initial promised utility U(0) , i.e.,

(4) min
σ

c(σ)

subject to U(0) = ∫
0

∞

e −(r+π)t (πE(t) + ru(t)) dt,

and constraints (2), (3) .

With a slight abuse of notation, denote the principal’s cost function as c(U(0)) .7

III. A Simplification of the Optimal Contract

We begin our analysis by presenting two features of the optimal contract. In
Section IIIA, we establish a “scaling” property. Then, in Section IIIB, we show that
the optimal monitoring is periodic. These properties simplify our analysis of the
optimal contract by narrowing the search of a solution to problem (4) to a smaller
space.

To help us simplify, we rewrite problem (4) in terms of continuation utilities E( · ) ,
U( · ) , and flow variable u( · ) , instead of consumptions. The objective becomes

c(σ) = ∫
0

∞

e −(r+π)t (πc(E(t)) + rc(u(t))) dt + ∑
i

e −(r+π) m i γ,

where c : (−∞, 0) → 핉 denotes the inverse of the utility function:

(5) c(v) = −log (−v)/ρ .

The incentive constraint (2) becomes

(6) E(t) ≥ ∫
t

s
e −r (x−t) e −ρw ru(x) dx + e −r (s−t) E(s), ∀ s ∈ (t, m i+1 ),

since CARA utility implies that v ( c U (x) + w) = e −ρw v ( c U (x)) = e −ρw u(x) .

7 Ravikumar and Zhang (2012) analyze the problem of tax compliance in a costly state verification model where
the verification technology is imperfect (a low-income agent might be mistakenly labeled as high income). They
solve for the principal’s cost function using the Hamilton-Jacobi-Bellman equation. In contrast, we study optimal
unemployment insurance in an environment with a perfect verification technology. We characterize the path of
unemployment benefits by formulating the optimal control problem and using the Pontryagin minimum principle.

258 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

A. Scalin

Our mechanism exhibits a scaling property: If the initial promise U(0) is scaled
by α > 0 , then the optimal contract is also scaled by α . More formally,

LEMMA 1: if { (U(t), E(t), u(t)) ; t ≥ 0} are optimal utilities for initial promise
U(0) , then the optimal utilities for initial promise αU(0) are

{ (αU(t), αE(t), αu(t)) ; t ≥ 0} .

Alternatively, Lemma 1 states that the consumption of the worker with initial
promise αU(0) differs from that of the worker with promise U(0) by a constant,
−log (α)/ρ , at all dates and states.

The scaling property in Lemma 1 is related to the fact that CARA utility has
no wealth effect. Although a worker with high promised utility consumes (perma-
nently) more than a worker with low promised utility, the level of promised utility
does not have an effect on the worker’s incentives to conceal earnings. In other
words, the incentive constraint (6) holds when all of the utilities are scaled by the
same factor.

Since the incentives to conceal earnings are the same for workers with different
promised utilities, the optimal sequence of monitoring dates, { m i ; i ≥ 1} , is inde-
pendent of the initial promised utility. Again, no wealth effect implies that the level
of promised utility does not change how the worker is monitored, even if it does
change the worker’s consumption.

B. periodicity

At time 0 , the principal knows the true employment status of the agent. After the
verification at m 1 , the principal again knows the true employment status. Hence, the
continuation problem at m 1 is the same as the problem at time 0 , except for the “ini-
tial” promised utility. The scaling property implies that, if U( m 1 ) = αU(0) , then
the optimal utilities from m 1 forward are scaled by α . Thus, starting with a promise
U(0) , if the principal finds it optimal to monitor the unemployed agent at m 1 , then
it must be the case that starting with the promise αU(0), the principal would again
find it optimal to monitor at m 1 . Put differently, having monitored the agent at m 1 ,
the next optimal monitoring period is 2 m 1 . We immediately conclude the following:

PROPOSITION 1: The optimal monitoring is periodic, i.e., m i = i m 1 for all i ≥ 1 .

To understand the intuition for the periodic monitoring, consider policies where
the interval between verifications is either increasing or decreasing over time. First,
it is suboptimal for the planner to verify more frequently at the beginning. Since
the worker starts out unemployed, he stays unemployed for some duration initially.
Frequent verifications early on merely incur unnecessary verification cost. Second,
one might think that it is optimal to verify more frequently later since the proba-
bility of a long duration of unemployment is small. However, this policy is also

VOL. 7 nO. 2 259fuller et al.: unemployment insurance fraud

sub

optimal.

The worker’s conditional probability of transitioning to employment is
independent of how long he has been unemployed. Moreover, because the princi-
pal knows the true employment status after each verification, the scaling property
implies that from the principal’s perspective the worker who was just verified to be
unemployed is no different from the worker at time zero. Thus, the interval between
consecutive monitoring periods is a constant.

While we have established that the optimal monitoring is periodic, finding the
optimal periodicity is difficult. To determine the optimal m 1, we must first determine
the optimal utilities in the intervals [0, m 1 ] , [ m 1 , 2 m 1 ] , etc. Toward this end, we break
the principal’s problem into two steps. First, assume that m 1 is exogenous and the
principal learns the agent’s employment status at dates m 1 , 2 m 1 , etc. Given m 1 , the
principal solves for the endogenous utility paths in [0, m 1 ] , [ m 1 , 2 m 1 ] , etc. Second,
the principal chooses m 1 optimally. We analyze the first step in the next section and
the second step in Section V.

IV. Optimal Unemployment Insurance with Exogenous Monitoring

Given the simplification in Section III, we now present the features of the optimal
unemployment insurance scheme. For a given m 1 , we first formulate the optimal
control problem in Section IVA. This allows us to analyze the time paths of the vari-
ables of interest. We then describe some features of the continuation utilities E( · )
and U( · ) in Section IVB and use these features to illustrate the employment tax in
Section IVC and unemployment benefits in Section IVD. Finally, in Section IVE
we use the Pontryagin Minimum Principle to explicitly characterize E( · ) and U( · ) .

A. Optimal control

Following Zhang (2009), we formulate the principal’s problem for interval [0, m 1 ]
as one of optimal control. Our analysis for [0, m 1 ] applies to other intervals as well.

First, we rewrite the constraints recursively. The promise-keeping constraint (1)
is equivalent to the differential equation:

U′ (t) = r (U(t) − u(t)) + π (U(t) − E(t)) .

On the right side of the differential equation, the first term is the rate of change of U
when there is no uncertainty (i.e., when there is no transition to employment), and
the second term captures the additional rate of change due to uncertainty.

The incentive constraint (6) is equivalent to the following differential inequality:

(7) r (v ( c U (t) + w) − v ( c E (t)) ) + E′(t) ≤ 0 .

That is, the short-term benefit that the agent gets from fraud, r (v ( c U (t) + w) −
v ( c E (t)) ) , is offset by lower continuation utility he receives after he delays the
employment report. Note that E( · ) could have downward jumps: When E(t) >
lim s↓t E(s) , we interpret the discontinuity as E′(t) = −∞ , and the differential

260 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

inequality (7) still holds under this interpretation. Introducing a slack variable
μ(t) ≥ 0 , we may rewrite (7) as

E′(t) = rE(t) − e −ρw ru(t) − μ(t) .

In Lemma 4 in Appendix C, we show that the above differential equation and
inequality are equivalent to (1) and (6).

Second, the scaling property implies that the cost function c( · ) satisfies

c(αU ) = c(U ) − log (α)/ρ .

Recalling the definition of c( · ) in (5), we rewrite c(U ) as

(8) c (U ) = c (|U |(−1)) = c (−1) − log (−U )/ρ ≡ ψ + c(U ),

where ψ ≡ c (−1) is the cost of private information: It is the one-time cost that
the principal is willing to pay to permanently remove private information from the
model.

With ψ + c(U( m 1 )) as the continuation cost at m 1 , we rewrite the principal’s
problem as one of optimal control with a convex objective and linear constraints:

(9) min

u(t), U(t), E(t),

0≤t≤ m 1

∫

m 1

e −(r+π)t (πc(E(t)) + rc(u(t))) dt

+ e −(r+π) m 1 (γ + ψ + c(U( m 1 )))

(10) subject to U′(t) = (r + π)U(t) − πE(t) − ru(t),

(11) E′(t) = rE(t) − e −ρw ru(t) − μ(t) ,

(12) E(t) ≥ U(t),

U(0) is given.

B. continuation Utilities

The continuation utilities E( · ) and U( · ) help us uncover the consumption paths
for the employed and the unemployed. We focus on the properties of E( · ) and
U( · ) in [0, m 1 ] ; those in other monitoring cycles can be obtained by scaling (see
Lemma 1).

We demonstrate five properties:

(i) E(t) > E(s) for t < s ≤ m 1 .

(ii) E(t) > U(t) for all t < m 1 .

VOL. 7 nO. 2 261fuller et al.: unemployment insurance fraud

(iii) E( m 1 ) = U( m 1 ) .

(iv) E( · ) jumps up immediately after m 1 .

(v) U( · ) declines over time.

Property (i) states that the payoff to a worker who reports the transition to employ-
ment earlier is higher than the payoff to one who reports the transition later. The
worker who transitions to employment at t but commits fraud consumes c U (t) + w
at t , whereas the worker who tells the truth consumes c E (t) . It is intuitive that
c E (t) < c U (t) + w ; otherwise deterring fraud would not be an issue. In terms of utilities, E(t) < e −ρw u(t) . Incentive compatibility (11) requires that delaying the report yields a lower payoff (see Figure 1). Thus, E(t) > E(s) within a monitoring
cycle.

For property (ii), recall that restriction (12) imposes E(t) must be greater than
or equal to U(t) . If the agent who transitions to employment before m 1 is offered
the same payoff as the agent who remains unemployed, then the employed agent
will claim to be unemployed and consume more than the unemployed agent. He
can continue cheating until the verification period m 1 (see Figure 2). Thus, within a
monitoring cycle, E(t) must be greater than U(t) .

To understand (iii), note that the true employment status is revealed at m 1 , so the
principal does not face an incentive problem at that instant. Hence, there is no reason
to reward the (lucky) agent who transitioned to employment at m 1 relative to the
(unlucky) agent who remains unemployed, i.e., no reason to set E( m 1 ) > U( m 1 ) .
Thus, E( m 1 ) = U( m 1 ) . (Again, recall restriction (12): E(t) ≥ U(t) for all t .)

Property (iv) states that U( m 1 ) = E( m 1 ) < E( m 1 +) , where E( m 1 +) is the utility for a worker who is unemployed at m 1, but transitions to employment immediately

Figure 1. Lower Payoff for Late Reporters ( E(t) > E(s) for t < s )

E(t )

E(s )

C
o

tin

u
a
tio

n
u

til
iti

e
s

Transition time

262 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

after m 1 , i.e., E( m 1 +) = lim t↓ m 1

E(t) (see Figure 3). Suppose, to the contrary, that

U( m 1 ) = E( m 1 +) . Then incentive compatibility in [ m 1 , 2 m 1 ] would be violated
because the worker employed immediately after m 1 can claim to be unemployed
and consume more than the employed worker until the next verification period, 2 m 1 .
Note that if there is no verification at date t , then an upward jump in E( · ) violates
the incentive constraint: A worker who transitions to employment prior to t would
benefit from delaying the employment report. At the moment of verification, how-
ever, the worker cannot delay the employment report since the true employment
status is revealed.

C
o
n
tin
u
a
tio
n
u
til
iti
e
s

Time

C
o
n
tin
u
a
tio
n
u
til
iti
e
s
Time
U
E
m1

Figure 2. Continuation Utilities E( · ) and U( · ) in [0, m 1 ] .

Figure 3. Continuation Utility E( · ) is Nonmonotonic

VOL. 7 nO. 2 263fuller et al.: unemployment insurance fraud

To understand why U( · ) declines, suppose U( m 1 ) > U(0) . Then lowering U( m 1 )
has two benefits. First, the unemployed agent’s continuation utility path is flatter,
which implies better insurance for the unemployed. Second, lower U( m 1 ) (and
E( m 1 ) ) reduces E′( · ) , generating stronger incentives to deter fraud. In addition,
U( · ) can never jump. Because U( · ) is the promised utility to the unemployed agent,
any jump in U( · ) would violate the promise-keeping constraint.

C. Employment Tax

Here we examine the consumption allocated to the agent who reports employ-
ment earlier relative to the consumption for the agent who reports it later. Recall
that E(t) > E(s) within a monitoring cycle and the continuation utility E( · ) jumps
up after verification. Since employment is an absorbing state, any agent who reports
a transition to employment at t is allocated constant consumption c E (t) forever and
is not monitored. Thus, E(t) maps into c E (t) instant by instant and, hence, c E (t) >
c E (s) within a monitoring cycle. Furthermore, the consumption for the agent who
reports the transition to employment immediately after m 1 is higher than that for the
employed agent at m 1 (see Figure 4).

The nonmonotonicity is closely related to the way incentives are provided in
our model. Within a cycle, the principal does not monitor and relies exclusively on
consumption distortions to induce truth-telling: c E must fall sufficiently fast for the
worker not to postpone his report of employment. At m 1 , c E falls to a level such
that the agent is indifferent between transitioning to employment and remaining
unemployed. The principal can perfectly insure the agent against the unemployment
shock at m 1 because the true employment status is revealed. Immediately after

Time

C E

m1 2m1

Figure 4. Permanent Consumption for Workers
Who Transition to Employment in Different Periods

264 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

m 1 , the principal treats the worker employed right after m 1 better than the worker
employed at m 1 . This is because the worker who transitions to employment after m 1
can commit fraud until the next monitoring period, while the worker who transitions
to employment at m 1 cannot commit fraud. Hence, the principal must offer the for-
mer a higher permanent consumption to induce truth-telling.

The difference between wage w and consumption c E can be interpreted as an
employment tax. Our contract implies that within a verification cycle, the employ-
ment tax for late reporters is higher than that for the early reporters. However, unlike
the existing unemployment insurance literature, the employment tax is nonmono-
tonic: It decreases immediately following verification.

D. Unemployment Benefits

Unlike the case where c E (t) maps into E(t) at every instant, c U (t) is not pinned
down at every instant by U(t) , since the unemployed agent is not fully insured.
Instead, the path of c U ( · ) in [0, m 1 ] requires knowledge of the entire path of U( · )
in the interval. We obtain the entire trajectories of c U ( · ) and U( · ) after solving (9)
in Section IVE. However, monotonicity of U( · ) in Section IVB suggests that c U ( · )
declines with unemployment duration. As in Hopenhayn and Nicolini (1997), our
contract implies that the unemployment benefit c U eventually reaches an arbitrarily
low level with positive probability.8

Figure 5 shows that the unemployment benefits jump down at the verification
period. To understand the jump, we argue that it is optimal for the principal to set
u(t) above u( m 1 ) when m 1 − t > 0 is small. Doing this relaxes the incentive con-
straint at time t , as the following variational argument shows. The promise-keeping
constraint at m 1 − δ , for a small positive δ , is

U( m 1 − δ) = rδu( m 1 − δ) + e −rδ [(πδ)E( m 1 ) + (1 − πδ)U( m 1 )]

= rδu( m 1 − δ) + e −rδ U( m 1 ) ,

where the second equality uses the aforementioned property E( m 1 ) = U( m 1 ) . The
incentive constraint at m 1 − δ is

E( m 1 − δ) ≥ rδ e −ρw u( m 1 − δ) + e −rδ E( m 1 ) .

Suppose u( m 1 − δ) = u( m 1 ) . Then the principal can maintain the promise-keeping
constraint but relax the incentive constraint by increasing u( m 1 − δ) and decreasing
u( m 1 ) . Specifically, consider the variation

u ̃ ( m 1 − δ) = u( m 1 − δ) + e −rδ ϵ, u ̃ ( m 1 ) = u( m 1 ) − ϵ, E ̃ ( m 1 ) = E( m 1 ) − rδϵ .

8 In contrast to Hopenhayn and Nicolini (1997) and our paper, Pavoni (2007) imposes an exogenous lower
bound on promised utility and shows that the optimal benefits decrease with the duration of unemployment, but
remain constant after the promised utility reaches the lower bound. Alvarez-Parra and Sanchez (2009) show a sim-
ilar result in a model with an endogenous lower bound on promised utility.

VOL. 7 nO. 2 265fuller et al.: unemployment insurance fraud

Because the unemployed worker’s consumption after m 1 remains unchanged,
his continuation utility at m 1 is U ̃ ( m 1 ) = U( m 1 ) − rδϵ , which is equal to
E ̃ ( m 1 ) . Therefore, the promise-keeping constraint U( m 1 − δ) = rδ u ̃ ( m 1 − δ) +
e −rδ U ̃ ( m 1 ) still holds, and the incentive constraint is relaxed:

rδ e −ρw u ̃ ( m 1 − δ) + e −rδ E ̃ ( m 1 ) = rδ e −ρw u( m 1 − δ) + e −rδ E( m 1 ) − (1 − e −ρw )rδϵ

< rδ e −ρw u( m 1 − δ) + e −rδ E( m 1 ) .

Starting from u( m 1 − δ) = u( m 1 ) , the additional cost of consumption incurred by
this variation is second order, but the effect on incentive constraint is first order.
Hence, the principal always chooses u(t) above u( m 1 ) when t is close to (but
below) m 1 .

We summarize these findings in the following proposition. The proof is in
Appendix C.

PROPOSITION 2: The unemployment benefit, c U ( · ) is monotonically decreasing
with unemployment duration, with downward jumps at verification, while c E ( · ) is
nonmonotonic: it decreases between verifications with upward jumps immediately
after verification.

Unemployment insurance systems in many countries feature benefits schemes
similar to the one in Proposition 2. For example, in Spain, workers receive a replace-
ment rate of 70 percent for the first 6 months of unemployment, 60 percent for the
next 18 months, and a minimum payment thereafter.

Figure 5. Consumption for the Unemployed

U
n
e
m

p
lo

ym
e
n
t
b
e
n
e
fit

s
Time

c U

m1 2m1

266 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

E. pontryagin Minimum principle

We construct a solution to the optimal control problem (9) in which the incentive
constraint (11) binds (i.e., μ(t) = 0 ) for all t < m 1 . The problem faced by the principal is to choose an initial state E(0) and a time path u( · ) to minimize the cost in (9), given U(0) . The promise-keeping and incentive constraints (10) and (11) then imply a time path (U( · ), E( · )) for continuation utilities. One way to think about this problem is to think of choosing u(t) at each date, given the values of U(t) and E(t) that have been attained by that date. The principal faces a tradeoff between the current period cost and the cost of delivering continuation utilities. Hence, she needs to set “prices,” Φ and λ , on increments to the continuation utilities U and E . Because it is costly for the principal to maintain a low E as a threat, it must be the case that λ ≤ 0 . Moreover, we have argued in Section IVB that E(t) ≥ U(t) is slack except at m 1 , so we impose only the constraint E( m 1 ) = U( m 1 ) .

A central construct in the optimal control problem is the current value
Hamiltonian ℋ defined by

ℋ = πc(E(t)) + rc(u(t)) + Φ(t)((r + π)U(t) − πE(t) − ru(t))

+ λ(t)(rE(t) − e −ρw ru(t)) ,

which is just the sum of current period cost and the rate of increase in continuation
utilities valued at Φ(t) and λ(t) . An optimal allocation must minimize ℋ at each
date t .

The first-order condition for minimizing ℋ with respect to u is

(13) c′(u) = Φ + e −ρw λ .

The left-hand side is the marginal cost of today’s utility, while the right-hand side
is the marginal cost of starting with higher continuation utility U tomorrow, offset
by the benefit of a slacker incentive constraint (it is a benefit because λ ≤ 0 ). The
utility u must be chosen to equalize the costs at each date.

The prices Φ and λ must satisfy

(14) Φ′(t) = (r + π)Φ − ∂ℋ ___ ∂U = 0,

(15) λ′(t) = (r + π)λ − ∂ℋ ___ ∂E = π(Φ − c′(E) + λ),

at each date t if (u( · ), U( · ), E( · )) is an optimal path. Equation (14) implies that
Φ(t) is a constant. Moreover, since multiplier Φ(0) is the marginal cost of U(0) , we
have

Φ = c′(U(0)) = −(ρU(0) ) −1 > 0 .

VOL. 7 nO. 2 267fuller et al.: unemployment insurance fraud

Since the planner can choose E(0) freely,

(16) λ(0) = 0 .

At m 1 , the shadow prices Φ and λ( m 1 ) must satisfy

(17) Φ = −κ + c′(U( m 1 )),

(18) λ( m 1 ) = κ,

where e −(r+π) m 1 κ is the multiplier on the constraint E( m 1 ) = U( m 1 ) . Since the
principal’s problem is convex, these conditions (13–18) are both necessary and suf-
ficient for a minimum.

When (11) holds as equality, the states (U, E) and the costate λ satisfy differen-
tial equations:

(19) U′(t) = (r + π)U − πE − ru,

(20) E′(t) = rE − r e −ρw u,

(21) λ′(t) = π(Φ − c′(E) + λ) .

The ODE system contains three variables and would be difficult to analyze in a
general context. However, we can solve (20) and (21) regardless of (19) because
neither (20) nor (21) relies on U . Once (20) and (21) are solved, it is easy to solve
(19). Formally,

LEMMA 2: if (20) and (21) hold, then (19) holds if and only if

(22) ΦU(t) + λ(t)E(t) + ρ −1 = 0, ∀ t ∈ [0, m 1 ] .

To solve the reduced ODE system, (20) and (21), we need two boundary condi-
tions. The first is (16), λ(0) = 0 . The second cannot be a value for E(0) , as E(0)
is endogenous and unknown a priori. We obtain the second boundary condition,
E( m 1 ) = − ρ −1 (Φ + λ( m 1 )) −1 , from E( m 1 ) = U( m 1 ) and equation (22).

The following lemma shows that these two boundary conditions pin down a
unique solution curve for the system (20) and (21). Figure 6 shows the phase dia-
gram. That λ < 0 implies that the incentive constraint binds for all t < m 1 .

LEMMA 3: For any m 1 > 0 , there is a unique initial condition E(0) such that the
solution starting at (λ(0) = 0, E(0)) satisfies E( m 1 ) = − ρ −1 (Φ + λ( m 1 )) −1 .

268 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

V. Optimal Monitoring

Until this point, we have taken m 1 as exogenous. In this section, we characterize
the optimal choice of m 1 . The tradeoff in choosing m 1 is as follows. Monitoring more
frequently implies higher verification cost, but the principal can provide better insur-
ance: The consumption path for the unemployed is similar to that for the employed.
Monitoring less frequently implies lower verification cost but worse insurance.

For any m 1 > 0 , denote the minimized cost in (9) as 𝒞( m 1 ); that is,

𝒞( m 1 ) = ∫ 0
m 1

e −(r+π)t (πc(E(t)) + rc(u(t))) dt

+ e −(r+π) m 1
(γ + ψ + c(U( m 1 ))) .

Intuitively, delaying monitoring (i.e., a small increase in m 1 ) saves the principal both
the cost of monitoring and the cost of (after-monitoring) consumptions, because
the payment of γ + ψ + c(U( m 1 )) is postponed. By doing so, however, the prin-
cipal must maintain the consumptions c(E( · )) and c(u( · )) for a longer duration.
Subtracting the benefit from the cost (algebraic details in Appendix C) yields

𝒞′( m 1 ) = e −(r+π) m 1
(r ρ

−1 log (
Φ + e −ρw λ( m 1 ) _____________

Φ + λ( m 1 )

) − (r + π)(γ + ψ)) .

Figure 6. Phase Diagram for (λ, E )

E(0)

(0, 0)

E = −ρ−1(Φ + λ)−1

VOL. 7 nO. 2 269fuller et al.: unemployment insurance fraud

Thus, the first-order condition for m 1 is

(23) r ρ −1 log (
Φ + e −ρw λ( m 1 ) ____________

Φ + λ( m 1 )
) = (r + π)(γ + ψ) .

PROPOSITION 3: The optimal m 1 is the unique solution to (23). That is, (23) is
both necessary and sufficient for the minimum of 𝒞( m 1 ) .

REMARk 1: Although our analysis relies on an undetermined parameter ψ , the
parameter can be uniquely pinned down by a fixed-point condition that the actual
cost function at time zero must equal the conjectured function ψ + c(U(0)) . Further
details are in Appendix c.

REMARk 2: Our analytical results rely on the assumption of cArA preferences.
Unlike the cArA case where the length of the monitoring cycle is independent of
history, the cycle length in the crrA case depends on the worker’s continuation
utility. However, most of the main features of the optimal contract remain valid
even if the worker has crrA preferences. We demonstrate this through a numerical
example in Fuller, ravikumar, and Zhang (2013).

VI. Quits

Another type of fraud that could arise in our model is quits. An agent in our model
could transition to employment in period t , claim to be unemployed until almost m 1 ,
and then quit to become unemployed at m 1 . The verification at m 1 would not reveal
him to be a cheater. Thus, quitting is possible in our model.

Our mechanism guarantees that the agent does not commit such a fraud. The
continuation utilities E( · ) and U( · ) are such that the agent is indifferent between
reporting the transition immediately and delaying it to the next period. By following
the path above and quitting at m 1 , he becomes truly unemployed, is subject to the
stochastic arrival rate of employment opportunity, and is worse off.

Hopenhayn and Nicolini (2009) examine a model where quits cannot be distin-
guished from layoffs and the only fraudulent behavior is quits. In their model, the
employment status is observable and nonabsorbing, and disutility from working is
greater than that from searching for employment. Employed agents might want to
opportunistically quit their job, enjoy more leisure, and collect unemployment bene-
fits. To discourage quits, the principal offers (i) higher consumption to the employed
workers who stay on the job longer and (ii) more generous benefits to unemployed
workers with longer employment spells, as quitters have shorter employment spells
on average. In our model, the utility functions for the unemployed worker and the
employed worker are the same, and employment status is private information. Since
employment is an absorbing state, quitting as considered in Hopenhayn and Nicolini
(2009) cannot arise in our model. The potential reason for quitting in our model is
to cover up the fraudulent collection of unemployment benefits before the verifica-
tion period. Our optimal mechanism provides incentives for the agent not to delay
reporting his transition to employment and not to conceal his earnings.

270 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

Overpayment due to quits is small relative to the overpayment due to concealed
earnings (see Table 1). Our mechanism deters fraud due to both concealed earnings
and quits.

VII. Stochastic Verification

Our monitoring mechanism in the previous sections was restricted to determin-
istic verification. Here we consider a more general mechanism where the principal
verifies randomly after receiving the unemployment report. As in Section II, veri-
fication reveals the worker’s true employment status. (In Appendix E, we consider
an imperfect verification technology: An unemployed worker might be erroneously
labeled as employed.) Conditional on the unemployment report at t , the principal
chooses the monitoring Poisson rate p(t) ≥ 0 . That is, over a period of length dt ,
the principal monitors with probability p(t) dt and she does not monitor with proba-
bility 1 − p(t) dt . (Since our model is in continuous time, p(t) is not the monitoring
probability.)

We assume that if a worker is monitored and caught cheating, he has to pay a
finite penalty forever. With infinite penalty, an arbitrarily small monitoring prob-
ability would deliver the full-information constant consumption. In our model, if
the principal can choose any finite penalty between 0 and ϕ > 0 , he would always
choose ϕ . Henceforth, we assume that the finite penalty is ϕ units of the consump-
tion good, forever.

Similar to (10) and (11), the promise-keeping constraint and incentive constraint
are

(24) U ′ = r(U − u) − π(E − U ) − p( U ̃ − U ),

(25) E′ ≤ rE − r e −ρw u − p( e ρϕ − 1)E,

where U ̃ is the unemployed agent’s continuation utility after monitoring. Because
the probability that monitoring does not occur in [0, t) is e − ∫ 0

t p(s)ds , the principal’s
objective is

(26) ∫
0

∞

e −(r+π)t− ∫ 0
t p(s) ds (πc(E(t)) + rc(u(t)) + p(t)(γ + c( U ̃ (t)))) dt .

The principal chooses the utilities {U(t), E(t), u(t), U ̃ (t); t ≥ 0} and the arrival
rates of monitoring { p(t); t ≥ 0} to minimize (26) subject to (24), (25), and the
constraint E(t) ≥ U(t) , ∀ t ≥ 0 .

Since the penalty for a worker with high promised utility is the same as that for a
worker with low promised utility, we obtain a scaling property similar to the one in
Section IIIA. Thus, the incentives to conceal earnings are the same for workers with
different promised utilities. Similar to our model with deterministic verification, we
show in Proposition 4 that the optimal stochastic verification mechanism consists of
cycles. See Appendix D for the proof.

VOL. 7 nO. 2 271fuller et al.: unemployment insurance fraud

PROPOSITION 4: There exists an n > 0 such that the principal monitors the
unemployed with a constant arrival rate p > 0 if and only if t ≥ n . Before n ,
the time path (U( · ), E( · )) converges to the 45-degree line; after n , it moves
along the 45-degree line toward (−∞, −∞) until the agent is randomly drawn to
be verified. After the verification, (U, E) jumps to a new state ( U ̃ , E ̃ ) and a new
cycle starts.

The unemployed worker is in one of two states: (i) not monitored
(i.e., p(t) = 0 ) or (ii) randomly drawn to be monitored (i.e., p(t) ≡ p > 0 ).
Within each cycle, an unemployed worker is initially in the not-monitored state.
He is moved to the random monitoring state if the duration of his unemployment
report exceeds the threshold n . If he is randomly drawn to be monitored, then
he is moved to the not-monitored state after being monitored, and a new cycle
begins. While the date of monitoring is stochastic, the threshold duration is not.
That is, within each cycle, the principal guarantees that the worker will not be
monitored until the threshold duration is reached, similar to the deterministic ver-
ification case.

The intuition for why the worker is not monitored before the threshold duration
is as follows. The Unemployment Insurance agency has access to two instruments:
tax/subsidy and monitoring. Recall that at verification the true employment status
is revealed, and E is reset to a level such that its shadow price is zero, which means
that, immediately after monitoring, the employment tax can be varied at no cost.
The cost of the tax/subsidy instrument is lower than the cost of monitoring, γ > 0 ,
immediately after monitoring, and remains so until some threshold unemployment
duration is reached. Hence, it is optimal to use only the tax/subsidy instrument for
the provision of incentives before the threshold.

REMARk 3: The absence of verification until a threshold duration is unlikely to
be robust to other types of penalties. For instance, in popov (2009) there is an
exogenous lower bound on the worker’s continuation utility, and a worker who is
caught cheating is pushed to this lower bound. So the penalty for a worker with
high continuation utility is larger than that for a worker with low continuation
utility. With hidden independently and identically distributed income, he shows
that the verification probability is always positive.

The stochastic monitoring mechanism clearly dominates the deterministic
mechanism characterized in Section V. To see this, consider a stochastic moni-
toring scheme in which the arrival rate of monitoring is higher than p for workers
in the random monitoring state. Denote this higher arrival rate as p ̃ . Proposition 4
implies that p ̃ is suboptimal. By continuity, the limiting scheme as p ̃ → ∞ should
also be suboptimal. This limiting scheme is exactly the deterministic monitoring
mechanism.

We argue below that the key insights on the use of tax/subsidy and monitoring
instruments in the suboptimal deterministic mechanism are nearly identical to the
insights from the optimal stochastic mechanism. We describe in detail the similari-
ties and differences between the implications of the two mechanisms.

272 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

A. comparison of Monitoring with the Deterministic case

First, both the stochastic and deterministic mechanisms have the feature that
monitoring does not occur before a threshold unemployment duration; m 1 in the
deterministic case and n in the stochastic case. These thresholds, however, could be
different; i.e., in general m 1 ≠ n .

Second, both mechanisms feature cycles. In the deterministic case, after m 1 , a
new cycle begins; the new cycle has exactly the same length as the previous cycle.
Similarly, in the stochastic case, after monitoring occurs a new cycle begins and
verification does not occur again before the threshold n is reached. The exact date
when the monitoring occurs in the stochastic case is random. This is because, after
n , monitoring arrives according to a Poisson process and, hence, the exact length of
each cycle depends on when the worker is actually verified. As in the deterministic
case, however, the value of n is the same in each cycle.

B. comparison of Tax/Subsidy with the Deterministic case

Consumptions in the stochastic monitoring case are similar to those in the deter-
ministic case. Within each cycle, before the threshold n , the patterns of con sumption
are identical to ( c E , c U ) in Figures 4 and 5. After n , if a worker is monitored and
verified to be truly unemployed, then the unemployment benefits jump down, as in
the deterministic case.

The only difference is that in the deterministic case, continuation utilities, and
consumptions are reset when the threshold m 1 is reached. In the stochastic case, after
the threshold n and before the monitoring actually arrives, continuation utilities and
consumptions smoothly decline with the duration of unemployment. The decreas-
ing continuation utilities and the monitoring (and finite punishment) jointly provide
incentives for truth telling; the worker is indifferent between reporting a job offer
and committing fraud.

C. Quantitative Analysis

To illustrate our optimal contract, we follow Hopenhayn and Nicolini (1997)
closely and perform a quantitative exercise similar to theirs. We let the agents in
our model face a stylized version of the US unemployment insurance system. We
calibrate the model to match the observed rate of concealed earnings fraud. We then
compute the gain from switching to the optimal mechanism in our model.

To perform this exercise, we have to add some heterogeneity to our model; oth-
erwise everyone would cheat or no one would cheat, and we would not be able to
match the observed rate of concealed earnings fraud. We assume that the workers
are heterogeneous in the wages they earn and, hence, the replacement rate for unem-
ployment benefits. Concretely, we assume that the wage distribution is lognormal
with parameters μ w and σ w 2 .

The BAM data provide earnings information for an individual’s previous employ-
ment (the earnings that determine the unemployment benefits for the individual). In
the 2007 sample of BAM data, the mean weekly wage is $692 and the coefficient

VOL. 7 nO. 2 273fuller et al.: unemployment insurance fraud

of variation is 0.79 . Using these data moments, we calibrate μ w = 6.296 and
σ w 2 = 0.488 . By construction, the earnings in the BAM data are only for those who
collect unemployment benefits. Instead of using the BAM data we could use the CPS
data on earnings for the entire employed population to calibrate the wage distribution
in the model. However, individuals collecting unemployment benefits generally earn
less (while employed) than the individuals in the entire employed population.9

We calculate the unemployment benefits as a function of wages, again using the
BAM 2007 data: ln(unemployment benefits) = 1.31 + 0.65 ln(wages).

We assume that the model period is one week and that the interest rate
r = 0.001 . Since the average duration of unemployment in 2007 is 16.85 weeks, we
calibrate the job arrival rate to be π = 1/16.85 . The monitoring cost γ is calibrated as
follows. On average, the BAM investigators spend 12.6 hours per case and the average
wage of the investigators is $43 in 2012 (the only year when such data are available).
So, adjusting the average wage to 2007 dollars, we calibrate γ to be $501 . We cali-
brate the value of absolute risk aversion ρ such that the relative risk aversion for the
average wage earner is 2 . Since the average wage is $692 in our sample, ρ = 2/692 .

We then calibrate the probability of monitoring and the penalty in the US system if
caught cheating to match two targets: fraction of people committing concealed earn-
ings fraud and fraction of people caught cheating among those committing the fraud.

With CARA preferences, wage heterogeneity is not relevant for matching the two
targets, but it is relevant for computing the distribution of initial promised utility in
the baseline. In the counterfactual, we take these initial promised utilities as given,
calculate the optimal monitoring and benefits, and then compute the cost of deliver-
ing the initial promised utilities. The job arrival rate, wage distribution, and penalty
are held fixed at the same values as the baseline calibration.

The results imply that, measured in present value, the cost of optimal monitoring
is 60 percent of the cost in the current US system. In the optimal contract (averag-
ing across the initial promised utilities), n = 11.64 weeks. That is, the planner
guarantees that monitoring does not occur for roughly the first 12 weeks of the
unemployment spell and, thus, reduces the monitoring cost with an efficient use of
the monitoring technology.

To determine the magnitude of the gain from switching to the optimal mecha-
nism, suppose that the planner is restricted to use the same amount of resources
as the current US system. How much additional utility can the planner deliver to
the average worker? The answer is a utility gain equivalent to 1.55 percent more
consumption at every date than the US system provides. This gain arises from two
sources: (i) improved consumption smoothing between employed and unemployed
states and (ii) reduced monitoring costs or higher consumption on average. The US
system spends only 0.24 percent its resources on monitoring the average worker and
spends the rest on unemployment benefits (net of wages), but the same resources are
allocated differently in the optimal contract: 0.17 percent is spent on monitoring the
average worker and the rest is spent on unemployment benefits. Thus, almost all of
the gain in our model comes from improved consumption smoothing.

9 The mean weekly wage among employed workers in the March 2007 CPS is $861 , and the coefficient of
variation is 1.27 .

274 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

There are some obvious limitations to this analysis. Most notably, our exercise
is a partial equilibrium analysis, as in Hopenhayn and Nicolini (1997). To fully
quantify the welfare gains from adopting the optimal contract, we have to conduct
a general equilibrium analysis incorporating transition from employment to unem-
ployment and disciplining the model with aggregate worker flows.

VIII. Conclusion

The most prevalent incentive problem in the US unemployment insurance system
is that individuals collect unemployment benefits while being gainfully employed.
We examine a model of optimal unemployment insurance where a worker can con-
ceal his employment status and the unemployment insurance authority has a tech-
nology to verify his employment status. We find that the optimal interval between
consecutive monitoring periods is a constant, independent of history. The optimal
employment tax is nonmonotonic, increasing between verifications and decreasing
immediately after a verification. The optimal unemployment benefits decline with
unemployment duration with sharp declines after each verification. Our optimal
contract also prevents fraud due to quits.

Unemployment insurance in our model is a form of social insurance protecting
workers against the risk of job loss. Acemoglu and Shimer (1999, 2000); Shimer
and Werning (2008); and Alvarez-Parra and Sanchez (2009) explore another role
of unemployment insurance. They examine environments with heterogeneous jobs,
and unemployment insurance helps the worker wait for the appropriate job. Some
jobs have higher productivity than others, but such job opportunities arrive less fre-
quently. Unemployment benefits help workers wait for more productive matches
and endure longer unemployment durations. The benefits in these environments
affect the aggregate composition of jobs. An interesting direction for future research
is to extend our environment to multiple jobs and examine optimal monitoring in the
presence of the alternative role of unemployment insurance.

Finally, our model does not include any job retention effort. Incorporating the
job retention effort into our model requires employment to be stochastic. If workers
can conceal earnings, their hidden income could affect their job retention effort.
Analyzing interaction between effort and fraud is another interesting direction for
future research.

Appendix A. Data

Fraud and Overpayments.—Table A1 details the various types of fraud overpay-
ments from 2005−2009 , averaged over all US states. Concealed earnings fraud is
the dominant source of overpayments in every year.

The unemployment insurance system might incur another form of overpayment
if workers strategically delay the start date of employment. That is, workers might
accept a job offer but agree to start the job after their unemployment benefits have
expired. Gauthier-Loiselle (2011) documents that unemployment insurance expen-
ditures are higher in Canada because of such cases. In the United States, this is not

VOL. 7 nO. 2 275fuller et al.: unemployment insurance fraud

considered fraud. Thus, the BAM data include no information on such cases, so they
are not included in the fraud overpayments statistics.

Overpayments Due to insufficient Search.—In Table 1 in Section I, the over-
payments due to concealed earnings fraud were almost 12 times the overpayments
due to insufficient search fraud. Do the data understate the incidence of insuffi-
cient search? Recall that the BAM program measures only the extensive margin—
whether the individual submits the required number of applications. It is possible
that the unmeasured intensive margin—effort that turns an application into a job
offer—is large enough to make the overpayments due to insufficient search compa-
rable in magnitude to the overpayments due to concealed earnings. The following
facts, however, suggest that the unmeasured component is unlikely to be large:
• Measured overpayments due to insufficient search have been declining: In 1988

they accounted for 34 percent of the total overpayments due to all fraud, whereas
in 2007 they accounted for less than 5 percent. (The corresponding numbers for
concealed earnings fraud were 41 percent and more than 60 percent.)

• The job search requirements that make an unemployed person eligible for ben-
efits have increased over time, so the decline in the measured component is not
due to changes in eligibility criteria. Hence, for the insufficient search over-
payments to be the same in 2007 as those measured in 1988, the unmeasured
component has to be almost six times that of the measured component in 2007.

• If unmeasured efforts to translate a job application into a job offer were sub-
stantially higher in 2007, then the increase in efforts should imply a substan-
tially higher transition rate from unemployment to employment. However, the
transition rate is roughly constant: The quarterly rate was 0.31 for the period
1988–1997 and 0.33 for 1998–2007.

From a normative point of view, as noted in footnote 1, the prevailing quantitative
theory prescribes an intensive margin search effort that is less than the effort exerted
under the current unemployment insurance program in the United States. In other
words, insufficient search is not a critical incentive problem in the United States.
(Using evidence from randomized trials in four US sites, Ashenfelter, Ashmore,
and Deschenes (2005) find that insufficient job search is not a significant source of
unemployment insurance overpayments.)

Table A1—Fraud Overpayments

Percent of total fraud overpayments

Cause 2005 2006 2007 2008 2009

Concealed earnings 62.64 54.40 60.06 67.32 65.89
Insufficient job search 4.55 4.15 4.95 3.02 2.75
Refused suitable offer 1.50 1.23 0.80 0.36 0.77
Quits 12.78 16.41 13.29 12.69 9.61
Fired 4.27 4.60 4.17 4.60 7.38
Unavailable for work 4.94 6.95 7.06 5.04 5.14
Other 9.33 12.27 9.67 6.97 8.46
Total 100.00 100.00 100.00 100.00 100.00

Source: Benefit Accuracy Measurement Program, US Department of Labor

276 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

Appendix B. Microfoundations for E(t) ≥ U(t)

Suppose that the worker can privately refuse a job offer. The timing in each period is
as follows. The stochastic job opportunity arrives and the worker either receives an offer
or does not. He then chooses to report the offer (if any) to the principal. Conditional
on the report of an offer, the principal recommends that the worker either accept or
reject the offer. The worker then chooses whether to follow the principal’s recommen-
dation. (In contrast, job acceptance is implicitly imposed in our model in Section II.)
Conditional on the report, the principal assigns current and future consumptions.

In such a job-refusal model, it is optimal for the principal to always recommend
to the worker who reports an offer to accept the offer. Recommending “accept”
minimizes the cost of delivering the promised utility since the worker’s consump-
tion is constant upon job acceptance and the principal gets the perpetual wage.
Recommending “reject” means that the continuation contract involves additional
uncertainty of job offers, reports, and incentive constraints. So the consumption cost
of delivering the same promised utility is higher under “reject.” Recall that, unlike
Atkeson and Lucas (1995), we do not have disutility to working so it is optimal to
always recommend “accept.”

The incentive compatibility for an agent with a job offer is as follows. If he reports
his offer and receives a recommendation to accept, he strictly prefers “accept” to
“reject.” This is because rejecting the offer would not make him eligible for any
unemployment insurance benefits, but would make him lose his wage income. If the
agent does not report his offer, then either he rejects the offer and obtains U(t) or
he accepts the offer and commits fraud (i.e., he works and collects unemployment
benefits at the same time). For the agent to truthfully report his offer, the utility of
reporting and accepting the offer, E(t) , must be higher than both U(t) and the utility
he obtains by committing concealed earnings fraud. These incentive compatibility
constraints are exactly conditions (2) and (3) in our model in Section II.

Appendix C. Proofs

PROOF OF LEMMA 1:
Suppose that a contract σ ≡ { (U(t), E(t), u(t), c U (t), c E (t), m i ) ;

t ≥ 0, i ≥ 1}

delivers the continuation utility U . Then, a contract

σ α ≡ { (αU(t), αE(t), αu(t), c U (t) − log (α)/ρ, c E (t) − log (α)/ρ, m i ) ;

t ≥ 0, i ≥ 1}

delivers αU . The reverse is also true. Further, σ is incentive compatible if and only
if σ α is incentive compatible. Therefore, { ( U ∗ (t), E ∗ (t), u ∗ (t), c U∗ (t), c E∗ (t), m i ∗ ) ;
t ≥ 0, i ≥ 1} is the optimal contract to deliver U if and only if

{ (α U ∗ (t), α E ∗ (t), α u ∗ (t), c U∗ (t) − log (α)/ρ, c E∗ (t) − log (α)/ρ, m i ∗ ) ;

t ≥ 0, i ≥ 1}

Vol. 7 No. 2 277fuller et al.: unemployment insurance fraud

is the optimal contract to deliver αU . ∎

Lemma 4: The promise-keeping constraint (1) and the incentive constraint (6)
hold for all 0 ≤ t < s ≤ m 1 if and only if

(C1) U(s) − U(t) = ∫
t

s
((r + π)U(x) − πE(x) − ru(x)) dx,

(C2) E(s) − E(t) ≤ ∫
t

s
(rE(x) − r e −ρw u(x)) dx,

hold for all 0 ≤ t < s ≤ m 1 . Taking the limit as s goes to t yields the differential equations (10) and (11).

Proof:
We show only the equivalence between (6) and (C2), since the equivalence

between (1) and (C1) can be obtained similarly by replacing the inequalities below
with equalities.

Necessity: If (6) holds for all t < s , then

E(t) + ∫
t

s
(rE(x) − r e

−ρw u(x)) dx

≥ ∫
t

s
e −r (x−t) r e −ρw u(x) dx + e −r (s−t) E(s)

+ ∫
t

s
(r ( ∫ x

s
e −r (η−x) r e −ρw u(η) dη + e −r (s−x) E(s)) − r e

−ρw u(x)) dx

= ( e
−r (s−t) + ∫

s
r e −r (s−x) dx ) E(s) + ∫ t

s
( e −r (x−t) − 1) r e −ρw u(x) dx

+ ∫
t

s
r ( ∫ x

s
e −r (η−x) r e −ρw u(η) dη) dx

= E(s) + ∫
t

s
( e −r (x−t) − 1) r e −ρw u(x) dx + ∫

s
( ∫ t

η
r e −r (η−x) dx ) r e

−ρw u(η) dη

= E(s) + ∫
t

s
( e −r (x−t) − 1) r e −ρw u(x) dx + ∫

s
(1 − e −r (η−t) ) r e −ρw u(η) dη

= E(s) .

Hence, inequality (C2) is verified.
Sufficiency: Define an absolutely continuous function f ( · ) as

f (s) ≡ ∫
t

s
e −r (x−t) r e −ρw u(x) dx + e −r (s−t) ( E(t) + ∫ t

s
(rE(x) − r e −ρw u(x)) dx ) .

278 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

Because f is absolutely continuous, it is differentiable almost everywhere (a.e.), and

f ′(s) = e −r (s−t) r e −ρw u(s) − r e −r (s−t) ( E(t) + ∫ t
s
(rE(x) − r e −ρw u(x)) dx )

+ e −r (s−t) (rE(s) − r e −ρw u(s))

= r e −r (s−t) ( E(s) − E(t) − ∫ t
s
(rE(x) − r e −ρw u(x)) dx ) , a.e .

If (C2) holds, then f ′(s) ≤ 0 a.e. Then, it follows from Theorem 29.15 in Aliprantis
and Burkinshaw (1990) that

f (s) = f (t) + ∫
t

s
f ′(x) dx ≤ f (t) = E(t) .

Therefore,

∫
t

s
e −r (x−t) r e −ρw u(x) dx + e −r (s−t) E(s) ≤ f (s) ≤ E(t),

which verifies inequality (6). ∎

PROOF OF LEMMA 2:
If (19), (20), and (21) all hold, we can substitute them into ( ΦU + λE )

′ and

obtain

(ΦU + λE )

′

= ΦU ′ + λ′E + λE ′

= Φ ((r + π)U − πE − ru) + π (Φ − c′(E ) + λ) E + λ(rE − r e −ρw u)

= (r + π) (ΦU + λE ) − πc′(E)E − r (Φ + e −ρw λ)u .

Because −c′(E )E = ρ −1 and − (ρu) −1 = c′(u) = Φ + e −ρw λ , we have

(C3) (ΦU + λE )
′ = (r + π) (ΦU + λE + ρ −1 ) .

Because ΦU(0) + λ(0)E(0) + ρ −1 = 0 , it follows from (C3) that ΦU(t) +
λ(t)E(t) + ρ −1 = 0 for all t ∈ [0, m 1 ] .

On the other hand, if (20) and (21) hold and

ΦU(t) + λ(t)E(t) + ρ −1 = 0, ∀ t ∈ [0, m 1 ],

VOL. 7 nO. 2 279fuller et al.: unemployment insurance fraud

then (ΦU + λE )
′ = 0 for all t ∈ [0, m 1 ] . Then (19) can be derived by reversing

the above steps. ∎

PROOF OF LEMMA 3:
First, it is convenient to transform the state variable E , which may approach −∞ ,

into a bounded one. To do so, we replace E with

g ≡ c′(E ) = −(ρE ) −1 .

Now, the ODE system consists of (21) and

(C4) g′ = E′ ____
ρ E 2

=
r g 2
________ Φ e ρw + λ − rg,

with boundary condition g( m 1 ) = Φ + λ( m 1 ) (Figure C1 shows the phase dia-
gram). Let m(g(0)) be the time to hit the straight line g = Φ + λ starting with
(λ(0) = 0, g(0)) .

Second, we show that lim g(0)↓Φ
m(g(0)) = 0 . If λ = 0 and g = Φ , then

(g − λ)′(t) = (
r g 2
________ Φ e ρw + λ − rg + π(g − λ − Φ)) | (λ, g)=(0, Φ)

= r Φ
2 ____ Φ e ρw − r Φ < 0 .

λ
(0, 0)

g(0)

Φe ρw

line g = Φe ρw + λ

line g = Φ + λ

Figure C1. Phase Diagram for (λ, g )

280 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

Continuity of the ODE system (21), (C4) implies that (g − λ)′(t) < 0 in a small neighborhood of (0, Φ) . If λ(0) = 0 and g(0) approaches Φ from above, then g(0) − λ(0) − Φ approaches zero. Since the solution curve starting with (0, g(0)) will remain in the small neighborhood of (0, Φ) for a while, it will decrease and hit the line g = Φ + λ quickly if g(0) − λ(0) − Φ is sufficiently small.

Third, we show that m(g(0)) is strictly increasing in g(0) . Consider two paths
that start with initial conditions (0, g 1 (0)) and (0, g 2 (0)) , where Φ < g 1 (0) < g 2 (0) . We will show that g 1 (t) − λ 1 (t) < g 2 (t) − λ 2 (t) for all t . By contradiction, suppose ( g 1 − λ 1 )(t) = ( g 2 − λ 2 )(t) for the first time at t = t ∗ . Because the two paths cannot cross, we cannot have that g 1 ( t ∗ ) ≤ g 2 ( t ∗ ) . Then g 1 ( t ∗ ) > g 2 ( t ∗ ) and
λ 1 ( t ∗ ) > λ 2 ( t ∗ ) . Hence,

( g 1 − λ 1 )′( t ∗ ) = −
r g 1 ________ Φ e ρw + λ 1

(Φ e ρw + λ 1 − g 1 ) − π(Φ + λ 1 − g 1 )

< − r g 2 ________ Φ e ρw + λ 2

(Φ e ρw + λ 2 − g 2 ) − π(Φ + λ 2 − g 2 )

= ( g 2 − λ 2 )′( t ∗ ) ,

where the inequality follows from
g 1 _______ Φ e ρw + λ 1

> g 2 _______ Φ e ρw + λ 2
. That ( g 1 − λ 1 )′( t ∗ ) <

( g 2 − λ 2 )′( t ∗ ) contradicts the facts that ( g 1 − λ 1 )( t ∗ ) = ( g 2 − λ 2 )( t ∗ ) and
( g 1 − λ 1 )(t) < ( g 2 − λ 2 )(t) for all t < t ∗ . Thus, g 1 (t) − λ 1 (t) < g 2 (t) − λ 2 (t) for all t , and the path ( λ 1 (t), g 1 (t)) reaches g = Φ + λ sooner.

Finally, we show there exists a unique g(0) to satisfy m(g(0)) = m 1 for any
m 1 > 0 . The second step in this proof shows that lim g(0)↓Φ

m(g(0)) = 0 . Part (ii)
in Lemma 5 shows that m(g(0)) can be arbitrarily large with high values of g(0) .
Hence, the existence of a unique solution to m(g(0)) = m 1 follows from the inter-
mediate value theorem and the monotonicity of m(g(0)) in g(0) . ∎

PROOF OF PROPOSITION 2:

First, we show that E , c U , U , and U __
E
all fall on [0, m 1 ] . It follows from g′(t) < 0

that E ′(t) = ρ E 2 (t)g′(t) < 0 . Equation (13) implies that u′(t) = e −ρw λ′(t) ______

c″(u)
< 0 ,

or ( c U )

′ (t) < 0 . Equation (22) implies that U′(t) = − Φ −1 (λ(t)E(t) )

′ < 0 .

Equation (22) also implies that U __
E
= Φ −1 (g − λ) . Hence, part (i) in Lemma 5

implies that (
U __
E
)

′ (t) < 0 . Second, to see the downward jump in c U ( · ) at m 1 , we show that

lim
t↑ m 1

c′(u(t)) > lim

t↓ m 1

c′(u(t)) .

The left side is Φ + e −ρw λ( m 1 ) according to (13). To obtain the right side, we apply
(13) to the interval [ m 1 , 2 m 1 ) and obtain

c′(u(t)) = c ′(U( m 1 )) + e −ρw λ ̃ (t), t ≥ m 1 ,

VOL. 7 nO. 2 281fuller et al.: unemployment insurance fraud

where λ ̃ denotes the multiplier λ for the problem on the interval [ m 1 , 2 m 1 ) . Because
λ ̃ ( m 1 ) = 0 , we have lim t↓ m 1

c′(u(t)) = c′(u( m 1 )) = c ′(U( m 1 )) + 0 = Φ +
λ( m 1 ) . Therefore,

lim
t↑ m 1

c′(u(t)) = Φ + e −ρw λ( m 1 ) > Φ + λ( m 1 ) = lim

t↓ m 1

c′(u(t)) . ∎

PROOF OF PROPOSITION 3:

First, because (i) Φ + e
−ρw λ _______ Φ + λ decreases in λ and (ii) λ( m 1 ) decreases in g(0) and

m 1 , there is a unique value for g(0) (as well as m 1 ) for a given ψ .
Second, to show that (23) is sufficient, we prove that

𝒞′( m 1 ) {
< 0, m 1 < m 1 ∗ ; > 0, m 1 > m 1 ∗ .

This is because
Φ + e −ρw λ( m 1 ) _________

Φ + λ( m 1 )
strictly increases in m 1 :

Φ + e −ρw λ( m 1 ) _________
Φ + λ( m 1 )

decreases in
λ( m 1 ) and the proof of Lemma 3 shows that λ( m 1 ) decreases in g(0) and m 1 . ∎

Details in the computation of 𝒞′( m 1 ).—
Rewrite 𝒞′( m 1 ) as

∫
0

m 1

e −(r+π)t (πc( E m 1 ) + rc( u m 1 ) + Φ ((r + π) U m 1 − π E m 1 − r u m 1 − ( U m 1 ) ′ )

+ λ m 1 (r E m 1 − r e −ρw u m 1 − ( E m 1 ) ′ ) ) dt

+ e −(r+π) m 1 (γ + ψ + c( U m 1 ( m 1 ))) + e −(r+π) m 1 λ m 1 ( m 1 )( E m 1 ( m 1 ) − U m 1 ( m 1 )),

where we put a superscript m 1 on U( · ) , E( · ) , u( · ) , and λ( · ) because these opti-
mal paths rely on m 1 . We use the envelope theorem to simplify the computation of
𝒞′( m 1 ). Since U m 1 (t) , E m 1 (t) , u m 1 (t) are already optimally chosen at each t , we may
view them as fixed when we vary m 1 . Further, U m 1 ( m 1 ) and E m 1 ( m 1 ) can be viewed
as varying only with the terminal date in the parenthesis.10 Viewed in this light, a
small increment of m 1 is just an extrapolation of all time paths over a longer duration
of unemployment, while the paths themselves are fixed. That is, we view all super-
scripts as being fixed and omit them when we calculate derivatives. Because E( m 1 ) −
U( m 1 ) = 0 , we have

𝒞′( m 1 ) = e −(r+π) m 1 (πc(E( m 1 )) + rc(u( m 1 )) − (r + π)(γ + ψ + c(U( m 1 )))

+ c′(U( m 1 ))U ′( m 1 ) + λ( m 1 )(E ′( m 1 ) − U ′( m 1 ))) .

10 This is because U m ̃ 1 ( m 1 ) and E m ̃ 1 ( m 1 ) can be viewed as being fixed when we vary m ̃ 1 .

282 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

It follows from c′(U( m 1 )) = Φ + λ( m 1 ) , λ′( m 1 ) = 0 and Lemma 2 that

c ′ (U( m 1 ))U ′( m 1 ) + λ( m 1 ) ( E ′( m 1 ) − U ′( m 1 ))

= ΦU ′( m 1 ) + λ( m 1 )E ′( m 1 ) = (ΦU( m 1 ) + λ( m 1 )E( m 1 )) ′ = 0 .

Therefore,

𝒞′( m 1 ) = e −(r+π) m 1 (πc(E( m 1 )) + rc(u( m 1 )) − (r + π)(γ + ψ + c(U( m 1 ))))

= e −(r+π) m 1 (r ρ
−1 log (

Φ + e −ρw λ( m 1 ) ___________
Φ + λ( m 1 )

) − (r + π)(γ + ψ)) .

Fixed-point condition for ψ .—
The condition for ψ is that ψ is the fixed point of operator T , i.e.,

ψ + c(U(0)) = T(ψ) + c(U(0)) ≡ min
σ

c(σ) .

We obtain ψ from the first-order condition (23) for m 1 ,

ψ =
r ρ −1
_____ r + π log (

Φ + e −ρw λ( m 1 ) ____________
Φ + λ( m 1 )

) − γ .

We obtain T(ψ) from the HJB equation for the cost function at time zero

T(ψ) + c(U(0)) =
πc(E(0)) + rc(u(0)) + Φ ((r + π)U(0) − πE(0) − ru(0))

_________________________________________ r + π

= π _____ r + π (

Φ ____
g(0)

− log (
Φ ____

g(0)
) − 1) + c(U(0)) .

The fixed-point condition ψ = T(ψ) is rewritten as

(C5) (r + π)γ = r ρ −1 log (
Φ + e −ρw λ( m 1 ) ____________

Φ + λ( m 1 )
) − π (

Φ ____
g(0)
− log (
Φ ____

g(0)
) − 1) .

PROPOSITION 5: The path that satisfies (C5) exists and is unique.

PROOF:
The existence of a path that satisfies (C5) follows from the intermediate value

theorem and the fact that the right side of (C5) is either extremely large or extremely

VOL. 7 nO. 2 283fuller et al.: unemployment insurance fraud

small if we vary g(0) . To see this, note that the proof of Lemma 3 shows that
lim g(0)↓Φ

m 1 = 0 = lim g(0)↓Φ
λ( m 1 ) . Therefore,

lim
g(0)↓Φ

r ρ −1 log (

Φ + e −ρw λ( m 1 ) ____________
Φ + λ( m 1 )

) − π (
Φ ____
g(0)

− log (
Φ ____

g(0)
) − 1) = 0 .

On the other hand, the proof of part (ii) of Lemma 5 shows the existence of paths with

λ( m 1 ) approaching −Φ and g(0) ∈ (Φ, Φ e ρw ) . For these paths, log (
Φ + e −ρw λ( m 1 ) _________

Φ + λ( m 1 )
)

can be arbitrarily large, while Φ ___
g(0)

remains bounded.
The uniqueness can be shown by contradiction. Suppose there are two paths satis-

fying (C5). Associated with the two paths are two fixed points, ψ < ψ ̃ . Because the principal facing ψ ̃ may monitor at m 1 (ψ) > 0 and adopt the optimal consumption
paths under ψ ,

T( ψ ̃ ) ≤ ψ + e −(r+π) m 1 (ψ) ( ψ ̃ − ψ) < ψ ̃ ,

which contradicts the fact that ψ ̃ is a fixed point. ∎

LEMMA 5: consider the ODE system (21), (C4) with time running backward,
that is,

(C6) λ′ = π(g − Φ − λ),

(C7) g′ = rg −
r g 2
________ Φ e ρw + λ .

Suppose the initial condition is (λ(0), g(0) = Φ + λ(0)) , −Φ < λ(0) < 0 , and m − (λ(0)) denotes the first time to hit the g -axis, i.e., m − (λ(0)) = min t {t > 0 : λ(t) = 0} :

(i ) (g − λ)′(t) > 0 for all t ∈ [0, m − (λ(0))] .

(ii ) m − (λ(0)) is finite, and lim λ(0)↓−Φ
m − (λ(0)) = ∞ .

PROOF:
(i) The path starting with (λ(0), g(0) = Φ + λ(0)) has

λ′(0) = π(g(0) − Φ − λ(0)) = 0,

g′(0) = rg(0) −
rg (0) 2
__________

Φ e ρw + λ(0)
> 0 .

Hence, it moves beyond g = Φ + λ at time zero and satisfies Φ + λ < g < Φ e ρw + λ before reaching the g -axis. If Φ + λ < g < Φ e ρw + λ , then g ′ > 0 and λ′ > 0 .

284 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

To show that (g − λ)′(t) > 0 for all t ∈ [0, m − (λ(0))] , suppose
to the contrary that (g − λ)′(s) ≤ 0 for some s . Let t ∗
= min s {s > 0 : (g − λ)′(s) ≤ 0} . It is easily seen that (g − λ)′( t ∗ ) = 0
and (g − λ)″( t ∗ ) ≤ 0 . Since (g − λ)′ = rg − r g

2 ______ Φ e ρw + λ − π(g − Φ − λ) ,

(g − λ)″( t ∗ ) =
(

r −
2rg(Φ e ρw + λ)

___________
(Φ e ρw + λ) 2

− π
)

g′( t ∗ ) + (
r g 2
__________

(Φ e ρw + λ) 2
+ π) λ′( t

∗ )

= (r +
r g 2 − 2rg(Φ e ρw + λ)

________________
(Φ e ρw + λ) 2

) g′( t
∗ )

= r
(Φ e ρw + λ − g) 2

_____________
(Φ e ρw + λ) 2

g′( t ∗ ) > 0 ,

where the second equality follows from g′( t ∗ ) = λ′( t ∗ ) . This contradicts that
(g − λ)″( t ∗ ) ≤ 0 .

(ii) First, we show that m − (λ(0)) is finite. We know from the proof of part (i) that
λ′ > 0 . It follows from (C6) and (g − λ)′ > 0 in part (i) that

λ″ = π(g − λ)′ > 0 .

Hence, starting from λ(0) < 0 , λ(t) accelerates and will reach zero in finite time.

Second, we show that lim λ(0)↓−Φ
m − (λ(0)) = ∞ . If λ(0) = −Φ and

g(0) = 0 , then

λ′(0) = π(g(0) − Φ − λ(0)) = 0,

g′(0) = rg(0) −
rg (0) 2
__________

Φ e ρw + λ(0)
= 0 .

Continuity of the ODE system (C6), (C7) implies that (λ, g) will stay in
a small neighborhood of (−Φ, 0) for a long duration if λ(0) is sufficiently
close to −Φ and g(0) = Φ + λ(0) . Therefore, lim λ(0)↓−Φ

m − (λ(0))
= ∞ . ∎

VOL. 7 nO. 2 285fuller et al.: unemployment insurance fraud

Appendix D. Stochastic Verification

A. construction of a contract

To prove Proposition 4, we first construct a contract σ ∗ in which E(t) > U(t)
implies p(t) = 0 and E(t) = U(t) implies p(t) > 0 . This contract has the features
described in Proposition 4, and in the next section we verify it is indeed optimal.

First, since the principal does not monitor in this contract when E > U , we still
use the ODE system (20), (21) to find a solution path in the interval [0, n] , where
n satisfies

(D1) − ∫
0

n
λ(t) (rE − r e −ρw u) dt − λ(n )( e ρϕ − 1)E(n ) + γ = 0 .

The two boundary conditions for the ODE system (20), (21) are still λ(0) = 0 and
E(n ) = − ρ −1 (Φ + λ(n )) −1 .

LEMMA 6: The n that satisfies (D1) exists and is unique.

PROOF:
For uniqueness, we show that f (n ) ≡ − ∫ 0

n λ(t) (rE − r e −ρw u) dt − λ(n )
× ( e ϕ − 1)E(n ) decreases with n . Since both λ(n ) and E(n ) are negative and
decreasing with n , −λ(n )( e ϕ − 1)E(n ) decreases with n . Moreover,

−λ (rE − r e −ρw u) = r|λ| __________
g(Φ e ρw + λ)

(g − λ − Φ e ρw ) .

For fixed t ,
r|λ|
________

g(Φ e ρw + λ)
increases with n , while (g − λ − Φ e ρw ) is more nega-

tive with higher n . Therefore, − ∫ 0
n λ (rE − r e −ρw u) dt decreases with n too.

For existence, note that li m n→0 f (n ) = 0 . Because li m n→∞ λ(n ) = −Φ and
li m n→∞ E (n ) = −∞ , we have li m n→∞ f (n ) = −∞ . ∎

Second, choose p > 0 after n so that the state vector stays on the 45-degree
line before the monitoring arrives, i.e., U(t) = E(t) for all t ≥ n . Choosing
U ̃ (n ) = U(0) = − 1 __ ρΦ and solving the equation U ′(n) = E′(n) , we have

(D2) p =
r(1 − e −ρw ) (Φ + e −ρw λ(n )) −1

______________________
e ρϕ (Φ + λ(n )) −1 − Φ −1

> 0 .

Note that p is independent of Φ . This also implies that p > 0 is time invariant after
n because U(t) = E(t) for t ≥ n .

Third, the constructed solution path defines a contract σ ∗ as follows. For each
t ∈ [0, n ] , the policy u(t) is obtained by the first-order condition (13)

(D3) u(t) = − 1 ____________
ρ(Φ + e −ρw λ(t))

286 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

If t ≥ n , then the state vector moves along the 45-degree line, and u(t) is always
proportional to (U(t), E(t)) . That is, for all t ≥ n ,

(D4)
u′(t)
____

u(t)
=

E ′(t)
____

E(t)
=

U ′(t)
____

U(t)
= r −

r(Φ + λ(n ))
___________

Φ + e −ρw λ(n )

+ p (1 −
Φ + λ(n)
________ Φ ) > 0 .

The contract σ ∗ is defined by (D1–D4), and the property that the continuation con-
tract after a monitoring at t ≥ n starts a new cycle in which the continuation utility
is U ̃ (t) =

Φ + λ(n )
______ Φ U(t) instead of U(0) . In this construction, σ

∗ has the features
mentioned in Proposition 4.

B. Optimality of the contract

First, using the path obtained in Lemma 6, we construct a cost function c as

(D5) (r + π)c(U(t), E(t)) = πc(E(t)) + rc(u(t))

+ Φ((r + π)U(t) − πE(t) − ru(t)) + λ(t)(rE(t) − r e −ρw u(t)) .

LEMMA 7: c U (U(t), E(t)) = Φ , and c E (U(t), E(t)) = λ(t) .

PROOF:
Differentiate (D5) with respect to t , we have

(r + π)( c U U ′(t) + c E E ′(t))

= πc′(E)E ′(t) + Φ((r + π)U ′(t) − πE ′(t)) + λ(t)rE ′(t) + λ′(t)E ′(t),

which, after substituting λ′(t) = π(Φ − c′(E) + λ) , becomes

c U U ′(t) + c E E ′(t) = ΦU ′(t) + λ(t)E ′(t) .

Homogeneity of c( ·, · ) implies that c U U(t) + c E E(t) + ρ −1 = 0 = ΦU(t) +
λ(t)E(t) + ρ −1 . Because the vectors (U ′(t), E ′(t)) and (U(t), E(t)) are linearly

independent (we have shown that (
U __
E
)

′ (t) < 0 in the proof of Proposition 2, which

is
E ′(t)
___

E(t)
>

U ′(t)
___

U(t)
) , we have c U = Φ and c E = λ(t) . ∎

VOL. 7 nO. 2 287fuller et al.: unemployment insurance fraud

Second, we verify that the cost function c satisfies the HJB equation:

(D6) (r + π)c(U, E ) = min
u, p, U ̃ , E ̃

{rc(u) + πc(E ) + p (c( U ̃ , E ̃ ) + γ − c(U, E ))

+ c U (r(U − u) − π(E − U ) − p( U ̃ − U))

+ c E (rE − r e −ρw u − p( e ρϕ − 1)E ) } ,

where ( U ̃ , E ̃ ) is the new state vector the principal chooses after the next monitoring.

LEMMA 8: The c( ·, · ) defined in (D5) satisfies (D6).

PROOF:
The only differences between (D5) and (D6) are the terms associated with arrival

rate p , which will be shown to be zero in this proof. Fix a t ∈ [0, n ] and con-
sider the HJB equation at (U(t), E(t)) . The first-order condition for U ̃ implies that
U ̃ = U(0) . Then we have

c( U ̃ , E ̃ ) + γ − c(U, E) − Φ ( U ̃ − U ) − c E ( e ρϕ − 1)E

= − ∫
0

t
λ(s) (rE(s) − r e −ρw u(s)) ds − λ(t)( e ϕ − 1)E(t) + γ .

The above is decreasing in t because λ(t) < 0 , and E(t) < 0 both decrease in t . Moreover, the integral − ∫ 0

t λ(s) (rE(s) − r e −ρw u(s)) ds decreases in t because

rE(t) − r e −ρw u(t) = E ′(t) = ρ E 2 (t)g′(t) < 0 .

Therefore, the definition of n in (D1) implies that

c ( U ̃ , E ̃ ) + γ − c(U, E) − Φ ( U ̃ − U ) − c E ( e ρϕ − 1)E {
> 0, if t < n, = 0,

if t = n .

This implies that

min
p≥0

p (c ( U ̃ , E ̃ ) + γ − c(U, E) − Φ ( U ̃ − U ) − c E ( e ρϕ − 1)E ) = 0,

which finishes the proof. ∎
Finally, to complete the proof of Proposition 4, we show that the contract σ ∗ is

optimal.

288 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

PROOF OF PROPOSITION 4:
Because the technique of using the HJB equation to verify optimality is standard,

we spare the reader of detailed steps. Given the initial promised utilities (U, E) , we
need to verify that

(i) The cost of the contract σ ∗ is c(U, E) .

(ii) The costs of other I.C. contracts are weakly higher than c(U, E) .

We only verify (ii) here, since the proof for (i) can be obtained simply by replacing
the following inequalities with equalities.

To see that the cost of an I.C. contract { ( c ̃ E (t), c ̃ U (t), p ̃ (t)) ; t ≥ 0} is higher than
c(U, E) , define

h(T ) = ∫
0

T
e −(r+π)t− ∫ 0

t p ̃ (x)dx (πc ( E ̃ (t)) + r c ̃ U (t) + p ̃ (t) (c ( U ̃ (t), E ̃ (t)) + γ ) ) dt

+ e −(r+π)T− ∫ 0
T p ̃ (x)dx c(U(T ), E(T )) .

The HJB equation implies that f ′(T ) ≥ 0 . Therefore, h(T ) increases in T , and

c(U, E) = h(0) ≤ h(T ) .

Taking limit T → ∞ , we have

c(U, E) ≤ ∫
0

∞

e −(r+π)t− ∫ 0
t p ̃ (x)dx (πc ( E ̃ (t)) + r c ̃ U (t) + p ̃ (t) (c ( U ̃ (t), E ̃ (t)) + γ ) ) dt,

which can be rewritten as

c(U, E) ≤ E [ ∫ 0
τ 1

e −rt (πc ( E ̃ (t)) + r c ̃ U (t)) dt ] + E [ e −r τ 1 γ ]

+ E [ e −r τ 1 c ( U ̃ ( τ 1 ), E ̃ ( τ 1 )) ] ,

where τ 1 is the first monitoring time and ( U ̃ ( τ 1 ), E ̃ ( τ 1 )) is the state vector immedi-
ately after monitoring. Inductively, we obtain

c (U, E ) ≤ E [ ∫ 0
τ n

e −rt (πc ( E ̃ (t)) + r c ̃ U (t)) dt ] + E [ ∑ i=1

e −r τ i γ ]

+ E [ e −r τ n c ( U ̃ ( τ n ), E ̃ ( τ n )) ] ,

VOL. 7 nO. 2 289fuller et al.: unemployment insurance fraud

where τ n is the n th monitoring time. Without loss of generality, we may assume that
li m n→∞ τ n almost surely (otherwise the principal monitors infinitely many times in
finite time and the monitoring cost is infinity). Taking limit n → ∞ yields

c (U, E ) ≤ E [ ∫ 0
∞

e −r t (πc ( E ̃ (t)) + r c ̃ U (t)) dt ] + E [ ∑ i=1

∞
e −r τ i γ ] . ∎

Appendix E. Imperfect Detection

This section presents a version of the stochastic verification model where detec-
tion is imperfect. Specifically, there is a positive probability ϖ > 0 of monitor-
ing error. In the event of monitoring error, an unemployed worker is labeled as
employed. If an unemployed worker is monitored after reporting unemployment,
the principal observes either an unemployed signal  with probability 1 − ϖ or an
employed signal  with probability ϖ . On the other hand, there is no monitoring
error that labels an employed worker as being unemployed, i.e., if an employed
worker is monitored after reporting unemployment, the principal observes  with
probability one.

The timing of the problem is similar to the stochastic verification case in
Section VII. The planner still chooses the arrival rate of monitoring, p(t) , conditional
on the report of unemployment in period t . There are, however, two differences in
the case of imperfect detection. First, the planner assigns continuation utilities based
not only on whether or not monitoring occurs (as above) but also on the signal from
monitoring. Let U  (t) and U  (t) be the continuation utilities of a monitored unem-
ployed worker with signals  and  at t , respectively. Let E  (t) be the continuation
utility of a monitored employed worker (whose signal can only be  ) at t . Finally,
E  (t) is the continuation utility of a monitored unemployed worker with signal 
who transited to employment immediately after being monitored. Second, the pen-
alty is exogenous in the case of perfect detection above, but is endogenous with
imperfect detection.

Similar to (24) and (25), the promise-keeping constraint and incentive constraint
are

(E1) U ′ = r(U − u) − π(E − U) − p [(1 − ϖ) U  + ϖ U  − U ] ,

(E2) E′ ≤ rE − r e −ρw u − p( E  − E) .

There are two differences between these two equations and (24) and (25). First, the
promise-keeping constraint (E1) incorporates the possibility that an unemployed
worker may be labeled as employed after monitoring. Second, in (25) the last term
on the right-hand side results from the exogenous and finite penalty, ϕ , whereas in
(E2) the last term allows the penalty E  to be endogenous.

The main results from the perfection detection case and stochastic monitoring
still hold here. That is, the optimal monitoring mechanism consists of cycles. Within
each cycle, there exists some n such that the planner sets p = 0 before n , and then
monitors at rate p thereafter. Formally we state the following proposition.

290 AMEricAn EcOnOMic JOUrnAL: MAcrOEcOnOMicS ApriL 2015

PROPOSITION 6: There exists an n > 0 such that the principal monitors the
unemployed with a constant arrival rate p > 0 if and only if t ≥ n . Before n ,
the time path (U( · ), E( · )) converges to the 45-degree line; after n , the utility pair
(U(t), E(t)) remains stationary (i.e., U(t) = E(t) = U(n ) = E(n ) for all t ≥ n )
until the worker is randomly drawn to be monitored. if the observed signal from
monitoring is  , the worker is punished, U  = E  < U(n ) . if the signal is  , the worker is rewarded, U  > U(n ) , and the contract enters a new cycle.

REFERENCES

Acemoglu, Daron, and Robert Shimer. 1999. “Efficient Unemployment Insurance.” Journal of politi-
cal Economy 107 (5): 893–928.

Acemoglu, Daron, and Robert Shimer. 2000. “Productivity Gains from Unemployment Insurance.”
European Economic review 44 (7): 1195–1224.

Aliprantis, Charalambos, and Owen Burkinshaw. 1990. principles of real Analysis. San Diego: Aca-
demic Press.

Alvarez-Parra, Fernando, and Juan M. Sanchez. 2009. “Unemployment Insurance with a Hidden
Labor Market.” Journal of Monetary Economics 56 (7): 954–67.

Ashenfelter, Orley, David Ashmore, and Olivier Deschenes. 2005. “Do Unemployment Insurance
Recipients Actively Seek Work? Evidence from Randomized Trials in Four U.S. States.” Journal of
Econometrics 125 (1–2): 53–75.

Atkeson, Andrew, and Robert E. Lucas. 1995. “Efficiency and Equality in a Simple Model of Efficient
Unemployment Insurance.” Journal of Economic Theory 66 (1): 64–88.

Baily, Martin Neil. 1978. “Some Aspects of Optimal Unemployment Insurance.” Journal of public
Economics 10 (3): 379–402.

Fuller, David L., B. Ravikumar, and Yuzhe Zhang. 2013. “Unemployment Insurance Fraud and Opti-
mal Monitoring.” http://research.stlouisfed.org/econ/ravikumar/2012-024C .

Fuller, David L., B. Ravikumar, and Yuzhe Zhang. 2015. “Unemployment Insurance Fraud and Optimal
Monitoring: Dataset.” American Economic Journal: Macroeconomics. http://dx.doi.org/10.1257/
mac.20130255.

Gauthier-Loiselle, Marjolaine. 2011. “Find a Job Now, Start Working Later: Does Unemploy-
ment Insurance Subsidize Leisure?” http://harris.princeton.edu/seminars/pdfs/Marjolaine%20
Gauthier%20Loiselle%2092611 .

Golosov, Mikhail, and Aleh Tsyvinski. 2006. “Designing Optimal Disability Insurance: A Case for
Asset Testing.” Journal of political Economy 114 (2): 257–79.

Hansen, Gary D., and Ayse Imrohoroglu. 1992. “The Role of Unemployment Insurance in an Econ-
omy with Liquidity Constraints and Moral Hazard.” Journal of political Economy 100 (1): 118–42.

Hopenhayn, Hugo A., and Juan Pablo Nicolini. 1997. “Optimal Unemployment Insurance.” Journal of
political Economy 105 (2): 412–38.

Hopenhayn, Hugo A., and Juan Pablo Nicolini. 2009. “Optimal Unemployment Insurance and Employ-
ment History.” review of Economic Studies 76 (3): 1049–70.

Pavoni, Nicola. 2007. “On Optimal Unemployment Compensation.” Journal of Monetary Economics
54 (6): 1612–30.

Popov, Latchezar. 2009. “Stochastic Costly State Verification and Dynamic Contracts.” http://people.
virginia.edu/~lap4d/lpsm .

Ravikumar, B., and Yuzhe Zhang. 2012. “Optimal Auditing and Insurance in a Dynamic Model of Tax
Compliance.” Theoretical Economics 7 (2): 241–82.

Setty, Ofer. 2014. “Optimal Unemployment Insurance with Monitoring.” Unpublished.
Shavell, Steven, and Laurence Weiss. 1979. “The Optimal Payment of Unemployment Insurance Ben-

efits over Time.” Journal of political Economy 87 (6): 1347–62.
Shimer, Robert, and Ivan Werning. 2008. “Liquidity and Insurance for the Unemployed.” American

Economic review 98 (5): 1922–42.
Wang, Cheng, and Stephen D. Williamson. 2002. “Moral Hazard, Optimal Unemployment Insurance,

and Experience Rating.” Journal of Monetary Economics 49 (7): 1337–71.
Zhang, Yuzhe. 2009. “Dynamic Contracting with Persistent Shocks.” Journal of Economic Theory 144

(2): 635–75.

http://research.stlouisfed.org/econ/ravikumar/2012-024C

http://dx.doi.org/10.1257/mac.20130255

http://harris.princeton.edu/seminars/pdfs/Marjolaine%20Gauthier%20Loiselle%2092611 .

http://people.virginia.edu/~lap4d/lpsm

http://pubs.aeaweb.org/action/showLinks?crossref=10.1016%2FS0304-3932%2802%2900174-5

http://pubs.aeaweb.org/action/showLinks?crossref=10.1016%2F0047-2727%2878%2990053-1

http://pubs.aeaweb.org/action/showLinks?crossref=10.1086%2F500549

http://pubs.aeaweb.org/action/showLinks?crossref=10.1016%2FS0014-2921%2800%2900035-0

http://pubs.aeaweb.org/action/showLinks?crossref=10.1086%2F262078

http://pubs.aeaweb.org/action/showLinks?crossref=10.1016%2Fj.jmoneco.2009.09.006

http://pubs.aeaweb.org/action/showLinks?crossref=10.1016%2Fj.jmoneco.2006.06.006

http://pubs.aeaweb.org/action/showLinks?system=10.1257%2Faer.98.5.1922

http://pubs.aeaweb.org/action/showLinks?crossref=10.1006%2Fjeth.1995.1032

http://pubs.aeaweb.org/action/showLinks?crossref=10.1086%2F250084

http://pubs.aeaweb.org/action/showLinks?crossref=10.3982%2FTE737

http://pubs.aeaweb.org/action/showLinks?crossref=10.1016%2Fj.jet.2008.08.004

http://pubs.aeaweb.org/action/showLinks?crossref=10.1086%2F261809

http://pubs.aeaweb.org/action/showLinks?crossref=10.1086%2F260839

http://pubs.aeaweb.org/action/showLinks?crossref=10.1111%2Fj.1467-937X.2009.00555.x

http://pubs.aeaweb.org/action/showLinks?crossref=10.1016%2Fj.jeconom.2004.04.003

Reproduced with permission of the copyright owner. Further reproduction prohibited without
permission.

mac.20130255

Unemployment Insurance Fraud and Optimal Monitoring
I. Unemployment Insurance Fraud Data
II. Model
III. A Simplification of the Optimal Contract
A. Scaling
B. Periodicity
IV. Optimal Unemployment Insurance with Exogenous Monitoring
A. Optimal Control
B. Continuation Utilities
C. Employment Tax
D. Unemployment Benefits
E. Pontryagin Minimum Principle
V. Optimal Monitoring
VI. Quits
VII. Stochastic Verification
A. Comparison of Monitoring with the Deterministic Case
B. Comparison of Tax/Subsidy with the Deterministic Case
C. Quantitative Analysis
VIII. Conclusion
Appendix A. Data
Appendix B. Microfoundations for E(t) ≥ U(t)
Appendix C. Proofs
Appendix D. Stochastic Verification
A. Construction of a Contract
B. Optimality of the Contract
Appendix E. Imperfect Detection
REFERENCES

Order your essay today and save 25% with the discount code: STUDYSAVE

Order Now

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Use of probability and z scores in research to get better results ”

Get high-quality paper

NEW! AI matching with writer

Order a unique copy of this paper

Type of paper needed:

Pages:

600 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

Our Services

Use of probability and z scores in research to get better results

Order a unique copy of this paper