Stochastic Process Hmk

Due at 22:30 ET. See attachement.

http://www.cambridge.org/9781107008007

This page intentionally left blank

Stochastic Processes
This comprehensive guide to stochastic processes gives a complete overview of the
theory and addresses the most important applications. Pitched at a level accessible
to beginning graduate students and researchers from applied disciplines, it is both
a course book and a rich resource for individual readers. Subjects covered include
Brownian motion, stochastic calculus, stochastic differential equations, Markov pro-
cesses, weak convergence of processes, and semigroup theory. Applications include
the Black–Scholes formula for the pricing of derivatives in financial mathematics, the
Kalman–Bucy filter used in the US space program, and also theoretical applications
to partial differential equations and analysis. Short, readable chapters aim for clarity
rather than for full generality. More than 350 exercises are included to help readers put
their new-found knowledge to the test and to prepare them for tackling the research
literature.
richard f. bass is Board of Trustees Distinguished Professor in the Department
of Mathematics at the University of Connecticut.

C A M B R I D G E S E R I E S I N S T A T I S T I C A L A N D
P R O B A B I L I S T I C M A T H E M A T I C S
Editorial Board
Z. Ghahramani (Department of Engineering, University of Cambridge)
R. Gill (Mathematical Insitute, Leiden University)
F. P. Kelly (Department of Pure Mathematics and Mathematical Statistics,
University of Cambridge)
B. D. Ripley (Department of Statistics, University of Oxford)
S. Ross (Department of Industrial and Systems Engineering,
University of Southern California)
M. Stein (Department of Statistics, University of Chicago)
This series of high-quality upper-division textbooks and expository monographs covers all
aspects of stochastic applicable mathematics. The topics range from pure and applied statistics
to probability theory, operations research, optimization, and mathematical programming. The
books contain clear presentations of new developments in the field and also of the state of
the art in classical methods. While emphasizing rigorous treatment of theoretical methods, the
books also contain applications and discussions of new techniques made possible by advances
in computational practice.
A complete list of books in the series can be found at http://www.cambridge.org/statistics.
Recent titles include the following:
11. Statistical Models, by A. C. Davison
12. Semiparametric Regression, by David Ruppert, M. P. Wand and R. J. Carroll
13. Exercises in Probability, by Loı̈c Chaumont and Marc Yor
14. Statistical Analysis of Stochastic Processes in Time, by J. K. Lindsey
15. Measure Theory and Filtering, by Lakhdar Aggoun and Robert Elliott
16. Essentials of Statistical Inference, by G. A. Young and R. L. Smith
17. Elements of Distribution Theory, by Thomas A. Severini
18. Statistical Mechanics of Disordered Systems, by Anton Bovier
19. The Coordinate-Free Approach to Linear Models, by Michael J. Wichura
20. Random Graph Dynamics, by Rick Durrett
21. Networks, by Peter Whittle
22. Saddlepoint Approximations with Applications, by Ronald W. Butler
23. Applied Asymptotics, by A. R. Brazzale, A. C. Davison and N. Reid
24. Random Networks for Communication, by Massimo Franceschetti and Ronald Meester
25. Design of Comparative Experiments, by R. A. Bailey
26. Symmetry Studies, by Marlos A. G. Viana
27. Model Selection and Model Averaging, by Gerda Claeskens and Nils Lid Hjort
28. Bayesian Nonparametrics, edited by Nils Lid Hjort et al.
29. From Finite Sample to Asymptotic Methods in Statistics, by Pranab K. Sen,
Julio M. Singer and Antonio C. Pedrosa de Lima
30. Brownian Motion, by Peter Mörters and Yuval Peres
31. Probability, by Rick Durrett
33. Stochastic Processes, by Richard F. Bass
34. Structured Regression for Categorical Data, by Gerhard Tutz

Stochastic Processes
Richard F. Bass
University of Connecticut

CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Tokyo, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9781107008007
C© R. F. Bass 2011
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2011
Printed in the United Kingdom at the University Press, Cambridge
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Bass, Richard F.
Stochastic processes / Richard F. Bass.
p. cm. – (Cambridge series in statistical and probabilistic mathematics ; 33)
Includes index.
ISBN 978-1-107-00800-7 (hardback)
1. Stochastic analysis. I. Title.
QA274.2.B375 2011
519.2′32 – dc23 2011023024
ISBN 978-1-107-00800-7 Hardback
Cambridge University Press has no responsibility for the persistence or
accuracy of URLs for external or third-party internet websites referred to
in this publication, and does not guarantee that any content on such
websites is, or will remain, accurate or appropriate.

http://www.cambridge.org

http://www.cambridge.org/9781107008007

To Meredith, as always

Contents
Preface page xiii
Frequently used notation xv
1 Basic notions 1
1.1 Processes and σ -fields 1
1.2 Laws and state spaces 3
2 Brownian motion 6
2.1 Definition and basic properties 6
3 Martingales 13
3.1 Definition and examples 13
3.2 Doob’s inequalities 14
3.3 Stopping times 15
3.4 The optional stopping theorem 17
3.5 Convergence and regularity 17
3.6 Some applications of martingales 20
4 Markov properties of Brownian motion 25
4.1 Markov properties 25
4.2 Applications 27
5 The Poisson process 32
6 Construction of Brownian motion 36
6.1 Wiener’s construction 36
6.2 Martingale methods 39
7 Path properties of Brownian motion 43
8 The continuity of paths 49
vii

viii Contents
9 Continuous semimartingales 54
9.1 Definitions 54
9.2 Square integrable martingales 55
9.3 Quadratic variation 57
9.4 The Doob–Meyer decomposition 58
10 Stochastic integrals 64
10.1 Construction 64
10.2 Extensions 69
11 Itô’s formula 71
12 Some applications of Itô’s formula 77
12.1 Lévy’s theorem 77
12.2 Time changes of martingales 78
12.3 Quadratic variation 79
12.4 Martingale representation 79
12.5 The Burkholder–Davis–Gundy inequalities 82
12.6 Stratonovich integrals 84
13 The Girsanov theorem 89
13.1 The Brownian motion case 89
13.2 An example 92
14 Local times 94
14.1 Basic properties 94
14.2 Joint continuity of local times 96
14.3 Occupation times 97
15 Skorokhod embedding 100
15.1 Preliminaries 100
15.2 Construction of the embedding 105
15.3 Embedding random walks 108
16 The general theory of processes 111
16.1 Predictable and optional processes 111
16.2 Hitting times 115
16.3 The debut and section theorems 117
16.4 Projection theorems 119
16.5 More on predictability 120
16.6 Dual projection theorems 122
16.7 The Doob–Meyer decomposition 124
16.8 Two inequalities 126

Contents ix
17 Processes with jumps 130
17.1 Decomposition of martingales 130
17.2 Stochastic integrals 133
17.3 Itô’s formula 135
17.4 The reduction theorem 139
17.5 Semimartingales 141
17.6 Exponential of a semimartingale 143
17.7 The Girsanov theorem 144
18 Poisson point processes 147
19 Framework for Markov processes 152
19.1 Introduction 152
19.2 Definition of a Markov process 153
19.3 Transition probabilities 154
19.4 An example 156
19.5 The canonical process and shift operators 158
20 Markov properties 160
20.1 Enlarging the filtration 160
20.2 The Markov property 162
20.3 Strong Markov property 164
21 Applications of the Markov properties 167
21.1 Recurrence and transience 167
21.2 Additive functionals 169
21.3 Continuity 170
21.4 Harmonic functions 171
22 Transformations of Markov processes 177
22.1 Killed processes 177
22.2 Conditioned processes 178
22.3 Time change 180
22.4 Last exit decompositions 181
23 Optimal stopping 184
23.1 Excessive functions 184
23.2 Solving the optimal stopping problem 187
24 Stochastic differential equations 192
24.1 Pathwise solutions of SDEs 192
24.2 One-dimensional SDEs 196
24.3 Examples of SDEs 198

x Contents
25 Weak solutions of SDEs 204
26 The Ray–Knight theorems 209
27 Brownian excursions 214
28 Financial mathematics 218
28.1 Finance models 218
28.2 Black–Scholes formula 220
28.3 The fundamental theorem of finance 223
28.4 Stochastic control 226
29 Filtering 229
29.1 The basic model 229
29.2 The innovation process 230
29.3 Representation of FZ-martingales 231
29.4 The filtering equation 232
29.5 Linear models 234
29.6 Kalman–Bucy filter 234
30 Convergence of probability measures 237
30.1 The portmanteau theorem 237
30.2 The Prohorov theorem 239
30.3 Metrics for weak convergence 241
31 Skorokhod representation 244
32 The space C[0, 1] 247
32.1 Tightness 247
32.2 A construction of Brownian motion 248
33 Gaussian processes 251
33.1 Reproducing kernel Hilbert spaces 251
33.2 Continuous Gaussian processes 254
34 The space D[0, 1] 259
34.1 Metrics for D[0, 1] 259
34.2 Compactness and completeness 262
34.3 The Aldous criterion 264
35 Applications of weak convergence 269
35.1 Donsker invariance principle 269
35.2 Brownian bridge 273
35.3 Empirical processes 275

Contents xi
36 Semigroups 279
36.1 Constructing the process 279
36.2 Examples 283
37 Infinitesimal generators 286
37.1 Semigroup properties 286
37.2 The Hille–Yosida theorem 292
37.3 Nondivergence form elliptic operators 296
37.4 Generators of Lévy processes 297
38 Dirichlet forms 302
38.1 Framework 303
38.2 Construction of the semigroup 304
38.3 Divergence form elliptic operators 307
39 Markov processes and SDEs 312
39.1 Markov properties 312
39.2 SDEs and PDEs 314
39.3 Martingale problems 315
40 Solving partial differential equations 319
40.1 Poisson’s equation 319
40.2 Dirichlet problem 320
40.3 Cauchy problem 321
40.4 Schrödinger operators 323
41 One-dimensional diffusions 326
41.1 Regularity 326
41.2 Scale functions 327
41.3 Speed measures 329
41.4 The uniqueness theorem 333
41.5 Time change 334
41.6 Examples 336
42 Lévy processes 339
42.1 Examples 339
42.2 Construction of Lévy processes 340
42.3 Representation of Lévy processes 344
Appendices
A Basic probability 348
A.1 First notions 348
A.2 Independence 353
A.3 Convergence 355
A.4 Uniform integrability 356

xii Contents
A.5 Conditional expectation 357
A.6 Stopping times 359
A.7 Martingales 359
A.8 Optional stopping 360
A.9 Doob’s inequalities 361
A.10 Martingale convergence theorem 362
A.11 Strong law of large numbers 364
A.12 Weak convergence 367
A.13 Characteristic functions 370
A.14 Uniqueness and characteristic functions 372
A.15 The central limit theorem 372
A.16 Gaussian random variables 374
B Some results from analysis 378
B.1 The monotone class theorem 378
B.2 The Schwartz class 379
C Regular conditional probabilities 380
D Kolmogorov extension theorem 382
References 385
Index 387

Preface
Why study stochastic processes? This branch of probability theory offers sophisticated theo-
rems and proofs, such as the existence of Brownian motion, the Doob–Meyer decomposition,
and the Kolmogorov continuity criterion. At the same time stochastic processes also have
far-reaching applications: the explosive growth in options and derivatives in financial mar-
kets throughout the world derives from the Black–Scholes formula, while NASA relies on
the Kalman–Bucy method to filter signals from satellites and probes sent into outer space.
A graduate student taking a year-long course in probability theory first learns about
sequences of random variables and topics such as laws of large numbers, central limit
theorems, and discrete time martingales. In the second half of the course, the student will
then turn to stochastic processes, which is the subject of this text. Topics covered here are
Brownian motion, stochastic integrals, stochastic differential equations, Markov processes,
the Black–Scholes formula of financial mathematics, the Kalman–Bucy filter, as well as
many more.
The 42 chapters of this book can be grouped into seven parts. The first part consists
of Chapters 1–8, where some of the basic processes and ideas are introduced, including
Brownian motion. The next group of chapters, Chapters 9–15, introduce the theory of
stochastic calculus, including stochastic integrals and Itô’s formula. Chapters 16–18 explore
jump processes. This requires a study of the foundations of stochastic processes, which
is also known as the general theory of processes. Next we take up Markov processes in
Chapters 19–23. A formidable obstacle to the study of Markov processes is the notation, and
I have attempted to make this as accessible as possible. Chapters 24–29 involve stochastic
differential equations. Two very important applications, to financial mathematics and to
filtering, appear in Chapters 28 and 29, respectively. Probability measures on metric spaces
and the weak convergence of random variables taking values in a metric space prove to
be relevant to the study of stochastic processes. These and related topics are treated in
Chapters 30–35. We then return to Markov processes, namely, their construction and some
important examples, in Chapters 36–42. Tools used in the construction include infinitesimal
generators, Dirichlet forms, and solutions to stochastic differential equations, while two
important examples that we consider are diffusions on the real line and Lévy processes.
The prerequisites to this book are a sound knowledge of basic measure theory and a
course in the classical aspects of probability. The probability topics needed are provided
(with proofs) in an appendix.
There is far too much material in this book to cover in a single semester, and even too
much for a full year. I recommend that as a minimum the following chapters be studied:
Chapters 1–5, Chapters 9–13, Chapters 19–21, and Chapter 24. If possible, include either
xiii

xiv Preface
Chapter 28 or Chapter 29. In Chapter 11, the statement and corollaries of Itô’s formula are
very important, but the proof of Itô’s formula may be omitted.
I would like to thank the many students who patiently sat through my lectures, pointed out
errors, and made suggestions. I especially would like to thank my colleague Sasha Teplyaev
who taught a course from a preliminary version of this book and made a great number of
useful suggestions.

Frequently used notation
Here are some notational conventions we will use. We use the letter c, either with or without
subscripts, to denote a finite positive constant whose exact value is unimportant and which
may change from line to line. We use B(x, r) to denote the open Euclidean ball centered at
x with radius r. a ∧ b is the minimum of a and b, while a ∨ b is the maximum of a and b.
x+ = x ∨ 0 and x− = (−x) ∨ 0. The symbol ∃ is used in a few formulas and means “there
exists.” Q, Q+, N, and Z denote the rationals, the positive rationals, the natural numbers,
and the integers, respectively. If C is a matrix, CT is the transpose of C.
For a set A, we use Ac for the complement of A. If A is a subset of a topological space, A,
A0, and ∂A denote the closure, interior, and boundary of A, respectively.
Given a topological space S , we use C(S ) for the space of continuous functions on S ,
where we use the supremum norm. If S is a domain in Rd , Ck(S ) refers to the set of
continuous functions with domain S whose partial derivatives up to order k are continuous.
C∞ functions are those that are infinitely differentiable.
We will on a few occasions use the Fourier transform, which we define by
f̂ (u) =
∫
eiu·x f (x) dx
for f integrable. This agrees with the convention in Rudin (1987).
If X is a stochastic process whose paths are right continuous with left limits, then
Xt− = lims0Ft+ε. A filtration is right continuous if Ft+ = Ft for all t ≥ 0.
The σ -field Ft+ is supposed to represent what one knows if one looks ahead an infinites-
imal amount. Most of the filtrations we will come across will be right continuous, but see
Exercise 1.1.
A null set N is one that has outer probability 0. This means that
inf{P(A) : N ⊂ A, A ∈ F} = 0.
A filtration is complete if each Ft contains every null set. A filtration that is right continuous
and complete is said to satisfy the usual conditions.
Given a filtration {Ft}, whether or not it satisfies the usual conditions, we define F∞ to be
the σ -field generated by ∪t≥0Ft , that is, the smallest σ -field containing ∪t≥0Ft , and we write
F∞ =
∨
t≥0
Ft .
Recall that the arbitrary intersection of σ -fields is a σ -field, but the union of even two σ -fields
need not be a σ -field.
We say that a stochastic process X is adapted to a filtration {Ft} if Xt is Ft measurable
for each t. Often one starts with a stochastic process X and wants to define a filtration with
respect to which X is adapted.
1

2 Basic notions
The simplest way to do this is to let Ft be the σ -field generated by the random variables
{Xs, s ≤ t}. More often one wants to have a slightly larger filtration than the one generated
by X .
We define the minimal augmented filtration generated by X to be the smallest filtration that
is right continuous and complete and with respect to which the process X is adapted. For each
t, Ft is in general strictly larger than the smallest σ -field with respect to which {Xs : s ≤ t} is
measurable because of the inclusion of the null sets. It is important to include the null sets;
see Exercise 1.5. There is no widely accepted name for what we call the minimal augmented
filtration; I like this nomenclature because it is descriptive and sufficiently different from
“filtration generated by X ” to avoid confusion.
The minimal augmented filtration generated by the process Xt can be constructed in three
steps. First, let {F 00t } be the smallest filtration with respect to which X is adapted, that is,
F 00t = σ (Xs; s ≤ t). (1.1)
Let P∗ be the outer probability corresponding to P: for A ⊂ �,
P∗(A) = inf{P(B) : B ∈ F , A ⊂ B}.
Let N be the collection of null sets, so that N = {A ⊂ � : P∗(A) = 0}. The second step is
to let F 0t be the smallest σ -field containing F 00t and N , or
F 0t = σ (F 00t ∪ N ). (1.2)
The third step is to let
Ft = ∩ε>0F 0t+ε. (1.3)
Exercise 1.2 asks you to check that {Ft} is the minimal augmented filtration generated by X .
We will refer to {F 00t } as the filtration generated by X .
Two stochastic processes X andY are said to be indistinguishable if P(Xt �= Yt for some t ≥
0) = 0. X and Y are versions of each other if for each t ≥ 0, we have P(Xt �= Yt ) = 0. An
example of two processes that are versions of each other but are not indistinguishable is to
let � = [0, 1], F the Borel σ -field on [0, 1], P Lebesgue measure on [0, 1], X (t, ω) = 0
for all t and ω, and Y (t, ω) equal to 1 if t = ω and 0 otherwise. Note that the functions
t → X (t, ω) are continuous for each ω, but the functions t → Y (t, ω) are not continuous
for any ω.
If X is a stochastic process, the functions t → X (t, ω) are called the paths or trajectories
of X . There will be one path for each ω. If the paths of X are continuous functions, except
for a set of ω’s in a null set, then X is called a continuous process, or is said to be continuous.
We similarly define right continuous process, left continuous process, etc.
A function f (t) is right continuous with left limits if limh>0,h↓0 f (t + h) = f (t) for all
t and limh<0,h↑0 f (t + h) exists for all t > 0. Almost all our stochastic processes will have
the property that except for a null set of ω’s the function t → X (t, ω) is right continuous
and has left limits. One often sees cadlag to refer to paths that are right continuous with left
limits; this abbreviates the French “continue à droite, limite à gauche.”

1.2 Laws and state spaces 3
1.2 Laws and state spaces
Let S be a topological space. The Borel σ -field on S is defined to be the σ -field generated
by the open sets of S . A function f : S → R is Borel measurable if f −1(G) is in the Borel
σ -field of S whenever G is an open subset of R. A random variable Y : � → S is measurable
with respect to a σ -field F of subsets of � if {ω ∈ � : Y (ω) ∈ A} is in F whenever A is in
the Borel σ -field on S .
A stochastic process taking values in a topological space S is a map X : [0, ∞)×� → S ,
where for each t, the random variable Xt is measurable with respect to F .
Recall that if we have a probability space (�,F , P) and Y : � → R is a random variable,
then the law of Y is the probability measure PY on the Borel subsets of R defined by
PY (A) = P(Y ∈ A). Similarly, if Y : � → Rd is a d-dimensional random vector, then the law
of Y is the probability measure PY on the Borel subsets of Rd defined by PY (A) = P(Y ∈ A).
We extend this definition to random variables Y taking values in a topological space S . In
this case PY is a probability measure on the Borel subsets of S with the same definition:
PY (A) = P(Y ∈ A). In particular, if Y and Z are two random variables with the same state
space S , then Y and Z will have the same law if P(Y ∈ A) = P(Z ∈ A) for all Borel subsets
A of S .
The relevance of the preceding paragraph to stochastic processes is this. Suppose X and
Y are stochastic processes with continuous paths. Let S = C[0, ∞) be the collection of
real-valued continuous functions on [0, ∞) together with the usual metric defined in terms
of the supremum norm:
d( f , g) = sup
0≤t
| f (t) − g(t)|.
(Strictly speaking, we should write C([0, ∞)), but we follow the usual convention and drop
the outside parentheses.) Let the random variable X taking values in S be defined by setting
X (ω) to be the continuous function t → X (t, ω), and define Y similarly. More precisely,
X : � → S with
X (ω)(t) = X (t, ω), t ≥ 0.
Then X and Y are random variables taking values in the metric space S , and saying that X
and Y have the same law means that P(X ∈ A) = P(Y ∈ A) for all Borel subsets A of S .
When this happens, we also say that the stochastic processes X and Y have the same law.
Two stochastic processes X and Y have the same finite-dimensional distributions if for
every n ≥ 1 and every t1 < · · · < tn, the laws of (Xt1, . . . , Xtn ) and (Yt1, . . . ,Ytn ) are equal. Most often the topological spaces we will consider will also be metric spaces, but there will be a few occasions when we want to consider topological spaces that are not metric spaces. Suppose S = R[0,∞). We furnish S with the product topology. S can be identified with the collection of real-valued functions on [0, ∞), but the topology is not given by the supremum norm nor by any other metric. We use f for elements of S , where f (t) is the tth coordinate of f . We call a subset A of S a cylindrical set if there exist n ≥ 1, non-negative reals t1, t2, . . . , tn, and a Borel subset B of Rn such that A = { f ∈ S : ( f (t1), . . . , f (tn)) ∈ B}. 4 Basic notions The appropriate σ -field to use on S is the one generated by the collection of cylindrical sets. We want to generalize this notion slightly by allowing more general index sets and by allowing for the possibility of considering only a subset of the product space. Definition 1.1 Let U be a topological space, T an arbitrary index set, and B a subset of UT , the collection of functions from T into U . We say a set C is a cylindrical subset of B if there exist n ≥ 1, t1, . . . , tn ∈ T , and a Borel subset A of Rn such that C = { f ∈ B : ( f (t1), . . . , f (tn)) ∈ A}. Exercises 1.1 This exercise gives an example where {F00t } defined by (1.1) is not right continuous. Let � = {a, b}, let F be the collection of all subsets of �, and let P({a}) = P({b}) = 12 . Define Xt (ω) = ⎧⎪⎨⎪⎩ 0, t ≤ 1; 0, t > 1 and ω = a;
t − 1, t > 1 and ω = b.
Calculate F00t = σ (Xs; s ≤ t) and show {F00t } is not right continuous.
1.2 If X is a stochastic process, let F00t , F0t , and Ft be defined by (1.1), (1.2), and (1.3), respectively.
Show that {Ft} is the minimal augmented filtration generated by X .
1.3 Let {Ft} be a filtration satisfying the usual conditions and let B[0, t] be the Borel σ -field on
[0, t]. A real-valued stochastic process X is progressively measurable if for each t ≥ 0, the
map (s, ω) → X (s, ω) from [0, t] × � to R is measurable with respect to the product σ -field
B[0, t] × Ft .
(1) If X is adapted to {Ft} and we define
X (n)t (ω) =
∞∑
k=0
Xk/2n (ω)1[k/2n,(k+1)/2n )(t),
show that X (n) is progressively measurable for each n ≥ 1.
(2) Use (1) to show that if X is adapted to {Ft} and has left continuous paths, then X is
progressively measurable.
(3) If X is adapted to {Ft} and we define
Y (n)t (ω) =
∞∑
k=0
X(k+1)/2n (ω)1[k/2n,(k+1)/2n )(t),
show that for each t ≥ 0, the map (s, ω) → Y (n)(s, ω) from [0, t] × � to R is measurable with
respect to B[0, t] × Ft+2−n .
(4) Show that if X is adapted to {Ft} and has right continuous paths, then X is progressively
measurable.
1.4 Let S = R[0,1], the set of functions from [0, 1] to R, and let F be the σ -field generated by the
cylindrical sets. The purpose of this exercise is to show that the elements of F depend on only
countably many coordinates.

Notes 5
Let S0 = {(x1, x2, . . .)}, the set of sequences taking values in R. Let F0 be the σ -field
generated by the cylindrical subsets of RN, where N = {1, 2, . . .}.
Show that B ∈ F if and only if there exist t1, t2, . . . in [0, 1] and a set C ∈ F0 such that
B = { f ∈ S : ( f (t1), f (t2), . . .) ∈ C}.
1.5 Null sets are sometimes important! Let S and F be as in Exercise 1.4. Show that D /∈ F , where
D = { f ∈ S : f is a continuous function on [0, 1]}.
1.6 Suppose X is a stochastic process, {Ft} its minimal augmented filtration, and F∞ = ∨t≥0Ft .
Suppose with probability one, the paths of X are right continuous with left limits. Let Xt− =
lims 1},
prove A ∈ F∞.
1.7 Suppose X is a stochastic process, {Ft} is the minimal augmented filtration for X , and F∞ =
∨t≥0Ft . If the paths of X are right continuous with left limits with probability one, show that
the event
A = {X has continuous paths}
is in F∞.
Notes
The older literature sometimes uses the notion of a separable stochastic process, but this is
rarely seen nowadays. For much more on measurability, see Chapter 16. For the complete
story on the foundations of stochastic processes, see Dellacherie and Meyer (1978).

2
Brownian motion
Brownian motion is by far the most important stochastic process. It is the archetype of
Gaussian processes, of continuous time martingales, and of Markov processes. It is basic to
the study of stochastic differential equations, financial mathematics, and filtering, to name
only a few of its applications.
In this chapter we define Brownian motion and consider some of its elementary aspects.
Later chapters will take up the construction of Brownian motion and properties of Brownian
motion paths.
2.1 Definition and basic properties
Let (�,F , P) be a probability space and let {Ft} be a filtration, not necessarily satisfying
the usual conditions.
Definition 2.1 Wt = Wt (ω) is a one-dimensional Brownian motion with respect to {Ft} and
the probability measure P, started at 0, if
(1) Wt is Ft measurable for each t ≥ 0.
(2) W0 = 0, a.s.
(3) Wt − Ws is a normal random variable with mean 0 and variance t − s whenever s < t. (4) Wt − Ws is independent of Fs whenever s < t. (5) Wt has continuous paths. If instead of (2) we have W0 = x, we say we have a Brownian motion started at x. Defini- tion 2.1(4) is referred to as the independent increments property of Brownian motion. The fact that Wt − Ws has the same law as Wt−s, which follows from Definition 2.1(3), is called the stationary increments property. When no filtration is specified, we assume the filtration is the filtration generated by W , i.e., Ft = σ (Ws; s ≤ t). Sometimes a one-dimensional Brownian motion started at 0 is called a standard Brownian motion. Figure 2.1 is a simulation of a typical Brownian motion path. We define d-dimensional Brownian motion with respect to a filtration {Ft} and started at x = (x1, . . . , xd ) to be (W (1)t , . . . ,W (d)t ), where the W (i) are each one-dimensional Brow- nian motions with respect to {Ft} started at xi, respectively, and W (1), . . . ,W (n) are all independent. The law of a Brownian motion is called Wiener measure. More precisely, given a Brownian motion W , we can view it as a random variable taking values in C[0, ∞), the space of real-valued continuous functions on [0, ∞). The law of W is the measure PW on 6 2.1 Definition and basic properties 7 0 0.2 0.4 0.6 0.8 1 2 1.5 1 0.5 0 0.5 1 1.5 2 Figure 2.1 Simulation of a typical Brownian motion path. C[0, ∞) defined by PW (A) = P(W ∈ A) for all Borel subsets A of C[0, ∞). The measure PW is Wiener measure. There are a number of transformations one can perform on a Brownian motion that yield a new Brownian motion. The first one is called the scaling property of Brownian motion, or simply scaling. Proposition 2.2 If W is a Brownian motion started at 0, a > 0, and Yt = aWt/a2 , then Yt is a
Brownian motion started at 0.
Proof We use Gt = Ft/a2 for the filtration for Y . Clearly Yt has continuous paths, Y0 = 0,
a.s., and Yt is Gt measurable. If s < t, Yt − Ys = a(Wt/a2 − Ws/a2 ) is independent of Fs/a2 , hence is independent of Gs. Finally, if s < t, and if s < t, then Yt −Ys will be a normal random variable with mean zero and Var (Yt − Ys) = a2Var (Wt/a2 − Ws/a2 ) = a2 ( t a2 − s a2 ) = t − s. This suffices to give our result. For some other transformations, see Exercises 2.3 and 2.5. Recall what it means for a finite collection of random variables to be jointly normal; see (A.29). A stochastic process X is Gaussian or jointly normal if all its finite-dimensional distributions are jointly normal, that is, if for each n ≥ 1 and t1 < · · · < tn, the collection of random variables Xt1, . . . , Xtn is a jointly normal collection. 8 Brownian motion Proposition 2.3 If W is a Brownian motion, then W is a Gaussian process. Proof Suppose W is a Brownian motion and let 0 = t0 < t1 < · · · < tn. Define Zi = Wti − Wti−1√ ti − ti−1 , i = 1, 2, . . . , n. By Definition 2.1(4), Zi is independent of Fti−1 , and hence independent of Z1, . . . , Zj−1. By Definition 2.1(3), Zi is a mean-zero random variable with variance one. We can write Wtj = j∑ i=1 (ti − ti−1)1/2Zi, j = 1, . . . , n, and so (Wt1, . . . ,Wtn ) is jointly normal. It follows that Brownian motion is a Gaussian process. Since the law of a finite collection of jointly normal random variables is determined by their means and covariances, let’s calculate the covariance of Ws and Wt when W is a Brownian motion. If s ≤ t, then t − s = Var (Wt − Ws) = VarWt + VarWs − 2 Cov (Ws,Wt ) = t + s − 2 Cov (Ws,Wt ) from Definition 2.1(2) and (3). Hence Cov (Ws,Wt ) = s if s ≤ t. This is frequently written as Cov (Ws,Wt ) = s ∧ t. (2.1) We have the following converse. Theorem 2.4 If W is a process such that all the finite-dimensional distributions are jointly normal, EWs = 0 for all s, Cov (Ws,Wt ) = s when s ≤ t, and the paths of Wt are continuous, then W is a Brownian motion. Proof For Ft we take the filtration generated by W . If we take s = t, then VarWt = Cov (Wt,Wt ) = t. In particular, VarW0 = 0, and since EW0 = 0, then W0 = 0, a.s. We have Var (Wt − Ws) = VarWt − 2 Cov (Ws,Wt ) + VarWt = t − 2s + s = t − s. We have thus established all the parts of Definition 2.1 except for the independence of Wt −Ws from Fs. If r ≤ s < t, then Cov (Wt − Ws,Wr) = Cov (Wt,Wr) − Cov (Ws,Wr) = r − r = 0, and so Wt − Ws is independent of Wr by Proposition A.55. This shows that Wt − Ws is independent of Fs. We now look at two results that are more technical. These should only be skimmed on the first reading of the book: read the statements, but not the proofs. The first result says that if W is a Brownian motion with respect to the filtration generated by W , then it is also a Brownian motion with respect to the minimal augmented filtration. 2.1 Definition and basic properties 9 Proposition 2.5 Let Wt be a Brownian motion with respect to {F 00t }, where F 00t = σ (Ws; s ≤ t). Let N be the collection of null sets, F 0t = σ (F 00t ∪ N ), and Ft = ∩ε>0F 0t+ε.
(1) W is a Brownian motion with respect to the filtration {Ft}.
(2) Ft = F 0t for each t.
Proof (1) The only property we need to check is Definition 2.1(4). If f is a continuous
bounded function on R, A ∈ F 00s , and s < t, then because W is a Brownian motion with respect to {F 00t }, the independent increments property shows that E [ f (Wt − Ws); A] = E [ f (Wt − Ws)] P(A). (2.2) If A is such that A\B and B\A are null sets for some B ∈ F 00s , it is easy to see that (2.2) continues to hold. By linearity, it also holds if A is a finite disjoint union of such sets. If C1 is the collection of subsets of F 0s that are finite disjoint unions of such sets, then C1 is an algebra of subsets of F 0s . Let M1 be the collection of subsets of F 0s for which (2.2) holds. It is readily checked that M1 is a monotone class. By the monotone class theorem (Theorem B.2), M1 is equal to the smallest σ -field containing C1, which is F 0s . Therefore (2.2) holds for all A ∈ F 0s . Now suppose A ∈ Fs = F 0s+. Then for each ε > 0, A ∈ F 0s+ε, and so using (2.2) with s
replaced by s + ε and t replaced by t + ε, we have
E [ f (Wt+ε − Ws+ε ); A] = E [ f (Wt+ε − Ws+ε )] P(A). (2.3)
Letting ε → 0 and using the facts that f is bounded and continuous and W has continuous
paths, the dominated convergence theorem implies that
E [ f (Wt − Ws); A] = E [ f (Wt − Ws)] P(A). (2.4)
This equation holds whenever f is continuous and A ∈ Fs. By a limit argument, (2.4) holds
whenever f is the indicator of a Borel subset of R. That says that Wt − Ws and Fs are
independent.
(2) Fix t and choose t0 > t. Let M2 be the collection of subsets of F 00t0 whose conditional
expectation with respect to Ft is F 0t measurable, that is, A ∈ M2 if A ∈ F 00t0 and E [1A | Ft]
is F 0t measurable. Let C2 be the collection of events A for which there exist n ≥ 1, 0 ≤ s0 < s1 < · · · < sn ≤ t0 with t equal to one of the si, and Borel subsets B1, . . . , Bn of R such that A = (Ws1 − Ws0 ∈ B1, . . . ,Wsn − Wsn−1 ∈ Bn). Suppose A is of this form, and suppose t = si. Then by the independence result that we proved in (1), E [1A | Ft] = 1(Ws1 −Ws0 ∈B1,...,Wsi −Wsi−1 ∈Bi ) × P(Wsi+1 − Wsi ∈ Bi+1, . . . ,Wsn − Wsn−1 ∈ Bn), which is F 0t measurable. Thus C2 ⊂ M2. Finite unions of sets in C2 form an algebra of subsets of F 00t0 that generate F 00t . It is easy to check that M2 is a monotone class, so by the monotone class theorem, M2 equals F 00t . By linearity and taking monotone limits, if Y is non-negative and F 00t measurable, then E [Y | Ft] is F 0t measurable. 10 Brownian motion To finish, suppose A ∈ Ft . Then since t < t0, we see that A ∈ F 0t0 . By Exercise 2.7, there exists Y ∈ F 00t0 such that 1A = Y , a.s. Then E [Y | Ft] is F 0t measurable. Since F 0t contains all the null sets, 1A = E [1A | Ft] is also F 0t measurable, or A ∈ F 0t . This proves (2). The final item we consider in this chapter is a subtle one. The question is this: if W and W ′ are both Brownian motions, do they have all the same properties? To illustrate this issue, let’s revisit the example of Chapter 1 where � = [0, 1], F is the Borel σ -field on [0, 1], P is Lebesgue measure on [0, 1], X (t, ω) = 0 for all t and ω, and Y (t, ω) is 1 if t = ω and 0 otherwise. For each t, P(Xt = Yt ) = 1, so X and Y have the same finite-dimensional distributions. However, if A = { f : f is not a continuous function on [0, 1]}, then (X ∈ A) is a null set but (Y ∈ A) is not. Even though X and Y have the same finite-dimensional distributions, X has continuous paths but Y does not. To rephrase our question, is it true that P(W ∈ A) = P(W ′ ∈ A) for every Borel subset A of C[0, ∞)? We know W and W ′ have the same finite-dimensional distributions because each is jointly normal with zero means and Cov (Ws,Wt ) = s ∧ t = Cov (W ′s ,W ′t ). The fact that the answer to our question is yes then comes from the following theorem. We look at C[0, t0] instead of C[0, ∞) for the sake of simplicity. Theorem 2.6 Let t0 > 0 and let X ,Y be random variables taking values in C[0, t0] which
have the same finite-dimensional distributions. Then the laws of X and Y are equal.
Proof Let M be the collection of Borel subsets A of C[0, t0] for which P(X ∈ A) equals
P(Y ∈ A). We will show that M is a monotone class and then use the monotone class
theorem to show that M is equal to the Borel σ -field on C[0, t0].
First, let C be the collection of all cylindrical subsets of C[0, t0] (defined by Defini-
tion 1.1). Since the finite-dimensional distributions of X and Y are equal, then M contains
C. It is easy to check that C is an algebra of subsets of C[0, t0]. If A1 ⊃ A2 ⊃ · · · are elements
of M, then
P(X ∈ ∩nAn) = lim
n
P(X ∈ An) = lim
n
P(Y ∈ An) = P(Y ∈ ∩nAn)
since P is a finite measure. Therefore ∩nAn ∈ M. A very similar argument shows that if
A1 ⊂ A2 ⊂ · · · are elements of M, then ∪nAn ∈ M. Therefore M is a monotone class. By
the monotone class theorem, M contains the smallest σ -field containing C. We will show
that M contains all the open sets; then M will contain the smallest σ -field containing the
open sets, and we will be done.
Since C[0, t0] is separable, every open set is the countable union of open balls. Because
M is a σ -field, it suffices to show that M contains the open balls in C[0, t0], that is, all sets
of the form
B( f0, r) = { f ∈ C[0, t0] : sup
0≤t≤t0
| f (t) − f0(t)| < r} where r > 0 and f0 ∈ C[0, t0]. For each m and n,
{ f ∈ C[0, t0] : sup
0≤k≤2nt0
| f (k/2n) − f0(k/2n)| ≤ r − (1/m)}

Exercises 11
is a set in C, and so is in M. As n → ∞, these sets decrease to
Dm = { f ∈ C[0, t0] : sup
0≤t≤t0
| f (t) − f0(t)| ≤ r − (1/m)},
since all the functions we are considering are continuous. Finally, Dm increases to B( f0, r)
as m → ∞, so B( f0, r) is in M as desired.
Exercises
2.1 Suppose W is a Brownian motion on [0, 1]. Let
Yt = W1−t − W1.
Show that Yt is a Brownian motion on [0, 1].
2.2 This exercise shows that the projection of a d-dimensional Brownian motion onto a hyperplane
yields a one-dimensional Brownian motion. Suppose (W (1)t , . . . ,W
(d)
t ) is a d-dimensional
Brownian motion started from 0 and λ1, . . . , λd ∈ R with
∑d
i=1 λ
2
i = 1. Show that Xt =∑d
i=1 λiW
(i)
t is a one-dimensional Brownian motion started from 0.
2.3 This exercise shows that rotating a Brownian motion about the origin yields another Brownian
motion. Let W be a d-dimensional Brownian motion started at 0 and let A be a d × d orthogonal
matrix, that is, A−1 = AT. Show that Yt = AWt is again a d-dimensional Brownian motion.
2.4 Here is a converse to Exercise 2.2: roughly speaking, if all the projections of a d-dimensional
process X onto hyperplanes are one-dimensional Brownian motions, then X is a d-dimensional
Brownian motion.
Suppose (X 1t , . . . , X
d
t ) is a d-dimensional continuous process, i.e., one taking values in
Rd . Let {Ft} be the minimal augmented filtration generated by X . Suppose that whenever
λ1, . . . , λd ∈ R with
∑d
i=1 λ
2
i = 1, then
∑d
i=1 λiX
i
t is a one-dimensional Brownian motion
started at 0 with respect to the filtration {Ft}.
(1) If u = (u1, . . . , ud ), let ‖u‖ = (
∑
u2j )
1/2 and let λ j = u j/‖u‖. Calculate
E exp
(
i
d∑
j=1
u jX
j
t
)
= E exp
(
i‖u‖
d∑
j=1
λ jX
j
t
)
,
the joint characteristic function of Xt .
(2) If t0 < t1 < · · · < tn, use independence and (1) to calculate E exp ( i n−1∑ k=0 d∑ j=1 ukj (X j tk+1 − X jtk ) ) . (3) Prove that (X 1t , . . . , X d t ) is a d-dimensional Brownian motion started from 0. (Some care is needed with the filtrations. If we only know that Y λ =∑i λiX i is a Brownian motion with respect to the filtration generated by Y λ for each λ = (λ1, . . . , λd ), the assertion is not true. See Revuz and Yor (1999), Exercise I.1.19.) 2.5 Let Wt be a Brownian motion and suppose lim t→∞Wt/t = 0, a.s. (2.5) Let Zt = tW1/t if t > 0 and set Z0 = 0. (This is called time inversion.) Show that Z is a Brownian
motion. (We will see later that the assumption (2.5) is superfluous; see Theorem 7.2.)

12 Brownian motion
2.6 Let X and Y be two independent Brownian motions started at 0 and let t0 > 0. Let
Zt =
{
Xt , t ≤ t0,
Xt0 + Yt−t0 , t > t0.
Prove that Z is also a Brownian motion.
2.7 Let F00t and F0t be defined as in (1.1) and (1.2). Prove that if X is F0t measurable, there exists
Z such that Z is F00t measurable and Y = Z, a.s.
2.8 Let F00t and F0t be defined as in (1.1) and (1.2). The symmetric difference of two sets A and B
is defined by A � B = (A\B) ∪ (B\A). Prove that
F0t = {A ⊂ � : A � B ∈ N for some B ∈ F00t }.
Notes
Brownian motion is named for Robert Brown, a botanist who observed the erratic motion
of colloidal particles in suspension in the 1820s. Brownian motion was used by Bachelier
in 1900 in his PhD thesis to model stock prices and was the subject of an important paper
by Einstein in 1905. The rigorous mathematical foundations for Brownian motion were first
given by Wiener in 1923.

3
Martingales
Although discrete-time martingales are useful in a first course on probability, they are nowhere
near as useful as continuous-time martingales are in the study of stochastic processes.
The whole theory of stochastic integrals and stochastic differential equations is based on
martingales indexed by times t ∈ [0, ∞). After giving the definition and some examples, we
extend Doob’s inequalities, the optional stopping theorem, and the martingale convergence
theorem to continuous-time martingales. We then derive some estimates for Brownian motion
using martingale techniques.
3.1 Definition and examples
We define continuous-time martingales. Let {F t} be a filtration, not necessarily satisfying
the usual conditions.
Definition 3.1 Mt is a continuous-time martingale with respect to the filtration {Ft} and the
probability measure P if
(1) E |Mt | < ∞ for each t; (2) Mt is Ft measurable for each t; (3) E [Mt | Fs] = Ms, a.s., if s < t. Part (2) of the definition can be rephrased as saying Mt is adapted to Ft . If in part (3) “=” is replaced by “≥,” then Mt is a submartingale, and if it is replaced by “≤,” then we have a supermartingale. Taking expectations in Definition 3.1(3), we see that if s < t, then E Ms ≤ E Mt is M is a submartingale and E Ms ≥ E Mt if M is a supermartingale. Thus submartingales tend to increase, on average, and supermartingales tend to decrease, on average. There are many martingales associated with Brownian motion. Here are three examples. Example 3.2 Let Mt = Wt , where Wt is a Brownian motion. Then Mt is a martingale. To verify Definition 3.1(3), we write E [Mt | Fs] = Ms + E [Wt − Ws | Fs] = Ms + E [Wt − Ws] = Ms, using the independent increments property of Brownian motion and the fact that E [Wt − Ws] = 0. 13 14 Martingales Example 3.3 Let Mt = W 2t − t, where Wt is a Brownian motion. To show Mt is a martingale, we write E [Mt | Fs] = E [(Wt − Ws + Ws)2 | Fs] − t = W 2s + E [(Wt − Ws)2 | Fs] + 2E [Ws(Wt − Ws) | Fs] − t = W 2s + E [(Wt − Ws)2] + 2WsE [Wt − Ws | Fs] − t = W 2s + E [(Wt − Ws)2] + 2WsE [Wt − Ws] − t = W 2s + (t − s) − t = Ms. We used the facts that Ws is Fs measurable and that Wt − Ws is independent of Fs. Example 3.4 Again let Wt be a Brownian motion, let a ∈ R, and let Mt = eaWt−a2t/2. Since Wt − Ws is normal with mean zero and variance t − s, we know E ea(Wt−Ws ) = ea2(t−s)/2; see (A.6). Then E [Mt | Fs] = e−a2t/2eaWsE [ea(Wt−Ws ) | Fs] = e−a2t/2eaWsE [ea(Wt−Ws )] = e−a2t/2eaWs ea2(t−s)/2 = Ms. We give one more example of a martingale, although not one derived from Brownian motion. Example 3.5 Recall that given a filtration {Ft}, each Ft is contained in F , where (�,F , P) is our probability space. Let X be an integrable F measurable random variable, and let Mt = E [X | Ft]. Then E [Mt | Fs] = E [E [X | Ft] | Fs] = E [X | Fs] = Ms, and M is a martingale. 3.2 Doob’s inequalities We derive the analogs of Doob’s inequalities in the stochastic process context. Theorem 3.6 Suppose Mt is a martingale or non-negative submartingale with paths that are right continuous with left limits. Then (1) P(sup s≤t |Ms| ≥ λ) ≤ E |Mt |/λ. (2) If 1 < p < ∞, then E [sup s≤t |Ms|]p ≤ ( p p − 1 )p E |Mt |p. Proof We will do the case where Mt is a martingale, the submartingale case being nearly identical. Let Dn = {kt/2n : 0 ≤ k ≤ 2n}. If we set N (n)k = Mkt/2n and G (n)k = Fkt/2n , it is clear that {N (n)k } is a discrete-time martingale with respect to {G (n)k }. Let An = { sup s≤t,s∈Dn |Ms| > λ}.

3.3 Stopping times 15
By Doob’s inequality for discrete-time martingales (see Theorem A.32),
P(An) = P(max
k≤2n
|N (n)k | > λ) ≤
E |N (n)2n |
λ
= E |Mt |
λ
.
Note that the An are increasing, and since Mt is right continuous,
∪nAn = {sup
s≤t
|Ms| > λ}.
Then
P(sup
s≤t
|Ms| > λ) = P(∪nAn) = lim
n→∞
P(An) ≤ E |Mt |/λ.
If we apply this with λ replaced by λ − ε and let ε → 0, we obtain (1).
The proof of (2) is similar. By Doob’s inequality for discrete-time martingales (see
Theorem A.33),
E [sup
k≤2n
|N (n)k |p] ≤
( p
p − 1
)p
E |N (n)2n |p =
( p
p − 1
)p
E |Mt |p.
Since supk≤2n |N (n)k |p increases to sups≤t |Ms|p by the right continuity of M , (2) follows by
Fatou’s lemma.
3.3 Stopping times
Throughout this section we suppose we have a filtration {Ft} satisfying the usual conditions.
Definition 3.7 A random variable T : � → [0, ∞] is a stopping time if for all t, (T < t) ∈ Ft . We say T is a finite stopping time if T < ∞, a.s. We say T is a bounded stopping time if there exists K ∈ [0, ∞) such that T ≤ K, a.s. Note that T can take the value infinity. Stopping times are also known as optional times. Given a stochastic process X , we define XT (ω) to be equal to X (T (ω), ω); that is, for each ω we evaluate t = T (ω) and then look at X (·, ω) at this time. Proposition 3.8 Suppose Ft satisfies the usual conditions. Then (1) T is a stopping time if and only if (T ≤ t) ∈ Ft for all t. (2) If T = t, a.s., then T is a stopping time. (3) If S and T are stopping times, then so are S ∨ T and S ∧ T . (4) If Tn, n = 1, 2, . . . , are stopping times with T1 ≤ T2 ≤ · · · , then so is supn Tn. (5) If Tn, n = 1, 2, . . . , are stopping times with T1 ≥ T2 ≥ · · · , then so is inf n Tn. (6) If s ≥ 0 and S is a stopping time, then so is S + s. Proof We will just prove part of (1), leaving the rest as Exercise 3.4. Note (T ≤ t) = ∩n≥N (T < t + 1/n) ∈ Ft+1/N for each N . Thus (T ≤ t) ∈ ∩NFt+1/N ⊂ Ft+ = Ft . For a Borel measurable set A, let TA = inf{t > 0 : Xt ∈ A}. (3.1)

16 Martingales
Proposition 3.9 Suppose Ft satisfies the usual conditions and Xt has continuous paths.
(1) If A is open, then TA is a stopping time.
(2) If A is closed, then TA is a stopping time.
Proof (1) (TA < t) = ∩q∈Q+,q 0, A ∩ (T ≤ t) ∈ Ft}. (3.3)
This definition of FT , which is supposed to be the collection of events that are “known” by
time T , is not very intuitive. But it turns out that this definition works well in applications.
Exercise 3.6 gives an equivalent definition that is more appealing but not as useful.
Proposition 3.10 Suppose {Ft} is a filtration satisfying the usual conditions.
(1) FT is a σ -field.
(2) If S ≤ T , then FS ⊂ FT .
(3) If FT+ = ∩ε>0FT+ε, then FT + = FT .
(4) If Xt has right-continuous paths, then XT is FT measurable.
Proof If A ∈ FT , then Ac ∩ (T ≤ t) = (T ≤ t) \ [A ∩ (T ≤ t)] ∈ Ft , so Ac ∈ FT . The rest
of the proof of (1) is easy.
Suppose A ∈ FS and S ≤ T . Then A ∩ (T ≤ t) = [A ∩ (S ≤ t)] ∩ (T ≤ t). We have
A ∩ (S ≤ t) ∈ Ft because A ∈ FS , while (T ≤ t) ∈ Ft because T is a stopping time.
Therefore A ∩ (T ≤ t) ∈ Ft , which proves (2).
For (3), if A ∈ FT+, then A ∈ FT+ε for every ε, and so A ∩ (T + ε ≤ t) ∈ Ft for all t.
Hence A ∩ (T ≤ t − ε) ∈ Ft for all t, or equivalently A ∩ (T ≤ t) ∈ Ft+ε for all t. This is
true for all ε, so A ∩ (T ≤ t) ∈ Ft+ = Ft . This says A ∈ FT .
(4) Define Tn by (3.2). Note
(XTn ∈ B) ∩ (Tn = k/2n) = (Xk/2n ∈ B) ∩ (Tn = k/2n) ∈ Fk/2n .
Since Tn only takes values in {k/2n : k ≥ 0}, we conclude (XTn ∈ B) ∩ (Tn ≤ t) ∈ Ft and so
(XTn ∈ B) ∈ FTn ⊂ FT +1/2n .

3.5 Convergence and regularity 17
Hence XTn is FT +1/2n measurable. If n ≥ m, then XTn is measurable with respect to FT +1/2n ⊂
FT+1/2m . Since XTn → XT , then XT is FT +1/2m measurable for each m. Therefore XT is
measurable with respect to FT+ = FT .
3.4 The optional stopping theorem
We will need Doob’s optional stopping theorem for continuous-time martingales. An example
to keep in mind is Mt = Wt∧t0 , where W is a Brownian motion and t0 is some fixed time.
Exercise 3.12 is a version of the optional stopping time with slightly weaker hypotheses that
is often useful.
Theorem 3.11 Let {Ft} be a filtration satisfying the usual conditions. If Mt is a martingale
or non-negative submartingale whose paths are right continuous, supt≥0 E M
2
t < ∞, and T is a finite stopping time, then E MT ≥ E M0. Proof We do the submartingale case, the martingale case being very similar. By Doob’s inequality (Theorem 3.6(1)), E [sup s≤t M2s ] ≤ 4E M2t . Letting t → ∞, we have E [supt≥0 M2t ] < ∞ by Fatou’s lemma. Let us first suppose that T < K, a.s., for some real number K. Define Tn by (3.2). Let N (n)k = Mk/2n , G (n)k = Fk/2n , and Sn = 2nTn. By Doob’s optional stopping theorem applied to the submartingale N (n)k , we have E M0 = E N (n)0 ≤ E N (n)Sn = E MTn . Since M is right continuous, MTn → MT , a.s. The random variables |MTn | are bounded by 1 + supt≥0 M2t , so by dominated convergence, E MTn → E MT . We apply the above to the stopping time T ∧K to get E MT∧K ≥ E M0. The random variables MT∧K are bounded by 1 + supt≥0 M2t , so by dominated convergence, we get E MT ≥ E M0 when we let K → ∞. 3.5 Convergence and regularity We present the continuous-time version of Doob’s martingale convergence theorem. We will see that not only do we get limits as t → ∞, but also a regularity result. Let Dn = {k/2n : k ≥ 0}, D = ∪nDn. Theorem 3.12 Let {Mt : t ∈ D} be either a martingale, a submartingale, or a supermartin- gale with respect to {Ft : t ∈ D} and suppose supt∈D E |Mt | < ∞. Then (1) limt→∞ Mt exists, a.s. (2) With probability one Mt has left and right limits along D. The second conclusion says that except for a null set, if t0 ∈ [0, ∞), then both limt∈D,t↑t0 Mt and limt∈D,t↓t0 Mt exist and are finite. The null set does not depend on t0. Proof Martingales are also submartingales and if Mt is a supermartingale, then −Mt is a submartingale, so we may without loss of generality restrict our attention to submartingales. 18 Martingales By Doob’s inequality (Theorem 3.6(1)), P( sup t∈Dn,t≤n |Mt | > λ) ≤ 1
λ
E |Mn|.
Letting n → ∞ and using Fatou’s lemma,
P(sup
t∈D
|Mt | > λ) ≤ 1
λ
sup
t
E |Mt |.
This is true for all λ, so with probability one, {|Mt | : t ∈ D} is a bounded set.
Therefore the only way either (1) or (2) can fail is that if for some pair of rationals a Si :
Mt ≥ b}, and Si+1 = inf{t > Ti : Mt ≤ a}, then the number of upcrossings up to time u is
sup{k : Tk ≤ u}.
Doob’s upcrossing lemma (Theorem A.34) tells us that if Vn is the number of upcrossings
by {Mt : t ∈ Dn ∩ [0, n]}, then
EVn ≤ E |Mn|
b − a .
Letting n → ∞ and using Fatou’s lemma, the number of upcrossings of [a, b] by {Mt : t ∈ D}
has finite expectation, hence is finite, a.s. If Na,b is the null set where the number of upcrossings
of [a, b] by {Mt : t ∈ D} is infinite and N = ∪at,u→t
Mu.
It is clear that M̃ has paths that are right continuous with left limits. Since Ft+ = Ft and M̃t
is Ft+ measurable, then M̃t is Ft measurable.
Let N be fixed. We will show {Mt; t ≤ N} is a uniformly integrable family of random
variables; see Section A.4. Let ε > 0. Since MN is integrable, there exists δ such that
if P(A) < δ, then E [|MN |; A] < ε. If L is large enough, P(|Mt | > L) ≤ E |Mt |/L ≤
E |MN |/L < δ. Then E [|Mt |; |Mt | > L] ≤ E [|MN |; |Mt | > L] < ε, since |Mt | is a submartingale and (|Mt | > L) ∈ Ft . Uniform integrability is proved.

3.5 Convergence and regularity 19
Now let t < N . If B ∈ Ft , E [M̃t; B] = lim u∈D,u>t,u→t
E [Mu; B] = E [Mt; B].
Here we used the Vitali convergence theorem (Theorem A.19) and the fact that Mt is a
martingale. Since M̃t is Ft measurable, this proves that M̃t = Mt , a.s. Since N was arbitrary,
we have this for all t. We thus have found a version of M that has paths that are right
continuous with left limits. That M̃t is a martingale is easy.
The following technical result will be used several times in this book. A function f is
increasing if s < t implies f (s) ≤ f (t). A process At has increasing paths if the function t → At (ω) is increasing for almost every ω. Proposition 3.14 Suppose {Ft} is a filtration satisfying the usual conditions and suppose At is an adapted process with paths that are increasing, are right continuous with left limits, and A∞ = limt→∞ At exists, a.s. Suppose X is a non-negative integrable random variable, and Mt is a version of the martingale E [X | Ft] which has paths that are right continuous with left limits. Suppose E [X A∞] < ∞. Then E ∫ ∞ 0 X dAs = E ∫ ∞ 0 Ms dAs. (3.4) Proof First suppose X and A are bounded. Let n > 1 and write E
∫∞
0 X dAs as
∞∑
k=1
E [X (Ak/2n − A(k−1)/2n )].
Conditioning the kth summand on Fk/2n , this is equal to
E
[ ∞∑
k=1
E [X | Fk/2n ](Ak/2n − A(k−1)/2n )
]
.
Given s and n, define sn to be that value of k/2n such that (k − 1)/2n < s ≤ k/2n. We then have E ∫ ∞ 0 X dAs = E ∫ ∞ 0 Msn dAs. (3.5) For any value of s, sn ↓s as n → ∞, and since M has right-continuous paths, Msn → Ms. Since X is bounded, so is M. By dominated convergence, the right-hand side of (3.5) converges to E ∫ ∞ 0 Ms dAs. This completes the proof when X and A are bounded. We apply this to X ∧ N and A ∧ N , let N → ∞, and use monotone convergence for the general case. The only reason we assume X is non-negative is so that the integrals make sense. The equation (3.4) can be rewritten as E ∫ ∞ 0 X dAs = E ∫ ∞ 0 E [X | Fs] dAs. (3.6) 20 Martingales We also have E ∫ t 0 X dAs = E ∫ t 0 E [X | Fs] dAs (3.7) for each t. This follows either by following the above proof or by applying Proposition 3.14 to As∧t . 3.6 Some applications of martingales The following estimates are very useful. Proposition 3.15 If Wt is a Brownian motion, then P(sup s≤t Ws ≥ λ) ≤ e−λ2/2t, λ > 0, (3.8)
and
P(sup
s≤t
|Ws| ≥ λ) ≤ 2e−λ2/2t, λ > 0. (3.9)
Proof For any a the process {eaWt } is a submartingale. To see this, since x → eax is convex,
the conditional expectation form of Jensen’s inequality (Proposition A.21) implies
E [eaWt | Fs] ≥ eaE [Wt |Fs] = eaWs .
By Doob’s inequality (Theorem 3.6(1)),
P(sup
s≤t
Ws ≥ λ) = P(sup
s≤t
eaWs ≥ eaλ) ≤ E e
aWt
eaλ
. (3.10)
Since E eaY = ea2VarY/2 if Y is Gaussian with mean 0 by (A.6), it follows that the right side
of (3.10) is bounded by e−aλea
2t/2. If we now set a = λ/t, we obtain (3.8). Inequality (3.9)
follows by applying (3.8) to W and to −W and adding.
Let us use martingales to calculate some probabilities. Let us suppose a, b > 0 and set
T = inf{t > 0 : Wt = −a or Wt = b}, the first time Brownian motion exits the interval
[−a, b]. By Proposition 3.9, T is a stopping time.
We have
Proposition 3.16 Let W be a Brownian motion, let T = inf{t > 0 : Wt /∈ [−a, b]}, and let
a, b > 0. Then
P(WT = −a) = b
a + b, P(WT = b) =
a
a + b, (3.11)
and
E T = ab. (3.12)
Proof Since W 2t − t is a martingale with W0 = 0, it is easy to check that for each u,
W 2t∧u − (t ∧u) is also a martingale. Applying Theorem 3.11, we see that EW 2u∧T = E [u∧T ].
As u → ∞, the right-hand side tends to E T by monotone convergence. |Wu∧T |2 is bounded

3.6 Some applications of martingales 21
by (a + b)2, so by dominated convergence the left-hand side tends to EW 2T ≤ (a + b)2 as
u → ∞. Therefore
E T = EW 2T . (3.13)
In particular, E T < ∞, so we know T < ∞, a.s. We use that T is finite, a.s., to conclude that P(WT ∈ {−a, b}) = 1, or 1 = P(WT = −a) + P(WT = b). (3.14) Since Wt is a martingale, then so is Wt∧u for each u, and therefore EWu∧T = 0. Letting u → ∞ and using dominated convergence (noting |Wu∧T | is bounded by a + b), we have EWT = 0, or 0 = (−a)P(WT = −a) + bP(WT = b). (3.15) We get (3.11) by solving (3.14) and (3.15) for the unknowns P(WT = −a) and P(WT = b). We get (3.12) by (3.13), writing E T = EW 2T = (−a)2P(WT = −a) + b2P(WT = b), and substituting the values from (3.11). In proving Proposition 3.16, we used the fact thatWt∧T is a martingale and P(T < ∞) = 1. The same proof shows Corollary 3.17 Suppose Mt is a martingale with continuous paths and with M0 = 0, a.s., T = inf{t ≥ 0 : Mt /∈ [−a, b]}, and T < ∞, a.s. Then P(MT = −a) = b a + b, P(MT = b) = a a + b . We can also use martingales to get more subtle results. Suppose r > 0. Since erWt−r
2t/2 is
a martingale, as above
E erWT ∧t−r
2(T∧t)/2 = 1.
The exponent is bounded by rb if r > 0, so we can let t → ∞ and use dominated convergence
to get
E erWT −r
2T/2 = 1.
This can be written as
e−raE [e−r
2T/2;WT = −a] + erbE [e−r2T/2;WT = b] = 1.
Since e−rWt−r
2t/2 is also a martingale, similar reasoning gives us
eraE [e−r
2T/2;WT = −a] + e−rbE [e−r2T/2;WT = b] = 1.
We can solve those two equations to obtain
E
[
e−r
2T/2;WT = −a
]
= e
rb − e−rb
er(a+b) − e−r(a+b) (3.16)
and
E
[
e−r
2T/2;WT = b
]
= e
ra − e−ra
er(a+b) − e−r(a+b) . (3.17)

22 Martingales
The left-hand sides of (3.16) and (3.17) are the Laplace transforms of the quantities P(T ∈
dt;WT = −a)/dt and P(T ∈ dt;WT = b)/dt, respectively, and finding the inverse Laplace
transforms of the right-hand sides of (3.16) and (3.17) gives us formulas for P(T ∈ dt;WT =
−a)/dt and P(T ∈ dt;WT = b)/dt. If we add the two formulas, we get an expression for
P(T ∈ dt)/dt, and integrating over t from 0 to t0 gives an expression for P(T ≤ t0).
We sketch how to invert the Laplace transform and leave the detailed calculations and
justification for inverting a Laplace transform term by term to the interested reader. See also
Karatzas and Shreve (1991), Section 2.8. The right-hand side of (3.16) is equal to
e−ra − e−ra−2rb
1 − e−2r(a+b) .
Since e−2r(a+b) < 1, we can use (1 − x)−1 = ∞∑ n=0 xn to expand the denominator as a power series; if we set λ = r2/2, then E [ e−λT ;WT = −a ] (3.18) = ∞∑ n=0 ( e−(2n+1) √ 2λa−2n√2λb − e−(2n+1) √ 2λa−(2n+2)√2λb ) . We then use the fact that the Laplace transform of k 2 √ πt3 e−k 2/4t is e−k √ λ to find the inverse Laplace transform of the right-hand side of (3.18) by inverting term by term. Similarly (see Exercises 3.15 and 3.16), if b > 0, W is a Brownian motion, and S =
inf{t > 0 : Wt = b}, then E e−λS = e−
√
2λb. Inverting the Laplace transform,
P(S ∈ dt) = b√
2πt3
e−b
2/2t, t ≥ 0. (3.19)
Exercises
3.1 If W is a Brownian motion, show that
W 3t − 3
∫ t
0
Ws ds
is a martingale.
3.2 Suppose {Ft} is a filtration satisfying the usual conditions. Show that if Mt is a submartingale
and E Mt = E M0 for all t, then M is a martingale.
3.3 Let X be a submartingale. Show that supt≥0 E |Xt | < ∞ if and only if supt≥0 E X +t < ∞. 3.4 Prove all parts of Proposition 3.8. Exercises 23 3.5 If Tn is defined by (3.2), show Tn is a stopping time for each n and Tn ↓ T . 3.6 This exercise gives an alternate definition of FT which is more appealing, but not as useful. Suppose that {Ft} satisfies the usual conditions. Show that FT is equal to the σ -field generated by the collection of random variables YT such that Y is a bounded process with paths that are right continuous with left limits and Y is adapted to the filtration {Ft}. 3.7 Suppose {Ft} is a filtration satisfying the usual conditions. Show that if T is a stopping time, then T is FT measurable. 3.8 Suppose {Ft} is a filtration satisfying the usual conditions and T is a stopping time. Show that if S is a FT measurable random variable with S ≥ T , then S is a stopping time. 3.9 This exercise demonstrates that the conclusion of Corollary 3.13 cannot be extended to sub- martingales. Find a filtration {Ft} satisfying the usual conditions and a submartingale X with respect to {Ft} such that X does not have a version with paths that are right continuous with left limits. 3.10 Suppose {Ft} is a filtration satisfying the usual conditions. Show that if S and T are stopping times and X is a bounded F∞ measurable random variable, then E [E [X | FS] | FT ] = E [X | FS∧T ]. Hint: Let Yt = E [X | Ft ] and Zt = Yt∧S . Show the left-hand side is equal to YS∧T . 3.11 A martingale or submartingale Mt is uniformly integrable if the family {Mt : t ≥ 0} is a uniformly integrable family of random variables. Show that if Mt is a uniformly integrable martingale with paths that are right continuous with left limits, then {MT ; T a finite stopping time} is a uniformly integrable family of random variables. Show this also holds if Mt is a non-negative submartingale with paths that are right continuous with left limits. 3.12 This exercise weakens the conditions on the optional stopping theorem. Show that if Mt is a uniformly integrable martingale that is right continuous with left limits and T is a finite stopping time, then E MT = E M0. 3.13 Let W be a Brownian motion and let T be a stopping time with E T < ∞. Prove that EWT = 0 and EW 2T = E T . This is not an easy application of the optional stopping theorem because we do not know that Wt∧T is necessarily a uniformly integrable martingale. 3.14 Suppose that (W 1t , . . . ,W d t ) is a d-dimensional Brownian motion. Show that if i �= j, then W it W j t is a martingale. 3.15 Let Wt be a Brownian motion, b > 0, and T = inf{t > 0 : Wt = b}. Show T < ∞, a.s. Show E T = ∞. Hint: Take a limit in (3.11). 3.16 Suppose W is a Brownian motion and b > 0. If S = inf{t > 0 : Wt = b}, show that the Laplace
transform of the density of S is given by
E e−λS = e−
√
2λb.
3.17 Let Wt be a Brownian motion. Show that if α > 1/2, then
lim
t→∞
Wt
tα
= 0, a.s.

24 Martingales
Hint: Let α0 ∈ (1/2, α), estimate
P( sup
2n≤s≤2n+1
|Ws| ≥ (2n)α0 )
using (3.9), and then use the Borel–Cantelli lemma.
3.18 Let Wt be a one-dimensional Brownian motion and α ∈ (0, 1/2]. Prove that
lim sup
t→∞
|Wt |
tα
> 0, a.s.
3.19 If W is a Brownian motion and b is a constant, then the process Xt = Wt + bt is a Brownian
motion with drift. Prove that if b > 0, then
lim
t→∞ Xt = ∞, a.s.

4
Markov properties of Brownian motion
In later chapters we will discuss extensively the Markov property and strong Markov property.
The Brownian motion case is much simpler, and we do that now.
4.1 Markov properties
Let us begin with the Markov property.
Theorem 4.1 Let {F t} be a filtration, not necessarily satisfying the usual conditions, and
let W be a Brownian motion with respect to {Ft}. If u is a fixed time, then Yt = Wt+u − Wu is
a Brownian motion independent of Fu.
Proof Let Gt = Ft+u. It is clear that Y has continuous paths, is zero at time 0, and is
adapted to {Gt}. Since Yt − Ys = Wt+u − Ws+u, then Yt − Ys is a mean zero normal random
variable with variance (t + u) − (s + u) = t − s that is independent of Fs+u = Gs.
The strong Markov property is the Markov property extended by replacing fixed times u
by finite stopping times.
Theorem 4.2 Let {Ft} be a filtration, not necessarily satisfying the usual conditions, and let
W be a Brownian motion adapted to {Ft}. If T is a finite stopping time, then Yt = WT +t −WT
is a Brownian motion independent of FT .
Proof We will first show that whenever m ≥ 1, t1 < · · · < tm, f is a bounded continuous function on Rm, and A ∈ FT , then E [ f (Yt1, . . . ,Ytm ); A] = E [ f (Wt1, . . . ,Wtm )] P(A). (4.1) Once we have done this, we will then show how (4.1) implies our theorem. To prove (4.1), define Tn by (3.2). We have E [ f (WTn+t1 − WTn, . . . ,WTn+tm − WTn ); A] (4.2) = ∞∑ k=1 E [ f (WTn+t1 − WTn, . . . ,WTn+tm − WTn ); A, Tn = k/2n] = ∞∑ k=1 E [ f (Wt1+k/2n − Wk/2n, . . . ,Wtm+k/2n − Wk/2n ); A, Tn = k/2n]. Following the usual practice in probability that “,” means “and,” we use the notation “E [· · · ; A, Tn = k/2n]” as an abbreviation for “E [· · · ; A ∩ (Tn = k/2n)].” Since A ∈ FT , 25 26 Markov properties of Brownian motion then A ∩ (Tn = k/2n) = A ∩ ((T < k/2n) \ (T < (k − 1)/2n)) ∈ Fk/2n . We use the independent increments property of Brownian motion and the fact that Wt −Ws has the same law as Wt−s to see that the sum in the last line of (4.2) is equal to ∞∑ k=1 E [ f (Wt1+k/2n − Wk/2n, . . . ,Wtm+k/2n − Wk/2n )] P(A, Tn = k/2n) = ∞∑ k=1 E [ f (Wt1, . . . ,Wtm )] P(A, Tn = k/2n) = E [ f (Wt1, . . . ,Wtm )] P(A), which is the right-hand side of (4.1). Thus E [ f (WTn+t1 − WTn, . . .WTn+tm − WTn ); A] = E [ f (Wt1, . . .Wtm )] P(A). (4.3) Now let n → ∞. By the right continuity of the paths of W, the boundedness and continuity of f , and the dominated convergence theorem, the left-hand side of (4.3) converges to the left-hand side of (4.1). If we take A = � in (4.1), we obtain E [ f (Yt1, . . . ,Ytm )] = E [ f (Wt1, . . . ,Wtm )] whenever m ≥ 1, t1, . . . , tm ∈ [0, ∞), and f is a bounded continuous function on Rm. This implies that the finite-dimensional distributions of Y and W are the same. Since Y has continuous paths, Y is a Brownian motion. Next take A ∈ FT . By using a limit argument, (4.1) holds whenever f is the indicator of a Borel subset B of Rd , or in other words, P(Y ∈ B, A) = P(Y ∈ B)P(A) (4.4) whenever B is a cylindrical set. Let M be the collection of all Borel subsets B of C[0, ∞) for which (4.4) holds. Let C be the collection of all cylindrical subsets of C[0, ∞). Then we observe that M is a monotone class containing C and C is an algebra of subsets of C[0, ∞) generating the Borel σ -field of C[0, ∞). By the monotone class theorem (Theorem B.2), M is equal to the Borel σ -field on C[0, ∞), and since (4.4) holds for all sets B ∈ M, this establishes the independence of Y and FT . In the future, we will not put in the details for the arguments using the monotone class theorem. Observe that what is needed for the above proof to work is not that W be a Brownian motion, but that the process W have right continuous paths and that Wt −Ws be independent of Fs and have the same distribution as Wt−s. We therefore have the following corollary. Corollary 4.3 Let {Ft} be a filtration, not necessarily satisfying the usual conditions, and let X be a process adapted to {Ft}. Suppose X has paths that are right continuous with left limits and suppose Xt −Xs is independent of Fs and has the same law as Xt−s whenever s < t. If T is a finite stopping time, then Yt = XT+t − XT is a process that is independent of FT and X and Y have the same law. 4.2 Applications 27 x b 2b−x Figure 4.1 The reflection principle. 4.2 Applications The first application is known as the reflection principle and allows us to get control of the maximum of a Brownian motion. The idea is the following. Suppose that Wt is a Brownian motion and for some path, the Brownian motion goes above a level b before time t but that at time t the value of Wt is less than x, where x < b. We could take the graph of this path and reflect it across the horizontal line at level b the first time the path crosses the level b (Figure 4.1). This will give us a new path that ends up above 2b−x. Thus there is a one-to-one correspondence between paths where the maximum up to time t is above b and Wt is below x and the paths where Wt is above 2b − x. More precisely, we have the following. Theorem 4.4 Let Wt be a Brownian motion, b > 0, T = inf{t : Wt ≥ b}, and x < b. Then P(sup s≤t Ws ≥ b,Wt < x) = P(Wt > 2b − x). (4.5)
Proof Let Tn be defined by (3.2). We first show that
P(Tn ≤ t,Wt − WTn < x − b) = P(Tn ≤ t,Wt − WTn > b − x). (4.6)

28 Markov properties of Brownian motion
Writing [x] for the integer part of x, the left-hand side of (4.6) is equal to
[2nt]∑
k=0
P(Tn = k/2n,Wt − WTn < x − b) = [2nt]∑ k=0 P(Tn = k/2n,Wt − Wk/2n < x − b) = [2nt]∑ k=0 P(Tn = k/2n)P(Wt − WTn < x − b), using the independent increments property of Brownian motion and the fact that we have (Tn = k/2n) ∈ Fk/2n . Using the symmetry of the normal distribution, that is, that Wt − Ws and Ws − Wt have the same law, this is the same as [2nt]∑ k=0 P(Tn = k/2n)P(Wt − WTn > b − x),
and reversing the steps above, this equals the right-hand side of (4.6).
Since W has continuous paths, WT = b, so (T = t) ⊂ (Wt = b). Because Wt is a normal
random variable, then P(T = t) = 0. Also, P(Wt − WT = b − x) and P(Wt − WT = x − b)
are both zero. If we now let n → ∞ in (4.6), we obtain
P(T ≤ t,Wt − WT < x − b) = P(T ≤ t,Wt − WT > b − x).
Since WT = b, this is the same as
P(T ≤ t,Wt < x) = P(T ≤ t,Wt > 2b − x). (4.7)
By the definition of T and the continuity of the paths of W, the left-hand side is equal to the
left-hand side of (4.5). If Wt > 2b − x, then automatically T ≤ t, so the right-hand side of
(4.7) is equal to the right-hand side of (4.5).
Our second application will be useful when studying local time in Chapter 14.
Proposition 4.5 Let Wt be a Brownian motion with respect to a filtration {Ft} satisfying the
usual conditions. Let T be a finite stopping time and s > 0. If a < b, then P(WT+s ∈ [a, b] | FT ) ≤ |b − a|√ 2πs . Proof If A ∈ FT , let k > 0 and write
P(WT+s ∈ [a, b], A)
=
∞∑
j=−∞
P(WT+s ∈ [a, b], A, j/k ≤ WT < ( j + 1)/k) ≤ ∞∑ j=−∞ P(WT+s − WT ∈ [a − ( j + 1)/k, b − j/k], A, j/k ≤ WT ≤ ( j + 1)/k). Exercises 29 Using the fact that WT+s − WT is a Brownian motion independent of FT , this is less than or equal to ∞∑ j=−∞ P(Ws ∈ [a − ( j + 1)/k, b − j/k]) P(A, j/k ≤ WT ≤ ( j + 1)/k) ≤ ∞∑ j=−∞ 1√ 2π b − a + 1/k√ s P(A, j/k ≤ WT ≤ ( j + 1)/k) ≤ 1√ 2π b − a + 1/k√ s P(A). We used here the formula for the density of a normal random variable with mean zero and variance s. This is true for all k, so letting k → ∞ yields our result. Exercises 4.1 If W is a Brownian motion, let St = sups≤t Ws. Find the density for St . 4.2 With W and S as in Exercise 4.1, find the joint density of (St ,Wt ). 4.3 Let W be a Brownian motion started at a > 0 and let T0 be the first time W hits 0. Find the law
of supt≤T0 Wt .
4.4 Use the reflection principle to prove that if W is a Brownian motion and T = inf{t > 0 : Wt ∈
(0,∞)}, then
P(T = 0) = 1.
In other words, Brownian motion enters the interval (0,∞) immediately. By symmetry it enters
the interval (−∞, 0) immediately. Conclude that Brownian motion hits 0 infinitely often in
every time interval [0, t].
4.5 Let Wt be a Brownian motion and {Ft} be the minimal augmented filtration generated by W . Let
T = inf{t > 0 : Wt = sup
0≤s≤1
Ws}.
Show that T is not a stopping time with respect to {Ft}.
4.6 Let W and S be as in Exercise 4.1.
(1) Let 0 < s < t < u and let a < b with b − a ≤ 1. Show that there exists a constant c, depending on s, t, and u, but not a or b, such that P(Ss ∈ [a, b], sup t≤r≤u Wr ∈ [a, b]) ≤ c(b − a)2. (2) Show that the path of a Brownian motion does not take on the same value as a local maximum twice. That is, if S and T are times when W has a local maximum, then WS �= WT , a.s. 4.7 Let Vt be the number of upcrossings of [0, 1] by a Brownian motion W up to time t. This means we let S1 = 0, Ti = inf{t > Si : Wt ≥ 1}, and Si+1 = inf{T > Ti : Wt ≤ 0} for i = 1, 2, . . . ,
and we set Vt = sup{k : Tk ≤ t}. Show that Vt → ∞, a.s., as t → ∞.

30 Markov properties of Brownian motion
4.8 Let W be a Brownian motion. The zero set of Brownian motion is the random set
Z(ω) = {t ∈ [0, 1] : Wt (ω) = 0}.
(1) Show that Z(ω) is a closed set for each ω.
(2) Show that with probability one, every point of Z(ω) is a limit point of Z(ω). Conclude
that Z(ω) is an uncountable set.
4.9 Let W be a one-dimensional Brownian motion and δ > 0.
(1) Prove that there exists γ such that if t ≤ γ , then
P(0 ≤ Wt ≤ δ/2) ≥ 1/4 and P(−δ/2 ≤ Wt ≤ 0) ≥ 1/4.
(2) Prove there exists γ such that
P(sup
s≤γ
|Ws| > δ/2) ≤ 1/8.
(3) Prove that if m ≥ 1, then
P( sup
mγ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2,Wmγ ∈ [0, δ/2], |W(m+1)γ | ≤ δ/2 | Fmγ )
≥ 18 P( sup
mγ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2,Wmγ ∈ [0, δ/2])
and the same with Wmγ ∈ [−δ/2, 0] in place of Wmγ ∈ [0, δ/2]. Conclude that
P( sup
mγ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2, |Wmγ | ≤ δ/2, |W(m+1)γ | ≤ δ/2 | Fmγ )
≥ 18 P( sup
mγ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2, |Wmγ | ≤ δ/2).
(4) Use induction to prove that if t0 > 0, there exists c1 > 0 such that
P(sup
s≤t0
|Ws| ≤ δ) > c1.
(5) Prove that if W is a d-dimensional Brownian motion, t0 > 0, and δ > 0, there exists c2
such that
P(sup
s≤t0
|Ws| ≤ δ) > c2.
4.10 The p-variation of a function f on the interval [0, 1] is defined by
V p( f ) = sup
{ n−1∑
i=0
| f (ti+1) − f (ti)|p : n ≥ 1, 0 = t0 < t1, · · · < tn = 1 } ; the supremum is over all partitions P of [0, 1]. In this exercise we will prove that if p < 2 and W is a Brownian motion, then V p(W ) = ∞, a.s. (1) Let Xi be an i.i.d. sequence of random variables with finite mean. Use the strong law of large numbers to prove that if K > E X1, then
P
( n∑
i=1
Xi > Kn
)
→ 0
as n → ∞.

Exercises 31
(2) If p < 2, take r ∈ (p, 2), and let εn = n−1/r. Let S0 = 0 and for i ≥ 0, set Si+1 = inf{t > Si : |Wt − WSi | > εn}. Set Xi = ε−2n (Si − Si−1). Prove that the Xi are i.i.d.
with finite mean.
(3) Use (1) to show that
P(Sn > 1) = P
( n∑
i=1
Xi > ε
−2
n
)
→ 0
as n → ∞.
(4) Using the partition {S0, S1, . . . , Sn}, show that V p(W ) ≥ nεpn on the event (Sn ≤ 1).
(5) Conclude V p(W ) = ∞, a.s.

5
The Poisson process
At the opposite extreme from Brownian motion is the Poisson process. This is a process
that only changes value by means of jumps, and even then, the jumps are nicely spaced. The
Poisson process is the prototype of a pure jump process, and later we will see that it is the
building block for an important class of stochastic processes known as Lévy processes.
Definition 5.1 Let {F t} be a filtration, not necessarily satisfying the usual conditions. A
Poisson process with parameter λ > 0 is a stochastic process X satisfying the following
properties:
(1) X0 = 0, a.s.
(2) The paths of Xt are right continuous with left limits.
(3) If s < t, then Xt − Xs is a Poisson random variable with parameter λ(t − s). (4) If s < t, then Xt − Xs is independent of Fs. Define Xt− = lims→t,s 0. If there were a jump of
size 2 or larger at some time t strictly less than t0, then for each n sufficiently large there
32

The Poisson process 33
exists 0 ≤ kn ≤ 2n such that X(kn+1)t0/2n − Xknt0/2n ≥ 2. Therefore
P(∃ s < t0 : �Xs ≥ 2) ≤ P(∃ k ≤ 2n : X(k+1)t0/2n − Xkt0/2n ≥ 2) (5.1) ≤ 2n sup k≤2n P(X(k+1)t0/2n − Xkt0/2n ≥ 2) = 2nP(Xt0/2n ≥ 2n) ≤ 2n(1 − P(Xt0/2n = 0) − P(Xt0/2n = 1)) = 2n ( 1 − e−λt0/2n − (λt0/2n)e−λt0/2n ) . We used property 5.1(3) for the two equalities. By l’Hôpital’s rule, (1 − e−x − xe−x)/x → 0 as x → 0. We apply this with x = λt0/2n, and see that the last line of (5.1) tends to 0 as n → ∞. Since the left-hand side of (5.1) does not depend on n, it must be 0. This holds for each t0. Another characterization of the Poisson process is as follows. Let T1 = inf{t : �Xt = 1}, the time of the first jump. Define Ti+1 = inf{t > Ti : �Xt = 1}, so that Ti is the time of the
ith jump.
Proposition 5.3 The random variables T1, T2 − T1, . . . , Ti+1 − Ti, . . . are independent expo-
nential random variables with parameter λ.
Proof In view of Corollary 4.3 it suffices to show that T1 is an exponential random variable
with parameter λ. If T1 > t, then the first jump has not occurred by time t, so Xt is still zero.
Hence
P(T1 > t) = P(Xt = 0) = e−λt,
using the fact that Xt is a Poisson random variable with parameter λt.
We can reverse the characterization in Proposition 5.3 to construct a Poisson process. We
do one step of the construction, leaving the rest as Exercise 5.4.
Let U1,U2, . . . be independent exponential random variables with parameter λ and let
Tj =
∑ j
i=1 Ui. Define
Xt (ω) = k if Tk(ω) ≤ t < Tk+1(ω). (5.2) An examination of the densities shows that an exponential random variable has a gamma distribution with parameters λ and r = 1, so by Proposition A.49, Tj is a gamma random variable with parameters λ and j. Thus P(Xt < k) = P(Tk > t) =
∫ ∞
t
λe−λx(λx)k−1

(k)
dx.
Performing the integration by parts repeatedly shows that
P(Xt < k) = k−1∑ i=0 e−λt (λt)i i! , and so Xt is a Poisson random variable with parameter λt. We will use the following proposition later. 34 The Poisson process Proposition 5.4 Let {Ft} be a filtration satisfying the usual conditions. Suppose X0 = 0, a.s., X has paths that are right continuous with left limits, Xt − Xs is independent of Fs if s < t, and Xt − Xs has the same law as Xt−s whenever s < t. If the paths of X are piecewise constant, increasing, all the jumps of X are of size 1, and X is not identically 0, then X is a Poisson process. Proof Let T0 = 0 and Ti+1 = inf{t > Ti : �Xt = 1}, i = 1, 2, . . . We will show that if
we set Ui = Ti − Ti−1, then the Ui are i.i.d. exponential random variables and then appeal to
Exercise 5.4.
By Corollary 4.3, the Ui are independent and have the same law. Hence it suffices to show
U1 is an exponential random variable. We observe
P(U1 > s + t) = P(Xs+t = 0) = P(Xs+t − Xs = 0, Xs = 0)
= P(Xt+s − Xs = 0)P(Xs = 0) = P(Xt = 0)P(Xs = 0)
= P(U1 > t)P(U1 > s).
Setting f (t) = P(U1 > t), we thus have f (t + s) = f (t) f (s). Since f (t) is decreasing
and 0 < f (t) < 1, we conclude P(U1 > t) = f (t) = e−λt for some λ > 0, or U1 is an
exponential random variable.
Exercises
5.1 Suppose Pt is a Poisson process and we write Xt = Pt−. Is P1 −X1−t a Poisson process on [0, 1]?
Why or why not?
5.2 Let P be a Poisson process with parameter λ. Show that
lim
n→∞ supt≤1
∣∣∣Pnt
n
− λt
∣∣∣ = 0, a.s.
5.3 Show that if P(1) and P(2) are independent Poisson processes with parameters λ1 and λ2,
respectively, then P(1)t + P(2)t is a Poisson process with parameter λ1 + λ2.
5.4 If X is defined by (5.2), show that X is a Poisson process.
5.5 Let Xt be a stochastic process and let {F00t } be the filtration generated by X . Suppose X is
a Poisson process with respect to the filtration {F00t }. Show that X is a Poisson process with
respect to the minimal augmented filtration generated by X .
Hint: Imitate the proof of Proposition 2.5.
5.6 Suppose Pt is a Poisson process and f and g are non-negative bounded deterministic functions
with compact support. Find necessary and sufficient conditions on f and g so that
∫∞
0 f (s) dPs
and
∫∞
0 g(s) dPs are independent.
Hint: First show that the characteristic function of F = ∫∞0 f (s) dPs is
E eiuF = exp
( ∫ ∞
0
(eiu f (s) − 1) ds
)
.

Exercises 35
5.7 We will talk about weak convergence in general metric spaces in Chapters 30–35. This ex-
ercise is concerned with the weak convergence of real-valued random variables as defined in
Section A.12.
Suppose for each n, Pn is a Poisson random variable with parameter λn and λn → ∞ as
n → ∞. Prove that
Pn − λn√
λn
converges weakly to a normal random variable with mean zero and variance one.
Hint: Imitate the proof of Theorem A.51.

6
Construction of Brownian motion
There are several ways of constructing Brownian motion, none of them easy. Here we give
two constructions. The first is the one that Wiener used, which is based on Fourier series.
The second uses martingale techniques. A method due to Lévy can be found in Bass (1995);
see also Exercises 6.4 and 6.5. We will see several other constructions in later chapters.
6.1 Wiener’s construction
For any of the constructions of Brownian motion, the main step is to construct Wt for
t ∈ [0, 1]. Once we have done this, we get Brownian motion for all t rather easily. More
specifically, suppose we have a Brownian motion Y (0) started at 0 on the time interval [0, 1].
Take independent copies Y (1),Y (2), . . . , each on [0, 1]. We have Y (i)0 = 0 for each i, and now
to get Brownian motion started at 0, define Wt to be equal to Y
(0)
t if t ≤ 1, equal to Y (0)1 +Y (1)t−1
if 1 < t ≤ 2, and more generally Wt = ( [t]−1∑ i=0 Y (i)1 ) + Y [t]t−[t] if t ≥ 1, where [t] is the largest integer less than or equal to t. This will give Brownian motion started at 0 on the time interval [0, ∞). Therefore the crux of the problem is to construct Brownian motion on [0, 1]. Because we are working with Fourier series, it is more convenient to look at Brownian motion on [0, π]; we can just disregard times between 1 and π when we are done. Throughout this chapter we make the supposition that we can find a countable sequence Z1, Z2, . . . of independent and identically distributed mean zero normal random variables with variance one that are F measurable, where (�,F , P) is our probability space. This is an extremely mild condition. Theorem 6.1 There exists a process {Wt; 0 ≤ t ≤ 1} that is Brownian motion. Proof If we fix t ∈ [0, π ] and compute the Fourier series for the function f (s) = s ∧ t, it is an exercise in calculus to get the Fourier coefficients. We end up with s ∧ t = st π + 2 π ∞∑ k=1 sin ks sin kt k2 . (6.1) 36 6.1 Wiener’s construction 37 This suggests letting Z0, Z1, . . . be i.i.d. normal random variables with mean 0 and variance 1 and setting Wt = t√ π Z0 + ∞∑ k=1 (√ 2 π sin kt k ) Zk. (6.2) Assuming there is no problem with convergence, we see that Wt has mean zero, since each of the Zi does, and that E [WsWt] = st π + ∞∑ k=1 2 π sin ks sin kt k2 = s ∧ t (6.3) as required. We used the independence of the Zi here to show that E [ZiZ j] = 0 if i �= j. We argue that there is in fact no difficulty with the convergence. Note that ∑m k=1 sin2 kt k2 increases as m increases to a finite limit. Therefore E [( n∑ k=m Zk sin kt k )2] = n∑ k=m sin2 kt k2 → 0 in L2 as m, n → ∞. This means that the sum on the right of (6.2) is a Cauchy sequence in L2. By the completeness of L2, the sum on the right of (6.2) converges in L2. A use of the Cauchy–Schwarz inequality allows us to justify the formula for the expectation of WsWt . If we let W jt = t√ π Z0 + j∑ k=1 (√ 2 π sin kt k ) Zk, then (W jt1 , . . . ,W j tm ) is a jointly normal collection of random variables for each j whenever t1, . . . , tn ∈ [0, π ]. By Remark A.56, it follows that (Wt1, . . . ,Wtm ) is a jointly normal collection of random variables. Therefore Wt is a Gaussian process. Since each Wt has mean zero and Cov (Ws,Wt ) = s ∧ t, then Wt has the correct finite-dimensional distributions to be a Brownian motion. The only part remaining to the construction is to show that Wt as constructed above has continuous paths, for we can then use Theorem 2.4. In what follows, pay attention to where the absolute values are placed. If one is cavalier about placing them, one will very likely run into trouble. Define Sm(t) = 2m−1∑ k=m sin kt k Zk and let Tm = sup0≤t≤π |Sm(t)|. We write Wt = t√ π Z0 + √ 2 π ∞∑ n=0 S2n (t). We will show E T 2m ≤ c m1/2 . (6.4) 38 Construction of Brownian motion Once we have this, then by the Fubini theorem and then Jensen’s inequality, E ∞∑ n=0 T2n = ∞∑ n=0 E T2n ≤ ∞∑ n=0 ( E [T 22n ] )1/2 < ∞. Therefore ∑∞ n=0 T2n < ∞, a.s., and by the Weierstrass M-test (see, e.g., Rudin, 1976), we have that with probability 1, ∑∞ n=0 S2n (t) converges uniformly in t. Since each S2n (t) is a continuous function of t, we see that the uniform limit is also continuous and we are done. We therefore have to prove (6.4). Using |∑k ak|2 =∑ j,k aka j for ak complex valued, we have T 2m ≤ sup 0≤t≤π ∣∣∣ 2m−1∑ k=m eikt k Zk ∣∣∣2 ≤ sup 0≤t≤π ∣∣∣ 2m−1∑ j,k=m eikte−i jt jk Z jZk ∣∣∣ ≤ 2m−1∑ k=m 1 k2 Z2k + 2 sup 0≤t≤π ∣∣∣ m−1∑ �=1 2m−�−1∑ j=m ei�t j( j + �)ZjZ j+� ∣∣∣ ≤ 2m−1∑ k=m 1 k2 Z2k + 2 m−1∑ �=1 ∣∣∣ 2m−�−1∑ j=m 1 j( j + �)ZjZ j+� ∣∣∣. (6.5) In the third inequality we wrote 2m−1∑ j,k=m = ∑ m≤ j=k≤2m−1 + 2 ∑ m≤ jt,u∈D,u→t Wu. Since E (Wu −Wt )2 = u − t → 0
as u → t, then W ′t = Wt , a.s., or W ′ is a version of W with paths that are right continuous
with left limits. We now drop the primes. Set Wt = W1 if t ≥ 1.
For any t0 ∈ [0, 1], Wt+t0 − Wt0 is also a martingale, and by Jensen’s inequality for
conditional expectations (Proposition A.21), |Wt+t0 −Wt0 |4 is a submartingale. Using Doob’s
inequalities (Theorem 3.6), if λ > 0 and t0, δ ∈ [0, 1],
P( sup
t0≤t≤t0+δ
|Wt − Wt0 | ≥ λ) = P( sup
t0≤t≤t0+δ
|Wt − Wt0 |4 ≥ λ4)
≤ c E |Wt0+δ − Wt0 |
4
λ4
.
Since Wt0+δ − Wt0 is a mean zero normal random variable with variance δ if t0 + δ ≤ 1, we
have
P( sup
t0≤t≤t0+δ
|Wt − Wt0 | ≥ λ) ≤ c
δ2
λ4
. (6.7)

40 Construction of Brownian motion
Let
An = {∃ k ≤ 2n : sup
k/2n≤t≤(k+2)/2n
|Wt − Wk/2n | > 2−n/8}.
From (6.7) with δ = 2−n+1 and λ = 2−n/8,
P(An) ≤ 2n max
k≤2n
P( sup
k/2n≤t≤(k+2)/2n
|Wt − Wk/2n | > 2−n/8)
≤ c2
n2−2n
2−n/2
= c2−n/2,
which is summable. By the Borel–Cantelli lemma, P(An i.o.) = 0. (The event (An i.o.) is the
event where ω is in infinitely many of the An.)
Except for a set of ω’s in a null set, there exists a positive integer N (which will depend
on ω) such that if n ≥ N , then ω /∈ An. Given ε > 0, take n ≥ N such that 2−n/8 < ε/2. If |t − s| ≤ 2−n with s, t ∈ [0, 1], then s, t ∈ [k/2n, (k + 2)/2n] for some k ≤ 2n. Since ω /∈ An, |Wt − Ws| ≤ |Wt − Wk/2n | + |Ws − Wk/2n | ≤ 2 · 2−n/8 < ε. This proves the continuity of Wt . There is nothing special about the trigonometric polynomials in this second construction. Let 〈 f , g〉 = ∫ 10 f (r)g(r) dr be the inner product for the Hilbert space L2[0, 1]; we consider only real-valued functions for simplicity. Let {ϕn} be a complete orthonormal system for L2[0, 1]: we have 〈ϕm, ϕn〉 = 0 if m �= n, 〈ϕn, ϕn〉 = 1 for each n, and f = 0, a.e., if 〈 f , ϕn〉 = 0 for all n. One property of a complete orthonormal system is Parseval’s identity, which says that 〈 f , f 〉 = ∞∑ n=1 |〈 f , ϕn〉|2; see Folland (1999). If we replace f by g and then by f + g and use 〈 f , g〉 = 12 [〈 f + g, f + g〉 − 〈 f , f 〉 − 〈g, g〉], we obtain 〈 f , g〉 = ∞∑ n=1 〈 f , ϕn〉〈g, ϕn〉. Now let an(t) = 〈1[0,t], ϕn〉 = ∫ t 0 ϕn(r) dr. If Z1, Z2, . . . are independent mean zero normal random variables with variance one, let Wt = ∞∑ n=1 an(t)Zk. (6.8) Exercises 41 Assuming there is no difficulty with the convergence, we have Cov (Ws,Wt ) = ∞∑ n=1 an(s)an(t) = ∞∑ n=1 〈1[0,s], ϕn〉〈1[0,t], ϕn〉 = 〈1[0,s], 1[0,t]〉 = s ∧ t. Exercise 6.2 asks you to verify that the process W defined by (6.8) is a mean zero Gaussian process on [0, 1] with the same covariances as a Brownian motion. Exercises 6.1 Let Z0, Z1, Z2, . . . be a sequence of independent identically distributed mean zero normal random variables with variance one. Define Xt = t 2 2 √ π Z0 + ∞∑ k=1 (√ 2 π cos kt k2 ) Zk . (6.9) (1) Show that the convergence in (6.9) is absolute and uniform over t ∈ [0, 1]. (2) Show that Xt is a Gaussian process. (3) If Wt is a Brownian motion and Yt = ∫ t 0 Wr dr, t ∈ [0, 1], show that X and Y have the same finite-dimensional distributions. Show that X and Y have the same law when viewed as random variables taking values in C[0, 1]. (The process X is sometimes known as integrated Brownian motion.) (4) Find Cov (Xs, Xt ). 6.2 Let {ϕn} be a complete orthonormal system for L2[0, 1]. Show that the sum (6.8) converges in L2 and give the details of the proof that the resulting process W is a mean zero Gaussian process with Cov (Ws,Wt ) = s ∧ t if s, t ∈ [0, 1]. 6.3 Let D = {k/2n : n ≥ 1, k = 0, 1, . . . , 2n} be the dyadic rationals. Suppose the collection of random variables {Vt : t ∈ D} is jointly normal, each Vt has mean zero, and Cov (Vs,Vt ) = s ∧ t. (1) Prove that the paths of V are uniformly continuous over t ∈ D. (2) If we define Wt = lims∈D,s→t Vs, prove that W is a Brownian motion. 6.4 In this and the next exercise we give the Haar function construction of Brownian motion. Let ϕ00 = 1 on [0, 1] and for i = 1, 2, . . ., and 1 ≤ j ≤ 2i−1, set ϕi j(x) = ⎧⎪⎨⎪⎩ 2(i−1)/2, (2 j − 2)/2i ≤ x < (2 j − 1)/2i, −2(i−1)/2, (2 j − 1)/2i ≤ x < 2 j/2i, 0, otherwise. It is a well-known and easily proved result from analysis (see, e.g., Bass (1995), Section I.2) that the collection {ϕi j} is a complete orthonormal system for L2[0, 1]. For each i, j, define ψi j(t) = ∫ t 0 ϕi j(s) ds, 42 Construction of Brownian motion for each i and j, let Yi j be independent mean zero normal random variables with variance one, and let Vi(t) = 2i−1∑ j=1 Yi jϕi j(t) for i ≥ 1. Set V0 = Y00ϕ00. (1) Fix i ≥ 1. Prove that each ψi j is bounded by 2(−i−1)/2. Prove that the sets {t : ψi j(t) > 0},
j = 1, . . . , 2i−1, are disjoint.
(2) Fix i ≥ 1. Write
P(∃ t ∈ [0, 1] : |Vi(t)| > i−2) ≤ P(∃ j ≤ 2i−1 : |Yi j|2(−i−1)/2 > i−2),
use Proposition A.52 to estimate this, and conclude that
∞∑
i=1
P( sup
0≤t≤1
|Vi(t)| > i−2) < ∞. (6.10) 6.5 This is a continuation of Exercise 6.4. With ϕi j , ψi j , Yi j , and Vi as in that problem, let Wt = ∞∑ i=0 Vi(t). (1) Prove thatW is a jointly normal Gaussian process with mean zero and Cov (Ws,Wt ) = s∧t. (2) Use (6.10) and the Borel–Cantelli lemma to show that ∑n i=1 |Vi(t)| converges uniformly over [0, 1]. Conclude that W is a Brownian motion. 7 Path properties of Brownian motion The paths of Brownian motion are continuous, but we will see that they are not differentiable. How continuous are they? We will see that the paths satisfy what is known as a Hölder continuity condition. A precise description of the oscillatory behavior of Brownian motion will be given by the law of the iterated logarithm. A function f : [0, 1] → R is said to be Hölder continuous of order α if there exists a constant M such that | f (t) − f (s)| ≤ M |t − s|α, s, t ∈ [0, 1]. (7.1) We show that the paths of Brownian motion are Hölder continuous of order α if α < 12 . (They are also not Hölder continuous of order α if α ≥ 12 ; we will see this from the law of the iterated logarithm.) Theorem 7.1 If α < 12 , the paths of Brownian motion are Hölder continuous of order α on [0, 1]. Proof Step 1. First we apply the Borel–Cantelli lemma to a certain sequence of sets. Let W be a Brownian motion and set An = {∃ k ≤ 2n − 1 : sup k/2n≤t≤(k+1)/2n |Wt − Wk/2n | > 2−nα}.
Since Wt+k/2n − Wk/2n is a Brownian motion,
P(An) ≤ 2n sup
k≤2n
P( sup
t≤1/2n
|Wt+k/2n − Wk/2n | > 2−nα )
≤ 2nP( sup
t≤1/2n
|Wt | > 2−nα ) (7.2)
≤ 2 · 2n exp(−2−2nα/2(2−n)).
Here we used Proposition 3.15. Since α < 12 , then 2 n(1−2α) > 2n for n large, and the last line
of (7.2) is less than
2n+1 exp(−2n(1−2α)/2) ≤ 2n+1e−n
if n is large. Hence
∑
P(An) < ∞, and P(An i.o.) = 0 by the Borel–Cantelli lemma. Step 2. Next we show that this implies the Hölder continuity. For almost every ω there exists N (depending on ω) such that if n ≥ N , then ω /∈ An. Let s ≤ t be two points in [0, 1]. If 2−(n+2) ≤ t − s ≤ 2−(n+1) for some n ≥ N and k is the largest integer such that 43 44 Path properties of Brownian motion k/2n+2 ≤ s, then |Wt − Ws| ≤ |Wt − Wt∧((k+1)/2n+2 )| + |Wt∧((k+1)/2n+2 ) − Wk/2n+2 | + |Ws − Wk/2n+2 | ≤ 3 · 2−nα ≤ 3 · 4α|t − s|α. We know |Wt (ω)| is bounded on [0, 1] since the paths are continuous; let K (depending on ω) be the bound. If |t − s| ≥ 2−(N+1), then |Wt − Ws| ≤ 2K ≤ (2K)(2N+1)|t − s| ≤ (2K)(2N+1)|t − s|α. Thus, no matter whether |t − s| is small or large, there exists L (depending on ω) such that |Wt (ω) − Ws(ω)| ≤ L|t − s|α for all s, t ∈ [0, 1]. One of the most beautiful theorems in probability theory is the law of the iterated logarithm (LIL). It describes precisely how Brownian motion oscillates. Theorem 7.2 Let W be a Brownian motion. We have lim sup t→∞ |Wt |√ 2t log log t = 1, a.s. and lim sup t→0 |Wt |√ 2t log log(1/t) = 1, a.s. Proof The second assertion follows from the first by time inversion; see Exercise 2.5. Thus we only need to prove the first assertion. Proof of upper bound: We use the Borel–Cantelli lemma. Let ε > 0 and then choose q
larger than 1 but close enough to 1 so that (1 + ε)2/q > 1. Let
An = (sup
s≤qn
|Ws| > (1 + ε)
√
2qn−1 log log qn−1).
By Proposition 3.15,
P(An) ≤ 2 exp
(
− (1 + ε)
22qn−1 log log qn−1
2qn
)
= 2 exp
(
− (1 + ε)
2
q
(log(n − 1) + log log q)
)
= c
(n − 1)(1+ε)2/q ,
where we are using our convention that the letter c denotes a constant whose exact value is
unimportant. This is summable in n, so
∑
P(An) < ∞. By the Borel–Cantelli lemma, P(An i.o.) = 0. Hence, except for a null set, there exists N = N (ω) such that ω /∈ An if n ≥ N (ω). If t ≥ qN , then for some n ≥ N + 1 we have qn−1 ≤ t ≤ qn, and |Wt | ≤ sup s≤qn |Ws| ≤ (1 + ε) √ 2qn−1 log log qn−1 ≤ (1 + ε) √ 2t log log t. Path properties of Brownian motion 45 Therefore lim sup t→∞ |Wt |√ 2t log log t ≤ 1 + ε, a.s. (7.3) Since ε > 0 is arbitrary, the upper bound is proved.
Proof of lower bound: We start with the second half of the Borel–Cantelli lemma. Let
ε > 0 and then take q > 1 very large so that
(1 − ε)2(1 + ε)
1 − q−1 < 1 and 2/ √ q < ε/2. This is possible because (1 − ε)2(1 + ε) = (1 − ε2)(1 − ε) < 1. Let Bn = (Wqn+1 − Wqn > (1 − ε)
√
2qn+1 log log qn+1).
Since Brownian motion has independent increments, the events Bn are independent. Let
Z = Wqn+1 − Wqn√
qn+1 − qn .
Then Z is a mean zero normal random variable with variance one. By Proposition A.52, we
see that
P(Bn) = P(Z > (1 − ε)
√
2qn+1 log log qn+1/
√
qn+1 − qn)
≥ exp
(
− (1 − ε)
2(1 + ε)2qn+1 log log qn+1
2(qn+1 − qn)
)
= c exp
(
− (1 − ε)2(1 + ε) log(n + 1) + log log q
1 − q−1
)
for n large. Hence ∑
P(Bn) ≥ c
∑
n
1
(n + 1)(1−ε)2(1+ε)/(1−q−1 ) = ∞.
By the Borel–Cantelli lemma, with probability one, ω is in infinitely many Bn. Conse-
quently, with probability one, infinitely often
Wqn+1 − Wqn > (1 − ε)
√
2qn+1 log log qn+1. (7.4)
The inequality (7.4) is not exactly what we want, as we want a lower bound for Wqn+1 , but
we can derive the desired lower bound by using the upper bound we proved in Step 1. We
know from (7.3) that for n large enough,
|Wqn | ≤ 2
√
2qn log log qn ≤ 2√
q
√
2qn+1 log log qn+1 < ε 2 √ 2qn+1 log log qn+1. Thus infinitely often Wqn+1 > (1 − 3ε/2)
√
2qn+1 log log qn+1.

46 Path properties of Brownian motion
This proves
lim sup
n→∞
Wqn+1√
2qn+1 log log qn+1
≥ 1 − 3ε
2
, a.s.
Since ε is arbitrary, the lower bound follows.
The law of the iterated logarithm show that the paths of Wt are not differentiable at time 0,
a.s. Applying this to Ws+t − Wt , we see that for each t, W is not differentiable at time t, a.s.
But the null set Nt might depend on t, and it is even conceivable that ∪t∈[0,1]Nt is not a null
set. We have the following stronger result, which says that except for a set of ω’s that form a
null set, t → Wt (ω) is a function that does not have a derivative at any time t ∈ [0, 1].
Theorem 7.3 With probability one, the paths of Brownian motion are nowhere differentiable.
Proof Note that if Z is a normal random variable with mean 0 and variance 1, then
P(|Z| ≤ r) = 1√
2π
∫ r
−r
e−x
2/2 dx ≤ 2r. (7.5)
Let M, h > 0 and let
AM,h = {∃s ∈ [0, 1] : |Wt − Ws| ≤ M |t − s| if |t − s| ≤ h},
Bn = {∃k ≤ 2n : |Wk/n − W(k−1)/n| ≤ 4M/n,
|W(k+1)/n − Wk/n| ≤ 4M/n, |W(k+2)/n − W(k+1)/n| ≤ 4M/n}.
We check that AM,h ⊂ Bn if n ≥ 2/h. To see this, if ω ∈ AM,h, there exists an s such that
|Wt −Ws| ≤ M |t − s| if |t − s| ≤ 2/n; let k/n be the largest multiple of 1/n less than or equal
to s. Then
|(k + 2)/n − s| ≤ 2/n and |(k + 1)/n − s| ≤ 2/n,
and therefore
|W(k+2)/n − W(k+1)/n| ≤ |W(k+2)/n − Ws| + |Ws − W(k+1)/n|
≤ 2M/n + 2M/n < 4M/n. Similarly |W(k+1)/n − Wk/n| and |Wk/n − W(k−1)/n| are less than 4M/n. Using the independent increments property, the stationary increments property, and (7.5), P(Bn) ≤ 2n sup k≤2n P(|Wk/n − W(k−1)/n| < 4M/n, |W(k+1)/n − Wk/n| < 4M/n, |W(k+2)/n − W(k+1)/n| < 4M/n) ≤ 2nP(|W1/n| < 4M/n, |W2/n − W1/n| < 4M/n, |W3/n − W2/n| < 4M/n) = 2nP(|W1/n| < 4M/n)P(|W2/n − W1/n| < 4M/n) × P(|W3/n − W2/n| < 4M/n) = 2n(P(|W1/n| < 4M/n))3 ≤ cn (4M√ n )3 , Exercises 47 which tends to 0 as n → ∞. Hence for each M and h, P(AM,h) ≤ lim sup n→∞ P(Bn) = 0. This implies that the probability that there exists s ≤ 1 such that lim sup h→0 |Ws+h − Ws| |h| ≤ M is zero. Since M is arbitrary, this proves the theorem. Exercises 7.1 Here you are asked to find a more precise description of the modulus of continuity of Brownian paths. Prove that lim δ→0 sup s,t∈[0,1],0<|t−s|<δ |Wt − Ws|√ δ log(1/δ) < ∞, a.s. Hint: Imitate the proof of Theorem 7.1. 7.2 The following is part of what is known as Chung’s law of the iterated logarithm. We will see in Section 40.3 that there exists c1 such that P(sup s≤t |Ws| ≤ λ) ≤ c1e−π2t/8λ2 for t/λ2 sufficiently large. Prove that lim inf t→∞ sups≤t |Ws|√ t/ log log t < ∞, a.s. 7.3 Let Wt be a one-dimensional Brownian motion. We will see in Section 40.3 that there exists c2 such that P(sup s≤t |Ws| ≤ λ) ≥ c2e−π2t/8λ2 if t/λ2 is sufficiently large. Prove that lim inf t→∞ sups≤t |Ws|√ t/ log log t > 0, a.s.
This is the other half of Chung’s law of the iterated logarithm. In fact,
lim inf
t→∞
sups≤t |Ws|√
t/ log log t
= c, a.s. (7.6)
Identify c and prove (7.6).
7.4 A function f is Hölder continuous of order α at a point t if there exists c such that | f (u)− f (t)| ≤
c|u − t|α for all u. Suppose α > 1/2 and Wt is a Brownian motion. Show that the event
A = {∃ t ∈ [0, 1] : W is Hölder continuous of order α at t)
has probability 0.
Hint: Imitate the proof of nowhere differentiability, but use more than three time intervals.

48 Path properties of Brownian motion
7.5 Let W be a one-dimensional Brownian motion and let Mt = sups≤t Ws (with no absolute value
signs). Prove that if ε > 0, then
lim inf
t→∞
Mt√
t/(log t)1+ε
> 0, a.s.
7.6 This is a complement to Exercise 4.10. Prove that if p > 2 and W is a Brownian motion, then
the p-variation of W , defined in Exercise 4.10, is finite, a.s.
Hint: Use the fact that the paths of Brownian motion are Hölder continuous of order α
if α < 1/2. 7.7 Let W be a Brownian motion and let Z be the zero set: Z = {t ∈ [0, 1] : Wt = 0}. (1) Show there exists a constant c not depending on x or δ such that P(∃s ≤ δ : Ws = −x) ≤ P(sup s≤δ |Ws| ≥ |x|) ≤ ce−x2/2δ. (2) Use the Markov property of Brownian motion to show that there exists a constant c not depending on s or t such that P(Z ∩ [s, t] �= ∅) ≤ c ( 1 ∧ √ t − s s ) . 7.8 Given a Borel measurable subset A of [0, 1], define Hγ (A) = lim sup δ→0 [ inf { ∞∑ i=1 [bi − ai]γ : A ⊂ ∪∞i=1[ai, bi], sup i |bi − ai| ≤ δ }] . In other words, cover A by the union of intervals [ai, bi] and define the analog of Lebesgue measure. The differences are that we look at |bi − ai|γ but do not require that γ be one, and we require that none of the intervals be longer than δ. The quantity Hγ (A) is called the Hausdorff measure of A with respect to the function xγ . The Hausdorff dimension of a set A is defined to be inf{γ : Hγ (A) > 0} = sup{γ : Hγ (A) = ∞}.
(For subsets of Rd , we replace the intervals [ai, bi] by balls of radius ri.) As a warm-up to this
exercise, prove that the Hausdorff dimension of the standard Cantor set in [0, 1] is log 2/ log 3.
The purpose of this exercise is to show that if W is a Brownian motion and Z = {t ∈ [0, 1] :
Wt = 0} is the zero set, then the Hausdorff dimension of Z is no more than 1/2.
(1) For each n, let Cn be the collection of intervals [i/2n, (i + 1)/2n] contained in [0, 1] that
intersect Z. (Cn is random.) If #Cn is the cardinality of Cn, use Exercise 7.7 to show
E [ #Cn] ≤
2n−1∑
i=0
P(Z ∩ [i/2n, (i + 1)/2n] �= ∅) ≤ c2n/2.
(2) Write ∑
[i/2n,(i+1)/2n]∈Cn
|2−n|γ = 2−nγ #Cn.
Use the Chebyshev inequality and (1) to conclude that the Hausdorff dimension of Z is less than
or equal to 1/2, a.s. (We will show that it is at least 1/2 in Exercise 14.10.)

8
The continuity of paths
It is often important to know whether a stochastic path has continuous paths. An important
sufficient condition is the Kolmogorov continuity criterion. This criterion is also useful in
showing the continuity of a family of random variables X a in the variable a, where a is a
parameter other than time. Kolmogorov’s continuity criterion is part (2) of Theorem 8.1.
Let Dn = {k/2n : k ≤ 2n} and let D = ∪nDn. The set D is known as the set of dyadic
rationals in [0, 1]. We will use
∞∑
i=1
i−2 ≤ 1 +
∫ ∞
1
x−2 dx = 2.
(In fact by a standard exercise using Parseval’s identity in the theory of Fourier series,∑∞
i=1 i
−2 is actually equal to π2/6.)
We will be considering at first a real-valued process {Xt : t ∈ D}. To show continuity by
considering Xt − Xs for all pairs (s, t) doesn’t work – there are too many pairs. Kolmogorov’s
proof circumvents this problem by considering only a restricted collection of pairs. To bound
X15/32 − X11/32, for example, we compare X15/32 to X7/16, compare X7/16 to X3/8, and compare
X3/8 to X1/4, and we also compare X11/32 to X5/16 and compare X5/16 to X1/4. The advantage
of this complicated way of matching pairs is that each comparison, say, for example X3/8 to
X1/4, is used for a great many of the possible pairs (s, t).
The proof of Theorem 8.1 has three main steps. Step 1 is to reduce the problem to proving
the bound (8.3). The second step is to set up the comparisons that we need, and the third is
to obtain estimates on all the comparisons.
Theorem 8.1 Suppose {Xt : t ∈ D} is a real-valued process and there exist c1, ε, and p > 0
such that
E [|Xt − Xs|p] ≤ c1|t − s|1+ε, s, t ∈ D. (8.1)
Then the following hold.
(1) There exists c2 depending only on c1, p, and ε such that for M > 0,
P
(
sup
s,t∈D,s�=t
|Xt − Xs|
|t − s|ε/4p ≥ M
)
≤ c1/M p. (8.2)
(2) With probability one, Xt is uniformly continuous on D.
Proof Step 1. Let λn = M2−(n+1)ε/4p and
An =
{|Xt − Xs| ≥ λn for some s, t ∈ D with |t − s| ≤ 2−n}.
49

50 The continuity of paths
Recall our convention that the letter c denotes unimportant constants which can change from
line to line. We will show
P(An) ≤ c2−nε/4M−p. (8.3)
This implies (1) and (2) as follows. If |Xt − Xt | ≥ M |t − s|ε/4p for some s, t ∈ D with s �= t,
choose n such that 2−(n+1) < |t − s| ≤ 2−n, and then An holds. The event on the left-hand side of (8.2) is contained in ∪nAn, and using (8.3) shows that P(∪nAn) ≤ cM−p ∞∑ n=1 2−nε/4 = cM−p, which implies (1). Let BM = { sup s,t∈D,s�=t |Xt − Xt |/|t − s|ε/4p ≥ M}. Note BM decreases as M increases and from (1) we have P(∩∞M=1BM ) = 0. Thus except for an event of probability zero, each ω is in BcM for some M (where M depends on ω), and this implies (2). Thus we must show (8.3). Step 2. Define a( j, t) to be the integer multiple of 2− j that is closest to t (if there are two different multiples that are equally close, we use some convention to break the tie). If t ∈ Dm, then a(m, t) = t. If |t − s| ≤ 2−n, then |a(n, t) − a(n, s)| ≤ 2−n+2. Now if s, t ∈ Dm and m ≥ n, we use the triangle inequality to write |Xt − Xs| = |Xa(m,t) − Xa(m,s)| (8.4) ≤ |Xa(n,t) − Xa(n,s)| + |Xa(n+1,t) − Xa(n,t)| + · · · + |Xa(m,t) − Xa(m−1,t)| + |Xa(n+1,s) − Xa(n,s)| + · · · + |Xa(m,s) − Xa(m−1,s)|. If |Xa(n,t) − Xa(n,s)| < λn/2 and for each i |Xa(n+i+1,t) − Xa(n+i,t)| < λn 8(i + 1)2 and the same with t replaced by s, then by (8.4) |Xt − Xs| < λn 2 + 2 ∞∑ i=0 λn 8(i + 1)2 ≤ λn. Hence if |Xt − Xs| ≥ λn for some s, t ∈ Dm, then at least one of the events E, Fi, or Gi, i ≥ 0, must hold, where E = {|Xa(n,t) − Xa(n,s)| ≥ λn/2 for some s, t ∈ Dn with |s − t| ≤ 2−n}, Fi = {|Xa(n+i+1,t) − Xa(n+i,t)| ≥ λn/8(i + 1)2 for some t}, Gi = {|Xa(n+i+1,s) − Xa(n+i,s)| ≥ λn/8(i + 1)2 for some s}. The continuity of paths 51 Step 3. For the event E to hold, we must have |Xr − Xq| ≥ λn/2 for some q, r ∈ Dn with |q − r| ≤ 2−n+2. There are at most c2n such pairs (q, r), so the probability of E is bounded, using Chebyshev’s inequality and (8.1), by (c2n) sup q∈Dn,r∈Dn+1,|r−q|≤2−n+2 P(|Xr − Xq| ≥ λn/2) ≤ c2n supq∈Dn,r∈Dn+1,|r−q|≤2−n+2 E [ |Xr − Xq| p] (λn/2)p ≤ c2 n λ p n (2−n+2)1+ε ≤ c2 −nε λ p n . For Fi to hold, that is, for |Xa(n+i+1,t) − Xa(n+i,t)| to be greater than λn/8(i + 1)2 for some t, we must have |Xr −Xq| ≥ λn/8(i+1)2 for some r ∈ Dn+i, q ∈ Dn+i+1 with |r−q| ≤ 2−n−i+2. There are at most c2n+i such pairs, and so the probability of Fi is bounded by (c2n+i) sup r∈Dn+i,q∈Dn+i+1,|r−q|≤2−n−i+2 P ( |Xr − Xq| ≥ λn 8(i + 1)2 ) ≤ c2 n+i2(−n−i+2)(1+ε)(8(i + 1)2)p λ p n ≤ c2 −nε2−iε/2 λ p n . Here we used the fact that 2−iε(i + 1)2p ≤ c2−iε/2 for some constant c depending on p and ε but not i. We have the same bound for Gi. Therefore P(∪i(Fi ∪ Gi) ∪ E) ≤ ∞∑ i=0 c2−nε/22−iε/2 λ p n + c2 −nε/2 λ p n ≤ c2−nε/2λ−pn . Letting m → ∞ we have P(An) ≤ c2−nε/2λ−pn = c2−nε/4M−p as required. The proof of Theorem 8.1 is an example of what is known as a metric entropy or chaining argument. In the above, the only place we relied on the fact that we were using real-valued processes was in using the triangle inequality. Therefore with only slight changes in notation, we have the following theorem. 52 The continuity of paths Theorem 8.2 Suppose X takes values in some metric space S with metric dS and there exist c1, ε, and p > 0 such that
E [dS (Xs, Xt )
p] ≤ c1|t − s|1+ε, s, t ∈ D. (8.5)
Then the following hold.
(1) There exists c2 depending only on c1, p, and ε such that for M > 0,
P
(
sup
s,t∈D,s�=t
dS (Xs, Xt )
|t − s|ε/2p ≥ M
)
≤ c1/M p.
(2) With probability one, Xt is uniformly continuous on D.
Remark 8.3 Theorem 8.2 holds for random variables indexed by time, but the analogous
result holds for the continuity in a of random variables X a indexed by some parameter a
running through D. We may also let the parameter a run instead through the dyadic rationals
in [b1, b2] for any b1 < b2. The proof of the following corollary is an adaptation of the proof of Theorem 8.1 and is left as Exercise 8.1. Corollary 8.4 Suppose there exist c1, ε, N, and p > 0 such that if n ≤ N,
E [dS (Xs, Xt )
p] ≤ c|t − s|1+ε, s, t ∈ Dn.
Then there exists c2 depending on c1, ε, and p but not N such that for M > 0 and n ≤ N we
have
P
(
sup
s,t∈Dn,s�=t
dS (Xs, Xt )
|t − s|ε/2p ≥ M
)
< c2M −p. Recall the definition of Hölder continuity from (7.1). Proposition 8.5 If α < 1/2, then the paths of a one-dimensional Brownian motion {Wt; 0 ≤ t ≤ 1} are Hölder continuous of order α with probability one. Proof By the stationary increments property and scaling, E |Wt − Ws|p = E |Wt−s|p = |t − s|p/2E |W1|p. If α < 1/2, choose p large enough so that ((p/2) − 1)/p > α and then take ε = (p/2) − 1.
(Here ε is large!) Take γ sufficiently small that (ε/p) − γ > α. Then by Exercise 8.2 the
paths of Wt are Hölder continuous of order α, with probability one, provided we restrict
t to D. But the paths of Brownian motion are continuous, so we see that we have Hölder
continuity of order α when t ∈ [0, 1].
Exercises
8.1 Prove Corollary 8.4.
8.2 If the hypothesis of Theorem 8.1 holds and γ < ε/p, show that there exists c2 depending only on c1, ε, γ , and p such that for M > 0
P
(
sup
s,t∈D,s�=t
dS (Xs, Xt )
|t − s|(ε/p)−γ ≥ M
)
≤ cM−p.

Exercises 53
8.3 Suppose X is a real-valued process and there exist constants c1, c2 such that
P(|Xt − Xs| > λ) ≤ c1e−c2λ log
4(1/|t−s|), s, t ∈ [0, 1].
Prove that with probability one, X has a version which is uniformly continuous on the dyadic
rationals in [0, 1].
8.4 Suppose (Xt , t ∈ [0, 1]) is a mean zero Gaussian process and there exist c and ε such that
Var (Xt − Xs) ≤ c|t − s|ε, s, t ∈ [0, 1].
Prove that there is a version of X that has continuous paths on [0, 1].
8.5 Let X be as in Exercise 8.4. For what values α will X have paths that are Hölder continuous of
order α? (α will depend on ε.)
8.6 Let {Xs,t : s, t ∈ [0, 1]} be a collection of random variables. Suppose there exist c, p, and ε > 0
such that
E |Xs′,t ′ − Xs,t |p ≤ c(|t ′ − t| + |s′ − s|)2+ε.
Prove that with probability one, the map (s, t) → Xs,t (ω) is uniformly continuous on D ×D =
{(s, t); s, t ∈ D}.

9
Continuous semimartingales
Roughly speaking, a semimartingale is the sum of a martingale and a process whose paths
are of bounded variation. In this chapter we consider semimartingales whose paths are con-
tinuous. We will give definitions, and then investigate in more detail the class of martingales
that are square integrable. Finally we present a proof of the Doob–Meyer decomposition for
continuous supermartingales. The Doob–Meyer decomposition used to be considered a very
hard theorem, but at least in the continuous case, an elementary proof is possible. For a proof
for the general case, see Chapter 16.
9.1 Definitions
Let {F t} be a filtration satisfying the usual conditions and let
F∞ =
∨
t≥0
Ft = σ
(⋃
t≥0
Ft
)
.
We say a process X has increasing paths or that X is an increasing process if the functions
t → Xt (ω) are increasing with probability one. Throughout this book saying f is “increasing”
means that s < t implies f (s) ≤ f (t), while saying f is “strictly increasing” means that s < t implies f (s) < f (t). A process X with paths of bounded variation is just what one would expect: with probability one, the functions t → Xt (ω) are of bounded variation. We say X has paths locally of bounded variation if there exist stopping times Rn → ∞ such that the process Xt∧Rn has paths of bounded variation for each n. We turn to martingales. A martingale M is a uniformly integrable martingale if the family of random variables {Mt} is uniformly integrable. A process X is a local martingale if there exist stopping times Rn → ∞ such that Mnt = Xt∧Rn is a uniformly integrable martingale for each n. A martingale whose paths are continuous is called a continuous martingale and we similarly define a right-continuous martingale. A semimartingale is a process X of the form Xt = Mt + At , where Mt is a local martingale and At is a process whose paths are locally of bounded variation. As a consequence of the Doob–Meyer decomposition we will see that submartingales and supermartingales are semimartingales. As an example, a Brownian motion Wt is a martingale and is a local martingale (let Rn be identically equal to n), but is not a uniformly integrable martingale. We will define what it means to be a square integrable martingale in the next section; Brownian motion is not a square integrable martingale. 54 9.2 Square integrable martingales 55 9.2 Square integrable martingales Definition 9.1 A martingale is a square integrable martingale if there exists aF∞ measurable random variable M∞ such that E M2∞ < ∞ and Mt = E [M∞ | Ft] for all t. An example of a square integrable martingale would be Mt = Wt∧t0 , whereWt is a Brownian motion and t0 is a fixed time; in this case M∞ = Wt0 . Proposition 9.2 Let {Ft} be a filtration satisfying the usual conditions and M a right con- tinuous process. The following are equivalent: (1) Mt is a square integrable martingale. (2) M is a martingale with supt≥0 E M 2 t < ∞. (3) M is a martingale with E [supt≥0 M 2 t ] < ∞. Proof To show (1) implies (2), suppose M is a square integrable martingale. Then by Jensen’s inequality for conditional expectations (Proposition A.21), E M2t = E [(E [M∞ | Ft])2] ≤ E [E [M2∞ | Ft] ] = E M2∞. To show (2) implies (3), for each N , E [ sup 0≤t≤N M2t ] ≤ 4E M2N by Doob’s inequalities. That (2) implies (3) follows by letting N → ∞ and using Fatou’s lemma. Now suppose (3) holds, and we will show (1) holds. Since E M2n is uniformly bounded in n, the martingale convergence theorem (Theorem A.35) implies that Mn converges almost surely and in L2. Let us call the limit M∞; we have E M2∞ < ∞ by the L2 convergence. Since E M2n is uniformly bounded, then Mn is a uniformly integrable martingale, and by Proposition A.37, Mn = E [M∞ | Fn]. If n − 1 ≤ t ≤ n, we have Mt = E [Mn | Ft] = E [ E [M∞ | Fn] | Ft] = E [M∞ | Ft], as required. For the remainder of this section all our martingales will have paths that are right continuous with left limits. Proposition 9.3 If M is a square integrable martingale and S ≤ T are finite stopping times, then E [MT | FS] = MS. Proof Let A ∈ FS and define U (ω) = S(ω)1A(ω) + T (ω)1Ac (ω). Thus U is equal to S if ω ∈ A and otherwise is equal to T . Since A ∈ FS ⊂ FT , then we have (U ≤ t) = [(S ≤ t) ∩ A] ∪ [(T ≤ t) ∩ Ac] is in Ft , and therefore U is a stopping time. By Proposition 3.11, E M0 = E MU = E [MS; A] + E [MT ; Ac] and E M0 = E MT = E [MT ; A] + E [MT ; Ac]. 56 Continuous semimartingales These two equations imply that E [MS; A] = E [MT ; A], which is what we needed to prove. By Exercise 3.11, the conclusion is valid if M is a uniformly integrable martingale. As an immediate corollary we have Corollary 9.4 Suppose M is a square integrable martingale and T is a stopping time. Then Xt = Mt∧T is a martingale with respect to {Ft∧T }. The proof of the following proposition is similar to that of Proposition 9.3. It may be viewed as a converse of the optional stopping theorem. Proposition 9.5 Suppose {Ft} is a filtration satisfying the usual conditions and M is a process that is adapted to {Ft} such that Mt is integrable for each t. If E MT = 0 for every bounded stopping time T , then Mt is a martingale. Proof Suppose s < t and A ∈ Fs. Define T to be equal to s if ω ∈ A and equal to t if ω /∈ A. As in the proof of Proposition 9.3, but even more simply, T is a stopping time, so 0 = E MT = E [Ms; A] + E [Mt; Ac]. The fixed time t is a stopping time, hence 0 = E Mt = E[Mt; A] + E [Mt; Ac]. Comparing, E [Mt; A] = E [Ms; A], which proves M is a martingale. Proposition 9.6 Suppose Mt is a square integrable martingale. Then E [(MT − MS )2 | FS] = E [M2T − M2S | FS]. (9.1) Proof By Proposition 9.3 E [(MT − MS )2 | FS] = E [M2T | FS] − 2MSE [MT | FS] + M2S = E [M2T | FS] − M2S = E [M2T − M2S | FS] and we are done. If we take expectations in (9.1), we obtain E [(MT − MS )2] = E M2T − E M2S . (9.2) Theorem 9.7 Suppose M0 = 0, Mt is a continuous local martingale, and the paths of Mt are locally of bounded variation. Then M is identically 0, a.s., that is, P(Mt = 0 for all t) = 1. Proof Using the definition of local martingale, it suffices to suppose M is a continuous uniformly integrable martingale. Let t0 be fixed and let At denote the total variation of the paths of M up to time t. If TN = inf{t : At ≥ N}, we look at MNt = MTN ∧t∧t0 . Using Proposition 9.3 and the remark following it, we see that MN is also a continuous martingale with paths of bounded variation, and if MN is identically zero, then letting N → ∞ and t0 → ∞, we obtain our result. Therefore it suffices to suppose the total variation of Mt is bounded by N , a.s. In particular, Mt is bounded by N . 9.3 Quadratic variation 57 Let n ≥ 1 and set Vn = sup k≤2n−1 |M(k+1)t0/2n − Mkt0/2n |. Note Vn ≤ 2N , a.s., and Vn → 0, a.s., as n → ∞ by the uniform continuity of the paths of M on [0, t0]. By dominated convergence, EVn → 0 as n → ∞. We write E M2t0 = E [ 2n−1∑ k=0 (M2(k+1)t0/2n − M2kt0/2n ) ] = E [ 2n−1∑ k=0 (M(k+1)t0/2n − Mkt0/2n )2 ] ≤ E [ Vn 2n−1∑ k=0 |M(k+1)t0/2n − Mkt0/2n | ] ≤ NEVn. The second equality follows by (9.2). Since n is arbitrary and EVn → 0, then E M2t0 = 0. By Doob’s inequalities, E [sups≤t0 M 2 s ] = 0. Hence M is identically 0 up to time t0. 9.3 Quadratic variation Definition 9.8 A continuous square integrable martingale Mt has quadratic variation 〈M〉t (sometimes written 〈M, M〉t) if M2t − 〈M〉t is a martingale, where 〈M〉t is a continuous adapted increasing process with 〈M〉0 = 0. In the case where W is a Brownian motion, t0 is fixed, and Mt = Wt∧t0 the quadratic variation of M is just 〈M〉t = t ∧ t0 by Example 3.3. Brownian motion itself does not fit perfectly into the framework of stochastic integration because it is not a square integrable martingale, although it is a martingale; we will be dealing with this point several times in what follows. We will show existence and uniqueness of 〈M〉t by means of the Doob–Meyer decompo- sition, Theorem 9.12, below. However we defer the proof of the Doob–Meyer decomposition until the next section. A process Z is of class D if {ZT : T a finite stopping time} is a uniformly integrable family of random variables. Theorem 9.9 Let Mt be a continuous square integrable martingale. There exists a continuous adapted increasing process 〈M〉t with 〈M〉0 = 0 and with increasing paths such that M2t − 〈M〉t is a martingale. If At is a continuous adapted increasing process such that M2t − At is a martingale, then P(At �= 〈M〉t for some t) = 0. Proof By Jensen’s inequality for conditional expectations, E [M2t | Fs] ≥ (E [Mt | Fs])2 = M2s if s < t, and so M2t is a submartingale. Since M∞ is square integrable, given ε there exists δ such that E [M2∞; A] < ε if P(A) < δ. Since M2t is a submartingale, if K > E M2∞/δ, then
P(M2t > K) ≤ E M2t /K ≤ E M2∞/K < δ, 58 Continuous semimartingales and consequently E [M2t ; M2t > K] ≤ E [M2∞; M2t > K] < ε. By Exercise 3.11, M2t is of class D. Applying the Doob–Meyer decomposition (Theorem 9.12) to −M2t , we write −M2t = Nt − Bt , where Nt is a martingale and Bt has increasing paths. We then set 〈M〉t = Bt . The uniqueness follows from the uniqueness part of the Doob–Meyer decomposition. In view of Proposition 9.3 and the definition of 〈M〉, we have E [(MT − MS )2 − (〈M〉T − 〈M〉S ) | FS] (9.3) = E [M2T − M2S − (〈M〉T − 〈M〉S ) | FS] = 0 if S and T are finite stopping times and M is a continuous square integrable martingale. If M and N are two square integrable martingales, we define 〈M, N〉t by 〈M, N〉t = 12 [〈M + N〉t − 〈M〉t − 〈N〉t]. (9.4) This is sometimes called the covariation of M and N . An alternative representation of 〈M〉t is the following. A proof could be given now, but it is a bit messy. After we have Itô’s formula this will be easier. Theorem 9.10 Let M be a square integrable martingale and let t0 > 0. Then 〈M〉t is the
limit in probability of
[2nt0]∑
k=0
(M(k+1)/2n − Mk/2n )2,
where [2nt0] is the largest integer less than or equal to 2nt0.
9.4 The Doob–Meyer decomposition
In this section we give a proof of the Doob–Meyer decomposition for continuous super-
martingales. First we need the following inequality, which has many other uses as well.
Proposition 9.11 Suppose A1 and A2 are two increasing adapted continuous processes
starting at zero with Ai∞ = limt→∞ Ait < ∞, a.s., i = 1, 2, and suppose there exists a positive real K such that for all t, E [Ai∞ − Ait | Ft] ≤ K, a.s., i = 1, 2. (9.5) Let Bt = A1t − A2t . Suppose there exists a non-negative random variable V with EV 2 < ∞ such that for all t, |E [B∞ − Bt | Ft] | ≤ E [V | Ft], a.s. (9.6) Then E sup t≥0 B2t ≤ 8EV 2 + 8 √ 2K(EV 2)1/2. (9.7) Proof We start by showing E (Ai∞) 2 ≤ 2K2, i = 1, 2. (9.8) 9.4 The Doob–Meyer decomposition 59 First suppose Ai∞ is bounded by a positive real number L. Note that we have E Ai∞ = E [E [Ai∞ − Ai0 | F0] ] ≤ K. A simple calculation shows that (Ai∞) 2 = 2 ∫ ∞ 0 (Ai∞ − Ait ) dAit . We then have, using Proposition 3.14, E (Ai∞) 2 = 2E ∫ ∞ 0 (Ai∞ − Ait ) dAit = 2E ∫ ∞ 0 (E [Ai∞ | Ft] − Ait ) dAit = 2E ∫ ∞ 0 E [Ai∞ − Ait | Ft] dAit ≤ 2KE ∫ ∞ 0 dAit = 2KE Ai∞ ≤ 2K2. If we let TL = inf{t : A1t + A2t ≥ L} and Ai,Lt = Ait∧TL , then (9.5) still holds if we replace Ait by Ai,Lt . We obtain E (A i,L ∞ ) 2 ≤ 2K2, and then letting L → ∞ and using Fatou’s lemma proves (9.8). We next write B2∞ = 2 ∫ ∞ 0 (B∞ − Bt ) dBt, and hence E B2∞ = 2E ∫ ∞ 0 E [B∞ − Bt | Ft] dBt ≤ E ∫ ∞ 0 E [V | Ft] d(A1t + A2t ) = E ∫ ∞ 0 V d(A1t + A2t ) = E [V (A1∞ + A2∞)]. The bound (9.8) takes care of the integrability concerns. By the Cauchy–Schwarz inequality we obtain E B2∞ ≤ (E [(A1∞ + A2∞)2])1/2(EV 2)1/2 ≤ 2 √ 2K(EV 2)1/2. Now let Mt = E [B∞ | Ft], Nt = E [V | Ft], where we take the right–continuous versions (see Corollary 3.13), and let Xt = Mt − Bt . We have |Xt | = |E [B∞ − Bt | Ft] | ≤ Nt, and using Doob’s inequalities, E sup t≥0 X 2t ≤ E sup t≥0 N2t ≤ 4E N2∞ = 4EV 2. Also by Doob’s inequalities, E sup t≥0 M2t ≤ 4E M2∞ = 4E B2∞. Since supt≥0 |Bt | ≤ supt≥0 |Xt | + supt≥0 |Mt |, our result follows. 60 Continuous semimartingales We now prove the Doob–Meyer decomposition for continuous supermartingales. In view of the proof of Proposition A.30, we would like to let At = ∫ t 0 E [dZs ds | Fs ] ds, but this doesn’t make sense. We instead define an approximation Aht by (9.9) and show that Aht converges to what we want as h → 0. Theorem 9.12 Suppose Zt is a continuous adapted supermartingale of class D. Then there exists an increasing adapted continuous process At with paths locally of bounded variation started at 0 and a continuous local martingale Mt such that Zt = Mt − At . If M ′ and A′ are two other such processes with Zt = M ′t − A′t , then Mt = M ′t and At = A′t for all t, a.s. Proof Let us prove the second assertion first. Let SN be the first time that |Mt | + |M ′t | exceeds N . If Zt = Mt − At = M ′t − A′t, then Mt∧SN − M ′t∧SN = At∧SN − A′t∧SN is a martingale whose paths are locally of bounded variation. By Theorem 9.7, Mt∧SN = M ′t∧SN , a.s. Since this is true for all N , then Mt = M ′t . Now let us prove the existence of M and A. Let TN = inf{t : |Zt | ≥ N}∧N and ZNt = Zt∧TN . By Exercise 9.2, ZN is a supermartingale. If we prove the decomposition ZNt = MNt − ANt for each N , then by the uniqueness assertion, if N1 < N2, we have A N1 t and M N1 t agreeing with AN2t and M N2 t , respectively, for t ≤ TN1 . Hence given t, we can choose N large enough so that t ≤ TN and then define Mt = MNt , At = ANt . Clearly this gives the desired decomposition. Thus we may suppose that Zt is bounded by some N and that Zt is constant for t ≥ N . Let Vδ = sup|t−s|≤δ |Zt − Zs|. Since Z has continuous paths, Vδ = sup s,t∈Q+,|t−s|≤δ |Zt − Zs|, and therefore Vδ is measurable with respect to F∞. Since the paths of Z are uniformly continuous, Vδ → 0, a.s., as δ → 0, and since |Vδ| ≤ 2N , we have by dominated convergence that EV 2δ → 0 as δ → 0. We define Aht = 1 h ∫ t 0 (Zs − E [Zs+h | Fs]) ds. (9.9) At this point we do not know even that E [Zs+h | Fs] has any nice measurability prop- erties (it is not a martingale, for example); let us assume that it has a version that has continuous paths, is adapted, and is jointly measurable in t and ω, and prove this 9.4 The Doob–Meyer decomposition 61 fact a bit later on. Because Z is a supermartingale, Ah is increasing. We have (note Exercise 9.6) E[Ah∞ − Aht | Ft] = 1 h E [ ∫ ∞ t E [Zs − Zs+h | Fs] ds | Ft ] = 1 h ∫ ∞ t E [Zs − Zs+h | Ft] ds = 1 h E [ ∫ ∞ t Zs ds − ∫ ∞ t+h Zs ds | Ft ] = 1 h E [ ∫ t+h t Zs ds | Ft ] = E [ ∫ 1 0 Zt+uh du | Ft ] . Since Z is bounded by N , it follows that Ah satisfies (9.5). If k < h, then |E [(Ah∞ − Aht ) − (Ak∞ − Akt ) | Ft] | = ∣∣∣E [ ∫ 1 0 (Zt+uh − Zt+uk ) du | Ft ] ∣∣∣ ≤ E [Vh | Ft]. Now apply Proposition 9.11 to see that E supt≥0(A h t − Akt )2 → 0 as k, h → 0. This shows that whenever hn decreases to 0, then Ahn is a Cauchy sequence in a normed linear space, where the norm is given by ‖X ‖ = (E sup t≥0 |Xt |2)1/2, (9.10) which is complete by Exercise 9.5. Therefore there exists a limit A. Since E sup t≥0 (Aht − At )2 → 0 as h → 0, there exists a subsequence hn → 0 such that supt≥0(Ahnt − At )2 → 0, a.s., which proves that At is continuous and increasing. We calculate E [A∞ − At | Ft] = lim h→0 E [Ah∞ − Aht | Ft] = lim h→0 E [ ∫ 1 0 Zt+uh du | Ft ] = E [ ∫ 1 0 Zt du | Ft ] = Zt . Therefore Zt = E [A∞ | Ft] − At, which is the decomposition of Z into a martingale minus an increasing process. 62 Continuous semimartingales Fix h. It remains to show that there is a version of E [Zs+h | Fs] that is a continuous jointly measurable adapted process. Define Yt = Zt+h and define Y nt to be equal to Yk/2n if k/2n ≤ t < (k+1)/2n. Take the right-continuous version Ỹ k,nt of the martingale E [Yk/2n | Ft] (see Corollary 3.13) and let Ỹ nt (ω) = ∞∑ k=0 1[k/2n,(k+1)/2n )(t)Ỹ k,nt (ω). Note that Ỹ nt = E [Y nt | Ft], a.s., for all t. Moreover, Ỹ nt is right continuous, so we see that it is jointly measurable in t and ω. Now for n > m,
sup
t≥0
|Ỹ nt − Ỹ mt | ≤ sup
t≥0
E [V2−m | Ft]. (9.11)
We have already seen that there exists a subsequence such that the right-hand side of
(9.11) converges to 0 almost surely. Hence along the appropriate subsequence, Ỹ nt converges
uniformly. If we call the limit Ỹ , we see that Ỹt is right continuous, adapted, and jointly
measurable. If k/2n ≤ t ≤ (k + 1)/2n, then |Y nt − Y nk/2n | ≤ V2−n , so
|Ỹ nt − Ỹ nk/2n | = |E [Y nt − Y nk/2n | Ft] | ≤ E [V2−n | Ft].
By the triangle inequality,
|Ỹ nt − Ỹ ns | ≤ 2 sup
t≥0
E [V2−n | Ft]
if k/2n ≤ s, t ≤ (k + 1)/2n. Therefore the largest jump of Ỹ nt is bounded by
2 supt≥0 E [V2−n | Ft], and we conclude the limit Ỹ has continuous paths. Finally, Y nt dif-
fers from Yt by at most V2−n , so we see by passing to the limit that Ỹt is a version of
E [Zt+h | Ft].
Exercises
9.1 Let Wt be a Brownian motion started at 1 and T0 = inf{t > 0 : Wt = 0}. Is Mt = Wt∧T0 a
square integrable martingale? A locally square integrable martingale? A uniformly integrable
martingale? A martingale? A local martingale? A semimartingale?
9.2 Prove that if M is a submartingale such that the paths of M are continuous, supt |Mt | is integrable,
and S ≤ T are finite stopping times, then E [MT | FS] ≥ MS . Note that the last part of the proof
of Proposition 9.3 breaks down here.
9.3 Suppose Mt is a local martingale with continuous paths. Show that if N > 0, TN = inf{t :
|Mt | ≥ N}, and MNt = Mt∧TN , then MN is a uniformly integrable martingale.
9.4 Suppose W 1t and W
2
t are two independent Brownian motions, t0 > 0, and M
i
t = Wt∧t0 , i = 1, 2.
Show 〈M1, M2〉t = 0.
9.5 Show that the norm defined in (9.10) is complete.
9.6 Let Zt be a bounded supermartingale with continuous paths that is constant from some time t0
on. Show that for each t
E
[ ∫ ∞
t
E [Zs − Zs+h | Fs] ds | Ft
]
=
∫ ∞
t
E [Zs − Zs+h | Ft ] ds, a.s.

Notes 63
9.7 We mentioned that one can prove the existence of 〈M〉 without using the Doob–Meyer theorem.
Here is how that argument starts. Let M be a bounded continuous martingale and for each n,
define
In(t) =
[t2n]∑
i=0
(M(i+1)/2n − Mi/2n )2.
Here [x] is the integer part of x. Prove that for each t > 0, E |In(t) − Im(t)|2 → 0 as n, m → ∞.
One can then define 〈M〉t as the L2 limit of In(t).
Hint: If n > m, note that
M(i+1)/2m − Mi/2m =
2n−m(i+1)−1∑
j=2n−mi
(M( j+1)/2n − M j/2n ).
Notes
The first proof of the Doob–Meyer decomposition was by Meyer in the early 1960s and was
a major breakthrough. There are now a number of alternate proofs. The proof we give here
for continuous supermartingales is new.

10
Stochastic integrals
This chapter is devoted to the construction of stochastic integrals, primarily with respect to
continuous square integrable martingales. The motivating example is
∫ t
0 Hs dWs, where W
is a Brownian motion and H is an adapted process satisfying certain conditions. We cannot
define this integral as a Lebesgue–Stieltjes integral because the paths of Brownian motion
are nowhere differentiable (Theorem 7.3).
One way to visualize a stochastic integral is to think of dWs as “white noise,” on a radio
and Hs as the volume control which increases or decreases the white noise by a factor. For
another model, if Ws is supposed to represent a stock price at time s (of course, stock prices
can’t be negative, while Brownian motion can!) and Hs is the number of shares held at time
s, then the stochastic integral represents the net profit.
10.1 Construction
Let Mt be a continuous square integrable martingale with respect to a filtration {F t} satisfying
the usual conditions, and suppose Ht is an adapted process. Under appropriate additional
assumptions on H , we want to define
Nt =
∫ t
0
Hs dMs, (10.1)
the stochastic integral of H with respect to M .
We impose two conditions on the integrand Ht , a measurability one and an integrability
one. First we define the predictable σ -field P on [0, ∞) × �. This is the smallest σ -field
of subsets of [0, ∞) × � with respect to which all left continuous, bounded, and adapted
processes are measurable. In symbols,
P = σ (X : X is left continuous, bounded, and adapted to {Ft}).
This can be rephrased by saying P is the σ -field on [0, ∞) × � generated by the collection
of all sets of the form
{(t, ω ∈ [0, ∞) × � : Xt (ω) > a},
where a ∈ R and X is a bounded, adapted, left continuous process. We require H : [0, ∞) ×
� → R to be measurable with respect to P . When this happens, we say H is predictable.
The integrability is easier to state: we require
E
∫ ∞
0
H 2s d〈M〉s < ∞. (10.2) 64 10.1 Construction 65 Observe that H will meet both requirements if H is bounded, adapted, and has continuous paths. We define ∫ t 0 Hs dMs in three steps: Step 1. When Hs(ω) = K(ω)1(a,b](s), where K is bounded and Fa measurable. Step 2. When Hs is a sum of processes of the form in Step 1. Step 3. When H is predictable and satisfies (10.2). If Mt = Wt∧t0 , where W is a Brownian motion and t0 is a fixed time, then 〈M〉t = t ∧ t0, and it might help the reader to work through the proofs in this special case. Even in this situation, all the elements of the general construction are present. We will need the following easy lemma. Lemma 10.1 The predictable σ -field P is generated by the collection C of processes of the form Xt (ω) = ∑n i=1 Ki(ω)1(ai,bi](t), where for each i, Ki is a boundedFai measurable random variable. Proof If X ∈ C, then X is bounded, adapted, and left continuous, hence X is a predictable process. Thus C ⊂ P . On the other hand, if Y is a bounded, adapted, left-continuous process, we can approximate Y by the processes Y nt (ω) = n2n∑ i=0 Yi/2n (ω)1(i/2n,(i+1)/2n](t). Each such Y n is in C. Therefore the σ -field generated by C contains P . Proposition 10.2 Suppose H is as in Step 1 above. Then Nt = K(Mt∧b − Mt∧a) is a continuous martingale, E N2∞ = E ∫ ∞ 0 K21(a,b](s) d〈M〉s = E [K2(〈M〉b − 〈M〉a)], and 〈N〉t = ∫ t 0 K21(a,b](s) d〈M〉s. Proof The continuity of the paths of N is clear. Set N∞ = K(Mb −Ma). Since K is bounded and M is square integrable, E N2∞ < ∞. We will show Nt = E [N∞ | Ft], which will prove that Nt is a martingale. If t ≥ b, then since K, Mb, and Ma are Ft measurable, E [N∞ | Ft] = K(Mb − Ma) = Nt . If a ≤ t ≤ b, K is Ft measurable, and E [K(Mb − Ma) | Ft] = KE [Mb − Ma | Ft] = K(Mt − Ma) = Nt . In particular, Na = E [N∞ | Fa] = 0. Finally, if t ≤ a, E [N∞ | Ft] = E [E [N∞ | Fa] | Ft] = 0 = Nt . 66 Stochastic integrals For E N2∞, we have by (9.2) with S = a and T = b, E N2∞ = E [K2(Mb − Ma)2] = E [K2E [(Mb − Ma)2 | Fa] ] = E [K2E [〈M〉b − 〈M〉a | Fa] ] = E [K2(〈M〉b − 〈M〉a)]. To verify the formula for 〈N〉t , let L∞ = K2(Mb − Ma)2 − K2(〈M〉b − 〈M〉a), Lt = K2(Mb∧t − Ma∧t )2 − K2(〈M〉b∧t − 〈M〉a∧t . Then Lt = N2t − ∫ t 0 K21(a,b](s) d〈M〉s, and we must show that Lt is a martingale. To do this, it suffices to show Lt = E [L∞ | Ft]. If t ≥ b, then L∞ is Ft measurable, so E [L∞ | Ft] = L∞ = Lt . If a ≤ t ≤ b, then E [L∞ | Ft] = K2E [(Mb − Ma)2 − (〈M〉b − 〈M〉a) | Ft] = K2E [M2b − M2a − (〈M〉b − 〈M〉a) | Ft] = K2E [M2t − M2a − (〈M〉t − 〈M〉a) | Ft] = K2E [(Mt − Ma)2 − (〈M〉t − 〈M〉a) | Ft] = Lt, using (9.1) and (9.3) with the stopping times there being fixed positive real numbers. In particular, E [L∞ | Fa] = La = 0. Finally, if t ≤ a, E [L∞ | Ft] = E [E [L∞ | Fa] | Ft] = 0 = La as required. Next suppose Hs(ω) = J∑ j=1 Kj1(a j,b j](s), (10.3) where each Kj isFa j measurable and bounded. We may rewrite H so that the intervals (aj, bj] satisfy a1 < b1 ≤ a2 < b2 ≤ · · · ≤ aJ < bJ . For example, if Hs = K11(a1,b1] + K21(a2,b2] with a1 < a2 < b1 < b2, we may rewrite Hs as K11(a1,a2] + (K1 + K2)1(a2,b1] + K21(b1,b2]. Define Nt = J∑ j=1 Kj(Mt∧b j − Mt∧a j ). (10.4) We need to check that rewriting Hs so that a1 < b1 ≤ a2 < · · · < bJ does not affect the value of Nt , but this is routine. 10.1 Construction 67 Proposition 10.3 With H as in (10.3) and N defined by (10.4), Nt is a continuous martingale, E N2∞ = E ∫ ∞ 0 H 2s d〈M〉s, and 〈N〉t = ∫ t 0 H 2s d〈M〉s. (10.5) Proof By linearity, Nt is a continuous martingale. We have E N2∞ = E [∑ j H 2j (Mbj − Maj )2 ] (10.6) + 2E [∑ i< j HiHj(Mbi − Mai )(Mbj − Maj ) ] . The cross terms vanish, because when i < j and we condition on Fa j , we have E [HiHj(Mbi − Mai )E [(Mbj − Maj ) | Fa j ] ] = 0. For the terms in the first sum in (10.6), by (9.3) E [H 2j (Mbj − Maj )2] = E [H 2j E [(Mbj − Maj )2 | Fa j ] ] = E [H 2j E [〈M〉b j − 〈M〉a j | Fa j ] ] = E [H 2j ([〈M〉b j − 〈M〉a j ])]. Therefore E N2∞ = E ∫ ∞ 0 H 2s d〈M〉s. (10.7) The argument for 〈N〉t is similar. Now suppose Hs is predictable and (10.2) holds. Choose H ns of the form given in (10.3) above such that E ∫ ∞ 0 (H ns − Hs)2 d〈M〉s → 0. To see that this can be done, define ‖Y ‖2 = ( E ∫ ∞ 0 Y 2t d〈M〉t )1/2 for Y predictable. Then ‖Y ‖2 is an L2 norm on functions on [0, ∞) × �, so by Lemma 10.1 we can approximate H in this norm by processes of the form given in (10.3). (When H is bounded, adapted, and has continuous paths, taking H ns equal to Hk/2n if k/2 n < s ≤ (k+1)/2n for s < n and H ns = 0 if s ≥ n will work.) By Doob’s inequalities we have E [ sup t≥0 ( ∫ t 0 (H ns − H ms ) dMs )2] ≤ 4E ( ∫ ∞ 0 (H ns − H ms ) dMs )2 = 4E ∫ ∞ 0 (H ns − H ms )2 d〈M〉s → 0. 68 Stochastic integrals The norm ‖Y ‖∞ = (E [sup t |Yt |2])1/2 (10.8) is complete; this was shown in Exercise 9.5. Thus there exists a process Nt such that supt≥0 |Nt − ∫ t 0 H n s dMs| → 0 in L2. If H ns and H n s are two sequences converging in the ‖ · ‖2 norm to H , then E ( ∫ t 0 (H ns − H ns ) dMs )2 = E ∫ t 0 (H ns − H ns )2 d〈M〉s → 0, or the limit is independent of which sequence H n we choose. It is easy to see, because of the L2 convergence, that Nt is a martingale: if A ∈ Fs, then E [ ∫ t 0 H nr dMr; A ] = E [ ∫ s 0 H nr dMr; A ] by Proposition 10.3. Now use that∣∣∣E [ ∫ t 0 H nr dMr − Nt; A ]∣∣∣ ≤ E ∣∣∣ ∫ t 0 H nr dMr − Nt ∣∣∣ ≤ ( E ( ∫ t 0 H nr dMr − Nt )2)1/2 → 0 and similarly with t replaced by s. Similar arguments using the L2 convergence show that E N2t = E ∫ t 0 H 2s d〈M〉s, (10.9) and 〈N〉t = ∫ t 0 H 2s d〈M〉s. (10.10) Because supt≥0 |Nt − ∫ t 0 H n s dMs| → 0 in L2, there exists a subsequence {nk} such that the convergence takes place almost surely, that is sup t≥0 ∣∣∣ ∫ t 0 H nks dMs − Nt ∣∣∣→ 0, a.s. Since each ∫ t 0 H n s dMs has continuous paths, with probability one, Nt has continuous paths. We write Nt = ∫ t 0 Hs dMs and call Nt the stochastic integral of H with respect to M . We summarize our construction as follows. Theorem 10.4 Suppose the filtration {Ft} satisfies the usual conditions and Mt is a square integrable martingale with continuous paths. Suppose H is of the form J∑ i=1 Kj(ω)1(a j,b j](s), (10.11) 10.2 Extensions 69 where each Kj is bounded and Fa j measurable. In this case define∫ t 0 Hs dMs = J∑ j=1 Kj(Mt∧b j − Mt∧a j ). If H is predictable and E ∫ ∞ 0 H 2s d〈M〉s < ∞, choose H n of the form given in (10.11) with E ∫∞ 0 (H n s − Hs)2 d〈M〉s → 0, and define Nt = ∫ t 0 Hs dMs to be the limit with respect to the norm (10.8) of ∫ t 0 H n s dMs. Then Nt is a continuous martingale, E N2∞ = E ∫ ∞ 0 H 2s d〈M〉s, and 〈N〉t = ∫ t 0 H 2s d〈M〉s. Moreover the definition of Nt is independent of the particular choice of the H n. 10.2 Extensions There are some extensions of the definition that are fairly routine. Extension 1. If ∫ ∞ 0 H 2s d〈M〉s < ∞, a.s., but without the expectation being finite, let TN = inf { t : ∫ t 0 H 2s d〈M〉s > N
}
.
M ′t = Mt∧TN is a square integrable martingale with 〈M ′〉t = 〈M〉t∧TN , so
∫ t
0 H
2
s d〈M ′〉t ≤ N .
Define
∫ t
0 Hs dMs to be the quantity
∫ t
0 Hs dMs∧TN if t ≤ TN . If t ≤ TK ≤ TN , we need to
check that
∫ t
0 Hs d〈M〉t∧TK =
∫ t
0 Hs d〈M〉t∧TN , so that our definition is consistent. This is part
of Exercise 10.2.
Extension 2. If Mt is a continuous local martingale (see Section 9.1 for the definition), let
Sn = inf{t : |Mt | ≥ n}. By Exercise 9.3, Mt∧Sn will be a uniformly integrable martingale,
and in fact, since Mt∧Sn is bounded, it is square integrable. For t ≤ Sn we set∫ t
0
Hs dMs =
∫ t
0
Hs dMs∧Sn
and 〈M〉t = 〈M〉t∧Sn . Again there is consistency to check, which is also part of Exercise 10.2.

70 Stochastic integrals
Extension 3. Suppose that Xt = Mt + At is a semimartingale with continuous paths, so
that M is a local martingale and A is a process with paths locally of bounded variation. If∫∞
0 H
2
s d〈M〉s +
∫∞
0 |Hs| |dAs| < ∞, we define∫ t 0 Hs dXs = ∫ t 0 Hs dMs + ∫ t 0 Hs dAs, where the first integral on the right is a stochastic integral and the second is a Lebesgue– Stieltjes integral. For a semimartingale, we define 〈X 〉t = 〈M〉t . (10.12) Given two semimartingales X and Y we define 〈X ,Y 〉t by: 〈X ,Y 〉t = 12 [〈X + Y 〉t − 〈X 〉t − 〈Y 〉t]. Exercises 10.1 Prove (10.5) in Proposition 10.3. 10.2 Check the consistency of the first two extensions of the definition of stochastic integrals. 10.3 Show that if M is a continuous square integrable martingale, and T a finite stopping time, then∫ ∞ 0 1[0,T ] dMs = MT . 10.4 Show that if Nt = ∫ t 0 Hs dMs where M is a continuous square integrable martingale, H is predictable, and E ∫∞ 0 H 2 s d〈M〉s < ∞, and Lt = ∫ t 0 Ks dNs, where K is predictable and E ∫∞ 0 K 2 s d〈N〉s < ∞, then Lt = ∫ t 0 HsKs dMs. 10.5 Show that if M , H , and N are as in Exercise 10.4, then 〈M, N〉t = ∫ t 0 Hs d〈M〉s. Hint: Derive a formula for 〈N + M〉t from the fact that Nt + Mt = ∫ t 0 (1 + Hs) dMs. 10.6 Suppose that M and L are square integrable martingales, H is predictable and satisfies (10.2), and Nt = ∫ t 0 Hs dMs. Show that 〈N, L〉t = ∫ t 0 Hs d〈M, L〉s. (10.13) Sometimes the stochastic integral of H with respect to M is defined to be the square integrable martingale N for which (10.13) holds for all square integrable martingales L. 10.7 Show that if M and N are square integrable martingales with continuous paths, then 〈M, N〉t ≤ (〈M〉t )1/2(〈N〉t )1/2. Hint: Imitate an appropriate proof of the Cauchy–Schwarz inequality. This result is a special case of the inequality of Kunita–Watanabe. 11 Itô’s formula The most important result in the theory of stochastic integration is Itô’s formula. This is also known as the change of variables formula. Let Ck be the functions that are k times continuously differentiable and Ckb those functions Ck such that the function and its ith-order derivatives are bounded for i ≤ k. Theorem 11.1 Let Xt be a semimartingale with continuous paths and suppose f ∈ C2. Then for almost every ω f (Xt ) = f (X0) + ∫ t 0 f ′(Xs) dXs + 12 ∫ t 0 f ′′(Xs) d〈X 〉s, t ≥ 0. (11.1) Step 1 will be to reduce to the case when f ∈ C3b and X has appropriate boundedness conditions. Step 2 is a use of Taylor’s formula; see (11.2). Step 3 shows that each term converges to the appropriate quantity, and Step 4 removes the restriction that f be in C3b . Proof Step 1. If Xt = Mt + At is the decomposition of X into a local martingale M and a process A that has paths locally of bounded variation, let Vt be the total variation of A up to time t: Vt = ∫ t 0 |dAs|. Let TN = inf{t : |Mt | > N or 〈M〉t > N or Vt > N}.
By the continuity of paths, TN → ∞, a.s., as N → ∞, so for almost every ω and for each
t, t ∧ TN = t for N large enough. Since Itô’s formula is a path-by-path result, it suffices to
prove Itô’s formula for Xt∧TN for each N , or what amounts to the same thing, we may take N
arbitrary and assume Mt , 〈M〉t , At , and Vt are all bounded by N . In this case, Xt is bounded
by 2N .
Since X is bounded, we may modify f , f ′, and f ′′ outside of [−2N, 2N] without affecting
the validity of Itô’s formula. Therefore we will also assume f ∈ C2 with compact support.
Let us temporarily assume in addition that f ′′′ exists and is continuous; we will remove this
last assumption later on.
Let t0 > 0, ε > 0, S0 = 0, and define
Si+1 = Si+1(ε) = inf{t > Si :|Mt − MSi | > ε or 〈M〉t − 〈M〉Si > ε
or Vt − VSi > ε} ∧ t0.
Note Si = t0 for i sufficiently large (how large depends on ω) by the continuity of the
paths.
71

72 Itô’s formula
Step 2. The key idea to proving Itô’s formula is Taylor’s theorem. We write
f (Xt0 ) − f (X0) =
∞∑
i=0
[ f (XSi+1 ) − f (XSi )] (11.2)
=
∞∑
i=0
f ′(XSi )(XSi+1 − XSi ) + 12
∞∑
i=0
f ′′(XSi )(XSi+1 − XSi )2
+
∞∑
i=0
Ri,
where Ri is the remainder term. We have |Ri| ≤ c‖ f ′′′‖∞|XSi+1 − XSi |3.
Step 3. Let us first look at the terms with f ′ in them. Let H εs = f ′(XSi ) if Si ≤ s < Si+1. By the continuity of f ′ and Xs, we see that H εs converges boundedly and pointwise to f ′(Xs). In particular, ∫ t0 0 |H εs − f ′(Xs)| dVs → 0 boundedly, hence E ∫ t0 0 |H εs − f ′(Xs)| dVs → 0. Also, E ( ∫ t0 0 (H εs − f ′(Xs)) dMs )2 = E ∫ t0 0 |H εs − f ′(Xs)|2 d〈M〉s → 0 as ε → 0. We then have∑ i f ′(XSi )(XSi+1 − XSi ) = ∫ t0 0 H εs (dMs + dAs) → ∫ t0 0 f ′(Xs) (dMs + dAs), which leads to the f ′ term in Itô’s formula. Next let us look at the f ′′ terms. We can write (XSi+1 − XSi )2 = (MSi+1 − MSi )2 + 2(MSi+1 − MSi )(ASi+1 − ASi ) + (ASi+1 − ASi )2. Note ∑ i f ′′(XSi )(MSi+1 − MSi )(ASi+1 − ASi ) is bounded in absolute value by∑ i ε‖ f ′′‖∞|ASi+1 − ASi | ≤ ε‖ f ′′‖∞ ∫ t0 0 dVs ≤ ε‖ f ′′‖∞N, which goes to 0 as ε → 0; this follows from the definition of Si. Similarly the expression∑ f ′′(XSi )(ASi+1 − ASi )2 also goes to 0. Therefore we need to show∑ i f ′′(XSi )(MSi+1 − MSi )2 → ∫ t0 0 f ′′(Xs) d〈X 〉s. By an argument very similar to the one for the f ′ terms, 1 2 ∑ i f ′′(XSi )(〈M〉Si+1 − 〈M〉Si ) → 12 ∫ t0 0 f ′′(Xs) d〈M〉s, (11.3) and since 〈X 〉t = 〈M〉t for semimartingales (see (10.12)), the right-hand side of (11.3) is the correct f ′′ term. We thus need to show that∑ i f ′′(XSi )[(MSi+1 − MSi )2 − (〈M〉Si+1 − 〈M〉Si )] → 0 (11.4) Itô’s formula 73 as ε → 0. We will show E ( ∞∑ i=0 Bi )2 → 0, (11.5) where Bi = f ′′(XSi )[(MSi+1 − MSi )2 − (〈M〉Si+1 − 〈M〉Si )]. We have E (∑ i Bi )2 = E ∑ i B2i + 2 ∑ i< j BiB j. If i < j, then E [BiBj] = E [BiE [Bj | F Si+1 ] ]. By (9.2) and the fact that Si+1 ≤ Sj, we see that E [Bj | FSi+1 ] = f ′′(XSj )E [(MSj+1 − MSj )2 − (〈M〉S j+1 − 〈M〉S j ) | FSi+1 ] = 0, so the cross-products vanish. Therefore to prove (11.5) it remains to show E ∑ i B 2 i → 0 as ε → 0. We use the easy inequality (x + y)2 ≤ 2x2 + 2y2. Since f ′′ is bounded, E ∑ i B2i ≤ 2‖ f ′′‖2∞ ∑ i E [(MSi+1 − MSi )4] (11.6) + 2‖ f ′′‖2∞ ∑ i E [(〈M〉Si+1 − 〈M〉Si )2]. The first sum on the right-hand side of (11.6) is bounded by 2ε2‖ f ′′‖2∞ ∑ i E [(MSi+1 − MSi )2] = 2ε2‖ f ′′‖2∞E [M2t0 − M20 ] ≤ 8ε2‖ f ′′‖2∞N2. The second sum on the right-hand side of (11.6) is bounded by 2ε‖ f ′′‖2∞ ∑ i E [(〈M〉Si+1 − 〈M〉Si ] ≤ 2ε‖ f ′′‖2∞E 〈M〉t0 ≤ 2ε‖ f ′′‖2∞N. Both of these tend to 0 as ε → 0. Therefore E ∑i B2i → 0, and the proof of the convergence for the f ′′ term is complete. 74 Itô’s formula The final terms to examine are the remainder terms. We have shown that E ∑ i(XSi+1 −XSi )2 remains bounded as ε → 0. Since |Ri| ≤ cε‖ f ′′′‖∞(XSi+1 − XSi )2, we see E ∑ i |Ri| → 0 as ε → 0. Step 4. To finish up, we remove the assumption that f ∈ C3. (We still assume that f ∈ C2 with compact support.) Take a sequence { fm} of C3 functions such that fm, f ′m, and f ′′m converge uniformly to f , f ′, and f ′′, respectively. Apply Itô’s formula with fm and then let m → ∞. The terms fm(Xt ) and fm(X0) clearly converge to f (Xt ) and f (X0). The f ′m terms converge because E ( ∫ t0 0 ( f ′m(Xs) − f ′(Xs)) dMs )2 = E ∫ t0 0 | f ′m(Xs) − f (Xs)|2 d〈M〉s → 0 and E ∣∣∣ ∫ t0 0 ( f ′m(Xs) − f ′(Xs)) dAs ∣∣∣ ≤ E ∫ t0 0 | f ′m(Xs) − f ′(Xs)| dVs → 0 as m → ∞. The f ′′m terms converge by dominated convergence. This shows that (11.1) holds for each t0, except for a null set Nt0 depending on t0. Let N = ∪t∈Q+Nt , where Q+ denotes the non-negative rationals. If ω /∈ N , then (11.1) holds for every t0 rational. Each term in (11.1) is continuous, a.s. (with a null set N ′ independent of t0). Therefore if ω /∈ N ∪ N ′, (11.1) holds for all t0. There is a multivariate version of Itô’s formula, which is proved in a very similar way: Theorem 11.2 Suppose X 1t , . . . , X d t are continuous semimartingales, Xt = (X 1t , . . . , X dt ), and f is a C2 function on Rd . Then with probability one, f (Xt ) = f (X0) + ∫ t 0 d∑ i=1 ∂ f ∂xi (Xs) dX i s (11.7) + 12 ∫ t 0 d∑ i, j=1 ∂2 f ∂xi∂x j (Xs) d〈X i, X j〉s for all t ≥ 0. The following is known as the integration by parts formula or Itô’s product formula, and is very useful. Corollary 11.3 If X and Y are semimartingales with continuous paths, then XtYt = X0Y0 + ∫ t 0 Xs dYs + ∫ t 0 Ys dXs + 〈X ,Y 〉t . Proof By Itô’s formula, X 2t = X 20 + 2 ∫ t 0 Xs dXs + 〈X 〉t . Exercises 75 The analogous formula holds when X is replaced by Y and when X is replaced by X + Y . We then use XtYt = 12 [(Xt + Yt )2 − X 2t − Y 2t ]; substituting the formulas for X 2t , Y 2 t , and (Xt + Yt )2 that we obtained from Itô’s formula and doing some algebra yields our result. Exercises 11.1 Suppose Wt is a Brownian motion and a ∈ R. Show that the amount of time Brownian motion spends at the point a is zero, i.e., that∫ t 0 1{a}(Ws) ds = 0, a.s. for all t > 0.
11.2 Let a < b and let fa,b be the C1 function such that fa,b(0) = f ′a,b(0) = 0 and f ′a,b(x) = ∫ x 0 1[a,b](y) dy, x ∈ R. In other words, fa,b is the function whose second derivative is 1[a,b], except that the second derivative is not defined at a and b. Show Itô’s formula holds for fa,b: fa,b(Wt ) = ∫ t 0 f ′a,b(Ws) dWs + 12 ∫ t 0 1[a,b](Ws) ds. 11.3 If Wt is a Brownian motion, a > 0, and T = inf{t > 0 : |Wt | = a}, calculate E
∫ T
0 (Ws)
k ds for
each non-negative integer k. Also calculate
E
∫ T
0
1[b1,b2](Ws) ds
if [b1, b2] ⊂ [−a, a].
11.4 Let W be a Brownian motion, let t0 < t1 < · · · < tn = 1, and let Bi = (Wti − Wti−1 )2 − (ti − ti−1). Show there exists a constant c1 not depending on {t0, . . . , tn} such that E ( n∑ i=1 Bi )2 ≤ c1 max 1≤i≤n |ti − ti−1|. 11.5 Use Exercise 11.4 and the Borel–Cantelli lemma to prove that if W is a Brownian motion, then lim n→∞ 2n∑ k=1 (Wk/2n − W(k−1)/2n )2 = 1, a.s. 76 Itô’s formula 11.6 In our proof of Itô’s formula, the use of stopping times simplifies the proof considerably. This exercise considers a proof of Itô’s formula using fixed times. Suppose M is a bounded continuous martingale, A is a continuous process whose paths have total variation bounded by N > 0, a.s.,
and Xt = Mt + At .
(1) Writing [x] for the integer part of x, prove that for each t,
[2nt]+1∑
i=1
(X(i+1)/2n − Xi/2n )2
converges in probability to 〈X 〉t .
(2) Prove that if f is a C2 function whose second derivative is bounded, then
[2nt]+1∑
i=1
f ′′(Xi/2n )(X(i+1)/2n − Xi/2n )2
converges in probability to ∫ t
0
f ′′(Xs) d〈Xs〉.
Since the increments of M and A are not uniformly bounded by something small, this is much
harder than the proof of Theorem 11.1 given in this chapter.
11.7 Here is an alternate way to prove Itô’s formula.
(1) Suppose X = M + A, where M and A are as in Exercise 11.6. Write
X 2t − X 20 =
[t2n]−1∑
i=0
(X 2(i+1)/2n − X 2i/2n )
=
[t2n]−1∑
i=0
2Xi/2n (X(i+1)/2n − Xi/2n ) +
[t2n]−1∑
i=0
(X(i+1)/2n − Xi/2n )2.
Use Exercise 11.6 to show that Itô’s formula holds when f (x) = x2.
(2) Derive the Itô product formula. Then use induction to show that Itô’s formula holds when
f (x) = xn, n a positive integer.
(3) Given f ∈ C2, find polynomials Pm such that Pm, P′m, P′′m converge uniformly to f , f ′, f ′′,
respectively, on a compact interval as m → ∞. Apply Itô’s formula for Pm and show that one
can take limits to derive Itô’s formula for f .

12
Some applications of Itô’s formula
We will be using Itô’s formula throughout the book. In this chapter we give some applications,
each of which will turn out to be quite useful.
12.1 Lévy’s theorem
The following is known as Lévy’s theorem. Recall that if M is a local martingale with
continuous paths and TN = inf{t : |Mt | ≥ N}, we defined 〈M〉t to be equal to 〈M〉t∧TN if
t ≤ TN ; see Section 10.2. Moreover, by Exercise 9.3, Mt∧N is a square integrable martingale
for each N .
Theorem 12.1 Let Mt be a continuous local martingale with respect to a filtration {F t}
satisfying the usual conditions such that M0 = 0 and 〈M〉t = t. Then Mt is a Brownian
motion with respect to {Ft}.
Proof Fix t0 and let Nt = Mt+t0 − Mt0 , F ′t = Ft+t0 . It is routine to check that Nt is a
martingale with respect to F ′t and that 〈N〉t = t. Note F ′0 will not be the trivial σ -field in
general. We see that
E N2t = E M2t+t0 − E M2t0 = t < ∞. If f is a function mapping the reals to the complex numbers, we may still use Itô’s formula: just apply Itô’s formula to the real and imaginary parts of f . Doing this for f (x) = eiux, where u and x are real, we have eiuNt = 1 + iu ∫ t 0 eiuNs dNs − u 2 2 ∫ t 0 eiuNs ds. (12.1) If we take TK = inf{t : |Nt | ≥ K}, then eiuNt∧TK = 1 + iu ∫ t∧TK 0 eiuNs dNs − u 2 2 ∫ t∧TK 0 eiuNs ds. (12.2) Take A ∈ F ′0, multiply (12.2) by 1A, and take expectations. The stochastic integral is a martingale, so this term will have 0 expectation. Then let K → ∞, and we are left with E [eiuNt ; A] = P(A) − u 2 2 ∫ t 0 E [eiuNs; A] ds. (12.3) We used the Fubini theorem here. (The reason we introduced the stopping time TK is that Nt∧TK is a square integrable martingale, and hence the stochastic integral is a martingale. We might run into integrability problems if we worked with (12.1) instead of (12.2).) 77 78 Some applications of Itô’s formula Write J (t) = E [eiuNt ; A], so we have J (t) = P(A) − u 2 2 ∫ t 0 J (s) ds. (12.4) Since J is bounded, (12.4) shows that J is continuous. Since J is continuous, using (12.4) again shows that J is differentiable. Hence J ′(t) = − u22 J (t) with J (0) = P(A). The only solution to this ordinary differential equation is J (t) = P(A)e−u2t/2. (12.5) If we set A = �, this tells us that E eiuNt = e−u2t/2, and by the uniqueness theorem for characteristic functions (Theorem A.48), Mt+t0 − Mt0 is a mean zero normal random variable with variance t. Equation (12.5) also tells us that E [eiuNt ; A] = E [eiuNt ]P(A) (12.6) when A ∈ F ′0. Let f be a C∞ function with compact support. The Fourier transform f̂ (u) will be in the Schwartz class; see Section B.2. Replacing u by −u in (12.6), multiplying the resulting equation by f̂ (u), and integrating over u ∈ R, we have∫ f̂ (u)E [e−iuNt ; A] du = ∫ f̂ (u)E [e−iuNt ] du P(A). Using the Fubini theorem and the Fourier inversion theorem, and dividing by a constant, we conclude E [ f (Nt ); A] = E [ f (Nt )]P(A). Since f̂ is in the Schwartz class, integrability is not a problem when applying the Fubini theorem. A limit argument shows that this equation holds with f equal to 1B, where B is a Borel subset of R, hence P(Mt+t0 − Mt0 ∈ B, A) = P(Mt+t0 − Mt0 ∈ B) P(A). This shows the independence of Mt+t0 − Mt0 and Ft0 . We thus see that Mt is a continuous process starting at 0 with Mt+t0 −Mt0 being a mean zero normal random variable with variance t independent of Ft0 , and therefore M is a Brownian motion. 12.2 Time changes of martingales The next theorem says that most continuous martingales arise from Brownian motion via a time change. That is, the paths are the same, but the rate at which one moves along the paths varies. In fact, it is possible to show that all continuous martingales arise from a time change of a Brownian motion that is possibly stopped at a random time. Theorem 12.2 Suppose Mt is a continuous local martingale, M0 = 0, 〈M〉t is strictly increasing, and limt→∞ 〈M〉t = ∞, a.s. Let τ (t) = inf{u : 〈M〉u ≥ t}. Then Wt = Mτ (t) is a Brownian motion with respect to F ′t = Fτ (t). 12.4 Martingale representation 79 Proof Let us first suppose that W 2t is integrable. We have by Proposition 9.3 that E [Wt | F ′s] = E [Mτ (t) | Fτ (s)] = Mτ (s) = Ws, or Wt is a continuous martingale. Similarly, W 2t − t is a martingale. Now apply Lévy’s theorem, Theorem 12.1. Removing the assumption that W 2t is integrable is left as Exercise 12.1. 12.3 Quadratic variation Itô’s formula allows us to prove Theorem 9.10 fairly simply. Proof of Theorem 9.10 If TK = inf{t : |Mt | ≥ K}, we will show that [t02n]∑ k=0 (MTK∧(k+1)/2n − MTK∧k/2n )2 converges to 〈M〉t0∧TK . Since TK → ∞ as K → ∞, this will prove the proposition. Thus we may assume M is bounded by K. If s > 0 and we let Nt = Ms+t − Ms, then Nt is a martingale with respect to the filtration
F ′t = Fs+t and we can check that 〈N〉t = 〈M〉t+s − 〈M〉s. By Itô’s formula applied to the
process N , we obtain
(Mt+s − Ms)2 = 2
∫ t
0
(Mr+s − Ms) dMr + (〈M〉t+s − 〈M〉s).
Applying this with t = 1/2n and s = k/2n and summing, we see that
[t02n]∑
k=0
(M(k+1)/2n − Mk/2n )2 − 〈M〉t = 2
∫ t0
0
Lnr dMr + R, (12.7)
where Lnr = Mr − Mk/2n for k/2n ≤ r < (k + 1)/2n and R = 〈M〉([t02n]+1)/2n − 〈M〉t0 . Note E ( 2 ∫ t0 0 Lnr dMr )2 = 4E ∫ t0 0 (Lnr ) 2 d〈M〉r. (12.8) The integrand (Lnr ) 2 is bounded by 4K2, E 〈M〉t = E M2t ≤ K2 is finite, and Lnr tends to 0 as n → ∞. By dominated convergence, the right-hand side of (12.8) tends to 0 as n → ∞. As for the remainder term, R goes to 0 by the continuity of the paths of 〈M〉t . The reason we only have convergence in probability rather than in L2 is due to the stopping time argument involving TK . 12.4 Martingale representation The next theorem says that every martingale adapted to the filtration of a Brownian mo- tion can be expressed as a stochastic integral with respect to the Brownian motion. This 80 Some applications of Itô’s formula used to be a rather arcane result that was of interest only to probabilists specializing in martingales. But then it turned out that this theorem is the basis for showing the complete- ness of the market in the theory of financial mathematics; see Chapter 28. The martingale representation theorem is also key to the innovations approach to stochastic filtering; see Chapter 29. Theorem 12.3 Let Ft be the minimal augmented filtration generated by a one-dimensional Brownian motion Wt, let t0 > 0, and let Y be Ft0 measurable with EY 2 < ∞. There exists a predictable process Hs with E ∫ t0 0 H 2 s ds < ∞ such that Y = EY + ∫ t0 0 Hs dWs, a.s. (12.9) The proof consists of showing (12.9) holds for successively larger classes of random variables. Step 1 of the proof shows that the equation holds for random variables of the form eiu(Wt−Ws ) and Step 2 shows that (12.9) holds for products of such random variables. In Step 3, it is shown that if the equation holds for a set of random variables, it holds for the closure of that set with respect to the L2 norm. Proof Step 1. Let Xt = iuWt +u2t/2. Note 〈X 〉t = (iu)2〈W 〉t . By Itô’s formula applied with f (x) = ex, eiuWt+u 2t/2 = 1 + ∫ t 0 eXr d(iuWr − u2r/2) + 12 ∫ t 0 (−u2)eXr dr = 1 + ∫ t 0 iueiuWr+u 2r/2 dWr. Therefore eiuWt = e−u2t/2 + ∫ t 0 iueiuWr+u 2r/2−u2t/2 dWr. (12.10) The integrand in the stochastic integral in (12.10) is eiuWr times a deterministic function, hence is predictable. Therefore (12.9) holds when Y = eiuWt and moreover, the support of H in this case is contained in [0, t], that is, Hr = 0 if r /∈ [0, t]. Similarly, (12.9) holds when Y = eiu(Wt−Ws ), and in this case the support of the corresponding H is [s, t]. Step 2. Suppose now that Y1 and Y2 are two random variables for which (12.9) holds with the supports of the corresponding H1 and H2 overlapping by at most finitely many points. To be more precise, if Yi = EYi + ∫ t0 0 Hi(s) dWs, i = 1, 2, then we suppose that, with probability one, H1(s)H2(s) = 0 except for finitely many points s. This implies∫ t0 0 H1(s)H2(s) ds = 0. 12.4 Martingale representation 81 Let Zi(t) = EYi + ∫ t 0 Hi(s) dWs, i = 1, 2. Note Zi(0) = EYi and Zi(t0) = Yi. Then by the product formula (Corollary 11.3), Y1Y2 = (EY1)(EY2) + ∫ t0 0 Z1(s) dZ2(s) + ∫ t0 0 Z2(s) dZ1(s) + 〈Z1, Z2〉t0 = (EY1)(EY2) + ∫ t0 0 Z1(s)H2(s) dWs + ∫ t0 0 Z2(s)H1(s) dWs + ∫ t0 0 H1(s)H2(s) ds = (EY1)(EY2) + ∫ t0 0 Ks dWs, (12.11) where Ks = Z1(s)H2(s)+Zs(s)H1(s), and so the support of Ks is contained in the union of the supports of H1(s) and H2(s). Taking an expectation in (12.11), E [Y1Y2] = (EY1)(EY2). Thus (12.9) holds for Y1Y2. Using induction, (12.9) will hold for the product of n random variables Yi, i = 1, . . . , n, provided the supports of any two of the corresponding Hi overlap by at most finitely many values of s. Combining this with Step 1, we see that if s1 < s2 < · · · < sn+1 ≤ t0, then the random variables of the form Y = exp ( i n∑ j=1 uj(Wsj+1 − Wsj ) ) (12.12) satisfy (12.9). Step 3. We claim that random variables of the form (12.12) generate σ (Ws; s ≤ t0). To see this, we proceed as in the last paragraph of the proof of Theorem 12.1, namely, we replace each uj by −uj, multiply by f̂ (u1, . . . , un), the Fourier transform of a C∞ function f with compact support, integrate over (u1, . . . , un) ∈ Rn, use the Fubini theorem and the Fourier inversion theorem, and we obtain random variables of the form f (Ws2 − Ws1, . . . ,Wsn+1 − Wsn ) for f in C∞ with compact support. By a limit argument, such random variables generate σ (Ws; s ≤ t0). We will prove that whenever Yn satisfies (12.9) and Yn → Y in L2, then Y satisfies (12.9). By Exercise 2.7 and Proposition 2.5, this will prove our theorem. Suppose each Yn satisfies (12.9) with integrand Hn(s) and suppose Yn → Y in L2. Then EYn → EY , and Yn − EYn converges in L2 to Y − EY . Since E ∫ t0 0 (Hn(s) − Hm(s))2 ds = E ((Yn − EYn) − (Ym − EYm))2 → 0, the sequence Hn is a Cauchy sequence with respect to the norm ‖X ‖ = (E ∫ t0 0 X 2 s ds) 1/2, which is an L2 norm and hence complete. Therefore there exists Hs (which is predictable because each Hn(s) is predictable) such that E ∫ t0 0 H 2 s ds < ∞ and E ∫ t0 0 (Hn(s)−Hs)2 ds → 0. Hence E ( (Yn − EYn) − ∫ t0 0 Hs dWs )2 = E ∫ t0 0 (Hn(s) − Hs)2 ds → 0. Since Yn − EYn converges in L2 to Y − EY , it follows that Y − EY = ∫ t0 0 Hs dWs, a.s. 82 Some applications of Itô’s formula Corollary 12.4 Suppose Mt is a right-continuous square integrable martingale with respect to the minimal augmented filtration {Ft} generated by a one-dimensional Brownian mo- tion and suppose M0 = 0. Let t0 > 0. Then there exists a predictable process Hs with
E
∫ t0
0 H
2
s ds < ∞ such that with probability one Mt = ∫ t 0 Hs dWs for all t ≤ t0. Proof Since Mt is a martingale, E [Mt0 | F0] = M0, and taking expectations, E Mt0 = E M0 = 0. By Theorem 12.3, there exists a predictable process H with E ∫ t0 0 H 2 s ds < ∞ such that Mt0 = ∫ t 0 Hs dWs. Taking conditional expectations with respect to Ft , we obtain Mt = ∫ t 0 Hs dWs. This holds almost surely for each t. Thus except for a null set of ω’s, it holds for all t rational. Since Mt is right continuous, it holds for all t. Corollary 12.5 If Mt is a square integrable martingale with respect to the minimal aug- mented filtration of a one-dimensional Brownian motion W , then Mt has a version with continuous paths. Proof By Corollary 3.13, M has a version with right continuous paths. By Corollary 12.4, M can be written as a stochastic integral with respect to W . But such stochastic integrals have continuous paths by Theorem 10.4. It is important for the martingale representation theorem that Mt be a martingale with respect to the minimal augmented filtration of W and not a larger filtration. For example, let (X ,Y ) be a two-dimensional Brownian motion and let {Ft} be the minimal augmented filtration generated by (X ,Y ). We show that we cannot write Y1 as a stochastic integral with respect to Xt . If it were possible to do so, since Y1 has mean zero, we would have Y1 = ∫ 1 0 Hs dXs. Taking conditional expectations, Yt = ∫ t 0 Hs dXs. Then 〈X ,Y 〉t = ∫ t 0 Hs ds by Exercise 10.5. But if (X ,Y ) is two-dimensional Brownian motion, then X and Y are independent, and so 〈X ,Y 〉t = 0 by Exercise 9.4, a contradiction. (However, it is true, by a proof similar to that of Theorem 12.3, if {Ft} is the minimal augmented filtration of a d-dimensional Brownian motion (W 1, . . . ,W d ) and Y is square integrable andFt0 measurable, then there exist suitable processes H is such that Y = EY + ∑d i=1 ∫ t0 0 H i s dW i s .) 12.5 The Burkholder–Davis–Gundy inequalities Next we turn to a pair of basic inequalities, those of Burkholder, Davis, and Gundy. In both of the following theorems, the constant depends on p, the exponent. As stated and proved below, we require p ≥ 2 for Theorems 12.6 and 12.7; in fact, the two theorems are true (with a different proof) as long as p > 0; see Bass (1995), pp. 62–4, or Exercise 12.12. The proof
we present here is a nice application of Itô’s formula.

12.5 The Burkholder–Davis–Gundy inequalities 83
Define
M∗t = sup
s≤t
|Ms|.
Theorem 12.6 Let Mt be a continuous local martingale with M0 = 0, a.s., and suppose
2 ≤ p < ∞. There exists a constant c1 depending on p such that for any finite stopping time T , E (M∗T ) p ≤ c1E 〈M〉p/2T . Proof There is nothing to prove if the left-hand side is zero, so we may assume it is positive. First suppose M∗T is bounded by a positive constant K. Note for p ≥ 2 the function x → |x|p is C2. By Doob’s inequalities and then Itô’s formula (and the fact that |Ms| ≥ 0), we have E |M∗T |p ≤ cE |MT |p = cE ∫ T 0 p|Ms|p−1 dMs + 12 cE ∫ T 0 p(p − 1)|Ms|p−2 d〈M〉s ≤ cE ∫ T 0 (M∗T ) p−2 d〈M〉s = cE [(M∗T )p−2〈M〉T ]. (Recall our convention about constants and the letter c.) Using Hölder’s inequality with exponents p/(p − 2) and p/2, we obtain E (M∗T ) p ≤ c(E (M∗T )p) p−2 p (E (〈M〉 p 2 T ) 2 p . Dividing both sides by (E (M∗T ) p)(p−2)/p) and then taking both sides to the power p/2 gives our result. We then apply the above to T ∧ UK , where UK = inf{t : |Mt | ≥ K}, let K → ∞, and use Fatou’s lemma. Theorem 12.7 Let Mt be a continuous local martingale with M0 = 0, a.s., and suppose 2 ≤ p < ∞. There exists a constant c2 depending on p such that for any finite stopping time T , E 〈M〉p/2T ≤ c2E (M∗T )p. Proof As in the previous theorem, we may assume the left-hand side is positive. Set r = p/2. Let us first suppose 〈M〉T and M∗T are bounded by a positive constant K. Let Nt = Mt∧T , so that 〈N〉∞ = 〈M〉T , and let At = 〈M〉r−1t∧T . Using integration by parts,∫ ∞ 0 〈N〉s dAs = 〈N〉∞A∞ − ∫ ∞ 0 As d〈N〉s = 〈N〉r∞ − 1 r 〈N〉r∞. Since ∫ ∞ 0 〈N〉∞ dAs = 〈N〉r∞, 84 Some applications of Itô’s formula we then have 〈N〉r∞ = r ∫ ∞ 0 (〈N〉∞ − 〈N〉s) dAs. Using Propositions 3.14 and 9.6, E 〈M〉rT = E 〈N〉r∞ = rE ∫ ∞ 0 (〈N〉∞ − 〈N〉s) dAs = rE ∫ ∞ 0 (E [〈N〉∞ | Fs] − 〈N〉s) dAs = rE ∫ ∞ 0 E [〈N〉∞ − 〈N〉s | Fs] dAs = rE ∫ ∞ 0 E [N2∞ − N2s | Fs] dAs ≤ cE ∫ ∞ 0 E [(N∗∞) 2 | Fs] dAs = cE [(N∗∞)2A∞] = cE [(M∗T )2〈M〉r−1T ]. We use Hölder’s inequality with exponents r and r/(r − 1), divide both sides by the quantity (E 〈M〉rT )(r−1)/r, and then take both sides to the rth power. We then get E 〈M〉rT ≤ cE (M∗T )2r, which is what we wanted. To remove the restriction that 〈M〉 and M∗ are bounded, we apply the above to T ∧ VK in place of T , where VK = inf{t : 〈M〉t + M∗t ≥ K}, let K → ∞, and use Fatou’s lemma. 12.6 Stratonovich integrals For stochastic differential geometry and also many other purposes, the Stratonovich integral is more convenient than the Itô integral. If X and Y are continuous semimartingales, the Stratonovich integral, denoted ∫ t 0 Xs ◦ dYs, is defined by∫ t 0 Xs ◦ dYs = ∫ t 0 Xs dYs + 12 〈X ,Y 〉t . Both the beauty and the difficulty of Itô’s formula are due to the quadratic variation term. The change of variables formula for the Stratonovich integral avoids this. Theorem 12.8 Suppose f ∈ C3 and X is a continuous semimartingale. Then f (Xt ) = f (X0) + ∫ t 0 f ′(Xs) ◦ dXs. Proof By Itô’s formula applied to the function f and the definition of the Stratonovich integral, it suffices to show that 〈 f ′(X ), X 〉t = ∫ t 0 f ′′(Xs)d〈X 〉s. (12.13) Exercises 85 Applying Itô’s formula to the function f ′, which is in C2, f ′(Xt ) = f ′(X0) + ∫ t 0 f ′′(Xs) dXs + 12 ∫ t 0 f ′′′(Xs) d〈X 〉s, from which (12.13) follows. If X and Y are continuous semimartingales and we apply the change of variables formula with f (x) = x2 to X + Y and X − Y , we obtain (Xt + Yt )2 = (X0 + Y0)2 + 2 ∫ t 0 (Xs + Ys) ◦ d(Xs + Ys) and (Xt − Yt )2 = (X0 − Y0)2 + 2 ∫ t 0 (Xs − Ys) ◦ d(Xs − Ys). Taking the difference and then dividing by 4, we have the product formula for Stratonovich integrals XtYt = X0Y0 + ∫ t 0 Xs ◦ dYs + ∫ t 0 Ys ◦ dXs. (12.14) The Stratonovich integral ∫ Hs ◦ dXs can be represented as a limit of Riemann sums. Proposition 12.9 Suppose H and X are continuous semimartingales and t0 > 0. Then∫ t
0 Hs ◦ dXs is the limit in probability as n → ∞ of
2n−1∑
k=0
Hkt0/2n + H(k+1)t0/2n
2
(X(k+1)t0/2n − Xkt0/2n ).
Proof We write the sum as∑
Hkt0/2n (X(k+1)t0/2n − Xkt0/2n )
+ 12
∑
(H(k+1)t0/2n − Hkt0/2n )(X(k+1)t0/2n − Xkt0/2n ).
The first sum tends to
∫ t
0 Hs dXs while by Exercise 12.10 the second sum tends to
1
2 〈H, X 〉t .
This proves the proposition.
Exercises
12.1 Show that Wt and W 2t − t are local martingales, where W is defined in the statement of Theorem
12.2.
12.2 Suppose {Ft} is a filtration satisfying the usual conditions, X is a Brownian motion with respect
to {Ft}, and T is a finite stopping time with respect to this same filtration. Let Y be another
Brownian motion that is independent of {Ft} and define
Zt =
{
Xt , t < T XT + Yt−T , t ≥ T. Show that Z is a Brownian motion (although not necessarily with respect to {Ft}). 86 Some applications of Itô’s formula 12.3 Suppose Mt is a continuous local martingale with respect to a filtration {Ft} satisfying the usual conditions, T is a stopping time with respect to {Ft}, and 〈M〉t = t ∧ T . Prove that Mt∧T has the same law as a Brownian motion stopped at time T . 12.4 Here is a multidimensional version of Lévy’s theorem. Let {Ft} be a filtration satisfying the usual conditions. Suppose (M1t , . . . , M d t ) is a d-dimensional process such that each component Mit is a continuous martingale with respect to {Ft} with 〈Mi〉t = t. Suppose that 〈Mi, M j〉t = 0 if i �= j. Prove that (M1t , . . . , Mdt ) is a d-dimensional Brownian motion. 12.5 Let {Ft} be a filtration satisfying the usual conditions. Let At be a strictly increasing continuous process adapted to {Ft} with limt→∞ At = ∞, a.s. Suppose (M1t , . . . , Mdt ) is a d-dimensional process such that each component Mit is a continuous martingale with respect to {Ft} and 〈Mi〉t = At . Suppose that 〈Mi, M j〉t = 0 if i �= j. Prove that (M1t , . . . , Mdt ) is a time change of d-dimensional Brownian motion. 12.6 Suppose M is a continuous local martingale such that 〈M〉t is deterministic. Prove that M is a Gaussian process. 12.7 Suppose M is a continuous local martingale with M0 = 0, a.s. Show that there exists a Brownian motion W , an increasing process τt , and a stopping time T such that Mt = Wτt∧T for all t. 12.8 Let Mt be a continuous local martingale. Show that the events (M∗∞ < ∞) and (〈M〉∞ < ∞) differ by at most a null set. 12.9 Let Mt be a continuous local martingale. Prove that P(sup t≥0 |Mt | > x, 〈M〉∞ < y) ≤ 2e−x 2/2y. 12.10 Suppose X and Y are continuous semimartingales and t0 > 0. Prove that
2n−1∑
k=0
(X(k+1)t0/2n − Xkt0/2n )(Y(k+1)t0/2n − Ykt0/2n )
converges to 〈X ,Y 〉t0 in probability.
12.11 Let p > 0. Suppose X and Y are non-negative random variables, β > 1, δ ∈ (0, 1), and
ε ∈ (0, β−p/2) such that
P(X > βλ,Y < δλ) ≤ εP(X ≥ λ) for all λ > 0. This inequality is known as a good-λ inequality. Prove that there exists a constant
c (depending on β, δ, ε, and p but not X or Y ) such that
E X p ≤ cEY p.
Hint: First assume X is bounded. Write
P(X/β > λ) = P(X > βλ,Y < δλ) + P(Y ≥ δλ) ≤ εP(X ≥ λ) + P(Y/δ ≥ λ). Multiply by pλp−1, integrate over λ, and use the fact that ε < β−P/2. Exercises 87 12.12 Use Exercise 12.11 to prove that the Burkholder–Davis–Gundy inequalities hold for all p > 0.
Hint: Use time change to reduce to the case of a Brownian motion W . If T is a stopping time
and U = inf{t : W ∗T > λ}, write
P(W ∗T > βλ, T
1/2 < δλ) = P(W ∗T > βλ, T < δ2λ2,U < ∞) ≤ P( sup U≤t≤U+δ2λ2 |Wt − WU | > (β − 1)λ,U < ∞). Condition on FU , use Theorem 4.2, and notice that P(U < ∞) = P(W ∗T > λ).
12.13 Define the H 1 norm of a martingale by
‖M‖H 1 = E [sup
t≥0
|Mt | ].
Prove that this is a norm. Does there exist a uniformly integrable continuous martingale that is
not in H 1?
12.14 Let W be a Brownian motion and let T be a stopping time. Prove that if E T 1/2 < ∞, then EWT = 0. 12.15 Suppose W = (W 1, . . . ,W d ) is a d-dimensional Brownian motion started at 0, and let {Ft} be the minimal augmented filtration of W . Suppose Y is a F1 measurable random variable with mean zero and finite variance. Prove there exist predictable processes H 1, . . . , H d such that E ∫ 1 0 (H i s ) 2 ds < ∞ for each i and Y = d∑ i=1 ∫ 1 0 H is dW i s . 12.16 Suppose W is a Brownian motion and H is adapted, bounded, and right continuous. Let t ≥ 0. Show 1 Wt+h − Wt ∫ t+h t Hs dWs converges in probability to Ht . 12.17 Let W be a Brownian motion and α > 0. Show that∫ t
0
1
|Ws|α ds
is infinite almost surely if α ≥ 1 but finite almost surely if α < 1. 12.18 Here is a useful inequality. Suppose A is an increasing process with A0 = 0, a.s., and suppose there exists a non-negative random variable B such that for each t, E [A∞ − At | Ft ] ≤ E [B | Ft ], a.s. Prove that for each integer p ≥ 1, there exists a constant cp depending only on p such that E Ap∞ ≤ cpE Bp. Hint: Write A∞ = p! ∫ ∞ 0 (A∞ − At ) dAt , take expectations, and use Proposition 3.14. 88 Some applications of Itô’s formula 12.19 Let W be a one-dimensional Brownian motion with filtration {Ft} and let f (r, s) be a determin- istic function. Define the multiple stochastic integral by∫ t 0 ∫ s 0 f (r, s) dWr dWs = ∫ t 0 ( ∫ s 0 f (r, s) dWr ) dWs, provided ∫ t 0 ∫ s 0 f (r, s)2 dr ds < ∞, and similarly for higher-order multiple stochastic integrals. (1) If f : Rm → R and g : Rn → R are bounded and deterministic, n �= m, M ft = ∫ t 0 · · · ∫ rm−1 0 f dWr1 · · · dWrm , and Mgt is defined similarly, show that E [M f t M g t ] = 0 for all t. (2) Show that the collection of random variables {M f1 : f has domain Rm for some m and is bounded and deterministic} is dense in the set of mean zero F1 measurable random variables with respect to the L2(P) norm. 13 The Girsanov theorem We look at what happens to a Brownian motion when we change P to another probability measure Q. This may seem strange, but there are many applications of this, including to financial mathematics and to filtering; see Chapters 28 and 29. Another application we will give (at the end of this chapter in Section 13.2) is to determine the probability a Brownian motion Ws crosses a line a + bs before time t. 13.1 The Brownian motion case We start with an observation. Suppose Yt is a continuous local martingale with Y0 = 0 and let Zt = eYt−〈Y 〉t/2. Applying Itô’s formula to Xt = Yt − 12 〈Y 〉t with the function ex yields Zt = eYt−〈Y 〉t/2 = 1 + ∫ t 0 eXs d ( Ys − 12 〈Y 〉s ) + 12 ∫ t 0 eXs d〈Y 〉s = 1 + ∫ t 0 Zs dYs. (13.1) This can be abbreviated by dZt = Zt dYt . Zt is called the exponential of the martingale Y , and since Z is the stochastic integral with respect to a local martingale, it is itself a local martingale. Before stating the Girsanov theorem, we need two technical lemmas. Lemma 13.1 Suppose Y is a continuous local martingale with Y0 = 0 and Zt = eYt−〈Y 〉t/2. If 〈Y 〉t is a bounded random variable for each t, then E |Zt |p < ∞ for each p > 1 and each t.
Proof Let us first suppose Y is bounded in absolute value by N . Since Zt ≥ 0, we have by
the Cauchy–Schwarz inequality
E Z pt = E epYt−p〈Y 〉t/2 (13.2)
= E
[
epYt−p
2〈Y 〉t e(p
2−(p/2))〈Y 〉t
]
≤
(
E e2pYt−2p
2〈Y 〉t
)1/2(
E e(2p
2−p)〈Y 〉t
)1/2
.
By the exact same calculation as in (13.1) but with Y replaced by 2pY , we see e2pYt−2p
2〈Y 〉t
is a stochastic integral of a bounded integrand with respect to a bounded martingale, and
hence is a martingale. This shows that the first factor on the last line of (13.2) is 1. By our
assumption that 〈Y 〉t is bounded, the second factor on this line is finite and does not depend
on N .
89

90 The Girsanov theorem
If Y is not bounded, let TN = inf{s : |Ys| ≥ N}, apply the above argument to Yt∧TN , and let
N → ∞.
The second lemma is the following.
Lemma 13.2 Suppose At is a continuous increasing process adapted to a filtration {F t}
satisfying the usual conditions. Let X be a bounded random variable, H a bounded adapted
process, s < t, and B ∈ Fs. Then E [ ∫ t s X Hr dAr; B ] = E [ ∫ t s E [X | Fr] Hr dAr; B ] . Proof By linearity, it suffices to suppose X and H are non-negative. Let A′r = Ar+s, H ′r = Hr+s, and F ′r = Fr+s. Let Cr = ∫ r 0 H ′ r 1B dA ′ s, and so we must show E ∫ t−s 0 X dCr = E ∫ t−s 0 E [X | F ′r] dCr. This follows by Proposition 3.14. Let Mt be a non-negative continuous martingale with M0 = 1, a.s. Define a new probability measure Q by Q(A) = E [Mt; A] if A ∈ Ft . Note Q is a probability measure because Q(�) = E Mt = E M0 = 1. Q is well-defined because if A ∈ Fs ⊂ Ft , then since M is a martingale, we have E [Mt; A] = E [Ms; A]. A more general version of the Girsanov theorem is possible (see Exercise 13.5), but the Girsanov theorem is most frequently used with Brownian motion. Theorem 13.3 Suppose Wt is a Brownian motion with respect to P, H is bounded and predictable, Mt = exp ( ∫ t 0 Hr dWr − 12 ∫ t 0 H 2r dr ) , (13.3) and Q(B) = E P[Mt; B] if B ∈ Ft . (13.4) Then Wt − ∫ t 0 Hr dr is a Brownian motion with respect to Q. Proof We prove the theorem by showing Wt − ∫ t 0 Hr dr satisfies the hypotheses of Lévy’s theorem (Theorem 12.1). We first show Wt − ∫ t 0 Hr dr is a martingale with respect to Q. By (13.1) with Yt = ∫ t 0 Hr dWr and Zt = Mt , Mt = 1 + ∫ t 0 MrHr dWr. By Exercise 10.5, 〈M,W 〉t = ∫ t 0 MrHr dr. (13.5) We want to show that if B ∈ Fs, then E Q [ Wt − ∫ t 0 Hr dr; B ] = E Q [ Ws − ∫ s 0 Hr dr; B ] . (13.6) 13.1 The Brownian motion case 91 If B ∈ Fs, then using the definition of Q and the product formula (Corollary 11.3), E Q[Wt; B] = E P[MtWt; B] (13.7) = E P [ ∫ t 0 Mr dWr; B ] + EP [ ∫ t 0 Wr dMr; B ] + E P[〈M,W 〉t; B] and E Q[Ws; B] = E P[MsWs; B] (13.8) = E P [ ∫ s 0 Mr dWr; B ] + EP [ ∫ s 0 Wr dMr; B ] + E P[〈M,W 〉s; B]. Since H is bounded, 〈∫ ·0 Hr dWr〉t ≤ ct. By Lemma 13.1, Mt is a martingale and E |Mt |p <∞ for each t and each p ≥ 1. Since stochastic integrals with respect to martingales are martingales, E P [ ∫ t 0 Mr dWr; B ] = E P [ ∫ s 0 Mr dWr; B ] (13.9) and E P [ ∫ t 0 Wr dMr; B ] = E P [ ∫ s 0 Wr dMr; B ] . (13.10) Combining (13.7), (13.8), (13.9), and (13.10), we see that (13.6) will follow if we show E P[〈M,W 〉t − 〈M,W 〉s; B] = E Q [ ∫ t s Hr dr; B ] . (13.11) Using Lemma 13.2 and (13.5), we have E Q [ ∫ t s Hr dr; B ] = E P [ Mt ∫ t s Hr dr; B ] = E P [ ∫ t s MtHr dr; B ] = E P [ ∫ t 0 E [Mt | Fr]Hr dr; B ] = E P [ ∫ t s MrHr dr; B ] = E P[〈M,W 〉t − 〈M,W 〉s; B], which proves (13.11). A similar proof shows that (Wt − ∫ t 0 Hr dr) 2 −t is a martingale with respect to Q, and hence the quadratic variation of Wt − ∫ t 0 Hr dr under Q is still t (or see Exercise 13.2). Since the process Wt − ∫ t 0 Hr dr has continuous paths, by Lévy’s theorem, Wt − ∫ t 0 Hr dr is a Brownian motion under Q. The assumption that H be bounded can be weakened, but in practice it is more common to use a stopping time argument; for an example, see the proof of Theorem 29.3. 92 The Girsanov theorem 13.2 An example Let us give an example of the use of the Girsanov theorem, namely, to compute the probability that Brownian motion crosses a line a + bt by time t0, a > 0. We want to find an exact
expression for P(∃t ≤ t0 : Wt = a + bt), where W is a Brownian motion.
Let Wt be a Brownian motion under P. Define Q on Ft0 by
dQ/dP = Mt = e−bWt−b2t/2.
By the Girsanov theorem, under Q, W̃t = Wt + bt is a Brownian motion, and Wt = W̃t − bt.
Let A = (sups≤t0 Ws ≥ a). If we set S = inf{t > 0 : Wt = a}, then A = (S ≤ t0) and
A ∈ FS∧t0 . We write
P(∃t ≤ t0 : Wt = a + bt) = P(∃t ≤ t0 : Wt − bt = a) (13.12)
= P(sup
s≤t0
(Ws − bs) ≥ a).
Wt is a Brownian motion under P while W̃t is a Brownian motion under Q. Therefore the last
line of (13.12) is equal to
Q(sup
s≤t0
(W̃s − bs) ≥ a).
This in turn is equal to
Q(sup
s≤t0
Ws ≥ a) = Q(A).
To evaluate Q(A), note MS = e−ab−b2S/2 and by (3.19) with b replaced by a,
P(S ∈ ds) = a√
2πs3
e−a
2/2s.
Now we use optional stopping to obtain
P(∃t ≤ t0 : Wt = a + bt) = Q(A) = E P[Mt0; A] (13.13)
= E P[MS∧t0; S ≤ t0]
= E P[MS; S ≤ t0]
=
∫ t0
0
e−ab−b
2s/2 a√
2πs3
e−a
2/2s ds.
Exercises
13.1 Whether a filtration satisfies the usual conditions depends on the class of null sets and hence the
probability measure involved matters. Suppose {Ft} satisfies the usual conditions with respect
to P, H is a bounded predictable process, W a Brownian motion with respect to P, M defined
by (13.3), and Q defined by (13.4). If t0 > 0 and A ∈ σ (Ws; s ≤ t0), show P(A) = 0 if and only
if Q(A) = 0.

Exercises 93
13.2 Theorem 9.10 allows us to avoid some calculations in the last paragraph of the proof of Theorem
13.3. Suppose X is a continuous semimartingale under P and Q is a probability measure
equivalent to P. That is, a set is a null set for P if and only if it is a null set for Q. Show X is a
semimartingale under Q and the quadratic variation of X under P equals the quadratic variation
of X under Q.
13.3 LetW = (W 1, . . . ,W d ) be a d-dimensional Brownian motion with minimal augmented filtration
{Ft} and let H1, . . . , Hd be bounded predictable processes. Let
Mt = exp
( d∑
i=1
∫ t
0
Hi(s) dW
i
s − 12
d∑
i=1
∫ t
0
|Hi(s)|2 ds
)
.
Define a probability measure Q by setting Q(A) = E P[Mt; A] if A ∈ Ft . Let W̃ it = W it −∫ t
0 Hi(s) ds for each i. Prove that W̃ = (W̃ 1, . . . ,W̃ d ) is a d-dimensional Brownian motion
under Q.
13.4 Let Wt be a d-dimensional Brownian motion and let δ, t0 > 0. Let f : [0, t0] → Rd be a
continuous function. Prove that there exists a constant c such that
P(sup
s≤t0
|Ws − f (s)| < δ) > c.
This is known as the support theorem for Brownian motion.
Hint: First assume that f has a bounded derivative. Use Exercise 4.9 and the Girsanov theorem.
13.5 Here is a more general form of the Girsanov theorem. Suppose Lt is a bounded continuous
martingale under P, Mt = eLt−〈L〉t/2, and Q is a probability measure defined by Q(A) =
E P[Mt0; A] if A ∈ Ft0 . Suppose {Ft} is a filtration satisfying the usual conditions with respect to
both P and Q. Show that if X is a martingale under P, then Xt −〈X , L〉t is a martingale under Q.

14
Local times
Let Wt be a one-dimensional Brownian motion. Although the Lebesgue measure of the
random set {t : Wt = 0} is 0, a.s., nevertheless there is an increasing continuous process
which grows only when the Brownian motion is at 0. This increasing process is known as
local time at 0. We want to derive some of its properties.
14.1 Basic properties
LetW be a Brownian motion. By Jensen’s inequality for conditional expectations (Proposition
A.21), |Wt | is a submartingale, and by the Doob–Meyer decomposition (Theorem 9.12), it
can be written as a martingale plus an increasing process. Since Wt is itself a martingale, the
increasing process grows only at times when the Brownian motion is at 0.
Rather than appealing to the Doob–Meyer decomposition, we give the explicit decompo-
sition of |Wt |. We define
sgn (x) =
⎧⎪⎨⎪⎩
1, x > 0;
0, x = 0;
−1, x < 0. Theorem 14.1 Let Wt be a one-dimensional Brownian motion. (1) There exists a non-negative increasing continuous adapted process L0t such that |Wt | = ∫ t 0 sgn (Ws) dWs + L0t . (14.1) (2) L0t increases only when W is at 0. More precisely, if Ws(ω) �= 0 for r ≤ s ≤ t, then L0r (ω) = L0t (ω). L0t is called the local time at 0. The equation (14.1) is called the Tanaka formula. Proof Define fε(x) = { x2/2ε, |x| < ε; |x| − (ε/2), |x| ≥ ε. The function fε is an approximation to the function | · |, and note that fε(0) = f ′ε (0), while f ′′ε (x) = ε−11[−ε,ε](x), except at x = ±ε. 94 14.1 Basic properties 95 We apply the extension of Itô’s formula given in Exercise 11.2 to fε(Wt ) and obtain fε(Wt ) = ∫ t 0 f ′ε (Ws) dWs + 12 ∫ t 0 f ′′ε (Ws) ds. As we let ε → 0, we see that fε(x) → |x| uniformly, and f ′ε (x) → sgn (x) boundedly. By Doob’s inequalities, if t0 > 0,
E sup
t≤t0
∣∣∣∫ t
0
f ′ε (Ws) dWs −
∫ t
0
sgn (Ws) dWs
∣∣∣2 → 0, (14.2)
while supt≤t0 | fε(Wt ) − |Wt | | → 0, a.s. Therefore there exists an increasing process L0t and a
subsequence εn → 0 such that
sup
t≤t0
∣∣∣ 1
2εn
∫ t
0
1[−εn,εn](Ws) ds − L0t
∣∣∣→ 0, a.s. (14.3)
Hence for almost every ω there is convergence uniformly over t in finite intervals, so L0t is
continuous in t. Since 12εn
∫ t
0 1[−εn,εn](Ws) ds increases only for those times t where |Wt | ≤ εn,
then L0t increases only on the set of times when Wt = 0.
In the Tanaka formula, the stochastic integral term is a martingale, say Nt . Note 〈N〉t = t,
since sgn (x)2 = 1 unless x = 0, and we have seen that Brownian motion spends 0 time at
0 (Exercise 11.1). Hence we have exhibited reflecting Brownian motion, namely |Wt |, as the
sum of another Brownian motion, Nt , and a continuous process that increases only when W
is at zero.
Let Mt denote sups≤t Ws. Note we do not have an absolute value here. The following, due
to Lévy, is often useful.
Theorem 14.2 The two-dimensional processes (|W |, L0) and (M − W, M ) have the same
law.
Proof Let Vt = −Nt in the Tanaka formula, so that
|Wt | = −Vt + L0t . (14.4)
Let St = sups≤t Vs. We will show St = L0t . This will prove the result, since V is a Brownian
motion, and hence (M − W, M ) is equal in law to (S − V, S) = (|W |, L0).
From (14.4), Vt = L0t − |Wt |, or Vt ≤ L0t for all t, hence St ≤ L0t , since L0 is increasing. L0t
increases only when Wt = 0 and at those times
L0t = Vt + |Wt | = Vt ≤ St .
Given two increasing functions with f ≤ g, if f (t) = g(t) at those times when f increases,
a little thought shows that f and g are equal for all t. Hence L0t = St for all t.
Just as we defined L0t via the Tanaka formula, we can construct local time at the level a by
the formula
|Wt − a| − |W0 − a| =
∫ t
0
sgn (Ws − a) dWs + Lat , (14.5)

96 Local times
and the same proof as above shows that Lat is the limit in L
2 of
1
2ε
∫ t
0
1[a−ε,a+ε](Ws)ds.
14.2 Joint continuity of local times
Next we will prove that Lat can be taken to be jointly continuous in both t and a.
Theorem 14.3 Let W be a one-dimensional Brownian motion and let Lat be the local time
of W at level a. For each a ∈ R there exists a version L̃at of Lat so that with probability one,
L̃at is jointly continuous in t and a.
Recall that two processes X and Y are versions of each other if for each t, Xt = Yt , a.s.
We will use the Kolmogorov continuity criterion, Corollary 8.2, together with Remark 8.3.
We will obtain an estimate on Ñat − Ñbt , where Ñat =
∫ t
0 sgn (Ws − a) dWs, by means of the
Burkholder–Davis–Gundy inequalities.
Proof Let M > 0 be arbitrary. It suffices to show the joint continuity for times less than or
equal to M and for |a| ≤ M . Let
Nat =
∫ M∧t
0
sgn (Ws − a) dWs.
Since |Wt −a| is uniformly continuous in t and a for |t| ≤ M, |a| ≤ M , by the Tanaka formula
(14.5) it suffices to establish the same fact for Nat .
Let T be a stopping time bounded by M and a < b. Since (Nat − Nbt )2 − 〈Na − Nb〉t is a martingale, E [ ((NaM − NbM )−(NaT − NbT ))2|FT ] = E [ ∫ M T (sgn (Ws − a) − sgn (Ws − b))2 ds|FT ] = 4E [ ∫ M T 1[a,b](Ws) ds|FT ] ≤ 4E [ ∫ M+T T 1[a,b](Ws) ds|FT ] = 4E [ ∫ M 0 1[a,b](Ws+T ) ds|FT ] ; recall Exercise 11.1. From Proposition 4.5 we deduce E [ ∫ M 0 1[a,b](Ws+T ) ds|FT ] ≤ ∫ M 0 c(b − a)√ s ds ≤ c(b − a). Thus E [ ((NaM − NbM ) − (NaT − NbT ))2|FT ] ≤ c|b − a|, and so by (9.3) E [〈Na − Nb〉M − 〈Na − Nb〉T | FT ] ≤ c|b − a|. 14.3 Occupation times 97 If we write At = 〈Na − Nb〉t , then we have by Proposition 3.14 E A2M = 2E ∫ M 0 (AM − At ) dAt = 2E [ ∫ M 0 (E [AM | Ft] − At ) dAt ] = 2E [ ∫ M 0 E [AM − At | Ft] dAt ] ≤ c|b − a|E ∫ M 0 dAt ≤ c|b − a|2. Applying the Burkholder–Davis–Gundy inequalities, E [sup t≤M |Nat − Nbt |4] ≤ c|b − a|2. (14.6) By the Kolmogorov continuity criterion applied on the Banach space of continuous functions with the metric d( f , g) = supt≤M | f (t)−g(t)|, we see Nat is continuous as a function of a for a in the dyadic rationals in [−M, M], uniformly over t ≤ M . Therefore Lta is continuous over a in the dyadic rationals in [−M, M], uniformly for t ≤ M . Also, (14.5) and (14.6) imply E [sup t≤M |Lat − Lbt |4] ≤ c (|a − b| ∧ 1)2. (14.7) Note that if we define L̃at = lim Lbnt where the limit is as bn → a and bn is in the dyadic rationals, then (14.7) implies that L̃at = Lat , a.s. The uniform continuity of Lat over a in the dyadic rationals and t ≤ M implies the joint continuity of L̃at . 14.3 Occupation times If we integrate local times over a set, we obtain occupation times. More precisely, we have the following. Theorem 14.4 Let Wt be a Brownian motion and L y t the local time at the level y, where we take Lyt to be jointly continuous in t and y. If f is non-negative and Borel measurable,∫ f (y)Lyt dy = ∫ t 0 f (Ws) ds, a.s. (14.8) with the null set independent of f and t. Proof Suppose we prove the above equality for each C2 function f with compact support and denote the null set by Nf . Taking a countable collection { fi} of non-negative C2 functions with compact support that are dense in the set of non-negative continuous functions on R with compact support and letting N = ∪iNfi , then if ω /∈ N we have the above equality for all fi. By taking limits, we have (14.8) for all bounded and continuous f . A further limiting procedure implies our result. 98 Local times Suppose f is bounded and C2 with compact support. Notice that the process ∫ f (y)Lyt dy is increasing and continuous. Define g(x) = ∫ f (y)|x − y| dy. (14.9) By Exercise 14.1, g is C2 with 12 g ′′ = f . If we take the Tanaka formula (14.5), replace a by y, multiply by f (y), and integrate over R with respect to y, we see that g(Wt ) − g(W0) = martingale + ∫ t 0 f (y)Lyt dy. Using Itô’s formula, g(Wt ) − g(W0) = martingale + 12 ∫ t 0 g′′(Ws) ds = martingale + ∫ t 0 f (Ws) ds. Thus ∫ t 0 f (y)Lyt dy − ∫ t 0 f (Ws) ds is a continuous martingale with paths locally of bounded variation, hence by Theorem 9.7 it is identically 0. Exercises 14.1 Suppose f is C2 with compact support and g(x) = ∫ f (y)|x − y| dy. Show that g is C2 and g′′ = 2 f . 14.2 Let Lyt be the jointly continuous local times of a Brownian motion W . Show 1 2ε ∫ t 0 1[y−ε,y+ε](Ws) ds → Lyt , a.s. Show the null set can be taken to be independent of y. Thus there is no need to take a subsequence εn to get almost sure convergence to L y t . 14.3 Let W be a Brownian motion and fix t. Show that the function x → ∫ t0 1(−∞,x](Ws) ds is continuous, a.s., but that the function x → 1(−∞,x](Wt ) is not continuous. 14.4 Let {Ft} be a filtration satisfying the usual conditions. Suppose Wt is a Brownian motion and Xt = Wt + At , where Xt ≥ 0 for all t, a.s., and At is an increasing continuous adapted process such that A increases only at those times when Xt = 0. Suppose also that X ′t = Wt + A′t , where X ′t ≥ 0 for all t, a.s., and A′t is an increasing continuous adapted process that increases only when X ′t = 0. Show that X ′t = Xt and At = A′t , a.s., for all t ≥ 0. 14.5 Let W be a Brownian motion and L0t the local time at 0. Since L 0 t is increasing, for each ω there is a Lebesgue–Stieltjes measure dL0t . Show that the support of dL 0 t is equal to {t : Wt = 0}. Exercises 99 Since Theorem 14.1(2) states that L0t does not increase when Wt is not equal to 0, what you need to show is that with probability one, if Wu(ω) = 0 and t L0t (ω).
14.6 Use Tanaka’s formula to show that if Lyt is the local time of Brownian motion at level y,
a ≤ x ≤ y ≤ b, and T = inf{t > 0 : Wt /∈ [a, b]}, then
E xLyT =
2(x − a)(b − y)
b − a .
14.7 If L0t is the local time of a Brownian motion at 0, show that L
0
at has the same law as
√
aL0t .
14.8 Let W be a Brownian motion with local times Lyt . Set L
∗
t = supy Lyt . Let p > 0. Prove that there
exist constants c1, c2 such that if T is any finite stopping time,
c1E T
p/2 ≤ EL∗T ≤ c2E T p/2.
The constants c1, c2 can depend on p, but not on T .
Hint: Use Exercise 12.11.
14.9 This exercise defines the local time of a continuous martingale. If M is a continuous martingale,
then M2t is a submartingale and so equals a martingale plus an increasing process. The increasing
process L0t is called the local time of M at 0.
(1) Prove the analog of Tanaka’s formula.
(2) Define the local time Lat of M at a. Prove that L
a
t is jointly continuous in t and a.
(3) Prove that ∫ t
0
f (Ms) d〈M〉s =
∫
R
Lat f (a) da, a.s.
if f is non-negative and measurable.
14.10 This exercise is a complement to Exercise 7.8. Let W be a Brownian motion and let us define
Z = {t ∈ [0, 1] : Wt = 0}, the zero set. Let ε ∈ (0, 1/2) and let δ > 0. Fix ω and let {Bi} be any
countable covering of Z(ω) by closed intervals such that the interiors of the Bi’s are pairwise
disjoint and the length of each Bi is less than or equal to δ. We write Bi = [ai, bi].
Let ε > 0. Since L0 has the same law of the maximum of Brownian motion, there exists a c
(depending on ω) such that
L0t − L0s ≤ c(t − s)
1
2 − ε2
for each 0 ≤ s ≤ t ≤ 0. Write∑
i
|bi − ai| 12 −ε ≥ δ
−ε/2
c
c
∑
i
|bi − ai| 12 − ε2
≥ δ
−ε/2
c
∑
i
(L0bi − L0ai )
= δ
−ε/2
c
[L01 − L00].
Show that this implies that the Hausdorff dimension of Z is at least 1/2.

15
Skorokhod embedding
Suppose Y is a random variable with mean zero and finite variance. Skorokhod proved the
remarkable fact that if W is a Brownian motion, there exists a stopping time T such that
WT has the same law as Y . Without any restrictions on T , there is a trivial solution (see
Exercise 15.1), so one wants to require that E T < ∞. Skorokhod’s construction required an additional random variable that is independent of the Brownian motion, but since that time there have been 15 or 20 other constructions, most of which don’t require the extra randomization, that is, T is a stopping time for the minimal augmented filtration generated by W . Although conceptually some constructions are easier than others, none is easy from the point of view of technical details. We will give a construction that doesn’t have any optimality properties, but is a nice example of stochastic calculus. Then we will use this to prove an embedding for random walks. 15.1 Preliminaries A function f : R → R is a Lipschitz function if there exists a constant k such that | f (y) − f (x)| ≤ k|y − x|, x, y ∈ R. (15.1) By the mean value theorem, if f has a bounded derivative, then f is a Lipschitz function. We will need the following well-known theorem from the theory of ordinary differential equations. Theorem 15.1 Suppose F : [0, ∞) × R → R is a bounded function and there exists a positive real k such that |F (t, x) − F (t, y)| ≤ k|x − y| for all t ≥ 0 and all x, y ∈ R. Let y0 ∈ R, define the function y0 by y0(t) = y0 for all t ≥ 0, and define the function yi inductively by yi+1(t) = y0 + ∫ t 0 F (s, yi(s)) ds, t ≥ 0. (15.2) Then the functions yi converge uniformly on bounded intervals to a function y that satisfies y(t) = y0 + ∫ t 0 F (s, y(s)) ds. (15.3) 100 15.1 Preliminaries 101 For any s such that F (s, y(s)) is continuous at s, y satisfies dy ds = F (s, y(s)). (15.4) The solution to (15.3) is unique. This inductive procedure for obtaining the solution to (15.4) is known as Picard iteration. Proof Note each yi(t) is bounded in absolute value by |y0| + t sup |F |. Let gi(t) = sups≤t |yi+1(s) − yi(s)|. If s ≤ t, then |yi+1(s) − yi(s)| = ∣∣∣ ∫ s 0 [F (r, yi(r)) − F (r, yi−1(r))] dr ∣∣∣ ≤ ∫ t 0 |F (r, yi(r)) − F (r, yi−1(r))| dr ≤ k ∫ t 0 |yi(r) − yi−1(r)| dr ≤ k ∫ t 0 gi−1(r) dr. Taking the supremum over s ≤ t, we have gi(t) ≤ k ∫ t 0 gi−1(r) dr. Fix t0. Now g1(t) is bounded for t ≤ t0, say by L. Then g2(t) ≤ k ∫ t 0 L dr = kLt for each t ≤ t0, and then g3(t) ≤ k ∫ t 0 (kLr) dr = k2Lt2/2 and g4(t) ≤ k ∫ t 0 (k 2Lr2/2) dr = k3Lt3/3! By induction gi(t) ≤ ki−1Lti−1/(i − 1)! We conclude ∑∞ i=1 gi(t0) < ∞. Then sup s≤t0 |yn(s) − ym(s)| ≤ n−1∑ i=m gi(t0), which tends to zero as m and n tend to infinity. By the completeness of the space C[0, t0], there exists a continuous function y such that sups≤t0 |yn(s) − y(s)| → 0 as n → ∞. F is continuous in the x variable, so taking the limit in (15.2) shows that y solves (15.3). If F is continuous at a particular value of s, then (15.4) holds by the fundamental theorem of calculus. To prove uniqueness, suppose x and y are solutions to (15.4) and let us set g(t) = sups≤t |x(s) − y(s)|. If s ≤ t, then |x(s) − y(s)| ≤ ∫ s 0 |F (r, x(r)) − F (r, y(r))| dr ≤ k ∫ t 0 |x(r) − y(r)| dr ≤ k ∫ t 0 g(r) dr. 102 Skorokhod embedding Taking the supremum over s ≤ t, we obtain g(t) ≤ k ∫ t 0 g(r) dr. For t ≤ t0, we have |x(t)| and |y(t)| bounded by a constant, say L, so g(t) is bounded for t ≤ t0. We then have g(t) ≤ k ∫ t0 L dr = kLt for each t ≤ t0 and then g(t) ≤ k ∫ t0 kLr dr = k2Lt2/2. Iterating, we have g(t) ≤ kitiL/i! for each i, and hence g(t) = 0. This is true for each t, hence x(s) = y(s) for all s ≤ t0. If the random variable Y that we are considering is equal to 0, a.s., we can just let our stopping time T equal 0, a.s., and then WT = 0 = Y if W is a Brownian motion. In the remainder of this section and the next we assume EY = 0, EY 2 < ∞, but that Y is not identically zero. Define ps(y) = 1√ 2πs e−y 2/2s, the density of a mean zero normal random variable with variance s. Use p′s(x) to denote the derivative of ps with respect to x. Lemma 15.2 Suppose W is a Brownian motion and g : R → R such that E [g(W1)2] < ∞. For 0 < s < 1, let a(s, x) = − ∫ p′1−s(z − x)g(z) dz (15.5) and b(s, x) = ∫ p1−s(z − x)g(z) dz. (15.6) We have g(W1) = E g(W1) + ∫ 1 0 a(s,Ws) dWs, a.s. (15.7) and E [g(W1) | F s] = b(s,Ws), a.s. (15.8) Proof We will first prove (15.7), and we will first look at the case when g(x) = eiux. By Itô’s formula with the function f (x) = ex applied to the semimartingale Xt = iuWt + u2t/2 eiuWt+u 2t/2 = 1 + ∫ t 0 eXs d(iuWs + u2s/2) + 12 ∫ t 0 (−u2)eXs ds = 1 + iu ∫ t 0 eiuWs+u 2s/2 dWs, so eiuW1 = e−u2/2 + ∫ 1 0 iueiuWs eu 2(s−1)/2 dWs. 15.1 Preliminaries 103 We need to check that iueiuxeu 2(s−1)/2 = a(s, x). Using integration by parts, a(s, x) = − ∫ p′1−s(z − x)g(z) dz = ∫ p1−s(z − x)g′(z) dz = iu ∫ 1√ 2π(1 − s)e −(z−x)2/2(1−s)eiuz dz. This is iu times the characteristic function of a normal random variable with mean x and variance 1 − s, and so by (A.25) equals iueiuxe−u 2(1−s)/2, as desired. We therefore have eiuW1 = E eiuW1 − ∫ 1 0 ∫ p′1−s(z − Ws)eiuz dz dWs. (15.9) Now suppose g is in the Schwartz class (see Section B.2), replace u by −u in (15.9), multiply by the Fourier transform of g, and integrate over u ∈ R. We then obtain (2π)−1g(W1) = (2π)−1E g(W1) (15.10) − ∫ ∫ 1 0 ∫ p′1−s(z − Ws)e−iuzĝ(u) dz dWs du, where ĝ is the Fourier transform of g. Using the Fubini theorem (check that there is no trouble with the stochastic integral; see Exercise 15.2) and the inversion formula for Fourier transforms, the triple integral on the right-hand side of (15.10) is equal to − (2π)−1 ∫ 1 0 ∫ p′1−s(z − Ws)g(z) dz dWs, (15.11) which gives us (15.7) when g is the Schwartz class. A limit argument gives us (15.7) for all g that we are interested in. To prove (15.8) we again start with the case g(x) = eiux. We have E [eiuW1 | Fs] = eiuWsE [eiu(W1−Ws ) | Fs] = eiuWsE [eiu(W1−Ws )] = eiuWs e−u2(1−s)/2, using the independent increments property of Brownian motion and (A.25). On the other hand, the definition of b(s, x) shows that when g(x) = eiux, b(s, x) is the characteristic function of a normal random variable with mean x and variance 1 − s, so b(s, x) = eiuxe−u2(1−s)/2. Replacing x by Ws proves (15.8) in the case g(x) = eiux. We extend this to general g in the same way as in the proof of (15.7). 104 Skorokhod embedding Next, we want to find a reasonable function g such that g(W1) is equal in law to Y , where again W is a Brownian motion. Let FY (x) = P(Y ≤ x), the distribution function of Y and let �(x) = P(W1 ≤ x). Then P(�(W1) ≤ x) = P(W1 ≤ �−1(x)) = �(�−1(x)) = x for x ∈ [0, 1], so �(W1) is a uniform random variable on [0, 1]. Define g(x) = F−1Y (�(x)). (15.12) We use the right-continuous version of F−1Y if F −1 Y is not continuous. Then P(g(W1) ≤ x) = P(�(W1) ≤ FY (x)) = FY (x), or Y is equal in law to g(W1) as desired. Note g is an increasing function. We will need the following estimates. Proposition 15.3 Let g be defined by (15.12) and define a and b by (15.5) and (15.6). (1) For each L > 0 and s0 < 1, a is continuously differentiable on [0, s0] × [−L, L]. Also, for each L > 0 and s0 < 1, a is bounded below by a positive constant on [0, s0] × [−L, L]. (2) For each L > 0 and s0 < 1, b is continuously differentiable on [0, s0] × [−L, L]. (3) For each s ∈ [0, s0], the function x → b(s, x) is strictly increasing. For each fixed s, let B(s, x) be the inverse of b(s, x) (so that B(s, b(s, x)) = x and b(s, B(s, x)) = x). For each L > 0 and s0 < 1, B is continuously differentiable on [0, s0] × [−L, L]. Proof To start, we observe that for every r > 0,
E er|W1| ≤ E erW1 + E e−rW1 < ∞. Since |z|m ≤ m!e|z| if m is a non-negative integer, then by the Cauchy–Schwarz inequality and the fact that EY 2 < ∞,∫ |z|mer|z|e−z2/2|g(z)| dz ≤ m! ∫ e(r+1)|z|e−z 2/2|g(z)| dz (15.13) = m!E [ e(r+1)|W1||g(W1)|] ≤ m! ( E e2(r+1)|W1| )1/2 (E |g(W1)|2)1/2 ≤ m! ( E e2(r+1)|W1| )1/2 (EY 2)1/2 < ∞. We now turn to (1). |p′1−s(z − x)| ≤ c |z − x| (1 − s)3/2 e −(z−x)2/2(1−s) ≤ c|z − x|e−x2/2(1−s)ezx/2(1−s)e−z2/2(1−s) ≤ c(|z| + L)e|z|L/2(1−s0 )e−z2/2 ≤ c|z|ec′ |z|e−z2/2 + cec′|z|e−z2/2. Therefore |a(s, x)| ≤ ∫ c|z|ec′|z|e−z2/2|g(z)| dz + ∫ cec ′|z|e−z 2/2|g(z)| dz, which is bounded by (15.13). This gives an upper bound for a. 15.2 Construction of the embedding 105 By the mean value theorem, |p′1−s(z − x) − p′1−s(z − (x + h))| ≤ c|h|(1 + |z|2 + L2)e−(z−x) 2/2(1−s) if s ≤ s0, |x| ≤ L, and |h| ≤ 1, so∣∣∣1 h (p′1−s(z − x) − p′1−s(z − (x + h)) ∣∣∣ ≤ c(1 + |z|2)ec′|z|e−z2/2. In view of (15.13), we can use dominated convergence to conclude that ∂a ∂x (s, x) = ∫ p′′1−s(z − x)g(z) dz and that |∂a(s, x)/∂x| is bounded above on [0, s0] × [−L, L]. By a similar argument we obtain that |∂a(s, x)/∂s| is also bounded above on [0, s0] × [−L, L]. The same argument shows that the second partial derivatives of a are bounded, and hence the first partial derivatives are continuous. Using integration by parts, a(s, x) = ∫ p1−s(z − x) dg(z), where the integral is a Lebesgue–Stieltjes integral; recall that g is an increasing function. Since we are working under the assumption that Y is not identically zero, then g is not identically zero, which implies that a is bounded below for s ≤ s0 and |x| ≤ L. The proof of (2) is quite similar. To prove (3), as above, we can use a dominated convergence argument to prove ∂b(s, x) ∂x = a(s, x). Since a(s, x) > 0 for each x and for each s < s0, we conclude that x → b(s, x) is strictly increasing. The estimates for B follow from the implicit function theorem applied to f (s, x, y) = 0, where f (s, x, y) = b(s, x) − y. 15.2 Construction of the embedding Theorem 15.4 Suppose Y is a random variable with EY = 0 and EY 2 < ∞. There exists a Brownian motion N and a stopping time T with respect to the minimal augmented filtration of N such that NT is equal in law to Y . Moreover E T = EY 2. Proof The idea is to define M by (15.14) below and do a time change so that NT = M1 = g(W1). To show that T is a stopping time relative to the minimal augmented filtration for N , we set up an ordinary differential equation that the time change solves and use Picard iteration to show that the solution can be obtained in a constructive way. The case where Y is identically zero is trivial for we take T = 0, so we suppose Y is not identically zero. Let Wt be a Brownian motion and let {Ft} be its minimal augmented filtration. Define the function g by (15.12) and define a and b for s < 1 by (15.5) and (15.6). Define a(s, x) = 1 and b(s, x) = x if s ≥ 1. 106 Skorokhod embedding Now let Mt = ∫ t 0 a(s,Ws) dWs, (15.14) and hence 〈M〉t = ∫ t 0 a(s,Ws) 2 ds. Note 〈M〉t → ∞, a.s., as t → ∞. Since EY = 0, then E g(W1) = 0, so M1 = g(W1) by (15.7). Let τt = inf{s : 〈M〉s ≥ t}, the inverse of 〈M〉. By Theorem 12.2, if we set Nt = Mτt , then N is a Brownian motion. Let {Gt} be the minimal augmented filtration generated by N . We let T = 〈M〉1. Then NT = N〈M〉1 = Mτ〈M〉1 = M1 = g(W1), and NT has the same law as Y . For the integrability of T we have E T = E 〈M〉1 = E M21 = E [g(W1)2] = EY 2 = VarY < ∞. (15.15) It remains to show that T is a stopping time with respect to {Gt}. Since T = lims↑1 〈M〉s, it suffices to show that 〈M〉s is a stopping time with respect to {Gt} for each s < 1. Fix K. We will show (τt ≤ s, sup s≤t |Ns| ≤ K) ∈ Gt, s < 1. (15.16) Letting K → ∞ will then show (〈M〉s ≥ t) = (τt ≤ s) ∈ Gt for s < 1. Since τ is the inverse of 〈M〉, then dτt dt = 1 d〈M〉τt /dτt = 1 a(τt,Wτt )2 with τ0 = 0, a.s. With B(s, x) being the inverse of b(s, x) in the x variable, Ms = E [M1 | Fs] = E [g(W1) | Fs] = b(s,Ws), or Ws = B(s, Ms), s < 1. Therefore Wτt = B(τt, Mτt ) = B(τt, Nt ) on the event (τt ≤ s) if s < 1. Thus τt solves the equation dτt dt = 1 a(τt, B(τt, Nt ))2 , τ0 = 0, or τt = ∫ t 0 1 a(τu, B(τu, Nu))2 du. 15.2 Construction of the embedding 107 Fix s and t and choose s0 ∈ (s, 1). Let SK = inf{t : |Nt | ≥ K} and let NKt = Nt∧SK . Define �(q, r) = 1 (a(r, B(r, NKq (ω))) 2 if r ≤ s0. Observe that � depends on ω. Define �(q, r) = 1 for r ≥ 1 and define �(q, r) by linear interpolation for r ∈ (s0, 1). Note that by Proposition 15.3, � is continuous, bounded, and there exists k > 0 such that
|�(q, r) − �(q, r′)| ≤ k|r − r′|, r ∈ R, q ∈ [0, ∞).
τt solves the equation
τt =
∫ t
0
�(u, τu) du.
We solve the differential equation
y(t) =
∫ t
0
�(u, y(u)) du (15.17)
using Theorem 15.1. The function y0(t) in the statement of Theorem 15.1 is identically zero,
and the function y1(t) = ∫ t0 �(u, y0(u)) du (which depends on ω because � does) will be Gt
measurable, and by induction, the functions yi(t) will be Gt measurable. Therefore the limit,
y(t), will be Gt measurable. Since |NKq (ω)| ≤ K for all q and we are only interested in the
solution to (15.17) for y(t) ≤ s, then τt = y(t) as long as τt ≤ s; therefore (15.16) holds and
the proof is complete.
In the above theorem, we started with a Brownian motion W , constructed a new Brownian
motion N , and then defined our stopping time T in terms of N . We can actually start with a
Brownian motion W and define a stopping time that is a stopping time with respect to the
minimal augmented filtration of W .
Corollary 15.5 LetW be a Brownian motion and let {Ft} be the minimal augmented filtration
for W . Let Y be a random variable with EY = 0 and VarY < ∞. There exists a stopping time V with respect to {Ft} such that WV has the same law as Y . Proof We sketch the proof and ask you to give the details in Exercise 15.3. Define �(q, r) = 1 (a(r, B(r,Wq(ω))))2 and solve the equation dτ t dt = �(t, τ t ), τ0 = 0 by Picard iteration. The proof of Theorem 15.4 shows that the solution τ t will satisfy (τ t ≤ s) ∈ Ft for every t as long as s < 1. Let A be the inverse of τ , and define V = lims↑1 As. Then V will be the desired stopping time. 108 Skorokhod embedding 15.3 Embedding random walks Let us give an application of Skorokhod embedding to show that we can find a Brownian motion that is relatively close to a random walk. Suppose Y1,Y2, . . . is an i.i.d. sequence of real-valued random variables with mean zero and variance one. Given a Brownian motion Wt we can find a stopping time T1 such that WT1 has the same distribution as Y1. We use the strong Markov property at time T1 and find a stopping time T2 for WT1+t − WT1 so that WT1+T2 − WT1 has the same distribution as Y2 and is independent of FT1 . We continue. We see that the Ti are i.i.d. and by Theorem 15.4, E Ti = EY 2i = 1. Let Uk = ∑k i=1 Ti. Then for each n, Sn = ∑n i=1 Yi has the same distribution as WUn . Theorem 15.6 sup i≤n |WUi − Wi|/ √ n tends to 0 in probability as n → ∞. Proof We will show that for each ε > 0
lim sup
n→∞
P(sup
k≤n
|WUk − Wk| > ε
√
n) ≤ ε. (15.18)
Since the paths of Brownian motion are continuous, we can find δ ≤ 1 small such that
P( sup
s,t≤2,|t−s|≤δ
|Wt − Ws| > ε) < ε/2. By scaling, P( sup s,t≤2n,|t−s|≤δn |Wt − Ws| > ε
√
n) < ε/2. (15.19) The strong law of large numbers (Theorem A.38) says that Un/n → E T1 = 1, a.s., and in fact, by Proposition A.39, we even have maxk≤n |Uk − k| n → 0, a.s. (15.20) Therefore P(max k≤n |WUk − Wk| > ε
√
n)
≤ P(max
k≤n
|Uk − k| > δn) + P( sup
s,t≤2n,|t−s|≤δn
|Wt − Ws| > ε
√
n)
≤ P
(
max
k≤n
|Uk − k|
n
> δ
)
+ ε
2
.
By (15.20) this will be less than ε if we take n sufficiently large.
Exercises
15.1 Without some supplemental conditions on T , the problem of Skorokhod embedding is trivial.
Suppose W is a Brownian motion with respect to a filtration {Ft} satisfying the usual conditions.
Suppose Y is a finite random variable and suppose h is a real-valued function such that h(W1)
has the same law as Y .

Exercises 109
(1) Show that if T = inf{t > 1 : Wt = h(W1)}, then WT and Y have the same law.
(2) Give an example of a mean zero random variable Y with finite variance such that if T is
defined as in (1), then E T = ∞.
15.2 Show that the triple integral on the right-hand side of (15.10) is equal to the expression in
(15.11).
15.3 A sketch was given for the proof of Corollary 15.5. Provide a detailed proof.
15.4 Here is another approach to proving Corollary 15.5. Let Y , N , T , and {Gt} be as in the proof of
Theorem 15.4.
(1) Show that there is a random variable U that is measurable with respect to σ (Ns : 0 ≤ s < ∞) such that U = T , a.s. (2) Show there is a Borel measurable map H : C[0,∞) → [0,∞) such that U = H (N ). (3) If W is a Brownian motion, define V = H (W ). Show V is a stopping time with respect to the minimal augmented filtration generated by W such that WV has the same law as Y . 15.5 Suppose p ∈ (0, 1/2) and Y is a random variable such that P(Y = 1) = P(Y = −1) = p and P(Y = 0) = 1 − 2p. Let W be a Brownian motion. Let Sx = inf{t > 0 : Wt = x} and let
T = inf{t > Sx ∧ S−x : Wt ∈ {−1, 0, 1}}. Determine x such that WT and Y have the same law.
15.6 Suppose Y is a mean zero random variable and there exists a real number K > 0 such that
|Y | ≤ K, a.s. Let W be a Brownian motion and let T be a stopping time with E T < ∞ such that WT and Y have the same law. (We do not necessarily assume that T was constructed by the method of Section 15.2.) Let SK = inf{t : |Wt | ≥ K}. Prove that T ≤ SK , a.s. 15.7 Let Yi be a sequence of i.i.d. random variables with P(Yi = 1) = P(Yi = −1) = 12 , and let Sn = ∑n i=1 Yi. Sn is called a simple symmetric random walk. Let T1, T2, . . . and U1,U2, . . . be as in Section 15.3. (1) Prove that E T p1 < ∞ for all p ≥ 1. (2) Prove that if ε > 0,
lim
n→∞
supk≤n |Uk − k|
n(1/2)+ε
= 0, a.s.
Hint: Use Doob’s inequalities to estimate
P(sup
k≤n
|Uk − k| ≥ δn(1/2)+ε ).
(3) Show that
sup
i≤n
|WUi − Wi|/n(1/4)+(ε/2)
tends to zero in probability as n → ∞.
15.8 Let Sn, Ti, and Ui be as in Exercise 15.7. Prove that
lim
n→∞
supi≤n |WUi − Wi|√
n
= 0, a.s.
15.9 Let Sn be a simple symmetric random walk; see Exercise 15.7. Let Y be a bounded symmetric
random variable that takes values only in Z. (Y being symmetric means that Y and −Y have the
same law.) Does there necessarily exist a stopping time N such that SN and Y have the same
law? Why or why not?

110 Skorokhod embedding
Notes
The survey article Obłój (2004) summarizes many different methods of Skorokhod em-
bedding. The embedding presented here is from Bass (1983); see also Stroock (2003),
pp. 213–17.

16
The general theory of processes
The name “general theory of processes” refers to the foundations of stochastic processes.
Specific topics include measurability issues and classifications of stopping times. This chapter
is fairly technical and abstract and should only be skimmed on the first reading of this book:
read the definitions and statements of theorems, propositions, and lemmas, but not the
proofs.
The two main results we discuss are the measurability of hitting times, and the
Doob–Meyer decomposition of submartingales, Theorem 16.29.
16.1 Predictable and optional processes
Suppose (�,F , P) is a probability space. The outer probability P∗ associated with P is given
by
P∗(A) = inf{P(B) : A ⊂ B, B ∈ F}. (16.1)
A set A is a P-null set if P∗(A) = 0. We suppose throughout this chapter that {Ft} is a
filtration satisfying the usual conditions; recall from Chapter 1 that this means that each Ft
contains all the P-null sets and that ∩ε>0Ft+ε = Ft for each t. Let π : [0, ∞) × � → � be
defined by
π(t, ω) = ω. (16.2)
We define the predictable σ -field P to be the σ -field on [0, ∞) × � generated by the
collection of all bounded left continuous processes adapted to Ft . That is, P is the σ -field
on [0, ∞) × � generated by the collection of all sets of the form
{(t, ω) ∈ [0, ∞) × � : Xt (ω) > a},
where a ∈ R and X is a bounded, adapted, left-continuous process. The optional σ -field O
is the σ -field on [0, ∞) × � generated by the collection of all bounded right-continuous
processes adapted to Ft . The word for predictable in French is “prévisible.” The older
literature uses “well measurable” in place of the word “optional.”
If S and T are random variables taking values in [0, ∞], let [S, T ) = {(t, ω) ∈ [0, ∞)×� :
S(ω) ≤ t < T (ω)}, and define (S, T ], (S, T ), etc. similarly. With this notation, [T, T ], the graph of T , is equal to {(t, ω) ∈ [0, ∞) × � : T (ω) = t < ∞}. Note that [T, T ] is a subset of [0, ∞) × �, so π([T, T ]) = (T < ∞). 111 112 The general theory of processes Recall that a stopping time can take the value ∞. A stopping time T is predictable if there exists a sequence of stopping times Tn such that for all ω (1) T1(ω) ≤ T2(ω) ≤ · · · , (2) limn→∞ Tn(ω) = T (ω), and (3) if T (ω) > 0, then Tn(ω) < T (ω) for each n. In this case, the stopping times Tn predict T or announce T . If T is a stopping time satisfying (1)–(3) above and S = T , a.s., then we call S a predictable stopping time as well. A stopping time T is totally inaccessible if P(T = S < ∞) = 0 for every predictable stopping time S. For an example of a predictable stopping time, let Wt be a Brownian motion started at 0 and let T = inf{t > 0 : Wt = 1}. The stopping time T is predicted by the stopping times
Tn = inf{t > 0 : Wt = 1 − (1/n)}.
For an example of a totally inaccessible stopping time, let Pt be a Poisson process with
parameter 1 and let T = inf{t : Pt = 1}, the first time the Poisson process jumps. Since
Pt has independent increments, Pt − t is a martingale, just as in Example 3.2. By (A.8),
E [(Pt − t)2] < ∞. If S is a bounded predictable stopping time, by the optional stopping theorem, E PS = E S. If Sn are stopping times predicting S, then by monotone convergence E PS− = lim n→∞ E PSn = limn→∞ E Sn = E S. Therefore E [PS − PS−] = 0, and since Pt is an increasing process, this says that P does not jump at time S. Applying this to S ∧ M and letting M → ∞, we see that P does not jump at any predictable time S, whether or not S is bounded. Therefore P(T = S < ∞) = 0, so T is totally inaccessible. The proof of the following proposition is reminiscent of that of the Vitali covering theorem from measure theory. Proposition 16.1 Let T be a stopping time. There exist predictable stopping times S1, S2, . . . and a totally inaccessible stopping time U such that [T, T ] = [U,U ] ∪ (∪∞i=1[Si, Si]). Proof Let a1 = sup{P(S = T < ∞) : S is a predictable stopping time} and choose S1 to be a predictable stopping time such that P(S1 = T < ∞) ≥ 12 a1. Given S1, . . . , Sn, let an+1 = sup{P(S = T < ∞, S �= S1, . . . , S �= Sn)) : S is a predictable stopping time} and choose Sn+1 such that P(Sn+1 = T < ∞, Sn+1 �= S1, . . . , Sn+1 �= Sn) ≥ 12 an+1. If this procedure stops after n steps, set U (ω) equal to T (ω) if T (ω) is not equal to any of S1(ω), . . . , Sn(ω) and equal to infinity otherwise. It is easy to check that U is a stopping time that is totally inaccessible. The other alternative is that this procedure continues indefinitely. In this case define U (ω) = { T (ω), T (ω) �= S1(ω), S2(ω), . . . , ∞, otherwise. 16.1 Predictable and optional processes 113 There is no problem checking that U is a stopping time, but we need to show that U is totally inaccessible. Since probabilities are bounded by one, we have an → 0. If there exists a predictable stopping time S such that b = P(S = U < ∞) > 0, then b > 2an for some
n, and in our construction we would have then chosen S in place of the Sn we did choose.
Therefore such a stopping time S cannot exist.
Proposition 16.2 (1) The optional σ -field O is generated by the collection of sets
{[S, T ) : S, T stopping times}.
(2) O is generated by the collection of sets of the form [a, b)×C, where a < b and C ∈ Fa. (3) The predictable σ -field P is generated by the collection of sets {(S, T ] : S, T stopping times}. (4) P is generated by the collection of sets {[S, T ) : S, T predictable stopping times}. (5) P is generated by the collection of sets of the form [b, c) × C, where a < b < c and C ∈ Fa. Proof (1) Since 1[S,T ) is a bounded right-continuous process that is adapted to {Ft}, sets of the form [S, T ) are optional. Now suppose X is a bounded adapted process with right- continuous paths. Let ε > 0, let U0 = 0, a.s., and let
Ui+1 = inf{t > Ui : |Xt − XUi | > ε}, i ≥ 0. (16.3)
Since X has right-continuous paths,
(U1 < t) = ∩q∈Q+,q ε},
where Q+ denotes the positive rationals, and it follows that U1 is a stopping time. Similarly
Ui is a stopping time for each i; Exercise 16.4 asks you to prove this. If we set
X εt (ω) =
∞∑
i=0
XUi (ω)1[Ui(ω),Ui+1(ω))(t),
then supt |Xt − X εt | ≤ ε. Therefore it suffices to show that each process X ε is measurable
with respect to the σ -field Ô generated by the collection of sets of the form [S, T ).
To do that, it suffices to show that processes of the form
Yt (ω) = 1A(ω)1[Ui(ω),Ui+1(ω))(t),
where A ∈ FUi , are measurable with respect to Ô. If we set S(ω) equal to Ui(ω) if ω ∈ A
and equal to ∞ otherwise and we set T (ω) equal to Ui+1(ω) if ω ∈ A and ∞ otherwise, then
Yt (ω) = 1[S(ω),T (ω)).
(2) If C ∈ Fa, then 1C(ω)1[a,b)(t) is a bounded right-continuous adapted process, so it is
optional. By (1), every bounded right-continuous adapted process can be approximated by
linear combinations of processes of the form 1[S,T ). Now 1[S,T ) = 1[S,∞) − 1[T,∞), and 1[S,∞)

114 The general theory of processes
is the limit of 1[Sn,∞), where Sn = k/2n if (k − 1)/2n ≤ S < k/2n, and we can similarly approximate 1[T,∞). Note 1[Sn(ω),∞)(t) = ∞∑ k=1 1((k−1)/2n≤S(ω) k). We have
(S, T ] = ∪k{∩m[S + 1k , T + 1m )}.
On the other hand, if S and T are predictable and are predicted by sequences Sn and Tm,
respectively, then
[S, T ) = ∩n{∪m(Sn, Tm]}.
(4) now follows by using (3).
(5) As long as a + (1/n) < b, the processes 1C(ω)1(b−(1/n),c−(1/n)](t) are left continuous, bounded, and adapted, hence predictable. The process 1C(ω)1[b,c)(t) is the limit of these processes as n → ∞, so is predictable. On the other hand, if Xt is a bounded adapted left-continuous process, it can be approximated by n2n−1∑ k=1 X(k−1)/2n (ω)1(k/2n,(k+1)/2n](t). Each summand can be approximated by linear combinations of processes of the form 1C(ω)1(b,c](t), where C ∈ Fa and a < b < c. Finally, 1C(ω)1(b,c](t) is the limit of 1C(ω)1[b+(1/n),c+(1/n))(t) as n → ∞. A consequence of Proposition 16.2(1) and (4) is that P ⊂ O. 16.2 Hitting times 115 16.2 Hitting times Let S be a separable metric space. Suppose {Ft} is a filtration satisfying the usual conditions and X is a stochastic process taking values S whose paths are right continuous and such that the jump times are totally inaccessible. Saying the jump times are totally inaccessible means that if T is a predictable stopping time, then XT − = XT , a.s., where XT− = lims 0 : Xt ∈ B}.
TB is known as the first hitting time of B and UB as the first entry time of B.
Proposition 16.3 (1) If A is an open set, then TA and UA are stopping times.
(2) If A is a compact set, then TA and UA are stopping times.
Proof (1) Since the paths of Xt are right continuous and A is open, for each t,
(TA < t) = ∪q∈Q+,q m, then XTAn ∈ An ⊂ Am, the closure of Am. Either TAn (ω) = T (ω) for all n sufficiently
large, in which case XT (ω) ∈ Am, or else TAn (ω) < T (ω) for all n. In the latter case, XT (ω) = limn→∞ XTAn (ω) ∈ Am except for ω’s in a null set since the jump times of X are totally inaccessible. In either case, XT ∈ Am. This is true for all m, so XT ∈ ∩mAm = A, and therefore TA ≤ T . We conclude TA is a stopping time. To prove UA is a stopping time, we argue using (16.4) as above. For the proof of the following, which uses Choquet’s capacity theorem, we refer the reader to Blumenthal and Getoor (1968), Section I.10. Fix t and define Rt (A) = {ω : Xs(ω) ∈ A for some s ∈ [0, t]} = (UA ≤ t). (16.5) Theorem 16.4 If A is a Borel subset of S , then Rt (A) ∈ Ft and there exists an increasing sequence of compact sets Kn contained in A such that P(Rt (Kn)) ↑ P(Rt (A)). Since (UA ≤ t) = Rt (A), we have the following as an immediate corollary. 116 The general theory of processes Theorem 16.5 For all Borel sets A, UA is a stopping time. Here is the main theorem of this section. Theorem 16.6 Suppose {Ft} is a filtration satisfying the usual conditions and X is a right continuous process whose jump times are totally inaccessible. If B is a Borel subset of S , then TB is a stopping time. Proof If we let Y δt = Xt+δ and U δB = inf{t ≥ 0 : Y δt ∈ B}, then by the above, U δB is a stopping time with respect to the filtration {F δt }, where F δt = Ft+δ . It follows that δ + U δB is a stopping time with respect to the filtration {Ft}. Since (1/m) + U 1/mB ↓ TB, then TB is a stopping time with respect to {Ft}. We now show that the hitting times of Borel sets can be approximated by the hitting times of compact sets. Proposition 16.7 There exists an increasing sequence of compact sets Kn contained in B such that UKn ↓ UB on (UB < ∞), P-a.s. Proof For each t we can find an increasing sequence of compact sets Ltn contained in B with P(Rt (Ltn)) ↑ P(Rt (B)). Let qj be an enumeration of the non-negative rationals. Let Kn = Lq1n ∪ · · · ∪ Lqnn . Then the Kn are compact, form an increasing sequence, and are all contained in B. Thus UKn decreases, say to S, and since UKn ≥ UB for all n, then S ≥ UB. If we prove S ≤ UB, P-a.s., then S = UB, and we have our result. If UB < S, there exists a rational qj with UB < qj < S. Hence it suffices to prove P(UB < qj < S) = 0 for all j. If UB < qj, then ω ∈ Rqj (B). Since Rqj (L qj n ) ↑ Rqj (B), a.s., then except for a null set, ω will be in Rqj (Lqjn ) for all n large enough, hence in Rqj (Kn) if n is large enough. Then UKn (ω) ≤ qj < UB or S ≤ qj. Therefore P(Ub < qj < S) = 0. Theorem 16.8 There exists an increasing sequence of compacts Kn contained in B such that TKn ↓ TB. Proof Let Y δt = Xt+δ and U δB = inf{t ≥ 0 : Yt ∈ B}. Applying the above proposition to Y 1/mt , for each m there exist compact sets L m n , increasing in n and contained in B, such that U 1/mLmn ↓ U 1/m B . Let Kn = L1n ∪ · · · ∪ Lnn. Then Kn is an increasing sequence of compact sets contained in B, and U 1/mKn ↓ U 1/mB . Also, for each n, 1/m +U 1/mKn ↓ TKn and 1/m +U 1/mB ↓ TB. We write TB = lim m (1/m + U 1/mB ) = limm limn (1/m + U 1/m Kn ) = lim n lim m (1/m + U 1/mKn ) = limn TKn . Since 1/m + U 1/mKn is decreasing in both m and n, the change in the order of taking limits is justified. Since TKn is decreasing, this completes the proof. 16.3 The debut and section theorems 117 16.3 The debut and section theorems If E ⊂ [0, ∞) × �, let DE = inf{t ≥ 0 : (t, ω) ∈ E}, the debut of E. An important generalization of Theorem 16.6 is the following, known as the debut theorem. Theorem 16.9 If E ∈ O, then DE is a stopping time. The proof of this theorem is beyond the scope of this book, and we refer the reader to Dellacherie and Meyer (1978) for a proof. Using Theorem 16.9, we can weaken the assumptions on X in Theorem 16.6. Theorem 16.10 If X is an optional process taking values in S and B is a Borel subset of S , then UB and TB are stopping times. Proof Since B is a Borel subset of S and X is an optional process, then 1B(Xt ) is also an optional process. UB is then the debut of the set E = {(s, ω) : 1B(Xs(ω)) = 1}, and therefore is a stopping time. To prove that TB is a stopping time, we argue exactly as in the proof of Theorem 16.6. Remark 16.11 In the theory of Markov processes, the notion of completion of a σ -field is a bit different. However it is still the case that the hitting times of Borel sets by right continuous processes are stopping times. See Remark 20.4. The optional section theorem is the following. Theorem 16.12 If E is an optional set and ε > 0, there exists a stopping time T such that
[T, T ] ⊂ E and P(π(E)) ≤ P(T < ∞) + ε. The statement of the predictable section theorem is very similar. Theorem 16.13 If E is a predictable set and ε > 0, there exists a predictable stopping time
T such that [T, T ] ⊂ E and P(π(E)) ≤ P(T < ∞) + ε. Again we refer to Dellacherie and Meyer (1978) for proofs. We note that Proposition 16.7 is a precursor of the optional section theorem. To see this, let A be a Borel set and let E = {(t, ω) : Xt ∈ A}. Then DE = UA. If the process is right continuous, then XUKn ∈ Kn ⊂ A, where the Kn are as in Proposition 16.7, and the graphs of the UKn are contained in E. Here is a corollary of Theorems 16.12 and 16.13. Corollary 16.14 (1) If X and Y are optional processes such that P(XT = YT ) = 1 for every finite stopping time T , then X and Y are indistinguishable: P(Xt = Yt for all t) = 1. (2) If X and Y are predictable processes with P(XT = YT ) = 1 for every finite predictable stopping time T , then X and Y are indistinguishable. Proof We prove (1), the proof of (2) being similar. Let F = {(t, ω) : Xt (ω) �= Yt (ω)}. Then F is an optional set, and if P(π(F )) > 0, there exists a stopping time U with [U,U ] ⊂ F

118 The general theory of processes
and P(U < ∞) > 0. By looking at T = U ∧ N for sufficiently large N , we obtain a
contradiction.
Another application of the section theorems is the following.
Proposition 16.15 Suppose [T, T ] is a predictable set. Then T is a predictable stopping
time.
Proof Since T is the debut of [T, T ], then T is a stopping time. By the predictable section
theorem, Theorem 16.13, for each n there exists a predictable stopping time Sn such that
[Sn, Sn] ⊂ [T, T ] and
P(π([Sn, Sn])) ≥ P(π([T, T ])) − 2−n.
Saying [Sn, Sn] ⊂ [T, T ] implies that for each ω, either Sn(ω) = T (ω) or else Sn(ω) = ∞.
The set of ω’s for which T (ω) < ∞ but Sn(ω) = ∞ has probability at most 2−n. Let Qn = S1 ∧· · ·∧Sn. Then the Qn’s are predictable stopping times by Exercise 16.1, they decrease, [Qn, Qn] ⊂ [T, T ], and P(π([Qn, Qn])) ≥ P(π([T, T ]))−2−n+1. Let Q = limn Qn. If Q(ω) < ∞, then Qn(ω) < ∞ for all n sufficiently large (how large depends on ω); since Qn(ω) is either equal to T (ω) or to ∞, Qn(ω) = Q(ω) for all n sufficiently large, and hence Q(ω) = T (ω). If T (ω) < ∞, then except for a set of ω’s of probability zero, Qn(ω) = T (ω) for n sufficiently large. Therefore Q = T , a.s. Choose Rnm predicting Qn as m → ∞. Choose mn large enough such that P(Rnmn + 2−n < Qn < ∞) < 2−n and P(Rnmn < n, Qn = ∞) < 2−n. Let Un = n ∧ Rnmn ∧ Rn+1,mn+1 ∧ · · · . Fix n for the moment. If 0 < Q(ω) < ∞, then Rjmj (ω) < Qj(ω) = Q(ω) for all j sufficiently large. Choosing j > n sufficiently large,
Un(ω) ≤ Rjmj (ω) < Q(ω). The Un increase; let T be the limit. By the Borel–Cantelli lemma, if Q(ω) < ∞, then Rnmn (ω) ≥ Qn(ω) − 2−n = Q(ω) − 2−n for all n sufficiently large, except for a set of ω’s of probability zero. Therefore Un(ω) ≥ Q(ω) − 2−n+1 for n sufficiently large, and we conclude that Un(ω) ↑ Q(ω), except for a set of ω’s of probability zero. If Q(ω) = ∞, then Qn(ω) = ∞ for all n. By the Borel–Cantelli lemma, except for a set of probability zero, Rnmn ≥ n for n sufficiently large. Hence Un(ω) = n for n sufficiently large, so Un(ω) < Q(ω) and Un(ω) ↑ Q(ω). Thus Q is predictable and T = Q, a.s. (We leave consideration of those ω for which Q(ω) = 0 to the reader.) Proposition 16.16 Let Xt be a predictable process with paths that are right continuous with left limits. If a ∈ R and T = inf{t > 0 : Xt ≥ a}, then T is a predictable stopping time.
Proof The set A = {(t, ω) : Xt (ω) ≥ a} is a predictable set. Since Xt is right continuous,
[T, ∞) = A ∪ (T, ∞) ∈ P by Proposition 16.2, and so [T, T ] = [T, ∞) \ (T, ∞) ∈ P .
Now apply Proposition 16.15.

16.4 Projection theorems 119
16.4 Projection theorems
Let B[0, ∞) be the Borel σ -field on [0, ∞), let F∞ = ∨t≥0Ft , and let H be the product
σ -field
H = B[0, ∞) × F∞. (16.6)
The following is the optional projection theorem.
Theorem 16.17 Let X be a bounded process that is H measurable. There exists a unique
optional process oX such that
oXT 1(T<∞) = E [XT 1(T <∞) | FT ] (16.7) for all stopping times T , including those taking infinite values. If X ≥ 0, then oX ≥ 0. oX is called the optional projection of X . If X is already optional, then by the uniqueness result, Corollary 16.14, oX = X . If we take our stopping time T in (16.7) equal to a fixed time t, we have oXt = E [Xt | Ft], a.s. (16.8) This observation is sometimes useful when X is not an adapted process and one wants a version of E [Xt | Ft] that is jointly measurable in t and ω. If (16.7) holds, then taking expectations shows that E [oXT ; T < ∞] = E [XT ; T < ∞] (16.9) for all stopping times T . Conversely, suppose (16.9) holds for all stopping times T . If S is a stopping time and A ∈ FS, let SA be defined by SA(ω) = { S(ω) ω ∈ A; ∞ ω /∈ A. (16.10) Then (16.9) with T replaced by SA implies that E [oXS1(S<∞); A] = E [XS1(S<∞); A]. Since oXS1(S<∞) is FS measurable, this implies (16.7) holds for the stopping time S. Conse- quently (16.7) holding for all stopping times T is equivalent to (16.9) holding for all stopping times T . Proof of Theorem 16.17 The uniqueness is immediate from Corollary 16.14. We look at existence. If Xt (ω) = 1F (ω)1[a,b)(t) where F ∈ F∞, we set oXt equal to E [1F | Ft]1[a,b)(t), where we use Corollary 3.13 to take the right continuous version of the martingale E [1F | Ft]. We check: E [oXT ; T < ∞] = E [E [1F | FT ]1[a,b)(T ); T < ∞] = E [1F 1[a,b)(T ); T < ∞] = E [XT ; T < ∞] 120 The general theory of processes since (T < ∞) and 1[a,b)(T ) are both FT measurable. We then use linearity and limits to define oX for bounded measurable X . The positivity of oX when X ≥ 0 is clear from the construction. Almost the same proof gives Theorem 16.18 Let X be a bounded measurable process. There exists a unique predictable process pX , called the predictable projection of X , such that E [pXT ; T < ∞] = E [XT ; T < ∞] for every predictable stopping time T . If X ≥ 0, then pX ≥ 0. Proof Uniqueness is as before. If Xt = 1F (ω)1(a,b](t), we let pXt = 1(a,b](t)Zt−(ω), where Zt− denotes the left-hand limit of Zt at time t and Zt is the right-continuous version of the martingale E [1F | Ft]. We use linearity and limits to define pX for bounded measurable X . The positivity of pX when X ≥ 0 is clear. 16.5 More on predictability If U is a random time, i.e., a F∞ measurable map from � to [0, ∞], define FU− = σ {XU : X is bounded and predictable}. Lemma 16.19 Suppose T is a predictable stopping time predicted by stopping times Tn. Then FT− = ∨∞ n=1 FTn . Proof If X is left continuous, adapted, and bounded, then XT = lim XTm and XTm ∈ FTm ⊂∨ n FTn , so XT ∈ ∨ n FTn . An argument using the monotone class theorem shows FT− ⊂∨ n FTn . On the other hand, suppose A ∈ FTn for some n. Define X = 1(Un,∞), where Un = Tn if ω ∈ A and ∞ otherwise. Since Tn < T on (T > 0), then XT = 1A. (We leave consideration of
what happens on the event (T = 0) to the reader.) X is predictable since it is left continuous,
adapted, and bounded, so A is FT− measurable. Therefore FTn ⊂ FT − for all n, and we
conclude
∨
n FTn ⊂ FT−.
Corollary 16.20 Suppose T is a predictable stopping time. If M is a uniformly integrable
martingale with right-continuous paths, then
E [MT | FT−] = MT −.
Proof If Xt = Mt−, then X is left continuous, hence predictable, so MT− = XT is FT−
measurable by the definition of FT − and a limit argument. Suppose the sequence Tn predicts
T . If A ∈ FTm and n > m, then A ∈ FTm ⊂ FTn , and by optional stopping (see Exercise 3.12),
E [MT ; A] = E [MTn; A] → E [MT−; A]
as n → ∞. Since FT− =
∨
m FTm , we have E [MT ; A] = E [MT−; A] for all A ∈ FT−. Now
use the definition of conditional expectation.
Corollary 16.21 Let S be a predictable stopping time, M a square integrable martingale,
and Nt = �MS1(t≥S). Then Nt is a square integrable martingale.

16.5 More on predictability 121
Proof Since |Nt | ≤ 2 sups≥0 |Ms|, N is square integrable. We will show N is a mar-
tingale by showing E NT = 0 for all bounded stopping times T , and then appealing to
Proposition 9.5.
If T is a bounded stopping time, then (T ≥ S) ∈ FS−; to see this, if Sm is a sequence of
stopping times predicting S, then (T ≥ S) = ∩m(T ≥ Sm) ∈ ∨mFSm . Using Corollary 16.20,
E NT = E �MS1(T≥S) = E [MS; T ≥ S] − E [MS−; T ≥ S] = 0,
and we are done.
We now show that every stopping time for Brownian motion is predictable.
Proposition 16.22 Let {Ft} be the minimal augmented filtration of a Brownian motion. If T
is a stopping time with respect to {Ft}, then T is a predictable stopping time.
Proof Let T be a stopping time for Brownian motion. Let g be a continuous strictly
increasing function from [0, ∞] to [0, 1], e.g., g(s) = (2/π ) arctan s. Let Mt be the right-
continuous modification of the martingale E [g(T ) | Ft]. The property of Brownian motion
that is key here is that every martingale adapted to the filtration of a Brownian motion is
continuous; see Corollary 12.5. Hence Mt can be taken to be continuous.
Let Vt = Mt − g(T ∧ t). Then Vt has continuous paths and since g(T ∧ t) increases with
t, V is a supermartingale. We have
Vt = E [g(T ) − g(T ∧ t) | Ft],
so V is non-negative. Clearly VT = 0. If S is the first time that Vt is 0, then S ≤ T . Also,
0 = EVS = E [g(T ) − g(T ∧ S)],
so S ≥ T.
We let Tn = inf{t : Vt = 1/n}. By the continuity of V , it is clear that each Tn is strictly less
than T if T > 0 and the Tn increase up to T . Hence T is predictable.
Now let us suppose that At is a right-continuous adapted process whose paths are increas-
ing. We call such a process an increasing process. �At denotes the jump of A at time t, that
is, �At = At − At−.
Proposition 16.23 Suppose At is an increasing process such that
(1) �AT = 0 whenever T is a totally inaccessible stopping time, and
(2) �AT is FT − measurable whenever T is a predictable stopping time.
Then A is predictable.
Proof Let Umi be the ith time |�At | ∈ (2−m, 2−m+1]. The Umi are predictable stopping times
by Exercise 16.5. We decompose each Umi as in Proposition 16.1. Since A does not jump at
totally inaccessible times, none of the Umi has a totally inaccessible part.
We do this for each m and i and obtain a countable collection of predictable stopping
times, the union of whose graphs contains all the jump times of A. We order them in some
way as R1, R2, . . . Define T1 = R1, define T2 by setting T2(ω) = R2(ω) if R2(ω) �= R1(ω)
and infinity otherwise. Set Tn(ω) = Rn(ω) if Rn(ω) �= R1(ω), . . . , Rn−1(ω) and Tn(ω) = ∞
otherwise. We thus get a sequence of predictable stopping times Tn with disjoint graphs and

122 The general theory of processes
∪n[Tn, Tn] includes all the jumps of A, except for the set of ω’s of probability zero. The Tn
are predictable stopping times by Exercise 16.6.
Since A jumps only at the predictable stopping times Tn, we see that we can write At =
Act +
∑
i(�ATn )1[Tn,∞), where A
c is a continuous increasing process. By hypothesis, �ATn
is FTn− measurable. Therefore the proof will be complete once we show (�ATn )1[Tn,∞) is a
predictable process.
It therefore suffices to show that the process Yt = 1B(ω)1[T,∞)(t) is predictable if T is
a predictable stopping time and B ∈ FT−. Since Yt = 1[TB,∞)(t), where TB is equal to T if
ω ∈ B and equal to infinity otherwise, the predictability of Y follows by Exercise 16.3.
16.6 Dual projection theorems
In this section At is a right-continuous increasing process with A0 = 0, a.s. We do not
necessarily assume that At is adapted, only that A is measurable with respect to H defined by
(16.6). Define μA on elements of H by
μA(B) = E
∫ ∞
0
1B(t, ω) dAt (ω).
We define μA(X ) by E
∫∞
0 Xt dAt if X is bounded and H measurable. Note that if X = 0,
then μA(X ) = 0.
Theorem 16.24 Suppose μ is a bounded positive measure on H such that μ(X ) = 0
whenever X = 0. Then there exists a unique right-continuous increasing process A with
A0 = 0, a.s., such that μ = μA.
Proof First, uniqueness. If μ = μA = μB, let t > 0 and let C be the set of ω’s where
At (ω) > Bt (ω)+ε. Then μA([0, t]×C) ≥ μB([0, t]×C)+εP(C), which implies P(C) = 0.
Since ε is arbitrary, then At = Bt , a.s. Since A and B are right continuous, we conclude A = B.
To prove existence, for each rational q, define νq(C) = μ([0, q] × C). Clearly νq is
absolutely continuous with respect to P. Let Ãq be the Radon–Nikodym derivative of νq with
respect to P. Since μ is positive, Ã is increasing in q. Let At = lim supq→t,q>t Ãq. It is easy to
check that μA = μ.
Theorem 16.25 Suppose A is right continuous, A0 = 0, a.s., and μA(X ) = μA(oX ) for
every bounded H measurable process X . Then At is optional.
Proof Since At is right continuous, we need only show that At is adapted. Fix t and let Y be
a bounded F∞ measurable random variable,
Z = Y − E [Y | Ft],
and Xs(ω) = 1[0,t](s)Z(ω). If T is a stopping time, then (T ≤ t) ∈ Ft , and so by the
definitions of X and Z,
E [oXT ; T < ∞] = E [XT ; T < ∞] = E [Z; T ≤ t] = 0. This implies oX = 0 by the definition of oX . Hence E [AtZ] = E [ ∫ ∞ 0 Xs dAs ] = μA(X ) = μA(oX ) = 0. 16.6 Dual projection theorems 123 Thus E [AtY ] = E [AtE [Y | Ft] ]. We write E [AtY ] = E [AtE [Y | Ft] ] = E [E [(AtE [Y | Ft]) | Ft] ] = E [E [At | Ft]E [Y | Ft] ] = E [E [(Y E [At | Ft]) | Ft] ] = E [Y E [At | Ft] ]. Hence E [AtY ] = E [Y E [At | Ft] ] for all bounded Y , or At = E [At | Ft], a.s., which says that At is Ft measurable. Theorem 16.26 If μA(X ) = μA(pX ) for all bounded X , then A is predictable and can be taken to be right continuous. Proof By hypothesis, together with Exercise 16.8, μA( oX ) = μA(p(oX )) = μA(pX ) = μA(X ). By Theorem 16.25, At is right continuous and optional. We need to show that A does not jump at totally inaccessible times and that �AT is FT− measurable at predictable times T ; we then use Proposition 16.23. Let T be a totally inaccessible stopping time and let B = (�AT > 0). Set TB equal to T on
B and equal to infinity otherwise. It is easy to check that TB is also totally inaccessible. Let
X = 1[TB,TB]. If U is a predictable stopping time, E [XU ;U < ∞] = P(TB = U < ∞) = 0. By the definition of predictable projection, pX = 0. Hence E [�AT ; �AT > 0] = E [�ATB ] = μA(X ) = μA(pX ) = 0.
Now suppose T is a predictable stopping time. Let Y be a bounded H measurable random
variable, set
Z = Y − E [Y | FT−],
and X = Z1[T,T ]. Let S be any predictable stopping time. Then if W = 1[S,S], W =
limn→∞ 1[S,S+(1/n)) is a predictable process by Proposition 16.2(4). By the definition of FT −,
WT is FT− measurable. This is the same as saying (S = T < ∞) ∈ FT−. Therefore E [XS; S < ∞] = E [Z; S = T < ∞] = 0. This implies pX = 0, and then 0 = μA(pX ) = μA(X ) = E [Z�AT ]. Similarly to the proof of Theorem 16.25, E [�ATY ] = E [�AT E [Y | FT −] ] = E [E [�AT | FT−]E [Y | FT −] ] = E [Y E [�AT | FT −] ]. Since this holds for all Y , then �AT = E [�AT | FT−] is FT − measurable. We now define the dual optional projection and the dual predictable projection of an increasing process. Given a right-continuous increasing, not necessarily adapted process At with A0 = 0, a.s., define μo by μo(X ) = μA(oX ) (16.11) 124 The general theory of processes for bounded H measurable X . Exercise 16.11 asks you to prove that μo is a measure. Clearly μo( oX ) = μA(o(oX )) = μA(oX ) = μo(X ). By Theorem 16.17, we see that oX ≥ 0 if X ≥ 0, hence μo is a positive measure. If X = 0, then oX = 0, so μo(X ) = μA(oX ) = 0. Therefore by Theorems 16.24 and 16.25, μo corresponds to an optional increasing process Ao, called the dual optional projection of A. The dual optional projection is used in excursion theory. More commonly used is the dual predictable projection, which is defined in a very similar way. Define μp(X ) = μA(pX ), and let Ap be the predictable increasing process associated with μp. We often denote Ap by Ã and call it the compensator of A. The reason for this terminology is the following proposition. Proposition 16.27 Let At be an adapted increasing process with A0 = 0, a.s. Then At − Ãt is a martingale. Proof Let s < t, let B ∈ Fs, define S(ω) = { s, ω ∈ B, ∞, ω /∈ B, and T (ω) = { t, ω ∈ B, ∞, ω /∈ B. Let X = 1(S,T ]. Then E [At − As; B] = μA(X ) = μA(pX ) = μAp(X ) = E [Apt − Aps ; B], which does it. 16.7 The Doob–Meyer decomposition Proposition 16.28 If M is a predictable uniformly integrable martingale with paths that are right continuous with left limits, then M is continuous. Proof Let ε > 0 and let T = inf{t : |�Mt | > ε}. T is a predictable stopping time by
Exercise 16.2. By Corollary 16.20, E [MT | FT−] = MT−. By the definition of FT− and a
limit argument, MT is FT− measurable, and thus E [MT | FT−] = MT . Hence MT = MT−
at all predictable stopping times, and in particular at time T . But ε is arbitrary, so M has no
jumps.
We say a process X is of class D if the family {XT : T a stopping time} is uniformly
integrable. The Doob–Meyer decomposition is the following. If Zt is a supermartingale,
then −Zt is a submartingale, and it is a matter only of convenience whether we state the
Doob–Meyer decomposition in terms of submartingales or supermartingales.
Theorem 16.29 Suppose Zt is a submartingale of class D with paths that are right continuous
with left limits and such that Z0 = 0, a.s. Then Zt = Mt+At, where Mt is a uniformly integrable
right-continuous martingale with M0 = 0, a.s., and At is a predictable increasing process
with A0 = 0, a.s. The decomposition is unique.
The existence is the hard part. We define a measure μ by μ((S, T ]) = E[ZT − ZS] for
stopping times S ≤ T , and then let A be the increasing process such that μA(X ) = μ(pX ).
Proof We start with uniqueness. If Zt = Mt + At = Nt + Bt , then Mt − Nt = Bt − At , and so
Mt − Nt is a predictable uniformly integrable martingale. By Proposition 16.28, Mt − Nt is

16.7 The Doob–Meyer decomposition 125
a continuous martingale. Since Mt − Nt = Bt − At , then Mt − Nt is a continuous martingale
whose paths are of bounded variation on each finite time interval, hence Mt − Nt = 0 by
Theorem 9.7. This proves uniqueness.
We turn to existence. By the martingale convergence theorem (Theorem 3.12), Z∞ =
limt→∞ Zt exists, a.s. By Fatou’s lemma, E |Z∞| < ∞. Let I denote the collection of finite unions of subsets of [0, ∞) × � of the form (S, T ], where S ≤ T are stopping times. Define μ((S, T ]) = E [ZT −ZS]. Since Z is a submartingale, then μ is non-negative. We note that I is an algebra and that μ is finitely additive on I . If K = (S1, T1] ∪ · · · ∪ (Sn, Tn] with S1 ≤ T1 ≤ S2 ≤ · · · ≤ Tn, set K = [S1, T1] ∪ · · · ∪ [Sn, Tn]. If H = (S, T ] and ε > 0, let
Sn(ω) =
{
S(ω) + (1/n), S(ω) + (1/n) < T (ω), ∞, otherwise, and Tn(ω) = { T (ω), S(ω) + (1/n) < T (ω), ∞, otherwise. Then [Sn, Tn] ⊂ (S, T ] and Sn ↓ S, Tn ↓ T . Since Z is right continuous and of class D, then μ(Sn, Tn] = E [ZTn − ZSn ] → E [ZT − ZS] = μ(H ). Thus if n is sufficiently large and we take K = (Sn, Tn], then K ⊂ H and μ(K) > μ(H ) − ε.
We now prove that μ is countably additive on I . Suppose Hn ∈ I with Hn ↓ ∅. We need
to show that μ(Hn) ↓ 0.
Let ε > 0 and choose Kn ∈ I such that Kn ⊂ Hn with μ(Kn) > μ(Hn) − ε/2n. Let
Ln = K1 ∩ · · · ∩ Kn. Then for each n we have μ(Hn) ≤ μ(Ln) + ε. Since Ln ⊂ Kn ⊂ Hn, we
have Ln ↓ ∅.
Let DLn be the debut of Ln. The stopping times DLn increase; let R be the limit. Let
Fn = Fn(ω) = {t : (t, ω) ∈ Ln}. This is a closed subset of [0, ∞), and DLn (ω) ∈ Fn ⊂ Fm
whenever n ≥ m and DLn (ω) < ∞. If R(ω) < ∞, then R(ω) ∈ Fm for each m, which contradicts ∩mLm = ∅. Therefore R = ∞. Since Z is of class D, then ZDLn converges almost surely and in L1 to Z∞. Thus μ(Ln) ≤ E [Z∞ − ZDLn ] → 0. Hence lim sup μ(Hn) < ε, and since ε is arbitrary, μ(Hn) → 0. This proves that μ is countably additive on I . By the Carathéodory extension theorem, μ may be extended to a measure on P . Define μ̃(X ) = μ(pX ). Then μ̃(pX ) = μ(p(pX )) = μ(pX ) = μ̃(X ), and so there exists a predictable right-continuous increasing process At such that μ̃ = μA. Since E A∞ = μA(1(0,∞)) = μ(p1(0,∞)) = μ(1(0,∞)) = E [Z∞ − Z0] < ∞, A∞ is integrable, and since At is an increasing process, the collection of random variables {At} is uniformly integrable. If S is any stopping time, then by Proposition 16.2, (S, ∞) is a predictable set, hence p1(S,∞) = 1(S,∞). We thus have E [A∞ − AS] = μ̃((S, ∞)) = μ(p1(S,∞)) = μ(1(S,∞)) = E [Z∞ − ZS]. 126 The general theory of processes Letting t > 0 and B ∈ Ft , define S = t if ω ∈ B and equal to infinity otherwise. Then
E [A∞ − At; B] = E [A∞ − AS] = E [Z∞ − ZS] = E [Z∞ − Zt; B],
or Mt = Zt − At is a martingale. Proposition A.17 tells us that M is a uniformly integrable
martingale.
A process X is of class DL if there exist stopping times Vn → ∞ such that Xt∧Vn is of
class D for each n. It is clear that there is a version of the Doob–Meyer decomposition for
submartingales of class DL.
Proposition 16.30 The process A is continuous if and only if E ZTn → E ZT whenever Tn ↑ T
and Tn < T on (T > 0).
Proof Let T be a predictable stopping time predicted by the sequence Tn. Since we know
E [A∞ − ATn ] = E [Z∞ − ZTn ], then taking limits,
E [A∞ − AT−; T < ∞] = E [Z∞ − ZT−; T < ∞], using the fact that Z is of class D. Also E [A∞ − AT ] = E [Z∞ − ZT ]. Thus E [AT − AT−] = E [ZT − ZT −]. Then E [AT − AT −] = 0 if and only if E ZT = E ZT−. Corollary 16.31 Let S be a totally inaccessible stopping time, Y a non-negative bounded random variable that is FS measurable, and At = Y 1(t≥S). Let Ã be the compensator of A. Then Ã has continuous paths. Proof Let T be a stopping time and let Tn be stopping times increasing to T . If we have P(T = S) = 0, then limn→∞ ATn = AT , a.s., since A jumps only at time S. If P(T = S) >
0, then [T, T ] cannot contain the graph of a predictable stopping time since S is totally
inaccessible. Therefore we cannot have Tn < T for all n with positive probability, hence Tn(ω) = T (ω) for all n sufficiently large (depending on ω). Thus again limn→∞ ATn = AT , a.s. By Proposition 16.30, Ã is continuous. 16.8 Two inequalities Proposition 16.32 Suppose Zt = Mt − At, where Mt is a uniformly integrable martingale and At is an increasing predictable process with A0 = 0, a.s. Suppose Z is bounded, that is, there exists K > 0 such that P(|Zt | > K for some t) = 0. If p is any positive integer,
E Ap∞ < ∞. Proof Let λ > 0 and let M = 4K. Let T = inf{t : At ≥ λ}. Because AT− ≤ λ,
P(A∞ ≥ λ + M ) = P(A∞ ≥ λ + M, T < ∞) ≤ P(A∞ − AT − ≥ M, T < ∞) ≤ E [A∞ − AT − M ; A∞ − AT− ≥ M, T < ∞ ] ≤ 1 M E [A∞ − AT−; T < ∞]. 16.8 Two inequalities 127 We will show 1 M E [A∞ − AT−; T < ∞] ≤ 12P(T < ∞), (16.12) which, since P(T < ∞) = P(A∞ ≥ λ), implies P(A∞ ≥ λ + M ) ≤ 12P(A∞ ≥ λ). (16.13) Taking λ = kM in (16.13) yields P(A∞ ≥ (k + 1)M ) ≤ 12P(A∞ ≥ kM ). Since P(A∞ ≥ M ) ≤ 1, induction tells us P(A∞ ≥ kM ) ≤ 1 2k−1 , which implies our conclusion. Therefore we need to prove (16.12). T is a predictable stopping time by Proposition 16.16. Let Tn be stopping times with Tn ↑ T and Tn < T on (T > 0). Let n be fixed for the moment
and let N > 0. If j > n,
E [A∞ − ATj ; Tn < N] = E [E [A∞ − ATj | FTj ]; Tn < N] = −E [E [Z∞ − ZTj | FTj ]; Tn < N] ≤ 2KP(Tn < N ) since Zt + At is a martingale, (Tn < N ) ∈ FTn ⊂ FTj , and |Z| is bounded by K. Letting j → ∞ and using Fatou’s lemma, we get E [A∞ − AT −; Tn < N] ≤ 2KP(Tn < N ). Letting n → ∞, by Fatou’s lemma again, E [A∞ − AT−; T < N] ≤ 2KP(T ≤ N ). Finally, letting N → ∞, by monotone convergence, E [A∞ − AT −; T < ∞] ≤ 2KP(T < ∞). By our choice of M , this gives (16.12). For use in the reduction theorem in Chapter 17, we will need a variation of the preceding proposition. Proposition 16.33 Let U be a stopping time, Y a non-negative integrable random variable that is FU measurable. Let Nt be the right-continuous version of E [Y | Ft]. Suppose there exists K > 0 such that Nt ≤ K if t < U . Let Zt = Y 1(t≥U ), which is an increasing process, and let At be its compensator. If p is a positive integer, then E A p ∞ < ∞. Proof As in the proof of Proposition 16.32, it suffices to show E [A∞ − AT −; T < ∞] ≤ KP(T < ∞), (16.14) where λ > 0 and T = inf{t : At ≥ λ}. Since A is a predictable process, then T is a predictable
stopping time by Proposition 16.16. Let Tn be stopping times predicting T .

128 The general theory of processes
Let N, n ≥ 1. If j > n, then (Tn < N ) ∈ FTn ⊂ FTj and E [A∞ − ATj ; Tn < N] = E [Z∞ − ZTj ; Tn < N]. (16.15) We observe that Z∞ − ZTj = 0 on the event (Tj ≥ U ), while Z∞ − ZTj = Y on the event (Tj < U ). Therefore E [Z∞ − ZTj ; Tn < N] = E [Y ; Tj < U, Tn < N] = E [E [Y | FTj ]; Tj < U, Tn < N] = E [NTj ; Tj < U, Tn < N] ≤ KP(Tj < U, Tn < N ) ≤ KP(Tn < N ). With this and (16.15), we can now proceed as in the proof of Proposition 16.32 to obtain (16.14). Exercises 16.1 Show that if S1, . . . , Sn are predictable stopping times, then so are S1 ∧· · ·∧Sn and S1 ∨· · ·∨Sn. 16.2 If At is a predictable process with paths that are right continuous with left limits and a > 0,
show T = inf{t > 0 : �At > a} is a predictable stopping time.
16.3 Show that if T is a predictable stopping time, B ∈ FT−, and TB(ω) is defined to be equal to
T (ω) if ω ∈ B and equal to ∞ otherwise, then TB is a predictable stopping time.
16.4 Let X be a bounded adapted right-continuous process, let ε > 0, let U0 = 0, a.s., and define Ui
by (16.3) for i ≥ 1. Show each Ui is a stopping time.
16.5 Let A be a predictable increasing process and let Sk be the kth time A jumps more than ε. Thus
S0 = 0, a.s., and Sk+1 = inf{t > Sk : �At > ε}. Show each Sk is a predictable stopping time.
16.6 Show that the stopping times Tn defined in the proof of Proposition 16.23 are predictable.
16.7 Show that if Pt is a Poisson process, then (pP)t = Pt−.
16.8 Show that if X is bounded and measurable with respect to the product σ -field B[0,∞) × F∞,
then p(oX ) = pX .
16.9 Suppose T is a totally inaccessible stopping time. Show that if X = 1[T,T ], then pX = 0.
16.10 If P is a Poisson process with parameter λ, determine Pot and P
p
t .
16.11 Show that μo defined in (16.11) is a measure.
16.12 Let Xt be a continuous process and suppose there exists K > 0 such that for all t,
E [ |X∞ − Xt | |Ft ] ≤ K, a.s.
Let X ∗∞ = supt≥0 |Xt |. Prove that there exists a depending only on K such that
E eaX
∗
∞ < ∞. Notes 129 This is sometimes called the John–Nirenberg inequality after the inequality of the same name in analysis. Hint: Imitate the proof of Proposition 16.32. This exercise is somewhat easier than the proof of that proposition because X has continuous paths. 16.13 A martingale M is said to be in the space BMO if sup t≥0 E [M2∞ − M2t | Ft ] < ∞, a.s. Let M∗t = sups≤t |Ms|. Show that if M is in BMO, then there exists a > 0 such that
E eaM
∗
∞ < ∞. The name BMO comes from the “bounded mean oscillation” spaces of harmonic analysis. Hint: Use Exercise 16.12. Notes A progressively measurable set is one whose indicator is a progressively measurable process, which is defined in Exercise 1.3. In fact, the debut of a progressively measurable set is a stopping time; see Dellacherie and Meyer (1978). An elementary proof of the general Doob–Meyer theorem along the lines of the proof given in Chapter 9 can be found in Bass (1996). See Dellacherie and Meyer (1978) for more on the general theory of processes. 17 Processes with jumps In this chapter we investigate the stochastic calculus for processes which may have jumps as well as a continuous component. If X is not a continuous process, it is no longer true that Xt∧TN is a bounded process when TN = inf{t : |Xt | ≥ N}, since there could be a large jump at time TN . We investigate stochastic integrals with respect to square integrable (not necessarily continuous) martingales, Itô’s formula, and the Girsanov transformation. We prove the reduction theorem that allows us to look at semimartingales that are not necessarily bounded. Since I encouraged you to skim Chapter 16 on the first reading of this book, it is only fair that I tell you the facts that we will need from that chapter. We will need the Doob–Meyer decomposition (Theorem 16.29), Proposition 16.1, Corollaries 16.21 and 16.31, and the two inequalities in Propositions 16.32 and 16.33. 17.1 Decomposition of martingales We assume throughout this chapter that {F t} is a filtration satisfying the usual conditions. This means that each Ft contains every P-null set and ∩ε>0Ft+ε = Ft for each t.
Let us begin by recalling a few definitions and facts. The predictable σ -field is the σ -field
of subsets of [0, ∞) × � generated by the collection of bounded, left-continuous processes
that are adapted to {Ft}; see Section 10.1. A stopping time T is predictable and predicted
by the sequence of stopping times Tn if Tn ↑ T , and Tn < T on the event (T > 0). A
stopping time T is totally inaccessible if P(T = S) = 0 for every predictable stopping time
S. The graph of a stopping time T is [T, T ] = {(t, ω) : t = T (ω) < ∞}; see Section 16.1. If Xt is a process that is right continuous with left limits, we set Xt− = lims→t,s Ti1 : |�Mt | ∈ [2i, 2i+1)}, and so on; i can be both positive and negative. Since Mt
is right continuous with left limits, for each i, Ti j → ∞ as j → ∞. We conclude that Mt
has at most countably many jumps. Next we decompose each Ti j into predictable and totally
inaccessible parts by Proposition 16.1. We relabel the jump times as S1, S2, . . . so that each
Sk is either predictable or totally inaccessible, the graphs of the Sk are disjoint, M has a jump
at each time Sk and only at these times, and |�MSk | is bounded for each k; of the proof of
Proposition 16.23. We do not assume that Sk1 ≤ Sk2 if k1 ≤ k2, and in general it would not be
possible to arrange this.
If Si is a totally inaccessible stopping time, let
Ai(t) = �MSi 1(t≥Si ) (17.2)
and
Mi(t) = Ai(t) − Ãi(t), (17.3)
where Ãi is the compensator of Ai. Ai(t) is the process that is 0 up to time Si and then jumps
an amount �MSi ; thereafter it is constant. By Corollary 16.31, Ã is continuous. If Si is a
predictable stopping time, let
Mi(t) = �MSi 1(t≥Si ). (17.4)
By Corollary 16.21, Mi is a martingale. Note that in either case, M − Mi has no jump at
time Si.
Theorem 17.3 Suppose M is a square integrable martingale and we define Mi as in (17.3)
and (17.4).
(1) Each Mi is square integrable.
(2)
∑∞
i=1 Mi(∞) converges in L2.
(3) If Mct = Mt −
∑∞
i=1 Mi(t), then M
c is square integrable and we can find a version that
has continuous paths.
(4) For each i and each stopping time T , E [McT Mi(T )] = 0.
Proof (1) If Si is a totally inaccessible stopping time and we let Bt = (�MSi )+1(t≥Si ) and
Ct = (�MSi )−1(t≥Si ), then (1) follows by Lemma 17.1. If Si is predictable, (1) follows by
Corollary 16.21.
(2) Let Vn(t) =
∑n
i=1 Mi(t). By the orthogonality lemma (Lemma 17.2),
E [Mi(∞)Mj(∞)] = 0 if i �= j and E [Mi(∞)(M∞ − Vn(∞)] = 0 if i ≤ n. We thus

17.2 Stochastic integrals 133
have
n∑
i=1
E Mi(∞)2 = EVn(∞)2
≤ E
[
M∞ − Vn(∞)
]2
+ EVn(∞)2
= E
[
M∞ − Vn(∞) + Vn(∞)
]2
= E M2∞ < ∞. Therefore the series E ∑n i=1 Mi(∞)2 converges. If n > m,
E [(Vn(∞) − Vm(∞)]2 = E
[ n∑
i=m+1
Mi(∞)
]2
=
n∑
i=m+1
E Mi(∞)2.
This tends to 0 as n, m → ∞, so Vn(∞) is a Cauchy sequence in L2, and hence converges.
(3) From (2), Doob’s inequalities, and the completeness of L2, the random variables
supt≥0[Mt − Vn(t)] converge in L2 as n → ∞. Let Mct = limn→∞[Mt − Vn(t)]. There is a
sequence nk such that
sup
t≥0
|(Mt − Vnk (t)) − Mct | → 0, a.s.
We conclude that the paths of Mct are right continuous with left limits. By the construction
of the Mi, M − Vnk has jumps only at times Si for i > nk . We therefore see that Mc has no
jumps, i.e., it is continuous.
(4) By the orthogonality lemma and (17.1),
E [Mi(T )(MT − Vn(T )] = 0
if T is a stopping time and i ≤ n. Letting n tend to infinity proves (4).
17.2 Stochastic integrals
If Mt is a square integrable martingale, then M2t is a submartingale by Jensen’s inequality
for conditional expectations. Just as in the case of continuous martingales, we can use the
Doob–Meyer decomposition (this time, we use Theorem 16.29 instead of Theorem 9.12) to
find a predictable increasing process starting at 0, denoted 〈M〉t , such that M2t − 〈M〉t is a
martingale.
Let us define
[M]t = 〈Mc〉t +
∑
s≤t
|�Ms|2. (17.5)
Here Mc is the continuous part of the martingale M as defined in Theorem 17.3. As an
example, if Mt = Pt − t, where Pt is a Poisson process with parameter 1, then Mct = 0 and
[M]t =
∑
s≤t
�P2s =
∑
s≤t
�Ps = Pt,
because all the jumps of Pt are of size one. In this case 〈M〉t = t; this follows from Proposition
17.4 below.

134 Processes with jumps
In defining stochastic integrals, one could work with 〈M〉t , but the process [M]t is the one
that shows up naturally in many formulas, such as the product formula.
Proposition 17.4 M2t − [M]t is a martingale.
Proof By the orthogonality lemma and (17.1) it is easy to see that
〈M〉t = 〈Mc〉t +
∑
i
〈Mi〉t .
Since M2t − 〈M〉t is a martingale, we need only show [M]t − 〈M〉t is a martingale. Since
[M]t − 〈M〉t =
(
〈Mc〉t +
∑
s≤t
|�Ms|2
)
−
(
〈Mc〉t +
∑
i
〈Mi〉t
)
,
it suffices to show that
∑
i 〈Mi〉t −
∑
i
∑
s≤t |�Mi(s)|2 is a martingale.
By Exercise 17.1
Mi(t)
2 = 2
∫ t
0
Mi(s−) dMi(s) +
∑
s≤t
|�Mi(s)|2, (17.6)
where the first term on the right-hand side is a Lebesgue–Stieltjes integral. If we approximate
this integral by a Riemann sum and use the fact that Mi is a martingale, we see that the first
term on the right in (17.6) is a martingale. Thus M2i (t) −
∑
s≤t |�Mi(s)|2 is a martingale.
Since M2i (t) − 〈Mi〉t is a martingale, summing over i completes the proof.
If Hs is of the form
Hs(ω) =
n∑
i=1
Ki(ω)1(ai,bi](s), (17.7)
where each Ki is bounded and Fai measurable, define the stochastic integral by
Nt =
∫ t
0
Hs dMs =
n∑
i=1
Ki[Mbi∧t − Mai∧t].
Very similar proofs to those in Chapter 10 show that the left-hand side will be a martingale
and (with [·] instead of 〈·〉), N2t − [N]t is a martingale.
If H is P measurable and E
∫∞
0 H
2
s d[M]s < ∞, approximate H by integrands H ns of the form (17.7) so that E ∫ ∞ 0 (Hs − H ns )2 d[M]s → 0 and define Nnt as the stochastic integral of H n with respect to Mt . By almost the same proof as that of Theorem 10.4, the martingales Nnt converge in L 2. We call the limit Nt = ∫ t 0 Hs dMs the stochastic integral of H with respect to M . A subsequence of the Nn converges uniformly over t ≥ 0, a.s., and therefore the limit has paths that are right continuous with left limits. The same arguments as those of Theorem 10.4 apply to prove that the stochastic integral is a martingale and [N]t = ∫ t 0 H 2s d[M]s. 17.3 Itô’s formula 135 A consequence of this last equation is that E ( ∫ t 0 Hs dMs )2 = E ∫ t 0 H 2s d[M]s. (17.8) 17.3 Itô’s formula We will first prove Itô’s formula for a special case, namely, we suppose Xt = Mt + At , where Mt is a square integrable martingale and At is a process of bounded variation whose total variation is integrable. The extension to semimartingales without the integrability conditions will be done later in the chapter (in Section 17.5) and is easy. Define 〈X c〉t to be 〈Mc〉t . Theorem 17.5 Suppose Xt = Mt + At, where Mt is a square integrable martingale and At is a process with paths of bounded variation whose total variation is integrable. Suppose f is C2 on R with bounded first and second derivatives. Then f (Xt ) = f (X0) + ∫ t 0 f ′(Xs−) dXs + 12 ∫ t 0 f ′′(Xs−) d〈X c〉s (17.9) + ∑ s≤t [ f (Xs) − f (Xs−) − f ′(Xs−)�Xs]. Proof The proof will be given in several steps. Set S(t) = ∫ t 0 f ′(Xs−) dXs, Q(t) = 12 ∫ t 0 f ′′(Xs−) d〈X c〉s, and J (t) = ∑ s≤t [ f (Xs) − f (Xs−) − f ′(Xs−)�Xs]. We use these letters as mnemonics for “stochastic integral term,” “quadratic variation term,” and “jump term,” respectively. Step 1. Suppose Xt has a single jump at time T which is either a predictable stopping time or a totally inaccessible stopping time and there exists N > 0 such that |�MT | + |�AT | ≤ N
a.s.
If T is totally inaccessible, let Ct = �MT 1(t≥T ) and let C̃t be the compensator. If we replace
Mt by Mt − Ct + C̃t and At by At + Ct − Ĉt , we may assume that Mt is continuous. If T is
a predictable stopping time, replace Mt by Mt − �MT 1(t≥T ) and At by At + �MT 1(t≥T ), and
again we may assume M is continuous.
Let Bt = �XT 1(t≥T ). Set X̂t = Xt − Bt and Ât = At − Bt . Then X̂t = Mt + Ât and X̂t is
a continuous process that agrees with Xt up to but not including time T . We have X̂s− = X̂s
and �X̂s = 0 if s ≤ T . By Theorem 11.1
f (X̂t ) = f (X̂0) +
∫ t
0
f ′(X̂s) dX̂s + 12
∫ t
0
f ′′(X̂s) d〈M〉s
= f (X̂0) +
∫ t
0
f ′(X̂s−) dX̂s + 12
∫ t
0
f ′′(X̂s−) d〈X̃ c〉s
+
∑
s≤t
[ f (X̂s) − f (X̂s−) − f ′(X̂s−)�X̂s],

136 Processes with jumps
since the sum on the last line is zero. For t < T , X̂t agrees with Xt . At time T , f (Xt ) has a jump of size f (XT ) − f (XT−). The integral with respect to X̂ , S(t), will jump f ′(XT−)�XT , Q(t) does not jump at all, and J (t) jumps f (XT ) − f (XT−) − f ′(XT−)�XT . Therefore both sides of (17.9) jump the same amount at time T , and hence in this case we have (17.9) holding for t ≤ T . Step 2. Suppose there exist times T1 < T2 < · · · with Tn → ∞, each Ti is either a totally inaccessible stopping time or a predictable stopping time, for each i, there exists Ni > 0 such
that |�MTi | and |�ATi | are bounded by Ni, and Xt is continuous except at the times T1, T2, . . .
Let T0 = 0.
Fix i for the moment. Define X ′t = X(t−Ti)+ , define A′t and M ′t similarly, and apply Step 1 to
X ′ at time Ti + t. We have for Ti ≤ t ≤ Ti+1
f (Xt ) = f (XTi ) +
∫ t
Ti
f ′(Xs−) dXs + 12
∫ t
Ti
f ′′(Xs−) d〈X c〉s
+
∑
Ti 0 such that |�MSi |+ |�ASi | ≤ Ni. Moreover each Si is either
a predictable stopping time or a totally inaccessible stopping time. Let M be decomposed
into Mc and Mi as in Theorem 17.3 and let
Act = At −
∞∑
i=1
�ASi 1(t≥Si ).
Since At is of bounded variation, then Ac will be finite and continuous. Define
Mnt = Mct +
n∑
i=1
Mi(t)
and
Ant = Act +
n∑
i=1
�ASi 1(t≥Si ),
and let X nt = Mnt + Ant . We already know that Mn converges uniformly over t ≥ 0 to M in
L2. If we let Bnt =
∑n
i=1(�ASi )
+1(t≥Si ) and C
n
t =
∑n
i=1(�ASi )
−1(t≥Si ) and let Bt = supn Bnt ,
Ct = supn Cnt , then the fact that A has paths of bounded variation implies that with probability

17.3 Itô’s formula 137
one, Bnt → Bt and Cnt → Ct uniformly over t ≥ 0 and At = Bt − Ct . In particular, we have
convergence in total variation norm:
E
∫ ∞
0
|d(Ant ) − At )| → 0.
We define Sn(t), Qn(t), and Jn(t) analogously to S(t), Q(t), and J (t), respectively. By
applying Step 2 to X n, we have
f (X nt ) = f (X n0 ) + Sn(t) + Qn(t) + Jn(t),
and we need to show convergence of each term. We now examine the various terms.
Uniformly in t, X nt converges to Xt in probability, that is,
P(sup
t≥0
|X nt − Xt | > ε) → 0
as n → ∞ for each ε > 0. Since ∫ t0 d〈Mc〉s < ∞, by dominated convergence∫ t 0 f ′′(X ns−) d〈Mc〉s → ∫ t 0 f ′′(Xs−) d〈Mc〉s in probability. Therefore Qn(t) → Q(t) in probability. Also, f (X nt ) → f (Xt ) and f (X0) → f (X0), both in probability. We now show Sn(t) → S(t). Write∫ t 0 f ′(X ns−) dA n s − ∫ t 0 f ′(Xs−) dAs = [ ∫ t 0 f ′(X ns−) dA n s − ∫ t 0 f ′(X ns−) dAs ] + [ ∫ t 0 f ′(X ns−) dAs − ∫ t 0 f ′(Xs−) dAs ] = In1 + In2 . We see that |In1 | ≤ ‖ f ′‖∞ ∫ t 0 |dAns − dAs| → 0 as n → ∞, while by dominated convergence, |In2 | also tends to 0. We next look at the stochastic integral part of Sn(t).∫ t 0 f ′(X ns−) dM n s − ∫ t 0 f ′(Xs−) dMs = [ ∫ t 0 f ′(X ns−) dM n s − ∫ t 0 f ′(Xs−) dMns ] + [ ∫ t 0 f ′(Xs−) dMns − ∫ t 0 f ′(Xs−) dMs ] = In3 + In4 . 138 Processes with jumps The L2 norm of In3 is bounded by E ∫ t 0 | f ′(X ns−) − f ′(Xs−)|2 d[Mn]s ≤ E ∫ t 0 | f ′(X ns−) − f ′(Xs−)|2 d[M]s, which goes to zero by dominated convergence. Also In4 = ∫ t 0 f ′(Xs−) ∞∑ i=n+1 dMi(s), so using the orthogonality lemma (Lemma 17.2), the L2 norm of In4 is less than ‖ f ′‖2∞ ∞∑ i=n+1 E [Mi]∞ ≤ ‖ f ′‖2∞ ∞∑ i=n+1 E Mi(∞)2, which goes to zero as n → ∞. Finally, we look at the convergence of Jn. The idea here is to break both J (t) and Jn(t) into two parts, the jumps that might be relatively large (jumps at times Si for i ≤ N where N will be chosen appropriately) and the remaining jumps. Let N > 1 be chosen later.
J (t) − J n(t) =
∑
s≤t
[ f (Xs) − f (Xs−) − f ′(Xs−)�Xs]
−
∑
s≤t
[ f (X ns ) − f (X ns−) − f ′(X ns−)�X ns ]
=
∑
{i:Si≤t}
[ f (XSi ) − f (XSi−) − f ′(XSi−)�XSi ]
−
∑
{i:Si≤t}
[ f (X nSi ) − f (X nSi−) − f ′(X nSi−)�X nSi ]
=
∑
{i>N :Si≤t}
[ f (XSi ) − f (XSi−) − f ′(XSi−)�XSi ]
−
∑
{i>N :Si≤t}
[ f (X nSi ) − f (X nSi−) − f ′(X nSi−)�X nSi ]
+
∑
{i≤N,Si≤t}
{
[ f (XSi ) − f (XSi−) − f ′(XSi−)�XSi ]
− [ f (X nSi ) − f (X nSi−) − f ′(X nSi−)�X nSi ]
}
= IN5 − In,N6 + In,N7 .
By the fact that M and A are right continuous with left limits, |�MSi | ≤ 1/2 and |�ASi | ≤
1/2 if i is large enough (depending on ω), and then |�XSi | ≤ 1, and also
|�XSi |2 ≤ 2|�MSi |2 + 2|�ASi |2
≤ 2|�MSi |2 + |�ASi |.

17.4 The reduction theorem 139
We have
|IN5 | ≤ ‖ f ′′‖∞
∑
i>N,Si≤t
(�XSi )
2
and
|In,N6 | ≤ ‖ f ′′‖∞
∑
n≥i>N,Si≤t
(�XSi )
2.
Since
∑∞
i=1 |�MSi |2 ≤ [M]∞ < ∞ and ∑∞ i=1 |�ASi | < ∞, then given ε > 0, we can choose
N large such that
P(|IN5 | + |In,N6 | > ε) < ε. Once we choose N , we then see that In,N7 tends to zero in probability as n → ∞, since X nt converges in probability to Xt uniformly over t ≥ 0. We conclude that Jn(t) converges to J (t) in probability as n → ∞. This completes the proof. 17.4 The reduction theorem Let M be a process adapted to {Ft}. If there exist stopping times Tn increasing to ∞ such that each process Mt∧Tn is a uniformly integrable martingale, we say M is a local martingale. If each Mt∧Tn is a square integrable martingale, we say M is a locally square integrable martingale. We say a stopping time T reduces a process M if Mt∧T is a uniformly integrable martingale. Lemma 17.6 (1) The sum of two local martingales is a local martingale. (2) If S and T both reduce M, then so does S ∨ T . (3) If there exist times Tn → ∞ such that Mt∧Tn is a local martingale for each n, then M is a local martingale. Proof (1) If the sequence Sn reduces M and the sequence Tn reduces N , then Sn ∧ Tn will reduce M + N . (2) Mt∧(S∨T ) is bounded in absolute value by |Mt∧T | + |Mt∧S|. Both {|Mt∧T |} and {|Mt∧S|} are uniformly integrable families of random variables. Now use Proposition A.17. (3) Let Snm be a family of stopping times reducing Mt∧Tn and let S ′ nm = Snm ∧Tn. Renumber the stopping times into a single sequence R1, R2, . . . and let Hk = R1∨· · ·∨Rk . Note Hk ↑ ∞. To show that Hk reduces M , we need to show that Ri reduces M and use (2). But Ri = S′nm for some m, n, so Mt∧Ri = Mt∧Snm∧Tn is a uniformly integrable martingale. Let M be a local martingale with M0 = 0. We say that a stopping time T strongly reduces M if T reduces M and the martingale E [ |MT | | Fs] is bounded on [0, T ), that is, there exists K > 0 such that
sup
0≤sT ) | Ft].
The first term is bounded since T strongly reduces M . For the second term, if t < T , 1(tT ) | Ft] = E [ |MS|1(S>T )1(t 0, a.s., and so Mt never equals zero,
a.s. Observe that MT is the Radon–Nikodym derivative of Q with respect to P on FT .
Let Lt be the local martingale defined by
Lt =
∫ t
0
1
Ms−
dMs,
so that
dMt = Mt− dLt,
or M is the exponential of L.
Theorem 17.14 Suppose X is a local martingale with respect to P. Then Xt − Dt is a local
martingale with respect to Q, where
Dt =
∫ t
0
1
Ms
d[X , M]s =
∫ t
0
Ms−
Ms
d[X , L]s.
Note that in the formula for D, we are using a Lebesgue–Stieltjes integral.
Proof Exercise 17.6 tells us that it suffices to show that Mt (Xt − Dt ) is a local martingale
with respect to P. By Corollary 17.12,
d(M (X − D))t = (X − D)t− dMt + Mt− dXt − Mt− dDt
+ d[M, X − D]t .

Exercises 145
The first two terms on the right are local martingales with respect to P. Since D is of bounded
variation, the continuous part of D is zero, hence
[M, D]t =
∑
s≤t
�Ms�Ds =
∫ t
0
�Ms dDs.
Thus
Mt (Xt − Dt ) = local martingale + [M, X ]t −
∫ t
0
Ms dDs.
Using the definition of D shows that Mt (Xt − Dt ) is a local martingale.
Exercises
17.1 Suppose a(t) is a deterministic right-continuous nondecreasing function of t with a(0) = 0.
Prove the following formulas:
a(t)2 =
∫ t
0
[(a(t) − a(s)) + (a(t) − a(s−))] da(s), (17.14)
and a(t)2 =
∫ t
0
(2a(s−) + �a(s)) da(s)
= 2
∫ t
0
a(s−) da(s) +
∑
s
(�a(s))2. (17.15)
Hint: First do the case where a has only finitely many discontinuities.
17.2 If At is an increasing process and Ãt is its compensator, show that Ã jumps only when A does.
17.3 Let P jt , j ∈ Z, be independent Poisson processes with parameter λ j . Suppose λ j = λ− j for each
j �= 0. Suppose λ j decreases as j increases for j ≥ 1. Let
Xt =
∑
j∈Z
P jt .
Determine reasonable conditions on the sequence λ j so that X is a semimartingale. A local
martingale. A martingale. A locally square integrable martingale.
17.4 Show that if f (t) is a purely discontinuous function, then e f (t) is also.
17.5 Suppose M is a non-negative right-continuous martingale and T = inf{t > 0 : Mt = 0}. Show
that Mt = 0 on (t > T ).
17.6 Suppose P and Q are two equivalent probability measures, M∞ is the Radon–Nikodym derivative
of Q with respect to P, and Mt = E [M∞ | Ft ]. Show that Yt is a local martingale with respect
to Q if and only if YtMt is a local martingale with respect to P.
17.7 Suppose Xt is an increasing process with paths that are right continuous with left limits, X0 = 0,
a.s., X is purely discontinuous, and all jumps are of size +1 only. Suppose Xt − t is a martingale.
Prove that X is a Poisson process.
Hint: Imitate the proof of Theorem 12.1. When using Itô’s formula, it is important to use the
fact that �Xt is always 0 or 1.

146 Processes with jumps
17.8 Suppose Xt is an increasing process with paths that are right continuous with left limits, X0 = 0,
a.s., X is purely discontinuous, and all jumps are of size +1 only. Suppose limt→∞ Xt = ∞, a.s.
Prove that X is a time change of a Poisson process.
17.9 Suppose Pt is a Poisson process with parameter λ, {Ft} is the minimal augmented filtration for
P, and Mt = Pt − λt. Suppose Y is a F1 measurable random variable with finite mean and
variance. Prove that there exists a predictable process H such that
Y = EY +
∫ 1
0
Hs dMs.
17.10 Let P1 and P2 be two independent Poisson processes with the same parameter. Let Xt = P1t −P2t
and let {Ft} be the minimal augmented filtration for X . Find a bounded mean zero random
variable Y that is F1 measurable which does not satisfy
Y =
∫ 1
0
Hs dXs
for any predictable process H .

18
Poisson point processes
Poisson point processes are random measures that are related to Poisson processes. We will
use them when we study Lévy processes in Chapter 42. Poisson point processes are also
useful in the study of excursions, even excursions of a continuous process such as Brownian
motion (see Chapter 27), and they arise when studying stochastic differential equations with
jumps.
Let S be a metric space, G the collection of Borel subsets of S , and λ a measure on (S,G).
Definition 18.1 We say a map
N : � × [0, ∞) × G → {0, 1, 2, . . .}
(writing Nt (A) for N (ω, t, A)) is a Poisson point process if
(1) for each Borel subset A of S with λ(A) < ∞, the process Nt (A) is a Poisson process with parameter λ(A), and (2) for each t and ω, N (t, ·) is a measure on G. A model to keep in mind is where S = R and λ is a Lebesgue measure. For each ω there is a collection of points {(s, z)} (where the collection depends on ω). The number of points in this collection with s ≤ t and z in a subset A is Nt (A)(ω). Since λ(R) = ∞, there are infinitely many points in every time interval. A consequence of the definition is that since λ(∅) = 0, then Nt (∅) is a Poisson process with parameter 0; in other words, Nt (∅) is identically zero. Our main result is that Nt (A) and Nt (B) are independent if A and B are disjoint. Theorem 18.2 Let {Ft} be a filtration satisfying the usual conditions. Let S be a metric space furnished with a positive measure λ. Suppose that Nt (A) is a Poisson point process with respect to the measure λ. If A1, . . . , An are pairwise disjoint measurable subsets of S with λ(Ak ) < ∞ for k = 1, . . . , n, then the processes Nt (A1), . . . , Nt (An) are mutually independent. Proof We first make the observation that because N (t, ·) is a measure and the A1, A2, . . . , An are disjoint, then ∑n k=1 Nt (Ak ) = Nt (∪nk=1Ak ) is a Poisson process with finite parameter. A Poisson process has jumps of size one only, hence no two of the Nt (Ak ) have jumps at the same time. To prove the theorem, it suffices to let 0 = r0 < r1 < · · · < rm and show that the random variables {Nrj (Ak ) − Nrj−1 (Ak ) : 1 ≤ j ≤ m, 1 ≤ k ≤ n} 147 148 Poisson point processes are independent. Since for each j and each k, Nrj (Ak ) − Nrj−1 (Ak ) is independent of Fr j−1 , it suffices to show that for each j ≤ m, the random variables {Nrj (Ak ) − Nrj−1 (Ak ) : 1 ≤ k ≤ n} are independent. We will do the case j = m = 1 and write r for r j for simplicity; the case when j, m > 1 differs only in notation.
We will prove this using induction. We start with the case n = 2 and show the independence
of Nr(A1) and Nr(A2). Each Nt (Ak ) is a Poisson process, and so Nt (Ak ) has moments of all
orders. Let u1, u2 ∈ R and set
φk = λ(Ak )(eiuk − 1), k = 1, 2.
Let
Mkt = eiuk Nt (Ak )−tφk .
We see that Mkt is a martingale because E e
iuk Nt (Ak ) = etφk , and therefore
E [Mkt | Fs] = Mks E [eiu(Nt (Ak )−Ns(Ak )))−(t−s)φk | Fs]
= Mks e−(t−s)φk E [eiu(Nt (Ak )−Ns(Ak ))] = Mks ,
using the independence and stationarity of the increments of a Poisson process.
Since we have argued that no two of the Nt (Ak ) jump at the same time, the same is true for
the Mkt and so [M
j, Mk]t = 0 if j �= k. By the product formula (Corollary 17.12) and Itô’s
formula (Theorem 17.10)
Mkt = 1 − φk
∫ t
0
eiuk Ns−(Ak )−sφk ds + iuk
∫ t
0
eiuk Ns−(Ak )−sφk dNs(Ak )
+
∑
s≤t
eiuk Ns−(Ak )−sφk [eiuk�Ns(Ak ) − 1 − iuk�Ns(Ak )]
= 1 − φk
∫ t
0
eiuk Ns−(Ak )−sφk ds +
∑
s≤t
eiuk Ns−(Ak )−sφk [eiuk�Ns(Ak ) − 1]
= 1 − B̃kt + Bkt .
We see therefore that Mkt − 1 is of the form Bkt − B̃kt , where Bkt is a complex-valued process
whose paths are locally of bounded variation, and B̃kt is the compensator of B
k
t .
Let M
k
t = Mkt∧r − 1. Since the Mkt do not jump at the same time, by the orthogonality
lemma (Lemma 17.2), E M
1
∞M
2
∞ = 0, which translates to
E M1r M
2
r = 1.
This implies
E
[
ei(u1Nr(A1 )+u2Nr(A2 ))
]
= erφ1 erφ2 = E
[
eiu1Nr(A1 )
]
E
[
eiu2Nr(A2 )
]
.
Since this holds for all u1, u2, then Nr(A1) and Nr(A2) are independent. We conclude that the
processes Nt (A1) and Nt (A2) are independent.

Poisson point processes 149
To handle the case n = 3, we first show that M1t M2t is a martingale. We write
E [M1t M
2
t | Fs]
= M1s M2s e−(t−s)(φ1+φ2)E [ei(u1(Nt (A1 )−Ns(A1 ))+u2(Nt (A2 )−Ns(A2 ))) | Fs]
= M1s M2s e−(t−s)(φ1+φ2)E [ei(u1(Nt (A1 )−Ns(A1 ))+u2(Nt (A2 )−Ns(A2 )))]
= M1s M2s ,
using the fact that Nt (A1) and Nt (A2) are independent of each other and each have stationary
and independent increments.
Note that M3t = eiu3Nt (A3 )−tφ3 has no jumps in common with M1t or M2t . Therefore if
M
3
t = M3t∧r, then
E [M
3
∞(M
1
∞M
2
∞)] = 0,
and as before this leads to
E [M3r (M
1
r M
2
r )] = 1.
As above this implies that Nr(A1), Nr(A2), and Nr(A3) are independent. To prove the general
induction step is similar.
We will also need the following corollary.
Corollary 18.3 Let Ft and Nt (Ak ) be as in Theorem 18.2. Suppose Yt is a process with paths
that are right continuous with left limits such that Yt − Ys is independent of Fs whenever
s < t and Yt − Ys has the same law as Yt−s for each s < t. Suppose moreover that Y has no jumps in common with any of the Nt (Ak ). Then the processes Nt (A1), . . . , Nt (An), and Yt are independent. Proof The law of Y0 is the same as that of Yt − Yt , so Y0 = 0, a.s. By the fact that Y has stationary and independent increments, E eiuYs+t = E eiuYsE eiu(Ys+t−Ys) = E eiuYsE eiuYt , which implies that the characteristic function of Y is of the form E eiuYt = etψ(u) for some function ψ(u). We fix u ∈ R and define MYt = eiuYt−tψ(u). As in the proof of Theorem 18.2, we see that MYt is a martingale. Since M Y has no jumps in common with any of the Mkt , if M Y t = MYt∧r, we see by Lemma 17.2 that E [M Y ∞(M 1 ∞ · · · M n ∞)] = 1, or E [MYr M 1 r · · · Mnr ] = 1. This leads as above to the independence of Y from all the Nt (Ak )’s. We now turn to stochastic integrals with respect to Poisson point processes. In the same way that a nondecreasing function on the reals gives rise to a measure, so Nt (A) gives rise 150 Poisson point processes to a random measure μ(dt, dz) on the product σ -field B[0, ∞) × G, where B[0, ∞) is the Borel σ -field on [0, ∞); μ is determined by μ([0, t] × A)(ω) = Nt (A)(ω). Define a nonrandom measure ν on B[0, ∞) × G by ν([0, t] × A) = tλ(A) for A ∈ G. If λ(A) < ∞, then μ([0, t] × A) − ν([0, t] × A) is the same as a Poisson process minus its mean, hence is locally a square integrable martingale. We can define a stochastic integral with respect to the random measure μ − ν as follows. Suppose H (ω, s, z) is of the form H (ω, s, z) = n∑ i=1 Ki(ω)1(ai,bi](s)1Ai (z), (18.1) where for each i the random variable Ki is bounded and Fai measurable and Ai ∈ G with λ(Ai) < ∞. For such H we define Nt = ∫ t 0 ∫ H (ω, s, z) d(μ − ν)(ds, dz) (18.2) = n∑ i=1 Ki(μ − ν)(((ai, bi] ∩ [0, t]) × Ai). Let us assume without loss of generality that the Ai are disjoint. It is not hard to see (Exercise 18.3) that Nt is a martingale, that Nc = 0, and that [N]t = ∫ t 0 ∫ H (ω, s, z)2 μ(ds, dz). (18.3) Since 〈N〉t must be predictable and all the jumps of N are totally inaccessible, it follows from Proposition 16.30 that 〈N〉t is continuous. Since [N]t − 〈N〉t is a martingale, we conclude 〈N〉t = ∫ t 0 ∫ H (ω, s, z)2 ν(ds, dz). (18.4) Suppose H (s, z) is a predictable process in the following sense: H is measurable with respect to the σ -field generated by all processes of the form (18.1). Suppose also that E ∫ ∞ 0 ∫ S H (s, z)2 ν(ds, dz) < ∞. Take processes H n of the form (18.1) converging to H in the space L2 with norm (E ∫∞ 0 ∫ S H 2 dν)1/2. The corresponding Nnt = ∫ t 0 H n(s, z) d(μ − ν) are easily seen to be a Cauchy sequence in L2, and the limit Nt we call the stochastic integral of H with respect to μ − ν. As in the continuous case, we may prove that E N2t = E [N]t = E 〈N〉t , and it follows from this, (18.3), and (18.4) that [N]t = ∫ t 0 ∫ S H (s, z)2 μ(ds, dz), 〈N〉t = ∫ t 0 ∫ S H (s, z)2 ν(ds, dz). (18.5) One may think of the stochastic integral as follows: if μ gives unit mass to a point at time t with value z, then Nt jumps at this time t and the size of the jump is H (t, z). Exercises 151 Exercises 18.1 Suppose {Ft} is a filtration satisfying the usual conditions and P1t and P2t are Poisson processes with respect to {Ft} with parameters λ1, λ2, respectively. Suppose P1t + P2t is a Poisson process with parameter λ1 + λ2. Prove that P1 and P2 are independent processes. 18.2 Suppose {Ft} is a filtration satisfying the usual conditions, Pt is a Poisson process with respect to {Ft}, and Wt is a Brownian motion with respect to {Ft}. Show that if Wt + Pt has stationary and independent increments, then P and W are independent processes. 18.3 If H is as in (18.1) and N is defined by (18.2), show that N is a martingale, Nc = 0, and [N]t is given by (18.3). 18.4 Suppose {As, 0 < s < ∞} is a collection of subsets of S such that λ(As) → ∞ as s → ∞. Show that Nt (As)/λ(As) converges to t uniformly over finite intervals, where the convergence is in probability. 18.5 Suppose {As, 0 < s < ∞} is a collection of subsets of S such that Ar ⊂ As if r ≤ s and λ(As) → ∞ as s → ∞. Show that for each t, sup u≤t ∣∣∣Nu(As) λ(As) − u ∣∣∣ tends to zero almost surely as s → ∞. 18.6 Let S be a metric space and λ a σ -finite measure on S. Construct a Poisson point process which has λ as the corresponding measure. 18.7 Let P jt , j = 1, 2, . . . be independent Poisson processes with parameter β j . Let Xt = ∑∞ j=1 a jP j t , where a j is a sequence such that Xt is finite, a.s. For A ⊂ R \ {0}, define Nt (A) to be the number of times before time t that X has a jump whose size is in A: Nt (A) = ∑ s≤t 1A(Xs − Xs−). Prove that Nt is a Poisson point process and determine λ. 19 Framework for Markov processes It is not uncommon for a Markov process to be defined as a sextuple (�,F ,Ft, Xt, θt, Px), and for additional notation (e.g., ζ , �,S, Pt, Rλ, etc.) to be introduced rather rapidly. This can be intimidating for the beginner. We will explain all this notation in as gentle a manner as possible. We will consider a Markov process to be a pair (Xt, Px) (rather than a sextuple), where Xt is a single stochastic process and {Px} is a family of probability measures, one probability measure Px corresponding to each element x of the state space. 19.1 Introduction The idea that a Markov process consists of one process and many probabilities is one that takes some getting used to. To explain this, let us first look at an example. Suppose X1, X2, . . . is a Markov chain with stationary transition probabilities with K states: 1, 2, . . . , K. Everything we want to know about X can be determined if we know p(i, j) = P(X1 = j | X0 = i) for each i and j and μ(i) = P(X0 = i) for each i. We sometimes think of having a different Markov chain for every choice of starting distribution μ = (μ(1), . . . , μ(K)). But instead let us define a new probability space by taking �′ to be the collection of all sequences ω = (ω0, ω1, . . .) such that each ωn takes one of the values 1, . . . , K. Define Xn(ω) = ωn. Define Fn to be the σ -field generated by X0, . . . , Xn; this is the same as the σ -field generated by sets of the form {ω : ω0 = a0, . . . , ωn = an}, where a0, . . . , an ∈ {1, 2, . . . , K}. For each x = 1, 2, . . . , K, define a probability measure Px on �′ by Px(X0 = x0, X1 = x1, . . . Xn = xn) (19.1) = 1{x}(x0)p(x0, x1) · · · p(xn−1, xn). We have K different probability measures, one for each of x = 1, 2, . . . , K, and we can start with an arbitrary probability distribution μ if we define Pμ(A) =∑Ki=1 Pi(A)μ(i). We have lost no information by this redefinition, and it turns out this works much better when doing technical details. The value of X0(ω) = ω0 can be any of 1, 2, . . . , K; the notion of starting at x is captured by Px, not by X0. The probability measure Px is concentrated on those ω’s for which ω0 = x and Px gives no mass to any other ω. Let us now look at Brownian motion, and see how this framework plays out there. Let P be a probability measure and let Wt be a one-dimensional Brownian motion with respect to P started at 0. Then W xt = x + Wt is a one-dimensional Brownian motion started at x. Let �′ = C[0, ∞) be the set of continuous functions from [0, ∞) to R, so that each element 152 19.2 Definition of a Markov process 153 ω in �′ is a continuous function. (We do not require that ω(0) = 0 or that ω(0) take any particular value of x.) Define Xt (ω) = ω(t). (19.2) This will be our process. Let F be the σ -field on �′ = C[0, ∞) generated by the cylindrical subsets of C[0, ∞); see Definition 1.1. Now define Px to be the law of W x. This means that Px is the probability measure on (�′,F ) defined by Px(X ∈ A) = P(W x ∈ A), x ∈ R, A ∈ F . (19.3) The probability measure Px is determined by the fact that if n ≥ 1, t1 ≤ · · · ≤ tn, and B1, . . . , Bn are Borel subsets of R, then P(Xt1 ∈ B1, . . . , Xtn ∈ Bn) = P(W xt1 ∈ B1, . . . ,W xtn ∈ Bn). We call the pair (Xt, Px), x ∈ R, t ≥ 0, a Brownian motion. 19.2 Definition of a Markov process We want to allow our Markov processes to take values in spaces other than the Euclidean ones. For now, we take our state space S to be a separable metric space, furnished with the Borel σ -field. For the beginner, just think of R in place of S . To define a Markov process, we start with a measurable space (�,F ) and suppose we have a filtration {Ft} (not necessarily satisfying the usual conditions). Definition 19.1 A Markov process (Xt, Px) is a stochastic process X : [0, ∞) × � → S and a family of probability measures {Px : x ∈ S} on (�,F ) satisfying the following. (1) For each t, Xt is Ft measurable. (2) For each t and each Borel subset A of S , the map x → Px(Xt ∈ A) is Borel measurable. (3) For each s, t ≥ 0, each Borel subset A of S , and each x ∈ S , we have Px(Xs+t ∈ A | Fs) = PXs (Xt ∈ A), Px − a.s. (19.4) Some explanation is definitely in order. Let ϕ(x) = Px(Xt ∈ A), (19.5) so that ϕ is a function mapping S to R. Part of the definition of filtration given in Chapter 1 is that each Ft ⊂ F . Since we are requiring Xt to be Ft measurable, that means (Xt ∈ A) is in F and it makes sense to talk about Px(Xt ∈ A). Definition 19.1(2) says that the function ϕ is Borel measurable. This is a very mild assumption, and will be satisfied in the examples we look at. The expression PXs (Xt ∈ A) on the right-hand side of (19.4) is a random variable and its value at ω ∈ � is defined to be ϕ(Xs(ω)), with ϕ given by (19.5). Note that the randomness in PXs (Xt ∈ A) is thus all due to the Xs term and not the Xt term. Definition 19.1(3) can be rephrased as saying that for each s, t, each A, and each x, there is a set Ns,t,x,A ⊂ � that is a null set with respect to Px and for ω /∈ Ns,t,x,A, the conditional expectation Px(Xs+t ∈ A | Fs) is equal to ϕ(Xs). 154 Framework for Markov processes We have now explained all the terms in the sextuple (�,F ,Ft, Xt, θt, Px) except for θt . These are called shift operators and are maps from � → � such that Xs ◦ θt = Xs+t . We defer the precise meaning of the θt and the rationale for them until Section 19.5, where they will appear in a natural way. In the remainder of the section and in Section 19.3 we define some of the additional notation commonly used for Markov processes. The first one is almost self-explanatory. We use E x for expectation with respect to Px. As with PXs (Xt ∈ A), the notation E Xs f (Xt ), where f is bounded and Borel measurable, is to be taken to mean ψ(Xs) with ψ(y) = E y f (Xt ). If we want to talk about our Markov process started with distribution μ, we define Pμ(B) = ∫ Px(B) μ(dx), and similarly for E μ; here μ is a probability on S . 19.3 Transition probabilities IfB is the Borel σ -field on a metric space S , a kernel Q(x, A) onS is a map from S×B → R satisfying the following. (1) For each x ∈ S , Q(x, ·) is a measure on (S,B). (2) For each A ∈ B, the function x → Q(x, A) is Borel measurable. The definition of Markov transition probabilities (or simply transition probabilities) is the following. Definition 19.2 A collection of kernels {Pt (x, A); t ≥ 0} are Markov transition probabilities for a Markov process (Xt, Px) if (1) Pt (x,S ) = 1 for each t ≥ 0 and each x ∈ S . (2) For each x ∈ S , each Borel subset A of S , and each s, t ≥ 0, Pt+s(x, A) = ∫ S Pt (y, A)Ps(x, dy). (19.6) (3) For each x ∈ S , each Borel subset A of S , and each t ≥ 0, Pt (x, A) = Px(Xt ∈ A). (19.7) Definition 19.2(3) can be rephrased as saying that for each x, the measures Pt (x, dy) and Px(Xt ∈ dy) are the same. We define Pt f (x) = ∫ f (y)Pt (x, dy) (19.8) when f : S → R is Borel measurable and either bounded or non-negative. Lemma 19.3 Suppose Pt are Markov transition probabilities. If f is Borel measurable and either non-negative or bounded, then Pt f is non-negative (respectively, bounded) and Borel measurable and Pt f (x) = E x f (Xt ), x ∈ S. (19.9) 19.3 Transition probabilities 155 Proof Using (19.7) and Definition 19.1(2), the Borel measurability and (19.9) hold when f is the indicator of a set A. By linearity they hold for simple functions, and then using monotone convergence they hold for non-negative functions. Using linearity again, we have measurability and (19.9) holding for f bounded and Borel measurable. The non-negativity (respectively, the boundedness) of f follows from (19.9). The equations (19.6) are known as the Chapman–Kolmogorov equations. They can be rephrased in terms of equality of measures: for each x Ps+t (x, dz) = ∫ y∈S Pt (y, dz)Ps(x, dy). (19.10) Multiplying (19.10) by a bounded Borel measurable function f (z) and integrating gives Ps+t f (x) = ∫ Pt f (y)Ps(x, dy). (19.11) The right-hand side is the same as Ps(Pt f )(x), so we have Ps+t f (x) = PsPt f (x), (19.12) i.e., the functions Ps+t f and PsPt f are the same. The equation (19.12) is known as the semigroup property. By Lemma 19.3, Pt is a linear operator on the space of bounded Borel measurable functions on S . We can then rephrase (19.12) simply as Ps+t = PsPt . (19.13) Operators satisfying (19.13) are called a semigroup, and are much studied in functional analysis. We will show in Chapter 36 how to construct the Markov process corresponding to a given semigroup Pt . More about semigroups can also be found in Chapter 37. One more observation about semigroups: if we take expectations in (19.4), we obtain Px(Xs+t ∈ A) = E x [ PXs (Xt ∈ A) ] . The left-hand side is Ps+t1A(x) and the right-hand side is E x[Pt1A(Xs)] = PsPt1A(x), and so (19.4) encodes the semigroup property. The resolvent or λ-potential of a semigroup Pt is defined by Rλ f (x) = ∫ ∞ 0 e−λtPt f (x) dt, λ ≥ 0, x ∈ S. This can be recognized as the Laplace transform of Pt . By Lemma 19.3 and the Fubini theorem, we see that Rλ f (x) = E x ∫ ∞ 0 e−λt f (Xt ) dt. Resolvents are useful because they are typically easier to work with than semigroups. When practitioners of stochastic calculus tire of a martingale, they “stop” it. Markov process theorists are a harsher lot and they “kill” their processes. To be precise, attach an 156 Framework for Markov processes isolated point � to S . Thus one looks at Ŝ = S ∪ �, and the topology on Ŝ is the one generated by the open sets of S and {�}. � is called the cemetery point. All functions on S are extended to Ŝ by defining them to be 0 at �. At some random time ζ the Markov process is killed, which means that Xt = � for all t ≥ ζ . The time ζ is called the lifetime of the Markov process. 19.4 An example Let us give an example, that of Brownian motion, of course. Let Xt and Px be defined by (19.2) and (19.3). Define Ft = σ (Xr; r ≤ t). Clearly Definition 19.1(1) holds. Observe that since, under P, Wt is a mean zero normal random variable with variance t, Px(Xt ∈ A) = P(W xt ∈ A) = P(x + Wt ∈ A) (19.14) = 1√ 2πt ∫ A e−(y−x) 2/2t dy. By dominated convergence, x → Px(Xt ∈ A) is continuous, therefore measurable. This proves Definition 19.1(2). It remains to prove Definition 19.1(3), which is the following proposition. Proposition 19.4 Let W be a Brownian motion as defined by Definition 2.1, let W xt = x+Wt, and let (Xt, Px) be defined by (19.2) and (19.3). If f is bounded and Borel measurable, E x[ f (Xt+s) | Fs] = E Xs f (Xt ), Px-a.s. (19.15) Proof We will first prove E x[ f (Xt+s) | Fs] = E Xs f (Xt ) (19.16) when f (x) = eiux. Using independent increments and the fact that Wt+s − Ws has the same law as Wt , we see that under each Px, Xt+s − Xs is independent of Fs and has the same law as a mean zero normal random variable with variance t. We conclude that E xeiu(Xt+s−Xs) = e−u2t/2; see (A.25). We then write E x [ eiuXt+s |Fs ] = E x [ eiu(Xt+s−Xs)|Fs ] eiuXs = E x [ eiu(Xt+s−Xs) ] eiuXs = e−u2t/2eiuXs . On the other hand, for any y, E yeiuXt = E eiuW yt = E eiuWt eiuy = e−u2t/2eiuy. Replacing y by Xs proves (19.16) for this f . Now suppose that f ∈ C∞ with compact support and let f̂ be the Fourier transform of f . In (19.16) we replace u by −u, multiply both sides by f̂ (u), and integrate over u ∈ R. Using 19.4 An example 157 the Fourier inversion formula, we then have E x[ f (Xt+s) | Fs] = (2π)−1E x [ ∫ e−iuXt+s f̂ (u) du | Fs ] = (2π)−1E Xs ∫ e−iuXt f̂ (u) du = E Xs f (Xt ). We used the Fubini theorem several times to interchange expectation and integration; this is justified because f in C∞ with compact support implies f̂ is in the Schwartz class; see Section B.2. This proves the proposition for f in C∞ with compact support, and a limit argument gives it for all bounded and measurable f . The same proof works for d-dimensional Brownian motion. Set Pt (x, A) = Px(Xt ∈ A) = P(Wt + x ∈ A) = 1√ 2πt ∫ A e−(y−x) 2/2t dy. (19.17) Clearly for each x and t, Pt (x, ·) is a measure with total mass 1. As we mentioned earlier, the function x → Pt (x, A) is continuous, hence Borel measurable. We will show the Chapman– Kolmogorov equations. These follow from the next proposition. Proposition 19.5 If s, t > 0 and x, z ∈ R, then∫
y∈R
1√
2πt
e−(y−x)
2/2t 1√
2πs
e−(z−y)
2/2s dy (19.18)
= 1√
2π(s + t)e
−(z−x)2/2(s+t).
Proof This is a well-known property of the Gaussian density, but we can derive (19.18)
from Proposition 19.4. Let f be continuous with compact support. Taking expectations in
(19.15),
E
x f (Xt+s) = E x[E Xs f (Xt )],
or
Pt+s f (x) = PsPt f (x).
Using Lemma 19.3 and (19.17),∫
f (x)
1√
2π(s + t)e
−(z−x)2/2(s+t) dx
=
∫
f (x)
∫
1√
2πt
e−(y−x)
2/2t 1√
2πs
e−(z−y)
2/2s dy dx.
Since this holds for all continuous f with compact support, (19.18) holds for almost every
x. Since both sides of (19.18) are continuous in x, then (19.18) holds for all x.

158 Framework for Markov processes
19.5 The canonical process and shift operators
Suppose we have a Markov process (Xt, Px) where Ft = σ (Xs; s ≤ t). Suppose for the
moment that Xt has continuous paths. For this to even make sense, it is necessary that the
set {t → Xt is not continuous} to be in F , and then we require this event to be Px-null for
each x. Define �̃ to be the set of continuous functions on [0, ∞). If ω̃ ∈ �̃, set X̃t = ω̃(t).
Define F̃t = σ (X̃s; s ≤ t) and F̃∞ = ∨t≥0F̃t . Finally define P̃x on (�̃, F̃∞) by P̃x(X̃ ∈ ·) =
Px(X ∈ ·). Thus P̃x is specified uniquely by
P̃x(X̃t1 ∈ A1, . . . , X̃tn ∈ An) = Px(Xt1 ∈ A1, . . . , Xtn ∈ An)
for n ≥ 1, A1, . . . , An Borel subsets of S , and t1 < · · · < tn. Clearly there is so far no loss (or gain) by looking at the Markov process (X̃t, P̃x), which is called the canonical process. Let us now suppose we are working with the canonical process, and we drop the tildes everywhere. We define the shift operators θt : � → � as follows. θt (ω) will be an element of � and therefore is a continuous function from [0, ∞) to S . Define θt (ω)(s) = ω(t + s). Then Xs ◦ θt (ω) = Xs(θt (ω)) = θt (ω)(s) = ω(t + s) = Xt+s(ω). The shift operator θt takes the path of X and chops off and discards the part of the path before time t. We will use expressions like f (Xs) ◦ θt . If we apply this to ω ∈ �, then ( f (Xs) ◦ θt )(ω) = f (Xs(θt (ω))) = f (Xs+t (ω)), or f (Xs) ◦ θt = f (Xs+t ). If the paths of X are not continuous, but instead only right continuous with left limits, we can follow exactly the above procedure, except we start with �̃ being the collection of functions from [0, ∞) to S that are right continuous with left limits. Even if we are not in this canonical setup, from now on we will suppose there exist shift operators mapping � into itself so that Xs ◦ θt = Xs+t . Exercises 19.1 Suppose (Xt , Px) is a Brownian motion and St = sups≤t Xs. Show that ((Xt , St ), Px) is a Markov process and determine the transition probabilities. 19.2 Suppose (Xt , Px) is a Brownian motion, f a non-negative, bounded, Borel measurable function, and At = ∫ t 0 f (Xs) ds. Show that ((Xt , At ), P x) is a Markov process. 19.3 Suppose Pt is a Poisson process with parameter λ. Let �′ be the collection of functions on [0,∞) which are right continuous and which have left limits, let F be the σ -field on �′ generated by the cylindrical subsets of �′, let Pxt = x + Pt , and let Px be the law of x + P. Show that (Xt , Px) is a Markov process and determine the transition probabilities. Notes 159 19.4 Suppose m is a measure on the Borel subsets B of a metric space S. Suppose for each t > 0 there
exist jointly measurable non-negative functions pt : S ×S → R such that
∫
pt (x, y) m(dy) = 1
for each x and t and define
Pt (x, A) =
∫
A
pt (x, y) m(dy).
Show that the kernels Pt satisfy the Chapman–Kolmogorov equations if and only if∫
ps(x, y)pt (y, z) m(dy) = ps+t (x, z)
for every s, t ≥ 0, every x ∈ S, and m-almost every z.
19.5 The Ornstein–Uhlenbeck process Y started at x is a continuous Gaussian process with EYt =
e−t/2x and covariance
Cov (Ys,Yt ) = e−(s+t)/2(es∧t − 1).
If X is the canonical process and Px is the law of an Ornstein–Uhlenbeck process started at x,
show that (Xt , Px) is a Markov process and determine the transition probabilities.
Notes
For more, see Blumenthal and Getoor (1968).

20
Markov properties
We want to accomplish three things in this chapter. First, we want to talk about what it means
in the Markov process context for a filtration to satisfy the usual conditions. This is now
more complicated than in Chapter 1 because we have more than one probability measure.
Second, we want to extend the Markov property to expressions that are more complicated
than E x[ f (Xs+t ) | F s]. Third, we want to look at the strong Markov property, which means
we look at expressions like E x[ f (XT+t ) | FT ], where T is a stopping time.
Throughout this chapter we assume that X has paths that are right continuous with left
limits. To be more precise, if
N = {ω : the function t → Xt (ω) is not right continuous with left limits},
then we assume N ∈ F and N is Px-null for every x ∈ S .
20.1 Enlarging the filtration
Let us first introduce some notation. Define
F 00t = σ (Xs; s ≤ t), t ≥ 0. (20.1)
This is the smallest σ -field with respect to which each Xs is measurable for s ≤ t. We let
F 0t be the completion of F 00t , but we need to be careful what we mean by completion here,
because we have more than one probability measure present. Let N be the collection of sets
that are Px-null for every x ∈ S . Thus N ∈ N if (Px)∗(N ) = 0 for each x ∈ S , where (Px)∗
is the outer probability corresponding to Px. The outer probability (Px)∗ is defined by
(Px)∗(S) = inf{Px(B) : A ⊂ B, B ∈ F}.
Let
F 0t = σ (F 00t ∪ N ). (20.2)
Finally, let
Ft = F 0t+ = ∩ε>0F 0t+ε. (20.3)
We call {Ft} the minimal augmented filtration generated by X . Ultimately, we will work
only with {Ft}, but we need the other two filtrations at intermediate stages. The reason for
worrying about which filtrations to use is that {F 00t } is too small to include many interesting
sets (such as those arising in the law of the iterated logarithm, for example), while if the
filtration is too large, the Markov property will not hold for that filtration.
160

20.1 Enlarging the filtration 161
The filtration matters when defining a Markov process; see Definition 19.1(3). We will
assume throughout this section that (Xt, Px) is a Markov process with respect to the filtration
{F 00t }, that is,
Px(Xs+t ∈ A | F 00s ) = PXs (Xt ∈ A), Px-a.s. (20.4)
whenever A is a Borel subset of S and s, t ≥ 0.
We will also make the following assumption, which will be needed here and also in
Section 20.3.
Assumption 20.1 Suppose Pt f is continuous on S whenever f is bounded and continuous
on S .
Markov processes satisfying Assumption 20.1 are called Feller processes or weak Feller
processes. If Pt f is continuous whenever f is bounded and Borel measurable, then the
Markov process is said to be a strong Feller process.
We show that we can replace F 00t in (20.4) by F 0t .
Proposition 20.2 Let (Xt, Px) be a Markov process and suppose that (20.4) holds. If A is a
Borel subset of S , x ∈ S , and s, t ≥ 0, then
Px(Xs+t ∈ A | F 0s ) = PXs (Xt ∈ A), Px-a.s. (20.5)
Proof Since the right-hand side is a function of Xs and hence F 0s measurable, we need to
show that if B ∈ F 0s , then
Px(Xs+t ∈ A, B) = E x
[
PXs (Xt ∈ A); B
]
. (20.6)
This holds for B ∈ F 00s by (20.4). It holds for sets B ∈ N , the class of null sets, since both
sides are 0. Therefore it holds for sets B such that there exists B1 ∈ F 00s with B�B1 being
a null set. By linearity it holds for finite disjoint unions of sets of the form just described.
The class of such finite disjoint unions is a monotone class that generates F 0s , and our result
follows by the monotone class theorem, Theorem B.2.
The next step is to go from F 0s to Fs.
Proposition 20.3 Let (Xt, Px) be a Markov process and suppose that (20.4) holds. If As-
sumption 20.1 holds and f is a bounded Borel measurable function, then
E
x[ f (Xs+t ) | Fs] = E Xs f (Xt ), Px-a.s. (20.7)
It will turn out (see Proposition 20.7 below) that F 0s is equal to Fs, but we do not know
this yet.
Proof We start with (20.5). By linearity, we have
E
x[ f (Xs+t ) | F 0s ] = E Xs f (Xt ), Px-a.s., (20.8)
when f is a simple random variable, then by monotone convergence when f is non-negative,
and then by linearity again, when f is bounded and Borel measurable. In particular, we have
this when f is bounded and continuous.

162 Markov properties
If B ∈ Fs = F 0s+, then B ∈ F 0s+ε for every ε > 0. Hence by (20.8) with s replaced by
s + ε, if f is bounded and continuous,
E
x[ f (Xs+t+ε ); B] = E x
[
E
Xs+ε f (Xt ); B
]
. (20.9)
The right-hand side is equal to
E
x[Pt f (Xs+ε ); B];
since Pt f is continuous and Xt has paths that are right continuous with left limits, this
converges to
E
x[Pt f (Xs); B] = E x
[
E
Xs f (Xt ); B
]
by dominated convergence. The left-hand side of (20.9) converges, using dominated conver-
gence, the continuity of f , and the fact that X has paths that are right continuous with left
limits, to
E
x[ f (Xs+t ); B].
We therefore have
E
x[ f (Xs+t ); B] = E x
[
E
Xs f (Xt ); B
]
. (20.10)
A limit argument shows this holds whenever f is bounded and measurable. Since B is an
arbitrary event in Fs, that completes the proof.
Remark 20.4 In Chapter 16, we discussed the fact that the first time a right continuous
process whose jump times are totally inaccessible hits a Borel set is a stopping time, provided
the filtration satisfies the usual conditions. Even though the notion of completion of a filtration
is a bit different in the context of Markov processes, the result is still true. See Blumenthal
and Getoor (1968).
20.2 The Markov property
We start with the Markov property given by Proposition 20.3:
E
x[ f (Xs+t ) | Fs] = E Xs [ f (Xt )], Px-a.s. (20.11)
Since f (Xs+t ) = f (Xt ) ◦ θs, if we write Y for the random variable f (Xt ), we have
E
x[Y ◦ θs | Fs] = E XsY, Px-a.s. (20.12)
We wish to generalize this to other random variables Y .
Proposition 20.5 Let (Xt, Px) be a Markov process and suppose (20.11) holds. Suppose
Y =∏ni=1 fi(Xti−s), where the fi are bounded, Borel measurable, and s ≤ t1 ≤ · · · ≤ tn. Then
(20.12) holds.

20.2 The Markov property 163
Proof We will prove this by induction on n. The case n = 1 is (20.11), so we suppose the
equality holds for n and prove it for n + 1.
Let V =∏n+1j=2 f j(Xtj−t1 ) and h(y) = E yV . By the induction hypothesis,
E x
[ n+1∏
j=1
f j(Xtj )|Fs
]
= E x
[
E x[V ◦ θt1 |Ft1 ] f1(Xt1 )|Fs
]
= E x
[
(E
Xt1 V ) f1(Xt1 )|Fs
]
= E x[(h f1)(Xt1 )|Fs].
By (20.11) this is E Xs [(h f1)(Xt1−s)]. For any y,
E
y[(h f1)(Xt1−s)] = E y[(E Xt1−sV ) f1(Xt1−s)]
= E y
[
E
y[V ◦ θt1−s|Ft1−s] f1(Xt1−s)
]
= E y[(V ◦ θt1−s) f1(Xt1−s)].
If we replace V by its definition, replace y by Xs, and use the definition of θt1−s, we get the
desired equality for n + 1 and hence the induction step.
We now come to the general version of the Markov property. As usual, F∞ = ∨t≥0Ft .
The expression Y ◦ θt for general Y may seem puzzling at first. We will give some examples
when we get to applications of the strong Markov property in Chapter 21.
Theorem 20.6 Let (Xt, Px) be a Markov process and suppose (20.11) holds. Suppose Y is
bounded and measurable with respect to F∞. Then
E x[Y ◦ θs | Fs] = E XsY, Px-a.s. (20.13)
Proof If in Proposition 20.5 we take f j(x) = 1Aj (x) for Borel measurable Aj, we have
E
x[1B ◦ θs | Fs] = E Xs 1B (20.14)
when B = {ω : ω(t1) ∈ A1, . . . , ω(tn) ∈ An}. It is easy to see that the set of B’s for which
(20.14) holds is a monotone class. By an argument using the monotone class theorem, (20.14)
holds for all B that are measurable with respect to F∞. Taking linear combinations, (20.13)
holds for Y ’s that are simple random variables. Using monotone convergence, (20.13) holds
for non-negative Y ’s, and then by linearity for bounded Y ’s.
Proposition 20.7 Let (Xt, Px) be a Markov process with respect to {Ft}. Let F 0t and Ft be
defined by (20.2) and (20.3). Then Ft = F 0t for each t ≥ 0.
Proof Let Y1 =
∏n
i=1 fi(Xti ) and Y2 =
∏m
j=1 gj(Xuj ), where t1 < · · · < tn ≤ s and 0 ≤ u1 < · · · < um and the f j and gj are bounded Borel measurable functions. Then by Proposition 20.5, E x[(Y1)(Y2 ◦ θs) | Fs] = Y1E XsY2. Since E XsY2 is a function of Xs, then (Y1)(E XsY2) is F 0s measurable. Using a monotone class argument, we conclude that if Y is bounded and F∞ measurable, then E x[Y | Fs] is F 0s 164 Markov properties measurable. Now apply this to Y = 1A for A ∈ Fs to obtain that 1A = E x[1A | Fs] is F 0s measurable. The following is known as the Blumenthal 0–1 law. Proposition 20.8 Let (Xt, Px) be a Markov process with respect to {Ft}. If A ∈ F0, then for each x, Px(A) is equal to 0 or 1. Proof Suppose A ∈ F0. Under Px, X0 = x, a.s., and then Px(A) = E X0 1A = E x[1A ◦ θ0 | F0] = 1A ◦ θ0 = 1A ∈ {0, 1}, Px-a.s. since 1A ◦ θ0 is F0 measurable. Our result follows because Px(A) is a real number and not random. 20.3 Strong Markov property Given a stopping time T , recall that the σ -field of events known up to time T is defined to be FT = { A ∈ F∞ : A ∩ (T ≤ t) ∈ Ft for all t > 0
}
.
We define θT by θT (ω)(t) = ω(T (ω) + t). Thus, for example, Xt ◦ θT (ω) = XT (ω)+t (ω) and
XT (ω) = XT (ω)(ω).
Now we can state the strong Markov property. The notation and definition are admittedly
a bit opaque at this stage – be patient until we reach the examples in the next chapter.
Theorem 20.9 Suppose (Xt, Px) is a Markov process with respect to {Ft}, that Assumption
20.1 holds, and that T is finite stopping time. If Y is bounded and measurable with respect
to F∞, then
E
x[Y ◦ θT |FT ] = E XT Y, Px-a.s.
Proof Following the proofs of Section 20.2, it is enough to prove
E
x[ f (XT+t )|FT ] = E XT f (Xt ) (20.15)
for f bounded. We can obtain this by a limit argument if we have (20.15) for f bounded and
continuous. Define Tn to be equal to (k + 1)/2n on the event (k/2n ≤ T < (k + 1)/2n). If A ∈ FT , then A ∈ FTn . Therefore A ∩ (Tn = k/2n) ∈ Fk/2n and we have by the Markov property, Theorem 20.6, E x[ f (XTn+t ); A, Tn = k/2n] = E x[ f (Xt+k/2n ); A, T = k/2n] = E x[E Xk/2n f (Xt ); A, Tn = k/2n] = E x[E XTn f (Xt ); A, Tn = k/2n]. 20.3 Strong Markov property 165 Then E x[ f (XTn+t ); A] = ∞∑ k=1 E x[ f (XTn+t ); A, Tn = k/2n] = ∞∑ k=1 E x[ E XTn f (Xt ); A, Tn = k/2n ] = E x[E XTn f (Xt ); A]. Now let n → ∞. E x[ f (XTn+t ); A] → E x[ f (XT+t ); A)] by dominated convergence and the continuity of f and the right continuity of Xt . On the other hand, using the continuity of Pt f , E XTn f (Xt ) = Pt f (XTn ) → Pt f (XT ) = E XT f (Xt ). Therefore E x[ f (XT+t ); A] = E x[E XT f (Xt ); A] for all A ∈ FT , and hence (20.15) holds. Recall that we are restricting our attention to Markov processes whose paths are right continuous with left limits. If we have a Markov process (Xt, Px) whose paths are right continuous with left limits, which has shift operators {θt}, and which satisfies the conclusion of Theorem 20.9, whether or not Assumption 20.1 holds, then we say that (Xt, Px) is a strong Markov process. A strong Markov process is said to be quasi-left continuous if XTn → XT , a.s., on {T < ∞} whenever Tn are stopping times increasing up to T . Unlike in the definition of predictable stopping times given in Chapter 16, we are not requiring the Tn to be strictly less than T . A Hunt process is a strong Markov process that is quasi-left continuous. Quasi-left continuity does not imply left continuity; consider the Poisson process. Proposition 20.10 If (Xt, Px) is a strong Markov process and Assumption 20.1 holds, then Xt is quasi-left continuous. Proof First suppose T is bounded, Tn increases to T , Y = limn→∞ XTn , and f and g are bounded and continuous. If Tn = T for some n, then limn→∞ g(XTn+t ) = g(XT+t ), and if Tn < T for all n, then limn→∞ g(XTn+t ) = g(X(T+t)−), where Xs− is the left-hand limit at time s. In either case, lim t→0 lim n→∞ g(XTn+t ) = g(XT ). Then E x[ f (Y )g(XT )] = lim t→0 lim n→∞ E x[ f (XTn )g(XTn+t )] = lim t→0 lim n→∞ E x[ f (XTn )Ptg(XTn )] = lim t→0 E x[ f (Y )Ptg(Y )] = E x[ f (Y )g(Y )]. By a limit argument we have E x[h(Y, XT )] = E x[h(Y,Y )] (20.16) for all bounded measurable functions h on S × S . Now take h(x, y) to be zero if x = y and one otherwise. The right-hand side of (20.16) is 0, so the left-hand side is also. 166 Markov properties If T is not bounded, apply the argument in the preceding paragraph to the stopping time T ∧ M , where M is a positive real, and then let M → ∞. Exercises 20.1 Suppose that S is a locally compact separable metric space and C0 is the set of continuous functions on S that vanish at infinity. To say a continuous function f vanishes at infinity means that given ε > 0 there exists a compact set K such that | f (x)| < ε if x /∈ K. Show that if Assumption 20.1 is replaced by the assumptions that Pt f ∈ C0 whenever f ∈ C0 and Pt f → f uniformly as t → 0 whenever f ∈ C0, then the conclusion of Theorem 20.9 still holds. 20.2 Suppose (Xt , Px) is a Markov process with respect to a filtration {Ft}. Suppose that Et ⊂ Ft for each t and that Xt is Et measurable for each t. Show that (Xt , Px) is a Markov process with respect to the filtration {Et}. 20.3 Give an example of a Markov process that is not a strong Markov process. Hint: Let the state space be [0,∞) and starting from x ∈ (0,∞), let X move deterministically at constant speed to the right. Starting at 0, let X wait an exponential length of time, and then begin moving at constant speed to the right. 20.4 Let (Xt , Px) be Brownian motion and let {Ft} be the minimal augmented filtration. Suppose B ∈ ∨t≥0Ft and for some s > 0 is of the form 1B = 1A ◦ θs. Show that if B is a Px-null set for
some x, then it is a Px-null set for every x.
20.5 Let Pt be transition probabilities for a Poisson process with parameter λ. These are defined in
Exercise 19.3. Show that Assumption 20.1 holds.
20.6 Suppose (Xt , Px) is a Markov process with transition probabilities Pt , f is a bounded Borel
measurable function, t0 > 0, and we define Mt = Pt0−t f (Xt ) for t ≤ t0. Show that (Mt , t ≤ t0)
is a Px-martingale for each x.
20.7 Use the Blumenthal 0–1 law to show that if W is a one-dimensional Brownian motion and
T = inf{t > 0 : Wt > 0} is the first time Brownian motion hits (0,∞), then P(T = 0) = 1.
20.8 Let A be a Borel subset of a metric space S. Let TA = inf{t : Xt ∈ A}, where (Xt , Px) is a strong
Markov process. Show that Px(TA = 0) is either 0 or 1 for each x.
20.9 Let (Xt , Px) be a strong Markov process and let A be a Borel subset of S. We define Ar by setting
Ar = {x : Px(TA = 0) = 1}, where TA is the first hitting time of A. Thus Ar is the set of points
that are regular for A. Prove that for each x,
Px(XTA ∈ A ∪ Ar) = 1.

21
Applications of the Markov properties
We give some applications of the Markov property and the strong Markov property. In the
first application, we show that d-dimensional Brownian motion is transient if d ≥ 3. Next
we consider estimates on additive functionals. (An example of an additive functional is
At =
∫ t
0 f (Xs) ds, where f is a non-negative function on the state space of the Markov
process X .) Third is a sufficient criterion for a Markov process to have continuous paths.
Finally, we discuss harmonic functions and show how to solve the classical Dirichlet problem
of analysis and partial differential equations.
21.1 Recurrence and transience
Let Wt = (W1(t), . . . ,Wd (t)) be a d-dimensional Brownian motion started at 0 with d ≥ 3
and let W xt = x +Wt be Brownian motion started at x. Let h(y) = |y|2−d . A direct calculation
of derivatives shows that
�h(x) =
d∑
i=1
∂2h
∂x2i
(x) = 0, x �= 0.
(Noting that
∂
∂yi
|y| = ∂
∂yi
(y21 + · · · + y2d )1/2 =
yi
|y|
helps with the calculation.) By Exercise 9.4, 〈Wi,Wj〉t equals 0 if i �= j and we saw in Section
9.3 that it equals t if i = j. Suppose r < |x| < R, and let S = inf{t : |W xt | ≤ r or |W xt | ≥ R}. S is finite, a.s., because |W xt | ≥ |W1(t)| − |x| and W1(t) exits [−2R, 2R] in finite time by Theorem 7.2. By Itô’s formula, h(W xt∧S ) = h(W x0 ) + martingale + 12 ∫ t∧S 0 d∑ i=1 ∂2h ∂x2i (W xs ) ds = h(x) + martingale. Therefore h(Wt∧S ) − h(x) is a martingale started at 0. The function h is equal to r2−d on ∂B(0, r), the boundary of B(0, r), and equal to R2−d on ∂B(0, R), the boundary of B(0, R). 167 168 Applications of the Markov properties By Corollary 3.17, we deduce P(W xt hits B(0, r) before B(0, R)) = P(h(W xt ) − h(x) hits r2−d − |x|2−d before R2−d − |x|2−d ) = |x| 2−d − R2−d r2−d − R2−d . If we let R → ∞ and recall that 2 − d < 0, we see that P(W xt ever hits ∂B(0, r)) = ( r |x| )d−2 . (21.1) We want to use the strong Markov property to go from (21.1) to lim t→∞ |W xt | = ∞. (There are other ways besides the strong Markov property of showing this.) The first step in doing this is to convert to the Markov process notation. Let (Xt, Px) be a Brownian motion. What we have shown is that Px(Xt ever hits ∂B(0, r)) = ( r |x| )d−2 . (21.2) Let M > 0 and let
S1 = inf{t : |Xt | ≥ 2M},
T1 = inf{t > S1 : |Xt | ≤ M},
S2 = inf{t > T1 : |Xt | ≥ 2M},
T2 = inf{t > S2 : |Xt | ≤ M},
and so on. Another way of writing this is to define
S = inf{t > 0 : |Xt | ≥ 2M}, T = inf{t > 0 : |Xt | ≤ M},
and then to let S1 = S, and for each i ≥ 1,
Ti = Si + T ◦ θSi, Si+1 = Ti + S ◦ θTi .
Let us explain what is going on. Given a path ω, which is a continuous function from [0, ∞)
to Rd , T ◦ θSi means to proceed along the path until time Si, disregard this piece, and then
see how long it takes after time Si to first enter B(0, M ). If we add the quantity Si to T ◦ θSi ,
we then get the amount of time for Xt to first enter B(0, M ) after time Si. Thus Ti with the
shift notation is the same as inf{t > Si : Xt ∈ B(0, M )}. The shift notation interpretation of
Si+1 is similar.
Now we can apply the strong Markov property. Since Ti+1 = Si+1 + T ◦ θSi+1 , we can write
Px(Ti+1 < ∞) = Px(Si+1 < ∞, T ◦ θSi+1 < ∞) = E x [ Px(T ◦ θSi+1 < ∞ | FSi+1 ); Si+1 < ∞ ] = E x [ PXSi+1 (T < ∞); Si+1 < ∞ ] . 21.2 Additive functionals 169 At time Si+1, we have |XSi+1 | = 2M , and by (21.1) PXSi+1 (T < ∞) = ( 12 )d−2. Therefore Px(Ti+1 < ∞) ≤ 22−dPx(Si+1 < ∞) ≤ 22−dPx(Ti < ∞). The last inequality is simply the fact that Si+1 ≥ Ti. Since Px(T1 < ∞) ≤ 1, induction tells us that Px(Ti < ∞) ≤ 2(2−d)(i−1) → 0 as i → ∞. Hence Px(Ti < ∞ for all i) = 0. Since Ti increases as i increases, for almost all ω, Ti will be infinite for i sufficiently large (how large will depend on ω). Hence Xt returns to B(0, M ) for a last time, a.s. Since M is arbitrary, this proves that Xt tends to ∞ as t → ∞. We have thus proved Proposition 21.1 If (Xt, Px) is a d-dimensional Brownian motion and d ≥ 3, then |Xt | → ∞ as t → ∞ with Px-probability one for each x. 21.2 Additive functionals Let D be a closed subset of S , let f : D → [0, ∞), let S = τD, and let A = sup x∈D E x ∫ S 0 f (Xs) ds, where τD = inf{t > 0 : Xt /∈ D} is the first time X exits D.
Proposition 21.2 If A < ∞, then sup x∈D Px ( ∫ S 0 f (Xs)ds ≥ 2kA ) ≤ 2−k. (21.3) This is rather remarkable: as soon as one gets a bound on the expectation, although it must be uniform in x, one gets exponential tails for the distribution. A use of Chebyshev’s inequality would only give the bound (2k)−1. Proof Let Bt = ∫ t∧S 0 f (Xs) ds. This is a special case of what is known as an additive functional; see Section 22.3. Let U1 = inf{t : Bt ≥ 2A}, and let Ui+1 = Ui + U1 ◦ θUi . To explain this formula, composing ω with θUi means we disregard the path before time Ui. Thus U1 ◦ θUi is the length of time after time Ui until Bt has increased an amount 2A over its value at Ui. Therefore Ui + U1 ◦ θUi is the (i + 1)st time B has increased by 2A. The event Px(BS ≥ 2kA) is bounded by Px(Uk ≤ S) = Px(Uk−1 ≤ S,U1 ◦ θUk−1 ≤ S ◦ θUk−1 ) = E x[Px(U1 ◦ θUk−1 ≤ S ◦ θUk−1 |FUk−1 );Uk−1 ≤ S] = E x[PXUk−1 (U1 ≤ S);Uk−1 ≤ S]. 170 Applications of the Markov properties If Uk−1 ≤ S, then XUk−1 ∈ D. If y ∈ D, Py(U1 ≤ S) ≤ Py ( ∫ S 0 f (Xs)ds ≥ 2A ) ≤ E y ∫ S 0 f (Xs)ds 2A ≤ 12 by Chebyshev’s inequality. Then Px(Uk ≤ S) ≤ 12Px(Uk−1 ≤ S) and (21.3) follows by induction. We give another proof of Proposition 4.5. Proposition 21.3 Let W be a one-dimensional Brownian motion. If T is a finite stopping time and a < b, then P(WT+t ∈ [a, b] | FT ) ≤ b − a√ 2πt , a.s. Proof Let (Xt, Px) be a one-dimensional Brownian motion. If y ∈ R, then Py(Xt ∈ [a, b]) = P0(Xt ∈ [a − y, b − y]) (21.4) = 1√ 2πt ∫ b−y a−y e−z 2/2t dz ≤ b − a√ 2πt . By the strong Markov property, P(WT+t ∈ [a, b] | FT ) = P0(XT+t ∈ [a, b] | FT ) = E 0[1[a,b](Xt ) ◦ θT | FT ] = E XT [1[a,b](Xt )] = PXT (Xt ∈ [a, b]). Now use (21.4) with y replaced by XT . 21.3 Continuity Let us now come up with a criterion for a Markov process to have continuous paths. We assume we have a strong Markov process (Xt, Px) whose paths are right continuous with left limits. Let d(·, ·) be the metric for the state space S . Lemma 21.4 Let (Xt, Px) be a strong Markov process with state space S . For all x ∈ S and all λ ≥ 0, Px(sup s≤t d(Xs, x) ≥ λ) ≤ 2 sup s≤t sup y∈S Py(d(Xs, X0) ≥ λ/2). Note that the left-hand side has the supremum inside while the right-hand side has the suprema outside the probability. Proof Let us use the notation F (t, λ) = sup s≤t sup y∈S Py(d(Xs, X0) ≥ λ). (21.5) 21.4 Harmonic functions 171 Write S = inf{t : d(Xt, X0) ≥ λ}. Then by the strong Markov property, Px(sup s≤t d(Xs, x) ≥ λ) ≤ Px(d(Xt, x) ≥ λ/2) + Px(S < t, d(Xt, X0) ≤ λ/2) ≤ F (t, λ/2) + E x [ PXS (d(Xt−S, X0) ≥ λ/2) ] ≤ 2F (t, λ/2); (21.6) see Exercise 21.2. Proposition 21.5 Let (Xt, Px) be a strong Markov process. With F (t, λ) defined as in (21.5), suppose F (t, λ) t → 0 (21.7) as t → 0 for each λ > 0. Then Xt has continuous paths with Px-probability one for each x.
For X a Brownian motion, F (t, λ) ≤ 2e−λ2/8t by Proposition 3.15, and hence F (t, λ)/t → 0
as t → 0. Thus Brownian motion satisfies (21.7). On the other hand, (21.7) is not satisfied
for the Poisson process; see Exercise 21.3.
Proof Suppose λ, t0 > 0 and X has a jump of size larger than 4λ at some time before t0
with positive probability, that is,
Px(sup
t≤t0
d(Xt−, Xt ) ≥ 4λ) > 0,
where Xt− = lims↑t,s 0 : Xt ∈ A}. Thus a point x is regular for a set A if starting at x the Brownian
motion enters A immediately. For example, a consequence of Theorem 7.2 is that the point
0 is regular for the set A = (0, ∞) when we have a one-dimensional Brownian motion.
Theorem 21.7 Suppose D is a bounded open domain in Rd and f is a function on ∂D that
is continuous on ∂D. Let (Xt, Px) be a d-dimensional Brownian motion and τD = inf{t :
Xt ∈ Dc}. If each point of ∂D is regular for Dc, then h(x) = E x f (XτD ) is a solution to the
Dirichlet problem.
The regularity condition says that starting at any point of ∂D, Brownian motion enters
Dc immediately. Uniqueness of the solution to the Dirichlet problem is easy, and we do not
address this here.
Proof We have already seen in Proposition 21.6 and the remarks immediately following
the proof of that proposition that h is harmonic in D. This implies that h is continuous in D.
Thus we only need to show that h agrees with f on ∂D.
Our first step is to fix t and ε and to show that the set
{x : Px(τD ≤ t) > 1 − ε}
is an open set. Let s < t, define ϕs(x) = Px(τD ≤ t − s), and let ws(x) = Px(Xu ∈ Dc for some u ∈ [s, t]). By the Markov property at time s, ws(x) = E xPXs (Xu ∈ Dc for some u ∈ [0, t − s]) = E x[PXs (τD ≤ t − s)] = E xϕs(Xs) = (2πs)−d/2 ∫ ϕs(y)e −|x−y|2/2s dy. By dominated convergence, the last integral is a continuous function of x. If w0(x) = Px(Xu ∈ Dc for some u ∈ [0, t]), then ws(x) ↑ w0(x), so {x : w0(x) > 1 − ε} = ∪s∈(0,t){x : ws(x) > 1 − ε} is open.
Let z ∈ ∂D. Let ε > 0 and choose η such that | f (w) − f (z)| < ε if |w − z| < η and w ∈ ∂D. Pick t small so that P0(sups≤t |Xs| > η/2) < ε; this is possible because Brownian 174 Applications of the Markov properties motion has continuous paths. Because z ∈ ∂D and every point of ∂D is regular for Dc, Pz(τD ≤ t) = 1. Finally choose δ < (η/2) ∧ ε so that if |w − z| < δ and w ∈ D, then Pw(τD ≤ t) > 1 − ε.
Now if |w − z| < δ and w ∈ D, then Pw(|XτD − z| < η) ≥ Pw(τD ≤ t, sup s≤t |Xs − w| ≤ η/2) ≥ Pw(τD ≤ t) − P0(sup s≤t |Xs| > η/2)
≥ (1 − ε) − ε.
The set ∂D is a bounded and closed subset of Rd , hence compact, and since f is continuous
on ∂D, there exists M such that | f | is bounded by M . If |w − z| < δ and w ∈ D, |h(w) − f (z)| = |E w f (XτD ) − f (z)| ≤ |E w[ f (XτD ); |XτD − z| < η] − f (z)Pw(|XτD − z| < η)| + 2MPw(|XτD − z| ≥ η) ≤ εPw(|XτD − z| < η) + 4Mε ≤ (1 + 4M )ε. We used the fact that | f (XτD ) − f (z)| < ε if |XτD − z| < η. Since ε is arbitrary, this proves that h(w) → f (z) as w → z inside D. Let us give a sufficient condition for a point to be regular for a domain D. Let Ṽa = {(x1, . . . , xd ) : x1 > 0, (x22 + · · · + x2d ) < a2x21}. The vertex of Ṽa is the origin. A cone V in Rd is a translation and rotation of Ṽa for some a. The following is known as the Poincaré cone condition. Proposition 21.8 Suppose there exists a cone V with vertex y ∈ ∂D such that V ∩ B(y, r) ⊂ Dc for some r > 0. Then y is regular for Dc.
Proof By translation and rotation of the coordinates, we may suppose y = 0 and V = Ṽa
for some a. Then for each t,
P0(τD ≤ t) ≥ P0(Xt ∈ Dc) ≥ P0(Xt ∈ V ∩ B(0, r))
≥ P0(Xt ∈ V ) − P0(Xt /∈ B(0, r)).
By scaling, the last term is P0(X1 ∈ V ) − P0(X1 /∈ B(0, r/
√
t)), which converges to
P0(X1 ∈ V ) = (2π)−d/2
∫
V
e−|z|
2/2 dz > 0
as t → 0. Observe P0(τD ≤ t) converges to P0(τD = 0). By the Blumenthal 0–1 law
(Proposition 20.8), P0(τD = 0) = 1.
Continue to suppose (Xt, Px) is a d-dimensional Brownian motion and D is a bounded
domain, but now we suppose d ≥ 3. Define
U (x, A) = E x
∫ ∞
0
1A(Xs) ds, x ∈ D.

21.4 Harmonic functions 175
This is the same as the λ-resolvent of 1A with λ = 0. We write
U (x, A) = E x
∫ ∞
0
1A(Xs), ds
=
∫ ∞
0
Px(Xs ∈ A) ds
=
∫ ∞
0
∫
A
1
(2πs)d/2
e−|y−x|
2/2s dy ds
=
∫
A
∫ ∞
0
1
(2πs)d/2
e−|y−x|
2/2s ds dy.
Some calculus shows that the inside integral is equal to c|x − y|2−d . If we denote c|x − y|2−d
by u(x, y), we then have that
U (x, A) =
∫
A
u(x, y) dy. (21.9)
The expression u(x, y) is called the Newtonian potential density. Note that u(x, y) is a
function only of |x − y|, it blows up as |x − y| → 0, and tends to 0 as |x − y| → ∞.
If x is in the interior of D, then u(x, ·) will be bounded on ∂D. Define hx(z) = E zu(x, XτD );
we saw above that hx is harmonic. Now define gD(x, y) = u(x, y) − hx(y); this function of
two variables is called the Green’s function or Green function for D with pole at x. This is
a well-known object in analysis – let us give a probabilistic interpretation. Since u(x, y) is
symmetric in x and y, if A ⊂ D we have∫
A
gD(x, y) dx =
∫
A
u(x, y) dx −
∫
A
E
yu(x, XτD ) dx (21.10)
= E y
∫ ∞
0
1A(Xs) ds − E y
∫
A
u(x, XτD ) dx
= E y
∫ ∞
0
1A(Xs) ds − E y
[
E
XτD
∫ ∞
0
1A(Xs) ds
]
.
Using the strong Markov property and then a change of variables,
E y
[
E XτD
∫ ∞
0
1A(Xs) ds
]
= E y
[
E y
[ ∫ ∞
0
1A(Xs) ◦ θτD ds | FτD
] ]
= E y
∫ ∞
0
1A(Xs) ◦ θτD ds
= E y
∫ ∞
0
1A(XτD+s) ds
= E y
∫ ∞
τD
1A(Xs) ds.
Substituting this in (21.10) we have∫
A
gD(x, y) dx = E y
∫ τD
0
1A(Xs) ds.
For this reason gD is sometimes called the occupation time density for D.

176 Applications of the Markov properties
Exercises
21.1 Suppose d = 2, (Xt , Px) is a two-dimensional Brownian motion, and r > 0. Imitate the argument
of Proposition 21.1 but with h(x) = log(|x|) to show that Px(Xt hits B(0, r)) = 1 when |x| > r.
Then use the strong Markov property to show that there are times Ti → ∞ with XTi ∈ B(0, r).
That is, two-dimensional Brownian motion is neighborhood recurrent.
21.2 In the proof of Lemma 21.4, justify each inequality in (21.6).
21.3 Let (Xt , Px) be a Poisson process with parameter a and let F be defined by (21.5). Show
F (t, 1/2)/t does not converge to 0 as t → 0.
21.4 Suppose d ≥ 3, (Xt , Px) is a d-dimensional Brownian motion, and
U f (x) = E x
∫ ∞
0
f (Xs) ds.
Show that if f is bounded and measurable with compact support, then U f is continuous and
|U f (x)| → 0 as |x| → ∞. Show that if f ∈ C2 with compact support, then U f is C2. Show
that 12�U f = − f .
21.5 Let Wt be a Brownian motion and f a continuous function. Prove that if f (Wt ) is a submartingale,
then f must be convex.
21.6 Prove the maximum principle for harmonic functions. This says that if h is harmonic in a bounded
domain D, then
sup
x∈D
|h(x)| ≤ sup
x∈∂D
|h(x)|.
21.7 If W is a d-dimensional Brownian motion started at 0, find E T , where T is the first time W exits
the ball of radius r centered at the origin.
Hint: Use the fact that |Wt |2 − dt is a martingale.
21.8 Let f : R → R be a bounded function with | f (x) − f (y)| ≤ |x − y| for all x, y ∈ R. Let
Dε = {(x, y) ∈ R2 : f (x) < y < f (x) + ε} for ε ∈ (0, 1). Let (Xt , Px) be a two-dimensional Brownian motion and let τε = inf{t : Xt /∈ Dε}. Prove that there exists a constant c not depending on ε such that E 0τε ≤ cε2. Hint: By Exercise 21.7 the expected time for two-dimensional Brownian motion to leave a ball of radius 2ε is less than cε2. Then use the strong Markov property repeatedly at the times Si, where Si is the first time after time Si−1 that Brownian motion has moved at least 2ε from XSi−1 . 22 Transformations of Markov processes There are a number of interesting transformations that make new Markov processes out of old. We will look at four: killing, conditioning, changing time, and stopping at a last exit time. These are only a few of the possible transformations. 22.1 Killed processes One sometimes wants to consider a Markov process up until a stopping time ζ , called the lifetime of the process. We affix to our state space S an isolated point �, called the cemetery state, and the topology on S� = S ∪ {�} is the one generated by the collection of open sets of S together with the set {�}. We define the killed process X̂ by X̂t = { Xt, t < ζ ; �, t ≥ ζ , (22.1) and we say we kill the process X at time ζ . Every function f on S is defined to be 0 at �. One example of this situation would be to let ζ = τD, where D is a subset of S and τD = inf{t > 0 : Xt /∈ D}, the first exit from the set D. Another common occurrence is to
let ζ = S, where S is a random variable with an exponential distribution with parameter λ,
i.e., P(S > t) = e−λt , such that S is independent of X . A third possibility would be to let
ζ = inf{t : ∫ t0 f (Xs) ds ≥ 1}, where f is a non-negative function. The crucial property of ζ
is that it be a terminal time:
ζ = s + ζ ◦ θs if s < ζ. (22.2) Proposition 22.1 If (Xt, Px) is a strong Markov process and (22.2) holds, then (X̂t, Px) satisfies the Markov and strong Markov properties. Proof As in Section 20.2, we need to show E x[ f (X̂t ) ◦ θT |FT ] = E X̂T f (X̂t ), Px-a.s. If A ∈ FT , E x[ f (X̂t ) ◦ θT ; A] = E x[ f (Xt+T ); A, T + t < ζ ]. 177 178 Transformations of Markov processes On the other hand, E X̂T f (X̂t ) = E XT [ f (Xt ); t < ζ ]1(T<ζ ) = E x[ f (Xt ) ◦ θT ; t ◦ θT < ζ ◦ θT |FT ]1(T <ζ ) = E x[ f (Xt+T ); T + t ◦ θT < T + ζ ◦ θT , T < ζ |FT ] = E x[ f (Xt+T ); T + t < ζ |FT ], since T + t ◦ θT = T + t and T + ζ ◦ θT = ζ on (T < ζ ). Hence E x[E X̂T f (X̂t ); A] = E x[ f (Xt+T ); T + t < ζ, A], as required. 22.2 Conditioned processes Another type of transformation of a Markov process is by conditioning, also known as Doob’s h-path transform. To motivate this, let D be a domain in Rd and let Xt be a Brownian motion killed on exiting the domain. One would like to give a precise meaning to the intuitive notion of Brownian motion conditioned to exit the domain at a certain point. Let h be a positive harmonic function in D (i.e., h is C2 in D, and �h = 0 there) and suppose that h is 0 everywhere on the boundary of D except at one point z. The Poisson kernel for the ball or for the half-space gives examples of such harmonic functions. Then, heuristically, we have by the Markov property at time t, Px(Xt ∈ dy|XτD = z) = Px(Xt ∈ dy, XτD = z) Px(XτD = z) = P x(Xt ∈ dy)Py(XτD = z) Px(XτD = z) . If p0(t, x, dy) represents the probability that Brownian motion started at x and killed on leaving D is in dy at time t, we then expect that the analogous probability for Brownian motion conditioned to exit D at z ought to be h(y)p0(t, x, dy)/h(x). We now make this precise. Let us look at a strong Markov process X . We say a function h is invariant with respect to X if Pth(x) = h(x) for all t and x, where Pt is the semigroup associated with X . If h is invariant, by the Markov property, E x[h(Xt ) | Fs] = E x[h(Xt−s) ◦ θs | Fs] = E Xs h(Xt−s) = Pt−sh(Xs) = h(Xs), and so for each x, h(Xt ) is a martingale with respect to Px. Conversely, if h(Xt ) is a martingale with respect to Px for all x, Pth(x) = E xh(Xt ) = h(x) by the definition of martingale, and so h is invariant. In the case of Brownian motion killed on leaving a domain, the invariant functions are thus the harmonic ones. 22.2 Conditioned processes 179 Now let h be a non-negative invariant function for a strong Markov process X . Letting Mt = h(Xt )/h(X0), Mt is a non-negative continuous martingale with M0 = 1, Px-a.s., as long as h(x) > 0.
We define the h-path transform of the Markov process X by setting
Pxh(A) = E x[Mt; A], A ∈ Ft . (22.3)
Since M0 = 1, Pxh(�) = 1. Observe that Pxh gives more mass to paths where h(Xt ) is big and
less to where it is small. Note the similarity to the Girsanov theorem.
We have the following.
Proposition 22.2 Suppose (Xt, Px) is a strong Markov process and that h is non-negative
and invariant. Then (Xt, Pxh) forms a strong Markov process.
Proof Suppose A ∈ Fs and h(x) �= 0. (We leave consideration of the case where h(x) = 0
to the reader.) Then
E
x
h[ f (Xt+s); A] =
E
x[ f (Xt+s)h(Xt+s); A]
h(x)
= E
x[E Xs [ f (Xt )h(Xt )]; A]
h(x)
= E x
[ 1
h(Xs)
E
Xs [ f (Xt )h(Xt )]h(Xs); A
]
by the Markov property for X . This is equal to
E
x[
E
Xs
h [ f (Xt )]h(Xs); A
]
/h(x) = Exh[E Xsh f (Xt ); A].
The Markov property follows from this. The strong Markov property is proved in almost
identical fashion.
Let us consider an example. Let (Xt, Px) be a Brownian motion on the non-negative axis
killed on first hitting 0. This is the same as a Brownian motion killed on exiting (0, ∞). This
will be a strong Markov process. Since the second derivative of the function h(x) = x is 0,
then h is harmonic on (0, ∞), and so is invariant for the killed Brownian motion. Let us
now condition using the function h to get Brownian motion conditioned to hit infinity before
hitting zero.
To identify the resulting process, we argue as follows. Fix x and let Tε = inf{t > 0 :
Xt < ε}. The Radon–Nikodym derivative of the law of Pxh with respect to Px on Ft∧Tε is Mt∧Tε = h(Xt∧Tε )/h(x). We can rewrite Mt∧Tε as Mt∧Tε = exp(log Xt∧Tε − log x) = exp ( ∫ t∧Tε 0 1 Xs dXs − 12 ∫ t∧Tε 0 ( 1 Xs )2 ds ) , using Itô’s formula. By the Girsanov theorem, under Pxh, Wt∧Tε = Xt∧Tε − ∫ t∧Tε 0 1 Xs ds 180 Transformations of Markov processes is a martingale. By Exercise 13.2, its quadratic variation is t ∧ Tε, and so by Exercise 12.3, Wt∧Tε is a Brownian motion stopped at time Tε. We have Xt∧Tε = x + Wt∧Tε + ∫ t∧Tε 0 1 Xs ds, or X satisfies the stochastic differential equation dXt = dWt + 1 Xt dt for t ≤ Tε. We will see later (Section 24.3) that this is the stochastic differential equation defining the Bessel process of order 3. The same argument shows that Brownian motion killed on exiting (0, a) and then conditioned to hit a before 0 is also a Bessel process of order 3 up until the time of first hitting a. 22.3 Time change An additive functional is an increasing adapted process with A0 = 0, a.s., such that At = As + At−s ◦ θs (22.4) if s < t. The simplest examples are what are known as classical additive functionals: At = ∫ t 0 f (Xr) dr, where f is a non-negative measurable function. We have At − As = ∫ t s f (Xr) dr = ∫ t−s 0 f (Xr) dr ◦ θs = At−s ◦ θ. If we have the uniform limit of additive functionals, we again get an additive functional, and thus, for example, the local times Lxt of a one-dimensional Brownian motion are also additive functionals. Given a Markov process X and an additive functional A, let Bt = inf{u : Au > t}
and
X ′t = XBt .
Let F ′t = FBt . Thus X ′ is a time change of X .
Proposition 22.3 Let (Xt, Px) be a strong Markov process and At an additive functional.
With B defined as above, (X ′t , P
x) is also a strong Markov process.
Proof We verify the strong Markov property. Let F ′t = FBt . Then if T is a stopping time
for F ′t , we have
E
x[ f (X ′T+t ) | F ′T ] = E x[ f (X (BT+t )) | FBT ].
BT can be seen to be a stopping time with respect to {Ft} and BT+t = Bt ◦ θBT where the θt
are the shift operators, so this is
E
x
E
X (BT ) f (XBt ) = E xE X
′
T f (X ′t ).
This suffices to show that X ′t is a strong Markov process.

22.4 Last exit decompositions 181
22.4 Last exit decompositions
Let A be a Borel set, and let L be the last visit to A:
L = sup{s : Xs ∈ A}.
We define L to be 0 if X never hits A. The random time L is not a stopping time, but we can
nevertheless kill the process X at time L. It turns out the resulting process Y is the process
X conditioned by the function h(x) = Px(TA < ∞). The intuitive meaning of this is that Y is X conditioned to hit the set A. Let T = inf{t : Xt ∈ A}, and set Yt = { Xt, t < L, �, t ≥ L. Let Ht = σ (Ys; s ≤ t). Proposition 22.4 If (Xt, Px) is a strong Markov process, then (Yt, Px) is a Markov process with respect to {Ht}. Proof If B ⊂ S (so that � /∈ B), then (Yt ∈ B) = (Xt ∈ B, L > t) = (Xt ∈ B, T ◦ θt < ∞), since L, the last time that X is in A, will be larger than t if and only if X hits A at some time after time t. We conclude that the function x → Px(Yt ∈ B) is Borel measurable. Since Px(Yt = �) = Px(L ≤ t) = 1 − Px(L > t) = 1 − Px(T ◦ θt < ∞), then the function x → Px(Yt = �) is also Borel measurable. We need to show that if C ∈ Hs, E x[ f (Yt );C] = E x[Qt−s f (Ys);C], (22.5) where f is bounded and measurable, h(x) = Px(L > 0), and
Qtg(x) = 1
h(x)
Pt (gh)(x)
when h(x) �= 0. (Set Qtg(x) = 0 if h(x) = 0.)
It suffices to show (22.5) when C = (Yr1 ∈ B1, . . . ,Yrn ∈ Bn) for r1 ≤ · · · ≤ rn ≤ s and
the B1, . . . , Bn are Borel sets. If we set
Cs = (Xr1 ∈ B1, . . . , Xrn ∈ Bn),
then Cs ∈ Fs, C ∩ (L > s) = Cs ∩ (L > s), and C ∩ (L > t) = Cs ∩ (L > t).
We start with
E
x[ f (Yt );C] = E x[ f (Xt );C, L > t] = E x[ f (Xt );Cs, L > t]
= E x[ f (Xt );Cs, L ◦ θt > 0].
Conditioning on Ft , this is equal to
E
x[ f (Xt )P
Xt (L > 0);Cs] = E x[ f (Xt )h(Xt );Cs].

182 Transformations of Markov processes
Conditioning on Fs, this in turn is equal to
E
x[Pt−s( f h)(Xt−s);Cs] = E x[h(Xs)Qt−s f (Xs);Cs] (22.6)
= E x[PXs (L > 0)Qt−s f (Xs);Cs]
= E x[Qt−s f (Xs);Cs, L ◦ θs > 0],
where we used the Markov property for the last equality. Continuing, we have that the last
line of (22.6) is equal to
E
x[Qt−s f (Xs);Cs, L > s] = E x[Qt−s f (Xs);C, L > s]
= E x[Qt−s f (Ys);C],
as desired.
We can also look at XL+t , where L is as above. This new process is again a strong Markov
process, and this time is the process X conditioned by the function h(x) = Px(TA = ∞). The
intuitive meaning of this is that XL+t is X conditioned never to hit A. Since we are looking
at the process after the last visit to A, this is entirely plausible. For a proof of the Markov
property of XL+t , see Meyer et al. (1972).
Exercises
22.1 Let (Xt , Px) be a one-dimensional Brownian motion, Lxt the local time of Brownian motion at x,
and m a positive finite measure on R. Show that At =
∫
Lxt m(dx) is an additive functional.
22.2 We consider the space-time process. Let Vt = V0 + t. The process Vt is simply the process that
increases deterministically at unit speed. Thus Vt can represent time. If (Xt , Px) is a Markov
process, show that ((Xt ,Vt ), P(x,v)) is also a Markov process. Is ((Xt ,Vt ), P(x,v)) necessarily a
strong Markov process if (Xt , Px) is a strong Markov process?
For some applications, one lets Vt = V0 − t, and one thinks of time running backwards.
Space-time processes are useful when considering parabolic partial differential equations.
22.3 Suppose (Xt , Px) is a strong Markov process and f is a non-negative invariant function for
(Xt , Px). Write Qx for Pxf . Suppose g is a non-negative invariant function for (Xt , Q
x). Show that
f g is a non-negative invariant function for (Xt , Px) and that Qxg = Pxf g.
22.4 Suppose A and B are additive functionals for a Markov process and A and B have continuous
paths. Prove that if E xAt = E xBt for all x and t, then
Px(At �= Bt for some t ≥ 0) = 0
for all x.
Hint: Show At − Bt is a martingale.
22.5 Suppose A and B are additive functionals with continuous paths and suppose E xA∞ = E xB∞ < ∞ for each x. Show Px(At �= Bt for some t ≥ 0) = 0 for each x. Hint: If f (x) = E xA∞, then E x[A∞ | Ft ] − At = E Xt A∞ = f (Xt ), and similarly with B in place of A. Then A − B is a Px martingale for each x. Notes 183 22.6 Let A be an additive functional with continuous paths. Suppose there exists K > 0 such that
E xA∞ ≤ K for each x. Prove that there exists a constant c depending only on K such that
E ecA∞ < ∞, x ∈ S. 22.7 Here is an argument that the law of a Brownian motion conditioned to have a maximum at a certain level is a Bessel process of order 3. Let W be a one-dimensional Brownian motion killed on hitting 0. Let St = sups≤t Ws be the maximum. By Exercise 19.1, X = (W, S) is a Markov process. Determine the law of X for t ≤ L, where L is the last time X hits the diagonal. To define L more precisely, let D = {(w, s) : w = s, w > 0} and L = sup{t ≥ 0 : Xt ∈ D}. L is finite, a.s., because W will hit 0
in finite time with probability one.
Notes
Markov processes are in some sense supposed to have the property that the past and the
future are independent given the present. From this point of view, one might hope that a
Markov process run backwards is again a Markov process. This is, more or less, the case;
see Chung and Walsh (1969) or Rogers and Williams (2000a).

23
Optimal stopping
A nice application of Markov process theory is optimal stopping. Suppose we have a reward
function g ≥ 0 and we want to find the stopping time T that maximizes the value of E xg(XT )
and we also want to find the value of this expectation. This is the optimal stopping problem.
An important example of an optimal stopping problem is pricing the American put. (See
Chapter 28 for more on options.) A European put is an option to sell a share of stock at a
fixed price K at a certain time t0. If at time t0 the price St0 of the stock is lower than K, one
can make a profit by buying a share of stock on the stock exchange for St0 dollars, exercising
the put (which means selling a share of stock for K dollars), and taking home a profit of
K − St0 . If the price of the stock is above K at time t0, it would be silly to exercise the put,
and thus the put is worthless. An American put is almost the same, but one has the option to
sell a share of stock at price K at any time before time t0. An American put is more valuable
than a European put because if one exercises the option early, that is, sells the share of stock
before time t0, then one can put the money in a risk-free asset such as a bond or in the bank
and earn interest on the money. When should one exercise an American put to maximize the
expected return? One cannot look into the future, so the time should be a stopping time. The
stopping time should depend on the stock price, the exercise price, and also the time until
time t0. Thus one is in the optimal stopping context with Xt = (t, St ), where St is the stock
price, and one wants to find a stopping time T that maximizes a certain reward function.
23.1 Excessive functions
A solution to the optimal stopping problem can be given in the Markov case through the use
of excessive functions. A non-negative function f is excessive for a Markov process X if
Pt f (x) ≤ f (x) for all t and x and Pt f (x) increases up to f (x) pointwise as t → 0. Here Pt is
the semigroup associated with the Markov process X . If g ≥ 0, define
U g(x) =
∫ ∞
0
Psg(x) ds = E x
∫ ∞
0
g(Xs) ds. (23.1)
When g ≥ 0, U g is excessive. To see this, using the semigroup property and a change of
variables,
Pt f (x) = Pt
( ∫ ∞
0
Psg(x) ds
)
=
∫ ∞
0
Ps+tg(x) ds
=
∫ ∞
t
Psg(x) ds.
184

23.1 Excessive functions 185
This is certainly less than the integral from 0 to ∞, hence is less than f (x), and Pt f (x)
increases up to f (x) by monotone convergence. (It is possible that f is infinite for some or
all x.)
The theory of excessive functions is an important part of Markov process theory and we
refer the reader to Blumenthal and Getoor (1968), a book which has inspired a generation of
Markov process theorists.
We have the following.
Lemma 23.1 If f is excessive, there exist functions gn ≥ 0 such that U gn increases up to f ,
where U gn is defined by (23.1).
Proof Let gn = n( f − P1/n f ). Since f is excessive, then gn ≥ 0. We have
U gn = n
∫ ∞
0
Ps f ds − n
∫ ∞
0
Ps+(1/n) f ds
= n
∫ 1/n
0
Ps f ds,
which is less than f and increases to f .
Next we have
Proposition 23.2 (1) If f is excessive, T is a finite stopping time, and h(x) = E x f (XT ),
then h is excessive.
(2) If f is excessive and T is a finite stopping time, then f (x) ≥ E x f (XT ).
(3) If f is excessive, then f (Xt ) is a supermartingale
Proof (1) First suppose f = U g for some non-negative function g. Then
h(x) = E xU g(XT ) = E xE XT
∫ ∞
0
g(Xs) ds (23.2)
= E x
∫ ∞
0
g(Xs+T ) ds = E x
∫ ∞
T
g(Xs) ds
by the strong Markov property and a change of variables. The same argument shows that
Pth(x) = E xh(Xt ) = E xE Xt
∫ ∞
T
g(Xs) ds = E x
∫ ∞
T+t
g(Xs) ds.
This is less than E x
∫∞
T g(Xs) ds = h(x) and increases up to h(x) as t ↓ 0.
Now let f be excessive but not necessarily of the form U g. In the paragraph above, replace
g by the gn that were defined in Lemma 23.1 to conclude
Pth(x) = lim
n→∞
PtU gn(x) ≤ lim
n→∞
U gn(x) = h(x).
That Pth increases up to h is proved similarly; there is no difficulty interchanging the limit
as n tends to infinity and the limit as t tends to 0 since PtU gn increases both as n increases
and as t decreases.

186 Optimal stopping
(2) As in the proof of (1), it suffices to consider the case where f = U g and then take
limits. By (23.2),
E
xU g(XT ) = E x
∫ ∞
T
g(Xs) ds ≤ E x
∫ ∞
0
g(Xs) ds = U g(x).
(3) By the Markov property,
E x[ f (Xt ) | F s] = E Xs f (Xt−s) = Pt−s f (Xs) ≤ f (Xs).
The proof is complete.
By Proposition 23.2, f (Xt ) is a supermartingale and therefore has left and right limits
along the dyadic rationals. We could take a version of f (Xt ) that is right continuous, but
there is the danger that doing so would result in a version of X that is not right continuous
with left limits. We want to have X fixed and then conclude that f (Xt ) is right continuous
with left limits without needing to take a version.
Proposition 23.3 Let (Xt, Px) be a strong Markov process. If f is excessive, then for each
x, f (Xt ) is right continuous with left limits Px almost surely.
For a proof, we refer the reader to Blumenthal and Getoor (1968), Theorem II.2.12 or to
Exercise 23.8.
Given a function g, the function G is an excessive majorant for g if G is excessive and
G ≥ g pointwise. G is the least excessive majorant for g if (1) G is an excessive majorant,
and (2) if G̃ is any other excessive majorant, then G ≤ G̃ pointwise.
It turns out, which we will prove below, that an optimal stopping time is to stop the first
time Xt leaves the set where g(x) < G(x). Therefore it is important to be able to calculate the least excessive majorant of a function. Here is one method of constructing the least excessive majorant. We say a function f : S → R is lower semicontinuous if {x : f (x) > a} is an open set for every real number
a. See Exercise 23.9 for information about lower semicontinuous functions.
Proposition 23.4 Suppose that g is non-negative, bounded, and continuous and that As-
sumption 20.1 holds. Let g0 = g, let Tn = {k/2n : 0 ≤ k ≤ n2n}, and define
gn(x) = max
t∈Tn
Ptgn−1(x)
for n = 1, 2, . . . Then gn(x) increases pointwise to G(x), the least excessive majorant of g.
Proof Since gn(x) ≥ P0gn−1(x) = E xgn−1(X0) = gn−1(x), the sequence gn(x) is increasing.
Call the limit H (x).
We first show H is lower semicontinuous. If gn−1 is bounded and continuous, then Ptgn−1
is bounded and continuous for each t by Assumption 20.1. Since the maximum of a finite
number of continuous functions is continuous, then gn is bounded and continuous. By an
induction argument, each gn is continuous. By Exercise 23.9, H is lower semicontinuous.
We next show that H is excessive. If t ∈ Tm and n ≥ m, then
H (x) ≥ gn(x) ≥ Ptgn−1(x) = E xgn−1(Xt ).

23.2 Solving the optimal stopping problem 187
Letting n tend to infinity, H (x) ≥ E xH (Xt ) if t ∈ Tm for some m. Now take tk ∈ ∪mTm with
tk → t. Since H is lower semicontinuous, then using Exercise 23.9 and Fatou’s lemma,
H (x) ≥ lim inf
k→∞
E
xH (Xtk ) ≥ E x[lim inf
k→∞
H (Xtk )] ≥ E xH (Xt ).
If a ∈ R, let Ea = {y : H (y) > a}, which is open. If a < H (x), then PtH (x) = E xH (Xt ) ≥ aPx(Xt ∈ Ea) → a as t → 0. Therefore lim inf t→0 PtH (x) ≥ a for all a < H (x), hence lim inf t→0 PtH (x) ≥ H (x), and we conclude PtH (x) → H (x) as t → 0. Thus H is excessive. Suppose now that F is excessive and F ≥ g pointwise. If F ≥ gn−1, then F (x) ≥ PtF (x) ≥ Ptgn−1(x) for every t ∈ Tn, hence F (x) ≥ gn(x). By an induction argument, F (x) ≥ gn(x) for all n, hence F (x) ≥ H (x). Therefore H is the least excessive majorant of g. In one case, at least, finding the least excessive majorant is easy. Suppose we have a one- dimensional Brownian motion killed on leaving an interval [a, b] and a non-negative function g defined on [a, b]. Then the least excessive majorant is the smallest concave function G that is larger than or equal to g everywhere. To see this, if G is the smallest concave function, by Jensen’s inequality PtG(x) = E xG(Xt ) ≤ G(E xXt ) ≤ G(x). Because G is concave, it is continuous, and so PtG(x) = E xG(Xt ) → G(x) as t → 0. Therefore G is excessive. If G̃ is another excessive function larger than g and a ≤ c < x < d ≤ b, we have G̃(x) ≥ E xG̃(XS ), where S is the first time the process leaves [c, d] by Proposition 23.2(1). Since X is equal to a Brownian motion up to time S, we know the exact distribution of XS; see Proposition 3.16. Therefore G̃(x) ≥ E xG̃(XS ) = d − x d − cG̃(c) + x − c d − cG̃(d). Rearranging this inequality shows that G̃ is concave. Recall that the minimum of two concave functions is concave, so G ∧ G̃ is a concave function larger than g that is less than or equal to G. But G is the smallest concave function larger than or equal to g, hence G = G ∧ G̃, or G ≤ G̃. Thus G is the least excessive majorant of g. 23.2 Solving the optimal stopping problem Now let us turn to proving that an optimal stopping time can be given in terms of the least excessive majorant. For simplicity we will suppose that g is non-negative, continuous, and bounded. We will assume that our Markov process and g are such that a least excessive majorant G exists. Let g∗ be the optimal reward: g∗(x) = sup{E xg(XT ) : T a stopping time}. Let D = {x : g(x) < G(x)}, the continuation region and let τD = inf{t : Xt /∈ D}. Theorem 23.5 Let (Xt, Px) be a strong Markov process and g, g∗, G, and D as above. If τD < ∞, Px-a.s., then g∗(x) = G(x) = E xg(XτD ). 188 Optimal stopping In other words, an optimal stopping time is to stop the first time the process hits {x : G(x) = g(x)}. Proof Let Dε = {x : g(x) < G(x) − ε}, and write τε for τDε . Let Hε(x) = E x[G(Xτε )], which is excessive by Proposition 23.2(2). The first step of the proof is to prove (23.3) below. Second, we prove G(x) ≤ g∗(x). The third step is to prove that G(x) = g∗(x) and the fourth that g∗(x) = E xg(XτD ). Step 1. Let ε > 0. We claim
g(x) ≤ Hε(x) + ε, x ∈ D. (23.3)
To prove this, we suppose not, that is, we let
b = sup
x∈D
(g(x) − Hε(x))
and suppose b > ε. Choose η < ε, and then choose x0 such that g(x0) − Hε(x0) ≥ b − η. (23.4) Since Hε + b is an excessive majorant of g by the definition of b, and G is the least excessive majorant, then G(x0) ≤ Hε(x0) + b. (23.5) From (23.4) and (23.5) we conclude G(x0) ≤ g(x0) + η. (23.6) By the Blumenthal 0–1 law (Proposition 20.8), either τε is strictly positive with Px0 probability one or else zero with Px0 probability one. In the first case, for each t > 0,
g(x0) + η ≥ G(x0)
≥ E x[G(Xt∧τε )]
≥ E x0 [g(Xt ) + ε; τε > t].
The first inequality is (23.6), the second is due to G being excessive, and the third because
G > g + ε up until the time τε. If we let t → 0 and use the fact that g is continuous, we get
g(x0) + η ≥ g(x0) + ε, a contradiction to the way we chose η.
In the second case, where τε = 0 with Px0 -probability one, we have
Hε(x0) = E x0 G(Xτε ) = E x0 G(X0) = G(x0) ≥ g(x0) ≥ Hε(x0) + b − η,
a contradiction since we chose η < b. In either case we reach a contradiction, so (23.3) must hold. Step 2. A conclusion we reach from (23.3) is that Hε + ε is an excessive majorant of g. Therefore G(x) ≤ Hε(x) + ε (23.7) = E x[G(Xτε )] + ε ≤ E x[g(Xτε ) + ε] + ε ≤ g∗(x) + 2ε. Exercises 189 The first inequality holds because G is the least excessive majorant, the second inequality because g(Xτε ) + ε = G(Xτε ) by the definition of τε, and the third by the definition of g∗. Since ε is arbitrary, we see that G(x) ≤ g∗(x). Step 3. For any stopping time T , because G is excessive and majorizes g, G(x) ≥ E xG(XT ) ≥ E xg(XT ). Taking the supremum over all stopping times T , G(x) ≥ g∗(x), and therefore G(x) = g∗(x). Step 4. Because τD is finite almost surely, the continuity of g tells us that E xg(Xτε ) → E xg(XτD ) as ε → 0. By the definition of g∗, we know that E xg(Xτε ) ≤ g∗(x). On the other hand, by the definitions of τε and Hε, E xg(Xτε ) = E xG(Xτε ) − ε = Hε(x) − ε. By the first inequality in (23.7), the right-hand side is greater than or equal to G(x) − 2ε = g∗(x) − 2ε. Letting ε → 0 we obtain E xg(XτD ) ≥ g∗(x) as desired. The following two corollaries are useful in applications. Corollary 23.6 Suppose there exists a Borel set A such that h is an excessive majorant of g, where h(x) = E xg(XτA ) and τA = inf{t : Xt /∈ A}. Then g∗(x) = h(x). Proof Let G be the least excessive majorant of g. Then h(x) ≥ G(x). However, h(x) = E xg(XτA ) ≤ sup T E xg(XT ) = g∗(x) = G(x) by Theorem 23.5. Corollary 23.7 Suppose g is continuous and G, the least excessive majorant of g, is lower semicontinuous. Let D be the continuation region, suppose τD < ∞, a.s., and let h(x) = E xg(XτD ). If h ≥ g, then h = g∗. Proof Note D = {x : g(x) < G(x)} = ∪a b)], where the union is
over all pairs of real numbers a < b. Since G is lower semicontinuous and g is continuous, then D is open. This implies XτD /∈ D, and so g(XτD ) ≥ G(XτD ), a.s. Since g ≤ G, we see that h(x) = E xg(XτD ) = E xG(XτD ). Since G is excessive, then h is also excessive by Proposition 23.2. Therefore h is an excessive majorant of g and we can apply Corollary 23.6. Exercises 23.1 Show that if f is excessive, then 1 − e− f is excessive. Thus, for some purposes it is enough to look at bounded excessive functions. 23.2 Show that if f and g are excessive, then f ∧ g is excessive. 190 Optimal stopping 23.3 Let At be an additive functional (defined in (22.4)) and let f (x) = E xA∞. Show that f is excessive. 23.4 Let f be an excessive function for a strong Markov process (Xt , Px). Let ε > 0 and S1 = inf{t :
| f (Xt ) − f (X0)| ≥ ε}. Let Si+1 = Si + S1 ◦ θSi . Prove that f (XSi ) is a supermartingale with
respect to the σ -fields FSi and with respect to Px for each x.
23.5 For each n, let Ant be an additive functional with continuous paths and suppose that fn(x) = E xAn∞
is finite for every x. Suppose At is a continuous additive functional with f (x) = E xA∞ also
finite for each x. Suppose fn converges to f uniformly. Prove that for each x, with Px-probability
one, Ant converges to At , uniformly over t ≥ 0.
Hint: Use Proposition 9.11.
23.6 Suppose f is bounded and excessive, λ ≥ 0, and A = {y : f (y) ≤ λ}. Prove that if x ∈ Ar (i.e.,
x is regular for A), then f (x) ≤ λ.
Hint: Use the optional section theorem (Theorem 16.12) to find stopping times Tm whose
graphs are contained in {(t, ω) : t ≤ 1/m, f (Xt ) ≤ λ} with Px-probability at least 1 − (1/m).
If the gn are as in Lemma 23.1, write
U gn(x) = E x
∫ Tm
0
gn(Xs) ds + E xU gn(XTm )
≤ E x
∫ Tm
0
gn(Xs) ds + E x f (XTm )
≤ E x
∫ Tm
0
gn(Xs) ds + λ + ‖ f ‖∞/m.
Let m → ∞, then n → ∞.
23.7 Suppose f is bounded and excessive, λ ≥ 0, and B = {y : f (y) ≥ λ}. Prove that if x ∈ Br, then
f (x) ≤ λ.
Hint: Use the optional stopping theorem as in Exercise 23.6 to find stopping times Rm
analogous to the Tm. Write
f (x) ≥ E x f (XRm ) ≥ λ − ‖ f ‖∞/m,
and then let m → ∞.
23.8 (1) Suppose f is bounded and excessive, x ∈ S, ε > 0, and C = {y : | f (y) − f (x)| ≥ ε}. Use
Exercises 23.6 and 23.7 to show that if z ∈ Cr, then | f (z) − f (x)| ≥ ε.
(2) Let f , ε, and x be as in (1) and set S = inf{t > 0 : | f (Xt ) − f (x)| ≥ ε}. Use Exercise
20.9 to show that | f (XS ) − f (x)| ≥ ε with Px-probability one.
(3) Let f , ε, x, and S be as in (2). Define S = 0 and Si+1 = Si + S ◦ θSi . By Exercise 23.4,
f (XSi ) is a positive supermartingale. Use Corollary A.36 to show Si → ∞, Px-a.s. Deduce that
with Px-probability one, f (Xt ) has paths that are right continuous with left limits.
(4) Use Exercise 23.1 to show that if f is excessive but not necessarily bounded, then f (Xt )
has paths that are right continuous with left limits.
23.9 (1) Show that every continuous function is lower semicontinuous.
(2) Show that if f is lower semicontinuous and x ∈ S, then
lim inf
y→x f (y) ≥ f (x).
(3) Show that if fn is a sequence of continuous functions increasing to f , then f is lower
semicontinuous.

Notes 191
23.10 Suppose g is non-negative, bounded, and continuous, and Assumption 20.1 holds. Let g0 = g
and define gn(x) = supt≥0 Ptgn−1(x) for n ≥ 1. Prove that gn increases to the least excessive
majorant of g.
Notes
See Øksendal (2003) for further information on optimal stopping.
Exercise 23.3 shows that E xA∞ is an excessive function if A is an additive functional. To
a large extent the converse is true: given an excessive function f and some mild conditions,
there exists an additive functional A such that f (x) = E xA∞ for all x. The proof is a
modification of the Doob–Meyer decomposition of f (Xt ) that takes into account the fact
there is a family of probabilities instead of just one; see Blumenthal and Getoor (1968).
The optimal stopping problem involving American puts has a theoretical solution: look
at the least excessive majorant for a certain reward function. The reward function is not just
(K − s)+ because the interest earned on the money obtained after the sale of a share of
stock needs to be taken into account. Moreover, the excessive functions here are relative to
the space-time process (St, t), not those relative to St . Finding a satisfactory solution to this
optimal stopping problem is still open and is important.

24
Stochastic differential equations
Stochastic differential equations are used in modeling a wide variety of physical and economic
situations, and are one of the main reasons for the interest in stochastic integrals.
We consider stochastic differential equations (SDEs) of the form
dXt = σ (Xt ) dWt + b(Xt ) dt,
where σ and b are real-valued functions and W is a one-dimensional Brownian motion. We
also consider multidimensional analogs of this equation. If X represents the position of a
particle, the σ (Xt ) dWt term says that the particle X diffuses like a multiple of Brownian
motion, but how strong the diffusivity is depends on the location of the particle. The b(Xt ) dt
term represents a push in one direction or another, the size of the push depending on the
location of the particle.
24.1 Pathwise solutions of SDEs
Let Wt be a one-dimensional Brownian motion with respect to a filtration {F t} satisfying the
usual conditions; see Chapter 1. We want to consider SDEs of the form
dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x0. (24.1)
This means that Xt satisfies the equation
Xt = x0 +
∫ t
0
σ (Xs) dWs +
∫ t
0
b(Xs) ds, t ≥ 0. (24.2)
Here σ and b are Borel measurable functions, the first integral in (24.2) is a stochastic integral
with respect to the Brownian motion Wt , and (24.2) holds almost surely, that is, we can find
versions of
∫ t
0 σ (Xs) dWs such that for almost all ω, (24.2) holds for all t. In order to be able
to define the stochastic integral, we require that any solution Xt to (24.2) be adapted to the
filtration {Ft}. If X satisfies (24.2), then X will automatically have continuous paths. We
want to consider existence and uniqueness of solutions to the equation (24.2).
Definition 24.1 A stochastic process X will be a pathwise solution to (24.1) if X is adapted
to the filtration {Ft} and (24.2) holds almost surely, where the null set does not depend on t.
We say the solution to (24.1) is pathwise unique if whenever X ′t is another solution, then
P(Xt �= X ′t for some t ≥ 0) = 0. (24.3)
Sometimes pathwise uniqueness is used for a slightly stronger concept: one can let W be
a Brownian motion with respect to each of two filtrations {Ft} and {F ′t }, which are possibly
192

24.1 Pathwise solutions of SDEs 193
different, and one can let X ′t be adapted to {F ′t }. One then requires (24.3) to hold. We won’t
need to use this modification of the definition, and in any case our proof of uniqueness will
be equally valid in this situation.
The function σ in (24.1) is called the diffusion coefficient and the function b is called the
drift coefficient. σ tells us the intensity of the noise at a point, and b tells us if there is a push
in any direction at a given point.
We will suppose that σ and b are Lipschitz functions: there exists a constant c such that
|σ (x) − σ (y)| ≤ c|x − y|, |b(x) − b(y)| ≤ c|x − y|. (24.4)
We also suppose for now that σ and b are bounded.
Theorem 24.2 Suppose σ and b are bounded Lipschitz functions. Then there exists a path-
wise solution to (24.1) and this solution is pathwise unique.
Proof Existence. Let X0(t) = x0 for all t and define Xi(t) recursively by
Xi+1(t) = x0 +
∫ t
0
σ (Xi(s)) dWs +
∫ t
0
b(Xi(s)) ds. (24.5)
Note that X0(t) is trivially adapted to {Ft}, and an induction argument shows that Xi is adapted
to {Ft} for each i.
Fix t0. We will show existence (and uniqueness) up to time t0; since t0 is arbitrary, this will
achieve the theorem.
Since (x + y)2 ≤ 2×2 + 2y2, then
E sup
r≤t
|Xi+1(r) − Xi(r)|2 = E
[
sup
r≤t
( ∫ r
0
[σ (Xi(s)) − σ (Xi−1(s))] dWs
+
∫ r
0
[b(Xi(s)) − b(Xi−1(s))] ds
)2]
≤ 2E
[
sup
r≤t
( ∫ r
0
[σ (Xi(s)) − σ (Xi−1(s))] dWs
)2]
+ 2E
[
sup
r≤t
( ∫ r
0
[b(Xi(s)) − b(Xi−1(s))] ds
)2]
.
By Doob’s inequalities (Theorem 3.6) and the fact that σ is a Lipschitz function, the first
term after the inequality is bounded by
8E
[( ∫ t
0
[σ (Xi(s)) − σ (Xi−1(s))] dWs
)2]
= 8E
∫ t
0
[σ (Xi(s)) − σ (Xi−1(s))]2 ds
≤ cE
∫ t
0
|Xi(s) − Xi−1(s)|2 ds.
By the Cauchy–Schwarz inequality, the fact that t ≤ t0, and the fact that b is a Lipschitz
function, the second term is bounded by
2E
( ∫ t
0
|b(Xi(s)) − b(Xi−1(s))| ds
)2
≤ 2t0E
∫ t
0
|b(Xi(s)) − b(Xi−1(s))|2 ds
≤ cE
∫ t
0
|Xi(s) − Xi−1(s)|2 ds.

194 Stochastic differential equations
Therefore
E sup
r≤t
|Xi+1(r) − Xi(r)|2 ≤ cE
∫ t
0
|Xi(s) − Xi−1(s)|2 ds. (24.6)
Let gi(t) = E supr≤t |Xi(r)−Xi−1(r)|2. Thus provided we choose A big enough, g1(t) ≤ A
for t ≤ t0 and
gi+1(t) ≤ A
∫ t
0
gi(s) ds, t ≤ t0.
(Clearly |Xi+1(t) − Xi(t)|2 ≤ supr≤t |Xi+1(r) − Xi(r)|2.) Thus
g2(t) ≤ A
∫ t
0
g1(s) ds ≤ A
∫ t
0
A ds = A2t,
g3(t) ≤ A
∫ t
0
g2(s) ds ≤ A
∫ t
0
A2s ds = A3t2/2,
and continuing by induction,
gi(t) ≤ Aiti−1/(i − 1)!
Exercise 24.1 asks you to show that if we define
‖Y ‖t = (E sup
r≤t
|Yr|2)1/2 (24.7)
when Y is a stochastic process, then ‖Y ‖t is a norm and the corresponding metric is complete.
Hence
(E sup
r≤t0
|Xn(s) − Xm(s)|2)1/2 = ‖Xn − Xm‖t0
≤
n−1∑
i=m
‖Xi+1 − Xi‖t0
≤
n−1∑
i=m
(gi(t0))
1/2
can be made small by taking m, n large. (We use the ratio test to show that the sum∑
(Aiti−10 /(i − 1)!)1/2 converges.) We have E X0(t)2 < ∞. By the completeness of ‖ · ‖t0 there exists Xt such that E sups≤t0 |Xn(s) − Xs|2 → 0 as n → ∞. This implies there exists a subsequence {nj} such that sups≤t0 |Xnj (s) − Xs|2 → 0 almost surely; since each Xnj is continuous in t, then Xt is also. Taking a limit in (24.5) as n → ∞ shows Xt satisfies (24.2). Uniqueness. Suppose Xt and X ′t are two solutions to (24.2). Let g(t) = E sup r≤t |Xr − X ′r |2. 24.1 Pathwise solutions of SDEs 195 Very similarly to the existence proof, E supr≤t |Xr|2 < ∞, the same with X replaced by X ′, and E sup r≤t |Xr − X ′r |2 ≤ 2E [ sup r≤t ( ∫ r 0 [σ (Xs) − σ (X ′s )] dWs )2] + 2E [ sup r≤t ( ∫ r 0 [b(Xs) − b(X ′s )] ds )2] ≤ cE ∫ t 0 |Xs − X ′s |2 ds. Therefore there exists A > 0 such that g(t) is bounded by A and g(t) ≤ A ∫ t0 g(s) ds.
Then g(t) ≤ A ∫ t0 A ds = A2t, g(t) ≤ A ∫ t0 A2s ds = A3t2/2, etc. Thus we have
g(t) ≤ Aiti−1/(i − 1)! for all i, which is only possible if g(t) = 0. This implies that
Xt = X ′t for all t ≤ t0, except for a null set.
We also want to consider the SDE (24.1) when σ and b are Lipschitz functions, but not
necessarily bounded. Note |σ (x)| ≤ |σ (0)| + c|x|, so that |σ (x)| is less than or equal to
c(1 + |x|), and the same for b.
Theorem 24.3 Suppose σ and b are Lipschitz functions, but not necessarily bounded. Then
there exists a pathwise solution to (24.1) and this solution is pathwise unique.
Proof Let σn and bn be bounded Lipschitz functions that agree with σ and b, respectively,
on [−n, n]. Let Xn be the unique pathwise solution to (24.1) with σ and b replaced by σn and
bn, respectively. Let Tn = inf{t : |Xn(t)| ≥ n}. We claim Xn(t) = Xm(t) if t ≤ Tn ∧ Tm; to
prove this, let g(t) = E sups≤t∧Tn∧Tm |Xn(s) − Xm(s)|2, and proceed as in the uniqueness part
of the proof of Theorem 24.2. We then have existence and uniqueness of the SDE for t ≤ Tn
for each n.
To complete the proof, it suffices to show Tn → ∞. Let
hn(t) = E sup
s≤t∧Tn
|Xn(s)|2.
Then
hn(t) ≤ c|x0|2 + cE
( ∫ t
0
σn(Xn(s)) dWs
)2
+ cE
∫ t
0
bn(Xn(s))
2 ds
≤ c|x0|2 + cE
∫ t
0
σn(Xn(s))
2 ds + ct0E
∫ t
0
bn(Xn(s))
2 ds
≤ c|x0|2 + c + cE
∫ t
0
|Xn(s)|2 ds
≤ c + c
∫ t
0
hn(s) ds,
using estimates very similar to those of the proof of Theorem 24.2. By Exercise 24.2,
hn(t) ≤ cect if t ≤ t0. Note the constant c can be chosen to be independent of n. Then
P(Tn < t0) = P(sup s≤t0 |Xn(s)| ≥ n) ≤ E sups≤t0 |Xn(s)|2 n2 ≤ hn(t0) n2 → 0 as n → ∞. Since t0 is arbitrary, Tn → ∞, a.s. 196 Stochastic differential equations Although we considered one-dimensional SDEs for simplicity, the same arguments apply when we have higher-dimensional SDEs. Let W = (W 1, . . . ,W d ) be a d-dimensional Brownian motion, let σi j(x) be bounded Lipschitz functions for i = 1, . . . , n and j = 1, . . . , d, and let bi(x) be bounded Lipschitz functions for i = 1, . . . , n. Consider the system of equations dX it = d∑ j=1 σi j(Xt ) dW j t + bi(Xt ) dt, i = 1, . . . , n. (24.8) This is frequently written in matrix form dXt = σ (Xt ) dWt + b(Xt ) dt (24.9) where we view X = (X 1, . . . , X n) as a n × 1 matrix, b = (b1, . . . , bn) as a n × 1 matrix- valued function, W as a d × 1 matrix, and σ as a n × d matrix-valued function. We have existence and uniqueness to the system (24.8). Exercise 24.5 asks you to prove this in the case when n = d, although there is nothing at all special about requiring n = d. 24.2 One-dimensional SDEs Although our proof of pathwise existence and uniqueness was for SDEs in one dimension, as is pointed out in Exercise 24.5, almost the same proof works in higher dimensions. In this section we look at a pathwise uniqueness result that is valid only for SDEs on R. The equation we look at is the same as the one in the last section, namely, Xt = x0 + ∫ t 0 σ (Xs) dWs + ∫ t 0 b(Xs) ds. (24.10) Theorem 24.4 Suppose b is bounded and Lipschitz. Suppose there exists a continuous function ρ : [0, ∞) → [0, ∞) such that ρ(0) = 0,∫ ε 0 ρ−2(u) du = ∞ (24.11) for all ε > 0, and σ is bounded and satisfies
|σ (x) − σ (y)| ≤ ρ(|x − y|)
for all x and y. Then the solution to (24.10) is pathwise unique.
For an example, let b(x) = 0 for all x, and let σ be Hölder continuous of order α, that is,
there exists c such that |σ (x) − σ (y)| ≤ c|x − y|α. Then we take ρ(x) = xα, and the integral
condition in the theorem is satisfied if and only if α ≥ 1/2. If (24.11) holds for all ε > 0,
we say the Yamada–Watanabe condition holds.
Instead of proving this theorem right away and then essentially repeating the proof to give
a comparison theorem, we will state and prove a comparison theorem (Theorem 24.5) and
then obtain Theorem 24.4 as a corollary of Theorem 24.5.
We only prove the uniqueness of the solution to (24.10) here. The existence is a conse-
quence of some measure-theoretic magic; see Revuz and Yor (1999), Theorem IX.1.7.

24.2 One-dimensional SDEs 197
Theorem 24.5 Suppose σ satisfies the conditions in Theorem 24.4. Suppose Xt satisfies
(24.10) with b a Lipschitz function. Suppose Yt is a continuous semimartingale satisfying
Yt ≥ Y0 +
∫ t
0
σ (Ys) dWs +
∫ t
0
B(Ys) ds,
where B is a Borel measurable function and B(z) ≥ b(z) for all z. If Y0 ≥ x, a.s., then Yt ≥ Xt
almost surely for all t.
Proof Let an ↓ 0 be selected so that∫ an−1
an
(ρ(u))−2 du = n.
This can be done inductively. Choose a0 arbitrarily. Since
∫ a0
r ρ(x)
−2 dx increases to in-
finity as r → 0, we can choose a1 such that
∫ a0
a1
ρ(x)−2 dx = 1; in a similar man-
ner we choose a2, a3, . . .. Let hn be continuous, supported in (an, an−1), 0 ≤ hn(u) ≤
2/nρ2(u), and
∫ an−1
an
hn(u) du = 1 for each n. The idea here is to start with the function
(1 + ε)1(an,an−1 )(u)/(nρ(u)2) for some small ε, and then modify this near an and an−1 to get
a function that is continuous, is supported in (an, an−1), and integrates to 1. Let fn be such
that fn(0) = f ′n(0) = 0 and f ′′n = hn. Note
f ′n(u) =
∫ u
0
f ′′n (s) ds =
∫ u
0
hn(s) ds ≤ 1
and f ′n(u) ≥ 0, so 0 ≤ f ′n(u) ≤ 1 and f ′n(u) = 1 if u ≥ an−1. Hence fn(u) ↑ u as n → ∞ for
each u ≥ 0.
Since x ≤ y, then fn(x − y) = 0, and we have by Itô’s formula
fn(Xt − Yt ) = martingale +
∫ t
0
f ′n(Xs − Ys)[b(Xs) − B(Ys)] ds (24.12)
+ 12
∫ t
0
f ′′n (Xs − Ys)[σ (Xs) − σ (Ys)]2 ds.
We take expectations of both sides. The martingale term has 0 expectation. The final term
on the right-hand side is bounded in expectation by
1
2E
∫ t
0
2
n(ρ|Xs − Ys|)2 (ρ|Xs − Ys|)
2 ds ≤ t
n
by the assumptions on σ and the bound on f ′′n = hn, and so goes to 0 as n → ∞. The
expectation of the second term on the right of (24.12) is bounded above by
E
∫ t
0
f ′n(Xs − Ys)[b(Xs) − b(Ys)] ds + E
∫ t
0
f ′n(Xs − Ys)[b(Ys) − B(Ys)] ds
≤ cE
∫ t
0
(1[0,∞)(Xs − Ys)) |Xs − Ys| ds
= cE
∫ t
0
(Xs − Ys)+ ds.

198 Stochastic differential equations
Letting n → ∞,
E (Xt − Yt )+ ≤ c
∫ t
0
E (Xs − Ys)+ ds.
If we set g(t) = E (Xt − Yt )+, we have
g(t) ≤ c
∫ t
0
g(s) ds,
and by Exercise 24.2 we conclude g(t) = 0 for each t. Using the continuity of the paths of
Xt and Yt completes the proof.
We now prove Theorem 24.4.
Proof of Theorem 24.4 Suppose X and X ′ are two solutions to (24.10). Then by Theorem
24.5 with Y = X ′ and B = b, we have Xt ≤ X ′t for all t. Applying this argument with X and
X ′ reversed yields X ′t ≤ Xt for all t, which completes the proof.
24.3 Examples of SDEs
Ornstein–Uhlenbeck process
The Ornstein–Uhlenbeck process is the solution to the SDE
dXt = dWt − Xt
2
dt, X0 = x. (24.13)
The existence and uniqueness follow by Theorem 24.3. Note that the drift coefficient is not
bounded, so Theorem 24.2 is not sufficient. The process behaves like a Brownian motion,
with a drift that pushes the process towards the origin; the farther the process gets from the
origin, the stronger the push.
The equation (24.13) can be solved explicitly. Rearranging, multiplying by et/2, and using
the product rule,
d[et/2Xt] = et/2 dXt + et/2 Xt
2
dt = et/2 dWt,
so
et/2Xt = X0 +
∫ t
0
es/2 dWs,
or
Xt = e−t/2x + e−t/2
∫ t
0
es/2 dWs. (24.14)
We used here the fact that the martingale part of the semimartingale Zt = et/2 is zero, and
therefore 〈Z,W 〉t = 0. By Exercise 24.6, Xt is a Gaussian process and the distribution of Xt
is that of a normal random variable with mean e−t/2x and variance equal to e−t
∫ t
0 (e
s/2)2 ds =
1 − e−t .
If we let Yt =
∫ t
0 e
s/2 dWs and Vt = Ylog(t+1), then Yt is a mean-zero continuous Gaussian
process with independent increments, and hence so is Vt . Since
Var (Vu − Vt ) =
∫ log(u+1)
log(t+1)
es ds = u − t,

24.3 Examples of SDEs 199
then Vt is a Brownian motion. Hence
Xt = e−t/2x + e−t/2V (et − 1).
This representation of an Ornstein–Uhlenbeck process in terms of a Brownian motion is
useful for, among other things, calculating the exit probabilities of a square root boundary.
Linear equations
We consider the linear equation
dXt = AXt dWt + BXt dt, X0 = x0, (24.15)
where A and B are constants. One place this comes up is in models of stock prices in financial
mathematics; see Chapter 28. We have pathwise existence and uniqueness by Theorem 24.3;
here both the diffusion and drift coefficients are unbounded.
We will give a candidate for the solution, and verify that it solves (24.15). By the pathwise
uniqueness, this will then be the only solution. Our candidate is
Xt = x0eAWt+(B−A2/2)t .
To verify that this is a solution, we use Itô’s formula with the process AWt + (B − A2/2)t and
the function ex:
Xt = x0 +
∫ t
0
eAWs+(B−A
2/2)sA dWs +
∫ t
0
eAWs+(B−A
2/2)s(B − A2/2) ds
+ 12
∫ t
0
eAWs+(B−A
2/2)sA2 ds
= x0 +
∫ t
0
eAWs+(B−A
2/2)sA dWs +
∫ t
0
eAWs+(B−A
2/2)sB ds
= x0 +
∫ t
0
AXs dWs +
∫ t
0
BXs ds.
Let us summarize our discussion.
Proposition 24.6 The unique pathwise solution to
dXt = AXt dWt + BXt dt
is
Xt = X0eAWt+(B−A2/2)t .
If we write Zt = AWt + Bt, then (24.15) becomes
dXt = Xt dZt, X0 = x0. (24.16)
The equation (24.16) makes sense for arbitrary continuous semimartingales Z, and by using
Itô’s formula as above, one can see that a solution is Xt = x0eZt−〈Z〉t/2.

200 Stochastic differential equations
Bessel processes
We consider Bessel processes and the squares of Bessel processes. The reason for the name
is that these processes turn out to be Markov processes and the infinitesimal generator of the
semigroup (see Chapter 37) is related to Bessel’s equation, a type of differential equation.
A Bessel process of order ν ≥ 2 is defined to be a solution of the SDE
dXt = dWt + ν − 1
2Xt
dt, X0 = x. (24.17)
Bessel processes of order 0 ≤ ν < 2 can also be defined using (24.17), but only up until the first time the process X reaches 0; some extra information needs to be given as to what the process does at 0. The square of a Bessel process of order ν ≥ 0 is defined to be the solution to the SDE dYt = 2 √ |Yt | dWt + ν dt, Y0 = y. (24.18) There is no difficulty defining the square of a Bessel process for 0 ≤ ν < 2. By Theorem 24.4 we have pathwise uniqueness for the solution to (24.18), because | |y|1/2 − |x|1/2| ≤ |y − x|1/2, and we can thus take ρ(u) = 2u1/2 in Theorem 24.4. The solution to (24.18) when ν = 0 and y = 0 is clearly Yt = 0 for all t. By Theorem 24.5 with b(x) = ν and B(x) = 0, we see that the solution to (24.18) is greater than or equal to 0 for all t. We may thus omit the absolute value in (24.18) and rewrite it as dYt = 2 √ Yt dWt + ν dt, Y0 = y. (24.19) If we apply Itô’s formula to the solution Yt of (24.19) with the function √ x, we see that Xt = √ Yt solves (24.17) for t up until the first time Y reaches 0; the function √ x is twice continuously differentiable as long as we stay away from 0. We will see shortly that the square of a Bessel process started away from 0 never hits 0 if and only if ν ≥ 2. Using Itô’s formula with a d-dimensional process Wt and the function |x|2 shows that the square of the modulus of a d-dimensional Brownian motion is the square of a Bessel process of order d; this is Exercise 24.7. Bessel processes have the same scaling properties as Brownian motion. That is, if Xt is a Bessel process of order ν started at x, then aXa−2t is a Bessel process of order ν started at ax. In fact, from (24.17), d(aXa−2t ) = a dWa−2t + a2 ν − 1 2aXa−2t d(a−2t), and the assertion follows from the uniqueness of the solution to (24.17) and the fact that aW (a−2t) is again a Brownian motion. Bessel processes are useful for comparison purposes, and so the following is worthwhile. Proposition 24.7 Suppose Yt is the square of a Bessel process of order ν. Suppose Y0 = y. The following hold with probability one. (1) If ν > 2 and y > 0, Yt never hits 0.
(2) If ν = 2 and y > 0, Yt hits every neighborhood of 0, but never hits the point 0.
(3) If 0 < ν < 2, Yt hits 0. (4) If ν = 0, then Yt hits 0. If started at 0, then Yt remains at 0 forever. Exercises 201 When we say that Yt hits 0, we consider only times t > 0. We define T0 = inf{t > 0 : Yt = 0}
and say that Yt hits 0 if T0 < ∞. Proof We prove (2). An application of Itô’s formula with the process being the square of a Bessel process of order 2 and the function being log x shows that logYt is a martingale up until the first hitting time of 0; cf. Exercise 21.1. The quadratic variation of logYt is ∫ t 0 Y −2 s ds for t less than the hitting time of 0. Suppose 0 < a < y < b. We claim that Yt leaves the interval [a, b], a.s. If not, 〈logY 〉t ≥ b−2t → ∞ as t → ∞. Since logYt is a martingale, it is a time change of Brownian motion, and Brownian motion leaves [log a, log b] with probability one, a contradiction. Then by Corollary 3.17, P(Yt hits a before b) = log b − log y log b − log a . (24.20) Letting b → ∞, we see that P(Yt hits a) = 1, and since a is arbitrary, Yt hits every neighbor- hood of 0. If in (24.20) we hold b fixed instead and let a → 0, we see P(Yt hits 0 before b) = 0; since b is arbitrary, this proves that Yt never hits the point 0. Parts (1), (3), and (4) are similar, but instead of log |x| we use |x|(2−ν)/2. The details are left as Exercise 24.8. Exercises 24.1 Show that ‖ · ‖t defined by (24.7) gives rise to a complete normed linear space. 24.2 Suppose g(t) is non-negative and bounded on each finite subinterval of [0,∞). Suppose there exist constants A and B such that g(t) ≤ A + B ∫ t 0 g(s) ds (24.21) for each t ≥ 0. Prove that g(t) ≤ AeBt for all t ≥ 0. This result is known as Gronwall’s lemma. Hint: Write g(t) ≤ A + B ∫ t 0 [ A + B ∫ s 0 g(r) dr ] ds, use (24.21) to substitute for g(r), and iterate. 24.3 The starting point in (24.1) can be random. Suppose Y is a random variable that is measurable with respect toF0, Y is square integrable, and σ and b are bounded and Lipschitz. Prove pathwise existence and uniqueness for the equation Xt = Y + ∫ t 0 σ (Xs) dWs + ∫ t 0 b(Xs) ds. 24.4 The functions σ and b in (24.1) can depend on time as well as space. Suppose σ : [0,∞)×R → R, b : [0,∞)×R → R are bounded and uniformly Lipschitz in the second variable: there exists c independent of s such that |σ (s, x) − σ (s, y)| ≤ c|x − y| and similarly for b. Prove pathwise existence and uniqueness for the equation Xt = x0 + ∫ t 0 σ (s, Xs) dWs + ∫ t 0 b(s, Xs) ds. 202 Stochastic differential equations 24.5 Here is a multidimensional analog of (24.1). Suppose the functions σi j : Rd → R, 1 ≤ i, j ≤ d, are bounded and Lipschitz, and bi : Rd → R, i = 1, . . . , d, are bounded and Lipschitz, W j are independent one-dimensional Brownian motions, x0 = (x(1)0 , . . . , x(d)0 ), and Xt = (X (1)t , . . . , X )d)t ) satisfies X (i)t = x(i)0 + ∫ t 0 d∑ j=1 σi j(Xs) dW j s + ∫ t 0 bi(Xs) ds (24.22) for i = 1, . . . , d. Prove pathwise existence and uniqueness for this system of equations. 24.6 Suppose f and g map [0,∞) → R with ∫∞0 f (t)2 dt < ∞ and ∫∞0 g(t)2 dt < ∞. Show that∫∞ 0 f (t) dWt is a mean zero Gaussian random variable, the same with f replaced by g, and Cov ( ∫ ∞ 0 f (t) dWt , ∫ ∞ 0 g(t) dWt ) = ∫ ∞ 0 f (t)g(t) dt. Hint: Approximate f and g by piecewise constant deterministic functions. 24.7 Show that if Wt is a d-dimensional Brownian motion, then |Wt |2 is the square of a Bessel process of order d. 24.8 Prove (1), (3), and (4) of Proposition 24.7. 24.9 Let X be the solution to dXt = σ (Xt ) dWt + b(Xt ) dt, where W is a one-dimensional Brownian motion, σ and b are Lipschitz continuous real-valued functions, and |σ (x)| ≤ c(1 + |x|) and |b(x)| ≤ c(1 + |x|). Let t0 > 0. Prove that if p ≥ 2, then
E [sup
s≤t0
|Xs|p] ≤ c(1 + |x0|p).
24.10 Let W be a one-dimensional Brownian motion and let X xt be the solution to
dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x.
Suppose σ and b are C∞ functions and that σ and b and all their derivatives are bounded. Show
that for each t the map x → X xt is continuous in x with probability one. Show that the map is
differentiable in x.
24.11 Suppose A(t) and B(t) are deterministic functions of t. Find an explicit solution to the one-
dimensional SDE
dXt = A(t) dWt + B(t) dt, X0 = x.
Notes
If one wants to have a stochastic differential equation with jumps, besides a Brownian motion,
one integrates with respect to a Poisson point process, which is defined in Chapter 18. Using
the notation of that chapter, one considers the stochastic differential equation
dXt = σ (Xt−) dWt + b(Xt−) dt
+
∫
S
F (Xt−, z) (μ(dt, dz) − ν(dt, dz)), X0 = x0,

Notes 203
which means that we want a solution to
Xt = x0 +
∫ t
0
σ (Xs−) dWs +
∫ t
0
b(Xs−) ds
+
∫ t
0
∫
S
F (Xs−, z) (μ(ds, dz) − ν(ds, dz)).
There is pathwise existence and uniqueness to this SDE provided F satisfies a suitable
Lipschitz-like condition; see Skorokhod (1965).

25
Weak solutions of SDEs
In Chapter 24 we considered SDEs of the form
dXt = σ (Xt ) dWt + b(Xt ) dt, (25.1)
where W is a Brownian motion and σ and b are Lipschitz functions, or in one dimension,
where σ has a modulus of continuity satisfying an integral condition. When the coefficients
σ and b fail to be sufficiently smooth, it is sometimes the case that (25.1) may not have a
pathwise solution at all, or it may not be unique. We define another notion of existence and
uniqueness that is useful.
Definition 25.1 A weak solution (X ,W, P) to (25.1) exists if there exists a probability
measure P and a pair of processes (Xt,Wt ) such that Wt is a Brownian motion under P
and (25.1) holds. There is weak uniqueness holding for (25.1) if whenever (X ,W, P) and
(X ′,W ′, P′) are two weak solutions, then the joint law of (X ,W ) under P and the joint law
of (X ′,W ′) under P′ are equal. When this happens, we also say that the solution to (25.1) is
unique in law.
Let us discuss the relationship between weak solutions and pathwise solutions. If the
solution to (25.1) is pathwise unique, then weak uniqueness holds. For a proof of this result
under very general hypotheses, see Revuz and Yor (1999), theorem IX.1.7. In the case that
σ and b are Lipschitz functions, the proof is much simpler.
Proposition 25.2 Suppose σ and b are bounded Lipschitz functions and x0 ∈ Rd . Then weak
uniqueness holds for (25.1).
Proof For notational simplicity we consider the case of dimension one. Suppose (X ,W, P)
and (X ′,W ′, P′) are two weak solutions to (25.1). Let X0(t) = x0 and define Xi+1(t) by
Xi+1(t) = x0 +
∫ t
0
σ (Xi(s)) dWs +
∫ t
0
b(Xi(s)) ds. (25.2)
We saw by the proof of Theorem 24.2 that the limit of the Xi exists, uniformly over finite
time intervals, and solves (25.1), and the solution is pathwise unique. Since X also solves
(25.1), we conclude that Xi converges (uniformly over finite time intervals) to X , a.s., with
respect to P. Similarly, if we let X ′0 (t) = x0 and define X ′i+1(t) by
X ′i+1(t) = x0 +
∫ t
0
σ (X ′i (s)) dW
′
s +
∫ t
0
b(X ′i (s)) ds, (25.3)
then X ′i converges, uniformly over finite time intervals, to X
′.
204

Weak solutions of SDEs 205
Now since W is a Brownian motion under P and W ′ is a Brownian motion under P′, then
the law of (X0,W ) under P equals the law of (X ′0,W
′) under P′. By (25.2) and (25.3), the
law of (X1,W ) under P equals the law of (X ′1,W
′) under P′, and iterating, the law of (Xi,W )
under P equals the law of (X ′i ,W
′) under P′. Passing to the limit, the law of (X ,W ) under P
equals the law of (X ′,W ′) under P′.
We now give an example to show that weak uniqueness might hold even if pathwise
uniqueness does not. Let σ (x) be equal to 1 if x ≥ 0 and −1 otherwise. We take b to be
identically 0. We consider solutions to
Xt =
∫ t
0
σ (Xs) dWs. (25.4)
Weak uniqueness holds since ifW is a Brownian motion under P, then Xt must be a martingale,
and the quadratic variation of X is d〈X 〉t = σ (Xt )2 dt = dt; by Lévy’s theorem (Theorem
12.1), Xt is a Brownian motion. Given a Brownian motion Xt and letting Wt =
∫ t
0
1
σ (Xs)
dXs,
then again by Lévy’s theorem, Wt is a Brownian motion and Xt =
∫ t
0 σ (Xs) dWs; thus weak
solutions exist.
On the other hand, pathwise uniqueness does not hold. To see this, let Yt = −Xt . We have
Yt =
∫ t
0
σ (Ys) dWs − 2
∫ t
0
1{0}(Xs) dWs. (25.5)
The second term on the right has quadratic variation 4
∫ t
0 1{0}(Xs) ds; this is 0 almost surely
because we showed in Exercise 11.1 that the amount of time Brownian motion spends at 0
has Lebesgue measure 0. Therefore Y is another pathwise solution to (25.4).
This example is not satisfying because one would like σ to be positive and even continuous
if possible. Such examples exist, however. For each β < 1/2, Barlow (1982) has constructed functions σ that are Hölder continuous of order β and bounded above and below by positive constants and for which dXt = σ (Xt ) dWt, X0 = x0, (25.6) has a unique weak solution but no pathwise solution exists. Let us show how the technique of time change can be used to study weak uniqueness. We consider the SDE (25.6). Proposition 25.3 If σ is Borel measurable and there exist c2 > c1 > 0 such that c1 ≤
σ (x) ≤ c2 for all x, then weak existence and weak uniqueness hold for (25.6).
Proof We consider only uniqueness, leaving existence as Exercise 25.1. Suppose (X ,W, P)
and (X ′,W ′, P′) are two weak solutions. Then Xt is a martingale, and as in Section 12.2, if
we set
At =
∫ t
0
σ (Xs)
2 ds, τt = inf{s : As ≥ t},
then Mt = Xτt is a Brownian motion under P. Define A′, τ ′, and M ′ analogously. The law of
M under P is that of a Brownian motion, as is that of M ′ under P′.

206 Weak solutions of SDEs
Now let
Bt =
∫ t
0
1
σ (Ms)2
ds, ρt = inf{s : Bs ≥ t}. (25.7)
Since Mt is a Brownian motion and σ is bounded above and below by positive constants,
then Bt is continuous, strictly increasing, and increases to infinity as t → ∞, and the same
is therefore true of ρt . By a change of variables,
Bt =
∫ t
0
1
σ (Xτs )2
ds =
∫ τt
0
1
σ (Xu)2
dAu
=
∫ τt
0
1
σ (Xu)2
σ (Xu)
2 du = τt .
Therefore Mρt = Xτ (ρt ) = Xt . We have the analogous formulas with primes.
The law of M under P equals the law of M ′ under P′ since both are Brownian motions, so
by (25.7) the law of (M, B) under P equals the law of (M ′, B′) under P′, and consequently
the law of (M, ρ) under P equals the law of (M ′, ρ ′) under P′. Since Xt = Mρt and similarly
for X ′, we conclude the law of X under P equals the law of X ′ under P′. Finally, since
Wt =
∫ t
0
1
σ (Xs)
dXs and similarly for W ′, the joint law of (X ,W ) under P equals the joint law
of (X ′,W ′) under P′.
We point out that in the above proof it is essential that one can reconstruct X from M in a
measurable way.
We now use the Girsanov theorem to prove weak uniqueness for (25.1).
Proposition 25.4 Suppose σ and b are measurable and bounded above and σ is bounded
below by a positive constant. Then weak existence and uniqueness holds for (25.1).
Proof We prove the weak uniqueness, leaving it as Exercise 25.2 to prove existence. Define
{F t} to be the minimal augmented filtration generated by X ,
Mt = exp
(
−
∫ t
0
b
σ
(Xs) dWs − 12
∫ t
0
( b
σ
(Xs)
)2
ds
)
,
and Q the probability measure defined by Q(A) = E P[Mt; A] if A ∈ Ft . By Theorem 13.3,
under Q, the process W̃t = Wt +
∫ t
0 (b/σ )(Xs) ds is a Brownian motion, and
dXt = σ (Xt )
(
dWt + b
σ
(Xt ) dt
)
= σ (Xt ) dW̃t .
Define M ′, Q′, and W̃ ′ analogously. By Proposition 25.3 the law of (X ,W̃ ) under Q is
equal to the law of (X ′,W̃ ′) under Q′. Let n ≥ 1, t1 < · · · < tn, and let A1, . . . , An be Borel subsets of R. Set B = {Xt1 ∈ A1, . . . , Xtn ∈ An} and define B′ analogously. We have P(B) = ∫ B dP dQ dQ = ∫ B exp ( ∫ t 0 b σ (Xs) dWs + 12 ∫ t 0 ( b σ (Xs) )2 ds ) dQ = ∫ B exp ( ∫ t 0 b σ (Xs) dW̃s − 12 ∫ t 0 ( b σ (Xs) )2 ds ) dQ. Exercises 207 Using the analogous formula for P′(B′) and the fact that the law of (X ,W̃ ) under Q is the same as that of (X ′,W̃ ′) under Q′, we see that P(B) = P′(B′); thus the finite-dimensional distributions of X under P and of X ′ under P′ are the same. Since both X and X ′ are continuous processes, we conclude from Theorem 2.6 that the law of X under P equals the law of X ′ under P′. Defining Yt = Xt − ∫ t 0 b(Xs) ds and similarly for Y ′, the joint law of (X ,Y ) under P equals the joint law of (X ′,Y ′) under P′. Finally, Wt = ∫ t 0 1 σ (Xs) dYs and similarly for W ′, so we obtain our conclusion. The procedure of using the Girsanov theorem to get rid of the drift also works in higher dimensions. However the time change procedure of Proposition 25.3 is not nearly as useful in higher dimensions as in one dimension. The question of weak uniqueness for the system of equations in Exercise 24.5 is quite an interesting one; see Bass (1997) and Stroock and Varadhan (1977). Exercises 25.1 Show weak existence holds under the hypotheses of Proposition 25.3. 25.2 Show weak existence holds under the hypotheses of Proposition 25.4. 25.3 Here is an example of an SDE where weak uniqueness does not hold. Suppose W is a one- dimensional Brownian motion and α ∈ (0, 12 ). Let σ (x) = 1 ∧ |x|α . Find two solutions to dXt = σ (Xt ) dWt that are not equal in law. Hint: One is the solution that is identically zero. The other can be constructed by time changing a Brownian motion by the inverse of the increasing process 2 ∫ t 0 (1 ∧ |Xs|2α )−1 ds. 25.4 (1) Suppose as and bs are bounded predictable processes with as bounded below by a positive constant. Let W be a one-dimensional Brownian motion. Suppose Y is a one-dimensional semimartingale such that dYt = atYt dWt + bt dt, Y0 = 0. Prove that if t0 > 0 and ε > 0, there exists a constant c > 0 depending only on t0, ε, and the
bounds on as and bs such that
P(sup
s≤t0
|Ys| < ε) > c.
(2) Now let W be d-dimensional Brownian motion, let x ∈ Rd , and let σ be a d × d matrix-
valued function that is bounded and such that σσ T (x) is positive definite, uniformly in x. That
is, there exists � > 0 such that for all x,
d∑
i, j=1
yiy j(σ (x)σ
T (x))i j ≥ �
d∑
i=1
y2i , (y1, . . . , yd ) ∈ Rd .
Let b be a d × 1 matrix-valued function that is bounded. Let X be the solution to
dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x.

208 Weak solutions of SDEs
Use Itô’s formula to find an equivalent expression for |Xt − x|2. Then use (1) to prove that if
t0 > 0 and ε > 0, there exists a constant c > 0 not depending on x such that
Px(sup
s≤t0
|Xs − x| < ε) > c.
25.5 This is the support theorem for solutions to SDEs. Let X , x, ε, and t0 be as in (2) of Exercise
25.4. Suppose ψ : [0, t0] → Rd is a continuous function with ψ(0) = x. Use the Girsanov
theorem to prove that there exists c > 0 such that
Px(sup
s≤t0
|Xs − ψ(s)| < ε) > c.
25.6 Suppose weak uniqueness holds for the one-dimensional stochastic differential equation
dXt = σ (Xt ) dWt , X0 = x, (25.8)
where W is a one-dimensional Brownian motion. Suppose also that there exists a process X ′
that is adapted to the minimal augmented filtration of W with X ′0 = x and dX ′t = σ (X ′t ) dWt .
Prove that pathwise uniqueness holds for (25.8).
Hint: Show there exists a measurable map F fromC[0,∞) → C[0,∞) such that X ′ = F (W ).
If X ′′ is another solution to (25.8), then weak uniqueness shows that the laws of (X ′′,W ) and
(X ′,W ) are equal, hence X ′′ = F (W ) = X ′.

26
The Ray–Knight theorems
The local time of Brownian motion, Lxt , is parameterized by space and time: x and t. Ray
and Knight independently discovered that at certain stopping times T , the process x → LxT
is a Markov process.
Times that work are (1) the first time local time at 0 reaches a level r; (2) an exponential
random variable T that is independent of the Brownian motion; and (3) the first time T
that Brownian motion reaches the level one. We will prove the version of the Ray–Knight
theorems in the last case. We will show that if W is a Brownian motion with local times Lxt
and
T = inf{t > 0 : Wt = 1},
then the process L1−xT indexed by x has the same law as the square of a Bessel process of
order 2. We will see in Chapter 39 that the square of a Bessel process is a Markov process.
We will use the following lemma.
Lemma 26.1 Suppose X ( j)t , j = 1, 2, are two continuous processes such that
E exp
(
−
∫ 1
0
f (s)X (1)s ds
)
= E exp
(
−
∫ 1
0
f (s)X (2)s ds
)
whenever f is a non-negative continuous function with support in (0, 1). Then the laws of
{X ( j)t ; 0 ≤ t ≤ 1}, j = 1, 2, are equal.
Proof Let ϕ be a non-negative continuous function with support in [0, 1] such that∫ 1
0 ϕ(x) dx = 1, and let ϕε(x) = ε−1ϕ(x/ε), so that the sequence {ϕε} is an approxima-
tion to the identity. If g is a continuous function and t �= 0, then ∫ g(s)ϕε(s − t) ds → g(t).
Now let t1, . . . , tn ∈ (0, 1), a1, . . . , an > 0, and set fε(x) =
∑n
i=1 aiϕε(x − ti). Using the
hypothesis and letting ε → 0, we obtain
E exp
(
−
n∑
i=1
aiX
(1)
ti
)
= E exp
(
−
n∑
i=1
aiX
(2)
ti
)
.
The left-hand side is the joint Laplace transform of (X (1)t1 , . . . , X
(1)
tn ) and the right-hand side
is the same for X (2). By the uniqueness of the Laplace transform, the finite-dimensional
distributions of X (1) and X (2) are equal. Both processes have continuous paths, and the
conclusion now follows from Theorem 2.6.
209

210 The Ray–Knight theorems
Let Bt be a Brownian motion, not necessarily the same as Wt , and let Zt be the non-negative
solution to
dZt = 2
√
Zt dBt + 2 dt, Z0 = 0, 0 ≤ t ≤ 1. (26.1)
The solution to this equation is unique by Theorem 24.4, and Zt is the square of a Bessel
process of order 2.
Theorem 26.2 The processes {L1−xT ; 0 ≤ x ≤ 1} and {Zx; 0 ≤ x ≤ 1} have the same law.
Proof Let f ≥ 0 be a continuous function whose support [a, b] is a subset of (0, 1). Let F
be the solution to
F ′′(x) = 2F (x) f (x), F (1) = 1, F ′(1) = 0;
see Exercise 26.1. Define g(x) = f (1 − x) and G(x) = F (1 − x), so that G′′ = 2Gg,
G′(0) = 0, and G(0) = 1. We will show
E exp
(
−
∫ 1
0
f (x)L1−xT dx
)
= E exp
(
−
∫ 1
0
f (t)Zt dt
)
, (26.2)
and then apply Lemma 26.1.
The left-hand side of (26.2) is equal to
E exp
(
−
∫ 1
0
f (1 − x)LxT dx
)
= E exp
(
−
∫ 1
0
g(x)LxT dx
)
= E exp
(
−
∫ T
0
g(Xs) ds
)
,
where the last equality follows from the occupation time formula (Theorem 14.4). Let
Mt = G(Wt )e−
∫ t
0 g(Ws ) ds.
By Itô’s formula and the product formula,
dMt = −G(Wt )g(Wt )e−
∫ t
0 g(Ws) ds dt + G′(Wt )e−
∫ t
0 g(Ws ) ds dWt
+ 12 G′′(Wt )e−
∫ t
0 g(Ws ) ds dt
= G′(Wt )e−
∫ t
0 g(Ws ) ds dWt,
since 12 G
′′ − Gg = 0. Therefore Mt is a martingale. Since G is bounded on (−∞, 1], then
Mt∧T is bounded and we then have
1 = G(0) = E M0 = E MT = E G(1)e−
∫ T
0 g(Ws ) ds,
so
E exp
(
−
∫ T
0
g(Ws) ds
)
= 1
G(1)
. (26.3)
Now look at the right-hand side of (26.2). Let
Nt = 1
F (t)
exp
(
Zt
F ′(t)
2F (t)
−
∫ t
0
f (s)Zs ds
)
.

The Ray–Knight theorems 211
Let
Yt = Zt F
′(t)
2F (t)
,
so using (26.1),
dYt = Zt 2F (t)F
′′(t) − 2F ′(t)2
4F (t)2
dt + 2 F
′(t)
2F (t)
√
Zt dBt + 2 F
′(t)
2F (t)
dt.
If
Xt = Yt −
∫ t
0
f (s)Zs ds,
then the martingale part of X is ∫ t
0
F ′(s)
F (s)
√
Zs dBs,
and hence
d〈X 〉t =
(F ′(t)
F (t)
)2
Zt dt.
By Itô’s formula and the product formula and using F ′′ = 2F f ,
dNt = − F
′(t)
F (t)2
eXt dt + 1
F (t)
eXt
{
Zt
F ′′(t)
2F (t)
dt − Zt F
′(t)2
2F (t)2
dt
+ F
′(t)
F (t)
√
Zt dBt + F
′(t)
F (t)
dt − f (t)Zt dt
}
+ 12
1
F (t)
eXt
F ′(t)2
F (t)2
Zt dt
=F
′(t)
F (t)
√
Zt dBt .
Observe that F is continuous and positive on [0, 1], hence bounded below on [0, 1] by a
positive constant. Also F ′ is bounded above on [0, 1]. We see that Nt∧1 is a martingale. Then
E N0 = 1/F (0) = 1/G(1), while
E N1 = 1
F (1)
E exp
(
Z1
F ′(1)
2F (1)
−
∫ 1
0
f (s)Zs ds
)
= E e−
∫ 1
0 f (s)Zs ds.
Therefore
E exp
(
−
∫ 1
0
f (s)Zs ds
)
= E N1 = E N0 = 1
G(1)
.
Combining with (26.3), we conclude the two sides of (26.2) are equal.

212 The Ray–Knight theorems
You may wonder how the function F was arrived at. Exercises 26.2 and 26.3 may shed
some light on this.
Exercises
26.1 Suppose f is a non-negative continuous function whose support [a, b] is a subset of (0, 1).
Show that there is a unique solution to the ordinary differential equation F ′′(x) = 2F (x) f (x),
F (1) = 1, F ′(1) = 0, that F is everywhere positive, and F is bounded on [0,∞).
Hint: Since f is zero in (b,∞), then F ′′ is zero there, and hence is of the form F (x) = Ax+B
for some A and B for x ≥ b. Since F ′(1) = 0, conclude that A is 0.
26.2 Suppose Xt is a solution to the one-dimensional SDE
dXt = σ (Xt ) dWt + b(Xt ) dt.
Suppose σ and b are bounded and continuous and f is a bounded and continuous function. What
ordinary differential equation must H (x) satisfy (in terms of σ, b, and f ) in order that
Mt = H (Xt )e
∫ t
0 f (Xs) ds
be a martingale?
26.3 Suppose Xt is a solution to the one-dimensional SDE
dXt = σ (Xt ) dWt + b(Xt ) dt.
Suppose σ and b are bounded and continuous and f is a bounded and continuous function. What
partial differential equation must K(x, t)) satisfy (in terms of σ, b, and f ) in order that
Nt = K(Xt , t)e
∫ t
0 f (s)Xs ds
be a martingale?
26.4 Let W be a Brownian motion and Lyt the local times at level y. Prove that local times at a fixed
time t are not a Markov process. That is, let t > 0 be fixed and show that (Lyt , y ≥ 0) is not a
Markov process in the variable y.
26.5 Let S be the first time two-dimensional Brownian motion exits the unit ball and let ψ(λ) =
P0(S > λ). If W is a one-dimensional Brownian motion with local times Lxt and T = inf{t >
0 : Wt = 1}, find the distribution of Y = sup0≤x≤1 LxT in terms of ψ , i.e., write P(Y ≤ λ) in
terms of the function ψ .
26.6 Suppose x ∈ (0, 1). With W and T as in Exercise 26.5, find the distribution of LxT .
26.7 Let W be a one-dimensional Brownian motion with local times Lxt . Let Tr = inf{t > 0 : L0t = r}.
The law of the process x → LxTr can be described as follows:
(1) The law of {LxTr , x ≥ 0} is the same as the law of {Xx, x ≥ 0} started at r, where X is the
square of a Bessel process of order 0.
(2) The law of {L−xTr , x ≥ 0} is also the same as the law of {Xx, x ≥ 0} started at r, where X is
the square of a Bessel process of order 0.
(3) The processes {LxTr , x ≥ 0} and {L−xTr , x ≥ 0} are independent of each other.

Notes 213
This is proved in Revuz and Yor (1999), Section XI.2, or for a challenge, try to prove (1) for
yourself using the techniques of this chapter. Using this description of LxTr , find the distribution
of L∗Tr = supx LxTr .
Notes
There are several other proofs of the Ray–Knight theorems. One by Walsh (Rogers and
Williams, 2000b; Walsh, 1978) uses excursion theory. In the next chapter we will indicate
some ideas used in that proof.

27
Brownian excursions
The paths of a Brownian motion Wt are continuous, so the zero set Z(ω) = {t : Wt (ω) = 0}
is a closed set. The complement of Z(ω) is an open subset of the reals, hence is the countable
union of disjoint open intervals. If (a, b) is one of those intervals (depending on ω, of course),
then {Wt (ω) : a ≤ t ≤ b} is a continuous function of t that is zero at t = a and t = b but is
never 0 for any t ∈ (a, b). We call this piece of the path of Wt (ω) an excursion.
To be more formal, let E be the collection of continuous functions f with domain [0, ∞)
such that the following hold: there exists a positive real σ f such that f (0) = 0, f (σ f ) = 0,
f (t) �= 0 if t ∈ (0, σ f ), and f (t) = 0 if t > σ f . We make E into a metric space by furnishing
it with the supremum norm. Given a Borel subset A of E , we say that the Brownian motion
W has had an excursion in A by time t if there exists a time u and a function f ∈ A such that
u + σ f ≤ t and Wu+s(ω) = f (s) for all s ≤ σ f . Let Kt (A) be the number of excursions of W
in A by time t. Let L0t be Brownian local time at 0, and let
Tr = inf{t > 0 : L0t ≥ r} (27.1)
be the inverse of Brownian local time at zero.
Set
Nr(A) = KTr (A).
Although Nt (A) might be identically infinite for some sets A, it will be finite for others. For
example, let δ > 0 and suppose that every function in A has a supremum greater than δ. The
continuity of the paths of W implies that Nt (A) is finite for every t.
The main result of this section is the following.
Theorem 27.1 Nt (·) is a Poisson point process.
Proof If Nt (B) is not infinite, then it has right-continuous paths that increase at most 1
at any given time. The main step will be to show that Nt (B) has stationary increments and
Nt (B) − Ns(B) is independent of the σ -field generated by the random variables
{Nr(A) : r ≤ s, A a Borel subset of E}.
214

Brownian excursions 215
If r1 ≤ · · · ≤ rn ≤ s < t, k ≥ 0, j1, . . . , jn ≥ 0, and B and A1, . . . , An are Borel subsets of E , then P(Nt (B) − Ns(B) = k; Nr1 (A1) = j1, . . . Nrn (An) = jn) (27.2) = P(KTt (B) − KTs (B) = k; KTr1 (A1) = j1, . . . , KTrn (An) = jn) = E [PWTs (KTt−s (B) − KT0 (B) = k); KTr1 (A1) = j1, . . . , KTrn (An) = jn], where we used the strong Markov property at time Ts. Since Ts is the first time that local time of Brownian motion at 0 exceeds s and L0t increases only when W is at 0, then at time Ts the process W is at 0, so WTs = 0. Therefore the last expression in (27.2) equals P0(KTt−s (B) − K0(B) = k)P(KTr1 (A1) = j1, . . . , KTrn (An) = jn), which can be rewritten as P0(Nt−s(B) − N0(B) = k)P(Nr1 (A1) = j1, . . . , Nrn (An) = jn). This shows that the law of Nt (B) − Ns(B) is the same as the law of Nt−s(B) − N0(B) and is independent of σ (Nr(A) : r ≤ s, A ⊂ E ), which is what we wanted. Observe that Nt (B) is constant except for jumps of size one. By Proposition 5.4, Nt (B) is a Poisson process. It is clear that Nt (B) is a measure in B, which completes the proof. Let m(A) = E 0N1(A). The measure A is called the excursion measure. We can say a few things about m. Proposition 27.2 If A = { f ∈ E : sup t | f (t)| > a},
then m(A) = 1/a.
Proof Let U = inf{t : |Wt | = a} and V = inf{t > U : Wt = 0}. Since |Wt | − L0t is a
martingale by Theorem 14.1, then E 0|Wt∧U | = E 0L0t∧U . Letting t → ∞ and using dominated
convergence on the left and monotone convergence on the right,
E L0U = E 0|WU | = a.
Set R = inf{r : Nr(A) = 1}. Because Nr(A) is a Poisson process, then R is an exponential
random variable with parameter E N1(A) = m(A). It therefore suffices to show E 0R = a;
see (A.9).
We have R = inf{r : KTr (A) = 1}, and because K can only increase at times when Wt = 0,
then
R = inf{L0t : Kt (A) = 1}.
Now Kt (A) will first equal one when t = V . But because local time at 0 does not increase
when W is not at 0, L0V = L0U . Therefore
E
0R = E 0L0V = E 0L0U = a.
We conclude that m(A) = 1/a.

216 Brownian excursions
By symmetry, if B = { f ∈ E : supt f (t) > a}, then m(B) = 1/(2a). One can say
more about m. Consider those excursions whose maximum is some fixed value b. Starting
at any point other than 0, the excursion can be viewed as a Brownian motion killed at 0
and conditioned to have maximum b. Such a path can be decomposed into the part before
the maximum, which is a Brownian motion conditioned to hit b before 0, and the part after
the maximum, which is Brownian motion conditioned to hit 0 before b. The former can
be shown to have the same law as a three-dimensional Bessel process, up until it hits the
level b (see the example in Section 22.2), and the latter the same law as b − Xt , where Xt is
also a three-dimensional Bessel process up until it hits the level b. Moreover, the part of the
path before the maximum can be taken to be independent of the part of the path after the
maximum. See Rogers and Williams (2000b) for details.
Let us briefly revisit the Ray–Knight theorems and indicate how Brownian excursions
can be used to obtain information about local times at different levels. Fix r and let Tr =
inf{t > 0 : L0t ≥ r}. If x > 0 and y1, . . . , yn < 0, then the local time at x is a function of the excursions from 0 that hit x and the local times at y1, . . . , yn are functions of the excursions that go below zero. Since the set of excursions that take positive values and those that take negative values are independent, then LxTr should be independent of L y1 Tr , . . . , LynTr . To find the distribution of LxTr , there are a Poisson number of excursions that reach the level x. Each excursion that reaches x contributes an amount to the local time at x that is an exponential random variable; see Exercise 27.1. After proving some additional independence, namely, that the amount each excursion contributes to local time at x is independent of the amount any other excursion contributes and that the amount contributed by an excursion is independent of the number of excursions reaching x, we see that LxTr should have the same distribution as a Poisson number of independent exponential random variables. Exercises 27.1 Let W be a Brownian motion, x > 0, and T = inf{t > 0 : Wt = x}. If Lxt is the local time at x,
show that the distribution of LxT is an exponential random variable. Determine the parameter of
this exponential random variable.
27.2 Let W be a one-dimensional Brownian motion. This exercise asks you to prove that the nor-
malized number of downcrossings by time t converges to local time at 0. If a > 0, let S0 = 0,
T0 = inf{t : Wt = a}, and for i ≥ 1,
Si = inf{t > Ti−1 : Wt = 0}, Ti = inf{t > Si : Wt = a}.
Then Dt (a), the number of downcrossings up to time t, is defined to be sup{k : Sk ≤ t}. Prove
that there exists a constant c such that
lim
a→0
aDt (a) = cL0t , a.s.,
where L0t is local time at 0 of W . Determine c.
Hint: Use Exercise 18.5.
27.3 Let (Xt , Px) be a Brownian motion.
(1) Use the reflection principle to find
P0(Xs > −a for all s ≤ r).

Notes 217
This is the same as Pa(T0 > r), where T0 is the first time the Brownian motion hits 0.
(2) Let
A(a, r) = { f ∈ E : sup f (t) > a, σ f > r},
B(r) = { f ∈ E : σ f > r, sup f (t) > 0},
and
C(a) = { f ∈ E : sup f (t) > a}.
Prove that
m(B(r)) = lim
a→0
m(A(a, r)) = lim
a→0
[m(C(a)) × Pa(T0 > r)]
and use this and (1) to compute m(B(r)). By symmetry, m({ f ∈ E : σ f > r}) will be twice the
value of m(B(r)).
27.4 Let W be a Brownian motion. Let Et (r) be the number of excursions of length larger than r that
have been completed by time t. An excursion of length larger than r means that σ f > r. Show
that there exists a constant c such that
lim
r→0
√
rEt (r) = cL0t , a.s.
Determine c.
One interesting point here is that this shows that L0t is determined entirely by the zero set
Z(ω) = {t : Wt (ω) = 0}.
27.5 Let δ > 0 and Aδ = { f ∈ E : supt | f (t)| > δ}. Let S1 = inf{t : Kt (Aδ ) = 1} and S2 = inf{t >
S1 : Kt (Aδ ) = 2}. Thus S1 and S2 are the times the first and second excursions in Aδ have been
completed. Let Y1(t) be the excursion completed at time S1 and define Y2(t) similarly. To be
more precise, if R1 = sup{t < S1 : Wt = 0}, then Y1(s) = WR1+s if s ≤ S1 − R1 and Y1(s) is equal to 0 for all s ≥ S1 − R1. Prove that Y1 and Y2 are independent. Hint: Use the strong Markov property at time S1. Notes Besides its use in the Ray–Knight theorems (Rogers and Williams, 2000b), excursion theory is useful in many other contexts. See Rogers and Williams (2000b) for applications to Skorokhod embedding and to the arc sine law. 28 Financial mathematics A European call option is the option to buy a share of stock at a given price at some particular time in the future. For example, I might buy a call option to purchase one share of Company X for $40 three months from today. When the three months is up, I check the price of Company X. If, say, it is $35, then my option is worthless, because why would I buy a share for $40 using the option when I could buy it on the open market for $35? But if three months from now, the share price is, say, $45, then I can exercise my option, which means I buy a share for $40, and I can then turn around immediately and sell that share for $45 and make a profit of $5. Thus, today, there is a potential for a profit if I have a call option, and so I should pay something to purchase that option. A significant part of financial mathematics is devoted to the question of what is the fair price I should pay for a call option. Options originated in the commodities market, where farmers wanted to hedge their risks. Since then many types of options have been developed (options are also known as derivatives), and the amount of money invested in options has for the past several years exceeded the amount of money invested in stocks. In 1973 Black and Scholes, using the reasonable principle that you can’t get something for nothing, came up with a convincing formula for the price of an option. This chapter gives two derivations of the Black–Scholes formula, proves the fundamental theorem of finance, and finishes by considering a stochastic control problem. The Black–Scholes formula is a beautiful example of applied stochastic processes. 28.1 Finance models Let Wt be a Brownian motion. We assume that St is the price of a stock or other risky security. If we have $2,000 and we buy 100 shares in a stock that sells for $20 per share and it goes up $2, or if we buy 10 shares in a stock selling for $200 per share that goes up $20, we are equally happy; it is the percentage increase that matters. With this in mind, we assume that St satisfies dSt = μSt dt + σSt dWt . (28.1) This is plausible, since then dSt/St = μ dt + σ dWt , that is, we are assuming the relative change in price is a multiple of Brownian motion with drift. The quantity μ is known as the mean rate of return and σ is called the volatility. The solution to this SDE is St = S0eσWt+(μ−(σ 2/2))t (28.2) by Proposition 24.6. 218 28.1 Finance models 219 We also assume the existence of a bond with price Bt , which is assumed to be riskless, and the equation for Bt is dBt = rBt dt, which implies Bt = B0ert . Suppose at time t one buys A shares of stock. The cost is ASt . If one sells the shares at time t + h, one receives ASt+h, and the net gain is A(St+h − St ). One can also sell short, i.e., let A be negative. The formula for the gain is the same. Suppose at time ti one holds Ai shares, up until time ti+1. The total net gain over the whole period t0 to tn is ∑n−1 i=0 Ai(Sti+1 − Sti ). This is the same as the stochastic integral ∫ t 0 at dSt if at equals Ai when ti ≤ t < ti+1. One should allow Ai to depend on the entire past F ti . Idealizing, one allows continuous trading, and if as is the number of shares held at time s, the net gain through trading the stock is ∫ t 0 as dSs. One has a similar net gain of ∫ t 0 bs dBs when trading bonds if bs is the number of bonds held at time s. Although at can depend on the entire past Ft , one does not want to let at depend on the future. This helps explain why the class of predictable integrands is the appropriate one to use. The pair (a, b) is called a trading strategy. Set Vt = atSt + btBt, (28.3) the amount of wealth one has at time t. The strategy is self-financing if Vt = V0 + ∫ t 0 as dSs + ∫ t 0 bs dBs (28.4) for all t. The first integral represents the net gain from trading in the stock, the second integral the net gain from trading in the bond, and (28.4) says that one’s wealth at time t is equal to what one starts with plus what one has realized through trading in the stock and bond. We assume throughout that there are no transaction costs (i.e., no brokerage fees). A European call gives the buyer the option of buying a share of the stock at a fixed time tE at price K. The time tE is called the exercise time. After time tE , the option has expired and is worthless. What is the option worth? At time tE , if StE ≤ K, the option is worth nothing, for who would pay K dollars for a share of stock when it sells for StE dollars? If StE > K, one can
use the option to buy a share of the stock at price K and immediately sell it at price StE , to
make a profit of StE − K. Thus the value of the option at time tE is (StE − K)+. An important
question is: how much should the option sell for? What is a fair price for the option at
time 0?
There are a myriad of types of options. The American call is almost the same as the
European call, except that one is allowed to buy a share of the stock at price K at any time in
the interval [0, tE]. The European put gives the buyer the option to sell a share of the stock at
price K at time tE , while the American put gives the buyer the option to sell a share at price
K anytime before time tE .

220 Financial mathematics
28.2 Black–Scholes formula
In 1973 Black and Scholes came up with their formula for the price of a European call. We
will give two derivations of this formula.
Derivation 1. First of all, the interest rate r on the bond may be considered to be the same as
the rate of inflation. Thus the value of the option (StE − K)+ in today’s dollars is
C = e−rtE (StE − K)+. (28.5)
In this first derivation we work in today’s dollars. Therefore the present-day value of the stock
is Pt = e−rtSt . Note P0 = S0 and the present-day value of our option at time tE is then
C = e−rtE (StE − K)+ = (PtE − e−rtE K)+. (28.6)
By the product formula,
dPt = e−rt dSt − re−rtSt dt
= e−rtσSt dWt + e−rtμSt dt − re−rtSt dt
= σPt dWt + (μ − r)Pt dt.
The solution to this stochastic differential equation (see Proposition 24.6) is
Pt = P0eσWt+(μ−r−σ 2/2)t .
Also, the net gain or loss in present-day dollars when holding as shares of stock at time s is∫ t
0 as dPs.
Define Q on FtE by
dQ/dP = MtE = exp
(
− μ − r
σ
WtE −
(μ − r)2
2σ 2
tE
)
.
Under Q, W̃t = Wt + μ−rσ t is a Brownian motion by the Girsanov theorem.
Now
dPt = σPt dWt + (μ − r)Pt dt = σPt
(
dWt + μ − r
σ
dt
)
= σPt dW̃t .
Therefore under Q, Pt is a martingale since stochastic integrals with respect to martingales
are martingales. The solution to the SDE
dPt = σPtdW̃t
is
Pt = P0eσW̃t−(σ 2/2)t, (28.7)
so Pt and W̃t have the same filtration.
C is FtE measurable. By the martingale representation theorem (Theorem 12.3), there
exists an adapted process As such that
C = E QC +
∫ tE
0
As dW̃s = E QC +
∫ tE
0
Ds dPs,
where Ds = As/(σPs).

28.2 Black–Scholes formula 221
Therefore, if one follows the trading strategy of buying and selling the stock St , where one
holds Ds shares of stock at time s, one can obtain C − E QC dollars at time tE . Or, starting
with E QC dollars and buying and selling stock, one can get the identical output as C, almost
surely. A standard assumption in finance is that of no arbitrage, which means you cannot
make a profit without taking some risk. To avoid riskless profits, C must sell for E QC.
To explain this in more detail, suppose you could sell the European call for C′ dollars.
If C′ > E QC, you could sell a call for C′ dollars, use the money and invest in the trading
strategy of holding Ds shares of stock at time s, and at time tE have C′ + C − E QC worth
of stocks and options. The buyer of the option decides whether to exercise the option, and it
costs you C dollars to meet that obligation. With probability one, you have gained C′ − E QC
dollars, a riskless profit. If C′ < E QC, simply reverse the roles of buying and selling. The only way to avoid making a riskless profit is if C′ = E QC. To find E QC, using (28.6) and (28.7) we write E QC = E Q[(S0eσW̃tE −σ 2tE/2 − e−rtE K)+] (28.8) = 1√ 2πtE ∫ (S0e σy−σ 2tE/2 − e−rtE K)+e−y2/2tE dy, which is the Black–Scholes formula. One can, if one wishes, perform some calculations to find alternate expressions for the right-hand side. It is noteworthy that μ does not appear in (28.8)! You and I might have different opinions as to what μ, the mean rate of return, is equal to, but we should agree on the price of the call. This was a shock to economists when this was first discovered. The value of σ , the volatility, does enter into the formula. Until we evaluated E QC in (28.8), the actual form of C was unimportant. For any type of option expiring at time tE , Derivation 1 tells us that its price at time zero should be its expectation under Q. Derivation 2. In this approach, which is the one used by Black and Scholes, we use the actual values of the securities, not the present-day values. Let Vt be the value of the option at time t and assume Vt = f (St, tE − t) (28.9) for all t, where f is some function that is sufficiently smooth. We also want VtE = (StE −K)+. Recall the multivariate version of Itô’s formula (Theorem 11.2). We apply this with d = 2 and Xt = (St, tE − t). From (28.1), 〈S〉t = σ 2S2t dt, 〈tE − t〉t = 0 since tE − t is of bounded variation and hence has no martingale part, and 〈S, tE − t〉t = 0. Also, d(tE − t) = −dt. Then Vt − V0 = f (St, tE − t) − f (S0, tE ) (28.10) = ∫ t 0 fx(Su, tE − u) dSu − ∫ t 0 ft (Su, tE − u) du + 12 ∫ t 0 σ 2S2u fxx(Su, tE − u) du. 222 Financial mathematics Here fx is the partial derivative with respect to x, the first variable, fxx is the second partial derivative with respect to x, and ft is the partial derivative with respect to t, the second variable. On the other hand, Vt − V0 = ∫ t 0 au dSu + ∫ t 0 bu dBu. (28.11) By (28.3) and (28.9), bt = Vt − atSt Bt = f (St, tEt − t) − atSt Bt . (28.12) Also, recall Bt = B0ert . Comparing (28.10) with (28.11), we must therefore have at = fx(St, tE − t) (28.13) and − ft (St, tE − t) + 12σ 2S2t fxx(St, tE − t) = btB0rert . (28.14) Substituting for bt using (28.12), r[ f (St, tE − t) − St fx(St, tE − t)] (28.15) = − ft (St, tE − t) + 12σ 2S2t fxx(St, tE − t) for almost all t and all St . Since St is a continuous process, (28.15) leads to the parabolic partial differential equation (PDE) ft = 12σ 2x2 fxx + rx fx − r f , (x, s) ∈ (0, ∞) × [0, tE ), and f (x, 0) = (x − K)+. Solving this equation for f , f (x, tE ) tells us what V0 should be, i.e., the cost of setting up the equivalent portfolio. This partial differential equation can be solved and the solution is the Black–Scholes formula. Equation (28.13) shows what the trading strategy should be. Let us now briefly discuss American calls. Recall that these are ones where the holder can buy the security at price K at any time up to time tE . Since the holder of an American call can always wait up to time tE , which is equivalent to having a European call, the value of an American call should always be at least as large as the value of the corresponding European call. Suppose one exercises an American call early. If StE > K and one exercised early, at time
tE one has one share of stock, for which one paid K, and one has a profit of (StE − K).
However, because one purchased the stock before time tE , one lost the interest Ker(tE−t) that
would have accrued by waiting to exercise the option. (We are supposing r ≥ 0.) Thus in this
case it would have been better to wait until time tE to exercise the option.
On the other hand, if StE < K, exercising the option early would mean that one has lost |StE − K|, whereas for the European option, one would have not exercised at all, and lost nothing (other than the price of the option). In either case, exercising early gains nothing, hence the price of an American call should be the same as that of a European call. 28.3 The fundamental theorem of finance 223 One can equally well price the European put, the option to sell a share of stock at price K at time tE , by either Derivation 1 or Derivation 2 of the Black–Scholes formula. However this analysis breaks down for American puts (sell a share of stock anytime up to time tE), because in this case one gains by selling early: one can earn interest on the money received. 28.3 The fundamental theorem of finance In the preceding section, we showed there was a probability measure Q under which Pt was a martingale. This is true very generally. Let St be the price of a security in present-day dollars. We will suppose St is a continuous semimartingale, and can be written St = Mt + At . The NFLVR condition (“no free lunch with vanishing risk”) is that one cannot find fixed positive real numbers t0, ε, b > 0, and predictable processes Hn with
∫ t0
0 |Hn(s)| |dAs| +∫ t0
0 H
2
n d〈M〉s < ∞, a.s., for each n such that∫ t0 0 Hn(s) dSs > −1
n
, a.s.,
for all n and
P
( ∫ t0
0
Hn(s) dSs > b
)
> ε.
Here t0, b, ε do not depend on n. The condition says that one can with positive probability
ε make a profit of b and with a loss no larger than 1/n. Q is an equivalent martingale measure
if Q is a probability measure, Q is equivalent to P (which means they have the same null
sets), and St is a local martingale under Q.
Theorem 28.1 If St is a continuous semimartingale and the NFLVR condition holds, then
there exists an equivalent martingale measure Q.
Proof Let us prove first of all that dAt is absolutely continuous with respect to d〈M〉t . We
suppose not and obtain a contradiction. Consider the measures μA and μ〈M〉 on the predictable
σ -field defined by
μA(D) = E
∫ ∞
0
1D dAt, μ〈M〉(D) = E
∫ ∞
0
1D d〈M〉t . (28.16)
Since A is of bounded variation and continuous, it is a predictable process, and we can
write At = Bt − Ct , where B and C are continuous increasing processes and where μB
and μC are mutually singular measures on the predictable σ -field; we define μB and μC
analogously to (28.16). To give a few more details on how to do this, we write At = B′t − C′t ,
where B′ and C′ are continuous increasing processes, we find non-negative predictable
processes bt and ct such that B′t =
∫ t
0 bt d(B
′
t + C′t ) and C′t =
∫ t
0 ct d(B
′
t + C′t ), and then let
Bt =
∫ t
0 (bt − (bt ∧ ct )) d(B′t + C′t ) and Ct =
∫ t
0 (ct − (bt ∧ ct )) d(B′t + C′t ). We leave it to
the reader to check that B and C are the desired processes. Since μB and μC are mutually
singular, there exists a set E in the predictable σ -field such that μB(D) = μB(D ∩ E) and
μC(D) = μC(D ∩ Ec) for all sets D in the predictable σ -field.
If μA is not absolutely continuous with respect to μ〈M〉, then at least one of μB and μC is not
absolutely continuous. We assume that μB is not, for otherwise we can look at −St instead of
St . Therefore there exists a predictable set F and a fixed time t0 such that
∫ t0
0 1F dBs is almost

224 Financial mathematics
surely non-negative, is strictly positive with positive probability, and
∫ t0
0 1F d〈M〉s = 0. We
can replace F by F ∩ E and so assume that F ⊂ E, and hence μC(F ) = μC(F ∩ Ec) = 0.
Then ∫ t0
0
1F dSs =
∫ t0
0
1F dMs +
∫ t0
0
1F dBs +
∫ t0
0
1F dCs.
The stochastic integral term is 0 because
∫ t0
0 (1F )
2 d〈M〉s = 0. The integral with respect to
Cs is zero because μC(F ) = 0. We then have the NFLVR condition violated with Hn = 1F
for all n. Hence absolute continuity is established, and by the Radon–Nikodym theorem,
At =
∫ t
0 hs d〈M〉s for some predictable process hs.
Our next goal is to show
∫ t
0 h
2
s d〈M〉s < ∞ for all t. Let U = inf { t : ∫ t 0 h2s d〈M〉s = ∞ } . On (U < ∞) there are two possibilities: (1) ∫ t 0 h 2 s d〈M〉s < ∞ if t 0)
1
x dx at t = 0.)
Let us investigate case (1) and show that it cannot happen. Choose a fixed time t0 such
that P(U < t0) > 0. Let
R1 = R1(n) = inf
{
t :
∫ t
0
h2s d〈M〉s ≥ n4
}
∧ U ∧ t0.
We suppose
inf
n
P(R1(n) 0 (28.17)
and obtain a contradiction. Let Ht = ht1[0,R1]/n4. Then∫ t0
0
Hs dAs =
∫ R1
0
h2s
n4
d〈M〉s ≥ 1
on (R1 1
n
)
≤
E supt≤t0
∣∣∣ ∫ t0 Hs dMs∣∣∣2
n−2
≤ 4/n
4
n−2
= 4
n2
.
Let
R2 = R2(n) = inf
{
t :
∣∣∣ ∫ t
0
Hs dMs
∣∣∣ ≥ 1/n}

28.3 The fundamental theorem of finance 225
and let H̃t = Ht1[0,R2]. We then have
P(R2 < R1) ≤ P(R2 ≤ t0) ≤ 4/n2,∫ t0 0 H̃s dSs = ∫ R2 0 H̃s dMs + ∫ R2 0 H̃s dAs ≥ −1 n + ∫ R2 0 h2s n4 d〈M〉s ≥ −1/n almost surely, and P ( ∫ t0 0 H̃s dSs ≥ 1 2 ) ≥ P(R1 < U < t0) − P(R2 < R1) ≥ P(R1 < U < t0) − 4 n2 . We do this for each n, and thus obtain a contradiction to the NFLVR condition, so (28.17) cannot hold. Case (2) is similar: choose δn such that ∫ U+1 U+δn h 2 s d〈M〉s ≥ n4 with positive probability, let Ht = ht1[U+δ,U+1]/n4, and proceed as above. We leave the details as Exercise 28.3. We thus have ∫ t 0 h 2 s d〈M〉s < ∞, a.s., for each t. Consequently the quantity sups≤t | ∫ s 0 hr dMr| is also finite. Let Vn = inf { t : ∣∣∣ ∫ t 0 hs dMs ∣∣∣ ≥ n or ∫ t 0 h2s d〈M〉s ≥ n } . We conclude Vn ↑ ∞. Define Q on FVn by dQ/dP = exp ( − ∫ Vn 0 hs dMs − 12 ∫ Vn 0 h2s d〈M〉s ) . The exponent is bounded, so Q is well defined. Under Q, if t ≤ Vn, then Mt − ⟨ − ∫ · 0 hs dMs, M ⟩ t = Mt + ∫ t 0 hs d〈Ms〉 = Mt + At is a martingale by the Girsanov theorem (Exercise 13.5). Therefore St = Mt + At is a local martingale. Finally, e− ∫ t 0 hs dMs− 12 ∫ t 0 h 2 s d〈M〉s is never zero nor infinite, so Q is equivalent to P. Let us give two examples to clarify the proof. Let C be the standard Cantor set and let g(t) be the Cantor function. Suppose St = Wt + g(t), where W is a Brownian motion. We then let Ht = 1C(t). Since the Cantor function increases only on the Cantor set, ∫ 1 0 Hs dg(s) = 1. Since the Cantor set has Lebesgue measure 0, then ∫ 1 0 H 2 s ds = 0. But this is the quadratic variation of ∫ 1 0 Hs dWs, so this stochastic integral is also 0. It follows that∫ 1 0 Hs dSs = ∫ 1 0 Hs dWs + ∫ 1 0 Hs dg(s) = 1, 226 Financial mathematics which says that with the trading strategy H we make a profit of 1 almost surely, that is, without any risk. Therefore the NFLVR condition is violated. This example indicates why we must have dAt absolutely continuous with respect to d〈M〉t . Suppose now that W is a Brownian motion and St = Wt + ∫ t 0 Hs ds with Hs bounded. Let Mt = e− ∫ t 0 Hs dWs− 1 2 ∫ t 0 H 2 s ds, and define Q on F1 by dQ/dP = M1. By the Girsanov theorem, St = Wt + ∫ t 0 Hs ds is a martingale under Q. This example shows that if the Radon–Nikodym derivative of dAt with respect to d〈M〉t is not too bad, we can apply the Girsanov theorem. 28.4 Stochastic control The theory of stochastic control, which includes a study of the Hamilton–Jacobi–Bellman (HJB) equation and requires some knowledge of partial differential equations, is beyond the scope of this book. However, we can consider one simple useful example. Suppose we have available to us a stock which satisfies the SDE dSt = σSt dWt + μSt dt, where Wt is a Brownian motion, and a risk-free asset which satisfies the equation dBt = rBt dt. We want to put a proportion u of our wealth Zt into the stock and the remainder into the risk-free asset. We will restrict 0 ≤ u ≤ 1, so that we do not borrow nor have short selling. Also, we take μ > r, for if the mean rate of return on the stock is less than the risk-free rate,
we simply put all our money in the risk-free asset. How do we choose u in order to maximize
our return?
First of all, what do we mean by maximizing our return? Typically one chooses ahead
of time a deterministic function U , called the utility function, and one wants to maximize
EU (Zt0 ) at some fixed time t0. Usually utility functions are taken to be increasing and
concave. The function is chosen to be increasing because more money is considered better. It
is chosen concave because one assumes that twice the amount of money will give increased
pleasure, but not twice as much pleasure.
Let us work out the optimal control problem when U (x) = xp for some p ∈ (0, 1). If
Zt (depending on u) is our wealth, we have Zt = St + Bt and St = uZt , Bt = (1 − u)Zt .
We will allow u to depend on t and ω, but our answer will turn out to be deterministic and
independent of t, i.e., u is a constant.
We have seen (Proposition 24.6) that
St = S0eσWt−σ 2t/2+μt
and 〈S〉t = σ 2S2t dt and that the equation for Bt has the solution
Bt = B0ert .

Exercises 227
Therefore neither St nor Bt can ever be 0 or negative, and so Zt > 0 for all t. Applying Itô’s
formula to Z pt and noting that 〈Z〉t = 〈S〉t , we have
dZ pt = pZ p−1t dZt + 12 p(p − 1)Z p−2t d〈Z〉t
= pZ p−1t σSt dWt + pZ p−1t μSt dt + pZ p−1t rBt dt
+ 12 p(p − 1)Z p−2t σ 2S2t dt
= puZ pt σ dWt + puZ pt μ dt + p(1 − u)rZ pt dt
+ 12 p(p − 1)Z pt σ 2u2 dt.
Therefore
E Z pt0 = E Z p0 + pE
∫ t0
0
Z pt [uμ + (1 − u)r + 12 (p − 1)σ 2u2] dt.
This will be largest if the expression
F (u) = uμ + (1 − u)r + 12 (p − 1)σ 2u2
is largest, which by elementary calculus is largest when
u = μ − r
(1 − p)σ 2 .
Exercises
28.1 Let
�(x) = 1√
2π
∫ x
−∞
e−y
2/2 dy,
the cumulative normal distribution function. Rewrite the Black–Scholes formula for the value
of a European call in terms of �. This is the way the Black–Scholes formula is written in finance
books.
28.2 A European put that gives one the option to sell a share of stock at price K at time tE has value
(K − StE )+ at time tE . Find the present-day value of the European put at time 0.
28.3 Carry out the details of the proof of Theorem 28.1 for Case 2.
28.4 If the utility function in Section 28.4 is U (x) = log x instead of U (x) = xp, what is the optimal
choice for u?
28.5 Let a, b > 0, let Yi be i.i.d. random variables that take only the values b and −a, and let
Sn =
∑n
i=1 Yi. Show that if P(Y1 = b) > 0 and P(Y1 = −a) > 0, there exists a probability
measure Q equivalent to P under which Sn is a martingale. Describe the Radon–Nikodym
derivative of Q with respect to P.
28.6 Suppose the interest rate r is equal to 0 and an option V has payoff
sup
s≤te
Ss
at time te. What is the price of V at time 0?

228 Financial mathematics
28.7 Suppose the interest rate r is equal to 0. Let U be the option that pays off − inf s≤te Ss at time te.
What is the price of U at time 0?
If V is as in Exercise 28.6, then U + V is the option that pays off the maximum of the stock
price minus the minimum of the stock price, in other words, “buy low, sell high.” Naturally
such an option would be expensive. It is remarkable that there exists a trading strategy that can
duplicate this payoff, even though the times when the maximum and minimum occur are not
stopping times.

29
Filtering
Stochastic filtering is a nice example of nontrivial interesting mathematics that is extremely
useful. For example, it has been used extensively in NASA’s space program.
The method we use is called the innovations approach to filtering, and uses Lévy’s theorem,
the martingale representation theorem, and other results from stochastic calculus.
We will start with a fairly general model, except for simplicity we will assume our
observation process is one-dimensional. The extension to the d-dimensional case is mostly
routine. Later on we will look at a specific model, the linear model, where one can give fairly
explicit solutions to the filtering equation for real-life problems.
29.1 The basic model
We start with a probability space (�,F , P), together with a filtration {Ft} satisfying the
usual conditions. In filtering theory, there are a number of filtrations present, and we will
need to be careful about which ones are which.
We have a signal process Xt taking values in a complete separable metric space S and we
let {FXt } be the minimal augmented filtration generated by X . We have a function f mapping
S to the reals, we suppose E | f (Xt )|2 < ∞ for all t, and we suppose that there exists a process As adapted to the filtration {FXt } such that Mt = f (Xt ) − f (X0) − ∫ t 0 As ds is a martingale with respect to the filtration {FXt }. Next we discuss the observation process. Let Wt be a one-dimensional Brownian motion with respect to the filtration {FXt }, let ht be a real-valued process adapted to {FXt }, and suppose Zt = Wt + ∫ t 0 hs ds. (29.1) The process Zt is called the observation process and is what we observe. Let {FZt } be the filtration generated by the process Z. In practice one does not necessarily want to assume that {FZt } is right continuous, but let us assume that it is for simplicity. Requiring the filtration to be complete is not a serious issue. For an example, suppose that dXt = σ (Xt ) dW t + b(Xt ) dt as in Chapter 24, where W t is a d-dimensional Brownian motion and σ and b are matrix valued, and suppose f ∈ C2(Rd ) is bounded or has linear growth. Then Itô’s formula shows that such an f will satisfy our 229 230 Filtering assumptions. In this case hs in (29.1) is of the form g(Xs) for a particular function g; see Section 39.3. The goal of filtering is to get the best estimate of f (Xt ) from the observations {Zt}. We want to find the best estimate for f (Xt ) in the following sense. We want to minimize the mean square error E | f (Xt )−Y |2 over all random variables Y that are FZt measurable, i.e., over all random variables that can be determined by the observations up to time t. The rationale is that since FZt is the information we have observed up to time t, we want our estimate to be FZt measurable, and among all random variables that are FZt measurable, we want the one closest to f (Xt ) in L2 norm, which means we minimize the mean square error. Lemma 29.1 The best mean square error estimate of f (Xt ) over the class of FZt measurable random variables is Y = E [ f (Xt ) | FZt ]. Proof By our assumptions on f , the random variable V = f (Xt ) is in L2(P). Let Y be the best mean square estimator. The collection M of L2 random variables which are FZt measurable is a linear subspace of L2, and the element of a Hilbert space that minimizes the distance from V to this subspace M is the projection onto M. Therefore Y is the projection of V onto M. Hence V − Y is orthogonal (in the L2 sense) to every element of M. In particular, if E ∈ FZt , E [(V − Y )1E] = 0, which implies E [V ; E] = E [Y ; E]. This holds for every E ∈ FZt and Y is FZt measurable, hence Y = E [V | FZt ]. Given any process Ht that is {Ft} adapted, we use the notation Ĥt = E [Ht | FZt ]. We will look at expressions like ∫ t 0 Ĥs ds, and you might wonder about the joint measurability of Ĥ in ω and t, since Ĥt is only defined almost surely for each t. The way to deal with this is to let Ĥt be the optional projection of H with respect to the optional σ -field generated by {FZt }; see (16.8) in Chapter 16. 29.2 The innovation process We next define the innovation process Nt = Zt − ∫ t 0 ĥs ds. (29.2) (Following our convention on notation, ĥs = E [hs | FZs ].) Note that although Nt is FZt measurable, we cannot determine it from (29.2) because it contains the unknown ĥs on the right-hand side. Proposition 29.2 Nt is a Brownian motion with respect to the filtration {FZt }. Proof We will show that Nt is a continuous martingale with respect to the filtration {FZt } whose quadratic variation is t, and then our result follows from Lévy’s theorem (Theorem 12.1). That Nt is continuous is obvious, and 〈N〉t = 〈Z〉t = 〈W 〉t = t from the definitions of Z and W . Thus we need to show that N is a martingale with respect to {FZt }. 29.3 Representation of FZ-martingales 231 If r ≥ s, we have E [̂hr | FZs ] = E [E [hr | FZr ] | FZs ] = E [hr | FZs ]. (29.3) Then using Exercise 29.1, E [Nt − Ns | FZs ] = E [Zt − Zs | FZs ] − ∫ t s E [̂hr | FZs ] dr (29.4) = E [Wt − Ws | FZs ] + ∫ t s E [hr − ĥr | FZs ] dr = E [E [Wt − Ws | FXs ] | FZs ] = 0, since FZs ⊂ FXs . 29.3 Representation of FZ-martingales In this section we prove that if Yt is a martingale with respect to {FZt }, then Y can be represented as a stochastic integral with respect to N . This is not an immediate consequence of Theorem 12.3 because we do not know that Nt generates {FZt }; the filtration generated by N could conceivably be strictly smaller than the one generated by Z. Theorem 29.3 Suppose Yt is a square integrable martingale with respect to {FZt }. Let PZ be the predictable σ -field defined on [0, ∞) × � in terms of {FZt }. Then there exists Hs which is PZ measurable and with E ∫∞ 0 H 2 s ds < ∞ such that Yt = Y0 + ∫ t 0 Hs dNs (29.5) for all t. To clarify, PZ is the σ -field generated by all bounded left-continuous processes that are adapted to {FZt }. Proof First let us treat the case where ∫ t 0 ĥs dNs, ∫ t 0 |̂hs|2 ds, and Yt are each bounded. Define Q on FZt by dQ/dP |FZt = Mt , where Mt = exp ( − ∫ t 0 ĥs dNs − 12 ∫ t 0 |̂hs|2 ds ) . Then by the Girsanov theorem (Theorem 13.3) Zt = Nt + ∫ t 0 ĥs ds is a martingale under Q with respect to {FZt }. Since 〈Z〉t = 〈N〉t = 〈W 〉t = t, then Z is a Brownian motion under Q with respect to {FZt }. Let Ỹt = M−1t Yt . If A ∈ FZs , then A ∈ FXs and E Q[Ỹt; A] = E P[Mt (M−1t Yt ); A] = E P[Yt; A] = E P[Ys; A] = E P[Ms(M−1s Ys); A] = E Q[Ỹs; A]. 232 Filtering Therefore Ỹt is a martingale under Q with respect to {FZt }. By the martingale representation theorem (Theorem 12.3) there exists Ks ∈ PZ such that Ỹt = Ỹ0 + ∫ t 0 Ks dZs = Ỹ0 + ∫ t 0 Ks dNs + ∫ t 0 Kŝhs ds. On the other hand, dMt = −Mt̂ht dNt and Yt = MtỸt . We have d〈M,Y 〉t = −Mt̂htKt dt. By the product formula, Yt = M0Ỹ0 + ∫ t 0 Ỹs dMs + ∫ t 0 Ms dỸs + 〈M, Ỹ 〉t = Y0 − ∫ t 0 ỸsMŝhs dNs + ∫ t 0 KsMs dNs + ∫ t 0 KŝhsMs ds − ∫ t 0 MŝhsKs ds, which is of the desired form if we set Hs = KsMs − ỸsMŝhs. In the general case, let TK = inf { t : ∣∣∣ ∫ t 0 ĥs dNs ∣∣∣+ ∫ t 0 |̂hs|2 ds + |Yt | ≥ K } . We apply the above argument to Yt∧TK and use Exercise 29.3 to get Yt∧TK = Y0 + ∫ t 0 H Ks dNs, where H Ks is predictable with respect to the σ -fields {FZt∧TK } and is 0 from time TK on. Since Yt is square integrable, YTK → Y∞ almost surely and in L2(P) as K → ∞, and E [ ∫ ∞ 0 |H Ks − H Ls |2 ds ] = E [ |YTK − YTL |2] → 0 as K, L → ∞. Using the completeness of L2, there exists Hs such that E ∫∞ 0 H 2 s ds < ∞ and E ∫∞ 0 |Hs − H Ks |2 ds → 0 as K → ∞. It is routine to check that Hs is PZ measurable and that (29.5) holds. 29.4 The filtering equation We now derive the general filtering equation. First we need a lemma. Lemma 29.4 If Yt − ∫ t 0 Hs ds is a martingale with respect to {FXt }, then Ŷt − ∫ t 0 Ĥs ds is a martingale with respect to {FZt }. 29.4 The filtering equation 233 Proof Since FZs ⊂ FXs , E [ Ŷt − Ŷs − ∫ t s Ĥr dr | FZs ] = E [ E [Yt | FZt ] − E [Ys | FZs ] − ∫ t s E [Hr | FZr ] dr | FZs ] = E [ Yt − Ys − ∫ t s Hr dr | FZs ] = E [ E [ Yt − Ys − ∫ t s Hr dr | FXs ] | FZs ] = 0. The first equality is proved in a fashion similar to the one you were asked to prove in Exercise 29.1. Here is the filtering equation. Theorem 29.5 Let Mt = f (Xt ) − f (X0) − ∫ t 0 As ds be a martingale with respect to {FXt } and write Fs for f (Xs). Suppose 〈M,W 〉t = ∫ t 0 Ds ds. Then F̂t = F̂0 + ∫ t 0 Âs ds + ∫ t 0 (F̂shs − F̂ŝhs + D̂s) dNs. (29.6) Proof By Lemma 29.4, Lt = F̂t − F̂0 − ∫ t 0 Âs ds (29.7) is a martingale with respect to {FZt } and by Theorem 29.3, there exists Hs such that Lt = ∫ t 0 Hs dNs. (29.8) By the product formula FtZt = ∫ t 0 Fs dZs + ∫ t 0 Zs dFs + ∫ t 0 Ds ds = ∫ t 0 Fs dNs + ∫ t 0 Fshs ds + ∫ t 0 Zs dMs + ∫ t 0 ZsAs ds + ∫ t 0 Ds ds = FX -martingale + ∫ t 0 [Fshs + ZsAs + Ds] ds. By Lemma 29.4 and the obvious fact that Z is adapted to {FZt }, F̂tZt = F̂tZt = FZ-martingale + ∫ t 0 (F̂shs + ZsÂs + D̂s) ds. Again using the product formula, F̂tZt = ∫ t 0 F̂s dZs + ∫ t 0 Zs dF̂s + ∫ t 0 Hs ds = FZ-martingale + ∫ t 0 [F̂ŝhs + ZsÂs + Hs] ds. 234 Filtering Therefore ∫ t 0 (F̂shs + ZsÂs + D̂s − F̂ŝhs − ZsÂs − Hs) ds is a continuous FZ-martingale that has paths that are locally of bounded variation and which is zero at time zero, hence is identically zero by Theorem 9.7. Hence with probability one, Hs = F̂shs − F̂ŝhs + D̂s for almost every s. Substituting this in (29.8) and combining with (29.7) gives our result. 29.5 Linear models The filtering equation (29.6) is difficult to apply in most cases. However, in the linear model, we can get a much simpler representation. To define the linear model in d dimensions, let Xt solve dXt = A(t) dW t + B(t)Xt dt, (29.9) where W t is a d-dimensional Brownian motion and A(t) and B(t) are deterministic d × d matrices that are continuous in t. Let dZt = dWt + C(t)Xt dt, (29.10) where C is a deterministic d × d matrix-valued function that is continuous in t and Wt is a d-dimensional Brownian motion independent of W and X . Why is this model useful? Suppose Xt is two-dimensional with X (1) t being the position of a particle and X (2)t its velocity. Suppose the position and the velocity have some randomness and that our observations of the position and velocity are noisy. This fits into the model (29.9)–(29.10) if we take A(t) = ( 1 0 0 1 ) , B(t) = ( 0 1 0 0 ) , C(t) = ( c1 0 0 c2 ) . For another example, suppose a particle has a fixed unknown velocity and the position is observed, but obscured by noise. Let X (1)t and X (2) t be the position and velocity and let A(t) be the zero matrix, B(t) = ( 0 1 0 0 ) , C(t) = ( 1 0 0 0 ) . The solution of the filtering problem modeled by (29.9)–(29.10) is known as the Kalman– Bucy filter. For simplicity we will consider the special case where the dimension d is 1 and A, B,C are constant in t; the general case is done in exactly the same way, but the notation becomes much more complicated (see Kallianpur, 1980). We will further assume E X0 and Var X0 are known. 29.6 Kalman–Bucy filter Let Vt = X̂ 2t − (X̂t )2, the conditional variance of Xt given FZt . 29.6 Kalman–Bucy filter 235 Theorem 29.6 Vt solves the deterministic ordinary differential equation dVt dt = 1 + 2BVt − C2V 2t , V0 = Var X0 (29.11) In particular, Vt is deterministic. X̂t solves dX̂t = CVt dZt + (B − CVt )X̂t dt, X̂0 = E X0. (29.12) The equation (29.11) is an example of what is known as a Riccati equation. We get a similar equation when d > 1 or when A, B, and C depend on t, but in general one cannot
solve the Riccati equation explicitly. However, when d = 1 and A, B,C do not depend on t,
one can solve (29.11) by separation of variables. Write
dV
1 + 2BV − C2V 2 = dt,
and integrate both sides.
When d = 1 (and even if A, B, and C depend on time), we can solve (29.12). Let
Gt = B − CVt so that we have
dX̂t = CVt dZt + GtX̂t dt,
or by the product formula
d
[
e−
∫ t
0 Gr drX̂t
]
= e−
∫ t
0 Gr drCVt dZt,
and hence
X̂t = E X0 +
∫ t
0
e
∫ t
s Gr drCVs dZs.
(Cf. the solution of (24.15).)
Proof of Theorem 29.6 By Itô’s formula, if f ∈ C2,
f (Xt ) − f (X0) = FX -martingale +
∫ t
0
[ 12 f
′′(Xs) + BXs f ′(Xs)] ds.
By the filtering equation applied with f (x) = x,
X̂t = E X0 + B
∫ t
0
X̂s ds + C
∫ t
0
Vs dNs. (29.13)
By Exercises 29.4(2) and 29.5(3),
X̂ 3t − X̂t X̂ 2t = 2X̂tVt . (29.14)
With the filtering equation applied with f (x) = x2 and (29.14),
X̂ 2t = E X 20 + C
∫ t
0
(1 + 2BX̂ 2s ) ds + C
∫ t
0
(X̂ 3s − X̂sX̂ 2s ) dNs
= E X 20 + C
∫ t
0
(1 + 2BX̂ 2s ) ds + 2C
∫ t
0
VsX̂s dNs.

236 Filtering
Therefore
dVt = d(X̂ 2t − (X̂t )2) (29.15)
= 2CVtX̂t dNt + (1 + 2BX̂ 2t dt) − 2X̂t (CVt dNt + BX̂t dt) − C2V 2t dt
= (1 + 2BVt − C2V 2t ) dt.
This shows that Vt solves the deterministic ordinary differential equation (29.15). This
equation has a unique solution (cf. Theorem 15.1), so Vt is deterministic. We obtain (29.12)
from (29.2), (29.10), and (29.13).
Exercises
29.1 Justify the first equality in (29.4).
29.2 Show that if Mt is a martingale with respect to {FXt }, then M̂t is a martingale with respect
to {FZt }.
29.3 Suppose W is a Brownian motion and {Ft} is its minimal augmented filtration. Let T be a
bounded stopping time with respect to {Ft}. Suppose Y is a FT measurable random variable
with EY 2 < ∞. Show that there exists a predictable process Hs with E ∫ T 0 H 2 s ds < ∞ such that Y = EY + ∫ T0 Hs dWs, a.s. 29.4 (1) Show that the solution to (29.9) is a Gaussian process. (2) Show that the solutions (Xt , Zt ) to (29.9)–(29.10) form a Gaussian process. 29.5 (1) Show that if X is a normal random variable with mean μ and variance σ 2, then E X 3 = μ(μ2 + 3σ 2). (2) Show that if X ,Y1, . . . ,Yn are jointly normal random variables, then E [X 3 | Y1, . . . ,Yn] = E [X | Y1, . . . ,Yn](E [X | Y1, . . . ,Yn]2 + 3Var [X | Y1, . . . ,Yn]), where Var [X | Y1, . . . ,Yn] = E [(X − E [X | Y1, . . . ,Yn])2 | Y1, . . . ,Yn]. (3) Show that X̂ 3t = X̂t ((X̂t )2 + 3Var (Xt | FZt )), where Var (Xt | FZt ) = E [(Xt − X̂t )2 | FZt ] = X̂ 2t − (X̂t )2. Notes For more on filtering, see Kallianpur (1980) and Øksendal (2003). 30 Convergence of probability measures Suppose we have a sequence of probabilities on a metric space S and we want to define what it means for the sequence to converge weakly. Alternately, we may have a sequence of random variables and want to say what it means for the random variables to converge weakly. We will apply the results we obtain here in later chapters to the case where S is a function space such as C[0, 1] and obtain theorems on the convergence of stochastic processes. For now our state space is assumed to be an arbitrary metric space, although we will soon add additional assumptions on S . We use the Borel σ -field on S , which is the σ -field generated by the open sets inS . We write A0, A, and ∂A for the interior, closure, and boundary of A, respectively. 30.1 The portmanteau theorem Clearly the definition of weak convergence of real-valued random variables in terms of dis- tribution functions (see Section A.12) has no obvious analog. The appropriate generalization is the following; cf. Proposition A.41. Definition 30.1 A sequence of probabilities {Pn} on a metric space S furnished with the Borel σ -field is said to converge weakly to P if ∫ f dPn → ∫ f dP for every bounded and continuous function f on S . A sequence of random variables {Xn} taking values in S converges weakly to a random variable X taking values in S if E f (Xn) → E f (X ) whenever f is a bounded and continuous function. Saying Xn converges weakly to X is the same as saying that the laws of Xn converge weakly to the law of X . To see this, if Pn is the law of Xn, that is, Pn(A) = P(Xn ∈ A) for each Borel subset A of S , then E f (Xn) = ∫ f dPn and E f (X ) = ∫ f dP. (This holds when f is an indicator by the definition of the law of Xn and X , then for simple functions by linearity, then for non-negative measurable functions by monotone convergence, and then for arbitrary bounded and Borel measurable f by linearity.) What might cause a bit of confusion is that weak convergence in probability is not the same as weak convergence in functional analysis, but rather is equivalent to what is known as weak-∗ convergence in functional analysis. Feel free to skip the remainder of this paragraph where we explain this. Recall that if B is a Banach space and B∗ is its dual, then xn ∈ B converges weakly to x ∈ B if f (xn) → f (x) for all f ∈ B∗. fn ∈ B∗ converges with respect to the weak-∗ topology to f ∈ B∗ if fn(x) → f (x) for all x ∈ B. By the Riesz representation theorem, there is a one-to-one correspondence between positive bounded linear functionals on B = C(X ), the continuous functions on X , where X is compact, and the set M of finite 237 238 Convergence of probability measures measures on X . When B = C(X ), B∗ can be identified with M, and measures Pn with mass 1 in M converge to P ∈ M with respect to the weak-∗ topology if Pn(g) → P(g) for every g ∈ B = C(X ). Interpreting Pn(g) as ∫ g dPn shows the connection. Returning to weak convergence in the probability sense, the following theorem, known as the portmanteau theorem, gives some other characterizations. For this chapter we let Fδ = {x : d(x, F ) < δ} (30.1) for closed sets F , the set of points within δ of F , where d(x, F ) = inf{d(x, y) : y ∈ F}. Theorem 30.2 Suppose {Pn, n = 1, 2, . . .} and P are probabilities on a metric space. The following are equivalent. (1) Pn converges weakly to P. (2) lim supn Pn(F ) ≤ P(F ) for all closed sets F. (3) lim inf n Pn(G) ≥ P(G) for all open sets G. (4) limn Pn(A) = P(A) for all Borel sets A such that P(∂A) = 0. Proof The equivalence of (2) and (3) is easy because if F is closed, then G = F c is open and Pn(G) = 1 − Pn(F ). To see that (2) and (3) imply (4), suppose P(∂A) = 0. Then lim sup n Pn(A) ≤ lim sup n Pn(A) ≤ P(A) = P(A0) ≤ lim inf Pn(A0) ≤ lim inf Pn(A). Next, let us show (4) implies (2). Let F be closed. If y ∈ ∂Fδ, then d(y, F ) = δ. The sets ∂Fδ are disjoint for different δ. At most countably many of them can have positive P-measure, hence there exists a sequence δk ↓ 0 such that P(∂Fδk ) = 0 for each k. Then lim sup n Pn(F ) ≤ lim sup n Pn(Fδk ) = P(Fδk ) = P(Fδk ) for each k. Since P(Fδk ) ↓ P(F ) as δk → 0, this gives (2). We show now that (1) implies (2). Suppose F is closed. Let ε > 0. Take δ > 0 small
enough so that P(Fδ ) − P(F ) < ε. Then take f continuous, to be equal to 1 on F , to have support in Fδ, and to be bounded between 0 and 1. For example, f (x) = 1− (1∧δ−1d(x, F )) would do. Then lim sup n Pn(F ) ≤ lim sup n ∫ f dPn = ∫ f dP ≤ P(Fδ ) ≤ P(F ) + ε. Since this is true for all ε, (2) follows. Finally, let us show (2) implies (1). Let f be bounded and continuous. If we show lim sup n ∫ f dPn ≤ ∫ f dP, (30.2) for every such f , then applying this inequality to both f and − f will give (1). By adding a sufficiently large positive constant to f and then multiplying by a suitable constant, without 30.2 The Prohorov theorem 239 loss of generality we may assume f is bounded and takes values in (0, 1). We define Fi = {x : f (x) ≥ i/k}, which is closed.∫ f dPn ≤ k∑ i=1 i k Pn ( i − 1 k ≤ f (x) < i k ) = k∑ i=1 i k [Pn(Fi−1) − Pn(Fi)] = k−1∑ i=0 i + 1 k Pn(Fi) − k∑ i=1 i n Pn(Fi) ≤ 1 k + 1 k k∑ i=1 Pn(Fi). Similarly, ∫ f dP ≥ 1 k k∑ i=1 P(Fi). Then lim sup n ∫ f dPn ≤ 1 k + 1 k k∑ i=1 lim sup n Pn(Fi) ≤ 1 k + 1 k k∑ i=1 P(Fi) ≤ 1 k + ∫ f dP. Since k is arbitrary, this gives (30.2). If xn → x, Pn = δxn , and P = δx, it is easy to see Pn converges weakly to P. Letting A = {x} shows that one cannot, in general, have limn Pn(F ) = P(F ) for all closed sets F . 30.2 The Prohorov theorem It turns out there is a simple condition that ensures that a sequence of probability measures has a weakly convergent subsequence. Definition 30.3 A sequence of probabilities Pn on a metric space S is tight if for every ε there exists a compact set K (depending on ε) such that supn Pn(K c) ≤ ε. The important result here is Prohorov’s theorem. Theorem 30.4 If a sequence of probability measures on a metric space S is tight, there is a subsequence that converges weakly to a probability measure on S . Proof Suppose first that the metric space S is compact. Then C(S ), the collection of continuous functions on S , is a separable metric space when furnished with the supremum norm; this is Exercise 30.1. Let { fi} be a countable collection of non-negative elements of C(S ) whose linear span is dense in C(S ). For each i, ∫ fi dPn is a bounded sequence, so we 240 Convergence of probability measures have a convergent subsequence. By a diagonalization procedure, we can find a subsequence n′ such that ∫ fi dPn′ converges for all i. By the term “diagonalization procedure” we are referring to the well-known method of proof of the Ascoli–Arzelà theorem; see any book on real analysis for a detailed explanation. Call the limit L fi. Clearly 0 ≤ L fi ≤ ‖ fi‖∞, L is linear, and so we can extend L to a bounded linear functional on S . By the Riesz representation theorem (Rudin, 1987), there exists a measure P such that L f = ∫ f dP. Since ∫ fi dPn′ → ∫ fi dP for all fi, it is not hard to see, since each Pn′ has total mass 1, that∫ f dPn′ → ∫ f dP for all f ∈ C(S ). Therefore Pn′ converges weakly to P. Since L f ≥ 0 if f ≥ 0, then P is a positive measure. The function that is identically equal to 1 is bounded and continuous, so 1 = Pn′ (S ) = ∫ 1 dPn′ → ∫ 1 dP, or P(S ) = 1. Next suppose that S is a Borel subset of a compact metric space S ′. Extend each Pn, initially defined on S , to S ′ by setting Pn(S ′ \ S ) = 0. By the first paragraph of the proof, there is a subsequence Pn′ that converges weakly to a probability P on S ′ (the definition of weak convergence here is relative to the topology on S ′). Since the Pn are tight, there exist compact subsets Km of S such that Pn(Km) ≥ 1 − 1/m for all n. The Km will also be compact relative to the topology on S ′, so by Theorem 30.2, P(Km) ≥ lim sup n′ Pn′ (Km) ≥ 1 − 1/m. Since ∪mKm ⊂ S , we conclude P(S ) = 1. If G is open in S , then G = H ∩ S for some H open in S ′. Then lim inf n′ Pn′ (G) = lim inf n′ Pn′ (H ) ≥ P(H ) = P(H ∩ S ) = P(G). Thus by Theorem 30.2, Pn′ converges weakly to P relative to the topology on S . Now let S be an arbitrary metric space. Since all the Pn’s are supported on ∪mKm, we can replace S by ∪mKm, or we may as well assume that S is σ -compact, and hence separable. It remains to embed the separable metric space S into a compact metric space S ′. If d is the metric on S , d ∧ 1 will also be an equivalent metric, that is, one that generates the same collection of open sets, so we may assume d is bounded by 1. Now S can be embedded in S ′ = [0, 1]N as follows. We define a metric on S ′ by d ′(a, b) = ∞∑ i=1 2−i(|ai − bi| ∧ 1), a = (a1, a2, . . .), b = (b1, b2, . . .). (30.3) Being the product of compact spaces, S ′ is itself compact. If {z j} is a countable dense subset of S , let I : S → [0, 1]N be defined by I(x) = (d(x, z1), d(x, z2), . . .). We leave it to the reader to check that I is a one-to-one continuous open map of S to a subset of S ′. Since S is σ -compact, and the continuous image of compact sets is compact, then I(S ) is a Borel set. Clearly, Prohorov’s theorem is easily modified to handle the case of finite measures on S . 30.3 Metrics for weak convergence 241 30.3 Metrics for weak convergence Since we have defined a notion of convergence of probability measures, one might wonder if one can make the set of probability measures M on S into a metric space so that weak convergence is equivalent to convergence in M. This is indeed possible and in fact there are a number of metrics on the space of probability measures that work. We will focus on the Prohorov metric. Definition 30.5 If P and Q are probability measures on a separable metric space S , define dM(P, Q) = inf{ε : P(F ) ≤ Q(Fε ) + ε for all F closed}. (30.4) It is not immediately obvious that dM is even a metric, so the first task is to show that it is. Proposition 30.6 dM is a metric on M. Proof We start with symmetry, that is, that dM(Q, P) = dM(P, Q). Let α be any real number larger than dM(P, Q). If H is closed, then Hα = {x : d(x, H ) < α} is open and K = S \ Hα is closed. Note that H ⊂ S − Kα, where Kα = {x : d(x, K) < α}, because if x ∈ H , then d(x, K) ≥ α, so x /∈ Kα and hence x ∈ S \ Kα. Since K is closed, by the definition of dM(P, Q), P(Hα ) = 1 − P(K) ≥ 1 − Q(Kα ) − α = Q(S \ Kα ) − α ≥ Q(H ) − α, or Q(H ) ≤ P(Hα ) + α. Since H was an arbitrary closed set, dM(Q, P) ≤ α, and it follows that dM(Q, P) ≤ dM(P, Q). Reversing the roles of P and Q shows symmetry. Clearly dM(P, Q) ≥ 0. If dM(P, Q) = 0, then P(F ) = Q(F ) = 0 for all closed sets F . Since the collection of closed sets generates the Borel σ -field, it is not hard to see that P(A) = Q(A) for all Borel subsets A, and hence P = Q. Finally we prove the triangle inequality. Suppose P, Q, R ∈ M. If α is any real larger than dM(P, Q) and β any real larger than dM(Q, R), then for any ε > 0 and any closed set F
P(F ) ≤ Q(Fα ) + α ≤ Q(Fα) + α
≤ R((Fα )β ) + α + β
≤ R(Fα+β+ε ) + (α + β + ε).
Therefore dM(P, R) ≤ α + β + ε, and since ε is arbitrary, the triangle inequality follows.
Now we show that weak convergence is equivalent to convergence in the topology gen-
erated by dM, at least if S is separable. (L∞[0, 1] is an example of a nonseparable metric
space.)
Proposition 30.7 SupposeS is a separable metric space. A sequence of probability measures
Pn on S converges weakly to a probability P if and only if dM(Pn, P) → 0.
Proof We first suppose dM(Pn, P) → 0 and show that Pn converges weakly to P. Separa-
bility is not used in this part of the proof. Suppose F is closed and set εn = dM(Pn, P)+1/n.
Since Pn(F ) ≤ P(Fεn ) + εn, then
lim sup
n
Pn(F ) ≤ lim sup
n
P(Fεn ) = P(F ),
and we now apply Theorem 30.2(2).

242 Convergence of probability measures
We now suppose Pn converges weakly to P. Let ε > 0. Cover S with countably many
balls {Bi} of diameter less than ε/2 (separability is used here) and let A1 = B1, A2 = B2 \ B1,
A3 = B3 \ (B1 ∪ B2), A4 = B4 \ (B1 ∪ B2 ∪ B3), and so on. Hence the An form a collection of
disjoint sets which coverS and each An has diameter less than ε/2. Choose N large enough so
that P(∪Ni=1Ai) > 1 − ε/2. Let G be the collection of open sets of the form (Ai1 ∪ · · ·∪ Aij )ε/2
such that i1, . . . , i j ≤ N . That is, we look at all finite unions of A1, . . . , AN , and then take the
(ε/2)-enlargements. The collection G is finite. This fact and Theorem 30.2(3) imply that we
can find n0 such that P(G) ≤ Pn(G) + ε/2 if n ≥ n0 and G ∈ G.
Suppose F is closed. Let G = (∪{Ai : i ≤ N, Ai ∩ F �= ∅})ε/2. Then G ∈ G and if n ≥ n0
P(F ) ≤ P(G) + P(∪∞i=N+1Ai) ≤ P(G) + ε/2
≤ Pn(G) + ε ≤ Pn(Fε ) + ε.
In the last inequality we used the definition of G and the fact that the Ai have diameters less
than ε/2. This shows dM(P, Pn) ≤ ε if n ≥ n0, which in turn implies dM(P, Pn) → 0.
Exercises
30.1 If S is a metric space, then it is well known that C(S), the collection of continuous functions
with the metric
d( f , g) = sup
x∈S
| f (x) − g(x)|
is a metric space. Show that if S is compact, then C(S) is separable.
30.2 Suppose Xn converges weakly to X and the random variables Zn are such that d(Xn, Zn)
converges to 0 in probability. Prove that Zn converges weakly to X . This is known as Slutsky’s
theorem.
Hint: Start with P(Zn ∈ F ) ≤ P(Xn ∈ Fδ ) + P(d(Xn, Zn) ≥ δ).
30.3 Suppose Xn take values in a normed linear space and converge weakly to X . Suppose cn are
scalars converging to c. Show cnXn converges weakly to cX .
30.4 Give an example of a sequence Pn converging weakly to P and a function f that is continuous
but not bounded such that
∫
f dPn does not converge to
∫
f dP.
30.5 Give an example of a sequence Pn converging weakly to P and a function f that is bounded but
not continuous such that
∫
f dPn does not converge to
∫
f dP.
30.6 Show that if Xn converges weakly to X and Yn converges in probability to 0, then XnYn converges
in probability to 0.
30.7 This exercise considers a sequence of probability measures that have densities. Suppose S is
furnished with the Borel σ -field and μ is a measure on S. Suppose that fn : S → [0,∞) and
f : S → [0,∞) are measurable functions, each of whose integral over S is one, and define
Pn(A) =
∫
A fn(x) μ(dx) for each n and P(A) =
∫
A f (x) μ(dx).
(1) Show that if fn → f , μ-a.e., then Pn converges weakly to P.
(2) Give an example where Pn and P are as above, Pn converges weakly to P, but fn does not
converge almost everywhere to f .

Notes 243
30.8 Give an example of continuous processes Xn and X such that all the finite-dimensional distri-
butions of Xn converge weakly to the corresponding finite-dimensional distributions of X , but
where Xn does not converge weakly to X with respect to the topology of C[0, 1].
30.9 Suppose X is a random variable taking values in a complete separable metric space. If ε > 0,
show there exists a compact set K such that P(X /∈ K) < ε. Hint: For each n choose closed balls {Bni, i = 1, . . . , Nn} such that P(X /∈ ∪Nni=1Bni) < ε/2n+1. Then K = ∩∞n=1 ∪Nni=1 Bni is totally bounded, hence compact. 30.10 Suppose Xn converges weakly to X and the metric space S is complete and separable. Prove that the sequence {Xn} is tight. 30.11 Let L be the collection of continuous functions on S such that (1) supx∈S | f (x)| ≤ 1. (2) | f (x) − f (y)| ≤ d(x, y) for all x, y ∈ S. Define dL(P, Q) = sup f ∈L ∣∣∣ ∫ f dP − ∫ f dQ∣∣∣. Show that dL is a metric on the collection of probability measures on the Borel σ -field ofS. Prove that a sequence of probability measures Pn converges weakly to P if and only if dL(Pn, P) → 0. 30.12 Suppose S is a separable metric space. Show that M is separable. Notes For more information, see Billingsley (1968) and Ethier and Kurtz (1986). 31 Skorokhod representation Suppose S is a complete separable metric space furnished with the Borel σ -field. We are going to show that if Xn are random variables taking values in S converging weakly to a random variable X , then we can find another probability space and other random variables X ′n, X ′ such that the law of X ′n equals the law of Xn for each n, the law of X ′ equals the law of X , and X ′n converges to X ′ almost surely. Let �′ = [0, 1], F ′ the Borel σ -field on [0, 1], and P′ Lebesgue measure. We first prove Theorem 31.1 Let P be a probability measure on S . Then there exists a random variable X mapping �′ to S such that the law of X ′ under P′ is equal to P. Proof For each k ≥ 1, let {Aki} be a countable disjoint covering of S by Borel sets of diameter less than 1/k, such that P(∂Aki) = 0, and {Aki} is a refinement of {Ak−1,i}. We can construct these families inductively. To start, cover S with countably many balls of radius less than 1. Since for each x0, P({x : |x − x0| = r}) can be nonzero for at most countably many values of r, we can arrange matters so that the P-measure of the boundary of these balls is 0. We order the balls B1, B2, . . . , and then let A11 = B1, A12 = B2 \ B1, A13 = B3 \ (B1 ∪ B2), and so on. To construct {A2i}, we first find a similar covering of S by sets {A′2i} of diameter less than 1/2, and then take all intersections of sets in {A′2i} with sets in {A1 j}. We inductively define closed subintervals of [0, 1] by choosing I11 to have left endpoint at 0 and length equal to P(A11), then I12 to have left endpoint equal to the right endpoint of I11 and length equal to P(A12), and so forth. We then decompose I11 into subintervals {I21} in an analogous way so that the lengths of the subintervals match the probabilities of the A2i’s contained in A11. We then subdivide I12, and so on. We observe that {Iki} is a refinement of {Ik−1,i} for all k ≥ 2 and P′(Iki) = P(Aki) for all k and i. Pick a point xki ∈ Aki for each k and i. We define X k by setting X k(ω′) equal to xki if ω′ ∈ Iki. (The set of endpoints of the Iki is countable, hence has Lebesgue measure 0, and it doesn’t matter how we define X k at those points.) For each ω′ except those that are endpoints of some Iki, if n ≥ m, then X n(ω′) and X m(ω′) are in the same Ami for some i. Since the diameter of Ami is less than 1/m, we see that d(X n(ω′), X m(ω′)) ≤ 1/m. That is, X n(ω′) is a Cauchy sequence. The space S is complete, so we can define X (ω′) to be the limit of the X n(ω′). The collection of endpoints of the Imi is countable, so the limit exists for almost every ω′. 244 Skorokhod representation 245 It remains to show that the law of X under P′ is P. Let F be a closed set, let Fk = {x : d(x, F ) < 1/k}, and let Jk = {i : Aki ∩ F �= ∅}. We have P′(X k ∈ F ) ≤ P′(X k ∈ ∪i∈Jk Aki) ≤ ∑ i∈Jk P′(X k ∈ Aki) = ∑ i∈Jk P′(Iki) = ∑ i∈Jk P(Aki) ≤ P(Fk ). We used the fact that each Aki has diameter less than 1/k. Hence lim sup k P′(X k ∈ F ) ≤ P(F ). Therefore the laws of X k under P′ converge weakly to P. But we know d(X k(ω′), X (ω′)) ≤ 1/k, so X k converges to X , a.s., with respect to P′. If f is continuous and bounded, E ′ f (X k ) → E ′ f (X ) by dominated convergence, so X k → X weakly. Therefore the law of X under P′ is equal to P. We did not need the fact that the Aki were continuity sets, i.e., that the probability of the boundary of Aki is zero, but this will be used in the next theorem, which is known as the Skorokhod representation. Theorem 31.2 Suppose Pn are probability measures on S converging weakly to P. Then there exist random variables Xn mapping �′ to S with laws Pn and a random variable X mapping �′ to S with law P such that Xn → X , a.s. Equivalently, if X ′n converges to X ′ weakly, there exist random variables Xn and X mapping �′ to S with laws equal to X ′n and X , respectively, such that Xn → X , a.s. Proof Let the Aki be as in the proof of the previous theorem, and for each Pn define intervals Inki and random variables X k n as was done above, and let Xn be the limit of the X k n ’s. Let Kkn = {i : P(Aki) > Pn(Aki)} and Kckn = {i : P(Aki) ≤ Pn(Aki)}. Since∑
i
[P(Aki) − Pn(Aki)] = 1 − 1 = 0,
we have ∑
Kckn
[P(Aki) − Pn(Aki)] = −
∑
Kkn
[P(Aki) − Pn(Aki)].
Hence ∑
i
|P′(Iki) − P′(Inki)| =
∑
i
|P(Aki) − Pn(Aki)| (31.1)
=
∑
Kkn
[P(Aki) − Pn(Aki)] −
∑
Kckn
[P(Aki) − Pn(Aki)]
= 2
∑
Kkn
[P(Aki) − Pn(Aki)]
= 2
∑
i
[P(Aki) − Pn(Aki)]+.
Each term in the sum on the last line goes to 0 as n → ∞ by Theorem 30.2 because the
Aki are P-continuity sets, that is, P(∂Aki) = 0; also each term is dominated by P(Aki), and

246 Skorokhod representation∑
i P(Aki) = 1. Therefore by dominated convergence the sum on the last line of (31.1)
goes to 0.
Fix k and j and let α, αn be the left-hand endpoints of Ik j, Ink j, respectively. Then (31.1)
allows us to use dominated convergence to conclude that
α =
∑
i∈J
P′(Iki) = lim
n→∞
∑
i∈J
P′(Inki) = limn→∞ αn,
where J consists of those i such that Iki is to the left of Ik j; note that for i ∈ J we have that Inki
is to the left of Ink j and conversely, if I
n
ki is to the left of I
n
k j, then i ∈ J . Similarly the right-hand
endpoint of Ink j converges to the right-hand endpoint of Ik j.
If ω′ is in the interior of Ik j, then it will be in the interior of Ink j for all sufficiently large n.
This means that for n sufficiently large,
d(X (ω′), Xn(ω′) ≤ 2/k.
This implies our result.
Exercises
31.1 Suppose f is bounded, Xn converges to X weakly, and also that P(X ∈ D f ) = 0, where
D f = {x : f is not continuous at x}. Show that f (Xn) converges weakly to f (X ).
31.2 Suppose a sequence {Xn} is uniformly integrable and Xn converges to X weakly. Show E Xn →
E X .
31.3 Give an example of a sequence of random variables Xn converging weakly to X and where each
Xn is integrable, but X is not integrable.
31.4 Suppose Xn converges weakly to X and each Xn is non-negative. Prove that
E X ≤ lim inf
n→∞ E Xn.
31.5 Suppose Xn converges weakly to X and each Xn has the property that with probability one,
|Xn(t) − Xn(s)| ≤ |t − s|, s, t ≤ 1.
(This might arise, for example, if each Xn is of the form Xn(t) =
∫ t
0 Yn(s) ds and each Yn is
bounded by 1.) Prove that X has this same property, that is, with probability one,
|X (t) − X (s)| ≤ |t − s|, s, t ≤ 1.
31.6 Here is a way to prove one direction of Lebesgue’s theorem on Riemann integrable functions.
(1) For each n ≥ 1 and each i ≤ n, let xin be a point in [(i−1)/n, i/n). Let Pn be the probability
measure that assigns mass 1/n to each point xin, i = 1, 2, . . . , n. Show that Pn converges weakly
to P, where P is a Lebesgue measure on [0, 1].
(2) Suppose f is a bounded function which is continuous at almost every point of [0, 1]. Show
that
∫
f dPn →
∫
f dP. Note that
∫
f dPn is a Riemann sum approximation to
∫ 1
0 f (x) dx.

32
The space C[0, 1]
We examine weak convergence for the space C[0, 1], the set of continuous real-valued
functions on [0, 1]. We give a criterion for the laws of a sequence of continuous stochastic
processes to be tight. We apply these results to show that a simple symmetric random walk
converges weakly to a Brownian motion, which in particular gives another construction of
Brownian motion.
32.1 Tightness
Let C[0, 1] be the collection of continuous real-valued functions from [0, 1] into R. We make
C[0, 1] into a metric space by defining
d( f , g) = sup
t∈[0,1]
| f (t) − g(t)|,
and it is well known that C[0, 1] is separable and complete. We recall the Ascoli–Arzelà
theorem: if a family F of functions on a compact set is equicontinuous and uniformly
bounded at one point, then every subsequence in F has a further subsequence in F that
converges. Rephrased another way, if the family F is equicontinuous and uniformly bounded
at one point, then the closure of F is compact. We furnish C[0, 1] with the Borel σ -field.
Given a continuous function f on [0, 1], we define ω f , the modulus of continuity of f , by
ω f (δ) = sup
s,t∈[0,1],|t−s|<δ | f (t) − f (s)|. We have the following criterion for a sequence of continuous processes to be tight. Theorem 32.1 Suppose the Xn are continuous real-valued processes. Suppose for each ε and η > 0 there exist n0, A, and δ (depending on ε and η) such that if n ≥ n0, then
P(ωXn (δ) ≥ ε) ≤ η (32.1)
and
P(|Xn(0)| ≥ A) ≤ η. (32.2)
Then the Xn are tight.
Proof Since each Xi is a continuous process, then for each i, P(ωXi (δ) ≥ ε) → 0 as δ → 0
by dominated convergence. Hence, given ε and η we can, by taking δ smaller if necessary,
assume that (32.1) holds for all n.
247

248 The space C[0, 1]
Choose εm = ηm = 2−m and consider the δm and Am so that
sup
n
P(ωXn (δm) ≥ 2−m) ≤ 2−m
and
sup
n
P(|Xn(0)| ≥ Am) ≤ 2−m.
Let
Km0 = { f ∈ C[0, 1] : sup
s,t∈[0,1],|t−s|≤δm
| f (t) − f (s)| ≤ 2−m for all m ≥ m0,
| f (0)| ≤ Am0}.
Each Km0 is an equicontinuous family, and by the Ascoli–Arzelá theorem, each Km0 is a
compact subset of C[0, 1]. We have
P(Xn /∈ Km0 ) ≤ P(|Xn(0)| ≥ Am0 ) +
∞∑
m=m0
P(ωXn (δm) ≥ εm)
≤ 2−m0 +
∞∑
m=m0
2−m = 3 · 2−m0 .
This proves tightness.
We have given one criterion for a process to have continuous paths, namely, Theorem 8.1.
In the case of Markov processes, we have given another: Theorem 21.5.
32.2 A construction of Brownian motion
We will now use the results of Section 32.1 to give a construction of Brownian motion, quite
different from that of Chapter 6.
Let Yi be i.i.d. random variables with P(Yi = 1) = P(Yi = −1) = 12 . Then Sn =
∑n
i=1 Yi
is a simple symmetric random walk. Let Zn(t) = Snt/√n for t a multiple of 1/n and define
Znt by linear interpolation for other t. That is, if k/n ≤ t ≤ (k + 1)/n, then
Znt =
(k + 1) − nt√
n
Sk + nt − k√
n
Sk+1. (32.3)
The Zn are continuous processes. Let Pn be the law of Zn, which will be a probability measure
on C[0, 1].
Theorem 32.2 The sequence Pn converges weakly to a probability measure P∞ on C[0, 1],
and P∞ is the law of a Brownian motion.
Proof The main step is to prove that the Pn are tight. We then show that any subsequential
limit point is a Wiener measure, that is, the law of a Brownian motion. We can then appeal
to Theorem 31.1 to obtain the process X , which will be a Brownian motion.
A computation shows that
E S4n =
n∑
i=1
EY 4i +
∑
i�= j
(EY 2i )(EY
2
j ) ≤ cn2, (32.4)

32.2 A construction of Brownian motion 249
since EYi and EY 3i are both 0, the Yi’s are independent, and the second sum has n(n−1) ≤ n2
terms.
If s and t are multiples of 1/n, then
E |Zt − Zs|4 = 1
n2
E
( nt∑
i=ns+1
Yi
)4
= 1
n2
E
( nt−ns∑
i=1
Yi
)4
(32.5)
≤ c
n2
n2|t − s|2 ≤ c|t − s|2.
If we tried to get by with only the second moment, we would only end up with c|t − s|, which
is not good enough for Theorem 8.1.
At this point we would like to apply Theorem 32.1, but we have the technical nuisance
that s and t might not be multiples of 1/n. If |t − s| ≤ 2/n, then by the construction of Zn
using linear interpolation and the fact that the Yi’s are bounded by one in absolute value, we
have |Zn(t) − Zn(s)| ≤ c|t − s|√n and then
E |Zn(t) − Zn(s)|4 ≤ c|t − s|4n2 ≤ c|t − s|2. (32.6)
Suppose |t − s| > 2/n. Let s′ be the largest multiple of 1/n less than or equal to s and t ′ the
largest multiple of 1/n larger than or equal to t. Using (32.5) and (32.6),
E |Zn(t) − Zn(s)|4 ≤ cE |Zn(t) − Zn(t ′)|4 + cE |Zn(t ′) − Zn(s′)|4 + E |Zn(s′) − Zn(s)|4
≤ c|t − t ′|2 + c|t ′ − s′|2 + c|s′ − s|2
≤ c|t − s|2,
since |t − t ′|, |t ′ − s′|, and |s − s′| are all less than c|t − s|. Note Zn(0) = 0 for all n. We now
apply Theorems 8.1 and 32.1 to obtain the tightness.
Any subsequential limit point is a probability measure on C[0, 1], so to show that the
limit is a Brownian motion, it is enough by Theorem 2.6 to show that the finite-dimensional
distributions under the limit law P∞ agree with those of Brownian motion. Fix t. Then Zn(t)
differs from S[nt]/
√
n by at most 1/
√
n, where [nt] is the largest integer less than or equal to
nt. By the central limit theorem (Theorem A.51), S[nt]/
√
[nt] converges weakly (with respect
to the topology of R) to a mean zero normal random variable with variance one. By Exercise
30.3, S[nt]/
√
n converges weakly to a mean zero normal random variable with variance t,
and by Exercise 30.2, Zn(t) converges weakly to a mean zero normal random variable with
variance t. This shows that the one-dimensional distributions of Zn converge weakly to the
one-dimensional distributions of a Brownian motion. We leave the analogous argument for
the higher-dimensional distributions to the reader.
One can also use Doob’s inequalities to obtain the necessary tightness estimate. If s and t
are multiples of 1/n, we have
P( max
ns≤k≤nt
|Sk − Sns| > λ
√
n) ≤ cE |Snt − Sns|
4
λ4n2
(32.7)
≤ c |t − s|
2
λ4
.

250 The space C[0, 1]
Exercises
32.1 The support of a measure λ is the smallest closed set F such that λ(Fc) = 0. Let P be a Wiener
measure on C[0, 1], i.e., the law of a Brownian motion on [0, 1]. Use Exercise 13.4 to prove that
the support of P is all of C[0, 1].
32.2 Let (S, d) be a complete separable metric space and let R be a subset of S. Then (R, d) is also
a metric space. If Xn converges weakly to X with respect to the topology of (S, d) and each Xn
and X take values in R, does Xn converge weakly to X with respect to the topology of (R, d)?
Does the answer change if R is a closed subset of S?
If Xn and X take values in R and Xn converges weakly to X with respect to the topology of
(R, d), does Xn converge weakly to X with respect to the topology of (S, d)? What if R is a
closed subset of S?
32.3 Give a proof of Theorem 32.2 using (32.7) in place of Theorem 8.1.
32.4 Suppose (X ,W, P) is a weak solution to
dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x, (32.8)
where W is a one-dimensional Brownian motion and σ and b are bounded and continuous, but
we do not assume that σ is bounded below by a positive constant. Suppose the solution to (32.8)
is unique in law.
Suppose σn and bn are Lipschitz functions which are uniformly bounded and which converge
uniformly to σ and b, respectively. Let Xt (n) be the unique pathwise solution to
dYt = σn(Yt ) dWt + bn(Yt ) dt, Y0 = x;
the probability measure here is P. Prove that X (n) converges weakly to X with respect to C[0, 1].
32.5 Let W be a d-dimensional Brownian motion and let {Xt , t ∈ [0, 1]} be the solution to (24.22). If
x ∈ Rd , prove that the support of Px is all of C[0, 1].

33
Gaussian processes
A Gaussian process is a stochastic process where each of the finite-dimensional distributions
is jointly normal. We will primarily, but not exclusively, be concerned with Gaussian processes
that have continuous paths. For much of what we consider, it is not essential that the index
set of times be [0, ∞), and can in fact be almost any set. We will thus consider {Xt : t ∈ T }
for some index set T , and where for every finite subset S of T , the collection {Xs : s ∈ S} is
jointly normal.
33.1 Reproducing kernel Hilbert spaces
We define the covariance function
by

(s, t) = E [(Xs − E Xs)(Xt − E Xt )], s, t ∈ T. (33.1)
For our purposes, having a non-zero mean just complicates formulas without adding anything
interesting, so in this chapter we will assume E Xt = 0 for all t ∈ T , and (33.1) becomes

(s, t) = E [XsXt], s, t ∈ T. (33.2)
We first show how
can be used to construct a Hilbert space called the reproducing kernel
Hilbert space (RKHS).
When we write
(s, ·), we mean that we fix an element s ∈ T and then consider the
function g : T → R defined by g(t) =
(s, t) for t ∈ T . Let K be the collection of finite
linear combinations of the functions
(s, ·), s ∈ T . Thus each element of K has the form
m∑
j=1
aj
(s j, ·),
where m ≥ 1, the aj’s are real, and each s j, j = 1, . . . , m, is an element of T . If f =∑m
j=1 aj
(s j, ·) and g =
∑n
k=1 bk
(tk, ·), define
〈 f , g〉RKHS =
m∑
j=1
n∑
k=1
ajbk
(s j, tk ).
We define H to be the closure of K with respect to the norm induced by the inner product
〈·, ·〉RKHS.
We need to show that this bilinear form is indeed an inner product, that what is known as
the reproducing property holds, and that H is a Hilbert space.
251

252 Gaussian processes
We start with the reproducing property. If f = ∑mj=1 aj
(s j, ·), then the reproducing
property applied to f is the formula
〈 f ,
(t, ·)〉RKHS = f (t). (33.3)
This follows from
〈 f ,
(t, ·)〉RKHS =
m∑
j=1
aj
(s j, t) = f (t).
By taking limits, (33.3) holds for all f ∈ H.
To show that 〈·, ·〉RKHS is an inner product, notice that when
f =
∑
aj
(s j, ·) ∈ K,
then
〈 f , f 〉RKHS =
m∑
j=1
m∑
k=1
ajak
(s j, sk ) =
m∑
j,k=1
ajakE [Xsj Xsk ]
= E
( m∑
j=1
ajXsj
)2
≥ 0.
The Cauchy–Schwarz inequality holds for 〈·, ·〉RKHS (the standard proof of the Cauchy–
Schwarz inequality applies), and so if 〈 f , f 〉RKHS = 0, then
| f (t)|2 = 〈 f ,
(t, ·)〉2RKHS ≤ 〈 f , f 〉RKHS 〈
(t, ·),
(t, ·)〉RKHS = 0,
and thus f is zero.
If fn is a Cauchy sequence with respect to the norm
‖g‖RKHS = 〈g, g〉1/2RKHS,
then
| fn(t) − fm(t)|2 = 〈 fn − fm,
(t, ·)〉2RKHS
≤ 〈 fn − fm, fn − fm〉RKHS 〈
(t, ·),
(t, ·)〉RKHS,
which tends to 0 as n, m → ∞. Thus fn converges pointwise. This is enough to prove H is
complete; this is Exercise 33.1.
We summarize.
Proposition 33.1 H with the inner product 〈·, ·〉RKHS is a Hilbert space. Moreover, if f ∈ H
and t ∈ T , then
〈 f ,
(t, ·)〉RKHS = f (t).
We consider another Hilbert space M, the closure of the linear span of {Xt : t ∈ T } with
respect to L2(P). We define
〈Y, Z〉M = E [Y Z]

33.1 Reproducing kernel Hilbert spaces 253
if Y and Z are both finite linear combinations of the Xt’s. Thus if m, n ≥ 1, aj, bk ∈ R, we set⟨ m∑
j=1
ajXsj ,
n∑
k=1
bkXtk
⟩
M
=
m∑
j=1
n∑
k=1
ajbkE [Xsj Xtk ], (33.4)
and we let M be the closure of the collection of random variables of the form ∑mj=1 ajXsj
with respect to 〈·, ·〉M. Since
(s j, tk ) = E [Xsj Xtk ], from (33.4) we see that H and M
are isomorphic, where we have a one-to-one correspondence between
∑m
j=1 aj
(s j, ·) and∑m
j=1 ajXsj .
Let {en} be a complete orthonormal system for H. Let Yn be the element of M corre-
sponding to en. Then
E [YnYm] = 〈Yn,Ym〉M = 〈en, em〉RKHS = δnm,
where δnm is 0 if n �= m and 1 if n = m. This implies that the Yn are independent normal
random variables with mean zero and variance one; see Proposition A.55. (Recall that we
are assuming that all the Xt’s have mean zero.)
Since
(s, ·) is an element of H, we can write

(s, ·) =
∞∑
n=1
〈
(s, ·), en〉RKHS en(·) =
∞∑
n=1
en(s)en(·).
Using the correspondence between H and M, we have
Xs =
∞∑
n=1
en(s)Yn,
where the Yn are i.i.d. standard normal variables. This is known as the Karhunen–Loève
expansion of a Gaussian process.
Example 33.2 Let’s see what this expansion is in the case of Brownian motion. If we define
〈 f , g〉CM =
∫ 1
0
f ′(r)g′(r) dr (33.5)
for f and g whose first derivatives are in L2([0, 1]) and such that f (0) = g(0) = 0, then
because
(s, t) = s ∧ t,
〈
(s, ·),
(t, ·)〉CM =
∫ 1
0
1[0,s)(r)1[0,t)(r) dr = s ∧ t
=
(s, t),
and we see that we have identified the reproducing kernel Hilbert space for Brownian motion
on [0, 1]. The notation 〈·, ·〉CM is used because the Hilbert space with this inner product is
called the Cameron–Martin space, a space that has many connections with Brownian motion.
If en(s) =
√
2 sin(nπs)/nπ , then the sequence {en} is a complete orthonormal sequence
for the Cameron–Martin space. The Karhunen–Loève expansion is equivalent to the formula
(6.2) that we used in our first construction of Brownian motion.

254 Gaussian processes
33.2 Continuous Gaussian processes
We now turn to the construction of Gaussian processes with continuous paths. Suppose we
have an index set T and a non-negative definite kernel
(·, ·). Saying
is non-negative
definite means that for each n and each t1, . . . , tn ∈ T , the matrix whose (i, j) entry is
(ti, t j) is a non-negative definite matrix. We define a metric on T by defining
d(s, t) = (Var (Xt − Xs))1/2.
Actually, d is a pseudo-metric because d(s, t) = 0 does not necessarily imply t = s. An
ε-ball is a set of the form {t ∈ T : d(t, t0) < ε} for some t0. Let N (ε) be the minimum number of ε-balls needed to cover T . Theorem 33.3 Let : T × T → R be continuous with respect to the pseudo-metric d, symmetric, and non-negative definite. If for some β < 1 and some constant c we have log N (ε) ≤ cε−β, ε ∈ (0, 1), (33.6) then there exists a continuous Gaussian process {Xt : t ∈ T } with covariance kernel . One can in fact be more precise than (33.6) and give an integral condition that N (x) must satisfy for x small. Before proving Theorem 33.3, let us look at a number of examples. Example 33.4 In the case of Brownian motion, Var (Xt − Xs) = |t − s|, so that d(s, t) = |s − t|1/2. If T is the interval [0, 1], then the set of intervals of length ε2 and centers kε2/4, k = 0, 1, . . . , 4/ε2, is a collection of ε-balls covering [0, 1]. Therefore N (ε) ≤ c/ε2, implying log N (ε) ≤ c log(1/ε), which satisfies (33.6). This and Theorem 2.4 gives a construction of Brownian motion. Example 33.5 We look at fractional Brownian motion. Let H ∈ (0, 2). H is known as the Hurst index, where H = 1 corresponds to Brownian motion. Define (s, t) = |s|H + |t|H − |s − t|H . This leads to d(s, t) = c|t − s|H/2. Open intervals of length ε2/H are ε-balls, and it takes cε−2/H of them to cover [0, 1]. Therefore again N (ε) ≤ c log(1/ε), and (33.6) applies. One use of fractional Brownian motion is to model stock prices where there is more or less memory of the past than a Brownian motion has. Example 33.6 Here is our first example of a Gaussian process where T is not a subset of [0, ∞). We construct a Brownian sheet, X (t1, t2), where the points (t1, t2) ∈ [0, 1]2. More generally we can consider X (t), where t ∈ [0, 1]d . This is no harder, but for simplicity of notation we consider only the case d = 2. If s = (s1, s2) and t = (t1, t2), define (s, t) = (s1 ∧ t1)(s2 ∧ t2). One motivation for this formula is to identify the point (t1, t2) with the rectangle Rt whose lower left corner is at the origin and whose upper right corner is at (t1, t2). Then the covariance of Xs and Xt is the area of Rs ∩ Rt . 33.2 Continuous Gaussian processes 255 Some simple geometry shows that if we put ε-balls centered at the points (c1 jε2, c1kε2) for an appropriate c1 and with j, k ≤ c2ε−2, we cover T . Therefore N (ε) ≤ cε−4, and so log N (ε) ≤ c log(1/ε). Example 33.7 We can generalize the last example. For every Borel subset A of [0, 1]d , let XA be a Gaussian random variable. We want the covariance of XA and XB to be the Lebesgue measure of A ∩ B. This is known as a set-indexed process. If we let T be the collection of all Borel subsets of [0, 1]d , one cannot get a continuous Gaussian process. In order to get a continuous process X one must restrict T to be a subcollection of sets whose boundaries are sufficiently smooth; see Dudley (1973). Example 33.8 Our last example has a more complicated index set. Let W be a one- dimensional Brownian motion. If f ∈ L2[0, 1], define Xf = ∫ 1 0 f (s) dWs. By Exercise 24.6, Xf is a Gaussian random variable with mean 0 and variance ∫ 1 0 f (s) 2 ds and the covariance of Xf and Xg is ∫ 1 0 f (s)g(s) ds. It follows that d( f , g)2 = ∫ 1 0 ( f (s) − g(s))2 ds. The process Xf is known as a Gaussian field. For what subsets T of L2([0, 1]) can one define a process Xf that has continuous paths with respect to d? This means that the map f → Xf (ω) is continuous for almost all ω, where we use the pseudo-metric d to define open sets in T . It turns out T = { f ∈ L2([0, 1]) : ‖ f ‖2 ≤ 1} is too large to obtain a continuous Gaussian process, but, for example, T = { f ∈ C2([0, 1]) : ‖ f ‖∞ ≤ 1, ‖ f ′‖∞ ≤ 1, ‖ f ′′‖∞ ≤ 1} is small enough to apply Theorem 33.3. We now proceed to the proof of Theorem 33.3. Proof of Theorem 33.3 Since T can be covered by finitely many ε-balls for each ε, it follows that if A(ε) is the collection of centers for the cover by ε-balls, then A = ∪∞n=1A(2−n) is a countable dense subset of T . We first label the elements of A by t1, t2, . . . For each n, we construct the law of (Xt1, . . . , Xtn ). We then use the Kolmogorov extension theorem to construct the law of {Xt : t ∈ A}. Next we prove that t → Xt is uniformly continuous on A, almost surely. Finally we define Xt for all t ∈ T by continuity. Step 1. We construct the law of (Xt1, . . . , Xtn ). Let n be fixed, and let B be an n×n matrix whose (i, j) entry is (ti, t j). The matrix B is symmetric, and non-negative definite by hypothesis. Let Y1, . . . ,Yn be independent normal random variables with mean zero and variance one. If we let C be the non-negative definite square root of B and X = CY (viewed as vectors), or equivalently, Xti = n∑ j=1 Ci jYj, 256 Gaussian processes a simple calculation shows that E [Xtk Xtm ] = Bkm = (tk, tm). The Xtj ’s are jointly normal and this gives the first step of the construction. Step 2. We apply the Kolmogorov extension theorem. Let Pn be the law of (Xt1, . . . , Xtn ). It is easy to see the consistency property holds for the Pn, so by the Kolmogorov extension theorem, there exists a probability P on RN such that if we define Xt (ω) by ω(t) for t ∈ A, the law of (Xt1, . . . , Xtn ) is Pn for each n. Step 3. We show that except for a null set of probability zero, the map t → Xt (ω) is uniformly continuous on A. To prove the uniform continuity, we proceed similarly to Theorem 8.1. For each point t ∈ A, let t j be the element of A(2− j) closest to t, with some convention for breaking ties. We will fix J in a moment, and write Xt = XtJ + (XtJ+1 − XtJ ) + (XtJ+2 − XtJ+1 ) + · · · , where the sum is finite because t ∈ A. Let λ > 0. If |Xt − Xs| > λ for some s, t ∈ A with
d(s, t) < 2−, then ω is in one or more of the following events: (a) the event EJ = {|XtJ − XsJ | > λ/2 for some sJ , tJ ∈ A(2−J ) with d(sJ , tJ ) ≤ 3 · 2−J };
(b) the event
Fj =
{
|Xtj+1 − Xtj | >
λ
8 j2
for some t j ∈ A(2− j), t j+1 ∈ A(2−( j+1))
with d(t j, t j+1) < 3 · 2− j+1 } for some j ≥ J ; (c) the event Gj = { |Xsj+1 − Xsj | >
λ
8 j2
for some s j ∈ A(2− j), s j+1 ∈ A(2−( j+1))
with d(s j, s j+1) < 3 · 2− j+1 } for some j ≥ J . First we bound the probability of EJ . There are N (2−J ) elements of A(2−J ), so there are at most exp(2c(2J )β ) pairs (sJ , tJ ). If d(tJ , sJ ) < 3 · 2−J , then P(|XsJ − XtJ | > λ/2) ≤ 2 exp
(
− (λ/2)
2
2 · 3 · 2−J
)
.
Therefore the probability of EJ is bounded by
P(Ej) ≤ ec2βJ e−cλ22J .
Since β < 1, this can be made as small as we like by taking J large enough. For any t j and t j+1 with d(t j, t j+1) < 3 · 2− j+1, P(|Xtj − Xtj+1 | > λ/(8 j2)) ≤ 2 exp
( λ2/64 j4
6 · 2− j+1
)
.

Exercises 257
There are less than ec2
β j
points in A(2− j) and ec2β( j+1) points in A(2−( j+1)), so less than ec2β j
pairs. Thus the probability of Fj is bounded by
P(Fj) ≤ cec2β j e−cλ22 j/ j4 .
Since β < 1, this is summable in j, and ∑∞ j=J P(Fj) can be made as small as we like if we take J large enough. We handle the bound for Gj similarly. Thus, given ε, we have P( sup s,t∈A,d(s,t)<2−J |Xt − Xs| > λ) ≤ ε
if we take J large enough, where J depends on ε and λ. This suffices to prove the uniform
continuity.
Step 4. We use continuity to complete the proof. Define Xt = lims∈A,s→t Xs. The limit exists
and will be a continuous function of t by virtue of the uniform continuity. By Remark A.56,
Xt will have the desired covariance function.
We have been considering Gaussian processes taking values in R, but it is also of interest
to look at Brownian motion taking values in a Hilbert space or a Banach space. There are
three steps to constructing such a process:
(1) constructing Gaussian measures on Banach (or Hilbert) spaces;
(2) getting a suitable estimate on ‖Xt − Xs‖;
(3) constructing a Brownian motion.
Of these three steps, the third follows along the lines we used for real-valued processes.
Steps (1) and (2) require considerable work, and we refer the reader to Bogachev (1998) or
Kuo (1975). A measure μ on a Banach space is called Gaussian if μ ◦ L−1 is a Gaussian
measure on R for every linear functional L on the Banach space.
Exercises
33.1 Finish the proof that H as defined in Section 33.1 is complete.
33.2 Show that if in Example 33.8 we let
T = { f ∈ C1([0, 1]); ‖ f ‖∞ ≤ 1, ‖ f ′‖∞ ≤ 1},
then N (ε) is bounded above by c1ε−1 and bounded below by c2ε−1.
33.3 Suppose X i and Y i are two sequences of Brownian motions with all of the Brownian motions
independent of each other. Let
Zn(s,t) =
1√
n
n∑
i=1
X is Y
i
t .
Prove that Zn converges weakly with respect to the topology of C([0, 1]2) as n → ∞ to a
Brownian sheet.

258 Gaussian processes
33.4 Let X be a Brownian bridge. (This will be studied further in Section 35.2.) This means that X
is a mean zero Gaussian process with
Cov (Xs, Xt ) = s ∧ t − st, 0 ≤ s, t ≤ 1.
Identify the reproducing kernel Hilbert space for X .
33.5 Let X be the Ornstein–Uhlenbeck process started at 0. This was defined in Exercise 19.5.
Identify the reproducing kernel Hilbert space for X .

34
The space D[0, 1]
We define the space D[0, 1] to be the collection of real-valued functions on [0, 1] which
are right continuous with left limits. We will introduce a topology on D = D[0, 1], the
Skorokhod topology, which makes D into a complete separable metric space. We will give
a criterion for a subset of D to be compact, which will lead to some criteria for a family of
probability measures on D to be tight.
34.1 Metrics for D[0, 1]
We write f (t−) for lims 0, let t0 = 0, and for i > 0 let ti+1 = inf{t > ti : | f (t) − f (ti)| > ε} ∧ 1. Because f is
right continuous with left limits, then from some i on, ti must be equal to 1.
Our first try at a metric, ρ, makes D into a separable metric space, but one that is not
complete. Let’s start with ρ anyway, since we need it on the way to the metric d we end up
with.
Let � be the set of functions λ from [0, 1] to [0, 1] that are continuous, strictly increasing,
and such that λ(0) = 0, λ(1) = 1. Define
ρ( f , g) = inf{ε > 0 : ∃λ ∈ � such that sup
t∈[0,1]
|λ(t) − t| < ε, sup t∈[0,1] | f (t) − g(λ(t))| < ε}. Since the function λ(t) = t is in �, then ρ( f , g) is finite if f , g ∈ D. Clearly ρ( f , g) ≥ 0. If ρ( f , g) = 0, then either f (t) = g(t) or else f (t) = g(t−) for each t; since elements of D are right continuous with left limits, it follows that f = g. If λ ∈ �, then so is λ−1 and we have, setting s = λ−1(t) and noting both s and t range over [0, 1], sup t∈[0,1] |λ−1(t) − t| = sup s∈[0,1] |s − λ(s)| and sup t∈[0,1] | f (λ−1(t)) − g(t)| = sup s∈[0,1] | f (s) − g(λ(s))|, and we conclude ρ( f , g) = ρ(g, f ). The triangle inequality follows from sup t∈[0,1] |λ2 ◦ λ1(t) − t| ≤ sup t∈[0,1] |λ1(t) − t| + sup s∈[0,1] |λ2(s) − s| 259 260 The space D[0, 1] and sup t∈[0,1] | f (t) − h(λ2 ◦ λ1(t))| ≤ sup t∈[0,1] | f (t) − g(λ1(t))| + sup s∈[0,1] |g(s) − h(λ2(s))|. Look at the set of f in D for which there exists an integer k such that f is constant and equal to a rational on each interval [(i−1)/k, i/k). It is not hard to check (Exercise 34.1) that the collection of such f ’s is dense in D with respect to ρ, which shows (D, ρ) is separable. The space D with the metric ρ is not, however, complete; see Exercise 34.2. We therefore introduce a slightly different metric d. Define ‖λ‖ = sup s�=t,s,t∈[0,1] ∣∣∣ log λ(t) − λ(s) t − s ∣∣∣ and let d( f , g) = inf{ε > 0 : ∃λ ∈ � such that ‖λ‖ ≤ ε, sup
t∈[0,1]
| f (t) − g(λ(t))| ≤ ε}.
Note ‖λ−1‖ = ‖λ‖ and ‖λ2 ◦ λ1‖ ≤ ‖λ1‖ + ‖λ2‖. The symmetry of d and the triangle
inequality follow easily from this, and we conclude d is a metric.
Lemma 34.1 There exists ε0 such that
ρ( f , g) ≤ 2d( f , g)
if d( f , g) < ε0. (It turns out ε0 = 1/4 will do.) Proof Since log(1 + 2x)/(2x) → 1 as x → 0, we have log(1 − 2ε) < −ε < ε < log(1 + 2ε) if ε is small enough. Suppose d( f , g) < ε and λ is the element of � such that d( f , g) < ‖λ‖ < ε and supt∈[0,1] | f (t) − g(λ(t))| < ε. Since λ(0) = 0, we have log(1 − 2ε) < −ε < log λ(t) t < ε < log(1 + 2ε), (34.1) or 1 − 2ε < λ(t) t < 1 + 2ε, (34.2) which implies |λ(t) − t| < 2ε, and hence ρ( f , g) ≤ 2d( f , g). We define the analog ξ f of the modulus of continuity for a function in D as follows. Define θ f [a, b) = sups,t∈[a,b) | f (t) − f (s)| and ξ f (δ) = inf{ max 1≤i≤n θ f [ti−1, ti) : ∃n ≥ 1, 0 = t0 < t1 < · · · < tn = 1 such that ti − ti−1 > δ for all i ≤ n}.
Observe that if f ∈ D, then ξ f (δ) ↓ 0 as δ ↓ 0.

34.1 Metrics for D[0, 1] 261
Lemma 34.2 Suppose δ < 1/4. Let f ∈ D. If ρ( f , g) ≤ δ2, then d( f , g) ≤ 4δ + ξ f (δ). Proof Choose ti’s such that ti − ti−1 > δ and θ f [ti−1, ti) < ξ f (δ) + δ for each i. Pick μ ∈ � such that supt | f (t)−g(μ(t))| < δ2 and supt |μ(t)−t| < δ2. Then supt | f (μ−1(t))−g(t)| < δ2. Set λ(ti) = μ(ti) and let λ be linear in between. Since μ−1(λ(ti)) = ti for all i, then t and μ−1 ◦ λ(t) always lie in the same subinterval [ti−1, ti). Consequently | f (t) − g(λ(t))| ≤ | f (t) − f (μ−1(λ(t)))| + | f (μ−1(λ(t))) − g(λ(t))| ≤ θ f (δ) + δ2 ≤ ξ f (δ) + δ + δ2 < ξ f (δ) + 4δ. We have |λ(ti) − λ(ti−1) − (ti − ti−1)| = |μ(ti) − μ(ti−1) − (ti − ti−1)| ≤ 2δ2 < 2δ(ti − ti−1). Since λ is defined by linear interpolation, |λ(t) − λ(s)) − (t − s)| ≤ 2δ|t − s|, s, t ∈ [0, 1], which leads to ∣∣∣λ(t) − λ(s) t − s − 1 ∣∣∣ ≤ 2δ, or log(1 − 2δ) ≤ log (λ(t) − λ(s) t − s ) ≤ log(1 + 2δ). Since δ < 14 , we have ‖λ‖ ≤ 4δ. Proposition 34.3 The metrics d and ρ are equivalent, i.e., they generate the same topology. In particular, (D, d) is separable. Proof Let Bρ ( f , r) denote the ball with center f and radius r with respect to the metric ρ and define Bd ( f , r) analogously. Let ε > 0 and let f ∈ D. If d( f , g) < ε/2 and ε is small enough, then ρ( f , g) ≤ 2d( f , g) < ε, and so Bd ( f , ε/2) ⊂ Bρ ( f , ε). To go the other direction, what we must show is that given f and ε, there exists δ such that Bρ ( f , δ) ⊂ Bd ( f , ε). δ may depend on f ; in fact, it has to in general, for otherwise a Cauchy sequence with respect to d would be a Cauchy sequence with respect to ρ, and vice versa. Choose δ small enough that 4δ1/2 + ξ f (δ1/2) < ε. By Lemma 34.2, if ρ( f , g) < δ, then d( f , g) < ε, which is what we want. Finally, suppose G is open with respect to the topology generated by ρ. For each f ∈ G, let r f be chosen so that Bρ ( f , r f ) ⊂ G. Hence G = ∪ f ∈GBρ ( f , r f ). Let s f be chosen so that Bd ( f , s f ) ⊂ Bρ ( f , r f ). Then ∪ f ∈GBd ( f , s f ) ⊂ G, and in fact the sets are equal because if f ∈ G, then f ∈ Bd ( f , s f ). Since G can be written as the union of balls which are open with respect to d, then G is open with respect to d. The same argument with d and ρ interchanged shows that a set that is open with respect to d is open with respect to ρ. 262 The space D[0, 1] 34.2 Compactness and completeness We now show completeness for (D, d). Theorem 34.4 The space D with the metric d is complete. Proof Let fn be a Cauchy sequence with respect to the metric d. If we can find a subsequence nj such that fn j converges, say, to f , then it is standard that the whole sequence converges to f . Choose nj such that d( fn j , fn j+1 ) < 2 − j. For each j there exists λ j such that sup t | fn j (t) − fn j+1 (λ j(t))| ≤ 2− j, ‖λ j‖ ≤ 2− j. As in (34.1) and (34.2), |λ j(t) − t| ≤ 2− j+1. Then sup t |λn+m+1 ◦ λm+n ◦ · · · ◦ λn(t) − λn+m ◦ · · · ◦ λn(t)| = sup s |λn+m+1(s) − s| ≤ 2−(n+m) for each n. Hence for each n, the sequence λm+n ◦ · · · ◦ λn (indexed by m) is a Cauchy sequence of functions on [0, 1] with respect to the supremum norm on [0, 1]. Let νn be the limit. Clearly νn(0) = 0, νn(1) = 1, νn is continuous, and nondecreasing. We also have∣∣∣ log λn+m ◦ · · · ◦ λn(t) − λn+m ◦ · · · ◦ λn(s) t − s ∣∣∣ ≤ ‖λn+m ◦ · · · ◦ λn‖ ≤ ‖λn+m‖ + · · · + ‖λn‖ ≤ 1 2n−1 . If we then let m → ∞, we obtain∣∣∣ log νn(t) − νn(s) t − s ∣∣∣ ≤ 1 2n−1 , which implies νn ∈ � with ‖νn‖ ≤ 21−n. We see that νn = νn+1 ◦ λn. Consequently sup t | fn j (ν−1j (t)) − fn j+1 (ν−1j+1(t))| = sup s | fn j (s) − fn j+1 (λ j(s))| ≤ 2− j. Therefore fn j ◦ ν−1j is a Cauchy sequence on [0, 1] with respect to the supremum norm. Let f be the limit. Since sup t | fn j (ν−1j (t)) − f (t)| → 0 and ‖ν j‖ → 0 as j → ∞, then d( fn j , f ) → 0. 34.2 Compactness and completeness 263 We next show that if fn → f with respect to d and f ∈ C[0, 1], the convergence is in fact uniform. Proposition 34.5 Suppose fn → f in the topology of D[0, 1] with respect to d and f ∈ C[0, 1]. Then supt∈[0,1] | fn(t) − f (t)| → 0. Proof Let ε > 0. Since f is uniformly continuous on [0, 1], there exists δ such that
| f (t) − f (s)| < ε/2 if |t − s| < δ. For n sufficiently large there exists λn ∈ � such that supt | fn(t) − f (λn(t))| < ε/2 and supt |λn(t) − t| < δ. Therefore | f (λn(t)) − f (t)| < ε/2, and so | fn(t) − f (t)| < ε. We turn to compactness. Theorem 34.6 A set A has compact closure in D[0, 1] if sup f ∈A sup t | f (t)| < ∞ and lim δ→0 sup f ∈A ξ f (δ) = 0. The converse of this theorem is also true, but we won’t need this. See Billingsley (1968) or Exercise 34.9. Proof A complete and totally bounded set in a metric space is compact, and D[0, 1] is a complete metric space. Hence it suffices to show that A is totally bounded: for each ε > 0
there exist finitely many balls of radius ε that cover A.
Let η > 0 and choose k large such that 1/k < η and ξ f (1/k) < η for each f ∈ A. Let M = sup f ∈A supt | f (t)| and let H = {−M + j/k : j ≤ 2kM}, so that H is an η-net for [−M, M]. Let B be the set of functions f ∈ D[0, 1] that are constant on each interval [(i − 1)/k, i/k) and that take values only in the set H . In particular, f (1) ∈ H . We first prove that B is a 2η-net for A with respect to ρ. If f ∈ A, there exist t0, . . . , tn such that t0 = 0, tn = 1, ti − ti−1 > 1/k for each i, and θ f [ti−1, ti) < η for each i. Note we must have n ≤ k. For each i choose integers ji such that ji/k ≤ ti < ( ji + 1)/k. The ji are distinct since the ti are at least 1/k apart. Define λ so that λ( ji/k) = ti and λ is linear on each interval [ ji/k, ji+1/k]. Choose g ∈ B such that |g(m/k) − f (λ(m/k))| < η for each m ≤ k. Observe that each [m/k, (m + 1)/k) lies inside some interval of the form [ ji/k, ji+1/k). Since λ is increasing, [λ(m/k), λ((m + 1)/k)) is contained in [λ( ji/k), λ( ji+1/k)) = [ti, ti+1). The function f does not vary more than η over each interval [ti, ti+1), so f (λ(t)) does not vary more than η over each interval [m/k, (m + 1)/k). g is constant on each such interval, and hence sup t |g(t) − f (λ(t))| < 2η. We have |λ( ji/k) − ji/k| = |ti − ji/k| < 1/k < η for each i. By the piecewise linearity of λ, supt |λ(t) − t| < η. Thus ρ( f , g) < 2η. We have proved that given f ∈ A, there exists g ∈ B such that ρ( f , g) < 2η, or B is a 2η-net for A with respect to ρ. 264 The space D[0, 1] Now let ε > 0 and choose δ > 0 small so that 4δ + ξ f (δ) < ε for each f ∈ A. Set η = δ2/4. Choose B as above to be a 2η-net for A with respect to ρ. By Lemma 34.2, if ρ( f , g) < 2η < δ2, then d( f , g) ≤ 4δ + ξ f (δ) < ε. Therefore B is an ε-net for A with respect to d. The following corollary is proved exactly similarly to Theorem 32.1. Corollary 34.7 Suppose Xn are processes whose paths are right continuous with left limits. Suppose for each ε and η there exists n0, R, and δ such that P(ξXn (δ) ≥ ε) ≤ η and P( sup t∈[0,1] |Xn(t)| ≥ R) ≤ η. Then the Xn are tight with respect to the topology of D[0, 1]. 34.3 The Aldous criterion A very useful criterion for tightness is the following one due to Aldous (1978). Theorem 34.8 Let {Xn} be a sequence in D[0, 1]. Suppose lim R→∞ sup n P(|Xn(t)| ≥ R) = 0 (34.3) for each t ∈ [0, 1] and that whenever τn are stopping times for Xn and δn → 0 are reals, |Xn(τn + δn) − Xn(τn)| (34.4) converges to 0 in probability as n → ∞. Proof We will set Xn(t) = Xn(1) for t ∈ [1, 2] to simplify notation. The proof of this theorem comprises four steps. Step 1. We claim that (34.4) implies the following: given ε there exist n0 and δ such that P(|Xn(τn + s) − Xn(τn)| ≥ ε) ≤ ε (34.5) for each n ≥ n0, s ≤ 2δ, and τn a stopping time for Xn. For if not, we choose an increasing subsequence nk , stopping times τnk , and snk ≤ 1/k for which (34.5) does not hold. Taking δnk = snk gives a contradiction to (34.4). Step 2. Let ε > 0, fix n ≥ n0, and let T ≤ U ≤ 1 be two stopping times for Xn. We will prove
P(U ≤ T + δ, |Xn(U ) − Xn(T )| ≥ 2ε) ≤ 16ε. (34.6)
To prove this, we start by letting λ be Lebesgue measure. If
AT = {(ω, s) ∈ � × [0, 2δ] : |Xn(T + s) − Xn(T )| ≥ ε},
then for each s ≤ 2δ we have P(ω : (ω, s) ∈ AT ) ≤ ε by (34.5) with τn replaced by T .
Writing P × λ for the product measure, we then have
P × λ(AT ) ≤ 2δε. (34.7)

34.3 The Aldous criterion 265
Set BT (ω) = {s : (ω, s) ∈ AT } and CT = {ω : λ(BT (ω)) ≥ 14δ}. From (34.7) and the
Fubini theorem, ∫
λ(BT (ω)) P(dω) ≤ 2δε,
so
P(CT ) ≤ 8ε.
We similarly define BU and CU , and obtain P(CT ∪ CU ) ≤ 16ε.
If ω /∈ CT ∪ CU , then λ(BT (ω)) ≤ 14δ and λ(BU (ω)) ≤ 14δ. Suppose U ≤ T + δ. Then
λ{t ∈ [T, T + 2δ] : |Xn(t) − Xn(T )| ≥ ε} ≤ 14δ,
and
λ{t ∈ [U,U + δ] : |Xn(t) − Xn(U )| ≥ ε} ≤ 14δ.
Hence there exists t ∈ [T, T + 2δ] ∩ [U,U + δ] such that |Xn(t) − Xn(T )| < ε and |Xn(t) − Xn(U )| < ε; this implies |Xn(U ) − Xn(T )| < 2ε, which proves (34.6). Step 3. We obtain a bound on ξXn . Let Tn0 = 0 and Tn,i+1 = inf{t > Tni : |Xn(t) − Xn(Tni)| ≥ 2ε} ∧ 2.
Note we have |Xn(Tn,i+1) − Xn(Tni)| ≥ 2ε if Tni < 2. We choose n0, δ as in Step 1. By Step 2 with T = Tni and U = Tn,i+1, P(Tn,i+1 − Tni < δ, Tni < 2) ≤ 16ε. (34.8) Let K = [2/δ] + 1 and apply (34.5) with ε replaced by ε/K to see that there exist n1 ≥ n0 and ζ ≤ δ ∧ ε such that if n ≥ n1, s ≤ 2ζ , and τn is a stopping time, then P(|Xn(τn + s) − Xn(τn)| > ε/K) ≤ ε/K. (34.9)
By (34.6) with T = Tni and U = Tn,i+1 and δ replaced by ζ ,
P(Tn,i+1 ≤ Tni + ζ ) ≤ 16ε/K (34.10)
for each i and hence
P(∃i ≤ K : Tn,i+1 ≤ Tni + ζ ) ≤ 16ε. (34.11)
We have
E [Tni − Tn,i−1; TnK < 1] ≥ δP(Tni − Tn,i−1 ≥ δ, TnK < 1) ≥ δ[P(TnK < 1) − P(Tni − Tn,i−1 < δ, TnK < 1)] ≥ δ[P(TnK < 1) − 16ε], where we used (34.8) in the last step. Summing over i from 1 to K, P(TnK < 1) ≥ E [TnK; TnK < 1] = K∑ i=1 E [Tni − Tn,i−1; TnK < 1] ≥ Kδ[P(TnK < 1) − 16ε] ≥ 2[P(TnK < 1) − 16ε], 266 The space D[0, 1] or P(TnK < 1) ≤ 32ε. Hence except for an event of probability at most 32ε, we have ξXn (ζ ) ≤ 4ε. Step 4. The last step is to obtain a bound on supt |Xn(t)|. Let ε > 0 and choose δ and n0 as
in Step 1. Define
DRn = {(ω, s) ∈ � × [0, 1] : |Xn(s)(ω)| > R}
for R > 0. The measurability of DRn with respect to the product σ -field F × B[0, 1], where
B[0, 1] is the Borel σ -field on [0, 1], follows by the fact that Xn is right continuous with left
limits. Let
G(R, s) = sup
n
P(|Xn(s)| > R).
By (34.3), G(R, s) → 0 as R → ∞ for each s. Pick R large so that
λ({s : G(R, s) > εδ}) < εδ. Then ∫ 1DRn (ω, s) P(dω) = P(|Xn(s)| > R) ≤
{
1, G(r, s) > εδ,
εδ, otherwise.
Integrating over s ∈ [0, 1],
P × λ(DRn) < 2εδ. If ERn(ω) = {s : (ω, s) ∈ DRn} and FRn = {ω : λ(ERn) > δ/4}, we have
1
4δP(FRn) =
∫
FRn
1
4δ P(dω) ≤
∫ ∫ 1
0
1DRn (ω, s) λ(ds) P(dω) ≤ 2εδ,
so P(FRn) ≤ 8ε.
Define T = inf{t : |Xn(t)| ≥ R + 2ε} ∧ 2 and define AT , BT , and CT as in Step 2. We have
P(CT ∪ FRn) ≤ 16ε.
If ω /∈ CT ∪ FRn and T < 2, then λ(ERn(ω)) ≤ δ/4. Hence there exists t ∈ [T, T + 2δ] such that |Xn(t)| ≤ R and |Xn(t) − Xn(T )| ≤ ε. Therefore |Xn(T )| ≤ R + ε, which contradicts the definition of T . We conclude that T must equal 2 on the complement of CT ∪ FRn, or in other words, except for an event of probability at most 16ε, we have supt |Xn(t)| ≤ R + 2ε, provided, of course, that n ≥ n0. An application of Corollary 34.7 completes the proof. Aldous’s criterion is particularly well suited for strong Markov processes. Proposition 34.9 Suppose Xn is a sequence of real-valued strong Markov processes and there exists c, p, and γ > 0 such that
E
x|Xn(t) − Xn(0)|p ≤ ctγ , x ∈ R, t ∈ [0, 1]. (34.12)
Then for each x ∈ R, the sequence of Px-laws of {Xn} is tight with respect to the space
D[0, 1].

Exercises 267
Unlike the Kolmogorov continuity criterion, we do not require γ > 1.
Proof Fix x. For each t,
Px(|Xn(t)| ≥ R + |x|) ≤ Px(|Xn(t) − Xn(0)| ≥ R)
≤ E
x|Xn(t) − Xn(0)|p
Rp
≤ ct
γ
Rp
,
which tends to 0 as R → ∞. We used Chebyshev’s inequality here.
Suppose τn are stopping times for Xn and δn → 0. By the strong Markov property, for each
ε > 0
Px(|Xn(τn + δn) − Xn(τn)| > ε) ≤ E
x|Xn(τn + δn) − Xn(τn)|p
εp
= ε−pE x[E Xn(τn)|Xn(δn) − Xn(0)|γ ]
≤ cε−pδγn ,
which tends to 0 as n → ∞. Now apply Theorem 34.8.
Exercises
34.1 Show that the space D with the metric ρ is separable.
34.2 Let fn = 1[1/2,1/2+1/n). Show that this is a Cauchy sequence with respect to ρ, but does not
converge to an element of D. Show { fn} is not a Cauchy sequence with respect to d.
34.3 Show that (with respect to the topology on D) the subset C[0, 1] of D is nowhere dense.
34.4 Consider D with the metric dsup( f , g) = supt∈[0,1] | f (t) − g(t)|. Show that D is not separable
with respect to the metric dsup.
34.5 Suppose P and P′ are measures supported on D[0, 1] that agree on all cylindrical subsets of
D[0, 1]. In other words, all the finite-dimensional distributions agree. Prove that P = P′ on
D[0, 1].
34.6 Show that the following are continuous functions on the space D[0, 1].
(1) f (x) = supt≤1 x(t).
(2) f (x) = ∫ 10 x(t) dt.
(3) f (x) = supt≤1(x(t) − x(t−)).
34.7 Let P be a Poisson process with parameter λ. Prove that
Pnt − nλt√
nλ
converges weakly with respect to the topology of D[0, 1] as n → ∞ to a Brownian motion.
34.8 Suppose Xn converges weakly to X with respect to the topology of C[0, 1]. Prove that Xn
converges weakly to X with respect to the topology of D[0, 1].

268 The space D[0, 1]
34.9 This is the converse to Theorem 34.6. Let A be an index set, and suppose the collection of
functions { fα, f ∈ A} is precompact in D[0, 1], i.e., its closure is compact.
(1) Prove supα∈A sup0≤t≤1 | f (t)| < ∞. (2) Prove lim δ→0 sup α∈A ξ fα (δ) = 0. Notes See Billingsley (1968) for more information. 35 Applications of weak convergence In Chapter 32 we showed how weak convergence of stochastic processes could be used to give another construction of Brownian motion by showing that a simple symmetric random walk converges to a Brownian motion. In the first section of this chapter, we show that the sum of independent, identically distributed mean zero random variables with variance one also converges to a Brownian motion, which is known as the Donsker invariance principle. We then consider a Brownian bridge, which is a Brownian motion conditioned to return to zero at time one. We prove in Section 35.3 that a Brownian bridge is the limit process for a sequence of normalized empirical processes. 35.1 Donsker invariance principle Suppose the Yi are i.i.d. real-valued random variables with mean zero and variance one, Sn = ∑n i=1 Yi, and Zn(t) is defined to be equal to Snt/ √ n if nt is an integer and defined by linear interpolation for other values of t. The Donsker invariance principle says that the Zn converge weakly with respect to the space C[0, 1] to a Brownian motion. This is a bit more delicate than in Section 32.2 because here our Yi only have second moments. The statement of the Donsker invariance principle is the following. Theorem 35.1 Let the Yi and Zn be as above. Then Zn converges weakly to the law of Brownian motion on [0, 1] with respect to the metric of C[0, 1]. Before we prove this, we give an application and explain the name “invariance principle.” An example of how the Donsker invariance principle can be used is the following. Corollary 35.2 Let M = sups≤1 Ws and Mn = sups≤1 Zn(s), where W is a Brownian motion. Then Mn converges weakly to M. Proof Let g be a bounded and continuous function on the reals and define a function F on C[0, 1] by F ( f ) = g(sup s≤1 f (s)). Notice | sups≤1 f2(s) − sups≤1 f1(s)| ≤ sups≤1 | f2(s) − f1(s)| and therefore F : C[0, 1] → R is bounded and continuous. Since Zn converges weakly to W with respect to the topology on C[0, 1], then E F (Zn) → E F (W ). This is equivalent to E g(Mn) → E g(M ). Because g is an arbitrary bounded continuous function on the reals, we conclude Mn → M weakly. 269 270 Applications of weak convergence This corollary says that the distribution of maxi≤n Si/ √ n converges to the supremum of a Brownian motion. We can actually use this to derive the distribution of the maximum of a Brownian motion: first determine the distribution of the maximum of Sn when the Yi’s are particularly simple, such as when they are a simple symmetric random walk. (That is, P(Yi = 1) = P(Yi = −1) = 12 .) Then take the limit as n → ∞. In the case of a simple symmetric random walk, we can find the distribution of the maximum using the reflection principle, and there are no technical difficulties with the proof, unlike using the reflection principle with Brownian motion. Another useful example is where In = ∫ 1 0 |Zn(t)|2 dt and I = ∫ 1 0 |Wt |2 dt. Here the dis- tribution of I can be found by an eigenvalue argument (Kuo, 1975), and this is then an approximation to the distribution of In. If f is a continuous function from C[0, 1] to R, an argument similar to the proof of Corollary 35.2 shows that f (Zn) converges weakly to f (W ). We get the same limit process, regardless of the distribution of the Yi’s, provided only that they are i.i.d. with mean zero and variance one. This is where the name “invariance principle” comes from – the limit is invariant with respect to changing the distribution of the Yi’s. Lemma 35.3 Suppose we have a sequence Yi of i.i.d. random variables with mean zero and variance one and Sn = ∑n i=1 Yi. Suppose λ > 4. Then
P(max
i≤n
|Si| ≥ λ
√
n) ≤ 43P(|Sn| ≥ λ
√
n/2).
Proof Let N = min{i : |Si| ≥ λ√n}, the first time Si is bigger than λ√n. N is a stopping
time and (N = i) is in the σ -field generated by Y1, . . . ,Yi. We have
P(max
i≤n
|Si| ≥ λ
√
n) ≤ P(|Sn| ≥ λ
√
n/2) + P(N < n, |Sn| < λ √ n/2) (35.1) ≤ P(|Sn| ≥ λ √ n/2) + n−1∑ i=1 P(N = i, |Sn| < λ √ n/2). If N = i with i < n and |Sn| < λ√n/2, then |Sn − Si| ≥ λ√n/2, and moreover the event {|Sn − Si| ≥ λ√n/2} is in the σ -field generated by Yi+1, . . . ,Yn, and hence is independent of the event {N = i}. Using Chebyshev’s inequality, the sum on the last line of (35.1) is bounded by n−1∑ i=1 P(N = i)P(|Sn − Si| ≥ λ √ n/2) ≤ n−1∑ i=1 P(N = i)E |Sn − Si| 2 λ2n/4 = n−1∑ i=1 P(N = i) n − i λ2n/4 ≤ 14P(N < i) ≤ 14P(maxi≤n |Si| ≥ λ √ n), since λ > 4. Therefore
P(max
i≤n
P(|Si| ≥ λ
√
n) ≤ P(|Sn| ≥ λ
√
n/2) + 14P(maxi≤n |Si| ≥ λ
√
n).

35.1 Donsker invariance principle 271
Subtracting the second term on the right from both sides and multiplying by 4/3 proves the
lemma.
Note that the central limit theorem tells us that for any β > 0
P(|Sn| ≥ β
√
n) → P(|Z| ≥ β) ≤ e−β2/2,
where Z is a mean zero normal random variable with variance one, and hence for n large
(depending on β),
P(|Sn| ≥ β
√
n) ≤ 2e−β2/2. (35.2)
Lemma 35.4 For each ε, η > 0, there exist n0 and δ such that if n ≥ n0 and s ∈ [0, 1 − δ],
then
P( sup
s≤t≤s+δ
|Zn(t) − Zn(s)| > ε) ≤ ηδ.
Proof Let ε, η > 0, and choose δ small enough that 2e−ε
2/128δ ≤ δη/2. Then choose j0
large enough so that, using (35.2),
P
(
|Sj| > ε
√
j
8
√
δ
)
≤ 2e−ε2/128δ ≤ δη/2
if j ≥ j0. Finally, choose n0 ≥ j0/δ + 2, so that if n ≥ n0, then [nδ] + 2 ≥ j0 and
nδ ≥ ([nδ] + 2)/2, where [x] is the largest integer less than or equal to x.
Let n ≥ n0 and set J = [nδ] + 2. Suppose there exists s such that for some t ∈ [s, s + δ]
we have |Zn(t) − Zn(s)| > ε. Then there exists j ≤ n such that for some i between j and
j + J we have |Si − Sj| ≥ ε√n/2. Therefore n ≥ J/2δ and by Lemma 35.3
P( sup
s≤t≤s+δ
|Zn(t) − Zn(s)| > ε) ≤ P( max
j≤i≤ j+J
|Si − Sj| >
√
nε/2)
≤ P
(
max
j≤i≤ j+J
|Si − Sj| >
√
Jε
4
√
δ
)
≤ 43P
(
|Sj+J − Sj| >
√
Jε
8
√
δ
)
≤ 43P
(
|SJ | >
√
Jε
8
√
δ
)
≤ δη.
The proof is complete.
Lemma 35.5 For each ε, η > 0 there exist n0 and δ such that if n ≥ n0,
P(ωZn (δ) ≥ ε) ≤ 2η.
Proof We will take δ = 1/K for some large K. If |t − s| ≤ 1/K, then either both s, t are
in the same interval [(i − 1)/K, i/K] or they are in adjoining intervals. Thus they both lie in
some interval of the form [(i − 2)/K, i/K]. Since
|Zn(t) − Zn(s)| ≤ |Zn(t) − Zn((i − 2)/K)| + |Zn(s) − Zn((i − 2)/K)|,

272 Applications of weak convergence
then using Lemma 35.4 with δ = 2/K
P(∃s, t ∈ [0, 1] : |Zn(t) − Zn(s)| ≥ ε, |t − s| < δ) ≤ P(∃i ≤ K : sup (i−2)/K≤s≤i/K |Zn(s) − Z(i−2)/K | ≥ ε/2) ≤ K sup i P( sup (i−2)/K≤s≤i/K |Zn(s) − Z(i−2)/K | ≥ ε/2) ≤ Kη(2/K) = 2η, which proves the lemma. We can now prove the Donsker invariance principle. Proof of Theorem 35.1 By Lemma 35.5, Theorem 32.1, and the fact that Zn(0) = 0 for all n, the laws of the Zn are tight. Therefore by Prohorov’s theorem (Theorem 30.4), every sub- sequence has a further subsequence which converges weakly with respect to the topology on C[0, 1]. We therefore only need to show that every subsequential limit point of the Zn with respect to weak convergence is a Brownian motion. Since our processes lie in C[0, 1], the paths of any subsequential limit point are continuous, so it suffices by Theorem 2.6 to show that the finite-dimensional distributions of Zn converge weakly to the corresponding finite-dimensional distributions of a Brownian motion W . We will show the one-dimensional distributions converge, and leave the analogous argument for the higher-dimensional distri- butions to the reader. We have P(max i≤n |Yi|/ √ n ≥ ε) ≤ nP(|Y1| ≥ √ nε) ≤ nP(|Y1|2/ε2 ≥ n). (35.3) For any integrable non-negative random variable X , nP(X ≥ n) = E [n; X ≥ n] ≤ E [X ; X ≥ n], which tends to zero by dominated convergence. Therefore P(max i≤n |Yi|/ √ n ≥ ε) → 0. (35.4) Fix t ∈ [0, 1]. By the central limit theorem, S[nt]/ √ [nt] converges weakly on R to a mean zero normal random variable with variance one, and by Exercise 30.3, we see that S[nt]/ √ n converges weakly to a mean zero normal random variable with variance t. From the preceding paragraph we conclude that for each t, |Zn(t) − S[nt]/√n| converges to zero in probability. By Exercise 30.2, Zn(t) has the same weak limit as S[nt]/ √ n, namely, a mean zero normal random variable with variance t, which is the distribution of Wt . There is an elegant proof of the Donsker invariance principle using Skorokhod embedding. Unlike the proof above, however, this second proof does not extend to random variables taking values in Rd . By Theorem 15.6 we can find a Brownian motion W and a random walk Sn such that sup i≤n |Si − Wi|√ n → 0 35.2 Brownian bridge 273 in probability. By the continuity of paths of W , P( sup |t−s|≤1/n,s,t≤1 |Wt − Ws| > ε) → 0.
If we let W n(t) = Wnt/√n, we then have that supi≤n |Zn(i/n) − Wn(i/n)| tends to zero in
probability as n → ∞ and also, because Wn is again a Brownian motion,
P( sup
|t−s|≤1/n,s,t≤1
|Wn(t) − Wn(s)| > ε) → 0.
We conclude that
sup
t≤1
|Zn(t) − Wn(t)| → 0.
The law of Wn is that of a Brownian motion and does not depend on n. By Exercise 30.2 we
obtain that Zn converges weakly to the law of a Brownian motion.
If the above proof seems too simple, remember that we used Theorem 15.6, which in turn
relies on Skorokhod embedding.
One might ask about the weak convergence of Z̃n(t) = S[nt]/√n; these are the normalized
partial sums without the linear interpolation. Rather than being continuous and piecewise
linear like the Zn(t), the Z̃n(t) are piecewise constant and have jumps.
Proposition 35.6 Suppose the Yi are independent with mean zero and variance one. The Z̃n
converge weakly with respect to the topology of D[0, 1] to Brownian motion.
Proof The Zn converge weakly with respect to the topology of C[0, 1] to a Brownian
motion. By the Skorokhod representation (Theorem 31.2), we can find a probability space
and random variables Z ′n having the same law as Zn that converge almost surely with respect
to the supremum norm. Therefore the Z ′n converge almost surely with respect to the metric of
D[0, 1], and hence the Zn converge weakly to a Brownian motion with respect to the topology
of D[0, 1]. If we show that supt≤1 |Zn(t) − Z̃n(t)| converges to zero in probability, then our
result will follow by Exercise 30.2.
Now Zn(t) and Z̃n(t) will differ by more than ε for some t only if some Yi is larger than√
nε in absolute value. But by (35.4), the probability of this tends to zero as n → ∞.
35.2 Brownian bridge
A Brownian bridge W 0t is the process defined by
W 0t = Wt − tW1, 0 ≤ t ≤ 1,
where W is a Brownian motion. W 0 has continuous paths, is jointly normal, is zero at time
0 and at time 1, has mean zero, and we calculate its covariance by
Cov (W 0s ,W
0
t ) = Cov (Ws,Wt ) − s Cov (W1,Wt ) − t Cov (Ws,W1) + stVar (W1)
= s ∧ t − st,
recalling (2.1).

274 Applications of weak convergence
A Brownian bridge can be characterized as a Brownian motion conditioned to be zero at
time 1. To make this precise, let W be a Brownian motion started at zero under P, and for A
a Borel subset of C[0, 1], define
Pε(A) = P(W ∈ A | |W1| ≤ ε);
cf. (A.13). Set P0(A) = P(W 0 ∈ A), the law of W 0.
Proposition 35.7 Pε converges weakly to P0 with respect to the topology ofC[0, 1] as ε → 0.
Proof Since W is a jointly normal process and
Cov (Wt − tW1,W1) = Cov (Wt,W1) − tVar (W1) = 0,
then the process W 0t = Wt − tW1 and the random variable W1 are independent by Proposition
A.55. Let F be any closed subset of C[0, 1] and let Fδ = {g ∈ C[0, 1] : d(g, F ) < δ}, where d(g, F ) = inf{d(g, f ) : f ∈ F} and d here is the supremum norm. Note supt≤1 |Wt −W 0t | ≤ ε on the event {|W1| ≤ ε}. If δ > ε,
Pε(F ) = P(W ∈ F | |W1| ≤ ε) ≤ P(W 0 ∈ Fδ | |W1| ≤ ε)
= P(W 0 ∈ Fδ ) = P0(Fδ ).
Thus lim supε→0 Pε(F ) ≤ P0(Fδ ). Since F is closed, P0(Fδ ) → P0(F ) as δ → 0, so
lim sup Pε(F ) ≤ P0(F ). An application of Theorem 30.2 completes the proof.
We show that a Brownian bridge can also be represented as the solution X of the stochastic
differential equation
dXt = dWt − Xt
1 − t dt, X0 = 0, (35.5)
where W is a Brownian motion. This is plausible: X behaves much like a Brownian motion
until t is close to 1, when there is a strong push toward the origin. The existence and
uniqueness theory of Chapter 24 shows uniqueness and existence for the solution of (35.5)
for s ≤ t for any t < 1; see Exercise 24.4. We can solve (35.5) explicitly. We have dWt = dXt + Xt 1 − t dt = (1 − t) d [ Xt 1 − t ] , or Xt = (1 − t) ∫ t 0 dWs 1 − s . Thus Xt is a continuous Gaussian process with mean zero. The variance of Xt is (1 − t)2 ∫ t 0 (1 − s)−2 ds = t − t2, the same as the variance of a Brownian bridge. A similar calculation shows that the covariance of Xt and Xs is the same as the covariance of Wt − tW1 and Ws − sW1; see Exercise 24.6. Hence the finite-dimensional distributions of Xt and a Brownian bridge are the same. We now appeal to Theorem 2.6. 35.3 Empirical processes 275 35.3 Empirical processes In this section we will consider empirical processes, which are useful in statistics in estimating distribution functions. Let Xi, i = 1, . . . , n, be i.i.d. random variables that are uniformly distributed on the interval [0, 1]. Define the empirical process Fn(t) = 1 n n∑ i=1 1[0,t](Xi). (35.6) The Glivenko–Cantelli theorem (Theorem A.40) says that sup t∈[0,1] |Fn(t) − t| → 0, a.s. Our goal in this section is to obtain the corresponding weak limit theorem. Let Zn(t) = √ n(Fn(t) − t) = 1√ n n∑ =1 (1[0,t](Xi) − t). (35.7) We will show that Zn converges weakly with respect to D[0, 1] to a Brownian bridge. Let ωZn (δ) = sup s,t∈[0,1],|t−s|<δ |Zn(t) − Zn(s)|. The paths of Zn are not continuous: they have a jump of size 1/n at every time Xi. Thus ωZn (δ) does not tend to zero as δ → 0. Nevertheless we can get reasonable estimates on ωZn (δ). We need an elementary lemma on binomial random variables, the proof of which is Exercise 35.1. Lemma 35.8 Suppose Sn is a binomial random variable with parameters n and p. Then there exists a constant c not depending on n or p such that E |Sn − E Sn|4 ≤ cnp + cn2 p2 (35.8) and E |Sn|4 ≤ cnp + cn4 p4. (35.9) Proposition 35.9 Let ε, η > 0. There exists δ and n0 such that if n ≥ n0, then
P(ωZn (δ) > ε) ≤ η.
The idea of the proof is to use Corollary 8.4 to estimate Zn(t)− Zn(s) when |t − s| is small
and use estimates on binomials when |t − s| is large.
Proof Let ε, η > 0. We will choose n0, δ later. Assuming that they have been chosen,
suppose n ≥ n0 and choose k such that n ≤ 2k < 2n. If t ∈ [0, 1], let t(k) be the largest multiple of 2−k less than or equal to t and similarly define s(k). Let Dk = {i/2k : 0 ≤ i ≤ 2k}. We will show there exists δ > 0 such that
P( sup
s,t∈Dk ,|t−s|<2δ |Zn(t) − Zn(s)| > ε/3) < η/3 (35.10) 276 Applications of weak convergence and P( sup s∈[0,1] |Zn(s) − Zn(s(k))| > ε/3) < η/3. (35.11) Step 1. We first prove (35.10) by using Corollary 8.4. Suppose s, t ∈ Dk with |t − s| < 2δ. Then either s = t, in which case Zn(t) − Zn(s) = 0, or else |t − s| ≥ 2−k ≥ 1/(2n). Take p = t − s and note that 1(s,t](Xi) is a Bernoulli random variable with parameter p. Using (35.7) and Lemma 35.8, E |Zn(t) − Zn(s)|4 ≤ c n2 (np + n2 p2) = c ( p n + p2 ) ≤ c|t − s|2, where in the last line we used 1/n ≤ 2|t − s|. By Corollary 8.4, P( sup s,t∈Dk ,|t−s|<2δ |Zn(t) − Zn(s)| > ε/3) ≤ P
(
sup
s,t∈Dk ,|t−s|<2δ |Zn(t) − Zn(s)| |t − s|1/8 > c
ε
δ1/8
)
≤ c(ε/δ1/8)−4 = cδ1/2/ε4.
We choose δ small enough so that the last term is less than η/3.
Step 2. We now prove (35.11). Let
Tn(t) =
n∑
i=1
1[0,t](Xi).
Observe that Tn(t) is nondecreasing in t. If there exists s ∈ [0, 1] such that Tn(s)−Tn(s(k)) >
ε
√
n/3, then there exists j ≤ 2k −1 such that Tn(( j+1)/2k )−Tn( j/2k ) > ε√n/3. Therefore,
using (35.9),
P
(
sup
s∈[0,1]
Tn(s) − Tn(s(k))√
n
> ε/3
)
≤ P(∃ j ≤ 2k − 1 : Tn(( j + 1)/2k ) − Tn( j/2k ) > ε
√
n/3)
≤ 2k sup
j≤2k−1
P(Tn(( j + 1)/2k ) − Tn( j/2k ) > ε
√
n/3)
≤ c2k sup j E |Tn(( j + 1)/2
k ) − Tn( j/2k )|4
ε4n2
≤ c2k n2
−k + (n2−k )4
ε4n2
.
Since n2−k ≤ 2, the last line is less than or equal to
c2kn2−k/ε4n2 = c1/ε4n.
We choose n0 > 1/δ large enough so that if n ≥ n0, then c1/ε4n is less than η/3.
Also,
E [Tn(s) − Tn(s(k)] ≤ n(s − s(k)) ≤ n2−k ≤ 2

35.3 Empirical processes 277
will be less than ε
√
n/3 if n ≥ 36/ε2 and we choose n0 larger if necessary so that n0 > 36/ε2.
Since
Zn(t) − Zn(s) = Tn(t) − Tn(s)√
n
− E [Tn(t) − Tn(s)]√
n
,
(35.11) follows.
Step 3. Now that we have (35.10) and (35.11), we write
|Zn(t) − Zn(s)| ≤ |Zn(t) − Zn(t(k))| + |Zn(t(k)) − Zn(s(k))| + |Zn(s(k)) − Zn(s)|.
If |t − s| < δ, then |t(k) − s(k)| ≤ δ + 2−k ≤ δ + 1/n. Provided n ≥ n0 > 1/δ, combining
(35.10) and (35.11) gives
P( sup
s,t∈[0,1],|t−s|<δ |Zn(t) − Zn(s)| > ε) < η as required. Theorem 35.10 The Zn converge weakly to a Brownian bridge with respect to the topology of D[0, 1]. Proof We smooth Zn to get a continuous process Vn. Set Zn(t) = Zn(1) for t ∈ [1, 2] and set Vn(t) = n ∫ n−1 0 Zn(u + t) du. We have |Vn(t2) − Vn(t1)| ≤ n ∫ n−1 0 |Zn(t2 + u) − Zn(t1 + u)| du ≤ n ∫ n−1 0 ωZn (|t2 − t1|) du = ωZn (|t2 − t1|). Note also that by (35.8) with p = t − s and using Jensen’s inequality with the measure n1[0,n−1](u) du, E |Vn(0)|4 ≤ n ∫ n−1 0 E |Zn(u)|4 du ≤ c. Hence P(|Vn(0)| ≥ A) ≤ E |Vn(0)| 4 A4 ≤ c A4 . Therefore by Theorem 8.1, the Vn are tight with respect to weak convergence on C[0, 1]. If the Vnj converges weakly (with respect to C[0, 1]), by the Skorokhod representation we may find V ′n j with the same law as Vnj that converge almost surely. Then the V ′ n j will also converge almost surely in the space D[0, 1]. This proves that the Vn are tight in D[0, 1] by Exercise 30.10. Given ε and η, choose δ and n0 such that P(ωZn (δ) > ε) < η if n ≥ n0. We have |Vn(t) − Zn(t)| ≤ n ∫ n−1 0 |Zn(u + t) − Zn(t)| du ≤ ωZn (n−1). 278 Applications of weak convergence If n is large enough so that n−1 < δ, then P(sup t |Vn(t) − Zn(t)| > ε) ≤ P(ωZn (n−1) > ε) ≤ P(ωZn (δ) > ε) < η. Therefore Vn − Zn converges to 0 in probability, and by Exercise 30.2 the subsequential limit points of Vn are the same as those of Zn. It remains to show that any subsequential limit point of the Zn is a Brownian bridge. This follows from the multidimensional central limit theorem for multinomials (see Remark A.57) and is left as Exercise 35.2. Exercises 35.1 Prove Lemma 35.8. 35.2 Prove that the finite-dimensional distributions of Zn in Theorem 35.10 converge to those of a Brownian bridge. 35.3 If W 0t is a Brownian bridge, prove that Yt = W 01−t is also a Brownian bridge. 35.4 Let t0 < 1. The SDE (35.5) has a unique solution when X0 = 0 is replaced by X0 = x. Let Px be the law of the solution when X0 = x and let Zt be the canonical process. Show that (Zt , Px) is not a Markov process. 35.5 Let Nt (A) be a Poisson point process with respect to the measure space (S, m) and let As, s > 0,
be an increasing sequence of subsets of S with m(As) → ∞ as s → ∞. Does
Nt (As) − m(As)√
m(As)
converge weakly with respect to D[0, 1] as s → ∞? What is the limit?
This can be applied to get central limit theorems for the number of downcrossings of a
Brownian motion, for example.
35.6 This exercise asks you to prove that the Poisson process conditioned to be equal to n at time 1
has the same law as n times the empirical process. Here is the precise statement. Suppose Pt is
a Poisson process with parameter λ > 0. Let Q be the law of {Pt , t ∈ [0, 1]} conditioned so that
P1 = n. Thus Q is a probability on D[0, 1] with
Q(P ∈ A) = P(P ∈ A | P1 = n).
Since (P1 = n) is an event with positive probability, there is no difficulty defining these
conditional probabilities. Prove that Q is also the law of {nFn(t), t ∈ [0, 1]}, where Fn is defined
in Section 35.3.

36
Semigroups
In this chapter we suppose we have a semigroup of positive contraction operators {Pt},
and we show how to construct a Markov process X corresponding to this semigroup. In
Chapters 37 and 38, we will show how such semigroups might arise.
We suppose that we have a state space S that is a separable locally compact metric space
S . Let C0 be the set of continuous functions on S that vanish at infinity. Recall that f ∈ C0
if f is continuous, and given ε, there exists a compact set K depending on ε and f such that
| f (x)| < ε, x /∈ K. We use the usual supremum norm on C0. We assume we have a semigroup {Pt} of positive contractions mapping C0 to C0. More precisely, we assume Assumption 36.1 There exists a family {Pt}, t ≥ 0, of operators on C0 such that (1) If f ∈ C0, then Pt (Ps f )(x) = Pt+s f (x), x ∈ S, s, t ≥ 0. (2) If f (x) ≥ 0 for all x and if t ≥ 0, then Pt f (x) ≥ 0 for all x. (3) For all t, ‖Pt f ‖ ≤ ‖ f ‖. (4) If f ∈ C0, then Pt f → f uniformly as t → 0. Our goal in this section is to construct a process X corresponding to the semigroup Pt . The steps we use are the following. (1) We temporarily assume each Pt maps the function 1 into itself. We define Xt for t in the dyadic rationals and define Px using the Kolmogorov extension theorem. (2) We verify a preliminary version of the Markov property. (3) We use the regularity theorem for supermartingales to show that X has left and right limits along the dyadic rationals, and then define Xt for all t. (4) We verify that our process (Xt, Px) corresponds to the semigroup Pt . (5) We remove the assumption that Pt1 = 1. 36.1 Constructing the process Let us assume the following for now. We will remove this assumption at the end of this section. Assumption 36.2 Pt1(x) = 1 for all x and all t ≥ 0. 279 280 Semigroups We now begin the construction of (Xt, Px). Step 1. Let Dn = {k/2n : k ≥ 0} and let D = ∪nDn, the dyadic rationals. Let � be the set of functions from D to S . Define Xt (ω) = ω(t), t ∈ D, ω ∈ �. We let F be the σ -field on � generated by the collection of cylindrical subsets of �. By the Riesz representation theorem (see Rudin (1987)), for each t > 0 there exists a
measure Pt (x, dy) such that
Pt f (x) =
∫
f (y) Pt (x, dy), f ∈ C0. (36.1)
(The Riesz representation theorem is most often phrased for continuous functions on compact
spaces; since we are working with C0, we can let the state space satisfy slightly weaker
hypotheses; see Folland (1999), p. 223.) We can use (36.1) to define Pt f for all bounded
Borel measurable functions f . Since Pt maps C0 to C0, and continuous functions are Borel
measurable, a limit argument shows that Pt f is Borel measurable whenever f is bounded
and Borel measurable.
Our main task in this step is to define Px. D is countable and we fix a labeling D =
{t1, t2, . . .}. Let En = {t1, . . . , tn}. Let s1 ≤ · · · ≤ sn be the ordering of En according to the
usual ordering of the reals, so that s1 is the smallest element of the set {t1, . . . , tn}, s2 is the
next smallest, and so on. Define
Pxn(Xs1 ∈ A1, . . . , Xsn ∈ An) (36.2)
=
∫
An
· · ·
∫
A1
Ps1 (x, dx1)Ps2−s1 (x1, dx2) · · · Psn−sn−1 (xn−1, dxn)
for A1, . . . , An Borel subsets of S . The Pxn are consistent in the sense of Appendix D. The
key to checking this is to observe that if s1, . . . , sn is the ordering of En and we temporarily
write s1, . . . , si, s, si+1, . . . , sn for the ordering of En+1, then∫
S
Ps−si (xi−1, dx)Psi+1−s(x, dxi) = Psi+1−si (xi−1, dxi)
by the semigroup property; cf. (19.10).
By the Kolmogorov extension theorem (Theorem D.1), for each x there exists a probability
Px such that
Px(Xt1 ∈ A1, . . . , Xtn ∈ An) = Pxn(Xt1 ∈ A1, . . . , Xtn ∈ An)
for each n whenever A1, . . . , An are Borel subsets of S .
If E x is the expectation corresponding to Px, (36.2) can be rewritten as
E
x[ f1(Xs1 ) · · · fn(Xsn )] (36.3)
=
∫
· · ·
∫
f1(x1) · · · fn(xn)Ps1 (x, dx1)Ps2−s1 (x1, dx2) · · ·
× Psn−sn−1 (xn−1, dxn)

36.1 Constructing the process 281
when fi = 1Ai for each i. To see this, by linearity we have (36.3) when the functions fi are
simple functions; by a limit argument we have (36.3) when the fi are Borel measurable and
non-negative, and by linearity, (36.3) holds when the fi are bounded and Borel measurable.
By (36.2) we have
Px(Xt ∈ A) = E 1A(Xt ) =
∫
A
Pt (x, dy) = Pt1A(x).
Using linearity and a limit argument, we have E x f (Xt ) = Pt f (x) when f is bounded and
Borel measurable.
Proposition 36.3 If f is bounded and Borel measurable, s, t > 0, and x ∈ S , then
E
x
[
E
Xt f (Xs)
]
= E x f (Xs+t ). (36.4)
Proof The proof of (36.4) is mainly a matter of sorting out notation. Let ϕ(x) = E x f (Xs) =
Ps f (x). Hence E
Xt f (Xs) = ϕ(Xt ) = Ps f (Xt ). Then the left-hand side is E x(Ps f )(Xt ) =
Pt (Ps f )(x). The right-hand side of (36.4) is Ps+t f (x), and so the two sides agree by the
semigroup property.
Step 2. We so far only have Xt constructed for t ∈ D. To extend the definition to all t, we
want to let Xt = limu>t,u∈D,u→t Xu. But before we can make that definition, we need to know
that the limits exist. We will use the regularity of supermartingales to show this, so we need
to look at conditional expectations. Let
F ′s = σ (Xr; r ≤ s, r ∈ D).
Proposition 36.4 If s < t with s, t ∈ D and f is bounded and Borel measurable, then E x[ f (Xt ) | F ′s] = E Xs f (Xt−s), Px-a.s. (36.5) Proof Take n ≥ 1, r1 ≤ r2 ≤ · · · ≤ rn ≤ s with each r j in D, and A1, . . . , An Borel subsets of S . It suffices to show that E x[ f (Xt )1A1 (Xr1 ) · · · 1An (Xrn )] = E x[(E Xs f (Xt−s))1A1 (Xr1 ) · · · 1An (Xrn )], (36.6) since the events (Xr1 ∈ A1, . . . , Xrn ∈ An) generate F ′s. The right-hand side of (36.6) is equal to E x[Pt−s f (Xs)1A1 (Xr1 ) · · · 1An (Xrn )]. (36.7) From (36.3) E x[Pt−s f (Xs)1A1 (Xr1 ) · · · 1An (Xrn )] = ∫ · · · ∫ Pt−s f (y)1A1 (x1) · · · 1An (xn) (36.8) × Pr1 (x, dx1) · · · Prn−rn−1 (xn−1, xn)Ps−rn (xn, dy). But Pt−s f (y) = ∫ f (z)Pt−s(y, dz). Substituting this in (36.8) and using (36.3) again, we obtain the left-hand side of (36.6). Step 3. We define Rλ, the resolvent or λ-resolvent of Pt , by Rλ f (x) = ∫ ∞ 0 e−λtPt f (x) dt. (36.9) 282 Semigroups Lemma 36.5 If f ≥ 0 is bounded and Borel measurable and x ∈ S , then Mt = e−λtRλ f (Xt ), t ∈ D, is a supermartingale with respect to the filtration {F ′t ; t ∈ D} and the probability measure Px. Proof What we need to show is that if s < t ∈ D, then E x[e−λtRλ f (Xt ) | F ′s] ≤ e−λsRλ f (Xs), Px-a.s. By Proposition 36.3 the left-hand side is e−λtE Xs Rλ f (Xt−s), so what we need to show is that E yRλ f (Xt−s) ≤ eλ(t−s)Rλ f (y) (36.10) for all y. The left-hand side of (36.10) is Pt−sRλ f (y) = ∫ ∞ 0 e−λrPt−sPr f (y) dr = ∫ ∞ 0 e−λrPr+t−s f (y) dr = eλ(t−s) ∫ ∞ t−s e−λrPr f (y) dr ≤ eλ(t−s) ∫ ∞ 0 e−λrPr f (y) dr = eλ(t−s)Rλ f (y). The first equality is the Fubini theorem, the second the semigroup property, and the third equality comes from a change of variables. Next, if f is non-negative and bounded, by Theorem 3.12 with P replaced by Px, we see that e−λtRλ f (Xt ) has left and right limits along t ∈ D. Therefore the same is true for Rλ f (Xt ). By Assumption 36.1 and dominated convergence, we have that if f ∈ C0, λRλ f (x) − f (x) = ∫ ∞ 0 e−λt (Pt f (x) − f (x)) dt = ∫ ∞ 0 e−t (Pt/λ f (x) − f (x)) dt tends to zero uniformly in x as λ → 0. Take a countable dense subset { fi} of C0 and look at jR j fi(Xt ) for all positive integers j. Since jR j fi(Xt ) has left and right limits along D, a.s., letting j → ∞, we see that fi(Xt ) does also. We conclude that Xt has left and right limits along D. Now define Xt = limu>t,u∈D,u→t Xu. Then Xu is right continuous with left limits. We check
that
Px(Xt1 ∈ A1, . . . , Xtn ∈ An) =
∫
A1
· · ·
∫
An
Pt1 (x, dx1) · · · Ptn−tn−1 (xn−1, dxn).

36.2 Examples 283
To see this, we know this holds when the ti are in D. By linearity and a limit argument, we
conclude
E
x[ f1(Xt1 ) · · · fn(Xtn )] =
∫
· · ·
∫
f (x1) · · · f (xn)Pt1 (x, dx1) · · · Ptn−tn−1 (xn−1, dxn)
(36.11)
when the fi are bounded and continuous. Using a limit argument, we know (36.11) holds
when the ti are arbitrary non-negative real numbers. Using a limit argument again, (36.11)
holds for all bounded and measurable f , in particular, when fi = 1Ai .
Step 4. It remains to show that (Xt, Px) satisfies Definition 19.1 and that Pt is the semigroup
of this process. Let F 00t = σ (Xs; s ≤ t). Then we have already shown that (Xt, Px) is a
Markov process with respect to the filtration {F 00t }, except for showing that
Px(Xs+t ∈ A | F 00s ) = PXs (Xt ∈ A).
However, this can be proved almost identically to the way we proved Proposition 36.4.
Step 5. Sometimes the semigroup is a contraction semigroup and satisfies Assumption 36.1
but not Assumption 36.2. In this case the Pt (x, A) are called sub-Markov transition probability
kernels. The missing probability is due to the process being killed, and we can handle this
situation as follows. Let S� = S ∪ {�}, where we introduce an isolated point {�}. The
topology on S� is the one generated by the open sets on S together with the set {�}. Given
a function f on S , we extend it to S� by setting f (�) = 0. We replace Pt (x, A) by Pt (x, A),
where ⎧⎪⎨⎪⎩
Pt (x, A) = Pt (x, A), x ∈ S, A ⊂ S,
Pt (x, {�}) = 1 − Pt (x,S ), x ∈ S,
Pt (�, {�}) = 1.
(36.12)
One can go through the above construction with Pt and obtain a strong Markov process Xt
whose state space is S�. It is not hard to show that starting at �, the process stays at �
forever; see Exercise 36.1.
We remark that by the results of Chapter 20 and also Exercise 20.1, we can expand the
filtration from {F 00t } to {Ft}, where {Ft} is right continuous and each Ft contains all the sets
that are null with respect to each Px. In addition, the strong Markov property will hold for
(Xt, Px).
36.2 Examples
Example 36.6 Our first example is a Brownian motion. Let
p(t, x, y) = (2πt)d/2e−|x−y|2/2t,
and set
Pt (x, A) =
∫
A
p(t, x, y) dy.
We know ∫
p(t, x, z)p(s, z, y) dz = p(t + s, x, y)

284 Semigroups
by Proposition 19.5, and so Pt satisfies the semigroup property. We showed in Section 19.4
that Assumption 36.1 is satisfied, except for the fact that Pt maps C0 to C0; this is Exercise
36.2. Therefore we have a strong Markov process associated with Pt . By Proposition 21.5,
the paths of the strong Markov process can be taken to be continuous. This gives yet another
construction of a Brownian motion.
Example 36.7 We now use the machinery we have developed in this chapter to construct
the Poisson process. Define transition probabilities by
Pt (x, A) = e−λt
∞∑
k=0
(λt)k
k!
1A(x + k),
where λ is some fixed parameter. If p(t, k) = e−λt (λt)k/k!, then
Pt f (x) =
∞∑
k=0
f (x + k)p(t, k). (36.13)
Thus
Ps(Pt f )(x) =
∞∑
j=0
Pt f (x + j)p(s, j) =
∞∑
j=0
∞∑
k=0
f (x + j + k)p(t, k)p(s, j).
This is equal to
∞∑
m=0
f (x + m)
m∑
k=0
p(t, m − k)p(s, k), (36.14)
which by Exercise 36.3 is equal to
∞∑
m=0
f (x + m)p(s + t, m) = Ps+t f (x). (36.15)
Therefore the semigroup property holds.
We therefore have a strong Markov process X whose paths are right continuous with left
limits. We want to show that the process Xt under the probability measure P0 is a Poisson
process. That P0(X0 = 0) = 1 is obvious. We need to show that Definition 5.1(3) and (4)
hold. For the former,
P0(Xt − Xs = k) =
∞∑
j=0
P(Xt = j + k, Xs = j) (36.16)
=
∞∑
j=0
p(s, j)p(t − s, k) = p(t − s, k), (36.17)
as desired. For Definition 5.1(4), suppose r1 ≤ r2 ≤ · · · ≤ rn ≤ s < t, a1, . . . , an are integers, and let A = (Xr1 = a1, . . . , Xrn = an). We will be done if we show P0(Xt − Xs = k, A) = P0(Xt − Xs = k)P0(A). (36.18) Notes 285 The left-hand side of (36.18) is equal to ∞∑ j=0 P0(Xt = j + k, Xs = j, A) = ∞∑ j=0 E 0[P0(Xt = j + k | Fs); Xs = j, A] = ∞∑ j=0 E 0 [ PXs (Xt−s = j + k); Xs = j, A ] = ∞∑ j=0 E 0 [ P j(Xt−s = j + k); Xs = j, A ] = ∞∑ j=0 E 0[p(t − s, k); Xs = j, A] = p(t − s, k)P0(A). Together with (36.16) this proves (36.18). Exercises 36.1 Suppose Pt is a family of sub-Markov transition probabilities and we define Pt by (36.12). Show that Pt is a family of Markov transition probabilities. Show that P�(Xt �= � for some t > 0) = 0,
i.e., starting at �, the process stays there forever.
36.2 Show that if Pt (x, A) is defined by (19.17), and Pt f (x) =
∫
f (y) Pt (x, dy), then Pt maps C0 into
C0.
36.3 Show that (36.14) equals (36.15).
36.4 Show that Pt defined by (36.13) satisfies all the parts of Assumption 36.1.
36.5 Suppose {μt , t ≥ 0} is a tight family of probability measures on the real line. Suppose there
exists a function ψ : R → C such that the Fourier transforms of the μt have the following form:∫
eiux μt (dx) = etψ(u), t ≥ 0, u ∈ R.
(1) Prove that μt converges weakly to μ0 as t → 0. Note that μ0 is the same as point
mass at 0.
(2) Define the operators Pt by
Pt f (x) =
∫
f (x − y) μt (dy).
Prove that the Pt form a strongly continuous semigroup of contraction operators mapping C0
into C0. Conclude that there exists a strong Markov process whose semigroup is given by
the Pt .
This semigroup is called a convolution semigroup because μt+s = μt ∗ μs, in the sense of
convolution of measures. We will see later that these are associated with Lévy processes.
Notes
See Blumenthal and Getoor (1968) for further information.

37
Infinitesimal generators
Often a Markov process is specified in terms of its behavior at each point, and one wants
to form a global picture of the process. This means one is given the infinitesimal generator,
which is a linear operator that is an unbounded operator in general, and one wants to come
up with the semigroup for the Markov process.
We will begin by looking further at semigroups and resolvents, and then define the
infinitesimal generator of a semigroup. We will prove the Hille–Yosida theorem, which is the
primary tool for constructing semigroups from infinitesimal generators. Then we will look
at two important examples: elliptic operators in nondivergence form and Lévy processes.
37.1 Semigroup properties
Let S be a locally compact separable metric space. We will take B to be a separable Banach
space of real-valued functions on S . For the most part, we will take B to be the continuous
functions on S that vanish at infinity (with the supremum norm), although another common
example is to let B be the set of functions on S that are in L2 with respect to some measure.
We use ‖ · ‖ for the norm on B.
For the duration of this chapter we will make the following assumption.
Assumption 37.1 Suppose that Pt , t ≥ 0, are operators acting on B such that
(1) the Pt are contractions: ‖Pt f ‖ ≤ ‖ f ‖ for all t ≥ 0 and all f ∈ B,
(2) the Pt form a semigroup: PsPt = Pt+s for all s, t ≥ 0, and
(3) the Pt are strongly continuous: if f ∈ B, then Pt f → f as t → 0.
Note that the semigroup property implies in particular that Ps and Pt commute. For a bounded
operator A on B, ‖A‖ = sup{‖A f ‖ : ‖ f ‖ ≤ 1}, so saying Pt is a contraction is the same as
saying ‖Pt‖ ≤ 1.
Define the resolvent or λ-resolvent operator of a semigroup Pt by
Rλ f (x) =
∫ ∞
0
e−λtPt f (x) dt. (37.1)
The resolvent equation is
Rλ − Rμ = (μ − λ)RλRμ. (37.2)
We show that the semigroup property implies the resolvent equation.
286

37.1 Semigroup properties 287
Proposition 37.2 The resolvent equation (37.2) holds.
Proof We write
Rλ(Rμ f )(x) =
∫ ∞
0
e−λtPt (Rμ f )(x) dt
=
∫ ∞
0
e−λt
∫ ∞
0
e−μsPt (Ps f )(x) ds dt
=
∫ ∞
0
e−λt
∫ ∞
0
e−μsPt+s f (x) ds dt
=
∫ ∞
0
e−λt eμt
∫ ∞
t
e−μsPs f (x) ds dt
=
∫ ∞
0
∫ s
0
e−(λ−μ)t e−μsPs f (x) dt ds
=
∫ ∞
0
1 − e−(λ−μ)s
λ − μ e
−μsPs f (x) ds
= 1
μ − λ
[ ∫ ∞
0
e−λsPs f (x) ds −
∫ ∞
0
e−μsPs f (x) ds
]
= 1
μ − λ [Rλ f (x) − Rμ f (x)].
The second equality uses Exercise 37.2, the fourth a change of variables, and the fifth the
Fubini theorem.
We have the following corollary to Proposition 37.2.
Corollary 37.3 If μ, λ > 0 and |μ − λ| < λ, then Rμ f = Rλ f + ∞∑ i=1 (λ − μ)iRi+1λ f . (37.3) Here R2λ f = Rλ(Rλ f ), and similarly for Riλ f . Proof By Proposition 37.2, we have Rμ f = Rλ f + (λ − μ)RλRμ f . (37.4) If we substitute for Rμ f in the last term on the right-hand side of (37.4), we have Rμ f = Rλ f + (λ − μ)RλRλ f + (λ − μ)2RλRλRμ f . We again substitute for Rμ f , and repeat. Since ‖(λ − μ)Rλ‖ ≤ |λ − μ| λ , which is less than one in absolute value, (λ − μ)iRi+1λ Rμ f converges to zero as i → ∞ and the series converges. 288 Infinitesimal generators Remark 37.4 In particular, if Rλ and Sλ are two resolvents that agree at one value of λ, say λ0, then Corollary 37.3 applied once with Rλ and once with Sλ implies that if λ < 2λ0, then Rλ f = Rλ0 f + ∞∑ i=1 (λ0 − λ)i(Rλ0 )i+1 f = Sλ0 f + ∞∑ i=1 (λ0 − λ)i(Sλ0 )i+1 f = Sλ f , or Rλ and Sλ agree for λ < 2λ0. Applying this observation again with λ0 replaced by 3λ0/2, then Rλ and Sλ agree for λ < 3λ0. Continuing this argument, we see that Rλ and Sλ must agree for each positive value of λ. If for some f ∈ B, ∥∥∥Ph f − f h − g ∥∥∥→ 0 as h → 0, we say that f is in the domain of the infinitesimal generator of the semigroup, we write g = L f and write f ∈ D = D(L). Generally D(L) is a proper subset of B. If f ∈ D and t > 0, then
PhPt f − Pt f
h
= PtPh f − Pt f
h
= Pt
(Ph f − f
h
)
→ PtL f , (37.5)
since Pt is a contraction. Therefore Pt f ∈ D when f ∈ D and L(Pt f ) = Pt (L f ).
Proposition 37.5 Fix λ > 0 and let C = {Rλ f : f ∈ B}. Then C = D(L) and for f ∈ B,
LRλ f = λRλ f − f .
Proof Suppose that g ∈ C, so that g = Rλ f for some f ∈ B. Then
PhRλ f =
∫ ∞
0
e−λtPh+t f dt = eλh
∫ ∞
h
e−λtPt f dt, (37.6)
and so
Phg − g = PhRλ f − Rλ f = (eλh − 1)
∫ ∞
h
e−λtPt f dt −
∫ h
0
e−λtPt f dt. (37.7)
Dividing by h and letting h → 0, the first term on the right of (37.7) converges (use Exercise
37.2) to
λ
∫ ∞
0
e−λtPt f dt = Rλ f .
Since f ∈ B, then Pt f → f as t → 0. After dividing by h, the second term on the right-hand
side of (37.7) converges to f . Thus
L(Rλ f ) = λRλ f − f , (37.8)
as required.

37.1 Semigroup properties 289
We have shown that C ⊂ D(L), and we now show the opposite inclusion. Suppose
f ∈ D(L). Let g = λ f − L f , which is in B. Since Pt and L commute, then Rλ and L
commute, and by (37.8),
f = λRλ f − (λRλ f − f ) = λRλ f − RλL f
= Rλg,
which is in C.
Example 37.6 Let us compute the infinitesimal generator when (Xt, Px) is a one-
dimensional Brownian motion. For our space B we take the continuous functions on R
that vanish at infinity. Suppose f ∈ C2 with compact support. By a Taylor series expansion,
Ph f (x) = E x f (Xh) = f (x) + f ′(x)E x(Xh − x) + 12 f ′′(x)E x(Xh − x)2 + Rh,
where Rh is the remainder term. We know Rh is bounded by
‖ f ′′‖∞E x[ϕ(Xh − x)],
where ϕ is bounded and |ϕ(y)/y2| → 0 as y → 0. Since Wh started at x has mean x and
variance h, we have
Ph f (x) = f (x) + 12 f ′′(x)h + Rh,
where |Rh/h| tends to zero as h → 0. Therefore
Ph f − f
h
→ 12 f ′′,
the convergence being with respect to the supremum norm. Exactly the same argument
holds in higher dimensions to show that L f = 12� f . We have shown that D(L) contains
the C2 functions with compact support, but have not actually identified the domain of the
infinitesimal generator. We refer the reader to Knight (1981) for a detailed discussion.
The domain of an infinitesimal generator is nearly as important as the operator itself.
We will briefly discuss aspects of the domains of the infinitesimal generator for absorbing
Brownian motion and for reflecting Brownian motion on [0, ∞). Both have the same operator
L f = 12 f ′′ but different domains.
Absorbing Brownian motion on [0, ∞) is Brownian motion killed on first hitting (−∞, 0).
Let Wt be standard Brownian motion on R and let Xt be Wt killed on first hitting (−∞, 0).
If f ∈ C2[0, ∞) with f and its first and second derivatives being bounded and uniformly
continuous and x �= 0, (E x f (Xt ) − f (x))/t differs from (E x f (Wt ) − f (x))/t by at most
‖ f ‖∞Px(T0 < t)/t, where T0 is the first time Wt hits (−∞, 0). If x �= 0, Px(T0 < t) t ≤ P x(sups≤t |Ws − W0| ≥ x) t ≤ 2 t e−x 2/2t → 0 as t → 0. Therefore for x �= 0, the infinitesimal generator of absorbing Brownian motion is the same as the infinitesimal generator of standard Brownian motion, namely, 12 f ′′(x). 290 Infinitesimal generators If f = Rλg for g bounded and continuous, we have f (0) = Rλg(0) = E 0 ∫ T0 0 e−λtg(Xt ) dt = 0. We use the fact that starting at 0, T0 = 0, a.s., by Theorem 7.2. Using Proposition 37.5, every function in the domain of the infinitesimal generator of absorbing Brownian motion must satisfy f (0) = 0. We can define reflecting Brownian motion on [0, ∞) by Xt = |Wt |, where W is a one- dimensional Brownian motion on R. As in the preceding paragraph, the infinitesimal gen- erator for X agrees with 12 f ′′(x) if x �= 0. For x = 0, an application of Taylor’s theorem gives E 0 f (|Wt |) = f (0) + f ′(0)E 0|Wt | + 12 f ′′(0)E 0|Wt |2 + E 0Rt, where Rt is a remainder term. Subtracting f (0) from both sides and dividing by t, and noting E 0|Wt |/t = c1 √ t/t → ∞ as t → 0, the only way we can get convergence is if f ′(0) = 0. Thus every function in the domain of the infinitesimal generator of reflecting Brownian motion must satisfy f ′(0) = 0. In higher dimensions, the analogous restriction for reflecting Brownian motion is that the normal derivative ∂ f /∂n must equal zero on the boundary of the domain, where n is the inward-pointing unit normal vector. In the partial differential equations literature, this is known as the Neumann boundary condition, and models situations where there is no heat flow across the boundary. For absorbing Brownian motion the analogous restriction is that f = 0 on the boundary of the domain, and this is called the Dirichlet boundary condition. Example 37.7 Next we compute the generator for a Poisson process with parameter λ. We can let B be as in Example 37.6. We have Ph f (x) = ∞∑ i=0 e−λh (λh)i i! f (x + i) = e−λh f (x) + e−λhλh f (x + 1) + ∞∑ i=2 e−λh (λh)i i! f (x + i). Subtracting f (x) from both sides, dividing by h, and letting h → 0, we obtain L f (x) = −λ f (x) + λ f (x + 1) = λ[ f (x + 1) − f (x)]. In this case the domain of L is all of B. A very useful result is Dynkin’s formula. Theorem 37.8 Suppose Pt operating on the space B of continuous functions vanishing at infinity is the semigroup of a Markov process (Xt, Px), f ∈ D(L), and f andL f are bounded. If x ∈ S and T is a stopping time with E xT < ∞, then E x f (XT ) − f (x) = E x ∫ T 0 L f (Xr) dr. 37.1 Semigroup properties 291 Proof If f ∈ D(L), then L f ∈ B, and so PtL f is continuous in t. Moreover, as we saw in (37.5), ∂ ∂t Pt f (y) = PtL f (y). By the fundamental theorem of calculus, Pt f (y) − f (y) = ∫ t 0 PrL f (y) dr, which can be rewritten as E y f (Xt ) − f (y) = E y ∫ t 0 L f (Xr) dr; (37.9) we used the Fubini theorem here as well. This holds for each y ∈ S and each t > 0.
Set Mt = f (Xt ) − f (X0) −
∫ t
0 L f (Xr) dr. What (37.9) says is that E
yMt = 0 for all y and
all t. By the Markov property,
E
x[Mt − Ms | Fs] = E x
[
f (Xt ) − f (Xs) −
∫ t
s
L f (Xr) dr | Fs
]
= E x
[(
f (Xt−s) − f (X0) −
∫ t−s
0
L f (Xr) dr
)
◦ θs | Fs
]
= E Xs Mt−s = 0.
Therefore Mt is a martingale with respect to Px for each x. If T is a bounded stopping time,
then by optional stopping, E xMT = 0. If T is instead only integrable with respect to Px,
we have E xMT ∧n = 0 for each n. We then let n → ∞ and use the fact that f and L f are
bounded to conclude E xMT = 0, which is what we want.
We say a few words about the Kolmogorov backward and forward equations. Suppose the
semigroup Pt can be written
Pt f (x) =
∫
f (y)p(t, x, y) dy,
for functions p(t, x, y), which are called transition densities. Provided there are no difficulties
interchanging integration and differentiation, the equation
∂
∂t
Pt f (x) = LPt f (x)
can be rewritten as ∫
f (y)
∂
∂t
p(t, x, y) dy =
∫
f (y)Lp(t, x, y) dy,
which leads to the Kolmogorov backward equation
∂
∂t
p(t, x, y) = Lp(t, x, y),
where L operates on the x variable and y is held fixed.

292 Infinitesimal generators
If L has an adjoint operator L∗, which means
∫
f (Lg) = ∫ (L∗ f )g for f and g in the
domains of L∗ and L, respectively, the equation
∂
∂t
Pt f (x) = PtL f (x)
can be rewritten as∫
f (y)
∂
∂t
p(t, x, y) dy =
∫
L f (y)p(t, x, y) dy =
∫
f (y)L∗ p(t, x, y) dy,
which leads to the Kolmogorov forward equation
∂
∂t
p(t, x, y) = L∗ p(t, x, y),
where L∗ operates on the y variable and x is held fixed.
37.2 The Hille–Yosida theorem
We now show how to construct a semigroup given the infinitesimal generator. We start with
a few preliminary observations. If A is a bounded operator, we can define
eA = I + A + A2/2! + · · · =
∞∑
i=0
Ai/i!
To see that the series converges, note that∥∥∥ n∑
i=m
Ai/i!
∥∥∥ ≤ n∑
i=m
‖Ai‖/i! ≤
∞∑
i=m
‖A‖i/i!,
which will be small if m is large since ‖A‖ is a finite number. Similarly,
‖eA‖ ≤
∞∑
i=0
‖A‖i/i! = e‖A‖.
Proposition 37.9 Suppose {Rλ} is a family of bounded operators defined on B such that
(1) the resolvent equation holds,
(2) ‖Rλ‖ ≤ 1/λ for each λ > 0, and
(3) ‖λRλ f − f ‖ → 0 as λ → ∞ for each f ∈ B.
Then there exists a strongly continuous semigroup Pt whose resolvent is Rλ.
Proof Let Dλ = λ(λRλ − I ) and Qλt = etDλ . Note that the resolvent equation implies that
Dλ and Dμ commute and therefore all the operators Dλ, Qλt , Dμ, and Q
μ
t commute. Since
‖λRλ‖ ≤ 1, then
‖Qλt ‖ = e−λt‖etλ
2Rλ‖ ≤ e−λt e‖tλ2Rλ‖ ≤ e−λteλt = 1.
We first show that the set of f such that Dλ f converges as λ → ∞ is a dense subset of B.
If f = Rag for some a > 0 and some g ∈ B, then by the resolvent equation
Dλ f = λ(λRλ − I )(Rag) = λ2RλRag − λRag
= λ
2
λ − a (Rag − Rλg) − λRag.

37.2 The Hille–Yosida theorem 293
We have
λ2
λ − aRλg =
λ
λ − aλRλg → g
as λ → ∞ by hypothesis (3) and
λ2
λ − aRag − λRag =
λ
λ − aaRag → aRag
as λ → ∞. Therefore
DλRag → aRag − g. (37.10)
Thus Dλ converges on E = ∪a>0{Ra f : f ∈ B}. But for any f ∈ B, aRa f → f as a → ∞
and aRa f = Ra(a f ) ∈ E, which proves that E is a dense subset of B.
Next we show that if Dλ f converges, then Qλt f converges. Suppose Dλ f converges and
ε > 0. Choose M such that if λ, μ ≥ M , then ‖Dλ f − Dμ f ‖ < ε. Since ∂Qλt f /dt = DλQλt f and Qλ0, Q μ 0 are both the identity operator, we have Qλt f − Qμt f = ∫ t 0 ∂ ∂s (Qλs Q μ t−s f ) ds = ∫ t 0 [Qλs DλQ μ t−s f − Qλs DμQμt−s f ] ds = ∫ t 0 [Qλs Q μ t−s(Dλ f − Dμ f )] ds, so ‖Qλt f − Qμt f ‖ ≤ t‖Dλ f − Dμ f ‖ < εt, using that Qλs and Q μ t−s are contractions. Since ε is arbitrary, this proves that Qλt f is a Cauchy sequence in B and hence converges. Call the limit Pt f . We can easily check that Qλt is a semigroup for each λ > 0 and we saw
that Qλt is a contraction for each t and λ. It follows that Pt is a semigroup and that the norm of
each Pt is bounded by 1. Each Qλt is strongly continuous, and by the uniform convergence, it
follows that Pt f → f as t → 0 for f ∈ E. Since each Pt is a contraction and E is dense in B,
we can extend each Pt so as to have domain B and so that the Pt will be a strongly continuous
semigroup on B.
Let Sλ be the resolvent for Pt . It remains to prove that Sλ = Rλ. Fix a and let f = Rag. We
saw in (37.10) that DλRag → aRag − g. Now Qλt is a semigroup for each λ and by Exercise
37.4, the infinitesimal generator of Qλt is Dλ. By the fundamental theorem of calculus,
Qλt (Rag) − Rag =
∫ t
0
∂
∂s
(Qλs Rag) ds =
∫ t
0
Qλs (DλRag) ds.
Letting λ → ∞,
Pt (Rag) − Rag =
∫ t
0
Ps(aRag − g) ds.

294 Infinitesimal generators
Let b < a. Multiply the above equation by e−bt and integrate over t from 0 to ∞. Then Sb(Rag) − 1 b Rag = ∫ ∞ 0 e−bt ∫ t 0 Ps(aRag − g) ds dt = ∫ ∞ 0 ∫ ∞ s e−btPs(aRag − g) dt ds = ∫ ∞ 0 1 b e−bsPs(aRag − g) ds = 1 b Sb(aRag − g). Therefore Sbg = Rag + (a − b)SbRag. Applying this with g replaced by Rag, iterating, and using Corollary 37.3, we obtain Sbg = Rag + (a − b)R2ag + (a − b)3R3ag + · · · = Rbg. By Remark 37.4, this proves Sb = Rb for all b. We now show that under appropriate hypotheses on L, there exists a semigroup whose infinitesimal generator is L. This is known as the Hille–Yosida theorem. We say that an operator L is dissipative if ‖(λ − L) f ‖ ≥ λ‖ f ‖, f ∈ D(L). (37.11) Theorem 37.10 Suppose L is an operator such that (1) the domain of L is a dense subset of B, (2) the range of λ − L is B for each λ, and (3) L is dissipative. Then there exists a semigroup on B which has L as its infinitesimal generator. Proof If (λ − L) f = (λ − L)g, then λ‖ f − g‖ ≤ ‖(λ − L)( f − g)‖ = 0, or f = g. Thus λ −L is a one-to-one map, hence is invertible because the range of λ −L is B. We let Rλ be the inverse, and thus the domain of Rλ is all of B. We first show that the resolvent equation holds. We observe (μ − L) 1 λ − μRμ f = 1 λ − μ f and (μ − L) 1 λ − μRλ f = (μ − λ) 1 λ − μRλ f + (λ − L) 1 λ − μRλ f = −Rλ f + 1 λ − μ f . Combining, (μ − L)RμRλ f = Rλ f = (μ − L) 1 λ − μ(Rμ − Rλ) f . 37.2 The Hille–Yosida theorem 295 Applying Rμ to both sides yields the resolvent equation. The hypothesis that ‖λ − L) f ‖ ≥ λ‖ f ‖ immediately implies ‖Rλ f ‖ ≤ ‖ f ‖/λ. We next show λRλ f → f as λ → ∞. If f ∈ D, then RλL f = LRλ f = λRλ f − f , and so ‖λRλ f − f ‖ ≤ 1 λ ‖L f ‖ → 0 as λ → ∞. Since ‖λRλ‖ ≤ 1 and the domain of L is dense in B, we conclude λRλ f → f for all f ∈ B. We use Proposition 37.9 to construct Pt . By Proposition 37.9, Rλ is the resolvent for Pt . If M is the infinitesimal generator for Pt , then by Proposition 37.5, the domain of M is {Rλ f : f ∈ B}. Since we know L(Rλ f ) = λRλ f − f ∈ B, then the domain of L contains {Rλ f : f ∈ B}. Since M is the infinitesimal generator of Pt , by Proposition 37.5, M(Rλ f ) = λRλ f − f . Therefore L is an extension of M. If f ∈ D(L), then g = (λ − L) f ∈ B, and thus (λ − M)−1g ∈ D(M) ⊂ D(L). Hence (λ − L) f = g = (λ − M)(λ − M)−1g = (λ − L)(λ − M)−1g. Since λ − L is one-to-one, then f = (λ − M)−1g, which implies f ∈ D(M). Therefore M = L and so L is the generator of Pt . When applying the Hille–Yosida theorem, it is quite often the case that it is easier to show that the range of λ −L is only dense in B, rather than being all of B. When that occurs, one needs to look at a closed extension L of L. An operator L is closed if whenever fn → f and L fn → g, then f ∈ D(L) and L f = g. To construct the closed extension of L, where we assume that L is dissipative (defined by (37.11)), let Rλg = f if (λ − L) f = g. L being dissipative is equivalent to the norm of Rλ being bounded by 1/λ on the range of λ −L, and so we can extend the domain of Rλ uniquely to all of B. Now define D(L) to be the range of Rλ and set LRλg = λRλg − g. (37.12) We will soon give two examples where infinitesimal generators can be used to construct very useful processes. The first is where the infinitesimal generator is an elliptic operator of second order in non-divergence form. The second case studies the infinitesimal generators of Lévy processes. We should mention that there is another important example where infinitesimal generators are useful in constructing a process, that of infinite particle systems. The name “infinite particle systems” refers to a class of models with discrete space and continuous time that are useful in mathematical biology and in statistical mechanics. One of the simplest examples is the voter model. Suppose at every point in Z2, the integer lattice in the plane, there is a voter, who is leaning either toward the Democrat candidate or the Republican candidate. At each point, the voter waits a length of time that is exponential with parameter one, chooses 296 Infinitesimal generators one of his four nearest neighbors at random, and then changes his view to agree with that neighbor. Other infinite particle systems include the contact process (modeling the spread of infection), Ising model (modeling ferromagnetism), and the exclusion model (used in solid state physics). See Liggett (2010) for how to construct these processes using infinitesimal generators, and for much more. 37.3 Nondivergence form elliptic operators Let us consider the operator L defined on C2 functions on Rd by L f (x) = d∑ i, j=1 ai j(x) ∂2 f ∂xi∂x j (x) + d∑ i=1 bi(x) ∂ f ∂xi (x). We suppose ai j(x) = aji(x) for all x. We assume the ai j and bi are bounded and Hölder continuous of order α ∈ (0, 1): there exists c such that |ai j(x) − ai j(y)| ≤ c|x − y|α, |bi(x) − bi(y)| ≤ c|x − y|α, for i, j = 1, . . . , d. We also assume a uniform ellipticity condition on the ai j: there exists � > 0 such that
d∑
i, j=1
ai j(x)yiy j ≥ �
d∑
i=1
y2i , (y1, . . . , yd ) ∈ Rd .
Uniform ellipticity says that the matrix whose (i, j)th element is ai j(x) is positive definite,
uniformly in x.
If the ai j and bi were Lipschitz continuous, we can construct the Markov process with
infinitesimal generator L using stochastic differential equations (see Chapter 39), which
is a more probabilistic way of doing it. Even when the ai j are continuous and the bi only
measurable, it is possible to construct the Markov process via SDEs, although this is much
harder. Here we illustrate how the Hille–Yosida theorem can be used in constructing these
processes.
Let B be the space of continuous functions that vanish at infinity. We will want the domain
ofL to include the class C of functions f such that f and its first and second partial derivatives
are continuous and vanish at infinity. Then C is dense in B and L maps C into B.
We show that L is dissipative. Let f ∈ C and let x0 be a point where | f (x0)| = ‖ f ‖. There
is nothing to prove if f is identically zero. If f (x0) < 0, we can look at − f , so let us suppose f (x0) > 0. Such a point x0 exists because f is continuous and vanishes at infinity. It suffices
to show that L f (x0) ≤ 0, since then
λ‖ f ‖ = λ f (x0) ≤ (λ − L) f (x0) ≤ ‖(λ − L) f ‖.
Let A be the matrix whose (i, j) element is ai j(x0) and let H be the Hessian at x0 so that
Hi j = ∂
2 f
∂xi∂x j
(x0).

37.4 Generators of Lévy processes 297
Let y ∈ Rd and consider the function f (x0 + ty), t ∈ R. Since x0 is the location of a local
maximum for this function, its second derivative, which is
∑d
i, j=1 yiy jHi j, will be less than
or equal to 0. The first derivative of this function will be zero at x0.
Since A is positive definite, there exists an orthogonal matrix P and a diagonal matrix D
with positive entries such that A = PT DP. Recall the trace of a square matrix is defined by
Trace (C) =∑di=1 Cii and Trace (AB) = Trace (BA). Note
d∑
i, j=1
ai j(x0)
∂2 f
∂xi∂x j
(x0) = Trace (AH ).
We have
Trace (AH ) = Trace (PT DPH ) = Trace (PHPT D) =
d∑
i=1
(PHPT )iiDii,
since D is a diagonal matrix. Thus to show that Trace (AH ) ≤ 0, it suffices to show that
(PHPT )ii ≤ 0 for each i. If we let ei be the unit vector in the xi direction and y = PT ei, we
have
(PHPT )ii = etiPHPT ei = ytHy =
d∑
i, j=1
yiy jHi j ≤ 0.
Since x0 is the location of a local maximum, then
∂ f
∂xi
(x0) = 0, and we conclude L f (x0) ≤ 0.
Since L1 = 0, then Pt1 = 1 for all t. This and Exercise 37.1 imply that the Pt are
non-negative operators.
To apply the Hille–Yosida theorem, it remains to show that the range of λ −L is dense in
B. For this we refer the reader to the PDE literature, e.g., Bass (1997), Chapter 3 or Gilbarg
and Trudinger (1983), Chapters 5,6.
37.4 Generators of Lévy processes
Let n be a measure on R \ {0} satisfying∫
(h2 ∧ 1) n(dh) < ∞. Consider the operator L defined on C2 functions by L f (x) = ∫ [ f (x + h) − f (x) − 1(|h|≤1) f ′(x)h] n(dh). We will show that L is the infinitesimal generator of a Markov semigroup. We construct these processes, the Lévy processes, probabilistically in Chapter 42. We confine ourselves to the one-dimensional case, although the argument for higher dimensions is completely analogous. We let B be the continuous functions vanishing at infinity. We let C be the class of Schwartz functions, which is the class of C∞ functions, all of whose kth partial derivatives go to zero faster than |x|−m as |x| → ∞ for every k = 0, 1, . . . and every m = 1, 2, . . .; see Section B.2. 298 Infinitesimal generators First we show that L maps C into B, so that the domain of L contains C, and hence is dense in B. Given M > 1 and f ∈ C, by Taylor’s theorem
|L f (x)| ≤
∫
| f (x + h) − f (x) − 1(|h|≤1) f ′(x)h| n(dh) (37.13)
≤ sup
|y−x|≤1
(| f ′′(y)|)
∫
0<|h|≤1 h2 n(dh) + 2( sup |y−x|≤M | f (y)|) ∫ 1<|h|≤M n(dh) + 2 ∫ |h|>M
‖ f ‖∞ n(dh).
This shows |L f (x)| is finite. Given ε > 0 and f ∈ C, choose M large so that∫
|h|>M
n(dh) < ε/‖ f ‖∞. Since the first two terms on the right-hand side of (37.13) tend to zero as |x| → ∞, then L : C → B. To showL is dissipative, let f ∈ C and choose x0 such that | f (x0)| = ‖ f ‖. There is nothing to prove if ‖ f ‖ = 0, so assume ‖ f ‖ > 0. Because f is in the Schwartz class, it takes on
its maximum and its minimum. By looking at − f if necessary, we may suppose f (x0) > 0.
Since x0 is the location of a local maximum, f ′(x0) = 0 and f (x0 + h) − f (x0) ≤ 0 for each
h, hence L f (x0) ≤ 0. Then
λ‖ f ‖ = λ f (x0) ≤ (λ − L) f (x0) ≤ ‖(λ − L) f ‖.
Taking limits, this holds for every f in the domain of L.
Finally we need to show that the range of λ−L is dense in B. This is the most complicated
part and we break the argument into steps.
Step 1. We start by computing the Fourier transform of L f if f ∈ C. Let nδ(dh) =
1(|h|≥δ)n(dh) and let
Lδ f (x) =
∫
[ f (x + h) − f (x) − 1(|h|≤1) f ′(x)h] nδ(dh).
Then nδ is a finite measure. Using the Fubini theorem and the fact that the Fourier transform
of the function x → f (x + h) is eiuh f̂ (u) and the Fourier transform of f ′(x) is −iu f̂ (u),
L̂δ f (u) =
∫ ∫
[eiux f (x + h) − eiux f (x) − 1(|h|≤1)eiux f ′(x)h] dx nδ(dh)
= f̂ (u)
∫
[e−iuh − 1 + 1(|h|≤1)iuh] nδ (dh)
= f̂ (u)
∫
[e−iuh − 1 + 1(|h|≤1)iuh]1(|h|≥δ) n(dh). (37.14)
The expression in brackets on the last line is bounded by c(h2 ∧ 1) and by dominated
convergence the last line converges to f̂ (u)ψ(u) as δ → 0, where
ψ(u) =
∫
[e−iuh − 1 + 1(|h|≤1)iuh] n(dh). (37.15)

37.4 Generators of Lévy processes 299
Since
|L̂ f (u) − L̂δ f (u)|
=
∣∣∣ ∫ eiux ∫
|h|<δ [ f (x + h) − f (x) − 1(|h|≤1) f ′(x)h] n(dh) dx ∣∣∣ ≤ ∫ ( sup |y−x|<δ | f ′′(y)|) ∫ |h|<δ h2 n(dh) dx, which tends to zero as δ → 0 because f ∈ C, we conclude L̂ f (u) = f̂ (u)ψ(u). (37.16) Step 2. Now let g ∈ C, let ε > 0, choose K > 1 such that ∫|h|≥K n(dh) < ε, let mK (dh) = 1(|h|≥K)n(dh), and define LK and ψK in terms of mK . We show there exists f ∈ C such that g = (λ − LK ) f = g. We have ψK (u) = ∫ |h|≤K [e−iuh − 1 + iuh1(|h|≤1)] n(dh), so using dominated convergence, ψ ′K (u) = ∫ |h|≤K [−ihe−iuh + ih1(|h|≤1)] n(dh), ψ ′′K (u) = ∫ |h|≤K [−h2e−iuh] n(dh), with similar formulas for the higher derivatives. Thus all the derivatives of ψK are bounded. Moreover the real part of ψK (u) is ∫ |h|≤K[cos(uh) − 1] n(dh), which is less than or equal to 0. Since g ∈ C, by Section B.2, ĝ ∈ C. If we define f by f̂ (u) = 1 λ − ψK (u) ĝ(u), (37.17) we see that f̂ and all its derivatives are continuous and tend to zero faster than |u|−m for every m. Hence f̂ ∈ C, which implies f ∈ C by Section B.2. Notice (λ − LK ) f = g because λ f̂ (u) − L̂K f (u) = λ − ψK (u) λ − ψK (u) ĝ(u) = ĝ(u). Step 3. We prove that ‖L f − LK f ‖ ≤ cε‖g‖. Since g ∈ C, then ĝ ∈ L1. From (37.17) we have | f̂ (u)| ≤ |̂g(u)|/λ. Then ‖ f ‖∞ ≤ c‖ f̂ ‖L1 ≤ c‖ĝ‖L1 300 Infinitesimal generators and |L f (x) − LK f (x)| ≤ ∫ |h|≥K | f (x + h) − f (x)| n(dh) ≤ 2‖ f ‖∞ ∫ |h|≥K n(dh) ≤ cε‖ĝ‖L1 . Step 4. We complete the proof that the range of λ − L is dense in B. Since ‖L f − LK f ‖ ≤ cε‖g‖ by Step 3 and (λ − LK ) f = g, then ‖(λ − L) f − g‖ ≤ cε‖g‖. Because f ∈ C ⊂ D(L) and ε is arbitrary, this proves the range of λ − L is dense in C, hence in B. We thus have L satisfying all the hypotheses of the Hille–Yosida theorem, and hence there exists a semigroup Pt mapping B into B. We again note that L1 = 0, hence Pt = 1 for all t, and so by Exercise 37.1, the Pt are non-negative operators. Exercises 37.1 Let B be either the space L2 with respect to a finite measure or else the continuous functions vanishing at infinity for some locally compact separable metric space S. In the former case, we say f ≥ 0 if f (x) ≥ 0 for almost every x, in the latter case if f (x) ≥ 0 for all x. A semigroup is non-negative if f ≥ 0 implies Pt f ≥ 0 for all t ≥ 0. Suppose that Pt is a semigroup, the space B contains the constant functions, and Pt1 = 1 for all t. Show that Pt is a contraction if and only if Pt is non-negative. 37.2 Show that Pt and Rλ commute and that PtRλ f = ∫ ∞ 0 e−λsPs+t f ds. Show that for any a < b we have∥∥∥ ∫ b a eλtPt f dt ∥∥∥ ≤ ∫ b a e−λt‖Pt f ‖ dt. Hint: Approximate Rλ f by a Riemann sum. 37.3 Show that if Pt is a contraction semigroup and Rλ is the resolvent, then ‖Rλ‖ ≤ 1/λ. (37.18) 37.4 Show that if A is a bounded operator and Tt = etA, then Tt is a strongly continuous semigroup of operators with infinitesimal generator A. (We cannot assert that the Tt are contractions.) 37.5 Prove that if L is dissipative, the domain of L is dense in B, and the range of λ − L is dense in B, then L defined in (37.12) is a closed extension of L that is dissipative and the range of λ −L is equal to B. Show there is only one such closed extension of L. 37.6 If the range of λ−L equals B for a single value of λ, then the range of λ−L equals B for every value of λ. Hint: Define Rλ as the inverse of λ − L, then use (37.3) to define Ra for other values of a. Exercises 301 37.7 Let (Xt , Px) be a Markov process with transition probabilities given by Pt f (x) = f (x + t). Determine L and D(L). 37.8 Let Pt be a strongly continuous semigroup of contraction operators and let L be the infinitesimal generator. Show that D(Ln) is dense in B for every positive integer n. 37.9 This is a continuation of Exercise 36.5. Prove that if f ∈ C2 with compact support, Pt is the semigroup given in Exercise 36.5, andL is the infinitesimal generator, then the Fourier transform of L f is f̂ (u)ψ(u). 37.10 Suppose that Pt is a strongly continuous semigroup, but not necessarily of contractions. Thus Pt+s = PtPs and Pt f → f in norm if f ∈ B, but we do not assume ‖Pt‖ ≤ 1. Prove that there exist constants K, b > 0 such that ‖Pt‖ ≤ Kebt for all t ≥ 0.
Hint: Use the uniform boundedness principle from functional analysis to prove there exists
c, t0 such that ‖Pt‖ ≤ c if t ≤ t0. Then use the semigroup property.

38
Dirichlet forms
When constructing semigroups, it is sometimes easier to start with a bilinear form, called the
Dirichlet form, than to work with the infinitesimal generator, and to construct the semigroup
from the form. For example, let � be the Laplacian. If f , g ∈ C2 with compact support, then
integration by parts shows∫
Rd
f (x)( 12�g)(x) dx = 12
∫
Rd
f (x)
d∑
i=1
∂2g
∂x2i
(x) dx
= − 12
∫
Rd
d∑
i=1
∂ f
∂xi
(x)
∂g
∂xi
(x) dx.
If we write
E ( f , g) = 12
∫ d∑
i=1
∂ f
∂xi
(x)
∂g
∂xi
(x) dx,
we thus have ∫
Rd
f ( 12�g) = −E ( f , g). (38.1)
Clearly E ( f , g) is symmetric in f and g, so∫
Rd
f ( 12�g) = −E ( f , g) = −E (g, f ) =
∫
Rd
g( 12� f ) dx.
If Rλ is the resolvent for Brownian motion, (38.1) and the fact that
1
2�Rλ f = λRλ f − f tells
us that
E (Rλ f , g) + λ
∫
(Rλ f )g = −
∫
( 12�Rλ f )g + λ
∫
(Rλ f )g (38.2)
= −
∫
(λRλ f − f )g + λ
∫
(Rλ f )g
=
∫
f g.
The bilinear form E ( f , g) makes sense even if f , g are only in C1 with compact support,
which is one major advantage of the Dirichlet form. Since E is clearly linear in each variable,
we have
E ( f , g) = 12 [E ( f + g, f + g) − E ( f , f ) − E (g, g)],
302

38.1 Framework 303
so to specify the Dirichlet form, it is only necessary to know E ( f , f ), a number, rather than
L f , a function. One disadvantage of Dirichlet forms is that one needs a self-adjoint operator,
and not every infinitesimal generator is self-adjoint. Another disadvantage is that when
working with Dirichlet forms, L2 is the natural space to work with, which means there are
null sets one has to worry about. In particular, the construction of Chapter 36 is not directly
applicable, because there we required our Banach space to be the set of continuous functions
vanishing at infinity. (Modifications of the methods in Chapter 36 do work, however.)
38.1 Framework
Let us now suppose S is a locally compact separable metric space together with a σ -finite
measure m defined on the Borel subsets of S . We want to give a definition of the Dirichlet
form in this more general context. We suppose there exists a dense subset D = D(E ) of
L2(S, m) and a non-negative bilinear symmetric form E defined on D × D, which means
E ( f , g) = E (g, f ), E ( f + g, h) = E ( f , h) + E (g, h)
E (a f , g) = aE ( f , g), E ( f , f ) ≥ 0
for f , g, h ∈ D, a ∈ R.
We will frequently write 〈 f , g〉 for ∫ f (x)g(x) m(dx). For a > 0 define
Ea( f , f ) = E ( f , f ) + a〈 f , f 〉.
We can define a norm on D using the inner product Ea: the norm of f equals (Ea( f , f ))1/2;
we call this the norm induced by Ea. Since a〈 f , f 〉 ≤ Ea( f , f ), then
Ea( f , f ) ≤ Eb( f , f ) = Ea( f , f ) + (b − a)〈 f , f 〉
≤
(
1 + b − a
a
)
Ea( f , f )
if a < b, so the norms induced by different a’s are all equivalent. We say E is closed if D is complete with respect to the norm induced by Ea for some a. Equivalently, E is closed if whenever un ∈ D satisfies E1(un − um, un − um) → 0 as n, m → ∞, then there exists u ∈ D such that E (un − u, un − u) → 0 as n → ∞. We say E is Markovian if whenever u ∈ D, then v = 0∨(u∧1) ∈ D and E (v, v) ≤ E (u, u). (A slightly weaker definition of Markovian is sometimes used.) A Dirichlet form is a non- negative bilinear symmetric form that is closed and Markovian. Absorbing Brownian motion on [0, ∞) is a symmetric process. The corresponding Dirich- let form is E ( f , f ) = 12 ∫ ∞ 0 | f ′(x)|2 dx, and the appropriate domain turns out to be the completion of the set of C1 functions with compact support contained in (0, ∞) with respect to the norm induced by E1. In particular, any function with compact support contained in (0, ∞) will be zero in a neighborhood of 0. In a domain D in higher dimensions, the Dirichlet form for absorbing Brownian motion becomes E ( f , f ) = 12 ∫ |∇ f (x)|2 dx, (38.3) 304 Dirichlet forms with the domain of E being the completion with respect to E1 of the C1 functions whose support is contained in the interior of D. Reflecting Brownian motion is also a symmetric process. For a domain D, the Dirichlet form is given by (38.3) and the domain D(E ) of the form is given by the completion with respect to the norm induced by E1 of the C1 functions on D with compact support, where D is the closure of D. One might expect there to be some restriction on the normal derivative ∂ f /∂n on the boundary of D, but in fact there is no such restriction. To examine this further, consider the case of D = (0, ∞). If one takes the class of functions f which are C1 with compact support and with f ′(0) = 0 and takes the closure with respect to the norm induced by E1, one gets the same class as D(E ); this is Exercise 38.1. One nice consequence of the fact that we don’t need to impose a restriction on the normal derivative in the domain of E for reflecting Brownian motion is that this allows us to define reflecting Brownian motion in any domain, even when the boundary is not smooth enough for the notion of a normal derivative to be defined. 38.2 Construction of the semigroup We now want to construct the resolvent corresponding to a Dirichlet form. The motivation given in (38.2) shows we should expect Ea(Ra f , g) = 〈 f , g〉 (38.4) for all a > 0 and all f , g such that Ra f , g ∈ D. Our Banach space B will be L2(S, m).
Theorem 38.1 If E is a Dirichlet form, there exists a family of resolvent operators {Rλ} such
that
(1) the Rλ satisfy the resolvent equation,
(2) ‖λRλ‖ ≤ 1 for all λ > 0,
(3) λRλ f → f as λ → ∞ for f ∈ B, and
(4) Ea(Ra f , g) = 〈 f , g〉 if a > 0, Ra f , g ∈ D.
Moreover, if f ∈ B satisfies 0 ≤ f (x) ≤ 1, m-a.e., then for all a > 0
0 ≤ aRa f ≤ 1, m-a.e. (38.5)
Proof Fix f ∈ B and define a linear functional on B by I(g) = 〈 f , g〉. This functional is
also a bounded linear functional on D with respect to the norm induced by Ea, that is, there
exists c such that |I(g)| ≤ cEa(g, g)1/2. This follows because
|I(g)| =
∣∣∣ ∫ f g∣∣∣ ≤ 〈 f , f 〉1/2〈g, g〉1/2 ≤ 〈 f , f 〉1/2( 1aEa(g, g))1/2
by the Cauchy–Schwarz inequality. Since E is closed, D is a Hilbert space with respect to
the norm induced by Ea. By the Riesz representation theorem for Hilbert spaces (see, e.g.,
Folland (1999), Theorem 5.25), there exists a unique element u ∈ D such that I(g) = Ea(u, g)
for all g ∈ D. We set Ra f = u. In particular, (38.4) holds, and Ra f ∈ D.

38.2 Construction of the semigroup 305
We show the resolvent equation holds. If g ∈ D,
Ea(Ra f − Rb f , g) = Ea(Ra f , g) − E (Rb f , g) − a〈Rb f , g〉
= 〈 f , g〉 − E (Rb f , g) − b〈Rb f , g〉 + (b − a)〈Rb f , g〉
= 〈 f , g〉 − Eb(Rb f , g) + (b − a)〈Rb f , g〉
= (b − a)〈Rb f , g〉
= Ea((b − a)RaRb f , g).
Since this holds for all g ∈ D and D is dense in B, then Ra f − Rb f = (b − a)RaRb f .
Next we show that ‖aRa f ‖ ≤ ‖ f ‖, or equivalently,
〈aRa f , aRa f 〉 ≤ 〈 f , f 〉. (38.6)
If 〈Ra f , Ra f 〉 is zero, then (38.6) trivially holds, so suppose it is positive. We have
a〈Ra f , Ra f 〉 ≤ Ea(Ra f , Ra f ) = 〈 f , Ra f 〉 ≤ 〈 f , f 〉1/2〈Ra f , Ra f 〉1/2
by (38.4) and the Cauchy–Schwarz inequality. If we now divide both sides by 〈Ra f , Ra f 〉1/2
and then square both sides, we obtain (38.6).
We show that bRb f → f as b → ∞ when f ∈ B. If f ∈ D, then by the Cauchy–Schwarz
inequality and (38.6)
〈bRb f , f 〉 ≤ 〈bRb f , bRb f 〉1/2〈 f , f 〉1/2
≤ 〈 f , f 〉.
Using this,
b〈bRb f − f , bRb f − f 〉 ≤ Eb(bRb f − f , bRb f − f )
= b2Eb(Rb f , Rb f ) − 2bEb(Rb f , f ) + Eb( f , f )
= b2〈Rb f , f 〉 − 2b〈 f , f 〉 + E ( f , f ) + b〈 f , f 〉
≤ E ( f , f ).
Now divide both sides by b to get ‖bRb f − f ‖2 ≤ E ( f , f )/b → 0 as b → ∞. Since D is
dense in B and ‖bRb‖ ≤ 1 for all b, we conclude bRb f → f for all f ∈ B.
It remains to show 0 ≤ bRb f ≤ 1, m-a.e., if 0 ≤ f ≤ 1, m-a.e. Fix f ∈ B with 0 ≤ f ≤ 1,
m-a.e., and let a > 0. Define a functional ψ on D by
ψ(v) = E (v, v) + a
⟨
v − f
a
, v − f
a
⟩
.
We claim
ψ(Ra f ) + Ea(Ra f − v, Ra f − v) = ψ(v), v ∈ D. (38.7)

306 Dirichlet forms
To see this, start with the left-hand side, which is equal to
E (Ra f , Ra f ) + a
⟨
Ra f − 1
a
f , Ra f − 1
a
f
⟩
+ Ea(Ra f − v, Ra f − v)
= Ea(Ra f , Ra f ) − 2〈Ra f , f 〉 + 1
a
〈 f , f 〉 + Ea(Ra f , Ra f ) − 2Ea(Ra f , v) + Ea(v, v)
= 1
a
〈 f , f 〉 − 2〈 f , v〉 + E (v, v) + a〈v, v〉
= ψ(v).
If follows from (38.7) and the fact that Ea(g, g) is non-negative for any g ∈ D that Ra f is the
function that minimizes ψ .
Set φ(x) = 0 ∨ (x ∧ (1/a)) and let w = φ(Ra f ). Observe that |φ(t) − s| ≤ |t − s| for
t ∈ R and s ∈ [0, 1/a], so ∣∣∣w(x) − f (x)
a
∣∣∣ ≤ ∣∣∣Ra f (x) − f (x)
a
∣∣∣,
and therefore ⟨
w − f
a
, w − f
a
⟩
≤
⟨
Ra f − f
a
, Ra f − f
a
⟩
. (38.8)
Since E is Markovian, then aw = 0 ∨ ((aRa f ) ∧ 1), which leads to
E (w, w) ≤ 1
a2
E (aRa f , aRa f ) = E (Ra f , Ra f ). (38.9)
Adding (38.8) and (38.9), we conclude ψ(w) ≤ ψ(Ra f ). Since Ra f is the minimizer for ψ ,
then w = Ra f , m-a.e. But 0 ≤ w ≤ 1/a, and hence aRa f takes values in [0, 1], m-a.e.
If we combine Proposition 37.9 and Theorem 38.1, we obtain a semigroup Pt whose
resolvent satisfies (38.4). We would like to know that the analog of (38.5) holds for Pt .
Corollary 38.2 If 0 ≤ f ≤ 1, m-a.e., then 0 ≤ Pt f ≤ 1, m-a.e.
Proof If 0 ≤ f ≤ 1, m-a.e., then 0 ≤ bRb f ≤ 1, m-a.e, by Theorem 38.1, and iterating,
0 ≤ (bRb)i f ≤ 1, m-a.e., for every i. Using the notation of the proof of Proposition 37.9,
Qbt f (x) = e−bt
∞∑
i=0
(bt)i(bRb)
i f (x)/i!,
which will be non-negative, m-a.e., and bounded by e−bt
∑∞
i=0(bt)
i/i!, m-a.e. Passing to the
limit as b → ∞, we see that Pt f takes values in [0, 1], m-a.e.
When it comes to using the semigroup Pt derived from a Dirichlet form to construct a
Markov process X , there is a difficulty that we did not have before. Since Pt is constructed
using an L2 procedure, Pt f is defined only up to almost everywhere equivalence. Without
some continuity properties of Pt f for enough f ’s, we must neglect some null sets. If the only
null sets we could work with were sets of m-measure 0, we would be in trouble. For example,
when S is the plane and m is a two-dimensional Lebesgue measure, the x axis has measure
zero, but a continuous process will (in general) hit the x axis. Fortunately there is a notion
of sets of capacity zero, which are null sets that are smaller than sets of measure zero. It is

38.3 Divergence form elliptic operators 307
possible to construct a process X starting from all points x in S except for those in a set N
of capacity zero and to show that starting from any point not in N , the process never hits N .
There is another difficulty when working with Dirichlet forms. In general, one must look
at S̃ , a certain compactification of S , which is a compact set containing S . Even when our
state space is a domain in Rd , S̃ is not necessarily equal to S , the Euclidean closure of S , and
one must work with S̃ instead of S . It can be shown that this problem will not occur if the
Dirichlet form is regular. Let CK be the set of continuous functions with compact support. A
Dirichlet form E is regular if D ∩ CK is dense in D with respect to the norm induced by E1
and D ∩ CK also is dense in CK with respect to the supremum norm.
38.3 Divergence form elliptic operators
We want to show how to construct the Markov process corresponding to the operator
L f (x) =
d∑
i, j=1
∂
∂xi
(
ai j(·) ∂ f
∂x j
(·)
)
(x). (38.10)
If the ai j’s are smooth in x, this can be interpreted as first calculating the partial derivative of f
with respect to x j, multiplying the result by ai j(x), taking the partial derivative of the product
with respect to xi, and then summing over i and j. If, however, the ai j’s are only bounded
and measurable, one cannot even in general give any nontrivial examples of functions in
the domain of L. Here is where Dirichlet forms are the perfect tool. Operators of the form
(38.10) are known as elliptic operators in divergence form or in variational form, and the
study of their properties has a long history in PDE.
We assume ai j(x) = aji(x) for each i and j and each x. We suppose the ai j(x) are
measurable functions and are uniformly bounded in x for each i and j. We also require
uniform ellipticity: there exists � such that
d∑
i, j=1
ai j(x)yiy j ≥ �
d∑
i=1
y2i , (y1, . . . , yd ) ∈ Rd .
Just as in the nondivergence elliptic operator case, the matrix whose (i, j)th element is ai j(x)
is positive definite, uniformly in x.
We will shortly define a Dirichlet form, but let us first specify a domain. Let C1K be the
collection of C1 functions with compact support, and define H 1 to be the completion of C1K
with respect to the norm
‖ f ‖H 1 =
( ∫
(| f (x)|2 + |∇ f (x)|2) dx
)1/2
. (38.11)
One can show that H 1 with this norm is a Banach space; this is Exercise 38.2.
Now for f ∈ C1K define
E ( f , f ) =
∫
Rd
d∑
i, j=1
ai j(x)
∂ f
∂xi
(x)
∂ f
∂x j
(x) dx. (38.12)

308 Dirichlet forms
We can use the fact that C1K is dense in H
1 to extend the definition of E to all of H 1 ×H 1. The
connection with the operator L is that when the ai j are smooth, integration by parts yields∫
(L f )g dx = −E ( f , g)
if g is C1 with compact support; cf. (38.1).
Because of the boundedness and uniform ellipticity, there exist positive constants c1 and
c2 not depending on f such that
c1
∫
|∇ f (x)|2 dx ≤ E ( f , f ) ≤ c2
∫
|∇ f (x)|2 dx.
Therefore the norm induced by E1 and the norm in H 1 are equivalent. This implies E is
closed. By the definition of H 1, E is regular, and clearly E is symmetric. Thus we need only
to show that E is Markovian.
Let φ(x) = (0 ∨ x) ∧ 1. For each ε > 0 let φε be C∞, bounded, agreeing with φ on [0, 1],
with ‖φ′ε‖∞ ≤ 1, and such that φε(x) → φ(x) uniformly in x as ε → 0 and φ′ε(x) → 1[0,1](x)
pointwise as ε → 0. Note ∇φε( f ) = φ′ε( f )∇ f , so if f ∈ C1K ,
E (φε( f ), φε( f )) =
d∑
i, j=1
∫
(φ′ε( f )(x))
2ai j(x)
∂ f
∂xi
(x)
∂ f
∂x j
(x) dx. (38.13)
Since
d∑
i, j=1
ai j(x)
∂ f
∂xi
(x)
∂ f
∂x j
(x) ≥ �|∇ f (x)|2 ≥ 0
and |φ′ε( f )(x)| ≤ 1, we see that
E (φε( f ), φε( f )) ≤ E ( f , f ).
Taking the limit as ε → 0 in (38.13) we obtain
E (φ( f ), φ( f )) ≤ E ( f , f ) < ∞. (38.14) In particular, φ( f ) ∈ H 1 = D(E ). We now pass to the limit to show that (38.14) holds for all f ∈ H 1, which says that E is Markovian. We can therefore apply Theorem 38.1 to obtain a semigroup corresponding to the Dirichlet form E . As mentioned earlier, there is potentially a problem in that the semigroup is only defined for points not in a certain null set. However, a famous result of Nash and of DeGiorgi shows that the semigroup Pt can be written as Pt f (x) = ∫ f (y)p(t, x, y) dy with p(t, x, y) Hölder continuous in x and y; see Bass (1997), Chapter VII for a presentation of this result. This allows us to take the null set to be empty and to see that our semigroup satisfies the assumptions of Chapter 36. Therefore there exists a strong Markov process having Pt as its semigroup. Exercises 309 Exercises 38.1 Let F1 = { f ∈ C1[0,∞) : f has compact support} and F2 = F1 ∩ { f ∈ C1[0,∞) : f has compact support, f ′(0) = 0}. Show that the closures of F1 and F2 with respect to the norm ( ∫ (| f (x)|2 + | f ′(x)|2) dx)1/2 are the same. 38.2 If H 1 is the completion of C1K , the C 1 functions on Rd with compact support, relative to the norm given by (38.11), show H 1 is a Hilbert space. 38.3 Show that the resolvent operator Rλ defined in Theorem 38.1 is a symmetric operator, that is, if f , g ∈ B, then 〈Rλ f , g〉 = 〈 f , Rλg〉. 38.4 Show that if the resolvent operator Rλ is a symmetric operator, then the transition operators Pt are also symmetric: if f , g ∈ B, then 〈Pt f , g〉 = 〈 f , Ptg〉. 38.5 To do the next few exercises, you will have to know some functional analysis, specifically, the spectral theorem for self-adjoint operators. See Lax (2002). Let E be a Dirichlet form with domain D(E ) and let L be the infinitesimal generator of the semigroup Pt that corresponds to L. Let E(dλ) be a spectral resolution of the identity for −L. (The operator L is a negative operator, so −L is a positive one.) Then a consequence of the spectral theorem is that Pt f = ∫ ∞ 0 e−λt E(dλ) f and Ra f = ∫ ∞ 0 1 a + λ E(dλ) f . Also 〈 f , g〉 = ∫ ∞ 0 〈E(dλ) f , g〉. Show that if f , g ∈ D, then E( f , g) = ∫ ∞ 0 λ 〈E(dλ) f , g〉. Hint: First prove it for f = Rah. Write E(Rah, g) = 〈h, g〉 − a〈Rah, g〉 = ∫ ∞ 0 ( 1 − a a + λ ) 〈E(dλ)h, g〉 = ∫ ∞ 0 λ a + λ 〈E(dλ)h, g〉 = ∫ ∞ 0 λ 〈E(dλ)(Rah), g〉. To extend this to all f in the domain of E , use the fact that E is closed. 38.6 If L is the infinitesimal generator of the semigroup associated with the Dirichlet form E , show that D( √−L) = D(E ). 38.7 Show that if f ∈ D(E ), then aRa f converges to f with respect to the norm induced by E1. 38.8 Show that if b > 0, then {Rb f : f ∈ L2} is a dense subset of D(E ) with respect to the norm
induced by E1.
38.9 Show that {Pt f : f ∈ L2, t > 0} is a dense subset of D(E ) with respect to the norm induced by
E1.

310 Dirichlet forms
38.10 This exercise shows how to approximate E by forms whose domain is all of B. Let
E (t)( f , g) = 1
t
〈 f − Pt f , g〉.
Show that if f ∈ D(E ), then E (t)( f , f ) increases to E( f , f ). Show that if f , g ∈ D(E ), then
E (t)( f , g) converges to E( f , g).
38.11 Show that if u ∈ D(E ), then |u| ∈ D(E ) and E(|u|, |u|) ≤ E(u, u).
Hint: Use Exercise 38.10.
38.12 Use Exericse 38.11 to show that if u ∈ D(E ), then E(u+, u−) ≤ 0.
38.13 Suppose {Pt} are the transition probabilities corresponding to a Dirichlet form E . Suppose there
exist functions pt (x, y) such that for each t,
Pt f (x) =
∫
pt (x, y) m(dy)
for almost every x. Prove that for almost every pair (x, y) with respect to the product measure
m × m, pt (x, y) = pt (y, x).
38.14 Let f ∈ L2(m) and define the functional
ψ(u) = E(u, u) + λ〈u, u〉 − 2〈 f , u〉
for u in the domain of E . Prove that ψ is minimized by u = Rλ f , and that this function is the
unique minimizer.
38.15 Let Pt be the semigroup associated with a Dirichlet form and define
J (dx, dy) = Pt (x, dy) m(dx).
(1) Prove that if f , g are continuous with compact support, then∫ ∫
f (x)g(y) J (dx, dy) =
∫ ∫
g(x) f (y) J (dx, dy).
(2) With f and g continuous with compact support, prove that∫
f (x)g(y) J (dx, dy) = 〈 f , Ptg〉
and ∫ ∫
f (x)g(x) J (dx, dy) = 〈 f g, Pt1〉.
(3) Let k(x) = 1 − Pt1(x). Prove that if E (t) is defined as in Exercise 38.10, then
2tE (t)( f , g) =
∫ ∫
( f (x) − f (y))(g(x) − g(y)) J (dx, dy) +
∫
f (x)g(x)k(x) m(dx).
(4) Is E (t) a Dirichlet form? A regular Dirichlet form?

Notes 311
38.16 This is a continuation of the previous exercise. If f is a function on the state space, we say that
g is a normal contraction of f if |g(x)| ≤ | f (x)| for all x and |g(x) − g(y)| ≤ | f (x) − f (y)| for
all x and y. As an example, note that if g(x) = −1 ∨ ( f (x) ∧ 1), then g is a normal contraction
of f . Prove that if f ∈ D(E ), where E is a Dirichlet form and g is a normal contraction of f ,
then for each t > 0,
E (t)(g, g) ≤ E (t)( f , f ) ≤ E( f , f ).
Notes
See Fukushima et al. (1994) for further information.

39
Markov processes and SDEs
One common way of constructing Markov processes is via stochastic differential equations.
Roughly speaking, if there is uniqueness for every starting point, then one can create a
strong Markov process. After proving this, we establish a connection between stochastic
differential equations and partial differential equations, and then we describe what is known
as the martingale problem.
39.1 Markov properties
Let P be a probability and suppose W is a d-dimensional Brownian motion with respect to
P. Consider the SDE
dXt = σ (Xt ) dWt + b(Xt ) dt. (39.1)
Here σ is a d × d matrix-valued function and b is a vector-valued function, both Borel
measurable and bounded. This can be written in terms of components as
dX it =
d∑
j=1
σi j(Xt ) dW
j
t + bi(Xt ) dt, i = 1, . . . , d,
where W = (W 1, . . . ,W d ). Let X xt be the solution to (39.1) when X0 = x. Let Px be the law
of X xt .
Let � = C[0, ∞), let F be the cylindrical subsets of �, and define Zt (ω) = ω(t). The
main result of this section is that if weak existence and weak uniqueness hold for (39.1) for
every starting point x, then the solutions (Zt, Px) form a strong Markov process.
We begin by considering regular conditional probabilities.
Definition 39.1 Let (�,F , P) be a probability space, and let E be a σ -field contained in F .
A regular conditional probability for E [ · | E] is a kernel Q(ω, dω′) such that
(1) Q(ω, ·) is a probability measure on (�, E ) for each ω;
(2) for each A ∈ F , Q(·, A) is a random variable that is measurable with respect to F ;
(3) for each A ∈ F and each B ∈ E ,∫
B
Q(ω, A) P(dω) = P(A ∩ B).
Regular conditional probabilities need not always exist, but if the probability space has
sufficient structure, then they do. We provide a proof in the appendix; see Theorem C.1.
Q(ω, A) can be thought of as P(A | E )(ω), regularized so as to have some joint measurability.
312

39.1 Markov properties 313
Recall that the definition of minimal augmented filtration for a Markov process was given in
Section 20.1.
Theorem 39.2 Suppose weak existence and weak uniqueness hold for the SDE (39.1) when-
ever X0 is a random variable that is in L2 and is measurable with respect to F0. Suppose the
matrix σ (y) is invertible for each y. Let (�,F , P) be defined as above. Let Px be the law
of the weak solution when X0 is identically equal to x. Let {Ft} be the minimal augmented
filtration generated by Z. Then (Px, Zt ) is a strong Markov process.
Proof We will prove that if T is a bounded stopping time and f is a bounded and Borel
measurable function on Rd , then
E
x[ f (ZT+t ) | FT ] = E ZT f (Zt ), a.s. (39.2)
As in Section 20.3, this is sufficient to get the strong Markov property.
Fix x. Let
Yt = Zt −
∫ t
0
b(Zr) dr (39.3)
and
W ′t =
∫ t
0
σ−1(Zr) dYr. (39.4)
Since the Px law of Zt is the same as the P law of X xt , then the P
x law of W ′ is the same as
the P law of W , or in other words, W ′ is a Brownian motion under Px. Rearranging (39.3)
and (39.4), we have the equation
Zt = Z0 +
∫ t
0
σ (Zr) dW
′
r +
∫ t
0
b(Zr) dr. (39.5)
Let Q be a regular conditional probability for E x[ · | FT ]. Let Z̃t = ZT +t and W̃t = W ′T+t −W ′T .
Using (39.5) with t replaced by T +t and then with t replaced by T , and taking the difference,
we obtain
ZT+t − ZT =
∫ T +t
T
σ (Zr) dWr +
∫ T+t
T
b(Zr) dr,
and hence
Z̃t = Z̃0 +
∫ t
0
σ (Z̃r)W̃r +
∫ t
0
b(Z̃r) dr. (39.6)
We will show in a moment that W̃ is a Brownian motion with respect to Q(ω, ·) for
Px-almost all ω. Thus except for ω in a Px-null set, (39.6) implies that under Q(ω, ·), Z̃ is a
solution to (39.1) with starting point Z̃0 = ZT (ω). If E Q denotes the expectation with respect
to Q, the weak uniqueness tells us that
E Q f (Z̃t ) = E ZT f (Zt ), Px(dω)-a.s. (39.7)
On the other hand,
E Q f (Z̃t ) = E Q f (ZT+t ) = E x[ f (ZT+t ) | FT ], Px(dω)-a.s. (39.8)
Combining (39.7) and (39.8) proves (39.2).

314 Markov processes and SDEs
It remains to prove that under Q the process W̃ is a Brownian motion. Q(ω, ·) is a
probability measure on �′, so t → W̃t is continuous for every ω′. Let t1 < · · · < tn and N (u2, . . . , un, t1, . . . , tn) = { ω : E Q exp ( i n∑ j=2 uj(W ′ T+t j − W ′T+t j−1 ) ) �= exp ( − n∑ j=2 |uj|2(t j − t j−1)/2 )} . By the strong Markov property of the Brownian motion W ′ and the definition of Q, E Q exp ( i n∑ j=2 uj(W ′ T+t j − W ′T+t j−1 ) ) = E [ exp ( i n∑ j=2 uj(W ′ T+t j − W ′T+t j−1 ) ) | FT ] = E W ′T exp ( i n∑ j=2 uj(W ′ T+t j − W ′T+t j−1 ) ) = exp ( − n∑ j=2 |uj|2(t j − t j−1)/2 ) , where the second equality holds almost surely, that is, except for a Px-null set of ω’s. This shows that N (u2, . . . , un, t1, . . . , tn) is a null set with respect to Px. Let N be the union of all such N (u1, . . . , un, t1, . . . , tn) for n ≥ 1, u1, . . . , un rational, and t1 < . . . < tn rational. Therefore N is a Px-null set. Suppose ω /∈ N . By the continuity of the paths of W ′, E Q exp ( i n∑ j=2 uj(W ′ T+t j − W ′T+t j−1 ) ) = exp ( − n∑ j=2 |uj|2(t j − t j−1)/2 ) for all t, . . . , tn ∈ [0, ∞) and u2, . . . , un ∈ R. Thus the finite-dimensional distributions of W̃ under QT (ω, ·) are those of a Brownian motion. By the continuity of W̃ and Theorem 2.6, under QT , W̃ is a Brownian motion, except for a null set of ω’s. By a slight abuse of notation, we will say (Xt, Px) is a strong Markov family when (Zt, Px) is a strong Markov family. 39.2 SDEs and PDEs The connection between stochastic differential equations and partial differential equations comes about through the following theorem, which is simply an application of Itô’s formula. Let L be the operator on functions in C2 defined by L f (x) = 12 d∑ i, j=1 ai j(x) ∂2 f ∂xi∂x j (x) + d∑ i=1 bi(x) ∂ f ∂xi (x). (39.9) 39.3 Martingale problems 315 Theorem 39.3 Suppose Xt is a solution to (39.1), σ and b are bounded and Borel measurable, and a = σσ T . Suppose f ∈ C2. Then f (Xt ) = f (X0) + Mt + ∫ t 0 L f (Xs) ds, (39.10) where Mt = ∫ t 0 d∑ i, j=1 ∂ f ∂xi (Xs)σi j(Xs) dW j s (39.11) is a local martingale. Proof Since the components of the Brownian motion Wt are independent, we have d〈W k,W �〉t = 0 if k �= �; see Exercise 9.4. Therefore d〈X i, X j〉t = ∑ k ∑ � σik(Xt )σ jl (Xt ) d〈W k,W �〉t = ∑ k σik(Xt )σ T k j(Xt ) dt = ai j(Xt ) dt. We now apply Itô’s formula: f (Xt ) = f (X0) + ∑ i ∫ t 0 ∂ f ∂xi (Xs) dX i s + 12 ∫ t 0 ∑ i, j ∂2 f ∂xi∂x j (Xs) d〈X i, X j〉s = f (X0) + Mt + ∑ i ∫ t 0 ∂ f ∂xi (Xs)bi(Xs) ds + 12 ∫ t 0 ∑ i, j ∂2 f ∂xi∂x j (Xs)ai j(Xs) ds = f (X0) + Mt + ∫ t 0 L f (Xs) ds, and we are finished. 39.3 Martingale problems In this section we consider operators in nondivergence form, that is, operators of the form given by (39.9). We assume throughout this section that the coefficients ai j and bi are bounded and measurable and that ai j(x) = aji(x) for all i, j = 1, . . . , d and all x ∈ Rd . The coefficients ai j are called the diffusion coefficients and the bi are called the drift coefficients. We also assume that the operator L is uniformly elliptic, which means that there exists � > 0
such that
d∑
i, j=1
yiai j(x)y j ≥ �|y|2, y ∈ Rd, x ∈ Rd . (39.12)
This says that the matrix ai j(x) is positive definite, uniformly in x.
We saw in the previous section that if Xt is the solution to (39.1), a = σσ T , and f ∈ C2,
then
f (Xt ) − f (X0) −
∫ t
0
L f (Xs) ds (39.13)

316 Markov processes and SDEs
is a local martingale under P. A very fruitful idea of Stroock and Varadhan is to phrase the
association of Xt with L in terms which use (39.13) as a key element. Let � consist of all
continuous functions ω mapping [0, ∞) to Rd . Let Xt (ω) = ω(t) and given a probability P,
let {Ft} be the minimal augmented filtration generated by X . A probability measure P is a
solution to the martingale problem for L started at x0 if
P(X0 = x0) = 1 (39.14)
and
f (Xt ) − f (X0) −
∫ t
0
L f (Xs) ds (39.15)
is a local martingale under P whenever f ∈ C2(Rd ). The martingale problem is well posed
if there exists a solution P and this solution is unique.
Uniqueness of the martingale problem for L is closely connected to weak uniqueness or,
equivalently, uniqueness in law of (39.1).
Theorem 39.4 Suppose a = σσ T and suppose the matrix σ (x) is invertible for each x.
Weak uniqueness for (39.1) holds if and only if the solution for the martingale problem for L
started at x is unique. Weak existence for (39.1) holds if and only if there exists a solution to
the martingale problem for L started at x.
Proof We prove the uniqueness assertion. Let � be the continuous functions on [0, ∞)
and Zt the coordinate process: Zt (ω) = ω(t). First suppose the solution to the martingale
problem is unique. If (X 1t ,W
1
t , P1) and (X
2
t ,W
2
t , P2) are two weak solutions to (39.1), define
Pxi on � to be the law of X
i under Pi, i = 1, 2. Clearly Pxi (Z0 = x) = Pi(X i0 = x) = 1.
The expression in (39.13) is a local martingale under Pxi for each i and each f ∈ C2. By the
uniqueness for the solution of the martingale problem, Px1 = Px2. This implies that the laws
of X 1t and X
2
t are the same, or weak uniqueness holds.
Now suppose weak uniqueness holds for (39.1). Let
Yt = Zt −
∫ t
0
b(Zs) ds.
Let Px1 and P
x
2 be solutions to the martingale problem. If f (x) = xk , the kth coordinate of x,
then ∂ f /∂xi(x) = δik and ∂2 f /∂xi∂x j(x) = 0, where δik is 1 if i = k and 0 otherwise, and
so L f (Zs) = bk(Zs). We see from (39.13) that the kth coordinate of Yt is a local martingale
under Pxi .
Now let f (x) = xkxm. A simple computation shows that L f (x) = akm(x), hence Y kt Y mt −∫ t
0 akm(Zs) ds is a local martingale. We set
Wt =
∫ t
0
σ−1(Zs) dYs.
The stochastic integral is finite since
E
∫ t
0
d∑
j=1
(σ−1)i j(Zs)
d∑
k=1
(σ−1)ik(Zs) d〈Y j,Y k〉s (39.16)
= E
∫ t
0
d∑
i,k=1
(a−1)ik(Zs)aik(Zs) ds = t < ∞. Exercises 317 Since Yt is a local martingale, it follows that Wt is a local martingale, and a calculation similar to (39.16) shows that W kt W m t − δkmt is also a martingale under Pxi . By Lévy’s theorem (Exercise 12.4), Wt is a Brownian motion under both Px1 and P x 2, and (Zt,Wt, P x i ) is a weak solution to (39.1). By the weak uniqueness hypothesis, the laws of Zt under Px1 and P x 2 agree, which is what we wanted to prove. Exercise 39.1 asks you to prove that the existence of a weak solution to (39.1) is equivalent to the existence of a solution to the martingale problem. If the σi j and bi are Lipschitz functions, the solution to (39.1) is pathwise unique; see Exercise 24.5. By Proposition 25.2, weak existence and uniqueness hold, and then the martingale problem for L is well posed for every starting point. A process that can be described in terms of a martingale problem (as well as other ways) is super-Brownian motion. Super-Brownian motion, also known as a measure-valued branching diffusion process, is a process whose state space is the set M of finite positive measures on Rd . The intuitive picture is as follows. Given an initial finite measure μ as a starting point, let X nt be the process that starts with [nμ(R d )] particles, each with mass 1/n, each distributed according to μ(dx)/μ(Rd ), where [·] denotes the integer part. Each particle moves as an independent Brownian motion for a time 1/n, at which time each particle splits into two or dies, independently of the other particles. The particles that are now alive move as independent Brownian motions for time 1/n, at which time each particle splits into two or dies, and so on. X nt is the measure that assigns mass 1/n at each point at which there is a particle alive at time t. We take the right-continuous version of X nt . It turns out that the sequence converges weakly with respect to the topology of D[0, 1], but where the state space is the set of right-continuous functions with left limits taking values in M (rather than the set of real-valued functions) and the limit law can be characterized as the unique solution to a martingale problem. A solution to this martingale problem started at μ ∈ M is a probability measure on the space of continuous processes taking values in M such that (1) P(X0 = μ) = 1; (2) if f ∈ C∞ has compact support and we write ν( f ) for ∫ f dν, then M ft = Xt ( f ) − ∫ t 0 Xr( 1 2� f ) dr is a continuous martingale with quadratic variation process given by 〈M ft 〉 = ∫ t 0 Xr( f 2) dr. See Dawson (1993) and Perkins (2002) for more on these processes. Exercises 39.1 Show that the existence of a weak solution to (39.1) is equivalent to the existence of a solution to the martingale problem for L. 39.2 Suppose the ai j are Lipschitz functions in x and the matrices a(x) are positive definite, uniformly in x; see Exercise 25.4. Show that we can find matrices σ (x) so that each σi j is a Lipschitz function of x and a(x) = σ (x)σ T (x) for each x. 318 Markov processes and SDEs 39.3 If X is a solution to (39.1), give formulas for At and Mt in terms of σ and b, where Mt is a local martingale, At is a process whose paths are locally of bounded variation, and |Xt | = Mt + At . 39.4 Let A ∈ (−1,∞) and let X be a solution to (39.1), where all the bi’s are equal to 0, a = σσ T , and ai j(x) = δi j + Axix j/|x| 2 1 + A for x �= 0, where δi j is equal to 1 if i = j and 0 otherwise. Let a(0) be the identity matrix. (1) Prove that the matrices a(x) are uniformly elliptic. (2) Show that |Xt | has the same law as a Bessel process of order d + A 1 + A . Conclude that if A is sufficiently close to −1, then X is transient, i.e, limt→∞ |Xt | = ∞, a.s., while if A is sufficiently large, there exist arbitrarily large times t such that Xt = 0. 39.5 Suppose for each n ≥ 1, ani j(x) is symmetric in i and j, is continuous in x, and the matrix whose (i, j)th entry is ani j(x) is positive definite, uniformly in x and n. Let Ln f (x) = d∑ i, j=1 ani j(x) ∂2 f ∂xi∂x j (x) (39.17) for f ∈ C2. Suppose ani j(x) converges to ai j(x) uniformly in x as n → ∞, and define L analogously to (39.17). Fix x0 and let Pn be a solution to the martingale problem for Ln started at x0. (1) Prove that Pn converges weakly to a solution P to the martingale problem for L started at x0. (2) Prove that if the ai j are continuously differentiable functions of x whose first partial derivatives are bounded, then there exists a solution to the martingale problem for L started at x0. (3) Prove that if the ai j are continuous functions of x, then there exists a solution to the martingale problem for L started at x0. 39.6 Suppose X is a solution to dXt = σ (Xt ) dWt , where W is a d-dimensional Brownian motion, σ (x) is a d × d matrix-valued function that is bounded, and σ T σ is positive definite, uniformly in x. Prove the following estimate for the time to leave a ball: there exist constants c1 and c2 not depending on x0 such that c1r 2 ≤ E x0τB(x0,r) ≤ c2r2, r > 0,
where τB(x0,r) = inf{> 0 : Xt /∈ B(x0, r)}.
Notes
See Bass (1997) for more information.

40
Solving partial differential equations
We will be concerned with giving probabilistic representations of the solutions to certain
PDEs. Throughout we will be assuming that the given PDE has a solution, the solution is
unique, and the solution is sufficiently smooth. We will consider Poisson’s equation, the
Dirichlet problem, the Cauchy problem (with an application to Brownian passage times), and
Schrödinger’s equation.
We let Xt be the solution to
dXt = σ (Xt ) dWt + b(Xt ) dt. (40.1)
Here W is a d-dimensional Brownian motion, σ is a bounded Lipschitz continuous d × d
matrix-valued function, b is a bounded Lipschitz continuous d × 1 matrix-valued function,
and X takes values in Rd . We let a = σσ T and we consider the operator on C2 functions
given by
L f (x) = 12
d∑
i, j=1
ai j(x)
∂2 f
∂xi∂x j
(x) +
d∑
i=1
bi(x)
∂ f
∂xi
(x). (40.2)
We suppose the operator L is uniformly elliptic: there exists � > 0 such that
d∑
i, j=1
ai j(x)yiy j ≥ �
d∑
i=1
y2i , y1, . . . , yd ∈ Rd .
In fact, the uniform ellipticity of L will be used only to guarantee that the exit times
of bounded domains are finite, a.s.; see Exercise 40.1. For many non-uniformly elliptic
operators, it is often the case that the finiteness of the exit times is known for other reasons,
and the results then apply to equations involving these operators.
Let X xt be the solution to (40.1) when X0 = x and let Px be the law of X xt . As in
Chapter 39, we slightly abuse notation and say that (Xt, Px) is a strong Markov process.
40.1 Poisson’s equation
We consider first Poisson’s equation in Rd . Suppose λ > 0 and f is a C1 function with
compact support. Poisson’s equation is
Lu(x) − λu(x) = − f (x), x ∈ Rd . (40.3)
319

320 Solving partial differential equations
Theorem 40.1 Suppose u is a C2 solution to (40.3) such that u and its first and second
partial derivatives are bounded. Then
u(x) = E x
∫ ∞
0
e−λt f (Xt ) dt.
Proof Let u be the solution to (40.3). By Theorem 39.3,
u(Xt ) − u(X0) = Mt +
∫ t
0
Lu(Xs) ds,
where Mt is a martingale. By the product formula,
e−λtu(Xt ) − u(X0) =
∫ t
0
e−λsdMs +
∫ t
0
e−λsLu(Xs) ds − λ
∫ t
0
e−λsu(Xs) ds.
Taking the expectation with respect to Px and letting t → ∞,
−u(x) = E x
∫ ∞
0
e−λs(Lu − λu)(Xs) ds.
Since Lu − λu = − f , the result follows.
Let us now let D be a nice bounded domain, e.g., a ball. Poisson’s equation in D requires
one to find a function u such that Lu − λu = − f in D and u = 0 on ∂D, where f ∈ C2(D)
and λ ≥ 0. Here we can allow λ to be equal to 0.
Theorem 40.2 Suppose u is a solution to Poisson’s equation in a bounded domain D that is
C2 in D and continuous on D. Then
u(x) = E x
∫ τD
0
e−λs f (Xs) ds.
Proof The proof is nearly identical to that of the previous theorem. We already men-
tioned that τD < ∞, a.s.; see Exercise 40.1. Let Sn = inf{t : dist (Xt, ∂D) < 1/n}. By Theorem 39.3, u(Xt∧Sn ) − u(X0) = martingale + ∫ t∧Sn 0 Lu(Xs) ds. By the product formula, E xe−λ(t∧Sn )u(Xt∧Sn ) − u(x) = E x ∫ t∧Sn 0 e−λsLu(Xs) ds − E x ∫ t∧Sn 0 e−λsu(Xs) ds = −E x ∫ t∧Sn 0 e−λs f (Xs) ds. Now let n → ∞ and then t → ∞ and use the fact that u is zero on ∂D. 40.2 Dirichlet problem Let D be a ball (or other nice bounded domain) and let us consider the solution to the Dirichlet problem: given a continuous function f on ∂D, find u ∈ C(D) such that u is C2 in D and Lu = 0 in D, u = f on ∂D. (40.4) 40.3 Cauchy problem 321 We considered the Dirichlet problem in the special case when L is the Laplacian in Section 21.4. Theorem 40.3 Suppose u is a solution to the Dirichlet problem specified by (40.4). Then u satisfies u(x) = E x f (XτD ). Proof As we mentioned above, τD < ∞, a.s. Let Sn = inf{t : dist (Xt, ∂D) < 1/n}. By Theorem 39.3, u(Xt∧Sn ) = u(X0) + martingale + ∫ t∧Sn 0 Lu(Xs) ds. Since Lu = 0 inside D, taking expectations shows u(x) = E xu(Xt∧Sn ). We let t → ∞ and then n → ∞. By dominated convergence, we obtain u(x) = E xu(XτD ). This is what we want since u = f on ∂D. If v ∈ C2 and Lv = 0 in D, we say v is L-harmonic in D. 40.3 Cauchy problem The related parabolic partial differential equation ∂u ∂t = Lu is often of interest. Here u is a function of x ∈ Rd and t ∈ [0, ∞). When we write Lu, we mean Lu(x, t) = d∑ im j=1 ai j(x) ∂2u ∂xi∂x j (x, t) + d∑ i=1 bi(x) ∂u ∂xi (x, t). We will sometimes write ut for ∂u/∂t. Suppose for simplicity that the function f is a continuous function with compact support. The Cauchy problem is to find u such that u is bounded, u is C2 with bounded first and second partial derivatives in x, u is C1 in t for t > 0, and
ut (x, t) = Lu(x, t), t > 0, x ∈ Rd,
u(x, 0) = f (x), x ∈ Rd . (40.5)
Theorem 40.4 Suppose there exists a solution to (40.5) that is C2 in x and C1 in t for t > 0.
Then u satisfies
u(x, t) = E x f (Xt ).
Proof Fix t0 and let Mt = u(Xt, t0 − t). Note
∂
∂t
u(x, t0 − t) = −ut (x, t0 − t).

322 Solving partial differential equations
Similarly to the proof of Theorem 39.3 (see Exercise 40.2) but using now the multivariate
version of Itô’s formula,
u(Xt, t0 − t) = martingale +
∫ t
0
Lu(Xs, t0 − s) ds −
∫ t
0
ut (Xs, t0 − s) ds. (40.6)
Since ut = Lu, Mt is a martingale, and E xM0 = E xMt0 . On the one hand,
E xMt0 = E xu(Xt0, 0) = E x f (Xt0 ),
while on the other hand,
E
xM0 = E xu(X0, t0) = u(x, t0).
Since t0 is arbitrary, the result follows.
A very similar proof allows one to represent the solution to the Cauchy problem in a
bounded domain. Suppose u(x, t) is C2 in the x variable, C1 in the t variable, and satisfies
∂u
∂t
(x, t) = Lu(x, t)
for (x, t) ∈ D × (0, t1], where D is a bounded domain in Rd and t1 > 0. Suppose u(x, 0) =
f (x) and u(x, t) = 0 for all x ∈ ∂D. Exercise 40.3 asks you to show that in this case
u(x, t) = E x f (Xt∧τD ),
where again τD is the first exit time of X from the domain D.
The Cauchy problem has an application to the passage times of Brownian motion. Suppose
we look at the equation
ux(x, t) = 12 uxx(x, t), 0 < x < b, t > 0,
with
u(x, 0) = f (x) for all x, u(0, t) = u(b, t) = 0 for all t,
where f is a bounded function on [0, b]. This is a partial differential equation (the heat
equation) that is sometimes solved in undergraduate classes; see, e.g., Boyce and DiPrima
(2009), Section 10.5. Using a combination of the technique of separation of variables and
Fourier series expansions, the solution can then be shown to be
u(x, t) =
∫
f (y)p0(t, x, y) dy,
where
p0(t, x, y) = 2
b
∞∑
n=1
e−n
2π2t/2b2 sin(nπx/b) sin(nπy/b).
See also Knight (1981), p. 62. Since u(x, t) is also equal to E x f (Xt∧τD ), where D is the
interval (0, b), then the p0(t, x, y) are the transition densities for Brownian motion killed on
exiting (0, b).
In particular, if we take f identically equal to 1 on (0, b), we see that starting at x inside
(0, b), Px(t < τD) is asymptotically equal to ce−π 2t/2b2 . If b is 2, this becomes ce−π 2t/8. 40.4 Schrödinger operators 323 Since the time for a Brownian motion started at 0 to leave (−1, 1) is the same as the time for a Brownian motion started at 1 to leave (0, 2), we obtain the estimate that was used in Exercise 7.2. 40.4 Schrödinger operators Finally we look at what happens when one adds a potential term, that is, when one considers the operator Lu(x) + q(x)u(x). (40.7) This is known as the Schrödinger operator, and q(x) is known as the potential. Equa- tions involving the operator in (40.7) are considerably simpler than the quantum mechanics Schrödinger equation because here all terms are real-valued. If Xt is the diffusion corresponding to L, then solutions to PDEs involving the operator in (40.7) can be expressed in terms of Xt by means of the Feynman–Kac formula. To illustrate, let D be a nice bounded domain, e.g., a ball, q a C2 function on D, and f a continuous function on ∂D; q+ denotes the positive part of q. Theorem 40.5 Let D, q, f be as above. Let u be a C2 function on D that agrees with f on ∂D and satisfies Lu + qu = 0 in D. If E x exp ( ∫ τD 0 q+(Xs) ds ) < ∞, then u(x) = E x [ f (XτD )e ∫ τD 0 q(Xs) ds ] . (40.8) Proof Let Bt = ∫ t∧τD 0 q(Xs) ds. By Itô’s formula and the product formula, eB(t∧τD)u(Xt∧τD ) = u(X0) + martingale + ∫ t∧τD 0 u(Xr)e Br dBr + ∫ t∧τD 0 eBrLu(Xr) dr. Taking the expectation with respect to Px and using Proposition 39.3, E xeB(t∧τD)u(Xt∧τD ) = u(x) + E x ∫ t∧τD 0 eBr u(Xr)q(Xr) dr + E x ∫ t∧τD 0 eBrLu(Xr) dr. Since Lu + qu = 0, E xeB(t∧τD )u(Xt∧τD ) = u(x). If we let t → ∞ and use the exponential integrability of q+, the result follows. The existence of a solution to Lu + qu = 0 in D depends on the finiteness of E x exp( ∫ τD 0 q +(Xs) ds), an expression that is sometimes known as the gauge. Even in one dimension with D = (0, 1) and q a constant function, the gauge need not be finite. With x = 1/2, Px(τD > t) is asymptotically equal to ce−π2t/2 as t → ∞ by

324 Solving partial differential equations
Section 40.3. Hence
E
x exp
( ∫ τD
0
q ds
)
= E xeqτD
=
∫ ∞
0
qeqtPx(τD > t) dt;
this is infinite if q ≥ π2/2.
Exercises
40.1 This (lengthy) exercise is designed to guide you through a proof that solutions to (40.1) exit
bounded sets in finite time, a.s.
(1) Suppose
Xt = Wt +
∫ t
0
as ds,
where W is a one-dimensional Brownian motion, and as is an adapted process bounded by K.
Let L > K > 0 and t0 > 0. Show that there exists ε > 0, depending only on L, K, and t0 such
that P(|Xt0 | > 3L) > ε.
(2) Suppose Xt = Mt +
∫ t
0 as ds, where as is as in (1) and M is a continuous martingale with
K−1 ≤ d〈M〉t/dt ≤ K, a.s. Use a time change argument to show that there exist L, ε > 0 such
that
P(sup
s≤1
|Xs| ≤ L) ≤ 1 − ε.
(3) If now X is a solution to (40.1), a = σσ T , and L given by (40.2) is uniformly elliptic,
show by looking at the first coordinate of X that there exist L, ε such that
Px(sup
s≤1
|Xs| ≤ L) ≤ 1 − ε, x ∈ B(0, L).
(4) What you have proved in (3) can be rephrased as saying that if (Xt , Px) is a strong
Markov process that solves (40.1) for every starting point and τ = inf{t : Xt /∈ B(0, L)}, then
Px(τ > 1) ≤ 1 − ε, where ε does not depend on x. Now use the strong Markov property (cf. the
proof of Proposition 21.2) to show Px(τ > k) ≤ (1 − ε)k . Conclude that τ < ∞, Px-a.s., for each starting point x. 40.2 Prove (40.6). 40.3 Let D be a ball in Rd and suppose u is the solution to the Cauchy problem in the domain D × [0, t1] as described in Section 40.3. Show that u(x, t) = E x f (Xt∧τD ). 40.4 Suppose f is such that the solution u to ut (x, t) = Lu(x, t) + q(x), u(x, 0) = f (x), is C2 in x and t and X is the diffusion associated with L. Prove that u(x, t) = E x [ f (Xt )e ∫ t 0 q(Xs ) ds ] . Notes 325 40.5 Suppose (Xt , Px) is a Brownian motion on [0, b] with reflection at 0 and b. Find a series expansion for p(t, x, y), the transition densities for X . Hint: Imitate the argument for absorbing Brownian motion in Section 40.3, but now use the boundary conditions ux(0, t) = ux(b, t) = 0. Notes See Bass (1997) for more on the connection between probability and PDEs. 41 One-dimensional diffusions Under very mild regularity conditions, every one-dimensional diffusion arises from first time-changing a one-dimensional Brownian motion and then making a transformation of the state space. We will prove this fact in this chapter. 41.1 Regularity Throughout this chapter we suppose that we have a continuous process (Xt, Px) defined on an interval I contained in R. For almost all of the chapter, we suppose for simplicity that the interval is in fact all of R. We further suppose that (Xt, Px) is a strong Markov process with respect to a right-continuous filtration {F t} such that each Ft contains all the sets that are Px-null for every x. We call such a process a one-dimensional diffusion. Write Ty = inf{t : Xt = y}, (41.1) the first time the process X hits the point y. We will also assume that every point can be hit from every other point: for all x, y, Px(Ty < ∞) = 1. (41.2) When (41.2) holds, we say the diffusion is regular. For any interval J , define τJ = inf{t : Xt /∈ J}, the first time the process leaves J . When Xt is a Brownian motion, we know (Proposition 3.16) that the distribution of Xt upon exiting [a, b] is Px(X (τ[a,b]) = a) = b − x b − a , P x(X (τ[a,b]) = b) = x − a b − a . (41.3) We say that a regular diffusion Xt is on natural scale if (41.3) holds for every interval [a, b]. We also say a regular diffusion X defined on an interval I properly contained in R is on natural scale if (41.3) holds whenever [a, b] ⊂ I and x ∈ (a, b). If Xt is regular, then the process started at x must leave x immediately. That is, if S = inf{t > 0 : Xt �= x}, then Px(S = 0) = 1. To see this, let ε > 0 and U = inf{t : |Xt −x| ≥ ε}.
By the regularity of X , E xe−U > 0. Observe that U = S + U ◦ θS , where θt is the shift
operator. By the strong Markov property at time S,
E
xe−U = E x[e−SE x[e−U ◦ θS | FS] ] = E x[e−SE XS [e−U ] ] = E x[e−SE xe−U ],
since XS = x by the continuity of the paths of X . The only way this can happen is if
E
xe−S = 1, which implies S = 0, Px-a.s.
326

41.2 Scale functions 327
41.2 Scale functions
We will show that given a regular diffusion, there exists a scale function that is continuous,
strictly increasing, and such that s(Xt ) is on natural scale.
We first look at a special case, when the diffusion is given as the solution to an SDE.
Suppose Xt is given as the solution to
dXt = σ (Xt ) dWt + b(Xt ) dt, (41.4)
where we assume σ and b are real-valued, continuous and bounded above and σ is bounded
below by a positive constant. Let a(x) = σ 2(x). In this case we can give a formula for the
scale function.
Theorem 41.1 The scale function s(x) is the solution to
1
2 a(x)s
′′(x) + b(x)s′(x) = 0,
and for some constants c1, c2, and x0 is given by
s(x) = c1 + c2
∫ x
x0
exp
(
−
∫ y
x0
2b(w)
a(w)
dw
)
dy. (41.5)
Proof To solve the differential equation, we write
s′′(x)
s′(x)
= −2 b(x)
a(x)
,
or (log s′(x))′ = −2b(x)/a(x), from which (41.5) follows. Since we assumed that σ and b
are continuous, s(x) given by (41.5) is C2. Since σ is bounded below by a positive constant
and b and σ are bounded, s given by (41.5) is strictly increasing. Applying Itô’s formula,
s(Xt ) − s(X0) =
∫ t
0
s′(Xr)σ (Xr) dWr (41.6)
because ∫ t
0
[ 12 s
′′(Xr)σ (Xr)2 + s′(Xr)b(Xr)] dr = 0.
This implies that s(Xt ) − s(X0) is a martingale, hence a time change of Brownian motion.
Therefore the exit probabilities of s(Xt ) for an interval [a, b] are the same as those of a
Brownian motion, namely, those given by (41.3).
From (41.6), if Yt = s(Xt ), then
dYt = (s′σ )(s−1(Yt )) dWt . (41.7)
Now we show there exists a scale function for general regular diffusions on R. Let J be
an interval [a, b]. We define
p(x) = pJ (x) = Px(XτJ = b). (41.8)
Proposition 41.2 Let J = [a, b] be a finite interval. Then p(Xt∧τJ ) is a regular diffusion on
[0, 1] on natural scale.

328 One-dimensional diffusions
Proof First we show that p is increasing. To get to the point b starting from x, the process
must first hit every point between x and b because X has continuous paths. If a < x < y < b, by the strong Markov property at time Ty, p(x) ≤ p(y). We claim there is a positive probability that the process starting from x hits a before y, that is, Px(Ta < Ty) > 0. (41.9)
If (41.9) did not hold, then the process started at x must hit y before hitting a, then by the
continuity of paths must hit x before hitting a, and once the process is again at x, it again hits
y with probability one before a and so on. Therefore the process never hits a, a contradiction
to the regularity; Exercise 41.2 asks you to make this argument precise. Therefore (41.9)
does hold, and by the strong Markov property at Ty,
p(x) = Px(Ty < Ta)p(y). Since Px(Ty < Ta) = 1 − Px(Ta < Ty) is strictly less than 1, p is strictly increasing. Next we show that p is continuous. We show continuity from the right; the proof of continuity from the left is similar. Suppose xn ↓ x. The process Xt has continuous paths, so given ε we can find t small enough so that Px(Ta < t) < ε. By the Blumenthal 0–1 law (Proposition 20.8), Px(T(x,b] = 0) is zero or one, where T(x,b] is the first time the process hits the interval (x, b]. If it is zero, the process immediately moves to the left from x, a.s., and by the strong Markov property at Tx, it never hits b, a contradiction. The probability must therefore be one. Thus by the continuity of paths, for n large enough, Px(Txn < t) ≥ 1 − ε. Hence with probability at least 1 − 2ε, Xt hits xn before a. Since p(x) = Px(Txn < Ta) p(xn) ≥ (1 − 2ε)p(xn) and ε is arbitrary, we see that p(x) ≥ lim inf n→∞ p(xn). Since p is strictly increasing, p(xn) decreases, and therefore p(x) = lim p(xn). Finally, we show p(Xt ) is on natural scale. Let [e, f ] ⊂ (0, 1) and let r(y) = Py(Xt hits p−1( f ) before hitting p−1(e)). Note that Px(p(Xt ) hits f before e) = Pp−1(x)(Xt hits p−1( f ) before p−1(e)) = r(p−1(x)). (41.10) For y ∈ [p−1(a), p−1(b)], the strong Markov property tells us that p(y) = Py(Xt hits p−1( f ) before p−1(e))p(p−1( f )) (41.11) + Py(Xt hits p−1(e) before p−1( f ))p(p−1(e)) = r(y) f + (1 − r(y))e. Solving for r(y), we obtain r(y) = (p(y) − e)/( f − e). Substituting in (41.10), Px(p(Xt ) hits f before e) = r(p−1(x)) = (p(p−1(x)) − e)/( f − e) = (x − e)/( f − e), which is the formula we wanted. 41.3 Speed measures 329 Note that if Xt is on natural scale, then so is c1Xt + c2 for any constants c1 > 0, c2 ∈ R.
Theorem 41.3 There exists a continuous strictly increasing function s such that s(Xt ) is on
natural scale on s(R).
Proof Let Jn be closed intervals increasing up to R. Pick two points in J1; label them a
and b with a < b. Choose An and Bn so that if sn(x) = An pJn (x) + Bn, then sn(a) = 0 and sn(b) = 1. We will show that if n ≥ m, then sn = sm on Jm. Once we have that, we can set s(x) = sn(x) on Jn, and the theorem will be proved. Suppose Jm = [e, f ]. By Proposition 41.2, both sm(Xt ) and sn(Xt ) are on natural scale. For all x ∈ Jm, sm(x) − sm(e) sm( f ) − sm(e) = P sm(x) ( sm(Xt ) hits sm( f ) before sm(e) ) = Px(Xt hits f before e). We have a similar equation with sm replaced everywhere by sn. It follows that sm(x) − sm(e) sm( f ) − sm(e) = sn(x) − sn(e) sn( f ) − sn(e) for all x, which implies that sn(x) = Csm(x) + D for some constants C and D. Since sn and sm are equal at both x = a and x = b, then C must be 1 and D must be 0. 41.3 Speed measures Suppose that (Px, Xt ) is a regular diffusion on R on natural scale. For each finite interval (a, b), define Gab(x, y) = { 2(x−a)(b−y) b−a , a < x ≤ y < b, 2(y−a)(b−x) b−a , a < y ≤ x < b, (41.12) and set Gab(x, y) = 0 if x or y is not in (a, b). A measure m(dx) is the speed measure for the diffusion (Xt, Px) if E x τ(a,b) = ∫ Gab(x, y) m(dy) (41.13) for each finite interval (a, b) and each x ∈ (a, b). As (41.13) indicates, the speed measure governs how quickly the diffusion moves through intervals. As an example, let us argue that the speed measure for Brownian motion is a Lebesgue measure. By Proposition 3.16, if (Xt, Px) is a Brownian motion, E x τ(a,b) = (x − a)(b − x). On the other hand, a calculation shows that∫ Gab(x, y) dy = (x − a)(b − x). 330 One-dimensional diffusions Since E x τ(a,b) = ∫ Gab(x, y) dy and Brownian motion is on natural scale, we see that the speed measure m(dy) of Brownian motion is equal to a Lebesgue measure. We will show that (1) a regular diffusion on natural scale has one and only one speed measure, (2) the law of the diffusion is determined by the speed measure, and (3) there exists a diffusion with a given speed measure. We first want to show that any speed measure must satisfy 0 < m(a, b) < ∞ for any finite interval [a, b]. To start we have the following lemma. Lemma 41.4 If [a, b] is a finite interval, then supx E x τ k(a,b) < ∞ for each positive integer k. Proof Pick y ∈ (a, b). Since Xt is a regular diffusion, Py(Ta < ∞) = 1, and hence there exists t0 such that Py(Ta > t0) < 1/2. Similarly, taking t0 larger if necessary, Py(Tb > t0) ≤
1/2. If a < x ≤ y, then Px(τ(a,b) > t0) ≤ Px(Ta > t0) ≤ Py(Ta > t0) ≤ 1/2,
and similarly, Px(τ(a,b) > t0) ≤ 1/2 if y ≤ x < b. By the Markov property, Px(τ(a,b) > (n + 1)t0) = E x[PX (nt0 )(τ(a,b) > t0); τ(a,b) > nt0]
≤ 12Px(τ(a,b) > nt0),
and by induction, Px(τ(a,b) > nt0) ≤ 2−n. The lemma is now immediate.
Lemma 41.5 If (Xt, Px) has a speed measure m and [a, b] is a non-empty finite interval,
then 0 < m(a, b) < ∞. Proof If m(a, b) = 0, then for x ∈ (a, b), we have E xτ(a,b) = ∫ Gab(x, y) m(dy) = 0, which implies τ(a,b) = 0, Px-a.s., a contradiction to the continuity of the paths of Xt . Next we show the finiteness of m(a, b). Pick (e, f ) such that [a, b] ⊂ (e, f ). There exists a constant c such that for x, y ∈ (a, b), Ge f (x, y) is bounded below by c, so m(a, b) ≤ c−1 ∫ f e Ge f (x, y) m(dy) = c−1E xτ(e, f ) < ∞. This completes the proof. Theorem 41.6 A regular diffusion on natural scale on R has one and only one speed measure. Proof First let I = (e, f ) be a finite open interval. For n > 1 let xi = e + i( f − e)/2n,
i = 0, 1, 2, . . . , 2n. Let Dn = {xi : 0 ≤ i ≤ 2n}. Let
mn(dx) = 2n
2n−1∑
i=1
B(xi)δxi, (41.14)

41.3 Speed measures 331
where B(xi) = E xiτ(xi−1,xi+1). We first want to show that if [a, b] is a subinterval of I with a, b
each in Dn and x is also in Dn, then
E xτ(a,b) =
∫
Gab(x, y) mn(dy). (41.15)
To see this, let S0 = 0 and Sj+1 = inf{t > Sj : |Xt − XSj | = 2−n} ∧ τ(a,b). The Sj’s are
the successive times that X moves 2−n, up until the time of leaving (a, b). Because X is
on natural scale, XSj+1 is equal to XSj + 2−n with probability 12 and equal to XSj − 2−n with
probability 12 , until leaving (a, b). Therefore XSj is a simple symmetric random walk on the
lattice with step size 2−n, stopped on leaving (a, b).
Let J (xi) = (xi − 2−n, xi + 2−n) for xi �= a, b. Let J (a) = J (b) = ∅. By repeated use of
the strong Markov property,
E
x
τ(a,b) =
∞∑
j=0
E
x
(Sj+1 − Sj)
= E x
∞∑
j=0
E
X (S j )[τJ (X0 )] = E x
∞∑
j=0
B(XSj )1(a,b)(XSj ).
Let Ni =
∑∞
j=0 1{xi}(XSj ), the number of visits to xi before exiting (a, b). Then
E
x
τ(a,b) = E x
∞∑
j=0
B(XSj )1(a,b)(XSj ) (41.16)
= E x
∞∑
j=0
2n−1∑
i=1
B(XSj )1{xi}(XSj )
= E x
2n−1∑
i=1
B(xi)Ni.
E
xNi must equal 0 when x = a or x = b and satisfies the equation
E
x j Ni = δi j + 12 (E x j+1 Ni + E x j−1 Ni), (41.17)
where δi j is 1 if i = j and 0 otherwise. This holds because for j �= i, the process goes left or
right, each with probability 1/2, while if j = 1, we add one to Ni before going left or right.
The function x → E xNi is hence piecewise linear on (a, xi) and on (xi, b). Some algebra
shows that we must have
E
xNi = 2nGab(x, xi). (41.18)
Combining (41.16) and (41.18),
E
x
τ(a,b) =
2n−1∑
i=1
B(xi)2
nGab(x, xi)
=
∫
Gab(x, y) mn(dy),
which is (41.15).

332 One-dimensional diffusions
Using (41.15) and the same proof as that of Lemma 41.5, mn(a, b) is bounded above by
a constant independent of n. By a diagonalization procedure, there exists a subsequence nk
such that mnk converges weakly to m, where m is a measure that is finite on every subinterval
(a, b) such that [a, b] ⊂ I . By the continuity of Gab,
E
x
τ(a,b) =
∫
Gab(x, y) m(dy) (41.19)
whenever a, b, and x are in Dn for some n.
We now remove this last restriction. If a, b are not of this form, take ar, br to be in ∪nDn
such that (ar, br) ↑ (a, b). Then τ(ar,br ) ↑ τ(a,b), and by the continuity of Gab in a, b, x, and y,
we have (41.19) for all a and b. Take yr ↑ x, zr ↓ x such that yr and zr are in Dn for some n.
By the strong Markov property,
E
x
τ(a,b) = E xτ(yr,zr ) + E yrτ(a,b)Px(Xτ(yr ,zr ) = yr)
+ E zrτ(a,b)Px(Xτ(yr ,zr ) = zr).
By the continuity of Gab in x, and the fact that E
x
τ(y′r,zr ) → 0 as r → ∞, we obtain (41.19)
for all x.
We leave the uniqueness as Exercise 41.3.
Finally, let Ik be finite subintervals increasing up to R. Let mk be the speed measure for Xt
on the interval Ik . By the uniqueness result, mk agrees with m� on I� if I� ⊂ Ik . Setting m to
be the measure whose restriction to Ik is mk gives us the speed measure.
The speed measure completely characterizes occupation times.
Corollary 41.7 Suppose Xt is a diffusion on natural scale on R. If f is bounded and
measurable, for each a < b, E x ∫ τ(a,b) 0 f (Xs) ds = ∫ Gab(x, y) f (y) m(dy). (41.20) Proof Suppose that f is continuous and bounded on [a, b]. Let xi, Sj, B(xi), Ni, and mn be as in the proof of Theorem 41.6. Let εn = sup{| f (x) − f (y)| : |x − y| ≤ 2−n}. Note that if (x − a)/(b − a) is a multiple of 2−n, E x ∫ τ(a,b) 0 f (Xs) ds = ∞∑ j=0 E x ∫ S j+1 S j f (Xs) ds (41.21) and E x ∞∑ j=0 f (XSj )(Sj+1 − Sj) = E x ∞∑ j=0 f (XSj )1(a,b)(XSj )E XS j S1 (41.22) = 2n−1∑ i=1 f (xi)B(xi)1(a,b)(xi)E xNi. 41.4 The uniqueness theorem 333 Moreover, the right-hand side of (41.21) differs from the left-hand side of (41.22) by at most εnE x τ(a,b). By (41.18) the right-hand side of (41.22) is equal to 2n−1∑ i=1 2n f (xi)B(xi)1(a,b)(xi)Gab(x, xi) = ∫ Gab(x, xi) f (xi) mn(dx). By weak convergence along an appropriate subsequence, the left-hand side and the right- hand side of (41.20) differ by at most lim supn εnE x τ(a,b), which is zero. A limit argument then shows that (41.20) holds for all x ∈ [a, b], and another limit argument shows that (41.20) holds for all bounded f . 41.4 The uniqueness theorem We next turn to showing that the speed measure characterizes the law of a diffusion. Theorem 41.8 If (Xt, Pxi ), i = 1, 2, are two diffusions on natural scale with the same speed measure m, then Px1 = Px2. Proof We start by letting (a, b) ⊂ R and considering the operator Riλ f (x) = E xi ∫ τ(a,b) 0 e−λt f (Xt ) dt, λ ≥ 0, (41.23) for i = 1, 2. We show first that R10 = R20, that is, that E x 1 ∫ τ(a,b) 0 f (Xt ) dt = E x2 ∫ τ(a,b) 0 f (Xt ) dt if f is bounded and Borel measurable. This is easy, because by Corollary 41.7, both sides are equal to ∫ b a Gab(x, y) m(dy). Since (X̂t, Pxi ) is a Markov process, where X̂ is the process X killed on exiting (a, b), the resolvent equation (37.2) holds. We have ‖Ri0 f ‖∞ ≤ ‖ f ‖∞ sup x E xτ(a,b) = ‖ f ‖∞ sup x ∫ Gab(x, y) m(dy) ≤ c‖ f ‖∞m(a, b) < ∞. Since ‖Ri0‖∞ < ∞, we can let μ go to zero in (37.2). We can repeat the proof of Corollary 37.3 with λ = 0 to see that Riμ f = Ri0 f + ∞∑ i=1 (−μ) j(Ri0) j+1 f provided μ < ‖Ri0‖∞. We can then use Remark 37.4 to obtain that R1λ = R2λ for all λ > 0. We
now take open intervals In increasing up to R. Applying the above to In and letting n → ∞,

334 One-dimensional diffusions
we have
E
x
1
∫ ∞
0
e−λt f (Xt ) dt = E x2
∫ ∞
0
e−λt f (Xt ) dt
whenever f is bounded and Borel measurable and x ∈ R.
Suppose f is continuous as well. By the uniqueness of the Laplace transform, we see
that E x1 f (Xt ) = E x2 f (Xt ) for almost every t, and since both terms are continuous in t, this
equality holds for all t. By a limit argument, this equality holds for all bounded and Borel
measurable f . Therefore the one-dimensional distributions of X under Px1 and P
x
2 agree.
If s < t and f and g are bounded and Borel measurable, E x 1[ f (Xs)g(Xt )] = E x1[ f (Xs)P1t−sg(Xs)] = E x1[ f (Xs)P2t−sg(Xs)] = E x1[( f P2t−sg)(Xs)] = E x2[( f P2t−sg)(Xs)] = E x2[ f (Xs)P2t−sg(Xs)] = E x2[ f (Xs)g(Xt )]. Here Pit−s is the semigroup for (Xt, P x i ); since the one-dimensional distributions agree, P 1 t−s = P2t−s. We have thus shown the two-dimensional distributions of X under P x 1 and P x 2 agree. Continuing, we see that all the finite-dimensional distributions under Px1 and P x 2 agree. By the continuity of the paths of X and Theorem 2.6, that is enough to show equality of Px1 and Px2. 41.5 Time change We now want to show that if m is a measure such that 0 < m(a, b) < ∞ for all intervals [a, b], then there exists a regular diffusion on natural scale on R having m as a speed measure. If m(dx) had a density, say m(dx) = r(x) dx, we would proceed as follows. Let Wt be a one-dimensional Brownian motion and let At = ∫ t 0 r(Ws) ds, Bt = inf{u : At > u}, Xt = WBt .
In other words, we let Xt be a certain time change of Brownian motion. In general, where
m(dx) does not have a density, we make use of the local times Lxt of Brownian motion; see
Chapter 14.
Let
At =
∫
Lxt m(dx), Bt = inf{u : Au > t}, Xt = WBt . (41.24)
Theorem 41.9 Let (Wt, Px) be a Brownian motion and m a measure on R such that 0 < m(a, b) < ∞ for every finite interval (a, b). Then, under Px, Xt as defined by (41.24) is a regular diffusion on natural scale with speed measure m. Proof First we show that Xt is a continuous process. Fix ω. If we choose a < inf s≤t Ws and b > sups≤t Ws, then
At =
∫
Lxt m(dx) =
∫
Lxt 1[a,b](x) m(dx)
since Lxt increases only for those times s when Ws = x. By the continuity of Lxt and dominated
convergence, we conclude that At (ω) is continuous at time t. Next we show that At is strictly
increasing. Fix ω. If s < u, pick t ∈ (s, u). Set x = Wt . Because the support of the measure 41.5 Time change 335 dLxt is the set {r : Wr = x}, then Lxu −Lxs > 0. By the continuity of local times, Lyu −Lys > 0 for
all y in a neighborhood of x, say (x − δ, x + δ). Since m(x − δ, x + δ) > 0, then Au − As > 0.
Hence At is strictly increasing. This and the continuity of At imply that Bt is continuous, and
therefore Xt is continuous.
Next we show that Xt is a regular diffusion on natural scale. By monotone convergence
and the fact that Lxt → ∞, a.s., for each x, At ↑ ∞, hence Bt ↑ ∞, so τX(a,b) < ∞, Px-a.s., where τX(a,b) denotes the exit time of (a, b) by Xt and τ W (a,b) denotes the corresponding exit time of Wt . Moreover, Px(X (τX(a,b)) = b) = Px(W (τW(a,b)) = b) = x − a b − a , since Xt is a time change of Wt . To verify the strong Markov property, we repeat the argument of Section 22.3. Let F ′t = FBt . Then if T is a stopping time for F ′t , we have E x[ f (XT+t ) | F ′T ] = E x[ f (W (BT+t )) | FBT ]. BT can be seen to be a stopping time for Ft and BT+t = Bt ◦ θBT where θt are the shift operators, so this is E x E W (BT ) f (WBt ) = E xE XT f (Xt ). As in Section 20.3, this suffices to show that Xt is a strong Markov process. It remains to determine the speed measure of Xt . Fix (a, b) and write τX for τX(a,b) and τW for τW(a,b). We have E x τX = E x ∫ ∞ 0 1(a,b)(Xs∧τX ) ds = E x ∫ ∞ 0 1(a,b)(WBs∧τX ) ds = E x ∫ ∞ 0 1(a,b)(Wt∧τW ) dAt = E x ∫ ∫ ∞ 0 1(a,b)(Wt∧τW ) L y t m(dy) = E x ∫ ∫ τW 0 Lyt m(dy) = ∫ E xLyτW m(dy). We also have E xLyτW = E x|WτW − y| − |x − y| by (14.5). This is equal to |a − y|Px(WτW = a) + |b − y|Px(WτW = b) − |x − y| = |a − y|b − x b − a + |b − y| x − a b − a − |x − y| = Gab(x, y). We thus have E x τX = ∫ Gab(x, y) m(dy), as required. 336 One-dimensional diffusions As a corollary to the proof, we see that a regular diffusion on natural scale is a local martingale, since it is a time change of Brownian motion. 41.6 Examples Let us calculate the scale function and the speed measure for some examples of diffusions. First we need to connect the speed measure with the coefficients of an SDE. Let us look at the solutions to the SDE (41.4), but now suppose b is identically zero, or dXt = σ (Xt ) dWt . We again set a(x) = σ (x)2. Theorem 41.10 Suppose c1 < σ (x) < c2 for all x and σ is continuous. The speed measure of Xt is given by m(dx) = 1 a(x) dx. Proof Since dXt = σ (Xt ) dWt , then 〈X 〉t = ∫ t 0 a(Xs) ds. To obtain a Brownian motion W t by time-changing the martingale Xt , we must time-change by the inverse of 〈X 〉t . On the other hand, from Theorem 41.9, Xt is the time-change of a Brownian motion by Bt , where Bt is given by (41.24). Hence Bt = 〈X 〉t = ∫ t 0 a(Xs) ds. The inverse of Bt , namely, At , must then satisfy dAt dt = 1 a(XAt ) = 1 a(Wt ) , or At = ∫ t 0 1 a(Ws) ds = ∫ Lyt 1 a(y) dy for all t, using Theorem 14.4. However, At = ∫ Lyt m(dy) by (41.24). Hence∫ Lyt 1 a(y) dy = ∫ Lyt m(dy). We know E xLyτ(c,d) = Gcd (x, y). Therefore∫ Gcd (x, y) m(dy) = ∫ E xLyτ(c,d) m(dy) = ∫ E xLyτ(c,d) 1 a(y) dy = ∫ Gcd (x, y) 1 a(y) dy for all c, d, and x, which implies m(dy) = (1/a(y)) dy. Now we can look at some examples and do calculations. Brownian motion with constant drift. This process is the solution to the SDE dXt = dWt+b dt. From Theorem 41.1, s(x) = exp(−2bx) is the scale function. If Yt = s(Xt ), then Exercises 337 (s′σ )(s−1(y)) = −2by, or Yt corresponds to the operator 2b2y2 f ′′, and the speed measure is (4b2y2)−1 dx. Bessel processes. The process is only defined on the state space [0, ∞) instead of all of R and there is a boundary condition at 0. We ignore this here and consider a Bessel process of order ν up until the first hit of 0. Then X solves the SDE dXt = dWt + ν − 1 2Xt dt. If ν �= 2, a calculation using Theorem 41.1 shows that s(x) = x2−ν . Then Yt = s(Xt ) satisfies dYt = (2 − ν)Y (1−ν)/(2−ν)t dWt, and the speed measure is m(dx) = (2 − ν)−2x(2ν−2)/(2−ν) dx, x > 0.
Exercises
41.1 In the proof of Proposition 41.2 we used the strong Markov property numerous times. Write
out carefully in terms of shift operators and conditional expectations how the strong Markov
property is applied in each case.
41.2 Give a rigorous proof of (41.9).
41.3 Show that if ∫
Gab(x, y) m1(dy) =
∫
Gab(x, y) m2(dy)
for all x, a, and b, then m1 = m2.
41.4 Show that if X is a Bessel process of order 2, then the scale function is given by s(x) = log x,
Yt = s(Xt ) satisfies dYt = e−Yt dWt , and the speed measure is m(dx) = e2x dx.
41.5 Suppose X is a regular diffusion whose state space is R. Prove that X is on natural scale if and
only if
P(a+b)/2(Ta < Tb) = 12 whenever a < b. 41.6 Let a > 0 and let m(dx) = dx + a δ0(dx), where δ0 is the point mass at 0. Let (Xt , Px) be the
diffusion on the line on natural scale whose speed measure is given by m. Show that under P0,∫ t
0
1{0}(Xs) ds > 0
with probability one for each t > 0. Prove that for each t > 0, Zt = {t : Xt = 0} contains no
intervals. Thus the zero set of the process X spends an amount of time at 0 that has positive
Lebesgue measure, but the zero set contains no intervals.
41.7 Define
ma(dx) =
{
dx, x ≥ 0,
a dx, x < 0. 338 One-dimensional diffusions Let (Xt , Pxa) be the diffusion on natural scale on the line whose speed measure is given by ma. Suppose x > 0.
Prove that if a → ∞, then Pxa converges weakly to the law of Brownian motion absorbed (i.e.,
killed) at 0, started at x. What do you think happens when a → 0?
Notes
We have considered diffusions on R but most of what we discussed goes through for diffusions
whose state space is an interval properly contained in R. In this case, one must specify what
the process does when it hits the boundary. Being absorbed (i.e., killed) or reflected are two
options, but much more complicated behavior is possible. See Itô and McKean (1965) and
Knight (1981) for the complete story.

42
Lévy processes
A Lévy process is a process with stationary and independent increments whose paths are
right continuous with left limits. Having stationary increments means that the law of Xt − Xs
is the same as the law of Xt−s −X0 whenever s < t. Saying that X has independent increments means that Xt − Xs is independent of σ (Xr; r ≤ s) whenever s < t. We want to examine the structure of Lévy processes. We have three examples already: the Poisson process, Brownian motion, and the deterministic process Xt = t. It turns out that all Lévy processes can be built up out of these building blocks. We will show how to construct Lévy processes and give a representation of an arbitrary Lévy process. Recall that we use Xt− = lims 0 such that supt |�Xt | ≤
K, a.s.
Lemma 42.3 If Xt is a Lévy process with bounded jumps and with X0 = 0, then Xt has
moments of all orders, that is, E |Xt |p < ∞ for all positive integers p. Proof Suppose the jumps of Xt are bounded in absolute value by K. Since Xt is right continuous with left limits, there exists M > K such that P(sups≤t |Xs| ≥ 2M ) ≤ 1/2.

42.2 Construction of Lévy processes 341
Let T1 = inf{t : |Xt | ≥ M} and Ti+1 = inf{t > Ti : |Xt − XTi | > M}. For s < T1, |Xs| ≤ M , and then |XT1 | ≤ |XT1−| + |�XT1 | ≤ M + K ≤ 2M . We have P(sup s≤t |Xs| ≥ 2(i + 1)M ) ≤ P(Ti+1 ≤ t) ≤ P(Ti ≤ t, Ti+1 − Ti ≤ t) = P(sup s≤t |XTi+s − XTi | ≥ 2M, Ti ≤ t) = P(sup s≤t |Xs| ≥ 2M )P(Ti ≤ t) ≤ 12P(Ti ≤ t), using Lemma 42.2 in the last equality. By induction, P(sups≤t |Xs| ≥ 2iM ) ≤ 2−i, and the lemma now follows immediately. A key lemma is the following. Lemma 42.4 Suppose I is a finite interval of the form (a, b), [a, b), (a, b], or [a, b] with a > 0 and m is a finite measure on R giving no mass to Ic. Then there exists a Lévy process
Xt satisfying (42.3).
Proof First let us consider the case where I = [a, b). We approximate m by a discrete
measure. If n ≥ 1, let z j = a + j(b − a)/n, j = 0, . . . , n − 1, and let
mn(dx) =
n−1∑
j=0
m([z j, z j+1))δz j (dx),
where δz j is the point mass at z j. The measures mn converge weakly to m as n → ∞ in the
sense that ∫
f (x) mn(dx) →
∫
f (x) dx
whenever f is a bounded continuous function on R. For each n, let Pn, jt , j = 0, . . . , n − 1,
be independent Poisson processes with parameters m([z j, z j+1)) and let
X nt =
n−1∑
j=0
z jP
n, j
t .
Then X n is a Lévy process with jumps bounded by b.
By Lemma 42.2, if Tn is a stopping time for X n, ε > 0, and δ > 0, then
P(|X nTn+δ − X nTn | > ε) = P(|X nδ | > ε) ≤ P(X nδ �= 0) (42.5)
≤ P
( n−1∑
j=0
Pn, jδ �= 0
)
.
Since the sum of independent Poisson processes is a Poisson process, then
∑n−1
j=0 P
n, j
t is a
Poisson process with parameter
n−1∑
j=0
m([z j, z j+1)) = m(I ).

342 Lévy processes
The last line of (42.5) is then bounded by
1 − e−δm(I ) ≤ δm(I ),
which tends to zero uniformly in n as δ → 0. Note X n0 = 0, a.s. We can therefore apply the
Aldous criterion (Theorem 34.8) to see that the X n are tight with respect to weak convergence
on the space D[0, t0) for any t0.
Any subsequential weak limit X will have paths that are right continuous with left limits.
For any continuous bounded function f on R,
E f (X nt − X ns ) = E f (X nt−s − X n0 ).
Passing to the limit along an appropriate subsequence,
E f (Xt − Xs) = E f (Xt−s − X0).
Since f is an arbitrary bounded continuous function, we see that the laws of Xt − Xs and
Xt−s − X0 are the same. Similarly we prove the increments are independent.
Since x → eiux is a bounded continuous function and mn converges weakly to m, starting
with
E exp(iuX nt ) = exp
(
t
∫
[eiux − 1] mn(dx)
)
,
and passing to the limit, we obtain that the characteristic function of X under P is given
by (42.3).
If now the interval I contains the point b, we follow the above proof, except we let Pn,n−1t
be a Poisson random variable with parameter m([zn−1, b]). Similarly, if I does not contain
the point a, we change Pn,0t to be a Poisson random variable with parameter m((a, z1)). With
these changes, the proof works for intervals I , whether or not they contain either of their
endpoints.
Remark 42.5 If X is the Lévy process constructed in Lemma 42.4, then Yt = Xt − E Xt will
be a Lévy process satisfying (42.4).
Here is the main theorem of this section.
Theorem 42.6 Suppose m is a measure on R with m({0}) = 0 and∫
(1 ∧ x2)m(dx) < ∞. Suppose b ∈ R and σ ≥ 0. There exists a Lévy process Xt such that E eiuXt = exp ( t { iub − σ 2u2/2 + ∫ R [eiux − 1 − iux1(|x|≤1)]m(dx) }) . (42.6) The above equation is called the Lévy–Khintchine formula. The measure m is called the Lévy measure. If we let m(dx) = 1 + x 2 x2 m′(dx) 42.2 Construction of Lévy processes 343 and b = b′ + ∫ (|x|≤1) x3 1 + x2 m(dx) − ∫ (|x|>1)
x
1 + x2 m(dx),
then we can also write
E eiuXt = exp
(
t
{
iub′ − σ 2u2/2 +
∫
R
[
eiux − 1 − iux
1 + x2
]1 + x2
x2
m′(dx)
})
.
Both expressions for the Lévy–Khintchine formula are in common use.
Proof Let m(dx) be a measure supported on (0, 1] with
∫
x2 m(dx) < ∞. Let mn(dx) be the measure m restricted to (2−n, 2−n+1]. Let Y nt be independent Lévy processes whose characteristic functions are given by (42.4) with m replaced by mn; see Remark 42.5. Note EY nt = 0 for all n by Remark 42.1. By the independence of the Y n’s, if M < N , E ( N∑ n=M Y nt )2 = N∑ n=M E (Y nt ) 2 = N∑ n=M t ∫ x2 mn(dx) = t ∫ 2−M 2−N x2 m(dx). By our assumption on m, this goes to zero as M, N → ∞, and we conclude that ∑Nn=0 Y nt converges in L2 for each t. Call the limit Yt . It is routine to check that Yt has independent and stationary increments. Each Y nt has independent increments and is mean zero, so E [Y nt − Y ns | Fs] = E [Y nt − Y ns ] = 0, or Y n is a martingale. By Doob’s inequalities and the L2 convergence, E sup s≤t ∣∣∣ N∑ n=M Y ns ∣∣∣2 → 0 as M, N → ∞, and hence there exists a subsequence Mk such that ∑Mk n=1 Y n s converges uniformly over s ≤ t, a.s. Therefore the limit Yt will have paths that are right continuous with left limits. If m is a measure supported in (1, ∞) with m(R) < ∞, we do a similar procedure starting with Lévy processes whose characteristic functions are of the form (42.3). We let mn(dx) be the restriction of m to (2n, 2n+1], let X nt be independent Lévy processes corresponding to mn, and form Xt = ∑∞ n=0 X n t . Since m(R) < ∞, for each t0, the number of times t less than t0 at which any one of the X nt jumps is finite. This shows Xt has paths that are right continuous with left limits, and it is easy to then see that Xt is a Lévy process. Finally, suppose ∫ x2 ∧ 1 m(dx) < ∞. Let X 1t , X 2t be Lévy processes with characteristic functions given by (42.3) with m replaced by the restriction of m to (1, ∞) and (−∞, −1), respectively, let X 3t , X 4 t be Lévy processes with characteristic functions given by (42.4) with m replaced by the restriction of m to (0, 1] and [−1, 0), respectively, let X 5t = bt, and let X 6t be σ times a Brownian motion. Suppose the X i’s are all independent. Then their sum will be a Lévy process whose characteristic function is given by (42.6). A key step in the construction was the centering of the Poisson processes to get Lévy processes with characteristic functions given by (42.4). Without the centering one is forced to work only with characteristic functions given by (42.3). 344 Lévy processes 42.3 Representation of Lévy processes We now work toward showing that every Lévy process has a characteristic function of the form given by (42.6). Lemma 42.7 If Xt is a Lévy process and A is a Borel subset of R that is a positive distance from 0, then Nt (A) = ∑ s≤t 1A(�Xs) is a Poisson process. Saying that A is a positive distance from 0 means that inf{|x| : x ∈ A} > 0.
Proof Since Xt has paths that are right continuous with left limits and A is a positive distance
from 0, then there can only be finitely many jumps of X that lie in A in any finite time interval,
and so Nt (A) is finite and has paths that are right continuous with left limits. It follows from
the fact that Xt has stationary and independent increments that Nt (A) also has stationary and
independent increments. We now apply Proposition 5.4.
Theorem 42.8 Let Xt be a Lévy process with X0 = 0 and let A1, . . . , An be disjoint bounded
Borel subsets of (0, ∞), each a finite distance from 0. Set
Nt (Ak ) =
∑
s≤t
1Ak (�Xs)
and
Yt = Xt −
n∑
k=1
Nt (Ak ).
Then the processes Nt (A1), . . . , Nt (An), and Yt are mutually independent.
Proof Define λ(A) = E N1(A). The previous lemma shows that if λ(A) < ∞, then Nt (A) is a Poisson process, and clearly its parameter is λ(A). The result now follows from Theorem 18.3. Here is the representation theorem for Lévy processes. Theorem 42.9 Suppose Xt is a Lévy process with X0 = 0. Then there exists a measure m on R − {0} with ∫ (1 ∧ x2) m(dx) < ∞ and real numbers b and σ such that the characteristic function of Xt is given by (42.6). Proof Define m(A) = E N1(A) if A is a bounded Borel subset of (0, ∞) that is a positive distance from 0. Since N1(∪∞k=1Ak ) = ∑∞ k=1 N1(Ak ) if the Ak are pairwise disjoint and each is a positive distance from 0, we see that m is a measure on [a, b] for each 0 < a 1) is a Lévy process with characteristic function
exp
(
t
∫ ∞
1
[eiux − 1] m(dx)
)
.
Since the characteristic function of the sum of independent random variables is equal to the
product of the characteristic functions, it suffices to suppose 0 < a 1 and z j = a + j(b− a)/n. By Lemma 42.7, Nt ((z j, z j+1]) is a Poisson process with
parameter
� j = E N1((z j−1, z j]) = m((z j, z j+1]).
Thus
∑n−1
j=0 z jNt ((z j, z j+1]) has characteristic function
n−1∏
j=0
exp(t� j(e
iuz j − 1)) = exp
(
t
n−1∑
j=0
(eiuz j − 1)� j
)
,
which is equal to
exp
(
t
∫
(eiux − 1) mn(dx)
)
, (42.7)
where mn(dx) =
∑n−1
j=0 � jδz j (dx). Since Z
n
t converges to Zt as n → ∞, passing to the limit
shows that Zt has a characteristic function of the form (42.6).
Next we show that m(1, ∞) < ∞. (We write m(1, ∞) instead of m((1, ∞)) for esthetic reasons.) If not, m(1, K) → ∞ as K → ∞. Then for each fixed L and each fixed t, lim sup K→∞ P(Nt (1, K) ≤ L) = lim sup K→∞ L∑ j=0 e−tm(1,K) m(1, K) j j! = 0. This implies that Nt (1, ∞) = ∞ for each t. However, this contradicts the fact that Xt has paths that are right continuous with left limits. We define m on (−∞, 0) similarly. We now look at Yt = Xt − ∑ s≤t �Xs1(|�Xs|>1).
This is again a Lévy process, and we need to examine its structure. This process has
bounded jumps, hence has moments of all orders. By subtracting c1t for an appropriate
constant c1, we may suppose Yt has mean 0. Let I1, I2, . . . be an ordering of the intervals
{[2−(m+1), 2−m), (−2−m, −2−(m+1)] : m ≥ 0}. Let
X̃ kt =
∑
s≤t
�Xs1(�Xs∈Ik )

346 Lévy processes
and let X kt = X̃ kt − E X̃ kt . By Corollary 18.3 and the fact that all the X k have mean zero,
∞∑
k=1
E (X kt )
2 ≤ E
[(
Yt −
∞∑
k=1
X kt
)2]
+ E
[( ∞∑
k=1
X kt
)2]
= E (Yt )2 < ∞. Hence E [ N∑ k=M X kt ]2 = N∑ k=M E (X kt ) 2 tends to zero as M, N → ∞, and thus Xt − ∑N k=1 X k t converges in L 2. The limit, X ct , say, will be a Lévy process independent of all the X kt . Moreover, X c has no jumps, i.e., it is continuous. Since all the X k have mean zero, then E X ct = 0. By the independence of the increments, E [X ct − X cs | Fs] = E [X ct − X cs ] = 0, and we see X c is a continuous martingale. Using the stationarity and independence of the increments, E [(X cs+t ) 2] = E [(X cs )2] + 2E [X cs (X cs+t − X cs )] + E [(X cs+t − X cs )2] = E [(X cs )2] + E [(X ct )2], which implies that there exists a constant c2 such that E (X ct ) 2 = c2t. We then have E [(X ct ) 2 − c2t | Fs] = (X cs )2 − c2s + E [(X ct − X cs )2 | Fs] − c2(t − s) = (X cs )2 − c2s + E [(X ct − X cs )2] − c2(t − s) = (X cs )2 − c2s. The quadratic variation process of X c is therefore c2t, and by Lévy’s theorem (Theorem 12.1), X ct / √ c2 is a constant multiple of Brownian motion. To complete the proof, it remains to show that ∫ 1 −1 x 2 m(dx) < ∞. But by Remark 42.1,∫ Ik x2 m(dx) = E (X k1 )2, and we have seen that ∑ k E (X k1 ) 2 ≤ EY 21 < ∞. Combining gives the finiteness of ∫ 1 −1 x 2 m(dx). Exercises 42.1 Let α ∈ (0, 2) and let X be a Lévy process where b = σ = 0 in the Lévy–Khintchine formula and the Lévy measure is m(dx) = c|x|−1−α dx. Show that if a > 0 and Yt = a1/αXat , then Y has
the same law as X . The process X is known as a symmetric stable process of index α.
42.2 Suppose Wt = (W 1t ,W 2t ) is a two-dimensional Brownian motion started at 0. Let τs =
inf{t > 0 : W 1t > s}. Prove that W 2τt is a Lévy process and determine the Lévy measure.
Hint: Use scaling to make a guess.

Exercises 347
42.3 Let W be a one-dimensional Brownian motion and let L0 be the local time at 0. Let Tt be the
inverse of L0, that is, Tt = inf{s : L0s ≥ t}. Show Tt is a Lévy process and determine the Lévy
measure.
Hint: Use scaling to get started.
42.4 Let Wt be a one-dimensional Brownian motion, L
y
t the local time at level y, and Tt the inverse
local time at 0, that is, Tt = inf{s : L0s ≥ t}. Let x > 0 be fixed. Prove that LxTt is a Lévy process.
42.5 Let X be a Lévy process with Lévy measure m. Prove that if A and B are disjoint closed sets,
then
E x
∑
s≤t
1A(Xs−)1B(Xs) = E x
∫ t
0
1A(Xs)m(B − Xs) ds
for each x, where B − y = {z − y : z ∈ B}. This is the Lévy system formula in the case of Lévy
processes. There is an analogous formula for Hunt processes.
42.6 A stable subordinator X of order α ∈ (0, 1) is a Lévy process whose characteristic function
is given by (42.6), where b = σ 2 = 0 and m(dx) = c1(x>0)|x|−α−1 dx. Suppose X is a stable
subordinator of index α and W is a Brownian motion. Show that, up to a deterministic time
change, the process Zt = WXt is a symmetric stable process of index 2α.
Hint: Start by using scaling.
42.7 Let Zt be a symmetric stable process of order α ∈ (0, 2). Show that if ε > 0, then
lim
t→∞
|Zt |
tα+ε
= 0, a.s.

Appendix A
Basic probability
This appendix covers the facts from basic probability that we will need. The presentation
here is not precisely what I use when I teach such a course. For example, in a course I
prove the strong law of large numbers without using martingales, I present the inversion
theorem for characteristic functions, I make use of Lévy’s continuity theorem, and so on.
Nevertheless, proofs of all the facts from probability needed in the main part of the text are
given.
A.1 First notions
A probability or probability measure is a measure whose total mass is one. Instead of denoting
a measure space by (X ,A, μ), probabilists use (�,F , P). Here � is a set, F is called a
σ -field (which is the same thing as a σ -algebra), and P is a measure with P(�) = 1. Elements
of F are called events. A typical element of � is denoted ω.
Instead of saying a property occurs almost everywhere, we talk about properties occurring
almost surely, written a.s. Real-valued measurable functions from � to R are called random
variables and are usually denoted by X or Y or other capital letters.
Integration (in the sense of Lebesgue) with respect to P is called expectation or expected
value, and we write E X for
∫
X dP. The notation E [X ; A] is often used for ∫A X dP.
The random variable 1A is the function that is one if ω ∈ A and zero otherwise. It is called
the indicator of A (the name “characteristic function” in probability refers to the Fourier
transform). Events such as {ω : X (ω) > a} are almost always abbreviated by (X > a) or
{X > a}.
Given a random variable X , we can define a probability on the Borel σ -field of R by
PX (A) = P(X ∈ A), A ⊂ R. (A.1)
The probability PX is called the law of X or the distribution of X . We define FX : R → [0, 1]
by
FX (x) = PX ((−∞, x]) = P(X ≤ x). (A.2)
The function FX is called the distribution function of X .
348

A.1 First notions 349
Proposition A.1 The distribution function FX of a random variable X satisfies:
(1) FX is increasing;
(2) FX is right continuous with left limits;
(3) limx→∞ FX (x) = 1 and limx→−∞ FX (x) = 0.
Proof We prove the right continuity of FX and leave the rest of the proof to the reader.
If xn ↓ x, then (X ≤ xn) ↓ (X ≤ x), and so P(X ≤ xn) ↓ P(X ≤ x) since P is a finite
measure.
Note that if xn ↑ x, then (X ≤ xn) ↑ (X < x), and so FX (xn) ↑ P(X < x). Any function F : R → [0, 1] satisfying (1)–(3) of Proposition 1.1 is called a distribution function, whether or not it comes from a random variable. Proposition A.2 Suppose F is a distribution function. There exists a random variable X such that F = FX . Proof Let � = [0, 1], F the Borel σ -field, and P a Lebesgue measure. Define X (ω) = sup{x : F (x) < ω}. It is routine to check that FX = F . In the above proof, essentially X = F−1. However F may have jumps or be constant over some intervals, so some care is needed in defining X . Certain distributions or laws are very common. We list some of them. (1) Bernoulli. A random variable is Bernoulli if P(X = 1) = p, P(X = 0) = 1 − p for some p ∈ [0, 1]. (2) Binomial. This is defined by P(X = k) = ( n k ) pk(1 − p)n−k , where n is a positive integer, 0 ≤ k ≤ n, p ∈ [0, 1], and ( n k ) = n!k!(n−k)! . (3) Point mass at a. Here P(X = a) = 1. (4) Poisson. For λ > 0 we set P(X = k) = e−λλk/k! Again k is a non-negative integer.
If F is absolutely continuous, we call f = F ′ the density of F . If such an F is the
distribution function of a random variable X , then
P(X ∈ A) =
∫
A
f (x) dx.
Some examples of distributions characterized by densities are the following.
(5) Uniform on [a, b]. Define f (x) = (b − a)−11[a,b](x).
(6) Exponential. For x ≥ 0 let f (x) = λe−λx and set f (x) = 0 for x < 0. (7) Standard normal. Define f (x) = 1√ 2π e−x 2/2 for x ∈ R. Let us verify that the integral of f is one. To do that, let I = ∫ ∞ 0 e−x 2/2 dx, 350 Basic probability and it suffices to show I = √π/2. Using the Fubini theorem, the monotone convergence theorem, and a change of variables to polar coordinates, we write I2 = ( ∫ ∞ 0 e−x 2/2 dx )( ∫ ∞ 0 e−y 2/2 dy ) = ∫ ∞ 0 ∫ ∞ 0 e−(x 2+y2 )/2 dx dy = lim R→∞ ∫ ∫ x,y≥0,x2+y2≤R2 e−(x 2+y2 )/2 dx dy = lim R→∞ ∫ π/2 0 ∫ R 0 e−r 2/2 r dr dθ = lim R→∞ π 2 (1 − e−R2/2) = π 2 as desired. We shall see later ((A.4) and (A.5)) that a standard normal random variable Z has mean zero and variance one, which means that E Z = 0 and E Z2 = 1. (8) Normal random variables with mean μ and variance σ 2. If Z is a standard normal random variable, then a normal random variable X with mean μ and variance σ 2 has the same distribution as μ+σZ. It is an exercise in calculus to check that such a random variable has density f (x) = 1√ 2πσ e−(x−μ) 2/2σ 2 . (A.3) (9) Gamma. A random variable X has a gamma distribution with parameters r and λ (both r and λ must be positive) if it has density f (x) = λe−λx(λx)r−1/ (r) for x ≥ 0 and f (x) = 0 if x < 0, where (r) = ∫∞0 e−yyr−1 dy is the Gamma function. Recall (k) = (k − 1)! for k a non-negative integer. We can use the law of a random variable to calculate expectations. Proposition A.3 Let X be a random variable. If g is bounded or non-negative, then E g(X ) = ∫ g(x) PX (dx). Proof If g is the indicator of an event A, this is just the definition of PX . By linearity, the result holds for simple functions. By the monotone convergence theorem, the result holds for non-negative functions, and by linearity again, it holds for bounded g. If FX has a density f , then PX (dx) = f (x) dx. In this case E X = ∫ x f (x) dx and E X 2 = ∫ x2 f (x) dx. (We need E |X | finite to justify the first equality if X is not necessarily non-negative.) We define the mean of a random variable to be its expectation, and the variance of a random variable is defined by Var X = E (X − E X )2. The pth moment of X is E X p if p is a positive integer. A.1 First notions 351 Note Var X = E [X 2 − 2(X )(E X ) + (E X )2] = E X 2 − (E X )2. Let us calculate a few examples. Since xe−x 2/2 is an odd function, if Z is a standard normal random variable, then E Z = ∫ x 1√ 2π e−x 2/2 dx = 0. (A.4) Using integration by parts, E Z2 = ∫ x2 1√ 2π e−x 2/2 dx (A.5) = lim N→∞ ∫ N −N x2 1√ 2π e−x 2/2 dx = lim N→∞ −2Ne−N2/2 + ∫ N −N 1√ 2π e−x 2/2 dx = 1√ 2π ∫ e−x 2/2 dx = 1, and so Var Z = 1. By completing the square and a change of variables, we calculate E eaZ = 1√ 2π ∫ eaxe−x 2/2 dx = 1√ 2π ea 2/2 ∫ e−(x−a) 2/2 dx = ea2/2. If X is a normal random variable with mean μ and variance σ 2, we can write X = μ+σZ for Z a standard normal random variable, and obtain E eaX = eaμE eaσZ = eaμ+a2σ 2/2. (A.6) If X is a Poisson random variable with parameter λ, then E X = ∞∑ k=0 ke−λ λk k! = ∞∑ k=1 ke−λ λk k! (A.7) = λe−λ ∞∑ k=1 λk−1 (k − 1)! = λ. A similar calculation shows that E [X (X − 1)] = λ2, so Var X = E [X (X − 1)] + E X − (E X )2 = λ. (A.8) A straightforward application of integration by parts shows that if X is an exponential random variable with parameter λ, then E X = ∫ ∞ 0 λxe−λx dx = 1 λ . (A.9) Another equality that is useful is the following. 352 Basic probability Proposition A.4 If X ≥ 0, a.s., and p > 0, then
E X p =
∫ ∞
0
pλp−1P(X > λ) dλ.
The proof will show that this equality is also valid if we replace P(X > λ) by P(X ≥ λ).
Proof Using the Fubini theorem and writing∫ ∞
0
pλp−1P(X > λ) dλ = E
∫ ∞
0
pλp−11(λ,∞)(X ) dλ
= E
∫ X
0
pλp−1 dλ = E X p
gives the proof.
We need two elementary inequalities. The first is known as Chebyshev’s inequality.
Proposition A.5 If X ≥ 0,
P(X ≥ a) ≤ E X
a
.
Proof We write
P(X ≥ a) = E
[
1[a,∞)(X )
]
≤ E
[(X
a
)
1[a,∞)(X )
]
≤ E X/a,
since X/a is bigger than or equal to 1 when X ∈ [a, ∞).
If we apply this to X = (Y − EY )2, we obtain
P(|Y − EY | ≥ a) = P((Y − EY )2 ≥ a2) ≤ VarY/a2. (A.10)
This special case of Chebyshev’s inequality is sometimes itself referred to as Chebyshev’s
inequality, while Proposition A.5 is sometimes called the Markov inequality.
The second inequality we need is Jensen’s inequality, not to be confused with Jensen’s
formula of complex analysis.
Proposition A.6 Suppose g is convex and X and g(X ) are both integrable. Then
g(E X ) ≤ E g(X ).
Proof One property of convex functions is that they lie above their tangent lines, and more
generally, their support lines. Thus if x0 ∈ R, we have
g(x) ≥ g(x0) + c(x − x0)
for some constant c. Letting x = X (ω) and taking expectations, we obtain
E g(X ) ≥ g(x0) + c(E X − x0).
Now set x0 equal to E X .
If An is a sequence of sets, define (An i.o.), read “An infinitely often,” by
(An i.o.) = ∩∞n=1 ∪∞i=n Ai.
This set consists of those ω that are in infinitely many of the An.

A.2 Independence 353
A simple but very important proposition is the Borel–Cantelli lemma. It has two parts,
and we prove the first part here, leaving the second part to the next section.
Proposition A.7 Let A1, A2, . . . be a sequence of events. If
∑
n P(An) < ∞, then P(An i.o.) = 0. Proof We write P(An i.o.) = lim n→∞ P(∪∞i=nAi) ≤ lim sup n→∞ ∞∑ i=n P(Ai) = 0, and we are done. A.2 Independence We say two events A and B are independent if P(A ∩ B) = P(A)P(B). The events A1, . . . , An are independent if P(Ai1 ∩ Ai2 ∩ · · · ∩ Aij ) = P(Ai1 )P(Ai2 ) · · · P(Aij ) for each subset {i1, . . . , i j} of {1, . . . , n} with 1 ≤ i1 < · · · < i j ≤ n. Proposition A.8 If A and B are independent, then Ac and B are independent. Proof We write P(Ac ∩ B) = P(B) − P(A ∩ B) = P(B) − P(A)P(B) = P(B)(1 − P(A)) = P(B)P(Ac). This is all there is to the proof. We say two σ -fields F and G are independent if A and B are independent whenever A ∈ F and B ∈ G. Two random variables X and Y are independent if σ (X ), the σ -field generated by X , and σ (Y ), the σ -field generated by Y , are independent. (Recall that the σ -field generated by a random variable X is given by {(X ∈ A) : A a Borel subset of R}.) We define the independence of n σ -fields or n random variables in a similar way. Remark A.9 If f and g are Borel functions and X and Y are independent, then f (X ) and g(Y ) are independent. This follows because the σ -field generated by f (X ) is a sub-σ -field of the one generated by X , and similarly for g(Y ). To construct independent random variables, we can use the following. Proposition A.10 If F1, . . . , Fn are distribution functions, there exist independent random variables X1, . . . , Xn such that FXi = Fi, i = 1, . . . , n. Proof Let � = [0, 1]n, F the Borel σ -field on �, and P an n-dimensional Lebesgue measure on �. If ω = (ω1, . . . , ωn), define Xi(ω) = sup{x : Fi(x) < ωi}. As in Proposition A.2, FXi = Fi. We deduce the independence from the fact that P is a product measure, in fact, the n-fold product of one-dimensional Lebesgue measure on [0, 1]. 354 Basic probability Let FX ,Y (x, y) = P(X ≤ x,Y ≤ y) denote the joint distribution function of two random variables X and Y . (The comma inside the set means “and"; this is a standard convention in probability.) Proposition A.11 FX ,Y (x, y) = FX (x)FY (y) if and only if X and Y are independent. Proof If X and Y are independent, then FX ,Y (x, y) = P(X ≤ x,Y ≤ y) = P(X ≤ x)P(Y ≤ y) = FX (x)FY (y). Conversely, if the inequality holds, fix y and let My denote the collection of sets A for which P(X ∈ A,Y ≤ y) = P(X ∈ A)P(Y ≤ y). My contains all sets of the form (−∞, x]. It follows by linearity that My contains all sets of the form (x, z], and then by linearity again, all sets that are the finite union of such half-open, half-closed intervals. Note that the collection of finite unions of such intervals, A, is an algebra generating the Borel σ -field. It is clear that My is a monotone class, so by the monotone class theorem (Theorem B.2), My contains the Borel σ -field. For a fixed set A, let MA denote the collection of sets B for which P(X ∈ A,Y ∈ B) = P(X ∈ A)P(Y ∈ B). Again, MA is a monotone class and by the preceding paragraph contains the σ -field generated by the collection of finite unions of intervals of the form (x, z], and hence contains the Borel sets. Therefore X and Y are independent. The following is known as the multiplication theorem. Proposition A.12 If X , Y , and X Y are integrable and X and Y are independent, then E [X Y ] = (E X )(EY ). Proof Consider the pairs (ZX , ZY ) with ZX being σ (X ) measurable and ZY being σ (Y ) measurable for which the multiplication theorem is true. It holds for ZX = 1A(X ) and ZY = 1B(Y ) with A and B Borel subsets of R by the definition of X and Y being independent. It holds for simple random variables (ZX , ZY ), that is, linear combinations of indicators, by the linearity of both sides. It holds for non-negative random variables by monotone convergence. And it holds for integrable random variables by linearity again. If X1, . . . , Xn are independent, then so are X1 −E X1, . . . , Xn −E Xn. Assuming everything is integrable, E [(X1 − E X1) + · · · (Xn − E Xn)]2 = E (X1 − E X1)2 + · · · + E (Xn − E Xn)2, using the multiplication theorem to show that the expectations of the cross-product terms are zero. We have thus shown Var (X1 + · · · + Xn) = Var X1 + · · · + Var Xn. (A.11) We finish up this section by proving the second half of the Borel–Cantelli lemma. Proposition A.13 Suppose An is a sequence of independent events. If ∞∑ n=1 P(An) = ∞, then P(An i.o.) = 1. A.3 Convergence 355 Note that here the An are independent, while in the first half of the Borel–Cantelli lemma no such assumption was necessary. Proof Note P(∪Ni=nAi) = 1 − P(∩Ni=nAci ) = 1 − N∏ i=n P(Aci ) = 1 − N∏ i=n (1 − P(Ai)) ≥ 1 − exp ( − N∑ i=n P(Ai) ) , using the inequality 1 − x ≤ e−x for x > 0. As N → ∞, the right-hand side tends to one, so
P(∪∞i=nAi) = 1. This holds for all n, which proves the result.
A.3 Convergence
In this section we consider three ways a sequence of random variables Xn can converge.
We say Xn converges to X almost surely if the event (Xn �→ X ) has probability zero. Xn
converges to X in probability if for each ε, P(|Xn − X | > ε) → 0 as n → ∞. For p ≥ 1, Xn
converges to X in Lp if E |Xn − X |p → 0 as n → ∞.
The following proposition shows some relationships among the types of convergence.
Proposition A.14 (1) If Xn → X almost surely, then Xn → X in probability.
(2) If Xn → X in Lp, then Xn → X in probability.
(3) If Xn → X in probability, there exists a subsequence n j such that Xnj converges to X
almost surely.
Proof To prove (1), note Xn − X tends to zero almost surely, so 1(−ε,ε)c (Xn − X ) also
converges to zero almost surely. Now apply the dominated convergence theorem.
(2) comes from Chebyshev’s inequality:
P(|Xn − X | > ε) = P(|Xn − X |p > εp) ≤ E |Xn − X |p/εp → 0
as n → ∞.
To prove (3), choose nj larger than nj−1 such that P(|Xn − X | > 2− j) < 2− j whenever n ≥ nj. Thus if we let Ai = (|Xnj − X | > 2−i for some j ≥ i), then P(Ai) ≤ 2−i+1.
By the Borel–Cantelli lemma P(Ai i.o.) = 0. This implies Xnj → X almost surely on the
complement of (Ai i.o.).
Let us give some examples to show there need not be any other implications among the
three types of convergence.
Let � = [0, 1], F the Borel σ -field, and P a Lebesgue measure. Let Xn = n21(0,1/n). Then
clearly Xn converges to zero almost surely and in probability, but E X pn = n2p/n → ∞ for
any p ≥ 1.
Let � be the unit circle, and let P be a Lebesgue measure on the circle normalized to have
total mass 1. We use θ to denote the angle that the ray from 0 through a point on the circle
makes with the x axis. Let tn =
∑n
i=1 i
−1, and let An = {eiθ : tn−1 ≤ θ < tn}. Let Xn = 1An . 356 Basic probability Any point on the unit circle will be in infinitely many An, so Xn does not converge almost surely to zero. But P(An) = 1/(2πn) → 0, so Xn → 0 in probability and in Lp. A.4 Uniform integrability A sequence {Xi} of random variables is uniformly integrable if sup i ∫ (|Xi|>M )
|Xi| dP → 0
as M → ∞. This can be rephrased by saying: given ε > 0 there exists M > 0 such that
E [ |Xi|; |Xi| > M] < ε for all i. Here M can be chosen independently of i. Lemma A.15 If {Xi} is a uniformly integrable sequence of random variables, then supi E |Xi| < ∞. Proof There exists M such that E [ |Xi|; |Xi| > M] ≤ 1. Then
E |Xi| ≤ E [ |Xi|; |Xi| ≤ M] + E [ |Xi|; |Xi| > M] ≤ M + 1,
and we are done.
We say a sequence of random variables {Xi} is uniformly absolutely continuous if given ε
there exists δ such that supi E [ |Xi|; A] ≤ ε whenever P(A) < δ. Proposition A.16 The following are equivalent. (1) The sequence {Xi} is uniformly integrable. (2) The sequence {Xi} is uniformly absolutely continuous and supi E |Xi| < ∞. Proof If (1) holds, we showed in Lemma A.15 that the expectations are uniformly bounded. Let ε > 0 and choose M such that supi E [ |Xi| : |Xi| > M] < ε/2. Then if δ = ε/(2M ) and P(A) < δ, we have E [ |Xi|; A] ≤ E [ |Xi|; |Xi| > M] + E [ |Xi|; |Xi| ≤ M, A] < ε 2 + MP(A) ≤ ε. Now suppose (2) holds. Let ε > 0 and choose δ such that E [ |Xi|; A] < ε for all i if P(A) ≤ δ. Let M = supi E |Xi|/δ. Then by the Chebyshev inequality P(|Xi| > M ) ≤ E |Xi|
M
= δ,
so E [ |Xi|; |Xi| > M] < ε. Proposition A.17 Suppose {Xi} and {Yi} are each uniformly integrable sequences of random variables. Then {Xi + Yi} is also a uniformly integrable sequence. Proof By Proposition A.16, sup i E |Xi + Yi| ≤ sup i E |Xi| + sup i E |Yi| < ∞. Using Proposition A.16 again, given ε there exists δ such that E [ |Xi|; A] < ε/2 and E [ |Yi|; A] < ε/2 if P(A) < δ. But then E [ |Xi + Yi|; A] < ε and a third use of Propo- sition A.16 yields our result. A.5 Conditional expectation 357 Proposition A.18 Suppose there exists ϕ : [0, ∞) → [0, ∞) such that ϕ is increasing, ϕ(x)/x → ∞ as x → ∞, and supi E ϕ(|Xi|) < ∞. Then the sequence {Xi} is uniformly integrable. Proof Let ε > 0 and choose x0 such that x/ϕ(x) < ε if x ≥ x0. If M ≥ x0,∫ (|Xi|>M )
|Xi| =
∫ |Xi|
ϕ(|Xi|)ϕ(|Xi|)1(|Xi|>M ) ≤ ε
∫
ϕ(|Xi|) ≤ ε sup
i
E ϕ(|Xi|).
Since ε is arbitrary, we are done.
The main result we need in this section is the Vitali convergence theorem.
Theorem A.19 If Xn → X almost surely and the sequence {Xn} is uniformly integrable, then
E |Xn − X | → 0.
Proof By Proposition A.17 with Yi = −X for each i, the sequence Xi − X is uniformly
integrable. Let ε > 0 and choose M such that∫
(|Xi−X |>M )
|Xi − X | < ε. By dominated convergence, lim sup i→∞ E |Xi − X | ≤ lim sup i→∞ E [ |Xi − X |; |Xi − X | ≤ M] + ε = ε. Since ε is arbitrary, then E |Xi − X | → 0. A.5 Conditional expectation If F ⊂ G are two σ -fields and X is an integrable G measurable random variable, the conditional expectation of X given F , written E [X | F ] and read as “the expectation (or expected value) of X given F ,” is any F measurable random variable Y such that E [Y ; A] = E [X ; A] for every A ∈ F . The conditional probability of A ∈ G given F is defined by P(A | F ) = E [1A | F ]. If Y1,Y2 are two F measurable random variables with E [Y1; A] = E [Y2; A] for all A ∈ F , then Y1 = Y2, a.s., and so conditional expectation is unique up to almost sure equivalence. In the case X is already F measurable, E [X | F ] = X . If X is independent from F , E [X | F ] = E X . Both of these facts follow immediately from the definition. For another example, if {Ai} is a finite collection of pairwise disjoint sets whose union is �, P(Ai) > 0
for all i, and F is the σ -field generated by the Ai’s, then
P(A | F ) =
∑
i
P(A ∩ Ai)
P(Ai)
1Ai . (A.12)
This follows since the right-hand side is F measurable and its expectation over any set Ai is
P(A ∩ Ai). Equation (A.12) provides the link with the definition of conditional probability
from elementary probability: if P(B) �= 0, then
P(A | B) = P(A ∩ B)
P(B)
. (A.13)

358 Basic probability
We have
E [E [X | F ] ] = E X (A.14)
because E [E [X | F ]] = E [E [X | F ]; �] = E [X ; �] = E X .
The following is easy to establish.
Proposition A.20 (1) If X ≥ Y are both integrable, then
E [X | F ] ≥ E [Y | F ], a.s.
(2) If X and Y are integrable and a ∈ R, then
E [aX + Y | F ] = aE [X | F ] + E [Y | F ].
It is easy to check that limit theorems such as monotone convergence and dominated
convergence have conditional expectation versions, as do inequalities like Jensen’s and
Chebyshev’s inequalities. Thus, for example, we have Jensen’s inequality for conditional
expectations.
Proposition A.21 If g is convex and X and g(X ) are integrable,
E [g(X ) | F ] ≥ g(E [X | F ]), a.s.
A key fact is the following.
Proposition A.22 If X and X Y are integrable and Y is measurable with respect to F , then
E [X Y | F ] = Y E [X | F ]. (A.15)
Proof If A ∈ F , then for any B ∈ F ,
E
[
1AE [X | F ]; B
] = E [E [X | F ]; A ∩ B] = E [X ; A ∩ B] = E [1AX ; B].
Since 1AE [X | F ] is F measurable, this shows that (A.15) holds when Y = 1A and A ∈ F .
Using linearity shows that (A.15) holds whenever Y is a simple F measurable random
variable. Taking limits, (A.15) holds whenever Y ≥ 0 is F measurable and X and X Y are
integrable. Using linearity again completes the proof.
Two other equalities are contained in the following.
Proposition A.23 If E ⊂ F ⊂ G are σ -fields, then
E
[
E [X | F ] | E] = E [X | E] = E [E [X | E] | F].
Proof The right equality holds because E [X | E] is E measurable, hence F measurable.
We then use the fact that if Y is F measurable, E [Y | F ] = Y .
To show the left equality, let A ∈ E . Then since A is also in F ,
E
[
E
[
E [X | F ] | E]; A] = E [E [X | F ]; A] = E [X ; A] = E [E [X | E]; A].
Since both sides are E measurable, the equality follows.
To show the existence of E [X | F ], we proceed as follows.
Proposition A.24 If X is integrable, then E [X | F ] exists.

A.7 Martingales 359
Proof Using linearity, we need only consider X ≥ 0. Define a finite measure Q on F by
Q(A) = E [X ; A] for A ∈ F . This is trivially absolutely continuous with respect to P|F , the
restriction of P to F . Let E [X | F ] be the Radon–Nikodym derivative of Q with respect to
P|F . Since Q and P|F are measures on F , the Radon–Nikodym derivative is F measurable,
and so provides the desired random variable.
When F = σ (Y ), one usually writes E [X | Y ] for E [X | F ]. Notation that is commonly
used is E [X | Y = y]. The definition is as follows. If A ∈ σ (Y ), then A = (Y ∈ B) for
some Borel set B by the definition of σ (Y ), or 1A = 1B(Y ). By linearity and taking limits, it
follows that if Z is σ (Y ) measurable, then Z = f (Y ) for some Borel measurable function f .
Set Z = E [X | Y ] and choose f Borel measurable so that Z = f (Y ). Then E [X | Y = y] is
defined to be f (y).
If X ∈ L2 and M = {Y ∈ L2 : Y is F measurable}, one can show that E [X | F ] is equal
to the projection of X onto the subspace M.
A.6 Stopping times
We next want to talk about stopping times. Suppose we have a sequence of σ -fields Fi such
that Fi ⊂ Fi+1 for each i. An example would be if Fi = σ (X1, . . . , Xi). A random mapping
N from � to {0, 1, 2, . . .} is called a stopping time if for each n, (N ≤ n) ∈ Fn.
The proof of the following is immediate from the definitions.
Proposition A.25 (1) Fixed times n are stopping times.
(2) If N1 and N2 are stopping times, then so are N1 ∧ N2 and N1 ∨ N2.
(3) If Nn is an increasing sequence of stopping times, then so is N = supn Nn.
(4) If Nn is a decreasing sequence of stopping times, then so is N = inf n Nn.
(5) If N is a stopping time, then so is N + n.
We define
FN = {A : A ∩ (N ≤ n) ∈ Fn for all n}. (A.16)
A.7 Martingales
In this section we consider martingales. Let Fn be an increasing sequence of σ -fields. A
sequence of random variables Mn is adapted to Fn if for each n, Mn is Fn measurable.
Mn is a martingale if Mn is adapted to Fn, Mn is integrable for all n, and
E [Mn | Fn−1] = Mn−1, a.s., n = 2, 3, . . . (A.17)
If we have E [Mn | Fn−1] ≥ Mn−1, a.s., for every n, then Mn is a submartingale. If we have
E [Mn | Fn−1] ≤ Mn−1, we have a supermartingale.
Let us look at some examples. If Xi is a sequence of mean zero independent random
variables and Sn =
∑n
i=1 Xi, then Mn = Sn is a martingale, since
E [Mn | Fn−1] = Mn−1 + E [Mn − Mn−1 | Fn−1]
= Mn−1 + E [Mn − Mn−1] = Mn−1,
using independence.

360 Basic probability
Another example is the following. If the Xi’s are independent and have mean zero and
variance one, Sn is as in the previous example, and Mn = S2n − n, then
E [S2n | Fn−1] = E [(Sn − Sn−1)2 | Fn−1] + 2Sn−1E [Sn | Fn−1] − S2n−1 = 1 + S2n−1,
using independence. It follows that Mn is a martingale.
A third example is the following: if X ∈ L1 and Mn = E [X | Fn], then Mn is a martingale.
The proof of this is simple:
E [Mn+1 | Fn] = E [E [X | Fn+1] | Fn] = E [X | Fn] = Mn.
If Mn is a martingale, g is convex, and g(Mn) is integrable for each n, then by Jensen’s
inequality for conditional expectations,
E [g(Mn+1) | Fn] ≥ g(E [Mn+1 | Fn]) = g(Mn), (A.18)
or g(Mn) is a submartingale. Similarly if g is convex and increasing on [0, ∞) and Mn is a
positive submartingale, then g(Mn) is a submartingale because
E [g(Mn+1) | Fn] ≥ g(E [Mn+1 | Fn]) ≥ g(Mn).
A.8 Optional stopping
Note that if one takes expectations in (A.17), one has E Mn = E Mn−1, and by induction
E Mn = E M0. The theorem about martingales that lies at the basis of all other results is
Doob’s optional stopping theorem, which says that the same is true if we replace n by a
stopping time N . There are various versions, depending on what conditions one puts on the
stopping times.
Theorem A.26 If N is a stopping time with respect to Fn that is bounded by a positive real
K and Mn a martingale, then E MN = E M0.
Proof We write
E MN =
K∑
k=0
E [MN ; N = k] =
K∑
k=0
E [Mk; N = k].
Note (N = k) is F j measurable if j ≥ k, so
E [Mk; N = k] = E [Mk+1; N = k] = E [Mk+2; N = k]
= · · · = E [MK; N = k].
Hence
E MN =
K∑
k=0
E [MK; N = k] = E MK = E M0.
This completes the proof.
The same proof as that in Theorem A.26 gives the following corollary.

A.9 Doob’s inequalities 361
Corollary A.27 If N is a stopping time bounded by K and Mn is a submartingale, then
E MN ≤ E MK.
The same proof also gives
Corollary A.28 If N is a stopping time bounded by K, A ∈ FN , and Mn is a submartingale,
then E [MN ; A] ≤ E [MK; A].
Proposition A.29 If N1 ≤ N2 are stopping times bounded by K and M is a martingale, then
E [MN2 | FN1 ] = MN1 , a.s.
Proof Suppose A ∈ FN1 . We need to show E [MN1; A] = E [MN2; A]. Define a new stopping
time N3 by
N3(ω) =
{
N1(ω), ω ∈ A
N2(ω), ω /∈ A.
It is easy to check that N3 is a stopping time, so E MN3 = E MK = E MN2 implies
E [MN1; A] + E [MN2; Ac] = E [MN2 ].
Subtracting E [MN2; Ac] from each side completes the proof.
The following is known as the Doob decomposition for discrete time martingales.
Proposition A.30 Suppose Xk is a submartingale with respect to an increasing sequence of
σ -fields Fk. Then we can write Xk = Mk + Ak such that Mk is a martingale adapted to the Fk
and Ak is a sequence of random variables with Ak beingFk−1 measurable and A0 ≤ A1 ≤ · · · .
Proof Let ak = E [Xk | Fk−1] − Xk−1 for k = 1, 2, . . . Since Xk is a submartingale, each
ak ≥ 0. Let Ak =
∑k
i=1 ai. The fact that the Ak are increasing and measurable with respect to
Fk−1 is clear. Set Mk = Xk − Ak . Then
E [Mk+1 − Mk | Fk] = E [Xk+1 − Xk | Fk] − ak+1 = 0,
or Mk is a martingale.
Combining Propositions A.29 and A.30 we have
Corollary A.31 Suppose Xk is a submartingale, and N1 ≤ N2 are bounded stopping times.
Then
E [XN2 | FN1 ] ≥ XN1 .
A.9 Doob’s inequalities
The first interesting consequences of the optional stopping theorems are Doob’s inequalities.
If Mn is a martingale, set M∗n = maxi≤n |Mi|.
Theorem A.32 If Mn is a martingale or a positive submartingale,
P(M∗n ≥ a) ≤
1
a
E [ |Mn|; M∗n ≥ a] ≤
1
a
E |Mn|.

362 Basic probability
Proof Fix n. Set Mn+1 = Mn. Let N = min{ j : |Mj| ≥ a} ∧ (n + 1). Since the function
f (x) = |x| is convex, |Mn| is a submartingale. If A = (M∗n ≥ a), then A ∈ FN and we have
aP(M∗n ≥ a) ≤ E [ |MN |; A] ≤ E [ |Mn|; A] ≤ E |Mn|,
the first inequality by the definition of N , the second by Corollary A.28.
For p > 1, we have the following inequality.
Theorem A.33 If p > 1, M is a martingale or positive submartingale, and E |Mi|p < ∞ for i ≤ n, then E (M∗n ) p ≤ ( p p − 1 )p E |Mn|p. Proof Note M∗n ≤ ∑n i=1 |Mn|, hence M∗n ∈ Lp. We write, using Theorem A.32, E (M∗n ) p = ∫ ∞ 0 pap−1P(M∗n > a) da ≤
∫ ∞
0
pap−1E [ |Mn|1(M∗n ≥a)/a] da
= E
∫ M∗n
0
pap−2|Mn| da = p
p − 1E [(M
∗
n )
p−1|Mn|]
≤ p
p − 1 (E (M
∗
n )
p)(p−1)/p(E |Mn|p)1/p.
The last inequality follows by Hölder’s inequality. Now divide both sides by the quantity
(E (M∗n )
p)(p−1)/p.
A.10 Martingale convergence theorem
The martingale convergence theorem is another important consequence of optional stopping.
The main step is the upcrossing lemma. The number of upcrossings of an interval [a, b] is
the number of times a process M crosses from below a to above b.
To be more exact, let
S1 = min{k : Mk ≤ a}, T1 = min{k > S1 : Mk ≥ b},
and
Si+1 = min{k > Ti : Mk ≤ a}, Ti+1 = min{k > Si+1 : Mk ≥ b}.
The number of upcrossings Un before time n is Un = max{ j : Tj ≤ n}.
Theorem A.34 (Upcrossing lemma) If Mk is a submartingale,
EUn ≤ 1
b − aE [(Mn − a)
+].
Proof The number of upcrossings of [a, b] by Mk is the same as the number of upcrossings
of [0, b − a] by Yk = (Mk − a)+, where x+ = x ∨ 0. Moreover Yk is still a submartingale.
If we obtain the inequality for the number of upcrossings of the interval [0, b − a] by the
process Yk , we will have the desired inequality for upcrossings of M .
Thus we may assume a = 0. Fix n and define Yn+1 = Yn. This will still be a submartingale.
Define Si, Ti as above, and let S′i = Si ∧ (n + 1), T ′i = Ti ∧ (n + 1). Since Ti+1 > Si+1 > Ti,
then T ′n+1 = n + 1.

A.10 Martingale convergence theorem 363
We write
EYn+1 = EYS′1 +
n+1∑
i=0
E [YT ′i − YS′i ] +
n+1∑
i=0
E [YS′i+1 − YT ′i ].
All the summands in the third term on the right are non-negative since Yk is a submartingale.
The first term on the right will be non-negative sinceY is non-negative. For the jth upcrossing,
YT ′j − YS′j ≥ b − a, while YT ′j − YS′j is always greater than or equal to 0. Thus
n+1∑
i=0
(YT ′i − YS′i ) ≥ (b − a)Un.
Hence
EUn ≤ 1
b − aEYn+1. (A.19)
This leads to the martingale convergence theorem.
Theorem A.35 If Mn is a submartingale such that supn E M
+
n < ∞, then Mn converges almost surely as n → ∞. Proof For each a < b, let Un(a, b) be the number of upcrossings of [a, b] by M up to time n, and let U (a, b) = limn→∞ Un. For each pair a < b of rational numbers, by monotone convergence, EU (a, b) ≤ 1 b − a supn E (Mn − a) + < ∞. Thus U (a, b) < ∞, a.s. If Na,b is the set of ω’s where U (a, b) = ∞ and N = ∪a lim inf n→∞ Mn(ω). There-
fore Mn converges almost surely, although we still have to rule out the possibility of the limit
being infinite. Since Mn is a submartingale, E Mn ≥ E M0, and thus
E |Mn| = E M+n + E M−n = 2E M+n − E Mn ≤ 2E M+n − E M0.
By Fatou’s lemma,
E lim
n
|Mn| ≤ sup
n
E |Mn| ≤ 2 sup
n
E M+n − E M0 < ∞, or Mn converges almost surely to a finite limit. Corollary A.36 If Xn is a positive supermartingale or a martingale bounded above or below, Xn converges almost surely. Proof If Xn is a positive supermartingale, −Xn is a submartingale bounded above by 0. Now apply Theorem A.35. If Xn is a martingale bounded above, by considering −Xn, we may assume Xn is bounded below. Looking at Xn + M for fixed M will not affect the convergence, so we may assume Xn is bounded below by 0. Now apply the first assertion of the corollary. 364 Basic probability Mn is a uniformly integrable martingale if the collection of random variables {Mn} is uniformly integrable. Proposition A.37 (1) If Mn is a martingale with supn E |Mn|p < ∞ for some p > 1, then the
convergence is in Lp as well as almost surely. This is also true when Mn is a submartingale.
(2) If Mn is a uniformly integrable martingale, then the convergence is in L1.
(3) If Mn → M∞ in L1, then Mn = E [M∞ | Fn].
Proof (1) If supn E |Mn|p < ∞, then supn E M+n < ∞ and Mn converges almost surely. Let M∞ be the limit. Then |Mn − M∞| → 0, a.s., and E sup n |Mn − M∞|p ≤ cE sup n |Mn|p + cE |M∞|p ≤ cE sup n |Mn|p ≤ c sup n E |Mn|p < ∞. The second inequality is by Fatou’s lemma and the last by Doob’s inequalities, Theorem A.33. The Lp convergence assertion now follows by dominated convergence. (2) The L1 convergence assertion follows since almost sure convergence together with uniform integrability implies L1 convergence by the Vitali convergence theorem, Theorem A.19. (3) Finally, if j < n, we have Mj = E [Mn | F j]. If A ∈ F j, E [Mj; A] = E [Mn; A] → E [M∞; A] by the L1 convergence of Mn to M∞. Since this is true for all A ∈ F j, Mj = E [M∞ | F j]. A.11 Strong law of large numbers Suppose we have a sequence X1, X2, . . . of independent and identically distributed random variables. This means that the Xi are independent and each has the same law as X1. This situation is very common, and we abbreviate this by saying the Xi are i.i.d. Define Sn = n∑ i=1 Xi. The Sn are called partial sums. In this section we suppose E |X1| < ∞. The strong law of large number is the precise version of the law of averages. Theorem A.38 If Xi is an i.i.d. sequence and E |X1| < ∞, then Sn n → E X1, a.s. The proof we give is a mixture of the standard one and some martingale techniques. The standard proof (see, e.g., Chung (2001)) uses no martingale methods, while there is a proof (see Durrett (1996)) that is entirely martingale based. A.11 Strong law of large numbers 365 Proof We may assume E Xi = 0, for otherwise we replace Xi by Xi − E Xi. Let Yn = Xn1(|Xn|≤n), Zn = Yn − EYn, and Mn = n∑ i=1 Zi i . Let Fn = σ (X1, . . . , Xn). Note that the Zi are independent but not identically distributed. Using the independence, Mn is a martingale: E [Mn+1 | Fn] = Mn + 1 n + 1E [Zn+1 | Fn] = Mn + 1 n + 1E [Zn+1] = Mn. We will need the estimate ∞∑ i=1 P(|X1| ≥ i) = ∞∑ i=1 ∫ i i−1 P(|Xi| ≥ i) dx (A.20) ≤ ∫ ∞ 0 P(|X1| ≥ x) dx = E |X1| < ∞, using Proposition A.4. We show that E |Mn| is bounded by a constant not depending on n. In fact, again using Proposition A.4, E M2n = Var Mn = n∑ i=1 Var Zi i2 = n∑ i=1 1 i2 VarYi ≤ n∑ i=1 1 i2 EY 2i ≤ ∞∑ i=1 1 i2 ∫ i 0 2yP(|Xi| ≥ y) dy = 2 ∞∑ i=1 1 i2 ∫ ∞ 0 1(y≤i)yP(|X1| ≥ y) dy = 2 ∫ ∞ 0 ∞∑ i=1 1 i2 1(y≤i)yP(|X1| ≥ y) dy ≤ c ∫ ∞ 0 1 y · yP(|X1| ≥ y) dy = c ∫ ∞ 0 P(|X1| ≥ y) dy = cE |X1| < ∞. The uniform bound on E |Mn| follows by Jensen’s inequality. By the martingale convergence theorem, Mn converges almost surely; let M∞ be the limit. Some elementary calculus shows that 1n ∑n i=1 Mi also converges to M∞, a.s. We now use summation by parts as follows. Since i(Mi − Mi−1) = Zi and M0 = 0, then 1 n n∑ i=1 Zi = 1 n n∑ i=1 (iMi − iMi−1) = 1 n ( n∑ i=1 iMi − n−1∑ i=1 (i + 1)Mi ) = Mn − n − 1 n ( 1 n − 1 n−1∑ i=1 Mi ) → M∞ − M∞ = 0. 366 Basic probability By dominated convergence and the fact that the Xi are identically distributed, EYn = E [Xn1(|Xn|≤i)] = E [X11(|X1|≤n)] → E X1 = 0 as n → ∞, and this implies 1n ∑n i=1 EYi → 0. Since Yi = Zi + EYi, we conclude 1 n n∑ i=1 Yi → 0, a.s. Finally, ∞∑ i=1 P(Xi �= Yi) = ∞∑ i=1 P(|Xi| ≥ i) = ∞∑ i=1 P(|X1| ≥ i) < ∞, so by the Borel–Cantelli lemma, except for a set of probability zero, Xi = Yi for all i greater than some positive integer I (I depends on ω). Hence∣∣∣1 n n∑ i=1 Xi − 1 n n∑ i=1 Yi ∣∣∣ ≤ 1 n I∑ i=1 |Xi − Yi| → 0, a.s. This completes the proof. The following extension of the strong law will be needed when comparing a random walk and a Brownian motion. Proposition A.39 Suppose Xi is an i.i.d. sequence and E |X1| < ∞. Then maxk≤n |Sk − E Sk| n → 0, a.s. Proof By looking at Xi −E Xi, we may assume E Xi = 0. Let j(n) be (one of) the value(s) of j such that |Sj| = maxk≤n |Sk|. Suppose Sn(ω)/n → 0. It suffices to show |Sj(n)(ω)|/n → 0, a.s. If not, for this ω, either (1) there is a subsequence nk → ∞ and ε > 0 such that j(nk ) → ∞ and |Sj(nk )|/nk ≥ ε
for all k; or
(2) there exists a subsequence nk → ∞, ε > 0, and N > 1 such that j(nk ) ≤ N and
|Sj(nk )|/nk ≥ ε for all k.
In case (1), since j(nk ) → ∞,
|Sj(nk )|
nk
= |Sj(nk )|
j(nk )
j(nk )
nk
≤ |Sj(nk )|
j(nk )
→ 0,
a contradiction. In case (2),
|Sj(nk )|
nk
≤ maxm≤N |Sm|
nk
→ 0,
also a contradiction.
Another application of the strong law of large numbers is the Glivenko–Cantelli
theorem. Let Xi be i.i.d. random variables which have a uniform distribution on [0, 1],

A.12 Weak convergence 367
that is, P(X1 ≤ t) = t if 0 ≤ t ≤ 1. Let
Fn(t) = 1
n
n∑
i=1
1[0,t](Xi), 0 ≤ t ≤ 1.
By the strong law, Fn(t) → t, a.s., for each t. The Glivenko–Cantelli theorem says that the
convergence is uniform over t.
Theorem A.40 With Fn as above,
sup
0≤t≤1
|Fn(t) − t| → 0, a.s.
Proof For each t ∈ [0, 1], 1[0,t](Xi) is a sequence of i.i.d. random variables with expectation
P(Xi ≤ t) = t. By the strong law of large numbers, for each t, Fn(t) → t, a.s. Let Nt be the
set of ω such that Fn(t)(ω) does not converge to t, and let N = ∪Q+Nt . Then P(N ) = 0.
Let ε > 0 and take ω /∈ N . Take m > 2/ε and choose n0 large enough (depending on ω)
such that
|Fn(k/m)(ω) − (k/m)| < ε/2, k = 0, 1, 2, . . . , m, if n ≥ n0. Then if n ≥ n0 and k/m ≤ t < (k + 1)/m, Fn(t) − t ≤ Fn((k + 1)/m) − k/m ≤ Fn((k + 1)/m) − (k + 1)/m + ε/2 < ε, and similarly Fn(t) − t > −ε. Hence for n ≥ n0,
sup
t∈[0,1]
|Fn(t) − t| ≤ ε.
Since ε is arbitrary, this proves the uniform convergence.
A.12 Weak convergence
We will see soon that if the Xi are i.i.d. with mean zero and variance one, then Sn/
√
n
converges in the sense that
P(Sn/
√
n ∈ [a, b]) → P(Z ∈ [a, b]),
where Z is a standard normal. We want to generalize the above type of convergence.
We say Fn converges weakly to F if Fn(x) → F (x) for all x at which F is continuous. Here
Fn and F are distribution functions. We say Xn converges weakly to X if FXn converges weakly
to FX . We also say Xn converges in distribution or converges in law to X . Probabilities μn
converge weakly if their corresponding distribution functions converge, that is, if Fμn (x) =
μn(−∞, x] converges weakly.
An example that illustrates why we restrict the convergence to continuity points of F is
the following. Let Xn = 1/n with probability one, and X = 0 with probability one. FXn (x) is
0 if x < 1/n and 1 otherwise. Note FXn (x) converges to FX (x) for all x except x = 0. Proposition A.41 Xn converges weakly to X if and only if E g(Xn) → E g(X ) for all g bounded and continuous. 368 Basic probability Proof Suppose E g(Xn) → E g(X ) whenever g is bounded and continuous. Let ε > 0
and suppose x is a continuity point of FX . Choose δ such that FX (x) − ε < FX (x − δ) ≤ FX (x + δ) < FX (x) + ε. Let g be a continuous function taking values in [0, 1] such that g equals 1 on (−∞, x] and equals 0 on [x + δ, ∞). Then lim sup n→∞ FXn (x) ≤ lim sup n→∞ E g(Xn) = E g(X ) ≤ FX (x + δ) < FX (x) + ε. A similar argument shows that lim inf n→∞ FXn > FX (x) − ε. Since ε is arbitrary,
limn→∞ FXn (x) = FX (x).
Now suppose Xn → X weakly. Let ε > 0 and choose M > 0 such that M and −M are
continuity points for FX and also continuity points for each of the FXn , FX (−M ) < ε, and FX (M ) > 1 − ε. Suppose g is bounded and continuous on R and without loss of generality
suppose g is bounded by 1. Then
lim sup
n→∞
|E [g(Xn); Xn /∈ [−M, M )]| (A.21)
≤ lim sup
n→∞
P(|Xn| ≥ M )
= lim sup
n→∞
FXn (−M ) + lim sup
n→∞
(1 − FXn (M ))
≤ 2ε.
Similarly,
|E [g(X ); X /∈ [−M, M )]| ≤ 2ε. (A.22)
Take f to be a step function of the form
∑m
i=1 ci1(ai,bi] such that | f (x) − g(x)| < ε for x ∈ [−M, M ) and each ai and bi is a continuity point for FX and also continuity points for each of the FXn . Then E f (Xn) = m∑ i=1 ci(FXn (bi) − FXn (ai)) (A.23) → m∑ i=1 ci(FX (bi) − FX (ai)) = E f (X ). Finally, since f differs from g by at most ε on [−M, M), then |E f (Xn) − E [g(Xn); Xn ∈ [−M, M )] | ≤ ε (A.24) and similarly when Xn is replaced by X . Combining (A.21), (A.22), (A.23), and (A.24) and using the fact that ε is arbitrary shows that E g(Xn) → E g(X ). Let us examine the relationship between weak convergence and convergence in probability. If Xi is an i.i.d. sequence, then Xi converges weakly, in fact, to X1, since all the Xi’s have the same distribution. But from the independence it is not hard to see that the sequence Xi does not converge in probability unless the Xi’s are identically constant. Therefore one can have weak convergence without convergence in probability. A.12 Weak convergence 369 Proposition A.42 (1) If Xn converges to X in probability, then it converges weakly. (2) If Xn converges weakly to a constant, it converges in probability. (3) (Slutsky’s theorem) If Xn converges weakly to X and Yn converges weakly to a constant b, then Xn + Yn converges weakly to X + b and XnYn converges weakly to bX . Proof To prove (1), let g be a bounded and continuous function. If nj is any subsequence, then there exists a further subsequence such that X (njk ) converges almost surely to X . Then by dominated convergence, E g(X (njk )) → E g(X ). That suffices to show E g(Xn) converges to E g(X ). For (2), if Xn converges weakly to b, P(Xn − b > ε) = P(Xn > b + ε) = 1 − P(Xn ≤ b + ε) → 1 − P(b ≤ b + ε) = 0.
We use the fact that if Y is identically equal to b, then b + ε is a point of continuity for FY .
A similar equation shows P(Xn − b ≤ −ε) → 0, so P(|Xn − b| > ε) → 0.
We now prove the first part of (3), leaving the second part for the reader. Let x be a point
such that x − b is a continuity point of FX . Choose ε so that x − b + ε is again a continuity
point. Then
P(Xn + Yn ≤ x) ≤ P(Xn + b ≤ x + ε) + P(|Yn − b| > ε) → P(X ≤ x − b + ε).
Hence lim sup P(Xn + Yn ≤ x) ≤ P(X + b ≤ x + ε). Since ε can be arbitrarily small and
x − b is a continuity point of FX , then lim sup P(Xn + Yn ≤ x) ≤ P(X + b ≤ x). The lim inf
is done similarly.
We say a sequence of distribution functions {Fn} is tight if for each ε > 0 there exists M such
that Fn(M ) ≥ 1 − ε and Fn(−M ) ≤ ε for all n. A sequence of random variables {Xn} is tight
if the corresponding distribution functions are tight; this is equivalent to P(|Xn| ≥ M ) ≤ ε.
Theorem A.43 (Helly’s theorem) Let Fn be a sequence of distribution functions that is
tight. There exists a subsequence n j and a distribution function F such that Fnj converges
weakly to F.
What could conceivably happen is that Xn is identically equal to n, so that FXn → 0, but the
function F that is identically equal to 0 is not a distribution function; the tightness precludes
this.
Proof Let qk be an enumeration of the rationals. Since Fn(qk ) ∈ [0, 1], any subsequence
has a further subsequence that converges. Use a diagonalization argument (as in the proof
of the Ascoli–Arzelà theorem; see Rudin (1976)) so that Fnj (qk ) converges for each qk and
call the limit F (qk ). F is increasing, and define F (x) = inf qk≥x F (qk ). Hence F is right
continuous and increasing.
If x is a point of continuity of F and ε > 0, then there exist r and s rational such that
r < x < s and F (s) − ε < F (x) < F (r) + ε. Then Fnj (x) ≥ Fnj (r) → F (r) > F (x) − ε
and
Fnj (x) ≤ Fnj (s) → F (s) < F (x) + ε. Since ε is arbitrary, Fnj (x) → F (x). 370 Basic probability Since the Fn are tight, there exists M such that Fn(−M ) < ε. Then F (−M ) ≤ ε, which implies limx→−∞ F (x) = 0. Showing limx→∞ F (x) = 1 is similar. Therefore F is in fact a distribution function. We conclude by giving an easily checked criterion for tightness. Proposition A.44 Suppose there exists ϕ : [0, ∞) → [0, ∞) that is increasing and ϕ(x) → ∞ as x → ∞. If a = supn E ϕ(|Xn|) < ∞, then the sequence {Xn} is tight. Proof Let ε > 0. Choose M such that ϕ(x) ≥ a/ε if x > M . Then
P(|Xn| > M ) ≤
∫
ϕ(|Xn|)
a/ε
1(|Xn|>M )dP ≤
ε
a
E ϕ(|Xn|) ≤ ε.
The conclusion follows.
In particular, if supn E |Xn|2 < ∞, the sequence {Xn} is tight. A.13 Characteristic functions We define the characteristic function of a random variable X by ϕX (t) = E eitx for t ∈ R. Note that ϕX (t) = ∫ eitxPX (dx). Thus if X and Y have the same law, they have the same characteristic function. Also, if the law of X has a density, that is, PX (dx) = fX (x) dx, then ϕX (t) = ∫ eitx fX (x) dx, so in this case the characteristic function is the same as the definition of the Fourier transform of fX . Proposition A.45 ϕ(0) = 1, |ϕ(t)| ≤ 1, ϕ(−t) = ϕ(t), and ϕ is uniformly continuous. Proof Since |eitx| ≤ 1, everything follows immediately from the definitions except the uniform continuity. For that we write |ϕ(t + h) − ϕ(t)| = |E ei(t+h)X − E eitX | ≤ E |eitX (eihX − 1)| = E |eihX − 1|. Since |eihX − 1| tends to zero almost surely as h → 0, the right-hand side tends to zero by dominated convergence. Note that the right-hand side is independent of t. Proposition A.46 ϕaX (t) = ϕX (at) and ϕX +b(t) = eitbϕX (t). Proof The first follows from E eit(aX ) = E ei(at)X , and the second is similar. Proposition A.47 If X and Y are independent, then ϕX +Y (t) = ϕX (t)ϕY (t). Proof From the multiplication theorem, E eit(X +Y ) = E eitX eitY = E eitX E eitY , and we are done. Let us look at some examples of characteristic functions. (1) Bernoulli: By direct computation, ϕX (t) = peit + (1 − p) = 1 − p(1 − eit ). A.13 Characteristic functions 371 (2) Binomial: Write X as the sum of n independent Bernoulli random variables Bi with parameter p. Thus ϕX (t) = n∏ i=1 ϕBi (t) = [ϕBi (t)]n = [1 − p(1 − eit )]n. (3) Point mass at a: E eitX = eita. Note that when a = 0, then ϕ is identically equal to 1. (4) Poisson: E eitX = ∞∑ k=0 eitke−λ λk k! = e−λ ∑ (λeit )k k! = e−λeλeit = eλ(eit−1). (5) Uniform on [a, b]: ϕ(t) = 1 b − a ∫ b a eitxdx = e itb − eita (b − a)it . Note that when a = −b this reduces to sin(bt)/bt. (6) Exponential: ϕ(t) = ∫ ∞ 0 λeitxe−λx dx = λ ∫ ∞ 0 e(it−λ)xdx = λ λ − it . (7) Standard normal: ϕ(t) = 1√ 2π ∫ ∞ −∞ eitxe−x 2/2dx. This can be done by completing the square and then doing a contour integration. Alternately, ϕ′(t) = (1/√2π) ∫∞−∞ ixeitxe−x2/2dx. (Do the real and imaginary parts separately, and use the dominated convergence theorem to justify taking the derivative inside.) Integrating by parts (do the real and imaginary parts separately), ϕ′(t) = −tϕ(t). The only solution to this differential equation with ϕ(0) = 1 is ϕ(t) = e−t2/2. (8) Normal with mean μ and variance σ 2: Writing X = σZ + μ, where Z is a standard normal, then ϕX (t) = eiμtϕZ(σ t) = eiμt−σ 2t2/2. (A.25) (9) Gamma. If X has a gamma distribution with parameters λ and r, then its characteristic function is E eiuX = ( λ λ − it )r . Formally, this comes from writing ϕ(t) = 1 (r) ∫ ∞ 0 eitxλe−λx(λx)r−1 dx = λ r (r) ∫ ∞ 0 e−(λ−it)xxr−1 dx and performing a change of variables. To do it properly requires a contour integration around the boundary of the region in the complex plane that is bounded by the positive x axis, the ray {(λ − it)r : r > 0}, ∂B(0, ε), and ∂B(0, R), and then letting ε → 0 and R → ∞.

372 Basic probability
A.14 Uniqueness and characteristic functions
Theorem A.48 If ϕX = ϕY , then PX = PY .
Proof If f is in the Schwartz class, then so is f̂ ; see Section B.2. We use the Fubini theorem
and the Fourier inversion theorem to write
E f (X ) = (2π)−1E
[ ∫
f̂ (u)e−iuX du
]
= (2π)−1
∫
f̂ (u)ϕX (−u) du,
and similarly for E f (Y ). Since ϕX = ϕY , we conclude E f (X ) = E f (Y ). By a limit
procedure, we have this equality for all bounded and measurable f , in particular, when f is
the indicator of a set.
The same proof works in higher dimensions: if
E ei
∑n
j=1 u jXj = E ei
∑n
j=1 u jYj
for all (u1, . . . , un) ∈ Rn, then the joint laws of (X1, . . . , Xn) and (Y1, . . . ,Yn) are equal. The
expression E ei
∑n
j=1 u jXj is called the joint characteristic function of (X1, . . . , Xn).
The following proposition can be proved directly, but the proof using characteristic func-
tions is much easier.
Proposition A.49 (1) If X and Y are independent, X is a normal random variable with
mean a and variance b2, and Y is a normal random variable with mean c and variance d2,
then X + Y is normal random variable with mean a + c and variance b2 + d2.
(2) If X and Y are independent, X is a Poisson random variable with parameter λ1, and Y
is a Poisson random variable with parameter λ2, then X + Y is a Poisson random variable
with parameter λ1 + λ2.
(3) If X and Y are independent random variables, where X has a gamma distribution with
parameters λ and r1 and Y has a gamma distribution with parameters λ and r2, then X + Y
has a gamma distribution with parameters λ and r1 + r2.
Proof For (1),
ϕX +Y (t) = ϕX (t)ϕY (t) = eiat−b2t2/2eict−c2t2/2 = ei(a+c)t−(b2+d2 )t2/2.
Now use the uniqueness theorem.
Parts (2) and (3) are proved similarly.
A.15 The central limit theorem
We need the following estimate on moments.
Proposition A.50 If E |X |k < ∞ for an integer k, then ϕX has a continuous derivative of order k and ϕ (k) X (t) = ∫ (ix)keitxPX (dx). In particular, ϕ(k)X (0) = ikE X k. A.15 The central limit theorem 373 Proof Write ϕX (t + h) − ϕX (t) h = ∫ ei(t+h)x − eitx h P(dx). Since |eihx − 1| ≤ |h| |x|, the integrand is bounded by |x|. Thus if ∫ |x|PX (dx) < ∞, we can use dominated convergence to obtain the desired formula for ϕ′X (t). As in the proof of Proposition A.45, we see ϕ′X (t) is continuous. We do the case of general k by induction. Evaluating ϕ(k)X at 0 shows ϕ (k) X (0) = ikE X k . By the above, E X 2 = −ϕ′′X (0). (A.26) The simplest case of the central limit theorem (CLT) is the case when the Xi’s are i.i.d., with mean zero and variance one, Sn = ∑n i=1 Xi, and then the CLT says that Sn/ √ n converges weakly to a standard normal. This is the case we prove. We need the fact that if wn are complex numbers converging to w, then (1+(wn/n))n → ew. We leave the proof of this to the reader, with the warning that any proof using loga- rithms needs to be done with some care, since log z is a multivalued function when z is complex. Theorem A.51 Suppose the Xi’s are i.i.d. random variables with mean zero and variance one. Then Sn/ √ n converges weakly to a standard normal. Proof Since X1 has finite second moment, then ϕX1 has a continuous second derivative by Proposition A.50. By Taylor’s theorem, ϕX1 (t) = ϕX1 (0) + ϕ′X1 (0)t + ϕ′′X1 (0)t2/2 + R(t), where |R(t)|/t2 → 0 as |t| → 0. Thus ϕX1 (t) = 1 − t2/2 + R(t). Then ϕSn/ √ n(t) = ϕSn (t/ √ n) = (ϕX1 (t/ √ n))n = [ 1 − t 2 2n + R(t/√n) ]n . Since t/ √ n converges to zero as n → ∞, we have ϕSn/ √ n(t) → e−t2/2. Since E S2n/n = 1 for all n, Proposition A.44 tells us that the random variables Sn/ √ n are tight, and from Theorem A.43, subsequential weak limit points exist. By the preceding paragraph, any weak limit of a subsequence is a normal random variable with mean zero and variance one. Therefore the entire sequence converges weakly to a normal random variable with mean zero and variance one. 374 Basic probability A.16 Gaussian random variables A normal random variable is also known as a Gaussian random variable. Proposition A.52 If Z is a mean zero normal random variable with variance one and x ≥ 1, then 1 x e−x 2/2 ≤ P(Z ≥ x) ≤ e−x2/2. In particular, if ε > 0, there exists x0 such that
P(Z ≥ x) ≥ e−(1+ε)x2/2
if x ≥ x0.
Proof For the right-hand inequality,
P(Z ≥ x) = 1√
2π
∫ ∞
x
e−y
2/2 dy ≤
∫ ∞
x
y
x
e−y
2/2 dy = 1
x
e−x
2/2.
The left-hand inequality is left as an exercise.
Proposition A.53 If Xn is a normal random variable with mean an and variance b2n, Xn
converges to X weakly, an → a, and bn → b �= 0, then X is a normal random variable with
mean a and variance b2.
Proof Since
E X 2n = Var Xn + (E Xn)2 = b2n + a2n,
then supn E X
2
n < ∞, and the Xn are tight. For each t, the characteristic functions converge: ϕX (t) = lim n→∞ ϕXn (t) = limn→∞ e itan−t2b2n/2 = eita−t2b2/2, and the last term is the characteristic function of a normal random variable with mean a and variance b2. Therefore any weak subsequential limit point of the sequence Xn is a normal random variable with mean a and variance b2. We next prove Proposition A.54 If E ei(uX +vY ) = E eiuX E eivY (A.27) for all u and v, then X and Y are independent random variables. Proof Let X ′ be a random variable with the same law as X , Y ′ one with the same law as Y , and so that X ′ is independent of Y ′. (We let � = [0, 1]2, P a Lebesgue measure, X ′ a function of the first variable, and Y ′ a function of the second variable defined as in Proposition A.2.) Then since eiuX ′ and eivY ′ are independent, E ei(uX ′+vY ′) = E eiuX ′E eivY ′ . (A.28) Since X , X ′ have the same law, E eiuX = E eiuX ′ , and similarly for Y,Y ′. Therefore, using (A.27) and (A.28), (X ′,Y ′) has the same joint characteristic function as (X ,Y ). By the A.16 Gaussian random variables 375 uniqueness theorem for characteristic functions, (X ′,Y ′) has the same joint law as (X ,Y ), which implies that X and Y are independent. A sequence of random variables X1, . . . , Xn is said to be jointly normal if there exists a sequence of i.i.d. normal random variables Z1, . . . , Zm with mean zero and variance one and constants bi j and ai such that Xi = m∑ j=1 bi jZ j + ai, i = 1, . . . , n. (A.29) In matrix notation, X = BZ + A. For simplicity, in what follows let us take A = 0; the modifications for the general case are easy. The covariance of two random variables X and Y is defined to be E [(X − E X )(Y − EY )]. Since we are assuming our normal random variables are mean zero, we can omit the centering at expectations. Given a sequence of mean zero random variables, we can talk about the covariance matrix, which is Cov (X ) = E X X T , where X T denotes the transpose of the vector X . In the above case, we see Cov (X ) = E [(BZ)(BZ)T ] = E [BZZT BT ] = BBT , since E ZZT = I , the identity. Let us compute the joint characteristic function E eiu T X of the vector X , where u is an n-dimensional vector. First, if v is an m-dimensional vector, E eiv T Z = E m∏ j=1 eiv jZ j = m∏ j=1 E eiv jZ j = m∏ j=1 e−v 2 j/2 = e−vT v/2 using the independence of the Zj’s. Thus E eiu T X = E eiuT BZ = e−uT BBT u/2. By taking u = (0, . . . , 0, a, 0, . . . , 0) to be a constant times the unit vector in the jth coordinate direction, we deduce that Xj is indeed normal, and this is true for each j. Note that the joint characteristic function of a jointly normal collection of random vari- ables X = (X1, . . . , Xn) is completely determined by BBT , which is the covariance matrix of X . In the case when the Xi’s are not mean zero, we can readily check that the joint char- acteristic function is determined by the covariance matrix together with the vector of means E X . Therefore the joint distribution of a jointly normal collection of random variables is determined by the covariance matrix and the means. Proposition A.55 If the Xi are jointly normal and Cov (Xi, Xj) = 0 for i �= j, then the Xi are independent. Proof If Cov (X ) = BBT is a diagonal matrix, then the joint characteristic function of the Xi’s factors into the product of the characteristic functions of the Xi’s, and so by Proposition A.54, the Xi’s will in this case be independent. Remark A.56 We note that the analog of Proposition A.53 holds for jointly normal random vectors. That is, if (X 1j , . . . , X n j ) is a jointly normal collection of random variables for each j and each X ij converges in probability to X i and each Xi is nonconstant, then (X 1, . . . , X n) 376 Basic probability is a jointly normal collection of random variables. This follows by looking at the joint characteristic functions as in the proof of Proposition A.53. We present the multidimensional central limit theorem. Theorem A.57 Let Xj = (X 1j , . . . , X dj ) be random vectors taking values in Rd and suppose the X1, X2, . . . are independent and identically distributed. Suppose E X k1 = 0 and E (X k1 )2 < ∞ for k = 1, . . . , d and let Ck� = E [X k1 X �1 ]. If Sn = ∑n j=1 Xj, then Sn/ √ n converges weakly to a jointly normal random vector Z = (Z1, . . . , Zd ) where each Zk has mean zero and the covariance of Zk and Z� is Ck�. Proof Since E |Sn|2/n = n∑ j=1 d∑ k=1 E |X kj |2/n is bounded independently of n, the random vectors Sn/ √ n are tight, and therefore weak subsequential limit points exist. We need to show that any subsequential limit point is a jointly normal random vector with mean zero and covariance matrix C. If u1, . . . , ud ∈ R, then ∑d k=1 ukX k j , j = 1, 2, . . . , will be a sequence of i.i.d. random variables with mean zero and variance ∑d k,�=1 uku�Ck�. By Theorem A.51,∑n j=1 ∑d k=1 ukX k j√ n converges weakly to a mean zero normal random variable with variance equal to∑d k,�=1 uku�Ck�. If we write Sn = (S1n, . . . , Sdn ), then E exp ( i d∑ k=1 ukS k n/ √ n ) → exp ( − d∑ k,�=1 uku�Ck�/2 ) . This shows that any subsequential limit point of the sequence Sn/ √ n has the required law. If (X ,Y1, . . . ,Yn) are jointly normal random variables, then the law of X given Y1, . . . ,Yn is also Gaussian. Proposition A.58 Suppose X ,Y1, . . . ,Yn are jointly normal random variables with mean zero. Let A be the n × 1 matrix whose ith entry is Cov (X ,Yi), B the n × n matrix whose (i, j)th entry is Cov (Yi,Yj), and Y the n × 1 matrix whose ith entry is Yi. Suppose B is invertible and let D = B−1A. Then for u ∈ R, E [eiuX | Y1, . . . ,Yn] = eiuDT Y e−(Var X −AT B−1A)/2. In particular, the law of X given Y1, . . . ,Yn is that of a normal random variable with mean DTY and variance equal to Var X − AT B−1A. A.16 Gaussian random variables 377 Proof Note Cov (X − DTY,Yj) = Cov (X ,Yj) − n∑ i=1 DiCov (Yi,Yj) = Aj − n∑ i=1 DiBi j = 0, so X − DTY is independent of each Yj. Then E [eiuX | Y1, . . . ,Yn] = eiuDT Y E [eiu(X −DT Y | Y1, . . . ,Yn] = eiuDT Y E [eiu(X −DT Y ] = eiuDT Y E e−Var (X −DT Y )/2. To complete the proof, we calculate Var (X − DTY ) = Var X − 2 ∑ i DiAi + ∑ i, j DiBi jDj = Var X − AT B−1A, and we are done. Appendix B Some results from analysis B.1 The monotone class theorem The monotone class theorem is a result from measure theory used in the proof of the Fubini theorem. Definition B.1 M is a monotone class if M is a collection of subsets of X such that (1) if A1 ⊂ A2 ⊂ · · · , A = ∪iAi, and each Ai ∈ M, then A ∈ M; (2) if A1 ⊃ A2 ⊃ · · · , A = ∩iAi, and each Ai ∈ M, then A ∈ M. Recall that an algebra of sets is a collection A of sets such that if A1, . . . , An ∈ A, then A1 ∪ · · · ∪ An and A1 ∩ · · · ∩ An are also in A, and if A ∈ A, then Ac ∈ A. The intersection of monotone classes is a monotone class, and the intersection of all mono- tone classes containing a given collection of sets is the smallest monotone class containing that collection. Theorem B.2 Suppose A0 is an algebra of sets, A is the smallest σ -field containing A0, and M is the smallest monotone class containing A0. Then M = A. Proof A σ -algebra is clearly a monotone class, so M ⊂ A. We must show A ⊂ M. Let N1 = {A ∈ M : Ac ∈ M}. Note N1 is contained in M, contains A0, and is a monotone class. Since M is the smallest monotone class containing A0, then N1 = M, and therefore M is closed under the operation of taking complements. Let N2 = {A ∈ M : A ∩ B ∈ M for all B ∈ A0}. N2 is contained in M; N2 contains A0 because A0 is an algebra; N2 is a monotone class because (∪∞i=1Ai) ∩ B = ∪∞i=1(Ai ∩ B), and similarly for intersections. Therefore N2 = M; in other words, if B ∈ A0 and A ∈ M, then A ∩ B ∈ M. Let N3 = {A ∈ M : A ∩ B ∈ M for all B ∈ M}. As in the preceding paragraph, N3 is a monotone class contained in M. By the last sentence of the preceding paragraph, N3 contains A0. Hence N3 = M. We thus have that M is a monotone class closed under the operations of tak- ing complements and taking intersections. This shows M is a σ -algebra, and so A ⊂ M. 378 B.2 The Schwartz class 379 B.2 The Schwartz class A function f : Rd → R is in the Schwartz class if f is C∞ and for each m, k ≥ 0 and each i1, i2, . . . , ik ∈ {1, 2, . . . , d}, |x|m ∣∣∣ ∂k f ∂xi1 · · · ∂xik (x) ∣∣∣→ 0 as |x| → ∞. (Here i1, . . . , ik need not be distinct.) Suppose that f is in the Schwartz class. Suppose m, k ≥ 0 and i1, . . . , ik and j1, . . . , jn are each integers between 1 and d inclusive, and m1, . . . , mk are even positive integers. Let f̂ be the Fourier transform of f : f̂ (u) = ∫ Rd eiu·x f (x) dx. Then um1i1 · · · umkik ∂ j1+···+ jn f̂ ∂uj1 · · · ∂ujn (u) is bounded as a function of u because it is a constant times the Fourier transform of x j1 · · · x jn ∂m1+···+mk f ∂xm1i1 · · · ∂xmkik , which is in L1(Rd ) since f is in the Schwartz class. We conclude that f̂ is also in the Schwartz class. Appendix C Regular conditional probabilities Let E ⊂ F be σ -fields, where (�,F , P) is a probability space. A regular conditional probability for E [ · | E] is a map Q : � × F → [0, 1] such that (1) Q(ω, ·) is a probability measure on (�,F ) for each ω; (2) for each A ∈ F , Q(·, A) is an E measurable random variable; (3) for each A ∈ F and each B ∈ E ,∫ B Q(ω, A) P(dω) = P(A ∩ B). Q(ω, A) can be thought of as P(A | E ). Theorem C.1 Suppose (�,F , P) is a probability space, E ⊂ F , and � is in addition a complete and separable metric space. Then a regular conditional probability for P(· | E ) exists. Proof Since � is a complete and separable metric space, we can embed � as a subset of the compact set I = [0, 1]N, where we furnish I with the product topology. Let { f j} be a countable collection of uniformly continuous functions on � such that every finite subset of distinct elements is linearly independent and such thatL0, the set of finite linear combinations of the f j’s, is dense in the class of uniformly continuous functions on �; let us assume f1 is identically equal to 1. For each j, let gj = E [ f j | E]. (The random variables gj are only defined up to almost sure equivalence. For each j we select an element gj from the equivalence class and keep it fixed.) If r1, . . . , rn are rationals with r1 f1(ω) + · · · + rn fn(ω) ≥ 0 for all ω, let N (r1, . . . , rn) = {ω : r1g1(ω) + · · · + rngn(ω) < 0}. By the definition of gj, P(N (r1, . . . , rn)) = 0. Let N1 be the union of all such N (r1, . . . , rn) with n ≥ 1, the r j rational. Then N1 ∈ E and P(N1) = 0. Fix ω ∈ � \ N1. Define a functional Lω on L0 by Lω( f ) = t1g1(ω) + · · · + tngn(ω) if f = t1 f1 + · · · + tn fn. 380 Regular conditional probabilities 381 We claim Lω is a positive linear functional. If f = t1 f1 +· · ·+ tn fn ≥ 0 and ε > 0 is rational,
then there exist rationals r1, . . . , rn such that r1 f1 + . . . + rn fn ≥ −ε and |ti − ri| ≤ ε,
i = 1, . . . , n, or
(r1 + ε) f1 + r2 f2 + · · · + rn fn ≥ 0.
Since ω /∈ N1, then
(r1 + ε)g1 + r2g2 + · · · + rngn ≥ 0.
Letting ε → 0, it follows that t1g1 + · · · + tngn ≥ 0. This proves that Lω is positive.
Since Lω( f1) = 1, this implies that Lω is a bounded linear functional, and by the Hahn–
Banach theorem Lω can be extended to a positive linear functional on the closure of L0. Any
uniformly continuous function on � can be extended uniquely to �, the closure of � in I ,
so Lω can be considered as a positive linear functional on C(�). By the Riesz representation
theorem, there exists a probability measure Q(ω, ·) such that
Lω( f ) =
∫
f (ω′)Q(ω, dω′).
The mapping ω → Lω( f ) is measurable with respect to E for each f ∈ L0, hence for all
uniformly continuous functions on � by a limit argument. If B ∈ E and f = t1 f1 +· · ·+ tn fn,∫
B
[ ∫
f (ω′) Q(ω, dω′)
]
P(dω) =
∫
B
Lω f (ω) P(dω)
=
∫
B
(t1g1 + · · · + tngn)(ω) P(dω)
=
∫
B
E [t1 f1 + · · · + tn fn | E](ω) P(dω)
=
∫
B
f (ω) P(dω)
or
∫
f (ω′)Q(ω, dω′) is a version of E [ f |E] if f ∈ L0. By a limit argument, the same is true
for all f that are of the form f = 1A with A ∈ F .
Let Gni be a sequence of balls of radius 1/n (with respect to the metric on �) contained in �
and covering �. Choose in such that P(∪i≤in Gni) > 1 − 1/(n2n). The set Hn = ∩n≥1 ∪i≤in Gni
is totally bounded; let Kn be the closure of Hn in �. Since � is complete, Kn is complete and
totally bounded, and hence compact, and P(Kn) ≥ 1 − 1/n. Hence
E [Q(·, ∪∞i=1Ki); � \ N1] ≥ E [Q(·, Kn); � \ N1] = P(Kn) ≥ 1 − (1/n)
for each n, or Q(ω, ∪∞i=1Ki) = 1, a.s. Let N2 be the null set for which this fails. Thus for
ω ∈ � \ (N1 ∪ N2), we see that Q(ω, dω′) is a probability measure on �. For ω ∈ N1 ∪ N2,
set Q(ω, ·) = P(·). This Q is the desired regular conditional probability.

Appendix D
Kolmogorov extension theorem
Suppose S is a metric space. We use SN for the product space S×S×· · · furnished with the
product topology. We may view SN as the set of sequences (x1, x2, . . .) of elements of S . We
use the σ -field on SN generated by the cylindrical sets. Given an element x = (x1, x2, . . .) of
SN, we define πn(x) = (x1, . . . , xn) ∈ Sn.
We suppose we have a Radon probability measure μn defined on Sn for each n. (Being
a Radon measure means that we can approximate μn(A) from below by compact sets; see
Folland (1999) for details.) The μn are consistent if μn+1(A × S ) = μn(A) whenever A is a
Borel subset of Sn. The Kolmogorov extension theorem is the following.
Theorem D.1 Suppose for each n we have a probability measure μn on Sn. Suppose the μn’s
are consistent. Then there exists a probability measure μ onSN such that μ(A×SN) = μn(A)
for all A ⊂ Sn.
Proof Define μ on cylindrical sets by μ(A × SN) = μn(A) if A ⊂ Sn. By the consistency
assumption, μ is well defined. By the Carathéodory extension theorem, we can extend μ
to the σ -field generated by the cylindrical sets provided we show that whenever An are
cylindrical sets decreasing to ∅, then μ(An) → 0.
Suppose An are cylindrical sets decreasing to ∅ but μ(An) does not tend to 0; by taking
a subsequence we may assume without loss of generality that there exists ε > 0 such that
μ(An) ≥ ε for all n. We will obtain a contradiction.
We first want to arrange things so that each An = πn(An) × SN. Suppose An is of the
form
An = {(x1, x2, . . .) : (x1, . . . , x jn ) ∈ Bn},
where Bn is a Borel subset of S jn . We choose mn = n + max( j1, . . . , jn). Let
A0 = SN. We then replace our original sequence A1, A2, . . . by the sequence
A0, . . . , A0, A1, . . . , A1, A2, . . . , A2, A3, . . . , where we have m1 occurrences of A0, m2 − m1
occurrences of A1, m3 − m2 occurrences of A2, and so on. Therefore we may without loss of
generality suppose jn ≤ n. We then have
An = {(x1, x2, . . .) : (x1, . . . , xn) ∈ Bn × Sn− jn}.
Replacing Bn by Bn × S jn−n, we may without loss of generality suppose An =
πn(An) × SN.
382

Kolmogorov extension theorem 383
We set Ãn = πn(An). For each n, choose B̃n ⊂ Ãn so that B̃n is compact and μ(Ãn \ B̃n) ≤
ε/2n+1. Let Bn = B̃n ×SN and let Cn = B1 ∩ . . . ∩ Bn. Hence Cn ⊂ Bn ⊂ An, and Cn ↓ ∅, but
μ(Cn) ≥ μ(An) −
n∑
i=1
μ(Ai \ Bi) ≥ ε/2,
and C̃n = πn(Cn), the projection of Cn onto Sn, is compact.
We will find x = (x1, . . . , xn, . . . ) ∈ ∩nCn and obtain our contradiction. For each n choose
a point y(n) ∈ Cn. The first coordinates of {y(n)}, namely, {y1(n)}, form a sequence contained
in C̃1, which is compact, hence there is a convergent subsequence {y1(nk )}. Let x1 be the limit
point. The first and second coordinates of {y(nk )} form a sequence contained in the compact
set C̃2, so a further subsequence {(y1(nkj ), y2(nkj ))} converges to a point in C̃2. Since {nkj }
is a subsequence of {nk}, the first coordinate of the limit is x1. Therefore the limit point of
{(y1(nkj ), y2(nkj ))} is of the form (x1, x2), and this point is in C̃2. We continue this procedure
to obtain x = (x1, x2, . . . , xn, . . .). By our construction, (x1, . . . , xn) ∈ C̃n for each n, hence
x ∈ Cn for each n, or x ∈ ∩nCn, a contradiction.
A typical application of this theorem is to construct a countable sequence of independent
random variables. We construct X1, . . . , Xn as in Proposition A.10. Here S = [0, 1]. Let μn
be the law of (X1, . . . , Xn); it is easy to check that the μn form a consistent family. We use
Theorem D.1 to obtain a probability measure μ on [0, 1]N. To get random variables out of
this, we let Xi(ω) = ωi if ω = (ω1, ω2, . . .).

References
Aldous, D. 1978. Stopping times and tightness. Ann. Probab. 6, 335–40.
Barlow, M. T. 1982. One-dimensional stochastic differential equations with no strong solution. J. London
Math. Soc. 26, 335–47.
Bass, R. F. 1983. Skorokhod imbedding via stochastic integrals. Séminaire de Probabilités XVII. New York:
Springer-Verlag; 221–4.
Bass, R. F. 1995. Probabilistic Techniques in Analysis. New York: Springer-Verlag.
Bass, R. F. 1996. The Doob–Meyer decomposition revisited. Can. Math. Bull. 39, 138–50.
Bass, R. F. 1997. Diffusions and Elliptic Operators. New York: Springer-Verlag.
Billingsley, P. 1968. Convergence of Probability Measures. New York: John Wiley & Sons, Ltd.
Billingsley, P. 1971. Weak Convergence of Measures: Applications in Probability. Philadelphia: SIAM.
Blumenthal, R. M. and Getoor, R. K. 1968. Markov Processes and Potential Theory. New York: Academic
Press.
Bogachev, V. I. 1998. Gaussian Measures. Providence, RI: American Mathematical Society.
Boyce, W. E. and DiPrima, R. C. 2009. Elementary Differential Equations and Boundary Value Problems,
9th edn. New York: John Wiley & Sons, Ltd.
Chung, K. L. 2001. A Course in Probability Theory, 3rd edn. San Diego: Academic Press.
Chung, K. L. and Walsh, J. B. 1969. To reverse a Markov process. Acta Math. 123, 225–51.
Dawson, D. A. 1993. Measure-valued Markov processes. Ecole d’Eté de Probabilités de Saint-Flour XXI–
1991. Berlin: Springer-Verlag.
Dellacherie, C. and Meyer, P.-A. 1978. Probability and Potential. Amsterdam: North-Holland.
Dudley, R. M. 1973. Sample functions of the Gaussian process. Ann. Probab. 1, 66–103.
Durrett, R. 1996. Probability: Theory and Examples. Belmont, CA: Duxbury Press.
Ethier, S. N. and Kurtz, T. G. 1986. Markov Processes: Characterization and Convergence. New York: John
Wiley & Sons, Ltd.
Feller, W. 1971. An Introduction to Probability Theory and its Applications, 2nd edn. New York: John
Wiley & Sons, Ltd.
Folland, G. B. 1999. Real Analysis: Modern Techniques and their Applications, 2nd edn. New York: John
Wiley & Sons, Ltd.
Fukushima, M., Oshima, Y. and Takeda, M. 1994. Dirichlet Forms and Symmetric Markov Processes. Berlin:
de Gruyter.
Gilbarg, D. and Trudinger, N. S. 1983. Elliptic Partial Differential Equations of Second Order, 2nd edn.
New York: Springer-Verlag.
Itô, K. and McKean, Jr, H. P. 1965. Diffusion Processes and their Sample Paths. Berlin: Springer-Verlag.
Kallianpur, G. 1980. Stochastic Filtering Theory. Berlin: Springer-Verlag.
Karatzas, I. and Shreve, S. E. 1991. Brownian Motion and Stochastic Calculus, 2nd edn. New York: Springer-
Verlag.
Knight, F. B. 1981. Essentials of Brownian Motion and Diffusion. Providence, RI: American Mathematical
Society.
Kuo, H. H. 1975. Gaussian Measures in Banach Spaces. New York: Springer-Verlag.
Lax, P. 2002. Functional Analysis. New York: John Wiley & Sons, Ltd.
385

386 References
Liggett, T. M. 2010. Continuous Time Markov Processes: An Introduction. Providence, RI: American Math-
ematical Society.
Meyer, P.-A., Smythe, R. T. and Walsh, J. B. 1972. Birth and death of Markov processes. Proceedings of the
Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. III. Berkeley, CA: University
of California Press; 295–305.
Obłój, J. 2004. The Skorokhod embedding problem and its offspring. Probab. Surv. 1, 321–90.
Øksendal, B. 2003. Stochastic Differential Equations: An Introduction with Applications, 6th edn. Berlin:
Springer-Verlag.
Perkins, E. A. 2002. Dawson–Watanabe superprocesses and measure-valued diffusions. Lectures on Proba-
bility Theory and Statistics (Saint-Flour, 1999). Berlin: Springer-Verlag; 125–324.
Revuz, D. and Yor, M. 1999. Continuous Martingales and Brownian Motion, 3rd edn. Berlin: Springer-
Verlag.
Rogers, L. C. G. and Williams, D. 2000a. Diffusions, Markov Processes, and Martingales, Vol. 1. Cambridge:
Cambridge University Press.
Rogers, L. C. G. and Williams, D. 2000b. Diffusions, Markov Processes, and Martingales, Vol. 2. Cambridge:
Cambridge University Press.
Rudin, W. 1976. Principles of Mathematical Analysis, 3rd edn. New York: McGraw-Hill.
Rudin, W. 1987. Real and Complex Analysis, 3rd edn. New York: McGraw-Hill.
Skorokhod, A. V. 1965. Studies in the Theory of Random Processes. Reading, MA: Addison-Wesley.
Stroock, D. W. 2003. Markov Processes from K. Itô’s Perspective. Princeton, NJ: Princeton University Press.
Stroock, D. W. and Varadhan, S. R. S. 1977. Multidimensional Diffusion Processes. Berlin: Springer-Verlag.
Walsh, J. B. 1978. Excursions and local time. Astérisque 52–53, 159–92.

Index
adapted, 1, 359
additive functional, 169, 180
classical, 180
Aldous criterion, 264
almost surely, 348
announce, 112
Bessel processes, 200
binomial, 349, 371
Black–Scholes formula, 220
Blumenthal 0–1 law, 164
BMO, 129
Borel–Cantelli lemma, 353, 354
Brownian bridge, 273
Brownian motion, 6, 153
covariance, 8
fractional, 254
integrated, 41
maximum, 27
standard, 6
with drift, 24
zero set, 30, 48, 99, 214, 217
Brownian sheet, 254
Burkholder–Davis–Gundy
inequalities, 82
cadlag, 2
Cameron–Martin space, 253
canonical process, 158
Cauchy problem, 321
cemetery, 156, 177
central limit theorem, 373
chaining, 51
change of variables formula, 71
Chapman–Kolmogorov
equations, 155
characteristic function, 370
Chebyshev’s inequality, 352
Chung’s law of the iterated
logarithm, 47
class D, 57, 124
class DL, 126
closed form, 303
closed operator, 295
compensator, 124, 130
complete filtration, 1
conditional expectation, 357
conditional probability, 357
conditioned processes, 178
consistent, 382
construction of Brownian motion, 36, 248, 254, 284
continuation region, 187
continuous process, 2
convergence
almost surely, 355
in Lp, 355
in distribution, 367
in law, 367
in probability, 355
weak, 367
convolution semigroup, 285
covariance, 375
covariance matrix, 375
covariation, 58
cumulative normal distribution function, 227
cylindrical set, 3
D[0, 1]
compactness, 263
completeness, 262
metrics, 259
debut, 117
debut theorem, 117
density, 349
diffusion coefficient, 193, 315
Dirichlet boundary condition, 290
Dirichlet form, 303
Dirichlet problem, 320
dissipative, 294
distribution, 348
distribution function, 348
divergence form elliptic
operators, 307
Donsker invariance principle, 269
Doob decomposition, 361
Doob’s h-path transform, 178
Doob’s inequalities, 14, 361
Doob–Meyer decomposition, 60, 124
drift coefficient, 193, 315
dual optional projection, 124
387

388 Index
dual predictable projection, 124
dyadic rationals, 49
empirical process, 275
entry time, 115
equivalent martingale measure, 223
events, 348
excessive, 184
excessive majorant, 186
exercise time, 219
expected value, 348
exponential, 349, 371
martingale, 89
semimartingale, 144
exponential random variables, 33
Feller process, 161
Feynman–Kac formula, 323
filtration, 1, 2
finite-dimensional distributions, 3
Fourier series, 36
gamma, 350, 371
gauge, 323
Gaussian, 7, 374
Gaussian field, 255
Girsanov theorem, 89, 93, 144
Glivenko–Cantelli theorem, 366
good-λ inequality, 86
Green’s function, 175
Gronwall’s lemma, 201
Hölder continuous, 43, 47
harmonic, 173, 321
Hausdorff dimension, 48, 99
Hausdorff measure, 48
heat equation, 322
Helly’s theorem, 369
Hille–Yosida theorem, 292
hitting time, 115
Hunt process, 165
Hurst index, 254
i.i.d., 364
increasing process, 54, 121
independent, 353
independent increments, 6, 339
indicator, 348
indistinguishable, 2
infinite particle systems, 295
infinitesimal generator, 288
innovation process, 230
innovations approach, 229
integration by parts formula, 74
invariance principle, 108
invariant, 178
Itô’s formula, 71
multivariate, 74
Jensen’s inequality, 352, 358
John–Nirenberg inequality, 129
joint characteristic function, 372
jointly normal, 7, 375
Kalman–Bucy filter, 234
Karhunen–Loève expansion, 253
kernel, 154
killed process, 177
Kolmogorov backward equation, 291
Kolmogorov continuity criterion, 49
Kolmogorov extension theorem , 382
Kolmogorov forward equation, 292
Kunita–Watanabe inequality, 70
Lévy measure, 342
Lévy process, 32, 297, 339
Lévy system formula, 347
Lévy’s theorem, 77
Lévy–Khintchine formula, 342
last exit, 181
law, 3, 10, 348
law of the iterated logarithm, 44
least excessive majorant, 186
left continuous process, 2
lifetime, 156, 177
LIL, 44
linear equations, 199
linear model, 234
Lipschitz function, 100, 193
local time, 94, 209
joint continuity, 96
locally bounded, 141
lower semicontinuous, 186
Markov property, 25
Markov transition probabilities, 154
Markovian, 303
martingale, 13, 359
continuous, 54
convergence theorem, 363
local, 54, 139
locally square integrable, 139
problem, 316
representation theorem, 80, 81
uniformly integrable, 54, 364
maximum principle, 176
mean, 350
mean rate of return, 218
measure-valued branching diffusion
process, 317
metric entropy, 51
minimal augmented filtration, 2,
160
modulus of continuity, 247, 260
moment, 350
monotone class theorem, 378
multiplication theorem, 354

Index 389
natural scale, 326
Neumann boundary condition, 290
Newtonian potential density, 175
NFLVR condition, 223
no free lunch, 223
nondivergence form, 296, 315
non-negative definite, 254
normal, 349, 371
nowhere differentiable, 46
null set, 1, 111
observation process, 229
occupation time density, 175
occupation times, 97
one-dimensional diffusion, 326
optimal reward, 187
optimal stopping problem, 184
optional σ -field, 111
optional projection, 119
optional stopping theorem, 17, 360
optional time, 15
Ornstein–Uhlenbeck process, 159, 198
orthogonality lemma, 131
outer probability, 111
p-variation, 30, 48
partial sums, 364
paths, 2
paths locally of bounded variation, 54
Picard iteration, 101
Poincaré cone condition, 174
Poisson, 349, 371
point process, 147
process, 32, 171
Poisson’s equation, 319
portmanteau theorem, 237
potential, 155, 323
prévisible, 111
predict, 112
predictable, 64, 130
predictable σ -field, 64, 111
predictable projection, 120
probability, 348
process, 1
product formula, 74, 85
progressively measurable, 4
Prohorov metric, 241
Prohorov theorem, 239
purely discontinuous, 143
quadratic variation, 57, 79
quasi-left continuous, 165
random variables, 348
Ray–Knight theorem, 209
recurrence, 167
reduce, 139
reflection principle, 27
regular, 173, 326
regular conditional probability, 312, 380
regular Dirichlet form, 307
reproducing property, 252
resolvent, 155, 286
reward function, 184
right continuous filtration, 1
right continuous process, 2
right continuous with left limits, 2
scale function, 327
scaling, 7
Schrödinger operator, 323
Schwartz class, 379
section theorem
optional, 117
predictable, 117
self-financing, 219
semigroup, 155
semigroup property, 155
semimartingale, 54, 141
set-indexed process, 255
shift operators, 158
signal process, 229
simple symmetric random walk, 109, 248
Skorokhod embedding, 100
Skorokhod representation, 245
Slutsky’s theorem, 242, 369
space-time process, 182
spectral theorem, 309
speed measure, 329
square integrable martingale, 55
stable subordinator, 347
stationary increments, 6, 339
stochastic integral, 64, 134, 150
local martingales, 69
multiple, 88
semimartingales, 69
stochastic process, 1
stopping time, 15, 359
Stratonovich integral, 84
strong Feller process, 161
strong law of large numbers, 364
strong Markov process, 165
strong Markov property, 25
strongly reduce, 139
sub-Markov transition probability
kernels, 283
submartingale, 359
super-Brownian motion, 317
supermartingale, 359
support theorem, 93, 208
symmetric difference, 12
symmetric stable process, 346
Tanaka formula, 94, 95
terminal time, 177
tight, 369

390 Index
time change, 78, 105, 180
time inversion, 11
totally inaccessible, 112, 130
trading strategy, 219
trajectories, 2
transience, 167
transition densities, 291
transition probabilities, 154
uniform ellipticity, 296, 307, 315
uniformly absolutely continuous, 356
uniformly integrable, 356
unique in law, 204
upcrossings, 18, 362
usual conditions, 1
variance, 350
versions, 2
Vitali convergence theorem,
357
volatility, 218
weak convergence, 367
weak Feller process, 161
weak solution, 204
weak uniqueness, 204
well posed, 316
well measurable, 111
Wiener measure, 6
Yamada–Watanabe condition, 196

Cover
Title
Copyright
Dedication
Contents
Preface
Frequently used notation
1 Basic notions
1.1 Processes and s-fields
1.2 Laws and state spaces
Exercises
Notes
2 Brownian motion
2.1 Definition and basic properties
Exercises
Notes
3 Martingales
3.1 Definition and examples
3.2 Doob’s inequalities
3.3 Stopping times
3.4 The optional stopping theorem
3.5 Convergence and regularity
3.6 Some applications of martingales
Exercises
4 Markov properties of Brownian motion
4.1 Markov properties
4.2 Applications
Exercises
5 The Poisson process
Exercises
6 Construction of Brownian motion
6.1 Wiener’s construction
6.2 Martingale methods
Exercises
7 Path properties of Brownian motion
Exercises
8 The continuity of paths
Exercises
9 Continuous semimartingales
9.1 Definitions
9.2 Square integrable martingales
9.3 Quadratic variation
9.4 The Doob–Meyer decomposition
Exercises
Notes
10 Stochastic integrals
10.1 Construction
10.2 Extensions
Exercises
11 Ito’s formula
Exercises
12 Some applications of Ito’s formula
12.1 Levy’s theorem
12.2 Time changes of martingales
12.3 Quadratic variation
12.4 Martingale representation
12.5 The Burkholder–Davis–Gundy inequalities
12.6 Stratonovich integrals
Exercises
13 The Girsanov theorem
13.1 The Brownian motion case
13.2 An example
Exercises
14 Local times
14.1 Basic properties
14.2 Joint continuity of local times
14.3 Occupation times
Exercises
15 Skorokhod embedding
15.1 Preliminaries
15.2 Construction of the embedding
15.3 Embedding random walks
Exercises
16 The general theory of processes
16.1 Predictable and optional processes
16.2 Hitting times
16.3 The debut and section theorems
16.4 Projection theorems
16.5 More on predictability
16.6 Dual projection theorems
16.7 The Doob–Meyer decomposition
16.8 Two inequalities
Exercises
Notes
17 Processes with jumps
17.1 Decomposition of martingales
17.2 Stochastic integrals
17.3 Ito’s formula
17.4 The reduction theorem
17.5 Semimartingales
17.6 Exponential of a semimartingale
17.7 The Girsanov theorem
Exercises
18 Poisson point processes
Exercises
19 Framework for Markov processes
19.1 Introduction
19.2 Definition of a Markov process
19.3 Transition probabilities
19.4 An example
19.5 The canonical process and shift operators
Exercises
Notes
20 Markov properties
20.1 Enlarging the filtration
20.2 The Markov property
20.3 Strong Markov property
Exercises
21 Applications of the Markov properties
21.1 Recurrence and transience
21.2 Additive functionals
21.3 Continuity
21.4 Harmonic functions
Exercises
22 Transformations of Markov processes
22.1 Killed processes
22.2 Conditioned processes
22.3 Time change
22.4 Last exit decompositions
Exercises
Notes
23 Optimal stopping
23.1 Excessive functions
23.2 Solving the optimal stopping problem
Exercises
Notes
24 Stochastic differential equations
24.1 Pathwise solutions of SDEs
24.2 One-dimensional SDEs
24.3 Examples of SDEs
Exercises
Notes
25 Weak solutions of SDEs
Exercises
26 The Ray–Knight theorems
Exercises
Notes
27 Brownian excursions
Exercises
Notes
28 Financial mathematics
28.1 Finance models
28.2 Black–Scholes formula
28.3 The fundamental theorem of finance
28.4 Stochastic control
Exercises
29 Filtering
29.1 The basic model
29.2 The innovation process
29.3 Representation of Fz-martingales
29.4 The filtering equation
29.5 Linear models
29.6 Kalman–Bucy filter
Exercises
Notes
30 Convergence of probability measures
30.1 The portmanteau theorem
30.2 The Prohorov theorem
30.3 Metrics for weak convergence
Exercises
Notes
31 Skorokhod representation
Exercises
32 The space C[0,1]
32.1 Tightness
32.2 A construction of Brownian motion
Exercises
33 Gaussian processes
33.1 Reproducing kernel Hilbert spaces
33.2 Continuous Gaussian processes
Exercises
34 The space D[0,1]
34.1 Metrics for D[0,1]
34.2 Compactness and completeness
34.3 The Aldous criterion
Exercises
Notes
35 Applications of weak convergence
35.1 Donsker invariance principle
35.2 Brownian bridge
35.3 Empirical processes
Exercises
36 Semigroups
36.1 Constructing the process
36.2 Examples
Exercises
Notes
37 Infinitesimal generators
37.1 Semigroup properties
37.2 The Hille–Yosida theorem
37.3 Nondivergence form elliptic operators
37.4 Generators of Levy processes
Exercises
38 Dirichlet forms
38.1 Framework
38.2 Construction of the semigroup
38.3 Divergence form elliptic operators
Exercises
Notes
39 Markov processes and SDEs
39.1 Markov properties
39.2 SDEs and PDEs
39.3 Martingale problems
Exercises
Notes
40 Solving partial differential equations
40.1 Poisson’s equation
40.2 Dirichlet problem
40.3 Cauchy problem
40.4 Schrodinger operators
Exercises
Notes
41 One-dimensional diffusions
41.1 Regularity
41.2 Scale functions
41.3 Speed measures
41.4 The uniqueness theorem
41.5 Time change
41.6 Examples
Exercises
Notes
42 Levy processes
42.1 Examples
42.2 Construction of Levy processes
42.3 Representation of Levy processes
Exercises
Appendix A Basic probability
A.1 First notions
A.2 Independence
A.3 Convergence
A.4 Uniform integrability
A.5 Conditional expectation
A.6 Stopping times
A.7 Martingales
A.8 Optional stopping
A.9 Doob’s inequalities
A.10 Martingale convergence theorem
A.11 Strong law of large numbers
A.12 Weak convergence
A.13 Characteristic functions
A.14 Uniqueness and characteristic functions
A.15 The central limit theorem
A.16 Gaussian random variables
Appendix B Some results from analysis
B.1 The monotone class theorem
B.2 The Schwartz class
Appendix C Regular conditional probabilities
Appendix D Kolmogorov extension theorem
References
Index

Order your essay today and save 25% with the discount code: STUDYSAVE

Order Now

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Stochastic Process Hmk ”

Get high-quality paper

NEW! AI matching with writer

Order a unique copy of this paper

Type of paper needed:

Pages:

600 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

Our Services

Stochastic Process Hmk

Order a unique copy of this paper