Due at 22:30 ET. See attachement.

http://www.cambridge.org/9781107008007

This page intentionally left blank

Stochastic Processes

This comprehensive guide to stochastic processes gives a complete overview of the

theory and addresses the most important applications. Pitched at a level accessible

to beginning graduate students and researchers from applied disciplines, it is both

a course book and a rich resource for individual readers. Subjects covered include

Brownian motion, stochastic calculus, stochastic differential equations, Markov pro-

cesses, weak convergence of processes, and semigroup theory. Applications include

the Black–Scholes formula for the pricing of derivatives in financial mathematics, the

Kalman–Bucy filter used in the US space program, and also theoretical applications

to partial differential equations and analysis. Short, readable chapters aim for clarity

rather than for full generality. More than 350 exercises are included to help readers put

their new-found knowledge to the test and to prepare them for tackling the research

literature.

richard f. bass is Board of Trustees Distinguished Professor in the Department

of Mathematics at the University of Connecticut.

C A M B R I D G E S E R I E S I N S T A T I S T I C A L A N D

P R O B A B I L I S T I C M A T H E M A T I C S

Editorial Board

Z. Ghahramani (Department of Engineering, University of Cambridge)

R. Gill (Mathematical Insitute, Leiden University)

F. P. Kelly (Department of Pure Mathematics and Mathematical Statistics,

University of Cambridge)

B. D. Ripley (Department of Statistics, University of Oxford)

S. Ross (Department of Industrial and Systems Engineering,

University of Southern California)

M. Stein (Department of Statistics, University of Chicago)

This series of high-quality upper-division textbooks and expository monographs covers all

aspects of stochastic applicable mathematics. The topics range from pure and applied statistics

to probability theory, operations research, optimization, and mathematical programming. The

books contain clear presentations of new developments in the field and also of the state of

the art in classical methods. While emphasizing rigorous treatment of theoretical methods, the

books also contain applications and discussions of new techniques made possible by advances

in computational practice.

A complete list of books in the series can be found at http://www.cambridge.org/statistics.

Recent titles include the following:

11. Statistical Models, by A. C. Davison

12. Semiparametric Regression, by David Ruppert, M. P. Wand and R. J. Carroll

13. Exercises in Probability, by Loı̈c Chaumont and Marc Yor

14. Statistical Analysis of Stochastic Processes in Time, by J. K. Lindsey

15. Measure Theory and Filtering, by Lakhdar Aggoun and Robert Elliott

16. Essentials of Statistical Inference, by G. A. Young and R. L. Smith

17. Elements of Distribution Theory, by Thomas A. Severini

18. Statistical Mechanics of Disordered Systems, by Anton Bovier

19. The Coordinate-Free Approach to Linear Models, by Michael J. Wichura

20. Random Graph Dynamics, by Rick Durrett

21. Networks, by Peter Whittle

22. Saddlepoint Approximations with Applications, by Ronald W. Butler

23. Applied Asymptotics, by A. R. Brazzale, A. C. Davison and N. Reid

24. Random Networks for Communication, by Massimo Franceschetti and Ronald Meester

25. Design of Comparative Experiments, by R. A. Bailey

26. Symmetry Studies, by Marlos A. G. Viana

27. Model Selection and Model Averaging, by Gerda Claeskens and Nils Lid Hjort

28. Bayesian Nonparametrics, edited by Nils Lid Hjort et al.

29. From Finite Sample to Asymptotic Methods in Statistics, by Pranab K. Sen,

Julio M. Singer and Antonio C. Pedrosa de Lima

30. Brownian Motion, by Peter Mörters and Yuval Peres

31. Probability, by Rick Durrett

33. Stochastic Processes, by Richard F. Bass

34. Structured Regression for Categorical Data, by Gerhard Tutz

Stochastic Processes

Richard F. Bass

University of Connecticut

CAMBRIDGE UNIVERSITY PRESS

Cambridge, New York, Melbourne, Madrid, Cape Town,

Singapore, São Paulo, Delhi, Tokyo, Mexico City

Cambridge University Press

The Edinburgh Building, Cambridge CB2 8RU, UK

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org

Information on this title: www.cambridge.org/9781107008007

C© R. F. Bass 2011

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,

no reproduction of any part may take place without the written

permission of Cambridge University Press.

First published 2011

Printed in the United Kingdom at the University Press, Cambridge

A catalogue record for this publication is available from the British Library

Library of Congress Cataloguing in Publication data

Bass, Richard F.

Stochastic processes / Richard F. Bass.

p. cm. – (Cambridge series in statistical and probabilistic mathematics ; 33)

Includes index.

ISBN 978-1-107-00800-7 (hardback)

1. Stochastic analysis. I. Title.

QA274.2.B375 2011

519.2′32 – dc23 2011023024

ISBN 978-1-107-00800-7 Hardback

Cambridge University Press has no responsibility for the persistence or

accuracy of URLs for external or third-party internet websites referred to

in this publication, and does not guarantee that any content on such

websites is, or will remain, accurate or appropriate.

http://www.cambridge.org

http://www.cambridge.org/9781107008007

To Meredith, as always

Contents

Preface page xiii

Frequently used notation xv

1 Basic notions 1

1.1 Processes and σ -fields 1

1.2 Laws and state spaces 3

2 Brownian motion 6

2.1 Definition and basic properties 6

3 Martingales 13

3.1 Definition and examples 13

3.2 Doob’s inequalities 14

3.3 Stopping times 15

3.4 The optional stopping theorem 17

3.5 Convergence and regularity 17

3.6 Some applications of martingales 20

4 Markov properties of Brownian motion 25

4.1 Markov properties 25

4.2 Applications 27

5 The Poisson process 32

6 Construction of Brownian motion 36

6.1 Wiener’s construction 36

6.2 Martingale methods 39

7 Path properties of Brownian motion 43

8 The continuity of paths 49

vii

viii Contents

9 Continuous semimartingales 54

9.1 Definitions 54

9.2 Square integrable martingales 55

9.3 Quadratic variation 57

9.4 The Doob–Meyer decomposition 58

10 Stochastic integrals 64

10.1 Construction 64

10.2 Extensions 69

11 Itô’s formula 71

12 Some applications of Itô’s formula 77

12.1 Lévy’s theorem 77

12.2 Time changes of martingales 78

12.3 Quadratic variation 79

12.4 Martingale representation 79

12.5 The Burkholder–Davis–Gundy inequalities 82

12.6 Stratonovich integrals 84

13 The Girsanov theorem 89

13.1 The Brownian motion case 89

13.2 An example 92

14 Local times 94

14.1 Basic properties 94

14.2 Joint continuity of local times 96

14.3 Occupation times 97

15 Skorokhod embedding 100

15.1 Preliminaries 100

15.2 Construction of the embedding 105

15.3 Embedding random walks 108

16 The general theory of processes 111

16.1 Predictable and optional processes 111

16.2 Hitting times 115

16.3 The debut and section theorems 117

16.4 Projection theorems 119

16.5 More on predictability 120

16.6 Dual projection theorems 122

16.7 The Doob–Meyer decomposition 124

16.8 Two inequalities 126

Contents ix

17 Processes with jumps 130

17.1 Decomposition of martingales 130

17.2 Stochastic integrals 133

17.3 Itô’s formula 135

17.4 The reduction theorem 139

17.5 Semimartingales 141

17.6 Exponential of a semimartingale 143

17.7 The Girsanov theorem 144

18 Poisson point processes 147

19 Framework for Markov processes 152

19.1 Introduction 152

19.2 Definition of a Markov process 153

19.3 Transition probabilities 154

19.4 An example 156

19.5 The canonical process and shift operators 158

20 Markov properties 160

20.1 Enlarging the filtration 160

20.2 The Markov property 162

20.3 Strong Markov property 164

21 Applications of the Markov properties 167

21.1 Recurrence and transience 167

21.2 Additive functionals 169

21.3 Continuity 170

21.4 Harmonic functions 171

22 Transformations of Markov processes 177

22.1 Killed processes 177

22.2 Conditioned processes 178

22.3 Time change 180

22.4 Last exit decompositions 181

23 Optimal stopping 184

23.1 Excessive functions 184

23.2 Solving the optimal stopping problem 187

24 Stochastic differential equations 192

24.1 Pathwise solutions of SDEs 192

24.2 One-dimensional SDEs 196

24.3 Examples of SDEs 198

x Contents

25 Weak solutions of SDEs 204

26 The Ray–Knight theorems 209

27 Brownian excursions 214

28 Financial mathematics 218

28.1 Finance models 218

28.2 Black–Scholes formula 220

28.3 The fundamental theorem of finance 223

28.4 Stochastic control 226

29 Filtering 229

29.1 The basic model 229

29.2 The innovation process 230

29.3 Representation of FZ-martingales 231

29.4 The filtering equation 232

29.5 Linear models 234

29.6 Kalman–Bucy filter 234

30 Convergence of probability measures 237

30.1 The portmanteau theorem 237

30.2 The Prohorov theorem 239

30.3 Metrics for weak convergence 241

31 Skorokhod representation 244

32 The space C[0, 1] 247

32.1 Tightness 247

32.2 A construction of Brownian motion 248

33 Gaussian processes 251

33.1 Reproducing kernel Hilbert spaces 251

33.2 Continuous Gaussian processes 254

34 The space D[0, 1] 259

34.1 Metrics for D[0, 1] 259

34.2 Compactness and completeness 262

34.3 The Aldous criterion 264

35 Applications of weak convergence 269

35.1 Donsker invariance principle 269

35.2 Brownian bridge 273

35.3 Empirical processes 275

Contents xi

36 Semigroups 279

36.1 Constructing the process 279

36.2 Examples 283

37 Infinitesimal generators 286

37.1 Semigroup properties 286

37.2 The Hille–Yosida theorem 292

37.3 Nondivergence form elliptic operators 296

37.4 Generators of Lévy processes 297

38 Dirichlet forms 302

38.1 Framework 303

38.2 Construction of the semigroup 304

38.3 Divergence form elliptic operators 307

39 Markov processes and SDEs 312

39.1 Markov properties 312

39.2 SDEs and PDEs 314

39.3 Martingale problems 315

40 Solving partial differential equations 319

40.1 Poisson’s equation 319

40.2 Dirichlet problem 320

40.3 Cauchy problem 321

40.4 Schrödinger operators 323

41 One-dimensional diffusions 326

41.1 Regularity 326

41.2 Scale functions 327

41.3 Speed measures 329

41.4 The uniqueness theorem 333

41.5 Time change 334

41.6 Examples 336

42 Lévy processes 339

42.1 Examples 339

42.2 Construction of Lévy processes 340

42.3 Representation of Lévy processes 344

Appendices

A Basic probability 348

A.1 First notions 348

A.2 Independence 353

A.3 Convergence 355

A.4 Uniform integrability 356

xii Contents

A.5 Conditional expectation 357

A.6 Stopping times 359

A.7 Martingales 359

A.8 Optional stopping 360

A.9 Doob’s inequalities 361

A.10 Martingale convergence theorem 362

A.11 Strong law of large numbers 364

A.12 Weak convergence 367

A.13 Characteristic functions 370

A.14 Uniqueness and characteristic functions 372

A.15 The central limit theorem 372

A.16 Gaussian random variables 374

B Some results from analysis 378

B.1 The monotone class theorem 378

B.2 The Schwartz class 379

C Regular conditional probabilities 380

D Kolmogorov extension theorem 382

References 385

Index 387

Preface

Why study stochastic processes? This branch of probability theory offers sophisticated theo-

rems and proofs, such as the existence of Brownian motion, the Doob–Meyer decomposition,

and the Kolmogorov continuity criterion. At the same time stochastic processes also have

far-reaching applications: the explosive growth in options and derivatives in financial mar-

kets throughout the world derives from the Black–Scholes formula, while NASA relies on

the Kalman–Bucy method to filter signals from satellites and probes sent into outer space.

A graduate student taking a year-long course in probability theory first learns about

sequences of random variables and topics such as laws of large numbers, central limit

theorems, and discrete time martingales. In the second half of the course, the student will

then turn to stochastic processes, which is the subject of this text. Topics covered here are

Brownian motion, stochastic integrals, stochastic differential equations, Markov processes,

the Black–Scholes formula of financial mathematics, the Kalman–Bucy filter, as well as

many more.

The 42 chapters of this book can be grouped into seven parts. The first part consists

of Chapters 1–8, where some of the basic processes and ideas are introduced, including

Brownian motion. The next group of chapters, Chapters 9–15, introduce the theory of

stochastic calculus, including stochastic integrals and Itô’s formula. Chapters 16–18 explore

jump processes. This requires a study of the foundations of stochastic processes, which

is also known as the general theory of processes. Next we take up Markov processes in

Chapters 19–23. A formidable obstacle to the study of Markov processes is the notation, and

I have attempted to make this as accessible as possible. Chapters 24–29 involve stochastic

differential equations. Two very important applications, to financial mathematics and to

filtering, appear in Chapters 28 and 29, respectively. Probability measures on metric spaces

and the weak convergence of random variables taking values in a metric space prove to

be relevant to the study of stochastic processes. These and related topics are treated in

Chapters 30–35. We then return to Markov processes, namely, their construction and some

important examples, in Chapters 36–42. Tools used in the construction include infinitesimal

generators, Dirichlet forms, and solutions to stochastic differential equations, while two

important examples that we consider are diffusions on the real line and Lévy processes.

The prerequisites to this book are a sound knowledge of basic measure theory and a

course in the classical aspects of probability. The probability topics needed are provided

(with proofs) in an appendix.

There is far too much material in this book to cover in a single semester, and even too

much for a full year. I recommend that as a minimum the following chapters be studied:

Chapters 1–5, Chapters 9–13, Chapters 19–21, and Chapter 24. If possible, include either

xiii

xiv Preface

Chapter 28 or Chapter 29. In Chapter 11, the statement and corollaries of Itô’s formula are

very important, but the proof of Itô’s formula may be omitted.

I would like to thank the many students who patiently sat through my lectures, pointed out

errors, and made suggestions. I especially would like to thank my colleague Sasha Teplyaev

who taught a course from a preliminary version of this book and made a great number of

useful suggestions.

Frequently used notation

Here are some notational conventions we will use. We use the letter c, either with or without

subscripts, to denote a finite positive constant whose exact value is unimportant and which

may change from line to line. We use B(x, r) to denote the open Euclidean ball centered at

x with radius r. a ∧ b is the minimum of a and b, while a ∨ b is the maximum of a and b.

x+ = x ∨ 0 and x− = (−x) ∨ 0. The symbol ∃ is used in a few formulas and means “there

exists.” Q, Q+, N, and Z denote the rationals, the positive rationals, the natural numbers,

and the integers, respectively. If C is a matrix, CT is the transpose of C.

For a set A, we use Ac for the complement of A. If A is a subset of a topological space, A,

A0, and ∂A denote the closure, interior, and boundary of A, respectively.

Given a topological space S , we use C(S ) for the space of continuous functions on S ,

where we use the supremum norm. If S is a domain in Rd , Ck(S ) refers to the set of

continuous functions with domain S whose partial derivatives up to order k are continuous.

C∞ functions are those that are infinitely differentiable.

We will on a few occasions use the Fourier transform, which we define by

f̂ (u) =

∫

eiu·x f (x) dx

for f integrable. This agrees with the convention in Rudin (1987).

If X is a stochastic process whose paths are right continuous with left limits, then

Xt− = lims

The σ -field Ft+ is supposed to represent what one knows if one looks ahead an infinites-

imal amount. Most of the filtrations we will come across will be right continuous, but see

Exercise 1.1.

A null set N is one that has outer probability 0. This means that

inf{P(A) : N ⊂ A, A ∈ F} = 0.

A filtration is complete if each Ft contains every null set. A filtration that is right continuous

and complete is said to satisfy the usual conditions.

Given a filtration {Ft}, whether or not it satisfies the usual conditions, we define F∞ to be

the σ -field generated by ∪t≥0Ft , that is, the smallest σ -field containing ∪t≥0Ft , and we write

F∞ =

∨

t≥0

Ft .

Recall that the arbitrary intersection of σ -fields is a σ -field, but the union of even two σ -fields

need not be a σ -field.

We say that a stochastic process X is adapted to a filtration {Ft} if Xt is Ft measurable

for each t. Often one starts with a stochastic process X and wants to define a filtration with

respect to which X is adapted.

1

2 Basic notions

The simplest way to do this is to let Ft be the σ -field generated by the random variables

{Xs, s ≤ t}. More often one wants to have a slightly larger filtration than the one generated

by X .

We define the minimal augmented filtration generated by X to be the smallest filtration that

is right continuous and complete and with respect to which the process X is adapted. For each

t, Ft is in general strictly larger than the smallest σ -field with respect to which {Xs : s ≤ t} is

measurable because of the inclusion of the null sets. It is important to include the null sets;

see Exercise 1.5. There is no widely accepted name for what we call the minimal augmented

filtration; I like this nomenclature because it is descriptive and sufficiently different from

“filtration generated by X ” to avoid confusion.

The minimal augmented filtration generated by the process Xt can be constructed in three

steps. First, let {F 00t } be the smallest filtration with respect to which X is adapted, that is,

F 00t = σ (Xs; s ≤ t). (1.1)

Let P∗ be the outer probability corresponding to P: for A ⊂ �,

P∗(A) = inf{P(B) : B ∈ F , A ⊂ B}.

Let N be the collection of null sets, so that N = {A ⊂ � : P∗(A) = 0}. The second step is

to let F 0t be the smallest σ -field containing F 00t and N , or

F 0t = σ (F 00t ∪ N ). (1.2)

The third step is to let

Ft = ∩ε>0F 0t+ε. (1.3)

Exercise 1.2 asks you to check that {Ft} is the minimal augmented filtration generated by X .

We will refer to {F 00t } as the filtration generated by X .

Two stochastic processes X andY are said to be indistinguishable if P(Xt �= Yt for some t ≥

0) = 0. X and Y are versions of each other if for each t ≥ 0, we have P(Xt �= Yt ) = 0. An

example of two processes that are versions of each other but are not indistinguishable is to

let � = [0, 1], F the Borel σ -field on [0, 1], P Lebesgue measure on [0, 1], X (t, ω) = 0

for all t and ω, and Y (t, ω) equal to 1 if t = ω and 0 otherwise. Note that the functions

t → X (t, ω) are continuous for each ω, but the functions t → Y (t, ω) are not continuous

for any ω.

If X is a stochastic process, the functions t → X (t, ω) are called the paths or trajectories

of X . There will be one path for each ω. If the paths of X are continuous functions, except

for a set of ω’s in a null set, then X is called a continuous process, or is said to be continuous.

We similarly define right continuous process, left continuous process, etc.

A function f (t) is right continuous with left limits if limh>0,h↓0 f (t + h) = f (t) for all

t and limh<0,h↑0 f (t + h) exists for all t > 0. Almost all our stochastic processes will have

the property that except for a null set of ω’s the function t → X (t, ω) is right continuous

and has left limits. One often sees cadlag to refer to paths that are right continuous with left

limits; this abbreviates the French “continue à droite, limite à gauche.”

1.2 Laws and state spaces 3

1.2 Laws and state spaces

Let S be a topological space. The Borel σ -field on S is defined to be the σ -field generated

by the open sets of S . A function f : S → R is Borel measurable if f −1(G) is in the Borel

σ -field of S whenever G is an open subset of R. A random variable Y : � → S is measurable

with respect to a σ -field F of subsets of � if {ω ∈ � : Y (ω) ∈ A} is in F whenever A is in

the Borel σ -field on S .

A stochastic process taking values in a topological space S is a map X : [0, ∞)×� → S ,

where for each t, the random variable Xt is measurable with respect to F .

Recall that if we have a probability space (�,F , P) and Y : � → R is a random variable,

then the law of Y is the probability measure PY on the Borel subsets of R defined by

PY (A) = P(Y ∈ A). Similarly, if Y : � → Rd is a d-dimensional random vector, then the law

of Y is the probability measure PY on the Borel subsets of Rd defined by PY (A) = P(Y ∈ A).

We extend this definition to random variables Y taking values in a topological space S . In

this case PY is a probability measure on the Borel subsets of S with the same definition:

PY (A) = P(Y ∈ A). In particular, if Y and Z are two random variables with the same state

space S , then Y and Z will have the same law if P(Y ∈ A) = P(Z ∈ A) for all Borel subsets

A of S .

The relevance of the preceding paragraph to stochastic processes is this. Suppose X and

Y are stochastic processes with continuous paths. Let S = C[0, ∞) be the collection of

real-valued continuous functions on [0, ∞) together with the usual metric defined in terms

of the supremum norm:

d( f , g) = sup

0≤t

| f (t) − g(t)|.

(Strictly speaking, we should write C([0, ∞)), but we follow the usual convention and drop

the outside parentheses.) Let the random variable X taking values in S be defined by setting

X (ω) to be the continuous function t → X (t, ω), and define Y similarly. More precisely,

X : � → S with

X (ω)(t) = X (t, ω), t ≥ 0.

Then X and Y are random variables taking values in the metric space S , and saying that X

and Y have the same law means that P(X ∈ A) = P(Y ∈ A) for all Borel subsets A of S .

When this happens, we also say that the stochastic processes X and Y have the same law.

Two stochastic processes X and Y have the same finite-dimensional distributions if for

every n ≥ 1 and every t1 < · · · < tn, the laws of (Xt1, . . . , Xtn ) and (Yt1, . . . ,Ytn ) are equal.
Most often the topological spaces we will consider will also be metric spaces, but there
will be a few occasions when we want to consider topological spaces that are not metric
spaces. Suppose S = R[0,∞). We furnish S with the product topology. S can be identified
with the collection of real-valued functions on [0, ∞), but the topology is not given by the
supremum norm nor by any other metric. We use f for elements of S , where f (t) is the tth
coordinate of f . We call a subset A of S a cylindrical set if there exist n ≥ 1, non-negative
reals t1, t2, . . . , tn, and a Borel subset B of Rn such that
A = { f ∈ S : ( f (t1), . . . , f (tn)) ∈ B}.
4 Basic notions
The appropriate σ -field to use on S is the one generated by the collection of cylindrical
sets.
We want to generalize this notion slightly by allowing more general index sets and by
allowing for the possibility of considering only a subset of the product space.
Definition 1.1 Let U be a topological space, T an arbitrary index set, and B a subset of UT ,
the collection of functions from T into U . We say a set C is a cylindrical subset of B if there
exist n ≥ 1, t1, . . . , tn ∈ T , and a Borel subset A of Rn such that
C = { f ∈ B : ( f (t1), . . . , f (tn)) ∈ A}.
Exercises
1.1 This exercise gives an example where {F00t } defined by (1.1) is not right continuous. Let
� = {a, b}, let F be the collection of all subsets of �, and let P({a}) = P({b}) = 12 . Define
Xt (ω) =
⎧⎪⎨⎪⎩
0, t ≤ 1;
0, t > 1 and ω = a;

t − 1, t > 1 and ω = b.

Calculate F00t = σ (Xs; s ≤ t) and show {F00t } is not right continuous.

1.2 If X is a stochastic process, let F00t , F0t , and Ft be defined by (1.1), (1.2), and (1.3), respectively.

Show that {Ft} is the minimal augmented filtration generated by X .

1.3 Let {Ft} be a filtration satisfying the usual conditions and let B[0, t] be the Borel σ -field on

[0, t]. A real-valued stochastic process X is progressively measurable if for each t ≥ 0, the

map (s, ω) → X (s, ω) from [0, t] × � to R is measurable with respect to the product σ -field

B[0, t] × Ft .

(1) If X is adapted to {Ft} and we define

X (n)t (ω) =

∞∑

k=0

Xk/2n (ω)1[k/2n,(k+1)/2n )(t),

show that X (n) is progressively measurable for each n ≥ 1.

(2) Use (1) to show that if X is adapted to {Ft} and has left continuous paths, then X is

progressively measurable.

(3) If X is adapted to {Ft} and we define

Y (n)t (ω) =

∞∑

k=0

X(k+1)/2n (ω)1[k/2n,(k+1)/2n )(t),

show that for each t ≥ 0, the map (s, ω) → Y (n)(s, ω) from [0, t] × � to R is measurable with

respect to B[0, t] × Ft+2−n .

(4) Show that if X is adapted to {Ft} and has right continuous paths, then X is progressively

measurable.

1.4 Let S = R[0,1], the set of functions from [0, 1] to R, and let F be the σ -field generated by the

cylindrical sets. The purpose of this exercise is to show that the elements of F depend on only

countably many coordinates.

Notes 5

Let S0 = {(x1, x2, . . .)}, the set of sequences taking values in R. Let F0 be the σ -field

generated by the cylindrical subsets of RN, where N = {1, 2, . . .}.

Show that B ∈ F if and only if there exist t1, t2, . . . in [0, 1] and a set C ∈ F0 such that

B = { f ∈ S : ( f (t1), f (t2), . . .) ∈ C}.

1.5 Null sets are sometimes important! Let S and F be as in Exercise 1.4. Show that D /∈ F , where

D = { f ∈ S : f is a continuous function on [0, 1]}.

1.6 Suppose X is a stochastic process, {Ft} its minimal augmented filtration, and F∞ = ∨t≥0Ft .

Suppose with probability one, the paths of X are right continuous with left limits. Let Xt− =

lims

prove A ∈ F∞.

1.7 Suppose X is a stochastic process, {Ft} is the minimal augmented filtration for X , and F∞ =

∨t≥0Ft . If the paths of X are right continuous with left limits with probability one, show that

the event

A = {X has continuous paths}

is in F∞.

Notes

The older literature sometimes uses the notion of a separable stochastic process, but this is

rarely seen nowadays. For much more on measurability, see Chapter 16. For the complete

story on the foundations of stochastic processes, see Dellacherie and Meyer (1978).

2

Brownian motion

Brownian motion is by far the most important stochastic process. It is the archetype of

Gaussian processes, of continuous time martingales, and of Markov processes. It is basic to

the study of stochastic differential equations, financial mathematics, and filtering, to name

only a few of its applications.

In this chapter we define Brownian motion and consider some of its elementary aspects.

Later chapters will take up the construction of Brownian motion and properties of Brownian

motion paths.

2.1 Definition and basic properties

Let (�,F , P) be a probability space and let {Ft} be a filtration, not necessarily satisfying

the usual conditions.

Definition 2.1 Wt = Wt (ω) is a one-dimensional Brownian motion with respect to {Ft} and

the probability measure P, started at 0, if

(1) Wt is Ft measurable for each t ≥ 0.

(2) W0 = 0, a.s.

(3) Wt − Ws is a normal random variable with mean 0 and variance t − s whenever s < t.
(4) Wt − Ws is independent of Fs whenever s < t.
(5) Wt has continuous paths.
If instead of (2) we have W0 = x, we say we have a Brownian motion started at x. Defini-
tion 2.1(4) is referred to as the independent increments property of Brownian motion. The
fact that Wt − Ws has the same law as Wt−s, which follows from Definition 2.1(3), is called
the stationary increments property. When no filtration is specified, we assume the filtration
is the filtration generated by W , i.e., Ft = σ (Ws; s ≤ t). Sometimes a one-dimensional
Brownian motion started at 0 is called a standard Brownian motion.
Figure 2.1 is a simulation of a typical Brownian motion path.
We define d-dimensional Brownian motion with respect to a filtration {Ft} and started at
x = (x1, . . . , xd ) to be (W (1)t , . . . ,W (d)t ), where the W (i) are each one-dimensional Brow-
nian motions with respect to {Ft} started at xi, respectively, and W (1), . . . ,W (n) are all
independent.
The law of a Brownian motion is called Wiener measure. More precisely, given a
Brownian motion W , we can view it as a random variable taking values in C[0, ∞), the
space of real-valued continuous functions on [0, ∞). The law of W is the measure PW on
6
2.1 Definition and basic properties 7
0 0.2 0.4 0.6 0.8 1
2
1.5
1
0.5
0
0.5
1
1.5
2
Figure 2.1 Simulation of a typical Brownian motion path.
C[0, ∞) defined by PW (A) = P(W ∈ A) for all Borel subsets A of C[0, ∞). The measure
PW is Wiener measure.
There are a number of transformations one can perform on a Brownian motion that yield
a new Brownian motion. The first one is called the scaling property of Brownian motion, or
simply scaling.
Proposition 2.2 If W is a Brownian motion started at 0, a > 0, and Yt = aWt/a2 , then Yt is a

Brownian motion started at 0.

Proof We use Gt = Ft/a2 for the filtration for Y . Clearly Yt has continuous paths, Y0 = 0,

a.s., and Yt is Gt measurable. If s < t,
Yt − Ys = a(Wt/a2 − Ws/a2 )
is independent of Fs/a2 , hence is independent of Gs. Finally, if s < t, and if s < t, then Yt −Ys
will be a normal random variable with mean zero and
Var (Yt − Ys) = a2Var (Wt/a2 − Ws/a2 ) = a2
( t
a2
− s
a2
)
= t − s.
This suffices to give our result.
For some other transformations, see Exercises 2.3 and 2.5.
Recall what it means for a finite collection of random variables to be jointly normal; see
(A.29). A stochastic process X is Gaussian or jointly normal if all its finite-dimensional
distributions are jointly normal, that is, if for each n ≥ 1 and t1 < · · · < tn, the collection of
random variables Xt1, . . . , Xtn is a jointly normal collection.
8 Brownian motion
Proposition 2.3 If W is a Brownian motion, then W is a Gaussian process.
Proof Suppose W is a Brownian motion and let 0 = t0 < t1 < · · · < tn. Define
Zi = Wti − Wti−1√
ti − ti−1 , i = 1, 2, . . . , n.
By Definition 2.1(4), Zi is independent of Fti−1 , and hence independent of Z1, . . . , Zj−1. By
Definition 2.1(3), Zi is a mean-zero random variable with variance one. We can write
Wtj =
j∑
i=1
(ti − ti−1)1/2Zi, j = 1, . . . , n,
and so (Wt1, . . . ,Wtn ) is jointly normal. It follows that Brownian motion is a Gaussian
process.
Since the law of a finite collection of jointly normal random variables is determined by
their means and covariances, let’s calculate the covariance of Ws and Wt when W is a Brownian
motion. If s ≤ t, then
t − s = Var (Wt − Ws) = VarWt + VarWs − 2 Cov (Ws,Wt )
= t + s − 2 Cov (Ws,Wt )
from Definition 2.1(2) and (3). Hence Cov (Ws,Wt ) = s if s ≤ t. This is frequently written as
Cov (Ws,Wt ) = s ∧ t. (2.1)
We have the following converse.
Theorem 2.4 If W is a process such that all the finite-dimensional distributions are jointly
normal, EWs = 0 for all s, Cov (Ws,Wt ) = s when s ≤ t, and the paths of Wt are continuous,
then W is a Brownian motion.
Proof For Ft we take the filtration generated by W . If we take s = t, then VarWt =
Cov (Wt,Wt ) = t. In particular, VarW0 = 0, and since EW0 = 0, then W0 = 0, a.s. We have
Var (Wt − Ws) = VarWt − 2 Cov (Ws,Wt ) + VarWt
= t − 2s + s = t − s.
We have thus established all the parts of Definition 2.1 except for the independence of Wt −Ws
from Fs.
If r ≤ s < t, then
Cov (Wt − Ws,Wr) = Cov (Wt,Wr) − Cov (Ws,Wr) = r − r = 0,
and so Wt − Ws is independent of Wr by Proposition A.55. This shows that Wt − Ws is
independent of Fs.
We now look at two results that are more technical. These should only be skimmed on the
first reading of the book: read the statements, but not the proofs. The first result says that
if W is a Brownian motion with respect to the filtration generated by W , then it is also a
Brownian motion with respect to the minimal augmented filtration.
2.1 Definition and basic properties 9
Proposition 2.5 Let Wt be a Brownian motion with respect to {F 00t }, where F 00t =
σ (Ws; s ≤ t). Let N be the collection of null sets, F 0t = σ (F 00t ∪ N ), and Ft = ∩ε>0F 0t+ε.

(1) W is a Brownian motion with respect to the filtration {Ft}.

(2) Ft = F 0t for each t.

Proof (1) The only property we need to check is Definition 2.1(4). If f is a continuous

bounded function on R, A ∈ F 00s , and s < t, then because W is a Brownian motion with
respect to {F 00t }, the independent increments property shows that
E [ f (Wt − Ws); A] = E [ f (Wt − Ws)] P(A). (2.2)
If A is such that A\B and B\A are null sets for some B ∈ F 00s , it is easy to see that (2.2)
continues to hold. By linearity, it also holds if A is a finite disjoint union of such sets. If C1
is the collection of subsets of F 0s that are finite disjoint unions of such sets, then C1 is an
algebra of subsets of F 0s . Let M1 be the collection of subsets of F 0s for which (2.2) holds. It
is readily checked that M1 is a monotone class. By the monotone class theorem (Theorem
B.2), M1 is equal to the smallest σ -field containing C1, which is F 0s . Therefore (2.2) holds
for all A ∈ F 0s .
Now suppose A ∈ Fs = F 0s+. Then for each ε > 0, A ∈ F 0s+ε, and so using (2.2) with s

replaced by s + ε and t replaced by t + ε, we have

E [ f (Wt+ε − Ws+ε ); A] = E [ f (Wt+ε − Ws+ε )] P(A). (2.3)

Letting ε → 0 and using the facts that f is bounded and continuous and W has continuous

paths, the dominated convergence theorem implies that

E [ f (Wt − Ws); A] = E [ f (Wt − Ws)] P(A). (2.4)

This equation holds whenever f is continuous and A ∈ Fs. By a limit argument, (2.4) holds

whenever f is the indicator of a Borel subset of R. That says that Wt − Ws and Fs are

independent.

(2) Fix t and choose t0 > t. Let M2 be the collection of subsets of F 00t0 whose conditional

expectation with respect to Ft is F 0t measurable, that is, A ∈ M2 if A ∈ F 00t0 and E [1A | Ft]

is F 0t measurable. Let C2 be the collection of events A for which there exist n ≥ 1, 0 ≤ s0 <
s1 < · · · < sn ≤ t0 with t equal to one of the si, and Borel subsets B1, . . . , Bn of R such
that
A = (Ws1 − Ws0 ∈ B1, . . . ,Wsn − Wsn−1 ∈ Bn).
Suppose A is of this form, and suppose t = si. Then by the independence result that we
proved in (1),
E [1A | Ft] = 1(Ws1 −Ws0 ∈B1,...,Wsi −Wsi−1 ∈Bi )
× P(Wsi+1 − Wsi ∈ Bi+1, . . . ,Wsn − Wsn−1 ∈ Bn),
which is F 0t measurable. Thus C2 ⊂ M2. Finite unions of sets in C2 form an algebra of
subsets of F 00t0 that generate F 00t . It is easy to check that M2 is a monotone class, so by the
monotone class theorem, M2 equals F 00t . By linearity and taking monotone limits, if Y is
non-negative and F 00t measurable, then E [Y | Ft] is F 0t measurable.
10 Brownian motion
To finish, suppose A ∈ Ft . Then since t < t0, we see that A ∈ F 0t0 . By Exercise 2.7, there
exists Y ∈ F 00t0 such that 1A = Y , a.s. Then E [Y | Ft] is F 0t measurable. Since F 0t contains
all the null sets, 1A = E [1A | Ft] is also F 0t measurable, or A ∈ F 0t . This proves (2).
The final item we consider in this chapter is a subtle one. The question is this: if W and
W ′ are both Brownian motions, do they have all the same properties? To illustrate this issue,
let’s revisit the example of Chapter 1 where � = [0, 1], F is the Borel σ -field on [0, 1],
P is Lebesgue measure on [0, 1], X (t, ω) = 0 for all t and ω, and Y (t, ω) is 1 if t = ω
and 0 otherwise. For each t, P(Xt = Yt ) = 1, so X and Y have the same finite-dimensional
distributions. However, if
A = { f : f is not a continuous function on [0, 1]},
then (X ∈ A) is a null set but (Y ∈ A) is not. Even though X and Y have the same
finite-dimensional distributions, X has continuous paths but Y does not.
To rephrase our question, is it true that P(W ∈ A) = P(W ′ ∈ A) for every Borel subset
A of C[0, ∞)? We know W and W ′ have the same finite-dimensional distributions because
each is jointly normal with zero means and Cov (Ws,Wt ) = s ∧ t = Cov (W ′s ,W ′t ). The fact
that the answer to our question is yes then comes from the following theorem. We look at
C[0, t0] instead of C[0, ∞) for the sake of simplicity.
Theorem 2.6 Let t0 > 0 and let X ,Y be random variables taking values in C[0, t0] which

have the same finite-dimensional distributions. Then the laws of X and Y are equal.

Proof Let M be the collection of Borel subsets A of C[0, t0] for which P(X ∈ A) equals

P(Y ∈ A). We will show that M is a monotone class and then use the monotone class

theorem to show that M is equal to the Borel σ -field on C[0, t0].

First, let C be the collection of all cylindrical subsets of C[0, t0] (defined by Defini-

tion 1.1). Since the finite-dimensional distributions of X and Y are equal, then M contains

C. It is easy to check that C is an algebra of subsets of C[0, t0]. If A1 ⊃ A2 ⊃ · · · are elements

of M, then

P(X ∈ ∩nAn) = lim

n

P(X ∈ An) = lim

n

P(Y ∈ An) = P(Y ∈ ∩nAn)

since P is a finite measure. Therefore ∩nAn ∈ M. A very similar argument shows that if

A1 ⊂ A2 ⊂ · · · are elements of M, then ∪nAn ∈ M. Therefore M is a monotone class. By

the monotone class theorem, M contains the smallest σ -field containing C. We will show

that M contains all the open sets; then M will contain the smallest σ -field containing the

open sets, and we will be done.

Since C[0, t0] is separable, every open set is the countable union of open balls. Because

M is a σ -field, it suffices to show that M contains the open balls in C[0, t0], that is, all sets

of the form

B( f0, r) = { f ∈ C[0, t0] : sup

0≤t≤t0

| f (t) − f0(t)| < r}
where r > 0 and f0 ∈ C[0, t0]. For each m and n,

{ f ∈ C[0, t0] : sup

0≤k≤2nt0

| f (k/2n) − f0(k/2n)| ≤ r − (1/m)}

Exercises 11

is a set in C, and so is in M. As n → ∞, these sets decrease to

Dm = { f ∈ C[0, t0] : sup

0≤t≤t0

| f (t) − f0(t)| ≤ r − (1/m)},

since all the functions we are considering are continuous. Finally, Dm increases to B( f0, r)

as m → ∞, so B( f0, r) is in M as desired.

Exercises

2.1 Suppose W is a Brownian motion on [0, 1]. Let

Yt = W1−t − W1.

Show that Yt is a Brownian motion on [0, 1].

2.2 This exercise shows that the projection of a d-dimensional Brownian motion onto a hyperplane

yields a one-dimensional Brownian motion. Suppose (W (1)t , . . . ,W

(d)

t ) is a d-dimensional

Brownian motion started from 0 and λ1, . . . , λd ∈ R with

∑d

i=1 λ

2

i = 1. Show that Xt =∑d

i=1 λiW

(i)

t is a one-dimensional Brownian motion started from 0.

2.3 This exercise shows that rotating a Brownian motion about the origin yields another Brownian

motion. Let W be a d-dimensional Brownian motion started at 0 and let A be a d × d orthogonal

matrix, that is, A−1 = AT. Show that Yt = AWt is again a d-dimensional Brownian motion.

2.4 Here is a converse to Exercise 2.2: roughly speaking, if all the projections of a d-dimensional

process X onto hyperplanes are one-dimensional Brownian motions, then X is a d-dimensional

Brownian motion.

Suppose (X 1t , . . . , X

d

t ) is a d-dimensional continuous process, i.e., one taking values in

Rd . Let {Ft} be the minimal augmented filtration generated by X . Suppose that whenever

λ1, . . . , λd ∈ R with

∑d

i=1 λ

2

i = 1, then

∑d

i=1 λiX

i

t is a one-dimensional Brownian motion

started at 0 with respect to the filtration {Ft}.

(1) If u = (u1, . . . , ud ), let ‖u‖ = (

∑

u2j )

1/2 and let λ j = u j/‖u‖. Calculate

E exp

(

i

d∑

j=1

u jX

j

t

)

= E exp

(

i‖u‖

d∑

j=1

λ jX

j

t

)

,

the joint characteristic function of Xt .

(2) If t0 < t1 < · · · < tn, use independence and (1) to calculate
E exp
(
i
n−1∑
k=0
d∑
j=1
ukj (X
j
tk+1 − X jtk )
)
.
(3) Prove that (X 1t , . . . , X
d
t ) is a d-dimensional Brownian motion started from 0.
(Some care is needed with the filtrations. If we only know that Y λ =∑i λiX i is a Brownian
motion with respect to the filtration generated by Y λ for each λ = (λ1, . . . , λd ), the assertion is
not true. See Revuz and Yor (1999), Exercise I.1.19.)
2.5 Let Wt be a Brownian motion and suppose
lim
t→∞Wt/t = 0, a.s. (2.5)
Let Zt = tW1/t if t > 0 and set Z0 = 0. (This is called time inversion.) Show that Z is a Brownian

motion. (We will see later that the assumption (2.5) is superfluous; see Theorem 7.2.)

12 Brownian motion

2.6 Let X and Y be two independent Brownian motions started at 0 and let t0 > 0. Let

Zt =

{

Xt , t ≤ t0,

Xt0 + Yt−t0 , t > t0.

Prove that Z is also a Brownian motion.

2.7 Let F00t and F0t be defined as in (1.1) and (1.2). Prove that if X is F0t measurable, there exists

Z such that Z is F00t measurable and Y = Z, a.s.

2.8 Let F00t and F0t be defined as in (1.1) and (1.2). The symmetric difference of two sets A and B

is defined by A � B = (A\B) ∪ (B\A). Prove that

F0t = {A ⊂ � : A � B ∈ N for some B ∈ F00t }.

Notes

Brownian motion is named for Robert Brown, a botanist who observed the erratic motion

of colloidal particles in suspension in the 1820s. Brownian motion was used by Bachelier

in 1900 in his PhD thesis to model stock prices and was the subject of an important paper

by Einstein in 1905. The rigorous mathematical foundations for Brownian motion were first

given by Wiener in 1923.

3

Martingales

Although discrete-time martingales are useful in a first course on probability, they are nowhere

near as useful as continuous-time martingales are in the study of stochastic processes.

The whole theory of stochastic integrals and stochastic differential equations is based on

martingales indexed by times t ∈ [0, ∞). After giving the definition and some examples, we

extend Doob’s inequalities, the optional stopping theorem, and the martingale convergence

theorem to continuous-time martingales. We then derive some estimates for Brownian motion

using martingale techniques.

3.1 Definition and examples

We define continuous-time martingales. Let {F t} be a filtration, not necessarily satisfying

the usual conditions.

Definition 3.1 Mt is a continuous-time martingale with respect to the filtration {Ft} and the

probability measure P if

(1) E |Mt | < ∞ for each t;
(2) Mt is Ft measurable for each t;
(3) E [Mt | Fs] = Ms, a.s., if s < t.
Part (2) of the definition can be rephrased as saying Mt is adapted to Ft . If in part (3) “=”
is replaced by “≥,” then Mt is a submartingale, and if it is replaced by “≤,” then we have a
supermartingale.
Taking expectations in Definition 3.1(3), we see that if s < t, then E Ms ≤ E Mt is M is
a submartingale and E Ms ≥ E Mt if M is a supermartingale. Thus submartingales tend to
increase, on average, and supermartingales tend to decrease, on average.
There are many martingales associated with Brownian motion. Here are three examples.
Example 3.2 Let Mt = Wt , where Wt is a Brownian motion. Then Mt is a martingale. To
verify Definition 3.1(3), we write
E [Mt | Fs] = Ms + E [Wt − Ws | Fs] = Ms + E [Wt − Ws] = Ms,
using the independent increments property of Brownian motion and the fact that
E [Wt − Ws] = 0.
13
14 Martingales
Example 3.3 Let Mt = W 2t − t, where Wt is a Brownian motion. To show Mt is a martingale,
we write
E [Mt | Fs] = E [(Wt − Ws + Ws)2 | Fs] − t
= W 2s + E [(Wt − Ws)2 | Fs] + 2E [Ws(Wt − Ws) | Fs] − t
= W 2s + E [(Wt − Ws)2] + 2WsE [Wt − Ws | Fs] − t
= W 2s + E [(Wt − Ws)2] + 2WsE [Wt − Ws] − t
= W 2s + (t − s) − t = Ms.
We used the facts that Ws is Fs measurable and that Wt − Ws is independent of Fs.
Example 3.4 Again let Wt be a Brownian motion, let a ∈ R, and let Mt = eaWt−a2t/2. Since
Wt − Ws is normal with mean zero and variance t − s, we know E ea(Wt−Ws ) = ea2(t−s)/2; see
(A.6). Then
E [Mt | Fs] = e−a2t/2eaWsE [ea(Wt−Ws ) | Fs]
= e−a2t/2eaWsE [ea(Wt−Ws )]
= e−a2t/2eaWs ea2(t−s)/2 = Ms.
We give one more example of a martingale, although not one derived from Brownian
motion.
Example 3.5 Recall that given a filtration {Ft}, each Ft is contained in F , where (�,F , P)
is our probability space. Let X be an integrable F measurable random variable, and let
Mt = E [X | Ft]. Then
E [Mt | Fs] = E [E [X | Ft] | Fs] = E [X | Fs] = Ms,
and M is a martingale.
3.2 Doob’s inequalities
We derive the analogs of Doob’s inequalities in the stochastic process context.
Theorem 3.6 Suppose Mt is a martingale or non-negative submartingale with paths that
are right continuous with left limits. Then
(1)
P(sup
s≤t
|Ms| ≥ λ) ≤ E |Mt |/λ.
(2) If 1 < p < ∞, then
E [sup
s≤t
|Ms|]p ≤
( p
p − 1
)p
E |Mt |p.
Proof We will do the case where Mt is a martingale, the submartingale case being nearly
identical. Let Dn = {kt/2n : 0 ≤ k ≤ 2n}. If we set N (n)k = Mkt/2n and G (n)k = Fkt/2n , it is
clear that {N (n)k } is a discrete-time martingale with respect to {G (n)k }. Let
An = { sup
s≤t,s∈Dn
|Ms| > λ}.

3.3 Stopping times 15

By Doob’s inequality for discrete-time martingales (see Theorem A.32),

P(An) = P(max

k≤2n

|N (n)k | > λ) ≤

E |N (n)2n |

λ

= E |Mt |

λ

.

Note that the An are increasing, and since Mt is right continuous,

∪nAn = {sup

s≤t

|Ms| > λ}.

Then

P(sup

s≤t

|Ms| > λ) = P(∪nAn) = lim

n→∞

P(An) ≤ E |Mt |/λ.

If we apply this with λ replaced by λ − ε and let ε → 0, we obtain (1).

The proof of (2) is similar. By Doob’s inequality for discrete-time martingales (see

Theorem A.33),

E [sup

k≤2n

|N (n)k |p] ≤

( p

p − 1

)p

E |N (n)2n |p =

( p

p − 1

)p

E |Mt |p.

Since supk≤2n |N (n)k |p increases to sups≤t |Ms|p by the right continuity of M , (2) follows by

Fatou’s lemma.

3.3 Stopping times

Throughout this section we suppose we have a filtration {Ft} satisfying the usual conditions.

Definition 3.7 A random variable T : � → [0, ∞] is a stopping time if for all t, (T < t) ∈
Ft . We say T is a finite stopping time if T < ∞, a.s. We say T is a bounded stopping time if
there exists K ∈ [0, ∞) such that T ≤ K, a.s.
Note that T can take the value infinity. Stopping times are also known as optional times.
Given a stochastic process X , we define XT (ω) to be equal to X (T (ω), ω); that is, for
each ω we evaluate t = T (ω) and then look at X (·, ω) at this time.
Proposition 3.8 Suppose Ft satisfies the usual conditions. Then
(1) T is a stopping time if and only if (T ≤ t) ∈ Ft for all t.
(2) If T = t, a.s., then T is a stopping time.
(3) If S and T are stopping times, then so are S ∨ T and S ∧ T .
(4) If Tn, n = 1, 2, . . . , are stopping times with T1 ≤ T2 ≤ · · · , then so is supn Tn.
(5) If Tn, n = 1, 2, . . . , are stopping times with T1 ≥ T2 ≥ · · · , then so is inf n Tn.
(6) If s ≥ 0 and S is a stopping time, then so is S + s.
Proof We will just prove part of (1), leaving the rest as Exercise 3.4. Note (T ≤ t) =
∩n≥N (T < t + 1/n) ∈ Ft+1/N for each N . Thus (T ≤ t) ∈ ∩NFt+1/N ⊂ Ft+ = Ft .
For a Borel measurable set A, let
TA = inf{t > 0 : Xt ∈ A}. (3.1)

16 Martingales

Proposition 3.9 Suppose Ft satisfies the usual conditions and Xt has continuous paths.

(1) If A is open, then TA is a stopping time.

(2) If A is closed, then TA is a stopping time.

Proof (1) (TA < t) = ∩q∈Q+,q

This definition of FT , which is supposed to be the collection of events that are “known” by

time T , is not very intuitive. But it turns out that this definition works well in applications.

Exercise 3.6 gives an equivalent definition that is more appealing but not as useful.

Proposition 3.10 Suppose {Ft} is a filtration satisfying the usual conditions.

(1) FT is a σ -field.

(2) If S ≤ T , then FS ⊂ FT .

(3) If FT+ = ∩ε>0FT+ε, then FT + = FT .

(4) If Xt has right-continuous paths, then XT is FT measurable.

Proof If A ∈ FT , then Ac ∩ (T ≤ t) = (T ≤ t) \ [A ∩ (T ≤ t)] ∈ Ft , so Ac ∈ FT . The rest

of the proof of (1) is easy.

Suppose A ∈ FS and S ≤ T . Then A ∩ (T ≤ t) = [A ∩ (S ≤ t)] ∩ (T ≤ t). We have

A ∩ (S ≤ t) ∈ Ft because A ∈ FS , while (T ≤ t) ∈ Ft because T is a stopping time.

Therefore A ∩ (T ≤ t) ∈ Ft , which proves (2).

For (3), if A ∈ FT+, then A ∈ FT+ε for every ε, and so A ∩ (T + ε ≤ t) ∈ Ft for all t.

Hence A ∩ (T ≤ t − ε) ∈ Ft for all t, or equivalently A ∩ (T ≤ t) ∈ Ft+ε for all t. This is

true for all ε, so A ∩ (T ≤ t) ∈ Ft+ = Ft . This says A ∈ FT .

(4) Define Tn by (3.2). Note

(XTn ∈ B) ∩ (Tn = k/2n) = (Xk/2n ∈ B) ∩ (Tn = k/2n) ∈ Fk/2n .

Since Tn only takes values in {k/2n : k ≥ 0}, we conclude (XTn ∈ B) ∩ (Tn ≤ t) ∈ Ft and so

(XTn ∈ B) ∈ FTn ⊂ FT +1/2n .

3.5 Convergence and regularity 17

Hence XTn is FT +1/2n measurable. If n ≥ m, then XTn is measurable with respect to FT +1/2n ⊂

FT+1/2m . Since XTn → XT , then XT is FT +1/2m measurable for each m. Therefore XT is

measurable with respect to FT+ = FT .

3.4 The optional stopping theorem

We will need Doob’s optional stopping theorem for continuous-time martingales. An example

to keep in mind is Mt = Wt∧t0 , where W is a Brownian motion and t0 is some fixed time.

Exercise 3.12 is a version of the optional stopping time with slightly weaker hypotheses that

is often useful.

Theorem 3.11 Let {Ft} be a filtration satisfying the usual conditions. If Mt is a martingale

or non-negative submartingale whose paths are right continuous, supt≥0 E M

2

t < ∞, and
T is a finite stopping time, then E MT ≥ E M0.
Proof We do the submartingale case, the martingale case being very similar. By Doob’s
inequality (Theorem 3.6(1)),
E [sup
s≤t
M2s ] ≤ 4E M2t .
Letting t → ∞, we have E [supt≥0 M2t ] < ∞ by Fatou’s lemma.
Let us first suppose that T < K, a.s., for some real number K. Define Tn by (3.2). Let
N (n)k = Mk/2n , G (n)k = Fk/2n , and Sn = 2nTn. By Doob’s optional stopping theorem applied to
the submartingale N (n)k , we have
E M0 = E N (n)0 ≤ E N (n)Sn = E MTn .
Since M is right continuous, MTn → MT , a.s. The random variables |MTn | are bounded by
1 + supt≥0 M2t , so by dominated convergence, E MTn → E MT .
We apply the above to the stopping time T ∧K to get E MT∧K ≥ E M0. The random variables
MT∧K are bounded by 1 + supt≥0 M2t , so by dominated convergence, we get E MT ≥ E M0
when we let K → ∞.
3.5 Convergence and regularity
We present the continuous-time version of Doob’s martingale convergence theorem. We will
see that not only do we get limits as t → ∞, but also a regularity result.
Let Dn = {k/2n : k ≥ 0}, D = ∪nDn.
Theorem 3.12 Let {Mt : t ∈ D} be either a martingale, a submartingale, or a supermartin-
gale with respect to {Ft : t ∈ D} and suppose supt∈D E |Mt | < ∞. Then
(1) limt→∞ Mt exists, a.s.
(2) With probability one Mt has left and right limits along D.
The second conclusion says that except for a null set, if t0 ∈ [0, ∞), then both limt∈D,t↑t0 Mt
and limt∈D,t↓t0 Mt exist and are finite. The null set does not depend on t0.
Proof Martingales are also submartingales and if Mt is a supermartingale, then −Mt is a
submartingale, so we may without loss of generality restrict our attention to submartingales.
18 Martingales
By Doob’s inequality (Theorem 3.6(1)),
P( sup
t∈Dn,t≤n
|Mt | > λ) ≤ 1

λ

E |Mn|.

Letting n → ∞ and using Fatou’s lemma,

P(sup

t∈D

|Mt | > λ) ≤ 1

λ

sup

t

E |Mt |.

This is true for all λ, so with probability one, {|Mt | : t ∈ D} is a bounded set.

Therefore the only way either (1) or (2) can fail is that if for some pair of rationals a < b the
number of upcrossings of [a, b] by {Mt : t ∈ D} is infinite. Recall that we define upcrossings
as follows.
Given an interval [a, b] and a submartingale M , if S1 = inf{t : Mt ≤ a}, Ti = inf{t > Si :

Mt ≥ b}, and Si+1 = inf{t > Ti : Mt ≤ a}, then the number of upcrossings up to time u is

sup{k : Tk ≤ u}.

Doob’s upcrossing lemma (Theorem A.34) tells us that if Vn is the number of upcrossings

by {Mt : t ∈ Dn ∩ [0, n]}, then

EVn ≤ E |Mn|

b − a .

Letting n → ∞ and using Fatou’s lemma, the number of upcrossings of [a, b] by {Mt : t ∈ D}

has finite expectation, hence is finite, a.s. If Na,b is the null set where the number of upcrossings

of [a, b] by {Mt : t ∈ D} is infinite and N = ∪a**t,u→t
Mu.
It is clear that M̃ has paths that are right continuous with left limits. Since Ft+ = Ft and M̃t
is Ft+ measurable, then M̃t is Ft measurable.
Let N be fixed. We will show {Mt; t ≤ N} is a uniformly integrable family of random
variables; see Section A.4. Let ε > 0. Since MN is integrable, there exists δ such that
if P(A) < δ, then E [|MN |; A] < ε. If L is large enough, P(|Mt | > L) ≤ E |Mt |/L ≤
E |MN |/L < δ. Then
E [|Mt |; |Mt | > L] ≤ E [|MN |; |Mt | > L] < ε,
since |Mt | is a submartingale and (|Mt | > L) ∈ Ft . Uniform integrability is proved.**

3.5 Convergence and regularity 19

Now let t < N . If B ∈ Ft ,
E [M̃t; B] = lim
u∈D,u>t,u→t

E [Mu; B] = E [Mt; B].

Here we used the Vitali convergence theorem (Theorem A.19) and the fact that Mt is a

martingale. Since M̃t is Ft measurable, this proves that M̃t = Mt , a.s. Since N was arbitrary,

we have this for all t. We thus have found a version of M that has paths that are right

continuous with left limits. That M̃t is a martingale is easy.

The following technical result will be used several times in this book. A function f is

increasing if s < t implies f (s) ≤ f (t). A process At has increasing paths if the function
t → At (ω) is increasing for almost every ω.
Proposition 3.14 Suppose {Ft} is a filtration satisfying the usual conditions and suppose At
is an adapted process with paths that are increasing, are right continuous with left limits,
and A∞ = limt→∞ At exists, a.s. Suppose X is a non-negative integrable random variable,
and Mt is a version of the martingale E [X | Ft] which has paths that are right continuous
with left limits. Suppose E [X A∞] < ∞. Then
E
∫ ∞
0
X dAs = E
∫ ∞
0
Ms dAs. (3.4)
Proof First suppose X and A are bounded. Let n > 1 and write E

∫∞

0 X dAs as

∞∑

k=1

E [X (Ak/2n − A(k−1)/2n )].

Conditioning the kth summand on Fk/2n , this is equal to

E

[ ∞∑

k=1

E [X | Fk/2n ](Ak/2n − A(k−1)/2n )

]

.

Given s and n, define sn to be that value of k/2n such that (k − 1)/2n < s ≤ k/2n. We then
have
E
∫ ∞
0
X dAs = E
∫ ∞
0
Msn dAs. (3.5)
For any value of s, sn ↓s as n → ∞, and since M has right-continuous paths, Msn → Ms. Since
X is bounded, so is M. By dominated convergence, the right-hand side of (3.5) converges to
E
∫ ∞
0
Ms dAs.
This completes the proof when X and A are bounded. We apply this to X ∧ N and A ∧ N , let
N → ∞, and use monotone convergence for the general case.
The only reason we assume X is non-negative is so that the integrals make sense. The
equation (3.4) can be rewritten as
E
∫ ∞
0
X dAs = E
∫ ∞
0
E [X | Fs] dAs. (3.6)
20 Martingales
We also have
E
∫ t
0
X dAs = E
∫ t
0
E [X | Fs] dAs (3.7)
for each t. This follows either by following the above proof or by applying Proposition 3.14
to As∧t .
3.6 Some applications of martingales
The following estimates are very useful.
Proposition 3.15 If Wt is a Brownian motion, then
P(sup
s≤t
Ws ≥ λ) ≤ e−λ2/2t, λ > 0, (3.8)

and

P(sup

s≤t

|Ws| ≥ λ) ≤ 2e−λ2/2t, λ > 0. (3.9)

Proof For any a the process {eaWt } is a submartingale. To see this, since x → eax is convex,

the conditional expectation form of Jensen’s inequality (Proposition A.21) implies

E [eaWt | Fs] ≥ eaE [Wt |Fs] = eaWs .

By Doob’s inequality (Theorem 3.6(1)),

P(sup

s≤t

Ws ≥ λ) = P(sup

s≤t

eaWs ≥ eaλ) ≤ E e

aWt

eaλ

. (3.10)

Since E eaY = ea2VarY/2 if Y is Gaussian with mean 0 by (A.6), it follows that the right side

of (3.10) is bounded by e−aλea

2t/2. If we now set a = λ/t, we obtain (3.8). Inequality (3.9)

follows by applying (3.8) to W and to −W and adding.

Let us use martingales to calculate some probabilities. Let us suppose a, b > 0 and set

T = inf{t > 0 : Wt = −a or Wt = b}, the first time Brownian motion exits the interval

[−a, b]. By Proposition 3.9, T is a stopping time.

We have

Proposition 3.16 Let W be a Brownian motion, let T = inf{t > 0 : Wt /∈ [−a, b]}, and let

a, b > 0. Then

P(WT = −a) = b

a + b, P(WT = b) =

a

a + b, (3.11)

and

E T = ab. (3.12)

Proof Since W 2t − t is a martingale with W0 = 0, it is easy to check that for each u,

W 2t∧u − (t ∧u) is also a martingale. Applying Theorem 3.11, we see that EW 2u∧T = E [u∧T ].

As u → ∞, the right-hand side tends to E T by monotone convergence. |Wu∧T |2 is bounded

3.6 Some applications of martingales 21

by (a + b)2, so by dominated convergence the left-hand side tends to EW 2T ≤ (a + b)2 as

u → ∞. Therefore

E T = EW 2T . (3.13)

In particular, E T < ∞, so we know T < ∞, a.s.
We use that T is finite, a.s., to conclude that P(WT ∈ {−a, b}) = 1, or
1 = P(WT = −a) + P(WT = b). (3.14)
Since Wt is a martingale, then so is Wt∧u for each u, and therefore EWu∧T = 0. Letting
u → ∞ and using dominated convergence (noting |Wu∧T | is bounded by a + b), we have
EWT = 0, or
0 = (−a)P(WT = −a) + bP(WT = b). (3.15)
We get (3.11) by solving (3.14) and (3.15) for the unknowns P(WT = −a) and P(WT = b).
We get (3.12) by (3.13), writing
E T = EW 2T = (−a)2P(WT = −a) + b2P(WT = b),
and substituting the values from (3.11).
In proving Proposition 3.16, we used the fact thatWt∧T is a martingale and P(T < ∞) = 1.
The same proof shows
Corollary 3.17 Suppose Mt is a martingale with continuous paths and with M0 = 0, a.s.,
T = inf{t ≥ 0 : Mt /∈ [−a, b]}, and T < ∞, a.s. Then
P(MT = −a) = b
a + b, P(MT = b) =
a
a + b .
We can also use martingales to get more subtle results. Suppose r > 0. Since erWt−r

2t/2 is

a martingale, as above

E erWT ∧t−r

2(T∧t)/2 = 1.

The exponent is bounded by rb if r > 0, so we can let t → ∞ and use dominated convergence

to get

E erWT −r

2T/2 = 1.

This can be written as

e−raE [e−r

2T/2;WT = −a] + erbE [e−r2T/2;WT = b] = 1.

Since e−rWt−r

2t/2 is also a martingale, similar reasoning gives us

eraE [e−r

2T/2;WT = −a] + e−rbE [e−r2T/2;WT = b] = 1.

We can solve those two equations to obtain

E

[

e−r

2T/2;WT = −a

]

= e

rb − e−rb

er(a+b) − e−r(a+b) (3.16)

and

E

[

e−r

2T/2;WT = b

]

= e

ra − e−ra

er(a+b) − e−r(a+b) . (3.17)

22 Martingales

The left-hand sides of (3.16) and (3.17) are the Laplace transforms of the quantities P(T ∈

dt;WT = −a)/dt and P(T ∈ dt;WT = b)/dt, respectively, and finding the inverse Laplace

transforms of the right-hand sides of (3.16) and (3.17) gives us formulas for P(T ∈ dt;WT =

−a)/dt and P(T ∈ dt;WT = b)/dt. If we add the two formulas, we get an expression for

P(T ∈ dt)/dt, and integrating over t from 0 to t0 gives an expression for P(T ≤ t0).

We sketch how to invert the Laplace transform and leave the detailed calculations and

justification for inverting a Laplace transform term by term to the interested reader. See also

Karatzas and Shreve (1991), Section 2.8. The right-hand side of (3.16) is equal to

e−ra − e−ra−2rb

1 − e−2r(a+b) .

Since e−2r(a+b) < 1, we can use
(1 − x)−1 =
∞∑
n=0
xn
to expand the denominator as a power series; if we set λ = r2/2, then
E
[
e−λT ;WT = −a
]
(3.18)
=
∞∑
n=0
(
e−(2n+1)
√
2λa−2n√2λb − e−(2n+1)
√
2λa−(2n+2)√2λb
)
.
We then use the fact that the Laplace transform of
k
2
√
πt3
e−k
2/4t
is e−k
√
λ to find the inverse Laplace transform of the right-hand side of (3.18) by inverting
term by term.
Similarly (see Exercises 3.15 and 3.16), if b > 0, W is a Brownian motion, and S =

inf{t > 0 : Wt = b}, then E e−λS = e−

√

2λb. Inverting the Laplace transform,

P(S ∈ dt) = b√

2πt3

e−b

2/2t, t ≥ 0. (3.19)

Exercises

3.1 If W is a Brownian motion, show that

W 3t − 3

∫ t

0

Ws ds

is a martingale.

3.2 Suppose {Ft} is a filtration satisfying the usual conditions. Show that if Mt is a submartingale

and E Mt = E M0 for all t, then M is a martingale.

3.3 Let X be a submartingale. Show that supt≥0 E |Xt | < ∞ if and only if supt≥0 E X +t < ∞.
3.4 Prove all parts of Proposition 3.8.
Exercises 23
3.5 If Tn is defined by (3.2), show Tn is a stopping time for each n and Tn ↓ T .
3.6 This exercise gives an alternate definition of FT which is more appealing, but not as useful.
Suppose that {Ft} satisfies the usual conditions. Show that FT is equal to the σ -field generated
by the collection of random variables YT such that Y is a bounded process with paths that are
right continuous with left limits and Y is adapted to the filtration {Ft}.
3.7 Suppose {Ft} is a filtration satisfying the usual conditions. Show that if T is a stopping time,
then T is FT measurable.
3.8 Suppose {Ft} is a filtration satisfying the usual conditions and T is a stopping time. Show that
if S is a FT measurable random variable with S ≥ T , then S is a stopping time.
3.9 This exercise demonstrates that the conclusion of Corollary 3.13 cannot be extended to sub-
martingales. Find a filtration {Ft} satisfying the usual conditions and a submartingale X with
respect to {Ft} such that X does not have a version with paths that are right continuous with left
limits.
3.10 Suppose {Ft} is a filtration satisfying the usual conditions. Show that if S and T are stopping
times and X is a bounded F∞ measurable random variable, then
E [E [X | FS] | FT ] = E [X | FS∧T ].
Hint: Let Yt = E [X | Ft ] and Zt = Yt∧S . Show the left-hand side is equal to YS∧T .
3.11 A martingale or submartingale Mt is uniformly integrable if the family {Mt : t ≥ 0} is a uniformly
integrable family of random variables. Show that if Mt is a uniformly integrable martingale with
paths that are right continuous with left limits, then {MT ; T a finite stopping time} is a uniformly
integrable family of random variables. Show this also holds if Mt is a non-negative submartingale
with paths that are right continuous with left limits.
3.12 This exercise weakens the conditions on the optional stopping theorem. Show that if Mt is a
uniformly integrable martingale that is right continuous with left limits and T is a finite stopping
time, then E MT = E M0.
3.13 Let W be a Brownian motion and let T be a stopping time with E T < ∞. Prove that EWT = 0
and EW 2T = E T . This is not an easy application of the optional stopping theorem because we
do not know that Wt∧T is necessarily a uniformly integrable martingale.
3.14 Suppose that (W 1t , . . . ,W
d
t ) is a d-dimensional Brownian motion. Show that if i �= j, then
W it W
j
t is a martingale.
3.15 Let Wt be a Brownian motion, b > 0, and T = inf{t > 0 : Wt = b}. Show T < ∞, a.s. Show
E T = ∞.
Hint: Take a limit in (3.11).
3.16 Suppose W is a Brownian motion and b > 0. If S = inf{t > 0 : Wt = b}, show that the Laplace

transform of the density of S is given by

E e−λS = e−

√

2λb.

3.17 Let Wt be a Brownian motion. Show that if α > 1/2, then

lim

t→∞

Wt

tα

= 0, a.s.

24 Martingales

Hint: Let α0 ∈ (1/2, α), estimate

P( sup

2n≤s≤2n+1

|Ws| ≥ (2n)α0 )

using (3.9), and then use the Borel–Cantelli lemma.

3.18 Let Wt be a one-dimensional Brownian motion and α ∈ (0, 1/2]. Prove that

lim sup

t→∞

|Wt |

tα

> 0, a.s.

3.19 If W is a Brownian motion and b is a constant, then the process Xt = Wt + bt is a Brownian

motion with drift. Prove that if b > 0, then

lim

t→∞ Xt = ∞, a.s.

4

Markov properties of Brownian motion

In later chapters we will discuss extensively the Markov property and strong Markov property.

The Brownian motion case is much simpler, and we do that now.

4.1 Markov properties

Let us begin with the Markov property.

Theorem 4.1 Let {F t} be a filtration, not necessarily satisfying the usual conditions, and

let W be a Brownian motion with respect to {Ft}. If u is a fixed time, then Yt = Wt+u − Wu is

a Brownian motion independent of Fu.

Proof Let Gt = Ft+u. It is clear that Y has continuous paths, is zero at time 0, and is

adapted to {Gt}. Since Yt − Ys = Wt+u − Ws+u, then Yt − Ys is a mean zero normal random

variable with variance (t + u) − (s + u) = t − s that is independent of Fs+u = Gs.

The strong Markov property is the Markov property extended by replacing fixed times u

by finite stopping times.

Theorem 4.2 Let {Ft} be a filtration, not necessarily satisfying the usual conditions, and let

W be a Brownian motion adapted to {Ft}. If T is a finite stopping time, then Yt = WT +t −WT

is a Brownian motion independent of FT .

Proof We will first show that whenever m ≥ 1, t1 < · · · < tm, f is a bounded continuous
function on Rm, and A ∈ FT , then
E [ f (Yt1, . . . ,Ytm ); A] = E [ f (Wt1, . . . ,Wtm )] P(A). (4.1)
Once we have done this, we will then show how (4.1) implies our theorem.
To prove (4.1), define Tn by (3.2). We have
E [ f (WTn+t1 − WTn, . . . ,WTn+tm − WTn ); A] (4.2)
=
∞∑
k=1
E [ f (WTn+t1 − WTn, . . . ,WTn+tm − WTn ); A, Tn = k/2n]
=
∞∑
k=1
E [ f (Wt1+k/2n − Wk/2n, . . . ,Wtm+k/2n − Wk/2n ); A, Tn = k/2n].
Following the usual practice in probability that “,” means “and,” we use the notation
“E [· · · ; A, Tn = k/2n]” as an abbreviation for “E [· · · ; A ∩ (Tn = k/2n)].” Since A ∈ FT ,
25
26 Markov properties of Brownian motion
then A ∩ (Tn = k/2n) = A ∩ ((T < k/2n) \ (T < (k − 1)/2n)) ∈ Fk/2n . We use the
independent increments property of Brownian motion and the fact that Wt −Ws has the same
law as Wt−s to see that the sum in the last line of (4.2) is equal to
∞∑
k=1
E [ f (Wt1+k/2n − Wk/2n, . . . ,Wtm+k/2n − Wk/2n )] P(A, Tn = k/2n)
=
∞∑
k=1
E [ f (Wt1, . . . ,Wtm )] P(A, Tn = k/2n)
= E [ f (Wt1, . . . ,Wtm )] P(A),
which is the right-hand side of (4.1). Thus
E [ f (WTn+t1 − WTn, . . .WTn+tm − WTn ); A] = E [ f (Wt1, . . .Wtm )] P(A). (4.3)
Now let n → ∞. By the right continuity of the paths of W, the boundedness and continuity
of f , and the dominated convergence theorem, the left-hand side of (4.3) converges to the
left-hand side of (4.1).
If we take A = � in (4.1), we obtain
E [ f (Yt1, . . . ,Ytm )] = E [ f (Wt1, . . . ,Wtm )]
whenever m ≥ 1, t1, . . . , tm ∈ [0, ∞), and f is a bounded continuous function on Rm.
This implies that the finite-dimensional distributions of Y and W are the same. Since Y has
continuous paths, Y is a Brownian motion.
Next take A ∈ FT . By using a limit argument, (4.1) holds whenever f is the indicator of
a Borel subset B of Rd , or in other words,
P(Y ∈ B, A) = P(Y ∈ B)P(A) (4.4)
whenever B is a cylindrical set. Let M be the collection of all Borel subsets B of C[0, ∞)
for which (4.4) holds. Let C be the collection of all cylindrical subsets of C[0, ∞). Then we
observe that M is a monotone class containing C and C is an algebra of subsets of C[0, ∞)
generating the Borel σ -field of C[0, ∞). By the monotone class theorem (Theorem B.2),
M is equal to the Borel σ -field on C[0, ∞), and since (4.4) holds for all sets B ∈ M, this
establishes the independence of Y and FT .
In the future, we will not put in the details for the arguments using the monotone class
theorem.
Observe that what is needed for the above proof to work is not that W be a Brownian
motion, but that the process W have right continuous paths and that Wt −Ws be independent
of Fs and have the same distribution as Wt−s. We therefore have the following corollary.
Corollary 4.3 Let {Ft} be a filtration, not necessarily satisfying the usual conditions, and
let X be a process adapted to {Ft}. Suppose X has paths that are right continuous with left
limits and suppose Xt −Xs is independent of Fs and has the same law as Xt−s whenever s < t.
If T is a finite stopping time, then Yt = XT+t − XT is a process that is independent of FT and
X and Y have the same law.
4.2 Applications 27
x
b
2b−x
Figure 4.1 The reflection principle.
4.2 Applications
The first application is known as the reflection principle and allows us to get control of the
maximum of a Brownian motion. The idea is the following. Suppose that Wt is a Brownian
motion and for some path, the Brownian motion goes above a level b before time t but that
at time t the value of Wt is less than x, where x < b. We could take the graph of this path
and reflect it across the horizontal line at level b the first time the path crosses the level b
(Figure 4.1). This will give us a new path that ends up above 2b−x. Thus there is a one-to-one
correspondence between paths where the maximum up to time t is above b and Wt is below
x and the paths where Wt is above 2b − x.
More precisely, we have the following.
Theorem 4.4 Let Wt be a Brownian motion, b > 0, T = inf{t : Wt ≥ b}, and x < b.
Then
P(sup
s≤t
Ws ≥ b,Wt < x) = P(Wt > 2b − x). (4.5)

Proof Let Tn be defined by (3.2). We first show that

P(Tn ≤ t,Wt − WTn < x − b) = P(Tn ≤ t,Wt − WTn > b − x). (4.6)

28 Markov properties of Brownian motion

Writing [x] for the integer part of x, the left-hand side of (4.6) is equal to

[2nt]∑

k=0

P(Tn = k/2n,Wt − WTn < x − b)
=
[2nt]∑
k=0
P(Tn = k/2n,Wt − Wk/2n < x − b)
=
[2nt]∑
k=0
P(Tn = k/2n)P(Wt − WTn < x − b),
using the independent increments property of Brownian motion and the fact that we have
(Tn = k/2n) ∈ Fk/2n . Using the symmetry of the normal distribution, that is, that Wt − Ws
and Ws − Wt have the same law, this is the same as
[2nt]∑
k=0
P(Tn = k/2n)P(Wt − WTn > b − x),

and reversing the steps above, this equals the right-hand side of (4.6).

Since W has continuous paths, WT = b, so (T = t) ⊂ (Wt = b). Because Wt is a normal

random variable, then P(T = t) = 0. Also, P(Wt − WT = b − x) and P(Wt − WT = x − b)

are both zero. If we now let n → ∞ in (4.6), we obtain

P(T ≤ t,Wt − WT < x − b) = P(T ≤ t,Wt − WT > b − x).

Since WT = b, this is the same as

P(T ≤ t,Wt < x) = P(T ≤ t,Wt > 2b − x). (4.7)

By the definition of T and the continuity of the paths of W, the left-hand side is equal to the

left-hand side of (4.5). If Wt > 2b − x, then automatically T ≤ t, so the right-hand side of

(4.7) is equal to the right-hand side of (4.5).

Our second application will be useful when studying local time in Chapter 14.

Proposition 4.5 Let Wt be a Brownian motion with respect to a filtration {Ft} satisfying the

usual conditions. Let T be a finite stopping time and s > 0. If a < b, then
P(WT+s ∈ [a, b] | FT ) ≤ |b − a|√
2πs
.
Proof If A ∈ FT , let k > 0 and write

P(WT+s ∈ [a, b], A)

=

∞∑

j=−∞

P(WT+s ∈ [a, b], A, j/k ≤ WT < ( j + 1)/k)
≤
∞∑
j=−∞
P(WT+s − WT ∈ [a − ( j + 1)/k, b − j/k],
A, j/k ≤ WT ≤ ( j + 1)/k).
Exercises 29
Using the fact that WT+s − WT is a Brownian motion independent of FT , this is less than or
equal to
∞∑
j=−∞
P(Ws ∈ [a − ( j + 1)/k, b − j/k]) P(A, j/k ≤ WT ≤ ( j + 1)/k)
≤
∞∑
j=−∞
1√
2π
b − a + 1/k√
s
P(A, j/k ≤ WT ≤ ( j + 1)/k)
≤ 1√
2π
b − a + 1/k√
s
P(A).
We used here the formula for the density of a normal random variable with mean zero and
variance s. This is true for all k, so letting k → ∞ yields our result.
Exercises
4.1 If W is a Brownian motion, let St = sups≤t Ws. Find the density for St .
4.2 With W and S as in Exercise 4.1, find the joint density of (St ,Wt ).
4.3 Let W be a Brownian motion started at a > 0 and let T0 be the first time W hits 0. Find the law

of supt≤T0 Wt .

4.4 Use the reflection principle to prove that if W is a Brownian motion and T = inf{t > 0 : Wt ∈

(0,∞)}, then

P(T = 0) = 1.

In other words, Brownian motion enters the interval (0,∞) immediately. By symmetry it enters

the interval (−∞, 0) immediately. Conclude that Brownian motion hits 0 infinitely often in

every time interval [0, t].

4.5 Let Wt be a Brownian motion and {Ft} be the minimal augmented filtration generated by W . Let

T = inf{t > 0 : Wt = sup

0≤s≤1

Ws}.

Show that T is not a stopping time with respect to {Ft}.

4.6 Let W and S be as in Exercise 4.1.

(1) Let 0 < s < t < u and let a < b with b − a ≤ 1. Show that there exists a constant c,
depending on s, t, and u, but not a or b, such that
P(Ss ∈ [a, b], sup
t≤r≤u
Wr ∈ [a, b]) ≤ c(b − a)2.
(2) Show that the path of a Brownian motion does not take on the same value as a
local maximum twice. That is, if S and T are times when W has a local maximum, then
WS �= WT , a.s.
4.7 Let Vt be the number of upcrossings of [0, 1] by a Brownian motion W up to time t. This means
we let S1 = 0, Ti = inf{t > Si : Wt ≥ 1}, and Si+1 = inf{T > Ti : Wt ≤ 0} for i = 1, 2, . . . ,

and we set Vt = sup{k : Tk ≤ t}. Show that Vt → ∞, a.s., as t → ∞.

30 Markov properties of Brownian motion

4.8 Let W be a Brownian motion. The zero set of Brownian motion is the random set

Z(ω) = {t ∈ [0, 1] : Wt (ω) = 0}.

(1) Show that Z(ω) is a closed set for each ω.

(2) Show that with probability one, every point of Z(ω) is a limit point of Z(ω). Conclude

that Z(ω) is an uncountable set.

4.9 Let W be a one-dimensional Brownian motion and δ > 0.

(1) Prove that there exists γ such that if t ≤ γ , then

P(0 ≤ Wt ≤ δ/2) ≥ 1/4 and P(−δ/2 ≤ Wt ≤ 0) ≥ 1/4.

(2) Prove there exists γ such that

P(sup

s≤γ

|Ws| > δ/2) ≤ 1/8.

(3) Prove that if m ≥ 1, then

P( sup

mγ≤s≤(m+1)γ

|Ws − Wmγ | ≤ δ/2,Wmγ ∈ [0, δ/2], |W(m+1)γ | ≤ δ/2 | Fmγ )

≥ 18 P( sup

mγ≤s≤(m+1)γ

|Ws − Wmγ | ≤ δ/2,Wmγ ∈ [0, δ/2])

and the same with Wmγ ∈ [−δ/2, 0] in place of Wmγ ∈ [0, δ/2]. Conclude that

P( sup

mγ≤s≤(m+1)γ

|Ws − Wmγ | ≤ δ/2, |Wmγ | ≤ δ/2, |W(m+1)γ | ≤ δ/2 | Fmγ )

≥ 18 P( sup

mγ≤s≤(m+1)γ

|Ws − Wmγ | ≤ δ/2, |Wmγ | ≤ δ/2).

(4) Use induction to prove that if t0 > 0, there exists c1 > 0 such that

P(sup

s≤t0

|Ws| ≤ δ) > c1.

(5) Prove that if W is a d-dimensional Brownian motion, t0 > 0, and δ > 0, there exists c2

such that

P(sup

s≤t0

|Ws| ≤ δ) > c2.

4.10 The p-variation of a function f on the interval [0, 1] is defined by

V p( f ) = sup

{ n−1∑

i=0

| f (ti+1) − f (ti)|p : n ≥ 1, 0 = t0 < t1, · · · < tn = 1
}
;
the supremum is over all partitions P of [0, 1]. In this exercise we will prove that if p < 2 and
W is a Brownian motion, then V p(W ) = ∞, a.s.
(1) Let Xi be an i.i.d. sequence of random variables with finite mean. Use the strong law of
large numbers to prove that if K > E X1, then

P

( n∑

i=1

Xi > Kn

)

→ 0

as n → ∞.

Exercises 31

(2) If p < 2, take r ∈ (p, 2), and let εn = n−1/r. Let S0 = 0 and for i ≥ 0, set
Si+1 = inf{t > Si : |Wt − WSi | > εn}. Set Xi = ε−2n (Si − Si−1). Prove that the Xi are i.i.d.

with finite mean.

(3) Use (1) to show that

P(Sn > 1) = P

( n∑

i=1

Xi > ε

−2

n

)

→ 0

as n → ∞.

(4) Using the partition {S0, S1, . . . , Sn}, show that V p(W ) ≥ nεpn on the event (Sn ≤ 1).

(5) Conclude V p(W ) = ∞, a.s.

5

The Poisson process

At the opposite extreme from Brownian motion is the Poisson process. This is a process

that only changes value by means of jumps, and even then, the jumps are nicely spaced. The

Poisson process is the prototype of a pure jump process, and later we will see that it is the

building block for an important class of stochastic processes known as Lévy processes.

Definition 5.1 Let {F t} be a filtration, not necessarily satisfying the usual conditions. A

Poisson process with parameter λ > 0 is a stochastic process X satisfying the following

properties:

(1) X0 = 0, a.s.

(2) The paths of Xt are right continuous with left limits.

(3) If s < t, then Xt − Xs is a Poisson random variable with parameter λ(t − s).
(4) If s < t, then Xt − Xs is independent of Fs.
Define Xt− = lims→t,s

size 2 or larger at some time t strictly less than t0, then for each n sufficiently large there

32

The Poisson process 33

exists 0 ≤ kn ≤ 2n such that X(kn+1)t0/2n − Xknt0/2n ≥ 2. Therefore

P(∃ s < t0 : �Xs ≥ 2) ≤ P(∃ k ≤ 2n : X(k+1)t0/2n − Xkt0/2n ≥ 2) (5.1)
≤ 2n sup
k≤2n
P(X(k+1)t0/2n − Xkt0/2n ≥ 2)
= 2nP(Xt0/2n ≥ 2n)
≤ 2n(1 − P(Xt0/2n = 0) − P(Xt0/2n = 1))
= 2n
(
1 − e−λt0/2n − (λt0/2n)e−λt0/2n
)
.
We used property 5.1(3) for the two equalities. By l’Hôpital’s rule, (1 − e−x − xe−x)/x → 0
as x → 0. We apply this with x = λt0/2n, and see that the last line of (5.1) tends to 0 as
n → ∞. Since the left-hand side of (5.1) does not depend on n, it must be 0. This holds for
each t0.
Another characterization of the Poisson process is as follows. Let T1 = inf{t : �Xt = 1},
the time of the first jump. Define Ti+1 = inf{t > Ti : �Xt = 1}, so that Ti is the time of the

ith jump.

Proposition 5.3 The random variables T1, T2 − T1, . . . , Ti+1 − Ti, . . . are independent expo-

nential random variables with parameter λ.

Proof In view of Corollary 4.3 it suffices to show that T1 is an exponential random variable

with parameter λ. If T1 > t, then the first jump has not occurred by time t, so Xt is still zero.

Hence

P(T1 > t) = P(Xt = 0) = e−λt,

using the fact that Xt is a Poisson random variable with parameter λt.

We can reverse the characterization in Proposition 5.3 to construct a Poisson process. We

do one step of the construction, leaving the rest as Exercise 5.4.

Let U1,U2, . . . be independent exponential random variables with parameter λ and let

Tj =

∑ j

i=1 Ui. Define

Xt (ω) = k if Tk(ω) ≤ t < Tk+1(ω). (5.2)
An examination of the densities shows that an exponential random variable has a gamma
distribution with parameters λ and r = 1, so by Proposition A.49, Tj is a gamma random
variable with parameters λ and j. Thus
P(Xt < k) = P(Tk > t) =

∫ ∞

t

λe−λx(λx)k−1

(k)

dx.

Performing the integration by parts repeatedly shows that

P(Xt < k) =
k−1∑
i=0
e−λt
(λt)i
i!
,
and so Xt is a Poisson random variable with parameter λt.
We will use the following proposition later.
34 The Poisson process
Proposition 5.4 Let {Ft} be a filtration satisfying the usual conditions. Suppose X0 = 0,
a.s., X has paths that are right continuous with left limits, Xt − Xs is independent of Fs if
s < t, and Xt − Xs has the same law as Xt−s whenever s < t. If the paths of X are piecewise
constant, increasing, all the jumps of X are of size 1, and X is not identically 0, then X is a
Poisson process.
Proof Let T0 = 0 and Ti+1 = inf{t > Ti : �Xt = 1}, i = 1, 2, . . . We will show that if

we set Ui = Ti − Ti−1, then the Ui are i.i.d. exponential random variables and then appeal to

Exercise 5.4.

By Corollary 4.3, the Ui are independent and have the same law. Hence it suffices to show

U1 is an exponential random variable. We observe

P(U1 > s + t) = P(Xs+t = 0) = P(Xs+t − Xs = 0, Xs = 0)

= P(Xt+s − Xs = 0)P(Xs = 0) = P(Xt = 0)P(Xs = 0)

= P(U1 > t)P(U1 > s).

Setting f (t) = P(U1 > t), we thus have f (t + s) = f (t) f (s). Since f (t) is decreasing

and 0 < f (t) < 1, we conclude P(U1 > t) = f (t) = e−λt for some λ > 0, or U1 is an

exponential random variable.

Exercises

5.1 Suppose Pt is a Poisson process and we write Xt = Pt−. Is P1 −X1−t a Poisson process on [0, 1]?

Why or why not?

5.2 Let P be a Poisson process with parameter λ. Show that

lim

n→∞ supt≤1

∣∣∣Pnt

n

− λt

∣∣∣ = 0, a.s.

5.3 Show that if P(1) and P(2) are independent Poisson processes with parameters λ1 and λ2,

respectively, then P(1)t + P(2)t is a Poisson process with parameter λ1 + λ2.

5.4 If X is defined by (5.2), show that X is a Poisson process.

5.5 Let Xt be a stochastic process and let {F00t } be the filtration generated by X . Suppose X is

a Poisson process with respect to the filtration {F00t }. Show that X is a Poisson process with

respect to the minimal augmented filtration generated by X .

Hint: Imitate the proof of Proposition 2.5.

5.6 Suppose Pt is a Poisson process and f and g are non-negative bounded deterministic functions

with compact support. Find necessary and sufficient conditions on f and g so that

∫∞

0 f (s) dPs

and

∫∞

0 g(s) dPs are independent.

Hint: First show that the characteristic function of F = ∫∞0 f (s) dPs is

E eiuF = exp

( ∫ ∞

0

(eiu f (s) − 1) ds

)

.

Exercises 35

5.7 We will talk about weak convergence in general metric spaces in Chapters 30–35. This ex-

ercise is concerned with the weak convergence of real-valued random variables as defined in

Section A.12.

Suppose for each n, Pn is a Poisson random variable with parameter λn and λn → ∞ as

n → ∞. Prove that

Pn − λn√

λn

converges weakly to a normal random variable with mean zero and variance one.

Hint: Imitate the proof of Theorem A.51.

6

Construction of Brownian motion

There are several ways of constructing Brownian motion, none of them easy. Here we give

two constructions. The first is the one that Wiener used, which is based on Fourier series.

The second uses martingale techniques. A method due to Lévy can be found in Bass (1995);

see also Exercises 6.4 and 6.5. We will see several other constructions in later chapters.

6.1 Wiener’s construction

For any of the constructions of Brownian motion, the main step is to construct Wt for

t ∈ [0, 1]. Once we have done this, we get Brownian motion for all t rather easily. More

specifically, suppose we have a Brownian motion Y (0) started at 0 on the time interval [0, 1].

Take independent copies Y (1),Y (2), . . . , each on [0, 1]. We have Y (i)0 = 0 for each i, and now

to get Brownian motion started at 0, define Wt to be equal to Y

(0)

t if t ≤ 1, equal to Y (0)1 +Y (1)t−1

if 1 < t ≤ 2, and more generally
Wt =
( [t]−1∑
i=0
Y (i)1
)
+ Y [t]t−[t]
if t ≥ 1, where [t] is the largest integer less than or equal to t. This will give Brownian motion
started at 0 on the time interval [0, ∞).
Therefore the crux of the problem is to construct Brownian motion on [0, 1]. Because we
are working with Fourier series, it is more convenient to look at Brownian motion on [0, π];
we can just disregard times between 1 and π when we are done.
Throughout this chapter we make the supposition that we can find a countable sequence
Z1, Z2, . . . of independent and identically distributed mean zero normal random variables
with variance one that are F measurable, where (�,F , P) is our probability space. This is
an extremely mild condition.
Theorem 6.1 There exists a process {Wt; 0 ≤ t ≤ 1} that is Brownian motion.
Proof If we fix t ∈ [0, π ] and compute the Fourier series for the function f (s) = s ∧ t, it
is an exercise in calculus to get the Fourier coefficients. We end up with
s ∧ t = st
π
+ 2
π
∞∑
k=1
sin ks sin kt
k2
. (6.1)
36
6.1 Wiener’s construction 37
This suggests letting Z0, Z1, . . . be i.i.d. normal random variables with mean 0 and variance
1 and setting
Wt = t√
π
Z0 +
∞∑
k=1
(√ 2
π
sin kt
k
)
Zk. (6.2)
Assuming there is no problem with convergence, we see that Wt has mean zero, since each
of the Zi does, and that
E [WsWt] = st
π
+
∞∑
k=1
2
π
sin ks sin kt
k2
= s ∧ t (6.3)
as required. We used the independence of the Zi here to show that E [ZiZ j] = 0 if i �= j.
We argue that there is in fact no difficulty with the convergence. Note that
∑m
k=1
sin2 kt
k2
increases as m increases to a finite limit. Therefore
E
[( n∑
k=m
Zk
sin kt
k
)2]
=
n∑
k=m
sin2 kt
k2
→ 0
in L2 as m, n → ∞. This means that the sum on the right of (6.2) is a Cauchy sequence in
L2. By the completeness of L2, the sum on the right of (6.2) converges in L2. A use of the
Cauchy–Schwarz inequality allows us to justify the formula for the expectation of WsWt .
If we let
W jt =
t√
π
Z0 +
j∑
k=1
(√ 2
π
sin kt
k
)
Zk,
then (W jt1 , . . . ,W
j
tm ) is a jointly normal collection of random variables for each j whenever
t1, . . . , tn ∈ [0, π ]. By Remark A.56, it follows that (Wt1, . . . ,Wtm ) is a jointly normal
collection of random variables. Therefore Wt is a Gaussian process. Since each Wt has mean
zero and Cov (Ws,Wt ) = s ∧ t, then Wt has the correct finite-dimensional distributions to be
a Brownian motion.
The only part remaining to the construction is to show that Wt as constructed above has
continuous paths, for we can then use Theorem 2.4. In what follows, pay attention to where
the absolute values are placed. If one is cavalier about placing them, one will very likely run
into trouble.
Define
Sm(t) =
2m−1∑
k=m
sin kt
k
Zk
and let Tm = sup0≤t≤π |Sm(t)|. We write
Wt = t√
π
Z0 +
√
2
π
∞∑
n=0
S2n (t).
We will show
E T 2m ≤
c
m1/2
. (6.4)
38 Construction of Brownian motion
Once we have this, then by the Fubini theorem and then Jensen’s inequality,
E
∞∑
n=0
T2n =
∞∑
n=0
E T2n ≤
∞∑
n=0
(
E [T 22n ]
)1/2
< ∞.
Therefore
∑∞
n=0 T2n < ∞, a.s., and by the Weierstrass M-test (see, e.g., Rudin, 1976), we
have that with probability 1,
∑∞
n=0 S2n (t) converges uniformly in t. Since each S2n (t) is a
continuous function of t, we see that the uniform limit is also continuous and we are done.
We therefore have to prove (6.4). Using |∑k ak|2 =∑ j,k aka j for ak complex valued, we
have
T 2m ≤ sup
0≤t≤π
∣∣∣ 2m−1∑
k=m
eikt
k
Zk
∣∣∣2
≤ sup
0≤t≤π
∣∣∣ 2m−1∑
j,k=m
eikte−i jt
jk
Z jZk
∣∣∣
≤
2m−1∑
k=m
1
k2
Z2k + 2 sup
0≤t≤π
∣∣∣ m−1∑
�=1
2m−�−1∑
j=m
ei�t
j( j + �)ZjZ j+�
∣∣∣
≤
2m−1∑
k=m
1
k2
Z2k + 2
m−1∑
�=1
∣∣∣ 2m−�−1∑
j=m
1
j( j + �)ZjZ j+�
∣∣∣. (6.5)
In the third inequality we wrote
2m−1∑
j,k=m
=
∑
m≤ j=k≤2m−1
+ 2
∑
m≤ j

as u → t, then W ′t = Wt , a.s., or W ′ is a version of W with paths that are right continuous

with left limits. We now drop the primes. Set Wt = W1 if t ≥ 1.

For any t0 ∈ [0, 1], Wt+t0 − Wt0 is also a martingale, and by Jensen’s inequality for

conditional expectations (Proposition A.21), |Wt+t0 −Wt0 |4 is a submartingale. Using Doob’s

inequalities (Theorem 3.6), if λ > 0 and t0, δ ∈ [0, 1],

P( sup

t0≤t≤t0+δ

|Wt − Wt0 | ≥ λ) = P( sup

t0≤t≤t0+δ

|Wt − Wt0 |4 ≥ λ4)

≤ c E |Wt0+δ − Wt0 |

4

λ4

.

Since Wt0+δ − Wt0 is a mean zero normal random variable with variance δ if t0 + δ ≤ 1, we

have

P( sup

t0≤t≤t0+δ

|Wt − Wt0 | ≥ λ) ≤ c

δ2

λ4

. (6.7)

40 Construction of Brownian motion

Let

An = {∃ k ≤ 2n : sup

k/2n≤t≤(k+2)/2n

|Wt − Wk/2n | > 2−n/8}.

From (6.7) with δ = 2−n+1 and λ = 2−n/8,

P(An) ≤ 2n max

k≤2n

P( sup

k/2n≤t≤(k+2)/2n

|Wt − Wk/2n | > 2−n/8)

≤ c2

n2−2n

2−n/2

= c2−n/2,

which is summable. By the Borel–Cantelli lemma, P(An i.o.) = 0. (The event (An i.o.) is the

event where ω is in infinitely many of the An.)

Except for a set of ω’s in a null set, there exists a positive integer N (which will depend

on ω) such that if n ≥ N , then ω /∈ An. Given ε > 0, take n ≥ N such that 2−n/8 < ε/2. If
|t − s| ≤ 2−n with s, t ∈ [0, 1], then s, t ∈ [k/2n, (k + 2)/2n] for some k ≤ 2n. Since ω /∈ An,
|Wt − Ws| ≤ |Wt − Wk/2n | + |Ws − Wk/2n | ≤ 2 · 2−n/8 < ε.
This proves the continuity of Wt .
There is nothing special about the trigonometric polynomials in this second construction.
Let 〈 f , g〉 = ∫ 10 f (r)g(r) dr be the inner product for the Hilbert space L2[0, 1]; we consider
only real-valued functions for simplicity. Let {ϕn} be a complete orthonormal system for
L2[0, 1]: we have 〈ϕm, ϕn〉 = 0 if m �= n, 〈ϕn, ϕn〉 = 1 for each n, and f = 0, a.e., if
〈 f , ϕn〉 = 0 for all n. One property of a complete orthonormal system is Parseval’s identity,
which says that
〈 f , f 〉 =
∞∑
n=1
|〈 f , ϕn〉|2;
see Folland (1999). If we replace f by g and then by f + g and use
〈 f , g〉 = 12 [〈 f + g, f + g〉 − 〈 f , f 〉 − 〈g, g〉],
we obtain
〈 f , g〉 =
∞∑
n=1
〈 f , ϕn〉〈g, ϕn〉.
Now let
an(t) = 〈1[0,t], ϕn〉 =
∫ t
0
ϕn(r) dr.
If Z1, Z2, . . . are independent mean zero normal random variables with variance one, let
Wt =
∞∑
n=1
an(t)Zk. (6.8)
Exercises 41
Assuming there is no difficulty with the convergence, we have
Cov (Ws,Wt ) =
∞∑
n=1
an(s)an(t) =
∞∑
n=1
〈1[0,s], ϕn〉〈1[0,t], ϕn〉
= 〈1[0,s], 1[0,t]〉 = s ∧ t.
Exercise 6.2 asks you to verify that the process W defined by (6.8) is a mean zero Gaussian
process on [0, 1] with the same covariances as a Brownian motion.
Exercises
6.1 Let Z0, Z1, Z2, . . . be a sequence of independent identically distributed mean zero normal random
variables with variance one. Define
Xt = t
2
2
√
π
Z0 +
∞∑
k=1
(√ 2
π
cos kt
k2
)
Zk . (6.9)
(1) Show that the convergence in (6.9) is absolute and uniform over t ∈ [0, 1].
(2) Show that Xt is a Gaussian process.
(3) If Wt is a Brownian motion and
Yt =
∫ t
0
Wr dr, t ∈ [0, 1],
show that X and Y have the same finite-dimensional distributions. Show that X and Y have
the same law when viewed as random variables taking values in C[0, 1]. (The process X is
sometimes known as integrated Brownian motion.)
(4) Find Cov (Xs, Xt ).
6.2 Let {ϕn} be a complete orthonormal system for L2[0, 1]. Show that the sum (6.8) converges in
L2 and give the details of the proof that the resulting process W is a mean zero Gaussian process
with Cov (Ws,Wt ) = s ∧ t if s, t ∈ [0, 1].
6.3 Let D = {k/2n : n ≥ 1, k = 0, 1, . . . , 2n} be the dyadic rationals. Suppose the collection of
random variables {Vt : t ∈ D} is jointly normal, each Vt has mean zero, and Cov (Vs,Vt ) = s ∧ t.
(1) Prove that the paths of V are uniformly continuous over t ∈ D.
(2) If we define Wt = lims∈D,s→t Vs, prove that W is a Brownian motion.
6.4 In this and the next exercise we give the Haar function construction of Brownian motion. Let
ϕ00 = 1 on [0, 1] and for i = 1, 2, . . ., and 1 ≤ j ≤ 2i−1, set
ϕi j(x) =
⎧⎪⎨⎪⎩
2(i−1)/2, (2 j − 2)/2i ≤ x < (2 j − 1)/2i,
−2(i−1)/2, (2 j − 1)/2i ≤ x < 2 j/2i,
0, otherwise.
It is a well-known and easily proved result from analysis (see, e.g., Bass (1995), Section I.2)
that the collection {ϕi j} is a complete orthonormal system for L2[0, 1].
For each i, j, define
ψi j(t) =
∫ t
0
ϕi j(s) ds,
42 Construction of Brownian motion
for each i and j, let Yi j be independent mean zero normal random variables with variance one,
and let
Vi(t) =
2i−1∑
j=1
Yi jϕi j(t)
for i ≥ 1. Set V0 = Y00ϕ00.
(1) Fix i ≥ 1. Prove that each ψi j is bounded by 2(−i−1)/2. Prove that the sets {t : ψi j(t) > 0},

j = 1, . . . , 2i−1, are disjoint.

(2) Fix i ≥ 1. Write

P(∃ t ∈ [0, 1] : |Vi(t)| > i−2) ≤ P(∃ j ≤ 2i−1 : |Yi j|2(−i−1)/2 > i−2),

use Proposition A.52 to estimate this, and conclude that

∞∑

i=1

P( sup

0≤t≤1

|Vi(t)| > i−2) < ∞. (6.10)
6.5 This is a continuation of Exercise 6.4. With ϕi j , ψi j , Yi j , and Vi as in that problem, let
Wt =
∞∑
i=0
Vi(t).
(1) Prove thatW is a jointly normal Gaussian process with mean zero and Cov (Ws,Wt ) = s∧t.
(2) Use (6.10) and the Borel–Cantelli lemma to show that
∑n
i=1 |Vi(t)| converges uniformly
over [0, 1]. Conclude that W is a Brownian motion.
7
Path properties of Brownian motion
The paths of Brownian motion are continuous, but we will see that they are not differentiable.
How continuous are they? We will see that the paths satisfy what is known as a Hölder
continuity condition. A precise description of the oscillatory behavior of Brownian motion
will be given by the law of the iterated logarithm.
A function f : [0, 1] → R is said to be Hölder continuous of order α if there exists a
constant M such that
| f (t) − f (s)| ≤ M |t − s|α, s, t ∈ [0, 1]. (7.1)
We show that the paths of Brownian motion are Hölder continuous of order α if α < 12 .
(They are also not Hölder continuous of order α if α ≥ 12 ; we will see this from the law of
the iterated logarithm.)
Theorem 7.1 If α < 12 , the paths of Brownian motion are Hölder continuous of order α
on [0, 1].
Proof Step 1. First we apply the Borel–Cantelli lemma to a certain sequence of sets. Let W
be a Brownian motion and set
An = {∃ k ≤ 2n − 1 : sup
k/2n≤t≤(k+1)/2n
|Wt − Wk/2n | > 2−nα}.

Since Wt+k/2n − Wk/2n is a Brownian motion,

P(An) ≤ 2n sup

k≤2n

P( sup

t≤1/2n

|Wt+k/2n − Wk/2n | > 2−nα )

≤ 2nP( sup

t≤1/2n

|Wt | > 2−nα ) (7.2)

≤ 2 · 2n exp(−2−2nα/2(2−n)).

Here we used Proposition 3.15. Since α < 12 , then 2
n(1−2α) > 2n for n large, and the last line

of (7.2) is less than

2n+1 exp(−2n(1−2α)/2) ≤ 2n+1e−n

if n is large. Hence

∑

P(An) < ∞, and P(An i.o.) = 0 by the Borel–Cantelli lemma.
Step 2. Next we show that this implies the Hölder continuity. For almost every ω
there exists N (depending on ω) such that if n ≥ N , then ω /∈ An. Let s ≤ t be two points
in [0, 1]. If 2−(n+2) ≤ t − s ≤ 2−(n+1) for some n ≥ N and k is the largest integer such that
43
44 Path properties of Brownian motion
k/2n+2 ≤ s, then
|Wt − Ws| ≤ |Wt − Wt∧((k+1)/2n+2 )| + |Wt∧((k+1)/2n+2 ) − Wk/2n+2 |
+ |Ws − Wk/2n+2 |
≤ 3 · 2−nα ≤ 3 · 4α|t − s|α.
We know |Wt (ω)| is bounded on [0, 1] since the paths are continuous; let K (depending
on ω) be the bound. If |t − s| ≥ 2−(N+1), then
|Wt − Ws| ≤ 2K ≤ (2K)(2N+1)|t − s| ≤ (2K)(2N+1)|t − s|α.
Thus, no matter whether |t − s| is small or large, there exists L (depending on ω) such that
|Wt (ω) − Ws(ω)| ≤ L|t − s|α for all s, t ∈ [0, 1].
One of the most beautiful theorems in probability theory is the law of the iterated logarithm
(LIL). It describes precisely how Brownian motion oscillates.
Theorem 7.2 Let W be a Brownian motion. We have
lim sup
t→∞
|Wt |√
2t log log t
= 1, a.s.
and
lim sup
t→0
|Wt |√
2t log log(1/t)
= 1, a.s.
Proof The second assertion follows from the first by time inversion; see Exercise 2.5. Thus
we only need to prove the first assertion.
Proof of upper bound: We use the Borel–Cantelli lemma. Let ε > 0 and then choose q

larger than 1 but close enough to 1 so that (1 + ε)2/q > 1. Let

An = (sup

s≤qn

|Ws| > (1 + ε)

√

2qn−1 log log qn−1).

By Proposition 3.15,

P(An) ≤ 2 exp

(

− (1 + ε)

22qn−1 log log qn−1

2qn

)

= 2 exp

(

− (1 + ε)

2

q

(log(n − 1) + log log q)

)

= c

(n − 1)(1+ε)2/q ,

where we are using our convention that the letter c denotes a constant whose exact value is

unimportant. This is summable in n, so

∑

P(An) < ∞.
By the Borel–Cantelli lemma, P(An i.o.) = 0. Hence, except for a null set, there exists
N = N (ω) such that ω /∈ An if n ≥ N (ω). If t ≥ qN , then for some n ≥ N + 1 we have
qn−1 ≤ t ≤ qn, and
|Wt | ≤ sup
s≤qn
|Ws| ≤ (1 + ε)
√
2qn−1 log log qn−1 ≤ (1 + ε)
√
2t log log t.
Path properties of Brownian motion 45
Therefore
lim sup
t→∞
|Wt |√
2t log log t
≤ 1 + ε, a.s. (7.3)
Since ε > 0 is arbitrary, the upper bound is proved.

Proof of lower bound: We start with the second half of the Borel–Cantelli lemma. Let

ε > 0 and then take q > 1 very large so that

(1 − ε)2(1 + ε)

1 − q−1 < 1
and 2/
√
q < ε/2. This is possible because (1 − ε)2(1 + ε) = (1 − ε2)(1 − ε) < 1. Let
Bn = (Wqn+1 − Wqn > (1 − ε)

√

2qn+1 log log qn+1).

Since Brownian motion has independent increments, the events Bn are independent. Let

Z = Wqn+1 − Wqn√

qn+1 − qn .

Then Z is a mean zero normal random variable with variance one. By Proposition A.52, we

see that

P(Bn) = P(Z > (1 − ε)

√

2qn+1 log log qn+1/

√

qn+1 − qn)

≥ exp

(

− (1 − ε)

2(1 + ε)2qn+1 log log qn+1

2(qn+1 − qn)

)

= c exp

(

− (1 − ε)2(1 + ε) log(n + 1) + log log q

1 − q−1

)

for n large. Hence ∑

P(Bn) ≥ c

∑

n

1

(n + 1)(1−ε)2(1+ε)/(1−q−1 ) = ∞.

By the Borel–Cantelli lemma, with probability one, ω is in infinitely many Bn. Conse-

quently, with probability one, infinitely often

Wqn+1 − Wqn > (1 − ε)

√

2qn+1 log log qn+1. (7.4)

The inequality (7.4) is not exactly what we want, as we want a lower bound for Wqn+1 , but

we can derive the desired lower bound by using the upper bound we proved in Step 1. We

know from (7.3) that for n large enough,

|Wqn | ≤ 2

√

2qn log log qn ≤ 2√

q

√

2qn+1 log log qn+1 <
ε
2
√
2qn+1 log log qn+1.
Thus infinitely often
Wqn+1 > (1 − 3ε/2)

√

2qn+1 log log qn+1.

46 Path properties of Brownian motion

This proves

lim sup

n→∞

Wqn+1√

2qn+1 log log qn+1

≥ 1 − 3ε

2

, a.s.

Since ε is arbitrary, the lower bound follows.

The law of the iterated logarithm show that the paths of Wt are not differentiable at time 0,

a.s. Applying this to Ws+t − Wt , we see that for each t, W is not differentiable at time t, a.s.

But the null set Nt might depend on t, and it is even conceivable that ∪t∈[0,1]Nt is not a null

set. We have the following stronger result, which says that except for a set of ω’s that form a

null set, t → Wt (ω) is a function that does not have a derivative at any time t ∈ [0, 1].

Theorem 7.3 With probability one, the paths of Brownian motion are nowhere differentiable.

Proof Note that if Z is a normal random variable with mean 0 and variance 1, then

P(|Z| ≤ r) = 1√

2π

∫ r

−r

e−x

2/2 dx ≤ 2r. (7.5)

Let M, h > 0 and let

AM,h = {∃s ∈ [0, 1] : |Wt − Ws| ≤ M |t − s| if |t − s| ≤ h},

Bn = {∃k ≤ 2n : |Wk/n − W(k−1)/n| ≤ 4M/n,

|W(k+1)/n − Wk/n| ≤ 4M/n, |W(k+2)/n − W(k+1)/n| ≤ 4M/n}.

We check that AM,h ⊂ Bn if n ≥ 2/h. To see this, if ω ∈ AM,h, there exists an s such that

|Wt −Ws| ≤ M |t − s| if |t − s| ≤ 2/n; let k/n be the largest multiple of 1/n less than or equal

to s. Then

|(k + 2)/n − s| ≤ 2/n and |(k + 1)/n − s| ≤ 2/n,

and therefore

|W(k+2)/n − W(k+1)/n| ≤ |W(k+2)/n − Ws| + |Ws − W(k+1)/n|

≤ 2M/n + 2M/n < 4M/n.
Similarly |W(k+1)/n − Wk/n| and |Wk/n − W(k−1)/n| are less than 4M/n.
Using the independent increments property, the stationary increments property, and (7.5),
P(Bn) ≤ 2n sup
k≤2n
P(|Wk/n − W(k−1)/n| < 4M/n, |W(k+1)/n − Wk/n| < 4M/n,
|W(k+2)/n − W(k+1)/n| < 4M/n)
≤ 2nP(|W1/n| < 4M/n, |W2/n − W1/n| < 4M/n,
|W3/n − W2/n| < 4M/n)
= 2nP(|W1/n| < 4M/n)P(|W2/n − W1/n| < 4M/n)
× P(|W3/n − W2/n| < 4M/n)
= 2n(P(|W1/n| < 4M/n))3
≤ cn
(4M√
n
)3
,
Exercises 47
which tends to 0 as n → ∞. Hence for each M and h,
P(AM,h) ≤ lim sup
n→∞
P(Bn) = 0.
This implies that the probability that there exists s ≤ 1 such that
lim sup
h→0
|Ws+h − Ws|
|h| ≤ M
is zero. Since M is arbitrary, this proves the theorem.
Exercises
7.1 Here you are asked to find a more precise description of the modulus of continuity of Brownian
paths. Prove that
lim
δ→0
sup
s,t∈[0,1],0<|t−s|<δ
|Wt − Ws|√
δ log(1/δ)
< ∞, a.s.
Hint: Imitate the proof of Theorem 7.1.
7.2 The following is part of what is known as Chung’s law of the iterated logarithm. We will see in
Section 40.3 that there exists c1 such that
P(sup
s≤t
|Ws| ≤ λ) ≤ c1e−π2t/8λ2
for t/λ2 sufficiently large. Prove that
lim inf
t→∞
sups≤t |Ws|√
t/ log log t
< ∞, a.s.
7.3 Let Wt be a one-dimensional Brownian motion. We will see in Section 40.3 that there exists c2
such that
P(sup
s≤t
|Ws| ≤ λ) ≥ c2e−π2t/8λ2
if t/λ2 is sufficiently large. Prove that
lim inf
t→∞
sups≤t |Ws|√
t/ log log t
> 0, a.s.

This is the other half of Chung’s law of the iterated logarithm. In fact,

lim inf

t→∞

sups≤t |Ws|√

t/ log log t

= c, a.s. (7.6)

Identify c and prove (7.6).

7.4 A function f is Hölder continuous of order α at a point t if there exists c such that | f (u)− f (t)| ≤

c|u − t|α for all u. Suppose α > 1/2 and Wt is a Brownian motion. Show that the event

A = {∃ t ∈ [0, 1] : W is Hölder continuous of order α at t)

has probability 0.

Hint: Imitate the proof of nowhere differentiability, but use more than three time intervals.

48 Path properties of Brownian motion

7.5 Let W be a one-dimensional Brownian motion and let Mt = sups≤t Ws (with no absolute value

signs). Prove that if ε > 0, then

lim inf

t→∞

Mt√

t/(log t)1+ε

> 0, a.s.

7.6 This is a complement to Exercise 4.10. Prove that if p > 2 and W is a Brownian motion, then

the p-variation of W , defined in Exercise 4.10, is finite, a.s.

Hint: Use the fact that the paths of Brownian motion are Hölder continuous of order α

if α < 1/2.
7.7 Let W be a Brownian motion and let Z be the zero set: Z = {t ∈ [0, 1] : Wt = 0}.
(1) Show there exists a constant c not depending on x or δ such that
P(∃s ≤ δ : Ws = −x) ≤ P(sup
s≤δ
|Ws| ≥ |x|) ≤ ce−x2/2δ.
(2) Use the Markov property of Brownian motion to show that there exists a constant c not
depending on s or t such that
P(Z ∩ [s, t] �= ∅) ≤ c
(
1 ∧
√
t − s
s
)
.
7.8 Given a Borel measurable subset A of [0, 1], define
Hγ (A) = lim sup
δ→0
[
inf
{ ∞∑
i=1
[bi − ai]γ : A ⊂ ∪∞i=1[ai, bi], sup
i
|bi − ai| ≤ δ
}]
.
In other words, cover A by the union of intervals [ai, bi] and define the analog of Lebesgue
measure. The differences are that we look at |bi − ai|γ but do not require that γ be one, and we
require that none of the intervals be longer than δ. The quantity Hγ (A) is called the Hausdorff
measure of A with respect to the function xγ . The Hausdorff dimension of a set A is defined to
be
inf{γ : Hγ (A) > 0} = sup{γ : Hγ (A) = ∞}.

(For subsets of Rd , we replace the intervals [ai, bi] by balls of radius ri.) As a warm-up to this

exercise, prove that the Hausdorff dimension of the standard Cantor set in [0, 1] is log 2/ log 3.

The purpose of this exercise is to show that if W is a Brownian motion and Z = {t ∈ [0, 1] :

Wt = 0} is the zero set, then the Hausdorff dimension of Z is no more than 1/2.

(1) For each n, let Cn be the collection of intervals [i/2n, (i + 1)/2n] contained in [0, 1] that

intersect Z. (Cn is random.) If #Cn is the cardinality of Cn, use Exercise 7.7 to show

E [ #Cn] ≤

2n−1∑

i=0

P(Z ∩ [i/2n, (i + 1)/2n] �= ∅) ≤ c2n/2.

(2) Write ∑

[i/2n,(i+1)/2n]∈Cn

|2−n|γ = 2−nγ #Cn.

Use the Chebyshev inequality and (1) to conclude that the Hausdorff dimension of Z is less than

or equal to 1/2, a.s. (We will show that it is at least 1/2 in Exercise 14.10.)

8

The continuity of paths

It is often important to know whether a stochastic path has continuous paths. An important

sufficient condition is the Kolmogorov continuity criterion. This criterion is also useful in

showing the continuity of a family of random variables X a in the variable a, where a is a

parameter other than time. Kolmogorov’s continuity criterion is part (2) of Theorem 8.1.

Let Dn = {k/2n : k ≤ 2n} and let D = ∪nDn. The set D is known as the set of dyadic

rationals in [0, 1]. We will use

∞∑

i=1

i−2 ≤ 1 +

∫ ∞

1

x−2 dx = 2.

(In fact by a standard exercise using Parseval’s identity in the theory of Fourier series,∑∞

i=1 i

−2 is actually equal to π2/6.)

We will be considering at first a real-valued process {Xt : t ∈ D}. To show continuity by

considering Xt − Xs for all pairs (s, t) doesn’t work – there are too many pairs. Kolmogorov’s

proof circumvents this problem by considering only a restricted collection of pairs. To bound

X15/32 − X11/32, for example, we compare X15/32 to X7/16, compare X7/16 to X3/8, and compare

X3/8 to X1/4, and we also compare X11/32 to X5/16 and compare X5/16 to X1/4. The advantage

of this complicated way of matching pairs is that each comparison, say, for example X3/8 to

X1/4, is used for a great many of the possible pairs (s, t).

The proof of Theorem 8.1 has three main steps. Step 1 is to reduce the problem to proving

the bound (8.3). The second step is to set up the comparisons that we need, and the third is

to obtain estimates on all the comparisons.

Theorem 8.1 Suppose {Xt : t ∈ D} is a real-valued process and there exist c1, ε, and p > 0

such that

E [|Xt − Xs|p] ≤ c1|t − s|1+ε, s, t ∈ D. (8.1)

Then the following hold.

(1) There exists c2 depending only on c1, p, and ε such that for M > 0,

P

(

sup

s,t∈D,s�=t

|Xt − Xs|

|t − s|ε/4p ≥ M

)

≤ c1/M p. (8.2)

(2) With probability one, Xt is uniformly continuous on D.

Proof Step 1. Let λn = M2−(n+1)ε/4p and

An =

{|Xt − Xs| ≥ λn for some s, t ∈ D with |t − s| ≤ 2−n}.

49

50 The continuity of paths

Recall our convention that the letter c denotes unimportant constants which can change from

line to line. We will show

P(An) ≤ c2−nε/4M−p. (8.3)

This implies (1) and (2) as follows. If |Xt − Xt | ≥ M |t − s|ε/4p for some s, t ∈ D with s �= t,

choose n such that 2−(n+1) < |t − s| ≤ 2−n, and then An holds. The event on the left-hand
side of (8.2) is contained in ∪nAn, and using (8.3) shows that
P(∪nAn) ≤ cM−p
∞∑
n=1
2−nε/4 = cM−p,
which implies (1). Let
BM = { sup
s,t∈D,s�=t
|Xt − Xt |/|t − s|ε/4p ≥ M}.
Note BM decreases as M increases and from (1) we have P(∩∞M=1BM ) = 0. Thus except for
an event of probability zero, each ω is in BcM for some M (where M depends on ω), and this
implies (2). Thus we must show (8.3).
Step 2. Define a( j, t) to be the integer multiple of 2− j that is closest to t (if there are two
different multiples that are equally close, we use some convention to break the tie). If t ∈ Dm,
then a(m, t) = t. If |t − s| ≤ 2−n, then |a(n, t) − a(n, s)| ≤ 2−n+2.
Now if s, t ∈ Dm and m ≥ n, we use the triangle inequality to write
|Xt − Xs| = |Xa(m,t) − Xa(m,s)| (8.4)
≤ |Xa(n,t) − Xa(n,s)|
+ |Xa(n+1,t) − Xa(n,t)| + · · · + |Xa(m,t) − Xa(m−1,t)|
+ |Xa(n+1,s) − Xa(n,s)| + · · · + |Xa(m,s) − Xa(m−1,s)|.
If |Xa(n,t) − Xa(n,s)| < λn/2 and for each i
|Xa(n+i+1,t) − Xa(n+i,t)| < λn
8(i + 1)2
and the same with t replaced by s, then by (8.4)
|Xt − Xs| < λn
2
+ 2
∞∑
i=0
λn
8(i + 1)2 ≤ λn.
Hence if |Xt − Xs| ≥ λn for some s, t ∈ Dm, then at least one of the events E, Fi, or Gi, i ≥ 0,
must hold, where
E = {|Xa(n,t) − Xa(n,s)| ≥ λn/2 for some s, t ∈ Dn with |s − t| ≤ 2−n},
Fi = {|Xa(n+i+1,t) − Xa(n+i,t)| ≥ λn/8(i + 1)2 for some t},
Gi = {|Xa(n+i+1,s) − Xa(n+i,s)| ≥ λn/8(i + 1)2 for some s}.
The continuity of paths 51
Step 3. For the event E to hold, we must have |Xr − Xq| ≥ λn/2 for some q, r ∈ Dn with
|q − r| ≤ 2−n+2. There are at most c2n such pairs (q, r), so the probability of E is bounded,
using Chebyshev’s inequality and (8.1), by
(c2n) sup
q∈Dn,r∈Dn+1,|r−q|≤2−n+2
P(|Xr − Xq| ≥ λn/2)
≤ c2n supq∈Dn,r∈Dn+1,|r−q|≤2−n+2 E [ |Xr − Xq|
p]
(λn/2)p
≤ c2
n
λ
p
n
(2−n+2)1+ε
≤ c2
−nε
λ
p
n
.
For Fi to hold, that is, for |Xa(n+i+1,t) − Xa(n+i,t)| to be greater than λn/8(i + 1)2 for some t,
we must have |Xr −Xq| ≥ λn/8(i+1)2 for some r ∈ Dn+i, q ∈ Dn+i+1 with |r−q| ≤ 2−n−i+2.
There are at most c2n+i such pairs, and so the probability of Fi is bounded by
(c2n+i) sup
r∈Dn+i,q∈Dn+i+1,|r−q|≤2−n−i+2
P
(
|Xr − Xq| ≥ λn
8(i + 1)2
)
≤ c2
n+i2(−n−i+2)(1+ε)(8(i + 1)2)p
λ
p
n
≤ c2
−nε2−iε/2
λ
p
n
.
Here we used the fact that 2−iε(i + 1)2p ≤ c2−iε/2 for some constant c depending on p and ε
but not i. We have the same bound for Gi. Therefore
P(∪i(Fi ∪ Gi) ∪ E) ≤
∞∑
i=0
c2−nε/22−iε/2
λ
p
n
+ c2
−nε/2
λ
p
n
≤ c2−nε/2λ−pn .
Letting m → ∞ we have
P(An) ≤ c2−nε/2λ−pn = c2−nε/4M−p
as required.
The proof of Theorem 8.1 is an example of what is known as a metric entropy or chaining
argument.
In the above, the only place we relied on the fact that we were using real-valued processes
was in using the triangle inequality. Therefore with only slight changes in notation, we have
the following theorem.
52 The continuity of paths
Theorem 8.2 Suppose X takes values in some metric space S with metric dS and there exist
c1, ε, and p > 0 such that

E [dS (Xs, Xt )

p] ≤ c1|t − s|1+ε, s, t ∈ D. (8.5)

Then the following hold.

(1) There exists c2 depending only on c1, p, and ε such that for M > 0,

P

(

sup

s,t∈D,s�=t

dS (Xs, Xt )

|t − s|ε/2p ≥ M

)

≤ c1/M p.

(2) With probability one, Xt is uniformly continuous on D.

Remark 8.3 Theorem 8.2 holds for random variables indexed by time, but the analogous

result holds for the continuity in a of random variables X a indexed by some parameter a

running through D. We may also let the parameter a run instead through the dyadic rationals

in [b1, b2] for any b1 < b2.
The proof of the following corollary is an adaptation of the proof of Theorem 8.1 and is
left as Exercise 8.1.
Corollary 8.4 Suppose there exist c1, ε, N, and p > 0 such that if n ≤ N,

E [dS (Xs, Xt )

p] ≤ c|t − s|1+ε, s, t ∈ Dn.

Then there exists c2 depending on c1, ε, and p but not N such that for M > 0 and n ≤ N we

have

P

(

sup

s,t∈Dn,s�=t

dS (Xs, Xt )

|t − s|ε/2p ≥ M

)

< c2M
−p.
Recall the definition of Hölder continuity from (7.1).
Proposition 8.5 If α < 1/2, then the paths of a one-dimensional Brownian motion {Wt; 0 ≤
t ≤ 1} are Hölder continuous of order α with probability one.
Proof By the stationary increments property and scaling,
E |Wt − Ws|p = E |Wt−s|p = |t − s|p/2E |W1|p.
If α < 1/2, choose p large enough so that ((p/2) − 1)/p > α and then take ε = (p/2) − 1.

(Here ε is large!) Take γ sufficiently small that (ε/p) − γ > α. Then by Exercise 8.2 the

paths of Wt are Hölder continuous of order α, with probability one, provided we restrict

t to D. But the paths of Brownian motion are continuous, so we see that we have Hölder

continuity of order α when t ∈ [0, 1].

Exercises

8.1 Prove Corollary 8.4.

8.2 If the hypothesis of Theorem 8.1 holds and γ < ε/p, show that there exists c2 depending only
on c1, ε, γ , and p such that for M > 0

P

(

sup

s,t∈D,s�=t

dS (Xs, Xt )

|t − s|(ε/p)−γ ≥ M

)

≤ cM−p.

Exercises 53

8.3 Suppose X is a real-valued process and there exist constants c1, c2 such that

P(|Xt − Xs| > λ) ≤ c1e−c2λ log

4(1/|t−s|), s, t ∈ [0, 1].

Prove that with probability one, X has a version which is uniformly continuous on the dyadic

rationals in [0, 1].

8.4 Suppose (Xt , t ∈ [0, 1]) is a mean zero Gaussian process and there exist c and ε such that

Var (Xt − Xs) ≤ c|t − s|ε, s, t ∈ [0, 1].

Prove that there is a version of X that has continuous paths on [0, 1].

8.5 Let X be as in Exercise 8.4. For what values α will X have paths that are Hölder continuous of

order α? (α will depend on ε.)

8.6 Let {Xs,t : s, t ∈ [0, 1]} be a collection of random variables. Suppose there exist c, p, and ε > 0

such that

E |Xs′,t ′ − Xs,t |p ≤ c(|t ′ − t| + |s′ − s|)2+ε.

Prove that with probability one, the map (s, t) → Xs,t (ω) is uniformly continuous on D ×D =

{(s, t); s, t ∈ D}.

9

Continuous semimartingales

Roughly speaking, a semimartingale is the sum of a martingale and a process whose paths

are of bounded variation. In this chapter we consider semimartingales whose paths are con-

tinuous. We will give definitions, and then investigate in more detail the class of martingales

that are square integrable. Finally we present a proof of the Doob–Meyer decomposition for

continuous supermartingales. The Doob–Meyer decomposition used to be considered a very

hard theorem, but at least in the continuous case, an elementary proof is possible. For a proof

for the general case, see Chapter 16.

9.1 Definitions

Let {F t} be a filtration satisfying the usual conditions and let

F∞ =

∨

t≥0

Ft = σ

(⋃

t≥0

Ft

)

.

We say a process X has increasing paths or that X is an increasing process if the functions

t → Xt (ω) are increasing with probability one. Throughout this book saying f is “increasing”

means that s < t implies f (s) ≤ f (t), while saying f is “strictly increasing” means that
s < t implies f (s) < f (t). A process X with paths of bounded variation is just what one
would expect: with probability one, the functions t → Xt (ω) are of bounded variation. We
say X has paths locally of bounded variation if there exist stopping times Rn → ∞ such that
the process Xt∧Rn has paths of bounded variation for each n.
We turn to martingales. A martingale M is a uniformly integrable martingale if the family
of random variables {Mt} is uniformly integrable. A process X is a local martingale if there
exist stopping times Rn → ∞ such that Mnt = Xt∧Rn is a uniformly integrable martingale for
each n. A martingale whose paths are continuous is called a continuous martingale and we
similarly define a right-continuous martingale.
A semimartingale is a process X of the form Xt = Mt + At , where Mt is a local martingale
and At is a process whose paths are locally of bounded variation. As a consequence of
the Doob–Meyer decomposition we will see that submartingales and supermartingales are
semimartingales.
As an example, a Brownian motion Wt is a martingale and is a local martingale (let Rn
be identically equal to n), but is not a uniformly integrable martingale. We will define what
it means to be a square integrable martingale in the next section; Brownian motion is not a
square integrable martingale.
54
9.2 Square integrable martingales 55
9.2 Square integrable martingales
Definition 9.1 A martingale is a square integrable martingale if there exists aF∞ measurable
random variable M∞ such that E M2∞ < ∞ and Mt = E [M∞ | Ft] for all t.
An example of a square integrable martingale would be Mt = Wt∧t0 , whereWt is a Brownian
motion and t0 is a fixed time; in this case M∞ = Wt0 .
Proposition 9.2 Let {Ft} be a filtration satisfying the usual conditions and M a right con-
tinuous process. The following are equivalent:
(1) Mt is a square integrable martingale.
(2) M is a martingale with supt≥0 E M
2
t < ∞.
(3) M is a martingale with E [supt≥0 M
2
t ] < ∞.
Proof To show (1) implies (2), suppose M is a square integrable martingale. Then by
Jensen’s inequality for conditional expectations (Proposition A.21),
E M2t = E [(E [M∞ | Ft])2] ≤ E [E [M2∞ | Ft] ] = E M2∞.
To show (2) implies (3), for each N ,
E [ sup
0≤t≤N
M2t ] ≤ 4E M2N
by Doob’s inequalities. That (2) implies (3) follows by letting N → ∞ and using Fatou’s
lemma.
Now suppose (3) holds, and we will show (1) holds. Since E M2n is uniformly bounded in
n, the martingale convergence theorem (Theorem A.35) implies that Mn converges almost
surely and in L2. Let us call the limit M∞; we have E M2∞ < ∞ by the L2 convergence. Since
E M2n is uniformly bounded, then Mn is a uniformly integrable martingale, and by Proposition
A.37, Mn = E [M∞ | Fn]. If n − 1 ≤ t ≤ n, we have
Mt = E [Mn | Ft] = E [ E [M∞ | Fn] | Ft] = E [M∞ | Ft],
as required.
For the remainder of this section all our martingales will have paths that are right continuous
with left limits.
Proposition 9.3 If M is a square integrable martingale and S ≤ T are finite stopping times,
then E [MT | FS] = MS.
Proof Let A ∈ FS and define U (ω) = S(ω)1A(ω) + T (ω)1Ac (ω). Thus U is equal
to S if ω ∈ A and otherwise is equal to T . Since A ∈ FS ⊂ FT , then we have
(U ≤ t) = [(S ≤ t) ∩ A] ∪ [(T ≤ t) ∩ Ac] is in Ft , and therefore U is a stopping
time. By Proposition 3.11,
E M0 = E MU = E [MS; A] + E [MT ; Ac]
and
E M0 = E MT = E [MT ; A] + E [MT ; Ac].
56 Continuous semimartingales
These two equations imply that E [MS; A] = E [MT ; A], which is what we needed to
prove.
By Exercise 3.11, the conclusion is valid if M is a uniformly integrable martingale.
As an immediate corollary we have
Corollary 9.4 Suppose M is a square integrable martingale and T is a stopping time. Then
Xt = Mt∧T is a martingale with respect to {Ft∧T }.
The proof of the following proposition is similar to that of Proposition 9.3. It may be
viewed as a converse of the optional stopping theorem.
Proposition 9.5 Suppose {Ft} is a filtration satisfying the usual conditions and M is a
process that is adapted to {Ft} such that Mt is integrable for each t. If E MT = 0 for every
bounded stopping time T , then Mt is a martingale.
Proof Suppose s < t and A ∈ Fs. Define T to be equal to s if ω ∈ A and equal to t if ω /∈ A.
As in the proof of Proposition 9.3, but even more simply, T is a stopping time, so
0 = E MT = E [Ms; A] + E [Mt; Ac].
The fixed time t is a stopping time, hence
0 = E Mt = E[Mt; A] + E [Mt; Ac].
Comparing, E [Mt; A] = E [Ms; A], which proves M is a martingale.
Proposition 9.6 Suppose Mt is a square integrable martingale. Then
E [(MT − MS )2 | FS] = E [M2T − M2S | FS]. (9.1)
Proof By Proposition 9.3
E [(MT − MS )2 | FS] = E [M2T | FS] − 2MSE [MT | FS] + M2S
= E [M2T | FS] − M2S
= E [M2T − M2S | FS]
and we are done.
If we take expectations in (9.1), we obtain
E [(MT − MS )2] = E M2T − E M2S . (9.2)
Theorem 9.7 Suppose M0 = 0, Mt is a continuous local martingale, and the paths of Mt are
locally of bounded variation. Then M is identically 0, a.s., that is, P(Mt = 0 for all t) = 1.
Proof Using the definition of local martingale, it suffices to suppose M is a continuous
uniformly integrable martingale. Let t0 be fixed and let At denote the total variation of the
paths of M up to time t. If TN = inf{t : At ≥ N}, we look at MNt = MTN ∧t∧t0 . Using
Proposition 9.3 and the remark following it, we see that MN is also a continuous martingale
with paths of bounded variation, and if MN is identically zero, then letting N → ∞ and
t0 → ∞, we obtain our result. Therefore it suffices to suppose the total variation of Mt is
bounded by N , a.s. In particular, Mt is bounded by N .
9.3 Quadratic variation 57
Let n ≥ 1 and set
Vn = sup
k≤2n−1
|M(k+1)t0/2n − Mkt0/2n |.
Note Vn ≤ 2N , a.s., and Vn → 0, a.s., as n → ∞ by the uniform continuity of the paths of
M on [0, t0]. By dominated convergence, EVn → 0 as n → ∞. We write
E M2t0 = E
[ 2n−1∑
k=0
(M2(k+1)t0/2n − M2kt0/2n )
]
= E
[ 2n−1∑
k=0
(M(k+1)t0/2n − Mkt0/2n )2
]
≤ E
[
Vn
2n−1∑
k=0
|M(k+1)t0/2n − Mkt0/2n |
]
≤ NEVn.
The second equality follows by (9.2). Since n is arbitrary and EVn → 0, then E M2t0 = 0. By
Doob’s inequalities, E [sups≤t0 M
2
s ] = 0. Hence M is identically 0 up to time t0.
9.3 Quadratic variation
Definition 9.8 A continuous square integrable martingale Mt has quadratic variation 〈M〉t
(sometimes written 〈M, M〉t) if M2t − 〈M〉t is a martingale, where 〈M〉t is a continuous
adapted increasing process with 〈M〉0 = 0.
In the case where W is a Brownian motion, t0 is fixed, and Mt = Wt∧t0 the quadratic
variation of M is just 〈M〉t = t ∧ t0 by Example 3.3. Brownian motion itself does not fit
perfectly into the framework of stochastic integration because it is not a square integrable
martingale, although it is a martingale; we will be dealing with this point several times in
what follows.
We will show existence and uniqueness of 〈M〉t by means of the Doob–Meyer decompo-
sition, Theorem 9.12, below. However we defer the proof of the Doob–Meyer decomposition
until the next section. A process Z is of class D if {ZT : T a finite stopping time} is a
uniformly integrable family of random variables.
Theorem 9.9 Let Mt be a continuous square integrable martingale. There exists a continuous
adapted increasing process 〈M〉t with 〈M〉0 = 0 and with increasing paths such that M2t −
〈M〉t is a martingale.
If At is a continuous adapted increasing process such that M2t − At is a martingale, then
P(At �= 〈M〉t for some t) = 0.
Proof By Jensen’s inequality for conditional expectations,
E [M2t | Fs] ≥ (E [Mt | Fs])2 = M2s
if s < t, and so M2t is a submartingale. Since M∞ is square integrable, given ε there exists δ
such that E [M2∞; A] < ε if P(A) < δ. Since M2t is a submartingale, if K > E M2∞/δ, then

P(M2t > K) ≤ E M2t /K ≤ E M2∞/K < δ,
58 Continuous semimartingales
and consequently
E [M2t ; M2t > K] ≤ E [M2∞; M2t > K] < ε.
By Exercise 3.11, M2t is of class D. Applying the Doob–Meyer decomposition
(Theorem 9.12) to −M2t , we write −M2t = Nt − Bt , where Nt is a martingale and Bt
has increasing paths. We then set 〈M〉t = Bt . The uniqueness follows from the uniqueness
part of the Doob–Meyer decomposition.
In view of Proposition 9.3 and the definition of 〈M〉, we have
E [(MT − MS )2 − (〈M〉T − 〈M〉S ) | FS] (9.3)
= E [M2T − M2S − (〈M〉T − 〈M〉S ) | FS] = 0
if S and T are finite stopping times and M is a continuous square integrable martingale.
If M and N are two square integrable martingales, we define 〈M, N〉t by
〈M, N〉t = 12 [〈M + N〉t − 〈M〉t − 〈N〉t]. (9.4)
This is sometimes called the covariation of M and N .
An alternative representation of 〈M〉t is the following. A proof could be given now, but it
is a bit messy. After we have Itô’s formula this will be easier.
Theorem 9.10 Let M be a square integrable martingale and let t0 > 0. Then 〈M〉t is the

limit in probability of

[2nt0]∑

k=0

(M(k+1)/2n − Mk/2n )2,

where [2nt0] is the largest integer less than or equal to 2nt0.

9.4 The Doob–Meyer decomposition

In this section we give a proof of the Doob–Meyer decomposition for continuous super-

martingales. First we need the following inequality, which has many other uses as well.

Proposition 9.11 Suppose A1 and A2 are two increasing adapted continuous processes

starting at zero with Ai∞ = limt→∞ Ait < ∞, a.s., i = 1, 2, and suppose there exists a
positive real K such that for all t,
E [Ai∞ − Ait | Ft] ≤ K, a.s., i = 1, 2. (9.5)
Let Bt = A1t − A2t . Suppose there exists a non-negative random variable V with EV 2 < ∞
such that for all t,
|E [B∞ − Bt | Ft] | ≤ E [V | Ft], a.s. (9.6)
Then
E sup
t≥0
B2t ≤ 8EV 2 + 8
√
2K(EV 2)1/2. (9.7)
Proof We start by showing
E (Ai∞)
2 ≤ 2K2, i = 1, 2. (9.8)
9.4 The Doob–Meyer decomposition 59
First suppose Ai∞ is bounded by a positive real number L. Note that we have
E Ai∞ = E [E [Ai∞ − Ai0 | F0] ] ≤ K. A simple calculation shows that
(Ai∞)
2 = 2
∫ ∞
0
(Ai∞ − Ait ) dAit .
We then have, using Proposition 3.14,
E (Ai∞)
2 = 2E
∫ ∞
0
(Ai∞ − Ait ) dAit
= 2E
∫ ∞
0
(E [Ai∞ | Ft] − Ait ) dAit
= 2E
∫ ∞
0
E [Ai∞ − Ait | Ft] dAit
≤ 2KE
∫ ∞
0
dAit = 2KE Ai∞ ≤ 2K2.
If we let TL = inf{t : A1t + A2t ≥ L} and Ai,Lt = Ait∧TL , then (9.5) still holds if we replace Ait
by Ai,Lt . We obtain E (A
i,L
∞ )
2 ≤ 2K2, and then letting L → ∞ and using Fatou’s lemma
proves (9.8).
We next write
B2∞ = 2
∫ ∞
0
(B∞ − Bt ) dBt,
and hence
E B2∞ = 2E
∫ ∞
0
E [B∞ − Bt | Ft] dBt
≤ E
∫ ∞
0
E [V | Ft] d(A1t + A2t )
= E
∫ ∞
0
V d(A1t + A2t )
= E [V (A1∞ + A2∞)].
The bound (9.8) takes care of the integrability concerns. By the Cauchy–Schwarz inequality
we obtain
E B2∞ ≤ (E [(A1∞ + A2∞)2])1/2(EV 2)1/2 ≤ 2
√
2K(EV 2)1/2.
Now let Mt = E [B∞ | Ft], Nt = E [V | Ft], where we take the right–continuous versions
(see Corollary 3.13), and let Xt = Mt − Bt . We have
|Xt | = |E [B∞ − Bt | Ft] | ≤ Nt,
and using Doob’s inequalities,
E sup
t≥0
X 2t ≤ E sup
t≥0
N2t ≤ 4E N2∞ = 4EV 2.
Also by Doob’s inequalities,
E sup
t≥0
M2t ≤ 4E M2∞ = 4E B2∞.
Since supt≥0 |Bt | ≤ supt≥0 |Xt | + supt≥0 |Mt |, our result follows.
60 Continuous semimartingales
We now prove the Doob–Meyer decomposition for continuous supermartingales. In view
of the proof of Proposition A.30, we would like to let
At =
∫ t
0
E
[dZs
ds
| Fs
]
ds,
but this doesn’t make sense. We instead define an approximation Aht by (9.9) and show that
Aht converges to what we want as h → 0.
Theorem 9.12 Suppose Zt is a continuous adapted supermartingale of class D. Then there
exists an increasing adapted continuous process At with paths locally of bounded variation
started at 0 and a continuous local martingale Mt such that
Zt = Mt − At .
If M ′ and A′ are two other such processes with Zt = M ′t − A′t , then Mt = M ′t and At = A′t
for all t, a.s.
Proof Let us prove the second assertion first. Let SN be the first time that |Mt | + |M ′t |
exceeds N . If
Zt = Mt − At = M ′t − A′t,
then Mt∧SN − M ′t∧SN = At∧SN − A′t∧SN is a martingale whose paths are locally of bounded
variation. By Theorem 9.7, Mt∧SN = M ′t∧SN , a.s. Since this is true for all N , then Mt = M ′t .
Now let us prove the existence of M and A. Let TN = inf{t : |Zt | ≥ N}∧N and ZNt = Zt∧TN .
By Exercise 9.2, ZN is a supermartingale. If we prove the decomposition ZNt = MNt − ANt for
each N , then by the uniqueness assertion, if N1 < N2, we have A
N1
t and M
N1
t agreeing with
AN2t and M
N2
t , respectively, for t ≤ TN1 . Hence given t, we can choose N large enough so that
t ≤ TN and then define Mt = MNt , At = ANt . Clearly this gives the desired decomposition.
Thus we may suppose that Zt is bounded by some N and that Zt is constant for t ≥ N .
Let Vδ = sup|t−s|≤δ |Zt − Zs|. Since Z has continuous paths,
Vδ = sup
s,t∈Q+,|t−s|≤δ
|Zt − Zs|,
and therefore Vδ is measurable with respect to F∞. Since the paths of Z are uniformly
continuous, Vδ → 0, a.s., as δ → 0, and since |Vδ| ≤ 2N , we have by dominated convergence
that EV 2δ → 0 as δ → 0.
We define
Aht =
1
h
∫ t
0
(Zs − E [Zs+h | Fs]) ds. (9.9)
At this point we do not know even that E [Zs+h | Fs] has any nice measurability prop-
erties (it is not a martingale, for example); let us assume that it has a version that
has continuous paths, is adapted, and is jointly measurable in t and ω, and prove this
9.4 The Doob–Meyer decomposition 61
fact a bit later on. Because Z is a supermartingale, Ah is increasing. We have (note
Exercise 9.6)
E[Ah∞ − Aht | Ft] =
1
h
E
[ ∫ ∞
t
E [Zs − Zs+h | Fs] ds | Ft
]
= 1
h
∫ ∞
t
E [Zs − Zs+h | Ft] ds
= 1
h
E
[ ∫ ∞
t
Zs ds −
∫ ∞
t+h
Zs ds | Ft
]
= 1
h
E
[ ∫ t+h
t
Zs ds | Ft
]
= E
[ ∫ 1
0
Zt+uh du | Ft
]
.
Since Z is bounded by N , it follows that Ah satisfies (9.5). If k < h, then
|E [(Ah∞ − Aht ) − (Ak∞ − Akt ) | Ft] | =
∣∣∣E [ ∫ 1
0
(Zt+uh − Zt+uk ) du | Ft
] ∣∣∣
≤ E [Vh | Ft].
Now apply Proposition 9.11 to see that E supt≥0(A
h
t − Akt )2 → 0 as k, h → 0. This shows
that whenever hn decreases to 0, then Ahn is a Cauchy sequence in a normed linear space,
where the norm is given by
‖X ‖ = (E sup
t≥0
|Xt |2)1/2, (9.10)
which is complete by Exercise 9.5. Therefore there exists a limit A. Since
E sup
t≥0
(Aht − At )2 → 0
as h → 0, there exists a subsequence hn → 0 such that supt≥0(Ahnt − At )2 → 0, a.s., which
proves that At is continuous and increasing.
We calculate
E [A∞ − At | Ft] = lim
h→0
E [Ah∞ − Aht | Ft]
= lim
h→0
E
[ ∫ 1
0
Zt+uh du | Ft
]
= E
[ ∫ 1
0
Zt du | Ft
]
= Zt .
Therefore
Zt = E [A∞ | Ft] − At,
which is the decomposition of Z into a martingale minus an increasing process.
62 Continuous semimartingales
Fix h. It remains to show that there is a version of E [Zs+h | Fs] that is a continuous
jointly measurable adapted process. Define Yt = Zt+h and define Y nt to be equal to Yk/2n if
k/2n ≤ t < (k+1)/2n. Take the right-continuous version Ỹ k,nt of the martingale E [Yk/2n | Ft]
(see Corollary 3.13) and let
Ỹ nt (ω) =
∞∑
k=0
1[k/2n,(k+1)/2n )(t)Ỹ k,nt (ω).
Note that Ỹ nt = E [Y nt | Ft], a.s., for all t. Moreover, Ỹ nt is right continuous, so we see that it
is jointly measurable in t and ω. Now for n > m,

sup

t≥0

|Ỹ nt − Ỹ mt | ≤ sup

t≥0

E [V2−m | Ft]. (9.11)

We have already seen that there exists a subsequence such that the right-hand side of

(9.11) converges to 0 almost surely. Hence along the appropriate subsequence, Ỹ nt converges

uniformly. If we call the limit Ỹ , we see that Ỹt is right continuous, adapted, and jointly

measurable. If k/2n ≤ t ≤ (k + 1)/2n, then |Y nt − Y nk/2n | ≤ V2−n , so

|Ỹ nt − Ỹ nk/2n | = |E [Y nt − Y nk/2n | Ft] | ≤ E [V2−n | Ft].

By the triangle inequality,

|Ỹ nt − Ỹ ns | ≤ 2 sup

t≥0

E [V2−n | Ft]

if k/2n ≤ s, t ≤ (k + 1)/2n. Therefore the largest jump of Ỹ nt is bounded by

2 supt≥0 E [V2−n | Ft], and we conclude the limit Ỹ has continuous paths. Finally, Y nt dif-

fers from Yt by at most V2−n , so we see by passing to the limit that Ỹt is a version of

E [Zt+h | Ft].

Exercises

9.1 Let Wt be a Brownian motion started at 1 and T0 = inf{t > 0 : Wt = 0}. Is Mt = Wt∧T0 a

square integrable martingale? A locally square integrable martingale? A uniformly integrable

martingale? A martingale? A local martingale? A semimartingale?

9.2 Prove that if M is a submartingale such that the paths of M are continuous, supt |Mt | is integrable,

and S ≤ T are finite stopping times, then E [MT | FS] ≥ MS . Note that the last part of the proof

of Proposition 9.3 breaks down here.

9.3 Suppose Mt is a local martingale with continuous paths. Show that if N > 0, TN = inf{t :

|Mt | ≥ N}, and MNt = Mt∧TN , then MN is a uniformly integrable martingale.

9.4 Suppose W 1t and W

2

t are two independent Brownian motions, t0 > 0, and M

i

t = Wt∧t0 , i = 1, 2.

Show 〈M1, M2〉t = 0.

9.5 Show that the norm defined in (9.10) is complete.

9.6 Let Zt be a bounded supermartingale with continuous paths that is constant from some time t0

on. Show that for each t

E

[ ∫ ∞

t

E [Zs − Zs+h | Fs] ds | Ft

]

=

∫ ∞

t

E [Zs − Zs+h | Ft ] ds, a.s.

Notes 63

9.7 We mentioned that one can prove the existence of 〈M〉 without using the Doob–Meyer theorem.

Here is how that argument starts. Let M be a bounded continuous martingale and for each n,

define

In(t) =

[t2n]∑

i=0

(M(i+1)/2n − Mi/2n )2.

Here [x] is the integer part of x. Prove that for each t > 0, E |In(t) − Im(t)|2 → 0 as n, m → ∞.

One can then define 〈M〉t as the L2 limit of In(t).

Hint: If n > m, note that

M(i+1)/2m − Mi/2m =

2n−m(i+1)−1∑

j=2n−mi

(M( j+1)/2n − M j/2n ).

Notes

The first proof of the Doob–Meyer decomposition was by Meyer in the early 1960s and was

a major breakthrough. There are now a number of alternate proofs. The proof we give here

for continuous supermartingales is new.

10

Stochastic integrals

This chapter is devoted to the construction of stochastic integrals, primarily with respect to

continuous square integrable martingales. The motivating example is

∫ t

0 Hs dWs, where W

is a Brownian motion and H is an adapted process satisfying certain conditions. We cannot

define this integral as a Lebesgue–Stieltjes integral because the paths of Brownian motion

are nowhere differentiable (Theorem 7.3).

One way to visualize a stochastic integral is to think of dWs as “white noise,” on a radio

and Hs as the volume control which increases or decreases the white noise by a factor. For

another model, if Ws is supposed to represent a stock price at time s (of course, stock prices

can’t be negative, while Brownian motion can!) and Hs is the number of shares held at time

s, then the stochastic integral represents the net profit.

10.1 Construction

Let Mt be a continuous square integrable martingale with respect to a filtration {F t} satisfying

the usual conditions, and suppose Ht is an adapted process. Under appropriate additional

assumptions on H , we want to define

Nt =

∫ t

0

Hs dMs, (10.1)

the stochastic integral of H with respect to M .

We impose two conditions on the integrand Ht , a measurability one and an integrability

one. First we define the predictable σ -field P on [0, ∞) × �. This is the smallest σ -field

of subsets of [0, ∞) × � with respect to which all left continuous, bounded, and adapted

processes are measurable. In symbols,

P = σ (X : X is left continuous, bounded, and adapted to {Ft}).

This can be rephrased by saying P is the σ -field on [0, ∞) × � generated by the collection

of all sets of the form

{(t, ω ∈ [0, ∞) × � : Xt (ω) > a},

where a ∈ R and X is a bounded, adapted, left continuous process. We require H : [0, ∞) ×

� → R to be measurable with respect to P . When this happens, we say H is predictable.

The integrability is easier to state: we require

E

∫ ∞

0

H 2s d〈M〉s < ∞. (10.2)
64
10.1 Construction 65
Observe that H will meet both requirements if H is bounded, adapted, and has continuous
paths.
We define
∫ t
0 Hs dMs in three steps:
Step 1. When Hs(ω) = K(ω)1(a,b](s), where K is bounded and Fa measurable.
Step 2. When Hs is a sum of processes of the form in Step 1.
Step 3. When H is predictable and satisfies (10.2).
If Mt = Wt∧t0 , where W is a Brownian motion and t0 is a fixed time, then 〈M〉t = t ∧ t0,
and it might help the reader to work through the proofs in this special case. Even in this
situation, all the elements of the general construction are present.
We will need the following easy lemma.
Lemma 10.1 The predictable σ -field P is generated by the collection C of processes of the
form Xt (ω) =
∑n
i=1 Ki(ω)1(ai,bi](t), where for each i, Ki is a boundedFai measurable random
variable.
Proof If X ∈ C, then X is bounded, adapted, and left continuous, hence X is a predictable
process. Thus C ⊂ P .
On the other hand, if Y is a bounded, adapted, left-continuous process, we can approximate
Y by the processes
Y nt (ω) =
n2n∑
i=0
Yi/2n (ω)1(i/2n,(i+1)/2n](t).
Each such Y n is in C. Therefore the σ -field generated by C contains P .
Proposition 10.2 Suppose H is as in Step 1 above. Then
Nt = K(Mt∧b − Mt∧a)
is a continuous martingale,
E N2∞ = E
∫ ∞
0
K21(a,b](s) d〈M〉s = E [K2(〈M〉b − 〈M〉a)],
and
〈N〉t =
∫ t
0
K21(a,b](s) d〈M〉s.
Proof The continuity of the paths of N is clear. Set N∞ = K(Mb −Ma). Since K is bounded
and M is square integrable, E N2∞ < ∞. We will show Nt = E [N∞ | Ft], which will prove
that Nt is a martingale.
If t ≥ b, then since K, Mb, and Ma are Ft measurable,
E [N∞ | Ft] = K(Mb − Ma) = Nt .
If a ≤ t ≤ b, K is Ft measurable, and
E [K(Mb − Ma) | Ft] = KE [Mb − Ma | Ft] = K(Mt − Ma) = Nt .
In particular, Na = E [N∞ | Fa] = 0. Finally, if t ≤ a,
E [N∞ | Ft] = E [E [N∞ | Fa] | Ft] = 0 = Nt .
66 Stochastic integrals
For E N2∞, we have by (9.2) with S = a and T = b,
E N2∞ = E [K2(Mb − Ma)2] = E [K2E [(Mb − Ma)2 | Fa] ]
= E [K2E [〈M〉b − 〈M〉a | Fa] ] = E [K2(〈M〉b − 〈M〉a)].
To verify the formula for 〈N〉t , let
L∞ = K2(Mb − Ma)2 − K2(〈M〉b − 〈M〉a),
Lt = K2(Mb∧t − Ma∧t )2 − K2(〈M〉b∧t − 〈M〉a∧t .
Then
Lt = N2t −
∫ t
0
K21(a,b](s) d〈M〉s,
and we must show that Lt is a martingale. To do this, it suffices to show Lt = E [L∞ | Ft].
If t ≥ b, then L∞ is Ft measurable, so E [L∞ | Ft] = L∞ = Lt . If a ≤ t ≤ b, then
E [L∞ | Ft] = K2E [(Mb − Ma)2 − (〈M〉b − 〈M〉a) | Ft]
= K2E [M2b − M2a − (〈M〉b − 〈M〉a) | Ft]
= K2E [M2t − M2a − (〈M〉t − 〈M〉a) | Ft]
= K2E [(Mt − Ma)2 − (〈M〉t − 〈M〉a) | Ft]
= Lt,
using (9.1) and (9.3) with the stopping times there being fixed positive real numbers. In
particular, E [L∞ | Fa] = La = 0. Finally, if t ≤ a,
E [L∞ | Ft] = E [E [L∞ | Fa] | Ft] = 0 = La
as required.
Next suppose
Hs(ω) =
J∑
j=1
Kj1(a j,b j](s), (10.3)
where each Kj isFa j measurable and bounded. We may rewrite H so that the intervals (aj, bj]
satisfy a1 < b1 ≤ a2 < b2 ≤ · · · ≤ aJ < bJ . For example, if Hs = K11(a1,b1] + K21(a2,b2] with
a1 < a2 < b1 < b2, we may rewrite Hs as
K11(a1,a2] + (K1 + K2)1(a2,b1] + K21(b1,b2].
Define
Nt =
J∑
j=1
Kj(Mt∧b j − Mt∧a j ). (10.4)
We need to check that rewriting Hs so that a1 < b1 ≤ a2 < · · · < bJ does not affect the value
of Nt , but this is routine.
10.1 Construction 67
Proposition 10.3 With H as in (10.3) and N defined by (10.4), Nt is a continuous martingale,
E N2∞ = E
∫ ∞
0
H 2s d〈M〉s,
and
〈N〉t =
∫ t
0
H 2s d〈M〉s. (10.5)
Proof By linearity, Nt is a continuous martingale. We have
E N2∞ = E
[∑
j
H 2j (Mbj − Maj )2
]
(10.6)
+ 2E
[∑
i< j
HiHj(Mbi − Mai )(Mbj − Maj )
]
.
The cross terms vanish, because when i < j and we condition on Fa j , we have
E [HiHj(Mbi − Mai )E [(Mbj − Maj ) | Fa j ] ] = 0.
For the terms in the first sum in (10.6), by (9.3)
E [H 2j (Mbj − Maj )2] = E [H 2j E [(Mbj − Maj )2 | Fa j ] ]
= E [H 2j E [〈M〉b j − 〈M〉a j | Fa j ] ]
= E [H 2j ([〈M〉b j − 〈M〉a j ])].
Therefore
E N2∞ = E
∫ ∞
0
H 2s d〈M〉s. (10.7)
The argument for 〈N〉t is similar.
Now suppose Hs is predictable and (10.2) holds. Choose H ns of the form given in (10.3)
above such that
E
∫ ∞
0
(H ns − Hs)2 d〈M〉s → 0.
To see that this can be done, define
‖Y ‖2 =
(
E
∫ ∞
0
Y 2t d〈M〉t
)1/2
for Y predictable. Then ‖Y ‖2 is an L2 norm on functions on [0, ∞) × �, so by Lemma 10.1
we can approximate H in this norm by processes of the form given in (10.3). (When H is
bounded, adapted, and has continuous paths, taking H ns equal to Hk/2n if k/2
n < s ≤ (k+1)/2n
for s < n and H ns = 0 if s ≥ n will work.)
By Doob’s inequalities we have
E
[
sup
t≥0
( ∫ t
0
(H ns − H ms ) dMs
)2]
≤ 4E
( ∫ ∞
0
(H ns − H ms ) dMs
)2
= 4E
∫ ∞
0
(H ns − H ms )2 d〈M〉s → 0.
68 Stochastic integrals
The norm
‖Y ‖∞ = (E [sup
t
|Yt |2])1/2 (10.8)
is complete; this was shown in Exercise 9.5. Thus there exists a process Nt such that
supt≥0 |Nt −
∫ t
0 H
n
s dMs| → 0 in L2.
If H ns and H
n
s are two sequences converging in the ‖ · ‖2 norm to H , then
E
( ∫ t
0
(H ns − H ns ) dMs
)2
= E
∫ t
0
(H ns − H ns )2 d〈M〉s → 0,
or the limit is independent of which sequence H n we choose.
It is easy to see, because of the L2 convergence, that Nt is a martingale: if A ∈ Fs, then
E
[ ∫ t
0
H nr dMr; A
]
= E
[ ∫ s
0
H nr dMr; A
]
by Proposition 10.3. Now use that∣∣∣E [ ∫ t
0
H nr dMr − Nt; A
]∣∣∣ ≤ E ∣∣∣ ∫ t
0
H nr dMr − Nt
∣∣∣
≤
(
E
( ∫ t
0
H nr dMr − Nt
)2)1/2
→ 0
and similarly with t replaced by s.
Similar arguments using the L2 convergence show that
E N2t = E
∫ t
0
H 2s d〈M〉s, (10.9)
and
〈N〉t =
∫ t
0
H 2s d〈M〉s. (10.10)
Because supt≥0 |Nt −
∫ t
0 H
n
s dMs| → 0 in L2, there exists a subsequence {nk} such that the
convergence takes place almost surely, that is
sup
t≥0
∣∣∣ ∫ t
0
H nks dMs − Nt
∣∣∣→ 0, a.s.
Since each
∫ t
0 H
n
s dMs has continuous paths, with probability one, Nt has continuous paths.
We write Nt =
∫ t
0 Hs dMs and call Nt the stochastic integral of H with respect to M .
We summarize our construction as follows.
Theorem 10.4 Suppose the filtration {Ft} satisfies the usual conditions and Mt is a square
integrable martingale with continuous paths. Suppose H is of the form
J∑
i=1
Kj(ω)1(a j,b j](s), (10.11)
10.2 Extensions 69
where each Kj is bounded and Fa j measurable. In this case define∫ t
0
Hs dMs =
J∑
j=1
Kj(Mt∧b j − Mt∧a j ).
If H is predictable and
E
∫ ∞
0
H 2s d〈M〉s < ∞,
choose H n of the form given in (10.11) with E
∫∞
0 (H
n
s − Hs)2 d〈M〉s → 0, and define
Nt =
∫ t
0
Hs dMs
to be the limit with respect to the norm (10.8) of
∫ t
0 H
n
s dMs. Then Nt is a continuous
martingale,
E N2∞ = E
∫ ∞
0
H 2s d〈M〉s,
and
〈N〉t =
∫ t
0
H 2s d〈M〉s.
Moreover the definition of Nt is independent of the particular choice of the H n.
10.2 Extensions
There are some extensions of the definition that are fairly routine.
Extension 1. If ∫ ∞
0
H 2s d〈M〉s < ∞, a.s.,
but without the expectation being finite, let
TN = inf
{
t :
∫ t
0
H 2s d〈M〉s > N

}

.

M ′t = Mt∧TN is a square integrable martingale with 〈M ′〉t = 〈M〉t∧TN , so

∫ t

0 H

2

s d〈M ′〉t ≤ N .

Define

∫ t

0 Hs dMs to be the quantity

∫ t

0 Hs dMs∧TN if t ≤ TN . If t ≤ TK ≤ TN , we need to

check that

∫ t

0 Hs d〈M〉t∧TK =

∫ t

0 Hs d〈M〉t∧TN , so that our definition is consistent. This is part

of Exercise 10.2.

Extension 2. If Mt is a continuous local martingale (see Section 9.1 for the definition), let

Sn = inf{t : |Mt | ≥ n}. By Exercise 9.3, Mt∧Sn will be a uniformly integrable martingale,

and in fact, since Mt∧Sn is bounded, it is square integrable. For t ≤ Sn we set∫ t

0

Hs dMs =

∫ t

0

Hs dMs∧Sn

and 〈M〉t = 〈M〉t∧Sn . Again there is consistency to check, which is also part of Exercise 10.2.

70 Stochastic integrals

Extension 3. Suppose that Xt = Mt + At is a semimartingale with continuous paths, so

that M is a local martingale and A is a process with paths locally of bounded variation. If∫∞

0 H

2

s d〈M〉s +

∫∞

0 |Hs| |dAs| < ∞, we define∫ t
0
Hs dXs =
∫ t
0
Hs dMs +
∫ t
0
Hs dAs,
where the first integral on the right is a stochastic integral and the second is a Lebesgue–
Stieltjes integral.
For a semimartingale, we define
〈X 〉t = 〈M〉t . (10.12)
Given two semimartingales X and Y we define 〈X ,Y 〉t by:
〈X ,Y 〉t = 12 [〈X + Y 〉t − 〈X 〉t − 〈Y 〉t].
Exercises
10.1 Prove (10.5) in Proposition 10.3.
10.2 Check the consistency of the first two extensions of the definition of stochastic integrals.
10.3 Show that if M is a continuous square integrable martingale, and T a finite stopping time, then∫ ∞
0
1[0,T ] dMs = MT .
10.4 Show that if Nt =
∫ t
0 Hs dMs where M is a continuous square integrable martingale, H is
predictable, and E
∫∞
0 H
2
s d〈M〉s < ∞, and Lt =
∫ t
0 Ks dNs, where K is predictable and
E
∫∞
0 K
2
s d〈N〉s < ∞, then
Lt =
∫ t
0
HsKs dMs.
10.5 Show that if M , H , and N are as in Exercise 10.4, then 〈M, N〉t =
∫ t
0 Hs d〈M〉s.
Hint: Derive a formula for 〈N + M〉t from the fact that
Nt + Mt =
∫ t
0
(1 + Hs) dMs.
10.6 Suppose that M and L are square integrable martingales, H is predictable and satisfies (10.2),
and Nt =
∫ t
0 Hs dMs. Show that
〈N, L〉t =
∫ t
0
Hs d〈M, L〉s. (10.13)
Sometimes the stochastic integral of H with respect to M is defined to be the square integrable
martingale N for which (10.13) holds for all square integrable martingales L.
10.7 Show that if M and N are square integrable martingales with continuous paths, then
〈M, N〉t ≤ (〈M〉t )1/2(〈N〉t )1/2.
Hint: Imitate an appropriate proof of the Cauchy–Schwarz inequality. This result is a special
case of the inequality of Kunita–Watanabe.
11
Itô’s formula
The most important result in the theory of stochastic integration is Itô’s formula. This is also
known as the change of variables formula.
Let Ck be the functions that are k times continuously differentiable and Ckb those functions
Ck such that the function and its ith-order derivatives are bounded for i ≤ k.
Theorem 11.1 Let Xt be a semimartingale with continuous paths and suppose f ∈ C2. Then
for almost every ω
f (Xt ) = f (X0) +
∫ t
0
f ′(Xs) dXs + 12
∫ t
0
f ′′(Xs) d〈X 〉s, t ≥ 0. (11.1)
Step 1 will be to reduce to the case when f ∈ C3b and X has appropriate boundedness
conditions. Step 2 is a use of Taylor’s formula; see (11.2). Step 3 shows that each term
converges to the appropriate quantity, and Step 4 removes the restriction that f be in C3b .
Proof Step 1. If Xt = Mt + At is the decomposition of X into a local martingale M and a
process A that has paths locally of bounded variation, let Vt be the total variation of A up to
time t: Vt =
∫ t
0 |dAs|. Let
TN = inf{t : |Mt | > N or 〈M〉t > N or Vt > N}.

By the continuity of paths, TN → ∞, a.s., as N → ∞, so for almost every ω and for each

t, t ∧ TN = t for N large enough. Since Itô’s formula is a path-by-path result, it suffices to

prove Itô’s formula for Xt∧TN for each N , or what amounts to the same thing, we may take N

arbitrary and assume Mt , 〈M〉t , At , and Vt are all bounded by N . In this case, Xt is bounded

by 2N .

Since X is bounded, we may modify f , f ′, and f ′′ outside of [−2N, 2N] without affecting

the validity of Itô’s formula. Therefore we will also assume f ∈ C2 with compact support.

Let us temporarily assume in addition that f ′′′ exists and is continuous; we will remove this

last assumption later on.

Let t0 > 0, ε > 0, S0 = 0, and define

Si+1 = Si+1(ε) = inf{t > Si :|Mt − MSi | > ε or 〈M〉t − 〈M〉Si > ε

or Vt − VSi > ε} ∧ t0.

Note Si = t0 for i sufficiently large (how large depends on ω) by the continuity of the

paths.

71

72 Itô’s formula

Step 2. The key idea to proving Itô’s formula is Taylor’s theorem. We write

f (Xt0 ) − f (X0) =

∞∑

i=0

[ f (XSi+1 ) − f (XSi )] (11.2)

=

∞∑

i=0

f ′(XSi )(XSi+1 − XSi ) + 12

∞∑

i=0

f ′′(XSi )(XSi+1 − XSi )2

+

∞∑

i=0

Ri,

where Ri is the remainder term. We have |Ri| ≤ c‖ f ′′′‖∞|XSi+1 − XSi |3.

Step 3. Let us first look at the terms with f ′ in them. Let H εs = f ′(XSi ) if Si ≤ s < Si+1.
By the continuity of f ′ and Xs, we see that H εs converges boundedly and pointwise to f
′(Xs).
In particular,
∫ t0
0 |H εs − f ′(Xs)| dVs → 0 boundedly, hence
E
∫ t0
0
|H εs − f ′(Xs)| dVs → 0.
Also,
E
( ∫ t0
0
(H εs − f ′(Xs)) dMs
)2
= E
∫ t0
0
|H εs − f ′(Xs)|2 d〈M〉s → 0
as ε → 0. We then have∑
i
f ′(XSi )(XSi+1 − XSi ) =
∫ t0
0
H εs (dMs + dAs) →
∫ t0
0
f ′(Xs) (dMs + dAs),
which leads to the f ′ term in Itô’s formula.
Next let us look at the f ′′ terms. We can write
(XSi+1 − XSi )2 = (MSi+1 − MSi )2 + 2(MSi+1 − MSi )(ASi+1 − ASi ) + (ASi+1 − ASi )2.
Note
∑
i f
′′(XSi )(MSi+1 − MSi )(ASi+1 − ASi ) is bounded in absolute value by∑
i
ε‖ f ′′‖∞|ASi+1 − ASi | ≤ ε‖ f ′′‖∞
∫ t0
0
dVs ≤ ε‖ f ′′‖∞N,
which goes to 0 as ε → 0; this follows from the definition of Si. Similarly the expression∑
f ′′(XSi )(ASi+1 − ASi )2 also goes to 0. Therefore we need to show∑
i
f ′′(XSi )(MSi+1 − MSi )2 →
∫ t0
0
f ′′(Xs) d〈X 〉s.
By an argument very similar to the one for the f ′ terms,
1
2
∑
i
f ′′(XSi )(〈M〉Si+1 − 〈M〉Si ) → 12
∫ t0
0
f ′′(Xs) d〈M〉s, (11.3)
and since 〈X 〉t = 〈M〉t for semimartingales (see (10.12)), the right-hand side of (11.3) is the
correct f ′′ term. We thus need to show that∑
i
f ′′(XSi )[(MSi+1 − MSi )2 − (〈M〉Si+1 − 〈M〉Si )] → 0 (11.4)
Itô’s formula 73
as ε → 0.
We will show
E
( ∞∑
i=0
Bi
)2
→ 0, (11.5)
where
Bi = f ′′(XSi )[(MSi+1 − MSi )2 − (〈M〉Si+1 − 〈M〉Si )].
We have
E
(∑
i
Bi
)2
= E
∑
i
B2i + 2
∑
i< j
BiB j.
If i < j, then
E [BiBj] = E [BiE [Bj | F Si+1 ] ].
By (9.2) and the fact that Si+1 ≤ Sj, we see that
E [Bj | FSi+1 ] = f ′′(XSj )E [(MSj+1 − MSj )2 − (〈M〉S j+1 − 〈M〉S j ) | FSi+1 ] = 0,
so the cross-products vanish.
Therefore to prove (11.5) it remains to show E
∑
i B
2
i → 0 as ε → 0. We use the easy
inequality (x + y)2 ≤ 2x2 + 2y2. Since f ′′ is bounded,
E
∑
i
B2i ≤ 2‖ f ′′‖2∞
∑
i
E [(MSi+1 − MSi )4] (11.6)
+ 2‖ f ′′‖2∞
∑
i
E [(〈M〉Si+1 − 〈M〉Si )2].
The first sum on the right-hand side of (11.6) is bounded by
2ε2‖ f ′′‖2∞
∑
i
E [(MSi+1 − MSi )2] = 2ε2‖ f ′′‖2∞E [M2t0 − M20 ]
≤ 8ε2‖ f ′′‖2∞N2.
The second sum on the right-hand side of (11.6) is bounded by
2ε‖ f ′′‖2∞
∑
i
E [(〈M〉Si+1 − 〈M〉Si ] ≤ 2ε‖ f ′′‖2∞E 〈M〉t0 ≤ 2ε‖ f ′′‖2∞N.
Both of these tend to 0 as ε → 0. Therefore E ∑i B2i → 0, and the proof of the convergence
for the f ′′ term is complete.
74 Itô’s formula
The final terms to examine are the remainder terms. We have shown that E
∑
i(XSi+1 −XSi )2
remains bounded as ε → 0. Since
|Ri| ≤ cε‖ f ′′′‖∞(XSi+1 − XSi )2,
we see E
∑
i |Ri| → 0 as ε → 0.
Step 4. To finish up, we remove the assumption that f ∈ C3. (We still assume that f ∈ C2
with compact support.) Take a sequence { fm} of C3 functions such that fm, f ′m, and f ′′m
converge uniformly to f , f ′, and f ′′, respectively. Apply Itô’s formula with fm and then let
m → ∞. The terms fm(Xt ) and fm(X0) clearly converge to f (Xt ) and f (X0). The f ′m terms
converge because
E
( ∫ t0
0
( f ′m(Xs) − f ′(Xs)) dMs
)2
= E
∫ t0
0
| f ′m(Xs) − f (Xs)|2 d〈M〉s → 0
and
E
∣∣∣ ∫ t0
0
( f ′m(Xs) − f ′(Xs)) dAs
∣∣∣ ≤ E ∫ t0
0
| f ′m(Xs) − f ′(Xs)| dVs → 0
as m → ∞. The f ′′m terms converge by dominated convergence. This shows that (11.1) holds
for each t0, except for a null set Nt0 depending on t0. Let N = ∪t∈Q+Nt , where Q+ denotes the
non-negative rationals. If ω /∈ N , then (11.1) holds for every t0 rational. Each term in (11.1)
is continuous, a.s. (with a null set N ′ independent of t0). Therefore if ω /∈ N ∪ N ′, (11.1)
holds for all t0.
There is a multivariate version of Itô’s formula, which is proved in a very similar way:
Theorem 11.2 Suppose X 1t , . . . , X
d
t are continuous semimartingales, Xt = (X 1t , . . . , X dt ),
and f is a C2 function on Rd . Then with probability one,
f (Xt ) = f (X0) +
∫ t
0
d∑
i=1
∂ f
∂xi
(Xs) dX
i
s (11.7)
+ 12
∫ t
0
d∑
i, j=1
∂2 f
∂xi∂x j
(Xs) d〈X i, X j〉s
for all t ≥ 0.
The following is known as the integration by parts formula or Itô’s product formula, and
is very useful.
Corollary 11.3 If X and Y are semimartingales with continuous paths, then
XtYt = X0Y0 +
∫ t
0
Xs dYs +
∫ t
0
Ys dXs + 〈X ,Y 〉t .
Proof By Itô’s formula,
X 2t = X 20 + 2
∫ t
0
Xs dXs + 〈X 〉t .
Exercises 75
The analogous formula holds when X is replaced by Y and when X is replaced by X + Y .
We then use
XtYt = 12 [(Xt + Yt )2 − X 2t − Y 2t ];
substituting the formulas for X 2t , Y
2
t , and (Xt + Yt )2 that we obtained from Itô’s formula and
doing some algebra yields our result.
Exercises
11.1 Suppose Wt is a Brownian motion and a ∈ R. Show that the amount of time Brownian motion
spends at the point a is zero, i.e., that∫ t
0
1{a}(Ws) ds = 0, a.s.
for all t > 0.

11.2 Let a < b and let fa,b be the C1 function such that fa,b(0) = f ′a,b(0) = 0 and
f ′a,b(x) =
∫ x
0
1[a,b](y) dy, x ∈ R.
In other words, fa,b is the function whose second derivative is 1[a,b], except that the second
derivative is not defined at a and b. Show Itô’s formula holds for fa,b:
fa,b(Wt ) =
∫ t
0
f ′a,b(Ws) dWs + 12
∫ t
0
1[a,b](Ws) ds.
11.3 If Wt is a Brownian motion, a > 0, and T = inf{t > 0 : |Wt | = a}, calculate E

∫ T

0 (Ws)

k ds for

each non-negative integer k. Also calculate

E

∫ T

0

1[b1,b2](Ws) ds

if [b1, b2] ⊂ [−a, a].

11.4 Let W be a Brownian motion, let t0 < t1 < · · · < tn = 1, and let
Bi = (Wti − Wti−1 )2 − (ti − ti−1).
Show there exists a constant c1 not depending on {t0, . . . , tn} such that
E
( n∑
i=1
Bi
)2 ≤ c1 max
1≤i≤n
|ti − ti−1|.
11.5 Use Exercise 11.4 and the Borel–Cantelli lemma to prove that if W is a Brownian motion, then
lim
n→∞
2n∑
k=1
(Wk/2n − W(k−1)/2n )2 = 1, a.s.
76 Itô’s formula
11.6 In our proof of Itô’s formula, the use of stopping times simplifies the proof considerably. This
exercise considers a proof of Itô’s formula using fixed times. Suppose M is a bounded continuous
martingale, A is a continuous process whose paths have total variation bounded by N > 0, a.s.,

and Xt = Mt + At .

(1) Writing [x] for the integer part of x, prove that for each t,

[2nt]+1∑

i=1

(X(i+1)/2n − Xi/2n )2

converges in probability to 〈X 〉t .

(2) Prove that if f is a C2 function whose second derivative is bounded, then

[2nt]+1∑

i=1

f ′′(Xi/2n )(X(i+1)/2n − Xi/2n )2

converges in probability to ∫ t

0

f ′′(Xs) d〈Xs〉.

Since the increments of M and A are not uniformly bounded by something small, this is much

harder than the proof of Theorem 11.1 given in this chapter.

11.7 Here is an alternate way to prove Itô’s formula.

(1) Suppose X = M + A, where M and A are as in Exercise 11.6. Write

X 2t − X 20 =

[t2n]−1∑

i=0

(X 2(i+1)/2n − X 2i/2n )

=

[t2n]−1∑

i=0

2Xi/2n (X(i+1)/2n − Xi/2n ) +

[t2n]−1∑

i=0

(X(i+1)/2n − Xi/2n )2.

Use Exercise 11.6 to show that Itô’s formula holds when f (x) = x2.

(2) Derive the Itô product formula. Then use induction to show that Itô’s formula holds when

f (x) = xn, n a positive integer.

(3) Given f ∈ C2, find polynomials Pm such that Pm, P′m, P′′m converge uniformly to f , f ′, f ′′,

respectively, on a compact interval as m → ∞. Apply Itô’s formula for Pm and show that one

can take limits to derive Itô’s formula for f .

12

Some applications of Itô’s formula

We will be using Itô’s formula throughout the book. In this chapter we give some applications,

each of which will turn out to be quite useful.

12.1 Lévy’s theorem

The following is known as Lévy’s theorem. Recall that if M is a local martingale with

continuous paths and TN = inf{t : |Mt | ≥ N}, we defined 〈M〉t to be equal to 〈M〉t∧TN if

t ≤ TN ; see Section 10.2. Moreover, by Exercise 9.3, Mt∧N is a square integrable martingale

for each N .

Theorem 12.1 Let Mt be a continuous local martingale with respect to a filtration {F t}

satisfying the usual conditions such that M0 = 0 and 〈M〉t = t. Then Mt is a Brownian

motion with respect to {Ft}.

Proof Fix t0 and let Nt = Mt+t0 − Mt0 , F ′t = Ft+t0 . It is routine to check that Nt is a

martingale with respect to F ′t and that 〈N〉t = t. Note F ′0 will not be the trivial σ -field in

general. We see that

E N2t = E M2t+t0 − E M2t0 = t < ∞.
If f is a function mapping the reals to the complex numbers, we may still use Itô’s formula:
just apply Itô’s formula to the real and imaginary parts of f . Doing this for f (x) = eiux,
where u and x are real, we have
eiuNt = 1 + iu
∫ t
0
eiuNs dNs − u
2
2
∫ t
0
eiuNs ds. (12.1)
If we take TK = inf{t : |Nt | ≥ K}, then
eiuNt∧TK = 1 + iu
∫ t∧TK
0
eiuNs dNs − u
2
2
∫ t∧TK
0
eiuNs ds. (12.2)
Take A ∈ F ′0, multiply (12.2) by 1A, and take expectations. The stochastic integral is a
martingale, so this term will have 0 expectation. Then let K → ∞, and we are left with
E [eiuNt ; A] = P(A) − u
2
2
∫ t
0
E [eiuNs; A] ds. (12.3)
We used the Fubini theorem here. (The reason we introduced the stopping time TK is that
Nt∧TK is a square integrable martingale, and hence the stochastic integral is a martingale. We
might run into integrability problems if we worked with (12.1) instead of (12.2).)
77
78 Some applications of Itô’s formula
Write J (t) = E [eiuNt ; A], so we have
J (t) = P(A) − u
2
2
∫ t
0
J (s) ds. (12.4)
Since J is bounded, (12.4) shows that J is continuous. Since J is continuous, using (12.4)
again shows that J is differentiable. Hence J ′(t) = − u22 J (t) with J (0) = P(A). The only
solution to this ordinary differential equation is
J (t) = P(A)e−u2t/2. (12.5)
If we set A = �, this tells us that E eiuNt = e−u2t/2, and by the uniqueness theorem for
characteristic functions (Theorem A.48), Mt+t0 − Mt0 is a mean zero normal random variable
with variance t. Equation (12.5) also tells us that
E [eiuNt ; A] = E [eiuNt ]P(A) (12.6)
when A ∈ F ′0. Let f be a C∞ function with compact support. The Fourier transform f̂ (u)
will be in the Schwartz class; see Section B.2. Replacing u by −u in (12.6), multiplying the
resulting equation by f̂ (u), and integrating over u ∈ R, we have∫
f̂ (u)E [e−iuNt ; A] du =
∫
f̂ (u)E [e−iuNt ] du P(A).
Using the Fubini theorem and the Fourier inversion theorem, and dividing by a constant, we
conclude
E [ f (Nt ); A] = E [ f (Nt )]P(A).
Since f̂ is in the Schwartz class, integrability is not a problem when applying the Fubini
theorem. A limit argument shows that this equation holds with f equal to 1B, where B is a
Borel subset of R, hence
P(Mt+t0 − Mt0 ∈ B, A) = P(Mt+t0 − Mt0 ∈ B) P(A).
This shows the independence of Mt+t0 − Mt0 and Ft0 . We thus see that Mt is a continuous
process starting at 0 with Mt+t0 −Mt0 being a mean zero normal random variable with variance
t independent of Ft0 , and therefore M is a Brownian motion.
12.2 Time changes of martingales
The next theorem says that most continuous martingales arise from Brownian motion via a
time change. That is, the paths are the same, but the rate at which one moves along the paths
varies. In fact, it is possible to show that all continuous martingales arise from a time change
of a Brownian motion that is possibly stopped at a random time.
Theorem 12.2 Suppose Mt is a continuous local martingale, M0 = 0, 〈M〉t is strictly
increasing, and limt→∞ 〈M〉t = ∞, a.s. Let
τ (t) = inf{u : 〈M〉u ≥ t}.
Then Wt = Mτ (t) is a Brownian motion with respect to F ′t = Fτ (t).
12.4 Martingale representation 79
Proof Let us first suppose that W 2t is integrable. We have by Proposition 9.3 that
E [Wt | F ′s] = E [Mτ (t) | Fτ (s)] = Mτ (s) = Ws,
or Wt is a continuous martingale. Similarly, W 2t − t is a martingale. Now apply
Lévy’s theorem, Theorem 12.1. Removing the assumption that W 2t is integrable is left as
Exercise 12.1.
12.3 Quadratic variation
Itô’s formula allows us to prove Theorem 9.10 fairly simply.
Proof of Theorem 9.10 If TK = inf{t : |Mt | ≥ K}, we will show that
[t02n]∑
k=0
(MTK∧(k+1)/2n − MTK∧k/2n )2
converges to 〈M〉t0∧TK . Since TK → ∞ as K → ∞, this will prove the proposition. Thus we
may assume M is bounded by K.
If s > 0 and we let Nt = Ms+t − Ms, then Nt is a martingale with respect to the filtration

F ′t = Fs+t and we can check that 〈N〉t = 〈M〉t+s − 〈M〉s. By Itô’s formula applied to the

process N , we obtain

(Mt+s − Ms)2 = 2

∫ t

0

(Mr+s − Ms) dMr + (〈M〉t+s − 〈M〉s).

Applying this with t = 1/2n and s = k/2n and summing, we see that

[t02n]∑

k=0

(M(k+1)/2n − Mk/2n )2 − 〈M〉t = 2

∫ t0

0

Lnr dMr + R, (12.7)

where Lnr = Mr − Mk/2n for k/2n ≤ r < (k + 1)/2n and
R = 〈M〉([t02n]+1)/2n − 〈M〉t0 .
Note
E
(
2
∫ t0
0
Lnr dMr
)2
= 4E
∫ t0
0
(Lnr )
2 d〈M〉r. (12.8)
The integrand (Lnr )
2 is bounded by 4K2, E 〈M〉t = E M2t ≤ K2 is finite, and Lnr tends to 0 as
n → ∞. By dominated convergence, the right-hand side of (12.8) tends to 0 as n → ∞. As
for the remainder term, R goes to 0 by the continuity of the paths of 〈M〉t . The reason we
only have convergence in probability rather than in L2 is due to the stopping time argument
involving TK .
12.4 Martingale representation
The next theorem says that every martingale adapted to the filtration of a Brownian mo-
tion can be expressed as a stochastic integral with respect to the Brownian motion. This
80 Some applications of Itô’s formula
used to be a rather arcane result that was of interest only to probabilists specializing in
martingales. But then it turned out that this theorem is the basis for showing the complete-
ness of the market in the theory of financial mathematics; see Chapter 28. The martingale
representation theorem is also key to the innovations approach to stochastic filtering; see
Chapter 29.
Theorem 12.3 Let Ft be the minimal augmented filtration generated by a one-dimensional
Brownian motion Wt, let t0 > 0, and let Y be Ft0 measurable with EY 2 < ∞. There exists a
predictable process Hs with E
∫ t0
0 H
2
s ds < ∞ such that
Y = EY +
∫ t0
0
Hs dWs, a.s. (12.9)
The proof consists of showing (12.9) holds for successively larger classes of random
variables. Step 1 of the proof shows that the equation holds for random variables of the form
eiu(Wt−Ws ) and Step 2 shows that (12.9) holds for products of such random variables. In Step
3, it is shown that if the equation holds for a set of random variables, it holds for the closure
of that set with respect to the L2 norm.
Proof Step 1. Let Xt = iuWt +u2t/2. Note 〈X 〉t = (iu)2〈W 〉t . By Itô’s formula applied with
f (x) = ex,
eiuWt+u
2t/2 = 1 +
∫ t
0
eXr d(iuWr − u2r/2) + 12
∫ t
0
(−u2)eXr dr
= 1 +
∫ t
0
iueiuWr+u
2r/2 dWr.
Therefore
eiuWt = e−u2t/2 +
∫ t
0
iueiuWr+u
2r/2−u2t/2 dWr. (12.10)
The integrand in the stochastic integral in (12.10) is eiuWr times a deterministic function,
hence is predictable. Therefore (12.9) holds when Y = eiuWt and moreover, the support of H
in this case is contained in [0, t], that is, Hr = 0 if r /∈ [0, t]. Similarly, (12.9) holds when
Y = eiu(Wt−Ws ), and in this case the support of the corresponding H is [s, t].
Step 2. Suppose now that Y1 and Y2 are two random variables for which (12.9) holds with
the supports of the corresponding H1 and H2 overlapping by at most finitely many points. To
be more precise, if Yi = EYi +
∫ t0
0 Hi(s) dWs, i = 1, 2, then we suppose that, with probability
one, H1(s)H2(s) = 0 except for finitely many points s. This implies∫ t0
0
H1(s)H2(s) ds = 0.
12.4 Martingale representation 81
Let Zi(t) = EYi +
∫ t
0 Hi(s) dWs, i = 1, 2. Note Zi(0) = EYi and Zi(t0) = Yi. Then by the
product formula (Corollary 11.3),
Y1Y2 = (EY1)(EY2) +
∫ t0
0
Z1(s) dZ2(s) +
∫ t0
0
Z2(s) dZ1(s) + 〈Z1, Z2〉t0
= (EY1)(EY2) +
∫ t0
0
Z1(s)H2(s) dWs +
∫ t0
0
Z2(s)H1(s) dWs
+
∫ t0
0
H1(s)H2(s) ds
= (EY1)(EY2) +
∫ t0
0
Ks dWs, (12.11)
where Ks = Z1(s)H2(s)+Zs(s)H1(s), and so the support of Ks is contained in the union of the
supports of H1(s) and H2(s). Taking an expectation in (12.11), E [Y1Y2] = (EY1)(EY2). Thus
(12.9) holds for Y1Y2. Using induction, (12.9) will hold for the product of n random variables
Yi, i = 1, . . . , n, provided the supports of any two of the corresponding Hi overlap by at most
finitely many values of s. Combining this with Step 1, we see that if s1 < s2 < · · · < sn+1 ≤ t0,
then the random variables of the form
Y = exp
(
i
n∑
j=1
uj(Wsj+1 − Wsj )
)
(12.12)
satisfy (12.9).
Step 3. We claim that random variables of the form (12.12) generate σ (Ws; s ≤ t0). To see
this, we proceed as in the last paragraph of the proof of Theorem 12.1, namely, we replace
each uj by −uj, multiply by f̂ (u1, . . . , un), the Fourier transform of a C∞ function f with
compact support, integrate over (u1, . . . , un) ∈ Rn, use the Fubini theorem and the Fourier
inversion theorem, and we obtain random variables of the form
f (Ws2 − Ws1, . . . ,Wsn+1 − Wsn )
for f in C∞ with compact support. By a limit argument, such random variables generate
σ (Ws; s ≤ t0). We will prove that whenever Yn satisfies (12.9) and Yn → Y in L2, then Y
satisfies (12.9). By Exercise 2.7 and Proposition 2.5, this will prove our theorem.
Suppose each Yn satisfies (12.9) with integrand Hn(s) and suppose Yn → Y in L2. Then
EYn → EY , and Yn − EYn converges in L2 to Y − EY . Since
E
∫ t0
0
(Hn(s) − Hm(s))2 ds = E ((Yn − EYn) − (Ym − EYm))2 → 0,
the sequence Hn is a Cauchy sequence with respect to the norm ‖X ‖ = (E
∫ t0
0 X
2
s ds)
1/2,
which is an L2 norm and hence complete. Therefore there exists Hs (which is predictable
because each Hn(s) is predictable) such that E
∫ t0
0 H
2
s ds < ∞ and E
∫ t0
0 (Hn(s)−Hs)2 ds →
0. Hence
E
(
(Yn − EYn) −
∫ t0
0
Hs dWs
)2
= E
∫ t0
0
(Hn(s) − Hs)2 ds → 0.
Since Yn − EYn converges in L2 to Y − EY , it follows that Y − EY =
∫ t0
0 Hs dWs, a.s.
82 Some applications of Itô’s formula
Corollary 12.4 Suppose Mt is a right-continuous square integrable martingale with respect
to the minimal augmented filtration {Ft} generated by a one-dimensional Brownian mo-
tion and suppose M0 = 0. Let t0 > 0. Then there exists a predictable process Hs with

E

∫ t0

0 H

2

s ds < ∞ such that with probability one
Mt =
∫ t
0
Hs dWs
for all t ≤ t0.
Proof Since Mt is a martingale, E [Mt0 | F0] = M0, and taking expectations, E Mt0 =
E M0 = 0. By Theorem 12.3, there exists a predictable process H with E
∫ t0
0 H
2
s ds < ∞
such that Mt0 =
∫ t
0 Hs dWs.
Taking conditional expectations with respect to Ft , we obtain Mt =
∫ t
0 Hs dWs. This holds
almost surely for each t. Thus except for a null set of ω’s, it holds for all t rational. Since Mt
is right continuous, it holds for all t.
Corollary 12.5 If Mt is a square integrable martingale with respect to the minimal aug-
mented filtration of a one-dimensional Brownian motion W , then Mt has a version with
continuous paths.
Proof By Corollary 3.13, M has a version with right continuous paths. By Corollary 12.4,
M can be written as a stochastic integral with respect to W . But such stochastic integrals
have continuous paths by Theorem 10.4.
It is important for the martingale representation theorem that Mt be a martingale with
respect to the minimal augmented filtration of W and not a larger filtration. For example,
let (X ,Y ) be a two-dimensional Brownian motion and let {Ft} be the minimal augmented
filtration generated by (X ,Y ). We show that we cannot write Y1 as a stochastic integral with
respect to Xt . If it were possible to do so, since Y1 has mean zero, we would have
Y1 =
∫ 1
0
Hs dXs.
Taking conditional expectations, Yt =
∫ t
0 Hs dXs. Then 〈X ,Y 〉t =
∫ t
0 Hs ds by Exercise 10.5.
But if (X ,Y ) is two-dimensional Brownian motion, then X and Y are independent, and so
〈X ,Y 〉t = 0 by Exercise 9.4, a contradiction. (However, it is true, by a proof similar to that
of Theorem 12.3, if {Ft} is the minimal augmented filtration of a d-dimensional Brownian
motion (W 1, . . . ,W d ) and Y is square integrable andFt0 measurable, then there exist suitable
processes H is such that Y = EY +
∑d
i=1
∫ t0
0 H
i
s dW
i
s .)
12.5 The Burkholder–Davis–Gundy inequalities
Next we turn to a pair of basic inequalities, those of Burkholder, Davis, and Gundy. In both
of the following theorems, the constant depends on p, the exponent. As stated and proved
below, we require p ≥ 2 for Theorems 12.6 and 12.7; in fact, the two theorems are true (with
a different proof) as long as p > 0; see Bass (1995), pp. 62–4, or Exercise 12.12. The proof

we present here is a nice application of Itô’s formula.

12.5 The Burkholder–Davis–Gundy inequalities 83

Define

M∗t = sup

s≤t

|Ms|.

Theorem 12.6 Let Mt be a continuous local martingale with M0 = 0, a.s., and suppose

2 ≤ p < ∞. There exists a constant c1 depending on p such that for any finite stopping
time T ,
E (M∗T )
p ≤ c1E 〈M〉p/2T .
Proof There is nothing to prove if the left-hand side is zero, so we may assume it is positive.
First suppose M∗T is bounded by a positive constant K. Note for p ≥ 2 the function x → |x|p
is C2. By Doob’s inequalities and then Itô’s formula (and the fact that |Ms| ≥ 0), we have
E |M∗T |p ≤ cE |MT |p
= cE
∫ T
0
p|Ms|p−1 dMs + 12 cE
∫ T
0
p(p − 1)|Ms|p−2 d〈M〉s
≤ cE
∫ T
0
(M∗T )
p−2 d〈M〉s
= cE [(M∗T )p−2〈M〉T ].
(Recall our convention about constants and the letter c.) Using Hölder’s inequality with
exponents p/(p − 2) and p/2, we obtain
E (M∗T )
p ≤ c(E (M∗T )p)
p−2
p (E (〈M〉
p
2
T )
2
p .
Dividing both sides by (E (M∗T )
p)(p−2)/p) and then taking both sides to the power p/2 gives
our result.
We then apply the above to T ∧ UK , where UK = inf{t : |Mt | ≥ K}, let K → ∞, and use
Fatou’s lemma.
Theorem 12.7 Let Mt be a continuous local martingale with M0 = 0, a.s., and suppose
2 ≤ p < ∞. There exists a constant c2 depending on p such that for any finite stopping
time T ,
E 〈M〉p/2T ≤ c2E (M∗T )p.
Proof As in the previous theorem, we may assume the left-hand side is positive. Set r = p/2.
Let us first suppose 〈M〉T and M∗T are bounded by a positive constant K. Let Nt = Mt∧T , so
that 〈N〉∞ = 〈M〉T , and let At = 〈M〉r−1t∧T . Using integration by parts,∫ ∞
0
〈N〉s dAs = 〈N〉∞A∞ −
∫ ∞
0
As d〈N〉s
= 〈N〉r∞ −
1
r
〈N〉r∞.
Since ∫ ∞
0
〈N〉∞ dAs = 〈N〉r∞,
84 Some applications of Itô’s formula
we then have
〈N〉r∞ = r
∫ ∞
0
(〈N〉∞ − 〈N〉s) dAs.
Using Propositions 3.14 and 9.6,
E 〈M〉rT = E 〈N〉r∞ = rE
∫ ∞
0
(〈N〉∞ − 〈N〉s) dAs
= rE
∫ ∞
0
(E [〈N〉∞ | Fs] − 〈N〉s) dAs
= rE
∫ ∞
0
E [〈N〉∞ − 〈N〉s | Fs] dAs
= rE
∫ ∞
0
E [N2∞ − N2s | Fs] dAs
≤ cE
∫ ∞
0
E [(N∗∞)
2 | Fs] dAs
= cE [(N∗∞)2A∞]
= cE [(M∗T )2〈M〉r−1T ].
We use Hölder’s inequality with exponents r and r/(r − 1), divide both sides by the quantity
(E 〈M〉rT )(r−1)/r, and then take both sides to the rth power. We then get
E 〈M〉rT ≤ cE (M∗T )2r,
which is what we wanted.
To remove the restriction that 〈M〉 and M∗ are bounded, we apply the above to T ∧ VK in
place of T , where VK = inf{t : 〈M〉t + M∗t ≥ K}, let K → ∞, and use Fatou’s lemma.
12.6 Stratonovich integrals
For stochastic differential geometry and also many other purposes, the Stratonovich integral
is more convenient than the Itô integral. If X and Y are continuous semimartingales, the
Stratonovich integral, denoted
∫ t
0 Xs ◦ dYs, is defined by∫ t
0
Xs ◦ dYs =
∫ t
0
Xs dYs + 12 〈X ,Y 〉t .
Both the beauty and the difficulty of Itô’s formula are due to the quadratic variation term.
The change of variables formula for the Stratonovich integral avoids this.
Theorem 12.8 Suppose f ∈ C3 and X is a continuous semimartingale. Then
f (Xt ) = f (X0) +
∫ t
0
f ′(Xs) ◦ dXs.
Proof By Itô’s formula applied to the function f and the definition of the Stratonovich
integral, it suffices to show that
〈 f ′(X ), X 〉t =
∫ t
0
f ′′(Xs)d〈X 〉s. (12.13)
Exercises 85
Applying Itô’s formula to the function f ′, which is in C2,
f ′(Xt ) = f ′(X0) +
∫ t
0
f ′′(Xs) dXs + 12
∫ t
0
f ′′′(Xs) d〈X 〉s,
from which (12.13) follows.
If X and Y are continuous semimartingales and we apply the change of variables formula
with f (x) = x2 to X + Y and X − Y , we obtain
(Xt + Yt )2 = (X0 + Y0)2 + 2
∫ t
0
(Xs + Ys) ◦ d(Xs + Ys)
and
(Xt − Yt )2 = (X0 − Y0)2 + 2
∫ t
0
(Xs − Ys) ◦ d(Xs − Ys).
Taking the difference and then dividing by 4, we have the product formula for Stratonovich
integrals
XtYt = X0Y0 +
∫ t
0
Xs ◦ dYs +
∫ t
0
Ys ◦ dXs. (12.14)
The Stratonovich integral
∫
Hs ◦ dXs can be represented as a limit of Riemann sums.
Proposition 12.9 Suppose H and X are continuous semimartingales and t0 > 0. Then∫ t

0 Hs ◦ dXs is the limit in probability as n → ∞ of

2n−1∑

k=0

Hkt0/2n + H(k+1)t0/2n

2

(X(k+1)t0/2n − Xkt0/2n ).

Proof We write the sum as∑

Hkt0/2n (X(k+1)t0/2n − Xkt0/2n )

+ 12

∑

(H(k+1)t0/2n − Hkt0/2n )(X(k+1)t0/2n − Xkt0/2n ).

The first sum tends to

∫ t

0 Hs dXs while by Exercise 12.10 the second sum tends to

1

2 〈H, X 〉t .

This proves the proposition.

Exercises

12.1 Show that Wt and W 2t − t are local martingales, where W is defined in the statement of Theorem

12.2.

12.2 Suppose {Ft} is a filtration satisfying the usual conditions, X is a Brownian motion with respect

to {Ft}, and T is a finite stopping time with respect to this same filtration. Let Y be another

Brownian motion that is independent of {Ft} and define

Zt =

{

Xt , t < T
XT + Yt−T , t ≥ T.
Show that Z is a Brownian motion (although not necessarily with respect to {Ft}).
86 Some applications of Itô’s formula
12.3 Suppose Mt is a continuous local martingale with respect to a filtration {Ft} satisfying the usual
conditions, T is a stopping time with respect to {Ft}, and 〈M〉t = t ∧ T . Prove that Mt∧T has
the same law as a Brownian motion stopped at time T .
12.4 Here is a multidimensional version of Lévy’s theorem. Let {Ft} be a filtration satisfying the
usual conditions. Suppose (M1t , . . . , M
d
t ) is a d-dimensional process such that each component
Mit is a continuous martingale with respect to {Ft} with 〈Mi〉t = t. Suppose that 〈Mi, M j〉t = 0
if i �= j. Prove that (M1t , . . . , Mdt ) is a d-dimensional Brownian motion.
12.5 Let {Ft} be a filtration satisfying the usual conditions. Let At be a strictly increasing continuous
process adapted to {Ft} with limt→∞ At = ∞, a.s. Suppose (M1t , . . . , Mdt ) is a d-dimensional
process such that each component Mit is a continuous martingale with respect to {Ft} and
〈Mi〉t = At . Suppose that 〈Mi, M j〉t = 0 if i �= j. Prove that (M1t , . . . , Mdt ) is a time change of
d-dimensional Brownian motion.
12.6 Suppose M is a continuous local martingale such that 〈M〉t is deterministic. Prove that M is a
Gaussian process.
12.7 Suppose M is a continuous local martingale with M0 = 0, a.s. Show that there exists a Brownian
motion W , an increasing process τt , and a stopping time T such that Mt = Wτt∧T for all t.
12.8 Let Mt be a continuous local martingale. Show that the events (M∗∞ < ∞) and (〈M〉∞ < ∞)
differ by at most a null set.
12.9 Let Mt be a continuous local martingale. Prove that
P(sup
t≥0
|Mt | > x, 〈M〉∞ < y) ≤ 2e−x
2/2y.
12.10 Suppose X and Y are continuous semimartingales and t0 > 0. Prove that

2n−1∑

k=0

(X(k+1)t0/2n − Xkt0/2n )(Y(k+1)t0/2n − Ykt0/2n )

converges to 〈X ,Y 〉t0 in probability.

12.11 Let p > 0. Suppose X and Y are non-negative random variables, β > 1, δ ∈ (0, 1), and

ε ∈ (0, β−p/2) such that

P(X > βλ,Y < δλ) ≤ εP(X ≥ λ)
for all λ > 0. This inequality is known as a good-λ inequality. Prove that there exists a constant

c (depending on β, δ, ε, and p but not X or Y ) such that

E X p ≤ cEY p.

Hint: First assume X is bounded. Write

P(X/β > λ) = P(X > βλ,Y < δλ) + P(Y ≥ δλ)
≤ εP(X ≥ λ) + P(Y/δ ≥ λ).
Multiply by pλp−1, integrate over λ, and use the fact that ε < β−P/2.
Exercises 87
12.12 Use Exercise 12.11 to prove that the Burkholder–Davis–Gundy inequalities hold for all p > 0.

Hint: Use time change to reduce to the case of a Brownian motion W . If T is a stopping time

and U = inf{t : W ∗T > λ}, write

P(W ∗T > βλ, T

1/2 < δλ) = P(W ∗T > βλ, T < δ2λ2,U < ∞)
≤ P( sup
U≤t≤U+δ2λ2
|Wt − WU | > (β − 1)λ,U < ∞).
Condition on FU , use Theorem 4.2, and notice that P(U < ∞) = P(W ∗T > λ).

12.13 Define the H 1 norm of a martingale by

‖M‖H 1 = E [sup

t≥0

|Mt | ].

Prove that this is a norm. Does there exist a uniformly integrable continuous martingale that is

not in H 1?

12.14 Let W be a Brownian motion and let T be a stopping time. Prove that if E T 1/2 < ∞, then
EWT = 0.
12.15 Suppose W = (W 1, . . . ,W d ) is a d-dimensional Brownian motion started at 0, and let {Ft} be
the minimal augmented filtration of W . Suppose Y is a F1 measurable random variable with
mean zero and finite variance. Prove there exist predictable processes H 1, . . . , H d such that
E
∫ 1
0 (H
i
s )
2 ds < ∞ for each i and
Y =
d∑
i=1
∫ 1
0
H is dW
i
s .
12.16 Suppose W is a Brownian motion and H is adapted, bounded, and right continuous. Let t ≥ 0.
Show
1
Wt+h − Wt
∫ t+h
t
Hs dWs
converges in probability to Ht .
12.17 Let W be a Brownian motion and α > 0. Show that∫ t

0

1

|Ws|α ds

is infinite almost surely if α ≥ 1 but finite almost surely if α < 1.
12.18 Here is a useful inequality. Suppose A is an increasing process with A0 = 0, a.s., and suppose
there exists a non-negative random variable B such that for each t,
E [A∞ − At | Ft ] ≤ E [B | Ft ], a.s.
Prove that for each integer p ≥ 1, there exists a constant cp depending only on p such that
E Ap∞ ≤ cpE Bp.
Hint: Write
A∞ = p!
∫ ∞
0
(A∞ − At ) dAt ,
take expectations, and use Proposition 3.14.
88 Some applications of Itô’s formula
12.19 Let W be a one-dimensional Brownian motion with filtration {Ft} and let f (r, s) be a determin-
istic function. Define the multiple stochastic integral by∫ t
0
∫ s
0
f (r, s) dWr dWs =
∫ t
0
( ∫ s
0
f (r, s) dWr
)
dWs,
provided ∫ t
0
∫ s
0
f (r, s)2 dr ds < ∞,
and similarly for higher-order multiple stochastic integrals.
(1) If f : Rm → R and g : Rn → R are bounded and deterministic, n �= m,
M ft =
∫ t
0
· · ·
∫ rm−1
0
f dWr1 · · · dWrm ,
and Mgt is defined similarly, show that E [M
f
t M
g
t ] = 0 for all t.
(2) Show that the collection of random variables
{M f1 : f has domain Rm for some m and is bounded and deterministic}
is dense in the set of mean zero F1 measurable random variables with respect to the L2(P) norm.
13
The Girsanov theorem
We look at what happens to a Brownian motion when we change P to another probability
measure Q. This may seem strange, but there are many applications of this, including to
financial mathematics and to filtering; see Chapters 28 and 29. Another application we will
give (at the end of this chapter in Section 13.2) is to determine the probability a Brownian
motion Ws crosses a line a + bs before time t.
13.1 The Brownian motion case
We start with an observation. Suppose Yt is a continuous local martingale with Y0 = 0 and
let Zt = eYt−〈Y 〉t/2. Applying Itô’s formula to Xt = Yt − 12 〈Y 〉t with the function ex yields
Zt = eYt−〈Y 〉t/2 = 1 +
∫ t
0
eXs d
(
Ys − 12 〈Y 〉s
)
+ 12
∫ t
0
eXs d〈Y 〉s
= 1 +
∫ t
0
Zs dYs. (13.1)
This can be abbreviated by dZt = Zt dYt . Zt is called the exponential of the martingale Y ,
and since Z is the stochastic integral with respect to a local martingale, it is itself a local
martingale.
Before stating the Girsanov theorem, we need two technical lemmas.
Lemma 13.1 Suppose Y is a continuous local martingale with Y0 = 0 and Zt = eYt−〈Y 〉t/2. If
〈Y 〉t is a bounded random variable for each t, then E |Zt |p < ∞ for each p > 1 and each t.

Proof Let us first suppose Y is bounded in absolute value by N . Since Zt ≥ 0, we have by

the Cauchy–Schwarz inequality

E Z pt = E epYt−p〈Y 〉t/2 (13.2)

= E

[

epYt−p

2〈Y 〉t e(p

2−(p/2))〈Y 〉t

]

≤

(

E e2pYt−2p

2〈Y 〉t

)1/2(

E e(2p

2−p)〈Y 〉t

)1/2

.

By the exact same calculation as in (13.1) but with Y replaced by 2pY , we see e2pYt−2p

2〈Y 〉t

is a stochastic integral of a bounded integrand with respect to a bounded martingale, and

hence is a martingale. This shows that the first factor on the last line of (13.2) is 1. By our

assumption that 〈Y 〉t is bounded, the second factor on this line is finite and does not depend

on N .

89

90 The Girsanov theorem

If Y is not bounded, let TN = inf{s : |Ys| ≥ N}, apply the above argument to Yt∧TN , and let

N → ∞.

The second lemma is the following.

Lemma 13.2 Suppose At is a continuous increasing process adapted to a filtration {F t}

satisfying the usual conditions. Let X be a bounded random variable, H a bounded adapted

process, s < t, and B ∈ Fs. Then
E
[ ∫ t
s
X Hr dAr; B
]
= E
[ ∫ t
s
E [X | Fr] Hr dAr; B
]
.
Proof By linearity, it suffices to suppose X and H are non-negative. Let A′r = Ar+s,
H ′r = Hr+s, and F ′r = Fr+s. Let Cr =
∫ r
0 H
′
r 1B dA
′
s, and so we must show
E
∫ t−s
0
X dCr = E
∫ t−s
0
E [X | F ′r] dCr.
This follows by Proposition 3.14.
Let Mt be a non-negative continuous martingale with M0 = 1, a.s. Define a new probability
measure Q by Q(A) = E [Mt; A] if A ∈ Ft . Note Q is a probability measure because
Q(�) = E Mt = E M0 = 1. Q is well-defined because if A ∈ Fs ⊂ Ft , then since M is a
martingale, we have E [Mt; A] = E [Ms; A].
A more general version of the Girsanov theorem is possible (see Exercise 13.5), but the
Girsanov theorem is most frequently used with Brownian motion.
Theorem 13.3 Suppose Wt is a Brownian motion with respect to P, H is bounded and
predictable,
Mt = exp
( ∫ t
0
Hr dWr − 12
∫ t
0
H 2r dr
)
, (13.3)
and
Q(B) = E P[Mt; B] if B ∈ Ft . (13.4)
Then Wt −
∫ t
0 Hr dr is a Brownian motion with respect to Q.
Proof We prove the theorem by showing Wt −
∫ t
0 Hr dr satisfies the hypotheses of Lévy’s
theorem (Theorem 12.1). We first show Wt −
∫ t
0 Hr dr is a martingale with respect to Q. By
(13.1) with Yt =
∫ t
0 Hr dWr and Zt = Mt ,
Mt = 1 +
∫ t
0
MrHr dWr.
By Exercise 10.5,
〈M,W 〉t =
∫ t
0
MrHr dr. (13.5)
We want to show that if B ∈ Fs, then
E Q
[
Wt −
∫ t
0
Hr dr; B
]
= E Q
[
Ws −
∫ s
0
Hr dr; B
]
. (13.6)
13.1 The Brownian motion case 91
If B ∈ Fs, then using the definition of Q and the product formula (Corollary 11.3),
E Q[Wt; B] = E P[MtWt; B] (13.7)
= E P
[ ∫ t
0
Mr dWr; B
]
+ EP
[ ∫ t
0
Wr dMr; B
]
+ E P[〈M,W 〉t; B]
and
E Q[Ws; B] = E P[MsWs; B] (13.8)
= E P
[ ∫ s
0
Mr dWr; B
]
+ EP
[ ∫ s
0
Wr dMr; B
]
+ E P[〈M,W 〉s; B].
Since H is bounded, 〈∫ ·0 Hr dWr〉t ≤ ct. By Lemma 13.1, Mt is a martingale and E |Mt |p <∞ for each t and each p ≥ 1. Since stochastic integrals with respect to martingales are
martingales,
E P
[ ∫ t
0
Mr dWr; B
]
= E P
[ ∫ s
0
Mr dWr; B
]
(13.9)
and
E P
[ ∫ t
0
Wr dMr; B
]
= E P
[ ∫ s
0
Wr dMr; B
]
. (13.10)
Combining (13.7), (13.8), (13.9), and (13.10), we see that (13.6) will follow if we show
E P[〈M,W 〉t − 〈M,W 〉s; B] = E Q
[ ∫ t
s
Hr dr; B
]
. (13.11)
Using Lemma 13.2 and (13.5), we have
E Q
[ ∫ t
s
Hr dr; B
]
= E P
[
Mt
∫ t
s
Hr dr; B
]
= E P
[ ∫ t
s
MtHr dr; B
]
= E P
[ ∫ t
0
E [Mt | Fr]Hr dr; B
]
= E P
[ ∫ t
s
MrHr dr; B
]
= E P[〈M,W 〉t − 〈M,W 〉s; B],
which proves (13.11).
A similar proof shows that (Wt −
∫ t
0 Hr dr)
2 −t is a martingale with respect to Q, and hence
the quadratic variation of Wt −
∫ t
0 Hr dr under Q is still t (or see Exercise 13.2). Since the
process Wt −
∫ t
0 Hr dr has continuous paths, by Lévy’s theorem, Wt −
∫ t
0 Hr dr is a Brownian
motion under Q.
The assumption that H be bounded can be weakened, but in practice it is more common
to use a stopping time argument; for an example, see the proof of Theorem 29.3.
92 The Girsanov theorem
13.2 An example
Let us give an example of the use of the Girsanov theorem, namely, to compute the probability
that Brownian motion crosses a line a + bt by time t0, a > 0. We want to find an exact

expression for P(∃t ≤ t0 : Wt = a + bt), where W is a Brownian motion.

Let Wt be a Brownian motion under P. Define Q on Ft0 by

dQ/dP = Mt = e−bWt−b2t/2.

By the Girsanov theorem, under Q, W̃t = Wt + bt is a Brownian motion, and Wt = W̃t − bt.

Let A = (sups≤t0 Ws ≥ a). If we set S = inf{t > 0 : Wt = a}, then A = (S ≤ t0) and

A ∈ FS∧t0 . We write

P(∃t ≤ t0 : Wt = a + bt) = P(∃t ≤ t0 : Wt − bt = a) (13.12)

= P(sup

s≤t0

(Ws − bs) ≥ a).

Wt is a Brownian motion under P while W̃t is a Brownian motion under Q. Therefore the last

line of (13.12) is equal to

Q(sup

s≤t0

(W̃s − bs) ≥ a).

This in turn is equal to

Q(sup

s≤t0

Ws ≥ a) = Q(A).

To evaluate Q(A), note MS = e−ab−b2S/2 and by (3.19) with b replaced by a,

P(S ∈ ds) = a√

2πs3

e−a

2/2s.

Now we use optional stopping to obtain

P(∃t ≤ t0 : Wt = a + bt) = Q(A) = E P[Mt0; A] (13.13)

= E P[MS∧t0; S ≤ t0]

= E P[MS; S ≤ t0]

=

∫ t0

0

e−ab−b

2s/2 a√

2πs3

e−a

2/2s ds.

Exercises

13.1 Whether a filtration satisfies the usual conditions depends on the class of null sets and hence the

probability measure involved matters. Suppose {Ft} satisfies the usual conditions with respect

to P, H is a bounded predictable process, W a Brownian motion with respect to P, M defined

by (13.3), and Q defined by (13.4). If t0 > 0 and A ∈ σ (Ws; s ≤ t0), show P(A) = 0 if and only

if Q(A) = 0.

Exercises 93

13.2 Theorem 9.10 allows us to avoid some calculations in the last paragraph of the proof of Theorem

13.3. Suppose X is a continuous semimartingale under P and Q is a probability measure

equivalent to P. That is, a set is a null set for P if and only if it is a null set for Q. Show X is a

semimartingale under Q and the quadratic variation of X under P equals the quadratic variation

of X under Q.

13.3 LetW = (W 1, . . . ,W d ) be a d-dimensional Brownian motion with minimal augmented filtration

{Ft} and let H1, . . . , Hd be bounded predictable processes. Let

Mt = exp

( d∑

i=1

∫ t

0

Hi(s) dW

i

s − 12

d∑

i=1

∫ t

0

|Hi(s)|2 ds

)

.

Define a probability measure Q by setting Q(A) = E P[Mt; A] if A ∈ Ft . Let W̃ it = W it −∫ t

0 Hi(s) ds for each i. Prove that W̃ = (W̃ 1, . . . ,W̃ d ) is a d-dimensional Brownian motion

under Q.

13.4 Let Wt be a d-dimensional Brownian motion and let δ, t0 > 0. Let f : [0, t0] → Rd be a

continuous function. Prove that there exists a constant c such that

P(sup

s≤t0

|Ws − f (s)| < δ) > c.

This is known as the support theorem for Brownian motion.

Hint: First assume that f has a bounded derivative. Use Exercise 4.9 and the Girsanov theorem.

13.5 Here is a more general form of the Girsanov theorem. Suppose Lt is a bounded continuous

martingale under P, Mt = eLt−〈L〉t/2, and Q is a probability measure defined by Q(A) =

E P[Mt0; A] if A ∈ Ft0 . Suppose {Ft} is a filtration satisfying the usual conditions with respect to

both P and Q. Show that if X is a martingale under P, then Xt −〈X , L〉t is a martingale under Q.

14

Local times

Let Wt be a one-dimensional Brownian motion. Although the Lebesgue measure of the

random set {t : Wt = 0} is 0, a.s., nevertheless there is an increasing continuous process

which grows only when the Brownian motion is at 0. This increasing process is known as

local time at 0. We want to derive some of its properties.

14.1 Basic properties

LetW be a Brownian motion. By Jensen’s inequality for conditional expectations (Proposition

A.21), |Wt | is a submartingale, and by the Doob–Meyer decomposition (Theorem 9.12), it

can be written as a martingale plus an increasing process. Since Wt is itself a martingale, the

increasing process grows only at times when the Brownian motion is at 0.

Rather than appealing to the Doob–Meyer decomposition, we give the explicit decompo-

sition of |Wt |. We define

sgn (x) =

⎧⎪⎨⎪⎩

1, x > 0;

0, x = 0;

−1, x < 0.
Theorem 14.1 Let Wt be a one-dimensional Brownian motion.
(1) There exists a non-negative increasing continuous adapted process L0t such that
|Wt | =
∫ t
0
sgn (Ws) dWs + L0t . (14.1)
(2) L0t increases only when W is at 0. More precisely, if Ws(ω) �= 0 for r ≤ s ≤ t, then
L0r (ω) = L0t (ω).
L0t is called the local time at 0. The equation (14.1) is called the Tanaka formula.
Proof Define
fε(x) =
{
x2/2ε, |x| < ε;
|x| − (ε/2), |x| ≥ ε.
The function fε is an approximation to the function | · |, and note that fε(0) = f ′ε (0), while
f ′′ε (x) = ε−11[−ε,ε](x), except at x = ±ε.
94
14.1 Basic properties 95
We apply the extension of Itô’s formula given in Exercise 11.2 to fε(Wt ) and obtain
fε(Wt ) =
∫ t
0
f ′ε (Ws) dWs + 12
∫ t
0
f ′′ε (Ws) ds.
As we let ε → 0, we see that fε(x) → |x| uniformly, and f ′ε (x) → sgn (x) boundedly. By
Doob’s inequalities, if t0 > 0,

E sup

t≤t0

∣∣∣∫ t

0

f ′ε (Ws) dWs −

∫ t

0

sgn (Ws) dWs

∣∣∣2 → 0, (14.2)

while supt≤t0 | fε(Wt ) − |Wt | | → 0, a.s. Therefore there exists an increasing process L0t and a

subsequence εn → 0 such that

sup

t≤t0

∣∣∣ 1

2εn

∫ t

0

1[−εn,εn](Ws) ds − L0t

∣∣∣→ 0, a.s. (14.3)

Hence for almost every ω there is convergence uniformly over t in finite intervals, so L0t is

continuous in t. Since 12εn

∫ t

0 1[−εn,εn](Ws) ds increases only for those times t where |Wt | ≤ εn,

then L0t increases only on the set of times when Wt = 0.

In the Tanaka formula, the stochastic integral term is a martingale, say Nt . Note 〈N〉t = t,

since sgn (x)2 = 1 unless x = 0, and we have seen that Brownian motion spends 0 time at

0 (Exercise 11.1). Hence we have exhibited reflecting Brownian motion, namely |Wt |, as the

sum of another Brownian motion, Nt , and a continuous process that increases only when W

is at zero.

Let Mt denote sups≤t Ws. Note we do not have an absolute value here. The following, due

to Lévy, is often useful.

Theorem 14.2 The two-dimensional processes (|W |, L0) and (M − W, M ) have the same

law.

Proof Let Vt = −Nt in the Tanaka formula, so that

|Wt | = −Vt + L0t . (14.4)

Let St = sups≤t Vs. We will show St = L0t . This will prove the result, since V is a Brownian

motion, and hence (M − W, M ) is equal in law to (S − V, S) = (|W |, L0).

From (14.4), Vt = L0t − |Wt |, or Vt ≤ L0t for all t, hence St ≤ L0t , since L0 is increasing. L0t

increases only when Wt = 0 and at those times

L0t = Vt + |Wt | = Vt ≤ St .

Given two increasing functions with f ≤ g, if f (t) = g(t) at those times when f increases,

a little thought shows that f and g are equal for all t. Hence L0t = St for all t.

Just as we defined L0t via the Tanaka formula, we can construct local time at the level a by

the formula

|Wt − a| − |W0 − a| =

∫ t

0

sgn (Ws − a) dWs + Lat , (14.5)

96 Local times

and the same proof as above shows that Lat is the limit in L

2 of

1

2ε

∫ t

0

1[a−ε,a+ε](Ws)ds.

14.2 Joint continuity of local times

Next we will prove that Lat can be taken to be jointly continuous in both t and a.

Theorem 14.3 Let W be a one-dimensional Brownian motion and let Lat be the local time

of W at level a. For each a ∈ R there exists a version L̃at of Lat so that with probability one,

L̃at is jointly continuous in t and a.

Recall that two processes X and Y are versions of each other if for each t, Xt = Yt , a.s.

We will use the Kolmogorov continuity criterion, Corollary 8.2, together with Remark 8.3.

We will obtain an estimate on Ñat − Ñbt , where Ñat =

∫ t

0 sgn (Ws − a) dWs, by means of the

Burkholder–Davis–Gundy inequalities.

Proof Let M > 0 be arbitrary. It suffices to show the joint continuity for times less than or

equal to M and for |a| ≤ M . Let

Nat =

∫ M∧t

0

sgn (Ws − a) dWs.

Since |Wt −a| is uniformly continuous in t and a for |t| ≤ M, |a| ≤ M , by the Tanaka formula

(14.5) it suffices to establish the same fact for Nat .

Let T be a stopping time bounded by M and a < b. Since (Nat − Nbt )2 − 〈Na − Nb〉t is a
martingale,
E
[
((NaM − NbM )−(NaT − NbT ))2|FT
]
= E
[ ∫ M
T
(sgn (Ws − a) − sgn (Ws − b))2 ds|FT
]
= 4E
[ ∫ M
T
1[a,b](Ws) ds|FT
]
≤ 4E
[ ∫ M+T
T
1[a,b](Ws) ds|FT
]
= 4E
[ ∫ M
0
1[a,b](Ws+T ) ds|FT
]
;
recall Exercise 11.1. From Proposition 4.5 we deduce
E
[ ∫ M
0
1[a,b](Ws+T ) ds|FT
]
≤
∫ M
0
c(b − a)√
s
ds ≤ c(b − a).
Thus
E
[
((NaM − NbM ) − (NaT − NbT ))2|FT
] ≤ c|b − a|,
and so by (9.3)
E [〈Na − Nb〉M − 〈Na − Nb〉T | FT ] ≤ c|b − a|.
14.3 Occupation times 97
If we write At = 〈Na − Nb〉t , then we have by Proposition 3.14
E A2M = 2E
∫ M
0
(AM − At ) dAt
= 2E
[ ∫ M
0
(E [AM | Ft] − At ) dAt
]
= 2E
[ ∫ M
0
E [AM − At | Ft] dAt
]
≤ c|b − a|E
∫ M
0
dAt ≤ c|b − a|2.
Applying the Burkholder–Davis–Gundy inequalities,
E [sup
t≤M
|Nat − Nbt |4] ≤ c|b − a|2. (14.6)
By the Kolmogorov continuity criterion applied on the Banach space of continuous functions
with the metric d( f , g) = supt≤M | f (t)−g(t)|, we see Nat is continuous as a function of a for
a in the dyadic rationals in [−M, M], uniformly over t ≤ M . Therefore Lta is continuous over
a in the dyadic rationals in [−M, M], uniformly for t ≤ M . Also, (14.5) and (14.6) imply
E [sup
t≤M
|Lat − Lbt |4] ≤ c
(|a − b| ∧ 1)2. (14.7)
Note that if we define L̃at = lim Lbnt where the limit is as bn → a and bn is in the dyadic
rationals, then (14.7) implies that L̃at = Lat , a.s. The uniform continuity of Lat over a in the
dyadic rationals and t ≤ M implies the joint continuity of L̃at .
14.3 Occupation times
If we integrate local times over a set, we obtain occupation times. More precisely, we have
the following.
Theorem 14.4 Let Wt be a Brownian motion and L
y
t the local time at the level y, where we
take Lyt to be jointly continuous in t and y. If f is non-negative and Borel measurable,∫
f (y)Lyt dy =
∫ t
0
f (Ws) ds, a.s. (14.8)
with the null set independent of f and t.
Proof Suppose we prove the above equality for each C2 function f with compact support
and denote the null set by Nf . Taking a countable collection { fi} of non-negative C2 functions
with compact support that are dense in the set of non-negative continuous functions on R
with compact support and letting N = ∪iNfi , then if ω /∈ N we have the above equality for
all fi. By taking limits, we have (14.8) for all bounded and continuous f . A further limiting
procedure implies our result.
98 Local times
Suppose f is bounded and C2 with compact support. Notice that the process
∫
f (y)Lyt dy
is increasing and continuous. Define
g(x) =
∫
f (y)|x − y| dy. (14.9)
By Exercise 14.1, g is C2 with 12 g
′′ = f . If we take the Tanaka formula (14.5), replace a by
y, multiply by f (y), and integrate over R with respect to y, we see that
g(Wt ) − g(W0) = martingale +
∫ t
0
f (y)Lyt dy.
Using Itô’s formula,
g(Wt ) − g(W0) = martingale + 12
∫ t
0
g′′(Ws) ds
= martingale +
∫ t
0
f (Ws) ds.
Thus ∫ t
0
f (y)Lyt dy −
∫ t
0
f (Ws) ds
is a continuous martingale with paths locally of bounded variation, hence by Theorem 9.7 it
is identically 0.
Exercises
14.1 Suppose f is C2 with compact support and
g(x) =
∫
f (y)|x − y| dy.
Show that g is C2 and g′′ = 2 f .
14.2 Let Lyt be the jointly continuous local times of a Brownian motion W . Show
1
2ε
∫ t
0
1[y−ε,y+ε](Ws) ds → Lyt , a.s.
Show the null set can be taken to be independent of y. Thus there is no need to take a subsequence
εn to get almost sure convergence to L
y
t .
14.3 Let W be a Brownian motion and fix t. Show that the function x → ∫ t0 1(−∞,x](Ws) ds is
continuous, a.s., but that the function x → 1(−∞,x](Wt ) is not continuous.
14.4 Let {Ft} be a filtration satisfying the usual conditions. Suppose Wt is a Brownian motion and
Xt = Wt + At , where Xt ≥ 0 for all t, a.s., and At is an increasing continuous adapted process
such that A increases only at those times when Xt = 0. Suppose also that X ′t = Wt + A′t , where
X ′t ≥ 0 for all t, a.s., and A′t is an increasing continuous adapted process that increases only
when X ′t = 0. Show that X ′t = Xt and At = A′t , a.s., for all t ≥ 0.
14.5 Let W be a Brownian motion and L0t the local time at 0. Since L
0
t is increasing, for each ω there
is a Lebesgue–Stieltjes measure dL0t . Show that the support of dL
0
t is equal to {t : Wt = 0}.
Exercises 99
Since Theorem 14.1(2) states that L0t does not increase when Wt is not equal to 0, what you need
to show is that with probability one, if Wu(ω) = 0 and t < u < v, then L0v (ω) > L0t (ω).

14.6 Use Tanaka’s formula to show that if Lyt is the local time of Brownian motion at level y,

a ≤ x ≤ y ≤ b, and T = inf{t > 0 : Wt /∈ [a, b]}, then

E xLyT =

2(x − a)(b − y)

b − a .

14.7 If L0t is the local time of a Brownian motion at 0, show that L

0

at has the same law as

√

aL0t .

14.8 Let W be a Brownian motion with local times Lyt . Set L

∗

t = supy Lyt . Let p > 0. Prove that there

exist constants c1, c2 such that if T is any finite stopping time,

c1E T

p/2 ≤ EL∗T ≤ c2E T p/2.

The constants c1, c2 can depend on p, but not on T .

Hint: Use Exercise 12.11.

14.9 This exercise defines the local time of a continuous martingale. If M is a continuous martingale,

then M2t is a submartingale and so equals a martingale plus an increasing process. The increasing

process L0t is called the local time of M at 0.

(1) Prove the analog of Tanaka’s formula.

(2) Define the local time Lat of M at a. Prove that L

a

t is jointly continuous in t and a.

(3) Prove that ∫ t

0

f (Ms) d〈M〉s =

∫

R

Lat f (a) da, a.s.

if f is non-negative and measurable.

14.10 This exercise is a complement to Exercise 7.8. Let W be a Brownian motion and let us define

Z = {t ∈ [0, 1] : Wt = 0}, the zero set. Let ε ∈ (0, 1/2) and let δ > 0. Fix ω and let {Bi} be any

countable covering of Z(ω) by closed intervals such that the interiors of the Bi’s are pairwise

disjoint and the length of each Bi is less than or equal to δ. We write Bi = [ai, bi].

Let ε > 0. Since L0 has the same law of the maximum of Brownian motion, there exists a c

(depending on ω) such that

L0t − L0s ≤ c(t − s)

1

2 − ε2

for each 0 ≤ s ≤ t ≤ 0. Write∑

i

|bi − ai| 12 −ε ≥ δ

−ε/2

c

c

∑

i

|bi − ai| 12 − ε2

≥ δ

−ε/2

c

∑

i

(L0bi − L0ai )

= δ

−ε/2

c

[L01 − L00].

Show that this implies that the Hausdorff dimension of Z is at least 1/2.

15

Skorokhod embedding

Suppose Y is a random variable with mean zero and finite variance. Skorokhod proved the

remarkable fact that if W is a Brownian motion, there exists a stopping time T such that

WT has the same law as Y . Without any restrictions on T , there is a trivial solution (see

Exercise 15.1), so one wants to require that E T < ∞. Skorokhod’s construction required
an additional random variable that is independent of the Brownian motion, but since that
time there have been 15 or 20 other constructions, most of which don’t require the extra
randomization, that is, T is a stopping time for the minimal augmented filtration generated
by W .
Although conceptually some constructions are easier than others, none is easy from the
point of view of technical details. We will give a construction that doesn’t have any optimality
properties, but is a nice example of stochastic calculus. Then we will use this to prove an
embedding for random walks.
15.1 Preliminaries
A function f : R → R is a Lipschitz function if there exists a constant k such that
| f (y) − f (x)| ≤ k|y − x|, x, y ∈ R. (15.1)
By the mean value theorem, if f has a bounded derivative, then f is a Lipschitz function.
We will need the following well-known theorem from the theory of ordinary differential
equations.
Theorem 15.1 Suppose F : [0, ∞) × R → R is a bounded function and there exists a
positive real k such that
|F (t, x) − F (t, y)| ≤ k|x − y|
for all t ≥ 0 and all x, y ∈ R. Let y0 ∈ R, define the function y0 by y0(t) = y0 for all t ≥ 0,
and define the function yi inductively by
yi+1(t) = y0 +
∫ t
0
F (s, yi(s)) ds, t ≥ 0. (15.2)
Then the functions yi converge uniformly on bounded intervals to a function y that satisfies
y(t) = y0 +
∫ t
0
F (s, y(s)) ds. (15.3)
100
15.1 Preliminaries 101
For any s such that F (s, y(s)) is continuous at s, y satisfies
dy
ds
= F (s, y(s)). (15.4)
The solution to (15.3) is unique.
This inductive procedure for obtaining the solution to (15.4) is known as Picard iteration.
Proof Note each yi(t) is bounded in absolute value by |y0| + t sup |F |. Let gi(t) =
sups≤t |yi+1(s) − yi(s)|. If s ≤ t, then
|yi+1(s) − yi(s)| =
∣∣∣ ∫ s
0
[F (r, yi(r)) − F (r, yi−1(r))] dr
∣∣∣
≤
∫ t
0
|F (r, yi(r)) − F (r, yi−1(r))| dr
≤ k
∫ t
0
|yi(r) − yi−1(r)| dr
≤ k
∫ t
0
gi−1(r) dr.
Taking the supremum over s ≤ t, we have
gi(t) ≤ k
∫ t
0
gi−1(r) dr.
Fix t0. Now g1(t) is bounded for t ≤ t0, say by L. Then g2(t) ≤ k
∫ t
0 L dr = kLt for each
t ≤ t0, and then g3(t) ≤ k
∫ t
0 (kLr) dr = k2Lt2/2 and g4(t) ≤ k
∫ t
0 (k
2Lr2/2) dr = k3Lt3/3!
By induction gi(t) ≤ ki−1Lti−1/(i − 1)! We conclude
∑∞
i=1 gi(t0) < ∞.
Then
sup
s≤t0
|yn(s) − ym(s)| ≤
n−1∑
i=m
gi(t0),
which tends to zero as m and n tend to infinity. By the completeness of the space C[0, t0],
there exists a continuous function y such that sups≤t0 |yn(s) − y(s)| → 0 as n → ∞.
F is continuous in the x variable, so taking the limit in (15.2) shows that y solves (15.3).
If F is continuous at a particular value of s, then (15.4) holds by the fundamental theorem of
calculus.
To prove uniqueness, suppose x and y are solutions to (15.4) and let us set g(t) =
sups≤t |x(s) − y(s)|. If s ≤ t, then
|x(s) − y(s)| ≤
∫ s
0
|F (r, x(r)) − F (r, y(r))| dr
≤ k
∫ t
0
|x(r) − y(r)| dr
≤ k
∫ t
0
g(r) dr.
102 Skorokhod embedding
Taking the supremum over s ≤ t, we obtain
g(t) ≤ k
∫ t
0
g(r) dr.
For t ≤ t0, we have |x(t)| and |y(t)| bounded by a constant, say L, so g(t) is bounded for t ≤ t0.
We then have g(t) ≤ k ∫ t0 L dr = kLt for each t ≤ t0 and then g(t) ≤ k ∫ t0 kLr dr = k2Lt2/2.
Iterating, we have g(t) ≤ kitiL/i! for each i, and hence g(t) = 0. This is true for each t,
hence x(s) = y(s) for all s ≤ t0.
If the random variable Y that we are considering is equal to 0, a.s., we can just let our
stopping time T equal 0, a.s., and then WT = 0 = Y if W is a Brownian motion. In the
remainder of this section and the next we assume EY = 0, EY 2 < ∞, but that Y is not
identically zero.
Define
ps(y) = 1√
2πs
e−y
2/2s,
the density of a mean zero normal random variable with variance s. Use p′s(x) to denote the
derivative of ps with respect to x.
Lemma 15.2 Suppose W is a Brownian motion and g : R → R such that E [g(W1)2] < ∞.
For 0 < s < 1, let
a(s, x) = −
∫
p′1−s(z − x)g(z) dz (15.5)
and
b(s, x) =
∫
p1−s(z − x)g(z) dz. (15.6)
We have
g(W1) = E g(W1) +
∫ 1
0
a(s,Ws) dWs, a.s. (15.7)
and
E [g(W1) | F s] = b(s,Ws), a.s. (15.8)
Proof We will first prove (15.7), and we will first look at the case when g(x) = eiux.
By Itô’s formula with the function f (x) = ex applied to the semimartingale Xt = iuWt +
u2t/2
eiuWt+u
2t/2 = 1 +
∫ t
0
eXs d(iuWs + u2s/2) + 12
∫ t
0
(−u2)eXs ds
= 1 + iu
∫ t
0
eiuWs+u
2s/2 dWs,
so
eiuW1 = e−u2/2 +
∫ 1
0
iueiuWs eu
2(s−1)/2 dWs.
15.1 Preliminaries 103
We need to check that
iueiuxeu
2(s−1)/2 = a(s, x).
Using integration by parts,
a(s, x) = −
∫
p′1−s(z − x)g(z) dz =
∫
p1−s(z − x)g′(z) dz
= iu
∫
1√
2π(1 − s)e
−(z−x)2/2(1−s)eiuz dz.
This is iu times the characteristic function of a normal random variable with mean x and
variance 1 − s, and so by (A.25) equals
iueiuxe−u
2(1−s)/2,
as desired. We therefore have
eiuW1 = E eiuW1 −
∫ 1
0
∫
p′1−s(z − Ws)eiuz dz dWs. (15.9)
Now suppose g is in the Schwartz class (see Section B.2), replace u by −u in (15.9),
multiply by the Fourier transform of g, and integrate over u ∈ R. We then obtain
(2π)−1g(W1) = (2π)−1E g(W1) (15.10)
−
∫ ∫ 1
0
∫
p′1−s(z − Ws)e−iuzĝ(u) dz dWs du,
where ĝ is the Fourier transform of g. Using the Fubini theorem (check that there is no
trouble with the stochastic integral; see Exercise 15.2) and the inversion formula for Fourier
transforms, the triple integral on the right-hand side of (15.10) is equal to
− (2π)−1
∫ 1
0
∫
p′1−s(z − Ws)g(z) dz dWs, (15.11)
which gives us (15.7) when g is the Schwartz class. A limit argument gives us (15.7) for all
g that we are interested in.
To prove (15.8) we again start with the case g(x) = eiux. We have
E [eiuW1 | Fs] = eiuWsE [eiu(W1−Ws ) | Fs] = eiuWsE [eiu(W1−Ws )]
= eiuWs e−u2(1−s)/2,
using the independent increments property of Brownian motion and (A.25). On the other
hand, the definition of b(s, x) shows that when g(x) = eiux, b(s, x) is the characteristic
function of a normal random variable with mean x and variance 1 − s, so
b(s, x) = eiuxe−u2(1−s)/2.
Replacing x by Ws proves (15.8) in the case g(x) = eiux. We extend this to general g in the
same way as in the proof of (15.7).
104 Skorokhod embedding
Next, we want to find a reasonable function g such that g(W1) is equal in law to Y , where
again W is a Brownian motion. Let FY (x) = P(Y ≤ x), the distribution function of Y and let
�(x) = P(W1 ≤ x). Then
P(�(W1) ≤ x) = P(W1 ≤ �−1(x)) = �(�−1(x)) = x
for x ∈ [0, 1], so �(W1) is a uniform random variable on [0, 1]. Define
g(x) = F−1Y (�(x)). (15.12)
We use the right-continuous version of F−1Y if F
−1
Y is not continuous. Then
P(g(W1) ≤ x) = P(�(W1) ≤ FY (x)) = FY (x),
or Y is equal in law to g(W1) as desired. Note g is an increasing function.
We will need the following estimates.
Proposition 15.3 Let g be defined by (15.12) and define a and b by (15.5) and (15.6).
(1) For each L > 0 and s0 < 1, a is continuously differentiable on [0, s0] × [−L, L]. Also,
for each L > 0 and s0 < 1, a is bounded below by a positive constant on [0, s0] × [−L, L].
(2) For each L > 0 and s0 < 1, b is continuously differentiable on [0, s0] × [−L, L].
(3) For each s ∈ [0, s0], the function x → b(s, x) is strictly increasing. For each fixed s,
let B(s, x) be the inverse of b(s, x) (so that B(s, b(s, x)) = x and b(s, B(s, x)) = x). For each
L > 0 and s0 < 1, B is continuously differentiable on [0, s0] × [−L, L].
Proof To start, we observe that for every r > 0,

E er|W1| ≤ E erW1 + E e−rW1 < ∞.
Since |z|m ≤ m!e|z| if m is a non-negative integer, then by the Cauchy–Schwarz inequality
and the fact that EY 2 < ∞,∫
|z|mer|z|e−z2/2|g(z)| dz ≤ m!
∫
e(r+1)|z|e−z
2/2|g(z)| dz (15.13)
= m!E
[
e(r+1)|W1||g(W1)|]
≤ m!
(
E e2(r+1)|W1|
)1/2
(E |g(W1)|2)1/2
≤ m!
(
E e2(r+1)|W1|
)1/2
(EY 2)1/2 < ∞.
We now turn to (1).
|p′1−s(z − x)| ≤ c
|z − x|
(1 − s)3/2 e
−(z−x)2/2(1−s)
≤ c|z − x|e−x2/2(1−s)ezx/2(1−s)e−z2/2(1−s)
≤ c(|z| + L)e|z|L/2(1−s0 )e−z2/2
≤ c|z|ec′ |z|e−z2/2 + cec′|z|e−z2/2.
Therefore
|a(s, x)| ≤
∫
c|z|ec′|z|e−z2/2|g(z)| dz +
∫
cec
′|z|e−z
2/2|g(z)| dz,
which is bounded by (15.13). This gives an upper bound for a.
15.2 Construction of the embedding 105
By the mean value theorem,
|p′1−s(z − x) − p′1−s(z − (x + h))| ≤ c|h|(1 + |z|2 + L2)e−(z−x)
2/2(1−s)
if s ≤ s0, |x| ≤ L, and |h| ≤ 1, so∣∣∣1
h
(p′1−s(z − x) − p′1−s(z − (x + h))
∣∣∣ ≤ c(1 + |z|2)ec′|z|e−z2/2.
In view of (15.13), we can use dominated convergence to conclude that
∂a
∂x
(s, x) =
∫
p′′1−s(z − x)g(z) dz
and that |∂a(s, x)/∂x| is bounded above on [0, s0] × [−L, L].
By a similar argument we obtain that |∂a(s, x)/∂s| is also bounded above on [0, s0] ×
[−L, L]. The same argument shows that the second partial derivatives of a are bounded, and
hence the first partial derivatives are continuous.
Using integration by parts,
a(s, x) =
∫
p1−s(z − x) dg(z),
where the integral is a Lebesgue–Stieltjes integral; recall that g is an increasing function.
Since we are working under the assumption that Y is not identically zero, then g is not
identically zero, which implies that a is bounded below for s ≤ s0 and |x| ≤ L.
The proof of (2) is quite similar. To prove (3), as above, we can use a dominated convergence
argument to prove
∂b(s, x)
∂x
= a(s, x).
Since a(s, x) > 0 for each x and for each s < s0, we conclude that x → b(s, x) is
strictly increasing. The estimates for B follow from the implicit function theorem applied to
f (s, x, y) = 0, where f (s, x, y) = b(s, x) − y.
15.2 Construction of the embedding
Theorem 15.4 Suppose Y is a random variable with EY = 0 and EY 2 < ∞. There exists a
Brownian motion N and a stopping time T with respect to the minimal augmented filtration
of N such that NT is equal in law to Y . Moreover E T = EY 2.
Proof The idea is to define M by (15.14) below and do a time change so that NT = M1 =
g(W1). To show that T is a stopping time relative to the minimal augmented filtration for
N , we set up an ordinary differential equation that the time change solves and use Picard
iteration to show that the solution can be obtained in a constructive way.
The case where Y is identically zero is trivial for we take T = 0, so we suppose Y is
not identically zero. Let Wt be a Brownian motion and let {Ft} be its minimal augmented
filtration. Define the function g by (15.12) and define a and b for s < 1 by (15.5) and (15.6).
Define a(s, x) = 1 and b(s, x) = x if s ≥ 1.
106 Skorokhod embedding
Now let
Mt =
∫ t
0
a(s,Ws) dWs, (15.14)
and hence
〈M〉t =
∫ t
0
a(s,Ws)
2 ds.
Note 〈M〉t → ∞, a.s., as t → ∞. Since EY = 0, then E g(W1) = 0, so M1 = g(W1) by
(15.7). Let
τt = inf{s : 〈M〉s ≥ t},
the inverse of 〈M〉. By Theorem 12.2, if we set Nt = Mτt , then N is a Brownian motion. Let
{Gt} be the minimal augmented filtration generated by N .
We let T = 〈M〉1. Then
NT = N〈M〉1 = Mτ〈M〉1 = M1 = g(W1),
and NT has the same law as Y .
For the integrability of T we have
E T = E 〈M〉1 = E M21 = E [g(W1)2] = EY 2 = VarY < ∞. (15.15)
It remains to show that T is a stopping time with respect to {Gt}. Since T = lims↑1 〈M〉s,
it suffices to show that 〈M〉s is a stopping time with respect to {Gt} for each s < 1. Fix K. We
will show
(τt ≤ s, sup
s≤t
|Ns| ≤ K) ∈ Gt, s < 1. (15.16)
Letting K → ∞ will then show (〈M〉s ≥ t) = (τt ≤ s) ∈ Gt for s < 1.
Since τ is the inverse of 〈M〉, then
dτt
dt
= 1
d〈M〉τt /dτt
= 1
a(τt,Wτt )2
with τ0 = 0, a.s. With B(s, x) being the inverse of b(s, x) in the x variable,
Ms = E [M1 | Fs] = E [g(W1) | Fs] = b(s,Ws),
or
Ws = B(s, Ms), s < 1.
Therefore
Wτt = B(τt, Mτt ) = B(τt, Nt )
on the event (τt ≤ s) if s < 1. Thus τt solves the equation
dτt
dt
= 1
a(τt, B(τt, Nt ))2
, τ0 = 0,
or
τt =
∫ t
0
1
a(τu, B(τu, Nu))2
du.
15.2 Construction of the embedding 107
Fix s and t and choose s0 ∈ (s, 1). Let SK = inf{t : |Nt | ≥ K} and let NKt = Nt∧SK . Define
�(q, r) = 1
(a(r, B(r, NKq (ω)))
2
if r ≤ s0. Observe that � depends on ω. Define �(q, r) = 1 for r ≥ 1 and define �(q, r) by
linear interpolation for r ∈ (s0, 1). Note that by Proposition 15.3, � is continuous, bounded,
and there exists k > 0 such that

|�(q, r) − �(q, r′)| ≤ k|r − r′|, r ∈ R, q ∈ [0, ∞).

τt solves the equation

τt =

∫ t

0

�(u, τu) du.

We solve the differential equation

y(t) =

∫ t

0

�(u, y(u)) du (15.17)

using Theorem 15.1. The function y0(t) in the statement of Theorem 15.1 is identically zero,

and the function y1(t) = ∫ t0 �(u, y0(u)) du (which depends on ω because � does) will be Gt

measurable, and by induction, the functions yi(t) will be Gt measurable. Therefore the limit,

y(t), will be Gt measurable. Since |NKq (ω)| ≤ K for all q and we are only interested in the

solution to (15.17) for y(t) ≤ s, then τt = y(t) as long as τt ≤ s; therefore (15.16) holds and

the proof is complete.

In the above theorem, we started with a Brownian motion W , constructed a new Brownian

motion N , and then defined our stopping time T in terms of N . We can actually start with a

Brownian motion W and define a stopping time that is a stopping time with respect to the

minimal augmented filtration of W .

Corollary 15.5 LetW be a Brownian motion and let {Ft} be the minimal augmented filtration

for W . Let Y be a random variable with EY = 0 and VarY < ∞. There exists a stopping
time V with respect to {Ft} such that WV has the same law as Y .
Proof We sketch the proof and ask you to give the details in Exercise 15.3. Define
�(q, r) = 1
(a(r, B(r,Wq(ω))))2
and solve the equation
dτ t
dt
= �(t, τ t ), τ0 = 0
by Picard iteration. The proof of Theorem 15.4 shows that the solution τ t will satisfy
(τ t ≤ s) ∈ Ft for every t as long as s < 1. Let A be the inverse of τ , and define V = lims↑1 As.
Then V will be the desired stopping time.
108 Skorokhod embedding
15.3 Embedding random walks
Let us give an application of Skorokhod embedding to show that we can find a Brownian
motion that is relatively close to a random walk. Suppose Y1,Y2, . . . is an i.i.d. sequence of
real-valued random variables with mean zero and variance one. Given a Brownian motion
Wt we can find a stopping time T1 such that WT1 has the same distribution as Y1. We use
the strong Markov property at time T1 and find a stopping time T2 for WT1+t − WT1 so that
WT1+T2 − WT1 has the same distribution as Y2 and is independent of FT1 . We continue. We see
that the Ti are i.i.d. and by Theorem 15.4, E Ti = EY 2i = 1. Let Uk =
∑k
i=1 Ti. Then for each
n, Sn =
∑n
i=1 Yi has the same distribution as WUn .
Theorem 15.6
sup
i≤n
|WUi − Wi|/
√
n
tends to 0 in probability as n → ∞.
Proof We will show that for each ε > 0

lim sup

n→∞

P(sup

k≤n

|WUk − Wk| > ε

√

n) ≤ ε. (15.18)

Since the paths of Brownian motion are continuous, we can find δ ≤ 1 small such that

P( sup

s,t≤2,|t−s|≤δ

|Wt − Ws| > ε) < ε/2.
By scaling,
P( sup
s,t≤2n,|t−s|≤δn
|Wt − Ws| > ε

√

n) < ε/2. (15.19)
The strong law of large numbers (Theorem A.38) says that Un/n → E T1 = 1, a.s., and in
fact, by Proposition A.39, we even have
maxk≤n |Uk − k|
n
→ 0, a.s. (15.20)
Therefore
P(max
k≤n
|WUk − Wk| > ε

√

n)

≤ P(max

k≤n

|Uk − k| > δn) + P( sup

s,t≤2n,|t−s|≤δn

|Wt − Ws| > ε

√

n)

≤ P

(

max

k≤n

|Uk − k|

n

> δ

)

+ ε

2

.

By (15.20) this will be less than ε if we take n sufficiently large.

Exercises

15.1 Without some supplemental conditions on T , the problem of Skorokhod embedding is trivial.

Suppose W is a Brownian motion with respect to a filtration {Ft} satisfying the usual conditions.

Suppose Y is a finite random variable and suppose h is a real-valued function such that h(W1)

has the same law as Y .

Exercises 109

(1) Show that if T = inf{t > 1 : Wt = h(W1)}, then WT and Y have the same law.

(2) Give an example of a mean zero random variable Y with finite variance such that if T is

defined as in (1), then E T = ∞.

15.2 Show that the triple integral on the right-hand side of (15.10) is equal to the expression in

(15.11).

15.3 A sketch was given for the proof of Corollary 15.5. Provide a detailed proof.

15.4 Here is another approach to proving Corollary 15.5. Let Y , N , T , and {Gt} be as in the proof of

Theorem 15.4.

(1) Show that there is a random variable U that is measurable with respect to σ (Ns : 0 ≤ s <
∞) such that U = T , a.s.
(2) Show there is a Borel measurable map H : C[0,∞) → [0,∞) such that U = H (N ).
(3) If W is a Brownian motion, define V = H (W ). Show V is a stopping time with respect to
the minimal augmented filtration generated by W such that WV has the same law as Y .
15.5 Suppose p ∈ (0, 1/2) and Y is a random variable such that P(Y = 1) = P(Y = −1) = p and
P(Y = 0) = 1 − 2p. Let W be a Brownian motion. Let Sx = inf{t > 0 : Wt = x} and let

T = inf{t > Sx ∧ S−x : Wt ∈ {−1, 0, 1}}. Determine x such that WT and Y have the same law.

15.6 Suppose Y is a mean zero random variable and there exists a real number K > 0 such that

|Y | ≤ K, a.s. Let W be a Brownian motion and let T be a stopping time with E T < ∞ such
that WT and Y have the same law. (We do not necessarily assume that T was constructed by the
method of Section 15.2.) Let SK = inf{t : |Wt | ≥ K}. Prove that T ≤ SK , a.s.
15.7 Let Yi be a sequence of i.i.d. random variables with P(Yi = 1) = P(Yi = −1) = 12 , and let
Sn =
∑n
i=1 Yi. Sn is called a simple symmetric random walk. Let T1, T2, . . . and U1,U2, . . . be
as in Section 15.3.
(1) Prove that E T p1 < ∞ for all p ≥ 1.
(2) Prove that if ε > 0,

lim

n→∞

supk≤n |Uk − k|

n(1/2)+ε

= 0, a.s.

Hint: Use Doob’s inequalities to estimate

P(sup

k≤n

|Uk − k| ≥ δn(1/2)+ε ).

(3) Show that

sup

i≤n

|WUi − Wi|/n(1/4)+(ε/2)

tends to zero in probability as n → ∞.

15.8 Let Sn, Ti, and Ui be as in Exercise 15.7. Prove that

lim

n→∞

supi≤n |WUi − Wi|√

n

= 0, a.s.

15.9 Let Sn be a simple symmetric random walk; see Exercise 15.7. Let Y be a bounded symmetric

random variable that takes values only in Z. (Y being symmetric means that Y and −Y have the

same law.) Does there necessarily exist a stopping time N such that SN and Y have the same

law? Why or why not?

110 Skorokhod embedding

Notes

The survey article Obłój (2004) summarizes many different methods of Skorokhod em-

bedding. The embedding presented here is from Bass (1983); see also Stroock (2003),

pp. 213–17.

16

The general theory of processes

The name “general theory of processes” refers to the foundations of stochastic processes.

Specific topics include measurability issues and classifications of stopping times. This chapter

is fairly technical and abstract and should only be skimmed on the first reading of this book:

read the definitions and statements of theorems, propositions, and lemmas, but not the

proofs.

The two main results we discuss are the measurability of hitting times, and the

Doob–Meyer decomposition of submartingales, Theorem 16.29.

16.1 Predictable and optional processes

Suppose (�,F , P) is a probability space. The outer probability P∗ associated with P is given

by

P∗(A) = inf{P(B) : A ⊂ B, B ∈ F}. (16.1)

A set A is a P-null set if P∗(A) = 0. We suppose throughout this chapter that {Ft} is a

filtration satisfying the usual conditions; recall from Chapter 1 that this means that each Ft

contains all the P-null sets and that ∩ε>0Ft+ε = Ft for each t. Let π : [0, ∞) × � → � be

defined by

π(t, ω) = ω. (16.2)

We define the predictable σ -field P to be the σ -field on [0, ∞) × � generated by the

collection of all bounded left continuous processes adapted to Ft . That is, P is the σ -field

on [0, ∞) × � generated by the collection of all sets of the form

{(t, ω) ∈ [0, ∞) × � : Xt (ω) > a},

where a ∈ R and X is a bounded, adapted, left-continuous process. The optional σ -field O

is the σ -field on [0, ∞) × � generated by the collection of all bounded right-continuous

processes adapted to Ft . The word for predictable in French is “prévisible.” The older

literature uses “well measurable” in place of the word “optional.”

If S and T are random variables taking values in [0, ∞], let [S, T ) = {(t, ω) ∈ [0, ∞)×� :

S(ω) ≤ t < T (ω)}, and define (S, T ], (S, T ), etc. similarly. With this notation, [T, T ], the
graph of T , is equal to {(t, ω) ∈ [0, ∞) × � : T (ω) = t < ∞}. Note that [T, T ] is a subset
of [0, ∞) × �, so π([T, T ]) = (T < ∞).
111
112 The general theory of processes
Recall that a stopping time can take the value ∞. A stopping time T is predictable if there
exists a sequence of stopping times Tn such that for all ω
(1) T1(ω) ≤ T2(ω) ≤ · · · ,
(2) limn→∞ Tn(ω) = T (ω), and
(3) if T (ω) > 0, then Tn(ω) < T (ω) for each n.
In this case, the stopping times Tn predict T or announce T . If T is a stopping time satisfying
(1)–(3) above and S = T , a.s., then we call S a predictable stopping time as well. A stopping
time T is totally inaccessible if P(T = S < ∞) = 0 for every predictable stopping time S.
For an example of a predictable stopping time, let Wt be a Brownian motion started at 0
and let T = inf{t > 0 : Wt = 1}. The stopping time T is predicted by the stopping times

Tn = inf{t > 0 : Wt = 1 − (1/n)}.

For an example of a totally inaccessible stopping time, let Pt be a Poisson process with

parameter 1 and let T = inf{t : Pt = 1}, the first time the Poisson process jumps. Since

Pt has independent increments, Pt − t is a martingale, just as in Example 3.2. By (A.8),

E [(Pt − t)2] < ∞. If S is a bounded predictable stopping time, by the optional stopping
theorem, E PS = E S. If Sn are stopping times predicting S, then by monotone convergence
E PS− = lim
n→∞
E PSn = limn→∞ E Sn = E S.
Therefore E [PS − PS−] = 0, and since Pt is an increasing process, this says that P does not
jump at time S. Applying this to S ∧ M and letting M → ∞, we see that P does not jump at
any predictable time S, whether or not S is bounded. Therefore P(T = S < ∞) = 0, so T is
totally inaccessible.
The proof of the following proposition is reminiscent of that of the Vitali covering theorem
from measure theory.
Proposition 16.1 Let T be a stopping time. There exist predictable stopping times S1, S2, . . .
and a totally inaccessible stopping time U such that [T, T ] = [U,U ] ∪ (∪∞i=1[Si, Si]).
Proof Let
a1 = sup{P(S = T < ∞) : S is a predictable stopping time}
and choose S1 to be a predictable stopping time such that P(S1 = T < ∞) ≥ 12 a1. Given
S1, . . . , Sn, let
an+1 = sup{P(S = T < ∞, S �= S1, . . . , S �= Sn)) :
S is a predictable stopping time}
and choose Sn+1 such that P(Sn+1 = T < ∞, Sn+1 �= S1, . . . , Sn+1 �= Sn) ≥ 12 an+1.
If this procedure stops after n steps, set U (ω) equal to T (ω) if T (ω) is not equal to any
of S1(ω), . . . , Sn(ω) and equal to infinity otherwise. It is easy to check that U is a stopping
time that is totally inaccessible.
The other alternative is that this procedure continues indefinitely. In this case define
U (ω) =
{
T (ω), T (ω) �= S1(ω), S2(ω), . . . ,
∞, otherwise.
16.1 Predictable and optional processes 113
There is no problem checking that U is a stopping time, but we need to show that U is
totally inaccessible. Since probabilities are bounded by one, we have an → 0. If there exists
a predictable stopping time S such that b = P(S = U < ∞) > 0, then b > 2an for some

n, and in our construction we would have then chosen S in place of the Sn we did choose.

Therefore such a stopping time S cannot exist.

Proposition 16.2 (1) The optional σ -field O is generated by the collection of sets

{[S, T ) : S, T stopping times}.

(2) O is generated by the collection of sets of the form [a, b)×C, where a < b and C ∈ Fa.
(3) The predictable σ -field P is generated by the collection of sets
{(S, T ] : S, T stopping times}.
(4) P is generated by the collection of sets
{[S, T ) : S, T predictable stopping times}.
(5) P is generated by the collection of sets of the form [b, c) × C, where a < b < c and
C ∈ Fa.
Proof (1) Since 1[S,T ) is a bounded right-continuous process that is adapted to {Ft}, sets
of the form [S, T ) are optional. Now suppose X is a bounded adapted process with right-
continuous paths. Let ε > 0, let U0 = 0, a.s., and let

Ui+1 = inf{t > Ui : |Xt − XUi | > ε}, i ≥ 0. (16.3)

Since X has right-continuous paths,

(U1 < t) = ∩q∈Q+,q

where Q+ denotes the positive rationals, and it follows that U1 is a stopping time. Similarly

Ui is a stopping time for each i; Exercise 16.4 asks you to prove this. If we set

X εt (ω) =

∞∑

i=0

XUi (ω)1[Ui(ω),Ui+1(ω))(t),

then supt |Xt − X εt | ≤ ε. Therefore it suffices to show that each process X ε is measurable

with respect to the σ -field Ô generated by the collection of sets of the form [S, T ).

To do that, it suffices to show that processes of the form

Yt (ω) = 1A(ω)1[Ui(ω),Ui+1(ω))(t),

where A ∈ FUi , are measurable with respect to Ô. If we set S(ω) equal to Ui(ω) if ω ∈ A

and equal to ∞ otherwise and we set T (ω) equal to Ui+1(ω) if ω ∈ A and ∞ otherwise, then

Yt (ω) = 1[S(ω),T (ω)).

(2) If C ∈ Fa, then 1C(ω)1[a,b)(t) is a bounded right-continuous adapted process, so it is

optional. By (1), every bounded right-continuous adapted process can be approximated by

linear combinations of processes of the form 1[S,T ). Now 1[S,T ) = 1[S,∞) − 1[T,∞), and 1[S,∞)

114 The general theory of processes

is the limit of 1[Sn,∞), where Sn = k/2n if (k − 1)/2n ≤ S < k/2n, and we can similarly
approximate 1[T,∞). Note
1[Sn(ω),∞)(t) =
∞∑
k=1
1((k−1)/2n≤S(ω)

(S, T ] = ∪k{∩m[S + 1k , T + 1m )}.

On the other hand, if S and T are predictable and are predicted by sequences Sn and Tm,

respectively, then

[S, T ) = ∩n{∪m(Sn, Tm]}.

(4) now follows by using (3).

(5) As long as a + (1/n) < b, the processes 1C(ω)1(b−(1/n),c−(1/n)](t) are left continuous,
bounded, and adapted, hence predictable. The process 1C(ω)1[b,c)(t) is the limit of these
processes as n → ∞, so is predictable. On the other hand, if Xt is a bounded adapted
left-continuous process, it can be approximated by
n2n−1∑
k=1
X(k−1)/2n (ω)1(k/2n,(k+1)/2n](t).
Each summand can be approximated by linear combinations of processes of the form
1C(ω)1(b,c](t), where C ∈ Fa and a < b < c. Finally, 1C(ω)1(b,c](t) is the limit of
1C(ω)1[b+(1/n),c+(1/n))(t) as n → ∞.
A consequence of Proposition 16.2(1) and (4) is that P ⊂ O.
16.2 Hitting times 115
16.2 Hitting times
Let S be a separable metric space. Suppose {Ft} is a filtration satisfying the usual conditions
and X is a stochastic process taking values S whose paths are right continuous and such that
the jump times are totally inaccessible. Saying the jump times are totally inaccessible means
that if T is a predictable stopping time, then XT − = XT , a.s., where XT− = lims

TB is known as the first hitting time of B and UB as the first entry time of B.

Proposition 16.3 (1) If A is an open set, then TA and UA are stopping times.

(2) If A is a compact set, then TA and UA are stopping times.

Proof (1) Since the paths of Xt are right continuous and A is open, for each t,

(TA < t) = ∪q∈Q+,q

large, in which case XT (ω) ∈ Am, or else TAn (ω) < T (ω) for all n. In the latter case,
XT (ω) = limn→∞ XTAn (ω) ∈ Am except for ω’s in a null set since the jump times of X are
totally inaccessible. In either case, XT ∈ Am. This is true for all m, so XT ∈ ∩mAm = A, and
therefore TA ≤ T .
We conclude TA is a stopping time. To prove UA is a stopping time, we argue using (16.4)
as above.
For the proof of the following, which uses Choquet’s capacity theorem, we refer the reader
to Blumenthal and Getoor (1968), Section I.10. Fix t and define
Rt (A) = {ω : Xs(ω) ∈ A for some s ∈ [0, t]} = (UA ≤ t). (16.5)
Theorem 16.4 If A is a Borel subset of S , then Rt (A) ∈ Ft and there exists an increasing
sequence of compact sets Kn contained in A such that P(Rt (Kn)) ↑ P(Rt (A)).
Since (UA ≤ t) = Rt (A), we have the following as an immediate corollary.
116 The general theory of processes
Theorem 16.5 For all Borel sets A, UA is a stopping time.
Here is the main theorem of this section.
Theorem 16.6 Suppose {Ft} is a filtration satisfying the usual conditions and X is a right
continuous process whose jump times are totally inaccessible. If B is a Borel subset of S ,
then TB is a stopping time.
Proof If we let Y δt = Xt+δ and U δB = inf{t ≥ 0 : Y δt ∈ B}, then by the above, U δB is a
stopping time with respect to the filtration {F δt }, where F δt = Ft+δ . It follows that δ + U δB
is a stopping time with respect to the filtration {Ft}. Since (1/m) + U 1/mB ↓ TB, then TB is a
stopping time with respect to {Ft}.
We now show that the hitting times of Borel sets can be approximated by the hitting times
of compact sets.
Proposition 16.7 There exists an increasing sequence of compact sets Kn contained in B
such that UKn ↓ UB on (UB < ∞), P-a.s.
Proof For each t we can find an increasing sequence of compact sets Ltn contained in B
with P(Rt (Ltn)) ↑ P(Rt (B)). Let qj be an enumeration of the non-negative rationals. Let
Kn = Lq1n ∪ · · · ∪ Lqnn . Then the Kn are compact, form an increasing sequence, and are all
contained in B. Thus UKn decreases, say to S, and since UKn ≥ UB for all n, then S ≥ UB. If
we prove S ≤ UB, P-a.s., then S = UB, and we have our result.
If UB < S, there exists a rational qj with UB < qj < S. Hence it suffices to
prove P(UB < qj < S) = 0 for all j. If UB < qj, then ω ∈ Rqj (B). Since
Rqj (L
qj
n ) ↑ Rqj (B), a.s., then except for a null set, ω will be in Rqj (Lqjn ) for all n large
enough, hence in Rqj (Kn) if n is large enough. Then UKn (ω) ≤ qj < UB or S ≤ qj.
Therefore P(Ub < qj < S) = 0.
Theorem 16.8 There exists an increasing sequence of compacts Kn contained in B such that
TKn ↓ TB.
Proof Let Y δt = Xt+δ and U δB = inf{t ≥ 0 : Yt ∈ B}. Applying the above proposition to
Y 1/mt , for each m there exist compact sets L
m
n , increasing in n and contained in B, such that
U 1/mLmn ↓ U
1/m
B . Let Kn = L1n ∪ · · · ∪ Lnn. Then Kn is an increasing sequence of compact sets
contained in B, and U 1/mKn ↓ U 1/mB . Also, for each n, 1/m +U 1/mKn ↓ TKn and 1/m +U 1/mB ↓ TB.
We write
TB = lim
m
(1/m + U 1/mB ) = limm limn (1/m + U
1/m
Kn
)
= lim
n
lim
m
(1/m + U 1/mKn ) = limn TKn .
Since 1/m + U 1/mKn is decreasing in both m and n, the change in the order of taking limits is
justified. Since TKn is decreasing, this completes the proof.
16.3 The debut and section theorems 117
16.3 The debut and section theorems
If E ⊂ [0, ∞) × �, let DE = inf{t ≥ 0 : (t, ω) ∈ E}, the debut of E. An important
generalization of Theorem 16.6 is the following, known as the debut theorem.
Theorem 16.9 If E ∈ O, then DE is a stopping time.
The proof of this theorem is beyond the scope of this book, and we refer the reader to
Dellacherie and Meyer (1978) for a proof.
Using Theorem 16.9, we can weaken the assumptions on X in Theorem 16.6.
Theorem 16.10 If X is an optional process taking values in S and B is a Borel subset of S ,
then UB and TB are stopping times.
Proof Since B is a Borel subset of S and X is an optional process, then 1B(Xt ) is also an
optional process. UB is then the debut of the set E = {(s, ω) : 1B(Xs(ω)) = 1}, and therefore
is a stopping time.
To prove that TB is a stopping time, we argue exactly as in the proof of Theorem 16.6.
Remark 16.11 In the theory of Markov processes, the notion of completion of a σ -field is a
bit different. However it is still the case that the hitting times of Borel sets by right continuous
processes are stopping times. See Remark 20.4.
The optional section theorem is the following.
Theorem 16.12 If E is an optional set and ε > 0, there exists a stopping time T such that

[T, T ] ⊂ E and P(π(E)) ≤ P(T < ∞) + ε.
The statement of the predictable section theorem is very similar.
Theorem 16.13 If E is a predictable set and ε > 0, there exists a predictable stopping time

T such that [T, T ] ⊂ E and P(π(E)) ≤ P(T < ∞) + ε.
Again we refer to Dellacherie and Meyer (1978) for proofs. We note that Proposition
16.7 is a precursor of the optional section theorem. To see this, let A be a Borel set
and let E = {(t, ω) : Xt ∈ A}. Then DE = UA. If the process is right continuous, then
XUKn ∈ Kn ⊂ A, where the Kn are as in Proposition 16.7, and the graphs of the UKn are
contained in E.
Here is a corollary of Theorems 16.12 and 16.13.
Corollary 16.14 (1) If X and Y are optional processes such that P(XT = YT ) = 1 for every
finite stopping time T , then X and Y are indistinguishable: P(Xt = Yt for all t) = 1.
(2) If X and Y are predictable processes with P(XT = YT ) = 1 for every finite predictable
stopping time T , then X and Y are indistinguishable.
Proof We prove (1), the proof of (2) being similar. Let F = {(t, ω) : Xt (ω) �= Yt (ω)}. Then
F is an optional set, and if P(π(F )) > 0, there exists a stopping time U with [U,U ] ⊂ F

118 The general theory of processes

and P(U < ∞) > 0. By looking at T = U ∧ N for sufficiently large N , we obtain a

contradiction.

Another application of the section theorems is the following.

Proposition 16.15 Suppose [T, T ] is a predictable set. Then T is a predictable stopping

time.

Proof Since T is the debut of [T, T ], then T is a stopping time. By the predictable section

theorem, Theorem 16.13, for each n there exists a predictable stopping time Sn such that

[Sn, Sn] ⊂ [T, T ] and

P(π([Sn, Sn])) ≥ P(π([T, T ])) − 2−n.

Saying [Sn, Sn] ⊂ [T, T ] implies that for each ω, either Sn(ω) = T (ω) or else Sn(ω) = ∞.

The set of ω’s for which T (ω) < ∞ but Sn(ω) = ∞ has probability at most 2−n.
Let Qn = S1 ∧· · ·∧Sn. Then the Qn’s are predictable stopping times by Exercise 16.1, they
decrease, [Qn, Qn] ⊂ [T, T ], and P(π([Qn, Qn])) ≥ P(π([T, T ]))−2−n+1. Let Q = limn Qn.
If Q(ω) < ∞, then Qn(ω) < ∞ for all n sufficiently large (how large depends on ω); since
Qn(ω) is either equal to T (ω) or to ∞, Qn(ω) = Q(ω) for all n sufficiently large, and hence
Q(ω) = T (ω). If T (ω) < ∞, then except for a set of ω’s of probability zero, Qn(ω) = T (ω)
for n sufficiently large. Therefore Q = T , a.s.
Choose Rnm predicting Qn as m → ∞. Choose mn large enough such that
P(Rnmn + 2−n < Qn < ∞) < 2−n and P(Rnmn < n, Qn = ∞) < 2−n.
Let Un = n ∧ Rnmn ∧ Rn+1,mn+1 ∧ · · · . Fix n for the moment. If 0 < Q(ω) < ∞, then
Rjmj (ω) < Qj(ω) = Q(ω) for all j sufficiently large. Choosing j > n sufficiently large,

Un(ω) ≤ Rjmj (ω) < Q(ω).
The Un increase; let T be the limit. By the Borel–Cantelli lemma, if Q(ω) < ∞, then
Rnmn (ω) ≥ Qn(ω) − 2−n = Q(ω) − 2−n for all n sufficiently large, except for a set of ω’s of
probability zero. Therefore Un(ω) ≥ Q(ω) − 2−n+1 for n sufficiently large, and we conclude
that Un(ω) ↑ Q(ω), except for a set of ω’s of probability zero.
If Q(ω) = ∞, then Qn(ω) = ∞ for all n. By the Borel–Cantelli lemma, except for a set
of probability zero, Rnmn ≥ n for n sufficiently large. Hence Un(ω) = n for n sufficiently
large, so Un(ω) < Q(ω) and Un(ω) ↑ Q(ω). Thus Q is predictable and T = Q, a.s. (We
leave consideration of those ω for which Q(ω) = 0 to the reader.)
Proposition 16.16 Let Xt be a predictable process with paths that are right continuous with
left limits. If a ∈ R and T = inf{t > 0 : Xt ≥ a}, then T is a predictable stopping time.

Proof The set A = {(t, ω) : Xt (ω) ≥ a} is a predictable set. Since Xt is right continuous,

[T, ∞) = A ∪ (T, ∞) ∈ P by Proposition 16.2, and so [T, T ] = [T, ∞) \ (T, ∞) ∈ P .

Now apply Proposition 16.15.

16.4 Projection theorems 119

16.4 Projection theorems

Let B[0, ∞) be the Borel σ -field on [0, ∞), let F∞ = ∨t≥0Ft , and let H be the product

σ -field

H = B[0, ∞) × F∞. (16.6)

The following is the optional projection theorem.

Theorem 16.17 Let X be a bounded process that is H measurable. There exists a unique

optional process oX such that

oXT 1(T<∞) = E [XT 1(T <∞) | FT ] (16.7)
for all stopping times T , including those taking infinite values. If X ≥ 0, then oX ≥ 0.
oX is called the optional projection of X . If X is already optional, then by the uniqueness
result, Corollary 16.14, oX = X .
If we take our stopping time T in (16.7) equal to a fixed time t, we have
oXt = E [Xt | Ft], a.s. (16.8)
This observation is sometimes useful when X is not an adapted process and one wants a
version of E [Xt | Ft] that is jointly measurable in t and ω.
If (16.7) holds, then taking expectations shows that
E [oXT ; T < ∞] = E [XT ; T < ∞] (16.9)
for all stopping times T . Conversely, suppose (16.9) holds for all stopping times T . If S is a
stopping time and A ∈ FS, let SA be defined by
SA(ω) =
{
S(ω) ω ∈ A;
∞ ω /∈ A. (16.10)
Then (16.9) with T replaced by SA implies that
E [oXS1(S<∞); A] = E [XS1(S<∞); A].
Since oXS1(S<∞) is FS measurable, this implies (16.7) holds for the stopping time S. Conse-
quently (16.7) holding for all stopping times T is equivalent to (16.9) holding for all stopping
times T .
Proof of Theorem 16.17 The uniqueness is immediate from Corollary 16.14. We look at
existence. If Xt (ω) = 1F (ω)1[a,b)(t) where F ∈ F∞, we set oXt equal to E [1F | Ft]1[a,b)(t),
where we use Corollary 3.13 to take the right continuous version of the martingale E [1F | Ft].
We check:
E [oXT ; T < ∞] = E [E [1F | FT ]1[a,b)(T ); T < ∞]
= E [1F 1[a,b)(T ); T < ∞]
= E [XT ; T < ∞]
120 The general theory of processes
since (T < ∞) and 1[a,b)(T ) are both FT measurable. We then use linearity and limits to
define oX for bounded measurable X . The positivity of oX when X ≥ 0 is clear from the
construction.
Almost the same proof gives
Theorem 16.18 Let X be a bounded measurable process. There exists a unique predictable
process pX , called the predictable projection of X , such that
E [pXT ; T < ∞] = E [XT ; T < ∞]
for every predictable stopping time T . If X ≥ 0, then pX ≥ 0.
Proof Uniqueness is as before. If Xt = 1F (ω)1(a,b](t), we let pXt = 1(a,b](t)Zt−(ω), where
Zt− denotes the left-hand limit of Zt at time t and Zt is the right-continuous version of the
martingale E [1F | Ft]. We use linearity and limits to define pX for bounded measurable X .
The positivity of pX when X ≥ 0 is clear.
16.5 More on predictability
If U is a random time, i.e., a F∞ measurable map from � to [0, ∞], define
FU− = σ {XU : X is bounded and predictable}.
Lemma 16.19 Suppose T is a predictable stopping time predicted by stopping times Tn.
Then FT− =
∨∞
n=1 FTn .
Proof If X is left continuous, adapted, and bounded, then XT = lim XTm and XTm ∈ FTm ⊂∨
n FTn , so XT ∈
∨
n FTn . An argument using the monotone class theorem shows FT− ⊂∨
n FTn .
On the other hand, suppose A ∈ FTn for some n. Define X = 1(Un,∞), where Un = Tn if
ω ∈ A and ∞ otherwise. Since Tn < T on (T > 0), then XT = 1A. (We leave consideration of

what happens on the event (T = 0) to the reader.) X is predictable since it is left continuous,

adapted, and bounded, so A is FT− measurable. Therefore FTn ⊂ FT − for all n, and we

conclude

∨

n FTn ⊂ FT−.

Corollary 16.20 Suppose T is a predictable stopping time. If M is a uniformly integrable

martingale with right-continuous paths, then

E [MT | FT−] = MT −.

Proof If Xt = Mt−, then X is left continuous, hence predictable, so MT− = XT is FT−

measurable by the definition of FT − and a limit argument. Suppose the sequence Tn predicts

T . If A ∈ FTm and n > m, then A ∈ FTm ⊂ FTn , and by optional stopping (see Exercise 3.12),

E [MT ; A] = E [MTn; A] → E [MT−; A]

as n → ∞. Since FT− =

∨

m FTm , we have E [MT ; A] = E [MT−; A] for all A ∈ FT−. Now

use the definition of conditional expectation.

Corollary 16.21 Let S be a predictable stopping time, M a square integrable martingale,

and Nt = �MS1(t≥S). Then Nt is a square integrable martingale.

16.5 More on predictability 121

Proof Since |Nt | ≤ 2 sups≥0 |Ms|, N is square integrable. We will show N is a mar-

tingale by showing E NT = 0 for all bounded stopping times T , and then appealing to

Proposition 9.5.

If T is a bounded stopping time, then (T ≥ S) ∈ FS−; to see this, if Sm is a sequence of

stopping times predicting S, then (T ≥ S) = ∩m(T ≥ Sm) ∈ ∨mFSm . Using Corollary 16.20,

E NT = E �MS1(T≥S) = E [MS; T ≥ S] − E [MS−; T ≥ S] = 0,

and we are done.

We now show that every stopping time for Brownian motion is predictable.

Proposition 16.22 Let {Ft} be the minimal augmented filtration of a Brownian motion. If T

is a stopping time with respect to {Ft}, then T is a predictable stopping time.

Proof Let T be a stopping time for Brownian motion. Let g be a continuous strictly

increasing function from [0, ∞] to [0, 1], e.g., g(s) = (2/π ) arctan s. Let Mt be the right-

continuous modification of the martingale E [g(T ) | Ft]. The property of Brownian motion

that is key here is that every martingale adapted to the filtration of a Brownian motion is

continuous; see Corollary 12.5. Hence Mt can be taken to be continuous.

Let Vt = Mt − g(T ∧ t). Then Vt has continuous paths and since g(T ∧ t) increases with

t, V is a supermartingale. We have

Vt = E [g(T ) − g(T ∧ t) | Ft],

so V is non-negative. Clearly VT = 0. If S is the first time that Vt is 0, then S ≤ T . Also,

0 = EVS = E [g(T ) − g(T ∧ S)],

so S ≥ T.

We let Tn = inf{t : Vt = 1/n}. By the continuity of V , it is clear that each Tn is strictly less

than T if T > 0 and the Tn increase up to T . Hence T is predictable.

Now let us suppose that At is a right-continuous adapted process whose paths are increas-

ing. We call such a process an increasing process. �At denotes the jump of A at time t, that

is, �At = At − At−.

Proposition 16.23 Suppose At is an increasing process such that

(1) �AT = 0 whenever T is a totally inaccessible stopping time, and

(2) �AT is FT − measurable whenever T is a predictable stopping time.

Then A is predictable.

Proof Let Umi be the ith time |�At | ∈ (2−m, 2−m+1]. The Umi are predictable stopping times

by Exercise 16.5. We decompose each Umi as in Proposition 16.1. Since A does not jump at

totally inaccessible times, none of the Umi has a totally inaccessible part.

We do this for each m and i and obtain a countable collection of predictable stopping

times, the union of whose graphs contains all the jump times of A. We order them in some

way as R1, R2, . . . Define T1 = R1, define T2 by setting T2(ω) = R2(ω) if R2(ω) �= R1(ω)

and infinity otherwise. Set Tn(ω) = Rn(ω) if Rn(ω) �= R1(ω), . . . , Rn−1(ω) and Tn(ω) = ∞

otherwise. We thus get a sequence of predictable stopping times Tn with disjoint graphs and

122 The general theory of processes

∪n[Tn, Tn] includes all the jumps of A, except for the set of ω’s of probability zero. The Tn

are predictable stopping times by Exercise 16.6.

Since A jumps only at the predictable stopping times Tn, we see that we can write At =

Act +

∑

i(�ATn )1[Tn,∞), where A

c is a continuous increasing process. By hypothesis, �ATn

is FTn− measurable. Therefore the proof will be complete once we show (�ATn )1[Tn,∞) is a

predictable process.

It therefore suffices to show that the process Yt = 1B(ω)1[T,∞)(t) is predictable if T is

a predictable stopping time and B ∈ FT−. Since Yt = 1[TB,∞)(t), where TB is equal to T if

ω ∈ B and equal to infinity otherwise, the predictability of Y follows by Exercise 16.3.

16.6 Dual projection theorems

In this section At is a right-continuous increasing process with A0 = 0, a.s. We do not

necessarily assume that At is adapted, only that A is measurable with respect to H defined by

(16.6). Define μA on elements of H by

μA(B) = E

∫ ∞

0

1B(t, ω) dAt (ω).

We define μA(X ) by E

∫∞

0 Xt dAt if X is bounded and H measurable. Note that if X = 0,

then μA(X ) = 0.

Theorem 16.24 Suppose μ is a bounded positive measure on H such that μ(X ) = 0

whenever X = 0. Then there exists a unique right-continuous increasing process A with

A0 = 0, a.s., such that μ = μA.

Proof First, uniqueness. If μ = μA = μB, let t > 0 and let C be the set of ω’s where

At (ω) > Bt (ω)+ε. Then μA([0, t]×C) ≥ μB([0, t]×C)+εP(C), which implies P(C) = 0.

Since ε is arbitrary, then At = Bt , a.s. Since A and B are right continuous, we conclude A = B.

To prove existence, for each rational q, define νq(C) = μ([0, q] × C). Clearly νq is

absolutely continuous with respect to P. Let Ãq be the Radon–Nikodym derivative of νq with

respect to P. Since μ is positive, Ã is increasing in q. Let At = lim supq→t,q>t Ãq. It is easy to

check that μA = μ.

Theorem 16.25 Suppose A is right continuous, A0 = 0, a.s., and μA(X ) = μA(oX ) for

every bounded H measurable process X . Then At is optional.

Proof Since At is right continuous, we need only show that At is adapted. Fix t and let Y be

a bounded F∞ measurable random variable,

Z = Y − E [Y | Ft],

and Xs(ω) = 1[0,t](s)Z(ω). If T is a stopping time, then (T ≤ t) ∈ Ft , and so by the

definitions of X and Z,

E [oXT ; T < ∞] = E [XT ; T < ∞] = E [Z; T ≤ t] = 0.
This implies oX = 0 by the definition of oX . Hence
E [AtZ] = E
[ ∫ ∞
0
Xs dAs
]
= μA(X ) = μA(oX ) = 0.
16.6 Dual projection theorems 123
Thus E [AtY ] = E [AtE [Y | Ft] ]. We write
E [AtY ] = E [AtE [Y | Ft] ] = E [E [(AtE [Y | Ft]) | Ft] ]
= E [E [At | Ft]E [Y | Ft] ] = E [E [(Y E [At | Ft]) | Ft] ]
= E [Y E [At | Ft] ].
Hence E [AtY ] = E [Y E [At | Ft] ] for all bounded Y , or At = E [At | Ft], a.s., which says
that At is Ft measurable.
Theorem 16.26 If μA(X ) = μA(pX ) for all bounded X , then A is predictable and can be
taken to be right continuous.
Proof By hypothesis, together with Exercise 16.8,
μA(
oX ) = μA(p(oX )) = μA(pX ) = μA(X ).
By Theorem 16.25, At is right continuous and optional. We need to show that A does not
jump at totally inaccessible times and that �AT is FT− measurable at predictable times T ;
we then use Proposition 16.23.
Let T be a totally inaccessible stopping time and let B = (�AT > 0). Set TB equal to T on

B and equal to infinity otherwise. It is easy to check that TB is also totally inaccessible. Let

X = 1[TB,TB]. If U is a predictable stopping time, E [XU ;U < ∞] = P(TB = U < ∞) = 0.
By the definition of predictable projection, pX = 0. Hence
E [�AT ; �AT > 0] = E [�ATB ] = μA(X ) = μA(pX ) = 0.

Now suppose T is a predictable stopping time. Let Y be a bounded H measurable random

variable, set

Z = Y − E [Y | FT−],

and X = Z1[T,T ]. Let S be any predictable stopping time. Then if W = 1[S,S], W =

limn→∞ 1[S,S+(1/n)) is a predictable process by Proposition 16.2(4). By the definition of FT −,

WT is FT− measurable. This is the same as saying (S = T < ∞) ∈ FT−. Therefore
E [XS; S < ∞] = E [Z; S = T < ∞] = 0.
This implies pX = 0, and then
0 = μA(pX ) = μA(X ) = E [Z�AT ].
Similarly to the proof of Theorem 16.25,
E [�ATY ] = E [�AT E [Y | FT −] ]
= E [E [�AT | FT−]E [Y | FT −] ]
= E [Y E [�AT | FT −] ].
Since this holds for all Y , then �AT = E [�AT | FT−] is FT − measurable.
We now define the dual optional projection and the dual predictable projection of an
increasing process. Given a right-continuous increasing, not necessarily adapted process At
with A0 = 0, a.s., define μo by
μo(X ) = μA(oX ) (16.11)
124 The general theory of processes
for bounded H measurable X . Exercise 16.11 asks you to prove that μo is a measure. Clearly
μo(
oX ) = μA(o(oX )) = μA(oX ) = μo(X ). By Theorem 16.17, we see that oX ≥ 0 if X ≥ 0,
hence μo is a positive measure. If X = 0, then oX = 0, so μo(X ) = μA(oX ) = 0. Therefore
by Theorems 16.24 and 16.25, μo corresponds to an optional increasing process Ao, called
the dual optional projection of A.
The dual optional projection is used in excursion theory. More commonly used is the dual
predictable projection, which is defined in a very similar way. Define μp(X ) = μA(pX ), and
let Ap be the predictable increasing process associated with μp. We often denote Ap by Ã and
call it the compensator of A. The reason for this terminology is the following proposition.
Proposition 16.27 Let At be an adapted increasing process with A0 = 0, a.s. Then At − Ãt
is a martingale.
Proof Let s < t, let B ∈ Fs, define
S(ω) =
{
s, ω ∈ B,
∞, ω /∈ B, and T (ω) =
{
t, ω ∈ B,
∞, ω /∈ B.
Let X = 1(S,T ]. Then
E [At − As; B] = μA(X ) = μA(pX ) = μAp(X ) = E [Apt − Aps ; B],
which does it.
16.7 The Doob–Meyer decomposition
Proposition 16.28 If M is a predictable uniformly integrable martingale with paths that are
right continuous with left limits, then M is continuous.
Proof Let ε > 0 and let T = inf{t : |�Mt | > ε}. T is a predictable stopping time by

Exercise 16.2. By Corollary 16.20, E [MT | FT−] = MT−. By the definition of FT− and a

limit argument, MT is FT− measurable, and thus E [MT | FT−] = MT . Hence MT = MT−

at all predictable stopping times, and in particular at time T . But ε is arbitrary, so M has no

jumps.

We say a process X is of class D if the family {XT : T a stopping time} is uniformly

integrable. The Doob–Meyer decomposition is the following. If Zt is a supermartingale,

then −Zt is a submartingale, and it is a matter only of convenience whether we state the

Doob–Meyer decomposition in terms of submartingales or supermartingales.

Theorem 16.29 Suppose Zt is a submartingale of class D with paths that are right continuous

with left limits and such that Z0 = 0, a.s. Then Zt = Mt+At, where Mt is a uniformly integrable

right-continuous martingale with M0 = 0, a.s., and At is a predictable increasing process

with A0 = 0, a.s. The decomposition is unique.

The existence is the hard part. We define a measure μ by μ((S, T ]) = E[ZT − ZS] for

stopping times S ≤ T , and then let A be the increasing process such that μA(X ) = μ(pX ).

Proof We start with uniqueness. If Zt = Mt + At = Nt + Bt , then Mt − Nt = Bt − At , and so

Mt − Nt is a predictable uniformly integrable martingale. By Proposition 16.28, Mt − Nt is

16.7 The Doob–Meyer decomposition 125

a continuous martingale. Since Mt − Nt = Bt − At , then Mt − Nt is a continuous martingale

whose paths are of bounded variation on each finite time interval, hence Mt − Nt = 0 by

Theorem 9.7. This proves uniqueness.

We turn to existence. By the martingale convergence theorem (Theorem 3.12), Z∞ =

limt→∞ Zt exists, a.s. By Fatou’s lemma, E |Z∞| < ∞.
Let I denote the collection of finite unions of subsets of [0, ∞) × � of the form (S, T ],
where S ≤ T are stopping times. Define μ((S, T ]) = E [ZT −ZS]. Since Z is a submartingale,
then μ is non-negative. We note that I is an algebra and that μ is finitely additive on I .
If K = (S1, T1] ∪ · · · ∪ (Sn, Tn] with S1 ≤ T1 ≤ S2 ≤ · · · ≤ Tn, set K = [S1, T1] ∪ · · · ∪
[Sn, Tn].
If H = (S, T ] and ε > 0, let

Sn(ω) =

{

S(ω) + (1/n), S(ω) + (1/n) < T (ω),
∞, otherwise,
and
Tn(ω) =
{
T (ω), S(ω) + (1/n) < T (ω),
∞, otherwise.
Then [Sn, Tn] ⊂ (S, T ] and Sn ↓ S, Tn ↓ T . Since Z is right continuous and of class D, then
μ(Sn, Tn] = E [ZTn − ZSn ] → E [ZT − ZS] = μ(H ). Thus if n is sufficiently large and we
take K = (Sn, Tn], then K ⊂ H and μ(K) > μ(H ) − ε.

We now prove that μ is countably additive on I . Suppose Hn ∈ I with Hn ↓ ∅. We need

to show that μ(Hn) ↓ 0.

Let ε > 0 and choose Kn ∈ I such that Kn ⊂ Hn with μ(Kn) > μ(Hn) − ε/2n. Let

Ln = K1 ∩ · · · ∩ Kn. Then for each n we have μ(Hn) ≤ μ(Ln) + ε. Since Ln ⊂ Kn ⊂ Hn, we

have Ln ↓ ∅.

Let DLn be the debut of Ln. The stopping times DLn increase; let R be the limit. Let

Fn = Fn(ω) = {t : (t, ω) ∈ Ln}. This is a closed subset of [0, ∞), and DLn (ω) ∈ Fn ⊂ Fm

whenever n ≥ m and DLn (ω) < ∞. If R(ω) < ∞, then R(ω) ∈ Fm for each m, which
contradicts ∩mLm = ∅. Therefore R = ∞. Since Z is of class D, then ZDLn converges almost
surely and in L1 to Z∞. Thus μ(Ln) ≤ E [Z∞ − ZDLn ] → 0. Hence lim sup μ(Hn) < ε, and
since ε is arbitrary, μ(Hn) → 0.
This proves that μ is countably additive on I . By the Carathéodory extension theorem, μ
may be extended to a measure on P .
Define μ̃(X ) = μ(pX ). Then μ̃(pX ) = μ(p(pX )) = μ(pX ) = μ̃(X ), and so there exists
a predictable right-continuous increasing process At such that μ̃ = μA. Since
E A∞ = μA(1(0,∞)) = μ(p1(0,∞)) = μ(1(0,∞)) = E [Z∞ − Z0] < ∞,
A∞ is integrable, and since At is an increasing process, the collection of random variables
{At} is uniformly integrable.
If S is any stopping time, then by Proposition 16.2, (S, ∞) is a predictable set, hence
p1(S,∞) = 1(S,∞). We thus have
E [A∞ − AS] = μ̃((S, ∞)) = μ(p1(S,∞)) = μ(1(S,∞)) = E [Z∞ − ZS].
126 The general theory of processes
Letting t > 0 and B ∈ Ft , define S = t if ω ∈ B and equal to infinity otherwise. Then

E [A∞ − At; B] = E [A∞ − AS] = E [Z∞ − ZS] = E [Z∞ − Zt; B],

or Mt = Zt − At is a martingale. Proposition A.17 tells us that M is a uniformly integrable

martingale.

A process X is of class DL if there exist stopping times Vn → ∞ such that Xt∧Vn is of

class D for each n. It is clear that there is a version of the Doob–Meyer decomposition for

submartingales of class DL.

Proposition 16.30 The process A is continuous if and only if E ZTn → E ZT whenever Tn ↑ T

and Tn < T on (T > 0).

Proof Let T be a predictable stopping time predicted by the sequence Tn. Since we know

E [A∞ − ATn ] = E [Z∞ − ZTn ], then taking limits,

E [A∞ − AT−; T < ∞] = E [Z∞ − ZT−; T < ∞],
using the fact that Z is of class D. Also E [A∞ − AT ] = E [Z∞ − ZT ]. Thus E [AT − AT−] =
E [ZT − ZT −]. Then E [AT − AT −] = 0 if and only if E ZT = E ZT−.
Corollary 16.31 Let S be a totally inaccessible stopping time, Y a non-negative bounded
random variable that is FS measurable, and At = Y 1(t≥S). Let Ã be the compensator of A.
Then Ã has continuous paths.
Proof Let T be a stopping time and let Tn be stopping times increasing to T . If we have
P(T = S) = 0, then limn→∞ ATn = AT , a.s., since A jumps only at time S. If P(T = S) >

0, then [T, T ] cannot contain the graph of a predictable stopping time since S is totally

inaccessible. Therefore we cannot have Tn < T for all n with positive probability, hence
Tn(ω) = T (ω) for all n sufficiently large (depending on ω). Thus again limn→∞ ATn = AT ,
a.s. By Proposition 16.30, Ã is continuous.
16.8 Two inequalities
Proposition 16.32 Suppose Zt = Mt − At, where Mt is a uniformly integrable martingale
and At is an increasing predictable process with A0 = 0, a.s. Suppose Z is bounded, that is,
there exists K > 0 such that P(|Zt | > K for some t) = 0. If p is any positive integer,

E Ap∞ < ∞.
Proof Let λ > 0 and let M = 4K. Let T = inf{t : At ≥ λ}. Because AT− ≤ λ,

P(A∞ ≥ λ + M ) = P(A∞ ≥ λ + M, T < ∞)
≤ P(A∞ − AT − ≥ M, T < ∞)
≤ E
[A∞ − AT −
M
; A∞ − AT− ≥ M, T < ∞
]
≤ 1
M
E [A∞ − AT−; T < ∞].
16.8 Two inequalities 127
We will show
1
M
E [A∞ − AT−; T < ∞] ≤ 12P(T < ∞), (16.12)
which, since P(T < ∞) = P(A∞ ≥ λ), implies
P(A∞ ≥ λ + M ) ≤ 12P(A∞ ≥ λ). (16.13)
Taking λ = kM in (16.13) yields
P(A∞ ≥ (k + 1)M ) ≤ 12P(A∞ ≥ kM ).
Since P(A∞ ≥ M ) ≤ 1, induction tells us
P(A∞ ≥ kM ) ≤ 1
2k−1
,
which implies our conclusion.
Therefore we need to prove (16.12). T is a predictable stopping time by Proposition 16.16.
Let Tn be stopping times with Tn ↑ T and Tn < T on (T > 0). Let n be fixed for the moment

and let N > 0. If j > n,

E [A∞ − ATj ; Tn < N] = E [E [A∞ − ATj | FTj ]; Tn < N]
= −E [E [Z∞ − ZTj | FTj ]; Tn < N]
≤ 2KP(Tn < N )
since Zt + At is a martingale, (Tn < N ) ∈ FTn ⊂ FTj , and |Z| is bounded by K. Letting
j → ∞ and using Fatou’s lemma, we get
E [A∞ − AT −; Tn < N] ≤ 2KP(Tn < N ).
Letting n → ∞, by Fatou’s lemma again,
E [A∞ − AT−; T < N] ≤ 2KP(T ≤ N ).
Finally, letting N → ∞, by monotone convergence,
E [A∞ − AT −; T < ∞] ≤ 2KP(T < ∞).
By our choice of M , this gives (16.12).
For use in the reduction theorem in Chapter 17, we will need a variation of the preceding
proposition.
Proposition 16.33 Let U be a stopping time, Y a non-negative integrable random variable
that is FU measurable. Let Nt be the right-continuous version of E [Y | Ft]. Suppose there
exists K > 0 such that Nt ≤ K if t < U . Let Zt = Y 1(t≥U ), which is an increasing process,
and let At be its compensator. If p is a positive integer, then E A
p
∞ < ∞.
Proof As in the proof of Proposition 16.32, it suffices to show
E [A∞ − AT −; T < ∞] ≤ KP(T < ∞), (16.14)
where λ > 0 and T = inf{t : At ≥ λ}. Since A is a predictable process, then T is a predictable

stopping time by Proposition 16.16. Let Tn be stopping times predicting T .

128 The general theory of processes

Let N, n ≥ 1. If j > n, then (Tn < N ) ∈ FTn ⊂ FTj and
E [A∞ − ATj ; Tn < N] = E [Z∞ − ZTj ; Tn < N]. (16.15)
We observe that Z∞ − ZTj = 0 on the event (Tj ≥ U ), while Z∞ − ZTj = Y on the event
(Tj < U ). Therefore
E [Z∞ − ZTj ; Tn < N] = E [Y ; Tj < U, Tn < N]
= E [E [Y | FTj ]; Tj < U, Tn < N]
= E [NTj ; Tj < U, Tn < N]
≤ KP(Tj < U, Tn < N )
≤ KP(Tn < N ).
With this and (16.15), we can now proceed as in the proof of Proposition 16.32 to obtain
(16.14).
Exercises
16.1 Show that if S1, . . . , Sn are predictable stopping times, then so are S1 ∧· · ·∧Sn and S1 ∨· · ·∨Sn.
16.2 If At is a predictable process with paths that are right continuous with left limits and a > 0,

show T = inf{t > 0 : �At > a} is a predictable stopping time.

16.3 Show that if T is a predictable stopping time, B ∈ FT−, and TB(ω) is defined to be equal to

T (ω) if ω ∈ B and equal to ∞ otherwise, then TB is a predictable stopping time.

16.4 Let X be a bounded adapted right-continuous process, let ε > 0, let U0 = 0, a.s., and define Ui

by (16.3) for i ≥ 1. Show each Ui is a stopping time.

16.5 Let A be a predictable increasing process and let Sk be the kth time A jumps more than ε. Thus

S0 = 0, a.s., and Sk+1 = inf{t > Sk : �At > ε}. Show each Sk is a predictable stopping time.

16.6 Show that the stopping times Tn defined in the proof of Proposition 16.23 are predictable.

16.7 Show that if Pt is a Poisson process, then (pP)t = Pt−.

16.8 Show that if X is bounded and measurable with respect to the product σ -field B[0,∞) × F∞,

then p(oX ) = pX .

16.9 Suppose T is a totally inaccessible stopping time. Show that if X = 1[T,T ], then pX = 0.

16.10 If P is a Poisson process with parameter λ, determine Pot and P

p

t .

16.11 Show that μo defined in (16.11) is a measure.

16.12 Let Xt be a continuous process and suppose there exists K > 0 such that for all t,

E [ |X∞ − Xt | |Ft ] ≤ K, a.s.

Let X ∗∞ = supt≥0 |Xt |. Prove that there exists a depending only on K such that

E eaX

∗

∞ < ∞.
Notes 129
This is sometimes called the John–Nirenberg inequality after the inequality of the same name
in analysis.
Hint: Imitate the proof of Proposition 16.32. This exercise is somewhat easier than the proof
of that proposition because X has continuous paths.
16.13 A martingale M is said to be in the space BMO if
sup
t≥0
E [M2∞ − M2t | Ft ] < ∞, a.s.
Let M∗t = sups≤t |Ms|. Show that if M is in BMO, then there exists a > 0 such that

E eaM

∗

∞ < ∞.
The name BMO comes from the “bounded mean oscillation” spaces of harmonic analysis.
Hint: Use Exercise 16.12.
Notes
A progressively measurable set is one whose indicator is a progressively measurable process,
which is defined in Exercise 1.3. In fact, the debut of a progressively measurable set is a
stopping time; see Dellacherie and Meyer (1978).
An elementary proof of the general Doob–Meyer theorem along the lines of the proof
given in Chapter 9 can be found in Bass (1996).
See Dellacherie and Meyer (1978) for more on the general theory of processes.
17
Processes with jumps
In this chapter we investigate the stochastic calculus for processes which may have jumps
as well as a continuous component. If X is not a continuous process, it is no longer true
that Xt∧TN is a bounded process when TN = inf{t : |Xt | ≥ N}, since there could be a large
jump at time TN . We investigate stochastic integrals with respect to square integrable (not
necessarily continuous) martingales, Itô’s formula, and the Girsanov transformation. We
prove the reduction theorem that allows us to look at semimartingales that are not necessarily
bounded.
Since I encouraged you to skim Chapter 16 on the first reading of this book, it is only fair
that I tell you the facts that we will need from that chapter. We will need the Doob–Meyer
decomposition (Theorem 16.29), Proposition 16.1, Corollaries 16.21 and 16.31, and the two
inequalities in Propositions 16.32 and 16.33.
17.1 Decomposition of martingales
We assume throughout this chapter that {F t} is a filtration satisfying the usual conditions.
This means that each Ft contains every P-null set and ∩ε>0Ft+ε = Ft for each t.

Let us begin by recalling a few definitions and facts. The predictable σ -field is the σ -field

of subsets of [0, ∞) × � generated by the collection of bounded, left-continuous processes

that are adapted to {Ft}; see Section 10.1. A stopping time T is predictable and predicted

by the sequence of stopping times Tn if Tn ↑ T , and Tn < T on the event (T > 0). A

stopping time T is totally inaccessible if P(T = S) = 0 for every predictable stopping time

S. The graph of a stopping time T is [T, T ] = {(t, ω) : t = T (ω) < ∞}; see Section 16.1.
If Xt is a process that is right continuous with left limits, we set Xt− = lims→t,s

is right continuous with left limits, for each i, Ti j → ∞ as j → ∞. We conclude that Mt

has at most countably many jumps. Next we decompose each Ti j into predictable and totally

inaccessible parts by Proposition 16.1. We relabel the jump times as S1, S2, . . . so that each

Sk is either predictable or totally inaccessible, the graphs of the Sk are disjoint, M has a jump

at each time Sk and only at these times, and |�MSk | is bounded for each k; of the proof of

Proposition 16.23. We do not assume that Sk1 ≤ Sk2 if k1 ≤ k2, and in general it would not be

possible to arrange this.

If Si is a totally inaccessible stopping time, let

Ai(t) = �MSi 1(t≥Si ) (17.2)

and

Mi(t) = Ai(t) − Ãi(t), (17.3)

where Ãi is the compensator of Ai. Ai(t) is the process that is 0 up to time Si and then jumps

an amount �MSi ; thereafter it is constant. By Corollary 16.31, Ã is continuous. If Si is a

predictable stopping time, let

Mi(t) = �MSi 1(t≥Si ). (17.4)

By Corollary 16.21, Mi is a martingale. Note that in either case, M − Mi has no jump at

time Si.

Theorem 17.3 Suppose M is a square integrable martingale and we define Mi as in (17.3)

and (17.4).

(1) Each Mi is square integrable.

(2)

∑∞

i=1 Mi(∞) converges in L2.

(3) If Mct = Mt −

∑∞

i=1 Mi(t), then M

c is square integrable and we can find a version that

has continuous paths.

(4) For each i and each stopping time T , E [McT Mi(T )] = 0.

Proof (1) If Si is a totally inaccessible stopping time and we let Bt = (�MSi )+1(t≥Si ) and

Ct = (�MSi )−1(t≥Si ), then (1) follows by Lemma 17.1. If Si is predictable, (1) follows by

Corollary 16.21.

(2) Let Vn(t) =

∑n

i=1 Mi(t). By the orthogonality lemma (Lemma 17.2),

E [Mi(∞)Mj(∞)] = 0 if i �= j and E [Mi(∞)(M∞ − Vn(∞)] = 0 if i ≤ n. We thus

17.2 Stochastic integrals 133

have

n∑

i=1

E Mi(∞)2 = EVn(∞)2

≤ E

[

M∞ − Vn(∞)

]2

+ EVn(∞)2

= E

[

M∞ − Vn(∞) + Vn(∞)

]2

= E M2∞ < ∞.
Therefore the series E
∑n
i=1 Mi(∞)2 converges. If n > m,

E [(Vn(∞) − Vm(∞)]2 = E

[ n∑

i=m+1

Mi(∞)

]2

=

n∑

i=m+1

E Mi(∞)2.

This tends to 0 as n, m → ∞, so Vn(∞) is a Cauchy sequence in L2, and hence converges.

(3) From (2), Doob’s inequalities, and the completeness of L2, the random variables

supt≥0[Mt − Vn(t)] converge in L2 as n → ∞. Let Mct = limn→∞[Mt − Vn(t)]. There is a

sequence nk such that

sup

t≥0

|(Mt − Vnk (t)) − Mct | → 0, a.s.

We conclude that the paths of Mct are right continuous with left limits. By the construction

of the Mi, M − Vnk has jumps only at times Si for i > nk . We therefore see that Mc has no

jumps, i.e., it is continuous.

(4) By the orthogonality lemma and (17.1),

E [Mi(T )(MT − Vn(T )] = 0

if T is a stopping time and i ≤ n. Letting n tend to infinity proves (4).

17.2 Stochastic integrals

If Mt is a square integrable martingale, then M2t is a submartingale by Jensen’s inequality

for conditional expectations. Just as in the case of continuous martingales, we can use the

Doob–Meyer decomposition (this time, we use Theorem 16.29 instead of Theorem 9.12) to

find a predictable increasing process starting at 0, denoted 〈M〉t , such that M2t − 〈M〉t is a

martingale.

Let us define

[M]t = 〈Mc〉t +

∑

s≤t

|�Ms|2. (17.5)

Here Mc is the continuous part of the martingale M as defined in Theorem 17.3. As an

example, if Mt = Pt − t, where Pt is a Poisson process with parameter 1, then Mct = 0 and

[M]t =

∑

s≤t

�P2s =

∑

s≤t

�Ps = Pt,

because all the jumps of Pt are of size one. In this case 〈M〉t = t; this follows from Proposition

17.4 below.

134 Processes with jumps

In defining stochastic integrals, one could work with 〈M〉t , but the process [M]t is the one

that shows up naturally in many formulas, such as the product formula.

Proposition 17.4 M2t − [M]t is a martingale.

Proof By the orthogonality lemma and (17.1) it is easy to see that

〈M〉t = 〈Mc〉t +

∑

i

〈Mi〉t .

Since M2t − 〈M〉t is a martingale, we need only show [M]t − 〈M〉t is a martingale. Since

[M]t − 〈M〉t =

(

〈Mc〉t +

∑

s≤t

|�Ms|2

)

−

(

〈Mc〉t +

∑

i

〈Mi〉t

)

,

it suffices to show that

∑

i 〈Mi〉t −

∑

i

∑

s≤t |�Mi(s)|2 is a martingale.

By Exercise 17.1

Mi(t)

2 = 2

∫ t

0

Mi(s−) dMi(s) +

∑

s≤t

|�Mi(s)|2, (17.6)

where the first term on the right-hand side is a Lebesgue–Stieltjes integral. If we approximate

this integral by a Riemann sum and use the fact that Mi is a martingale, we see that the first

term on the right in (17.6) is a martingale. Thus M2i (t) −

∑

s≤t |�Mi(s)|2 is a martingale.

Since M2i (t) − 〈Mi〉t is a martingale, summing over i completes the proof.

If Hs is of the form

Hs(ω) =

n∑

i=1

Ki(ω)1(ai,bi](s), (17.7)

where each Ki is bounded and Fai measurable, define the stochastic integral by

Nt =

∫ t

0

Hs dMs =

n∑

i=1

Ki[Mbi∧t − Mai∧t].

Very similar proofs to those in Chapter 10 show that the left-hand side will be a martingale

and (with [·] instead of 〈·〉), N2t − [N]t is a martingale.

If H is P measurable and E

∫∞

0 H

2

s d[M]s < ∞, approximate H by integrands H ns of the
form (17.7) so that
E
∫ ∞
0
(Hs − H ns )2 d[M]s → 0
and define Nnt as the stochastic integral of H
n with respect to Mt . By almost the same proof
as that of Theorem 10.4, the martingales Nnt converge in L
2. We call the limit Nt =
∫ t
0 Hs dMs
the stochastic integral of H with respect to M . A subsequence of the Nn converges uniformly
over t ≥ 0, a.s., and therefore the limit has paths that are right continuous with left limits.
The same arguments as those of Theorem 10.4 apply to prove that the stochastic integral is
a martingale and
[N]t =
∫ t
0
H 2s d[M]s.
17.3 Itô’s formula 135
A consequence of this last equation is that
E
( ∫ t
0
Hs dMs
)2
= E
∫ t
0
H 2s d[M]s. (17.8)
17.3 Itô’s formula
We will first prove Itô’s formula for a special case, namely, we suppose Xt = Mt + At , where
Mt is a square integrable martingale and At is a process of bounded variation whose total
variation is integrable. The extension to semimartingales without the integrability conditions
will be done later in the chapter (in Section 17.5) and is easy. Define 〈X c〉t to be 〈Mc〉t .
Theorem 17.5 Suppose Xt = Mt + At, where Mt is a square integrable martingale and At is
a process with paths of bounded variation whose total variation is integrable. Suppose f is
C2 on R with bounded first and second derivatives. Then
f (Xt ) = f (X0) +
∫ t
0
f ′(Xs−) dXs + 12
∫ t
0
f ′′(Xs−) d〈X c〉s (17.9)
+
∑
s≤t
[ f (Xs) − f (Xs−) − f ′(Xs−)�Xs].
Proof The proof will be given in several steps. Set
S(t) =
∫ t
0
f ′(Xs−) dXs, Q(t) = 12
∫ t
0
f ′′(Xs−) d〈X c〉s,
and
J (t) =
∑
s≤t
[ f (Xs) − f (Xs−) − f ′(Xs−)�Xs].
We use these letters as mnemonics for “stochastic integral term,” “quadratic variation term,”
and “jump term,” respectively.
Step 1. Suppose Xt has a single jump at time T which is either a predictable stopping time
or a totally inaccessible stopping time and there exists N > 0 such that |�MT | + |�AT | ≤ N

a.s.

If T is totally inaccessible, let Ct = �MT 1(t≥T ) and let C̃t be the compensator. If we replace

Mt by Mt − Ct + C̃t and At by At + Ct − Ĉt , we may assume that Mt is continuous. If T is

a predictable stopping time, replace Mt by Mt − �MT 1(t≥T ) and At by At + �MT 1(t≥T ), and

again we may assume M is continuous.

Let Bt = �XT 1(t≥T ). Set X̂t = Xt − Bt and Ât = At − Bt . Then X̂t = Mt + Ât and X̂t is

a continuous process that agrees with Xt up to but not including time T . We have X̂s− = X̂s

and �X̂s = 0 if s ≤ T . By Theorem 11.1

f (X̂t ) = f (X̂0) +

∫ t

0

f ′(X̂s) dX̂s + 12

∫ t

0

f ′′(X̂s) d〈M〉s

= f (X̂0) +

∫ t

0

f ′(X̂s−) dX̂s + 12

∫ t

0

f ′′(X̂s−) d〈X̃ c〉s

+

∑

s≤t

[ f (X̂s) − f (X̂s−) − f ′(X̂s−)�X̂s],

136 Processes with jumps

since the sum on the last line is zero. For t < T , X̂t agrees with Xt . At time T , f (Xt ) has a
jump of size f (XT ) − f (XT−). The integral with respect to X̂ , S(t), will jump f ′(XT−)�XT ,
Q(t) does not jump at all, and J (t) jumps f (XT ) − f (XT−) − f ′(XT−)�XT . Therefore both
sides of (17.9) jump the same amount at time T , and hence in this case we have (17.9)
holding for t ≤ T .
Step 2. Suppose there exist times T1 < T2 < · · · with Tn → ∞, each Ti is either a totally
inaccessible stopping time or a predictable stopping time, for each i, there exists Ni > 0 such

that |�MTi | and |�ATi | are bounded by Ni, and Xt is continuous except at the times T1, T2, . . .

Let T0 = 0.

Fix i for the moment. Define X ′t = X(t−Ti)+ , define A′t and M ′t similarly, and apply Step 1 to

X ′ at time Ti + t. We have for Ti ≤ t ≤ Ti+1

f (Xt ) = f (XTi ) +

∫ t

Ti

f ′(Xs−) dXs + 12

∫ t

Ti

f ′′(Xs−) d〈X c〉s

+

∑

Ti~~ 0 such that |�MSi |+ |�ASi | ≤ Ni. Moreover each Si is either~~

a predictable stopping time or a totally inaccessible stopping time. Let M be decomposed

into Mc and Mi as in Theorem 17.3 and let

Act = At −

∞∑

i=1

�ASi 1(t≥Si ).

Since At is of bounded variation, then Ac will be finite and continuous. Define

Mnt = Mct +

n∑

i=1

Mi(t)

and

Ant = Act +

n∑

i=1

�ASi 1(t≥Si ),

and let X nt = Mnt + Ant . We already know that Mn converges uniformly over t ≥ 0 to M in

L2. If we let Bnt =

∑n

i=1(�ASi )

+1(t≥Si ) and C

n

t =

∑n

i=1(�ASi )

−1(t≥Si ) and let Bt = supn Bnt ,

Ct = supn Cnt , then the fact that A has paths of bounded variation implies that with probability

17.3 Itô’s formula 137

one, Bnt → Bt and Cnt → Ct uniformly over t ≥ 0 and At = Bt − Ct . In particular, we have

convergence in total variation norm:

E

∫ ∞

0

|d(Ant ) − At )| → 0.

We define Sn(t), Qn(t), and Jn(t) analogously to S(t), Q(t), and J (t), respectively. By

applying Step 2 to X n, we have

f (X nt ) = f (X n0 ) + Sn(t) + Qn(t) + Jn(t),

and we need to show convergence of each term. We now examine the various terms.

Uniformly in t, X nt converges to Xt in probability, that is,

P(sup

t≥0

|X nt − Xt | > ε) → 0

as n → ∞ for each ε > 0. Since ∫ t0 d〈Mc〉s < ∞, by dominated convergence∫ t
0
f ′′(X ns−) d〈Mc〉s →
∫ t
0
f ′′(Xs−) d〈Mc〉s
in probability. Therefore Qn(t) → Q(t) in probability. Also, f (X nt ) → f (Xt ) and f (X0) →
f (X0), both in probability.
We now show Sn(t) → S(t). Write∫ t
0
f ′(X ns−) dA
n
s −
∫ t
0
f ′(Xs−) dAs
=
[ ∫ t
0
f ′(X ns−) dA
n
s −
∫ t
0
f ′(X ns−) dAs
]
+
[ ∫ t
0
f ′(X ns−) dAs −
∫ t
0
f ′(Xs−) dAs
]
= In1 + In2 .
We see that
|In1 | ≤ ‖ f ′‖∞
∫ t
0
|dAns − dAs| → 0
as n → ∞, while by dominated convergence, |In2 | also tends to 0.
We next look at the stochastic integral part of Sn(t).∫ t
0
f ′(X ns−) dM
n
s −
∫ t
0
f ′(Xs−) dMs
=
[ ∫ t
0
f ′(X ns−) dM
n
s −
∫ t
0
f ′(Xs−) dMns
]
+
[ ∫ t
0
f ′(Xs−) dMns −
∫ t
0
f ′(Xs−) dMs
]
= In3 + In4 .
138 Processes with jumps
The L2 norm of In3 is bounded by
E
∫ t
0
| f ′(X ns−) − f ′(Xs−)|2 d[Mn]s ≤ E
∫ t
0
| f ′(X ns−) − f ′(Xs−)|2 d[M]s,
which goes to zero by dominated convergence. Also
In4 =
∫ t
0
f ′(Xs−)
∞∑
i=n+1
dMi(s),
so using the orthogonality lemma (Lemma 17.2), the L2 norm of In4 is less than
‖ f ′‖2∞
∞∑
i=n+1
E [Mi]∞ ≤ ‖ f ′‖2∞
∞∑
i=n+1
E Mi(∞)2,
which goes to zero as n → ∞.
Finally, we look at the convergence of Jn. The idea here is to break both J (t) and Jn(t)
into two parts, the jumps that might be relatively large (jumps at times Si for i ≤ N where N
will be chosen appropriately) and the remaining jumps. Let N > 1 be chosen later.

J (t) − J n(t) =

∑

s≤t

[ f (Xs) − f (Xs−) − f ′(Xs−)�Xs]

−

∑

s≤t

[ f (X ns ) − f (X ns−) − f ′(X ns−)�X ns ]

=

∑

{i:Si≤t}

[ f (XSi ) − f (XSi−) − f ′(XSi−)�XSi ]

−

∑

{i:Si≤t}

[ f (X nSi ) − f (X nSi−) − f ′(X nSi−)�X nSi ]

=

∑

{i>N :Si≤t}

[ f (XSi ) − f (XSi−) − f ′(XSi−)�XSi ]

−

∑

{i>N :Si≤t}

[ f (X nSi ) − f (X nSi−) − f ′(X nSi−)�X nSi ]

+

∑

{i≤N,Si≤t}

{

[ f (XSi ) − f (XSi−) − f ′(XSi−)�XSi ]

− [ f (X nSi ) − f (X nSi−) − f ′(X nSi−)�X nSi ]

}

= IN5 − In,N6 + In,N7 .

By the fact that M and A are right continuous with left limits, |�MSi | ≤ 1/2 and |�ASi | ≤

1/2 if i is large enough (depending on ω), and then |�XSi | ≤ 1, and also

|�XSi |2 ≤ 2|�MSi |2 + 2|�ASi |2

≤ 2|�MSi |2 + |�ASi |.

17.4 The reduction theorem 139

We have

|IN5 | ≤ ‖ f ′′‖∞

∑

i>N,Si≤t

(�XSi )

2

and

|In,N6 | ≤ ‖ f ′′‖∞

∑

n≥i>N,Si≤t

(�XSi )

2.

Since

∑∞

i=1 |�MSi |2 ≤ [M]∞ < ∞ and
∑∞
i=1 |�ASi | < ∞, then given ε > 0, we can choose

N large such that

P(|IN5 | + |In,N6 | > ε) < ε.
Once we choose N , we then see that In,N7 tends to zero in probability as n → ∞, since X nt
converges in probability to Xt uniformly over t ≥ 0. We conclude that Jn(t) converges to
J (t) in probability as n → ∞.
This completes the proof.
17.4 The reduction theorem
Let M be a process adapted to {Ft}. If there exist stopping times Tn increasing to ∞ such
that each process Mt∧Tn is a uniformly integrable martingale, we say M is a local martingale.
If each Mt∧Tn is a square integrable martingale, we say M is a locally square integrable
martingale. We say a stopping time T reduces a process M if Mt∧T is a uniformly integrable
martingale.
Lemma 17.6 (1) The sum of two local martingales is a local martingale.
(2) If S and T both reduce M, then so does S ∨ T .
(3) If there exist times Tn → ∞ such that Mt∧Tn is a local martingale for each n, then M
is a local martingale.
Proof (1) If the sequence Sn reduces M and the sequence Tn reduces N , then Sn ∧ Tn will
reduce M + N .
(2) Mt∧(S∨T ) is bounded in absolute value by |Mt∧T | + |Mt∧S|. Both {|Mt∧T |} and {|Mt∧S|}
are uniformly integrable families of random variables. Now use Proposition A.17.
(3) Let Snm be a family of stopping times reducing Mt∧Tn and let S
′
nm = Snm ∧Tn. Renumber
the stopping times into a single sequence R1, R2, . . . and let Hk = R1∨· · ·∨Rk . Note Hk ↑ ∞.
To show that Hk reduces M , we need to show that Ri reduces M and use (2). But Ri = S′nm
for some m, n, so Mt∧Ri = Mt∧Snm∧Tn is a uniformly integrable martingale.
Let M be a local martingale with M0 = 0. We say that a stopping time T strongly reduces
M if T reduces M and the martingale E [ |MT | | Fs] is bounded on [0, T ), that is, there exists
K > 0 such that

sup

0≤s

The first term is bounded since T strongly reduces M . For the second term, if t < T ,
1(t

a.s. Observe that MT is the Radon–Nikodym derivative of Q with respect to P on FT .

Let Lt be the local martingale defined by

Lt =

∫ t

0

1

Ms−

dMs,

so that

dMt = Mt− dLt,

or M is the exponential of L.

Theorem 17.14 Suppose X is a local martingale with respect to P. Then Xt − Dt is a local

martingale with respect to Q, where

Dt =

∫ t

0

1

Ms

d[X , M]s =

∫ t

0

Ms−

Ms

d[X , L]s.

Note that in the formula for D, we are using a Lebesgue–Stieltjes integral.

Proof Exercise 17.6 tells us that it suffices to show that Mt (Xt − Dt ) is a local martingale

with respect to P. By Corollary 17.12,

d(M (X − D))t = (X − D)t− dMt + Mt− dXt − Mt− dDt

+ d[M, X − D]t .

Exercises 145

The first two terms on the right are local martingales with respect to P. Since D is of bounded

variation, the continuous part of D is zero, hence

[M, D]t =

∑

s≤t

�Ms�Ds =

∫ t

0

�Ms dDs.

Thus

Mt (Xt − Dt ) = local martingale + [M, X ]t −

∫ t

0

Ms dDs.

Using the definition of D shows that Mt (Xt − Dt ) is a local martingale.

Exercises

17.1 Suppose a(t) is a deterministic right-continuous nondecreasing function of t with a(0) = 0.

Prove the following formulas:

a(t)2 =

∫ t

0

[(a(t) − a(s)) + (a(t) − a(s−))] da(s), (17.14)

and a(t)2 =

∫ t

0

(2a(s−) + �a(s)) da(s)

= 2

∫ t

0

a(s−) da(s) +

∑

s

(�a(s))2. (17.15)

Hint: First do the case where a has only finitely many discontinuities.

17.2 If At is an increasing process and Ãt is its compensator, show that Ã jumps only when A does.

17.3 Let P jt , j ∈ Z, be independent Poisson processes with parameter λ j . Suppose λ j = λ− j for each

j �= 0. Suppose λ j decreases as j increases for j ≥ 1. Let

Xt =

∑

j∈Z

P jt .

Determine reasonable conditions on the sequence λ j so that X is a semimartingale. A local

martingale. A martingale. A locally square integrable martingale.

17.4 Show that if f (t) is a purely discontinuous function, then e f (t) is also.

17.5 Suppose M is a non-negative right-continuous martingale and T = inf{t > 0 : Mt = 0}. Show

that Mt = 0 on (t > T ).

17.6 Suppose P and Q are two equivalent probability measures, M∞ is the Radon–Nikodym derivative

of Q with respect to P, and Mt = E [M∞ | Ft ]. Show that Yt is a local martingale with respect

to Q if and only if YtMt is a local martingale with respect to P.

17.7 Suppose Xt is an increasing process with paths that are right continuous with left limits, X0 = 0,

a.s., X is purely discontinuous, and all jumps are of size +1 only. Suppose Xt − t is a martingale.

Prove that X is a Poisson process.

Hint: Imitate the proof of Theorem 12.1. When using Itô’s formula, it is important to use the

fact that �Xt is always 0 or 1.

146 Processes with jumps

17.8 Suppose Xt is an increasing process with paths that are right continuous with left limits, X0 = 0,

a.s., X is purely discontinuous, and all jumps are of size +1 only. Suppose limt→∞ Xt = ∞, a.s.

Prove that X is a time change of a Poisson process.

17.9 Suppose Pt is a Poisson process with parameter λ, {Ft} is the minimal augmented filtration for

P, and Mt = Pt − λt. Suppose Y is a F1 measurable random variable with finite mean and

variance. Prove that there exists a predictable process H such that

Y = EY +

∫ 1

0

Hs dMs.

17.10 Let P1 and P2 be two independent Poisson processes with the same parameter. Let Xt = P1t −P2t

and let {Ft} be the minimal augmented filtration for X . Find a bounded mean zero random

variable Y that is F1 measurable which does not satisfy

Y =

∫ 1

0

Hs dXs

for any predictable process H .

18

Poisson point processes

Poisson point processes are random measures that are related to Poisson processes. We will

use them when we study Lévy processes in Chapter 42. Poisson point processes are also

useful in the study of excursions, even excursions of a continuous process such as Brownian

motion (see Chapter 27), and they arise when studying stochastic differential equations with

jumps.

Let S be a metric space, G the collection of Borel subsets of S , and λ a measure on (S,G).

Definition 18.1 We say a map

N : � × [0, ∞) × G → {0, 1, 2, . . .}

(writing Nt (A) for N (ω, t, A)) is a Poisson point process if

(1) for each Borel subset A of S with λ(A) < ∞, the process Nt (A) is a Poisson process
with parameter λ(A), and
(2) for each t and ω, N (t, ·) is a measure on G.
A model to keep in mind is where S = R and λ is a Lebesgue measure. For each ω there
is a collection of points {(s, z)} (where the collection depends on ω). The number of points
in this collection with s ≤ t and z in a subset A is Nt (A)(ω). Since λ(R) = ∞, there are
infinitely many points in every time interval.
A consequence of the definition is that since λ(∅) = 0, then Nt (∅) is a Poisson process
with parameter 0; in other words, Nt (∅) is identically zero.
Our main result is that Nt (A) and Nt (B) are independent if A and B are disjoint.
Theorem 18.2 Let {Ft} be a filtration satisfying the usual conditions. Let S be a metric
space furnished with a positive measure λ. Suppose that Nt (A) is a Poisson point process
with respect to the measure λ. If A1, . . . , An are pairwise disjoint measurable subsets of
S with λ(Ak ) < ∞ for k = 1, . . . , n, then the processes Nt (A1), . . . , Nt (An) are mutually
independent.
Proof We first make the observation that because N (t, ·) is a measure and the A1, A2, . . . , An
are disjoint, then
∑n
k=1 Nt (Ak ) = Nt (∪nk=1Ak ) is a Poisson process with finite parameter. A
Poisson process has jumps of size one only, hence no two of the Nt (Ak ) have jumps at the
same time.
To prove the theorem, it suffices to let 0 = r0 < r1 < · · · < rm and show that the random
variables
{Nrj (Ak ) − Nrj−1 (Ak ) : 1 ≤ j ≤ m, 1 ≤ k ≤ n}
147
148 Poisson point processes
are independent. Since for each j and each k, Nrj (Ak ) − Nrj−1 (Ak ) is independent of Fr j−1 , it
suffices to show that for each j ≤ m, the random variables
{Nrj (Ak ) − Nrj−1 (Ak ) : 1 ≤ k ≤ n}
are independent. We will do the case j = m = 1 and write r for r j for simplicity; the case
when j, m > 1 differs only in notation.

We will prove this using induction. We start with the case n = 2 and show the independence

of Nr(A1) and Nr(A2). Each Nt (Ak ) is a Poisson process, and so Nt (Ak ) has moments of all

orders. Let u1, u2 ∈ R and set

φk = λ(Ak )(eiuk − 1), k = 1, 2.

Let

Mkt = eiuk Nt (Ak )−tφk .

We see that Mkt is a martingale because E e

iuk Nt (Ak ) = etφk , and therefore

E [Mkt | Fs] = Mks E [eiu(Nt (Ak )−Ns(Ak )))−(t−s)φk | Fs]

= Mks e−(t−s)φk E [eiu(Nt (Ak )−Ns(Ak ))] = Mks ,

using the independence and stationarity of the increments of a Poisson process.

Since we have argued that no two of the Nt (Ak ) jump at the same time, the same is true for

the Mkt and so [M

j, Mk]t = 0 if j �= k. By the product formula (Corollary 17.12) and Itô’s

formula (Theorem 17.10)

Mkt = 1 − φk

∫ t

0

eiuk Ns−(Ak )−sφk ds + iuk

∫ t

0

eiuk Ns−(Ak )−sφk dNs(Ak )

+

∑

s≤t

eiuk Ns−(Ak )−sφk [eiuk�Ns(Ak ) − 1 − iuk�Ns(Ak )]

= 1 − φk

∫ t

0

eiuk Ns−(Ak )−sφk ds +

∑

s≤t

eiuk Ns−(Ak )−sφk [eiuk�Ns(Ak ) − 1]

= 1 − B̃kt + Bkt .

We see therefore that Mkt − 1 is of the form Bkt − B̃kt , where Bkt is a complex-valued process

whose paths are locally of bounded variation, and B̃kt is the compensator of B

k

t .

Let M

k

t = Mkt∧r − 1. Since the Mkt do not jump at the same time, by the orthogonality

lemma (Lemma 17.2), E M

1

∞M

2

∞ = 0, which translates to

E M1r M

2

r = 1.

This implies

E

[

ei(u1Nr(A1 )+u2Nr(A2 ))

]

= erφ1 erφ2 = E

[

eiu1Nr(A1 )

]

E

[

eiu2Nr(A2 )

]

.

Since this holds for all u1, u2, then Nr(A1) and Nr(A2) are independent. We conclude that the

processes Nt (A1) and Nt (A2) are independent.

Poisson point processes 149

To handle the case n = 3, we first show that M1t M2t is a martingale. We write

E [M1t M

2

t | Fs]

= M1s M2s e−(t−s)(φ1+φ2)E [ei(u1(Nt (A1 )−Ns(A1 ))+u2(Nt (A2 )−Ns(A2 ))) | Fs]

= M1s M2s e−(t−s)(φ1+φ2)E [ei(u1(Nt (A1 )−Ns(A1 ))+u2(Nt (A2 )−Ns(A2 )))]

= M1s M2s ,

using the fact that Nt (A1) and Nt (A2) are independent of each other and each have stationary

and independent increments.

Note that M3t = eiu3Nt (A3 )−tφ3 has no jumps in common with M1t or M2t . Therefore if

M

3

t = M3t∧r, then

E [M

3

∞(M

1

∞M

2

∞)] = 0,

and as before this leads to

E [M3r (M

1

r M

2

r )] = 1.

As above this implies that Nr(A1), Nr(A2), and Nr(A3) are independent. To prove the general

induction step is similar.

We will also need the following corollary.

Corollary 18.3 Let Ft and Nt (Ak ) be as in Theorem 18.2. Suppose Yt is a process with paths

that are right continuous with left limits such that Yt − Ys is independent of Fs whenever

s < t and Yt − Ys has the same law as Yt−s for each s < t. Suppose moreover that Y has no
jumps in common with any of the Nt (Ak ). Then the processes Nt (A1), . . . , Nt (An), and Yt are
independent.
Proof The law of Y0 is the same as that of Yt − Yt , so Y0 = 0, a.s. By the fact that Y has
stationary and independent increments,
E eiuYs+t = E eiuYsE eiu(Ys+t−Ys) = E eiuYsE eiuYt ,
which implies that the characteristic function of Y is of the form E eiuYt = etψ(u) for some
function ψ(u).
We fix u ∈ R and define
MYt = eiuYt−tψ(u).
As in the proof of Theorem 18.2, we see that MYt is a martingale. Since M
Y has no jumps in
common with any of the Mkt , if M
Y
t = MYt∧r, we see by Lemma 17.2 that
E [M
Y
∞(M
1
∞ · · · M
n
∞)] = 1,
or
E [MYr M
1
r · · · Mnr ] = 1.
This leads as above to the independence of Y from all the Nt (Ak )’s.
We now turn to stochastic integrals with respect to Poisson point processes. In the same
way that a nondecreasing function on the reals gives rise to a measure, so Nt (A) gives rise
150 Poisson point processes
to a random measure μ(dt, dz) on the product σ -field B[0, ∞) × G, where B[0, ∞) is the
Borel σ -field on [0, ∞); μ is determined by
μ([0, t] × A)(ω) = Nt (A)(ω).
Define a nonrandom measure ν on B[0, ∞) × G by ν([0, t] × A) = tλ(A) for A ∈ G. If
λ(A) < ∞, then μ([0, t] × A) − ν([0, t] × A) is the same as a Poisson process minus its
mean, hence is locally a square integrable martingale.
We can define a stochastic integral with respect to the random measure μ − ν as follows.
Suppose H (ω, s, z) is of the form
H (ω, s, z) =
n∑
i=1
Ki(ω)1(ai,bi](s)1Ai (z), (18.1)
where for each i the random variable Ki is bounded and Fai measurable and Ai ∈ G with
λ(Ai) < ∞. For such H we define
Nt =
∫ t
0
∫
H (ω, s, z) d(μ − ν)(ds, dz) (18.2)
=
n∑
i=1
Ki(μ − ν)(((ai, bi] ∩ [0, t]) × Ai).
Let us assume without loss of generality that the Ai are disjoint. It is not hard to see (Exercise
18.3) that Nt is a martingale, that Nc = 0, and that
[N]t =
∫ t
0
∫
H (ω, s, z)2 μ(ds, dz). (18.3)
Since 〈N〉t must be predictable and all the jumps of N are totally inaccessible, it follows from
Proposition 16.30 that 〈N〉t is continuous. Since [N]t − 〈N〉t is a martingale, we conclude
〈N〉t =
∫ t
0
∫
H (ω, s, z)2 ν(ds, dz). (18.4)
Suppose H (s, z) is a predictable process in the following sense: H is measurable with
respect to the σ -field generated by all processes of the form (18.1). Suppose also that
E
∫ ∞
0
∫
S
H (s, z)2 ν(ds, dz) < ∞.
Take processes H n of the form (18.1) converging to H in the space L2 with norm
(E
∫∞
0
∫
S H
2 dν)1/2. The corresponding Nnt =
∫ t
0 H
n(s, z) d(μ − ν) are easily seen to be a
Cauchy sequence in L2, and the limit Nt we call the stochastic integral of H with respect to
μ − ν. As in the continuous case, we may prove that E N2t = E [N]t = E 〈N〉t , and it follows
from this, (18.3), and (18.4) that
[N]t =
∫ t
0
∫
S
H (s, z)2 μ(ds, dz), 〈N〉t =
∫ t
0
∫
S
H (s, z)2 ν(ds, dz). (18.5)
One may think of the stochastic integral as follows: if μ gives unit mass to a point at time t
with value z, then Nt jumps at this time t and the size of the jump is H (t, z).
Exercises 151
Exercises
18.1 Suppose {Ft} is a filtration satisfying the usual conditions and P1t and P2t are Poisson processes
with respect to {Ft} with parameters λ1, λ2, respectively. Suppose P1t + P2t is a Poisson process
with parameter λ1 + λ2. Prove that P1 and P2 are independent processes.
18.2 Suppose {Ft} is a filtration satisfying the usual conditions, Pt is a Poisson process with respect
to {Ft}, and Wt is a Brownian motion with respect to {Ft}. Show that if Wt + Pt has stationary
and independent increments, then P and W are independent processes.
18.3 If H is as in (18.1) and N is defined by (18.2), show that N is a martingale, Nc = 0, and [N]t is
given by (18.3).
18.4 Suppose {As, 0 < s < ∞} is a collection of subsets of S such that λ(As) → ∞ as s → ∞.
Show that Nt (As)/λ(As) converges to t uniformly over finite intervals, where the convergence
is in probability.
18.5 Suppose {As, 0 < s < ∞} is a collection of subsets of S such that Ar ⊂ As if r ≤ s and
λ(As) → ∞ as s → ∞. Show that for each t,
sup
u≤t
∣∣∣Nu(As)
λ(As)
− u
∣∣∣
tends to zero almost surely as s → ∞.
18.6 Let S be a metric space and λ a σ -finite measure on S. Construct a Poisson point process which
has λ as the corresponding measure.
18.7 Let P jt , j = 1, 2, . . . be independent Poisson processes with parameter β j . Let Xt =
∑∞
j=1 a jP
j
t ,
where a j is a sequence such that Xt is finite, a.s. For A ⊂ R \ {0}, define Nt (A) to be the number
of times before time t that X has a jump whose size is in A:
Nt (A) =
∑
s≤t
1A(Xs − Xs−).
Prove that Nt is a Poisson point process and determine λ.
19
Framework for Markov processes
It is not uncommon for a Markov process to be defined as a sextuple (�,F ,Ft, Xt, θt, Px),
and for additional notation (e.g., ζ , �,S, Pt, Rλ, etc.) to be introduced rather rapidly. This
can be intimidating for the beginner. We will explain all this notation in as gentle a manner
as possible. We will consider a Markov process to be a pair (Xt, Px) (rather than a sextuple),
where Xt is a single stochastic process and {Px} is a family of probability measures, one
probability measure Px corresponding to each element x of the state space.
19.1 Introduction
The idea that a Markov process consists of one process and many probabilities is one that takes
some getting used to. To explain this, let us first look at an example. Suppose X1, X2, . . . is a
Markov chain with stationary transition probabilities with K states: 1, 2, . . . , K. Everything
we want to know about X can be determined if we know p(i, j) = P(X1 = j | X0 = i) for
each i and j and μ(i) = P(X0 = i) for each i. We sometimes think of having a different
Markov chain for every choice of starting distribution μ = (μ(1), . . . , μ(K)). But instead
let us define a new probability space by taking �′ to be the collection of all sequences
ω = (ω0, ω1, . . .) such that each ωn takes one of the values 1, . . . , K. Define Xn(ω) = ωn.
Define Fn to be the σ -field generated by X0, . . . , Xn; this is the same as the σ -field generated
by sets of the form {ω : ω0 = a0, . . . , ωn = an}, where a0, . . . , an ∈ {1, 2, . . . , K}. For each
x = 1, 2, . . . , K, define a probability measure Px on �′ by
Px(X0 = x0, X1 = x1, . . . Xn = xn) (19.1)
= 1{x}(x0)p(x0, x1) · · · p(xn−1, xn).
We have K different probability measures, one for each of x = 1, 2, . . . , K, and we can start
with an arbitrary probability distribution μ if we define Pμ(A) =∑Ki=1 Pi(A)μ(i). We have
lost no information by this redefinition, and it turns out this works much better when doing
technical details.
The value of X0(ω) = ω0 can be any of 1, 2, . . . , K; the notion of starting at x is captured
by Px, not by X0. The probability measure Px is concentrated on those ω’s for which ω0 = x
and Px gives no mass to any other ω.
Let us now look at Brownian motion, and see how this framework plays out there. Let P
be a probability measure and let Wt be a one-dimensional Brownian motion with respect to
P started at 0. Then W xt = x + Wt is a one-dimensional Brownian motion started at x. Let
�′ = C[0, ∞) be the set of continuous functions from [0, ∞) to R, so that each element
152
19.2 Definition of a Markov process 153
ω in �′ is a continuous function. (We do not require that ω(0) = 0 or that ω(0) take any
particular value of x.) Define
Xt (ω) = ω(t). (19.2)
This will be our process. Let F be the σ -field on �′ = C[0, ∞) generated by the cylindrical
subsets of C[0, ∞); see Definition 1.1. Now define Px to be the law of W x. This means that
Px is the probability measure on (�′,F ) defined by
Px(X ∈ A) = P(W x ∈ A), x ∈ R, A ∈ F . (19.3)
The probability measure Px is determined by the fact that if n ≥ 1, t1 ≤ · · · ≤ tn, and
B1, . . . , Bn are Borel subsets of R, then
P(Xt1 ∈ B1, . . . , Xtn ∈ Bn) = P(W xt1 ∈ B1, . . . ,W xtn ∈ Bn).
We call the pair (Xt, Px), x ∈ R, t ≥ 0, a Brownian motion.
19.2 Definition of a Markov process
We want to allow our Markov processes to take values in spaces other than the Euclidean
ones. For now, we take our state space S to be a separable metric space, furnished with the
Borel σ -field. For the beginner, just think of R in place of S .
To define a Markov process, we start with a measurable space (�,F ) and suppose we
have a filtration {Ft} (not necessarily satisfying the usual conditions).
Definition 19.1 A Markov process (Xt, Px) is a stochastic process
X : [0, ∞) × � → S
and a family of probability measures {Px : x ∈ S} on (�,F ) satisfying the following.
(1) For each t, Xt is Ft measurable.
(2) For each t and each Borel subset A of S , the map x → Px(Xt ∈ A) is Borel measurable.
(3) For each s, t ≥ 0, each Borel subset A of S , and each x ∈ S , we have
Px(Xs+t ∈ A | Fs) = PXs (Xt ∈ A), Px − a.s. (19.4)
Some explanation is definitely in order. Let
ϕ(x) = Px(Xt ∈ A), (19.5)
so that ϕ is a function mapping S to R. Part of the definition of filtration given in Chapter 1
is that each Ft ⊂ F . Since we are requiring Xt to be Ft measurable, that means (Xt ∈ A) is
in F and it makes sense to talk about Px(Xt ∈ A). Definition 19.1(2) says that the function
ϕ is Borel measurable. This is a very mild assumption, and will be satisfied in the examples
we look at.
The expression PXs (Xt ∈ A) on the right-hand side of (19.4) is a random variable and its
value at ω ∈ � is defined to be ϕ(Xs(ω)), with ϕ given by (19.5). Note that the randomness
in PXs (Xt ∈ A) is thus all due to the Xs term and not the Xt term. Definition 19.1(3) can be
rephrased as saying that for each s, t, each A, and each x, there is a set Ns,t,x,A ⊂ � that is a
null set with respect to Px and for ω /∈ Ns,t,x,A, the conditional expectation Px(Xs+t ∈ A | Fs)
is equal to ϕ(Xs).
154 Framework for Markov processes
We have now explained all the terms in the sextuple (�,F ,Ft, Xt, θt, Px) except for θt .
These are called shift operators and are maps from � → � such that Xs ◦ θt = Xs+t . We
defer the precise meaning of the θt and the rationale for them until Section 19.5, where they
will appear in a natural way.
In the remainder of the section and in Section 19.3 we define some of the additional
notation commonly used for Markov processes. The first one is almost self-explanatory. We
use E x for expectation with respect to Px. As with PXs (Xt ∈ A), the notation E Xs f (Xt ), where
f is bounded and Borel measurable, is to be taken to mean ψ(Xs) with ψ(y) = E y f (Xt ).
If we want to talk about our Markov process started with distribution μ, we define
Pμ(B) =
∫
Px(B) μ(dx),
and similarly for E μ; here μ is a probability on S .
19.3 Transition probabilities
IfB is the Borel σ -field on a metric space S , a kernel Q(x, A) onS is a map from S×B → R
satisfying the following.
(1) For each x ∈ S , Q(x, ·) is a measure on (S,B).
(2) For each A ∈ B, the function x → Q(x, A) is Borel measurable.
The definition of Markov transition probabilities (or simply transition probabilities) is the
following.
Definition 19.2 A collection of kernels {Pt (x, A); t ≥ 0} are Markov transition probabilities
for a Markov process (Xt, Px) if
(1) Pt (x,S ) = 1 for each t ≥ 0 and each x ∈ S .
(2) For each x ∈ S , each Borel subset A of S , and each s, t ≥ 0,
Pt+s(x, A) =
∫
S
Pt (y, A)Ps(x, dy). (19.6)
(3) For each x ∈ S , each Borel subset A of S , and each t ≥ 0,
Pt (x, A) = Px(Xt ∈ A). (19.7)
Definition 19.2(3) can be rephrased as saying that for each x, the measures Pt (x, dy) and
Px(Xt ∈ dy) are the same. We define
Pt f (x) =
∫
f (y)Pt (x, dy) (19.8)
when f : S → R is Borel measurable and either bounded or non-negative.
Lemma 19.3 Suppose Pt are Markov transition probabilities. If f is Borel measurable and
either non-negative or bounded, then Pt f is non-negative (respectively, bounded) and Borel
measurable and
Pt f (x) = E x f (Xt ), x ∈ S. (19.9)
19.3 Transition probabilities 155
Proof Using (19.7) and Definition 19.1(2), the Borel measurability and (19.9) hold when
f is the indicator of a set A. By linearity they hold for simple functions, and then using
monotone convergence they hold for non-negative functions. Using linearity again, we have
measurability and (19.9) holding for f bounded and Borel measurable. The non-negativity
(respectively, the boundedness) of f follows from (19.9).
The equations (19.6) are known as the Chapman–Kolmogorov equations. They can be
rephrased in terms of equality of measures: for each x
Ps+t (x, dz) =
∫
y∈S
Pt (y, dz)Ps(x, dy). (19.10)
Multiplying (19.10) by a bounded Borel measurable function f (z) and integrating gives
Ps+t f (x) =
∫
Pt f (y)Ps(x, dy). (19.11)
The right-hand side is the same as Ps(Pt f )(x), so we have
Ps+t f (x) = PsPt f (x), (19.12)
i.e., the functions Ps+t f and PsPt f are the same. The equation (19.12) is known as the
semigroup property.
By Lemma 19.3, Pt is a linear operator on the space of bounded Borel measurable functions
on S . We can then rephrase (19.12) simply as
Ps+t = PsPt . (19.13)
Operators satisfying (19.13) are called a semigroup, and are much studied in functional
analysis. We will show in Chapter 36 how to construct the Markov process corresponding to
a given semigroup Pt . More about semigroups can also be found in Chapter 37.
One more observation about semigroups: if we take expectations in (19.4), we obtain
Px(Xs+t ∈ A) = E x
[
PXs (Xt ∈ A)
]
.
The left-hand side is Ps+t1A(x) and the right-hand side is
E
x[Pt1A(Xs)] = PsPt1A(x),
and so (19.4) encodes the semigroup property.
The resolvent or λ-potential of a semigroup Pt is defined by
Rλ f (x) =
∫ ∞
0
e−λtPt f (x) dt, λ ≥ 0, x ∈ S.
This can be recognized as the Laplace transform of Pt . By Lemma 19.3 and the Fubini
theorem, we see that
Rλ f (x) = E x
∫ ∞
0
e−λt f (Xt ) dt.
Resolvents are useful because they are typically easier to work with than semigroups.
When practitioners of stochastic calculus tire of a martingale, they “stop” it. Markov
process theorists are a harsher lot and they “kill” their processes. To be precise, attach an
156 Framework for Markov processes
isolated point � to S . Thus one looks at Ŝ = S ∪ �, and the topology on Ŝ is the one
generated by the open sets of S and {�}. � is called the cemetery point. All functions on S
are extended to Ŝ by defining them to be 0 at �. At some random time ζ the Markov process
is killed, which means that Xt = � for all t ≥ ζ . The time ζ is called the lifetime of the
Markov process.
19.4 An example
Let us give an example, that of Brownian motion, of course. Let Xt and Px be defined by
(19.2) and (19.3). Define Ft = σ (Xr; r ≤ t). Clearly Definition 19.1(1) holds. Observe that
since, under P, Wt is a mean zero normal random variable with variance t,
Px(Xt ∈ A) = P(W xt ∈ A) = P(x + Wt ∈ A) (19.14)
= 1√
2πt
∫
A
e−(y−x)
2/2t dy.
By dominated convergence, x → Px(Xt ∈ A) is continuous, therefore measurable. This
proves Definition 19.1(2). It remains to prove Definition 19.1(3), which is the following
proposition.
Proposition 19.4 Let W be a Brownian motion as defined by Definition 2.1, let W xt = x+Wt,
and let (Xt, Px) be defined by (19.2) and (19.3). If f is bounded and Borel measurable,
E
x[ f (Xt+s) | Fs] = E Xs f (Xt ), Px-a.s. (19.15)
Proof We will first prove
E
x[ f (Xt+s) | Fs] = E Xs f (Xt ) (19.16)
when f (x) = eiux. Using independent increments and the fact that Wt+s − Ws has the same
law as Wt , we see that under each Px, Xt+s − Xs is independent of Fs and has the same law as
a mean zero normal random variable with variance t. We conclude that
E
xeiu(Xt+s−Xs) = e−u2t/2;
see (A.25). We then write
E
x
[
eiuXt+s |Fs
]
= E x
[
eiu(Xt+s−Xs)|Fs
]
eiuXs
= E x
[
eiu(Xt+s−Xs)
]
eiuXs
= e−u2t/2eiuXs .
On the other hand, for any y,
E
yeiuXt = E eiuW yt = E eiuWt eiuy = e−u2t/2eiuy.
Replacing y by Xs proves (19.16) for this f .
Now suppose that f ∈ C∞ with compact support and let f̂ be the Fourier transform of f .
In (19.16) we replace u by −u, multiply both sides by f̂ (u), and integrate over u ∈ R. Using
19.4 An example 157
the Fourier inversion formula, we then have
E
x[ f (Xt+s) | Fs] = (2π)−1E x
[ ∫
e−iuXt+s f̂ (u) du | Fs
]
= (2π)−1E Xs
∫
e−iuXt f̂ (u) du
= E Xs f (Xt ).
We used the Fubini theorem several times to interchange expectation and integration; this
is justified because f in C∞ with compact support implies f̂ is in the Schwartz class; see
Section B.2. This proves the proposition for f in C∞ with compact support, and a limit
argument gives it for all bounded and measurable f .
The same proof works for d-dimensional Brownian motion.
Set
Pt (x, A) = Px(Xt ∈ A) = P(Wt + x ∈ A) = 1√
2πt
∫
A
e−(y−x)
2/2t dy. (19.17)
Clearly for each x and t, Pt (x, ·) is a measure with total mass 1. As we mentioned earlier, the
function x → Pt (x, A) is continuous, hence Borel measurable. We will show the Chapman–
Kolmogorov equations. These follow from the next proposition.
Proposition 19.5 If s, t > 0 and x, z ∈ R, then∫

y∈R

1√

2πt

e−(y−x)

2/2t 1√

2πs

e−(z−y)

2/2s dy (19.18)

= 1√

2π(s + t)e

−(z−x)2/2(s+t).

Proof This is a well-known property of the Gaussian density, but we can derive (19.18)

from Proposition 19.4. Let f be continuous with compact support. Taking expectations in

(19.15),

E

x f (Xt+s) = E x[E Xs f (Xt )],

or

Pt+s f (x) = PsPt f (x).

Using Lemma 19.3 and (19.17),∫

f (x)

1√

2π(s + t)e

−(z−x)2/2(s+t) dx

=

∫

f (x)

∫

1√

2πt

e−(y−x)

2/2t 1√

2πs

e−(z−y)

2/2s dy dx.

Since this holds for all continuous f with compact support, (19.18) holds for almost every

x. Since both sides of (19.18) are continuous in x, then (19.18) holds for all x.

158 Framework for Markov processes

19.5 The canonical process and shift operators

Suppose we have a Markov process (Xt, Px) where Ft = σ (Xs; s ≤ t). Suppose for the

moment that Xt has continuous paths. For this to even make sense, it is necessary that the

set {t → Xt is not continuous} to be in F , and then we require this event to be Px-null for

each x. Define �̃ to be the set of continuous functions on [0, ∞). If ω̃ ∈ �̃, set X̃t = ω̃(t).

Define F̃t = σ (X̃s; s ≤ t) and F̃∞ = ∨t≥0F̃t . Finally define P̃x on (�̃, F̃∞) by P̃x(X̃ ∈ ·) =

Px(X ∈ ·). Thus P̃x is specified uniquely by

P̃x(X̃t1 ∈ A1, . . . , X̃tn ∈ An) = Px(Xt1 ∈ A1, . . . , Xtn ∈ An)

for n ≥ 1, A1, . . . , An Borel subsets of S , and t1 < · · · < tn. Clearly there is so far no loss
(or gain) by looking at the Markov process (X̃t, P̃x), which is called the canonical process.
Let us now suppose we are working with the canonical process, and we drop the tildes
everywhere. We define the shift operators θt : � → � as follows. θt (ω) will be an element
of � and therefore is a continuous function from [0, ∞) to S . Define
θt (ω)(s) = ω(t + s).
Then
Xs ◦ θt (ω) = Xs(θt (ω)) = θt (ω)(s) = ω(t + s) = Xt+s(ω).
The shift operator θt takes the path of X and chops off and discards the part of the path before
time t.
We will use expressions like f (Xs) ◦ θt . If we apply this to ω ∈ �, then
( f (Xs) ◦ θt )(ω) = f (Xs(θt (ω))) = f (Xs+t (ω)),
or f (Xs) ◦ θt = f (Xs+t ).
If the paths of X are not continuous, but instead only right continuous with left limits,
we can follow exactly the above procedure, except we start with �̃ being the collection of
functions from [0, ∞) to S that are right continuous with left limits.
Even if we are not in this canonical setup, from now on we will suppose there exist shift
operators mapping � into itself so that
Xs ◦ θt = Xs+t .
Exercises
19.1 Suppose (Xt , Px) is a Brownian motion and St = sups≤t Xs. Show that ((Xt , St ), Px) is a Markov
process and determine the transition probabilities.
19.2 Suppose (Xt , Px) is a Brownian motion, f a non-negative, bounded, Borel measurable function,
and At =
∫ t
0 f (Xs) ds. Show that ((Xt , At ), P
x) is a Markov process.
19.3 Suppose Pt is a Poisson process with parameter λ. Let �′ be the collection of functions on [0,∞)
which are right continuous and which have left limits, let F be the σ -field on �′ generated by
the cylindrical subsets of �′, let Pxt = x + Pt , and let Px be the law of x + P. Show that (Xt , Px)
is a Markov process and determine the transition probabilities.
Notes 159
19.4 Suppose m is a measure on the Borel subsets B of a metric space S. Suppose for each t > 0 there

exist jointly measurable non-negative functions pt : S ×S → R such that

∫

pt (x, y) m(dy) = 1

for each x and t and define

Pt (x, A) =

∫

A

pt (x, y) m(dy).

Show that the kernels Pt satisfy the Chapman–Kolmogorov equations if and only if∫

ps(x, y)pt (y, z) m(dy) = ps+t (x, z)

for every s, t ≥ 0, every x ∈ S, and m-almost every z.

19.5 The Ornstein–Uhlenbeck process Y started at x is a continuous Gaussian process with EYt =

e−t/2x and covariance

Cov (Ys,Yt ) = e−(s+t)/2(es∧t − 1).

If X is the canonical process and Px is the law of an Ornstein–Uhlenbeck process started at x,

show that (Xt , Px) is a Markov process and determine the transition probabilities.

Notes

For more, see Blumenthal and Getoor (1968).

20

Markov properties

We want to accomplish three things in this chapter. First, we want to talk about what it means

in the Markov process context for a filtration to satisfy the usual conditions. This is now

more complicated than in Chapter 1 because we have more than one probability measure.

Second, we want to extend the Markov property to expressions that are more complicated

than E x[ f (Xs+t ) | F s]. Third, we want to look at the strong Markov property, which means

we look at expressions like E x[ f (XT+t ) | FT ], where T is a stopping time.

Throughout this chapter we assume that X has paths that are right continuous with left

limits. To be more precise, if

N = {ω : the function t → Xt (ω) is not right continuous with left limits},

then we assume N ∈ F and N is Px-null for every x ∈ S .

20.1 Enlarging the filtration

Let us first introduce some notation. Define

F 00t = σ (Xs; s ≤ t), t ≥ 0. (20.1)

This is the smallest σ -field with respect to which each Xs is measurable for s ≤ t. We let

F 0t be the completion of F 00t , but we need to be careful what we mean by completion here,

because we have more than one probability measure present. Let N be the collection of sets

that are Px-null for every x ∈ S . Thus N ∈ N if (Px)∗(N ) = 0 for each x ∈ S , where (Px)∗

is the outer probability corresponding to Px. The outer probability (Px)∗ is defined by

(Px)∗(S) = inf{Px(B) : A ⊂ B, B ∈ F}.

Let

F 0t = σ (F 00t ∪ N ). (20.2)

Finally, let

Ft = F 0t+ = ∩ε>0F 0t+ε. (20.3)

We call {Ft} the minimal augmented filtration generated by X . Ultimately, we will work

only with {Ft}, but we need the other two filtrations at intermediate stages. The reason for

worrying about which filtrations to use is that {F 00t } is too small to include many interesting

sets (such as those arising in the law of the iterated logarithm, for example), while if the

filtration is too large, the Markov property will not hold for that filtration.

160

20.1 Enlarging the filtration 161

The filtration matters when defining a Markov process; see Definition 19.1(3). We will

assume throughout this section that (Xt, Px) is a Markov process with respect to the filtration

{F 00t }, that is,

Px(Xs+t ∈ A | F 00s ) = PXs (Xt ∈ A), Px-a.s. (20.4)

whenever A is a Borel subset of S and s, t ≥ 0.

We will also make the following assumption, which will be needed here and also in

Section 20.3.

Assumption 20.1 Suppose Pt f is continuous on S whenever f is bounded and continuous

on S .

Markov processes satisfying Assumption 20.1 are called Feller processes or weak Feller

processes. If Pt f is continuous whenever f is bounded and Borel measurable, then the

Markov process is said to be a strong Feller process.

We show that we can replace F 00t in (20.4) by F 0t .

Proposition 20.2 Let (Xt, Px) be a Markov process and suppose that (20.4) holds. If A is a

Borel subset of S , x ∈ S , and s, t ≥ 0, then

Px(Xs+t ∈ A | F 0s ) = PXs (Xt ∈ A), Px-a.s. (20.5)

Proof Since the right-hand side is a function of Xs and hence F 0s measurable, we need to

show that if B ∈ F 0s , then

Px(Xs+t ∈ A, B) = E x

[

PXs (Xt ∈ A); B

]

. (20.6)

This holds for B ∈ F 00s by (20.4). It holds for sets B ∈ N , the class of null sets, since both

sides are 0. Therefore it holds for sets B such that there exists B1 ∈ F 00s with B�B1 being

a null set. By linearity it holds for finite disjoint unions of sets of the form just described.

The class of such finite disjoint unions is a monotone class that generates F 0s , and our result

follows by the monotone class theorem, Theorem B.2.

The next step is to go from F 0s to Fs.

Proposition 20.3 Let (Xt, Px) be a Markov process and suppose that (20.4) holds. If As-

sumption 20.1 holds and f is a bounded Borel measurable function, then

E

x[ f (Xs+t ) | Fs] = E Xs f (Xt ), Px-a.s. (20.7)

It will turn out (see Proposition 20.7 below) that F 0s is equal to Fs, but we do not know

this yet.

Proof We start with (20.5). By linearity, we have

E

x[ f (Xs+t ) | F 0s ] = E Xs f (Xt ), Px-a.s., (20.8)

when f is a simple random variable, then by monotone convergence when f is non-negative,

and then by linearity again, when f is bounded and Borel measurable. In particular, we have

this when f is bounded and continuous.

162 Markov properties

If B ∈ Fs = F 0s+, then B ∈ F 0s+ε for every ε > 0. Hence by (20.8) with s replaced by

s + ε, if f is bounded and continuous,

E

x[ f (Xs+t+ε ); B] = E x

[

E

Xs+ε f (Xt ); B

]

. (20.9)

The right-hand side is equal to

E

x[Pt f (Xs+ε ); B];

since Pt f is continuous and Xt has paths that are right continuous with left limits, this

converges to

E

x[Pt f (Xs); B] = E x

[

E

Xs f (Xt ); B

]

by dominated convergence. The left-hand side of (20.9) converges, using dominated conver-

gence, the continuity of f , and the fact that X has paths that are right continuous with left

limits, to

E

x[ f (Xs+t ); B].

We therefore have

E

x[ f (Xs+t ); B] = E x

[

E

Xs f (Xt ); B

]

. (20.10)

A limit argument shows this holds whenever f is bounded and measurable. Since B is an

arbitrary event in Fs, that completes the proof.

Remark 20.4 In Chapter 16, we discussed the fact that the first time a right continuous

process whose jump times are totally inaccessible hits a Borel set is a stopping time, provided

the filtration satisfies the usual conditions. Even though the notion of completion of a filtration

is a bit different in the context of Markov processes, the result is still true. See Blumenthal

and Getoor (1968).

20.2 The Markov property

We start with the Markov property given by Proposition 20.3:

E

x[ f (Xs+t ) | Fs] = E Xs [ f (Xt )], Px-a.s. (20.11)

Since f (Xs+t ) = f (Xt ) ◦ θs, if we write Y for the random variable f (Xt ), we have

E

x[Y ◦ θs | Fs] = E XsY, Px-a.s. (20.12)

We wish to generalize this to other random variables Y .

Proposition 20.5 Let (Xt, Px) be a Markov process and suppose (20.11) holds. Suppose

Y =∏ni=1 fi(Xti−s), where the fi are bounded, Borel measurable, and s ≤ t1 ≤ · · · ≤ tn. Then

(20.12) holds.

20.2 The Markov property 163

Proof We will prove this by induction on n. The case n = 1 is (20.11), so we suppose the

equality holds for n and prove it for n + 1.

Let V =∏n+1j=2 f j(Xtj−t1 ) and h(y) = E yV . By the induction hypothesis,

E x

[ n+1∏

j=1

f j(Xtj )|Fs

]

= E x

[

E x[V ◦ θt1 |Ft1 ] f1(Xt1 )|Fs

]

= E x

[

(E

Xt1 V ) f1(Xt1 )|Fs

]

= E x[(h f1)(Xt1 )|Fs].

By (20.11) this is E Xs [(h f1)(Xt1−s)]. For any y,

E

y[(h f1)(Xt1−s)] = E y[(E Xt1−sV ) f1(Xt1−s)]

= E y

[

E

y[V ◦ θt1−s|Ft1−s] f1(Xt1−s)

]

= E y[(V ◦ θt1−s) f1(Xt1−s)].

If we replace V by its definition, replace y by Xs, and use the definition of θt1−s, we get the

desired equality for n + 1 and hence the induction step.

We now come to the general version of the Markov property. As usual, F∞ = ∨t≥0Ft .

The expression Y ◦ θt for general Y may seem puzzling at first. We will give some examples

when we get to applications of the strong Markov property in Chapter 21.

Theorem 20.6 Let (Xt, Px) be a Markov process and suppose (20.11) holds. Suppose Y is

bounded and measurable with respect to F∞. Then

E x[Y ◦ θs | Fs] = E XsY, Px-a.s. (20.13)

Proof If in Proposition 20.5 we take f j(x) = 1Aj (x) for Borel measurable Aj, we have

E

x[1B ◦ θs | Fs] = E Xs 1B (20.14)

when B = {ω : ω(t1) ∈ A1, . . . , ω(tn) ∈ An}. It is easy to see that the set of B’s for which

(20.14) holds is a monotone class. By an argument using the monotone class theorem, (20.14)

holds for all B that are measurable with respect to F∞. Taking linear combinations, (20.13)

holds for Y ’s that are simple random variables. Using monotone convergence, (20.13) holds

for non-negative Y ’s, and then by linearity for bounded Y ’s.

Proposition 20.7 Let (Xt, Px) be a Markov process with respect to {Ft}. Let F 0t and Ft be

defined by (20.2) and (20.3). Then Ft = F 0t for each t ≥ 0.

Proof Let Y1 =

∏n

i=1 fi(Xti ) and Y2 =

∏m

j=1 gj(Xuj ), where t1 < · · · < tn ≤ s and
0 ≤ u1 < · · · < um and the f j and gj are bounded Borel measurable functions. Then by
Proposition 20.5,
E
x[(Y1)(Y2 ◦ θs) | Fs] = Y1E XsY2.
Since E XsY2 is a function of Xs, then (Y1)(E
XsY2) is F 0s measurable. Using a monotone class
argument, we conclude that if Y is bounded and F∞ measurable, then E x[Y | Fs] is F 0s
164 Markov properties
measurable. Now apply this to Y = 1A for A ∈ Fs to obtain that 1A = E x[1A | Fs] is F 0s
measurable.
The following is known as the Blumenthal 0–1 law.
Proposition 20.8 Let (Xt, Px) be a Markov process with respect to {Ft}. If A ∈ F0, then for
each x, Px(A) is equal to 0 or 1.
Proof Suppose A ∈ F0. Under Px, X0 = x, a.s., and then
Px(A) = E X0 1A = E x[1A ◦ θ0 | F0] = 1A ◦ θ0 = 1A ∈ {0, 1}, Px-a.s.
since 1A ◦ θ0 is F0 measurable. Our result follows because Px(A) is a real number and not
random.
20.3 Strong Markov property
Given a stopping time T , recall that the σ -field of events known up to time T is defined to be
FT =
{
A ∈ F∞ : A ∩ (T ≤ t) ∈ Ft for all t > 0

}

.

We define θT by θT (ω)(t) = ω(T (ω) + t). Thus, for example, Xt ◦ θT (ω) = XT (ω)+t (ω) and

XT (ω) = XT (ω)(ω).

Now we can state the strong Markov property. The notation and definition are admittedly

a bit opaque at this stage – be patient until we reach the examples in the next chapter.

Theorem 20.9 Suppose (Xt, Px) is a Markov process with respect to {Ft}, that Assumption

20.1 holds, and that T is finite stopping time. If Y is bounded and measurable with respect

to F∞, then

E

x[Y ◦ θT |FT ] = E XT Y, Px-a.s.

Proof Following the proofs of Section 20.2, it is enough to prove

E

x[ f (XT+t )|FT ] = E XT f (Xt ) (20.15)

for f bounded. We can obtain this by a limit argument if we have (20.15) for f bounded and

continuous. Define Tn to be equal to (k + 1)/2n on the event (k/2n ≤ T < (k + 1)/2n).
If A ∈ FT , then A ∈ FTn . Therefore A ∩ (Tn = k/2n) ∈ Fk/2n and we have by the Markov
property, Theorem 20.6,
E x[ f (XTn+t ); A, Tn = k/2n] = E x[ f (Xt+k/2n ); A, T = k/2n]
= E x[E Xk/2n f (Xt ); A, Tn = k/2n]
= E x[E XTn f (Xt ); A, Tn = k/2n].
20.3 Strong Markov property 165
Then
E x[ f (XTn+t ); A] =
∞∑
k=1
E x[ f (XTn+t ); A, Tn = k/2n]
=
∞∑
k=1
E
x[
E
XTn f (Xt ); A, Tn = k/2n
]
= E x[E XTn f (Xt ); A].
Now let n → ∞. E x[ f (XTn+t ); A] → E x[ f (XT+t ); A)] by dominated convergence and
the continuity of f and the right continuity of Xt . On the other hand, using the continuity of
Pt f , E
XTn f (Xt ) = Pt f (XTn ) → Pt f (XT ) = E XT f (Xt ). Therefore
E
x[ f (XT+t ); A] = E x[E XT f (Xt ); A]
for all A ∈ FT , and hence (20.15) holds.
Recall that we are restricting our attention to Markov processes whose paths are right
continuous with left limits. If we have a Markov process (Xt, Px) whose paths are right
continuous with left limits, which has shift operators {θt}, and which satisfies the conclusion
of Theorem 20.9, whether or not Assumption 20.1 holds, then we say that (Xt, Px) is a strong
Markov process. A strong Markov process is said to be quasi-left continuous if XTn → XT ,
a.s., on {T < ∞} whenever Tn are stopping times increasing up to T . Unlike in the definition
of predictable stopping times given in Chapter 16, we are not requiring the Tn to be strictly less
than T . A Hunt process is a strong Markov process that is quasi-left continuous. Quasi-left
continuity does not imply left continuity; consider the Poisson process.
Proposition 20.10 If (Xt, Px) is a strong Markov process and Assumption 20.1 holds, then
Xt is quasi-left continuous.
Proof First suppose T is bounded, Tn increases to T , Y = limn→∞ XTn , and f and g are
bounded and continuous. If Tn = T for some n, then limn→∞ g(XTn+t ) = g(XT+t ), and if
Tn < T for all n, then limn→∞ g(XTn+t ) = g(X(T+t)−), where Xs− is the left-hand limit at time
s. In either case,
lim
t→0
lim
n→∞
g(XTn+t ) = g(XT ).
Then
E x[ f (Y )g(XT )] = lim
t→0
lim
n→∞
E x[ f (XTn )g(XTn+t )]
= lim
t→0
lim
n→∞
E
x[ f (XTn )Ptg(XTn )]
= lim
t→0
E
x[ f (Y )Ptg(Y )] = E x[ f (Y )g(Y )].
By a limit argument we have
E
x[h(Y, XT )] = E x[h(Y,Y )] (20.16)
for all bounded measurable functions h on S × S . Now take h(x, y) to be zero if x = y and
one otherwise. The right-hand side of (20.16) is 0, so the left-hand side is also.
166 Markov properties
If T is not bounded, apply the argument in the preceding paragraph to the stopping time
T ∧ M , where M is a positive real, and then let M → ∞.
Exercises
20.1 Suppose that S is a locally compact separable metric space and C0 is the set of continuous
functions on S that vanish at infinity. To say a continuous function f vanishes at infinity means
that given ε > 0 there exists a compact set K such that | f (x)| < ε if x /∈ K. Show that if
Assumption 20.1 is replaced by the assumptions that Pt f ∈ C0 whenever f ∈ C0 and Pt f → f
uniformly as t → 0 whenever f ∈ C0, then the conclusion of Theorem 20.9 still holds.
20.2 Suppose (Xt , Px) is a Markov process with respect to a filtration {Ft}. Suppose that Et ⊂ Ft
for each t and that Xt is Et measurable for each t. Show that (Xt , Px) is a Markov process with
respect to the filtration {Et}.
20.3 Give an example of a Markov process that is not a strong Markov process.
Hint: Let the state space be [0,∞) and starting from x ∈ (0,∞), let X move deterministically
at constant speed to the right. Starting at 0, let X wait an exponential length of time, and then
begin moving at constant speed to the right.
20.4 Let (Xt , Px) be Brownian motion and let {Ft} be the minimal augmented filtration. Suppose
B ∈ ∨t≥0Ft and for some s > 0 is of the form 1B = 1A ◦ θs. Show that if B is a Px-null set for

some x, then it is a Px-null set for every x.

20.5 Let Pt be transition probabilities for a Poisson process with parameter λ. These are defined in

Exercise 19.3. Show that Assumption 20.1 holds.

20.6 Suppose (Xt , Px) is a Markov process with transition probabilities Pt , f is a bounded Borel

measurable function, t0 > 0, and we define Mt = Pt0−t f (Xt ) for t ≤ t0. Show that (Mt , t ≤ t0)

is a Px-martingale for each x.

20.7 Use the Blumenthal 0–1 law to show that if W is a one-dimensional Brownian motion and

T = inf{t > 0 : Wt > 0} is the first time Brownian motion hits (0,∞), then P(T = 0) = 1.

20.8 Let A be a Borel subset of a metric space S. Let TA = inf{t : Xt ∈ A}, where (Xt , Px) is a strong

Markov process. Show that Px(TA = 0) is either 0 or 1 for each x.

20.9 Let (Xt , Px) be a strong Markov process and let A be a Borel subset of S. We define Ar by setting

Ar = {x : Px(TA = 0) = 1}, where TA is the first hitting time of A. Thus Ar is the set of points

that are regular for A. Prove that for each x,

Px(XTA ∈ A ∪ Ar) = 1.

21

Applications of the Markov properties

We give some applications of the Markov property and the strong Markov property. In the

first application, we show that d-dimensional Brownian motion is transient if d ≥ 3. Next

we consider estimates on additive functionals. (An example of an additive functional is

At =

∫ t

0 f (Xs) ds, where f is a non-negative function on the state space of the Markov

process X .) Third is a sufficient criterion for a Markov process to have continuous paths.

Finally, we discuss harmonic functions and show how to solve the classical Dirichlet problem

of analysis and partial differential equations.

21.1 Recurrence and transience

Let Wt = (W1(t), . . . ,Wd (t)) be a d-dimensional Brownian motion started at 0 with d ≥ 3

and let W xt = x +Wt be Brownian motion started at x. Let h(y) = |y|2−d . A direct calculation

of derivatives shows that

�h(x) =

d∑

i=1

∂2h

∂x2i

(x) = 0, x �= 0.

(Noting that

∂

∂yi

|y| = ∂

∂yi

(y21 + · · · + y2d )1/2 =

yi

|y|

helps with the calculation.) By Exercise 9.4, 〈Wi,Wj〉t equals 0 if i �= j and we saw in Section

9.3 that it equals t if i = j. Suppose r < |x| < R, and let
S = inf{t : |W xt | ≤ r or |W xt | ≥ R}.
S is finite, a.s., because |W xt | ≥ |W1(t)| − |x| and W1(t) exits [−2R, 2R] in finite time by
Theorem 7.2. By Itô’s formula,
h(W xt∧S ) = h(W x0 ) + martingale + 12
∫ t∧S
0
d∑
i=1
∂2h
∂x2i
(W xs ) ds
= h(x) + martingale.
Therefore h(Wt∧S ) − h(x) is a martingale started at 0. The function h is equal to r2−d on
∂B(0, r), the boundary of B(0, r), and equal to R2−d on ∂B(0, R), the boundary of B(0, R).
167
168 Applications of the Markov properties
By Corollary 3.17, we deduce
P(W xt hits B(0, r) before B(0, R))
= P(h(W xt ) − h(x) hits r2−d − |x|2−d before R2−d − |x|2−d )
= |x|
2−d − R2−d
r2−d − R2−d .
If we let R → ∞ and recall that 2 − d < 0, we see that
P(W xt ever hits ∂B(0, r)) =
( r
|x|
)d−2
. (21.1)
We want to use the strong Markov property to go from (21.1) to
lim
t→∞
|W xt | = ∞.
(There are other ways besides the strong Markov property of showing this.) The first step in
doing this is to convert to the Markov process notation. Let (Xt, Px) be a Brownian motion.
What we have shown is that
Px(Xt ever hits ∂B(0, r)) =
( r
|x|
)d−2
. (21.2)
Let M > 0 and let

S1 = inf{t : |Xt | ≥ 2M},

T1 = inf{t > S1 : |Xt | ≤ M},

S2 = inf{t > T1 : |Xt | ≥ 2M},

T2 = inf{t > S2 : |Xt | ≤ M},

and so on. Another way of writing this is to define

S = inf{t > 0 : |Xt | ≥ 2M}, T = inf{t > 0 : |Xt | ≤ M},

and then to let S1 = S, and for each i ≥ 1,

Ti = Si + T ◦ θSi, Si+1 = Ti + S ◦ θTi .

Let us explain what is going on. Given a path ω, which is a continuous function from [0, ∞)

to Rd , T ◦ θSi means to proceed along the path until time Si, disregard this piece, and then

see how long it takes after time Si to first enter B(0, M ). If we add the quantity Si to T ◦ θSi ,

we then get the amount of time for Xt to first enter B(0, M ) after time Si. Thus Ti with the

shift notation is the same as inf{t > Si : Xt ∈ B(0, M )}. The shift notation interpretation of

Si+1 is similar.

Now we can apply the strong Markov property. Since Ti+1 = Si+1 + T ◦ θSi+1 , we can write

Px(Ti+1 < ∞) = Px(Si+1 < ∞, T ◦ θSi+1 < ∞)
= E x
[
Px(T ◦ θSi+1 < ∞ | FSi+1 ); Si+1 < ∞
]
= E x
[
PXSi+1 (T < ∞); Si+1 < ∞
]
.
21.2 Additive functionals 169
At time Si+1, we have |XSi+1 | = 2M , and by (21.1)
PXSi+1 (T < ∞) = ( 12 )d−2.
Therefore
Px(Ti+1 < ∞) ≤ 22−dPx(Si+1 < ∞) ≤ 22−dPx(Ti < ∞).
The last inequality is simply the fact that Si+1 ≥ Ti. Since Px(T1 < ∞) ≤ 1, induction tells
us that
Px(Ti < ∞) ≤ 2(2−d)(i−1) → 0
as i → ∞. Hence Px(Ti < ∞ for all i) = 0. Since Ti increases as i increases, for almost all
ω, Ti will be infinite for i sufficiently large (how large will depend on ω). Hence Xt returns to
B(0, M ) for a last time, a.s. Since M is arbitrary, this proves that Xt tends to ∞ as t → ∞.
We have thus proved
Proposition 21.1 If (Xt, Px) is a d-dimensional Brownian motion and d ≥ 3, then |Xt | → ∞
as t → ∞ with Px-probability one for each x.
21.2 Additive functionals
Let D be a closed subset of S , let f : D → [0, ∞), let S = τD, and let
A = sup
x∈D
E
x
∫ S
0
f (Xs) ds,
where τD = inf{t > 0 : Xt /∈ D} is the first time X exits D.

Proposition 21.2 If A < ∞, then
sup
x∈D
Px
( ∫ S
0
f (Xs)ds ≥ 2kA
)
≤ 2−k. (21.3)
This is rather remarkable: as soon as one gets a bound on the expectation, although it
must be uniform in x, one gets exponential tails for the distribution. A use of Chebyshev’s
inequality would only give the bound (2k)−1.
Proof Let Bt =
∫ t∧S
0 f (Xs) ds. This is a special case of what is known as an additive
functional; see Section 22.3. Let U1 = inf{t : Bt ≥ 2A}, and let Ui+1 = Ui + U1 ◦ θUi . To
explain this formula, composing ω with θUi means we disregard the path before time Ui.
Thus U1 ◦ θUi is the length of time after time Ui until Bt has increased an amount 2A over its
value at Ui. Therefore Ui + U1 ◦ θUi is the (i + 1)st time B has increased by 2A. The event
Px(BS ≥ 2kA) is bounded by
Px(Uk ≤ S) = Px(Uk−1 ≤ S,U1 ◦ θUk−1 ≤ S ◦ θUk−1 )
= E x[Px(U1 ◦ θUk−1 ≤ S ◦ θUk−1 |FUk−1 );Uk−1 ≤ S]
= E x[PXUk−1 (U1 ≤ S);Uk−1 ≤ S].
170 Applications of the Markov properties
If Uk−1 ≤ S, then XUk−1 ∈ D. If y ∈ D,
Py(U1 ≤ S) ≤ Py
( ∫ S
0
f (Xs)ds ≥ 2A
)
≤ E
y ∫ S
0 f (Xs)ds
2A
≤ 12
by Chebyshev’s inequality. Then
Px(Uk ≤ S) ≤ 12Px(Uk−1 ≤ S)
and (21.3) follows by induction.
We give another proof of Proposition 4.5.
Proposition 21.3 Let W be a one-dimensional Brownian motion. If T is a finite stopping
time and a < b, then
P(WT+t ∈ [a, b] | FT ) ≤ b − a√
2πt
, a.s.
Proof Let (Xt, Px) be a one-dimensional Brownian motion. If y ∈ R, then
Py(Xt ∈ [a, b]) = P0(Xt ∈ [a − y, b − y]) (21.4)
= 1√
2πt
∫ b−y
a−y
e−z
2/2t dz ≤ b − a√
2πt
.
By the strong Markov property,
P(WT+t ∈ [a, b] | FT ) = P0(XT+t ∈ [a, b] | FT ) = E 0[1[a,b](Xt ) ◦ θT | FT ]
= E XT [1[a,b](Xt )] = PXT (Xt ∈ [a, b]).
Now use (21.4) with y replaced by XT .
21.3 Continuity
Let us now come up with a criterion for a Markov process to have continuous paths. We
assume we have a strong Markov process (Xt, Px) whose paths are right continuous with left
limits. Let d(·, ·) be the metric for the state space S .
Lemma 21.4 Let (Xt, Px) be a strong Markov process with state space S . For all x ∈ S and
all λ ≥ 0,
Px(sup
s≤t
d(Xs, x) ≥ λ) ≤ 2 sup
s≤t
sup
y∈S
Py(d(Xs, X0) ≥ λ/2).
Note that the left-hand side has the supremum inside while the right-hand side has the
suprema outside the probability.
Proof Let us use the notation
F (t, λ) = sup
s≤t
sup
y∈S
Py(d(Xs, X0) ≥ λ). (21.5)
21.4 Harmonic functions 171
Write S = inf{t : d(Xt, X0) ≥ λ}. Then by the strong Markov property,
Px(sup
s≤t
d(Xs, x) ≥ λ) ≤ Px(d(Xt, x) ≥ λ/2) + Px(S < t, d(Xt, X0) ≤ λ/2)
≤ F (t, λ/2) + E x
[
PXS (d(Xt−S, X0) ≥ λ/2)
]
≤ 2F (t, λ/2); (21.6)
see Exercise 21.2.
Proposition 21.5 Let (Xt, Px) be a strong Markov process. With F (t, λ) defined as in (21.5),
suppose
F (t, λ)
t
→ 0 (21.7)
as t → 0 for each λ > 0. Then Xt has continuous paths with Px-probability one for each x.

For X a Brownian motion, F (t, λ) ≤ 2e−λ2/8t by Proposition 3.15, and hence F (t, λ)/t → 0

as t → 0. Thus Brownian motion satisfies (21.7). On the other hand, (21.7) is not satisfied

for the Poisson process; see Exercise 21.3.

Proof Suppose λ, t0 > 0 and X has a jump of size larger than 4λ at some time before t0

with positive probability, that is,

Px(sup

t≤t0

d(Xt−, Xt ) ≥ 4λ) > 0,

where Xt− = lims↑t,s

motion enters A immediately. For example, a consequence of Theorem 7.2 is that the point

0 is regular for the set A = (0, ∞) when we have a one-dimensional Brownian motion.

Theorem 21.7 Suppose D is a bounded open domain in Rd and f is a function on ∂D that

is continuous on ∂D. Let (Xt, Px) be a d-dimensional Brownian motion and τD = inf{t :

Xt ∈ Dc}. If each point of ∂D is regular for Dc, then h(x) = E x f (XτD ) is a solution to the

Dirichlet problem.

The regularity condition says that starting at any point of ∂D, Brownian motion enters

Dc immediately. Uniqueness of the solution to the Dirichlet problem is easy, and we do not

address this here.

Proof We have already seen in Proposition 21.6 and the remarks immediately following

the proof of that proposition that h is harmonic in D. This implies that h is continuous in D.

Thus we only need to show that h agrees with f on ∂D.

Our first step is to fix t and ε and to show that the set

{x : Px(τD ≤ t) > 1 − ε}

is an open set. Let s < t, define ϕs(x) = Px(τD ≤ t − s), and let
ws(x) = Px(Xu ∈ Dc for some u ∈ [s, t]).
By the Markov property at time s,
ws(x) = E xPXs (Xu ∈ Dc for some u ∈ [0, t − s]) = E x[PXs (τD ≤ t − s)]
= E xϕs(Xs) = (2πs)−d/2
∫
ϕs(y)e
−|x−y|2/2s dy.
By dominated convergence, the last integral is a continuous function of x. If
w0(x) = Px(Xu ∈ Dc for some u ∈ [0, t]),
then ws(x) ↑ w0(x), so {x : w0(x) > 1 − ε} = ∪s∈(0,t){x : ws(x) > 1 − ε} is open.

Let z ∈ ∂D. Let ε > 0 and choose η such that | f (w) − f (z)| < ε if |w − z| < η and
w ∈ ∂D. Pick t small so that P0(sups≤t |Xs| > η/2) < ε; this is possible because Brownian
174 Applications of the Markov properties
motion has continuous paths. Because z ∈ ∂D and every point of ∂D is regular for Dc,
Pz(τD ≤ t) = 1. Finally choose δ < (η/2) ∧ ε so that if |w − z| < δ and w ∈ D, then
Pw(τD ≤ t) > 1 − ε.

Now if |w − z| < δ and w ∈ D, then
Pw(|XτD − z| < η) ≥ Pw(τD ≤ t, sup
s≤t
|Xs − w| ≤ η/2)
≥ Pw(τD ≤ t) − P0(sup
s≤t
|Xs| > η/2)

≥ (1 − ε) − ε.

The set ∂D is a bounded and closed subset of Rd , hence compact, and since f is continuous

on ∂D, there exists M such that | f | is bounded by M . If |w − z| < δ and w ∈ D,
|h(w) − f (z)| = |E w f (XτD ) − f (z)|
≤ |E w[ f (XτD ); |XτD − z| < η] − f (z)Pw(|XτD − z| < η)|
+ 2MPw(|XτD − z| ≥ η)
≤ εPw(|XτD − z| < η) + 4Mε ≤ (1 + 4M )ε.
We used the fact that | f (XτD ) − f (z)| < ε if |XτD − z| < η. Since ε is arbitrary, this proves
that h(w) → f (z) as w → z inside D.
Let us give a sufficient condition for a point to be regular for a domain D. Let Ṽa =
{(x1, . . . , xd ) : x1 > 0, (x22 + · · · + x2d ) < a2x21}. The vertex of Ṽa is the origin. A cone V in
Rd is a translation and rotation of Ṽa for some a.
The following is known as the Poincaré cone condition.
Proposition 21.8 Suppose there exists a cone V with vertex y ∈ ∂D such that V ∩ B(y, r) ⊂
Dc for some r > 0. Then y is regular for Dc.

Proof By translation and rotation of the coordinates, we may suppose y = 0 and V = Ṽa

for some a. Then for each t,

P0(τD ≤ t) ≥ P0(Xt ∈ Dc) ≥ P0(Xt ∈ V ∩ B(0, r))

≥ P0(Xt ∈ V ) − P0(Xt /∈ B(0, r)).

By scaling, the last term is P0(X1 ∈ V ) − P0(X1 /∈ B(0, r/

√

t)), which converges to

P0(X1 ∈ V ) = (2π)−d/2

∫

V

e−|z|

2/2 dz > 0

as t → 0. Observe P0(τD ≤ t) converges to P0(τD = 0). By the Blumenthal 0–1 law

(Proposition 20.8), P0(τD = 0) = 1.

Continue to suppose (Xt, Px) is a d-dimensional Brownian motion and D is a bounded

domain, but now we suppose d ≥ 3. Define

U (x, A) = E x

∫ ∞

0

1A(Xs) ds, x ∈ D.

21.4 Harmonic functions 175

This is the same as the λ-resolvent of 1A with λ = 0. We write

U (x, A) = E x

∫ ∞

0

1A(Xs), ds

=

∫ ∞

0

Px(Xs ∈ A) ds

=

∫ ∞

0

∫

A

1

(2πs)d/2

e−|y−x|

2/2s dy ds

=

∫

A

∫ ∞

0

1

(2πs)d/2

e−|y−x|

2/2s ds dy.

Some calculus shows that the inside integral is equal to c|x − y|2−d . If we denote c|x − y|2−d

by u(x, y), we then have that

U (x, A) =

∫

A

u(x, y) dy. (21.9)

The expression u(x, y) is called the Newtonian potential density. Note that u(x, y) is a

function only of |x − y|, it blows up as |x − y| → 0, and tends to 0 as |x − y| → ∞.

If x is in the interior of D, then u(x, ·) will be bounded on ∂D. Define hx(z) = E zu(x, XτD );

we saw above that hx is harmonic. Now define gD(x, y) = u(x, y) − hx(y); this function of

two variables is called the Green’s function or Green function for D with pole at x. This is

a well-known object in analysis – let us give a probabilistic interpretation. Since u(x, y) is

symmetric in x and y, if A ⊂ D we have∫

A

gD(x, y) dx =

∫

A

u(x, y) dx −

∫

A

E

yu(x, XτD ) dx (21.10)

= E y

∫ ∞

0

1A(Xs) ds − E y

∫

A

u(x, XτD ) dx

= E y

∫ ∞

0

1A(Xs) ds − E y

[

E

XτD

∫ ∞

0

1A(Xs) ds

]

.

Using the strong Markov property and then a change of variables,

E y

[

E XτD

∫ ∞

0

1A(Xs) ds

]

= E y

[

E y

[ ∫ ∞

0

1A(Xs) ◦ θτD ds | FτD

] ]

= E y

∫ ∞

0

1A(Xs) ◦ θτD ds

= E y

∫ ∞

0

1A(XτD+s) ds

= E y

∫ ∞

τD

1A(Xs) ds.

Substituting this in (21.10) we have∫

A

gD(x, y) dx = E y

∫ τD

0

1A(Xs) ds.

For this reason gD is sometimes called the occupation time density for D.

176 Applications of the Markov properties

Exercises

21.1 Suppose d = 2, (Xt , Px) is a two-dimensional Brownian motion, and r > 0. Imitate the argument

of Proposition 21.1 but with h(x) = log(|x|) to show that Px(Xt hits B(0, r)) = 1 when |x| > r.

Then use the strong Markov property to show that there are times Ti → ∞ with XTi ∈ B(0, r).

That is, two-dimensional Brownian motion is neighborhood recurrent.

21.2 In the proof of Lemma 21.4, justify each inequality in (21.6).

21.3 Let (Xt , Px) be a Poisson process with parameter a and let F be defined by (21.5). Show

F (t, 1/2)/t does not converge to 0 as t → 0.

21.4 Suppose d ≥ 3, (Xt , Px) is a d-dimensional Brownian motion, and

U f (x) = E x

∫ ∞

0

f (Xs) ds.

Show that if f is bounded and measurable with compact support, then U f is continuous and

|U f (x)| → 0 as |x| → ∞. Show that if f ∈ C2 with compact support, then U f is C2. Show

that 12�U f = − f .

21.5 Let Wt be a Brownian motion and f a continuous function. Prove that if f (Wt ) is a submartingale,

then f must be convex.

21.6 Prove the maximum principle for harmonic functions. This says that if h is harmonic in a bounded

domain D, then

sup

x∈D

|h(x)| ≤ sup

x∈∂D

|h(x)|.

21.7 If W is a d-dimensional Brownian motion started at 0, find E T , where T is the first time W exits

the ball of radius r centered at the origin.

Hint: Use the fact that |Wt |2 − dt is a martingale.

21.8 Let f : R → R be a bounded function with | f (x) − f (y)| ≤ |x − y| for all x, y ∈ R. Let

Dε = {(x, y) ∈ R2 : f (x) < y < f (x) + ε} for ε ∈ (0, 1). Let (Xt , Px) be a two-dimensional
Brownian motion and let τε = inf{t : Xt /∈ Dε}. Prove that there exists a constant c not depending
on ε such that E 0τε ≤ cε2.
Hint: By Exercise 21.7 the expected time for two-dimensional Brownian motion to leave
a ball of radius 2ε is less than cε2. Then use the strong Markov property repeatedly at the
times Si, where Si is the first time after time Si−1 that Brownian motion has moved at least 2ε
from XSi−1 .
22
Transformations of Markov processes
There are a number of interesting transformations that make new Markov processes out of
old. We will look at four: killing, conditioning, changing time, and stopping at a last exit
time. These are only a few of the possible transformations.
22.1 Killed processes
One sometimes wants to consider a Markov process up until a stopping time ζ , called the
lifetime of the process. We affix to our state space S an isolated point �, called the cemetery
state, and the topology on S� = S ∪ {�} is the one generated by the collection of open sets
of S together with the set {�}. We define the killed process X̂ by
X̂t =
{
Xt, t < ζ ;
�, t ≥ ζ , (22.1)
and we say we kill the process X at time ζ . Every function f on S is defined to be 0 at �.
One example of this situation would be to let ζ = τD, where D is a subset of S and
τD = inf{t > 0 : Xt /∈ D}, the first exit from the set D. Another common occurrence is to

let ζ = S, where S is a random variable with an exponential distribution with parameter λ,

i.e., P(S > t) = e−λt , such that S is independent of X . A third possibility would be to let

ζ = inf{t : ∫ t0 f (Xs) ds ≥ 1}, where f is a non-negative function. The crucial property of ζ

is that it be a terminal time:

ζ = s + ζ ◦ θs if s < ζ. (22.2)
Proposition 22.1 If (Xt, Px) is a strong Markov process and (22.2) holds, then (X̂t, Px)
satisfies the Markov and strong Markov properties.
Proof As in Section 20.2, we need to show
E
x[ f (X̂t ) ◦ θT |FT ] = E X̂T f (X̂t ), Px-a.s.
If A ∈ FT ,
E
x[ f (X̂t ) ◦ θT ; A] = E x[ f (Xt+T ); A, T + t < ζ ].
177
178 Transformations of Markov processes
On the other hand,
E
X̂T f (X̂t ) = E XT [ f (Xt ); t < ζ ]1(T<ζ )
= E x[ f (Xt ) ◦ θT ; t ◦ θT < ζ ◦ θT |FT ]1(T <ζ )
= E x[ f (Xt+T ); T + t ◦ θT < T + ζ ◦ θT , T < ζ |FT ]
= E x[ f (Xt+T ); T + t < ζ |FT ],
since T + t ◦ θT = T + t and T + ζ ◦ θT = ζ on (T < ζ ). Hence
E
x[E X̂T f (X̂t ); A] = E x[ f (Xt+T ); T + t < ζ, A],
as required.
22.2 Conditioned processes
Another type of transformation of a Markov process is by conditioning, also known as Doob’s
h-path transform. To motivate this, let D be a domain in Rd and let Xt be a Brownian motion
killed on exiting the domain. One would like to give a precise meaning to the intuitive notion
of Brownian motion conditioned to exit the domain at a certain point. Let h be a positive
harmonic function in D (i.e., h is C2 in D, and �h = 0 there) and suppose that h is 0
everywhere on the boundary of D except at one point z. The Poisson kernel for the ball or
for the half-space gives examples of such harmonic functions. Then, heuristically, we have
by the Markov property at time t,
Px(Xt ∈ dy|XτD = z) =
Px(Xt ∈ dy, XτD = z)
Px(XτD = z)
= P
x(Xt ∈ dy)Py(XτD = z)
Px(XτD = z)
.
If p0(t, x, dy) represents the probability that Brownian motion started at x and killed on
leaving D is in dy at time t, we then expect that the analogous probability for Brownian
motion conditioned to exit D at z ought to be h(y)p0(t, x, dy)/h(x). We now make this
precise.
Let us look at a strong Markov process X . We say a function h is invariant with respect
to X if Pth(x) = h(x) for all t and x, where Pt is the semigroup associated with X . If h is
invariant, by the Markov property,
E
x[h(Xt ) | Fs] = E x[h(Xt−s) ◦ θs | Fs] = E Xs h(Xt−s)
= Pt−sh(Xs) = h(Xs),
and so for each x, h(Xt ) is a martingale with respect to Px. Conversely, if h(Xt ) is a martingale
with respect to Px for all x,
Pth(x) = E xh(Xt ) = h(x)
by the definition of martingale, and so h is invariant. In the case of Brownian motion killed
on leaving a domain, the invariant functions are thus the harmonic ones.
22.2 Conditioned processes 179
Now let h be a non-negative invariant function for a strong Markov process X . Letting
Mt = h(Xt )/h(X0), Mt is a non-negative continuous martingale with M0 = 1, Px-a.s., as long
as h(x) > 0.

We define the h-path transform of the Markov process X by setting

Pxh(A) = E x[Mt; A], A ∈ Ft . (22.3)

Since M0 = 1, Pxh(�) = 1. Observe that Pxh gives more mass to paths where h(Xt ) is big and

less to where it is small. Note the similarity to the Girsanov theorem.

We have the following.

Proposition 22.2 Suppose (Xt, Px) is a strong Markov process and that h is non-negative

and invariant. Then (Xt, Pxh) forms a strong Markov process.

Proof Suppose A ∈ Fs and h(x) �= 0. (We leave consideration of the case where h(x) = 0

to the reader.) Then

E

x

h[ f (Xt+s); A] =

E

x[ f (Xt+s)h(Xt+s); A]

h(x)

= E

x[E Xs [ f (Xt )h(Xt )]; A]

h(x)

= E x

[ 1

h(Xs)

E

Xs [ f (Xt )h(Xt )]h(Xs); A

]

by the Markov property for X . This is equal to

E

x[

E

Xs

h [ f (Xt )]h(Xs); A

]

/h(x) = Exh[E Xsh f (Xt ); A].

The Markov property follows from this. The strong Markov property is proved in almost

identical fashion.

Let us consider an example. Let (Xt, Px) be a Brownian motion on the non-negative axis

killed on first hitting 0. This is the same as a Brownian motion killed on exiting (0, ∞). This

will be a strong Markov process. Since the second derivative of the function h(x) = x is 0,

then h is harmonic on (0, ∞), and so is invariant for the killed Brownian motion. Let us

now condition using the function h to get Brownian motion conditioned to hit infinity before

hitting zero.

To identify the resulting process, we argue as follows. Fix x and let Tε = inf{t > 0 :

Xt < ε}. The Radon–Nikodym derivative of the law of Pxh with respect to Px on Ft∧Tε is
Mt∧Tε = h(Xt∧Tε )/h(x). We can rewrite Mt∧Tε as
Mt∧Tε = exp(log Xt∧Tε − log x) = exp
( ∫ t∧Tε
0
1
Xs
dXs − 12
∫ t∧Tε
0
( 1
Xs
)2
ds
)
,
using Itô’s formula. By the Girsanov theorem, under Pxh,
Wt∧Tε = Xt∧Tε −
∫ t∧Tε
0
1
Xs
ds
180 Transformations of Markov processes
is a martingale. By Exercise 13.2, its quadratic variation is t ∧ Tε, and so by Exercise 12.3,
Wt∧Tε is a Brownian motion stopped at time Tε. We have
Xt∧Tε = x + Wt∧Tε +
∫ t∧Tε
0
1
Xs
ds,
or X satisfies the stochastic differential equation
dXt = dWt + 1
Xt
dt
for t ≤ Tε. We will see later (Section 24.3) that this is the stochastic differential equation
defining the Bessel process of order 3. The same argument shows that Brownian motion
killed on exiting (0, a) and then conditioned to hit a before 0 is also a Bessel process of
order 3 up until the time of first hitting a.
22.3 Time change
An additive functional is an increasing adapted process with A0 = 0, a.s., such that
At = As + At−s ◦ θs (22.4)
if s < t. The simplest examples are what are known as classical additive functionals:
At =
∫ t
0 f (Xr) dr, where f is a non-negative measurable function. We have
At − As =
∫ t
s
f (Xr) dr =
∫ t−s
0
f (Xr) dr ◦ θs = At−s ◦ θ.
If we have the uniform limit of additive functionals, we again get an additive functional, and
thus, for example, the local times Lxt of a one-dimensional Brownian motion are also additive
functionals.
Given a Markov process X and an additive functional A, let
Bt = inf{u : Au > t}

and

X ′t = XBt .

Let F ′t = FBt . Thus X ′ is a time change of X .

Proposition 22.3 Let (Xt, Px) be a strong Markov process and At an additive functional.

With B defined as above, (X ′t , P

x) is also a strong Markov process.

Proof We verify the strong Markov property. Let F ′t = FBt . Then if T is a stopping time

for F ′t , we have

E

x[ f (X ′T+t ) | F ′T ] = E x[ f (X (BT+t )) | FBT ].

BT can be seen to be a stopping time with respect to {Ft} and BT+t = Bt ◦ θBT where the θt

are the shift operators, so this is

E

x

E

X (BT ) f (XBt ) = E xE X

′

T f (X ′t ).

This suffices to show that X ′t is a strong Markov process.

22.4 Last exit decompositions 181

22.4 Last exit decompositions

Let A be a Borel set, and let L be the last visit to A:

L = sup{s : Xs ∈ A}.

We define L to be 0 if X never hits A. The random time L is not a stopping time, but we can

nevertheless kill the process X at time L. It turns out the resulting process Y is the process

X conditioned by the function h(x) = Px(TA < ∞). The intuitive meaning of this is that Y
is X conditioned to hit the set A.
Let T = inf{t : Xt ∈ A}, and set
Yt =
{
Xt, t < L,
�, t ≥ L.
Let Ht = σ (Ys; s ≤ t).
Proposition 22.4 If (Xt, Px) is a strong Markov process, then (Yt, Px) is a Markov process
with respect to {Ht}.
Proof If B ⊂ S (so that � /∈ B), then
(Yt ∈ B) = (Xt ∈ B, L > t) = (Xt ∈ B, T ◦ θt < ∞),
since L, the last time that X is in A, will be larger than t if and only if X hits A at some time
after time t. We conclude that the function x → Px(Yt ∈ B) is Borel measurable. Since
Px(Yt = �) = Px(L ≤ t) = 1 − Px(L > t) = 1 − Px(T ◦ θt < ∞),
then the function x → Px(Yt = �) is also Borel measurable.
We need to show that if C ∈ Hs,
E
x[ f (Yt );C] = E x[Qt−s f (Ys);C], (22.5)
where f is bounded and measurable, h(x) = Px(L > 0), and

Qtg(x) = 1

h(x)

Pt (gh)(x)

when h(x) �= 0. (Set Qtg(x) = 0 if h(x) = 0.)

It suffices to show (22.5) when C = (Yr1 ∈ B1, . . . ,Yrn ∈ Bn) for r1 ≤ · · · ≤ rn ≤ s and

the B1, . . . , Bn are Borel sets. If we set

Cs = (Xr1 ∈ B1, . . . , Xrn ∈ Bn),

then Cs ∈ Fs, C ∩ (L > s) = Cs ∩ (L > s), and C ∩ (L > t) = Cs ∩ (L > t).

We start with

E

x[ f (Yt );C] = E x[ f (Xt );C, L > t] = E x[ f (Xt );Cs, L > t]

= E x[ f (Xt );Cs, L ◦ θt > 0].

Conditioning on Ft , this is equal to

E

x[ f (Xt )P

Xt (L > 0);Cs] = E x[ f (Xt )h(Xt );Cs].

182 Transformations of Markov processes

Conditioning on Fs, this in turn is equal to

E

x[Pt−s( f h)(Xt−s);Cs] = E x[h(Xs)Qt−s f (Xs);Cs] (22.6)

= E x[PXs (L > 0)Qt−s f (Xs);Cs]

= E x[Qt−s f (Xs);Cs, L ◦ θs > 0],

where we used the Markov property for the last equality. Continuing, we have that the last

line of (22.6) is equal to

E

x[Qt−s f (Xs);Cs, L > s] = E x[Qt−s f (Xs);C, L > s]

= E x[Qt−s f (Ys);C],

as desired.

We can also look at XL+t , where L is as above. This new process is again a strong Markov

process, and this time is the process X conditioned by the function h(x) = Px(TA = ∞). The

intuitive meaning of this is that XL+t is X conditioned never to hit A. Since we are looking

at the process after the last visit to A, this is entirely plausible. For a proof of the Markov

property of XL+t , see Meyer et al. (1972).

Exercises

22.1 Let (Xt , Px) be a one-dimensional Brownian motion, Lxt the local time of Brownian motion at x,

and m a positive finite measure on R. Show that At =

∫

Lxt m(dx) is an additive functional.

22.2 We consider the space-time process. Let Vt = V0 + t. The process Vt is simply the process that

increases deterministically at unit speed. Thus Vt can represent time. If (Xt , Px) is a Markov

process, show that ((Xt ,Vt ), P(x,v)) is also a Markov process. Is ((Xt ,Vt ), P(x,v)) necessarily a

strong Markov process if (Xt , Px) is a strong Markov process?

For some applications, one lets Vt = V0 − t, and one thinks of time running backwards.

Space-time processes are useful when considering parabolic partial differential equations.

22.3 Suppose (Xt , Px) is a strong Markov process and f is a non-negative invariant function for

(Xt , Px). Write Qx for Pxf . Suppose g is a non-negative invariant function for (Xt , Q

x). Show that

f g is a non-negative invariant function for (Xt , Px) and that Qxg = Pxf g.

22.4 Suppose A and B are additive functionals for a Markov process and A and B have continuous

paths. Prove that if E xAt = E xBt for all x and t, then

Px(At �= Bt for some t ≥ 0) = 0

for all x.

Hint: Show At − Bt is a martingale.

22.5 Suppose A and B are additive functionals with continuous paths and suppose E xA∞ = E xB∞ <
∞ for each x. Show
Px(At �= Bt for some t ≥ 0) = 0
for each x.
Hint: If f (x) = E xA∞, then
E x[A∞ | Ft ] − At = E Xt A∞ = f (Xt ),
and similarly with B in place of A. Then A − B is a Px martingale for each x.
Notes 183
22.6 Let A be an additive functional with continuous paths. Suppose there exists K > 0 such that

E xA∞ ≤ K for each x. Prove that there exists a constant c depending only on K such that

E ecA∞ < ∞, x ∈ S.
22.7 Here is an argument that the law of a Brownian motion conditioned to have a maximum at a
certain level is a Bessel process of order 3.
Let W be a one-dimensional Brownian motion killed on hitting 0. Let St = sups≤t Ws be
the maximum. By Exercise 19.1, X = (W, S) is a Markov process. Determine the law of
X for t ≤ L, where L is the last time X hits the diagonal. To define L more precisely, let
D = {(w, s) : w = s, w > 0} and L = sup{t ≥ 0 : Xt ∈ D}. L is finite, a.s., because W will hit 0

in finite time with probability one.

Notes

Markov processes are in some sense supposed to have the property that the past and the

future are independent given the present. From this point of view, one might hope that a

Markov process run backwards is again a Markov process. This is, more or less, the case;

see Chung and Walsh (1969) or Rogers and Williams (2000a).

23

Optimal stopping

A nice application of Markov process theory is optimal stopping. Suppose we have a reward

function g ≥ 0 and we want to find the stopping time T that maximizes the value of E xg(XT )

and we also want to find the value of this expectation. This is the optimal stopping problem.

An important example of an optimal stopping problem is pricing the American put. (See

Chapter 28 for more on options.) A European put is an option to sell a share of stock at a

fixed price K at a certain time t0. If at time t0 the price St0 of the stock is lower than K, one

can make a profit by buying a share of stock on the stock exchange for St0 dollars, exercising

the put (which means selling a share of stock for K dollars), and taking home a profit of

K − St0 . If the price of the stock is above K at time t0, it would be silly to exercise the put,

and thus the put is worthless. An American put is almost the same, but one has the option to

sell a share of stock at price K at any time before time t0. An American put is more valuable

than a European put because if one exercises the option early, that is, sells the share of stock

before time t0, then one can put the money in a risk-free asset such as a bond or in the bank

and earn interest on the money. When should one exercise an American put to maximize the

expected return? One cannot look into the future, so the time should be a stopping time. The

stopping time should depend on the stock price, the exercise price, and also the time until

time t0. Thus one is in the optimal stopping context with Xt = (t, St ), where St is the stock

price, and one wants to find a stopping time T that maximizes a certain reward function.

23.1 Excessive functions

A solution to the optimal stopping problem can be given in the Markov case through the use

of excessive functions. A non-negative function f is excessive for a Markov process X if

Pt f (x) ≤ f (x) for all t and x and Pt f (x) increases up to f (x) pointwise as t → 0. Here Pt is

the semigroup associated with the Markov process X . If g ≥ 0, define

U g(x) =

∫ ∞

0

Psg(x) ds = E x

∫ ∞

0

g(Xs) ds. (23.1)

When g ≥ 0, U g is excessive. To see this, using the semigroup property and a change of

variables,

Pt f (x) = Pt

( ∫ ∞

0

Psg(x) ds

)

=

∫ ∞

0

Ps+tg(x) ds

=

∫ ∞

t

Psg(x) ds.

184

23.1 Excessive functions 185

This is certainly less than the integral from 0 to ∞, hence is less than f (x), and Pt f (x)

increases up to f (x) by monotone convergence. (It is possible that f is infinite for some or

all x.)

The theory of excessive functions is an important part of Markov process theory and we

refer the reader to Blumenthal and Getoor (1968), a book which has inspired a generation of

Markov process theorists.

We have the following.

Lemma 23.1 If f is excessive, there exist functions gn ≥ 0 such that U gn increases up to f ,

where U gn is defined by (23.1).

Proof Let gn = n( f − P1/n f ). Since f is excessive, then gn ≥ 0. We have

U gn = n

∫ ∞

0

Ps f ds − n

∫ ∞

0

Ps+(1/n) f ds

= n

∫ 1/n

0

Ps f ds,

which is less than f and increases to f .

Next we have

Proposition 23.2 (1) If f is excessive, T is a finite stopping time, and h(x) = E x f (XT ),

then h is excessive.

(2) If f is excessive and T is a finite stopping time, then f (x) ≥ E x f (XT ).

(3) If f is excessive, then f (Xt ) is a supermartingale

Proof (1) First suppose f = U g for some non-negative function g. Then

h(x) = E xU g(XT ) = E xE XT

∫ ∞

0

g(Xs) ds (23.2)

= E x

∫ ∞

0

g(Xs+T ) ds = E x

∫ ∞

T

g(Xs) ds

by the strong Markov property and a change of variables. The same argument shows that

Pth(x) = E xh(Xt ) = E xE Xt

∫ ∞

T

g(Xs) ds = E x

∫ ∞

T+t

g(Xs) ds.

This is less than E x

∫∞

T g(Xs) ds = h(x) and increases up to h(x) as t ↓ 0.

Now let f be excessive but not necessarily of the form U g. In the paragraph above, replace

g by the gn that were defined in Lemma 23.1 to conclude

Pth(x) = lim

n→∞

PtU gn(x) ≤ lim

n→∞

U gn(x) = h(x).

That Pth increases up to h is proved similarly; there is no difficulty interchanging the limit

as n tends to infinity and the limit as t tends to 0 since PtU gn increases both as n increases

and as t decreases.

186 Optimal stopping

(2) As in the proof of (1), it suffices to consider the case where f = U g and then take

limits. By (23.2),

E

xU g(XT ) = E x

∫ ∞

T

g(Xs) ds ≤ E x

∫ ∞

0

g(Xs) ds = U g(x).

(3) By the Markov property,

E x[ f (Xt ) | F s] = E Xs f (Xt−s) = Pt−s f (Xs) ≤ f (Xs).

The proof is complete.

By Proposition 23.2, f (Xt ) is a supermartingale and therefore has left and right limits

along the dyadic rationals. We could take a version of f (Xt ) that is right continuous, but

there is the danger that doing so would result in a version of X that is not right continuous

with left limits. We want to have X fixed and then conclude that f (Xt ) is right continuous

with left limits without needing to take a version.

Proposition 23.3 Let (Xt, Px) be a strong Markov process. If f is excessive, then for each

x, f (Xt ) is right continuous with left limits Px almost surely.

For a proof, we refer the reader to Blumenthal and Getoor (1968), Theorem II.2.12 or to

Exercise 23.8.

Given a function g, the function G is an excessive majorant for g if G is excessive and

G ≥ g pointwise. G is the least excessive majorant for g if (1) G is an excessive majorant,

and (2) if G̃ is any other excessive majorant, then G ≤ G̃ pointwise.

It turns out, which we will prove below, that an optimal stopping time is to stop the first

time Xt leaves the set where g(x) < G(x). Therefore it is important to be able to calculate
the least excessive majorant of a function.
Here is one method of constructing the least excessive majorant. We say a function
f : S → R is lower semicontinuous if {x : f (x) > a} is an open set for every real number

a. See Exercise 23.9 for information about lower semicontinuous functions.

Proposition 23.4 Suppose that g is non-negative, bounded, and continuous and that As-

sumption 20.1 holds. Let g0 = g, let Tn = {k/2n : 0 ≤ k ≤ n2n}, and define

gn(x) = max

t∈Tn

Ptgn−1(x)

for n = 1, 2, . . . Then gn(x) increases pointwise to G(x), the least excessive majorant of g.

Proof Since gn(x) ≥ P0gn−1(x) = E xgn−1(X0) = gn−1(x), the sequence gn(x) is increasing.

Call the limit H (x).

We first show H is lower semicontinuous. If gn−1 is bounded and continuous, then Ptgn−1

is bounded and continuous for each t by Assumption 20.1. Since the maximum of a finite

number of continuous functions is continuous, then gn is bounded and continuous. By an

induction argument, each gn is continuous. By Exercise 23.9, H is lower semicontinuous.

We next show that H is excessive. If t ∈ Tm and n ≥ m, then

H (x) ≥ gn(x) ≥ Ptgn−1(x) = E xgn−1(Xt ).

23.2 Solving the optimal stopping problem 187

Letting n tend to infinity, H (x) ≥ E xH (Xt ) if t ∈ Tm for some m. Now take tk ∈ ∪mTm with

tk → t. Since H is lower semicontinuous, then using Exercise 23.9 and Fatou’s lemma,

H (x) ≥ lim inf

k→∞

E

xH (Xtk ) ≥ E x[lim inf

k→∞

H (Xtk )] ≥ E xH (Xt ).

If a ∈ R, let Ea = {y : H (y) > a}, which is open. If a < H (x), then
PtH (x) = E xH (Xt ) ≥ aPx(Xt ∈ Ea) → a
as t → 0. Therefore lim inf t→0 PtH (x) ≥ a for all a < H (x), hence
lim inf
t→0
PtH (x) ≥ H (x),
and we conclude PtH (x) → H (x) as t → 0. Thus H is excessive.
Suppose now that F is excessive and F ≥ g pointwise. If F ≥ gn−1, then
F (x) ≥ PtF (x) ≥ Ptgn−1(x) for every t ∈ Tn, hence F (x) ≥ gn(x). By an induction argument,
F (x) ≥ gn(x) for all n, hence F (x) ≥ H (x). Therefore H is the least excessive majorant of g.
In one case, at least, finding the least excessive majorant is easy. Suppose we have a one-
dimensional Brownian motion killed on leaving an interval [a, b] and a non-negative function
g defined on [a, b]. Then the least excessive majorant is the smallest concave function G that
is larger than or equal to g everywhere. To see this, if G is the smallest concave function, by
Jensen’s inequality
PtG(x) = E xG(Xt ) ≤ G(E xXt ) ≤ G(x).
Because G is concave, it is continuous, and so PtG(x) = E xG(Xt ) → G(x) as t → 0.
Therefore G is excessive. If G̃ is another excessive function larger than g and a ≤ c < x <
d ≤ b, we have G̃(x) ≥ E xG̃(XS ), where S is the first time the process leaves [c, d] by
Proposition 23.2(1). Since X is equal to a Brownian motion up to time S, we know the exact
distribution of XS; see Proposition 3.16. Therefore
G̃(x) ≥ E xG̃(XS ) = d − x
d − cG̃(c) +
x − c
d − cG̃(d).
Rearranging this inequality shows that G̃ is concave. Recall that the minimum of two concave
functions is concave, so G ∧ G̃ is a concave function larger than g that is less than or equal
to G. But G is the smallest concave function larger than or equal to g, hence G = G ∧ G̃, or
G ≤ G̃. Thus G is the least excessive majorant of g.
23.2 Solving the optimal stopping problem
Now let us turn to proving that an optimal stopping time can be given in terms of the least
excessive majorant. For simplicity we will suppose that g is non-negative, continuous, and
bounded. We will assume that our Markov process and g are such that a least excessive
majorant G exists. Let g∗ be the optimal reward:
g∗(x) = sup{E xg(XT ) : T a stopping time}.
Let D = {x : g(x) < G(x)}, the continuation region and let τD = inf{t : Xt /∈ D}.
Theorem 23.5 Let (Xt, Px) be a strong Markov process and g, g∗, G, and D as above. If
τD < ∞, Px-a.s., then g∗(x) = G(x) = E xg(XτD ).
188 Optimal stopping
In other words, an optimal stopping time is to stop the first time the process hits {x :
G(x) = g(x)}.
Proof Let Dε = {x : g(x) < G(x) − ε}, and write τε for τDε . Let Hε(x) = E x[G(Xτε )],
which is excessive by Proposition 23.2(2).
The first step of the proof is to prove (23.3) below. Second, we prove G(x) ≤ g∗(x). The
third step is to prove that G(x) = g∗(x) and the fourth that g∗(x) = E xg(XτD ).
Step 1. Let ε > 0. We claim

g(x) ≤ Hε(x) + ε, x ∈ D. (23.3)

To prove this, we suppose not, that is, we let

b = sup

x∈D

(g(x) − Hε(x))

and suppose b > ε. Choose η < ε, and then choose x0 such that
g(x0) − Hε(x0) ≥ b − η. (23.4)
Since Hε + b is an excessive majorant of g by the definition of b, and G is the least excessive
majorant, then
G(x0) ≤ Hε(x0) + b. (23.5)
From (23.4) and (23.5) we conclude
G(x0) ≤ g(x0) + η. (23.6)
By the Blumenthal 0–1 law (Proposition 20.8), either τε is strictly positive with Px0
probability one or else zero with Px0 probability one. In the first case, for each t > 0,

g(x0) + η ≥ G(x0)

≥ E x[G(Xt∧τε )]

≥ E x0 [g(Xt ) + ε; τε > t].

The first inequality is (23.6), the second is due to G being excessive, and the third because

G > g + ε up until the time τε. If we let t → 0 and use the fact that g is continuous, we get

g(x0) + η ≥ g(x0) + ε, a contradiction to the way we chose η.

In the second case, where τε = 0 with Px0 -probability one, we have

Hε(x0) = E x0 G(Xτε ) = E x0 G(X0) = G(x0) ≥ g(x0) ≥ Hε(x0) + b − η,

a contradiction since we chose η < b.
In either case we reach a contradiction, so (23.3) must hold.
Step 2. A conclusion we reach from (23.3) is that Hε + ε is an excessive majorant of g.
Therefore
G(x) ≤ Hε(x) + ε (23.7)
= E x[G(Xτε )] + ε
≤ E x[g(Xτε ) + ε] + ε
≤ g∗(x) + 2ε.
Exercises 189
The first inequality holds because G is the least excessive majorant, the second inequality
because g(Xτε ) + ε = G(Xτε ) by the definition of τε, and the third by the definition of g∗.
Since ε is arbitrary, we see that G(x) ≤ g∗(x).
Step 3. For any stopping time T , because G is excessive and majorizes g,
G(x) ≥ E xG(XT ) ≥ E xg(XT ).
Taking the supremum over all stopping times T , G(x) ≥ g∗(x), and therefore G(x) = g∗(x).
Step 4. Because τD is finite almost surely, the continuity of g tells us that E
xg(Xτε ) →
E
xg(XτD ) as ε → 0. By the definition of g∗, we know that E xg(Xτε ) ≤ g∗(x).
On the other hand, by the definitions of τε and Hε,
E
xg(Xτε ) = E xG(Xτε ) − ε = Hε(x) − ε.
By the first inequality in (23.7), the right-hand side is greater than or equal to G(x) − 2ε =
g∗(x) − 2ε. Letting ε → 0 we obtain
E
xg(XτD ) ≥ g∗(x)
as desired.
The following two corollaries are useful in applications.
Corollary 23.6 Suppose there exists a Borel set A such that h is an excessive majorant of g,
where h(x) = E xg(XτA ) and τA = inf{t : Xt /∈ A}. Then g∗(x) = h(x).
Proof Let G be the least excessive majorant of g. Then h(x) ≥ G(x). However,
h(x) = E xg(XτA ) ≤ sup
T
E
xg(XT ) = g∗(x) = G(x)
by Theorem 23.5.
Corollary 23.7 Suppose g is continuous and G, the least excessive majorant of g, is lower
semicontinuous. Let D be the continuation region, suppose τD < ∞, a.s., and let h(x) =
E
xg(XτD ). If h ≥ g, then h = g∗.
Proof Note D = {x : g(x) < G(x)} = ∪a** b)], where the union is
over all pairs of real numbers a < b. Since G is lower semicontinuous and g is continuous,
then D is open. This implies XτD /∈ D, and so g(XτD ) ≥ G(XτD ), a.s. Since g ≤ G, we see that
h(x) = E xg(XτD ) = E xG(XτD ).
Since G is excessive, then h is also excessive by Proposition 23.2. Therefore h is an excessive
majorant of g and we can apply Corollary 23.6.
Exercises
23.1 Show that if f is excessive, then 1 − e− f is excessive. Thus, for some purposes it is enough to
look at bounded excessive functions.
23.2 Show that if f and g are excessive, then f ∧ g is excessive.
190 Optimal stopping
23.3 Let At be an additive functional (defined in (22.4)) and let f (x) = E xA∞. Show that f is
excessive.
23.4 Let f be an excessive function for a strong Markov process (Xt , Px). Let ε > 0 and S1 = inf{t :
| f (Xt ) − f (X0)| ≥ ε}. Let Si+1 = Si + S1 ◦ θSi . Prove that f (XSi ) is a supermartingale with
respect to the σ -fields FSi and with respect to Px for each x.
23.5 For each n, let Ant be an additive functional with continuous paths and suppose that fn(x) = E xAn∞
is finite for every x. Suppose At is a continuous additive functional with f (x) = E xA∞ also
finite for each x. Suppose fn converges to f uniformly. Prove that for each x, with Px-probability
one, Ant converges to At , uniformly over t ≥ 0.
Hint: Use Proposition 9.11.
23.6 Suppose f is bounded and excessive, λ ≥ 0, and A = {y : f (y) ≤ λ}. Prove that if x ∈ Ar (i.e.,
x is regular for A), then f (x) ≤ λ.
Hint: Use the optional section theorem (Theorem 16.12) to find stopping times Tm whose
graphs are contained in {(t, ω) : t ≤ 1/m, f (Xt ) ≤ λ} with Px-probability at least 1 − (1/m).
If the gn are as in Lemma 23.1, write
U gn(x) = E x
∫ Tm
0
gn(Xs) ds + E xU gn(XTm )
≤ E x
∫ Tm
0
gn(Xs) ds + E x f (XTm )
≤ E x
∫ Tm
0
gn(Xs) ds + λ + ‖ f ‖∞/m.
Let m → ∞, then n → ∞.
23.7 Suppose f is bounded and excessive, λ ≥ 0, and B = {y : f (y) ≥ λ}. Prove that if x ∈ Br, then
f (x) ≤ λ.
Hint: Use the optional stopping theorem as in Exercise 23.6 to find stopping times Rm
analogous to the Tm. Write
f (x) ≥ E x f (XRm ) ≥ λ − ‖ f ‖∞/m,
and then let m → ∞.
23.8 (1) Suppose f is bounded and excessive, x ∈ S, ε > 0, and C = {y : | f (y) − f (x)| ≥ ε}. Use
Exercises 23.6 and 23.7 to show that if z ∈ Cr, then | f (z) − f (x)| ≥ ε.
(2) Let f , ε, and x be as in (1) and set S = inf{t > 0 : | f (Xt ) − f (x)| ≥ ε}. Use Exercise
20.9 to show that | f (XS ) − f (x)| ≥ ε with Px-probability one.
(3) Let f , ε, x, and S be as in (2). Define S = 0 and Si+1 = Si + S ◦ θSi . By Exercise 23.4,
f (XSi ) is a positive supermartingale. Use Corollary A.36 to show Si → ∞, Px-a.s. Deduce that
with Px-probability one, f (Xt ) has paths that are right continuous with left limits.
(4) Use Exercise 23.1 to show that if f is excessive but not necessarily bounded, then f (Xt )
has paths that are right continuous with left limits.
23.9 (1) Show that every continuous function is lower semicontinuous.
(2) Show that if f is lower semicontinuous and x ∈ S, then
lim inf
y→x f (y) ≥ f (x).
(3) Show that if fn is a sequence of continuous functions increasing to f , then f is lower
semicontinuous.**

Notes 191

23.10 Suppose g is non-negative, bounded, and continuous, and Assumption 20.1 holds. Let g0 = g

and define gn(x) = supt≥0 Ptgn−1(x) for n ≥ 1. Prove that gn increases to the least excessive

majorant of g.

Notes

See Øksendal (2003) for further information on optimal stopping.

Exercise 23.3 shows that E xA∞ is an excessive function if A is an additive functional. To

a large extent the converse is true: given an excessive function f and some mild conditions,

there exists an additive functional A such that f (x) = E xA∞ for all x. The proof is a

modification of the Doob–Meyer decomposition of f (Xt ) that takes into account the fact

there is a family of probabilities instead of just one; see Blumenthal and Getoor (1968).

The optimal stopping problem involving American puts has a theoretical solution: look

at the least excessive majorant for a certain reward function. The reward function is not just

(K − s)+ because the interest earned on the money obtained after the sale of a share of

stock needs to be taken into account. Moreover, the excessive functions here are relative to

the space-time process (St, t), not those relative to St . Finding a satisfactory solution to this

optimal stopping problem is still open and is important.

24

Stochastic differential equations

Stochastic differential equations are used in modeling a wide variety of physical and economic

situations, and are one of the main reasons for the interest in stochastic integrals.

We consider stochastic differential equations (SDEs) of the form

dXt = σ (Xt ) dWt + b(Xt ) dt,

where σ and b are real-valued functions and W is a one-dimensional Brownian motion. We

also consider multidimensional analogs of this equation. If X represents the position of a

particle, the σ (Xt ) dWt term says that the particle X diffuses like a multiple of Brownian

motion, but how strong the diffusivity is depends on the location of the particle. The b(Xt ) dt

term represents a push in one direction or another, the size of the push depending on the

location of the particle.

24.1 Pathwise solutions of SDEs

Let Wt be a one-dimensional Brownian motion with respect to a filtration {F t} satisfying the

usual conditions; see Chapter 1. We want to consider SDEs of the form

dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x0. (24.1)

This means that Xt satisfies the equation

Xt = x0 +

∫ t

0

σ (Xs) dWs +

∫ t

0

b(Xs) ds, t ≥ 0. (24.2)

Here σ and b are Borel measurable functions, the first integral in (24.2) is a stochastic integral

with respect to the Brownian motion Wt , and (24.2) holds almost surely, that is, we can find

versions of

∫ t

0 σ (Xs) dWs such that for almost all ω, (24.2) holds for all t. In order to be able

to define the stochastic integral, we require that any solution Xt to (24.2) be adapted to the

filtration {Ft}. If X satisfies (24.2), then X will automatically have continuous paths. We

want to consider existence and uniqueness of solutions to the equation (24.2).

Definition 24.1 A stochastic process X will be a pathwise solution to (24.1) if X is adapted

to the filtration {Ft} and (24.2) holds almost surely, where the null set does not depend on t.

We say the solution to (24.1) is pathwise unique if whenever X ′t is another solution, then

P(Xt �= X ′t for some t ≥ 0) = 0. (24.3)

Sometimes pathwise uniqueness is used for a slightly stronger concept: one can let W be

a Brownian motion with respect to each of two filtrations {Ft} and {F ′t }, which are possibly

192

24.1 Pathwise solutions of SDEs 193

different, and one can let X ′t be adapted to {F ′t }. One then requires (24.3) to hold. We won’t

need to use this modification of the definition, and in any case our proof of uniqueness will

be equally valid in this situation.

The function σ in (24.1) is called the diffusion coefficient and the function b is called the

drift coefficient. σ tells us the intensity of the noise at a point, and b tells us if there is a push

in any direction at a given point.

We will suppose that σ and b are Lipschitz functions: there exists a constant c such that

|σ (x) − σ (y)| ≤ c|x − y|, |b(x) − b(y)| ≤ c|x − y|. (24.4)

We also suppose for now that σ and b are bounded.

Theorem 24.2 Suppose σ and b are bounded Lipschitz functions. Then there exists a path-

wise solution to (24.1) and this solution is pathwise unique.

Proof Existence. Let X0(t) = x0 for all t and define Xi(t) recursively by

Xi+1(t) = x0 +

∫ t

0

σ (Xi(s)) dWs +

∫ t

0

b(Xi(s)) ds. (24.5)

Note that X0(t) is trivially adapted to {Ft}, and an induction argument shows that Xi is adapted

to {Ft} for each i.

Fix t0. We will show existence (and uniqueness) up to time t0; since t0 is arbitrary, this will

achieve the theorem.

Since (x + y)2 ≤ 2×2 + 2y2, then

E sup

r≤t

|Xi+1(r) − Xi(r)|2 = E

[

sup

r≤t

( ∫ r

0

[σ (Xi(s)) − σ (Xi−1(s))] dWs

+

∫ r

0

[b(Xi(s)) − b(Xi−1(s))] ds

)2]

≤ 2E

[

sup

r≤t

( ∫ r

0

[σ (Xi(s)) − σ (Xi−1(s))] dWs

)2]

+ 2E

[

sup

r≤t

( ∫ r

0

[b(Xi(s)) − b(Xi−1(s))] ds

)2]

.

By Doob’s inequalities (Theorem 3.6) and the fact that σ is a Lipschitz function, the first

term after the inequality is bounded by

8E

[( ∫ t

0

[σ (Xi(s)) − σ (Xi−1(s))] dWs

)2]

= 8E

∫ t

0

[σ (Xi(s)) − σ (Xi−1(s))]2 ds

≤ cE

∫ t

0

|Xi(s) − Xi−1(s)|2 ds.

By the Cauchy–Schwarz inequality, the fact that t ≤ t0, and the fact that b is a Lipschitz

function, the second term is bounded by

2E

( ∫ t

0

|b(Xi(s)) − b(Xi−1(s))| ds

)2

≤ 2t0E

∫ t

0

|b(Xi(s)) − b(Xi−1(s))|2 ds

≤ cE

∫ t

0

|Xi(s) − Xi−1(s)|2 ds.

194 Stochastic differential equations

Therefore

E sup

r≤t

|Xi+1(r) − Xi(r)|2 ≤ cE

∫ t

0

|Xi(s) − Xi−1(s)|2 ds. (24.6)

Let gi(t) = E supr≤t |Xi(r)−Xi−1(r)|2. Thus provided we choose A big enough, g1(t) ≤ A

for t ≤ t0 and

gi+1(t) ≤ A

∫ t

0

gi(s) ds, t ≤ t0.

(Clearly |Xi+1(t) − Xi(t)|2 ≤ supr≤t |Xi+1(r) − Xi(r)|2.) Thus

g2(t) ≤ A

∫ t

0

g1(s) ds ≤ A

∫ t

0

A ds = A2t,

g3(t) ≤ A

∫ t

0

g2(s) ds ≤ A

∫ t

0

A2s ds = A3t2/2,

and continuing by induction,

gi(t) ≤ Aiti−1/(i − 1)!

Exercise 24.1 asks you to show that if we define

‖Y ‖t = (E sup

r≤t

|Yr|2)1/2 (24.7)

when Y is a stochastic process, then ‖Y ‖t is a norm and the corresponding metric is complete.

Hence

(E sup

r≤t0

|Xn(s) − Xm(s)|2)1/2 = ‖Xn − Xm‖t0

≤

n−1∑

i=m

‖Xi+1 − Xi‖t0

≤

n−1∑

i=m

(gi(t0))

1/2

can be made small by taking m, n large. (We use the ratio test to show that the sum∑

(Aiti−10 /(i − 1)!)1/2 converges.) We have E X0(t)2 < ∞. By the completeness of ‖ · ‖t0
there exists Xt such that E sups≤t0 |Xn(s) − Xs|2 → 0 as n → ∞. This implies there exists
a subsequence {nj} such that sups≤t0 |Xnj (s) − Xs|2 → 0 almost surely; since each Xnj is
continuous in t, then Xt is also. Taking a limit in (24.5) as n → ∞ shows Xt satisfies (24.2).
Uniqueness. Suppose Xt and X ′t are two solutions to (24.2). Let
g(t) = E sup
r≤t
|Xr − X ′r |2.
24.1 Pathwise solutions of SDEs 195
Very similarly to the existence proof, E supr≤t |Xr|2 < ∞, the same with X replaced by X ′,
and
E sup
r≤t
|Xr − X ′r |2 ≤ 2E
[
sup
r≤t
( ∫ r
0
[σ (Xs) − σ (X ′s )] dWs
)2]
+ 2E
[
sup
r≤t
( ∫ r
0
[b(Xs) − b(X ′s )] ds
)2]
≤ cE
∫ t
0
|Xs − X ′s |2 ds.
Therefore there exists A > 0 such that g(t) is bounded by A and g(t) ≤ A ∫ t0 g(s) ds.

Then g(t) ≤ A ∫ t0 A ds = A2t, g(t) ≤ A ∫ t0 A2s ds = A3t2/2, etc. Thus we have

g(t) ≤ Aiti−1/(i − 1)! for all i, which is only possible if g(t) = 0. This implies that

Xt = X ′t for all t ≤ t0, except for a null set.

We also want to consider the SDE (24.1) when σ and b are Lipschitz functions, but not

necessarily bounded. Note |σ (x)| ≤ |σ (0)| + c|x|, so that |σ (x)| is less than or equal to

c(1 + |x|), and the same for b.

Theorem 24.3 Suppose σ and b are Lipschitz functions, but not necessarily bounded. Then

there exists a pathwise solution to (24.1) and this solution is pathwise unique.

Proof Let σn and bn be bounded Lipschitz functions that agree with σ and b, respectively,

on [−n, n]. Let Xn be the unique pathwise solution to (24.1) with σ and b replaced by σn and

bn, respectively. Let Tn = inf{t : |Xn(t)| ≥ n}. We claim Xn(t) = Xm(t) if t ≤ Tn ∧ Tm; to

prove this, let g(t) = E sups≤t∧Tn∧Tm |Xn(s) − Xm(s)|2, and proceed as in the uniqueness part

of the proof of Theorem 24.2. We then have existence and uniqueness of the SDE for t ≤ Tn

for each n.

To complete the proof, it suffices to show Tn → ∞. Let

hn(t) = E sup

s≤t∧Tn

|Xn(s)|2.

Then

hn(t) ≤ c|x0|2 + cE

( ∫ t

0

σn(Xn(s)) dWs

)2

+ cE

∫ t

0

bn(Xn(s))

2 ds

≤ c|x0|2 + cE

∫ t

0

σn(Xn(s))

2 ds + ct0E

∫ t

0

bn(Xn(s))

2 ds

≤ c|x0|2 + c + cE

∫ t

0

|Xn(s)|2 ds

≤ c + c

∫ t

0

hn(s) ds,

using estimates very similar to those of the proof of Theorem 24.2. By Exercise 24.2,

hn(t) ≤ cect if t ≤ t0. Note the constant c can be chosen to be independent of n. Then

P(Tn < t0) = P(sup
s≤t0
|Xn(s)| ≥ n) ≤
E sups≤t0 |Xn(s)|2
n2
≤ hn(t0)
n2
→ 0
as n → ∞. Since t0 is arbitrary, Tn → ∞, a.s.
196 Stochastic differential equations
Although we considered one-dimensional SDEs for simplicity, the same arguments apply
when we have higher-dimensional SDEs. Let
W = (W 1, . . . ,W d )
be a d-dimensional Brownian motion, let σi j(x) be bounded Lipschitz functions for i =
1, . . . , n and j = 1, . . . , d, and let bi(x) be bounded Lipschitz functions for i = 1, . . . , n.
Consider the system of equations
dX it =
d∑
j=1
σi j(Xt ) dW
j
t + bi(Xt ) dt, i = 1, . . . , n. (24.8)
This is frequently written in matrix form
dXt = σ (Xt ) dWt + b(Xt ) dt (24.9)
where we view X = (X 1, . . . , X n) as a n × 1 matrix, b = (b1, . . . , bn) as a n × 1 matrix-
valued function, W as a d × 1 matrix, and σ as a n × d matrix-valued function. We have
existence and uniqueness to the system (24.8). Exercise 24.5 asks you to prove this in the
case when n = d, although there is nothing at all special about requiring n = d.
24.2 One-dimensional SDEs
Although our proof of pathwise existence and uniqueness was for SDEs in one dimension,
as is pointed out in Exercise 24.5, almost the same proof works in higher dimensions. In
this section we look at a pathwise uniqueness result that is valid only for SDEs on R. The
equation we look at is the same as the one in the last section, namely,
Xt = x0 +
∫ t
0
σ (Xs) dWs +
∫ t
0
b(Xs) ds. (24.10)
Theorem 24.4 Suppose b is bounded and Lipschitz. Suppose there exists a continuous
function ρ : [0, ∞) → [0, ∞) such that ρ(0) = 0,∫ ε
0
ρ−2(u) du = ∞ (24.11)
for all ε > 0, and σ is bounded and satisfies

|σ (x) − σ (y)| ≤ ρ(|x − y|)

for all x and y. Then the solution to (24.10) is pathwise unique.

For an example, let b(x) = 0 for all x, and let σ be Hölder continuous of order α, that is,

there exists c such that |σ (x) − σ (y)| ≤ c|x − y|α. Then we take ρ(x) = xα, and the integral

condition in the theorem is satisfied if and only if α ≥ 1/2. If (24.11) holds for all ε > 0,

we say the Yamada–Watanabe condition holds.

Instead of proving this theorem right away and then essentially repeating the proof to give

a comparison theorem, we will state and prove a comparison theorem (Theorem 24.5) and

then obtain Theorem 24.4 as a corollary of Theorem 24.5.

We only prove the uniqueness of the solution to (24.10) here. The existence is a conse-

quence of some measure-theoretic magic; see Revuz and Yor (1999), Theorem IX.1.7.

24.2 One-dimensional SDEs 197

Theorem 24.5 Suppose σ satisfies the conditions in Theorem 24.4. Suppose Xt satisfies

(24.10) with b a Lipschitz function. Suppose Yt is a continuous semimartingale satisfying

Yt ≥ Y0 +

∫ t

0

σ (Ys) dWs +

∫ t

0

B(Ys) ds,

where B is a Borel measurable function and B(z) ≥ b(z) for all z. If Y0 ≥ x, a.s., then Yt ≥ Xt

almost surely for all t.

Proof Let an ↓ 0 be selected so that∫ an−1

an

(ρ(u))−2 du = n.

This can be done inductively. Choose a0 arbitrarily. Since

∫ a0

r ρ(x)

−2 dx increases to in-

finity as r → 0, we can choose a1 such that

∫ a0

a1

ρ(x)−2 dx = 1; in a similar man-

ner we choose a2, a3, . . .. Let hn be continuous, supported in (an, an−1), 0 ≤ hn(u) ≤

2/nρ2(u), and

∫ an−1

an

hn(u) du = 1 for each n. The idea here is to start with the function

(1 + ε)1(an,an−1 )(u)/(nρ(u)2) for some small ε, and then modify this near an and an−1 to get

a function that is continuous, is supported in (an, an−1), and integrates to 1. Let fn be such

that fn(0) = f ′n(0) = 0 and f ′′n = hn. Note

f ′n(u) =

∫ u

0

f ′′n (s) ds =

∫ u

0

hn(s) ds ≤ 1

and f ′n(u) ≥ 0, so 0 ≤ f ′n(u) ≤ 1 and f ′n(u) = 1 if u ≥ an−1. Hence fn(u) ↑ u as n → ∞ for

each u ≥ 0.

Since x ≤ y, then fn(x − y) = 0, and we have by Itô’s formula

fn(Xt − Yt ) = martingale +

∫ t

0

f ′n(Xs − Ys)[b(Xs) − B(Ys)] ds (24.12)

+ 12

∫ t

0

f ′′n (Xs − Ys)[σ (Xs) − σ (Ys)]2 ds.

We take expectations of both sides. The martingale term has 0 expectation. The final term

on the right-hand side is bounded in expectation by

1

2E

∫ t

0

2

n(ρ|Xs − Ys|)2 (ρ|Xs − Ys|)

2 ds ≤ t

n

by the assumptions on σ and the bound on f ′′n = hn, and so goes to 0 as n → ∞. The

expectation of the second term on the right of (24.12) is bounded above by

E

∫ t

0

f ′n(Xs − Ys)[b(Xs) − b(Ys)] ds + E

∫ t

0

f ′n(Xs − Ys)[b(Ys) − B(Ys)] ds

≤ cE

∫ t

0

(1[0,∞)(Xs − Ys)) |Xs − Ys| ds

= cE

∫ t

0

(Xs − Ys)+ ds.

198 Stochastic differential equations

Letting n → ∞,

E (Xt − Yt )+ ≤ c

∫ t

0

E (Xs − Ys)+ ds.

If we set g(t) = E (Xt − Yt )+, we have

g(t) ≤ c

∫ t

0

g(s) ds,

and by Exercise 24.2 we conclude g(t) = 0 for each t. Using the continuity of the paths of

Xt and Yt completes the proof.

We now prove Theorem 24.4.

Proof of Theorem 24.4 Suppose X and X ′ are two solutions to (24.10). Then by Theorem

24.5 with Y = X ′ and B = b, we have Xt ≤ X ′t for all t. Applying this argument with X and

X ′ reversed yields X ′t ≤ Xt for all t, which completes the proof.

24.3 Examples of SDEs

Ornstein–Uhlenbeck process

The Ornstein–Uhlenbeck process is the solution to the SDE

dXt = dWt − Xt

2

dt, X0 = x. (24.13)

The existence and uniqueness follow by Theorem 24.3. Note that the drift coefficient is not

bounded, so Theorem 24.2 is not sufficient. The process behaves like a Brownian motion,

with a drift that pushes the process towards the origin; the farther the process gets from the

origin, the stronger the push.

The equation (24.13) can be solved explicitly. Rearranging, multiplying by et/2, and using

the product rule,

d[et/2Xt] = et/2 dXt + et/2 Xt

2

dt = et/2 dWt,

so

et/2Xt = X0 +

∫ t

0

es/2 dWs,

or

Xt = e−t/2x + e−t/2

∫ t

0

es/2 dWs. (24.14)

We used here the fact that the martingale part of the semimartingale Zt = et/2 is zero, and

therefore 〈Z,W 〉t = 0. By Exercise 24.6, Xt is a Gaussian process and the distribution of Xt

is that of a normal random variable with mean e−t/2x and variance equal to e−t

∫ t

0 (e

s/2)2 ds =

1 − e−t .

If we let Yt =

∫ t

0 e

s/2 dWs and Vt = Ylog(t+1), then Yt is a mean-zero continuous Gaussian

process with independent increments, and hence so is Vt . Since

Var (Vu − Vt ) =

∫ log(u+1)

log(t+1)

es ds = u − t,

24.3 Examples of SDEs 199

then Vt is a Brownian motion. Hence

Xt = e−t/2x + e−t/2V (et − 1).

This representation of an Ornstein–Uhlenbeck process in terms of a Brownian motion is

useful for, among other things, calculating the exit probabilities of a square root boundary.

Linear equations

We consider the linear equation

dXt = AXt dWt + BXt dt, X0 = x0, (24.15)

where A and B are constants. One place this comes up is in models of stock prices in financial

mathematics; see Chapter 28. We have pathwise existence and uniqueness by Theorem 24.3;

here both the diffusion and drift coefficients are unbounded.

We will give a candidate for the solution, and verify that it solves (24.15). By the pathwise

uniqueness, this will then be the only solution. Our candidate is

Xt = x0eAWt+(B−A2/2)t .

To verify that this is a solution, we use Itô’s formula with the process AWt + (B − A2/2)t and

the function ex:

Xt = x0 +

∫ t

0

eAWs+(B−A

2/2)sA dWs +

∫ t

0

eAWs+(B−A

2/2)s(B − A2/2) ds

+ 12

∫ t

0

eAWs+(B−A

2/2)sA2 ds

= x0 +

∫ t

0

eAWs+(B−A

2/2)sA dWs +

∫ t

0

eAWs+(B−A

2/2)sB ds

= x0 +

∫ t

0

AXs dWs +

∫ t

0

BXs ds.

Let us summarize our discussion.

Proposition 24.6 The unique pathwise solution to

dXt = AXt dWt + BXt dt

is

Xt = X0eAWt+(B−A2/2)t .

If we write Zt = AWt + Bt, then (24.15) becomes

dXt = Xt dZt, X0 = x0. (24.16)

The equation (24.16) makes sense for arbitrary continuous semimartingales Z, and by using

Itô’s formula as above, one can see that a solution is Xt = x0eZt−〈Z〉t/2.

200 Stochastic differential equations

Bessel processes

We consider Bessel processes and the squares of Bessel processes. The reason for the name

is that these processes turn out to be Markov processes and the infinitesimal generator of the

semigroup (see Chapter 37) is related to Bessel’s equation, a type of differential equation.

A Bessel process of order ν ≥ 2 is defined to be a solution of the SDE

dXt = dWt + ν − 1

2Xt

dt, X0 = x. (24.17)

Bessel processes of order 0 ≤ ν < 2 can also be defined using (24.17), but only up until the
first time the process X reaches 0; some extra information needs to be given as to what the
process does at 0. The square of a Bessel process of order ν ≥ 0 is defined to be the solution
to the SDE
dYt = 2
√
|Yt | dWt + ν dt, Y0 = y. (24.18)
There is no difficulty defining the square of a Bessel process for 0 ≤ ν < 2.
By Theorem 24.4 we have pathwise uniqueness for the solution to (24.18), because
| |y|1/2 − |x|1/2| ≤ |y − x|1/2, and we can thus take ρ(u) = 2u1/2 in Theorem 24.4. The
solution to (24.18) when ν = 0 and y = 0 is clearly Yt = 0 for all t. By Theorem 24.5 with
b(x) = ν and B(x) = 0, we see that the solution to (24.18) is greater than or equal to 0 for
all t. We may thus omit the absolute value in (24.18) and rewrite it as
dYt = 2
√
Yt dWt + ν dt, Y0 = y. (24.19)
If we apply Itô’s formula to the solution Yt of (24.19) with the function
√
x, we see that
Xt =
√
Yt solves (24.17) for t up until the first time Y reaches 0; the function
√
x is twice
continuously differentiable as long as we stay away from 0. We will see shortly that the square
of a Bessel process started away from 0 never hits 0 if and only if ν ≥ 2.
Using Itô’s formula with a d-dimensional process Wt and the function |x|2 shows that the
square of the modulus of a d-dimensional Brownian motion is the square of a Bessel process
of order d; this is Exercise 24.7.
Bessel processes have the same scaling properties as Brownian motion. That is, if Xt is a
Bessel process of order ν started at x, then aXa−2t is a Bessel process of order ν started at ax.
In fact, from (24.17),
d(aXa−2t ) = a dWa−2t + a2
ν − 1
2aXa−2t
d(a−2t),
and the assertion follows from the uniqueness of the solution to (24.17) and the fact that
aW (a−2t) is again a Brownian motion.
Bessel processes are useful for comparison purposes, and so the following is worthwhile.
Proposition 24.7 Suppose Yt is the square of a Bessel process of order ν. Suppose Y0 = y.
The following hold with probability one.
(1) If ν > 2 and y > 0, Yt never hits 0.

(2) If ν = 2 and y > 0, Yt hits every neighborhood of 0, but never hits the point 0.

(3) If 0 < ν < 2, Yt hits 0.
(4) If ν = 0, then Yt hits 0. If started at 0, then Yt remains at 0 forever.
Exercises 201
When we say that Yt hits 0, we consider only times t > 0. We define T0 = inf{t > 0 : Yt = 0}

and say that Yt hits 0 if T0 < ∞.
Proof We prove (2). An application of Itô’s formula with the process being the square of
a Bessel process of order 2 and the function being log x shows that logYt is a martingale up
until the first hitting time of 0; cf. Exercise 21.1. The quadratic variation of logYt is
∫ t
0 Y
−2
s ds
for t less than the hitting time of 0. Suppose 0 < a < y < b.
We claim that Yt leaves the interval [a, b], a.s. If not, 〈logY 〉t ≥ b−2t → ∞ as t → ∞.
Since logYt is a martingale, it is a time change of Brownian motion, and Brownian motion
leaves [log a, log b] with probability one, a contradiction.
Then by Corollary 3.17,
P(Yt hits a before b) = log b − log y
log b − log a . (24.20)
Letting b → ∞, we see that P(Yt hits a) = 1, and since a is arbitrary, Yt hits every neighbor-
hood of 0. If in (24.20) we hold b fixed instead and let a → 0, we see P(Yt hits 0 before b) = 0;
since b is arbitrary, this proves that Yt never hits the point 0.
Parts (1), (3), and (4) are similar, but instead of log |x| we use |x|(2−ν)/2. The details are
left as Exercise 24.8.
Exercises
24.1 Show that ‖ · ‖t defined by (24.7) gives rise to a complete normed linear space.
24.2 Suppose g(t) is non-negative and bounded on each finite subinterval of [0,∞). Suppose there
exist constants A and B such that
g(t) ≤ A + B
∫ t
0
g(s) ds (24.21)
for each t ≥ 0. Prove that g(t) ≤ AeBt for all t ≥ 0. This result is known as Gronwall’s lemma.
Hint: Write
g(t) ≤ A + B
∫ t
0
[
A + B
∫ s
0
g(r) dr
]
ds,
use (24.21) to substitute for g(r), and iterate.
24.3 The starting point in (24.1) can be random. Suppose Y is a random variable that is measurable
with respect toF0, Y is square integrable, and σ and b are bounded and Lipschitz. Prove pathwise
existence and uniqueness for the equation
Xt = Y +
∫ t
0
σ (Xs) dWs +
∫ t
0
b(Xs) ds.
24.4 The functions σ and b in (24.1) can depend on time as well as space. Suppose σ : [0,∞)×R →
R, b : [0,∞)×R → R are bounded and uniformly Lipschitz in the second variable: there exists
c independent of s such that |σ (s, x) − σ (s, y)| ≤ c|x − y| and similarly for b. Prove pathwise
existence and uniqueness for the equation
Xt = x0 +
∫ t
0
σ (s, Xs) dWs +
∫ t
0
b(s, Xs) ds.
202 Stochastic differential equations
24.5 Here is a multidimensional analog of (24.1). Suppose the functions σi j : Rd → R, 1 ≤
i, j ≤ d, are bounded and Lipschitz, and bi : Rd → R, i = 1, . . . , d, are bounded and
Lipschitz, W j are independent one-dimensional Brownian motions, x0 = (x(1)0 , . . . , x(d)0 ), and
Xt = (X (1)t , . . . , X )d)t ) satisfies
X (i)t = x(i)0 +
∫ t
0
d∑
j=1
σi j(Xs) dW
j
s +
∫ t
0
bi(Xs) ds (24.22)
for i = 1, . . . , d. Prove pathwise existence and uniqueness for this system of equations.
24.6 Suppose f and g map [0,∞) → R with ∫∞0 f (t)2 dt < ∞ and ∫∞0 g(t)2 dt < ∞. Show that∫∞
0 f (t) dWt is a mean zero Gaussian random variable, the same with f replaced by g, and
Cov
( ∫ ∞
0
f (t) dWt ,
∫ ∞
0
g(t) dWt
)
=
∫ ∞
0
f (t)g(t) dt.
Hint: Approximate f and g by piecewise constant deterministic functions.
24.7 Show that if Wt is a d-dimensional Brownian motion, then |Wt |2 is the square of a Bessel process
of order d.
24.8 Prove (1), (3), and (4) of Proposition 24.7.
24.9 Let X be the solution to dXt = σ (Xt ) dWt + b(Xt ) dt, where W is a one-dimensional Brownian
motion, σ and b are Lipschitz continuous real-valued functions, and |σ (x)| ≤ c(1 + |x|) and
|b(x)| ≤ c(1 + |x|). Let t0 > 0. Prove that if p ≥ 2, then

E [sup

s≤t0

|Xs|p] ≤ c(1 + |x0|p).

24.10 Let W be a one-dimensional Brownian motion and let X xt be the solution to

dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x.

Suppose σ and b are C∞ functions and that σ and b and all their derivatives are bounded. Show

that for each t the map x → X xt is continuous in x with probability one. Show that the map is

differentiable in x.

24.11 Suppose A(t) and B(t) are deterministic functions of t. Find an explicit solution to the one-

dimensional SDE

dXt = A(t) dWt + B(t) dt, X0 = x.

Notes

If one wants to have a stochastic differential equation with jumps, besides a Brownian motion,

one integrates with respect to a Poisson point process, which is defined in Chapter 18. Using

the notation of that chapter, one considers the stochastic differential equation

dXt = σ (Xt−) dWt + b(Xt−) dt

+

∫

S

F (Xt−, z) (μ(dt, dz) − ν(dt, dz)), X0 = x0,

Notes 203

which means that we want a solution to

Xt = x0 +

∫ t

0

σ (Xs−) dWs +

∫ t

0

b(Xs−) ds

+

∫ t

0

∫

S

F (Xs−, z) (μ(ds, dz) − ν(ds, dz)).

There is pathwise existence and uniqueness to this SDE provided F satisfies a suitable

Lipschitz-like condition; see Skorokhod (1965).

25

Weak solutions of SDEs

In Chapter 24 we considered SDEs of the form

dXt = σ (Xt ) dWt + b(Xt ) dt, (25.1)

where W is a Brownian motion and σ and b are Lipschitz functions, or in one dimension,

where σ has a modulus of continuity satisfying an integral condition. When the coefficients

σ and b fail to be sufficiently smooth, it is sometimes the case that (25.1) may not have a

pathwise solution at all, or it may not be unique. We define another notion of existence and

uniqueness that is useful.

Definition 25.1 A weak solution (X ,W, P) to (25.1) exists if there exists a probability

measure P and a pair of processes (Xt,Wt ) such that Wt is a Brownian motion under P

and (25.1) holds. There is weak uniqueness holding for (25.1) if whenever (X ,W, P) and

(X ′,W ′, P′) are two weak solutions, then the joint law of (X ,W ) under P and the joint law

of (X ′,W ′) under P′ are equal. When this happens, we also say that the solution to (25.1) is

unique in law.

Let us discuss the relationship between weak solutions and pathwise solutions. If the

solution to (25.1) is pathwise unique, then weak uniqueness holds. For a proof of this result

under very general hypotheses, see Revuz and Yor (1999), theorem IX.1.7. In the case that

σ and b are Lipschitz functions, the proof is much simpler.

Proposition 25.2 Suppose σ and b are bounded Lipschitz functions and x0 ∈ Rd . Then weak

uniqueness holds for (25.1).

Proof For notational simplicity we consider the case of dimension one. Suppose (X ,W, P)

and (X ′,W ′, P′) are two weak solutions to (25.1). Let X0(t) = x0 and define Xi+1(t) by

Xi+1(t) = x0 +

∫ t

0

σ (Xi(s)) dWs +

∫ t

0

b(Xi(s)) ds. (25.2)

We saw by the proof of Theorem 24.2 that the limit of the Xi exists, uniformly over finite

time intervals, and solves (25.1), and the solution is pathwise unique. Since X also solves

(25.1), we conclude that Xi converges (uniformly over finite time intervals) to X , a.s., with

respect to P. Similarly, if we let X ′0 (t) = x0 and define X ′i+1(t) by

X ′i+1(t) = x0 +

∫ t

0

σ (X ′i (s)) dW

′

s +

∫ t

0

b(X ′i (s)) ds, (25.3)

then X ′i converges, uniformly over finite time intervals, to X

′.

204

Weak solutions of SDEs 205

Now since W is a Brownian motion under P and W ′ is a Brownian motion under P′, then

the law of (X0,W ) under P equals the law of (X ′0,W

′) under P′. By (25.2) and (25.3), the

law of (X1,W ) under P equals the law of (X ′1,W

′) under P′, and iterating, the law of (Xi,W )

under P equals the law of (X ′i ,W

′) under P′. Passing to the limit, the law of (X ,W ) under P

equals the law of (X ′,W ′) under P′.

We now give an example to show that weak uniqueness might hold even if pathwise

uniqueness does not. Let σ (x) be equal to 1 if x ≥ 0 and −1 otherwise. We take b to be

identically 0. We consider solutions to

Xt =

∫ t

0

σ (Xs) dWs. (25.4)

Weak uniqueness holds since ifW is a Brownian motion under P, then Xt must be a martingale,

and the quadratic variation of X is d〈X 〉t = σ (Xt )2 dt = dt; by Lévy’s theorem (Theorem

12.1), Xt is a Brownian motion. Given a Brownian motion Xt and letting Wt =

∫ t

0

1

σ (Xs)

dXs,

then again by Lévy’s theorem, Wt is a Brownian motion and Xt =

∫ t

0 σ (Xs) dWs; thus weak

solutions exist.

On the other hand, pathwise uniqueness does not hold. To see this, let Yt = −Xt . We have

Yt =

∫ t

0

σ (Ys) dWs − 2

∫ t

0

1{0}(Xs) dWs. (25.5)

The second term on the right has quadratic variation 4

∫ t

0 1{0}(Xs) ds; this is 0 almost surely

because we showed in Exercise 11.1 that the amount of time Brownian motion spends at 0

has Lebesgue measure 0. Therefore Y is another pathwise solution to (25.4).

This example is not satisfying because one would like σ to be positive and even continuous

if possible. Such examples exist, however. For each β < 1/2, Barlow (1982) has constructed
functions σ that are Hölder continuous of order β and bounded above and below by positive
constants and for which
dXt = σ (Xt ) dWt, X0 = x0, (25.6)
has a unique weak solution but no pathwise solution exists.
Let us show how the technique of time change can be used to study weak uniqueness. We
consider the SDE (25.6).
Proposition 25.3 If σ is Borel measurable and there exist c2 > c1 > 0 such that c1 ≤

σ (x) ≤ c2 for all x, then weak existence and weak uniqueness hold for (25.6).

Proof We consider only uniqueness, leaving existence as Exercise 25.1. Suppose (X ,W, P)

and (X ′,W ′, P′) are two weak solutions. Then Xt is a martingale, and as in Section 12.2, if

we set

At =

∫ t

0

σ (Xs)

2 ds, τt = inf{s : As ≥ t},

then Mt = Xτt is a Brownian motion under P. Define A′, τ ′, and M ′ analogously. The law of

M under P is that of a Brownian motion, as is that of M ′ under P′.

206 Weak solutions of SDEs

Now let

Bt =

∫ t

0

1

σ (Ms)2

ds, ρt = inf{s : Bs ≥ t}. (25.7)

Since Mt is a Brownian motion and σ is bounded above and below by positive constants,

then Bt is continuous, strictly increasing, and increases to infinity as t → ∞, and the same

is therefore true of ρt . By a change of variables,

Bt =

∫ t

0

1

σ (Xτs )2

ds =

∫ τt

0

1

σ (Xu)2

dAu

=

∫ τt

0

1

σ (Xu)2

σ (Xu)

2 du = τt .

Therefore Mρt = Xτ (ρt ) = Xt . We have the analogous formulas with primes.

The law of M under P equals the law of M ′ under P′ since both are Brownian motions, so

by (25.7) the law of (M, B) under P equals the law of (M ′, B′) under P′, and consequently

the law of (M, ρ) under P equals the law of (M ′, ρ ′) under P′. Since Xt = Mρt and similarly

for X ′, we conclude the law of X under P equals the law of X ′ under P′. Finally, since

Wt =

∫ t

0

1

σ (Xs)

dXs and similarly for W ′, the joint law of (X ,W ) under P equals the joint law

of (X ′,W ′) under P′.

We point out that in the above proof it is essential that one can reconstruct X from M in a

measurable way.

We now use the Girsanov theorem to prove weak uniqueness for (25.1).

Proposition 25.4 Suppose σ and b are measurable and bounded above and σ is bounded

below by a positive constant. Then weak existence and uniqueness holds for (25.1).

Proof We prove the weak uniqueness, leaving it as Exercise 25.2 to prove existence. Define

{F t} to be the minimal augmented filtration generated by X ,

Mt = exp

(

−

∫ t

0

b

σ

(Xs) dWs − 12

∫ t

0

( b

σ

(Xs)

)2

ds

)

,

and Q the probability measure defined by Q(A) = E P[Mt; A] if A ∈ Ft . By Theorem 13.3,

under Q, the process W̃t = Wt +

∫ t

0 (b/σ )(Xs) ds is a Brownian motion, and

dXt = σ (Xt )

(

dWt + b

σ

(Xt ) dt

)

= σ (Xt ) dW̃t .

Define M ′, Q′, and W̃ ′ analogously. By Proposition 25.3 the law of (X ,W̃ ) under Q is

equal to the law of (X ′,W̃ ′) under Q′. Let n ≥ 1, t1 < · · · < tn, and let A1, . . . , An be Borel
subsets of R. Set B = {Xt1 ∈ A1, . . . , Xtn ∈ An} and define B′ analogously. We have
P(B) =
∫
B
dP
dQ
dQ =
∫
B
exp
( ∫ t
0
b
σ
(Xs) dWs + 12
∫ t
0
( b
σ
(Xs)
)2
ds
)
dQ
=
∫
B
exp
( ∫ t
0
b
σ
(Xs) dW̃s − 12
∫ t
0
( b
σ
(Xs)
)2
ds
)
dQ.
Exercises 207
Using the analogous formula for P′(B′) and the fact that the law of (X ,W̃ ) under Q is the
same as that of (X ′,W̃ ′) under Q′, we see that P(B) = P′(B′); thus the finite-dimensional
distributions of X under P and of X ′ under P′ are the same. Since both X and X ′ are
continuous processes, we conclude from Theorem 2.6 that the law of X under P equals the
law of X ′ under P′. Defining Yt = Xt −
∫ t
0 b(Xs) ds and similarly for Y
′, the joint law of (X ,Y )
under P equals the joint law of (X ′,Y ′) under P′. Finally, Wt =
∫ t
0
1
σ (Xs)
dYs and similarly for
W ′, so we obtain our conclusion.
The procedure of using the Girsanov theorem to get rid of the drift also works in higher
dimensions. However the time change procedure of Proposition 25.3 is not nearly as useful
in higher dimensions as in one dimension. The question of weak uniqueness for the system
of equations in Exercise 24.5 is quite an interesting one; see Bass (1997) and Stroock and
Varadhan (1977).
Exercises
25.1 Show weak existence holds under the hypotheses of Proposition 25.3.
25.2 Show weak existence holds under the hypotheses of Proposition 25.4.
25.3 Here is an example of an SDE where weak uniqueness does not hold. Suppose W is a one-
dimensional Brownian motion and α ∈ (0, 12 ). Let σ (x) = 1 ∧ |x|α . Find two solutions to
dXt = σ (Xt ) dWt
that are not equal in law.
Hint: One is the solution that is identically zero. The other can be constructed by time changing
a Brownian motion by the inverse of the increasing process
2
∫ t
0
(1 ∧ |Xs|2α )−1 ds.
25.4 (1) Suppose as and bs are bounded predictable processes with as bounded below by a positive
constant. Let W be a one-dimensional Brownian motion. Suppose Y is a one-dimensional
semimartingale such that
dYt = atYt dWt + bt dt, Y0 = 0.
Prove that if t0 > 0 and ε > 0, there exists a constant c > 0 depending only on t0, ε, and the

bounds on as and bs such that

P(sup

s≤t0

|Ys| < ε) > c.

(2) Now let W be d-dimensional Brownian motion, let x ∈ Rd , and let σ be a d × d matrix-

valued function that is bounded and such that σσ T (x) is positive definite, uniformly in x. That

is, there exists � > 0 such that for all x,

d∑

i, j=1

yiy j(σ (x)σ

T (x))i j ≥ �

d∑

i=1

y2i , (y1, . . . , yd ) ∈ Rd .

Let b be a d × 1 matrix-valued function that is bounded. Let X be the solution to

dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x.

208 Weak solutions of SDEs

Use Itô’s formula to find an equivalent expression for |Xt − x|2. Then use (1) to prove that if

t0 > 0 and ε > 0, there exists a constant c > 0 not depending on x such that

Px(sup

s≤t0

|Xs − x| < ε) > c.

25.5 This is the support theorem for solutions to SDEs. Let X , x, ε, and t0 be as in (2) of Exercise

25.4. Suppose ψ : [0, t0] → Rd is a continuous function with ψ(0) = x. Use the Girsanov

theorem to prove that there exists c > 0 such that

Px(sup

s≤t0

|Xs − ψ(s)| < ε) > c.

25.6 Suppose weak uniqueness holds for the one-dimensional stochastic differential equation

dXt = σ (Xt ) dWt , X0 = x, (25.8)

where W is a one-dimensional Brownian motion. Suppose also that there exists a process X ′

that is adapted to the minimal augmented filtration of W with X ′0 = x and dX ′t = σ (X ′t ) dWt .

Prove that pathwise uniqueness holds for (25.8).

Hint: Show there exists a measurable map F fromC[0,∞) → C[0,∞) such that X ′ = F (W ).

If X ′′ is another solution to (25.8), then weak uniqueness shows that the laws of (X ′′,W ) and

(X ′,W ) are equal, hence X ′′ = F (W ) = X ′.

26

The Ray–Knight theorems

The local time of Brownian motion, Lxt , is parameterized by space and time: x and t. Ray

and Knight independently discovered that at certain stopping times T , the process x → LxT

is a Markov process.

Times that work are (1) the first time local time at 0 reaches a level r; (2) an exponential

random variable T that is independent of the Brownian motion; and (3) the first time T

that Brownian motion reaches the level one. We will prove the version of the Ray–Knight

theorems in the last case. We will show that if W is a Brownian motion with local times Lxt

and

T = inf{t > 0 : Wt = 1},

then the process L1−xT indexed by x has the same law as the square of a Bessel process of

order 2. We will see in Chapter 39 that the square of a Bessel process is a Markov process.

We will use the following lemma.

Lemma 26.1 Suppose X ( j)t , j = 1, 2, are two continuous processes such that

E exp

(

−

∫ 1

0

f (s)X (1)s ds

)

= E exp

(

−

∫ 1

0

f (s)X (2)s ds

)

whenever f is a non-negative continuous function with support in (0, 1). Then the laws of

{X ( j)t ; 0 ≤ t ≤ 1}, j = 1, 2, are equal.

Proof Let ϕ be a non-negative continuous function with support in [0, 1] such that∫ 1

0 ϕ(x) dx = 1, and let ϕε(x) = ε−1ϕ(x/ε), so that the sequence {ϕε} is an approxima-

tion to the identity. If g is a continuous function and t �= 0, then ∫ g(s)ϕε(s − t) ds → g(t).

Now let t1, . . . , tn ∈ (0, 1), a1, . . . , an > 0, and set fε(x) =

∑n

i=1 aiϕε(x − ti). Using the

hypothesis and letting ε → 0, we obtain

E exp

(

−

n∑

i=1

aiX

(1)

ti

)

= E exp

(

−

n∑

i=1

aiX

(2)

ti

)

.

The left-hand side is the joint Laplace transform of (X (1)t1 , . . . , X

(1)

tn ) and the right-hand side

is the same for X (2). By the uniqueness of the Laplace transform, the finite-dimensional

distributions of X (1) and X (2) are equal. Both processes have continuous paths, and the

conclusion now follows from Theorem 2.6.

209

210 The Ray–Knight theorems

Let Bt be a Brownian motion, not necessarily the same as Wt , and let Zt be the non-negative

solution to

dZt = 2

√

Zt dBt + 2 dt, Z0 = 0, 0 ≤ t ≤ 1. (26.1)

The solution to this equation is unique by Theorem 24.4, and Zt is the square of a Bessel

process of order 2.

Theorem 26.2 The processes {L1−xT ; 0 ≤ x ≤ 1} and {Zx; 0 ≤ x ≤ 1} have the same law.

Proof Let f ≥ 0 be a continuous function whose support [a, b] is a subset of (0, 1). Let F

be the solution to

F ′′(x) = 2F (x) f (x), F (1) = 1, F ′(1) = 0;

see Exercise 26.1. Define g(x) = f (1 − x) and G(x) = F (1 − x), so that G′′ = 2Gg,

G′(0) = 0, and G(0) = 1. We will show

E exp

(

−

∫ 1

0

f (x)L1−xT dx

)

= E exp

(

−

∫ 1

0

f (t)Zt dt

)

, (26.2)

and then apply Lemma 26.1.

The left-hand side of (26.2) is equal to

E exp

(

−

∫ 1

0

f (1 − x)LxT dx

)

= E exp

(

−

∫ 1

0

g(x)LxT dx

)

= E exp

(

−

∫ T

0

g(Xs) ds

)

,

where the last equality follows from the occupation time formula (Theorem 14.4). Let

Mt = G(Wt )e−

∫ t

0 g(Ws ) ds.

By Itô’s formula and the product formula,

dMt = −G(Wt )g(Wt )e−

∫ t

0 g(Ws) ds dt + G′(Wt )e−

∫ t

0 g(Ws ) ds dWt

+ 12 G′′(Wt )e−

∫ t

0 g(Ws ) ds dt

= G′(Wt )e−

∫ t

0 g(Ws ) ds dWt,

since 12 G

′′ − Gg = 0. Therefore Mt is a martingale. Since G is bounded on (−∞, 1], then

Mt∧T is bounded and we then have

1 = G(0) = E M0 = E MT = E G(1)e−

∫ T

0 g(Ws ) ds,

so

E exp

(

−

∫ T

0

g(Ws) ds

)

= 1

G(1)

. (26.3)

Now look at the right-hand side of (26.2). Let

Nt = 1

F (t)

exp

(

Zt

F ′(t)

2F (t)

−

∫ t

0

f (s)Zs ds

)

.

The Ray–Knight theorems 211

Let

Yt = Zt F

′(t)

2F (t)

,

so using (26.1),

dYt = Zt 2F (t)F

′′(t) − 2F ′(t)2

4F (t)2

dt + 2 F

′(t)

2F (t)

√

Zt dBt + 2 F

′(t)

2F (t)

dt.

If

Xt = Yt −

∫ t

0

f (s)Zs ds,

then the martingale part of X is ∫ t

0

F ′(s)

F (s)

√

Zs dBs,

and hence

d〈X 〉t =

(F ′(t)

F (t)

)2

Zt dt.

By Itô’s formula and the product formula and using F ′′ = 2F f ,

dNt = − F

′(t)

F (t)2

eXt dt + 1

F (t)

eXt

{

Zt

F ′′(t)

2F (t)

dt − Zt F

′(t)2

2F (t)2

dt

+ F

′(t)

F (t)

√

Zt dBt + F

′(t)

F (t)

dt − f (t)Zt dt

}

+ 12

1

F (t)

eXt

F ′(t)2

F (t)2

Zt dt

=F

′(t)

F (t)

√

Zt dBt .

Observe that F is continuous and positive on [0, 1], hence bounded below on [0, 1] by a

positive constant. Also F ′ is bounded above on [0, 1]. We see that Nt∧1 is a martingale. Then

E N0 = 1/F (0) = 1/G(1), while

E N1 = 1

F (1)

E exp

(

Z1

F ′(1)

2F (1)

−

∫ 1

0

f (s)Zs ds

)

= E e−

∫ 1

0 f (s)Zs ds.

Therefore

E exp

(

−

∫ 1

0

f (s)Zs ds

)

= E N1 = E N0 = 1

G(1)

.

Combining with (26.3), we conclude the two sides of (26.2) are equal.

212 The Ray–Knight theorems

You may wonder how the function F was arrived at. Exercises 26.2 and 26.3 may shed

some light on this.

Exercises

26.1 Suppose f is a non-negative continuous function whose support [a, b] is a subset of (0, 1).

Show that there is a unique solution to the ordinary differential equation F ′′(x) = 2F (x) f (x),

F (1) = 1, F ′(1) = 0, that F is everywhere positive, and F is bounded on [0,∞).

Hint: Since f is zero in (b,∞), then F ′′ is zero there, and hence is of the form F (x) = Ax+B

for some A and B for x ≥ b. Since F ′(1) = 0, conclude that A is 0.

26.2 Suppose Xt is a solution to the one-dimensional SDE

dXt = σ (Xt ) dWt + b(Xt ) dt.

Suppose σ and b are bounded and continuous and f is a bounded and continuous function. What

ordinary differential equation must H (x) satisfy (in terms of σ, b, and f ) in order that

Mt = H (Xt )e

∫ t

0 f (Xs) ds

be a martingale?

26.3 Suppose Xt is a solution to the one-dimensional SDE

dXt = σ (Xt ) dWt + b(Xt ) dt.

Suppose σ and b are bounded and continuous and f is a bounded and continuous function. What

partial differential equation must K(x, t)) satisfy (in terms of σ, b, and f ) in order that

Nt = K(Xt , t)e

∫ t

0 f (s)Xs ds

be a martingale?

26.4 Let W be a Brownian motion and Lyt the local times at level y. Prove that local times at a fixed

time t are not a Markov process. That is, let t > 0 be fixed and show that (Lyt , y ≥ 0) is not a

Markov process in the variable y.

26.5 Let S be the first time two-dimensional Brownian motion exits the unit ball and let ψ(λ) =

P0(S > λ). If W is a one-dimensional Brownian motion with local times Lxt and T = inf{t >

0 : Wt = 1}, find the distribution of Y = sup0≤x≤1 LxT in terms of ψ , i.e., write P(Y ≤ λ) in

terms of the function ψ .

26.6 Suppose x ∈ (0, 1). With W and T as in Exercise 26.5, find the distribution of LxT .

26.7 Let W be a one-dimensional Brownian motion with local times Lxt . Let Tr = inf{t > 0 : L0t = r}.

The law of the process x → LxTr can be described as follows:

(1) The law of {LxTr , x ≥ 0} is the same as the law of {Xx, x ≥ 0} started at r, where X is the

square of a Bessel process of order 0.

(2) The law of {L−xTr , x ≥ 0} is also the same as the law of {Xx, x ≥ 0} started at r, where X is

the square of a Bessel process of order 0.

(3) The processes {LxTr , x ≥ 0} and {L−xTr , x ≥ 0} are independent of each other.

Notes 213

This is proved in Revuz and Yor (1999), Section XI.2, or for a challenge, try to prove (1) for

yourself using the techniques of this chapter. Using this description of LxTr , find the distribution

of L∗Tr = supx LxTr .

Notes

There are several other proofs of the Ray–Knight theorems. One by Walsh (Rogers and

Williams, 2000b; Walsh, 1978) uses excursion theory. In the next chapter we will indicate

some ideas used in that proof.

27

Brownian excursions

The paths of a Brownian motion Wt are continuous, so the zero set Z(ω) = {t : Wt (ω) = 0}

is a closed set. The complement of Z(ω) is an open subset of the reals, hence is the countable

union of disjoint open intervals. If (a, b) is one of those intervals (depending on ω, of course),

then {Wt (ω) : a ≤ t ≤ b} is a continuous function of t that is zero at t = a and t = b but is

never 0 for any t ∈ (a, b). We call this piece of the path of Wt (ω) an excursion.

To be more formal, let E be the collection of continuous functions f with domain [0, ∞)

such that the following hold: there exists a positive real σ f such that f (0) = 0, f (σ f ) = 0,

f (t) �= 0 if t ∈ (0, σ f ), and f (t) = 0 if t > σ f . We make E into a metric space by furnishing

it with the supremum norm. Given a Borel subset A of E , we say that the Brownian motion

W has had an excursion in A by time t if there exists a time u and a function f ∈ A such that

u + σ f ≤ t and Wu+s(ω) = f (s) for all s ≤ σ f . Let Kt (A) be the number of excursions of W

in A by time t. Let L0t be Brownian local time at 0, and let

Tr = inf{t > 0 : L0t ≥ r} (27.1)

be the inverse of Brownian local time at zero.

Set

Nr(A) = KTr (A).

Although Nt (A) might be identically infinite for some sets A, it will be finite for others. For

example, let δ > 0 and suppose that every function in A has a supremum greater than δ. The

continuity of the paths of W implies that Nt (A) is finite for every t.

The main result of this section is the following.

Theorem 27.1 Nt (·) is a Poisson point process.

Proof If Nt (B) is not infinite, then it has right-continuous paths that increase at most 1

at any given time. The main step will be to show that Nt (B) has stationary increments and

Nt (B) − Ns(B) is independent of the σ -field generated by the random variables

{Nr(A) : r ≤ s, A a Borel subset of E}.

214

Brownian excursions 215

If r1 ≤ · · · ≤ rn ≤ s < t, k ≥ 0, j1, . . . , jn ≥ 0, and B and A1, . . . , An are Borel subsets of
E , then
P(Nt (B) − Ns(B) = k; Nr1 (A1) = j1, . . . Nrn (An) = jn) (27.2)
= P(KTt (B) − KTs (B) = k; KTr1 (A1) = j1, . . . , KTrn (An) = jn)
= E [PWTs (KTt−s (B) − KT0 (B) = k); KTr1 (A1) = j1, . . . , KTrn (An) = jn],
where we used the strong Markov property at time Ts. Since Ts is the first time that local time
of Brownian motion at 0 exceeds s and L0t increases only when W is at 0, then at time Ts the
process W is at 0, so WTs = 0. Therefore the last expression in (27.2) equals
P0(KTt−s (B) − K0(B) = k)P(KTr1 (A1) = j1, . . . , KTrn (An) = jn),
which can be rewritten as
P0(Nt−s(B) − N0(B) = k)P(Nr1 (A1) = j1, . . . , Nrn (An) = jn).
This shows that the law of Nt (B) − Ns(B) is the same as the law of Nt−s(B) − N0(B) and is
independent of σ (Nr(A) : r ≤ s, A ⊂ E ), which is what we wanted.
Observe that Nt (B) is constant except for jumps of size one. By Proposition 5.4, Nt (B) is
a Poisson process. It is clear that Nt (B) is a measure in B, which completes the proof.
Let m(A) = E 0N1(A). The measure A is called the excursion measure. We can say a few
things about m.
Proposition 27.2 If
A = { f ∈ E : sup
t
| f (t)| > a},

then m(A) = 1/a.

Proof Let U = inf{t : |Wt | = a} and V = inf{t > U : Wt = 0}. Since |Wt | − L0t is a

martingale by Theorem 14.1, then E 0|Wt∧U | = E 0L0t∧U . Letting t → ∞ and using dominated

convergence on the left and monotone convergence on the right,

E L0U = E 0|WU | = a.

Set R = inf{r : Nr(A) = 1}. Because Nr(A) is a Poisson process, then R is an exponential

random variable with parameter E N1(A) = m(A). It therefore suffices to show E 0R = a;

see (A.9).

We have R = inf{r : KTr (A) = 1}, and because K can only increase at times when Wt = 0,

then

R = inf{L0t : Kt (A) = 1}.

Now Kt (A) will first equal one when t = V . But because local time at 0 does not increase

when W is not at 0, L0V = L0U . Therefore

E

0R = E 0L0V = E 0L0U = a.

We conclude that m(A) = 1/a.

216 Brownian excursions

By symmetry, if B = { f ∈ E : supt f (t) > a}, then m(B) = 1/(2a). One can say

more about m. Consider those excursions whose maximum is some fixed value b. Starting

at any point other than 0, the excursion can be viewed as a Brownian motion killed at 0

and conditioned to have maximum b. Such a path can be decomposed into the part before

the maximum, which is a Brownian motion conditioned to hit b before 0, and the part after

the maximum, which is Brownian motion conditioned to hit 0 before b. The former can

be shown to have the same law as a three-dimensional Bessel process, up until it hits the

level b (see the example in Section 22.2), and the latter the same law as b − Xt , where Xt is

also a three-dimensional Bessel process up until it hits the level b. Moreover, the part of the

path before the maximum can be taken to be independent of the part of the path after the

maximum. See Rogers and Williams (2000b) for details.

Let us briefly revisit the Ray–Knight theorems and indicate how Brownian excursions

can be used to obtain information about local times at different levels. Fix r and let Tr =

inf{t > 0 : L0t ≥ r}. If x > 0 and y1, . . . , yn < 0, then the local time at x is a function of the
excursions from 0 that hit x and the local times at y1, . . . , yn are functions of the excursions
that go below zero. Since the set of excursions that take positive values and those that take
negative values are independent, then LxTr should be independent of L
y1
Tr
, . . . , LynTr . To find the
distribution of LxTr , there are a Poisson number of excursions that reach the level x. Each
excursion that reaches x contributes an amount to the local time at x that is an exponential
random variable; see Exercise 27.1. After proving some additional independence, namely,
that the amount each excursion contributes to local time at x is independent of the amount any
other excursion contributes and that the amount contributed by an excursion is independent
of the number of excursions reaching x, we see that LxTr should have the same distribution as
a Poisson number of independent exponential random variables.
Exercises
27.1 Let W be a Brownian motion, x > 0, and T = inf{t > 0 : Wt = x}. If Lxt is the local time at x,

show that the distribution of LxT is an exponential random variable. Determine the parameter of

this exponential random variable.

27.2 Let W be a one-dimensional Brownian motion. This exercise asks you to prove that the nor-

malized number of downcrossings by time t converges to local time at 0. If a > 0, let S0 = 0,

T0 = inf{t : Wt = a}, and for i ≥ 1,

Si = inf{t > Ti−1 : Wt = 0}, Ti = inf{t > Si : Wt = a}.

Then Dt (a), the number of downcrossings up to time t, is defined to be sup{k : Sk ≤ t}. Prove

that there exists a constant c such that

lim

a→0

aDt (a) = cL0t , a.s.,

where L0t is local time at 0 of W . Determine c.

Hint: Use Exercise 18.5.

27.3 Let (Xt , Px) be a Brownian motion.

(1) Use the reflection principle to find

P0(Xs > −a for all s ≤ r).

Notes 217

This is the same as Pa(T0 > r), where T0 is the first time the Brownian motion hits 0.

(2) Let

A(a, r) = { f ∈ E : sup f (t) > a, σ f > r},

B(r) = { f ∈ E : σ f > r, sup f (t) > 0},

and

C(a) = { f ∈ E : sup f (t) > a}.

Prove that

m(B(r)) = lim

a→0

m(A(a, r)) = lim

a→0

[m(C(a)) × Pa(T0 > r)]

and use this and (1) to compute m(B(r)). By symmetry, m({ f ∈ E : σ f > r}) will be twice the

value of m(B(r)).

27.4 Let W be a Brownian motion. Let Et (r) be the number of excursions of length larger than r that

have been completed by time t. An excursion of length larger than r means that σ f > r. Show

that there exists a constant c such that

lim

r→0

√

rEt (r) = cL0t , a.s.

Determine c.

One interesting point here is that this shows that L0t is determined entirely by the zero set

Z(ω) = {t : Wt (ω) = 0}.

27.5 Let δ > 0 and Aδ = { f ∈ E : supt | f (t)| > δ}. Let S1 = inf{t : Kt (Aδ ) = 1} and S2 = inf{t >

S1 : Kt (Aδ ) = 2}. Thus S1 and S2 are the times the first and second excursions in Aδ have been

completed. Let Y1(t) be the excursion completed at time S1 and define Y2(t) similarly. To be

more precise, if R1 = sup{t < S1 : Wt = 0}, then Y1(s) = WR1+s if s ≤ S1 − R1 and Y1(s) is
equal to 0 for all s ≥ S1 − R1.
Prove that Y1 and Y2 are independent.
Hint: Use the strong Markov property at time S1.
Notes
Besides its use in the Ray–Knight theorems (Rogers and Williams, 2000b), excursion theory
is useful in many other contexts. See Rogers and Williams (2000b) for applications to
Skorokhod embedding and to the arc sine law.
28
Financial mathematics
A European call option is the option to buy a share of stock at a given price at some particular
time in the future. For example, I might buy a call option to purchase one share of Company X
for $40 three months from today. When the three months is up, I check the price of Company
X. If, say, it is $35, then my option is worthless, because why would I buy a share for $40
using the option when I could buy it on the open market for $35? But if three months from
now, the share price is, say, $45, then I can exercise my option, which means I buy a share
for $40, and I can then turn around immediately and sell that share for $45 and make a profit
of $5. Thus, today, there is a potential for a profit if I have a call option, and so I should pay
something to purchase that option. A significant part of financial mathematics is devoted to
the question of what is the fair price I should pay for a call option.
Options originated in the commodities market, where farmers wanted to hedge their risks.
Since then many types of options have been developed (options are also known as derivatives),
and the amount of money invested in options has for the past several years exceeded the
amount of money invested in stocks.
In 1973 Black and Scholes, using the reasonable principle that you can’t get something
for nothing, came up with a convincing formula for the price of an option. This chapter gives
two derivations of the Black–Scholes formula, proves the fundamental theorem of finance,
and finishes by considering a stochastic control problem. The Black–Scholes formula is a
beautiful example of applied stochastic processes.
28.1 Finance models
Let Wt be a Brownian motion. We assume that St is the price of a stock or other risky security.
If we have $2,000 and we buy 100 shares in a stock that sells for $20 per share and it goes
up $2, or if we buy 10 shares in a stock selling for $200 per share that goes up $20, we are
equally happy; it is the percentage increase that matters. With this in mind, we assume that
St satisfies
dSt = μSt dt + σSt dWt . (28.1)
This is plausible, since then dSt/St = μ dt + σ dWt , that is, we are assuming the relative
change in price is a multiple of Brownian motion with drift. The quantity μ is known as the
mean rate of return and σ is called the volatility. The solution to this SDE is
St = S0eσWt+(μ−(σ 2/2))t (28.2)
by Proposition 24.6.
218
28.1 Finance models 219
We also assume the existence of a bond with price Bt , which is assumed to be riskless,
and the equation for Bt is
dBt = rBt dt,
which implies
Bt = B0ert .
Suppose at time t one buys A shares of stock. The cost is ASt . If one sells the shares at
time t + h, one receives ASt+h, and the net gain is A(St+h − St ). One can also sell short, i.e.,
let A be negative. The formula for the gain is the same.
Suppose at time ti one holds Ai shares, up until time ti+1. The total net gain over the whole
period t0 to tn is
∑n−1
i=0 Ai(Sti+1 − Sti ). This is the same as the stochastic integral
∫ t
0 at dSt if at
equals Ai when ti ≤ t < ti+1.
One should allow Ai to depend on the entire past F ti . Idealizing, one allows continuous
trading, and if as is the number of shares held at time s, the net gain through trading the stock
is
∫ t
0 as dSs. One has a similar net gain of
∫ t
0 bs dBs when trading bonds if bs is the number of
bonds held at time s.
Although at can depend on the entire past Ft , one does not want to let at depend on
the future. This helps explain why the class of predictable integrands is the appropriate one
to use.
The pair (a, b) is called a trading strategy. Set
Vt = atSt + btBt, (28.3)
the amount of wealth one has at time t. The strategy is self-financing if
Vt = V0 +
∫ t
0
as dSs +
∫ t
0
bs dBs (28.4)
for all t. The first integral represents the net gain from trading in the stock, the second integral
the net gain from trading in the bond, and (28.4) says that one’s wealth at time t is equal to
what one starts with plus what one has realized through trading in the stock and bond. We
assume throughout that there are no transaction costs (i.e., no brokerage fees).
A European call gives the buyer the option of buying a share of the stock at a fixed time
tE at price K. The time tE is called the exercise time. After time tE , the option has expired
and is worthless.
What is the option worth? At time tE , if StE ≤ K, the option is worth nothing, for who
would pay K dollars for a share of stock when it sells for StE dollars? If StE > K, one can

use the option to buy a share of the stock at price K and immediately sell it at price StE , to

make a profit of StE − K. Thus the value of the option at time tE is (StE − K)+. An important

question is: how much should the option sell for? What is a fair price for the option at

time 0?

There are a myriad of types of options. The American call is almost the same as the

European call, except that one is allowed to buy a share of the stock at price K at any time in

the interval [0, tE]. The European put gives the buyer the option to sell a share of the stock at

price K at time tE , while the American put gives the buyer the option to sell a share at price

K anytime before time tE .

220 Financial mathematics

28.2 Black–Scholes formula

In 1973 Black and Scholes came up with their formula for the price of a European call. We

will give two derivations of this formula.

Derivation 1. First of all, the interest rate r on the bond may be considered to be the same as

the rate of inflation. Thus the value of the option (StE − K)+ in today’s dollars is

C = e−rtE (StE − K)+. (28.5)

In this first derivation we work in today’s dollars. Therefore the present-day value of the stock

is Pt = e−rtSt . Note P0 = S0 and the present-day value of our option at time tE is then

C = e−rtE (StE − K)+ = (PtE − e−rtE K)+. (28.6)

By the product formula,

dPt = e−rt dSt − re−rtSt dt

= e−rtσSt dWt + e−rtμSt dt − re−rtSt dt

= σPt dWt + (μ − r)Pt dt.

The solution to this stochastic differential equation (see Proposition 24.6) is

Pt = P0eσWt+(μ−r−σ 2/2)t .

Also, the net gain or loss in present-day dollars when holding as shares of stock at time s is∫ t

0 as dPs.

Define Q on FtE by

dQ/dP = MtE = exp

(

− μ − r

σ

WtE −

(μ − r)2

2σ 2

tE

)

.

Under Q, W̃t = Wt + μ−rσ t is a Brownian motion by the Girsanov theorem.

Now

dPt = σPt dWt + (μ − r)Pt dt = σPt

(

dWt + μ − r

σ

dt

)

= σPt dW̃t .

Therefore under Q, Pt is a martingale since stochastic integrals with respect to martingales

are martingales. The solution to the SDE

dPt = σPtdW̃t

is

Pt = P0eσW̃t−(σ 2/2)t, (28.7)

so Pt and W̃t have the same filtration.

C is FtE measurable. By the martingale representation theorem (Theorem 12.3), there

exists an adapted process As such that

C = E QC +

∫ tE

0

As dW̃s = E QC +

∫ tE

0

Ds dPs,

where Ds = As/(σPs).

28.2 Black–Scholes formula 221

Therefore, if one follows the trading strategy of buying and selling the stock St , where one

holds Ds shares of stock at time s, one can obtain C − E QC dollars at time tE . Or, starting

with E QC dollars and buying and selling stock, one can get the identical output as C, almost

surely. A standard assumption in finance is that of no arbitrage, which means you cannot

make a profit without taking some risk. To avoid riskless profits, C must sell for E QC.

To explain this in more detail, suppose you could sell the European call for C′ dollars.

If C′ > E QC, you could sell a call for C′ dollars, use the money and invest in the trading

strategy of holding Ds shares of stock at time s, and at time tE have C′ + C − E QC worth

of stocks and options. The buyer of the option decides whether to exercise the option, and it

costs you C dollars to meet that obligation. With probability one, you have gained C′ − E QC

dollars, a riskless profit. If C′ < E QC, simply reverse the roles of buying and selling. The
only way to avoid making a riskless profit is if C′ = E QC.
To find E QC, using (28.6) and (28.7) we write
E QC = E Q[(S0eσW̃tE −σ 2tE/2 − e−rtE K)+] (28.8)
= 1√
2πtE
∫
(S0e
σy−σ 2tE/2 − e−rtE K)+e−y2/2tE dy,
which is the Black–Scholes formula. One can, if one wishes, perform some calculations to
find alternate expressions for the right-hand side.
It is noteworthy that μ does not appear in (28.8)! You and I might have different opinions
as to what μ, the mean rate of return, is equal to, but we should agree on the price of the call.
This was a shock to economists when this was first discovered. The value of σ , the volatility,
does enter into the formula.
Until we evaluated E QC in (28.8), the actual form of C was unimportant. For any type
of option expiring at time tE , Derivation 1 tells us that its price at time zero should be its
expectation under Q.
Derivation 2. In this approach, which is the one used by Black and Scholes, we use the actual
values of the securities, not the present-day values. Let Vt be the value of the option at time t
and assume
Vt = f (St, tE − t) (28.9)
for all t, where f is some function that is sufficiently smooth. We also want VtE = (StE −K)+.
Recall the multivariate version of Itô’s formula (Theorem 11.2). We apply this with d = 2
and Xt = (St, tE − t). From (28.1),
〈S〉t = σ 2S2t dt,
〈tE − t〉t = 0 since tE − t is of bounded variation and hence has no martingale part, and
〈S, tE − t〉t = 0. Also, d(tE − t) = −dt. Then
Vt − V0 = f (St, tE − t) − f (S0, tE ) (28.10)
=
∫ t
0
fx(Su, tE − u) dSu −
∫ t
0
ft (Su, tE − u) du
+ 12
∫ t
0
σ 2S2u fxx(Su, tE − u) du.
222 Financial mathematics
Here fx is the partial derivative with respect to x, the first variable, fxx is the second partial
derivative with respect to x, and ft is the partial derivative with respect to t, the second
variable. On the other hand,
Vt − V0 =
∫ t
0
au dSu +
∫ t
0
bu dBu. (28.11)
By (28.3) and (28.9),
bt = Vt − atSt
Bt
= f (St, tEt − t) − atSt
Bt
. (28.12)
Also, recall Bt = B0ert . Comparing (28.10) with (28.11), we must therefore have
at = fx(St, tE − t) (28.13)
and
− ft (St, tE − t) + 12σ 2S2t fxx(St, tE − t) = btB0rert . (28.14)
Substituting for bt using (28.12),
r[ f (St, tE − t) − St fx(St, tE − t)] (28.15)
= − ft (St, tE − t) + 12σ 2S2t fxx(St, tE − t)
for almost all t and all St . Since St is a continuous process, (28.15) leads to the parabolic
partial differential equation (PDE)
ft = 12σ 2x2 fxx + rx fx − r f , (x, s) ∈ (0, ∞) × [0, tE ),
and
f (x, 0) = (x − K)+.
Solving this equation for f , f (x, tE ) tells us what V0 should be, i.e., the cost of setting up the
equivalent portfolio. This partial differential equation can be solved and the solution is the
Black–Scholes formula. Equation (28.13) shows what the trading strategy should be.
Let us now briefly discuss American calls. Recall that these are ones where the holder
can buy the security at price K at any time up to time tE . Since the holder of an American
call can always wait up to time tE , which is equivalent to having a European call, the value
of an American call should always be at least as large as the value of the corresponding
European call.
Suppose one exercises an American call early. If StE > K and one exercised early, at time

tE one has one share of stock, for which one paid K, and one has a profit of (StE − K).

However, because one purchased the stock before time tE , one lost the interest Ker(tE−t) that

would have accrued by waiting to exercise the option. (We are supposing r ≥ 0.) Thus in this

case it would have been better to wait until time tE to exercise the option.

On the other hand, if StE < K, exercising the option early would mean that one has lost
|StE − K|, whereas for the European option, one would have not exercised at all, and lost
nothing (other than the price of the option).
In either case, exercising early gains nothing, hence the price of an American call should
be the same as that of a European call.
28.3 The fundamental theorem of finance 223
One can equally well price the European put, the option to sell a share of stock at price
K at time tE , by either Derivation 1 or Derivation 2 of the Black–Scholes formula. However
this analysis breaks down for American puts (sell a share of stock anytime up to time tE),
because in this case one gains by selling early: one can earn interest on the money received.
28.3 The fundamental theorem of finance
In the preceding section, we showed there was a probability measure Q under which Pt was a
martingale. This is true very generally. Let St be the price of a security in present-day dollars.
We will suppose St is a continuous semimartingale, and can be written St = Mt + At .
The NFLVR condition (“no free lunch with vanishing risk”) is that one cannot find fixed
positive real numbers t0, ε, b > 0, and predictable processes Hn with

∫ t0

0 |Hn(s)| |dAs| +∫ t0

0 H

2

n d〈M〉s < ∞, a.s., for each n such that∫ t0
0
Hn(s) dSs > −1

n

, a.s.,

for all n and

P

( ∫ t0

0

Hn(s) dSs > b

)

> ε.

Here t0, b, ε do not depend on n. The condition says that one can with positive probability

ε make a profit of b and with a loss no larger than 1/n. Q is an equivalent martingale measure

if Q is a probability measure, Q is equivalent to P (which means they have the same null

sets), and St is a local martingale under Q.

Theorem 28.1 If St is a continuous semimartingale and the NFLVR condition holds, then

there exists an equivalent martingale measure Q.

Proof Let us prove first of all that dAt is absolutely continuous with respect to d〈M〉t . We

suppose not and obtain a contradiction. Consider the measures μA and μ〈M〉 on the predictable

σ -field defined by

μA(D) = E

∫ ∞

0

1D dAt, μ〈M〉(D) = E

∫ ∞

0

1D d〈M〉t . (28.16)

Since A is of bounded variation and continuous, it is a predictable process, and we can

write At = Bt − Ct , where B and C are continuous increasing processes and where μB

and μC are mutually singular measures on the predictable σ -field; we define μB and μC

analogously to (28.16). To give a few more details on how to do this, we write At = B′t − C′t ,

where B′ and C′ are continuous increasing processes, we find non-negative predictable

processes bt and ct such that B′t =

∫ t

0 bt d(B

′

t + C′t ) and C′t =

∫ t

0 ct d(B

′

t + C′t ), and then let

Bt =

∫ t

0 (bt − (bt ∧ ct )) d(B′t + C′t ) and Ct =

∫ t

0 (ct − (bt ∧ ct )) d(B′t + C′t ). We leave it to

the reader to check that B and C are the desired processes. Since μB and μC are mutually

singular, there exists a set E in the predictable σ -field such that μB(D) = μB(D ∩ E) and

μC(D) = μC(D ∩ Ec) for all sets D in the predictable σ -field.

If μA is not absolutely continuous with respect to μ〈M〉, then at least one of μB and μC is not

absolutely continuous. We assume that μB is not, for otherwise we can look at −St instead of

St . Therefore there exists a predictable set F and a fixed time t0 such that

∫ t0

0 1F dBs is almost

224 Financial mathematics

surely non-negative, is strictly positive with positive probability, and

∫ t0

0 1F d〈M〉s = 0. We

can replace F by F ∩ E and so assume that F ⊂ E, and hence μC(F ) = μC(F ∩ Ec) = 0.

Then ∫ t0

0

1F dSs =

∫ t0

0

1F dMs +

∫ t0

0

1F dBs +

∫ t0

0

1F dCs.

The stochastic integral term is 0 because

∫ t0

0 (1F )

2 d〈M〉s = 0. The integral with respect to

Cs is zero because μC(F ) = 0. We then have the NFLVR condition violated with Hn = 1F

for all n. Hence absolute continuity is established, and by the Radon–Nikodym theorem,

At =

∫ t

0 hs d〈M〉s for some predictable process hs.

Our next goal is to show

∫ t

0 h

2

s d〈M〉s < ∞ for all t. Let
U = inf
{
t :
∫ t
0
h2s d〈M〉s = ∞
}
.
On (U < ∞) there are two possibilities:
(1)
∫ t
0 h
2
s d〈M〉s < ∞ if t < U but
∫ U
0 h
2
s d〈M〉s = ∞, and
(2)
∫ U
0 h
2
s d〈M〉s < ∞ but
∫ U+ε
U h
2
s d〈M〉s = ∞ for all ε.
(For a real variable analog, consider the two functions f1(t) =
∫ t
−1
1
|x| dx and f2(t) =∫ t
−1 1(x>0)

1

x dx at t = 0.)

Let us investigate case (1) and show that it cannot happen. Choose a fixed time t0 such

that P(U < t0) > 0. Let

R1 = R1(n) = inf

{

t :

∫ t

0

h2s d〈M〉s ≥ n4

}

∧ U ∧ t0.

We suppose

inf

n

P(R1(n) < U ∧ t0) > 0 (28.17)

and obtain a contradiction. Let Ht = ht1[0,R1]/n4. Then∫ t0

0

Hs dAs =

∫ R1

0

h2s

n4

d〈M〉s ≥ 1

on (R1 < U < t0). On the other hand,
E
(
sup
t≤t0
∣∣∣∫ t
0
Hs dMs
∣∣∣)2 ≤ 4E ∫ t0
0
H 2s d〈M〉s ≤
4
n8
n4 = 4
n4
by Doob’s inequalities. Therefore
P
(
sup
t≤t0
∣∣∣∫ t
0
Hs dMs
∣∣∣ > 1

n

)

≤

E supt≤t0

∣∣∣ ∫ t0 Hs dMs∣∣∣2

n−2

≤ 4/n

4

n−2

= 4

n2

.

Let

R2 = R2(n) = inf

{

t :

∣∣∣ ∫ t

0

Hs dMs

∣∣∣ ≥ 1/n}

28.3 The fundamental theorem of finance 225

and let H̃t = Ht1[0,R2]. We then have

P(R2 < R1) ≤ P(R2 ≤ t0) ≤ 4/n2,∫ t0
0
H̃s dSs =
∫ R2
0
H̃s dMs +
∫ R2
0
H̃s dAs
≥ −1
n
+
∫ R2
0
h2s
n4
d〈M〉s ≥ −1/n
almost surely, and
P
( ∫ t0
0
H̃s dSs ≥ 1
2
)
≥ P(R1 < U < t0) − P(R2 < R1)
≥ P(R1 < U < t0) − 4
n2
.
We do this for each n, and thus obtain a contradiction to the NFLVR condition, so (28.17)
cannot hold.
Case (2) is similar: choose δn such that
∫ U+1
U+δn h
2
s d〈M〉s ≥ n4 with positive probability, let
Ht = ht1[U+δ,U+1]/n4, and proceed as above. We leave the details as Exercise 28.3.
We thus have
∫ t
0 h
2
s d〈M〉s < ∞, a.s., for each t. Consequently the quantity
sups≤t |
∫ s
0 hr dMr| is also finite. Let
Vn = inf
{
t :
∣∣∣ ∫ t
0
hs dMs
∣∣∣ ≥ n or ∫ t
0
h2s d〈M〉s ≥ n
}
.
We conclude Vn ↑ ∞.
Define Q on FVn by
dQ/dP = exp
(
−
∫ Vn
0
hs dMs − 12
∫ Vn
0
h2s d〈M〉s
)
.
The exponent is bounded, so Q is well defined. Under Q, if t ≤ Vn, then
Mt −
⟨
−
∫ ·
0
hs dMs, M
⟩
t
= Mt +
∫ t
0
hs d〈Ms〉 = Mt + At
is a martingale by the Girsanov theorem (Exercise 13.5). Therefore St = Mt + At is a local
martingale.
Finally, e−
∫ t
0 hs dMs− 12
∫ t
0 h
2
s d〈M〉s is never zero nor infinite, so Q is equivalent to P.
Let us give two examples to clarify the proof. Let C be the standard Cantor set and let g(t)
be the Cantor function. Suppose St = Wt + g(t), where W is a Brownian motion. We then
let Ht = 1C(t). Since the Cantor function increases only on the Cantor set,
∫ 1
0 Hs dg(s) = 1.
Since the Cantor set has Lebesgue measure 0, then
∫ 1
0 H
2
s ds = 0. But this is the quadratic
variation of
∫ 1
0 Hs dWs, so this stochastic integral is also 0. It follows that∫ 1
0
Hs dSs =
∫ 1
0
Hs dWs +
∫ 1
0
Hs dg(s) = 1,
226 Financial mathematics
which says that with the trading strategy H we make a profit of 1 almost surely, that
is, without any risk. Therefore the NFLVR condition is violated. This example indicates why
we must have dAt absolutely continuous with respect to d〈M〉t .
Suppose now that W is a Brownian motion and St = Wt +
∫ t
0 Hs ds with Hs bounded. Let
Mt = e−
∫ t
0 Hs dWs−
1
2
∫ t
0 H
2
s ds,
and define Q on F1 by dQ/dP = M1. By the Girsanov theorem, St = Wt +
∫ t
0 Hs ds is a
martingale under Q. This example shows that if the Radon–Nikodym derivative of dAt with
respect to d〈M〉t is not too bad, we can apply the Girsanov theorem.
28.4 Stochastic control
The theory of stochastic control, which includes a study of the Hamilton–Jacobi–Bellman
(HJB) equation and requires some knowledge of partial differential equations, is beyond the
scope of this book. However, we can consider one simple useful example. Suppose we have
available to us a stock which satisfies the SDE
dSt = σSt dWt + μSt dt,
where Wt is a Brownian motion, and a risk-free asset which satisfies the equation
dBt = rBt dt.
We want to put a proportion u of our wealth Zt into the stock and the remainder into the
risk-free asset. We will restrict 0 ≤ u ≤ 1, so that we do not borrow nor have short selling.
Also, we take μ > r, for if the mean rate of return on the stock is less than the risk-free rate,

we simply put all our money in the risk-free asset. How do we choose u in order to maximize

our return?

First of all, what do we mean by maximizing our return? Typically one chooses ahead

of time a deterministic function U , called the utility function, and one wants to maximize

EU (Zt0 ) at some fixed time t0. Usually utility functions are taken to be increasing and

concave. The function is chosen to be increasing because more money is considered better. It

is chosen concave because one assumes that twice the amount of money will give increased

pleasure, but not twice as much pleasure.

Let us work out the optimal control problem when U (x) = xp for some p ∈ (0, 1). If

Zt (depending on u) is our wealth, we have Zt = St + Bt and St = uZt , Bt = (1 − u)Zt .

We will allow u to depend on t and ω, but our answer will turn out to be deterministic and

independent of t, i.e., u is a constant.

We have seen (Proposition 24.6) that

St = S0eσWt−σ 2t/2+μt

and 〈S〉t = σ 2S2t dt and that the equation for Bt has the solution

Bt = B0ert .

Exercises 227

Therefore neither St nor Bt can ever be 0 or negative, and so Zt > 0 for all t. Applying Itô’s

formula to Z pt and noting that 〈Z〉t = 〈S〉t , we have

dZ pt = pZ p−1t dZt + 12 p(p − 1)Z p−2t d〈Z〉t

= pZ p−1t σSt dWt + pZ p−1t μSt dt + pZ p−1t rBt dt

+ 12 p(p − 1)Z p−2t σ 2S2t dt

= puZ pt σ dWt + puZ pt μ dt + p(1 − u)rZ pt dt

+ 12 p(p − 1)Z pt σ 2u2 dt.

Therefore

E Z pt0 = E Z p0 + pE

∫ t0

0

Z pt [uμ + (1 − u)r + 12 (p − 1)σ 2u2] dt.

This will be largest if the expression

F (u) = uμ + (1 − u)r + 12 (p − 1)σ 2u2

is largest, which by elementary calculus is largest when

u = μ − r

(1 − p)σ 2 .

Exercises

28.1 Let

�(x) = 1√

2π

∫ x

−∞

e−y

2/2 dy,

the cumulative normal distribution function. Rewrite the Black–Scholes formula for the value

of a European call in terms of �. This is the way the Black–Scholes formula is written in finance

books.

28.2 A European put that gives one the option to sell a share of stock at price K at time tE has value

(K − StE )+ at time tE . Find the present-day value of the European put at time 0.

28.3 Carry out the details of the proof of Theorem 28.1 for Case 2.

28.4 If the utility function in Section 28.4 is U (x) = log x instead of U (x) = xp, what is the optimal

choice for u?

28.5 Let a, b > 0, let Yi be i.i.d. random variables that take only the values b and −a, and let

Sn =

∑n

i=1 Yi. Show that if P(Y1 = b) > 0 and P(Y1 = −a) > 0, there exists a probability

measure Q equivalent to P under which Sn is a martingale. Describe the Radon–Nikodym

derivative of Q with respect to P.

28.6 Suppose the interest rate r is equal to 0 and an option V has payoff

sup

s≤te

Ss

at time te. What is the price of V at time 0?

228 Financial mathematics

28.7 Suppose the interest rate r is equal to 0. Let U be the option that pays off − inf s≤te Ss at time te.

What is the price of U at time 0?

If V is as in Exercise 28.6, then U + V is the option that pays off the maximum of the stock

price minus the minimum of the stock price, in other words, “buy low, sell high.” Naturally

such an option would be expensive. It is remarkable that there exists a trading strategy that can

duplicate this payoff, even though the times when the maximum and minimum occur are not

stopping times.

29

Filtering

Stochastic filtering is a nice example of nontrivial interesting mathematics that is extremely

useful. For example, it has been used extensively in NASA’s space program.

The method we use is called the innovations approach to filtering, and uses Lévy’s theorem,

the martingale representation theorem, and other results from stochastic calculus.

We will start with a fairly general model, except for simplicity we will assume our

observation process is one-dimensional. The extension to the d-dimensional case is mostly

routine. Later on we will look at a specific model, the linear model, where one can give fairly

explicit solutions to the filtering equation for real-life problems.

29.1 The basic model

We start with a probability space (�,F , P), together with a filtration {Ft} satisfying the

usual conditions. In filtering theory, there are a number of filtrations present, and we will

need to be careful about which ones are which.

We have a signal process Xt taking values in a complete separable metric space S and we

let {FXt } be the minimal augmented filtration generated by X . We have a function f mapping

S to the reals, we suppose E | f (Xt )|2 < ∞ for all t, and we suppose that there exists a
process As adapted to the filtration {FXt } such that
Mt = f (Xt ) − f (X0) −
∫ t
0
As ds
is a martingale with respect to the filtration {FXt }. Next we discuss the observation process.
Let Wt be a one-dimensional Brownian motion with respect to the filtration {FXt }, let ht be a
real-valued process adapted to {FXt }, and suppose
Zt = Wt +
∫ t
0
hs ds. (29.1)
The process Zt is called the observation process and is what we observe. Let {FZt } be the
filtration generated by the process Z. In practice one does not necessarily want to assume that
{FZt } is right continuous, but let us assume that it is for simplicity. Requiring the filtration to
be complete is not a serious issue.
For an example, suppose that dXt = σ (Xt ) dW t + b(Xt ) dt as in Chapter 24, where W t is
a d-dimensional Brownian motion and σ and b are matrix valued, and suppose f ∈ C2(Rd )
is bounded or has linear growth. Then Itô’s formula shows that such an f will satisfy our
229
230 Filtering
assumptions. In this case hs in (29.1) is of the form g(Xs) for a particular function g; see
Section 39.3.
The goal of filtering is to get the best estimate of f (Xt ) from the observations {Zt}. We
want to find the best estimate for f (Xt ) in the following sense. We want to minimize the mean
square error E | f (Xt )−Y |2 over all random variables Y that are FZt measurable, i.e., over all
random variables that can be determined by the observations up to time t. The rationale is
that since FZt is the information we have observed up to time t, we want our estimate to be
FZt measurable, and among all random variables that are FZt measurable, we want the one
closest to f (Xt ) in L2 norm, which means we minimize the mean square error.
Lemma 29.1 The best mean square error estimate of f (Xt ) over the class of FZt measurable
random variables is
Y = E [ f (Xt ) | FZt ].
Proof By our assumptions on f , the random variable V = f (Xt ) is in L2(P). Let Y be
the best mean square estimator. The collection M of L2 random variables which are FZt
measurable is a linear subspace of L2, and the element of a Hilbert space that minimizes the
distance from V to this subspace M is the projection onto M. Therefore Y is the projection
of V onto M. Hence V − Y is orthogonal (in the L2 sense) to every element of M. In
particular, if E ∈ FZt ,
E [(V − Y )1E] = 0,
which implies E [V ; E] = E [Y ; E]. This holds for every E ∈ FZt and Y is FZt measurable,
hence Y = E [V | FZt ].
Given any process Ht that is {Ft} adapted, we use the notation Ĥt = E [Ht | FZt ]. We will
look at expressions like
∫ t
0 Ĥs ds, and you might wonder about the joint measurability of Ĥ
in ω and t, since Ĥt is only defined almost surely for each t. The way to deal with this is to
let Ĥt be the optional projection of H with respect to the optional σ -field generated by {FZt };
see (16.8) in Chapter 16.
29.2 The innovation process
We next define the innovation process
Nt = Zt −
∫ t
0
ĥs ds. (29.2)
(Following our convention on notation, ĥs = E [hs | FZs ].) Note that although Nt is FZt
measurable, we cannot determine it from (29.2) because it contains the unknown ĥs on the
right-hand side.
Proposition 29.2 Nt is a Brownian motion with respect to the filtration {FZt }.
Proof We will show that Nt is a continuous martingale with respect to the filtration
{FZt } whose quadratic variation is t, and then our result follows from Lévy’s theorem
(Theorem 12.1). That Nt is continuous is obvious, and 〈N〉t = 〈Z〉t = 〈W 〉t = t from
the definitions of Z and W . Thus we need to show that N is a martingale with respect to {FZt }.
29.3 Representation of FZ-martingales 231
If r ≥ s, we have
E [̂hr | FZs ] = E [E [hr | FZr ] | FZs ] = E [hr | FZs ]. (29.3)
Then using Exercise 29.1,
E [Nt − Ns | FZs ] = E [Zt − Zs | FZs ] −
∫ t
s
E [̂hr | FZs ] dr (29.4)
= E [Wt − Ws | FZs ] +
∫ t
s
E [hr − ĥr | FZs ] dr
= E [E [Wt − Ws | FXs ] | FZs ] = 0,
since FZs ⊂ FXs .
29.3 Representation of FZ-martingales
In this section we prove that if Yt is a martingale with respect to {FZt }, then Y can be
represented as a stochastic integral with respect to N . This is not an immediate consequence
of Theorem 12.3 because we do not know that Nt generates {FZt }; the filtration generated by
N could conceivably be strictly smaller than the one generated by Z.
Theorem 29.3 Suppose Yt is a square integrable martingale with respect to {FZt }. Let PZ be
the predictable σ -field defined on [0, ∞) × � in terms of {FZt }. Then there exists Hs which
is PZ measurable and with E
∫∞
0 H
2
s ds < ∞ such that
Yt = Y0 +
∫ t
0
Hs dNs (29.5)
for all t.
To clarify, PZ is the σ -field generated by all bounded left-continuous processes that are
adapted to {FZt }.
Proof First let us treat the case where
∫ t
0 ĥs dNs,
∫ t
0 |̂hs|2 ds, and Yt are each bounded. Define
Q on FZt by dQ/dP |FZt = Mt , where
Mt = exp
(
−
∫ t
0
ĥs dNs − 12
∫ t
0
|̂hs|2 ds
)
.
Then by the Girsanov theorem (Theorem 13.3)
Zt = Nt +
∫ t
0
ĥs ds
is a martingale under Q with respect to {FZt }. Since 〈Z〉t = 〈N〉t = 〈W 〉t = t, then Z is a
Brownian motion under Q with respect to {FZt }.
Let Ỹt = M−1t Yt . If A ∈ FZs , then A ∈ FXs and
E Q[Ỹt; A] = E P[Mt (M−1t Yt ); A] = E P[Yt; A] = E P[Ys; A]
= E P[Ms(M−1s Ys); A] = E Q[Ỹs; A].
232 Filtering
Therefore Ỹt is a martingale under Q with respect to {FZt }. By the martingale representation
theorem (Theorem 12.3) there exists Ks ∈ PZ such that
Ỹt = Ỹ0 +
∫ t
0
Ks dZs = Ỹ0 +
∫ t
0
Ks dNs +
∫ t
0
Kŝhs ds.
On the other hand, dMt = −Mt̂ht dNt and Yt = MtỸt . We have d〈M,Y 〉t = −Mt̂htKt dt.
By the product formula,
Yt = M0Ỹ0 +
∫ t
0
Ỹs dMs +
∫ t
0
Ms dỸs + 〈M, Ỹ 〉t
= Y0 −
∫ t
0
ỸsMŝhs dNs +
∫ t
0
KsMs dNs +
∫ t
0
KŝhsMs ds −
∫ t
0
MŝhsKs ds,
which is of the desired form if we set Hs = KsMs − ỸsMŝhs.
In the general case, let
TK = inf
{
t :
∣∣∣ ∫ t
0
ĥs dNs
∣∣∣+ ∫ t
0
|̂hs|2 ds + |Yt | ≥ K
}
.
We apply the above argument to Yt∧TK and use Exercise 29.3 to get
Yt∧TK = Y0 +
∫ t
0
H Ks dNs,
where H Ks is predictable with respect to the σ -fields {FZt∧TK } and is 0 from time TK on. Since
Yt is square integrable, YTK → Y∞ almost surely and in L2(P) as K → ∞, and
E
[ ∫ ∞
0
|H Ks − H Ls |2 ds
]
= E [ |YTK − YTL |2] → 0
as K, L → ∞. Using the completeness of L2, there exists Hs such that E
∫∞
0 H
2
s ds < ∞
and E
∫∞
0 |Hs − H Ks |2 ds → 0 as K → ∞. It is routine to check that Hs is PZ measurable
and that (29.5) holds.
29.4 The filtering equation
We now derive the general filtering equation. First we need a lemma.
Lemma 29.4 If Yt −
∫ t
0 Hs ds is a martingale with respect to {FXt }, then Ŷt −
∫ t
0 Ĥs ds is a
martingale with respect to {FZt }.
29.4 The filtering equation 233
Proof Since FZs ⊂ FXs ,
E
[
Ŷt − Ŷs −
∫ t
s
Ĥr dr | FZs
]
= E
[
E [Yt | FZt ] − E [Ys | FZs ] −
∫ t
s
E [Hr | FZr ] dr | FZs
]
= E
[
Yt − Ys −
∫ t
s
Hr dr | FZs
]
= E
[
E
[
Yt − Ys −
∫ t
s
Hr dr | FXs
]
| FZs
]
= 0.
The first equality is proved in a fashion similar to the one you were asked to prove in
Exercise 29.1.
Here is the filtering equation.
Theorem 29.5 Let Mt = f (Xt ) − f (X0) −
∫ t
0 As ds be a martingale with respect to {FXt }
and write Fs for f (Xs). Suppose 〈M,W 〉t =
∫ t
0 Ds ds. Then
F̂t = F̂0 +
∫ t
0
Âs ds +
∫ t
0
(F̂shs − F̂ŝhs + D̂s) dNs. (29.6)
Proof By Lemma 29.4,
Lt = F̂t − F̂0 −
∫ t
0
Âs ds (29.7)
is a martingale with respect to {FZt } and by Theorem 29.3, there exists Hs such that
Lt =
∫ t
0
Hs dNs. (29.8)
By the product formula
FtZt =
∫ t
0
Fs dZs +
∫ t
0
Zs dFs +
∫ t
0
Ds ds
=
∫ t
0
Fs dNs +
∫ t
0
Fshs ds +
∫ t
0
Zs dMs +
∫ t
0
ZsAs ds +
∫ t
0
Ds ds
= FX -martingale +
∫ t
0
[Fshs + ZsAs + Ds] ds.
By Lemma 29.4 and the obvious fact that Z is adapted to {FZt },
F̂tZt = F̂tZt = FZ-martingale +
∫ t
0
(F̂shs + ZsÂs + D̂s) ds.
Again using the product formula,
F̂tZt =
∫ t
0
F̂s dZs +
∫ t
0
Zs dF̂s +
∫ t
0
Hs ds
= FZ-martingale +
∫ t
0
[F̂ŝhs + ZsÂs + Hs] ds.
234 Filtering
Therefore ∫ t
0
(F̂shs + ZsÂs + D̂s − F̂ŝhs − ZsÂs − Hs) ds
is a continuous FZ-martingale that has paths that are locally of bounded variation and which
is zero at time zero, hence is identically zero by Theorem 9.7. Hence with probability one,
Hs = F̂shs − F̂ŝhs + D̂s for almost every s. Substituting this in (29.8) and combining with
(29.7) gives our result.
29.5 Linear models
The filtering equation (29.6) is difficult to apply in most cases. However, in the linear model,
we can get a much simpler representation. To define the linear model in d dimensions, let Xt
solve
dXt = A(t) dW t + B(t)Xt dt, (29.9)
where W t is a d-dimensional Brownian motion and A(t) and B(t) are deterministic d × d
matrices that are continuous in t. Let
dZt = dWt + C(t)Xt dt, (29.10)
where C is a deterministic d × d matrix-valued function that is continuous in t and Wt is a
d-dimensional Brownian motion independent of W and X .
Why is this model useful? Suppose Xt is two-dimensional with X
(1)
t being the position of
a particle and X (2)t its velocity. Suppose the position and the velocity have some randomness
and that our observations of the position and velocity are noisy. This fits into the model
(29.9)–(29.10) if we take
A(t) =
(
1 0
0 1
)
, B(t) =
(
0 1
0 0
)
, C(t) =
(
c1 0
0 c2
)
.
For another example, suppose a particle has a fixed unknown velocity and the position is
observed, but obscured by noise. Let X (1)t and X
(2)
t be the position and velocity and let A(t)
be the zero matrix,
B(t) =
(
0 1
0 0
)
, C(t) =
(
1 0
0 0
)
.
The solution of the filtering problem modeled by (29.9)–(29.10) is known as the Kalman–
Bucy filter. For simplicity we will consider the special case where the dimension d is 1 and
A, B,C are constant in t; the general case is done in exactly the same way, but the notation
becomes much more complicated (see Kallianpur, 1980). We will further assume E X0 and
Var X0 are known.
29.6 Kalman–Bucy filter
Let
Vt = X̂ 2t − (X̂t )2,
the conditional variance of Xt given FZt .
29.6 Kalman–Bucy filter 235
Theorem 29.6 Vt solves the deterministic ordinary differential equation
dVt
dt
= 1 + 2BVt − C2V 2t , V0 = Var X0 (29.11)
In particular, Vt is deterministic. X̂t solves
dX̂t = CVt dZt + (B − CVt )X̂t dt, X̂0 = E X0. (29.12)
The equation (29.11) is an example of what is known as a Riccati equation. We get a
similar equation when d > 1 or when A, B, and C depend on t, but in general one cannot

solve the Riccati equation explicitly. However, when d = 1 and A, B,C do not depend on t,

one can solve (29.11) by separation of variables. Write

dV

1 + 2BV − C2V 2 = dt,

and integrate both sides.

When d = 1 (and even if A, B, and C depend on time), we can solve (29.12). Let

Gt = B − CVt so that we have

dX̂t = CVt dZt + GtX̂t dt,

or by the product formula

d

[

e−

∫ t

0 Gr drX̂t

]

= e−

∫ t

0 Gr drCVt dZt,

and hence

X̂t = E X0 +

∫ t

0

e

∫ t

s Gr drCVs dZs.

(Cf. the solution of (24.15).)

Proof of Theorem 29.6 By Itô’s formula, if f ∈ C2,

f (Xt ) − f (X0) = FX -martingale +

∫ t

0

[ 12 f

′′(Xs) + BXs f ′(Xs)] ds.

By the filtering equation applied with f (x) = x,

X̂t = E X0 + B

∫ t

0

X̂s ds + C

∫ t

0

Vs dNs. (29.13)

By Exercises 29.4(2) and 29.5(3),

X̂ 3t − X̂t X̂ 2t = 2X̂tVt . (29.14)

With the filtering equation applied with f (x) = x2 and (29.14),

X̂ 2t = E X 20 + C

∫ t

0

(1 + 2BX̂ 2s ) ds + C

∫ t

0

(X̂ 3s − X̂sX̂ 2s ) dNs

= E X 20 + C

∫ t

0

(1 + 2BX̂ 2s ) ds + 2C

∫ t

0

VsX̂s dNs.

236 Filtering

Therefore

dVt = d(X̂ 2t − (X̂t )2) (29.15)

= 2CVtX̂t dNt + (1 + 2BX̂ 2t dt) − 2X̂t (CVt dNt + BX̂t dt) − C2V 2t dt

= (1 + 2BVt − C2V 2t ) dt.

This shows that Vt solves the deterministic ordinary differential equation (29.15). This

equation has a unique solution (cf. Theorem 15.1), so Vt is deterministic. We obtain (29.12)

from (29.2), (29.10), and (29.13).

Exercises

29.1 Justify the first equality in (29.4).

29.2 Show that if Mt is a martingale with respect to {FXt }, then M̂t is a martingale with respect

to {FZt }.

29.3 Suppose W is a Brownian motion and {Ft} is its minimal augmented filtration. Let T be a

bounded stopping time with respect to {Ft}. Suppose Y is a FT measurable random variable

with EY 2 < ∞. Show that there exists a predictable process Hs with E
∫ T
0 H
2
s ds < ∞ such
that Y = EY + ∫ T0 Hs dWs, a.s.
29.4 (1) Show that the solution to (29.9) is a Gaussian process.
(2) Show that the solutions (Xt , Zt ) to (29.9)–(29.10) form a Gaussian process.
29.5 (1) Show that if X is a normal random variable with mean μ and variance σ 2, then
E X 3 = μ(μ2 + 3σ 2).
(2) Show that if X ,Y1, . . . ,Yn are jointly normal random variables, then
E [X 3 | Y1, . . . ,Yn] = E [X | Y1, . . . ,Yn](E [X | Y1, . . . ,Yn]2
+ 3Var [X | Y1, . . . ,Yn]),
where
Var [X | Y1, . . . ,Yn] = E [(X − E [X | Y1, . . . ,Yn])2 | Y1, . . . ,Yn].
(3) Show that
X̂ 3t = X̂t ((X̂t )2 + 3Var (Xt | FZt )),
where
Var (Xt | FZt ) = E [(Xt − X̂t )2 | FZt ] = X̂ 2t − (X̂t )2.
Notes
For more on filtering, see Kallianpur (1980) and Øksendal (2003).
30
Convergence of probability measures
Suppose we have a sequence of probabilities on a metric space S and we want to define
what it means for the sequence to converge weakly. Alternately, we may have a sequence of
random variables and want to say what it means for the random variables to converge weakly.
We will apply the results we obtain here in later chapters to the case where S is a function
space such as C[0, 1] and obtain theorems on the convergence of stochastic processes.
For now our state space is assumed to be an arbitrary metric space, although we will
soon add additional assumptions on S . We use the Borel σ -field on S , which is the σ -field
generated by the open sets inS . We write A0, A, and ∂A for the interior, closure, and boundary
of A, respectively.
30.1 The portmanteau theorem
Clearly the definition of weak convergence of real-valued random variables in terms of dis-
tribution functions (see Section A.12) has no obvious analog. The appropriate generalization
is the following; cf. Proposition A.41.
Definition 30.1 A sequence of probabilities {Pn} on a metric space S furnished with the
Borel σ -field is said to converge weakly to P if
∫
f dPn →
∫
f dP for every bounded
and continuous function f on S . A sequence of random variables {Xn} taking values in S
converges weakly to a random variable X taking values in S if E f (Xn) → E f (X ) whenever
f is a bounded and continuous function.
Saying Xn converges weakly to X is the same as saying that the laws of Xn converge weakly
to the law of X . To see this, if Pn is the law of Xn, that is, Pn(A) = P(Xn ∈ A) for each
Borel subset A of S , then E f (Xn) =
∫
f dPn and E f (X ) =
∫
f dP. (This holds when f is
an indicator by the definition of the law of Xn and X , then for simple functions by linearity,
then for non-negative measurable functions by monotone convergence, and then for arbitrary
bounded and Borel measurable f by linearity.)
What might cause a bit of confusion is that weak convergence in probability is not the
same as weak convergence in functional analysis, but rather is equivalent to what is known as
weak-∗ convergence in functional analysis. Feel free to skip the remainder of this paragraph
where we explain this. Recall that if B is a Banach space and B∗ is its dual, then xn ∈ B
converges weakly to x ∈ B if f (xn) → f (x) for all f ∈ B∗. fn ∈ B∗ converges with respect
to the weak-∗ topology to f ∈ B∗ if fn(x) → f (x) for all x ∈ B. By the Riesz representation
theorem, there is a one-to-one correspondence between positive bounded linear functionals
on B = C(X ), the continuous functions on X , where X is compact, and the set M of finite
237
238 Convergence of probability measures
measures on X . When B = C(X ), B∗ can be identified with M, and measures Pn with mass
1 in M converge to P ∈ M with respect to the weak-∗ topology if Pn(g) → P(g) for every
g ∈ B = C(X ). Interpreting Pn(g) as
∫
g dPn shows the connection.
Returning to weak convergence in the probability sense, the following theorem, known as
the portmanteau theorem, gives some other characterizations. For this chapter we let
Fδ = {x : d(x, F ) < δ} (30.1)
for closed sets F , the set of points within δ of F , where d(x, F ) = inf{d(x, y) : y ∈ F}.
Theorem 30.2 Suppose {Pn, n = 1, 2, . . .} and P are probabilities on a metric space. The
following are equivalent.
(1) Pn converges weakly to P.
(2) lim supn Pn(F ) ≤ P(F ) for all closed sets F.
(3) lim inf n Pn(G) ≥ P(G) for all open sets G.
(4) limn Pn(A) = P(A) for all Borel sets A such that P(∂A) = 0.
Proof The equivalence of (2) and (3) is easy because if F is closed, then G = F c is open
and Pn(G) = 1 − Pn(F ).
To see that (2) and (3) imply (4), suppose P(∂A) = 0. Then
lim sup
n
Pn(A) ≤ lim sup
n
Pn(A) ≤ P(A)
= P(A0) ≤ lim inf Pn(A0) ≤ lim inf Pn(A).
Next, let us show (4) implies (2). Let F be closed. If y ∈ ∂Fδ, then d(y, F ) = δ. The sets
∂Fδ are disjoint for different δ. At most countably many of them can have positive P-measure,
hence there exists a sequence δk ↓ 0 such that P(∂Fδk ) = 0 for each k. Then
lim sup
n
Pn(F ) ≤ lim sup
n
Pn(Fδk ) = P(Fδk ) = P(Fδk )
for each k. Since P(Fδk ) ↓ P(F ) as δk → 0, this gives (2).
We show now that (1) implies (2). Suppose F is closed. Let ε > 0. Take δ > 0 small

enough so that P(Fδ ) − P(F ) < ε. Then take f continuous, to be equal to 1 on F , to have
support in Fδ, and to be bounded between 0 and 1. For example, f (x) = 1− (1∧δ−1d(x, F ))
would do. Then
lim sup
n
Pn(F ) ≤ lim sup
n
∫
f dPn =
∫
f dP
≤ P(Fδ ) ≤ P(F ) + ε.
Since this is true for all ε, (2) follows.
Finally, let us show (2) implies (1). Let f be bounded and continuous. If we show
lim sup
n
∫
f dPn ≤
∫
f dP, (30.2)
for every such f , then applying this inequality to both f and − f will give (1). By adding a
sufficiently large positive constant to f and then multiplying by a suitable constant, without
30.2 The Prohorov theorem 239
loss of generality we may assume f is bounded and takes values in (0, 1). We define
Fi = {x : f (x) ≥ i/k}, which is closed.∫
f dPn ≤
k∑
i=1
i
k
Pn
( i − 1
k
≤ f (x) < i
k
)
=
k∑
i=1
i
k
[Pn(Fi−1) − Pn(Fi)]
=
k−1∑
i=0
i + 1
k
Pn(Fi) −
k∑
i=1
i
n
Pn(Fi)
≤ 1
k
+ 1
k
k∑
i=1
Pn(Fi).
Similarly, ∫
f dP ≥ 1
k
k∑
i=1
P(Fi).
Then
lim sup
n
∫
f dPn ≤ 1
k
+ 1
k
k∑
i=1
lim sup
n
Pn(Fi)
≤ 1
k
+ 1
k
k∑
i=1
P(Fi) ≤ 1
k
+
∫
f dP.
Since k is arbitrary, this gives (30.2).
If xn → x, Pn = δxn , and P = δx, it is easy to see Pn converges weakly to P. Letting
A = {x} shows that one cannot, in general, have limn Pn(F ) = P(F ) for all closed sets F .
30.2 The Prohorov theorem
It turns out there is a simple condition that ensures that a sequence of probability measures
has a weakly convergent subsequence.
Definition 30.3 A sequence of probabilities Pn on a metric space S is tight if for every ε
there exists a compact set K (depending on ε) such that supn Pn(K
c) ≤ ε.
The important result here is Prohorov’s theorem.
Theorem 30.4 If a sequence of probability measures on a metric space S is tight, there is a
subsequence that converges weakly to a probability measure on S .
Proof Suppose first that the metric space S is compact. Then C(S ), the collection of
continuous functions on S , is a separable metric space when furnished with the supremum
norm; this is Exercise 30.1. Let { fi} be a countable collection of non-negative elements of
C(S ) whose linear span is dense in C(S ). For each i,
∫
fi dPn is a bounded sequence, so we
240 Convergence of probability measures
have a convergent subsequence. By a diagonalization procedure, we can find a subsequence
n′ such that
∫
fi dPn′ converges for all i. By the term “diagonalization procedure” we are
referring to the well-known method of proof of the Ascoli–Arzelà theorem; see any book
on real analysis for a detailed explanation. Call the limit L fi. Clearly 0 ≤ L fi ≤ ‖ fi‖∞,
L is linear, and so we can extend L to a bounded linear functional on S . By the Riesz
representation theorem (Rudin, 1987), there exists a measure P such that L f = ∫ f dP.
Since
∫
fi dPn′ →
∫
fi dP for all fi, it is not hard to see, since each Pn′ has total mass 1, that∫
f dPn′ →
∫
f dP for all f ∈ C(S ). Therefore Pn′ converges weakly to P. Since L f ≥ 0 if
f ≥ 0, then P is a positive measure. The function that is identically equal to 1 is bounded
and continuous, so 1 = Pn′ (S ) =
∫
1 dPn′ →
∫
1 dP, or P(S ) = 1.
Next suppose that S is a Borel subset of a compact metric space S ′. Extend each Pn,
initially defined on S , to S ′ by setting Pn(S ′ \ S ) = 0. By the first paragraph of the proof,
there is a subsequence Pn′ that converges weakly to a probability P on S ′ (the definition of
weak convergence here is relative to the topology on S ′). Since the Pn are tight, there exist
compact subsets Km of S such that Pn(Km) ≥ 1 − 1/m for all n. The Km will also be compact
relative to the topology on S ′, so by Theorem 30.2,
P(Km) ≥ lim sup
n′
Pn′ (Km) ≥ 1 − 1/m.
Since ∪mKm ⊂ S , we conclude P(S ) = 1.
If G is open in S , then G = H ∩ S for some H open in S ′. Then
lim inf
n′
Pn′ (G) = lim inf
n′
Pn′ (H ) ≥ P(H ) = P(H ∩ S ) = P(G).
Thus by Theorem 30.2, Pn′ converges weakly to P relative to the topology on S .
Now let S be an arbitrary metric space. Since all the Pn’s are supported on ∪mKm, we can
replace S by ∪mKm, or we may as well assume that S is σ -compact, and hence separable. It
remains to embed the separable metric space S into a compact metric space S ′. If d is the
metric on S , d ∧ 1 will also be an equivalent metric, that is, one that generates the same
collection of open sets, so we may assume d is bounded by 1. Now S can be embedded in
S ′ = [0, 1]N as follows. We define a metric on S ′ by
d ′(a, b) =
∞∑
i=1
2−i(|ai − bi| ∧ 1), a = (a1, a2, . . .), b = (b1, b2, . . .). (30.3)
Being the product of compact spaces, S ′ is itself compact. If {z j} is a countable dense subset
of S , let I : S → [0, 1]N be defined by
I(x) = (d(x, z1), d(x, z2), . . .).
We leave it to the reader to check that I is a one-to-one continuous open map of S to a subset
of S ′. Since S is σ -compact, and the continuous image of compact sets is compact, then
I(S ) is a Borel set.
Clearly, Prohorov’s theorem is easily modified to handle the case of finite measures
on S .
30.3 Metrics for weak convergence 241
30.3 Metrics for weak convergence
Since we have defined a notion of convergence of probability measures, one might wonder
if one can make the set of probability measures M on S into a metric space so that weak
convergence is equivalent to convergence in M. This is indeed possible and in fact there are
a number of metrics on the space of probability measures that work. We will focus on the
Prohorov metric.
Definition 30.5 If P and Q are probability measures on a separable metric space S , define
dM(P, Q) = inf{ε : P(F ) ≤ Q(Fε ) + ε for all F closed}. (30.4)
It is not immediately obvious that dM is even a metric, so the first task is to show that it is.
Proposition 30.6 dM is a metric on M.
Proof We start with symmetry, that is, that dM(Q, P) = dM(P, Q). Let α be any real
number larger than dM(P, Q). If H is closed, then Hα = {x : d(x, H ) < α} is open and
K = S \ Hα is closed. Note that H ⊂ S − Kα, where Kα = {x : d(x, K) < α}, because
if x ∈ H , then d(x, K) ≥ α, so x /∈ Kα and hence x ∈ S \ Kα. Since K is closed, by the
definition of dM(P, Q),
P(Hα ) = 1 − P(K) ≥ 1 − Q(Kα ) − α = Q(S \ Kα ) − α ≥ Q(H ) − α,
or Q(H ) ≤ P(Hα ) + α. Since H was an arbitrary closed set, dM(Q, P) ≤ α, and it follows
that dM(Q, P) ≤ dM(P, Q). Reversing the roles of P and Q shows symmetry.
Clearly dM(P, Q) ≥ 0. If dM(P, Q) = 0, then P(F ) = Q(F ) = 0 for all closed sets
F . Since the collection of closed sets generates the Borel σ -field, it is not hard to see that
P(A) = Q(A) for all Borel subsets A, and hence P = Q.
Finally we prove the triangle inequality. Suppose P, Q, R ∈ M. If α is any real larger than
dM(P, Q) and β any real larger than dM(Q, R), then for any ε > 0 and any closed set F

P(F ) ≤ Q(Fα ) + α ≤ Q(Fα) + α

≤ R((Fα )β ) + α + β

≤ R(Fα+β+ε ) + (α + β + ε).

Therefore dM(P, R) ≤ α + β + ε, and since ε is arbitrary, the triangle inequality follows.

Now we show that weak convergence is equivalent to convergence in the topology gen-

erated by dM, at least if S is separable. (L∞[0, 1] is an example of a nonseparable metric

space.)

Proposition 30.7 SupposeS is a separable metric space. A sequence of probability measures

Pn on S converges weakly to a probability P if and only if dM(Pn, P) → 0.

Proof We first suppose dM(Pn, P) → 0 and show that Pn converges weakly to P. Separa-

bility is not used in this part of the proof. Suppose F is closed and set εn = dM(Pn, P)+1/n.

Since Pn(F ) ≤ P(Fεn ) + εn, then

lim sup

n

Pn(F ) ≤ lim sup

n

P(Fεn ) = P(F ),

and we now apply Theorem 30.2(2).

242 Convergence of probability measures

We now suppose Pn converges weakly to P. Let ε > 0. Cover S with countably many

balls {Bi} of diameter less than ε/2 (separability is used here) and let A1 = B1, A2 = B2 \ B1,

A3 = B3 \ (B1 ∪ B2), A4 = B4 \ (B1 ∪ B2 ∪ B3), and so on. Hence the An form a collection of

disjoint sets which coverS and each An has diameter less than ε/2. Choose N large enough so

that P(∪Ni=1Ai) > 1 − ε/2. Let G be the collection of open sets of the form (Ai1 ∪ · · ·∪ Aij )ε/2

such that i1, . . . , i j ≤ N . That is, we look at all finite unions of A1, . . . , AN , and then take the

(ε/2)-enlargements. The collection G is finite. This fact and Theorem 30.2(3) imply that we

can find n0 such that P(G) ≤ Pn(G) + ε/2 if n ≥ n0 and G ∈ G.

Suppose F is closed. Let G = (∪{Ai : i ≤ N, Ai ∩ F �= ∅})ε/2. Then G ∈ G and if n ≥ n0

P(F ) ≤ P(G) + P(∪∞i=N+1Ai) ≤ P(G) + ε/2

≤ Pn(G) + ε ≤ Pn(Fε ) + ε.

In the last inequality we used the definition of G and the fact that the Ai have diameters less

than ε/2. This shows dM(P, Pn) ≤ ε if n ≥ n0, which in turn implies dM(P, Pn) → 0.

Exercises

30.1 If S is a metric space, then it is well known that C(S), the collection of continuous functions

with the metric

d( f , g) = sup

x∈S

| f (x) − g(x)|

is a metric space. Show that if S is compact, then C(S) is separable.

30.2 Suppose Xn converges weakly to X and the random variables Zn are such that d(Xn, Zn)

converges to 0 in probability. Prove that Zn converges weakly to X . This is known as Slutsky’s

theorem.

Hint: Start with P(Zn ∈ F ) ≤ P(Xn ∈ Fδ ) + P(d(Xn, Zn) ≥ δ).

30.3 Suppose Xn take values in a normed linear space and converge weakly to X . Suppose cn are

scalars converging to c. Show cnXn converges weakly to cX .

30.4 Give an example of a sequence Pn converging weakly to P and a function f that is continuous

but not bounded such that

∫

f dPn does not converge to

∫

f dP.

30.5 Give an example of a sequence Pn converging weakly to P and a function f that is bounded but

not continuous such that

∫

f dPn does not converge to

∫

f dP.

30.6 Show that if Xn converges weakly to X and Yn converges in probability to 0, then XnYn converges

in probability to 0.

30.7 This exercise considers a sequence of probability measures that have densities. Suppose S is

furnished with the Borel σ -field and μ is a measure on S. Suppose that fn : S → [0,∞) and

f : S → [0,∞) are measurable functions, each of whose integral over S is one, and define

Pn(A) =

∫

A fn(x) μ(dx) for each n and P(A) =

∫

A f (x) μ(dx).

(1) Show that if fn → f , μ-a.e., then Pn converges weakly to P.

(2) Give an example where Pn and P are as above, Pn converges weakly to P, but fn does not

converge almost everywhere to f .

Notes 243

30.8 Give an example of continuous processes Xn and X such that all the finite-dimensional distri-

butions of Xn converge weakly to the corresponding finite-dimensional distributions of X , but

where Xn does not converge weakly to X with respect to the topology of C[0, 1].

30.9 Suppose X is a random variable taking values in a complete separable metric space. If ε > 0,

show there exists a compact set K such that P(X /∈ K) < ε.
Hint: For each n choose closed balls {Bni, i = 1, . . . , Nn} such that
P(X /∈ ∪Nni=1Bni) < ε/2n+1.
Then K = ∩∞n=1 ∪Nni=1 Bni is totally bounded, hence compact.
30.10 Suppose Xn converges weakly to X and the metric space S is complete and separable. Prove that
the sequence {Xn} is tight.
30.11 Let L be the collection of continuous functions on S such that
(1) supx∈S | f (x)| ≤ 1.
(2) | f (x) − f (y)| ≤ d(x, y) for all x, y ∈ S.
Define
dL(P, Q) = sup
f ∈L
∣∣∣ ∫ f dP − ∫ f dQ∣∣∣.
Show that dL is a metric on the collection of probability measures on the Borel σ -field ofS. Prove
that a sequence of probability measures Pn converges weakly to P if and only if dL(Pn, P) → 0.
30.12 Suppose S is a separable metric space. Show that M is separable.
Notes
For more information, see Billingsley (1968) and Ethier and Kurtz (1986).
31
Skorokhod representation
Suppose S is a complete separable metric space furnished with the Borel σ -field. We are
going to show that if Xn are random variables taking values in S converging weakly to a
random variable X , then we can find another probability space and other random variables
X ′n, X
′ such that the law of X ′n equals the law of Xn for each n, the law of X
′ equals the law
of X , and X ′n converges to X
′ almost surely.
Let �′ = [0, 1], F ′ the Borel σ -field on [0, 1], and P′ Lebesgue measure. We first
prove
Theorem 31.1 Let P be a probability measure on S . Then there exists a random variable X
mapping �′ to S such that the law of X ′ under P′ is equal to P.
Proof For each k ≥ 1, let {Aki} be a countable disjoint covering of S by Borel sets of
diameter less than 1/k, such that P(∂Aki) = 0, and {Aki} is a refinement of {Ak−1,i}. We
can construct these families inductively. To start, cover S with countably many balls of
radius less than 1. Since for each x0, P({x : |x − x0| = r}) can be nonzero for at most
countably many values of r, we can arrange matters so that the P-measure of the boundary
of these balls is 0. We order the balls B1, B2, . . . , and then let A11 = B1, A12 = B2 \ B1,
A13 = B3 \ (B1 ∪ B2), and so on. To construct {A2i}, we first find a similar covering of S by
sets {A′2i} of diameter less than 1/2, and then take all intersections of sets in {A′2i} with sets
in {A1 j}.
We inductively define closed subintervals of [0, 1] by choosing I11 to have left endpoint
at 0 and length equal to P(A11), then I12 to have left endpoint equal to the right endpoint of
I11 and length equal to P(A12), and so forth. We then decompose I11 into subintervals {I21} in
an analogous way so that the lengths of the subintervals match the probabilities of the A2i’s
contained in A11. We then subdivide I12, and so on. We observe that {Iki} is a refinement of
{Ik−1,i} for all k ≥ 2 and P′(Iki) = P(Aki) for all k and i.
Pick a point xki ∈ Aki for each k and i. We define X k by setting X k(ω′) equal to xki if
ω′ ∈ Iki. (The set of endpoints of the Iki is countable, hence has Lebesgue measure 0, and it
doesn’t matter how we define X k at those points.) For each ω′ except those that are endpoints
of some Iki, if n ≥ m, then X n(ω′) and X m(ω′) are in the same Ami for some i. Since the
diameter of Ami is less than 1/m, we see that d(X n(ω′), X m(ω′)) ≤ 1/m. That is, X n(ω′)
is a Cauchy sequence. The space S is complete, so we can define X (ω′) to be the limit of
the X n(ω′). The collection of endpoints of the Imi is countable, so the limit exists for almost
every ω′.
244
Skorokhod representation 245
It remains to show that the law of X under P′ is P. Let F be a closed set, let Fk =
{x : d(x, F ) < 1/k}, and let Jk = {i : Aki ∩ F �= ∅}. We have
P′(X k ∈ F ) ≤ P′(X k ∈ ∪i∈Jk Aki) ≤
∑
i∈Jk
P′(X k ∈ Aki)
=
∑
i∈Jk
P′(Iki) =
∑
i∈Jk
P(Aki) ≤ P(Fk ).
We used the fact that each Aki has diameter less than 1/k. Hence
lim sup
k
P′(X k ∈ F ) ≤ P(F ).
Therefore the laws of X k under P′ converge weakly to P. But we know d(X k(ω′), X (ω′))
≤ 1/k, so X k converges to X , a.s., with respect to P′. If f is continuous and bounded,
E ′ f (X k ) → E ′ f (X ) by dominated convergence, so X k → X weakly. Therefore the law of
X under P′ is equal to P.
We did not need the fact that the Aki were continuity sets, i.e., that the probability of the
boundary of Aki is zero, but this will be used in the next theorem, which is known as the
Skorokhod representation.
Theorem 31.2 Suppose Pn are probability measures on S converging weakly to P. Then
there exist random variables Xn mapping �′ to S with laws Pn and a random variable X
mapping �′ to S with law P such that Xn → X , a.s.
Equivalently, if X ′n converges to X
′ weakly, there exist random variables Xn and X mapping
�′ to S with laws equal to X ′n and X , respectively, such that Xn → X , a.s.
Proof Let the Aki be as in the proof of the previous theorem, and for each Pn define
intervals Inki and random variables X
k
n as was done above, and let Xn be the limit of the X
k
n ’s.
Let Kkn = {i : P(Aki) > Pn(Aki)} and Kckn = {i : P(Aki) ≤ Pn(Aki)}. Since∑

i

[P(Aki) − Pn(Aki)] = 1 − 1 = 0,

we have ∑

Kckn

[P(Aki) − Pn(Aki)] = −

∑

Kkn

[P(Aki) − Pn(Aki)].

Hence ∑

i

|P′(Iki) − P′(Inki)| =

∑

i

|P(Aki) − Pn(Aki)| (31.1)

=

∑

Kkn

[P(Aki) − Pn(Aki)] −

∑

Kckn

[P(Aki) − Pn(Aki)]

= 2

∑

Kkn

[P(Aki) − Pn(Aki)]

= 2

∑

i

[P(Aki) − Pn(Aki)]+.

Each term in the sum on the last line goes to 0 as n → ∞ by Theorem 30.2 because the

Aki are P-continuity sets, that is, P(∂Aki) = 0; also each term is dominated by P(Aki), and

246 Skorokhod representation∑

i P(Aki) = 1. Therefore by dominated convergence the sum on the last line of (31.1)

goes to 0.

Fix k and j and let α, αn be the left-hand endpoints of Ik j, Ink j, respectively. Then (31.1)

allows us to use dominated convergence to conclude that

α =

∑

i∈J

P′(Iki) = lim

n→∞

∑

i∈J

P′(Inki) = limn→∞ αn,

where J consists of those i such that Iki is to the left of Ik j; note that for i ∈ J we have that Inki

is to the left of Ink j and conversely, if I

n

ki is to the left of I

n

k j, then i ∈ J . Similarly the right-hand

endpoint of Ink j converges to the right-hand endpoint of Ik j.

If ω′ is in the interior of Ik j, then it will be in the interior of Ink j for all sufficiently large n.

This means that for n sufficiently large,

d(X (ω′), Xn(ω′) ≤ 2/k.

This implies our result.

Exercises

31.1 Suppose f is bounded, Xn converges to X weakly, and also that P(X ∈ D f ) = 0, where

D f = {x : f is not continuous at x}. Show that f (Xn) converges weakly to f (X ).

31.2 Suppose a sequence {Xn} is uniformly integrable and Xn converges to X weakly. Show E Xn →

E X .

31.3 Give an example of a sequence of random variables Xn converging weakly to X and where each

Xn is integrable, but X is not integrable.

31.4 Suppose Xn converges weakly to X and each Xn is non-negative. Prove that

E X ≤ lim inf

n→∞ E Xn.

31.5 Suppose Xn converges weakly to X and each Xn has the property that with probability one,

|Xn(t) − Xn(s)| ≤ |t − s|, s, t ≤ 1.

(This might arise, for example, if each Xn is of the form Xn(t) =

∫ t

0 Yn(s) ds and each Yn is

bounded by 1.) Prove that X has this same property, that is, with probability one,

|X (t) − X (s)| ≤ |t − s|, s, t ≤ 1.

31.6 Here is a way to prove one direction of Lebesgue’s theorem on Riemann integrable functions.

(1) For each n ≥ 1 and each i ≤ n, let xin be a point in [(i−1)/n, i/n). Let Pn be the probability

measure that assigns mass 1/n to each point xin, i = 1, 2, . . . , n. Show that Pn converges weakly

to P, where P is a Lebesgue measure on [0, 1].

(2) Suppose f is a bounded function which is continuous at almost every point of [0, 1]. Show

that

∫

f dPn →

∫

f dP. Note that

∫

f dPn is a Riemann sum approximation to

∫ 1

0 f (x) dx.

32

The space C[0, 1]

We examine weak convergence for the space C[0, 1], the set of continuous real-valued

functions on [0, 1]. We give a criterion for the laws of a sequence of continuous stochastic

processes to be tight. We apply these results to show that a simple symmetric random walk

converges weakly to a Brownian motion, which in particular gives another construction of

Brownian motion.

32.1 Tightness

Let C[0, 1] be the collection of continuous real-valued functions from [0, 1] into R. We make

C[0, 1] into a metric space by defining

d( f , g) = sup

t∈[0,1]

| f (t) − g(t)|,

and it is well known that C[0, 1] is separable and complete. We recall the Ascoli–Arzelà

theorem: if a family F of functions on a compact set is equicontinuous and uniformly

bounded at one point, then every subsequence in F has a further subsequence in F that

converges. Rephrased another way, if the family F is equicontinuous and uniformly bounded

at one point, then the closure of F is compact. We furnish C[0, 1] with the Borel σ -field.

Given a continuous function f on [0, 1], we define ω f , the modulus of continuity of f , by

ω f (δ) = sup

s,t∈[0,1],|t−s|<δ
| f (t) − f (s)|.
We have the following criterion for a sequence of continuous processes to be tight.
Theorem 32.1 Suppose the Xn are continuous real-valued processes. Suppose for each ε
and η > 0 there exist n0, A, and δ (depending on ε and η) such that if n ≥ n0, then

P(ωXn (δ) ≥ ε) ≤ η (32.1)

and

P(|Xn(0)| ≥ A) ≤ η. (32.2)

Then the Xn are tight.

Proof Since each Xi is a continuous process, then for each i, P(ωXi (δ) ≥ ε) → 0 as δ → 0

by dominated convergence. Hence, given ε and η we can, by taking δ smaller if necessary,

assume that (32.1) holds for all n.

247

248 The space C[0, 1]

Choose εm = ηm = 2−m and consider the δm and Am so that

sup

n

P(ωXn (δm) ≥ 2−m) ≤ 2−m

and

sup

n

P(|Xn(0)| ≥ Am) ≤ 2−m.

Let

Km0 = { f ∈ C[0, 1] : sup

s,t∈[0,1],|t−s|≤δm

| f (t) − f (s)| ≤ 2−m for all m ≥ m0,

| f (0)| ≤ Am0}.

Each Km0 is an equicontinuous family, and by the Ascoli–Arzelá theorem, each Km0 is a

compact subset of C[0, 1]. We have

P(Xn /∈ Km0 ) ≤ P(|Xn(0)| ≥ Am0 ) +

∞∑

m=m0

P(ωXn (δm) ≥ εm)

≤ 2−m0 +

∞∑

m=m0

2−m = 3 · 2−m0 .

This proves tightness.

We have given one criterion for a process to have continuous paths, namely, Theorem 8.1.

In the case of Markov processes, we have given another: Theorem 21.5.

32.2 A construction of Brownian motion

We will now use the results of Section 32.1 to give a construction of Brownian motion, quite

different from that of Chapter 6.

Let Yi be i.i.d. random variables with P(Yi = 1) = P(Yi = −1) = 12 . Then Sn =

∑n

i=1 Yi

is a simple symmetric random walk. Let Zn(t) = Snt/√n for t a multiple of 1/n and define

Znt by linear interpolation for other t. That is, if k/n ≤ t ≤ (k + 1)/n, then

Znt =

(k + 1) − nt√

n

Sk + nt − k√

n

Sk+1. (32.3)

The Zn are continuous processes. Let Pn be the law of Zn, which will be a probability measure

on C[0, 1].

Theorem 32.2 The sequence Pn converges weakly to a probability measure P∞ on C[0, 1],

and P∞ is the law of a Brownian motion.

Proof The main step is to prove that the Pn are tight. We then show that any subsequential

limit point is a Wiener measure, that is, the law of a Brownian motion. We can then appeal

to Theorem 31.1 to obtain the process X , which will be a Brownian motion.

A computation shows that

E S4n =

n∑

i=1

EY 4i +

∑

i�= j

(EY 2i )(EY

2

j ) ≤ cn2, (32.4)

32.2 A construction of Brownian motion 249

since EYi and EY 3i are both 0, the Yi’s are independent, and the second sum has n(n−1) ≤ n2

terms.

If s and t are multiples of 1/n, then

E |Zt − Zs|4 = 1

n2

E

( nt∑

i=ns+1

Yi

)4

= 1

n2

E

( nt−ns∑

i=1

Yi

)4

(32.5)

≤ c

n2

n2|t − s|2 ≤ c|t − s|2.

If we tried to get by with only the second moment, we would only end up with c|t − s|, which

is not good enough for Theorem 8.1.

At this point we would like to apply Theorem 32.1, but we have the technical nuisance

that s and t might not be multiples of 1/n. If |t − s| ≤ 2/n, then by the construction of Zn

using linear interpolation and the fact that the Yi’s are bounded by one in absolute value, we

have |Zn(t) − Zn(s)| ≤ c|t − s|√n and then

E |Zn(t) − Zn(s)|4 ≤ c|t − s|4n2 ≤ c|t − s|2. (32.6)

Suppose |t − s| > 2/n. Let s′ be the largest multiple of 1/n less than or equal to s and t ′ the

largest multiple of 1/n larger than or equal to t. Using (32.5) and (32.6),

E |Zn(t) − Zn(s)|4 ≤ cE |Zn(t) − Zn(t ′)|4 + cE |Zn(t ′) − Zn(s′)|4 + E |Zn(s′) − Zn(s)|4

≤ c|t − t ′|2 + c|t ′ − s′|2 + c|s′ − s|2

≤ c|t − s|2,

since |t − t ′|, |t ′ − s′|, and |s − s′| are all less than c|t − s|. Note Zn(0) = 0 for all n. We now

apply Theorems 8.1 and 32.1 to obtain the tightness.

Any subsequential limit point is a probability measure on C[0, 1], so to show that the

limit is a Brownian motion, it is enough by Theorem 2.6 to show that the finite-dimensional

distributions under the limit law P∞ agree with those of Brownian motion. Fix t. Then Zn(t)

differs from S[nt]/

√

n by at most 1/

√

n, where [nt] is the largest integer less than or equal to

nt. By the central limit theorem (Theorem A.51), S[nt]/

√

[nt] converges weakly (with respect

to the topology of R) to a mean zero normal random variable with variance one. By Exercise

30.3, S[nt]/

√

n converges weakly to a mean zero normal random variable with variance t,

and by Exercise 30.2, Zn(t) converges weakly to a mean zero normal random variable with

variance t. This shows that the one-dimensional distributions of Zn converge weakly to the

one-dimensional distributions of a Brownian motion. We leave the analogous argument for

the higher-dimensional distributions to the reader.

One can also use Doob’s inequalities to obtain the necessary tightness estimate. If s and t

are multiples of 1/n, we have

P( max

ns≤k≤nt

|Sk − Sns| > λ

√

n) ≤ cE |Snt − Sns|

4

λ4n2

(32.7)

≤ c |t − s|

2

λ4

.

250 The space C[0, 1]

Exercises

32.1 The support of a measure λ is the smallest closed set F such that λ(Fc) = 0. Let P be a Wiener

measure on C[0, 1], i.e., the law of a Brownian motion on [0, 1]. Use Exercise 13.4 to prove that

the support of P is all of C[0, 1].

32.2 Let (S, d) be a complete separable metric space and let R be a subset of S. Then (R, d) is also

a metric space. If Xn converges weakly to X with respect to the topology of (S, d) and each Xn

and X take values in R, does Xn converge weakly to X with respect to the topology of (R, d)?

Does the answer change if R is a closed subset of S?

If Xn and X take values in R and Xn converges weakly to X with respect to the topology of

(R, d), does Xn converge weakly to X with respect to the topology of (S, d)? What if R is a

closed subset of S?

32.3 Give a proof of Theorem 32.2 using (32.7) in place of Theorem 8.1.

32.4 Suppose (X ,W, P) is a weak solution to

dXt = σ (Xt ) dWt + b(Xt ) dt, X0 = x, (32.8)

where W is a one-dimensional Brownian motion and σ and b are bounded and continuous, but

we do not assume that σ is bounded below by a positive constant. Suppose the solution to (32.8)

is unique in law.

Suppose σn and bn are Lipschitz functions which are uniformly bounded and which converge

uniformly to σ and b, respectively. Let Xt (n) be the unique pathwise solution to

dYt = σn(Yt ) dWt + bn(Yt ) dt, Y0 = x;

the probability measure here is P. Prove that X (n) converges weakly to X with respect to C[0, 1].

32.5 Let W be a d-dimensional Brownian motion and let {Xt , t ∈ [0, 1]} be the solution to (24.22). If

x ∈ Rd , prove that the support of Px is all of C[0, 1].

33

Gaussian processes

A Gaussian process is a stochastic process where each of the finite-dimensional distributions

is jointly normal. We will primarily, but not exclusively, be concerned with Gaussian processes

that have continuous paths. For much of what we consider, it is not essential that the index

set of times be [0, ∞), and can in fact be almost any set. We will thus consider {Xt : t ∈ T }

for some index set T , and where for every finite subset S of T , the collection {Xs : s ∈ S} is

jointly normal.

33.1 Reproducing kernel Hilbert spaces

We define the covariance function

by

(s, t) = E [(Xs − E Xs)(Xt − E Xt )], s, t ∈ T. (33.1)

For our purposes, having a non-zero mean just complicates formulas without adding anything

interesting, so in this chapter we will assume E Xt = 0 for all t ∈ T , and (33.1) becomes

(s, t) = E [XsXt], s, t ∈ T. (33.2)

We first show how

can be used to construct a Hilbert space called the reproducing kernel

Hilbert space (RKHS).

When we write

(s, ·), we mean that we fix an element s ∈ T and then consider the

function g : T → R defined by g(t) =

(s, t) for t ∈ T . Let K be the collection of finite

linear combinations of the functions

(s, ·), s ∈ T . Thus each element of K has the form

m∑

j=1

aj

(s j, ·),

where m ≥ 1, the aj’s are real, and each s j, j = 1, . . . , m, is an element of T . If f =∑m

j=1 aj

(s j, ·) and g =

∑n

k=1 bk

(tk, ·), define

〈 f , g〉RKHS =

m∑

j=1

n∑

k=1

ajbk

(s j, tk ).

We define H to be the closure of K with respect to the norm induced by the inner product

〈·, ·〉RKHS.

We need to show that this bilinear form is indeed an inner product, that what is known as

the reproducing property holds, and that H is a Hilbert space.

251

252 Gaussian processes

We start with the reproducing property. If f = ∑mj=1 aj

(s j, ·), then the reproducing

property applied to f is the formula

〈 f ,

(t, ·)〉RKHS = f (t). (33.3)

This follows from

〈 f ,

(t, ·)〉RKHS =

m∑

j=1

aj

(s j, t) = f (t).

By taking limits, (33.3) holds for all f ∈ H.

To show that 〈·, ·〉RKHS is an inner product, notice that when

f =

∑

aj

(s j, ·) ∈ K,

then

〈 f , f 〉RKHS =

m∑

j=1

m∑

k=1

ajak

(s j, sk ) =

m∑

j,k=1

ajakE [Xsj Xsk ]

= E

( m∑

j=1

ajXsj

)2

≥ 0.

The Cauchy–Schwarz inequality holds for 〈·, ·〉RKHS (the standard proof of the Cauchy–

Schwarz inequality applies), and so if 〈 f , f 〉RKHS = 0, then

| f (t)|2 = 〈 f ,

(t, ·)〉2RKHS ≤ 〈 f , f 〉RKHS 〈

(t, ·),

(t, ·)〉RKHS = 0,

and thus f is zero.

If fn is a Cauchy sequence with respect to the norm

‖g‖RKHS = 〈g, g〉1/2RKHS,

then

| fn(t) − fm(t)|2 = 〈 fn − fm,

(t, ·)〉2RKHS

≤ 〈 fn − fm, fn − fm〉RKHS 〈

(t, ·),

(t, ·)〉RKHS,

which tends to 0 as n, m → ∞. Thus fn converges pointwise. This is enough to prove H is

complete; this is Exercise 33.1.

We summarize.

Proposition 33.1 H with the inner product 〈·, ·〉RKHS is a Hilbert space. Moreover, if f ∈ H

and t ∈ T , then

〈 f ,

(t, ·)〉RKHS = f (t).

We consider another Hilbert space M, the closure of the linear span of {Xt : t ∈ T } with

respect to L2(P). We define

〈Y, Z〉M = E [Y Z]

33.1 Reproducing kernel Hilbert spaces 253

if Y and Z are both finite linear combinations of the Xt’s. Thus if m, n ≥ 1, aj, bk ∈ R, we set⟨ m∑

j=1

ajXsj ,

n∑

k=1

bkXtk

⟩

M

=

m∑

j=1

n∑

k=1

ajbkE [Xsj Xtk ], (33.4)

and we let M be the closure of the collection of random variables of the form ∑mj=1 ajXsj

with respect to 〈·, ·〉M. Since

(s j, tk ) = E [Xsj Xtk ], from (33.4) we see that H and M

are isomorphic, where we have a one-to-one correspondence between

∑m

j=1 aj

(s j, ·) and∑m

j=1 ajXsj .

Let {en} be a complete orthonormal system for H. Let Yn be the element of M corre-

sponding to en. Then

E [YnYm] = 〈Yn,Ym〉M = 〈en, em〉RKHS = δnm,

where δnm is 0 if n �= m and 1 if n = m. This implies that the Yn are independent normal

random variables with mean zero and variance one; see Proposition A.55. (Recall that we

are assuming that all the Xt’s have mean zero.)

Since

(s, ·) is an element of H, we can write

(s, ·) =

∞∑

n=1

〈

(s, ·), en〉RKHS en(·) =

∞∑

n=1

en(s)en(·).

Using the correspondence between H and M, we have

Xs =

∞∑

n=1

en(s)Yn,

where the Yn are i.i.d. standard normal variables. This is known as the Karhunen–Loève

expansion of a Gaussian process.

Example 33.2 Let’s see what this expansion is in the case of Brownian motion. If we define

〈 f , g〉CM =

∫ 1

0

f ′(r)g′(r) dr (33.5)

for f and g whose first derivatives are in L2([0, 1]) and such that f (0) = g(0) = 0, then

because

(s, t) = s ∧ t,

〈

(s, ·),

(t, ·)〉CM =

∫ 1

0

1[0,s)(r)1[0,t)(r) dr = s ∧ t

=

(s, t),

and we see that we have identified the reproducing kernel Hilbert space for Brownian motion

on [0, 1]. The notation 〈·, ·〉CM is used because the Hilbert space with this inner product is

called the Cameron–Martin space, a space that has many connections with Brownian motion.

If en(s) =

√

2 sin(nπs)/nπ , then the sequence {en} is a complete orthonormal sequence

for the Cameron–Martin space. The Karhunen–Loève expansion is equivalent to the formula

(6.2) that we used in our first construction of Brownian motion.

254 Gaussian processes

33.2 Continuous Gaussian processes

We now turn to the construction of Gaussian processes with continuous paths. Suppose we

have an index set T and a non-negative definite kernel

(·, ·). Saying

is non-negative

definite means that for each n and each t1, . . . , tn ∈ T , the matrix whose (i, j) entry is

(ti, t j) is a non-negative definite matrix. We define a metric on T by defining

d(s, t) = (Var (Xt − Xs))1/2.

Actually, d is a pseudo-metric because d(s, t) = 0 does not necessarily imply t = s. An

ε-ball is a set of the form {t ∈ T : d(t, t0) < ε} for some t0. Let N (ε) be the minimum
number of ε-balls needed to cover T .
Theorem 33.3 Let
: T × T → R be continuous with respect to the pseudo-metric d,
symmetric, and non-negative definite. If for some β < 1 and some constant c we have
log N (ε) ≤ cε−β, ε ∈ (0, 1), (33.6)
then there exists a continuous Gaussian process {Xt : t ∈ T } with covariance kernel
.
One can in fact be more precise than (33.6) and give an integral condition that N (x) must
satisfy for x small.
Before proving Theorem 33.3, let us look at a number of examples.
Example 33.4 In the case of Brownian motion, Var (Xt − Xs) = |t − s|, so that d(s, t) =
|s − t|1/2. If T is the interval [0, 1], then the set of intervals of length ε2 and centers
kε2/4, k = 0, 1, . . . , 4/ε2, is a collection of ε-balls covering [0, 1]. Therefore N (ε) ≤ c/ε2,
implying log N (ε) ≤ c log(1/ε), which satisfies (33.6). This and Theorem 2.4 gives a
construction of Brownian motion.
Example 33.5 We look at fractional Brownian motion. Let H ∈ (0, 2). H is known as the
Hurst index, where H = 1 corresponds to Brownian motion. Define
(s, t) = |s|H + |t|H − |s − t|H .
This leads to d(s, t) = c|t − s|H/2. Open intervals of length ε2/H are ε-balls, and it takes
cε−2/H of them to cover [0, 1]. Therefore again N (ε) ≤ c log(1/ε), and (33.6) applies. One
use of fractional Brownian motion is to model stock prices where there is more or less
memory of the past than a Brownian motion has.
Example 33.6 Here is our first example of a Gaussian process where T is not a subset of
[0, ∞). We construct a Brownian sheet, X (t1, t2), where the points (t1, t2) ∈ [0, 1]2. More
generally we can consider X (t), where t ∈ [0, 1]d . This is no harder, but for simplicity of
notation we consider only the case d = 2. If s = (s1, s2) and t = (t1, t2), define
(s, t) = (s1 ∧ t1)(s2 ∧ t2).
One motivation for this formula is to identify the point (t1, t2) with the rectangle Rt whose
lower left corner is at the origin and whose upper right corner is at (t1, t2). Then the covariance
of Xs and Xt is the area of Rs ∩ Rt .
33.2 Continuous Gaussian processes 255
Some simple geometry shows that if we put ε-balls centered at the points (c1 jε2, c1kε2)
for an appropriate c1 and with j, k ≤ c2ε−2, we cover T . Therefore N (ε) ≤ cε−4, and so
log N (ε) ≤ c log(1/ε).
Example 33.7 We can generalize the last example. For every Borel subset A of [0, 1]d , let
XA be a Gaussian random variable. We want the covariance of XA and XB to be the Lebesgue
measure of A ∩ B. This is known as a set-indexed process. If we let T be the collection of
all Borel subsets of [0, 1]d , one cannot get a continuous Gaussian process. In order to get a
continuous process X one must restrict T to be a subcollection of sets whose boundaries are
sufficiently smooth; see Dudley (1973).
Example 33.8 Our last example has a more complicated index set. Let W be a one-
dimensional Brownian motion. If f ∈ L2[0, 1], define
Xf =
∫ 1
0
f (s) dWs.
By Exercise 24.6, Xf is a Gaussian random variable with mean 0 and variance
∫ 1
0 f (s)
2 ds
and the covariance of Xf and Xg is
∫ 1
0 f (s)g(s) ds. It follows that
d( f , g)2 =
∫ 1
0
( f (s) − g(s))2 ds.
The process Xf is known as a Gaussian field.
For what subsets T of L2([0, 1]) can one define a process Xf that has continuous paths with
respect to d? This means that the map f → Xf (ω) is continuous for almost all ω, where we
use the pseudo-metric d to define open sets in T . It turns out T = { f ∈ L2([0, 1]) : ‖ f ‖2 ≤ 1}
is too large to obtain a continuous Gaussian process, but, for example, T = { f ∈ C2([0, 1]) :
‖ f ‖∞ ≤ 1, ‖ f ′‖∞ ≤ 1, ‖ f ′′‖∞ ≤ 1} is small enough to apply Theorem 33.3.
We now proceed to the proof of Theorem 33.3.
Proof of Theorem 33.3 Since T can be covered by finitely many ε-balls for each ε, it follows
that if A(ε) is the collection of centers for the cover by ε-balls, then A = ∪∞n=1A(2−n) is
a countable dense subset of T . We first label the elements of A by t1, t2, . . . For each n,
we construct the law of (Xt1, . . . , Xtn ). We then use the Kolmogorov extension theorem to
construct the law of {Xt : t ∈ A}. Next we prove that t → Xt is uniformly continuous on A,
almost surely. Finally we define Xt for all t ∈ T by continuity.
Step 1. We construct the law of (Xt1, . . . , Xtn ). Let n be fixed, and let B be an n×n matrix whose
(i, j) entry is
(ti, t j). The matrix B is symmetric, and non-negative definite by hypothesis.
Let Y1, . . . ,Yn be independent normal random variables with mean zero and variance one. If
we let C be the non-negative definite square root of B and
X = CY
(viewed as vectors), or equivalently,
Xti =
n∑
j=1
Ci jYj,
256 Gaussian processes
a simple calculation shows that E [Xtk Xtm ] = Bkm =
(tk, tm). The Xtj ’s are jointly normal
and this gives the first step of the construction.
Step 2. We apply the Kolmogorov extension theorem. Let Pn be the law of (Xt1, . . . , Xtn ). It
is easy to see the consistency property holds for the Pn, so by the Kolmogorov extension
theorem, there exists a probability P on RN such that if we define Xt (ω) by ω(t) for t ∈ A,
the law of (Xt1, . . . , Xtn ) is Pn for each n.
Step 3. We show that except for a null set of probability zero, the map t → Xt (ω) is uniformly
continuous on A.
To prove the uniform continuity, we proceed similarly to Theorem 8.1. For each point
t ∈ A, let t j be the element of A(2− j) closest to t, with some convention for breaking ties.
We will fix J in a moment, and write
Xt = XtJ + (XtJ+1 − XtJ ) + (XtJ+2 − XtJ+1 ) + · · · ,
where the sum is finite because t ∈ A. Let λ > 0. If |Xt − Xs| > λ for some s, t ∈ A with

d(s, t) < 2−, then ω is in one or more of the following events:
(a) the event
EJ = {|XtJ − XsJ | > λ/2 for some sJ , tJ ∈ A(2−J ) with d(sJ , tJ ) ≤ 3 · 2−J };

(b) the event

Fj =

{

|Xtj+1 − Xtj | >

λ

8 j2

for some t j ∈ A(2− j), t j+1 ∈ A(2−( j+1))

with d(t j, t j+1) < 3 · 2− j+1
}
for some j ≥ J ;
(c) the event
Gj =
{
|Xsj+1 − Xsj | >

λ

8 j2

for some s j ∈ A(2− j), s j+1 ∈ A(2−( j+1))

with d(s j, s j+1) < 3 · 2− j+1
}
for some j ≥ J .
First we bound the probability of EJ . There are N (2−J ) elements of A(2−J ), so there are
at most exp(2c(2J )β ) pairs (sJ , tJ ). If d(tJ , sJ ) < 3 · 2−J , then
P(|XsJ − XtJ | > λ/2) ≤ 2 exp

(

− (λ/2)

2

2 · 3 · 2−J

)

.

Therefore the probability of EJ is bounded by

P(Ej) ≤ ec2βJ e−cλ22J .

Since β < 1, this can be made as small as we like by taking J large enough.
For any t j and t j+1 with d(t j, t j+1) < 3 · 2− j+1,
P(|Xtj − Xtj+1 | > λ/(8 j2)) ≤ 2 exp

( λ2/64 j4

6 · 2− j+1

)

.

Exercises 257

There are less than ec2

β j

points in A(2− j) and ec2β( j+1) points in A(2−( j+1)), so less than ec2β j

pairs. Thus the probability of Fj is bounded by

P(Fj) ≤ cec2β j e−cλ22 j/ j4 .

Since β < 1, this is summable in j, and
∑∞
j=J P(Fj) can be made as small as we like if we
take J large enough. We handle the bound for Gj similarly.
Thus, given ε, we have
P( sup
s,t∈A,d(s,t)<2−J
|Xt − Xs| > λ) ≤ ε

if we take J large enough, where J depends on ε and λ. This suffices to prove the uniform

continuity.

Step 4. We use continuity to complete the proof. Define Xt = lims∈A,s→t Xs. The limit exists

and will be a continuous function of t by virtue of the uniform continuity. By Remark A.56,

Xt will have the desired covariance function.

We have been considering Gaussian processes taking values in R, but it is also of interest

to look at Brownian motion taking values in a Hilbert space or a Banach space. There are

three steps to constructing such a process:

(1) constructing Gaussian measures on Banach (or Hilbert) spaces;

(2) getting a suitable estimate on ‖Xt − Xs‖;

(3) constructing a Brownian motion.

Of these three steps, the third follows along the lines we used for real-valued processes.

Steps (1) and (2) require considerable work, and we refer the reader to Bogachev (1998) or

Kuo (1975). A measure μ on a Banach space is called Gaussian if μ ◦ L−1 is a Gaussian

measure on R for every linear functional L on the Banach space.

Exercises

33.1 Finish the proof that H as defined in Section 33.1 is complete.

33.2 Show that if in Example 33.8 we let

T = { f ∈ C1([0, 1]); ‖ f ‖∞ ≤ 1, ‖ f ′‖∞ ≤ 1},

then N (ε) is bounded above by c1ε−1 and bounded below by c2ε−1.

33.3 Suppose X i and Y i are two sequences of Brownian motions with all of the Brownian motions

independent of each other. Let

Zn(s,t) =

1√

n

n∑

i=1

X is Y

i

t .

Prove that Zn converges weakly with respect to the topology of C([0, 1]2) as n → ∞ to a

Brownian sheet.

258 Gaussian processes

33.4 Let X be a Brownian bridge. (This will be studied further in Section 35.2.) This means that X

is a mean zero Gaussian process with

Cov (Xs, Xt ) = s ∧ t − st, 0 ≤ s, t ≤ 1.

Identify the reproducing kernel Hilbert space for X .

33.5 Let X be the Ornstein–Uhlenbeck process started at 0. This was defined in Exercise 19.5.

Identify the reproducing kernel Hilbert space for X .

34

The space D[0, 1]

We define the space D[0, 1] to be the collection of real-valued functions on [0, 1] which

are right continuous with left limits. We will introduce a topology on D = D[0, 1], the

Skorokhod topology, which makes D into a complete separable metric space. We will give

a criterion for a subset of D to be compact, which will lead to some criteria for a family of

probability measures on D to be tight.

34.1 Metrics for D[0, 1]

We write f (t−) for lims

right continuous with left limits, then from some i on, ti must be equal to 1.

Our first try at a metric, ρ, makes D into a separable metric space, but one that is not

complete. Let’s start with ρ anyway, since we need it on the way to the metric d we end up

with.

Let � be the set of functions λ from [0, 1] to [0, 1] that are continuous, strictly increasing,

and such that λ(0) = 0, λ(1) = 1. Define

ρ( f , g) = inf{ε > 0 : ∃λ ∈ � such that sup

t∈[0,1]

|λ(t) − t| < ε,
sup
t∈[0,1]
| f (t) − g(λ(t))| < ε}.
Since the function λ(t) = t is in �, then ρ( f , g) is finite if f , g ∈ D. Clearly ρ( f , g) ≥ 0.
If ρ( f , g) = 0, then either f (t) = g(t) or else f (t) = g(t−) for each t; since elements of D
are right continuous with left limits, it follows that f = g. If λ ∈ �, then so is λ−1 and we
have, setting s = λ−1(t) and noting both s and t range over [0, 1],
sup
t∈[0,1]
|λ−1(t) − t| = sup
s∈[0,1]
|s − λ(s)|
and
sup
t∈[0,1]
| f (λ−1(t)) − g(t)| = sup
s∈[0,1]
| f (s) − g(λ(s))|,
and we conclude ρ( f , g) = ρ(g, f ). The triangle inequality follows from
sup
t∈[0,1]
|λ2 ◦ λ1(t) − t| ≤ sup
t∈[0,1]
|λ1(t) − t| + sup
s∈[0,1]
|λ2(s) − s|
259
260 The space D[0, 1]
and
sup
t∈[0,1]
| f (t) − h(λ2 ◦ λ1(t))| ≤ sup
t∈[0,1]
| f (t) − g(λ1(t))|
+ sup
s∈[0,1]
|g(s) − h(λ2(s))|.
Look at the set of f in D for which there exists an integer k such that f is constant and
equal to a rational on each interval [(i−1)/k, i/k). It is not hard to check (Exercise 34.1) that
the collection of such f ’s is dense in D with respect to ρ, which shows (D, ρ) is separable.
The space D with the metric ρ is not, however, complete; see Exercise 34.2. We therefore
introduce a slightly different metric d. Define
‖λ‖ = sup
s�=t,s,t∈[0,1]
∣∣∣ log λ(t) − λ(s)
t − s
∣∣∣
and let
d( f , g) = inf{ε > 0 : ∃λ ∈ � such that ‖λ‖ ≤ ε, sup

t∈[0,1]

| f (t) − g(λ(t))| ≤ ε}.

Note ‖λ−1‖ = ‖λ‖ and ‖λ2 ◦ λ1‖ ≤ ‖λ1‖ + ‖λ2‖. The symmetry of d and the triangle

inequality follow easily from this, and we conclude d is a metric.

Lemma 34.1 There exists ε0 such that

ρ( f , g) ≤ 2d( f , g)

if d( f , g) < ε0.
(It turns out ε0 = 1/4 will do.)
Proof Since log(1 + 2x)/(2x) → 1 as x → 0, we have
log(1 − 2ε) < −ε < ε < log(1 + 2ε)
if ε is small enough. Suppose d( f , g) < ε and λ is the element of � such that d( f , g) <
‖λ‖ < ε and supt∈[0,1] | f (t) − g(λ(t))| < ε. Since λ(0) = 0, we have
log(1 − 2ε) < −ε < log λ(t)
t
< ε < log(1 + 2ε), (34.1)
or
1 − 2ε < λ(t)
t
< 1 + 2ε, (34.2)
which implies |λ(t) − t| < 2ε, and hence ρ( f , g) ≤ 2d( f , g).
We define the analog ξ f of the modulus of continuity for a function in D as follows. Define
θ f [a, b) = sups,t∈[a,b) | f (t) − f (s)| and
ξ f (δ) = inf{ max
1≤i≤n
θ f [ti−1, ti) : ∃n ≥ 1, 0 = t0 < t1 < · · · < tn = 1
such that ti − ti−1 > δ for all i ≤ n}.

Observe that if f ∈ D, then ξ f (δ) ↓ 0 as δ ↓ 0.

34.1 Metrics for D[0, 1] 261

Lemma 34.2 Suppose δ < 1/4. Let f ∈ D. If ρ( f , g) ≤ δ2, then d( f , g) ≤ 4δ + ξ f (δ).
Proof Choose ti’s such that ti − ti−1 > δ and θ f [ti−1, ti) < ξ f (δ) + δ for each i. Pick μ ∈ �
such that supt | f (t)−g(μ(t))| < δ2 and supt |μ(t)−t| < δ2. Then supt | f (μ−1(t))−g(t)| <
δ2. Set λ(ti) = μ(ti) and let λ be linear in between. Since μ−1(λ(ti)) = ti for all i, then t and
μ−1 ◦ λ(t) always lie in the same subinterval [ti−1, ti). Consequently
| f (t) − g(λ(t))| ≤ | f (t) − f (μ−1(λ(t)))| + | f (μ−1(λ(t))) − g(λ(t))|
≤ θ f (δ) + δ2
≤ ξ f (δ) + δ + δ2 < ξ f (δ) + 4δ.
We have
|λ(ti) − λ(ti−1) − (ti − ti−1)| = |μ(ti) − μ(ti−1) − (ti − ti−1)|
≤ 2δ2 < 2δ(ti − ti−1).
Since λ is defined by linear interpolation,
|λ(t) − λ(s)) − (t − s)| ≤ 2δ|t − s|, s, t ∈ [0, 1],
which leads to ∣∣∣λ(t) − λ(s)
t − s − 1
∣∣∣ ≤ 2δ,
or
log(1 − 2δ) ≤ log
(λ(t) − λ(s)
t − s
)
≤ log(1 + 2δ).
Since δ < 14 , we have ‖λ‖ ≤ 4δ.
Proposition 34.3 The metrics d and ρ are equivalent, i.e., they generate the same topology.
In particular, (D, d) is separable.
Proof Let Bρ ( f , r) denote the ball with center f and radius r with respect to the metric ρ
and define Bd ( f , r) analogously. Let ε > 0 and let f ∈ D. If d( f , g) < ε/2 and ε is small
enough, then ρ( f , g) ≤ 2d( f , g) < ε, and so Bd ( f , ε/2) ⊂ Bρ ( f , ε).
To go the other direction, what we must show is that given f and ε, there exists δ such
that Bρ ( f , δ) ⊂ Bd ( f , ε). δ may depend on f ; in fact, it has to in general, for otherwise a
Cauchy sequence with respect to d would be a Cauchy sequence with respect to ρ, and vice
versa. Choose δ small enough that 4δ1/2 + ξ f (δ1/2) < ε. By Lemma 34.2, if ρ( f , g) < δ,
then d( f , g) < ε, which is what we want.
Finally, suppose G is open with respect to the topology generated by ρ. For each f ∈ G,
let r f be chosen so that Bρ ( f , r f ) ⊂ G. Hence G = ∪ f ∈GBρ ( f , r f ). Let s f be chosen so that
Bd ( f , s f ) ⊂ Bρ ( f , r f ). Then ∪ f ∈GBd ( f , s f ) ⊂ G, and in fact the sets are equal because if
f ∈ G, then f ∈ Bd ( f , s f ). Since G can be written as the union of balls which are open with
respect to d, then G is open with respect to d. The same argument with d and ρ interchanged
shows that a set that is open with respect to d is open with respect to ρ.
262 The space D[0, 1]
34.2 Compactness and completeness
We now show completeness for (D, d).
Theorem 34.4 The space D with the metric d is complete.
Proof Let fn be a Cauchy sequence with respect to the metric d. If we can find a subsequence
nj such that fn j converges, say, to f , then it is standard that the whole sequence converges to
f . Choose nj such that d( fn j , fn j+1 ) < 2
− j. For each j there exists λ j such that
sup
t
| fn j (t) − fn j+1 (λ j(t))| ≤ 2− j, ‖λ j‖ ≤ 2− j.
As in (34.1) and (34.2),
|λ j(t) − t| ≤ 2− j+1.
Then
sup
t
|λn+m+1 ◦ λm+n ◦ · · · ◦ λn(t) − λn+m ◦ · · · ◦ λn(t)|
= sup
s
|λn+m+1(s) − s|
≤ 2−(n+m)
for each n. Hence for each n, the sequence λm+n ◦ · · · ◦ λn (indexed by m) is a Cauchy
sequence of functions on [0, 1] with respect to the supremum norm on [0, 1]. Let νn be the
limit. Clearly νn(0) = 0, νn(1) = 1, νn is continuous, and nondecreasing. We also have∣∣∣ log λn+m ◦ · · · ◦ λn(t) − λn+m ◦ · · · ◦ λn(s)
t − s
∣∣∣
≤ ‖λn+m ◦ · · · ◦ λn‖
≤ ‖λn+m‖ + · · · + ‖λn‖
≤ 1
2n−1
.
If we then let m → ∞, we obtain∣∣∣ log νn(t) − νn(s)
t − s
∣∣∣ ≤ 1
2n−1
,
which implies νn ∈ � with ‖νn‖ ≤ 21−n.
We see that νn = νn+1 ◦ λn. Consequently
sup
t
| fn j (ν−1j (t)) − fn j+1 (ν−1j+1(t))| = sup
s
| fn j (s) − fn j+1 (λ j(s))| ≤ 2− j.
Therefore fn j ◦ ν−1j is a Cauchy sequence on [0, 1] with respect to the supremum norm. Let
f be the limit. Since
sup
t
| fn j (ν−1j (t)) − f (t)| → 0
and ‖ν j‖ → 0 as j → ∞, then d( fn j , f ) → 0.
34.2 Compactness and completeness 263
We next show that if fn → f with respect to d and f ∈ C[0, 1], the convergence is in fact
uniform.
Proposition 34.5 Suppose fn → f in the topology of D[0, 1] with respect to d and f ∈
C[0, 1]. Then supt∈[0,1] | fn(t) − f (t)| → 0.
Proof Let ε > 0. Since f is uniformly continuous on [0, 1], there exists δ such that

| f (t) − f (s)| < ε/2 if |t − s| < δ. For n sufficiently large there exists λn ∈ � such that
supt | fn(t) − f (λn(t))| < ε/2 and supt |λn(t) − t| < δ. Therefore | f (λn(t)) − f (t)| < ε/2,
and so | fn(t) − f (t)| < ε.
We turn to compactness.
Theorem 34.6 A set A has compact closure in D[0, 1] if
sup
f ∈A
sup
t
| f (t)| < ∞
and
lim
δ→0
sup
f ∈A
ξ f (δ) = 0.
The converse of this theorem is also true, but we won’t need this. See Billingsley (1968) or
Exercise 34.9.
Proof A complete and totally bounded set in a metric space is compact, and D[0, 1] is a
complete metric space. Hence it suffices to show that A is totally bounded: for each ε > 0

there exist finitely many balls of radius ε that cover A.

Let η > 0 and choose k large such that 1/k < η and ξ f (1/k) < η for each f ∈ A.
Let M = sup f ∈A supt | f (t)| and let H = {−M + j/k : j ≤ 2kM}, so that H is an η-net
for [−M, M]. Let B be the set of functions f ∈ D[0, 1] that are constant on each interval
[(i − 1)/k, i/k) and that take values only in the set H . In particular, f (1) ∈ H .
We first prove that B is a 2η-net for A with respect to ρ. If f ∈ A, there exist t0, . . . , tn such
that t0 = 0, tn = 1, ti − ti−1 > 1/k for each i, and θ f [ti−1, ti) < η for each i. Note we must
have n ≤ k. For each i choose integers ji such that ji/k ≤ ti < ( ji + 1)/k. The ji are distinct
since the ti are at least 1/k apart. Define λ so that λ( ji/k) = ti and λ is linear on each interval
[ ji/k, ji+1/k]. Choose g ∈ B such that |g(m/k) − f (λ(m/k))| < η for each m ≤ k. Observe
that each [m/k, (m + 1)/k) lies inside some interval of the form [ ji/k, ji+1/k). Since λ is
increasing, [λ(m/k), λ((m + 1)/k)) is contained in [λ( ji/k), λ( ji+1/k)) = [ti, ti+1). The
function f does not vary more than η over each interval [ti, ti+1), so f (λ(t)) does not vary
more than η over each interval [m/k, (m + 1)/k). g is constant on each such interval, and
hence
sup
t
|g(t) − f (λ(t))| < 2η.
We have
|λ( ji/k) − ji/k| = |ti − ji/k| < 1/k < η
for each i. By the piecewise linearity of λ, supt |λ(t) − t| < η. Thus ρ( f , g) < 2η. We have
proved that given f ∈ A, there exists g ∈ B such that ρ( f , g) < 2η, or B is a 2η-net for A
with respect to ρ.
264 The space D[0, 1]
Now let ε > 0 and choose δ > 0 small so that 4δ + ξ f (δ) < ε for each f ∈ A. Set
η = δ2/4. Choose B as above to be a 2η-net for A with respect to ρ. By Lemma 34.2, if
ρ( f , g) < 2η < δ2, then d( f , g) ≤ 4δ + ξ f (δ) < ε. Therefore B is an ε-net for A with
respect to d.
The following corollary is proved exactly similarly to Theorem 32.1.
Corollary 34.7 Suppose Xn are processes whose paths are right continuous with left limits.
Suppose for each ε and η there exists n0, R, and δ such that
P(ξXn (δ) ≥ ε) ≤ η
and
P( sup
t∈[0,1]
|Xn(t)| ≥ R) ≤ η.
Then the Xn are tight with respect to the topology of D[0, 1].
34.3 The Aldous criterion
A very useful criterion for tightness is the following one due to Aldous (1978).
Theorem 34.8 Let {Xn} be a sequence in D[0, 1]. Suppose
lim
R→∞
sup
n
P(|Xn(t)| ≥ R) = 0 (34.3)
for each t ∈ [0, 1] and that whenever τn are stopping times for Xn and δn → 0 are reals,
|Xn(τn + δn) − Xn(τn)| (34.4)
converges to 0 in probability as n → ∞.
Proof We will set Xn(t) = Xn(1) for t ∈ [1, 2] to simplify notation. The proof of this
theorem comprises four steps.
Step 1. We claim that (34.4) implies the following: given ε there exist n0 and δ such that
P(|Xn(τn + s) − Xn(τn)| ≥ ε) ≤ ε (34.5)
for each n ≥ n0, s ≤ 2δ, and τn a stopping time for Xn. For if not, we choose an increasing
subsequence nk , stopping times τnk , and snk ≤ 1/k for which (34.5) does not hold. Taking
δnk = snk gives a contradiction to (34.4).
Step 2. Let ε > 0, fix n ≥ n0, and let T ≤ U ≤ 1 be two stopping times for Xn. We will prove

P(U ≤ T + δ, |Xn(U ) − Xn(T )| ≥ 2ε) ≤ 16ε. (34.6)

To prove this, we start by letting λ be Lebesgue measure. If

AT = {(ω, s) ∈ � × [0, 2δ] : |Xn(T + s) − Xn(T )| ≥ ε},

then for each s ≤ 2δ we have P(ω : (ω, s) ∈ AT ) ≤ ε by (34.5) with τn replaced by T .

Writing P × λ for the product measure, we then have

P × λ(AT ) ≤ 2δε. (34.7)

34.3 The Aldous criterion 265

Set BT (ω) = {s : (ω, s) ∈ AT } and CT = {ω : λ(BT (ω)) ≥ 14δ}. From (34.7) and the

Fubini theorem, ∫

λ(BT (ω)) P(dω) ≤ 2δε,

so

P(CT ) ≤ 8ε.

We similarly define BU and CU , and obtain P(CT ∪ CU ) ≤ 16ε.

If ω /∈ CT ∪ CU , then λ(BT (ω)) ≤ 14δ and λ(BU (ω)) ≤ 14δ. Suppose U ≤ T + δ. Then

λ{t ∈ [T, T + 2δ] : |Xn(t) − Xn(T )| ≥ ε} ≤ 14δ,

and

λ{t ∈ [U,U + δ] : |Xn(t) − Xn(U )| ≥ ε} ≤ 14δ.

Hence there exists t ∈ [T, T + 2δ] ∩ [U,U + δ] such that |Xn(t) − Xn(T )| < ε and
|Xn(t) − Xn(U )| < ε; this implies |Xn(U ) − Xn(T )| < 2ε, which proves (34.6).
Step 3. We obtain a bound on ξXn . Let Tn0 = 0 and
Tn,i+1 = inf{t > Tni : |Xn(t) − Xn(Tni)| ≥ 2ε} ∧ 2.

Note we have |Xn(Tn,i+1) − Xn(Tni)| ≥ 2ε if Tni < 2. We choose n0, δ as in Step 1. By Step 2
with T = Tni and U = Tn,i+1,
P(Tn,i+1 − Tni < δ, Tni < 2) ≤ 16ε. (34.8)
Let K = [2/δ] + 1 and apply (34.5) with ε replaced by ε/K to see that there exist n1 ≥ n0
and ζ ≤ δ ∧ ε such that if n ≥ n1, s ≤ 2ζ , and τn is a stopping time, then
P(|Xn(τn + s) − Xn(τn)| > ε/K) ≤ ε/K. (34.9)

By (34.6) with T = Tni and U = Tn,i+1 and δ replaced by ζ ,

P(Tn,i+1 ≤ Tni + ζ ) ≤ 16ε/K (34.10)

for each i and hence

P(∃i ≤ K : Tn,i+1 ≤ Tni + ζ ) ≤ 16ε. (34.11)

We have

E [Tni − Tn,i−1; TnK < 1] ≥ δP(Tni − Tn,i−1 ≥ δ, TnK < 1)
≥ δ[P(TnK < 1) − P(Tni − Tn,i−1 < δ, TnK < 1)]
≥ δ[P(TnK < 1) − 16ε],
where we used (34.8) in the last step. Summing over i from 1 to K,
P(TnK < 1) ≥ E [TnK; TnK < 1] =
K∑
i=1
E [Tni − Tn,i−1; TnK < 1]
≥ Kδ[P(TnK < 1) − 16ε] ≥ 2[P(TnK < 1) − 16ε],
266 The space D[0, 1]
or P(TnK < 1) ≤ 32ε. Hence except for an event of probability at most 32ε, we have
ξXn (ζ ) ≤ 4ε.
Step 4. The last step is to obtain a bound on supt |Xn(t)|. Let ε > 0 and choose δ and n0 as

in Step 1. Define

DRn = {(ω, s) ∈ � × [0, 1] : |Xn(s)(ω)| > R}

for R > 0. The measurability of DRn with respect to the product σ -field F × B[0, 1], where

B[0, 1] is the Borel σ -field on [0, 1], follows by the fact that Xn is right continuous with left

limits. Let

G(R, s) = sup

n

P(|Xn(s)| > R).

By (34.3), G(R, s) → 0 as R → ∞ for each s. Pick R large so that

λ({s : G(R, s) > εδ}) < εδ.
Then ∫
1DRn (ω, s) P(dω) = P(|Xn(s)| > R) ≤

{

1, G(r, s) > εδ,

εδ, otherwise.

Integrating over s ∈ [0, 1],

P × λ(DRn) < 2εδ.
If ERn(ω) = {s : (ω, s) ∈ DRn} and FRn = {ω : λ(ERn) > δ/4}, we have

1

4δP(FRn) =

∫

FRn

1

4δ P(dω) ≤

∫ ∫ 1

0

1DRn (ω, s) λ(ds) P(dω) ≤ 2εδ,

so P(FRn) ≤ 8ε.

Define T = inf{t : |Xn(t)| ≥ R + 2ε} ∧ 2 and define AT , BT , and CT as in Step 2. We have

P(CT ∪ FRn) ≤ 16ε.

If ω /∈ CT ∪ FRn and T < 2, then λ(ERn(ω)) ≤ δ/4. Hence there exists t ∈ [T, T + 2δ] such
that |Xn(t)| ≤ R and |Xn(t) − Xn(T )| ≤ ε. Therefore |Xn(T )| ≤ R + ε, which contradicts
the definition of T . We conclude that T must equal 2 on the complement of CT ∪ FRn, or in
other words, except for an event of probability at most 16ε, we have supt |Xn(t)| ≤ R + 2ε,
provided, of course, that n ≥ n0.
An application of Corollary 34.7 completes the proof.
Aldous’s criterion is particularly well suited for strong Markov processes.
Proposition 34.9 Suppose Xn is a sequence of real-valued strong Markov processes and
there exists c, p, and γ > 0 such that

E

x|Xn(t) − Xn(0)|p ≤ ctγ , x ∈ R, t ∈ [0, 1]. (34.12)

Then for each x ∈ R, the sequence of Px-laws of {Xn} is tight with respect to the space

D[0, 1].

Exercises 267

Unlike the Kolmogorov continuity criterion, we do not require γ > 1.

Proof Fix x. For each t,

Px(|Xn(t)| ≥ R + |x|) ≤ Px(|Xn(t) − Xn(0)| ≥ R)

≤ E

x|Xn(t) − Xn(0)|p

Rp

≤ ct

γ

Rp

,

which tends to 0 as R → ∞. We used Chebyshev’s inequality here.

Suppose τn are stopping times for Xn and δn → 0. By the strong Markov property, for each

ε > 0

Px(|Xn(τn + δn) − Xn(τn)| > ε) ≤ E

x|Xn(τn + δn) − Xn(τn)|p

εp

= ε−pE x[E Xn(τn)|Xn(δn) − Xn(0)|γ ]

≤ cε−pδγn ,

which tends to 0 as n → ∞. Now apply Theorem 34.8.

Exercises

34.1 Show that the space D with the metric ρ is separable.

34.2 Let fn = 1[1/2,1/2+1/n). Show that this is a Cauchy sequence with respect to ρ, but does not

converge to an element of D. Show { fn} is not a Cauchy sequence with respect to d.

34.3 Show that (with respect to the topology on D) the subset C[0, 1] of D is nowhere dense.

34.4 Consider D with the metric dsup( f , g) = supt∈[0,1] | f (t) − g(t)|. Show that D is not separable

with respect to the metric dsup.

34.5 Suppose P and P′ are measures supported on D[0, 1] that agree on all cylindrical subsets of

D[0, 1]. In other words, all the finite-dimensional distributions agree. Prove that P = P′ on

D[0, 1].

34.6 Show that the following are continuous functions on the space D[0, 1].

(1) f (x) = supt≤1 x(t).

(2) f (x) = ∫ 10 x(t) dt.

(3) f (x) = supt≤1(x(t) − x(t−)).

34.7 Let P be a Poisson process with parameter λ. Prove that

Pnt − nλt√

nλ

converges weakly with respect to the topology of D[0, 1] as n → ∞ to a Brownian motion.

34.8 Suppose Xn converges weakly to X with respect to the topology of C[0, 1]. Prove that Xn

converges weakly to X with respect to the topology of D[0, 1].

268 The space D[0, 1]

34.9 This is the converse to Theorem 34.6. Let A be an index set, and suppose the collection of

functions { fα, f ∈ A} is precompact in D[0, 1], i.e., its closure is compact.

(1) Prove supα∈A sup0≤t≤1 | f (t)| < ∞.
(2) Prove
lim
δ→0
sup
α∈A
ξ fα (δ) = 0.
Notes
See Billingsley (1968) for more information.
35
Applications of weak convergence
In Chapter 32 we showed how weak convergence of stochastic processes could be used to
give another construction of Brownian motion by showing that a simple symmetric random
walk converges to a Brownian motion. In the first section of this chapter, we show that the
sum of independent, identically distributed mean zero random variables with variance one
also converges to a Brownian motion, which is known as the Donsker invariance principle.
We then consider a Brownian bridge, which is a Brownian motion conditioned to return
to zero at time one. We prove in Section 35.3 that a Brownian bridge is the limit process for
a sequence of normalized empirical processes.
35.1 Donsker invariance principle
Suppose the Yi are i.i.d. real-valued random variables with mean zero and variance one,
Sn =
∑n
i=1 Yi, and Zn(t) is defined to be equal to Snt/
√
n if nt is an integer and defined by
linear interpolation for other values of t. The Donsker invariance principle says that the Zn
converge weakly with respect to the space C[0, 1] to a Brownian motion. This is a bit more
delicate than in Section 32.2 because here our Yi only have second moments.
The statement of the Donsker invariance principle is the following.
Theorem 35.1 Let the Yi and Zn be as above. Then Zn converges weakly to the law of
Brownian motion on [0, 1] with respect to the metric of C[0, 1].
Before we prove this, we give an application and explain the name “invariance principle.”
An example of how the Donsker invariance principle can be used is the following.
Corollary 35.2 Let M = sups≤1 Ws and Mn = sups≤1 Zn(s), where W is a Brownian motion.
Then Mn converges weakly to M.
Proof Let g be a bounded and continuous function on the reals and define a function F on
C[0, 1] by
F ( f ) = g(sup
s≤1
f (s)).
Notice | sups≤1 f2(s) − sups≤1 f1(s)| ≤ sups≤1 | f2(s) − f1(s)| and therefore F : C[0, 1] → R
is bounded and continuous. Since Zn converges weakly to W with respect to the topology on
C[0, 1], then E F (Zn) → E F (W ). This is equivalent to E g(Mn) → E g(M ). Because g is
an arbitrary bounded continuous function on the reals, we conclude Mn → M weakly.
269
270 Applications of weak convergence
This corollary says that the distribution of maxi≤n Si/
√
n converges to the supremum of
a Brownian motion. We can actually use this to derive the distribution of the maximum of
a Brownian motion: first determine the distribution of the maximum of Sn when the Yi’s
are particularly simple, such as when they are a simple symmetric random walk. (That is,
P(Yi = 1) = P(Yi = −1) = 12 .) Then take the limit as n → ∞. In the case of a simple
symmetric random walk, we can find the distribution of the maximum using the reflection
principle, and there are no technical difficulties with the proof, unlike using the reflection
principle with Brownian motion.
Another useful example is where In =
∫ 1
0 |Zn(t)|2 dt and I =
∫ 1
0 |Wt |2 dt. Here the dis-
tribution of I can be found by an eigenvalue argument (Kuo, 1975), and this is then an
approximation to the distribution of In.
If f is a continuous function from C[0, 1] to R, an argument similar to the proof of
Corollary 35.2 shows that f (Zn) converges weakly to f (W ). We get the same limit process,
regardless of the distribution of the Yi’s, provided only that they are i.i.d. with mean zero
and variance one. This is where the name “invariance principle” comes from – the limit is
invariant with respect to changing the distribution of the Yi’s.
Lemma 35.3 Suppose we have a sequence Yi of i.i.d. random variables with mean zero and
variance one and Sn =
∑n
i=1 Yi. Suppose λ > 4. Then

P(max

i≤n

|Si| ≥ λ

√

n) ≤ 43P(|Sn| ≥ λ

√

n/2).

Proof Let N = min{i : |Si| ≥ λ√n}, the first time Si is bigger than λ√n. N is a stopping

time and (N = i) is in the σ -field generated by Y1, . . . ,Yi. We have

P(max

i≤n

|Si| ≥ λ

√

n) ≤ P(|Sn| ≥ λ

√

n/2) + P(N < n, |Sn| < λ
√
n/2) (35.1)
≤ P(|Sn| ≥ λ
√
n/2)
+
n−1∑
i=1
P(N = i, |Sn| < λ
√
n/2).
If N = i with i < n and |Sn| < λ√n/2, then |Sn − Si| ≥ λ√n/2, and moreover the event
{|Sn − Si| ≥ λ√n/2} is in the σ -field generated by Yi+1, . . . ,Yn, and hence is independent
of the event {N = i}. Using Chebyshev’s inequality, the sum on the last line of (35.1) is
bounded by
n−1∑
i=1
P(N = i)P(|Sn − Si| ≥ λ
√
n/2) ≤
n−1∑
i=1
P(N = i)E |Sn − Si|
2
λ2n/4
=
n−1∑
i=1
P(N = i) n − i
λ2n/4
≤ 14P(N < i)
≤ 14P(maxi≤n |Si| ≥ λ
√
n),
since λ > 4. Therefore

P(max

i≤n

P(|Si| ≥ λ

√

n) ≤ P(|Sn| ≥ λ

√

n/2) + 14P(maxi≤n |Si| ≥ λ

√

n).

35.1 Donsker invariance principle 271

Subtracting the second term on the right from both sides and multiplying by 4/3 proves the

lemma.

Note that the central limit theorem tells us that for any β > 0

P(|Sn| ≥ β

√

n) → P(|Z| ≥ β) ≤ e−β2/2,

where Z is a mean zero normal random variable with variance one, and hence for n large

(depending on β),

P(|Sn| ≥ β

√

n) ≤ 2e−β2/2. (35.2)

Lemma 35.4 For each ε, η > 0, there exist n0 and δ such that if n ≥ n0 and s ∈ [0, 1 − δ],

then

P( sup

s≤t≤s+δ

|Zn(t) − Zn(s)| > ε) ≤ ηδ.

Proof Let ε, η > 0, and choose δ small enough that 2e−ε

2/128δ ≤ δη/2. Then choose j0

large enough so that, using (35.2),

P

(

|Sj| > ε

√

j

8

√

δ

)

≤ 2e−ε2/128δ ≤ δη/2

if j ≥ j0. Finally, choose n0 ≥ j0/δ + 2, so that if n ≥ n0, then [nδ] + 2 ≥ j0 and

nδ ≥ ([nδ] + 2)/2, where [x] is the largest integer less than or equal to x.

Let n ≥ n0 and set J = [nδ] + 2. Suppose there exists s such that for some t ∈ [s, s + δ]

we have |Zn(t) − Zn(s)| > ε. Then there exists j ≤ n such that for some i between j and

j + J we have |Si − Sj| ≥ ε√n/2. Therefore n ≥ J/2δ and by Lemma 35.3

P( sup

s≤t≤s+δ

|Zn(t) − Zn(s)| > ε) ≤ P( max

j≤i≤ j+J

|Si − Sj| >

√

nε/2)

≤ P

(

max

j≤i≤ j+J

|Si − Sj| >

√

Jε

4

√

δ

)

≤ 43P

(

|Sj+J − Sj| >

√

Jε

8

√

δ

)

≤ 43P

(

|SJ | >

√

Jε

8

√

δ

)

≤ δη.

The proof is complete.

Lemma 35.5 For each ε, η > 0 there exist n0 and δ such that if n ≥ n0,

P(ωZn (δ) ≥ ε) ≤ 2η.

Proof We will take δ = 1/K for some large K. If |t − s| ≤ 1/K, then either both s, t are

in the same interval [(i − 1)/K, i/K] or they are in adjoining intervals. Thus they both lie in

some interval of the form [(i − 2)/K, i/K]. Since

|Zn(t) − Zn(s)| ≤ |Zn(t) − Zn((i − 2)/K)| + |Zn(s) − Zn((i − 2)/K)|,

272 Applications of weak convergence

then using Lemma 35.4 with δ = 2/K

P(∃s, t ∈ [0, 1] : |Zn(t) − Zn(s)| ≥ ε, |t − s| < δ)
≤ P(∃i ≤ K : sup
(i−2)/K≤s≤i/K
|Zn(s) − Z(i−2)/K | ≥ ε/2)
≤ K sup
i
P( sup
(i−2)/K≤s≤i/K
|Zn(s) − Z(i−2)/K | ≥ ε/2)
≤ Kη(2/K) = 2η,
which proves the lemma.
We can now prove the Donsker invariance principle.
Proof of Theorem 35.1 By Lemma 35.5, Theorem 32.1, and the fact that Zn(0) = 0 for all n,
the laws of the Zn are tight. Therefore by Prohorov’s theorem (Theorem 30.4), every sub-
sequence has a further subsequence which converges weakly with respect to the topology
on C[0, 1]. We therefore only need to show that every subsequential limit point of the Zn
with respect to weak convergence is a Brownian motion. Since our processes lie in C[0, 1],
the paths of any subsequential limit point are continuous, so it suffices by Theorem 2.6 to
show that the finite-dimensional distributions of Zn converge weakly to the corresponding
finite-dimensional distributions of a Brownian motion W . We will show the one-dimensional
distributions converge, and leave the analogous argument for the higher-dimensional distri-
butions to the reader.
We have
P(max
i≤n
|Yi|/
√
n ≥ ε) ≤ nP(|Y1| ≥
√
nε) ≤ nP(|Y1|2/ε2 ≥ n). (35.3)
For any integrable non-negative random variable X ,
nP(X ≥ n) = E [n; X ≥ n] ≤ E [X ; X ≥ n],
which tends to zero by dominated convergence. Therefore
P(max
i≤n
|Yi|/
√
n ≥ ε) → 0. (35.4)
Fix t ∈ [0, 1]. By the central limit theorem, S[nt]/
√
[nt] converges weakly on R to a mean
zero normal random variable with variance one, and by Exercise 30.3, we see that S[nt]/
√
n
converges weakly to a mean zero normal random variable with variance t. From the preceding
paragraph we conclude that for each t, |Zn(t) − S[nt]/√n| converges to zero in probability.
By Exercise 30.2, Zn(t) has the same weak limit as S[nt]/
√
n, namely, a mean zero normal
random variable with variance t, which is the distribution of Wt .
There is an elegant proof of the Donsker invariance principle using Skorokhod embedding.
Unlike the proof above, however, this second proof does not extend to random variables taking
values in Rd .
By Theorem 15.6 we can find a Brownian motion W and a random walk Sn such that
sup
i≤n
|Si − Wi|√
n
→ 0
35.2 Brownian bridge 273
in probability. By the continuity of paths of W ,
P( sup
|t−s|≤1/n,s,t≤1
|Wt − Ws| > ε) → 0.

If we let W n(t) = Wnt/√n, we then have that supi≤n |Zn(i/n) − Wn(i/n)| tends to zero in

probability as n → ∞ and also, because Wn is again a Brownian motion,

P( sup

|t−s|≤1/n,s,t≤1

|Wn(t) − Wn(s)| > ε) → 0.

We conclude that

sup

t≤1

|Zn(t) − Wn(t)| → 0.

The law of Wn is that of a Brownian motion and does not depend on n. By Exercise 30.2 we

obtain that Zn converges weakly to the law of a Brownian motion.

If the above proof seems too simple, remember that we used Theorem 15.6, which in turn

relies on Skorokhod embedding.

One might ask about the weak convergence of Z̃n(t) = S[nt]/√n; these are the normalized

partial sums without the linear interpolation. Rather than being continuous and piecewise

linear like the Zn(t), the Z̃n(t) are piecewise constant and have jumps.

Proposition 35.6 Suppose the Yi are independent with mean zero and variance one. The Z̃n

converge weakly with respect to the topology of D[0, 1] to Brownian motion.

Proof The Zn converge weakly with respect to the topology of C[0, 1] to a Brownian

motion. By the Skorokhod representation (Theorem 31.2), we can find a probability space

and random variables Z ′n having the same law as Zn that converge almost surely with respect

to the supremum norm. Therefore the Z ′n converge almost surely with respect to the metric of

D[0, 1], and hence the Zn converge weakly to a Brownian motion with respect to the topology

of D[0, 1]. If we show that supt≤1 |Zn(t) − Z̃n(t)| converges to zero in probability, then our

result will follow by Exercise 30.2.

Now Zn(t) and Z̃n(t) will differ by more than ε for some t only if some Yi is larger than√

nε in absolute value. But by (35.4), the probability of this tends to zero as n → ∞.

35.2 Brownian bridge

A Brownian bridge W 0t is the process defined by

W 0t = Wt − tW1, 0 ≤ t ≤ 1,

where W is a Brownian motion. W 0 has continuous paths, is jointly normal, is zero at time

0 and at time 1, has mean zero, and we calculate its covariance by

Cov (W 0s ,W

0

t ) = Cov (Ws,Wt ) − s Cov (W1,Wt ) − t Cov (Ws,W1) + stVar (W1)

= s ∧ t − st,

recalling (2.1).

274 Applications of weak convergence

A Brownian bridge can be characterized as a Brownian motion conditioned to be zero at

time 1. To make this precise, let W be a Brownian motion started at zero under P, and for A

a Borel subset of C[0, 1], define

Pε(A) = P(W ∈ A | |W1| ≤ ε);

cf. (A.13). Set P0(A) = P(W 0 ∈ A), the law of W 0.

Proposition 35.7 Pε converges weakly to P0 with respect to the topology ofC[0, 1] as ε → 0.

Proof Since W is a jointly normal process and

Cov (Wt − tW1,W1) = Cov (Wt,W1) − tVar (W1) = 0,

then the process W 0t = Wt − tW1 and the random variable W1 are independent by Proposition

A.55. Let F be any closed subset of C[0, 1] and let Fδ = {g ∈ C[0, 1] : d(g, F ) < δ}, where
d(g, F ) = inf{d(g, f ) : f ∈ F} and d here is the supremum norm. Note supt≤1 |Wt −W 0t | ≤ ε
on the event {|W1| ≤ ε}. If δ > ε,

Pε(F ) = P(W ∈ F | |W1| ≤ ε) ≤ P(W 0 ∈ Fδ | |W1| ≤ ε)

= P(W 0 ∈ Fδ ) = P0(Fδ ).

Thus lim supε→0 Pε(F ) ≤ P0(Fδ ). Since F is closed, P0(Fδ ) → P0(F ) as δ → 0, so

lim sup Pε(F ) ≤ P0(F ). An application of Theorem 30.2 completes the proof.

We show that a Brownian bridge can also be represented as the solution X of the stochastic

differential equation

dXt = dWt − Xt

1 − t dt, X0 = 0, (35.5)

where W is a Brownian motion. This is plausible: X behaves much like a Brownian motion

until t is close to 1, when there is a strong push toward the origin. The existence and

uniqueness theory of Chapter 24 shows uniqueness and existence for the solution of (35.5)

for s ≤ t for any t < 1; see Exercise 24.4. We can solve (35.5) explicitly. We have
dWt = dXt + Xt
1 − t dt = (1 − t) d
[ Xt
1 − t
]
,
or
Xt = (1 − t)
∫ t
0
dWs
1 − s .
Thus Xt is a continuous Gaussian process with mean zero. The variance of Xt is
(1 − t)2
∫ t
0
(1 − s)−2 ds = t − t2,
the same as the variance of a Brownian bridge. A similar calculation shows that the covariance
of Xt and Xs is the same as the covariance of Wt − tW1 and Ws − sW1; see Exercise 24.6.
Hence the finite-dimensional distributions of Xt and a Brownian bridge are the same. We
now appeal to Theorem 2.6.
35.3 Empirical processes 275
35.3 Empirical processes
In this section we will consider empirical processes, which are useful in statistics in estimating
distribution functions. Let Xi, i = 1, . . . , n, be i.i.d. random variables that are uniformly
distributed on the interval [0, 1]. Define the empirical process
Fn(t) = 1
n
n∑
i=1
1[0,t](Xi). (35.6)
The Glivenko–Cantelli theorem (Theorem A.40) says that
sup
t∈[0,1]
|Fn(t) − t| → 0, a.s.
Our goal in this section is to obtain the corresponding weak limit theorem. Let
Zn(t) =
√
n(Fn(t) − t) = 1√
n
n∑
=1
(1[0,t](Xi) − t). (35.7)
We will show that Zn converges weakly with respect to D[0, 1] to a Brownian bridge.
Let
ωZn (δ) = sup
s,t∈[0,1],|t−s|<δ
|Zn(t) − Zn(s)|.
The paths of Zn are not continuous: they have a jump of size 1/n at every time Xi. Thus ωZn (δ)
does not tend to zero as δ → 0. Nevertheless we can get reasonable estimates on ωZn (δ).
We need an elementary lemma on binomial random variables, the proof of which is
Exercise 35.1.
Lemma 35.8 Suppose Sn is a binomial random variable with parameters n and p. Then
there exists a constant c not depending on n or p such that
E |Sn − E Sn|4 ≤ cnp + cn2 p2 (35.8)
and
E |Sn|4 ≤ cnp + cn4 p4. (35.9)
Proposition 35.9 Let ε, η > 0. There exists δ and n0 such that if n ≥ n0, then

P(ωZn (δ) > ε) ≤ η.

The idea of the proof is to use Corollary 8.4 to estimate Zn(t)− Zn(s) when |t − s| is small

and use estimates on binomials when |t − s| is large.

Proof Let ε, η > 0. We will choose n0, δ later. Assuming that they have been chosen,

suppose n ≥ n0 and choose k such that n ≤ 2k < 2n. If t ∈ [0, 1], let t(k) be the largest
multiple of 2−k less than or equal to t and similarly define s(k). Let Dk = {i/2k : 0 ≤ i ≤ 2k}.
We will show there exists δ > 0 such that

P( sup

s,t∈Dk ,|t−s|<2δ
|Zn(t) − Zn(s)| > ε/3) < η/3 (35.10)
276 Applications of weak convergence
and
P( sup
s∈[0,1]
|Zn(s) − Zn(s(k))| > ε/3) < η/3. (35.11)
Step 1. We first prove (35.10) by using Corollary 8.4. Suppose s, t ∈ Dk with |t − s| < 2δ.
Then either s = t, in which case Zn(t) − Zn(s) = 0, or else |t − s| ≥ 2−k ≥ 1/(2n). Take
p = t − s and note that 1(s,t](Xi) is a Bernoulli random variable with parameter p. Using
(35.7) and Lemma 35.8,
E |Zn(t) − Zn(s)|4 ≤ c
n2
(np + n2 p2)
= c
( p
n
+ p2
)
≤ c|t − s|2,
where in the last line we used 1/n ≤ 2|t − s|. By Corollary 8.4,
P( sup
s,t∈Dk ,|t−s|<2δ
|Zn(t) − Zn(s)| > ε/3) ≤ P

(

sup

s,t∈Dk ,|t−s|<2δ
|Zn(t) − Zn(s)|
|t − s|1/8 > c

ε

δ1/8

)

≤ c(ε/δ1/8)−4 = cδ1/2/ε4.

We choose δ small enough so that the last term is less than η/3.

Step 2. We now prove (35.11). Let

Tn(t) =

n∑

i=1

1[0,t](Xi).

Observe that Tn(t) is nondecreasing in t. If there exists s ∈ [0, 1] such that Tn(s)−Tn(s(k)) >

ε

√

n/3, then there exists j ≤ 2k −1 such that Tn(( j+1)/2k )−Tn( j/2k ) > ε√n/3. Therefore,

using (35.9),

P

(

sup

s∈[0,1]

Tn(s) − Tn(s(k))√

n

> ε/3

)

≤ P(∃ j ≤ 2k − 1 : Tn(( j + 1)/2k ) − Tn( j/2k ) > ε

√

n/3)

≤ 2k sup

j≤2k−1

P(Tn(( j + 1)/2k ) − Tn( j/2k ) > ε

√

n/3)

≤ c2k sup j E |Tn(( j + 1)/2

k ) − Tn( j/2k )|4

ε4n2

≤ c2k n2

−k + (n2−k )4

ε4n2

.

Since n2−k ≤ 2, the last line is less than or equal to

c2kn2−k/ε4n2 = c1/ε4n.

We choose n0 > 1/δ large enough so that if n ≥ n0, then c1/ε4n is less than η/3.

Also,

E [Tn(s) − Tn(s(k)] ≤ n(s − s(k)) ≤ n2−k ≤ 2

35.3 Empirical processes 277

will be less than ε

√

n/3 if n ≥ 36/ε2 and we choose n0 larger if necessary so that n0 > 36/ε2.

Since

Zn(t) − Zn(s) = Tn(t) − Tn(s)√

n

− E [Tn(t) − Tn(s)]√

n

,

(35.11) follows.

Step 3. Now that we have (35.10) and (35.11), we write

|Zn(t) − Zn(s)| ≤ |Zn(t) − Zn(t(k))| + |Zn(t(k)) − Zn(s(k))| + |Zn(s(k)) − Zn(s)|.

If |t − s| < δ, then |t(k) − s(k)| ≤ δ + 2−k ≤ δ + 1/n. Provided n ≥ n0 > 1/δ, combining

(35.10) and (35.11) gives

P( sup

s,t∈[0,1],|t−s|<δ
|Zn(t) − Zn(s)| > ε) < η
as required.
Theorem 35.10 The Zn converge weakly to a Brownian bridge with respect to the topology
of D[0, 1].
Proof We smooth Zn to get a continuous process Vn. Set Zn(t) = Zn(1) for t ∈ [1, 2] and
set
Vn(t) = n
∫ n−1
0
Zn(u + t) du.
We have
|Vn(t2) − Vn(t1)| ≤ n
∫ n−1
0
|Zn(t2 + u) − Zn(t1 + u)| du
≤ n
∫ n−1
0
ωZn (|t2 − t1|) du = ωZn (|t2 − t1|).
Note also that by (35.8) with p = t − s and using Jensen’s inequality with the measure
n1[0,n−1](u) du,
E |Vn(0)|4 ≤ n
∫ n−1
0
E |Zn(u)|4 du ≤ c.
Hence
P(|Vn(0)| ≥ A) ≤ E |Vn(0)|
4
A4
≤ c
A4
.
Therefore by Theorem 8.1, the Vn are tight with respect to weak convergence on C[0, 1]. If
the Vnj converges weakly (with respect to C[0, 1]), by the Skorokhod representation we may
find V ′n j with the same law as Vnj that converge almost surely. Then the V
′
n j
will also converge
almost surely in the space D[0, 1]. This proves that the Vn are tight in D[0, 1] by Exercise
30.10.
Given ε and η, choose δ and n0 such that P(ωZn (δ) > ε) < η if n ≥ n0. We have
|Vn(t) − Zn(t)| ≤ n
∫ n−1
0
|Zn(u + t) − Zn(t)| du ≤ ωZn (n−1).
278 Applications of weak convergence
If n is large enough so that n−1 < δ, then
P(sup
t
|Vn(t) − Zn(t)| > ε) ≤ P(ωZn (n−1) > ε) ≤ P(ωZn (δ) > ε) < η.
Therefore Vn − Zn converges to 0 in probability, and by Exercise 30.2 the subsequential limit
points of Vn are the same as those of Zn.
It remains to show that any subsequential limit point of the Zn is a Brownian bridge.
This follows from the multidimensional central limit theorem for multinomials (see
Remark A.57) and is left as Exercise 35.2.
Exercises
35.1 Prove Lemma 35.8.
35.2 Prove that the finite-dimensional distributions of Zn in Theorem 35.10 converge to those of a
Brownian bridge.
35.3 If W 0t is a Brownian bridge, prove that Yt = W 01−t is also a Brownian bridge.
35.4 Let t0 < 1. The SDE (35.5) has a unique solution when X0 = 0 is replaced by X0 = x. Let Px
be the law of the solution when X0 = x and let Zt be the canonical process. Show that (Zt , Px)
is not a Markov process.
35.5 Let Nt (A) be a Poisson point process with respect to the measure space (S, m) and let As, s > 0,

be an increasing sequence of subsets of S with m(As) → ∞ as s → ∞. Does

Nt (As) − m(As)√

m(As)

converge weakly with respect to D[0, 1] as s → ∞? What is the limit?

This can be applied to get central limit theorems for the number of downcrossings of a

Brownian motion, for example.

35.6 This exercise asks you to prove that the Poisson process conditioned to be equal to n at time 1

has the same law as n times the empirical process. Here is the precise statement. Suppose Pt is

a Poisson process with parameter λ > 0. Let Q be the law of {Pt , t ∈ [0, 1]} conditioned so that

P1 = n. Thus Q is a probability on D[0, 1] with

Q(P ∈ A) = P(P ∈ A | P1 = n).

Since (P1 = n) is an event with positive probability, there is no difficulty defining these

conditional probabilities. Prove that Q is also the law of {nFn(t), t ∈ [0, 1]}, where Fn is defined

in Section 35.3.

36

Semigroups

In this chapter we suppose we have a semigroup of positive contraction operators {Pt},

and we show how to construct a Markov process X corresponding to this semigroup. In

Chapters 37 and 38, we will show how such semigroups might arise.

We suppose that we have a state space S that is a separable locally compact metric space

S . Let C0 be the set of continuous functions on S that vanish at infinity. Recall that f ∈ C0

if f is continuous, and given ε, there exists a compact set K depending on ε and f such that

| f (x)| < ε, x /∈ K.
We use the usual supremum norm on C0. We assume we have a semigroup {Pt} of positive
contractions mapping C0 to C0. More precisely, we assume
Assumption 36.1 There exists a family {Pt}, t ≥ 0, of operators on C0 such that
(1) If f ∈ C0, then
Pt (Ps f )(x) = Pt+s f (x), x ∈ S, s, t ≥ 0.
(2) If f (x) ≥ 0 for all x and if t ≥ 0, then Pt f (x) ≥ 0 for all x.
(3) For all t, ‖Pt f ‖ ≤ ‖ f ‖.
(4) If f ∈ C0, then Pt f → f uniformly as t → 0.
Our goal in this section is to construct a process X corresponding to the semigroup Pt .
The steps we use are the following.
(1) We temporarily assume each Pt maps the function 1 into itself. We define Xt for t in
the dyadic rationals and define Px using the Kolmogorov extension theorem.
(2) We verify a preliminary version of the Markov property.
(3) We use the regularity theorem for supermartingales to show that X has left and right
limits along the dyadic rationals, and then define Xt for all t.
(4) We verify that our process (Xt, Px) corresponds to the semigroup Pt .
(5) We remove the assumption that Pt1 = 1.
36.1 Constructing the process
Let us assume the following for now. We will remove this assumption at the end of this
section.
Assumption 36.2 Pt1(x) = 1 for all x and all t ≥ 0.
279
280 Semigroups
We now begin the construction of (Xt, Px).
Step 1. Let Dn = {k/2n : k ≥ 0} and let D = ∪nDn, the dyadic rationals. Let � be the set of
functions from D to S . Define
Xt (ω) = ω(t), t ∈ D, ω ∈ �.
We let F be the σ -field on � generated by the collection of cylindrical subsets of �.
By the Riesz representation theorem (see Rudin (1987)), for each t > 0 there exists a

measure Pt (x, dy) such that

Pt f (x) =

∫

f (y) Pt (x, dy), f ∈ C0. (36.1)

(The Riesz representation theorem is most often phrased for continuous functions on compact

spaces; since we are working with C0, we can let the state space satisfy slightly weaker

hypotheses; see Folland (1999), p. 223.) We can use (36.1) to define Pt f for all bounded

Borel measurable functions f . Since Pt maps C0 to C0, and continuous functions are Borel

measurable, a limit argument shows that Pt f is Borel measurable whenever f is bounded

and Borel measurable.

Our main task in this step is to define Px. D is countable and we fix a labeling D =

{t1, t2, . . .}. Let En = {t1, . . . , tn}. Let s1 ≤ · · · ≤ sn be the ordering of En according to the

usual ordering of the reals, so that s1 is the smallest element of the set {t1, . . . , tn}, s2 is the

next smallest, and so on. Define

Pxn(Xs1 ∈ A1, . . . , Xsn ∈ An) (36.2)

=

∫

An

· · ·

∫

A1

Ps1 (x, dx1)Ps2−s1 (x1, dx2) · · · Psn−sn−1 (xn−1, dxn)

for A1, . . . , An Borel subsets of S . The Pxn are consistent in the sense of Appendix D. The

key to checking this is to observe that if s1, . . . , sn is the ordering of En and we temporarily

write s1, . . . , si, s, si+1, . . . , sn for the ordering of En+1, then∫

S

Ps−si (xi−1, dx)Psi+1−s(x, dxi) = Psi+1−si (xi−1, dxi)

by the semigroup property; cf. (19.10).

By the Kolmogorov extension theorem (Theorem D.1), for each x there exists a probability

Px such that

Px(Xt1 ∈ A1, . . . , Xtn ∈ An) = Pxn(Xt1 ∈ A1, . . . , Xtn ∈ An)

for each n whenever A1, . . . , An are Borel subsets of S .

If E x is the expectation corresponding to Px, (36.2) can be rewritten as

E

x[ f1(Xs1 ) · · · fn(Xsn )] (36.3)

=

∫

· · ·

∫

f1(x1) · · · fn(xn)Ps1 (x, dx1)Ps2−s1 (x1, dx2) · · ·

× Psn−sn−1 (xn−1, dxn)

36.1 Constructing the process 281

when fi = 1Ai for each i. To see this, by linearity we have (36.3) when the functions fi are

simple functions; by a limit argument we have (36.3) when the fi are Borel measurable and

non-negative, and by linearity, (36.3) holds when the fi are bounded and Borel measurable.

By (36.2) we have

Px(Xt ∈ A) = E 1A(Xt ) =

∫

A

Pt (x, dy) = Pt1A(x).

Using linearity and a limit argument, we have E x f (Xt ) = Pt f (x) when f is bounded and

Borel measurable.

Proposition 36.3 If f is bounded and Borel measurable, s, t > 0, and x ∈ S , then

E

x

[

E

Xt f (Xs)

]

= E x f (Xs+t ). (36.4)

Proof The proof of (36.4) is mainly a matter of sorting out notation. Let ϕ(x) = E x f (Xs) =

Ps f (x). Hence E

Xt f (Xs) = ϕ(Xt ) = Ps f (Xt ). Then the left-hand side is E x(Ps f )(Xt ) =

Pt (Ps f )(x). The right-hand side of (36.4) is Ps+t f (x), and so the two sides agree by the

semigroup property.

Step 2. We so far only have Xt constructed for t ∈ D. To extend the definition to all t, we

want to let Xt = limu>t,u∈D,u→t Xu. But before we can make that definition, we need to know

that the limits exist. We will use the regularity of supermartingales to show this, so we need

to look at conditional expectations. Let

F ′s = σ (Xr; r ≤ s, r ∈ D).

Proposition 36.4 If s < t with s, t ∈ D and f is bounded and Borel measurable, then
E
x[ f (Xt ) | F ′s] = E Xs f (Xt−s), Px-a.s. (36.5)
Proof Take n ≥ 1, r1 ≤ r2 ≤ · · · ≤ rn ≤ s with each r j in D, and A1, . . . , An Borel subsets
of S . It suffices to show that
E
x[ f (Xt )1A1 (Xr1 ) · · · 1An (Xrn )] = E x[(E Xs f (Xt−s))1A1 (Xr1 ) · · · 1An (Xrn )], (36.6)
since the events (Xr1 ∈ A1, . . . , Xrn ∈ An) generate F ′s. The right-hand side of (36.6) is equal
to
E
x[Pt−s f (Xs)1A1 (Xr1 ) · · · 1An (Xrn )]. (36.7)
From (36.3)
E
x[Pt−s f (Xs)1A1 (Xr1 ) · · · 1An (Xrn )] =
∫
· · ·
∫
Pt−s f (y)1A1 (x1) · · · 1An (xn) (36.8)
× Pr1 (x, dx1) · · · Prn−rn−1 (xn−1, xn)Ps−rn (xn, dy).
But Pt−s f (y) =
∫
f (z)Pt−s(y, dz). Substituting this in (36.8) and using (36.3) again, we
obtain the left-hand side of (36.6).
Step 3. We define Rλ, the resolvent or λ-resolvent of Pt , by
Rλ f (x) =
∫ ∞
0
e−λtPt f (x) dt. (36.9)
282 Semigroups
Lemma 36.5 If f ≥ 0 is bounded and Borel measurable and x ∈ S , then Mt = e−λtRλ f (Xt ),
t ∈ D, is a supermartingale with respect to the filtration {F ′t ; t ∈ D} and the probability
measure Px.
Proof What we need to show is that if s < t ∈ D, then
E
x[e−λtRλ f (Xt ) | F ′s] ≤ e−λsRλ f (Xs), Px-a.s.
By Proposition 36.3 the left-hand side is
e−λtE Xs Rλ f (Xt−s),
so what we need to show is that
E
yRλ f (Xt−s) ≤ eλ(t−s)Rλ f (y) (36.10)
for all y. The left-hand side of (36.10) is
Pt−sRλ f (y) =
∫ ∞
0
e−λrPt−sPr f (y) dr
=
∫ ∞
0
e−λrPr+t−s f (y) dr
= eλ(t−s)
∫ ∞
t−s
e−λrPr f (y) dr
≤ eλ(t−s)
∫ ∞
0
e−λrPr f (y) dr
= eλ(t−s)Rλ f (y).
The first equality is the Fubini theorem, the second the semigroup property, and the third
equality comes from a change of variables.
Next, if f is non-negative and bounded, by Theorem 3.12 with P replaced by Px, we see
that e−λtRλ f (Xt ) has left and right limits along t ∈ D. Therefore the same is true for Rλ f (Xt ).
By Assumption 36.1 and dominated convergence, we have that if f ∈ C0,
λRλ f (x) − f (x) =
∫ ∞
0
e−λt (Pt f (x) − f (x)) dt
=
∫ ∞
0
e−t (Pt/λ f (x) − f (x)) dt
tends to zero uniformly in x as λ → 0. Take a countable dense subset { fi} of C0 and look at
jR j fi(Xt ) for all positive integers j. Since jR j fi(Xt ) has left and right limits along D, a.s.,
letting j → ∞, we see that fi(Xt ) does also. We conclude that Xt has left and right limits
along D.
Now define Xt = limu>t,u∈D,u→t Xu. Then Xu is right continuous with left limits. We check

that

Px(Xt1 ∈ A1, . . . , Xtn ∈ An) =

∫

A1

· · ·

∫

An

Pt1 (x, dx1) · · · Ptn−tn−1 (xn−1, dxn).

36.2 Examples 283

To see this, we know this holds when the ti are in D. By linearity and a limit argument, we

conclude

E

x[ f1(Xt1 ) · · · fn(Xtn )] =

∫

· · ·

∫

f (x1) · · · f (xn)Pt1 (x, dx1) · · · Ptn−tn−1 (xn−1, dxn)

(36.11)

when the fi are bounded and continuous. Using a limit argument, we know (36.11) holds

when the ti are arbitrary non-negative real numbers. Using a limit argument again, (36.11)

holds for all bounded and measurable f , in particular, when fi = 1Ai .

Step 4. It remains to show that (Xt, Px) satisfies Definition 19.1 and that Pt is the semigroup

of this process. Let F 00t = σ (Xs; s ≤ t). Then we have already shown that (Xt, Px) is a

Markov process with respect to the filtration {F 00t }, except for showing that

Px(Xs+t ∈ A | F 00s ) = PXs (Xt ∈ A).

However, this can be proved almost identically to the way we proved Proposition 36.4.

Step 5. Sometimes the semigroup is a contraction semigroup and satisfies Assumption 36.1

but not Assumption 36.2. In this case the Pt (x, A) are called sub-Markov transition probability

kernels. The missing probability is due to the process being killed, and we can handle this

situation as follows. Let S� = S ∪ {�}, where we introduce an isolated point {�}. The

topology on S� is the one generated by the open sets on S together with the set {�}. Given

a function f on S , we extend it to S� by setting f (�) = 0. We replace Pt (x, A) by Pt (x, A),

where ⎧⎪⎨⎪⎩

Pt (x, A) = Pt (x, A), x ∈ S, A ⊂ S,

Pt (x, {�}) = 1 − Pt (x,S ), x ∈ S,

Pt (�, {�}) = 1.

(36.12)

One can go through the above construction with Pt and obtain a strong Markov process Xt

whose state space is S�. It is not hard to show that starting at �, the process stays at �

forever; see Exercise 36.1.

We remark that by the results of Chapter 20 and also Exercise 20.1, we can expand the

filtration from {F 00t } to {Ft}, where {Ft} is right continuous and each Ft contains all the sets

that are null with respect to each Px. In addition, the strong Markov property will hold for

(Xt, Px).

36.2 Examples

Example 36.6 Our first example is a Brownian motion. Let

p(t, x, y) = (2πt)d/2e−|x−y|2/2t,

and set

Pt (x, A) =

∫

A

p(t, x, y) dy.

We know ∫

p(t, x, z)p(s, z, y) dz = p(t + s, x, y)

284 Semigroups

by Proposition 19.5, and so Pt satisfies the semigroup property. We showed in Section 19.4

that Assumption 36.1 is satisfied, except for the fact that Pt maps C0 to C0; this is Exercise

36.2. Therefore we have a strong Markov process associated with Pt . By Proposition 21.5,

the paths of the strong Markov process can be taken to be continuous. This gives yet another

construction of a Brownian motion.

Example 36.7 We now use the machinery we have developed in this chapter to construct

the Poisson process. Define transition probabilities by

Pt (x, A) = e−λt

∞∑

k=0

(λt)k

k!

1A(x + k),

where λ is some fixed parameter. If p(t, k) = e−λt (λt)k/k!, then

Pt f (x) =

∞∑

k=0

f (x + k)p(t, k). (36.13)

Thus

Ps(Pt f )(x) =

∞∑

j=0

Pt f (x + j)p(s, j) =

∞∑

j=0

∞∑

k=0

f (x + j + k)p(t, k)p(s, j).

This is equal to

∞∑

m=0

f (x + m)

m∑

k=0

p(t, m − k)p(s, k), (36.14)

which by Exercise 36.3 is equal to

∞∑

m=0

f (x + m)p(s + t, m) = Ps+t f (x). (36.15)

Therefore the semigroup property holds.

We therefore have a strong Markov process X whose paths are right continuous with left

limits. We want to show that the process Xt under the probability measure P0 is a Poisson

process. That P0(X0 = 0) = 1 is obvious. We need to show that Definition 5.1(3) and (4)

hold. For the former,

P0(Xt − Xs = k) =

∞∑

j=0

P(Xt = j + k, Xs = j) (36.16)

=

∞∑

j=0

p(s, j)p(t − s, k) = p(t − s, k), (36.17)

as desired. For Definition 5.1(4), suppose r1 ≤ r2 ≤ · · · ≤ rn ≤ s < t, a1, . . . , an are
integers, and let A = (Xr1 = a1, . . . , Xrn = an). We will be done if we show
P0(Xt − Xs = k, A) = P0(Xt − Xs = k)P0(A). (36.18)
Notes 285
The left-hand side of (36.18) is equal to
∞∑
j=0
P0(Xt = j + k, Xs = j, A) =
∞∑
j=0
E
0[P0(Xt = j + k | Fs); Xs = j, A]
=
∞∑
j=0
E
0
[
PXs (Xt−s = j + k); Xs = j, A
]
=
∞∑
j=0
E
0
[
P j(Xt−s = j + k); Xs = j, A
]
=
∞∑
j=0
E
0[p(t − s, k); Xs = j, A]
= p(t − s, k)P0(A).
Together with (36.16) this proves (36.18).
Exercises
36.1 Suppose Pt is a family of sub-Markov transition probabilities and we define Pt by (36.12). Show
that Pt is a family of Markov transition probabilities. Show that P�(Xt �= � for some t > 0) = 0,

i.e., starting at �, the process stays there forever.

36.2 Show that if Pt (x, A) is defined by (19.17), and Pt f (x) =

∫

f (y) Pt (x, dy), then Pt maps C0 into

C0.

36.3 Show that (36.14) equals (36.15).

36.4 Show that Pt defined by (36.13) satisfies all the parts of Assumption 36.1.

36.5 Suppose {μt , t ≥ 0} is a tight family of probability measures on the real line. Suppose there

exists a function ψ : R → C such that the Fourier transforms of the μt have the following form:∫

eiux μt (dx) = etψ(u), t ≥ 0, u ∈ R.

(1) Prove that μt converges weakly to μ0 as t → 0. Note that μ0 is the same as point

mass at 0.

(2) Define the operators Pt by

Pt f (x) =

∫

f (x − y) μt (dy).

Prove that the Pt form a strongly continuous semigroup of contraction operators mapping C0

into C0. Conclude that there exists a strong Markov process whose semigroup is given by

the Pt .

This semigroup is called a convolution semigroup because μt+s = μt ∗ μs, in the sense of

convolution of measures. We will see later that these are associated with Lévy processes.

Notes

See Blumenthal and Getoor (1968) for further information.

37

Infinitesimal generators

Often a Markov process is specified in terms of its behavior at each point, and one wants

to form a global picture of the process. This means one is given the infinitesimal generator,

which is a linear operator that is an unbounded operator in general, and one wants to come

up with the semigroup for the Markov process.

We will begin by looking further at semigroups and resolvents, and then define the

infinitesimal generator of a semigroup. We will prove the Hille–Yosida theorem, which is the

primary tool for constructing semigroups from infinitesimal generators. Then we will look

at two important examples: elliptic operators in nondivergence form and Lévy processes.

37.1 Semigroup properties

Let S be a locally compact separable metric space. We will take B to be a separable Banach

space of real-valued functions on S . For the most part, we will take B to be the continuous

functions on S that vanish at infinity (with the supremum norm), although another common

example is to let B be the set of functions on S that are in L2 with respect to some measure.

We use ‖ · ‖ for the norm on B.

For the duration of this chapter we will make the following assumption.

Assumption 37.1 Suppose that Pt , t ≥ 0, are operators acting on B such that

(1) the Pt are contractions: ‖Pt f ‖ ≤ ‖ f ‖ for all t ≥ 0 and all f ∈ B,

(2) the Pt form a semigroup: PsPt = Pt+s for all s, t ≥ 0, and

(3) the Pt are strongly continuous: if f ∈ B, then Pt f → f as t → 0.

Note that the semigroup property implies in particular that Ps and Pt commute. For a bounded

operator A on B, ‖A‖ = sup{‖A f ‖ : ‖ f ‖ ≤ 1}, so saying Pt is a contraction is the same as

saying ‖Pt‖ ≤ 1.

Define the resolvent or λ-resolvent operator of a semigroup Pt by

Rλ f (x) =

∫ ∞

0

e−λtPt f (x) dt. (37.1)

The resolvent equation is

Rλ − Rμ = (μ − λ)RλRμ. (37.2)

We show that the semigroup property implies the resolvent equation.

286

37.1 Semigroup properties 287

Proposition 37.2 The resolvent equation (37.2) holds.

Proof We write

Rλ(Rμ f )(x) =

∫ ∞

0

e−λtPt (Rμ f )(x) dt

=

∫ ∞

0

e−λt

∫ ∞

0

e−μsPt (Ps f )(x) ds dt

=

∫ ∞

0

e−λt

∫ ∞

0

e−μsPt+s f (x) ds dt

=

∫ ∞

0

e−λt eμt

∫ ∞

t

e−μsPs f (x) ds dt

=

∫ ∞

0

∫ s

0

e−(λ−μ)t e−μsPs f (x) dt ds

=

∫ ∞

0

1 − e−(λ−μ)s

λ − μ e

−μsPs f (x) ds

= 1

μ − λ

[ ∫ ∞

0

e−λsPs f (x) ds −

∫ ∞

0

e−μsPs f (x) ds

]

= 1

μ − λ [Rλ f (x) − Rμ f (x)].

The second equality uses Exercise 37.2, the fourth a change of variables, and the fifth the

Fubini theorem.

We have the following corollary to Proposition 37.2.

Corollary 37.3 If μ, λ > 0 and |μ − λ| < λ, then
Rμ f = Rλ f +
∞∑
i=1
(λ − μ)iRi+1λ f . (37.3)
Here R2λ f = Rλ(Rλ f ), and similarly for Riλ f .
Proof By Proposition 37.2, we have
Rμ f = Rλ f + (λ − μ)RλRμ f . (37.4)
If we substitute for Rμ f in the last term on the right-hand side of (37.4), we have
Rμ f = Rλ f + (λ − μ)RλRλ f + (λ − μ)2RλRλRμ f .
We again substitute for Rμ f , and repeat. Since
‖(λ − μ)Rλ‖ ≤ |λ − μ|
λ
,
which is less than one in absolute value, (λ − μ)iRi+1λ Rμ f converges to zero as i → ∞ and
the series converges.
288 Infinitesimal generators
Remark 37.4 In particular, if Rλ and Sλ are two resolvents that agree at one value of λ, say
λ0, then Corollary 37.3 applied once with Rλ and once with Sλ implies that if λ < 2λ0, then
Rλ f = Rλ0 f +
∞∑
i=1
(λ0 − λ)i(Rλ0 )i+1 f
= Sλ0 f +
∞∑
i=1
(λ0 − λ)i(Sλ0 )i+1 f = Sλ f ,
or Rλ and Sλ agree for λ < 2λ0. Applying this observation again with λ0 replaced by 3λ0/2,
then Rλ and Sλ agree for λ < 3λ0. Continuing this argument, we see that Rλ and Sλ must
agree for each positive value of λ.
If for some f ∈ B, ∥∥∥Ph f − f
h
− g
∥∥∥→ 0
as h → 0, we say that f is in the domain of the infinitesimal generator of the semigroup, we
write g = L f and write f ∈ D = D(L). Generally D(L) is a proper subset of B. If f ∈ D
and t > 0, then

PhPt f − Pt f

h

= PtPh f − Pt f

h

= Pt

(Ph f − f

h

)

→ PtL f , (37.5)

since Pt is a contraction. Therefore Pt f ∈ D when f ∈ D and L(Pt f ) = Pt (L f ).

Proposition 37.5 Fix λ > 0 and let C = {Rλ f : f ∈ B}. Then C = D(L) and for f ∈ B,

LRλ f = λRλ f − f .

Proof Suppose that g ∈ C, so that g = Rλ f for some f ∈ B. Then

PhRλ f =

∫ ∞

0

e−λtPh+t f dt = eλh

∫ ∞

h

e−λtPt f dt, (37.6)

and so

Phg − g = PhRλ f − Rλ f = (eλh − 1)

∫ ∞

h

e−λtPt f dt −

∫ h

0

e−λtPt f dt. (37.7)

Dividing by h and letting h → 0, the first term on the right of (37.7) converges (use Exercise

37.2) to

λ

∫ ∞

0

e−λtPt f dt = Rλ f .

Since f ∈ B, then Pt f → f as t → 0. After dividing by h, the second term on the right-hand

side of (37.7) converges to f . Thus

L(Rλ f ) = λRλ f − f , (37.8)

as required.

37.1 Semigroup properties 289

We have shown that C ⊂ D(L), and we now show the opposite inclusion. Suppose

f ∈ D(L). Let g = λ f − L f , which is in B. Since Pt and L commute, then Rλ and L

commute, and by (37.8),

f = λRλ f − (λRλ f − f ) = λRλ f − RλL f

= Rλg,

which is in C.

Example 37.6 Let us compute the infinitesimal generator when (Xt, Px) is a one-

dimensional Brownian motion. For our space B we take the continuous functions on R

that vanish at infinity. Suppose f ∈ C2 with compact support. By a Taylor series expansion,

Ph f (x) = E x f (Xh) = f (x) + f ′(x)E x(Xh − x) + 12 f ′′(x)E x(Xh − x)2 + Rh,

where Rh is the remainder term. We know Rh is bounded by

‖ f ′′‖∞E x[ϕ(Xh − x)],

where ϕ is bounded and |ϕ(y)/y2| → 0 as y → 0. Since Wh started at x has mean x and

variance h, we have

Ph f (x) = f (x) + 12 f ′′(x)h + Rh,

where |Rh/h| tends to zero as h → 0. Therefore

Ph f − f

h

→ 12 f ′′,

the convergence being with respect to the supremum norm. Exactly the same argument

holds in higher dimensions to show that L f = 12� f . We have shown that D(L) contains

the C2 functions with compact support, but have not actually identified the domain of the

infinitesimal generator. We refer the reader to Knight (1981) for a detailed discussion.

The domain of an infinitesimal generator is nearly as important as the operator itself.

We will briefly discuss aspects of the domains of the infinitesimal generator for absorbing

Brownian motion and for reflecting Brownian motion on [0, ∞). Both have the same operator

L f = 12 f ′′ but different domains.

Absorbing Brownian motion on [0, ∞) is Brownian motion killed on first hitting (−∞, 0).

Let Wt be standard Brownian motion on R and let Xt be Wt killed on first hitting (−∞, 0).

If f ∈ C2[0, ∞) with f and its first and second derivatives being bounded and uniformly

continuous and x �= 0, (E x f (Xt ) − f (x))/t differs from (E x f (Wt ) − f (x))/t by at most

‖ f ‖∞Px(T0 < t)/t,
where T0 is the first time Wt hits (−∞, 0). If x �= 0,
Px(T0 < t)
t
≤ P
x(sups≤t |Ws − W0| ≥ x)
t
≤ 2
t
e−x
2/2t → 0
as t → 0. Therefore for x �= 0, the infinitesimal generator of absorbing Brownian motion is
the same as the infinitesimal generator of standard Brownian motion, namely, 12 f
′′(x).
290 Infinitesimal generators
If f = Rλg for g bounded and continuous, we have
f (0) = Rλg(0) = E 0
∫ T0
0
e−λtg(Xt ) dt = 0.
We use the fact that starting at 0, T0 = 0, a.s., by Theorem 7.2. Using Proposition 37.5, every
function in the domain of the infinitesimal generator of absorbing Brownian motion must
satisfy f (0) = 0.
We can define reflecting Brownian motion on [0, ∞) by Xt = |Wt |, where W is a one-
dimensional Brownian motion on R. As in the preceding paragraph, the infinitesimal gen-
erator for X agrees with 12 f
′′(x) if x �= 0. For x = 0, an application of Taylor’s theorem
gives
E
0 f (|Wt |) = f (0) + f ′(0)E 0|Wt | + 12 f ′′(0)E 0|Wt |2 + E 0Rt,
where Rt is a remainder term. Subtracting f (0) from both sides and dividing by t, and noting
E
0|Wt |/t = c1
√
t/t → ∞ as t → 0, the only way we can get convergence is if f ′(0) = 0.
Thus every function in the domain of the infinitesimal generator of reflecting Brownian
motion must satisfy f ′(0) = 0.
In higher dimensions, the analogous restriction for reflecting Brownian motion is that
the normal derivative ∂ f /∂n must equal zero on the boundary of the domain, where n is
the inward-pointing unit normal vector. In the partial differential equations literature, this is
known as the Neumann boundary condition, and models situations where there is no heat
flow across the boundary. For absorbing Brownian motion the analogous restriction is that
f = 0 on the boundary of the domain, and this is called the Dirichlet boundary condition.
Example 37.7 Next we compute the generator for a Poisson process with parameter λ. We
can let B be as in Example 37.6. We have
Ph f (x) =
∞∑
i=0
e−λh
(λh)i
i!
f (x + i)
= e−λh f (x) + e−λhλh f (x + 1) +
∞∑
i=2
e−λh
(λh)i
i!
f (x + i).
Subtracting f (x) from both sides, dividing by h, and letting h → 0, we obtain
L f (x) = −λ f (x) + λ f (x + 1) = λ[ f (x + 1) − f (x)].
In this case the domain of L is all of B.
A very useful result is Dynkin’s formula.
Theorem 37.8 Suppose Pt operating on the space B of continuous functions vanishing at
infinity is the semigroup of a Markov process (Xt, Px), f ∈ D(L), and f andL f are bounded.
If x ∈ S and T is a stopping time with E xT < ∞, then
E
x f (XT ) − f (x) = E x
∫ T
0
L f (Xr) dr.
37.1 Semigroup properties 291
Proof If f ∈ D(L), then L f ∈ B, and so PtL f is continuous in t. Moreover, as we saw in
(37.5),
∂
∂t
Pt f (y) = PtL f (y).
By the fundamental theorem of calculus,
Pt f (y) − f (y) =
∫ t
0
PrL f (y) dr,
which can be rewritten as
E
y f (Xt ) − f (y) = E y
∫ t
0
L f (Xr) dr; (37.9)
we used the Fubini theorem here as well. This holds for each y ∈ S and each t > 0.

Set Mt = f (Xt ) − f (X0) −

∫ t

0 L f (Xr) dr. What (37.9) says is that E

yMt = 0 for all y and

all t. By the Markov property,

E

x[Mt − Ms | Fs] = E x

[

f (Xt ) − f (Xs) −

∫ t

s

L f (Xr) dr | Fs

]

= E x

[(

f (Xt−s) − f (X0) −

∫ t−s

0

L f (Xr) dr

)

◦ θs | Fs

]

= E Xs Mt−s = 0.

Therefore Mt is a martingale with respect to Px for each x. If T is a bounded stopping time,

then by optional stopping, E xMT = 0. If T is instead only integrable with respect to Px,

we have E xMT ∧n = 0 for each n. We then let n → ∞ and use the fact that f and L f are

bounded to conclude E xMT = 0, which is what we want.

We say a few words about the Kolmogorov backward and forward equations. Suppose the

semigroup Pt can be written

Pt f (x) =

∫

f (y)p(t, x, y) dy,

for functions p(t, x, y), which are called transition densities. Provided there are no difficulties

interchanging integration and differentiation, the equation

∂

∂t

Pt f (x) = LPt f (x)

can be rewritten as ∫

f (y)

∂

∂t

p(t, x, y) dy =

∫

f (y)Lp(t, x, y) dy,

which leads to the Kolmogorov backward equation

∂

∂t

p(t, x, y) = Lp(t, x, y),

where L operates on the x variable and y is held fixed.

292 Infinitesimal generators

If L has an adjoint operator L∗, which means

∫

f (Lg) = ∫ (L∗ f )g for f and g in the

domains of L∗ and L, respectively, the equation

∂

∂t

Pt f (x) = PtL f (x)

can be rewritten as∫

f (y)

∂

∂t

p(t, x, y) dy =

∫

L f (y)p(t, x, y) dy =

∫

f (y)L∗ p(t, x, y) dy,

which leads to the Kolmogorov forward equation

∂

∂t

p(t, x, y) = L∗ p(t, x, y),

where L∗ operates on the y variable and x is held fixed.

37.2 The Hille–Yosida theorem

We now show how to construct a semigroup given the infinitesimal generator. We start with

a few preliminary observations. If A is a bounded operator, we can define

eA = I + A + A2/2! + · · · =

∞∑

i=0

Ai/i!

To see that the series converges, note that∥∥∥ n∑

i=m

Ai/i!

∥∥∥ ≤ n∑

i=m

‖Ai‖/i! ≤

∞∑

i=m

‖A‖i/i!,

which will be small if m is large since ‖A‖ is a finite number. Similarly,

‖eA‖ ≤

∞∑

i=0

‖A‖i/i! = e‖A‖.

Proposition 37.9 Suppose {Rλ} is a family of bounded operators defined on B such that

(1) the resolvent equation holds,

(2) ‖Rλ‖ ≤ 1/λ for each λ > 0, and

(3) ‖λRλ f − f ‖ → 0 as λ → ∞ for each f ∈ B.

Then there exists a strongly continuous semigroup Pt whose resolvent is Rλ.

Proof Let Dλ = λ(λRλ − I ) and Qλt = etDλ . Note that the resolvent equation implies that

Dλ and Dμ commute and therefore all the operators Dλ, Qλt , Dμ, and Q

μ

t commute. Since

‖λRλ‖ ≤ 1, then

‖Qλt ‖ = e−λt‖etλ

2Rλ‖ ≤ e−λt e‖tλ2Rλ‖ ≤ e−λteλt = 1.

We first show that the set of f such that Dλ f converges as λ → ∞ is a dense subset of B.

If f = Rag for some a > 0 and some g ∈ B, then by the resolvent equation

Dλ f = λ(λRλ − I )(Rag) = λ2RλRag − λRag

= λ

2

λ − a (Rag − Rλg) − λRag.

37.2 The Hille–Yosida theorem 293

We have

λ2

λ − aRλg =

λ

λ − aλRλg → g

as λ → ∞ by hypothesis (3) and

λ2

λ − aRag − λRag =

λ

λ − aaRag → aRag

as λ → ∞. Therefore

DλRag → aRag − g. (37.10)

Thus Dλ converges on E = ∪a>0{Ra f : f ∈ B}. But for any f ∈ B, aRa f → f as a → ∞

and aRa f = Ra(a f ) ∈ E, which proves that E is a dense subset of B.

Next we show that if Dλ f converges, then Qλt f converges. Suppose Dλ f converges and

ε > 0. Choose M such that if λ, μ ≥ M , then ‖Dλ f − Dμ f ‖ < ε. Since ∂Qλt f /dt = DλQλt f
and Qλ0, Q
μ
0 are both the identity operator, we have
Qλt f − Qμt f =
∫ t
0
∂
∂s
(Qλs Q
μ
t−s f ) ds
=
∫ t
0
[Qλs DλQ
μ
t−s f − Qλs DμQμt−s f ] ds
=
∫ t
0
[Qλs Q
μ
t−s(Dλ f − Dμ f )] ds,
so
‖Qλt f − Qμt f ‖ ≤ t‖Dλ f − Dμ f ‖ < εt,
using that Qλs and Q
μ
t−s are contractions.
Since ε is arbitrary, this proves that Qλt f is a Cauchy sequence in B and hence converges.
Call the limit Pt f . We can easily check that Qλt is a semigroup for each λ > 0 and we saw

that Qλt is a contraction for each t and λ. It follows that Pt is a semigroup and that the norm of

each Pt is bounded by 1. Each Qλt is strongly continuous, and by the uniform convergence, it

follows that Pt f → f as t → 0 for f ∈ E. Since each Pt is a contraction and E is dense in B,

we can extend each Pt so as to have domain B and so that the Pt will be a strongly continuous

semigroup on B.

Let Sλ be the resolvent for Pt . It remains to prove that Sλ = Rλ. Fix a and let f = Rag. We

saw in (37.10) that DλRag → aRag − g. Now Qλt is a semigroup for each λ and by Exercise

37.4, the infinitesimal generator of Qλt is Dλ. By the fundamental theorem of calculus,

Qλt (Rag) − Rag =

∫ t

0

∂

∂s

(Qλs Rag) ds =

∫ t

0

Qλs (DλRag) ds.

Letting λ → ∞,

Pt (Rag) − Rag =

∫ t

0

Ps(aRag − g) ds.

294 Infinitesimal generators

Let b < a. Multiply the above equation by e−bt and integrate over t from 0 to ∞. Then
Sb(Rag) − 1
b
Rag =
∫ ∞
0
e−bt
∫ t
0
Ps(aRag − g) ds dt
=
∫ ∞
0
∫ ∞
s
e−btPs(aRag − g) dt ds
=
∫ ∞
0
1
b
e−bsPs(aRag − g) ds
= 1
b
Sb(aRag − g).
Therefore
Sbg = Rag + (a − b)SbRag.
Applying this with g replaced by Rag, iterating, and using Corollary 37.3, we obtain
Sbg = Rag + (a − b)R2ag + (a − b)3R3ag + · · · = Rbg.
By Remark 37.4, this proves Sb = Rb for all b.
We now show that under appropriate hypotheses on L, there exists a semigroup whose
infinitesimal generator is L. This is known as the Hille–Yosida theorem. We say that an
operator L is dissipative if
‖(λ − L) f ‖ ≥ λ‖ f ‖, f ∈ D(L). (37.11)
Theorem 37.10 Suppose L is an operator such that
(1) the domain of L is a dense subset of B,
(2) the range of λ − L is B for each λ, and
(3) L is dissipative.
Then there exists a semigroup on B which has L as its infinitesimal generator.
Proof If (λ − L) f = (λ − L)g, then
λ‖ f − g‖ ≤ ‖(λ − L)( f − g)‖ = 0,
or f = g. Thus λ −L is a one-to-one map, hence is invertible because the range of λ −L is
B. We let Rλ be the inverse, and thus the domain of Rλ is all of B.
We first show that the resolvent equation holds. We observe
(μ − L) 1
λ − μRμ f =
1
λ − μ f
and
(μ − L) 1
λ − μRλ f = (μ − λ)
1
λ − μRλ f + (λ − L)
1
λ − μRλ f
= −Rλ f + 1
λ − μ f .
Combining,
(μ − L)RμRλ f = Rλ f = (μ − L) 1
λ − μ(Rμ − Rλ) f .
37.2 The Hille–Yosida theorem 295
Applying Rμ to both sides yields the resolvent equation.
The hypothesis that ‖λ − L) f ‖ ≥ λ‖ f ‖ immediately implies ‖Rλ f ‖ ≤ ‖ f ‖/λ.
We next show λRλ f → f as λ → ∞. If f ∈ D, then
RλL f = LRλ f = λRλ f − f ,
and so
‖λRλ f − f ‖ ≤ 1
λ
‖L f ‖ → 0
as λ → ∞. Since ‖λRλ‖ ≤ 1 and the domain of L is dense in B, we conclude λRλ f → f
for all f ∈ B.
We use Proposition 37.9 to construct Pt . By Proposition 37.9, Rλ is the resolvent for
Pt . If M is the infinitesimal generator for Pt , then by Proposition 37.5, the domain of M
is {Rλ f : f ∈ B}. Since we know L(Rλ f ) = λRλ f − f ∈ B, then the domain of L
contains {Rλ f : f ∈ B}. Since M is the infinitesimal generator of Pt , by Proposition 37.5,
M(Rλ f ) = λRλ f − f . Therefore L is an extension of M.
If f ∈ D(L), then g = (λ − L) f ∈ B, and thus
(λ − M)−1g ∈ D(M) ⊂ D(L).
Hence
(λ − L) f = g = (λ − M)(λ − M)−1g = (λ − L)(λ − M)−1g.
Since λ − L is one-to-one, then f = (λ − M)−1g, which implies f ∈ D(M). Therefore
M = L and so L is the generator of Pt .
When applying the Hille–Yosida theorem, it is quite often the case that it is easier to show
that the range of λ −L is only dense in B, rather than being all of B. When that occurs, one
needs to look at a closed extension L of L. An operator L is closed if whenever fn → f
and L fn → g, then f ∈ D(L) and L f = g. To construct the closed extension of L, where
we assume that L is dissipative (defined by (37.11)), let Rλg = f if (λ − L) f = g. L being
dissipative is equivalent to the norm of Rλ being bounded by 1/λ on the range of λ −L, and
so we can extend the domain of Rλ uniquely to all of B. Now define D(L) to be the range of
Rλ and set
LRλg = λRλg − g. (37.12)
We will soon give two examples where infinitesimal generators can be used to construct
very useful processes. The first is where the infinitesimal generator is an elliptic operator of
second order in non-divergence form. The second case studies the infinitesimal generators
of Lévy processes.
We should mention that there is another important example where infinitesimal generators
are useful in constructing a process, that of infinite particle systems. The name “infinite
particle systems” refers to a class of models with discrete space and continuous time that are
useful in mathematical biology and in statistical mechanics. One of the simplest examples
is the voter model. Suppose at every point in Z2, the integer lattice in the plane, there is a
voter, who is leaning either toward the Democrat candidate or the Republican candidate. At
each point, the voter waits a length of time that is exponential with parameter one, chooses
296 Infinitesimal generators
one of his four nearest neighbors at random, and then changes his view to agree with that
neighbor. Other infinite particle systems include the contact process (modeling the spread of
infection), Ising model (modeling ferromagnetism), and the exclusion model (used in solid
state physics). See Liggett (2010) for how to construct these processes using infinitesimal
generators, and for much more.
37.3 Nondivergence form elliptic operators
Let us consider the operator L defined on C2 functions on Rd by
L f (x) =
d∑
i, j=1
ai j(x)
∂2 f
∂xi∂x j
(x) +
d∑
i=1
bi(x)
∂ f
∂xi
(x).
We suppose ai j(x) = aji(x) for all x. We assume the ai j and bi are bounded and Hölder
continuous of order α ∈ (0, 1): there exists c such that
|ai j(x) − ai j(y)| ≤ c|x − y|α, |bi(x) − bi(y)| ≤ c|x − y|α,
for i, j = 1, . . . , d. We also assume a uniform ellipticity condition on the ai j: there exists
� > 0 such that

d∑

i, j=1

ai j(x)yiy j ≥ �

d∑

i=1

y2i , (y1, . . . , yd ) ∈ Rd .

Uniform ellipticity says that the matrix whose (i, j)th element is ai j(x) is positive definite,

uniformly in x.

If the ai j and bi were Lipschitz continuous, we can construct the Markov process with

infinitesimal generator L using stochastic differential equations (see Chapter 39), which

is a more probabilistic way of doing it. Even when the ai j are continuous and the bi only

measurable, it is possible to construct the Markov process via SDEs, although this is much

harder. Here we illustrate how the Hille–Yosida theorem can be used in constructing these

processes.

Let B be the space of continuous functions that vanish at infinity. We will want the domain

ofL to include the class C of functions f such that f and its first and second partial derivatives

are continuous and vanish at infinity. Then C is dense in B and L maps C into B.

We show that L is dissipative. Let f ∈ C and let x0 be a point where | f (x0)| = ‖ f ‖. There

is nothing to prove if f is identically zero. If f (x0) < 0, we can look at − f , so let us suppose
f (x0) > 0. Such a point x0 exists because f is continuous and vanishes at infinity. It suffices

to show that L f (x0) ≤ 0, since then

λ‖ f ‖ = λ f (x0) ≤ (λ − L) f (x0) ≤ ‖(λ − L) f ‖.

Let A be the matrix whose (i, j) element is ai j(x0) and let H be the Hessian at x0 so that

Hi j = ∂

2 f

∂xi∂x j

(x0).

37.4 Generators of Lévy processes 297

Let y ∈ Rd and consider the function f (x0 + ty), t ∈ R. Since x0 is the location of a local

maximum for this function, its second derivative, which is

∑d

i, j=1 yiy jHi j, will be less than

or equal to 0. The first derivative of this function will be zero at x0.

Since A is positive definite, there exists an orthogonal matrix P and a diagonal matrix D

with positive entries such that A = PT DP. Recall the trace of a square matrix is defined by

Trace (C) =∑di=1 Cii and Trace (AB) = Trace (BA). Note

d∑

i, j=1

ai j(x0)

∂2 f

∂xi∂x j

(x0) = Trace (AH ).

We have

Trace (AH ) = Trace (PT DPH ) = Trace (PHPT D) =

d∑

i=1

(PHPT )iiDii,

since D is a diagonal matrix. Thus to show that Trace (AH ) ≤ 0, it suffices to show that

(PHPT )ii ≤ 0 for each i. If we let ei be the unit vector in the xi direction and y = PT ei, we

have

(PHPT )ii = etiPHPT ei = ytHy =

d∑

i, j=1

yiy jHi j ≤ 0.

Since x0 is the location of a local maximum, then

∂ f

∂xi

(x0) = 0, and we conclude L f (x0) ≤ 0.

Since L1 = 0, then Pt1 = 1 for all t. This and Exercise 37.1 imply that the Pt are

non-negative operators.

To apply the Hille–Yosida theorem, it remains to show that the range of λ −L is dense in

B. For this we refer the reader to the PDE literature, e.g., Bass (1997), Chapter 3 or Gilbarg

and Trudinger (1983), Chapters 5,6.

37.4 Generators of Lévy processes

Let n be a measure on R \ {0} satisfying∫

(h2 ∧ 1) n(dh) < ∞.
Consider the operator L defined on C2 functions by
L f (x) =
∫
[ f (x + h) − f (x) − 1(|h|≤1) f ′(x)h] n(dh).
We will show that L is the infinitesimal generator of a Markov semigroup. We construct
these processes, the Lévy processes, probabilistically in Chapter 42. We confine ourselves
to the one-dimensional case, although the argument for higher dimensions is completely
analogous.
We let B be the continuous functions vanishing at infinity. We let C be the class of
Schwartz functions, which is the class of C∞ functions, all of whose kth partial derivatives
go to zero faster than |x|−m as |x| → ∞ for every k = 0, 1, . . . and every m = 1, 2, . . .; see
Section B.2.
298 Infinitesimal generators
First we show that L maps C into B, so that the domain of L contains C, and hence is
dense in B. Given M > 1 and f ∈ C, by Taylor’s theorem

|L f (x)| ≤

∫

| f (x + h) − f (x) − 1(|h|≤1) f ′(x)h| n(dh) (37.13)

≤ sup

|y−x|≤1

(| f ′′(y)|)

∫

0<|h|≤1
h2 n(dh) + 2( sup
|y−x|≤M
| f (y)|)
∫
1<|h|≤M
n(dh)
+ 2
∫
|h|>M

‖ f ‖∞ n(dh).

This shows |L f (x)| is finite. Given ε > 0 and f ∈ C, choose M large so that∫

|h|>M

n(dh) < ε/‖ f ‖∞.
Since the first two terms on the right-hand side of (37.13) tend to zero as |x| → ∞, then
L : C → B.
To showL is dissipative, let f ∈ C and choose x0 such that | f (x0)| = ‖ f ‖. There is nothing
to prove if ‖ f ‖ = 0, so assume ‖ f ‖ > 0. Because f is in the Schwartz class, it takes on

its maximum and its minimum. By looking at − f if necessary, we may suppose f (x0) > 0.

Since x0 is the location of a local maximum, f ′(x0) = 0 and f (x0 + h) − f (x0) ≤ 0 for each

h, hence L f (x0) ≤ 0. Then

λ‖ f ‖ = λ f (x0) ≤ (λ − L) f (x0) ≤ ‖(λ − L) f ‖.

Taking limits, this holds for every f in the domain of L.

Finally we need to show that the range of λ−L is dense in B. This is the most complicated

part and we break the argument into steps.

Step 1. We start by computing the Fourier transform of L f if f ∈ C. Let nδ(dh) =

1(|h|≥δ)n(dh) and let

Lδ f (x) =

∫

[ f (x + h) − f (x) − 1(|h|≤1) f ′(x)h] nδ(dh).

Then nδ is a finite measure. Using the Fubini theorem and the fact that the Fourier transform

of the function x → f (x + h) is eiuh f̂ (u) and the Fourier transform of f ′(x) is −iu f̂ (u),

L̂δ f (u) =

∫ ∫

[eiux f (x + h) − eiux f (x) − 1(|h|≤1)eiux f ′(x)h] dx nδ(dh)

= f̂ (u)

∫

[e−iuh − 1 + 1(|h|≤1)iuh] nδ (dh)

= f̂ (u)

∫

[e−iuh − 1 + 1(|h|≤1)iuh]1(|h|≥δ) n(dh). (37.14)

The expression in brackets on the last line is bounded by c(h2 ∧ 1) and by dominated

convergence the last line converges to f̂ (u)ψ(u) as δ → 0, where

ψ(u) =

∫

[e−iuh − 1 + 1(|h|≤1)iuh] n(dh). (37.15)

37.4 Generators of Lévy processes 299

Since

|L̂ f (u) − L̂δ f (u)|

=

∣∣∣ ∫ eiux ∫

|h|<δ
[ f (x + h) − f (x) − 1(|h|≤1) f ′(x)h] n(dh) dx
∣∣∣
≤
∫
( sup
|y−x|<δ
| f ′′(y)|)
∫
|h|<δ
h2 n(dh) dx,
which tends to zero as δ → 0 because f ∈ C, we conclude
L̂ f (u) = f̂ (u)ψ(u). (37.16)
Step 2. Now let g ∈ C, let ε > 0, choose K > 1 such that ∫|h|≥K n(dh) < ε, let mK (dh) =
1(|h|≥K)n(dh), and define LK and ψK in terms of mK . We show there exists f ∈ C such that
g = (λ − LK ) f = g.
We have
ψK (u) =
∫
|h|≤K
[e−iuh − 1 + iuh1(|h|≤1)] n(dh),
so using dominated convergence,
ψ ′K (u) =
∫
|h|≤K
[−ihe−iuh + ih1(|h|≤1)] n(dh),
ψ ′′K (u) =
∫
|h|≤K
[−h2e−iuh] n(dh),
with similar formulas for the higher derivatives. Thus all the derivatives of ψK are bounded.
Moreover the real part of ψK (u) is
∫
|h|≤K[cos(uh) − 1] n(dh), which is less than or equal to
0. Since g ∈ C, by Section B.2, ĝ ∈ C. If we define f by
f̂ (u) = 1
λ − ψK (u) ĝ(u), (37.17)
we see that f̂ and all its derivatives are continuous and tend to zero faster than |u|−m for
every m. Hence f̂ ∈ C, which implies f ∈ C by Section B.2.
Notice (λ − LK ) f = g because
λ f̂ (u) − L̂K f (u) = λ − ψK (u)
λ − ψK (u) ĝ(u) = ĝ(u).
Step 3. We prove that ‖L f − LK f ‖ ≤ cε‖g‖.
Since g ∈ C, then ĝ ∈ L1. From (37.17) we have | f̂ (u)| ≤ |̂g(u)|/λ. Then
‖ f ‖∞ ≤ c‖ f̂ ‖L1 ≤ c‖ĝ‖L1
300 Infinitesimal generators
and
|L f (x) − LK f (x)| ≤
∫
|h|≥K
| f (x + h) − f (x)| n(dh)
≤ 2‖ f ‖∞
∫
|h|≥K
n(dh)
≤ cε‖ĝ‖L1 .
Step 4. We complete the proof that the range of λ − L is dense in B. Since ‖L f − LK f ‖ ≤
cε‖g‖ by Step 3 and (λ − LK ) f = g, then
‖(λ − L) f − g‖ ≤ cε‖g‖.
Because f ∈ C ⊂ D(L) and ε is arbitrary, this proves the range of λ − L is dense in C,
hence in B.
We thus have L satisfying all the hypotheses of the Hille–Yosida theorem, and hence there
exists a semigroup Pt mapping B into B. We again note that L1 = 0, hence Pt = 1 for all t,
and so by Exercise 37.1, the Pt are non-negative operators.
Exercises
37.1 Let B be either the space L2 with respect to a finite measure or else the continuous functions
vanishing at infinity for some locally compact separable metric space S. In the former case, we
say f ≥ 0 if f (x) ≥ 0 for almost every x, in the latter case if f (x) ≥ 0 for all x. A semigroup is
non-negative if f ≥ 0 implies Pt f ≥ 0 for all t ≥ 0. Suppose that Pt is a semigroup, the space
B contains the constant functions, and Pt1 = 1 for all t. Show that Pt is a contraction if and only
if Pt is non-negative.
37.2 Show that Pt and Rλ commute and that
PtRλ f =
∫ ∞
0
e−λsPs+t f ds.
Show that for any a < b we have∥∥∥ ∫ b
a
eλtPt f dt
∥∥∥ ≤ ∫ b
a
e−λt‖Pt f ‖ dt.
Hint: Approximate Rλ f by a Riemann sum.
37.3 Show that if Pt is a contraction semigroup and Rλ is the resolvent, then
‖Rλ‖ ≤ 1/λ. (37.18)
37.4 Show that if A is a bounded operator and Tt = etA, then Tt is a strongly continuous semigroup
of operators with infinitesimal generator A. (We cannot assert that the Tt are contractions.)
37.5 Prove that if L is dissipative, the domain of L is dense in B, and the range of λ − L is dense in
B, then L defined in (37.12) is a closed extension of L that is dissipative and the range of λ −L
is equal to B. Show there is only one such closed extension of L.
37.6 If the range of λ−L equals B for a single value of λ, then the range of λ−L equals B for every
value of λ.
Hint: Define Rλ as the inverse of λ − L, then use (37.3) to define Ra for other values of a.
Exercises 301
37.7 Let (Xt , Px) be a Markov process with transition probabilities given by Pt f (x) = f (x + t).
Determine L and D(L).
37.8 Let Pt be a strongly continuous semigroup of contraction operators and let L be the infinitesimal
generator. Show that D(Ln) is dense in B for every positive integer n.
37.9 This is a continuation of Exercise 36.5. Prove that if f ∈ C2 with compact support, Pt is the
semigroup given in Exercise 36.5, andL is the infinitesimal generator, then the Fourier transform
of L f is f̂ (u)ψ(u).
37.10 Suppose that Pt is a strongly continuous semigroup, but not necessarily of contractions. Thus
Pt+s = PtPs and Pt f → f in norm if f ∈ B, but we do not assume ‖Pt‖ ≤ 1. Prove that there
exist constants K, b > 0 such that ‖Pt‖ ≤ Kebt for all t ≥ 0.

Hint: Use the uniform boundedness principle from functional analysis to prove there exists

c, t0 such that ‖Pt‖ ≤ c if t ≤ t0. Then use the semigroup property.

38

Dirichlet forms

When constructing semigroups, it is sometimes easier to start with a bilinear form, called the

Dirichlet form, than to work with the infinitesimal generator, and to construct the semigroup

from the form. For example, let � be the Laplacian. If f , g ∈ C2 with compact support, then

integration by parts shows∫

Rd

f (x)( 12�g)(x) dx = 12

∫

Rd

f (x)

d∑

i=1

∂2g

∂x2i

(x) dx

= − 12

∫

Rd

d∑

i=1

∂ f

∂xi

(x)

∂g

∂xi

(x) dx.

If we write

E ( f , g) = 12

∫ d∑

i=1

∂ f

∂xi

(x)

∂g

∂xi

(x) dx,

we thus have ∫

Rd

f ( 12�g) = −E ( f , g). (38.1)

Clearly E ( f , g) is symmetric in f and g, so∫

Rd

f ( 12�g) = −E ( f , g) = −E (g, f ) =

∫

Rd

g( 12� f ) dx.

If Rλ is the resolvent for Brownian motion, (38.1) and the fact that

1

2�Rλ f = λRλ f − f tells

us that

E (Rλ f , g) + λ

∫

(Rλ f )g = −

∫

( 12�Rλ f )g + λ

∫

(Rλ f )g (38.2)

= −

∫

(λRλ f − f )g + λ

∫

(Rλ f )g

=

∫

f g.

The bilinear form E ( f , g) makes sense even if f , g are only in C1 with compact support,

which is one major advantage of the Dirichlet form. Since E is clearly linear in each variable,

we have

E ( f , g) = 12 [E ( f + g, f + g) − E ( f , f ) − E (g, g)],

302

38.1 Framework 303

so to specify the Dirichlet form, it is only necessary to know E ( f , f ), a number, rather than

L f , a function. One disadvantage of Dirichlet forms is that one needs a self-adjoint operator,

and not every infinitesimal generator is self-adjoint. Another disadvantage is that when

working with Dirichlet forms, L2 is the natural space to work with, which means there are

null sets one has to worry about. In particular, the construction of Chapter 36 is not directly

applicable, because there we required our Banach space to be the set of continuous functions

vanishing at infinity. (Modifications of the methods in Chapter 36 do work, however.)

38.1 Framework

Let us now suppose S is a locally compact separable metric space together with a σ -finite

measure m defined on the Borel subsets of S . We want to give a definition of the Dirichlet

form in this more general context. We suppose there exists a dense subset D = D(E ) of

L2(S, m) and a non-negative bilinear symmetric form E defined on D × D, which means

E ( f , g) = E (g, f ), E ( f + g, h) = E ( f , h) + E (g, h)

E (a f , g) = aE ( f , g), E ( f , f ) ≥ 0

for f , g, h ∈ D, a ∈ R.

We will frequently write 〈 f , g〉 for ∫ f (x)g(x) m(dx). For a > 0 define

Ea( f , f ) = E ( f , f ) + a〈 f , f 〉.

We can define a norm on D using the inner product Ea: the norm of f equals (Ea( f , f ))1/2;

we call this the norm induced by Ea. Since a〈 f , f 〉 ≤ Ea( f , f ), then

Ea( f , f ) ≤ Eb( f , f ) = Ea( f , f ) + (b − a)〈 f , f 〉

≤

(

1 + b − a

a

)

Ea( f , f )

if a < b, so the norms induced by different a’s are all equivalent. We say E is closed if D
is complete with respect to the norm induced by Ea for some a. Equivalently, E is closed if
whenever un ∈ D satisfies E1(un − um, un − um) → 0 as n, m → ∞, then there exists u ∈ D
such that E (un − u, un − u) → 0 as n → ∞.
We say E is Markovian if whenever u ∈ D, then v = 0∨(u∧1) ∈ D and E (v, v) ≤ E (u, u).
(A slightly weaker definition of Markovian is sometimes used.) A Dirichlet form is a non-
negative bilinear symmetric form that is closed and Markovian.
Absorbing Brownian motion on [0, ∞) is a symmetric process. The corresponding Dirich-
let form is
E ( f , f ) = 12
∫ ∞
0
| f ′(x)|2 dx,
and the appropriate domain turns out to be the completion of the set of C1 functions with
compact support contained in (0, ∞) with respect to the norm induced by E1. In particular,
any function with compact support contained in (0, ∞) will be zero in a neighborhood of
0. In a domain D in higher dimensions, the Dirichlet form for absorbing Brownian motion
becomes
E ( f , f ) = 12
∫
|∇ f (x)|2 dx, (38.3)
304 Dirichlet forms
with the domain of E being the completion with respect to E1 of the C1 functions whose
support is contained in the interior of D.
Reflecting Brownian motion is also a symmetric process. For a domain D, the Dirichlet
form is given by (38.3) and the domain D(E ) of the form is given by the completion with
respect to the norm induced by E1 of the C1 functions on D with compact support, where D
is the closure of D. One might expect there to be some restriction on the normal derivative
∂ f /∂n on the boundary of D, but in fact there is no such restriction. To examine this further,
consider the case of D = (0, ∞). If one takes the class of functions f which are C1 with
compact support and with f ′(0) = 0 and takes the closure with respect to the norm induced
by E1, one gets the same class as D(E ); this is Exercise 38.1.
One nice consequence of the fact that we don’t need to impose a restriction on the normal
derivative in the domain of E for reflecting Brownian motion is that this allows us to define
reflecting Brownian motion in any domain, even when the boundary is not smooth enough
for the notion of a normal derivative to be defined.
38.2 Construction of the semigroup
We now want to construct the resolvent corresponding to a Dirichlet form. The motivation
given in (38.2) shows we should expect
Ea(Ra f , g) = 〈 f , g〉 (38.4)
for all a > 0 and all f , g such that Ra f , g ∈ D. Our Banach space B will be L2(S, m).

Theorem 38.1 If E is a Dirichlet form, there exists a family of resolvent operators {Rλ} such

that

(1) the Rλ satisfy the resolvent equation,

(2) ‖λRλ‖ ≤ 1 for all λ > 0,

(3) λRλ f → f as λ → ∞ for f ∈ B, and

(4) Ea(Ra f , g) = 〈 f , g〉 if a > 0, Ra f , g ∈ D.

Moreover, if f ∈ B satisfies 0 ≤ f (x) ≤ 1, m-a.e., then for all a > 0

0 ≤ aRa f ≤ 1, m-a.e. (38.5)

Proof Fix f ∈ B and define a linear functional on B by I(g) = 〈 f , g〉. This functional is

also a bounded linear functional on D with respect to the norm induced by Ea, that is, there

exists c such that |I(g)| ≤ cEa(g, g)1/2. This follows because

|I(g)| =

∣∣∣ ∫ f g∣∣∣ ≤ 〈 f , f 〉1/2〈g, g〉1/2 ≤ 〈 f , f 〉1/2( 1aEa(g, g))1/2

by the Cauchy–Schwarz inequality. Since E is closed, D is a Hilbert space with respect to

the norm induced by Ea. By the Riesz representation theorem for Hilbert spaces (see, e.g.,

Folland (1999), Theorem 5.25), there exists a unique element u ∈ D such that I(g) = Ea(u, g)

for all g ∈ D. We set Ra f = u. In particular, (38.4) holds, and Ra f ∈ D.

38.2 Construction of the semigroup 305

We show the resolvent equation holds. If g ∈ D,

Ea(Ra f − Rb f , g) = Ea(Ra f , g) − E (Rb f , g) − a〈Rb f , g〉

= 〈 f , g〉 − E (Rb f , g) − b〈Rb f , g〉 + (b − a)〈Rb f , g〉

= 〈 f , g〉 − Eb(Rb f , g) + (b − a)〈Rb f , g〉

= (b − a)〈Rb f , g〉

= Ea((b − a)RaRb f , g).

Since this holds for all g ∈ D and D is dense in B, then Ra f − Rb f = (b − a)RaRb f .

Next we show that ‖aRa f ‖ ≤ ‖ f ‖, or equivalently,

〈aRa f , aRa f 〉 ≤ 〈 f , f 〉. (38.6)

If 〈Ra f , Ra f 〉 is zero, then (38.6) trivially holds, so suppose it is positive. We have

a〈Ra f , Ra f 〉 ≤ Ea(Ra f , Ra f ) = 〈 f , Ra f 〉 ≤ 〈 f , f 〉1/2〈Ra f , Ra f 〉1/2

by (38.4) and the Cauchy–Schwarz inequality. If we now divide both sides by 〈Ra f , Ra f 〉1/2

and then square both sides, we obtain (38.6).

We show that bRb f → f as b → ∞ when f ∈ B. If f ∈ D, then by the Cauchy–Schwarz

inequality and (38.6)

〈bRb f , f 〉 ≤ 〈bRb f , bRb f 〉1/2〈 f , f 〉1/2

≤ 〈 f , f 〉.

Using this,

b〈bRb f − f , bRb f − f 〉 ≤ Eb(bRb f − f , bRb f − f )

= b2Eb(Rb f , Rb f ) − 2bEb(Rb f , f ) + Eb( f , f )

= b2〈Rb f , f 〉 − 2b〈 f , f 〉 + E ( f , f ) + b〈 f , f 〉

≤ E ( f , f ).

Now divide both sides by b to get ‖bRb f − f ‖2 ≤ E ( f , f )/b → 0 as b → ∞. Since D is

dense in B and ‖bRb‖ ≤ 1 for all b, we conclude bRb f → f for all f ∈ B.

It remains to show 0 ≤ bRb f ≤ 1, m-a.e., if 0 ≤ f ≤ 1, m-a.e. Fix f ∈ B with 0 ≤ f ≤ 1,

m-a.e., and let a > 0. Define a functional ψ on D by

ψ(v) = E (v, v) + a

⟨

v − f

a

, v − f

a

⟩

.

We claim

ψ(Ra f ) + Ea(Ra f − v, Ra f − v) = ψ(v), v ∈ D. (38.7)

306 Dirichlet forms

To see this, start with the left-hand side, which is equal to

E (Ra f , Ra f ) + a

⟨

Ra f − 1

a

f , Ra f − 1

a

f

⟩

+ Ea(Ra f − v, Ra f − v)

= Ea(Ra f , Ra f ) − 2〈Ra f , f 〉 + 1

a

〈 f , f 〉 + Ea(Ra f , Ra f ) − 2Ea(Ra f , v) + Ea(v, v)

= 1

a

〈 f , f 〉 − 2〈 f , v〉 + E (v, v) + a〈v, v〉

= ψ(v).

If follows from (38.7) and the fact that Ea(g, g) is non-negative for any g ∈ D that Ra f is the

function that minimizes ψ .

Set φ(x) = 0 ∨ (x ∧ (1/a)) and let w = φ(Ra f ). Observe that |φ(t) − s| ≤ |t − s| for

t ∈ R and s ∈ [0, 1/a], so ∣∣∣w(x) − f (x)

a

∣∣∣ ≤ ∣∣∣Ra f (x) − f (x)

a

∣∣∣,

and therefore ⟨

w − f

a

, w − f

a

⟩

≤

⟨

Ra f − f

a

, Ra f − f

a

⟩

. (38.8)

Since E is Markovian, then aw = 0 ∨ ((aRa f ) ∧ 1), which leads to

E (w, w) ≤ 1

a2

E (aRa f , aRa f ) = E (Ra f , Ra f ). (38.9)

Adding (38.8) and (38.9), we conclude ψ(w) ≤ ψ(Ra f ). Since Ra f is the minimizer for ψ ,

then w = Ra f , m-a.e. But 0 ≤ w ≤ 1/a, and hence aRa f takes values in [0, 1], m-a.e.

If we combine Proposition 37.9 and Theorem 38.1, we obtain a semigroup Pt whose

resolvent satisfies (38.4). We would like to know that the analog of (38.5) holds for Pt .

Corollary 38.2 If 0 ≤ f ≤ 1, m-a.e., then 0 ≤ Pt f ≤ 1, m-a.e.

Proof If 0 ≤ f ≤ 1, m-a.e., then 0 ≤ bRb f ≤ 1, m-a.e, by Theorem 38.1, and iterating,

0 ≤ (bRb)i f ≤ 1, m-a.e., for every i. Using the notation of the proof of Proposition 37.9,

Qbt f (x) = e−bt

∞∑

i=0

(bt)i(bRb)

i f (x)/i!,

which will be non-negative, m-a.e., and bounded by e−bt

∑∞

i=0(bt)

i/i!, m-a.e. Passing to the

limit as b → ∞, we see that Pt f takes values in [0, 1], m-a.e.

When it comes to using the semigroup Pt derived from a Dirichlet form to construct a

Markov process X , there is a difficulty that we did not have before. Since Pt is constructed

using an L2 procedure, Pt f is defined only up to almost everywhere equivalence. Without

some continuity properties of Pt f for enough f ’s, we must neglect some null sets. If the only

null sets we could work with were sets of m-measure 0, we would be in trouble. For example,

when S is the plane and m is a two-dimensional Lebesgue measure, the x axis has measure

zero, but a continuous process will (in general) hit the x axis. Fortunately there is a notion

of sets of capacity zero, which are null sets that are smaller than sets of measure zero. It is

38.3 Divergence form elliptic operators 307

possible to construct a process X starting from all points x in S except for those in a set N

of capacity zero and to show that starting from any point not in N , the process never hits N .

There is another difficulty when working with Dirichlet forms. In general, one must look

at S̃ , a certain compactification of S , which is a compact set containing S . Even when our

state space is a domain in Rd , S̃ is not necessarily equal to S , the Euclidean closure of S , and

one must work with S̃ instead of S . It can be shown that this problem will not occur if the

Dirichlet form is regular. Let CK be the set of continuous functions with compact support. A

Dirichlet form E is regular if D ∩ CK is dense in D with respect to the norm induced by E1

and D ∩ CK also is dense in CK with respect to the supremum norm.

38.3 Divergence form elliptic operators

We want to show how to construct the Markov process corresponding to the operator

L f (x) =

d∑

i, j=1

∂

∂xi

(

ai j(·) ∂ f

∂x j

(·)

)

(x). (38.10)

If the ai j’s are smooth in x, this can be interpreted as first calculating the partial derivative of f

with respect to x j, multiplying the result by ai j(x), taking the partial derivative of the product

with respect to xi, and then summing over i and j. If, however, the ai j’s are only bounded

and measurable, one cannot even in general give any nontrivial examples of functions in

the domain of L. Here is where Dirichlet forms are the perfect tool. Operators of the form

(38.10) are known as elliptic operators in divergence form or in variational form, and the

study of their properties has a long history in PDE.

We assume ai j(x) = aji(x) for each i and j and each x. We suppose the ai j(x) are

measurable functions and are uniformly bounded in x for each i and j. We also require

uniform ellipticity: there exists � such that

d∑

i, j=1

ai j(x)yiy j ≥ �

d∑

i=1

y2i , (y1, . . . , yd ) ∈ Rd .

Just as in the nondivergence elliptic operator case, the matrix whose (i, j)th element is ai j(x)

is positive definite, uniformly in x.

We will shortly define a Dirichlet form, but let us first specify a domain. Let C1K be the

collection of C1 functions with compact support, and define H 1 to be the completion of C1K

with respect to the norm

‖ f ‖H 1 =

( ∫

(| f (x)|2 + |∇ f (x)|2) dx

)1/2

. (38.11)

One can show that H 1 with this norm is a Banach space; this is Exercise 38.2.

Now for f ∈ C1K define

E ( f , f ) =

∫

Rd

d∑

i, j=1

ai j(x)

∂ f

∂xi

(x)

∂ f

∂x j

(x) dx. (38.12)

308 Dirichlet forms

We can use the fact that C1K is dense in H

1 to extend the definition of E to all of H 1 ×H 1. The

connection with the operator L is that when the ai j are smooth, integration by parts yields∫

(L f )g dx = −E ( f , g)

if g is C1 with compact support; cf. (38.1).

Because of the boundedness and uniform ellipticity, there exist positive constants c1 and

c2 not depending on f such that

c1

∫

|∇ f (x)|2 dx ≤ E ( f , f ) ≤ c2

∫

|∇ f (x)|2 dx.

Therefore the norm induced by E1 and the norm in H 1 are equivalent. This implies E is

closed. By the definition of H 1, E is regular, and clearly E is symmetric. Thus we need only

to show that E is Markovian.

Let φ(x) = (0 ∨ x) ∧ 1. For each ε > 0 let φε be C∞, bounded, agreeing with φ on [0, 1],

with ‖φ′ε‖∞ ≤ 1, and such that φε(x) → φ(x) uniformly in x as ε → 0 and φ′ε(x) → 1[0,1](x)

pointwise as ε → 0. Note ∇φε( f ) = φ′ε( f )∇ f , so if f ∈ C1K ,

E (φε( f ), φε( f )) =

d∑

i, j=1

∫

(φ′ε( f )(x))

2ai j(x)

∂ f

∂xi

(x)

∂ f

∂x j

(x) dx. (38.13)

Since

d∑

i, j=1

ai j(x)

∂ f

∂xi

(x)

∂ f

∂x j

(x) ≥ �|∇ f (x)|2 ≥ 0

and |φ′ε( f )(x)| ≤ 1, we see that

E (φε( f ), φε( f )) ≤ E ( f , f ).

Taking the limit as ε → 0 in (38.13) we obtain

E (φ( f ), φ( f )) ≤ E ( f , f ) < ∞. (38.14)
In particular, φ( f ) ∈ H 1 = D(E ). We now pass to the limit to show that (38.14) holds for
all f ∈ H 1, which says that E is Markovian.
We can therefore apply Theorem 38.1 to obtain a semigroup corresponding to the Dirichlet
form E . As mentioned earlier, there is potentially a problem in that the semigroup is only
defined for points not in a certain null set. However, a famous result of Nash and of DeGiorgi
shows that the semigroup Pt can be written as Pt f (x) =
∫
f (y)p(t, x, y) dy with p(t, x, y)
Hölder continuous in x and y; see Bass (1997), Chapter VII for a presentation of this result.
This allows us to take the null set to be empty and to see that our semigroup satisfies the
assumptions of Chapter 36. Therefore there exists a strong Markov process having Pt as its
semigroup.
Exercises 309
Exercises
38.1 Let F1 = { f ∈ C1[0,∞) : f has compact support} and F2 = F1 ∩ { f ∈ C1[0,∞) :
f has compact support, f ′(0) = 0}. Show that the closures of F1 and F2 with respect to the
norm (
∫
(| f (x)|2 + | f ′(x)|2) dx)1/2 are the same.
38.2 If H 1 is the completion of C1K , the C
1 functions on Rd with compact support, relative to the
norm given by (38.11), show H 1 is a Hilbert space.
38.3 Show that the resolvent operator Rλ defined in Theorem 38.1 is a symmetric operator, that is, if
f , g ∈ B, then 〈Rλ f , g〉 = 〈 f , Rλg〉.
38.4 Show that if the resolvent operator Rλ is a symmetric operator, then the transition operators Pt
are also symmetric: if f , g ∈ B, then 〈Pt f , g〉 = 〈 f , Ptg〉.
38.5 To do the next few exercises, you will have to know some functional analysis, specifically, the
spectral theorem for self-adjoint operators. See Lax (2002).
Let E be a Dirichlet form with domain D(E ) and let L be the infinitesimal generator of the
semigroup Pt that corresponds to L. Let E(dλ) be a spectral resolution of the identity for −L.
(The operator L is a negative operator, so −L is a positive one.) Then a consequence of the
spectral theorem is that
Pt f =
∫ ∞
0
e−λt E(dλ) f
and
Ra f =
∫ ∞
0
1
a + λ E(dλ) f .
Also
〈 f , g〉 =
∫ ∞
0
〈E(dλ) f , g〉.
Show that if f , g ∈ D, then
E( f , g) =
∫ ∞
0
λ 〈E(dλ) f , g〉.
Hint: First prove it for f = Rah. Write
E(Rah, g) = 〈h, g〉 − a〈Rah, g〉 =
∫ ∞
0
(
1 − a
a + λ
)
〈E(dλ)h, g〉
=
∫ ∞
0
λ
a + λ 〈E(dλ)h, g〉 =
∫ ∞
0
λ 〈E(dλ)(Rah), g〉.
To extend this to all f in the domain of E , use the fact that E is closed.
38.6 If L is the infinitesimal generator of the semigroup associated with the Dirichlet form E , show
that D(
√−L) = D(E ).
38.7 Show that if f ∈ D(E ), then aRa f converges to f with respect to the norm induced by E1.
38.8 Show that if b > 0, then {Rb f : f ∈ L2} is a dense subset of D(E ) with respect to the norm

induced by E1.

38.9 Show that {Pt f : f ∈ L2, t > 0} is a dense subset of D(E ) with respect to the norm induced by

E1.

310 Dirichlet forms

38.10 This exercise shows how to approximate E by forms whose domain is all of B. Let

E (t)( f , g) = 1

t

〈 f − Pt f , g〉.

Show that if f ∈ D(E ), then E (t)( f , f ) increases to E( f , f ). Show that if f , g ∈ D(E ), then

E (t)( f , g) converges to E( f , g).

38.11 Show that if u ∈ D(E ), then |u| ∈ D(E ) and E(|u|, |u|) ≤ E(u, u).

Hint: Use Exercise 38.10.

38.12 Use Exericse 38.11 to show that if u ∈ D(E ), then E(u+, u−) ≤ 0.

38.13 Suppose {Pt} are the transition probabilities corresponding to a Dirichlet form E . Suppose there

exist functions pt (x, y) such that for each t,

Pt f (x) =

∫

pt (x, y) m(dy)

for almost every x. Prove that for almost every pair (x, y) with respect to the product measure

m × m, pt (x, y) = pt (y, x).

38.14 Let f ∈ L2(m) and define the functional

ψ(u) = E(u, u) + λ〈u, u〉 − 2〈 f , u〉

for u in the domain of E . Prove that ψ is minimized by u = Rλ f , and that this function is the

unique minimizer.

38.15 Let Pt be the semigroup associated with a Dirichlet form and define

J (dx, dy) = Pt (x, dy) m(dx).

(1) Prove that if f , g are continuous with compact support, then∫ ∫

f (x)g(y) J (dx, dy) =

∫ ∫

g(x) f (y) J (dx, dy).

(2) With f and g continuous with compact support, prove that∫

f (x)g(y) J (dx, dy) = 〈 f , Ptg〉

and ∫ ∫

f (x)g(x) J (dx, dy) = 〈 f g, Pt1〉.

(3) Let k(x) = 1 − Pt1(x). Prove that if E (t) is defined as in Exercise 38.10, then

2tE (t)( f , g) =

∫ ∫

( f (x) − f (y))(g(x) − g(y)) J (dx, dy) +

∫

f (x)g(x)k(x) m(dx).

(4) Is E (t) a Dirichlet form? A regular Dirichlet form?

Notes 311

38.16 This is a continuation of the previous exercise. If f is a function on the state space, we say that

g is a normal contraction of f if |g(x)| ≤ | f (x)| for all x and |g(x) − g(y)| ≤ | f (x) − f (y)| for

all x and y. As an example, note that if g(x) = −1 ∨ ( f (x) ∧ 1), then g is a normal contraction

of f . Prove that if f ∈ D(E ), where E is a Dirichlet form and g is a normal contraction of f ,

then for each t > 0,

E (t)(g, g) ≤ E (t)( f , f ) ≤ E( f , f ).

Notes

See Fukushima et al. (1994) for further information.

39

Markov processes and SDEs

One common way of constructing Markov processes is via stochastic differential equations.

Roughly speaking, if there is uniqueness for every starting point, then one can create a

strong Markov process. After proving this, we establish a connection between stochastic

differential equations and partial differential equations, and then we describe what is known

as the martingale problem.

39.1 Markov properties

Let P be a probability and suppose W is a d-dimensional Brownian motion with respect to

P. Consider the SDE

dXt = σ (Xt ) dWt + b(Xt ) dt. (39.1)

Here σ is a d × d matrix-valued function and b is a vector-valued function, both Borel

measurable and bounded. This can be written in terms of components as

dX it =

d∑

j=1

σi j(Xt ) dW

j

t + bi(Xt ) dt, i = 1, . . . , d,

where W = (W 1, . . . ,W d ). Let X xt be the solution to (39.1) when X0 = x. Let Px be the law

of X xt .

Let � = C[0, ∞), let F be the cylindrical subsets of �, and define Zt (ω) = ω(t). The

main result of this section is that if weak existence and weak uniqueness hold for (39.1) for

every starting point x, then the solutions (Zt, Px) form a strong Markov process.

We begin by considering regular conditional probabilities.

Definition 39.1 Let (�,F , P) be a probability space, and let E be a σ -field contained in F .

A regular conditional probability for E [ · | E] is a kernel Q(ω, dω′) such that

(1) Q(ω, ·) is a probability measure on (�, E ) for each ω;

(2) for each A ∈ F , Q(·, A) is a random variable that is measurable with respect to F ;

(3) for each A ∈ F and each B ∈ E ,∫

B

Q(ω, A) P(dω) = P(A ∩ B).

Regular conditional probabilities need not always exist, but if the probability space has

sufficient structure, then they do. We provide a proof in the appendix; see Theorem C.1.

Q(ω, A) can be thought of as P(A | E )(ω), regularized so as to have some joint measurability.

312

39.1 Markov properties 313

Recall that the definition of minimal augmented filtration for a Markov process was given in

Section 20.1.

Theorem 39.2 Suppose weak existence and weak uniqueness hold for the SDE (39.1) when-

ever X0 is a random variable that is in L2 and is measurable with respect to F0. Suppose the

matrix σ (y) is invertible for each y. Let (�,F , P) be defined as above. Let Px be the law

of the weak solution when X0 is identically equal to x. Let {Ft} be the minimal augmented

filtration generated by Z. Then (Px, Zt ) is a strong Markov process.

Proof We will prove that if T is a bounded stopping time and f is a bounded and Borel

measurable function on Rd , then

E

x[ f (ZT+t ) | FT ] = E ZT f (Zt ), a.s. (39.2)

As in Section 20.3, this is sufficient to get the strong Markov property.

Fix x. Let

Yt = Zt −

∫ t

0

b(Zr) dr (39.3)

and

W ′t =

∫ t

0

σ−1(Zr) dYr. (39.4)

Since the Px law of Zt is the same as the P law of X xt , then the P

x law of W ′ is the same as

the P law of W , or in other words, W ′ is a Brownian motion under Px. Rearranging (39.3)

and (39.4), we have the equation

Zt = Z0 +

∫ t

0

σ (Zr) dW

′

r +

∫ t

0

b(Zr) dr. (39.5)

Let Q be a regular conditional probability for E x[ · | FT ]. Let Z̃t = ZT +t and W̃t = W ′T+t −W ′T .

Using (39.5) with t replaced by T +t and then with t replaced by T , and taking the difference,

we obtain

ZT+t − ZT =

∫ T +t

T

σ (Zr) dWr +

∫ T+t

T

b(Zr) dr,

and hence

Z̃t = Z̃0 +

∫ t

0

σ (Z̃r)W̃r +

∫ t

0

b(Z̃r) dr. (39.6)

We will show in a moment that W̃ is a Brownian motion with respect to Q(ω, ·) for

Px-almost all ω. Thus except for ω in a Px-null set, (39.6) implies that under Q(ω, ·), Z̃ is a

solution to (39.1) with starting point Z̃0 = ZT (ω). If E Q denotes the expectation with respect

to Q, the weak uniqueness tells us that

E Q f (Z̃t ) = E ZT f (Zt ), Px(dω)-a.s. (39.7)

On the other hand,

E Q f (Z̃t ) = E Q f (ZT+t ) = E x[ f (ZT+t ) | FT ], Px(dω)-a.s. (39.8)

Combining (39.7) and (39.8) proves (39.2).

314 Markov processes and SDEs

It remains to prove that under Q the process W̃ is a Brownian motion. Q(ω, ·) is a

probability measure on �′, so t → W̃t is continuous for every ω′. Let t1 < · · · < tn and
N (u2, . . . , un, t1, . . . , tn) =
{
ω : E Q exp
(
i
n∑
j=2
uj(W
′
T+t j − W ′T+t j−1 )
)
�= exp
(
−
n∑
j=2
|uj|2(t j − t j−1)/2
)}
.
By the strong Markov property of the Brownian motion W ′ and the definition of Q,
E Q exp
(
i
n∑
j=2
uj(W
′
T+t j − W ′T+t j−1 )
)
= E
[
exp
(
i
n∑
j=2
uj(W
′
T+t j − W ′T+t j−1 )
)
| FT
]
= E W ′T exp
(
i
n∑
j=2
uj(W
′
T+t j − W ′T+t j−1 )
)
= exp
(
−
n∑
j=2
|uj|2(t j − t j−1)/2
)
,
where the second equality holds almost surely, that is, except for a Px-null set of ω’s. This
shows that N (u2, . . . , un, t1, . . . , tn) is a null set with respect to Px.
Let N be the union of all such N (u1, . . . , un, t1, . . . , tn) for n ≥ 1, u1, . . . , un rational, and
t1 < . . . < tn rational. Therefore N is a Px-null set.
Suppose ω /∈ N . By the continuity of the paths of W ′,
E Q exp
(
i
n∑
j=2
uj(W
′
T+t j − W ′T+t j−1 )
)
= exp
(
−
n∑
j=2
|uj|2(t j − t j−1)/2
)
for all t, . . . , tn ∈ [0, ∞) and u2, . . . , un ∈ R. Thus the finite-dimensional distributions of W̃
under QT (ω, ·) are those of a Brownian motion. By the continuity of W̃ and Theorem 2.6,
under QT , W̃ is a Brownian motion, except for a null set of ω’s.
By a slight abuse of notation, we will say (Xt, Px) is a strong Markov family when (Zt, Px)
is a strong Markov family.
39.2 SDEs and PDEs
The connection between stochastic differential equations and partial differential equations
comes about through the following theorem, which is simply an application of Itô’s formula.
Let L be the operator on functions in C2 defined by
L f (x) = 12
d∑
i, j=1
ai j(x)
∂2 f
∂xi∂x j
(x) +
d∑
i=1
bi(x)
∂ f
∂xi
(x). (39.9)
39.3 Martingale problems 315
Theorem 39.3 Suppose Xt is a solution to (39.1), σ and b are bounded and Borel measurable,
and a = σσ T . Suppose f ∈ C2. Then
f (Xt ) = f (X0) + Mt +
∫ t
0
L f (Xs) ds, (39.10)
where
Mt =
∫ t
0
d∑
i, j=1
∂ f
∂xi
(Xs)σi j(Xs) dW
j
s (39.11)
is a local martingale.
Proof Since the components of the Brownian motion Wt are independent, we have
d〈W k,W �〉t = 0 if k �= �; see Exercise 9.4. Therefore
d〈X i, X j〉t =
∑
k
∑
�
σik(Xt )σ jl (Xt ) d〈W k,W �〉t
=
∑
k
σik(Xt )σ
T
k j(Xt ) dt = ai j(Xt ) dt.
We now apply Itô’s formula:
f (Xt ) = f (X0) +
∑
i
∫ t
0
∂ f
∂xi
(Xs) dX
i
s + 12
∫ t
0
∑
i, j
∂2 f
∂xi∂x j
(Xs) d〈X i, X j〉s
= f (X0) + Mt +
∑
i
∫ t
0
∂ f
∂xi
(Xs)bi(Xs) ds + 12
∫ t
0
∑
i, j
∂2 f
∂xi∂x j
(Xs)ai j(Xs) ds
= f (X0) + Mt +
∫ t
0
L f (Xs) ds,
and we are finished.
39.3 Martingale problems
In this section we consider operators in nondivergence form, that is, operators of the form
given by (39.9). We assume throughout this section that the coefficients ai j and bi are
bounded and measurable and that ai j(x) = aji(x) for all i, j = 1, . . . , d and all x ∈ Rd . The
coefficients ai j are called the diffusion coefficients and the bi are called the drift coefficients.
We also assume that the operator L is uniformly elliptic, which means that there exists � > 0

such that

d∑

i, j=1

yiai j(x)y j ≥ �|y|2, y ∈ Rd, x ∈ Rd . (39.12)

This says that the matrix ai j(x) is positive definite, uniformly in x.

We saw in the previous section that if Xt is the solution to (39.1), a = σσ T , and f ∈ C2,

then

f (Xt ) − f (X0) −

∫ t

0

L f (Xs) ds (39.13)

316 Markov processes and SDEs

is a local martingale under P. A very fruitful idea of Stroock and Varadhan is to phrase the

association of Xt with L in terms which use (39.13) as a key element. Let � consist of all

continuous functions ω mapping [0, ∞) to Rd . Let Xt (ω) = ω(t) and given a probability P,

let {Ft} be the minimal augmented filtration generated by X . A probability measure P is a

solution to the martingale problem for L started at x0 if

P(X0 = x0) = 1 (39.14)

and

f (Xt ) − f (X0) −

∫ t

0

L f (Xs) ds (39.15)

is a local martingale under P whenever f ∈ C2(Rd ). The martingale problem is well posed

if there exists a solution P and this solution is unique.

Uniqueness of the martingale problem for L is closely connected to weak uniqueness or,

equivalently, uniqueness in law of (39.1).

Theorem 39.4 Suppose a = σσ T and suppose the matrix σ (x) is invertible for each x.

Weak uniqueness for (39.1) holds if and only if the solution for the martingale problem for L

started at x is unique. Weak existence for (39.1) holds if and only if there exists a solution to

the martingale problem for L started at x.

Proof We prove the uniqueness assertion. Let � be the continuous functions on [0, ∞)

and Zt the coordinate process: Zt (ω) = ω(t). First suppose the solution to the martingale

problem is unique. If (X 1t ,W

1

t , P1) and (X

2

t ,W

2

t , P2) are two weak solutions to (39.1), define

Pxi on � to be the law of X

i under Pi, i = 1, 2. Clearly Pxi (Z0 = x) = Pi(X i0 = x) = 1.

The expression in (39.13) is a local martingale under Pxi for each i and each f ∈ C2. By the

uniqueness for the solution of the martingale problem, Px1 = Px2. This implies that the laws

of X 1t and X

2

t are the same, or weak uniqueness holds.

Now suppose weak uniqueness holds for (39.1). Let

Yt = Zt −

∫ t

0

b(Zs) ds.

Let Px1 and P

x

2 be solutions to the martingale problem. If f (x) = xk , the kth coordinate of x,

then ∂ f /∂xi(x) = δik and ∂2 f /∂xi∂x j(x) = 0, where δik is 1 if i = k and 0 otherwise, and

so L f (Zs) = bk(Zs). We see from (39.13) that the kth coordinate of Yt is a local martingale

under Pxi .

Now let f (x) = xkxm. A simple computation shows that L f (x) = akm(x), hence Y kt Y mt −∫ t

0 akm(Zs) ds is a local martingale. We set

Wt =

∫ t

0

σ−1(Zs) dYs.

The stochastic integral is finite since

E

∫ t

0

d∑

j=1

(σ−1)i j(Zs)

d∑

k=1

(σ−1)ik(Zs) d〈Y j,Y k〉s (39.16)

= E

∫ t

0

d∑

i,k=1

(a−1)ik(Zs)aik(Zs) ds = t < ∞.
Exercises 317
Since Yt is a local martingale, it follows that Wt is a local martingale, and a calculation
similar to (39.16) shows that W kt W
m
t − δkmt is also a martingale under Pxi . By Lévy’s theorem
(Exercise 12.4), Wt is a Brownian motion under both Px1 and P
x
2, and (Zt,Wt, P
x
i ) is a weak
solution to (39.1). By the weak uniqueness hypothesis, the laws of Zt under Px1 and P
x
2 agree,
which is what we wanted to prove.
Exercise 39.1 asks you to prove that the existence of a weak solution to (39.1) is equivalent
to the existence of a solution to the martingale problem.
If the σi j and bi are Lipschitz functions, the solution to (39.1) is pathwise unique; see
Exercise 24.5. By Proposition 25.2, weak existence and uniqueness hold, and then the
martingale problem for L is well posed for every starting point.
A process that can be described in terms of a martingale problem (as well as other ways) is
super-Brownian motion. Super-Brownian motion, also known as a measure-valued branching
diffusion process, is a process whose state space is the set M of finite positive measures
on Rd . The intuitive picture is as follows. Given an initial finite measure μ as a starting
point, let X nt be the process that starts with [nμ(R
d )] particles, each with mass 1/n, each
distributed according to μ(dx)/μ(Rd ), where [·] denotes the integer part. Each particle
moves as an independent Brownian motion for a time 1/n, at which time each particle splits
into two or dies, independently of the other particles. The particles that are now alive move
as independent Brownian motions for time 1/n, at which time each particle splits into two
or dies, and so on. X nt is the measure that assigns mass 1/n at each point at which there is
a particle alive at time t. We take the right-continuous version of X nt . It turns out that the
sequence converges weakly with respect to the topology of D[0, 1], but where the state space
is the set of right-continuous functions with left limits taking values in M (rather than the
set of real-valued functions) and the limit law can be characterized as the unique solution to a
martingale problem. A solution to this martingale problem started at μ ∈ M is a probability
measure on the space of continuous processes taking values in M such that
(1) P(X0 = μ) = 1;
(2) if f ∈ C∞ has compact support and we write ν( f ) for ∫ f dν, then
M ft = Xt ( f ) −
∫ t
0
Xr(
1
2� f ) dr
is a continuous martingale with quadratic variation process given by
〈M ft 〉 =
∫ t
0
Xr( f
2) dr.
See Dawson (1993) and Perkins (2002) for more on these processes.
Exercises
39.1 Show that the existence of a weak solution to (39.1) is equivalent to the existence of a solution
to the martingale problem for L.
39.2 Suppose the ai j are Lipschitz functions in x and the matrices a(x) are positive definite, uniformly
in x; see Exercise 25.4. Show that we can find matrices σ (x) so that each σi j is a Lipschitz function
of x and a(x) = σ (x)σ T (x) for each x.
318 Markov processes and SDEs
39.3 If X is a solution to (39.1), give formulas for At and Mt in terms of σ and b, where Mt is a local
martingale, At is a process whose paths are locally of bounded variation, and |Xt | = Mt + At .
39.4 Let A ∈ (−1,∞) and let X be a solution to (39.1), where all the bi’s are equal to 0, a = σσ T ,
and
ai j(x) = δi j + Axix j/|x|
2
1 + A
for x �= 0, where δi j is equal to 1 if i = j and 0 otherwise. Let a(0) be the identity matrix.
(1) Prove that the matrices a(x) are uniformly elliptic.
(2) Show that |Xt | has the same law as a Bessel process of order
d + A
1 + A .
Conclude that if A is sufficiently close to −1, then X is transient, i.e, limt→∞ |Xt | = ∞, a.s.,
while if A is sufficiently large, there exist arbitrarily large times t such that Xt = 0.
39.5 Suppose for each n ≥ 1, ani j(x) is symmetric in i and j, is continuous in x, and the matrix whose
(i, j)th entry is ani j(x) is positive definite, uniformly in x and n. Let
Ln f (x) =
d∑
i, j=1
ani j(x)
∂2 f
∂xi∂x j
(x) (39.17)
for f ∈ C2. Suppose ani j(x) converges to ai j(x) uniformly in x as n → ∞, and define
L analogously to (39.17). Fix x0 and let Pn be a solution to the martingale problem for Ln
started at x0.
(1) Prove that Pn converges weakly to a solution P to the martingale problem for L started
at x0.
(2) Prove that if the ai j are continuously differentiable functions of x whose first partial
derivatives are bounded, then there exists a solution to the martingale problem for L started
at x0.
(3) Prove that if the ai j are continuous functions of x, then there exists a solution to the
martingale problem for L started at x0.
39.6 Suppose X is a solution to dXt = σ (Xt ) dWt , where W is a d-dimensional Brownian motion,
σ (x) is a d × d matrix-valued function that is bounded, and σ T σ is positive definite, uniformly
in x. Prove the following estimate for the time to leave a ball: there exist constants c1 and c2 not
depending on x0 such that
c1r
2 ≤ E x0τB(x0,r) ≤ c2r2, r > 0,

where τB(x0,r) = inf{> 0 : Xt /∈ B(x0, r)}.

Notes

See Bass (1997) for more information.

40

Solving partial differential equations

We will be concerned with giving probabilistic representations of the solutions to certain

PDEs. Throughout we will be assuming that the given PDE has a solution, the solution is

unique, and the solution is sufficiently smooth. We will consider Poisson’s equation, the

Dirichlet problem, the Cauchy problem (with an application to Brownian passage times), and

Schrödinger’s equation.

We let Xt be the solution to

dXt = σ (Xt ) dWt + b(Xt ) dt. (40.1)

Here W is a d-dimensional Brownian motion, σ is a bounded Lipschitz continuous d × d

matrix-valued function, b is a bounded Lipschitz continuous d × 1 matrix-valued function,

and X takes values in Rd . We let a = σσ T and we consider the operator on C2 functions

given by

L f (x) = 12

d∑

i, j=1

ai j(x)

∂2 f

∂xi∂x j

(x) +

d∑

i=1

bi(x)

∂ f

∂xi

(x). (40.2)

We suppose the operator L is uniformly elliptic: there exists � > 0 such that

d∑

i, j=1

ai j(x)yiy j ≥ �

d∑

i=1

y2i , y1, . . . , yd ∈ Rd .

In fact, the uniform ellipticity of L will be used only to guarantee that the exit times

of bounded domains are finite, a.s.; see Exercise 40.1. For many non-uniformly elliptic

operators, it is often the case that the finiteness of the exit times is known for other reasons,

and the results then apply to equations involving these operators.

Let X xt be the solution to (40.1) when X0 = x and let Px be the law of X xt . As in

Chapter 39, we slightly abuse notation and say that (Xt, Px) is a strong Markov process.

40.1 Poisson’s equation

We consider first Poisson’s equation in Rd . Suppose λ > 0 and f is a C1 function with

compact support. Poisson’s equation is

Lu(x) − λu(x) = − f (x), x ∈ Rd . (40.3)

319

320 Solving partial differential equations

Theorem 40.1 Suppose u is a C2 solution to (40.3) such that u and its first and second

partial derivatives are bounded. Then

u(x) = E x

∫ ∞

0

e−λt f (Xt ) dt.

Proof Let u be the solution to (40.3). By Theorem 39.3,

u(Xt ) − u(X0) = Mt +

∫ t

0

Lu(Xs) ds,

where Mt is a martingale. By the product formula,

e−λtu(Xt ) − u(X0) =

∫ t

0

e−λsdMs +

∫ t

0

e−λsLu(Xs) ds − λ

∫ t

0

e−λsu(Xs) ds.

Taking the expectation with respect to Px and letting t → ∞,

−u(x) = E x

∫ ∞

0

e−λs(Lu − λu)(Xs) ds.

Since Lu − λu = − f , the result follows.

Let us now let D be a nice bounded domain, e.g., a ball. Poisson’s equation in D requires

one to find a function u such that Lu − λu = − f in D and u = 0 on ∂D, where f ∈ C2(D)

and λ ≥ 0. Here we can allow λ to be equal to 0.

Theorem 40.2 Suppose u is a solution to Poisson’s equation in a bounded domain D that is

C2 in D and continuous on D. Then

u(x) = E x

∫ τD

0

e−λs f (Xs) ds.

Proof The proof is nearly identical to that of the previous theorem. We already men-

tioned that τD < ∞, a.s.; see Exercise 40.1. Let Sn = inf{t : dist (Xt, ∂D) < 1/n}. By
Theorem 39.3,
u(Xt∧Sn ) − u(X0) = martingale +
∫ t∧Sn
0
Lu(Xs) ds.
By the product formula,
E
xe−λ(t∧Sn )u(Xt∧Sn ) − u(x) = E x
∫ t∧Sn
0
e−λsLu(Xs) ds − E x
∫ t∧Sn
0
e−λsu(Xs) ds
= −E x
∫ t∧Sn
0
e−λs f (Xs) ds.
Now let n → ∞ and then t → ∞ and use the fact that u is zero on ∂D.
40.2 Dirichlet problem
Let D be a ball (or other nice bounded domain) and let us consider the solution to the Dirichlet
problem: given a continuous function f on ∂D, find u ∈ C(D) such that u is C2 in D and
Lu = 0 in D, u = f on ∂D. (40.4)
40.3 Cauchy problem 321
We considered the Dirichlet problem in the special case when L is the Laplacian in
Section 21.4.
Theorem 40.3 Suppose u is a solution to the Dirichlet problem specified by (40.4). Then u
satisfies
u(x) = E x f (XτD ).
Proof As we mentioned above, τD < ∞, a.s. Let Sn = inf{t : dist (Xt, ∂D) < 1/n}. By
Theorem 39.3,
u(Xt∧Sn ) = u(X0) + martingale +
∫ t∧Sn
0
Lu(Xs) ds.
Since Lu = 0 inside D, taking expectations shows
u(x) = E xu(Xt∧Sn ).
We let t → ∞ and then n → ∞. By dominated convergence, we obtain u(x) = E xu(XτD ).
This is what we want since u = f on ∂D.
If v ∈ C2 and Lv = 0 in D, we say v is L-harmonic in D.
40.3 Cauchy problem
The related parabolic partial differential equation
∂u
∂t
= Lu
is often of interest. Here u is a function of x ∈ Rd and t ∈ [0, ∞). When we write Lu, we
mean
Lu(x, t) =
d∑
im j=1
ai j(x)
∂2u
∂xi∂x j
(x, t) +
d∑
i=1
bi(x)
∂u
∂xi
(x, t).
We will sometimes write ut for ∂u/∂t.
Suppose for simplicity that the function f is a continuous function with compact support.
The Cauchy problem is to find u such that u is bounded, u is C2 with bounded first and second
partial derivatives in x, u is C1 in t for t > 0, and

ut (x, t) = Lu(x, t), t > 0, x ∈ Rd,

u(x, 0) = f (x), x ∈ Rd . (40.5)

Theorem 40.4 Suppose there exists a solution to (40.5) that is C2 in x and C1 in t for t > 0.

Then u satisfies

u(x, t) = E x f (Xt ).

Proof Fix t0 and let Mt = u(Xt, t0 − t). Note

∂

∂t

u(x, t0 − t) = −ut (x, t0 − t).

322 Solving partial differential equations

Similarly to the proof of Theorem 39.3 (see Exercise 40.2) but using now the multivariate

version of Itô’s formula,

u(Xt, t0 − t) = martingale +

∫ t

0

Lu(Xs, t0 − s) ds −

∫ t

0

ut (Xs, t0 − s) ds. (40.6)

Since ut = Lu, Mt is a martingale, and E xM0 = E xMt0 . On the one hand,

E xMt0 = E xu(Xt0, 0) = E x f (Xt0 ),

while on the other hand,

E

xM0 = E xu(X0, t0) = u(x, t0).

Since t0 is arbitrary, the result follows.

A very similar proof allows one to represent the solution to the Cauchy problem in a

bounded domain. Suppose u(x, t) is C2 in the x variable, C1 in the t variable, and satisfies

∂u

∂t

(x, t) = Lu(x, t)

for (x, t) ∈ D × (0, t1], where D is a bounded domain in Rd and t1 > 0. Suppose u(x, 0) =

f (x) and u(x, t) = 0 for all x ∈ ∂D. Exercise 40.3 asks you to show that in this case

u(x, t) = E x f (Xt∧τD ),

where again τD is the first exit time of X from the domain D.

The Cauchy problem has an application to the passage times of Brownian motion. Suppose

we look at the equation

ux(x, t) = 12 uxx(x, t), 0 < x < b, t > 0,

with

u(x, 0) = f (x) for all x, u(0, t) = u(b, t) = 0 for all t,

where f is a bounded function on [0, b]. This is a partial differential equation (the heat

equation) that is sometimes solved in undergraduate classes; see, e.g., Boyce and DiPrima

(2009), Section 10.5. Using a combination of the technique of separation of variables and

Fourier series expansions, the solution can then be shown to be

u(x, t) =

∫

f (y)p0(t, x, y) dy,

where

p0(t, x, y) = 2

b

∞∑

n=1

e−n

2π2t/2b2 sin(nπx/b) sin(nπy/b).

See also Knight (1981), p. 62. Since u(x, t) is also equal to E x f (Xt∧τD ), where D is the

interval (0, b), then the p0(t, x, y) are the transition densities for Brownian motion killed on

exiting (0, b).

In particular, if we take f identically equal to 1 on (0, b), we see that starting at x inside

(0, b), Px(t < τD) is asymptotically equal to ce−π
2t/2b2 . If b is 2, this becomes ce−π
2t/8.
40.4 Schrödinger operators 323
Since the time for a Brownian motion started at 0 to leave (−1, 1) is the same as the time
for a Brownian motion started at 1 to leave (0, 2), we obtain the estimate that was used in
Exercise 7.2.
40.4 Schrödinger operators
Finally we look at what happens when one adds a potential term, that is, when one considers
the operator
Lu(x) + q(x)u(x). (40.7)
This is known as the Schrödinger operator, and q(x) is known as the potential. Equa-
tions involving the operator in (40.7) are considerably simpler than the quantum mechanics
Schrödinger equation because here all terms are real-valued.
If Xt is the diffusion corresponding to L, then solutions to PDEs involving the operator in
(40.7) can be expressed in terms of Xt by means of the Feynman–Kac formula. To illustrate,
let D be a nice bounded domain, e.g., a ball, q a C2 function on D, and f a continuous
function on ∂D; q+ denotes the positive part of q.
Theorem 40.5 Let D, q, f be as above. Let u be a C2 function on D that agrees with f on
∂D and satisfies Lu + qu = 0 in D. If
E
x exp
( ∫ τD
0
q+(Xs) ds
)
< ∞,
then
u(x) = E x
[
f (XτD )e
∫ τD
0 q(Xs) ds
]
. (40.8)
Proof Let Bt =
∫ t∧τD
0 q(Xs) ds. By Itô’s formula and the product formula,
eB(t∧τD)u(Xt∧τD ) = u(X0) + martingale +
∫ t∧τD
0
u(Xr)e
Br dBr +
∫ t∧τD
0
eBrLu(Xr) dr.
Taking the expectation with respect to Px and using Proposition 39.3,
E
xeB(t∧τD)u(Xt∧τD ) = u(x) + E x
∫ t∧τD
0
eBr u(Xr)q(Xr) dr + E x
∫ t∧τD
0
eBrLu(Xr) dr.
Since Lu + qu = 0,
E xeB(t∧τD )u(Xt∧τD ) = u(x).
If we let t → ∞ and use the exponential integrability of q+, the result follows.
The existence of a solution to Lu + qu = 0 in D depends on the finiteness of
E
x exp(
∫ τD
0 q
+(Xs) ds), an expression that is sometimes known as the gauge.
Even in one dimension with D = (0, 1) and q a constant function, the gauge need not
be finite. With x = 1/2, Px(τD > t) is asymptotically equal to ce−π2t/2 as t → ∞ by

324 Solving partial differential equations

Section 40.3. Hence

E

x exp

( ∫ τD

0

q ds

)

= E xeqτD

=

∫ ∞

0

qeqtPx(τD > t) dt;

this is infinite if q ≥ π2/2.

Exercises

40.1 This (lengthy) exercise is designed to guide you through a proof that solutions to (40.1) exit

bounded sets in finite time, a.s.

(1) Suppose

Xt = Wt +

∫ t

0

as ds,

where W is a one-dimensional Brownian motion, and as is an adapted process bounded by K.

Let L > K > 0 and t0 > 0. Show that there exists ε > 0, depending only on L, K, and t0 such

that P(|Xt0 | > 3L) > ε.

(2) Suppose Xt = Mt +

∫ t

0 as ds, where as is as in (1) and M is a continuous martingale with

K−1 ≤ d〈M〉t/dt ≤ K, a.s. Use a time change argument to show that there exist L, ε > 0 such

that

P(sup

s≤1

|Xs| ≤ L) ≤ 1 − ε.

(3) If now X is a solution to (40.1), a = σσ T , and L given by (40.2) is uniformly elliptic,

show by looking at the first coordinate of X that there exist L, ε such that

Px(sup

s≤1

|Xs| ≤ L) ≤ 1 − ε, x ∈ B(0, L).

(4) What you have proved in (3) can be rephrased as saying that if (Xt , Px) is a strong

Markov process that solves (40.1) for every starting point and τ = inf{t : Xt /∈ B(0, L)}, then

Px(τ > 1) ≤ 1 − ε, where ε does not depend on x. Now use the strong Markov property (cf. the

proof of Proposition 21.2) to show Px(τ > k) ≤ (1 − ε)k . Conclude that τ < ∞, Px-a.s., for
each starting point x.
40.2 Prove (40.6).
40.3 Let D be a ball in Rd and suppose u is the solution to the Cauchy problem in the domain
D × [0, t1] as described in Section 40.3. Show that u(x, t) = E x f (Xt∧τD ).
40.4 Suppose f is such that the solution u to
ut (x, t) = Lu(x, t) + q(x), u(x, 0) = f (x),
is C2 in x and t and X is the diffusion associated with L. Prove that
u(x, t) = E x
[
f (Xt )e
∫ t
0 q(Xs ) ds
]
.
Notes 325
40.5 Suppose (Xt , Px) is a Brownian motion on [0, b] with reflection at 0 and b. Find a series expansion
for p(t, x, y), the transition densities for X .
Hint: Imitate the argument for absorbing Brownian motion in Section 40.3, but now use the
boundary conditions ux(0, t) = ux(b, t) = 0.
Notes
See Bass (1997) for more on the connection between probability and PDEs.
41
One-dimensional diffusions
Under very mild regularity conditions, every one-dimensional diffusion arises from first
time-changing a one-dimensional Brownian motion and then making a transformation of the
state space. We will prove this fact in this chapter.
41.1 Regularity
Throughout this chapter we suppose that we have a continuous process (Xt, Px) defined on
an interval I contained in R. For almost all of the chapter, we suppose for simplicity that the
interval is in fact all of R. We further suppose that (Xt, Px) is a strong Markov process with
respect to a right-continuous filtration {F t} such that each Ft contains all the sets that are
Px-null for every x. We call such a process a one-dimensional diffusion.
Write
Ty = inf{t : Xt = y}, (41.1)
the first time the process X hits the point y. We will also assume that every point can be hit
from every other point: for all x, y,
Px(Ty < ∞) = 1. (41.2)
When (41.2) holds, we say the diffusion is regular.
For any interval J , define τJ = inf{t : Xt /∈ J}, the first time the process leaves J . When
Xt is a Brownian motion, we know (Proposition 3.16) that the distribution of Xt upon exiting
[a, b] is
Px(X (τ[a,b]) = a) = b − x
b − a , P
x(X (τ[a,b]) = b) = x − a
b − a . (41.3)
We say that a regular diffusion Xt is on natural scale if (41.3) holds for every interval [a, b].
We also say a regular diffusion X defined on an interval I properly contained in R is on
natural scale if (41.3) holds whenever [a, b] ⊂ I and x ∈ (a, b).
If Xt is regular, then the process started at x must leave x immediately. That is, if S =
inf{t > 0 : Xt �= x}, then Px(S = 0) = 1. To see this, let ε > 0 and U = inf{t : |Xt −x| ≥ ε}.

By the regularity of X , E xe−U > 0. Observe that U = S + U ◦ θS , where θt is the shift

operator. By the strong Markov property at time S,

E

xe−U = E x[e−SE x[e−U ◦ θS | FS] ] = E x[e−SE XS [e−U ] ] = E x[e−SE xe−U ],

since XS = x by the continuity of the paths of X . The only way this can happen is if

E

xe−S = 1, which implies S = 0, Px-a.s.

326

41.2 Scale functions 327

41.2 Scale functions

We will show that given a regular diffusion, there exists a scale function that is continuous,

strictly increasing, and such that s(Xt ) is on natural scale.

We first look at a special case, when the diffusion is given as the solution to an SDE.

Suppose Xt is given as the solution to

dXt = σ (Xt ) dWt + b(Xt ) dt, (41.4)

where we assume σ and b are real-valued, continuous and bounded above and σ is bounded

below by a positive constant. Let a(x) = σ 2(x). In this case we can give a formula for the

scale function.

Theorem 41.1 The scale function s(x) is the solution to

1

2 a(x)s

′′(x) + b(x)s′(x) = 0,

and for some constants c1, c2, and x0 is given by

s(x) = c1 + c2

∫ x

x0

exp

(

−

∫ y

x0

2b(w)

a(w)

dw

)

dy. (41.5)

Proof To solve the differential equation, we write

s′′(x)

s′(x)

= −2 b(x)

a(x)

,

or (log s′(x))′ = −2b(x)/a(x), from which (41.5) follows. Since we assumed that σ and b

are continuous, s(x) given by (41.5) is C2. Since σ is bounded below by a positive constant

and b and σ are bounded, s given by (41.5) is strictly increasing. Applying Itô’s formula,

s(Xt ) − s(X0) =

∫ t

0

s′(Xr)σ (Xr) dWr (41.6)

because ∫ t

0

[ 12 s

′′(Xr)σ (Xr)2 + s′(Xr)b(Xr)] dr = 0.

This implies that s(Xt ) − s(X0) is a martingale, hence a time change of Brownian motion.

Therefore the exit probabilities of s(Xt ) for an interval [a, b] are the same as those of a

Brownian motion, namely, those given by (41.3).

From (41.6), if Yt = s(Xt ), then

dYt = (s′σ )(s−1(Yt )) dWt . (41.7)

Now we show there exists a scale function for general regular diffusions on R. Let J be

an interval [a, b]. We define

p(x) = pJ (x) = Px(XτJ = b). (41.8)

Proposition 41.2 Let J = [a, b] be a finite interval. Then p(Xt∧τJ ) is a regular diffusion on

[0, 1] on natural scale.

328 One-dimensional diffusions

Proof First we show that p is increasing. To get to the point b starting from x, the process

must first hit every point between x and b because X has continuous paths. If a < x < y < b,
by the strong Markov property at time Ty, p(x) ≤ p(y). We claim there is a positive probability
that the process starting from x hits a before y, that is,
Px(Ta < Ty) > 0. (41.9)

If (41.9) did not hold, then the process started at x must hit y before hitting a, then by the

continuity of paths must hit x before hitting a, and once the process is again at x, it again hits

y with probability one before a and so on. Therefore the process never hits a, a contradiction

to the regularity; Exercise 41.2 asks you to make this argument precise. Therefore (41.9)

does hold, and by the strong Markov property at Ty,

p(x) = Px(Ty < Ta)p(y).
Since Px(Ty < Ta) = 1 − Px(Ta < Ty) is strictly less than 1, p is strictly increasing.
Next we show that p is continuous. We show continuity from the right; the proof of
continuity from the left is similar. Suppose xn ↓ x. The process Xt has continuous paths, so
given ε we can find t small enough so that Px(Ta < t) < ε. By the Blumenthal 0–1 law
(Proposition 20.8), Px(T(x,b] = 0) is zero or one, where T(x,b] is the first time the process hits
the interval (x, b]. If it is zero, the process immediately moves to the left from x, a.s., and
by the strong Markov property at Tx, it never hits b, a contradiction. The probability must
therefore be one. Thus by the continuity of paths, for n large enough, Px(Txn < t) ≥ 1 − ε.
Hence with probability at least 1 − 2ε, Xt hits xn before a. Since
p(x) = Px(Txn < Ta) p(xn) ≥ (1 − 2ε)p(xn)
and ε is arbitrary, we see that p(x) ≥ lim inf n→∞ p(xn). Since p is strictly increasing, p(xn)
decreases, and therefore p(x) = lim p(xn).
Finally, we show p(Xt ) is on natural scale. Let [e, f ] ⊂ (0, 1) and let
r(y) = Py(Xt hits p−1( f ) before hitting p−1(e)).
Note that
Px(p(Xt ) hits f before e) = Pp−1(x)(Xt hits p−1( f ) before p−1(e))
= r(p−1(x)). (41.10)
For y ∈ [p−1(a), p−1(b)], the strong Markov property tells us that
p(y) = Py(Xt hits p−1( f ) before p−1(e))p(p−1( f )) (41.11)
+ Py(Xt hits p−1(e) before p−1( f ))p(p−1(e))
= r(y) f + (1 − r(y))e.
Solving for r(y), we obtain r(y) = (p(y) − e)/( f − e). Substituting in (41.10),
Px(p(Xt ) hits f before e) = r(p−1(x)) = (p(p−1(x)) − e)/( f − e)
= (x − e)/( f − e),
which is the formula we wanted.
41.3 Speed measures 329
Note that if Xt is on natural scale, then so is c1Xt + c2 for any constants
c1 > 0, c2 ∈ R.

Theorem 41.3 There exists a continuous strictly increasing function s such that s(Xt ) is on

natural scale on s(R).

Proof Let Jn be closed intervals increasing up to R. Pick two points in J1; label them a

and b with a < b. Choose An and Bn so that if sn(x) = An pJn (x) + Bn, then sn(a) = 0 and
sn(b) = 1.
We will show that if n ≥ m, then sn = sm on Jm. Once we have that, we can set s(x) = sn(x)
on Jn, and the theorem will be proved.
Suppose Jm = [e, f ]. By Proposition 41.2, both sm(Xt ) and sn(Xt ) are on natural scale. For
all x ∈ Jm,
sm(x) − sm(e)
sm( f ) − sm(e) = P
sm(x)
(
sm(Xt ) hits sm( f ) before sm(e)
)
= Px(Xt hits f before e).
We have a similar equation with sm replaced everywhere by sn. It follows that
sm(x) − sm(e)
sm( f ) − sm(e) =
sn(x) − sn(e)
sn( f ) − sn(e)
for all x, which implies that sn(x) = Csm(x) + D for some constants C and D. Since sn and
sm are equal at both x = a and x = b, then C must be 1 and D must be 0.
41.3 Speed measures
Suppose that (Px, Xt ) is a regular diffusion on R on natural scale. For each finite interval
(a, b), define
Gab(x, y) =
{
2(x−a)(b−y)
b−a , a < x ≤ y < b,
2(y−a)(b−x)
b−a , a < y ≤ x < b,
(41.12)
and set Gab(x, y) = 0 if x or y is not in (a, b). A measure m(dx) is the speed measure for the
diffusion (Xt, Px) if
E
x
τ(a,b) =
∫
Gab(x, y) m(dy) (41.13)
for each finite interval (a, b) and each x ∈ (a, b). As (41.13) indicates, the speed measure
governs how quickly the diffusion moves through intervals.
As an example, let us argue that the speed measure for Brownian motion is a Lebesgue
measure. By Proposition 3.16, if (Xt, Px) is a Brownian motion,
E
x
τ(a,b) = (x − a)(b − x).
On the other hand, a calculation shows that∫
Gab(x, y) dy = (x − a)(b − x).
330 One-dimensional diffusions
Since
E
x
τ(a,b) =
∫
Gab(x, y) dy
and Brownian motion is on natural scale, we see that the speed measure m(dy) of Brownian
motion is equal to a Lebesgue measure.
We will show that
(1) a regular diffusion on natural scale has one and only one speed measure,
(2) the law of the diffusion is determined by the speed measure, and
(3) there exists a diffusion with a given speed measure.
We first want to show that any speed measure must satisfy 0 < m(a, b) < ∞ for any finite
interval [a, b]. To start we have the following lemma.
Lemma 41.4 If [a, b] is a finite interval, then supx E
x
τ k(a,b) < ∞ for each positive integer k.
Proof Pick y ∈ (a, b). Since Xt is a regular diffusion, Py(Ta < ∞) = 1, and hence there
exists t0 such that Py(Ta > t0) < 1/2. Similarly, taking t0 larger if necessary, Py(Tb > t0) ≤

1/2. If a < x ≤ y, then
Px(τ(a,b) > t0) ≤ Px(Ta > t0) ≤ Py(Ta > t0) ≤ 1/2,

and similarly, Px(τ(a,b) > t0) ≤ 1/2 if y ≤ x < b. By the Markov property,
Px(τ(a,b) > (n + 1)t0) = E x[PX (nt0 )(τ(a,b) > t0); τ(a,b) > nt0]

≤ 12Px(τ(a,b) > nt0),

and by induction, Px(τ(a,b) > nt0) ≤ 2−n. The lemma is now immediate.

Lemma 41.5 If (Xt, Px) has a speed measure m and [a, b] is a non-empty finite interval,

then 0 < m(a, b) < ∞.
Proof If m(a, b) = 0, then for x ∈ (a, b), we have
E xτ(a,b) =
∫
Gab(x, y) m(dy) = 0,
which implies τ(a,b) = 0, Px-a.s., a contradiction to the continuity of the paths of Xt .
Next we show the finiteness of m(a, b). Pick (e, f ) such that [a, b] ⊂ (e, f ). There exists
a constant c such that for x, y ∈ (a, b), Ge f (x, y) is bounded below by c, so
m(a, b) ≤ c−1
∫ f
e
Ge f (x, y) m(dy) = c−1E xτ(e, f ) < ∞.
This completes the proof.
Theorem 41.6 A regular diffusion on natural scale on R has one and only one speed
measure.
Proof First let I = (e, f ) be a finite open interval. For n > 1 let xi = e + i( f − e)/2n,

i = 0, 1, 2, . . . , 2n. Let Dn = {xi : 0 ≤ i ≤ 2n}. Let

mn(dx) = 2n

2n−1∑

i=1

B(xi)δxi, (41.14)

41.3 Speed measures 331

where B(xi) = E xiτ(xi−1,xi+1). We first want to show that if [a, b] is a subinterval of I with a, b

each in Dn and x is also in Dn, then

E xτ(a,b) =

∫

Gab(x, y) mn(dy). (41.15)

To see this, let S0 = 0 and Sj+1 = inf{t > Sj : |Xt − XSj | = 2−n} ∧ τ(a,b). The Sj’s are

the successive times that X moves 2−n, up until the time of leaving (a, b). Because X is

on natural scale, XSj+1 is equal to XSj + 2−n with probability 12 and equal to XSj − 2−n with

probability 12 , until leaving (a, b). Therefore XSj is a simple symmetric random walk on the

lattice with step size 2−n, stopped on leaving (a, b).

Let J (xi) = (xi − 2−n, xi + 2−n) for xi �= a, b. Let J (a) = J (b) = ∅. By repeated use of

the strong Markov property,

E

x

τ(a,b) =

∞∑

j=0

E

x

(Sj+1 − Sj)

= E x

∞∑

j=0

E

X (S j )[τJ (X0 )] = E x

∞∑

j=0

B(XSj )1(a,b)(XSj ).

Let Ni =

∑∞

j=0 1{xi}(XSj ), the number of visits to xi before exiting (a, b). Then

E

x

τ(a,b) = E x

∞∑

j=0

B(XSj )1(a,b)(XSj ) (41.16)

= E x

∞∑

j=0

2n−1∑

i=1

B(XSj )1{xi}(XSj )

= E x

2n−1∑

i=1

B(xi)Ni.

E

xNi must equal 0 when x = a or x = b and satisfies the equation

E

x j Ni = δi j + 12 (E x j+1 Ni + E x j−1 Ni), (41.17)

where δi j is 1 if i = j and 0 otherwise. This holds because for j �= i, the process goes left or

right, each with probability 1/2, while if j = 1, we add one to Ni before going left or right.

The function x → E xNi is hence piecewise linear on (a, xi) and on (xi, b). Some algebra

shows that we must have

E

xNi = 2nGab(x, xi). (41.18)

Combining (41.16) and (41.18),

E

x

τ(a,b) =

2n−1∑

i=1

B(xi)2

nGab(x, xi)

=

∫

Gab(x, y) mn(dy),

which is (41.15).

332 One-dimensional diffusions

Using (41.15) and the same proof as that of Lemma 41.5, mn(a, b) is bounded above by

a constant independent of n. By a diagonalization procedure, there exists a subsequence nk

such that mnk converges weakly to m, where m is a measure that is finite on every subinterval

(a, b) such that [a, b] ⊂ I . By the continuity of Gab,

E

x

τ(a,b) =

∫

Gab(x, y) m(dy) (41.19)

whenever a, b, and x are in Dn for some n.

We now remove this last restriction. If a, b are not of this form, take ar, br to be in ∪nDn

such that (ar, br) ↑ (a, b). Then τ(ar,br ) ↑ τ(a,b), and by the continuity of Gab in a, b, x, and y,

we have (41.19) for all a and b. Take yr ↑ x, zr ↓ x such that yr and zr are in Dn for some n.

By the strong Markov property,

E

x

τ(a,b) = E xτ(yr,zr ) + E yrτ(a,b)Px(Xτ(yr ,zr ) = yr)

+ E zrτ(a,b)Px(Xτ(yr ,zr ) = zr).

By the continuity of Gab in x, and the fact that E

x

τ(y′r,zr ) → 0 as r → ∞, we obtain (41.19)

for all x.

We leave the uniqueness as Exercise 41.3.

Finally, let Ik be finite subintervals increasing up to R. Let mk be the speed measure for Xt

on the interval Ik . By the uniqueness result, mk agrees with m� on I� if I� ⊂ Ik . Setting m to

be the measure whose restriction to Ik is mk gives us the speed measure.

The speed measure completely characterizes occupation times.

Corollary 41.7 Suppose Xt is a diffusion on natural scale on R. If f is bounded and

measurable, for each a < b,
E
x
∫ τ(a,b)
0
f (Xs) ds =
∫
Gab(x, y) f (y) m(dy). (41.20)
Proof Suppose that f is continuous and bounded on [a, b]. Let xi, Sj, B(xi), Ni, and mn be
as in the proof of Theorem 41.6. Let
εn = sup{| f (x) − f (y)| : |x − y| ≤ 2−n}.
Note that if (x − a)/(b − a) is a multiple of 2−n,
E
x
∫ τ(a,b)
0
f (Xs) ds =
∞∑
j=0
E
x
∫ S j+1
S j
f (Xs) ds (41.21)
and
E
x
∞∑
j=0
f (XSj )(Sj+1 − Sj) = E x
∞∑
j=0
f (XSj )1(a,b)(XSj )E
XS j S1 (41.22)
=
2n−1∑
i=1
f (xi)B(xi)1(a,b)(xi)E
xNi.
41.4 The uniqueness theorem 333
Moreover, the right-hand side of (41.21) differs from the left-hand side of (41.22) by at most
εnE
x
τ(a,b). By (41.18) the right-hand side of (41.22) is equal to
2n−1∑
i=1
2n f (xi)B(xi)1(a,b)(xi)Gab(x, xi) =
∫
Gab(x, xi) f (xi) mn(dx).
By weak convergence along an appropriate subsequence, the left-hand side and the right-
hand side of (41.20) differ by at most lim supn εnE
x
τ(a,b), which is zero. A limit argument
then shows that (41.20) holds for all x ∈ [a, b], and another limit argument shows that (41.20)
holds for all bounded f .
41.4 The uniqueness theorem
We next turn to showing that the speed measure characterizes the law of a diffusion.
Theorem 41.8 If (Xt, Pxi ), i = 1, 2, are two diffusions on natural scale with the same speed
measure m, then Px1 = Px2.
Proof We start by letting (a, b) ⊂ R and considering the operator
Riλ f (x) = E xi
∫ τ(a,b)
0
e−λt f (Xt ) dt, λ ≥ 0, (41.23)
for i = 1, 2. We show first that R10 = R20, that is, that
E
x
1
∫ τ(a,b)
0
f (Xt ) dt = E x2
∫ τ(a,b)
0
f (Xt ) dt
if f is bounded and Borel measurable. This is easy, because by Corollary 41.7, both sides
are equal to ∫ b
a
Gab(x, y) m(dy).
Since (X̂t, Pxi ) is a Markov process, where X̂ is the process X killed on exiting (a, b), the
resolvent equation (37.2) holds. We have
‖Ri0 f ‖∞ ≤ ‖ f ‖∞ sup
x
E xτ(a,b)
= ‖ f ‖∞ sup
x
∫
Gab(x, y) m(dy)
≤ c‖ f ‖∞m(a, b) < ∞.
Since ‖Ri0‖∞ < ∞, we can let μ go to zero in (37.2). We can repeat the proof of Corollary
37.3 with λ = 0 to see that
Riμ f = Ri0 f +
∞∑
i=1
(−μ) j(Ri0) j+1 f
provided μ < ‖Ri0‖∞. We can then use Remark 37.4 to obtain that R1λ = R2λ for all λ > 0. We

now take open intervals In increasing up to R. Applying the above to In and letting n → ∞,

334 One-dimensional diffusions

we have

E

x

1

∫ ∞

0

e−λt f (Xt ) dt = E x2

∫ ∞

0

e−λt f (Xt ) dt

whenever f is bounded and Borel measurable and x ∈ R.

Suppose f is continuous as well. By the uniqueness of the Laplace transform, we see

that E x1 f (Xt ) = E x2 f (Xt ) for almost every t, and since both terms are continuous in t, this

equality holds for all t. By a limit argument, this equality holds for all bounded and Borel

measurable f . Therefore the one-dimensional distributions of X under Px1 and P

x

2 agree.

If s < t and f and g are bounded and Borel measurable,
E
x
1[ f (Xs)g(Xt )] = E x1[ f (Xs)P1t−sg(Xs)] = E x1[ f (Xs)P2t−sg(Xs)]
= E x1[( f P2t−sg)(Xs)] = E x2[( f P2t−sg)(Xs)]
= E x2[ f (Xs)P2t−sg(Xs)] = E x2[ f (Xs)g(Xt )].
Here Pit−s is the semigroup for (Xt, P
x
i ); since the one-dimensional distributions agree, P
1
t−s =
P2t−s. We have thus shown the two-dimensional distributions of X under P
x
1 and P
x
2 agree.
Continuing, we see that all the finite-dimensional distributions under Px1 and P
x
2 agree. By
the continuity of the paths of X and Theorem 2.6, that is enough to show equality of Px1
and Px2.
41.5 Time change
We now want to show that if m is a measure such that 0 < m(a, b) < ∞ for all intervals [a, b],
then there exists a regular diffusion on natural scale on R having m as a speed measure. If
m(dx) had a density, say m(dx) = r(x) dx, we would proceed as follows. Let Wt be a
one-dimensional Brownian motion and let
At =
∫ t
0
r(Ws) ds, Bt = inf{u : At > u}, Xt = WBt .

In other words, we let Xt be a certain time change of Brownian motion. In general, where

m(dx) does not have a density, we make use of the local times Lxt of Brownian motion; see

Chapter 14.

Let

At =

∫

Lxt m(dx), Bt = inf{u : Au > t}, Xt = WBt . (41.24)

Theorem 41.9 Let (Wt, Px) be a Brownian motion and m a measure on R such that 0 <
m(a, b) < ∞ for every finite interval (a, b). Then, under Px, Xt as defined by (41.24) is a
regular diffusion on natural scale with speed measure m.
Proof First we show that Xt is a continuous process. Fix ω. If we choose a < inf s≤t Ws and
b > sups≤t Ws, then

At =

∫

Lxt m(dx) =

∫

Lxt 1[a,b](x) m(dx)

since Lxt increases only for those times s when Ws = x. By the continuity of Lxt and dominated

convergence, we conclude that At (ω) is continuous at time t. Next we show that At is strictly

increasing. Fix ω. If s < u, pick t ∈ (s, u). Set x = Wt . Because the support of the measure
41.5 Time change 335
dLxt is the set {r : Wr = x}, then Lxu −Lxs > 0. By the continuity of local times, Lyu −Lys > 0 for

all y in a neighborhood of x, say (x − δ, x + δ). Since m(x − δ, x + δ) > 0, then Au − As > 0.

Hence At is strictly increasing. This and the continuity of At imply that Bt is continuous, and

therefore Xt is continuous.

Next we show that Xt is a regular diffusion on natural scale. By monotone convergence

and the fact that Lxt → ∞, a.s., for each x, At ↑ ∞, hence Bt ↑ ∞, so τX(a,b) < ∞, Px-a.s.,
where τX(a,b) denotes the exit time of (a, b) by Xt and τ
W
(a,b) denotes the corresponding exit
time of Wt . Moreover,
Px(X (τX(a,b)) = b) = Px(W (τW(a,b)) = b) =
x − a
b − a ,
since Xt is a time change of Wt .
To verify the strong Markov property, we repeat the argument of Section 22.3. Let F ′t =
FBt . Then if T is a stopping time for F ′t , we have
E
x[ f (XT+t ) | F ′T ] = E x[ f (W (BT+t )) | FBT ].
BT can be seen to be a stopping time for Ft and BT+t = Bt ◦ θBT where θt are the shift
operators, so this is
E
x
E
W (BT ) f (WBt ) = E xE XT f (Xt ).
As in Section 20.3, this suffices to show that Xt is a strong Markov process.
It remains to determine the speed measure of Xt . Fix (a, b) and write τX for τX(a,b) and τW
for τW(a,b). We have
E
x
τX = E x
∫ ∞
0
1(a,b)(Xs∧τX ) ds
= E x
∫ ∞
0
1(a,b)(WBs∧τX ) ds
= E x
∫ ∞
0
1(a,b)(Wt∧τW ) dAt
= E x
∫ ∫ ∞
0
1(a,b)(Wt∧τW ) L
y
t m(dy)
= E x
∫ ∫ τW
0
Lyt m(dy) =
∫
E
xLyτW m(dy).
We also have
E
xLyτW = E x|WτW − y| − |x − y|
by (14.5). This is equal to
|a − y|Px(WτW = a) + |b − y|Px(WτW = b) − |x − y|
= |a − y|b − x
b − a + |b − y|
x − a
b − a − |x − y| = Gab(x, y).
We thus have
E
x
τX =
∫
Gab(x, y) m(dy),
as required.
336 One-dimensional diffusions
As a corollary to the proof, we see that a regular diffusion on natural scale is a local
martingale, since it is a time change of Brownian motion.
41.6 Examples
Let us calculate the scale function and the speed measure for some examples of diffusions.
First we need to connect the speed measure with the coefficients of an SDE.
Let us look at the solutions to the SDE (41.4), but now suppose b is identically zero, or
dXt = σ (Xt ) dWt . We again set a(x) = σ (x)2.
Theorem 41.10 Suppose c1 < σ (x) < c2 for all x and σ is continuous. The speed measure
of Xt is given by
m(dx) = 1
a(x)
dx.
Proof Since dXt = σ (Xt ) dWt , then 〈X 〉t =
∫ t
0 a(Xs) ds. To obtain a Brownian motion W t
by time-changing the martingale Xt , we must time-change by the inverse of 〈X 〉t . On the
other hand, from Theorem 41.9, Xt is the time-change of a Brownian motion by Bt , where Bt
is given by (41.24). Hence
Bt = 〈X 〉t =
∫ t
0
a(Xs) ds.
The inverse of Bt , namely, At , must then satisfy
dAt
dt
= 1
a(XAt )
= 1
a(Wt )
,
or
At =
∫ t
0
1
a(Ws)
ds =
∫
Lyt
1
a(y)
dy
for all t, using Theorem 14.4. However, At =
∫
Lyt m(dy) by (41.24). Hence∫
Lyt
1
a(y)
dy =
∫
Lyt m(dy).
We know E xLyτ(c,d) = Gcd (x, y). Therefore∫
Gcd (x, y) m(dy) =
∫
E
xLyτ(c,d) m(dy)
=
∫
E xLyτ(c,d)
1
a(y)
dy
=
∫
Gcd (x, y)
1
a(y)
dy
for all c, d, and x, which implies m(dy) = (1/a(y)) dy.
Now we can look at some examples and do calculations.
Brownian motion with constant drift. This process is the solution to the SDE dXt = dWt+b dt.
From Theorem 41.1, s(x) = exp(−2bx) is the scale function. If Yt = s(Xt ), then
Exercises 337
(s′σ )(s−1(y)) = −2by, or Yt corresponds to the operator 2b2y2 f ′′, and the speed measure is
(4b2y2)−1 dx.
Bessel processes. The process is only defined on the state space [0, ∞) instead of all of R
and there is a boundary condition at 0. We ignore this here and consider a Bessel process of
order ν up until the first hit of 0. Then X solves the SDE
dXt = dWt + ν − 1
2Xt
dt.
If ν �= 2, a calculation using Theorem 41.1 shows that s(x) = x2−ν . Then Yt = s(Xt ) satisfies
dYt = (2 − ν)Y (1−ν)/(2−ν)t dWt,
and the speed measure is
m(dx) = (2 − ν)−2x(2ν−2)/(2−ν) dx, x > 0.

Exercises

41.1 In the proof of Proposition 41.2 we used the strong Markov property numerous times. Write

out carefully in terms of shift operators and conditional expectations how the strong Markov

property is applied in each case.

41.2 Give a rigorous proof of (41.9).

41.3 Show that if ∫

Gab(x, y) m1(dy) =

∫

Gab(x, y) m2(dy)

for all x, a, and b, then m1 = m2.

41.4 Show that if X is a Bessel process of order 2, then the scale function is given by s(x) = log x,

Yt = s(Xt ) satisfies dYt = e−Yt dWt , and the speed measure is m(dx) = e2x dx.

41.5 Suppose X is a regular diffusion whose state space is R. Prove that X is on natural scale if and

only if

P(a+b)/2(Ta < Tb) = 12
whenever a < b.
41.6 Let a > 0 and let m(dx) = dx + a δ0(dx), where δ0 is the point mass at 0. Let (Xt , Px) be the

diffusion on the line on natural scale whose speed measure is given by m. Show that under P0,∫ t

0

1{0}(Xs) ds > 0

with probability one for each t > 0. Prove that for each t > 0, Zt = {t : Xt = 0} contains no

intervals. Thus the zero set of the process X spends an amount of time at 0 that has positive

Lebesgue measure, but the zero set contains no intervals.

41.7 Define

ma(dx) =

{

dx, x ≥ 0,

a dx, x < 0.
338 One-dimensional diffusions
Let (Xt , Pxa) be the diffusion on natural scale on the line whose speed measure is given by ma.
Suppose x > 0.

Prove that if a → ∞, then Pxa converges weakly to the law of Brownian motion absorbed (i.e.,

killed) at 0, started at x. What do you think happens when a → 0?

Notes

We have considered diffusions on R but most of what we discussed goes through for diffusions

whose state space is an interval properly contained in R. In this case, one must specify what

the process does when it hits the boundary. Being absorbed (i.e., killed) or reflected are two

options, but much more complicated behavior is possible. See Itô and McKean (1965) and

Knight (1981) for the complete story.

42

Lévy processes

A Lévy process is a process with stationary and independent increments whose paths are

right continuous with left limits. Having stationary increments means that the law of Xt − Xs

is the same as the law of Xt−s −X0 whenever s < t. Saying that X has independent increments
means that Xt − Xs is independent of σ (Xr; r ≤ s) whenever s < t.
We want to examine the structure of Lévy processes. We have three examples already: the
Poisson process, Brownian motion, and the deterministic process Xt = t. It turns out that all
Lévy processes can be built up out of these building blocks. We will show how to construct
Lévy processes and give a representation of an arbitrary Lévy process.
Recall that we use Xt− = lims

K, a.s.

Lemma 42.3 If Xt is a Lévy process with bounded jumps and with X0 = 0, then Xt has

moments of all orders, that is, E |Xt |p < ∞ for all positive integers p.
Proof Suppose the jumps of Xt are bounded in absolute value by K. Since Xt is right
continuous with left limits, there exists M > K such that P(sups≤t |Xs| ≥ 2M ) ≤ 1/2.

42.2 Construction of Lévy processes 341

Let T1 = inf{t : |Xt | ≥ M} and Ti+1 = inf{t > Ti : |Xt − XTi | > M}. For s < T1, |Xs| ≤ M ,
and then |XT1 | ≤ |XT1−| + |�XT1 | ≤ M + K ≤ 2M . We have
P(sup
s≤t
|Xs| ≥ 2(i + 1)M ) ≤ P(Ti+1 ≤ t) ≤ P(Ti ≤ t, Ti+1 − Ti ≤ t)
= P(sup
s≤t
|XTi+s − XTi | ≥ 2M, Ti ≤ t)
= P(sup
s≤t
|Xs| ≥ 2M )P(Ti ≤ t)
≤ 12P(Ti ≤ t),
using Lemma 42.2 in the last equality. By induction, P(sups≤t |Xs| ≥ 2iM ) ≤ 2−i, and the
lemma now follows immediately.
A key lemma is the following.
Lemma 42.4 Suppose I is a finite interval of the form (a, b), [a, b), (a, b], or [a, b] with
a > 0 and m is a finite measure on R giving no mass to Ic. Then there exists a Lévy process

Xt satisfying (42.3).

Proof First let us consider the case where I = [a, b). We approximate m by a discrete

measure. If n ≥ 1, let z j = a + j(b − a)/n, j = 0, . . . , n − 1, and let

mn(dx) =

n−1∑

j=0

m([z j, z j+1))δz j (dx),

where δz j is the point mass at z j. The measures mn converge weakly to m as n → ∞ in the

sense that ∫

f (x) mn(dx) →

∫

f (x) dx

whenever f is a bounded continuous function on R. For each n, let Pn, jt , j = 0, . . . , n − 1,

be independent Poisson processes with parameters m([z j, z j+1)) and let

X nt =

n−1∑

j=0

z jP

n, j

t .

Then X n is a Lévy process with jumps bounded by b.

By Lemma 42.2, if Tn is a stopping time for X n, ε > 0, and δ > 0, then

P(|X nTn+δ − X nTn | > ε) = P(|X nδ | > ε) ≤ P(X nδ �= 0) (42.5)

≤ P

( n−1∑

j=0

Pn, jδ �= 0

)

.

Since the sum of independent Poisson processes is a Poisson process, then

∑n−1

j=0 P

n, j

t is a

Poisson process with parameter

n−1∑

j=0

m([z j, z j+1)) = m(I ).

342 Lévy processes

The last line of (42.5) is then bounded by

1 − e−δm(I ) ≤ δm(I ),

which tends to zero uniformly in n as δ → 0. Note X n0 = 0, a.s. We can therefore apply the

Aldous criterion (Theorem 34.8) to see that the X n are tight with respect to weak convergence

on the space D[0, t0) for any t0.

Any subsequential weak limit X will have paths that are right continuous with left limits.

For any continuous bounded function f on R,

E f (X nt − X ns ) = E f (X nt−s − X n0 ).

Passing to the limit along an appropriate subsequence,

E f (Xt − Xs) = E f (Xt−s − X0).

Since f is an arbitrary bounded continuous function, we see that the laws of Xt − Xs and

Xt−s − X0 are the same. Similarly we prove the increments are independent.

Since x → eiux is a bounded continuous function and mn converges weakly to m, starting

with

E exp(iuX nt ) = exp

(

t

∫

[eiux − 1] mn(dx)

)

,

and passing to the limit, we obtain that the characteristic function of X under P is given

by (42.3).

If now the interval I contains the point b, we follow the above proof, except we let Pn,n−1t

be a Poisson random variable with parameter m([zn−1, b]). Similarly, if I does not contain

the point a, we change Pn,0t to be a Poisson random variable with parameter m((a, z1)). With

these changes, the proof works for intervals I , whether or not they contain either of their

endpoints.

Remark 42.5 If X is the Lévy process constructed in Lemma 42.4, then Yt = Xt − E Xt will

be a Lévy process satisfying (42.4).

Here is the main theorem of this section.

Theorem 42.6 Suppose m is a measure on R with m({0}) = 0 and∫

(1 ∧ x2)m(dx) < ∞.
Suppose b ∈ R and σ ≥ 0. There exists a Lévy process Xt such that
E eiuXt = exp
(
t
{
iub − σ 2u2/2 +
∫
R
[eiux − 1 − iux1(|x|≤1)]m(dx)
})
. (42.6)
The above equation is called the Lévy–Khintchine formula. The measure m is called the
Lévy measure. If we let
m(dx) = 1 + x
2
x2
m′(dx)
42.2 Construction of Lévy processes 343
and
b = b′ +
∫
(|x|≤1)
x3
1 + x2 m(dx) −
∫
(|x|>1)

x

1 + x2 m(dx),

then we can also write

E eiuXt = exp

(

t

{

iub′ − σ 2u2/2 +

∫

R

[

eiux − 1 − iux

1 + x2

]1 + x2

x2

m′(dx)

})

.

Both expressions for the Lévy–Khintchine formula are in common use.

Proof Let m(dx) be a measure supported on (0, 1] with

∫

x2 m(dx) < ∞. Let mn(dx)
be the measure m restricted to (2−n, 2−n+1]. Let Y nt be independent Lévy processes whose
characteristic functions are given by (42.4) with m replaced by mn; see Remark 42.5. Note
EY nt = 0 for all n by Remark 42.1. By the independence of the Y n’s, if M < N ,
E
( N∑
n=M
Y nt
)2
=
N∑
n=M
E (Y nt )
2 =
N∑
n=M
t
∫
x2 mn(dx) = t
∫ 2−M
2−N
x2 m(dx).
By our assumption on m, this goes to zero as M, N → ∞, and we conclude that ∑Nn=0 Y nt
converges in L2 for each t. Call the limit Yt . It is routine to check that Yt has independent and
stationary increments. Each Y nt has independent increments and is mean zero, so
E [Y nt − Y ns | Fs] = E [Y nt − Y ns ] = 0,
or Y n is a martingale. By Doob’s inequalities and the L2 convergence,
E sup
s≤t
∣∣∣ N∑
n=M
Y ns
∣∣∣2 → 0
as M, N → ∞, and hence there exists a subsequence Mk such that
∑Mk
n=1 Y
n
s converges
uniformly over s ≤ t, a.s. Therefore the limit Yt will have paths that are right continuous with
left limits.
If m is a measure supported in (1, ∞) with m(R) < ∞, we do a similar procedure starting
with Lévy processes whose characteristic functions are of the form (42.3). We let mn(dx) be
the restriction of m to (2n, 2n+1], let X nt be independent Lévy processes corresponding to mn,
and form Xt =
∑∞
n=0 X
n
t . Since m(R) < ∞, for each t0, the number of times t less than t0
at which any one of the X nt jumps is finite. This shows Xt has paths that are right continuous
with left limits, and it is easy to then see that Xt is a Lévy process.
Finally, suppose
∫
x2 ∧ 1 m(dx) < ∞. Let X 1t , X 2t be Lévy processes with characteristic
functions given by (42.3) with m replaced by the restriction of m to (1, ∞) and (−∞, −1),
respectively, let X 3t , X
4
t be Lévy processes with characteristic functions given by (42.4) with
m replaced by the restriction of m to (0, 1] and [−1, 0), respectively, let X 5t = bt, and let X 6t
be σ times a Brownian motion. Suppose the X i’s are all independent. Then their sum will be
a Lévy process whose characteristic function is given by (42.6).
A key step in the construction was the centering of the Poisson processes to get Lévy
processes with characteristic functions given by (42.4). Without the centering one is forced
to work only with characteristic functions given by (42.3).
344 Lévy processes
42.3 Representation of Lévy processes
We now work toward showing that every Lévy process has a characteristic function of the
form given by (42.6).
Lemma 42.7 If Xt is a Lévy process and A is a Borel subset of R that is a positive distance
from 0, then
Nt (A) =
∑
s≤t
1A(�Xs)
is a Poisson process.
Saying that A is a positive distance from 0 means that inf{|x| : x ∈ A} > 0.

Proof Since Xt has paths that are right continuous with left limits and A is a positive distance

from 0, then there can only be finitely many jumps of X that lie in A in any finite time interval,

and so Nt (A) is finite and has paths that are right continuous with left limits. It follows from

the fact that Xt has stationary and independent increments that Nt (A) also has stationary and

independent increments. We now apply Proposition 5.4.

Theorem 42.8 Let Xt be a Lévy process with X0 = 0 and let A1, . . . , An be disjoint bounded

Borel subsets of (0, ∞), each a finite distance from 0. Set

Nt (Ak ) =

∑

s≤t

1Ak (�Xs)

and

Yt = Xt −

n∑

k=1

Nt (Ak ).

Then the processes Nt (A1), . . . , Nt (An), and Yt are mutually independent.

Proof Define λ(A) = E N1(A). The previous lemma shows that if λ(A) < ∞, then
Nt (A) is a Poisson process, and clearly its parameter is λ(A). The result now follows from
Theorem 18.3.
Here is the representation theorem for Lévy processes.
Theorem 42.9 Suppose Xt is a Lévy process with X0 = 0. Then there exists a measure m on
R − {0} with ∫
(1 ∧ x2) m(dx) < ∞
and real numbers b and σ such that the characteristic function of Xt is given by (42.6).
Proof Define m(A) = E N1(A) if A is a bounded Borel subset of (0, ∞) that is a positive
distance from 0. Since N1(∪∞k=1Ak ) =
∑∞
k=1 N1(Ak ) if the Ak are pairwise disjoint and each
is a positive distance from 0, we see that m is a measure on [a, b] for each 0 < a < b < ∞,
and m extends uniquely to a measure on (0, ∞).
42.3 Representation of Lévy processes 345
First we want to show that
∑
s≤t �Xs1(�Xs>1) is a Lévy process with characteristic function

exp

(

t

∫ ∞

1

[eiux − 1] m(dx)

)

.

Since the characteristic function of the sum of independent random variables is equal to the

product of the characteristic functions, it suffices to suppose 0 < a < b and to show that
E eiuZt = exp
(
t
∫
(a,b]
[eiux − 1] m(dx)
)
,
where
Zt =
∑
s≤t
�Xs1(a,b](�Xs).
Let n > 1 and z j = a + j(b− a)/n. By Lemma 42.7, Nt ((z j, z j+1]) is a Poisson process with

parameter

� j = E N1((z j−1, z j]) = m((z j, z j+1]).

Thus

∑n−1

j=0 z jNt ((z j, z j+1]) has characteristic function

n−1∏

j=0

exp(t� j(e

iuz j − 1)) = exp

(

t

n−1∑

j=0

(eiuz j − 1)� j

)

,

which is equal to

exp

(

t

∫

(eiux − 1) mn(dx)

)

, (42.7)

where mn(dx) =

∑n−1

j=0 � jδz j (dx). Since Z

n

t converges to Zt as n → ∞, passing to the limit

shows that Zt has a characteristic function of the form (42.6).

Next we show that m(1, ∞) < ∞. (We write m(1, ∞) instead of m((1, ∞)) for esthetic
reasons.) If not, m(1, K) → ∞ as K → ∞. Then for each fixed L and each fixed t,
lim sup
K→∞
P(Nt (1, K) ≤ L) = lim sup
K→∞
L∑
j=0
e−tm(1,K)
m(1, K) j
j!
= 0.
This implies that Nt (1, ∞) = ∞ for each t. However, this contradicts the fact that Xt has
paths that are right continuous with left limits.
We define m on (−∞, 0) similarly.
We now look at
Yt = Xt −
∑
s≤t
�Xs1(|�Xs|>1).

This is again a Lévy process, and we need to examine its structure. This process has

bounded jumps, hence has moments of all orders. By subtracting c1t for an appropriate

constant c1, we may suppose Yt has mean 0. Let I1, I2, . . . be an ordering of the intervals

{[2−(m+1), 2−m), (−2−m, −2−(m+1)] : m ≥ 0}. Let

X̃ kt =

∑

s≤t

�Xs1(�Xs∈Ik )

346 Lévy processes

and let X kt = X̃ kt − E X̃ kt . By Corollary 18.3 and the fact that all the X k have mean zero,

∞∑

k=1

E (X kt )

2 ≤ E

[(

Yt −

∞∑

k=1

X kt

)2]

+ E

[( ∞∑

k=1

X kt

)2]

= E (Yt )2 < ∞.
Hence
E
[ N∑
k=M
X kt
]2
=
N∑
k=M
E (X kt )
2
tends to zero as M, N → ∞, and thus Xt −
∑N
k=1 X
k
t converges in L
2. The limit, X ct , say,
will be a Lévy process independent of all the X kt . Moreover, X
c has no jumps, i.e., it is
continuous. Since all the X k have mean zero, then E X ct = 0. By the independence of the
increments,
E [X ct − X cs | Fs] = E [X ct − X cs ] = 0,
and we see X c is a continuous martingale. Using the stationarity and independence of the
increments,
E [(X cs+t )
2] = E [(X cs )2] + 2E [X cs (X cs+t − X cs )] + E [(X cs+t − X cs )2]
= E [(X cs )2] + E [(X ct )2],
which implies that there exists a constant c2 such that E (X ct )
2 = c2t. We then have
E [(X ct )
2 − c2t | Fs] = (X cs )2 − c2s + E [(X ct − X cs )2 | Fs] − c2(t − s)
= (X cs )2 − c2s + E [(X ct − X cs )2] − c2(t − s)
= (X cs )2 − c2s.
The quadratic variation process of X c is therefore c2t, and by Lévy’s theorem
(Theorem 12.1), X ct /
√
c2 is a constant multiple of Brownian motion.
To complete the proof, it remains to show that
∫ 1
−1 x
2 m(dx) < ∞. But by Remark 42.1,∫
Ik
x2 m(dx) = E (X k1 )2,
and we have seen that ∑
k
E (X k1 )
2 ≤ EY 21 < ∞.
Combining gives the finiteness of
∫ 1
−1 x
2 m(dx).
Exercises
42.1 Let α ∈ (0, 2) and let X be a Lévy process where b = σ = 0 in the Lévy–Khintchine formula
and the Lévy measure is m(dx) = c|x|−1−α dx. Show that if a > 0 and Yt = a1/αXat , then Y has

the same law as X . The process X is known as a symmetric stable process of index α.

42.2 Suppose Wt = (W 1t ,W 2t ) is a two-dimensional Brownian motion started at 0. Let τs =

inf{t > 0 : W 1t > s}. Prove that W 2τt is a Lévy process and determine the Lévy measure.

Hint: Use scaling to make a guess.

Exercises 347

42.3 Let W be a one-dimensional Brownian motion and let L0 be the local time at 0. Let Tt be the

inverse of L0, that is, Tt = inf{s : L0s ≥ t}. Show Tt is a Lévy process and determine the Lévy

measure.

Hint: Use scaling to get started.

42.4 Let Wt be a one-dimensional Brownian motion, L

y

t the local time at level y, and Tt the inverse

local time at 0, that is, Tt = inf{s : L0s ≥ t}. Let x > 0 be fixed. Prove that LxTt is a Lévy process.

42.5 Let X be a Lévy process with Lévy measure m. Prove that if A and B are disjoint closed sets,

then

E x

∑

s≤t

1A(Xs−)1B(Xs) = E x

∫ t

0

1A(Xs)m(B − Xs) ds

for each x, where B − y = {z − y : z ∈ B}. This is the Lévy system formula in the case of Lévy

processes. There is an analogous formula for Hunt processes.

42.6 A stable subordinator X of order α ∈ (0, 1) is a Lévy process whose characteristic function

is given by (42.6), where b = σ 2 = 0 and m(dx) = c1(x>0)|x|−α−1 dx. Suppose X is a stable

subordinator of index α and W is a Brownian motion. Show that, up to a deterministic time

change, the process Zt = WXt is a symmetric stable process of index 2α.

Hint: Start by using scaling.

42.7 Let Zt be a symmetric stable process of order α ∈ (0, 2). Show that if ε > 0, then

lim

t→∞

|Zt |

tα+ε

= 0, a.s.

Appendix A

Basic probability

This appendix covers the facts from basic probability that we will need. The presentation

here is not precisely what I use when I teach such a course. For example, in a course I

prove the strong law of large numbers without using martingales, I present the inversion

theorem for characteristic functions, I make use of Lévy’s continuity theorem, and so on.

Nevertheless, proofs of all the facts from probability needed in the main part of the text are

given.

A.1 First notions

A probability or probability measure is a measure whose total mass is one. Instead of denoting

a measure space by (X ,A, μ), probabilists use (�,F , P). Here � is a set, F is called a

σ -field (which is the same thing as a σ -algebra), and P is a measure with P(�) = 1. Elements

of F are called events. A typical element of � is denoted ω.

Instead of saying a property occurs almost everywhere, we talk about properties occurring

almost surely, written a.s. Real-valued measurable functions from � to R are called random

variables and are usually denoted by X or Y or other capital letters.

Integration (in the sense of Lebesgue) with respect to P is called expectation or expected

value, and we write E X for

∫

X dP. The notation E [X ; A] is often used for ∫A X dP.

The random variable 1A is the function that is one if ω ∈ A and zero otherwise. It is called

the indicator of A (the name “characteristic function” in probability refers to the Fourier

transform). Events such as {ω : X (ω) > a} are almost always abbreviated by (X > a) or

{X > a}.

Given a random variable X , we can define a probability on the Borel σ -field of R by

PX (A) = P(X ∈ A), A ⊂ R. (A.1)

The probability PX is called the law of X or the distribution of X . We define FX : R → [0, 1]

by

FX (x) = PX ((−∞, x]) = P(X ≤ x). (A.2)

The function FX is called the distribution function of X .

348

A.1 First notions 349

Proposition A.1 The distribution function FX of a random variable X satisfies:

(1) FX is increasing;

(2) FX is right continuous with left limits;

(3) limx→∞ FX (x) = 1 and limx→−∞ FX (x) = 0.

Proof We prove the right continuity of FX and leave the rest of the proof to the reader.

If xn ↓ x, then (X ≤ xn) ↓ (X ≤ x), and so P(X ≤ xn) ↓ P(X ≤ x) since P is a finite

measure.

Note that if xn ↑ x, then (X ≤ xn) ↑ (X < x), and so FX (xn) ↑ P(X < x).
Any function F : R → [0, 1] satisfying (1)–(3) of Proposition 1.1 is called a distribution
function, whether or not it comes from a random variable.
Proposition A.2 Suppose F is a distribution function. There exists a random variable X
such that F = FX .
Proof Let � = [0, 1], F the Borel σ -field, and P a Lebesgue measure. Define X (ω) =
sup{x : F (x) < ω}. It is routine to check that FX = F .
In the above proof, essentially X = F−1. However F may have jumps or be constant over
some intervals, so some care is needed in defining X .
Certain distributions or laws are very common. We list some of them.
(1) Bernoulli. A random variable is Bernoulli if P(X = 1) = p, P(X = 0) = 1 − p for
some p ∈ [0, 1].
(2) Binomial. This is defined by P(X = k) =
(
n
k
)
pk(1 − p)n−k , where n is a positive
integer, 0 ≤ k ≤ n, p ∈ [0, 1], and
(
n
k
)
= n!k!(n−k)! .
(3) Point mass at a. Here P(X = a) = 1.
(4) Poisson. For λ > 0 we set P(X = k) = e−λλk/k! Again k is a non-negative integer.

If F is absolutely continuous, we call f = F ′ the density of F . If such an F is the

distribution function of a random variable X , then

P(X ∈ A) =

∫

A

f (x) dx.

Some examples of distributions characterized by densities are the following.

(5) Uniform on [a, b]. Define f (x) = (b − a)−11[a,b](x).

(6) Exponential. For x ≥ 0 let f (x) = λe−λx and set f (x) = 0 for x < 0.
(7) Standard normal. Define f (x) = 1√
2π
e−x
2/2 for x ∈ R.
Let us verify that the integral of f is one. To do that, let
I =
∫ ∞
0
e−x
2/2 dx,
350 Basic probability
and it suffices to show I = √π/2. Using the Fubini theorem, the monotone convergence
theorem, and a change of variables to polar coordinates, we write
I2 =
( ∫ ∞
0
e−x
2/2 dx
)( ∫ ∞
0
e−y
2/2 dy
)
=
∫ ∞
0
∫ ∞
0
e−(x
2+y2 )/2 dx dy
= lim
R→∞
∫ ∫
x,y≥0,x2+y2≤R2
e−(x
2+y2 )/2 dx dy
= lim
R→∞
∫ π/2
0
∫ R
0
e−r
2/2 r dr dθ
= lim
R→∞
π
2
(1 − e−R2/2) = π
2
as desired.
We shall see later ((A.4) and (A.5)) that a standard normal random variable Z has mean
zero and variance one, which means that E Z = 0 and E Z2 = 1.
(8) Normal random variables with mean μ and variance σ 2. If Z is a standard normal
random variable, then a normal random variable X with mean μ and variance σ 2 has the
same distribution as μ+σZ. It is an exercise in calculus to check that such a random variable
has density
f (x) = 1√
2πσ
e−(x−μ)
2/2σ 2 . (A.3)
(9) Gamma. A random variable X has a gamma distribution with parameters r and λ (both
r and λ must be positive) if it has density
f (x) = λe−λx(λx)r−1/
(r)
for x ≥ 0 and f (x) = 0 if x < 0, where
(r) = ∫∞0 e−yyr−1 dy is the Gamma function.
Recall
(k) = (k − 1)! for k a non-negative integer.
We can use the law of a random variable to calculate expectations.
Proposition A.3 Let X be a random variable. If g is bounded or non-negative, then
E g(X ) =
∫
g(x) PX (dx).
Proof If g is the indicator of an event A, this is just the definition of PX . By linearity, the
result holds for simple functions. By the monotone convergence theorem, the result holds
for non-negative functions, and by linearity again, it holds for bounded g.
If FX has a density f , then PX (dx) = f (x) dx. In this case E X =
∫
x f (x) dx and
E X 2 = ∫ x2 f (x) dx. (We need E |X | finite to justify the first equality if X is not necessarily
non-negative.) We define the mean of a random variable to be its expectation, and the variance
of a random variable is defined by
Var X = E (X − E X )2.
The pth moment of X is E X p if p is a positive integer.
A.1 First notions 351
Note
Var X = E [X 2 − 2(X )(E X ) + (E X )2] = E X 2 − (E X )2.
Let us calculate a few examples. Since xe−x
2/2 is an odd function, if Z is a standard normal
random variable, then
E Z =
∫
x
1√
2π
e−x
2/2 dx = 0. (A.4)
Using integration by parts,
E Z2 =
∫
x2
1√
2π
e−x
2/2 dx (A.5)
= lim
N→∞
∫ N
−N
x2
1√
2π
e−x
2/2 dx
= lim
N→∞
−2Ne−N2/2 +
∫ N
−N
1√
2π
e−x
2/2 dx
= 1√
2π
∫
e−x
2/2 dx = 1,
and so Var Z = 1.
By completing the square and a change of variables, we calculate
E eaZ = 1√
2π
∫
eaxe−x
2/2 dx
= 1√
2π
ea
2/2
∫
e−(x−a)
2/2 dx = ea2/2.
If X is a normal random variable with mean μ and variance σ 2, we can write X = μ+σZ
for Z a standard normal random variable, and obtain
E eaX = eaμE eaσZ = eaμ+a2σ 2/2. (A.6)
If X is a Poisson random variable with parameter λ, then
E X =
∞∑
k=0
ke−λ
λk
k!
=
∞∑
k=1
ke−λ
λk
k!
(A.7)
= λe−λ
∞∑
k=1
λk−1
(k − 1)! = λ.
A similar calculation shows that E [X (X − 1)] = λ2, so
Var X = E [X (X − 1)] + E X − (E X )2 = λ. (A.8)
A straightforward application of integration by parts shows that if X is an exponential
random variable with parameter λ, then
E X =
∫ ∞
0
λxe−λx dx = 1
λ
. (A.9)
Another equality that is useful is the following.
352 Basic probability
Proposition A.4 If X ≥ 0, a.s., and p > 0, then

E X p =

∫ ∞

0

pλp−1P(X > λ) dλ.

The proof will show that this equality is also valid if we replace P(X > λ) by P(X ≥ λ).

Proof Using the Fubini theorem and writing∫ ∞

0

pλp−1P(X > λ) dλ = E

∫ ∞

0

pλp−11(λ,∞)(X ) dλ

= E

∫ X

0

pλp−1 dλ = E X p

gives the proof.

We need two elementary inequalities. The first is known as Chebyshev’s inequality.

Proposition A.5 If X ≥ 0,

P(X ≥ a) ≤ E X

a

.

Proof We write

P(X ≥ a) = E

[

1[a,∞)(X )

]

≤ E

[(X

a

)

1[a,∞)(X )

]

≤ E X/a,

since X/a is bigger than or equal to 1 when X ∈ [a, ∞).

If we apply this to X = (Y − EY )2, we obtain

P(|Y − EY | ≥ a) = P((Y − EY )2 ≥ a2) ≤ VarY/a2. (A.10)

This special case of Chebyshev’s inequality is sometimes itself referred to as Chebyshev’s

inequality, while Proposition A.5 is sometimes called the Markov inequality.

The second inequality we need is Jensen’s inequality, not to be confused with Jensen’s

formula of complex analysis.

Proposition A.6 Suppose g is convex and X and g(X ) are both integrable. Then

g(E X ) ≤ E g(X ).

Proof One property of convex functions is that they lie above their tangent lines, and more

generally, their support lines. Thus if x0 ∈ R, we have

g(x) ≥ g(x0) + c(x − x0)

for some constant c. Letting x = X (ω) and taking expectations, we obtain

E g(X ) ≥ g(x0) + c(E X − x0).

Now set x0 equal to E X .

If An is a sequence of sets, define (An i.o.), read “An infinitely often,” by

(An i.o.) = ∩∞n=1 ∪∞i=n Ai.

This set consists of those ω that are in infinitely many of the An.

A.2 Independence 353

A simple but very important proposition is the Borel–Cantelli lemma. It has two parts,

and we prove the first part here, leaving the second part to the next section.

Proposition A.7 Let A1, A2, . . . be a sequence of events. If

∑

n P(An) < ∞, then
P(An i.o.) = 0.
Proof We write
P(An i.o.) = lim
n→∞
P(∪∞i=nAi) ≤ lim sup
n→∞
∞∑
i=n
P(Ai) = 0,
and we are done.
A.2 Independence
We say two events A and B are independent if P(A ∩ B) = P(A)P(B). The events A1, . . . , An
are independent if
P(Ai1 ∩ Ai2 ∩ · · · ∩ Aij ) = P(Ai1 )P(Ai2 ) · · · P(Aij )
for each subset {i1, . . . , i j} of {1, . . . , n} with 1 ≤ i1 < · · · < i j ≤ n.
Proposition A.8 If A and B are independent, then Ac and B are independent.
Proof We write
P(Ac ∩ B) = P(B) − P(A ∩ B) = P(B) − P(A)P(B)
= P(B)(1 − P(A)) = P(B)P(Ac).
This is all there is to the proof.
We say two σ -fields F and G are independent if A and B are independent whenever A ∈ F
and B ∈ G. Two random variables X and Y are independent if σ (X ), the σ -field generated by
X , and σ (Y ), the σ -field generated by Y , are independent. (Recall that the σ -field generated
by a random variable X is given by {(X ∈ A) : A a Borel subset of R}.) We define the
independence of n σ -fields or n random variables in a similar way.
Remark A.9 If f and g are Borel functions and X and Y are independent, then f (X ) and
g(Y ) are independent. This follows because the σ -field generated by f (X ) is a sub-σ -field
of the one generated by X , and similarly for g(Y ).
To construct independent random variables, we can use the following.
Proposition A.10 If F1, . . . , Fn are distribution functions, there exist independent random
variables X1, . . . , Xn such that FXi = Fi, i = 1, . . . , n.
Proof Let � = [0, 1]n, F the Borel σ -field on �, and P an n-dimensional Lebesgue
measure on �. If ω = (ω1, . . . , ωn), define Xi(ω) = sup{x : Fi(x) < ωi}. As in Proposition
A.2, FXi = Fi. We deduce the independence from the fact that P is a product measure, in fact,
the n-fold product of one-dimensional Lebesgue measure on [0, 1].
354 Basic probability
Let FX ,Y (x, y) = P(X ≤ x,Y ≤ y) denote the joint distribution function of two random
variables X and Y . (The comma inside the set means “and"; this is a standard convention in
probability.)
Proposition A.11 FX ,Y (x, y) = FX (x)FY (y) if and only if X and Y are independent.
Proof If X and Y are independent, then
FX ,Y (x, y) = P(X ≤ x,Y ≤ y) = P(X ≤ x)P(Y ≤ y) = FX (x)FY (y).
Conversely, if the inequality holds, fix y and let My denote the collection of sets A for
which P(X ∈ A,Y ≤ y) = P(X ∈ A)P(Y ≤ y). My contains all sets of the form (−∞, x].
It follows by linearity that My contains all sets of the form (x, z], and then by linearity
again, all sets that are the finite union of such half-open, half-closed intervals. Note that the
collection of finite unions of such intervals, A, is an algebra generating the Borel σ -field. It
is clear that My is a monotone class, so by the monotone class theorem (Theorem B.2), My
contains the Borel σ -field.
For a fixed set A, let MA denote the collection of sets B for which P(X ∈ A,Y ∈ B) =
P(X ∈ A)P(Y ∈ B). Again, MA is a monotone class and by the preceding paragraph
contains the σ -field generated by the collection of finite unions of intervals of the form (x, z],
and hence contains the Borel sets. Therefore X and Y are independent.
The following is known as the multiplication theorem.
Proposition A.12 If X , Y , and X Y are integrable and X and Y are independent, then
E [X Y ] = (E X )(EY ).
Proof Consider the pairs (ZX , ZY ) with ZX being σ (X ) measurable and ZY being σ (Y )
measurable for which the multiplication theorem is true. It holds for ZX = 1A(X ) and
ZY = 1B(Y ) with A and B Borel subsets of R by the definition of X and Y being independent.
It holds for simple random variables (ZX , ZY ), that is, linear combinations of indicators, by the
linearity of both sides. It holds for non-negative random variables by monotone convergence.
And it holds for integrable random variables by linearity again.
If X1, . . . , Xn are independent, then so are X1 −E X1, . . . , Xn −E Xn. Assuming everything
is integrable,
E [(X1 − E X1) + · · · (Xn − E Xn)]2 = E (X1 − E X1)2 + · · · + E (Xn − E Xn)2,
using the multiplication theorem to show that the expectations of the cross-product terms are
zero. We have thus shown
Var (X1 + · · · + Xn) = Var X1 + · · · + Var Xn. (A.11)
We finish up this section by proving the second half of the Borel–Cantelli lemma.
Proposition A.13 Suppose An is a sequence of independent events. If
∞∑
n=1
P(An) = ∞,
then P(An i.o.) = 1.
A.3 Convergence 355
Note that here the An are independent, while in the first half of the Borel–Cantelli lemma
no such assumption was necessary.
Proof Note
P(∪Ni=nAi) = 1 − P(∩Ni=nAci ) = 1 −
N∏
i=n
P(Aci )
= 1 −
N∏
i=n
(1 − P(Ai)) ≥ 1 − exp
(
−
N∑
i=n
P(Ai)
)
,
using the inequality 1 − x ≤ e−x for x > 0. As N → ∞, the right-hand side tends to one, so

P(∪∞i=nAi) = 1. This holds for all n, which proves the result.

A.3 Convergence

In this section we consider three ways a sequence of random variables Xn can converge.

We say Xn converges to X almost surely if the event (Xn �→ X ) has probability zero. Xn

converges to X in probability if for each ε, P(|Xn − X | > ε) → 0 as n → ∞. For p ≥ 1, Xn

converges to X in Lp if E |Xn − X |p → 0 as n → ∞.

The following proposition shows some relationships among the types of convergence.

Proposition A.14 (1) If Xn → X almost surely, then Xn → X in probability.

(2) If Xn → X in Lp, then Xn → X in probability.

(3) If Xn → X in probability, there exists a subsequence n j such that Xnj converges to X

almost surely.

Proof To prove (1), note Xn − X tends to zero almost surely, so 1(−ε,ε)c (Xn − X ) also

converges to zero almost surely. Now apply the dominated convergence theorem.

(2) comes from Chebyshev’s inequality:

P(|Xn − X | > ε) = P(|Xn − X |p > εp) ≤ E |Xn − X |p/εp → 0

as n → ∞.

To prove (3), choose nj larger than nj−1 such that P(|Xn − X | > 2− j) < 2− j whenever
n ≥ nj. Thus if we let Ai = (|Xnj − X | > 2−i for some j ≥ i), then P(Ai) ≤ 2−i+1.

By the Borel–Cantelli lemma P(Ai i.o.) = 0. This implies Xnj → X almost surely on the

complement of (Ai i.o.).

Let us give some examples to show there need not be any other implications among the

three types of convergence.

Let � = [0, 1], F the Borel σ -field, and P a Lebesgue measure. Let Xn = n21(0,1/n). Then

clearly Xn converges to zero almost surely and in probability, but E X pn = n2p/n → ∞ for

any p ≥ 1.

Let � be the unit circle, and let P be a Lebesgue measure on the circle normalized to have

total mass 1. We use θ to denote the angle that the ray from 0 through a point on the circle

makes with the x axis. Let tn =

∑n

i=1 i

−1, and let An = {eiθ : tn−1 ≤ θ < tn}. Let Xn = 1An .
356 Basic probability
Any point on the unit circle will be in infinitely many An, so Xn does not converge almost
surely to zero. But P(An) = 1/(2πn) → 0, so Xn → 0 in probability and in Lp.
A.4 Uniform integrability
A sequence {Xi} of random variables is uniformly integrable if
sup
i
∫
(|Xi|>M )

|Xi| dP → 0

as M → ∞. This can be rephrased by saying: given ε > 0 there exists M > 0 such that

E [ |Xi|; |Xi| > M] < ε for all i. Here M can be chosen independently of i.
Lemma A.15 If {Xi} is a uniformly integrable sequence of random variables, then
supi E |Xi| < ∞.
Proof There exists M such that E [ |Xi|; |Xi| > M] ≤ 1. Then

E |Xi| ≤ E [ |Xi|; |Xi| ≤ M] + E [ |Xi|; |Xi| > M] ≤ M + 1,

and we are done.

We say a sequence of random variables {Xi} is uniformly absolutely continuous if given ε

there exists δ such that supi E [ |Xi|; A] ≤ ε whenever P(A) < δ.
Proposition A.16 The following are equivalent.
(1) The sequence {Xi} is uniformly integrable.
(2) The sequence {Xi} is uniformly absolutely continuous and supi E |Xi| < ∞.
Proof If (1) holds, we showed in Lemma A.15 that the expectations are uniformly bounded.
Let ε > 0 and choose M such that supi E [ |Xi| : |Xi| > M] < ε/2. Then if δ = ε/(2M ) and
P(A) < δ, we have
E [ |Xi|; A] ≤ E [ |Xi|; |Xi| > M] + E [ |Xi|; |Xi| ≤ M, A] < ε
2
+ MP(A) ≤ ε.
Now suppose (2) holds. Let ε > 0 and choose δ such that E [ |Xi|; A] < ε for all i if
P(A) ≤ δ. Let M = supi E |Xi|/δ. Then by the Chebyshev inequality
P(|Xi| > M ) ≤ E |Xi|

M

= δ,

so E [ |Xi|; |Xi| > M] < ε.
Proposition A.17 Suppose {Xi} and {Yi} are each uniformly integrable sequences of random
variables. Then {Xi + Yi} is also a uniformly integrable sequence.
Proof By Proposition A.16,
sup
i
E |Xi + Yi| ≤ sup
i
E |Xi| + sup
i
E |Yi| < ∞.
Using Proposition A.16 again, given ε there exists δ such that E [ |Xi|; A] < ε/2 and
E [ |Yi|; A] < ε/2 if P(A) < δ. But then E [ |Xi + Yi|; A] < ε and a third use of Propo-
sition A.16 yields our result.
A.5 Conditional expectation 357
Proposition A.18 Suppose there exists ϕ : [0, ∞) → [0, ∞) such that ϕ is increasing,
ϕ(x)/x → ∞ as x → ∞, and supi E ϕ(|Xi|) < ∞. Then the sequence {Xi} is uniformly
integrable.
Proof Let ε > 0 and choose x0 such that x/ϕ(x) < ε if x ≥ x0. If M ≥ x0,∫
(|Xi|>M )

|Xi| =

∫ |Xi|

ϕ(|Xi|)ϕ(|Xi|)1(|Xi|>M ) ≤ ε

∫

ϕ(|Xi|) ≤ ε sup

i

E ϕ(|Xi|).

Since ε is arbitrary, we are done.

The main result we need in this section is the Vitali convergence theorem.

Theorem A.19 If Xn → X almost surely and the sequence {Xn} is uniformly integrable, then

E |Xn − X | → 0.

Proof By Proposition A.17 with Yi = −X for each i, the sequence Xi − X is uniformly

integrable. Let ε > 0 and choose M such that∫

(|Xi−X |>M )

|Xi − X | < ε.
By dominated convergence,
lim sup
i→∞
E |Xi − X | ≤ lim sup
i→∞
E [ |Xi − X |; |Xi − X | ≤ M] + ε = ε.
Since ε is arbitrary, then E |Xi − X | → 0.
A.5 Conditional expectation
If F ⊂ G are two σ -fields and X is an integrable G measurable random variable, the
conditional expectation of X given F , written E [X | F ] and read as “the expectation
(or expected value) of X given F ,” is any F measurable random variable Y such that
E [Y ; A] = E [X ; A] for every A ∈ F . The conditional probability of A ∈ G given F is
defined by P(A | F ) = E [1A | F ].
If Y1,Y2 are two F measurable random variables with E [Y1; A] = E [Y2; A] for all A ∈ F ,
then Y1 = Y2, a.s., and so conditional expectation is unique up to almost sure equivalence.
In the case X is already F measurable, E [X | F ] = X . If X is independent from F ,
E [X | F ] = E X . Both of these facts follow immediately from the definition. For another
example, if {Ai} is a finite collection of pairwise disjoint sets whose union is �, P(Ai) > 0

for all i, and F is the σ -field generated by the Ai’s, then

P(A | F ) =

∑

i

P(A ∩ Ai)

P(Ai)

1Ai . (A.12)

This follows since the right-hand side is F measurable and its expectation over any set Ai is

P(A ∩ Ai). Equation (A.12) provides the link with the definition of conditional probability

from elementary probability: if P(B) �= 0, then

P(A | B) = P(A ∩ B)

P(B)

. (A.13)

358 Basic probability

We have

E [E [X | F ] ] = E X (A.14)

because E [E [X | F ]] = E [E [X | F ]; �] = E [X ; �] = E X .

The following is easy to establish.

Proposition A.20 (1) If X ≥ Y are both integrable, then

E [X | F ] ≥ E [Y | F ], a.s.

(2) If X and Y are integrable and a ∈ R, then

E [aX + Y | F ] = aE [X | F ] + E [Y | F ].

It is easy to check that limit theorems such as monotone convergence and dominated

convergence have conditional expectation versions, as do inequalities like Jensen’s and

Chebyshev’s inequalities. Thus, for example, we have Jensen’s inequality for conditional

expectations.

Proposition A.21 If g is convex and X and g(X ) are integrable,

E [g(X ) | F ] ≥ g(E [X | F ]), a.s.

A key fact is the following.

Proposition A.22 If X and X Y are integrable and Y is measurable with respect to F , then

E [X Y | F ] = Y E [X | F ]. (A.15)

Proof If A ∈ F , then for any B ∈ F ,

E

[

1AE [X | F ]; B

] = E [E [X | F ]; A ∩ B] = E [X ; A ∩ B] = E [1AX ; B].

Since 1AE [X | F ] is F measurable, this shows that (A.15) holds when Y = 1A and A ∈ F .

Using linearity shows that (A.15) holds whenever Y is a simple F measurable random

variable. Taking limits, (A.15) holds whenever Y ≥ 0 is F measurable and X and X Y are

integrable. Using linearity again completes the proof.

Two other equalities are contained in the following.

Proposition A.23 If E ⊂ F ⊂ G are σ -fields, then

E

[

E [X | F ] | E] = E [X | E] = E [E [X | E] | F].

Proof The right equality holds because E [X | E] is E measurable, hence F measurable.

We then use the fact that if Y is F measurable, E [Y | F ] = Y .

To show the left equality, let A ∈ E . Then since A is also in F ,

E

[

E

[

E [X | F ] | E]; A] = E [E [X | F ]; A] = E [X ; A] = E [E [X | E]; A].

Since both sides are E measurable, the equality follows.

To show the existence of E [X | F ], we proceed as follows.

Proposition A.24 If X is integrable, then E [X | F ] exists.

A.7 Martingales 359

Proof Using linearity, we need only consider X ≥ 0. Define a finite measure Q on F by

Q(A) = E [X ; A] for A ∈ F . This is trivially absolutely continuous with respect to P|F , the

restriction of P to F . Let E [X | F ] be the Radon–Nikodym derivative of Q with respect to

P|F . Since Q and P|F are measures on F , the Radon–Nikodym derivative is F measurable,

and so provides the desired random variable.

When F = σ (Y ), one usually writes E [X | Y ] for E [X | F ]. Notation that is commonly

used is E [X | Y = y]. The definition is as follows. If A ∈ σ (Y ), then A = (Y ∈ B) for

some Borel set B by the definition of σ (Y ), or 1A = 1B(Y ). By linearity and taking limits, it

follows that if Z is σ (Y ) measurable, then Z = f (Y ) for some Borel measurable function f .

Set Z = E [X | Y ] and choose f Borel measurable so that Z = f (Y ). Then E [X | Y = y] is

defined to be f (y).

If X ∈ L2 and M = {Y ∈ L2 : Y is F measurable}, one can show that E [X | F ] is equal

to the projection of X onto the subspace M.

A.6 Stopping times

We next want to talk about stopping times. Suppose we have a sequence of σ -fields Fi such

that Fi ⊂ Fi+1 for each i. An example would be if Fi = σ (X1, . . . , Xi). A random mapping

N from � to {0, 1, 2, . . .} is called a stopping time if for each n, (N ≤ n) ∈ Fn.

The proof of the following is immediate from the definitions.

Proposition A.25 (1) Fixed times n are stopping times.

(2) If N1 and N2 are stopping times, then so are N1 ∧ N2 and N1 ∨ N2.

(3) If Nn is an increasing sequence of stopping times, then so is N = supn Nn.

(4) If Nn is a decreasing sequence of stopping times, then so is N = inf n Nn.

(5) If N is a stopping time, then so is N + n.

We define

FN = {A : A ∩ (N ≤ n) ∈ Fn for all n}. (A.16)

A.7 Martingales

In this section we consider martingales. Let Fn be an increasing sequence of σ -fields. A

sequence of random variables Mn is adapted to Fn if for each n, Mn is Fn measurable.

Mn is a martingale if Mn is adapted to Fn, Mn is integrable for all n, and

E [Mn | Fn−1] = Mn−1, a.s., n = 2, 3, . . . (A.17)

If we have E [Mn | Fn−1] ≥ Mn−1, a.s., for every n, then Mn is a submartingale. If we have

E [Mn | Fn−1] ≤ Mn−1, we have a supermartingale.

Let us look at some examples. If Xi is a sequence of mean zero independent random

variables and Sn =

∑n

i=1 Xi, then Mn = Sn is a martingale, since

E [Mn | Fn−1] = Mn−1 + E [Mn − Mn−1 | Fn−1]

= Mn−1 + E [Mn − Mn−1] = Mn−1,

using independence.

360 Basic probability

Another example is the following. If the Xi’s are independent and have mean zero and

variance one, Sn is as in the previous example, and Mn = S2n − n, then

E [S2n | Fn−1] = E [(Sn − Sn−1)2 | Fn−1] + 2Sn−1E [Sn | Fn−1] − S2n−1 = 1 + S2n−1,

using independence. It follows that Mn is a martingale.

A third example is the following: if X ∈ L1 and Mn = E [X | Fn], then Mn is a martingale.

The proof of this is simple:

E [Mn+1 | Fn] = E [E [X | Fn+1] | Fn] = E [X | Fn] = Mn.

If Mn is a martingale, g is convex, and g(Mn) is integrable for each n, then by Jensen’s

inequality for conditional expectations,

E [g(Mn+1) | Fn] ≥ g(E [Mn+1 | Fn]) = g(Mn), (A.18)

or g(Mn) is a submartingale. Similarly if g is convex and increasing on [0, ∞) and Mn is a

positive submartingale, then g(Mn) is a submartingale because

E [g(Mn+1) | Fn] ≥ g(E [Mn+1 | Fn]) ≥ g(Mn).

A.8 Optional stopping

Note that if one takes expectations in (A.17), one has E Mn = E Mn−1, and by induction

E Mn = E M0. The theorem about martingales that lies at the basis of all other results is

Doob’s optional stopping theorem, which says that the same is true if we replace n by a

stopping time N . There are various versions, depending on what conditions one puts on the

stopping times.

Theorem A.26 If N is a stopping time with respect to Fn that is bounded by a positive real

K and Mn a martingale, then E MN = E M0.

Proof We write

E MN =

K∑

k=0

E [MN ; N = k] =

K∑

k=0

E [Mk; N = k].

Note (N = k) is F j measurable if j ≥ k, so

E [Mk; N = k] = E [Mk+1; N = k] = E [Mk+2; N = k]

= · · · = E [MK; N = k].

Hence

E MN =

K∑

k=0

E [MK; N = k] = E MK = E M0.

This completes the proof.

The same proof as that in Theorem A.26 gives the following corollary.

A.9 Doob’s inequalities 361

Corollary A.27 If N is a stopping time bounded by K and Mn is a submartingale, then

E MN ≤ E MK.

The same proof also gives

Corollary A.28 If N is a stopping time bounded by K, A ∈ FN , and Mn is a submartingale,

then E [MN ; A] ≤ E [MK; A].

Proposition A.29 If N1 ≤ N2 are stopping times bounded by K and M is a martingale, then

E [MN2 | FN1 ] = MN1 , a.s.

Proof Suppose A ∈ FN1 . We need to show E [MN1; A] = E [MN2; A]. Define a new stopping

time N3 by

N3(ω) =

{

N1(ω), ω ∈ A

N2(ω), ω /∈ A.

It is easy to check that N3 is a stopping time, so E MN3 = E MK = E MN2 implies

E [MN1; A] + E [MN2; Ac] = E [MN2 ].

Subtracting E [MN2; Ac] from each side completes the proof.

The following is known as the Doob decomposition for discrete time martingales.

Proposition A.30 Suppose Xk is a submartingale with respect to an increasing sequence of

σ -fields Fk. Then we can write Xk = Mk + Ak such that Mk is a martingale adapted to the Fk

and Ak is a sequence of random variables with Ak beingFk−1 measurable and A0 ≤ A1 ≤ · · · .

Proof Let ak = E [Xk | Fk−1] − Xk−1 for k = 1, 2, . . . Since Xk is a submartingale, each

ak ≥ 0. Let Ak =

∑k

i=1 ai. The fact that the Ak are increasing and measurable with respect to

Fk−1 is clear. Set Mk = Xk − Ak . Then

E [Mk+1 − Mk | Fk] = E [Xk+1 − Xk | Fk] − ak+1 = 0,

or Mk is a martingale.

Combining Propositions A.29 and A.30 we have

Corollary A.31 Suppose Xk is a submartingale, and N1 ≤ N2 are bounded stopping times.

Then

E [XN2 | FN1 ] ≥ XN1 .

A.9 Doob’s inequalities

The first interesting consequences of the optional stopping theorems are Doob’s inequalities.

If Mn is a martingale, set M∗n = maxi≤n |Mi|.

Theorem A.32 If Mn is a martingale or a positive submartingale,

P(M∗n ≥ a) ≤

1

a

E [ |Mn|; M∗n ≥ a] ≤

1

a

E |Mn|.

362 Basic probability

Proof Fix n. Set Mn+1 = Mn. Let N = min{ j : |Mj| ≥ a} ∧ (n + 1). Since the function

f (x) = |x| is convex, |Mn| is a submartingale. If A = (M∗n ≥ a), then A ∈ FN and we have

aP(M∗n ≥ a) ≤ E [ |MN |; A] ≤ E [ |Mn|; A] ≤ E |Mn|,

the first inequality by the definition of N , the second by Corollary A.28.

For p > 1, we have the following inequality.

Theorem A.33 If p > 1, M is a martingale or positive submartingale, and E |Mi|p < ∞
for i ≤ n, then
E (M∗n )
p ≤
( p
p − 1
)p
E |Mn|p.
Proof Note M∗n ≤
∑n
i=1 |Mn|, hence M∗n ∈ Lp. We write, using Theorem A.32,
E (M∗n )
p =
∫ ∞
0
pap−1P(M∗n > a) da ≤

∫ ∞

0

pap−1E [ |Mn|1(M∗n ≥a)/a] da

= E

∫ M∗n

0

pap−2|Mn| da = p

p − 1E [(M

∗

n )

p−1|Mn|]

≤ p

p − 1 (E (M

∗

n )

p)(p−1)/p(E |Mn|p)1/p.

The last inequality follows by Hölder’s inequality. Now divide both sides by the quantity

(E (M∗n )

p)(p−1)/p.

A.10 Martingale convergence theorem

The martingale convergence theorem is another important consequence of optional stopping.

The main step is the upcrossing lemma. The number of upcrossings of an interval [a, b] is

the number of times a process M crosses from below a to above b.

To be more exact, let

S1 = min{k : Mk ≤ a}, T1 = min{k > S1 : Mk ≥ b},

and

Si+1 = min{k > Ti : Mk ≤ a}, Ti+1 = min{k > Si+1 : Mk ≥ b}.

The number of upcrossings Un before time n is Un = max{ j : Tj ≤ n}.

Theorem A.34 (Upcrossing lemma) If Mk is a submartingale,

EUn ≤ 1

b − aE [(Mn − a)

+].

Proof The number of upcrossings of [a, b] by Mk is the same as the number of upcrossings

of [0, b − a] by Yk = (Mk − a)+, where x+ = x ∨ 0. Moreover Yk is still a submartingale.

If we obtain the inequality for the number of upcrossings of the interval [0, b − a] by the

process Yk , we will have the desired inequality for upcrossings of M .

Thus we may assume a = 0. Fix n and define Yn+1 = Yn. This will still be a submartingale.

Define Si, Ti as above, and let S′i = Si ∧ (n + 1), T ′i = Ti ∧ (n + 1). Since Ti+1 > Si+1 > Ti,

then T ′n+1 = n + 1.

A.10 Martingale convergence theorem 363

We write

EYn+1 = EYS′1 +

n+1∑

i=0

E [YT ′i − YS′i ] +

n+1∑

i=0

E [YS′i+1 − YT ′i ].

All the summands in the third term on the right are non-negative since Yk is a submartingale.

The first term on the right will be non-negative sinceY is non-negative. For the jth upcrossing,

YT ′j − YS′j ≥ b − a, while YT ′j − YS′j is always greater than or equal to 0. Thus

n+1∑

i=0

(YT ′i − YS′i ) ≥ (b − a)Un.

Hence

EUn ≤ 1

b − aEYn+1. (A.19)

This leads to the martingale convergence theorem.

Theorem A.35 If Mn is a submartingale such that supn E M

+

n < ∞, then Mn converges
almost surely as n → ∞.
Proof For each a < b, let Un(a, b) be the number of upcrossings of [a, b] by M up to time
n, and let U (a, b) = limn→∞ Un. For each pair a < b of rational numbers, by monotone
convergence,
EU (a, b) ≤ 1
b − a supn E (Mn − a)
+ < ∞.
Thus U (a, b) < ∞, a.s. If Na,b is the set of ω’s where U (a, b) = ∞ and N = ∪a** lim inf n→∞ Mn(ω). There-
fore Mn converges almost surely, although we still have to rule out the possibility of the limit
being infinite. Since Mn is a submartingale, E Mn ≥ E M0, and thus
E |Mn| = E M+n + E M−n = 2E M+n − E Mn ≤ 2E M+n − E M0.
By Fatou’s lemma,
E lim
n
|Mn| ≤ sup
n
E |Mn| ≤ 2 sup
n
E M+n − E M0 < ∞,
or Mn converges almost surely to a finite limit.
Corollary A.36 If Xn is a positive supermartingale or a martingale bounded above or below,
Xn converges almost surely.
Proof If Xn is a positive supermartingale, −Xn is a submartingale bounded above by 0. Now
apply Theorem A.35.
If Xn is a martingale bounded above, by considering −Xn, we may assume Xn is bounded
below. Looking at Xn + M for fixed M will not affect the convergence, so we may assume Xn
is bounded below by 0. Now apply the first assertion of the corollary.
364 Basic probability
Mn is a uniformly integrable martingale if the collection of random variables {Mn} is
uniformly integrable.
Proposition A.37 (1) If Mn is a martingale with supn E |Mn|p < ∞ for some p > 1, then the
convergence is in Lp as well as almost surely. This is also true when Mn is a submartingale.
(2) If Mn is a uniformly integrable martingale, then the convergence is in L1.
(3) If Mn → M∞ in L1, then Mn = E [M∞ | Fn].
Proof (1) If supn E |Mn|p < ∞, then supn E M+n < ∞ and Mn converges almost surely. Let
M∞ be the limit. Then |Mn − M∞| → 0, a.s., and
E sup
n
|Mn − M∞|p ≤ cE sup
n
|Mn|p + cE |M∞|p
≤ cE sup
n
|Mn|p
≤ c sup
n
E |Mn|p < ∞.
The second inequality is by Fatou’s lemma and the last by Doob’s inequalities,
Theorem A.33. The Lp convergence assertion now follows by dominated convergence.
(2) The L1 convergence assertion follows since almost sure convergence together
with uniform integrability implies L1 convergence by the Vitali convergence theorem,
Theorem A.19.
(3) Finally, if j < n, we have Mj = E [Mn | F j]. If A ∈ F j,
E [Mj; A] = E [Mn; A] → E [M∞; A]
by the L1 convergence of Mn to M∞. Since this is true for all A ∈ F j, Mj = E [M∞ | F j].
A.11 Strong law of large numbers
Suppose we have a sequence X1, X2, . . . of independent and identically distributed random
variables. This means that the Xi are independent and each has the same law as X1. This
situation is very common, and we abbreviate this by saying the Xi are i.i.d.
Define
Sn =
n∑
i=1
Xi.
The Sn are called partial sums. In this section we suppose E |X1| < ∞. The strong law of
large number is the precise version of the law of averages.
Theorem A.38 If Xi is an i.i.d. sequence and E |X1| < ∞, then
Sn
n
→ E X1, a.s.
The proof we give is a mixture of the standard one and some martingale techniques. The
standard proof (see, e.g., Chung (2001)) uses no martingale methods, while there is a proof
(see Durrett (1996)) that is entirely martingale based.
A.11 Strong law of large numbers 365
Proof We may assume E Xi = 0, for otherwise we replace Xi by Xi − E Xi. Let Yn =
Xn1(|Xn|≤n), Zn = Yn − EYn, and
Mn =
n∑
i=1
Zi
i
.
Let Fn = σ (X1, . . . , Xn). Note that the Zi are independent but not identically distributed.
Using the independence, Mn is a martingale:
E [Mn+1 | Fn] = Mn + 1
n + 1E [Zn+1 | Fn] = Mn +
1
n + 1E [Zn+1] = Mn.
We will need the estimate
∞∑
i=1
P(|X1| ≥ i) =
∞∑
i=1
∫ i
i−1
P(|Xi| ≥ i) dx (A.20)
≤
∫ ∞
0
P(|X1| ≥ x) dx = E |X1| < ∞,
using Proposition A.4.
We show that E |Mn| is bounded by a constant not depending on n. In fact, again using
Proposition A.4,
E M2n = Var Mn =
n∑
i=1
Var Zi
i2
=
n∑
i=1
1
i2
VarYi
≤
n∑
i=1
1
i2
EY 2i ≤
∞∑
i=1
1
i2
∫ i
0
2yP(|Xi| ≥ y) dy
= 2
∞∑
i=1
1
i2
∫ ∞
0
1(y≤i)yP(|X1| ≥ y) dy
= 2
∫ ∞
0
∞∑
i=1
1
i2
1(y≤i)yP(|X1| ≥ y) dy
≤ c
∫ ∞
0
1
y
· yP(|X1| ≥ y) dy
= c
∫ ∞
0
P(|X1| ≥ y) dy = cE |X1| < ∞.
The uniform bound on E |Mn| follows by Jensen’s inequality.
By the martingale convergence theorem, Mn converges almost surely; let M∞ be the limit.
Some elementary calculus shows that 1n
∑n
i=1 Mi also converges to M∞, a.s. We now use
summation by parts as follows. Since i(Mi − Mi−1) = Zi and M0 = 0, then
1
n
n∑
i=1
Zi = 1
n
n∑
i=1
(iMi − iMi−1) = 1
n
( n∑
i=1
iMi −
n−1∑
i=1
(i + 1)Mi
)
= Mn − n − 1
n
( 1
n − 1
n−1∑
i=1
Mi
)
→ M∞ − M∞ = 0.
366 Basic probability
By dominated convergence and the fact that the Xi are identically distributed,
EYn = E [Xn1(|Xn|≤i)] = E [X11(|X1|≤n)] → E X1 = 0
as n → ∞, and this implies 1n
∑n
i=1 EYi → 0. Since Yi = Zi + EYi, we conclude
1
n
n∑
i=1
Yi → 0, a.s.
Finally,
∞∑
i=1
P(Xi �= Yi) =
∞∑
i=1
P(|Xi| ≥ i) =
∞∑
i=1
P(|X1| ≥ i) < ∞,
so by the Borel–Cantelli lemma, except for a set of probability zero, Xi = Yi for all i greater
than some positive integer I (I depends on ω). Hence∣∣∣1
n
n∑
i=1
Xi − 1
n
n∑
i=1
Yi
∣∣∣ ≤ 1
n
I∑
i=1
|Xi − Yi| → 0, a.s.
This completes the proof.
The following extension of the strong law will be needed when comparing a random walk
and a Brownian motion.
Proposition A.39 Suppose Xi is an i.i.d. sequence and E |X1| < ∞. Then
maxk≤n |Sk − E Sk|
n
→ 0, a.s.
Proof By looking at Xi −E Xi, we may assume E Xi = 0. Let j(n) be (one of) the value(s) of
j such that |Sj| = maxk≤n |Sk|. Suppose Sn(ω)/n → 0. It suffices to show |Sj(n)(ω)|/n → 0,
a.s.
If not, for this ω, either
(1) there is a subsequence nk → ∞ and ε > 0 such that j(nk ) → ∞ and |Sj(nk )|/nk ≥ ε
for all k; or
(2) there exists a subsequence nk → ∞, ε > 0, and N > 1 such that j(nk ) ≤ N and
|Sj(nk )|/nk ≥ ε for all k.
In case (1), since j(nk ) → ∞,
|Sj(nk )|
nk
= |Sj(nk )|
j(nk )
j(nk )
nk
≤ |Sj(nk )|
j(nk )
→ 0,
a contradiction. In case (2),
|Sj(nk )|
nk
≤ maxm≤N |Sm|
nk
→ 0,
also a contradiction.
Another application of the strong law of large numbers is the Glivenko–Cantelli
theorem. Let Xi be i.i.d. random variables which have a uniform distribution on [0, 1],**

A.12 Weak convergence 367

that is, P(X1 ≤ t) = t if 0 ≤ t ≤ 1. Let

Fn(t) = 1

n

n∑

i=1

1[0,t](Xi), 0 ≤ t ≤ 1.

By the strong law, Fn(t) → t, a.s., for each t. The Glivenko–Cantelli theorem says that the

convergence is uniform over t.

Theorem A.40 With Fn as above,

sup

0≤t≤1

|Fn(t) − t| → 0, a.s.

Proof For each t ∈ [0, 1], 1[0,t](Xi) is a sequence of i.i.d. random variables with expectation

P(Xi ≤ t) = t. By the strong law of large numbers, for each t, Fn(t) → t, a.s. Let Nt be the

set of ω such that Fn(t)(ω) does not converge to t, and let N = ∪Q+Nt . Then P(N ) = 0.

Let ε > 0 and take ω /∈ N . Take m > 2/ε and choose n0 large enough (depending on ω)

such that

|Fn(k/m)(ω) − (k/m)| < ε/2, k = 0, 1, 2, . . . , m,
if n ≥ n0. Then if n ≥ n0 and k/m ≤ t < (k + 1)/m,
Fn(t) − t ≤ Fn((k + 1)/m) − k/m ≤ Fn((k + 1)/m) − (k + 1)/m + ε/2 < ε,
and similarly Fn(t) − t > −ε. Hence for n ≥ n0,

sup

t∈[0,1]

|Fn(t) − t| ≤ ε.

Since ε is arbitrary, this proves the uniform convergence.

A.12 Weak convergence

We will see soon that if the Xi are i.i.d. with mean zero and variance one, then Sn/

√

n

converges in the sense that

P(Sn/

√

n ∈ [a, b]) → P(Z ∈ [a, b]),

where Z is a standard normal. We want to generalize the above type of convergence.

We say Fn converges weakly to F if Fn(x) → F (x) for all x at which F is continuous. Here

Fn and F are distribution functions. We say Xn converges weakly to X if FXn converges weakly

to FX . We also say Xn converges in distribution or converges in law to X . Probabilities μn

converge weakly if their corresponding distribution functions converge, that is, if Fμn (x) =

μn(−∞, x] converges weakly.

An example that illustrates why we restrict the convergence to continuity points of F is

the following. Let Xn = 1/n with probability one, and X = 0 with probability one. FXn (x) is

0 if x < 1/n and 1 otherwise. Note FXn (x) converges to FX (x) for all x except x = 0.
Proposition A.41 Xn converges weakly to X if and only if E g(Xn) → E g(X ) for all g
bounded and continuous.
368 Basic probability
Proof Suppose E g(Xn) → E g(X ) whenever g is bounded and continuous. Let ε > 0

and suppose x is a continuity point of FX . Choose δ such that FX (x) − ε < FX (x − δ) ≤
FX (x + δ) < FX (x) + ε. Let g be a continuous function taking values in [0, 1] such that g
equals 1 on (−∞, x] and equals 0 on [x + δ, ∞). Then
lim sup
n→∞
FXn (x) ≤ lim sup
n→∞
E g(Xn)
= E g(X ) ≤ FX (x + δ) < FX (x) + ε.
A similar argument shows that lim inf n→∞ FXn > FX (x) − ε. Since ε is arbitrary,

limn→∞ FXn (x) = FX (x).

Now suppose Xn → X weakly. Let ε > 0 and choose M > 0 such that M and −M are

continuity points for FX and also continuity points for each of the FXn , FX (−M ) < ε, and
FX (M ) > 1 − ε. Suppose g is bounded and continuous on R and without loss of generality

suppose g is bounded by 1. Then

lim sup

n→∞

|E [g(Xn); Xn /∈ [−M, M )]| (A.21)

≤ lim sup

n→∞

P(|Xn| ≥ M )

= lim sup

n→∞

FXn (−M ) + lim sup

n→∞

(1 − FXn (M ))

≤ 2ε.

Similarly,

|E [g(X ); X /∈ [−M, M )]| ≤ 2ε. (A.22)

Take f to be a step function of the form

∑m

i=1 ci1(ai,bi] such that | f (x) − g(x)| < ε for
x ∈ [−M, M ) and each ai and bi is a continuity point for FX and also continuity points for
each of the FXn . Then
E f (Xn) =
m∑
i=1
ci(FXn (bi) − FXn (ai)) (A.23)
→
m∑
i=1
ci(FX (bi) − FX (ai)) = E f (X ).
Finally, since f differs from g by at most ε on [−M, M), then
|E f (Xn) − E [g(Xn); Xn ∈ [−M, M )] | ≤ ε (A.24)
and similarly when Xn is replaced by X . Combining (A.21), (A.22), (A.23), and (A.24) and
using the fact that ε is arbitrary shows that E g(Xn) → E g(X ).
Let us examine the relationship between weak convergence and convergence in probability.
If Xi is an i.i.d. sequence, then Xi converges weakly, in fact, to X1, since all the Xi’s have the
same distribution. But from the independence it is not hard to see that the sequence Xi does
not converge in probability unless the Xi’s are identically constant. Therefore one can have
weak convergence without convergence in probability.
A.12 Weak convergence 369
Proposition A.42 (1) If Xn converges to X in probability, then it converges weakly.
(2) If Xn converges weakly to a constant, it converges in probability.
(3) (Slutsky’s theorem) If Xn converges weakly to X and Yn converges weakly to a constant
b, then Xn + Yn converges weakly to X + b and XnYn converges weakly to bX .
Proof To prove (1), let g be a bounded and continuous function. If nj is any subsequence,
then there exists a further subsequence such that X (njk ) converges almost surely to X . Then
by dominated convergence, E g(X (njk )) → E g(X ). That suffices to show E g(Xn) converges
to E g(X ).
For (2), if Xn converges weakly to b,
P(Xn − b > ε) = P(Xn > b + ε) = 1 − P(Xn ≤ b + ε) → 1 − P(b ≤ b + ε) = 0.

We use the fact that if Y is identically equal to b, then b + ε is a point of continuity for FY .

A similar equation shows P(Xn − b ≤ −ε) → 0, so P(|Xn − b| > ε) → 0.

We now prove the first part of (3), leaving the second part for the reader. Let x be a point

such that x − b is a continuity point of FX . Choose ε so that x − b + ε is again a continuity

point. Then

P(Xn + Yn ≤ x) ≤ P(Xn + b ≤ x + ε) + P(|Yn − b| > ε) → P(X ≤ x − b + ε).

Hence lim sup P(Xn + Yn ≤ x) ≤ P(X + b ≤ x + ε). Since ε can be arbitrarily small and

x − b is a continuity point of FX , then lim sup P(Xn + Yn ≤ x) ≤ P(X + b ≤ x). The lim inf

is done similarly.

We say a sequence of distribution functions {Fn} is tight if for each ε > 0 there exists M such

that Fn(M ) ≥ 1 − ε and Fn(−M ) ≤ ε for all n. A sequence of random variables {Xn} is tight

if the corresponding distribution functions are tight; this is equivalent to P(|Xn| ≥ M ) ≤ ε.

Theorem A.43 (Helly’s theorem) Let Fn be a sequence of distribution functions that is

tight. There exists a subsequence n j and a distribution function F such that Fnj converges

weakly to F.

What could conceivably happen is that Xn is identically equal to n, so that FXn → 0, but the

function F that is identically equal to 0 is not a distribution function; the tightness precludes

this.

Proof Let qk be an enumeration of the rationals. Since Fn(qk ) ∈ [0, 1], any subsequence

has a further subsequence that converges. Use a diagonalization argument (as in the proof

of the Ascoli–Arzelà theorem; see Rudin (1976)) so that Fnj (qk ) converges for each qk and

call the limit F (qk ). F is increasing, and define F (x) = inf qk≥x F (qk ). Hence F is right

continuous and increasing.

If x is a point of continuity of F and ε > 0, then there exist r and s rational such that

r < x < s and F (s) − ε < F (x) < F (r) + ε. Then
Fnj (x) ≥ Fnj (r) → F (r) > F (x) − ε

and

Fnj (x) ≤ Fnj (s) → F (s) < F (x) + ε.
Since ε is arbitrary, Fnj (x) → F (x).
370 Basic probability
Since the Fn are tight, there exists M such that Fn(−M ) < ε. Then F (−M ) ≤ ε, which
implies limx→−∞ F (x) = 0. Showing limx→∞ F (x) = 1 is similar. Therefore F is in fact a
distribution function.
We conclude by giving an easily checked criterion for tightness.
Proposition A.44 Suppose there exists ϕ : [0, ∞) → [0, ∞) that is increasing and ϕ(x) →
∞ as x → ∞. If a = supn E ϕ(|Xn|) < ∞, then the sequence {Xn} is tight.
Proof Let ε > 0. Choose M such that ϕ(x) ≥ a/ε if x > M . Then

P(|Xn| > M ) ≤

∫

ϕ(|Xn|)

a/ε

1(|Xn|>M )dP ≤

ε

a

E ϕ(|Xn|) ≤ ε.

The conclusion follows.

In particular, if supn E |Xn|2 < ∞, the sequence {Xn} is tight.
A.13 Characteristic functions
We define the characteristic function of a random variable X by ϕX (t) = E eitx for t ∈ R.
Note that ϕX (t) =
∫
eitxPX (dx). Thus if X and Y have the same law, they have the same
characteristic function. Also, if the law of X has a density, that is, PX (dx) = fX (x) dx, then
ϕX (t) =
∫
eitx fX (x) dx, so in this case the characteristic function is the same as the definition
of the Fourier transform of fX .
Proposition A.45 ϕ(0) = 1, |ϕ(t)| ≤ 1, ϕ(−t) = ϕ(t), and ϕ is uniformly continuous.
Proof Since |eitx| ≤ 1, everything follows immediately from the definitions except the
uniform continuity. For that we write
|ϕ(t + h) − ϕ(t)| = |E ei(t+h)X − E eitX | ≤ E |eitX (eihX − 1)| = E |eihX − 1|.
Since |eihX − 1| tends to zero almost surely as h → 0, the right-hand side tends to zero by
dominated convergence. Note that the right-hand side is independent of t.
Proposition A.46 ϕaX (t) = ϕX (at) and ϕX +b(t) = eitbϕX (t).
Proof The first follows from E eit(aX ) = E ei(at)X , and the second is similar.
Proposition A.47 If X and Y are independent, then
ϕX +Y (t) = ϕX (t)ϕY (t).
Proof From the multiplication theorem,
E eit(X +Y ) = E eitX eitY = E eitX E eitY ,
and we are done.
Let us look at some examples of characteristic functions.
(1) Bernoulli: By direct computation,
ϕX (t) = peit + (1 − p) = 1 − p(1 − eit ).
A.13 Characteristic functions 371
(2) Binomial: Write X as the sum of n independent Bernoulli random variables Bi with
parameter p. Thus
ϕX (t) =
n∏
i=1
ϕBi (t) = [ϕBi (t)]n = [1 − p(1 − eit )]n.
(3) Point mass at a: E eitX = eita. Note that when a = 0, then ϕ is identically equal to 1.
(4) Poisson:
E eitX =
∞∑
k=0
eitke−λ
λk
k!
= e−λ
∑ (λeit )k
k!
= e−λeλeit = eλ(eit−1).
(5) Uniform on [a, b]:
ϕ(t) = 1
b − a
∫ b
a
eitxdx = e
itb − eita
(b − a)it .
Note that when a = −b this reduces to sin(bt)/bt.
(6) Exponential:
ϕ(t) =
∫ ∞
0
λeitxe−λx dx = λ
∫ ∞
0
e(it−λ)xdx = λ
λ − it .
(7) Standard normal:
ϕ(t) = 1√
2π
∫ ∞
−∞
eitxe−x
2/2dx.
This can be done by completing the square and then doing a contour integration. Alternately,
ϕ′(t) = (1/√2π) ∫∞−∞ ixeitxe−x2/2dx. (Do the real and imaginary parts separately, and use
the dominated convergence theorem to justify taking the derivative inside.) Integrating by
parts (do the real and imaginary parts separately), ϕ′(t) = −tϕ(t). The only solution to this
differential equation with ϕ(0) = 1 is ϕ(t) = e−t2/2.
(8) Normal with mean μ and variance σ 2: Writing X = σZ + μ, where Z is a standard
normal, then
ϕX (t) = eiμtϕZ(σ t) = eiμt−σ 2t2/2. (A.25)
(9) Gamma. If X has a gamma distribution with parameters λ and r, then its characteristic
function is
E eiuX =
( λ
λ − it
)r
.
Formally, this comes from writing
ϕ(t) = 1
(r)
∫ ∞
0
eitxλe−λx(λx)r−1 dx = λ
r
(r)
∫ ∞
0
e−(λ−it)xxr−1 dx
and performing a change of variables. To do it properly requires a contour integration around
the boundary of the region in the complex plane that is bounded by the positive x axis, the
ray {(λ − it)r : r > 0}, ∂B(0, ε), and ∂B(0, R), and then letting ε → 0 and R → ∞.

372 Basic probability

A.14 Uniqueness and characteristic functions

Theorem A.48 If ϕX = ϕY , then PX = PY .

Proof If f is in the Schwartz class, then so is f̂ ; see Section B.2. We use the Fubini theorem

and the Fourier inversion theorem to write

E f (X ) = (2π)−1E

[ ∫

f̂ (u)e−iuX du

]

= (2π)−1

∫

f̂ (u)ϕX (−u) du,

and similarly for E f (Y ). Since ϕX = ϕY , we conclude E f (X ) = E f (Y ). By a limit

procedure, we have this equality for all bounded and measurable f , in particular, when f is

the indicator of a set.

The same proof works in higher dimensions: if

E ei

∑n

j=1 u jXj = E ei

∑n

j=1 u jYj

for all (u1, . . . , un) ∈ Rn, then the joint laws of (X1, . . . , Xn) and (Y1, . . . ,Yn) are equal. The

expression E ei

∑n

j=1 u jXj is called the joint characteristic function of (X1, . . . , Xn).

The following proposition can be proved directly, but the proof using characteristic func-

tions is much easier.

Proposition A.49 (1) If X and Y are independent, X is a normal random variable with

mean a and variance b2, and Y is a normal random variable with mean c and variance d2,

then X + Y is normal random variable with mean a + c and variance b2 + d2.

(2) If X and Y are independent, X is a Poisson random variable with parameter λ1, and Y

is a Poisson random variable with parameter λ2, then X + Y is a Poisson random variable

with parameter λ1 + λ2.

(3) If X and Y are independent random variables, where X has a gamma distribution with

parameters λ and r1 and Y has a gamma distribution with parameters λ and r2, then X + Y

has a gamma distribution with parameters λ and r1 + r2.

Proof For (1),

ϕX +Y (t) = ϕX (t)ϕY (t) = eiat−b2t2/2eict−c2t2/2 = ei(a+c)t−(b2+d2 )t2/2.

Now use the uniqueness theorem.

Parts (2) and (3) are proved similarly.

A.15 The central limit theorem

We need the following estimate on moments.

Proposition A.50 If E |X |k < ∞ for an integer k, then ϕX has a continuous derivative of
order k and
ϕ
(k)
X (t) =
∫
(ix)keitxPX (dx).
In particular, ϕ(k)X (0) = ikE X k.
A.15 The central limit theorem 373
Proof Write
ϕX (t + h) − ϕX (t)
h
=
∫
ei(t+h)x − eitx
h
P(dx).
Since |eihx − 1| ≤ |h| |x|, the integrand is bounded by |x|. Thus if ∫ |x|PX (dx) < ∞, we
can use dominated convergence to obtain the desired formula for ϕ′X (t). As in the proof of
Proposition A.45, we see ϕ′X (t) is continuous. We do the case of general k by induction.
Evaluating ϕ(k)X at 0 shows ϕ
(k)
X (0) = ikE X k .
By the above,
E X 2 = −ϕ′′X (0). (A.26)
The simplest case of the central limit theorem (CLT) is the case when the Xi’s are i.i.d.,
with mean zero and variance one, Sn =
∑n
i=1 Xi, and then the CLT says that Sn/
√
n converges
weakly to a standard normal. This is the case we prove.
We need the fact that if wn are complex numbers converging to w, then (1+(wn/n))n → ew.
We leave the proof of this to the reader, with the warning that any proof using loga-
rithms needs to be done with some care, since log z is a multivalued function when z is
complex.
Theorem A.51 Suppose the Xi’s are i.i.d. random variables with mean zero and variance
one. Then Sn/
√
n converges weakly to a standard normal.
Proof Since X1 has finite second moment, then ϕX1 has a continuous second derivative by
Proposition A.50. By Taylor’s theorem,
ϕX1 (t) = ϕX1 (0) + ϕ′X1 (0)t + ϕ′′X1 (0)t2/2 + R(t),
where |R(t)|/t2 → 0 as |t| → 0. Thus
ϕX1 (t) = 1 − t2/2 + R(t).
Then
ϕSn/
√
n(t) = ϕSn (t/
√
n) = (ϕX1 (t/
√
n))n =
[
1 − t
2
2n
+ R(t/√n)
]n
.
Since t/
√
n converges to zero as n → ∞, we have
ϕSn/
√
n(t) → e−t2/2.
Since E S2n/n = 1 for all n, Proposition A.44 tells us that the random variables Sn/
√
n
are tight, and from Theorem A.43, subsequential weak limit points exist. By the preceding
paragraph, any weak limit of a subsequence is a normal random variable with mean zero and
variance one. Therefore the entire sequence converges weakly to a normal random variable
with mean zero and variance one.
374 Basic probability
A.16 Gaussian random variables
A normal random variable is also known as a Gaussian random variable.
Proposition A.52 If Z is a mean zero normal random variable with variance one and x ≥ 1,
then
1
x
e−x
2/2 ≤ P(Z ≥ x) ≤ e−x2/2.
In particular, if ε > 0, there exists x0 such that

P(Z ≥ x) ≥ e−(1+ε)x2/2

if x ≥ x0.

Proof For the right-hand inequality,

P(Z ≥ x) = 1√

2π

∫ ∞

x

e−y

2/2 dy ≤

∫ ∞

x

y

x

e−y

2/2 dy = 1

x

e−x

2/2.

The left-hand inequality is left as an exercise.

Proposition A.53 If Xn is a normal random variable with mean an and variance b2n, Xn

converges to X weakly, an → a, and bn → b �= 0, then X is a normal random variable with

mean a and variance b2.

Proof Since

E X 2n = Var Xn + (E Xn)2 = b2n + a2n,

then supn E X

2

n < ∞, and the Xn are tight. For each t, the characteristic functions converge:
ϕX (t) = lim
n→∞
ϕXn (t) = limn→∞ e
itan−t2b2n/2 = eita−t2b2/2,
and the last term is the characteristic function of a normal random variable with mean a and
variance b2. Therefore any weak subsequential limit point of the sequence Xn is a normal
random variable with mean a and variance b2.
We next prove
Proposition A.54 If
E ei(uX +vY ) = E eiuX E eivY (A.27)
for all u and v, then X and Y are independent random variables.
Proof Let X ′ be a random variable with the same law as X , Y ′ one with the same law as Y ,
and so that X ′ is independent of Y ′. (We let � = [0, 1]2, P a Lebesgue measure, X ′ a function
of the first variable, and Y ′ a function of the second variable defined as in Proposition A.2.)
Then since eiuX
′
and eivY
′
are independent,
E ei(uX
′+vY ′) = E eiuX ′E eivY ′ . (A.28)
Since X , X ′ have the same law, E eiuX = E eiuX ′ , and similarly for Y,Y ′. Therefore, using
(A.27) and (A.28), (X ′,Y ′) has the same joint characteristic function as (X ,Y ). By the
A.16 Gaussian random variables 375
uniqueness theorem for characteristic functions, (X ′,Y ′) has the same joint law as (X ,Y ),
which implies that X and Y are independent.
A sequence of random variables X1, . . . , Xn is said to be jointly normal if there exists a
sequence of i.i.d. normal random variables Z1, . . . , Zm with mean zero and variance one and
constants bi j and ai such that
Xi =
m∑
j=1
bi jZ j + ai, i = 1, . . . , n. (A.29)
In matrix notation, X = BZ + A. For simplicity, in what follows let us take A = 0; the
modifications for the general case are easy. The covariance of two random variables X and
Y is defined to be E [(X − E X )(Y − EY )]. Since we are assuming our normal random
variables are mean zero, we can omit the centering at expectations. Given a sequence of
mean zero random variables, we can talk about the covariance matrix, which is
Cov (X ) = E X X T ,
where X T denotes the transpose of the vector X . In the above case, we see Cov (X ) =
E [(BZ)(BZ)T ] = E [BZZT BT ] = BBT , since E ZZT = I , the identity.
Let us compute the joint characteristic function E eiu
T X of the vector X , where u is an
n-dimensional vector. First, if v is an m-dimensional vector,
E eiv
T Z = E
m∏
j=1
eiv jZ j =
m∏
j=1
E eiv jZ j =
m∏
j=1
e−v
2
j/2 = e−vT v/2
using the independence of the Zj’s. Thus
E eiu
T X = E eiuT BZ = e−uT BBT u/2.
By taking u = (0, . . . , 0, a, 0, . . . , 0) to be a constant times the unit vector in the jth
coordinate direction, we deduce that Xj is indeed normal, and this is true for each j.
Note that the joint characteristic function of a jointly normal collection of random vari-
ables X = (X1, . . . , Xn) is completely determined by BBT , which is the covariance matrix
of X . In the case when the Xi’s are not mean zero, we can readily check that the joint char-
acteristic function is determined by the covariance matrix together with the vector of means
E X . Therefore the joint distribution of a jointly normal collection of random variables is
determined by the covariance matrix and the means.
Proposition A.55 If the Xi are jointly normal and Cov (Xi, Xj) = 0 for i �= j, then the Xi
are independent.
Proof If Cov (X ) = BBT is a diagonal matrix, then the joint characteristic function of the
Xi’s factors into the product of the characteristic functions of the Xi’s, and so by Proposition
A.54, the Xi’s will in this case be independent.
Remark A.56 We note that the analog of Proposition A.53 holds for jointly normal random
vectors. That is, if (X 1j , . . . , X
n
j ) is a jointly normal collection of random variables for each
j and each X ij converges in probability to X
i and each Xi is nonconstant, then (X 1, . . . , X n)
376 Basic probability
is a jointly normal collection of random variables. This follows by looking at the joint
characteristic functions as in the proof of Proposition A.53.
We present the multidimensional central limit theorem.
Theorem A.57 Let Xj = (X 1j , . . . , X dj ) be random vectors taking values in Rd and suppose
the X1, X2, . . . are independent and identically distributed. Suppose E X k1 = 0 and E (X k1 )2 <
∞ for k = 1, . . . , d and let Ck� = E [X k1 X �1 ]. If Sn =
∑n
j=1 Xj, then Sn/
√
n converges weakly
to a jointly normal random vector Z = (Z1, . . . , Zd ) where each Zk has mean zero and the
covariance of Zk and Z� is Ck�.
Proof Since
E |Sn|2/n =
n∑
j=1
d∑
k=1
E |X kj |2/n
is bounded independently of n, the random vectors Sn/
√
n are tight, and therefore weak
subsequential limit points exist. We need to show that any subsequential limit point is a
jointly normal random vector with mean zero and covariance matrix C.
If u1, . . . , ud ∈ R, then
∑d
k=1 ukX
k
j , j = 1, 2, . . . , will be a sequence of i.i.d. random
variables with mean zero and variance
∑d
k,�=1 uku�Ck�. By Theorem A.51,∑n
j=1
∑d
k=1 ukX
k
j√
n
converges weakly to a mean zero normal random variable with variance equal to∑d
k,�=1 uku�Ck�. If we write Sn = (S1n, . . . , Sdn ), then
E exp
(
i
d∑
k=1
ukS
k
n/
√
n
)
→ exp
(
−
d∑
k,�=1
uku�Ck�/2
)
.
This shows that any subsequential limit point of the sequence Sn/
√
n has the required law.
If (X ,Y1, . . . ,Yn) are jointly normal random variables, then the law of X given Y1, . . . ,Yn
is also Gaussian.
Proposition A.58 Suppose X ,Y1, . . . ,Yn are jointly normal random variables with mean
zero. Let A be the n × 1 matrix whose ith entry is Cov (X ,Yi), B the n × n matrix whose
(i, j)th entry is Cov (Yi,Yj), and Y the n × 1 matrix whose ith entry is Yi. Suppose B is
invertible and let D = B−1A. Then for u ∈ R,
E [eiuX | Y1, . . . ,Yn] = eiuDT Y e−(Var X −AT B−1A)/2.
In particular, the law of X given Y1, . . . ,Yn is that of a normal random variable with mean
DTY and variance equal to Var X − AT B−1A.
A.16 Gaussian random variables 377
Proof Note
Cov (X − DTY,Yj) = Cov (X ,Yj) −
n∑
i=1
DiCov (Yi,Yj)
= Aj −
n∑
i=1
DiBi j = 0,
so X − DTY is independent of each Yj. Then
E [eiuX | Y1, . . . ,Yn] = eiuDT Y E [eiu(X −DT Y | Y1, . . . ,Yn]
= eiuDT Y E [eiu(X −DT Y ]
= eiuDT Y E e−Var (X −DT Y )/2.
To complete the proof, we calculate
Var (X − DTY ) = Var X − 2
∑
i
DiAi +
∑
i, j
DiBi jDj
= Var X − AT B−1A,
and we are done.
Appendix B
Some results from analysis
B.1 The monotone class theorem
The monotone class theorem is a result from measure theory used in the proof of the Fubini
theorem.
Definition B.1 M is a monotone class if M is a collection of subsets of X such that
(1) if A1 ⊂ A2 ⊂ · · · , A = ∪iAi, and each Ai ∈ M, then A ∈ M;
(2) if A1 ⊃ A2 ⊃ · · · , A = ∩iAi, and each Ai ∈ M, then A ∈ M.
Recall that an algebra of sets is a collection A of sets such that if A1, . . . , An ∈ A, then
A1 ∪ · · · ∪ An and A1 ∩ · · · ∩ An are also in A, and if A ∈ A, then Ac ∈ A.
The intersection of monotone classes is a monotone class, and the intersection of all mono-
tone classes containing a given collection of sets is the smallest monotone class containing
that collection.
Theorem B.2 Suppose A0 is an algebra of sets, A is the smallest σ -field containing A0,
and M is the smallest monotone class containing A0. Then M = A.
Proof A σ -algebra is clearly a monotone class, so M ⊂ A. We must show A ⊂ M.
Let N1 = {A ∈ M : Ac ∈ M}. Note N1 is contained in M, contains A0, and is a
monotone class. Since M is the smallest monotone class containing A0, then N1 = M, and
therefore M is closed under the operation of taking complements.
Let N2 = {A ∈ M : A ∩ B ∈ M for all B ∈ A0}. N2 is contained in M; N2 contains A0
because A0 is an algebra; N2 is a monotone class because (∪∞i=1Ai) ∩ B = ∪∞i=1(Ai ∩ B), and
similarly for intersections. Therefore N2 = M; in other words, if B ∈ A0 and A ∈ M, then
A ∩ B ∈ M.
Let N3 = {A ∈ M : A ∩ B ∈ M for all B ∈ M}. As in the preceding paragraph, N3
is a monotone class contained in M. By the last sentence of the preceding paragraph, N3
contains A0. Hence N3 = M.
We thus have that M is a monotone class closed under the operations of tak-
ing complements and taking intersections. This shows M is a σ -algebra, and so
A ⊂ M.
378
B.2 The Schwartz class 379
B.2 The Schwartz class
A function f : Rd → R is in the Schwartz class if f is C∞ and for each m, k ≥ 0 and each
i1, i2, . . . , ik ∈ {1, 2, . . . , d},
|x|m
∣∣∣ ∂k f
∂xi1 · · · ∂xik
(x)
∣∣∣→ 0
as |x| → ∞. (Here i1, . . . , ik need not be distinct.)
Suppose that f is in the Schwartz class. Suppose m, k ≥ 0 and i1, . . . , ik and j1, . . . , jn
are each integers between 1 and d inclusive, and m1, . . . , mk are even positive integers. Let
f̂ be the Fourier transform of f :
f̂ (u) =
∫
Rd
eiu·x f (x) dx.
Then
um1i1 · · · umkik
∂ j1+···+ jn f̂
∂uj1 · · · ∂ujn
(u)
is bounded as a function of u because it is a constant times the Fourier transform of
x j1 · · · x jn
∂m1+···+mk f
∂xm1i1 · · · ∂xmkik
,
which is in L1(Rd ) since f is in the Schwartz class. We conclude that f̂ is also in the Schwartz
class.
Appendix C
Regular conditional probabilities
Let E ⊂ F be σ -fields, where (�,F , P) is a probability space. A regular conditional
probability for E [ · | E] is a map Q : � × F → [0, 1] such that
(1) Q(ω, ·) is a probability measure on (�,F ) for each ω;
(2) for each A ∈ F , Q(·, A) is an E measurable random variable;
(3) for each A ∈ F and each B ∈ E ,∫
B
Q(ω, A) P(dω) = P(A ∩ B).
Q(ω, A) can be thought of as P(A | E ).
Theorem C.1 Suppose (�,F , P) is a probability space, E ⊂ F , and � is in addition a
complete and separable metric space. Then a regular conditional probability for P(· | E )
exists.
Proof Since � is a complete and separable metric space, we can embed � as a subset of
the compact set I = [0, 1]N, where we furnish I with the product topology. Let { f j} be a
countable collection of uniformly continuous functions on � such that every finite subset of
distinct elements is linearly independent and such thatL0, the set of finite linear combinations
of the f j’s, is dense in the class of uniformly continuous functions on �; let us assume f1 is
identically equal to 1.
For each j, let gj = E [ f j | E]. (The random variables gj are only defined up to almost
sure equivalence. For each j we select an element gj from the equivalence class and keep it
fixed.) If r1, . . . , rn are rationals with
r1 f1(ω) + · · · + rn fn(ω) ≥ 0
for all ω, let
N (r1, . . . , rn) = {ω : r1g1(ω) + · · · + rngn(ω) < 0}.
By the definition of gj, P(N (r1, . . . , rn)) = 0. Let N1 be the union of all such N (r1, . . . , rn)
with n ≥ 1, the r j rational. Then N1 ∈ E and P(N1) = 0.
Fix ω ∈ � \ N1. Define a functional Lω on L0 by
Lω( f ) = t1g1(ω) + · · · + tngn(ω)
if
f = t1 f1 + · · · + tn fn.
380
Regular conditional probabilities 381
We claim Lω is a positive linear functional. If f = t1 f1 +· · ·+ tn fn ≥ 0 and ε > 0 is rational,

then there exist rationals r1, . . . , rn such that r1 f1 + . . . + rn fn ≥ −ε and |ti − ri| ≤ ε,

i = 1, . . . , n, or

(r1 + ε) f1 + r2 f2 + · · · + rn fn ≥ 0.

Since ω /∈ N1, then

(r1 + ε)g1 + r2g2 + · · · + rngn ≥ 0.

Letting ε → 0, it follows that t1g1 + · · · + tngn ≥ 0. This proves that Lω is positive.

Since Lω( f1) = 1, this implies that Lω is a bounded linear functional, and by the Hahn–

Banach theorem Lω can be extended to a positive linear functional on the closure of L0. Any

uniformly continuous function on � can be extended uniquely to �, the closure of � in I ,

so Lω can be considered as a positive linear functional on C(�). By the Riesz representation

theorem, there exists a probability measure Q(ω, ·) such that

Lω( f ) =

∫

f (ω′)Q(ω, dω′).

The mapping ω → Lω( f ) is measurable with respect to E for each f ∈ L0, hence for all

uniformly continuous functions on � by a limit argument. If B ∈ E and f = t1 f1 +· · ·+ tn fn,∫

B

[ ∫

f (ω′) Q(ω, dω′)

]

P(dω) =

∫

B

Lω f (ω) P(dω)

=

∫

B

(t1g1 + · · · + tngn)(ω) P(dω)

=

∫

B

E [t1 f1 + · · · + tn fn | E](ω) P(dω)

=

∫

B

f (ω) P(dω)

or

∫

f (ω′)Q(ω, dω′) is a version of E [ f |E] if f ∈ L0. By a limit argument, the same is true

for all f that are of the form f = 1A with A ∈ F .

Let Gni be a sequence of balls of radius 1/n (with respect to the metric on �) contained in �

and covering �. Choose in such that P(∪i≤in Gni) > 1 − 1/(n2n). The set Hn = ∩n≥1 ∪i≤in Gni

is totally bounded; let Kn be the closure of Hn in �. Since � is complete, Kn is complete and

totally bounded, and hence compact, and P(Kn) ≥ 1 − 1/n. Hence

E [Q(·, ∪∞i=1Ki); � \ N1] ≥ E [Q(·, Kn); � \ N1] = P(Kn) ≥ 1 − (1/n)

for each n, or Q(ω, ∪∞i=1Ki) = 1, a.s. Let N2 be the null set for which this fails. Thus for

ω ∈ � \ (N1 ∪ N2), we see that Q(ω, dω′) is a probability measure on �. For ω ∈ N1 ∪ N2,

set Q(ω, ·) = P(·). This Q is the desired regular conditional probability.

Appendix D

Kolmogorov extension theorem

Suppose S is a metric space. We use SN for the product space S×S×· · · furnished with the

product topology. We may view SN as the set of sequences (x1, x2, . . .) of elements of S . We

use the σ -field on SN generated by the cylindrical sets. Given an element x = (x1, x2, . . .) of

SN, we define πn(x) = (x1, . . . , xn) ∈ Sn.

We suppose we have a Radon probability measure μn defined on Sn for each n. (Being

a Radon measure means that we can approximate μn(A) from below by compact sets; see

Folland (1999) for details.) The μn are consistent if μn+1(A × S ) = μn(A) whenever A is a

Borel subset of Sn. The Kolmogorov extension theorem is the following.

Theorem D.1 Suppose for each n we have a probability measure μn on Sn. Suppose the μn’s

are consistent. Then there exists a probability measure μ onSN such that μ(A×SN) = μn(A)

for all A ⊂ Sn.

Proof Define μ on cylindrical sets by μ(A × SN) = μn(A) if A ⊂ Sn. By the consistency

assumption, μ is well defined. By the Carathéodory extension theorem, we can extend μ

to the σ -field generated by the cylindrical sets provided we show that whenever An are

cylindrical sets decreasing to ∅, then μ(An) → 0.

Suppose An are cylindrical sets decreasing to ∅ but μ(An) does not tend to 0; by taking

a subsequence we may assume without loss of generality that there exists ε > 0 such that

μ(An) ≥ ε for all n. We will obtain a contradiction.

We first want to arrange things so that each An = πn(An) × SN. Suppose An is of the

form

An = {(x1, x2, . . .) : (x1, . . . , x jn ) ∈ Bn},

where Bn is a Borel subset of S jn . We choose mn = n + max( j1, . . . , jn). Let

A0 = SN. We then replace our original sequence A1, A2, . . . by the sequence

A0, . . . , A0, A1, . . . , A1, A2, . . . , A2, A3, . . . , where we have m1 occurrences of A0, m2 − m1

occurrences of A1, m3 − m2 occurrences of A2, and so on. Therefore we may without loss of

generality suppose jn ≤ n. We then have

An = {(x1, x2, . . .) : (x1, . . . , xn) ∈ Bn × Sn− jn}.

Replacing Bn by Bn × S jn−n, we may without loss of generality suppose An =

πn(An) × SN.

382

Kolmogorov extension theorem 383

We set Ãn = πn(An). For each n, choose B̃n ⊂ Ãn so that B̃n is compact and μ(Ãn \ B̃n) ≤

ε/2n+1. Let Bn = B̃n ×SN and let Cn = B1 ∩ . . . ∩ Bn. Hence Cn ⊂ Bn ⊂ An, and Cn ↓ ∅, but

μ(Cn) ≥ μ(An) −

n∑

i=1

μ(Ai \ Bi) ≥ ε/2,

and C̃n = πn(Cn), the projection of Cn onto Sn, is compact.

We will find x = (x1, . . . , xn, . . . ) ∈ ∩nCn and obtain our contradiction. For each n choose

a point y(n) ∈ Cn. The first coordinates of {y(n)}, namely, {y1(n)}, form a sequence contained

in C̃1, which is compact, hence there is a convergent subsequence {y1(nk )}. Let x1 be the limit

point. The first and second coordinates of {y(nk )} form a sequence contained in the compact

set C̃2, so a further subsequence {(y1(nkj ), y2(nkj ))} converges to a point in C̃2. Since {nkj }

is a subsequence of {nk}, the first coordinate of the limit is x1. Therefore the limit point of

{(y1(nkj ), y2(nkj ))} is of the form (x1, x2), and this point is in C̃2. We continue this procedure

to obtain x = (x1, x2, . . . , xn, . . .). By our construction, (x1, . . . , xn) ∈ C̃n for each n, hence

x ∈ Cn for each n, or x ∈ ∩nCn, a contradiction.

A typical application of this theorem is to construct a countable sequence of independent

random variables. We construct X1, . . . , Xn as in Proposition A.10. Here S = [0, 1]. Let μn

be the law of (X1, . . . , Xn); it is easy to check that the μn form a consistent family. We use

Theorem D.1 to obtain a probability measure μ on [0, 1]N. To get random variables out of

this, we let Xi(ω) = ωi if ω = (ω1, ω2, . . .).

References

Aldous, D. 1978. Stopping times and tightness. Ann. Probab. 6, 335–40.

Barlow, M. T. 1982. One-dimensional stochastic differential equations with no strong solution. J. London

Math. Soc. 26, 335–47.

Bass, R. F. 1983. Skorokhod imbedding via stochastic integrals. Séminaire de Probabilités XVII. New York:

Springer-Verlag; 221–4.

Bass, R. F. 1995. Probabilistic Techniques in Analysis. New York: Springer-Verlag.

Bass, R. F. 1996. The Doob–Meyer decomposition revisited. Can. Math. Bull. 39, 138–50.

Bass, R. F. 1997. Diffusions and Elliptic Operators. New York: Springer-Verlag.

Billingsley, P. 1968. Convergence of Probability Measures. New York: John Wiley & Sons, Ltd.

Billingsley, P. 1971. Weak Convergence of Measures: Applications in Probability. Philadelphia: SIAM.

Blumenthal, R. M. and Getoor, R. K. 1968. Markov Processes and Potential Theory. New York: Academic

Press.

Bogachev, V. I. 1998. Gaussian Measures. Providence, RI: American Mathematical Society.

Boyce, W. E. and DiPrima, R. C. 2009. Elementary Differential Equations and Boundary Value Problems,

9th edn. New York: John Wiley & Sons, Ltd.

Chung, K. L. 2001. A Course in Probability Theory, 3rd edn. San Diego: Academic Press.

Chung, K. L. and Walsh, J. B. 1969. To reverse a Markov process. Acta Math. 123, 225–51.

Dawson, D. A. 1993. Measure-valued Markov processes. Ecole d’Eté de Probabilités de Saint-Flour XXI–

1991. Berlin: Springer-Verlag.

Dellacherie, C. and Meyer, P.-A. 1978. Probability and Potential. Amsterdam: North-Holland.

Dudley, R. M. 1973. Sample functions of the Gaussian process. Ann. Probab. 1, 66–103.

Durrett, R. 1996. Probability: Theory and Examples. Belmont, CA: Duxbury Press.

Ethier, S. N. and Kurtz, T. G. 1986. Markov Processes: Characterization and Convergence. New York: John

Wiley & Sons, Ltd.

Feller, W. 1971. An Introduction to Probability Theory and its Applications, 2nd edn. New York: John

Wiley & Sons, Ltd.

Folland, G. B. 1999. Real Analysis: Modern Techniques and their Applications, 2nd edn. New York: John

Wiley & Sons, Ltd.

Fukushima, M., Oshima, Y. and Takeda, M. 1994. Dirichlet Forms and Symmetric Markov Processes. Berlin:

de Gruyter.

Gilbarg, D. and Trudinger, N. S. 1983. Elliptic Partial Differential Equations of Second Order, 2nd edn.

New York: Springer-Verlag.

Itô, K. and McKean, Jr, H. P. 1965. Diffusion Processes and their Sample Paths. Berlin: Springer-Verlag.

Kallianpur, G. 1980. Stochastic Filtering Theory. Berlin: Springer-Verlag.

Karatzas, I. and Shreve, S. E. 1991. Brownian Motion and Stochastic Calculus, 2nd edn. New York: Springer-

Verlag.

Knight, F. B. 1981. Essentials of Brownian Motion and Diffusion. Providence, RI: American Mathematical

Society.

Kuo, H. H. 1975. Gaussian Measures in Banach Spaces. New York: Springer-Verlag.

Lax, P. 2002. Functional Analysis. New York: John Wiley & Sons, Ltd.

385

386 References

Liggett, T. M. 2010. Continuous Time Markov Processes: An Introduction. Providence, RI: American Math-

ematical Society.

Meyer, P.-A., Smythe, R. T. and Walsh, J. B. 1972. Birth and death of Markov processes. Proceedings of the

Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. III. Berkeley, CA: University

of California Press; 295–305.

Obłój, J. 2004. The Skorokhod embedding problem and its offspring. Probab. Surv. 1, 321–90.

Øksendal, B. 2003. Stochastic Differential Equations: An Introduction with Applications, 6th edn. Berlin:

Springer-Verlag.

Perkins, E. A. 2002. Dawson–Watanabe superprocesses and measure-valued diffusions. Lectures on Proba-

bility Theory and Statistics (Saint-Flour, 1999). Berlin: Springer-Verlag; 125–324.

Revuz, D. and Yor, M. 1999. Continuous Martingales and Brownian Motion, 3rd edn. Berlin: Springer-

Verlag.

Rogers, L. C. G. and Williams, D. 2000a. Diffusions, Markov Processes, and Martingales, Vol. 1. Cambridge:

Cambridge University Press.

Rogers, L. C. G. and Williams, D. 2000b. Diffusions, Markov Processes, and Martingales, Vol. 2. Cambridge:

Cambridge University Press.

Rudin, W. 1976. Principles of Mathematical Analysis, 3rd edn. New York: McGraw-Hill.

Rudin, W. 1987. Real and Complex Analysis, 3rd edn. New York: McGraw-Hill.

Skorokhod, A. V. 1965. Studies in the Theory of Random Processes. Reading, MA: Addison-Wesley.

Stroock, D. W. 2003. Markov Processes from K. Itô’s Perspective. Princeton, NJ: Princeton University Press.

Stroock, D. W. and Varadhan, S. R. S. 1977. Multidimensional Diffusion Processes. Berlin: Springer-Verlag.

Walsh, J. B. 1978. Excursions and local time. Astérisque 52–53, 159–92.

Index

adapted, 1, 359

additive functional, 169, 180

classical, 180

Aldous criterion, 264

almost surely, 348

announce, 112

Bessel processes, 200

binomial, 349, 371

Black–Scholes formula, 220

Blumenthal 0–1 law, 164

BMO, 129

Borel–Cantelli lemma, 353, 354

Brownian bridge, 273

Brownian motion, 6, 153

covariance, 8

fractional, 254

integrated, 41

maximum, 27

standard, 6

with drift, 24

zero set, 30, 48, 99, 214, 217

Brownian sheet, 254

Burkholder–Davis–Gundy

inequalities, 82

cadlag, 2

Cameron–Martin space, 253

canonical process, 158

Cauchy problem, 321

cemetery, 156, 177

central limit theorem, 373

chaining, 51

change of variables formula, 71

Chapman–Kolmogorov

equations, 155

characteristic function, 370

Chebyshev’s inequality, 352

Chung’s law of the iterated

logarithm, 47

class D, 57, 124

class DL, 126

closed form, 303

closed operator, 295

compensator, 124, 130

complete filtration, 1

conditional expectation, 357

conditional probability, 357

conditioned processes, 178

consistent, 382

construction of Brownian motion, 36, 248, 254, 284

continuation region, 187

continuous process, 2

convergence

almost surely, 355

in Lp, 355

in distribution, 367

in law, 367

in probability, 355

weak, 367

convolution semigroup, 285

covariance, 375

covariance matrix, 375

covariation, 58

cumulative normal distribution function, 227

cylindrical set, 3

D[0, 1]

compactness, 263

completeness, 262

metrics, 259

debut, 117

debut theorem, 117

density, 349

diffusion coefficient, 193, 315

Dirichlet boundary condition, 290

Dirichlet form, 303

Dirichlet problem, 320

dissipative, 294

distribution, 348

distribution function, 348

divergence form elliptic

operators, 307

Donsker invariance principle, 269

Doob decomposition, 361

Doob’s h-path transform, 178

Doob’s inequalities, 14, 361

Doob–Meyer decomposition, 60, 124

drift coefficient, 193, 315

dual optional projection, 124

387

388 Index

dual predictable projection, 124

dyadic rationals, 49

empirical process, 275

entry time, 115

equivalent martingale measure, 223

events, 348

excessive, 184

excessive majorant, 186

exercise time, 219

expected value, 348

exponential, 349, 371

martingale, 89

semimartingale, 144

exponential random variables, 33

Feller process, 161

Feynman–Kac formula, 323

filtration, 1, 2

finite-dimensional distributions, 3

Fourier series, 36

gamma, 350, 371

gauge, 323

Gaussian, 7, 374

Gaussian field, 255

Girsanov theorem, 89, 93, 144

Glivenko–Cantelli theorem, 366

good-λ inequality, 86

Green’s function, 175

Gronwall’s lemma, 201

Hölder continuous, 43, 47

harmonic, 173, 321

Hausdorff dimension, 48, 99

Hausdorff measure, 48

heat equation, 322

Helly’s theorem, 369

Hille–Yosida theorem, 292

hitting time, 115

Hunt process, 165

Hurst index, 254

i.i.d., 364

increasing process, 54, 121

independent, 353

independent increments, 6, 339

indicator, 348

indistinguishable, 2

infinite particle systems, 295

infinitesimal generator, 288

innovation process, 230

innovations approach, 229

integration by parts formula, 74

invariance principle, 108

invariant, 178

Itô’s formula, 71

multivariate, 74

Jensen’s inequality, 352, 358

John–Nirenberg inequality, 129

joint characteristic function, 372

jointly normal, 7, 375

Kalman–Bucy filter, 234

Karhunen–Loève expansion, 253

kernel, 154

killed process, 177

Kolmogorov backward equation, 291

Kolmogorov continuity criterion, 49

Kolmogorov extension theorem , 382

Kolmogorov forward equation, 292

Kunita–Watanabe inequality, 70

Lévy measure, 342

Lévy process, 32, 297, 339

Lévy system formula, 347

Lévy’s theorem, 77

Lévy–Khintchine formula, 342

last exit, 181

law, 3, 10, 348

law of the iterated logarithm, 44

least excessive majorant, 186

left continuous process, 2

lifetime, 156, 177

LIL, 44

linear equations, 199

linear model, 234

Lipschitz function, 100, 193

local time, 94, 209

joint continuity, 96

locally bounded, 141

lower semicontinuous, 186

Markov property, 25

Markov transition probabilities, 154

Markovian, 303

martingale, 13, 359

continuous, 54

convergence theorem, 363

local, 54, 139

locally square integrable, 139

problem, 316

representation theorem, 80, 81

uniformly integrable, 54, 364

maximum principle, 176

mean, 350

mean rate of return, 218

measure-valued branching diffusion

process, 317

metric entropy, 51

minimal augmented filtration, 2,

160

modulus of continuity, 247, 260

moment, 350

monotone class theorem, 378

multiplication theorem, 354

Index 389

natural scale, 326

Neumann boundary condition, 290

Newtonian potential density, 175

NFLVR condition, 223

no free lunch, 223

nondivergence form, 296, 315

non-negative definite, 254

normal, 349, 371

nowhere differentiable, 46

null set, 1, 111

observation process, 229

occupation time density, 175

occupation times, 97

one-dimensional diffusion, 326

optimal reward, 187

optimal stopping problem, 184

optional σ -field, 111

optional projection, 119

optional stopping theorem, 17, 360

optional time, 15

Ornstein–Uhlenbeck process, 159, 198

orthogonality lemma, 131

outer probability, 111

p-variation, 30, 48

partial sums, 364

paths, 2

paths locally of bounded variation, 54

Picard iteration, 101

Poincaré cone condition, 174

Poisson, 349, 371

point process, 147

process, 32, 171

Poisson’s equation, 319

portmanteau theorem, 237

potential, 155, 323

prévisible, 111

predict, 112

predictable, 64, 130

predictable σ -field, 64, 111

predictable projection, 120

probability, 348

process, 1

product formula, 74, 85

progressively measurable, 4

Prohorov metric, 241

Prohorov theorem, 239

purely discontinuous, 143

quadratic variation, 57, 79

quasi-left continuous, 165

random variables, 348

Ray–Knight theorem, 209

recurrence, 167

reduce, 139

reflection principle, 27

regular, 173, 326

regular conditional probability, 312, 380

regular Dirichlet form, 307

reproducing property, 252

resolvent, 155, 286

reward function, 184

right continuous filtration, 1

right continuous process, 2

right continuous with left limits, 2

scale function, 327

scaling, 7

Schrödinger operator, 323

Schwartz class, 379

section theorem

optional, 117

predictable, 117

self-financing, 219

semigroup, 155

semigroup property, 155

semimartingale, 54, 141

set-indexed process, 255

shift operators, 158

signal process, 229

simple symmetric random walk, 109, 248

Skorokhod embedding, 100

Skorokhod representation, 245

Slutsky’s theorem, 242, 369

space-time process, 182

spectral theorem, 309

speed measure, 329

square integrable martingale, 55

stable subordinator, 347

stationary increments, 6, 339

stochastic integral, 64, 134, 150

local martingales, 69

multiple, 88

semimartingales, 69

stochastic process, 1

stopping time, 15, 359

Stratonovich integral, 84

strong Feller process, 161

strong law of large numbers, 364

strong Markov process, 165

strong Markov property, 25

strongly reduce, 139

sub-Markov transition probability

kernels, 283

submartingale, 359

super-Brownian motion, 317

supermartingale, 359

support theorem, 93, 208

symmetric difference, 12

symmetric stable process, 346

Tanaka formula, 94, 95

terminal time, 177

tight, 369

390 Index

time change, 78, 105, 180

time inversion, 11

totally inaccessible, 112, 130

trading strategy, 219

trajectories, 2

transience, 167

transition densities, 291

transition probabilities, 154

uniform ellipticity, 296, 307, 315

uniformly absolutely continuous, 356

uniformly integrable, 356

unique in law, 204

upcrossings, 18, 362

usual conditions, 1

variance, 350

versions, 2

Vitali convergence theorem,

357

volatility, 218

weak convergence, 367

weak Feller process, 161

weak solution, 204

weak uniqueness, 204

well posed, 316

well measurable, 111

Wiener measure, 6

Yamada–Watanabe condition, 196

Cover

Title

Copyright

Dedication

Contents

Preface

Frequently used notation

1 Basic notions

1.1 Processes and s-fields

1.2 Laws and state spaces

Exercises

Notes

2 Brownian motion

2.1 Definition and basic properties

Exercises

Notes

3 Martingales

3.1 Definition and examples

3.2 Doob’s inequalities

3.3 Stopping times

3.4 The optional stopping theorem

3.5 Convergence and regularity

3.6 Some applications of martingales

Exercises

4 Markov properties of Brownian motion

4.1 Markov properties

4.2 Applications

Exercises

5 The Poisson process

Exercises

6 Construction of Brownian motion

6.1 Wiener’s construction

6.2 Martingale methods

Exercises

7 Path properties of Brownian motion

Exercises

8 The continuity of paths

Exercises

9 Continuous semimartingales

9.1 Definitions

9.2 Square integrable martingales

9.3 Quadratic variation

9.4 The Doob–Meyer decomposition

Exercises

Notes

10 Stochastic integrals

10.1 Construction

10.2 Extensions

Exercises

11 Ito’s formula

Exercises

12 Some applications of Ito’s formula

12.1 Levy’s theorem

12.2 Time changes of martingales

12.3 Quadratic variation

12.4 Martingale representation

12.5 The Burkholder–Davis–Gundy inequalities

12.6 Stratonovich integrals

Exercises

13 The Girsanov theorem

13.1 The Brownian motion case

13.2 An example

Exercises

14 Local times

14.1 Basic properties

14.2 Joint continuity of local times

14.3 Occupation times

Exercises

15 Skorokhod embedding

15.1 Preliminaries

15.2 Construction of the embedding

15.3 Embedding random walks

Exercises

16 The general theory of processes

16.1 Predictable and optional processes

16.2 Hitting times

16.3 The debut and section theorems

16.4 Projection theorems

16.5 More on predictability

16.6 Dual projection theorems

16.7 The Doob–Meyer decomposition

16.8 Two inequalities

Exercises

Notes

17 Processes with jumps

17.1 Decomposition of martingales

17.2 Stochastic integrals

17.3 Ito’s formula

17.4 The reduction theorem

17.5 Semimartingales

17.6 Exponential of a semimartingale

17.7 The Girsanov theorem

Exercises

18 Poisson point processes

Exercises

19 Framework for Markov processes

19.1 Introduction

19.2 Definition of a Markov process

19.3 Transition probabilities

19.4 An example

19.5 The canonical process and shift operators

Exercises

Notes

20 Markov properties

20.1 Enlarging the filtration

20.2 The Markov property

20.3 Strong Markov property

Exercises

21 Applications of the Markov properties

21.1 Recurrence and transience

21.2 Additive functionals

21.3 Continuity

21.4 Harmonic functions

Exercises

22 Transformations of Markov processes

22.1 Killed processes

22.2 Conditioned processes

22.3 Time change

22.4 Last exit decompositions

Exercises

Notes

23 Optimal stopping

23.1 Excessive functions

23.2 Solving the optimal stopping problem

Exercises

Notes

24 Stochastic differential equations

24.1 Pathwise solutions of SDEs

24.2 One-dimensional SDEs

24.3 Examples of SDEs

Exercises

Notes

25 Weak solutions of SDEs

Exercises

26 The Ray–Knight theorems

Exercises

Notes

27 Brownian excursions

Exercises

Notes

28 Financial mathematics

28.1 Finance models

28.2 Black–Scholes formula

28.3 The fundamental theorem of finance

28.4 Stochastic control

Exercises

29 Filtering

29.1 The basic model

29.2 The innovation process

29.3 Representation of Fz-martingales

29.4 The filtering equation

29.5 Linear models

29.6 Kalman–Bucy filter

Exercises

Notes

30 Convergence of probability measures

30.1 The portmanteau theorem

30.2 The Prohorov theorem

30.3 Metrics for weak convergence

Exercises

Notes

31 Skorokhod representation

Exercises

32 The space C[0,1]

32.1 Tightness

32.2 A construction of Brownian motion

Exercises

33 Gaussian processes

33.1 Reproducing kernel Hilbert spaces

33.2 Continuous Gaussian processes

Exercises

34 The space D[0,1]

34.1 Metrics for D[0,1]

34.2 Compactness and completeness

34.3 The Aldous criterion

Exercises

Notes

35 Applications of weak convergence

35.1 Donsker invariance principle

35.2 Brownian bridge

35.3 Empirical processes

Exercises

36 Semigroups

36.1 Constructing the process

36.2 Examples

Exercises

Notes

37 Infinitesimal generators

37.1 Semigroup properties

37.2 The Hille–Yosida theorem

37.3 Nondivergence form elliptic operators

37.4 Generators of Levy processes

Exercises

38 Dirichlet forms

38.1 Framework

38.2 Construction of the semigroup

38.3 Divergence form elliptic operators

Exercises

Notes

39 Markov processes and SDEs

39.1 Markov properties

39.2 SDEs and PDEs

39.3 Martingale problems

Exercises

Notes

40 Solving partial differential equations

40.1 Poisson’s equation

40.2 Dirichlet problem

40.3 Cauchy problem

40.4 Schrodinger operators

Exercises

Notes

41 One-dimensional diffusions

41.1 Regularity

41.2 Scale functions

41.3 Speed measures

41.4 The uniqueness theorem

41.5 Time change

41.6 Examples

Exercises

Notes

42 Levy processes

42.1 Examples

42.2 Construction of Levy processes

42.3 Representation of Levy processes

Exercises

Appendix A Basic probability

A.1 First notions

A.2 Independence

A.3 Convergence

A.4 Uniform integrability

A.5 Conditional expectation

A.6 Stopping times

A.7 Martingales

A.8 Optional stopping

A.9 Doob’s inequalities

A.10 Martingale convergence theorem

A.11 Strong law of large numbers

A.12 Weak convergence

A.13 Characteristic functions

A.14 Uniqueness and characteristic functions

A.15 The central limit theorem

A.16 Gaussian random variables

Appendix B Some results from analysis

B.1 The monotone class theorem

B.2 The Schwartz class

Appendix C Regular conditional probabilities

Appendix D Kolmogorov extension theorem

References

Index

Order your essay today and save **25%** with the discount code: GREEN