Assignment 1Problem 1
Problem 2
Use R!
Advisors:
Robert Gentleman
· Kurt Hornik · Giovanni Parmigiani
For other titles published in this series, go to
http://www.springer.com/series/6991
Christian P. Robert · George Casella
Introducing Monte Carlo
Methods with R
123
Christian P. Robert
Université Paris Dauphine
UMR CNRS 7534
CEREMADE
place du Maréchal de Lattre
de Tassigny
75775 Paris cedex 16
France
xian@ceremade.dauphine.fr
George Casella
Department of Statistics
University of Florida
103 Griffin-Floyd Hall
Gainesville FL 32611-8545
USA
casella@stat.ufl.edu
Series Editors
Robert Gentleman
Department of Bioinformatics
and Computational Biology
Genentech South San Francisco
CA 94080
USA
Kurt Hornik
Department of Statistik and Mathematik
Wirtshchaftsuniversität Wien Augasse 2-6
A-1090 Wien
Austria
Giovanni Parmigiani
Department of Biostatistics
and Computational Biology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115
USA
ISBN 978-1-4419-1575-7
DOI 10.1007/978-1-4419-1576-4
e-ISBN 978-1-4419-1576-4
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2009941076
c Springer Science+Business Media, LLC 2010
All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the
publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief
excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and
retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified
as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
v
To our parents, who taught us much in many ways.
vi
“What if you haven’t the data?”
“Then we shall proceed directly to the brandy and cigars.”
Lyndsay Faye
The Case of Colonel Warbuton’s Madness
Preface
“After that, it was down to attitude.”
Ian Rankin
Black & Blue
The purpose of this book is to provide a self-contained entry into Monte Carlo
computational techniques. First and foremost, it must not be confused with
a programming addendum to our earlier book Monte Carlo Statistical Methods whose second edition came out in 2004. The current book has a different
purpose, namely to make a general audience familiar with the programming
aspects of Monte Carlo methodology through practical implementation. Not
only have we introduced R at the core of this book, but the emphasis and
contents have changed drastically from Monte Carlo Statistical Methods, even
though the overall vision remains the same. Theoretical foundations are intentionally avoided in the current book.
Indeed, the emphasis on practice is a major feature of this book in that
its primary audience consists of graduate students in statistics, biostatistics,
engineering, etc., who need to learn how to use simulation methods as a tool
to analyze their experiments and/or datasets. The book should appeal to
scientists in all fields, given the versatility of these Monte Carlo tools. It can
also be used for a more classical statistics audience when aimed at teaching a
quick entry into modern computational methods based on R, at the end of an
undergraduate program for example, even though this may prove challenging
for some students.
The choice of the programming language R, as opposed to faster alternatives such as Matlab or C and more structured constructs such as BUGS, is
due to its pedagogical simplicity and its versatility. Readers can easily conduct experiments in their own programming language by translating the examples provided in this book. (Obviously, the book can also supplement other
textbooks on Bayesian modeling at the graduate level, including our books
Bayesian Choice (Robert, 2001) and Monte Carlo Statistical Methods (Robert
viii
Preface
and Casella, 2004).) This book can also be viewed as a companion to, rather
than a competitor of, Jim Albert’s Use R! book Bayesian Computation with
R (Albert, 2009). Indeed, taken as a pair, these two books can provide a fairly
thorough introduction to Monte Carlo methods and Bayesian modeling.
We stress that, at a production level (that is, when using advanced Monte
Carlo techniques or analyzing large datasets), R cannot be recommended as
the default language, but the expertise gained from this book should make
the switch to another language seamless.
Contrary to usual practice, many exercises are interspersed within the
chapters rather than postponed until the end of each chapter. There are two
reasons for this stylistic choice. First, the results or developments contained in
those exercises are often relevant for upcoming points in the chapter. Second,
they signal to the student (or to any reader) that some pondering over the
previous pages may be useful before moving to the following topic and so may
act as self-checking gateways. Additional exercises are found at the end of
each chapter, with abridged solutions of the odd-numbered exercises provided
on our Webpages as well as Springer’s.
Thanks
We are immensely grateful to colleagues and friends for their help with this
book, in particular to the following people: Ed George for his comments on the
general purpose of the book and some exercises in particular; Jim Hobert and
Fernando Quintana for helpful discussions on the Monte Carlo EM; Alessandra
Iacobucci for signaling in due time a fatal blunder; Jean-Michel Marin for
letting us recycle the first chapter of Bayesian Core (Marin and Robert, 2007)
into our introduction to R and for numerous pieces of R advice, as well as
pedagogical suggestions; Antonietta Mira for pointing out mistakes during
a session of an MCMC conference in Warwick; François Perron for inviting
CPR to Montréal and thus providing him with a time window to complete
Chapter 8 (only shortened by an ice-climbing afternoon in Québéc!), and also
François Perron and Clémentine Trimont for testing the whole book from
the perspectives of a professor and a student, respectively; Martyn Plummer
for answering queries about coda; Jeff Rosenthal for very helpful exchanges
on amcmc; Dimitris Rizopoulos for providing Exercise 7.21; and Phil Spector
from Berkeley for making available his detailed and helpful notes and slides
on R and C, now partly published as Spector (2009). Comments from both
reviewers were especially useful in finalizing the book. We are also grateful to
John Kimmel of Springer for his advice and efficiency, as well as for creating
the visionary Use R! series and supporting the development of the R language
that way. From a distance, we also wish to thank Professors Gentleman and
Ihaka for creating the R language and for doing it in open-source, as well as
the entire R community for contributing endlessly to its development.
Sceaux and Gainesville
October 18, 2009
Christian P. Robert and George Casella
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1
Basic R Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 R objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 The vector class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 The matrix, array, and factor classes . . . . . . . . . . . . . . . . . .
1.3.3 The list and data.frame classes . . . . . . . . . . . . . . . . . . . . . .
1.4 Probability distributions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Basic and not-so-basic statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Graphical facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7 Writing new R functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8 Input and output in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9 Administration of R objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10 The mcsm package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
3
5
6
9
12
14
14
26
31
35
36
36
37
2
Random Variable Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Uniform simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 The inverse transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 General transformation methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 A normal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Mixture representations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Accept–reject methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
42
42
44
46
47
48
50
51
x
Contents
2.4 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3
Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Classical Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 An arbitrary change of reference measure . . . . . . . . . . . . .
3.3.2 Sampling importance resampling . . . . . . . . . . . . . . . . . . . .
3.3.3 Selection of the importance function . . . . . . . . . . . . . . . . .
3.4 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
62
65
69
69
75
78
86
4
Controlling and Accelerating Convergence . . . . . . . . . . . . . . . . . 89
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2 Monitoring variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3 Asymptotic variance of importance sampling estimators . . . . . . 92
4.4 Effective sample size and perplexity . . . . . . . . . . . . . . . . . . . . . . . . 98
4.5 Simultaneous monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.6 Rao–Blackwellization and deconditioning . . . . . . . . . . . . . . . . . . . 107
4.7 Acceleration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.7.1 Correlated simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.7.2 Antithetic variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.7.3 Control variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.8 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5
Monte Carlo Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2 Numerical optimization methods . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3 Stochastic search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.3.1 A basic solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.3.2 Stochastic gradient methods . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.3 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4 Stochastic approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.4.1 Optimizing Monte Carlo approximations . . . . . . . . . . . . . 146
5.4.2 Missing-data models and demarginalization . . . . . . . . . . . 150
5.4.3 The EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.4.4 Monte Carlo EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.5 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6
Metropolis–Hastings Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.2 A peek at Markov chain theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.3 Basic Metropolis–Hastings algorithms . . . . . . . . . . . . . . . . . . . . . . 170
6.3.1 A generic Markov chain Monte Carlo algorithm . . . . . . . 171
6.3.2 The independent Metropolis–Hastings algorithm . . . . . . 175
6.4 A selection of candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Contents
xi
6.4.1 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.4.2 Alternative candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.5 Acceptance rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7
Gibbs Samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.2 The two-stage Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.3 The multistage Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
7.4 Missing data and latent variables . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.5 Hierarchical structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7.6 Other considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.6.1 Reparameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.6.2 Rao–Blackwellization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.6.3 Metropolis within Gibbs and hybrid strategies . . . . . . . . 230
7.6.4 Improper priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
7.7 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8
Monitoring and Adaptation for MCMC Algorithms . . . . . . . . 237
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.2 Monitoring what and why . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.2.1 Convergence to the stationary distribution . . . . . . . . . . . . 238
8.2.2 Convergence of averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
8.2.3 Approximating iid sampling . . . . . . . . . . . . . . . . . . . . . . . . 240
8.2.4 The coda package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.3 Monitoring convergence to stationarity . . . . . . . . . . . . . . . . . . . . . 242
8.3.1 Graphical diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.3.2 Nonparametric tests of stationarity . . . . . . . . . . . . . . . . . . 243
8.3.3 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.4 Monitoring convergence of averages . . . . . . . . . . . . . . . . . . . . . . . . 250
8.4.1 Graphical diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
8.4.2 Within and between variances . . . . . . . . . . . . . . . . . . . . . . 253
8.4.3 Effective sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
8.4.4 Fixed-width batch means . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.5 Adaptive MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.5.1 Cautions about adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.5.2 The amcmc package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
8.6 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Index of R Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Index of Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
List of Figures
1.1 Illustrations of the processing of vectors in R. . . . . . . . . . . . . . . . .
1.2 Illustrations of the processing of matrices in R. . . . . . . . . . . . . . . .
1.3 Illustrations of the factor class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Chosen features of the list class. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Definition of a data frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Loess and natural splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7 Autocorrelation and partial autocorrelation plots . . . . . . . . . . . . .
1.8 Simple illustration of bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9 Bootstrap linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10 Spline approximation of monthly deaths . . . . . . . . . . . . . . . . . . . . .
1.11 Cumsum illustration for an AR(1) . . . . . . . . . . . . . . . . . . . . . . . . . .
1.12 Range of Brownian motions with confidence band . . . . . . . . . . . .
1.13 Some artificial loops in R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
11
12
13
14
19
23
24
26
29
30
32
34
2.1
2.2
2.3
2.4
Representation of a uniform random sample . . . . . . . . . . . . . . . . .
Representations of an exponential random sample . . . . . . . . . . . .
Representation of a binomial random sample . . . . . . . . . . . . . . . . .
Generation of beta random variables . . . . . . . . . . . . . . . . . . . . . . . .
43
45
51
54
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
Evaluation of integrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparison of integrate and area . . . . . . . . . . . . . . . . . . . . . . . . . . .
One-dimensional Monte Carlo integration . . . . . . . . . . . . . . . . . . .
Importance sampling approximation of a normal tail . . . . . . . . . .
Representation of the posterior π(α, β|x) . . . . . . . . . . . . . . . . . . . .
Analysis of a sample from π(α, β|x) . . . . . . . . . . . . . . . . . . . . . . . . .
Infinite variance importance sampler . . . . . . . . . . . . . . . . . . . . . . . .
Convergence of two estimators of the integral (3.9) . . . . . . . . . . .
Posterior of the regression parameter (β1 , β2 ) . . . . . . . . . . . . . . . .
63
64
67
72
74
78
80
84
86
4.1
4.2
Confidence bands for a simple example . . . . . . . . . . . . . . . . . . . . . . 92
Range and confidence for the Cauchy-normal problem (1) . . . . . 97
xiv
List of Figures
4.3 Range and confidence for the Cauchy-Normal problem (2) . . . . . 98
4.4 ESS and perplexity for the Cauchy-Normal problem . . . . . . . . . . 101
4.5 ESS and perplexity for the Cauchy-Normal problem . . . . . . . . . . 104
4.6 Brownian confidence band for the Cauchy-Normal problem . . . . 106
4.7 Convergence of estimators of E{exp(−X 2 )} . . . . . . . . . . . . . . . . . . 109
4.8 Approximate risks of truncated James–Stein estimators . . . . . . . 114
4.9 Impact of dyadic average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.10 Impact of control variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.11 Impact of control variates in logistic regression . . . . . . . . . . . . . . . 121
5.1 Sequences of MLEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2 Newton–Raphson sequences for a mixture likelihood . . . . . . . . . . 129
5.3 Simple Monte Carlo maximization . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.4 Two Cauchy likelihood maximizations . . . . . . . . . . . . . . . . . . . . . . 133
5.5 A Cauchy likelihood approximation . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.6 Representation of the function of Example 5.6 . . . . . . . . . . . . . . . 136
5.7 Stochastic gradient paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.8 Simulated annealing paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.9 Simulated annealing sequence with two modes . . . . . . . . . . . . . . . 146
5.10 Simulated annealing sequence for four schedules . . . . . . . . . . . . . . 147
5.11 Monte Carlo approximations of a probit marginal . . . . . . . . . . . . 149
5.12 EM sequences for a normal censored likelihood . . . . . . . . . . . . . . . 155
5.13 Multiple-mode EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.14 MCEM on logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.1 Metropolis output from a beta target . . . . . . . . . . . . . . . . . . . . . . . 173
6.2 Metropolis simulations from a beta target . . . . . . . . . . . . . . . . . . . 174
6.3 Output of a gamma accept–reject algorithm . . . . . . . . . . . . . . . . . 177
6.4 Metropolis–Hastings schemes for a Cauchy target . . . . . . . . . . . . . 179
6.5 Cumulative coverage for a Cauchy target . . . . . . . . . . . . . . . . . . . . 180
6.6 Fitting the braking data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.7 Random walk proposals with different scales . . . . . . . . . . . . . . . . . 184
6.8 Scale impact on mixture exploration . . . . . . . . . . . . . . . . . . . . . . . . 185
6.9 Langevin samples for probit posterior . . . . . . . . . . . . . . . . . . . . . . . 187
6.10 Langevin samples for mixture posterior . . . . . . . . . . . . . . . . . . . . . 189
6.11 Cumulative mean plots with different scales . . . . . . . . . . . . . . . . . . 193
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
Histograms from the Gibbs sampler of Example 7.2 . . . . . . . . . . . 203
Histograms from the Gibbs sampler of Example 7.3 . . . . . . . . . . . 205
Histograms from the Gibbs sampler of Example 7.5 . . . . . . . . . . . 208
Histograms of the posterior distributions from Example 7.6 . . . . 211
Histograms from the Gibbs sampler of Example 7.7 . . . . . . . . . . . 212
Histograms from the Gibbs sampler of Example 7.8 . . . . . . . . . . . 214
Gibbs output for mixture posterior . . . . . . . . . . . . . . . . . . . . . . . . . 217
Simple slice sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
List of Figures
xv
7.9 Logistic slice sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7.10 Histograms from the pump failure data of Example 7.12 . . . . . . . 223
7.11 Autocorrelations from a Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . 225
7.12 Autocovariance plots for the Gibbs sampler of model (7.7) . . . . . 226
7.13 Histograms of λ in Example 7.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.14 Histograms from the Gibbs sampler of (7.12) . . . . . . . . . . . . . . . . . 233
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
Raw coda output for the random effect logit model . . . . . . . . . . . 244
Empirical cdfs for the random effect logit parameters . . . . . . . . . 244
Plot of successive Kolmogorov–Smirnov statistics . . . . . . . . . . . . . 246
Comparison of two MCMC scales for the noisy AR model . . . . . 251
Multiple MCMC runs for the noisy AR model . . . . . . . . . . . . . . . . 252
Gelman and Rubin’s evaluation for the noisy AR model . . . . . . . 254
Gelman and Rubin’s evaluation for the pump failure model . . . . 255
Fixed-width batch sampling variance estimation for the pump
failure model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
8.9 Degenerating MCMC adaptation for the pump failure model . . . 261
8.10 Nonconverging non-parametric MCMC adaptation for the
noisy AR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
8.11 Mode-recovering non-parametric MCMC adaptation for the
noisy AR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
List of Examples
1.1
Bootstrapping simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Exponential variable generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Transformations of exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Normal variable generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Poisson random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Negative binomial random variables as mixtures . . . . . . . . . . . . . . . .
Accept–reject for beta variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Continuation of Example 2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
46
47
48
49
50
53
55
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
Precision of integrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
integrate versus area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monte Carlo convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Precision of a normal cdf approximation . . . . . . . . . . . . . . . . . . . . . . .
Tail probability approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Beta posterior importance approximation . . . . . . . . . . . . . . . . . . . . . .
Continuation of Example 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Importance sampling with infinite variance . . . . . . . . . . . . . . . . . . . .
Selection of the importance sampling function . . . . . . . . . . . . . . . . . .
Probit posterior importance sampling approximation . . . . . . . . . . . .
62
63
65
67
70
71
77
79
82
83
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
Monitoring with the CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Cauchy prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Continuation of Example 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Continuation of Example 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Continuation of Example 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Student’s t expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
James–Stein estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Continuation of Example 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Cauchy posterior with antithetic variables . . . . . . . . . . . . . . . . . . . . . 115
xviii
List of Examples
4.10
4.11
Continuation of Example 4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
Maximizing a Cauchy likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Mixture model likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
A first Monte Carlo maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Continuation of Example 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Continuation of Example 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Minimization of a complex function . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Continuation of Example 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Continuation of Example 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Simulated annealing for a normal mixture . . . . . . . . . . . . . . . . . . . . . 144
Continuation of Example 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Bayesian analysis of a simple probit model . . . . . . . . . . . . . . . . . . . . . 147
Missing-data mixture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Censored–data likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Continuation of Example 5.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
EM for a normal mixture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Missing–data multinomial model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Random effect logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
Metropolis–Hastings algorithm for beta variables . . . . . . . . . . . . . . . 172
Cauchys from normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Metropolis–Hastings for regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Normals from uniforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Metropolis–Hastings for mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Probit regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Continuation of Example 6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Acceptance rates: normals from double exponentials . . . . . . . . . . . . 192
Continuation of Example 6.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
Normal bivariate Gibbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Generating beta-binomial random variables . . . . . . . . . . . . . . . . . . . . 202
Fitting a one-way hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Normal multivariate Gibbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Extension of Example 7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Censored-data Gibbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Grouped multinomial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
More grouped multinomial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Gibbs for normal mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
A first slice sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Logistic regression with the slice sampler . . . . . . . . . . . . . . . . . . . . . . 219
A Poisson hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Correlation in a bivariate normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
List of Examples
xix
7.14
7.15
7.16
7.17
7.18
7.19
Continuation of Example 7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Normal bivariate Gibbs revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Poisson counts with missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Metropolis within Gibbs illustration . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Conditional exponential distributions—-nonconvergence . . . . . . . . . 232
Improper random effects posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
8.11
Random effect logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Poisson hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Metropolis–Hastings random walk on AR(1) model . . . . . . . . . . . . . 249
Continuation of Example 8.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Continuation of Example 8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Continuation of Example 8.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Another approach to Example 8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Continuation of Example 8.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Adaptive MCMC for anova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Continuation of Example 8.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Continuation of Example 8.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
1
Basic R Programming
“You’re missing the big picture,” he told her. “A good album should
be more than the sum of its parts.”
Ian Rankin
Exit Music
Reader’s guide
The Reader’s guide is a section that will start each chapter by providing comments on its contents. It also usually contains indications of the purpose of the
chapter and its links with other chapters.
This first chapter is where we introduce the programming language R, which we
use to implement and illustrate our algorithms. We discuss here input and output,
data structures, and basic programming commands for this language. It is thus
a crucial chapter for these new to R, but it will unavoidably feel unsatisfactory
because the coverage of those notions will most likely be too sketchy for most
readers. For those already familiar with R, or simply previously exposed to another
introduction to R, this chapter will undoubtedly feel mostly familiar and act as
a refresher, maybe prodding them to delve deeper into the R language using a
reference book. The similarity with the introductory chapter of Bayesian Core is
not coincidental, as we used the same skeleton as in Marin and Robert (2007).
C.P. Robert, G. Casella, Introducing Monte Carlo Methods with R, Use R,
DOI 10.1007/978-1-4419-1576-4_1, © Springer Science+Business Media, LLC 2010
2
1 Basic R Programming
1.1 Introduction
This chapter only attempts at introducing R to newcomers in a few pages
and, as such, it should not be considered as a proper introduction to R. Entire
volumes, such as the monumental R Book by Crawley (2007), the introduction
by Dalgaard (2002), and the focused R data manipulation by Spector (2009),
are dedicated to the practice of this language, and therefore additional efforts
(besides reading this chapter) will be required from the reader to sufficiently
master the language.1 However, before discouraging anyone, let us comfort
you with the fact that:
a. The syntax of R is simple and logical enough to quickly allow for a basic
understanding of simple R programs, as should become obvious in a few
paragraphs.
b. The best, and in a sense the only, way to learn R is through trial-anderror on simple and then more complex examples. Reading the book with
a computer available nearby is therefore the best way of implementing
this recommendation.
In particular, the embedded help commands help() and help.search() are
very good starting points to gather information about a specific function or
a general issue, even though more detailed manuals are available both locally
and on-line. Note that help.start() opens a Web browser linked to the local
manual pages.
One may first wonder why we support using R as the programming interface for this introduction to Monte Carlo methods, since there exist other
languages, most (all?) of them faster than R, like Matlab, and even free, like C
or Python. We obviously have no partisan or commercial involvement in this
language. Rather, besides the ease of presentation, our main reason for this
choice is that the language combines a sufficiently high power (for an interpreted language) with a very clear syntax both for statistical computation and
graphics. R is a flexible language that is object-oriented and thus allows the
manipulation of complex data structures in a condensed and efficient manner.
Its graphical abilities are also remarkable, with possible interfacing with a
text processor such as LATEX with the package Sweave. R offers the additional
advantages of being a free and open-source system under the GNU General
Public Licence principle, with constant upgrades and improvements from the
statistics community,2 as well as numerous (free) Web-based tutorials and
user’s manuals,3 and running on all platforms, including both Apple’s Mac
1
If you decide to skip this chapter, be sure to at least print the handy R Reference Card available at http://cran.r-project.org/doc/contrib/Short-refcard.pdf that
summarizes, in four pages, the major commands of R.
2
There is even an R newsletter, R-News, which is available on cran.rproject.org/doc/Rnews.
3
This means it is unlikely that you will need to acquire an R programming book
in order to get a proper understanding of the R language, even though it may
1.2 Getting started
3
and Microsoft Windows (and, obviously, under the Linux and Unix operating
systems). R provides a powerful interface that can integrate programs written
in other languages such as C, C++, Fortran, Perl, Python, and Java. Not only
can you keep programming in your usual language, but integrating programs
written by others in an alien language then becomes (mostly) seamless, as
seen in Chapter 8 with the package amcmc. At last, it is increasingly common
to see people who develop new methodology simultaneously producing an R
package in support of their approach and to back up introductory statistics
courses with illustrations in R, as shown by the expanding Use R! Springer
series in which this book is published.
One choice we have not addressed above is “why R and not BUGS?” BUGS
(which stands for Bayesian inference using Gibbs sampling) is a Bayesian analysis software developed since the early 1990’s, mostly by researchers from the
Medical Research Council (MRC) at Cambridge University. The most common
version is WinBugs, working under Windows, but there also exists an opensource version called OpenBugs. So, to return to the initial question, we are
not addressing the possible links and advantages of BUGS simply because the
purpose is different. Our goal is to give a deep but intuitive understanding of
simulation tools for statistical inference, not (directly) help in the construction
of Bayesian models and the Bayesian estimation of their parameters. While
access to Monte Carlo specifications is possible in BUGS, most computing operations are handled by the software itself, with the possible outcome that the
user does not bother about this side of the problem and instead concentrates
on Bayesian modeling. Thus, while R can be easily linked with BUGS and simulation can be done via BUGS, we think that a lower-level language such as R
is more effective in bringing you in touch with simulation imperatives. Note
that this perspective would not hold so tightly for a book on computational
statistics, as Albert (2009).
1.2 Getting started
The R language is straightforward to install: It can be downloaded (obviously
free) from one of the numerous CRAN (Comprehensive R Archive Network)
mirror Websites around the world.4 (Note that it is resident in most current
Linux kernels.)
At this stage, we do not cover installation of the R package and thus assume
that (a) R is installed on the machine you want to work with and (b) that
you have managed to launch it (in most cases, you simply have to click on
prove useful at a later stage. See http://cran.r-project.org/manuals.html for manuals available on-line.
4
The main CRAN Website is http://cran.r-project.org/.
4
1 Basic R Programming
the proper icon). You should then obtain a terminal window whose first lines
resemble the following, most likely with a more recent version:
R version 2.5.1 (2007-06-27)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type ’license()’ or ’licence()’ for distribution details.
R is a collaborative project with many contributors.
Type ’contributors()’ for more information and
’citation()’ on how to cite R or R packages in publications.
Type ’demo()’ for some demos, ’help()’ for on-line help, or
’help.start()’ for an HTML browser interface to help.
Type ’q()’ to quit R.
>
Neither this austere beginning nor the prospect of using a line editor should
put you off, though, as there are many other ways of inputting and outputting
commands and data, as we shall soon see! The final line above with the symbol
> is not a typo but rather means that the R software is waiting for a command
from the user (i.e., you). This character > at the beginning of each line in the
executable window is called the prompt and precedes the line command, which
is terminated by pressing the RETURN key. At this early stage, all commands
will be passed as line commands, and you should thus spot commands thanks
to this symbol.
Make sure to remember that exiting R can be done by typing q() after
the prompt, as in
> q()
Save workspace image? [y/n/c]: n
the options being proposed in the line right after the prompt having to
do with the storage of the command history and the produced objects,
something you can ignore at this stage. Commands and programs that
need to be stopped during their execution, for instance because they take
too long or too much memory to complete or because they involve a
programming mistake such as an infinite loop, can be stopped by the
Control-C double-key action without exiting the R session.
1.3 R objects
5
Exercise 1.1 To start with a limited challenge, using the final lines above the
prompt, you could embark on an on-line visit of the main features of R by typing
demo() after the prompt (make sure to test demo(image) and demo(graphics)
to get an idea of the great graphical abilities of R). More statistical demos are
also available, as listed by demo().
For memory and efficiency reasons, R does not install all the available
functions and programs when launched but only the basic packages that it
requires to run properly. Those basic packages are base, stats, graphics, nmle,
and lattice. Additional packages can be loaded via the library command, as
in
> library(combinat) # combinatorics utilities
> library(datasets) # The R Datasets Package
and the entire list of available packages is provided by library(). (The symbol
# in the prompt lines above indicates a comment: All characters following #
until the end of the command line are ignored. Comments are recommended to
improve the readability of your programs.) There exist hundreds of packages
available on the Web.5 Installing a new package such as the package mcsm
that is associated with this book is done by downloading the file from the
Web depository and calling
> install.package(“mcsm”)
or
> download.package(“mcsm”)
For a given package, the install command obviously needs to be executed
only once, while the library call is required each time R is launched (as the
corresponding package is not kept as part of the .RData file, whose role is
explained below in Section 1.9). Thus, it is good practice to include calls to
required libraries within your R programs in order to avoid error messages
when launching them.
1.3 R objects
As with many advanced programming languages, R distinguishes between several types of objects. Those types include scalar, vector, matrix, time series,
data frames, functions, or graphics. An R object is mostly characterized by a
mode that describes its contents and a class that describes its structure. The
R function str applied to any R object, including R functions, will show its
structure. The different modes are
5
Packages that have been validated and tested by the R core team are listed at
http://cran.r-project.org/src/contrib/PACKAGES.html.
6
–
1 Basic R Programming
null (empty object),
logical (TRUE or FALSE),
numeric (such as 3, 0.14159, or 2+sqrt(3)),
complex, (such as 3-2i or complex(1,4,-2)), and
character (such as ”Blue”, ”binomial”, ”male”, or “y=a+bx”),
and the main classes are vector, matrix, array, factor, time-series, data.frame, and
list. Heterogeneous objects such as those of the list class can include elements
with various modes. Manual entries about those classes can be obtained via
the help commands help(data.frame) or ?matrix for instance.
R can operate on most of those types as a regular function would operate
on a scalar, which is a feature that should be exploited as much as possible for
compact and efficient programming. The fact that R is interpreted rather than
compiled involves many subtle differences, but a major issue is that all variables in the system are evaluated and stored at every step of R programs. This
means that loops in R are enormously time-consuming and should be avoided
at all costs! Therefore, using the shortcuts offered by R in the manipulation
of vectors, matrices, and other structures is a must.
1.3.1 The vector class
As indicated logically by its name, the vector object corresponds to a mathematical vector of elements of the same type, such as (TRUE,TRUE,FALSE,TRUE)
or (1,2,3,5,7,11). Creating small vectors can be done using the R command
c() as in
> a=c(2,6,-4,9,18)
This fundamental function combines or concatenates terms together. For instance,
> d=c(a,b)
concatenates the two vectors a and b into a new vector d. Note that decimal
numbers should be encoded with a dot, character strings in quotes ” “, and
logical values with the character strings TRUE and FALSE or with their respective abbreviations T and F. Missing values are encoded with the character
string NA. The option recursive=TRUE in c() allows breaking down a list into
its individual components.
Being able to use abbreviations in R is quite handy, but this may lead to
confusion! In particular, the use of T instead of TRUE is only valid if T is not
defined otherwise in the current R session. Since T is a standard symbol
in Monte Carlo simulation, often denoting the number of iterations, this
may create unsuspected problems. For instance, using
1.3 R objects
7
> sort(weit,dec=T)
Error in sort(weit/sum(weit), dec = T) :
’decreasing’ must be a length-1 logical vector.
Did you intend to set ’partial’?
resulted in an error message because T was already defined in the R
program as 103 .
In Figure 1.1, we give a few illustrations of the use of vectors in R. The
character + indicates that the console is waiting for a supplementary instruction, which is useful when typing long expressions. The assignment operator
is =, not to be confused with ==, which is the Boolean operator for equality.
An older assignment operator is x b=a[2:4]
build the numeric vector b of dimension 3
with elements 5.6, 1, 4
> d=a[c(1,3,5)]
build the numeric vector d of dimension 3
with elements 5, 1, –5
> 2*a
multiply each element of a by 2
and display the result
> b%%3
provides each element of b modulo 3
> d%/%2.4
computes the integer division of each element of d by 2.4
> e=3/d
build the numeric vector e of dimension 3
and elements 3/5, 3, –3/5
> log(d*e)
multiply the vectors d and e term by term
and transform each term into its natural logarithm
> sum(d)
calculate the sum of d
> length(d)
display the length of d
> t(d)
transpose d, the result is a row vector
> t(d)%*%e
scalar product between the row vector t(b) and
the column vector e with identical length
> t(d)*e
elementwise product between two vectors
with identical lengths
> g=c(sqrt(2),log(10)) build the numeric
√ vector g of dimension 2
and elements 2, log(10)
> e[d==5]
build the subvector of e that contains the
components e[i] such that d[i]=5
> a[-3]
create the subvector of a that contains
all components of a but the third.6
> is.vector(d)
display the logical expression TRUE if
a vector and FALSE else
> a=c(5,5.6,1,4,-5)
Fig. 1.1. Illustrations of the processing of vectors in R.
Note the convenient use of Boolean expressions to extract subvectors from
a vector without having to resort to a component-by-component test (and
hence a loop). The quantity d==5 is itself a vector of Booleans, while the number of components satisfying the constraint can be computed by sum(d==5).
The ability to apply scalar functions to vectors as a whole is also a major advantage of R. In the event the function depends on a parameter or an option,
this quantity can be entered as in
> e=gamma(e^2,log=T)
which returns the vector with components log Γ (e2i ). Functions that are specially designed for vectors include, for instance, sample, permn, order, sort,
and rank, which all have to do with manipulating the order in which the
components of the vector occur. (Note that permn is part of the combinat
6
Positive and negative indices cannot be used simultaneously.
1.3 R objects
9
library, as indicated when typing help.search(“permn”), which returns a
permn(combinat) among its matching entries.)
Exercise 1.3 Test the help command on the functions seq, sample, and
order. (Hint: Start with help(help).)
Exercise 1.4 Explain the difference between the functions order and rank. For
the function rep. , explain the difference between the options times, length.out,
and each.
Besides their numeric and logical indexes, the components of a vector can
also be identified by names. For a given vector x, names(x) is a vector of
characters of the same length as x. This additional attribute is most useful
when dealing with real data, where the components have a meaning such
as “unemployed” or “democrat”. Those names can also be erased by the
command
> names(x)=NULL
The : operator found in Figure 1.1 is a very useful device that defines
a consecutive sequence, but it is also fragile in that reverse sequences do
not always produce what is expected.7 For one thing, 1:n-1 is interpreted
as (1:n)-1 rather than 1:(n-1). For another, while 3:1 returns the vector
c(3,2,1), the command 1:0 returns c(1,0), which may or may not be okay
depending on the circumstances. For instance, a[1:0] will only return a[1],
and this may not be the limiting case the programmer had in mind. Note also
that a[0] does not produce an error message but a vector with length zero.
Exercise 1.5 Show that the alternative seq(1,n-1,by=1) does not suffer from
the same drawbacks as 1:(n-1). Find a modification of by=1 that handles the
case where n ≤ 1.
1.3.2 The matrix, array, and factor classes
The matrix class provides the R representation of matrices. A typical entry is,
for instance,
> x=matrix(vec,nrow=n,ncol=p)
which creates an n × p matrix whose elements are those of the vector vec,
assuming this vector is of dimension np. An important feature of this entry
is that, in a somewhat unusual way, the components of vec are stored by
column, which means that x[1,1] is equal to vec[1], x[2,1] is equal to
7
This difficulty was pointed out by Radford Neal.
10
1 Basic R Programming
vec[2], and so on, except if the option byrow=T is used in matrix. (Because
of this choice of storage convention, working on R matrices columnwise is
faster then working rowwise.) Note also that, if vec is of dimension n × p, it is
not necessary to specify both the nrow=n and ncol=p options in matrix. One
of those two parameters is sufficient to define the matrix. On the other hand,
if vec is not of dimension n × p, matrix(vec,nrow=n,ncol=p) will create an
n × p matrix with the components of vec repeated the appropriate number of
times. For instance,
> matrix(1:4,ncol=3)
[,1] [,2] [,3]
[1,]
1
3
1
[2,]
2
4
2
Warning message:
data length [4] is not a submultiple or multiple of the number
of columns [3] in matrix in: matrix(1:4, ncol = 3)
produces a 2 × 3 matrix along with a warning message that something may
be missing in the call to matrix. Note again that 1, 2, 3, 4 are entered consecutively when following the column (or lexicographic) order. Names can be
given to the rows and columns of a matrix using the rownames and colnames
functions.
In some situations, it is useful to remember that an R matrix can also be
used as a vector. If x is an n×p matrix, x[i, j]=x[i+n*(j-1)] is equal
to x[i,j], i.e., x can also be manipulated as a vector made of the columns
of vec piled on top of one another. For instance, x[x>5] is a vector, while
x[x>5]=0 modifies the right entries in the matrix x. Conversely, vectors can
be turned into p × 1 matrices by the command as.matrix. Note that x[1,]
produces the first row of x as a vector rather than as a p × 1 matrix.
R allows for a wide range of manipulations on matrices, both termwise and
in the classical matrix algebra perspective. For instance, the standard matrix
product is denoted by %*%, while * represents the term-by-term product. (Note
that taking the product a%*%b when the number of columns of a differs from
the number of rows of b produces an error message.) Figure 1.2 gives a few
examples of matrix-related commands. The apply function is particularly easy
to use for functions operating on matrices by row or column.
The function diag can be used to extract the vector of the diagonal elements of a matrix, as in diag(a), or to create a diagonal matrix with a
given diagonal, as in diag(1:10). Since matrix algebra is central to good
programming in R, as matrix programming allows for the elimination of timeconsuming loops, it is important to be familiar with matrix manipulation. For
instance, the function crossprod replaces the product t(x)%*%y on either
vectors or matrices by crossprod(x,y) more efficiently:
> system.time(crossprod(1:10^6,1:10^6))
1.3 R objects
11
build the numeric matrix x1 of dimension
5 × 4 with first row 1, 6, 11, 16
> x2=matrix(1:20,nrow=5,byrow=T) build the numeric matrix x2 of dimension
5 × 4 with first row 1, 2, 3, 4
> a=x3%*%x2
matrix summation of x2 and x3
> x3=t(x2)
transpose the matrix x2
> b=x3%*%x2
matrix product between x2 and x3,
with a check of the dimension compatibility
> c=x1*x2
term-by-term product between x1 and x2
> dim(x1)
display the dimensions of x1
> b[,2]
select the second column of b
> b[c(3,4),]
select the third and fourth rows of b
> b[-2,]
delete the second row of b
> rbind(x1,x2)
vertical merging of x1 and x2
> cbind(x1,x2)
horizontal merging of x1 and x2
> apply(x1,1,sum)
calculate the sum of each row of x1
> as.matrix(1:10)
turn the vector 1:10 into a 10 × 1 matrix
> x1=matrix(1:20,nrow=5)
Fig. 1.2. Illustrations of the processing of matrices in R.
user system elapsed
0.016
0.048
0.066
> system.time(t(1:10^6)%*%(1:10^6))
user system elapsed
0.084
0.036
0.121
(You can also check the symmetric function tcrossprod.)
Eigenanalysis of square matrices is also included in the base package. For
instance, chol(m) returns the upper triangular factor of the Choleski decomposition of m; that is, the matrix R such that RT R is equal to m. Similarly,
eigen(m) returns a list (see Section 1.3.3) that contains the eigenvalues of
m (some of which can be complex numbers) as well as the corresponding
eigenvectors (some of which are complex if there are complex eigenvalues).
Related functions are svd and qr, which provide the singular values and the
QR decomposition of their argument, respectively. Note that the inverse M−1
of a matrix M can be found either by solve(M) (recommended) or ginv(M),
which requires downloading the library MASS and also produces generalized
inverses (which may be a mixed blessing since the fact that a matrix is not invertible is not signaled by ginv). Special versions of solve are backsolve and
forwardsolve, which are restricted to upper and lower diagonal triangular
systems, respectively. Note also the alternative of using chol2inv which returns the inverse of a matrix m when provided by the Choleski decomposition
chol(m).
Structures with more than two indices are represented by arrays and can
also be processed by R commands, for instance x=array(1:50,c(2,5,5)),
which gives a three-entry table of 50 terms. Once again, they can also be
interpreted as vectors.
12
1 Basic R Programming
The apply function used in Figure 1.2 is a very powerful device that operates on arrays and, in particular, matrices. Since it can return arrays, it
bypasses calls to multiple loops and makes for (sometimes) quicker and (always) cleaner programs. It should not be considered as a panacea, however,
as apply hides calls to loops inside a single command. For instance, a comparison of apply(A, 1, mean) with rowMeans(A) shows the second version
is about 200 times faster. Using linear algebra whenever possible is therefore
a more efficient solution. Spector (2009, Section 8.7) gives a detailed analysis
of the limitations of apply and the advantages of vectorization in R.
A factor is a vector of characters or integers used to specify a discrete
classification of the components of other vectors with the same length. Its
main difference from a standard vector is that it comes with a level attribute
used to specify the possible values of the factor. This structure is therefore
appropriate to represent qualitative variables. R provides both ordered and unordered factors, whose major appeal lies within model formulas, as illustrated
in Figure 1.3. Note the subtle difference between apply and tapply.
> state=c(“tas”,”tas”,”sa”,”sa”,”wa”) create a vector with five values
> statef=factor(state)
distinguish entries by group
> levels(statef)
give the groups
> incomes=c(60,59,40,42,23)
create a vector of incomes
> tapply(incomes,statef,mean)
average the incomes for each group
> statef=factor(state,
define a new level with one more
+ levels=c(“tas”,”sa”,”wa”,”yo”))
group than observed
> table(statef)
return statistics for all levels
Fig. 1.3. Illustrations of the factor class.
1.3.3 The list and data.frame classes
A list in R is a rather loose object made of a collection of other arbitrary
objects known as its components.8 For instance, a list can be derived from n
existing objects using the function list:
a=list(name_1=object_1,…,name_n=object_n)
This command creates a list with n arguments using object_1,…,object_n
for the components, each being associated with the argument’s name, name_i.
For instance, a$name_1 will be equal to object_1. (It can also be represented
as a[[1]], but this is less practical, as it requires some bookkeeping of the
order of the objects contained in the list.) Lists are very useful in preserving
information about the values of variables used within R functions in the sense
8
Lists can contain lists as elements.
1.3 R objects
13
that all relevant values can be put within a list that is the output of the
corresponding function (see Section 1.7 for details about the construction of
functions in R). Most standard functions in R, for instance eigen in Figure 1.4,
return a list as their output. Note the use of the abbreviations vec and val in
the last line of Figure 1.4. Such abbreviations are acceptable as long as they
do not induce confusion. (Using res$v would not work!)
> li=list(num=1:5,y=”color”,a=T)
create a list with three arguments
> a=matrix(c(6,2,0,2,6,0,0,0,36),nrow=3) create a (3, 3) matrix
> res=eigen(a,symmetric=T)
diagonalize a and
> names(res)
produce a list with two
arguments: vectors and values
> res$vectors
vectors arguments of res
> diag(res$values)
create the diagonal matrix
of eigenvalues
> res$vec%*%diag(res$val)%*%t(res$vec) recover a
Fig. 1.4. Chosen features of the list class.
The local version of apply is lapply, which computes a function for each
argument of the list
> x = list(a = 1:10, beta = exp(-3:3),
+ logic = c(TRUE,FALSE,FALSE,TRUE))
> lapply(x,mean) #compute the empirical means
$a
[1] 5.5
$beta
[1] 4.535125
$logic
[1] 0.5
provided each argument is of a mode that is compatible with the function
argument (i.e., is numeric in this case). A “user-friendly” version of lapply is
sapply, as in
> sapply(x,mean)
a
beta
logic
5.500000 4.535125 0.500000
The last class we briefly mention here is the data frame. A data frame
is a list whose elements are possibly made of differing modes and attributes
but have the same length, as in the example provided in Figure 1.5. A data
frame can be displayed in matrix form, and its rows and columns can be
extracted using matrix indexing conventions. A list whose components satisfy
the restrictions imposed on a data frame can be coerced into a data frame
14
1 Basic R Programming
using the function as.data.frame. The main purpose of this object is to
import data from an external file by using the read.table function.
simulate 30 independent uniform
random variables on {1, 2, . . . , 12}
> v2=sample(LETTERS[1:10],30,rep=T) simulate 30 independent uniform
random variables on {a, b, …., j}
> v3=runif(30)
simulate 30 independent uniform
random variables on [0, 1]
> v4=rnorm(30)
simulate 30 independent realizations
from a standard normal distribution
> xx=data.frame(v1,v2,v3,v4)
create a data frame
> v1=sample(1:12,30,rep=T)
Fig. 1.5. Definition of a data frame.
1.4 Probability distributions in R
R is primarily a statistical language. It is therefore well-equipped with probability distributions. As described in Table 1.1, all standard distributions are
available, with a clever programming shortcut: A “core” name, such as norm,
is associated with each distribution, and the four basic associated functions,
namely the cdf, the pdf, the quantile function, and the simulation procedure,
are defined by appending the prefixes d, p, q, and r to the core name, such
as dnorm, pnorm, qnorm, and rnorm. Obviously, each function requires additional entries, as in pnorm(1.96) or rnorm(10,mean=3,sd=3). Recall that
pnorm and qnorm are inverses of one another.
Exercise 1.6 Study the properties of the R function lm using simulated data as
in
> x=rnorm(20)
> y=3*x+5+rnorm(20,sd=0.3)
> reslm=lm(y∼x)
> summary(reslm)
The simulation aspects related to the normal distribution (and to these other
standard distributions) will be discussed in detail in Chapter 2.
1.5 Basic and not-so-basic statistics
R is designed by statisticians, as the logical continuation of the former S-plus
language, and, as such, it offers a very wide range of statistical packages that
cover the entire spectrum of statistics. The battery of these (classical) statistical tools, ranging from descriptive statistics to non-parametric density
1.5 Basic and not-so-basic statistics
15
Table 1.1. Standard distributions with R core name.
Distribution
Beta
Binomial
Cauchy
Chi-square
Exponential
F
Gamma
Geometric
Hypergeometric
Log-normal
Logistic
Normal
Poisson
Student
Uniform
Weibull
Core
beta
binom
cauchy
chisq
exp
f
gamma
geom
hyper
lnorm
logis
norm
pois
t
unif
weibull
Parameters
shape1, shape2
size, prob
location, scale
df
1/mean
df1, df2
shape,1/scale
prob
m, n, k
mean, sd
location, scale
mean, sd
lambda
df
min, max
shape
Default Values
0, 1
1
NA, 1
0, 1
0, 1
0, 1
0, 1
estimation and generalized linear models, cannot be provided in this book,
but we refer you to, for instance, Dalgaard (2002) or Venables and Ripley
(1999) for a detailed introduction.
At the most basic level, descriptive statistics can be obtained for any object
with numerical entries. For instance, mean (possibly trimmed), var, sd, median,
quantile, and summary produce standard estimates for the samples on which
they are called. Note that, due to the choice of an unbiased version of this
estimator that involves dividing the sum of squares by n−1 instead of dividing
by n, the variance estimator of a single observation is NA rather than 0.
When applied to a matrix x, the output of var(x) differs from the output
of sd(x)^2
> b=matrix(1:9,ncol=3)
> var(b)
[,1] [,2] [,3]
[1,]
1
1
1
[2,]
1
1
1
[3,]
1
1
1
> sd(b)^2
[1] 1 1 1
because the former returns an estimate of the covariance between the columns
of x, while the latter produces an estimate of the variances of the columns.
Note that the definition of b only specifies the number of columns, 3 in this
16
1 Basic R Programming
case, and thus assumes that the length of the vector is a multiple of 3. (If
it is not, R produces a warning that the data length is not a submultiple or
multiple of the number of columns.)
Classical hypothesis tests, such as the equality of two means or the equality of two variances, can be conducted using standard functions. Typing
help.search(“test”) will produce a list of tests that most likely contains
more tests than you have previously heard of. For example, checking that the
mean of a normal sample with unknown variance is zero can be conducted
using the t test (Casella and Berger, 2001) as
> x=rnorm(25) #produces a N(0,1) sample of size 25
> t.test(x)
One Sample t-test
data: x
t = -0.8168, df = 24, p-value = 0.4220
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.4915103 0.2127705
sample estimates:
mean of x
-0.1393699
whose outcome can be interpreted as providing a p-value of 0.4220 (i.e., a fairly
large probability of observing a larger empirical average x̄ than the one just
observed, −0.139) and hence as concluding that the data do not contradict
the null hypothesis.
As pointed out previously, all but the most basic R functions return lists
as their output (or value). For instance, when running t.test above, the
output involves nine arguments:
> out=t.test(x)
> names(out)
[1] “statistic” “parameter” “p.value” “conf.int” “estimate”
[6] “null.value” “alternative” “method” “data.name”
which can be handled separately, as for instance in as.numeric(out$est)^2.
Similarly, the presence of correlation between two variables can be tested
by cor.test, as in the example
> attach(faithful) #resident dataset
> cor.test(faithful[,1],faithful[,2])
1.5 Basic and not-so-basic statistics
17
Pearson’s product-moment correlation
data: faithful[, 1] and faithful[, 2]
t = 34.089, df = 270, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8756964 0.9210652
sample estimates:
cor
0.9008112
which concludes that the data, faithful, made of eruptions and waiting,
which correspond to the eruption times and the waiting times of the Old Faithful geyser in Yellowstone National Park, has its two variables most certainly
correlated.
Non-parametric tests such as the one-sample and two-sample Kolmogorov–
Smirnov adequation tests (ks.test), Shapiro’s normality test (shapiro.test),
Kruskall–Wallis homogeneity test (kruskal.test), and Wilcoxon rank tests
(wilcox.test) are available. For instance, testing for normality on the faithful
dataset leads to
> ks.test(jitter(faithful[,1]),pnorm)
One-sample Kolmogorov-Smirnov test
data: jitter(faithful[, 1])
D = 0.9486, p-value < 2.2e-16
alternative hypothesis: two-sided
> shapiro.test(faithful[,2])
Shapiro-Wilk normality test
data: faithful[, 2]
W = 0.9221, p-value = 1.016e-10
> wilcox.test(faithful[,1])
Wilcoxon signed rank test with continuity correction
data: faithful[, 1]
V = 37128, p-value < 2.2e-16
alternative hypothesis: true location is not equal to 0
In the first command line above, the function jitter is used to perturb each
entry in the dataset in order to remove the ties within it. Otherwise, the
p-value cannot be computed:
18
1 Basic R Programming
Warning message:
cannot compute correct p-values with ties in:
ks.test(faithful[, 1], pnorm)
This function is also quite handy when plotting datasets with ties.
Most R functions require arguments, and most of them have default values
for at least some arguments. For instance, the Wilcoxon test wilcox.test has
mu=0 as its default location to be tested. Those default values are indicated
on the help page of the functions.
Non-parametric kernel density estimates can similarly be constructed via
the function density and are quite amenable to calibration, from the choice of
the kernel to the choice of the bandwidth (see Venables and Ripley, 1999).
In our case, they will be handy as sample-based proposals when designing
MCMC algorithms in Chapters 6 and 8. Spline modeling also is available via
the functions spline and splinefun. Non-parametric regression can be performed
via the loess function or using natural splines.
For instance, the data constructed as
> Nit = c(0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,6,6,6)
> AOB =c(4.26,4.15,4.68,6.08,5.87,6.92,6.87,6.25,
+ 6.84,6.34,6.56,6.52,7.39,7.38,7.74,7.76,8.14,7.22)
reports on the relationship between nitrogen level in soil (coded 0,1,2,3,4,6)
and abundance of a bacteria called AOB, reproduced on the left-hand side of
Figure 1.6. The loess and natural spline fits are obtained via the R code
> AOBm=tapply(AOB,Nit,mean)
#means of AOB
> Nitm=tapply(Nit,Nit,mean)
#means of Nit
> plot(Nit,AOB,xlim=c(0,6),ylim=c(min(AOB),max(AOB)),pch=19)
> fitAOB=lm(AOBm∼ns(Nitm,df=2))
#natural spline
> xmin=min(Nit);xmax=max(Nit)
> lines(seq(xmin,xmax,.5),
#fit to means
+
predict(fitAOB,data.frame(Nitm=seq(xmin,xmax,.5))))
> fitAOB2=loess(AOBm∼Nitm,span = 1.25)
#loess
> lines(seq(xmin,xmax,.5),
#fit to means
+
predict(fitAOB2,data.frame(Nitm=seq(xmin,xmax,.5))))
where the function ns requires the splines library. The loess fit will vary
with the choice of span, as the natural spline fit will vary with the choice of
ns.
Covariates can be used as well for more advanced statistical modeling.
Indeed, linear and generalized linear (regression) models are similarly welldeveloped in R. The syntax is slightly unusual, though, since a standard linear
regression is launched as follows:
> x=seq(-3,3,le=5) # equidispersed regressor
1.5 Basic and not-so-basic statistics
19
Fig. 1.6. Scatterplot of bacteria abundance (AOB) versus nitrogen levels (left
panel). The right panel shows both the natural spline fit (dark) with ns=2 and loess
fit (light) with span=1.25.
> y=2+4*x+rnorm(5) # simulated variable
> lm(y∼x)
Call:
lm(formula = y ∼ x)
Coefficients:
(Intercept)
1.820
x
4.238
> summary(lm(y∼x))
Call:
lm(formula = y ∼ x)
Residuals:
1
2
0.25219 -0.07421
3
4
0.07080 -0.92773
5
0.67895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
20
1 Basic R Programming
(Intercept)
1.8203
0.3050
5.967 0.00942 **
x
4.2381
0.1438 29.472 8.58e-05 ***
–Signif. codes: 0 ‘***’ .001 ‘**’ .01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6821 on 3 degrees of freedom
Multiple R-Squared: 0.9966,
Adjusted R-squared: 0.9954
F-statistic: 868.6 on 1 and 3 DF, p-value: 8.58e-05
The core idea is to introduce the model formula y∼x as the argument to the
function. This model means that y is regressed on x. If no intercept is involved,
the model is modified as y∼x-1. Introducing interactions in the regression can
be specified via the colon (:) symbol, following the syntax of McCullagh and
Nelder (1989) for generalized linear models.
The function lm produces a list, and the estimates of the regression coefficients can be recovered as lm(y∼x)$coeff. Surprisingly, the estimated
standard error (0.6821 above) is not an argument of this list and needs to be
computed by
> out=lm(y∼x)
> sqrt(sum(out$res^2)/out$df)
[1] 0.6821
rather than via var(out$res), which uses the “wrong” number of degrees
of freedom. Note that the package arm (Gelman and Hill, 2006) provides a
cleaner output than summary via its display function.
An analysis of variance can be done by recycling output from lm, as in
this analysis on the impact of food type on chicken weights:
> summary(lm(weight ∼ feed, data = chickwts))
Call:
lm(formula = weight ∼ feed, data = chickwts)
Residuals:
Min
1Q
-123.909 -34.413
Median
1.571
3Q
38.170
Max
103.091
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
323.583
15.834 20.436 < 2e-16 ***
feedhorsebean -163.383
23.485 -6.957 2.07e-09 ***
feedlinseed
-104.833
22.393 -4.682 1.49e-05 ***
feedmeatmeal
-46.674
22.896 -2.039 0.045567 *
feedsoybean
-77.155
21.578 -3.576 0.000665 ***
1.5 Basic and not-so-basic statistics
21
feedsunflower
5.333
22.393
0.238 0.812495
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 54.85 on 65 degrees of freedom
Multiple R-Squared: 0.5417,
Adjusted R-squared: 0.5064
F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
> anova(lm(weight ∼ feed, data = chickwts))
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value
Pr(>F)
feed
5 231129
46226 15.365 5.936e-10 ***
Residuals 65 195556
3009
–Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
where the first command produces the regression coefficients for each type
of food, while the second command evaluates the relevance of the regression
model (and concludes positively). When using factor variables, more specific
analyzes can be conducted by splitting the degrees of freedom in aov using
the option split.
Generalized linear models can be equally well-estimated thanks to the
polymorphic function glm. For instance, fitting a binomial generalized linear
model to the probability of suffering from diabetes for a woman within the
Pima Indian population is done by
> glm(formula = type ∼ bmi + age, family = “binomial”,
+
data = Pima.tr)
Deviance Residuals:
Min
1Q
Median
-1.7935 -0.8368 -0.5033
3Q
1.0211
Max
2.2531
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.49870
1.17459 -5.533 3.15e-08 ***
bmi
0.10519
0.02956
3.558 0.000373 ***
age
0.07104
0.01538
4.620 3.84e-06 ***
–Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
22
1 Basic R Programming
Null deviance: 256.41
Residual deviance: 215.93
AIC: 221.93
on 199
on 197
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
concluding with the significance both of the body mass index bmi and the age.
Other generalized linear models can be defined by using a different family
value whose range is provided by the function family. Note also that link
functions different from the intrinsic (and default) link functions (McCullagh
and Nelder, 1989) can be specified, as well as a scale factor, as in
> glm(y ∼ x, family=quasi(var=”mu^2″, link=”log”))
where the model corresponds to a quasi-likelihood with link equal to the log
function.
Unidimensional and multidimensional time series (xt )t can be handled
directly by the arima function, following a Box–Jenkins-like analysis,
> arima(diff(EuStockMarkets[,1]),order=c(0,0,5))
Call:
arima(x = diff(EuStockMarkets[, 1]), order = c(0, 0, 5))
Coefficients:
ma1
ma2
0.0054 -0.0130
s.e. 0.0234
0.0233
ma3
-0.0110
0.0221
sigma^2 estimated as 1053:
aic = 18226.45
ma4
-0.0041
0.0236
ma5
-0.0486
0.0235
intercept
2.0692
0.6990
log likelihood = -9106.23,
while more advanced models can be fitted thanks to the function StructTS.
Simpler time-series functions can also be used, such as
> acf(ldeaths, plot=F) #monthly deaths from bronchitis,
#emphysema and asthma in the UK, 1974-1979
Autocorrelations of series ‘ldeaths’, by lag
0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833
1.000 0.755 0.397 0.019 -0.356 -0.609 -0.681 -0.608
0.6667 0.7500 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500
-0.378 -0.013 0.383 0.650 0.723 0.638 0.372 0.009
1.3333 1.4167 1.5000
-0.294 -0.497 -0.586
which is less straightforward to analyze than its graphical alternatives
1.5 Basic and not-so-basic statistics
23
Fig. 1.7. Autocorrelation and partial autocorrelation plots for the ldeaths series,
representing monthly deaths from bronchitis, emphysema, and asthma in the UK
over the period 1974—1979. The dashed horizontal lines correspond to the significance boundaries for the non-zero terms.
> acf(ldeaths)
> acf(ldeaths,type=”partial”)
represented on Figure 1.7 for both the standard and the partial autocorrelations. The standard autocorrelation exhibits quite clearly the seasonal pattern
of the deaths.
Lastly, we mention the bootstrap. This procedure has many uses (see Efron
and Tibshirani, 1993), but here we will mainly illustrate its use as a means of
attaching standard errors.
For readers unfamiliar with the notion of the bootstrap, we briefly recall
here that this statistical method is based upon the notion that the empirical
distribution of a sample X1 , . . . , Xn converges in n to the true distribution.
(The empirical distribution is a discrete distribution that puts a probability
1/n on every point Xi of the sample and 0 everywhere else.) The bootstrap
procedure then uses the empirical distribution as a substitute for the true
distribution to construct variance estimates and confidence intervals. Given
24
1 Basic R Programming
Fig. 1.8. Histogram of 2500 bootstrap means of a sample of size 8 generated from
a gamma G(4, 1) distribution, along with the normal approximation.
that the empirical distribution has a finite but large support made of nn
points, Monte Carlo approximations are almost always necessary to evaluate
those quantities, as illustrated in the next example.
For example, if we have a data vector y, we can create a bootstrap sample
y ∗ using the code
> ystar=sample(y,replace=T)
Figure 1.8 shows a histogram of 2500 bootstrap means mean(ystar) based on
the sample
y = c(4.313, 4.513, 5.489, 4.265, 3.641, 5.106, 8.006, 5.087),
along with the normal approximation based on the original sample (i.e., using
the empirical mean and empirical variance of y). The sample was in fact drawn
from a gamma G(4, 1) distribution, and we can see on this graph that the
bootstrap approximation does capture some of the skewness in the distribution
of ȳ (indeed, the sample size n = 8 is rather small for the Central Limit
Theorem to apply). The standard deviation of the sample is 0.4699, while the
standard deviation of the 2500 bootstrap means is 0.4368, showing agreement.
One difficulty in implementing the bootstrap technique is that it is not
always clear which quantity should be bootstrapped. A treatment of this im-
1.5 Basic and not-so-basic statistics
25
portant topic is outside the scope of this book, and we caution the reader to
verify the proper bootstrapping technique in any particular application.
As an example, we apply the bootstrap technique to the simple linear
regression example seen previously in this chapter.
Example 1.1. Recall the linear regression
> x=seq(-3,3,le=5)
> y=2+4*x+rnorm(5)
> lm(y∼x)
# equidispersed regressor
# simulated dependent variable
This corresponds to the regression model
Yij = α + βxi + εij ,
where α and β are the unknown intercept and slope, and the εij are the iid normal
errors. We fit the model with least squares and get α̂ = 1.820 and β̂ = 4.238.9
The residuals from the least squares fit are given by
ε̂ij = yij − α̂ − β̂xi ,
and these are the random variables making the sample that we bootstrap. That
is, we create bootstrap samples by resampling the ε̂ij ’s, producing a new sample
(ε̂∗ij )ij by sampling with replacement from the ε̂ij ’s. The bootstrap data are then
∗
= yij + ε̂∗ij . This can be implemented with the R code
yij
> fit=lm(y∼x)
#fit the linear model
> Rdata=fit$residuals
#get the residuals
> nBoot=2000
#number of bootstrap samples
> B=array(0,dim=c(nBoot, 2)) #bootstrap array
> for(i in 1:nBoot){
#bootstrap loop
>
ystar=y+sample(Rdata,replace=T)
>
Bfit=lm(ystar∼x)
>
B[i,]=Bfit$coefficients
>
}
The results of this bootstrap inference are summarized in Figure 1.9, where
we provide the histograms of the 2000 bootstrap replicates of both regression
coefficients. We can also derive from those replicates confidence intervals on
both coefficients (by taking 2.5th and 97.5th percentiles, for example), as well as
confidence intervals on predicted values (i.e., for new values of x). For instance,
based on the bootstrap sample, the derived 90% confidence intervals are (in our
case) (2.350, 3.416) for the intercept α and (4.099, 4.592) for the slope β.
9
When running the same experiment, you will obviously get different numerical
values due to the use of a different random seed (see Section 2.1.1).
26
1 Basic R Programming
Fig. 1.9. Histogram of 2000 bootstrap intercepts (left) and slopes (right) for the
linear regression of Exercise 1.1. The least squares estimate of the intercept is 2.900,
and the slope estimate is 4.35.
Exercise 1.7 For the data associated with Figure 1.8:
a. Bootstrap the data and obtain a similar figure based on 1000 bootstrap replications. If the inference is about the 95% of the distribution of ȳ, q.95 (ȳ),
give a bootstrap estimate of this quantity, q̂.95 (ȳ).
b. Construct a bootstrap experiment that provides a 95% confidence interval on
q̂.95 (ȳ). (Hint: You have to use two levels of bootstrapping to achieve this
goal.)
Exercise 1.8 For a dataset simulated as in Example 1.1, compare the bootstrap confidence intervals on both coefficients to the usual ones based on the
t-distribution. Comment on the differences.
1.6 Graphical facilities
Another clear advantage of using the R language is that it allows a very rich
range of graphical possibilities. Functions such as plot and image can be
customized to a large extent, as described in Venables and Ripley (1999) or
1.6 Graphical facilities
27
Murrell (2005) (the latter being entirely dedicated to the R graphic abilities).
Even though the default output of plot as for instance in
> plot(faithful)
is not the most enticing, plot is incredibly flexible: To see the number of
parameters involved, you can type par() that delivers the default values of
all those parameters.
The wealth of graphical possibilities offered by R should be taken advantage of cautiously! That is, good design avoids clutter, small fonts,
unreadable scale, etc. The recommendations found in Tufte (1990, 2001)
are thus worth following to avoid horrid outputs like those often found
in some periodicals! In addition, graphs produced by R usually tend to
look nicer on the current device than when printed or included in a slide
presentation. Colors may change, font sizes may turn awkward, separate
curves may end up overlapping, and so on. In the early stages of working
with R, and even later, you should thus check that the different outputs
corresponding to the same graph are satisfactory before closing your R
session and losing hours of work!!!
Before covering the most standard graphic commands, we start by describing the notion of device that is at the core of those graphic commands. Each
graphical operation sends its outcome to a device, which can be a graphical
window (like the one that automatically appears when calling a graphical command for the first time as in the example above) or a file where the graphical
outcome is stored for printing or other uses. Under Unix and Linux OS, launching a new graphical window can be done via X11(), with many possibilities
for customization (such as size, positions, color, etc.). Once a graphical window is created, it is given a device number and can be managed by functions
that start with dev., such as dev.list, dev.set, and others. An important
command is dev.off, which closes the current graphical window. When the
device is a file, it is created by a function that is named after its driver. There
are therefore a postscript, a pdf, a jpeg, and a png function. The complete
list is given by capabilities(). When printing to a file, as in the following
example,
> jpeg(file=”faith,jpg”)
> par(mfrow=c(1,2),mar=c(4,2,2,1))
> hist(faithful[,1],nclass=21,col=”grey”,main=””,
+ xlab=names(faithful)[1])
> hist(faithful[,2],nclass=21,col=”wheat”,main=””,
+ xlab=names(faithful)[2])
> dev.off()
closing the sequence with dev.off() is recommended since it completes the
file, which is then saved. If the command jpeg(file=”faith,jpg”) is repeated, the earlier version of the jpeg file is erased.
28
1 Basic R Programming
Using a line command interface for controlling graphics may seem antiquated, but this is the consequence of the R object-oriented philosophy. In
addition, current graphs can be saved to a postscript file using the dev.copy
and dev.print functions. Note that R-produced graphs tend to be large objects, in part because the graphs are not pictures of the current state but
instead preserve every action ever taken. For this reason, long series should
be thinned down to a few thousand points, images should work with a few
hundred pixels, contours should be preferred to images, and jpeg preferred to
pdf.
One of the most frustrating features of R is that the graphical device
is not refreshed while a program is executed in the main window. This
implies that, if you switch from one terminal to another or if the screen
saver starts, the whole or parts of the graph currently on the graphical
device will not be visible until the completion of the program. Conversely,
refreshing very large graphs will delay the activation of the prompt >.
As already stressed above, plot is a highly versatile tool that can be used
to represent functional curves and two-dimensional datasets. Colors (chosen
by colors() or colours() out of 650 hues), widths, and types can be calibrated at will and LATEX-like formulas can be included within the graphs
using expression; see plotmath(grDevices) for a detailed list of the mathematical symbols. Text and legends can be included at a specific point with
locator (see also identify) and legend. An example of (relatively simple)
output is
> plot(as.vector(time(mdeaths)),as.vector(mdeaths),cex=.6,
+ pch=19,xlab=””,ylab=”Monthly deaths from bronchitis”)
> lines(spline(mdeaths),lwd=2,col=”chocolate”,lty=3)
> ar=arima(mdeaths,order=c(1,0,0))$coef
> lines(as.vector(time(mdeaths))[-1], ar[2]+ar[1]*
+ (mdeaths[-length(mdeaths)]-ar[2]),col=”grey”,lwd=2,lty=2)
+ title(“Splines versus AR(1) predictor”)
> ari=arima(mdeaths,order=c(1,0,0),seasonal=list(order=c(1,
+ 0,0),period=12))$coef
> lines(as.vector(time(mdeaths))[-(1:13)],ari[3]+ari[1]*
+ (mdeaths[-c(1:12,72)]-ari[3])+ari[2]*(mdeaths[-(60:72)]+ ari[3]),lwd=2,col=”steelblue”,lty=2)
> title(“\n\nand SAR(1,12) predictor”)
+ legend(1974,2800,legend=c(“spline”,”AR(1)”,”SAR(1,12)”),
+ col=c(“chocolate”,”grey”,”steelblue”),
+ lty=c(3,2,2),lwd=rep(2,3),cex=.5)
1.6 Graphical facilities
29
represented ion Figure 1.10, which compares spline fitting to an AR(1) predictor and to an SAR(1,12) predictor. Note that the seasonal model is doing
worse.
Fig. 1.10. Monthly deaths from bronchitis in the UK over the period 1974—1980
and fits by a spline approximation and two AR predictors.
Another example illustrates the use of the command cumsum, which is
particularly handy when checking Monte Carlo convergence, as discussed in
the remark box of page 66.
> x=rnorm(1)
> for (t in 2:10^3)
+
x=c(x,.09*x[t-1]+rnorm(1))
> plot(x,type=”l”,xlab=”time”,ylab=”x”,lwd=2,lty=2,
+ col=”steelblue”,ylim=range(cumsum(x)))
> lines(cumsum(x),lwd=2,col=”orange3″)
30
1 Basic R Programming
This four-line program generates a simple AR(1) sequence and plots the original sequence (xt ) along with the cumulated sum sequence,
t
xi .
i=1
Note that, due to the high correlation factor (0.9), the cumulated sum is
behaving much closer to a random walk.
Fig. 1.11. Simulated AR(1) sequence (dotted) along with its corresponding cumulated sum.
Useful graphical functions include hist, for constructing and optionally
plotting histograms of datasets; points, for adding points on an existing
graph; lines, for linking points together on an existing graph, as in the above
example; polygon, for filling the area between two sets of points; barplot, for
creating barplots; and boxplot, for creating boxplots. The two-dimensional
representations offered by image and contour are quite handy when provid-
1.7 Writing new R functions
31
ing likelihood or posterior surfaces, as in Figures 3.5 and 5.2. An instance of
using polygon is provided by
>
>
>
>
>
+
>
+
>
+
par(mar=c(2,2,2,2))
x=matrix(0,ncol=100,nrow=10^4)
for (t in 2:10^4)
x[t,]=x[t-1,]+rnorm(100)*10^(-2)
plot(seq(0,1,le=10^4),x[,1],ty=”n”,
ylim=range(x),xlab=””,ylab=””)
polygon(c(1:10^4,10^4:1)/10^4,c(apply(x,1,max),
rev(apply(x,1,min))),col=”gold”,bor=F)
polygon(c(1:10^4,10^4:1)/10^4,c(apply(x,1,quantile,.95),
rev(apply(x,1,quantile,.05))),col=”brown”,bor=F)
which approximates the range of 100 Brownian motions, as well as a 90%
confidence band, represented in Figure 1.12 (see Kendall et al., 2007, and
Section 4.5).
The command points is used to add one or several points on a twodimensional plot. It suffers from a drawback, however, in that the entry
is by default a time series. Therefore, calling points(x) when x is a
two-dimensional vector will plot both points (1, x1 ) and (2, x2 ) rather
than the single point (x1 , x2 ). The result will be as expected if x is a
two-column matrix, resulting in the points (xi1 , xi2 ) being plotted.
These comments are only here to provide an introduction to the capacities
of R. Specific references such as Murrell (2005) need to be consulted to get a
complete picture of those capacities!
1.7 Writing new R functions
One of the strengths of R is that new functions and libraries can be created
by anyone and then added to Web depositories to continuously enrich the
language. These new functions are not distinguishable from the core functions
of R, such as median or var, because those are also written in R. This means
their code can be accessed and potentially modified, although it is safer to
define new functions. (A few functions are written in C, however, for efficiency.)
Learning how to write functions designed for one’s own problems is paramount
for their resolution, even though the huge collection of available R functions
may often contain a function already written for that purpose.
Exercise 1.9 Among the R functions you have met so far, check which ones are
written in R by simply typing their name without parentheses, as in mean or var.
32
1 Basic R Programming
Fig. 1.12. Range of 100 simulated Brownian motions (lighter hue) and 90% confidence band (darker hue).
A function is defined in R by an assignment of the form
name=function(arg1[=expr1],arg2[=expr2],…) {
expression
…
expression
value
}
where expression denotes an R command that uses some of the arguments
arg1, arg2, … to calculate a value, value, that is the outcome of the
function. The braces indicate the beginning and the end of the function and
the brackets some possible default values for the arguments. Note that producing a value at the end of a function is essential because anything done
1.7 Writing new R functions
33
within a function is local and temporary, and is therefore lost once the function has been exited unless saved in value (hence, again, the appeal of list).
For instance, the following function, named sqrnt, implements a version of
Newton’s method for calculating the square root of y:
sqrnt=function(y){
x=y/2
while (abs(x*x-y) > 1e-10) x=(x+y/x)/2
x
}
When designing a new R function, it is more convenient to use an external
text editor and to store the function under development in an external file,
say myfunction.R, which can be executed in R as source(“myfunction.R”).
Note also that some external commands can be launched within an R function
via the very handy command system. This is, for instance, the easiest (if not
the most efficient) way to incorporate programs written in other languages
(e.g., Fortran, C, Matlab) within R programs.
The fact that R is an interpreted language obviously helps in debugging
programs. Indeed, whenever a program is interrupted or aborted, the variables
that have been previously defined keep their latest value and can thus be
used to assess the cause of the error. This is not always sufficient though,
in particular because variables defined within functions are not stored, and a
useful tool is the pause command browser, which temporarily stops a program
to allow the programmer to check the values of all variables at a given point.
Further debugging commands are debug and trace.
Using an external text editor and an external program file is important
for two reasons. First, using the line editor of R is inefficient and close to
impossible for complex problems, even though functions can be internally
edited by the Unix vi editor as myfun=vi(myfun). Cut-and-paste is just
much easier. Second, this forces you to save your functions rather than
relying on .RData and .Rhistory, which are not 100% secure.
The expressions used in a function rely on a syntax that is quite similar to
those of other programming languages, with conditional statements such as
if (expres1) expres2 else expres3
where expres1 is a logical value, and loops such as
for (name in expres1) expres2
and
while (expres4) expres2
34
1 Basic R Programming
where expres1 is a collection of values, as illustrated in Figure 1.13, and
expres4 is a Boolean expression. In particular, Boolean operators can be
used within those expressions, including == for testing equality, != for testing
inequality, & for the logical and, | for the logical or, and ! for the logical
contradiction.
> bool=T;i=0
> while(bool==T) {i=i+1; bool=(i s=0;x=rnorm(10000)
> system.time(for (i in 1:length(x)){
+
s=s+x[i]})[3]
> system.time(t(rep(1,10000))%*%x)[3]
> system.time(sum(x))[3]
separate commands by semicolons
stop at i = 10
output sum(x) and
provide computing time
compare with vector product
compare with sum efficiency
Fig. 1.13. Some artificial loops in R.
Exercise 1.10 Explain the difference between the logical operators & and &&,
|, ||, and xor.
The operator if (and the associated operators ifelse and ) are some of
the rare occurrences where R does not apply to vectors. For vector-valued
tests, logical vectors like (abs(x)>1.96) can be used as indices of the output
vector, like the allocation commands
> y[(abs(x)1.96))
> y[(abs(x)>1.96)]=x[(x>1.96)]
Since R is an interpreted language, avoiding loops by vectorial programming is generally a good idea, but this may render programs much harder to
read. It is therefore extremely useful to include comments within the programs
by using the symbol #.
As noted previously, R is fundamentally slower than other languages.
Checking both the speed of a program and the reasons for its poor speed
can be done using the system.time command or the more advanced profiling
commands Rprof and Rprofmem described in the manual. There are, however, ways of speeding up the execution of your programs. First, using faster
functions (for example, those already programmed in C; see below) obviously
brings improvement. Second, preallocating memory as in x=double(10^5)
also increases speed. Third (and this is getting way beyond the scope of this
introduction!), it is possible to recompile parts of the R package with libraries
that are designed for your machine. An example is the Blas (basic linear algebra subprogram), which can be optimized using the free library Atlas (and
lead to improvements by factors from two to five). Details can be found in the
1.8 Input and output in R
35
R administration manual. Fourth, and this again requires some programming
expertise, you can take advantage of multiple processors, using for instance
netWorkSpace (NWS), Rpmi, or snow, developed by Luke Tierney.
While we cannot delve much here into the issue of interfacing R with other
languages, we do nonetheless stress that this is an important feature of R that
you should investigate, simply because there are problems R is just too slow to
handle! Using some routines written in C or Fortran then becomes paramount,
without losing the main advantages of R altogether. The easiest way to connect R with external subroutines such as the C executable mycprog.o is to
design the corresponding C program to take its input from a file such as
mycinput and write its output in another file such as mycouput. In this case,
calling
> system(“mycprog.o”)
within the R program will be sufficient. Obviously, this is a rudimentary type
of interfacing and it suffers from two drawbacks, the first one being that
repeated access to files is time-consuming as well and the second one being that
the C program cannot call R functions this way. A more advanced approach is
based on the function .C, which can call C functions with arguments, and the
C subroutine call_R, as described for instance in Crawley (2007). The main
difficulty with these more advanced techniques is to ensure the compatibility
between data types. Section 8.5.2 provides an illustration of a C program being
called by an R program in an efficient manner.
1.8 Input and output in R
Large data objects need to be read as values from ex…