Statistics 3021Homework 6

8

Due: Friday, April 14

Chapter 6.1-6.4

x

x

x

6.1 (page 185) (Hint; use the definition of E(X) and V(X)).

6.2 (page 185)

6.4 (page 185) (a), (b)

Additional question (c) What is the probability that the individual waits more than 7

minutes today and waits less than 7 minutes tomorrow? Assume that the waiting time on

each day is independent.

x

Problem 1. A certain type of storage battery lasts, on average, 3.0 years with a standard

deviation of 0.5 year. Assuming that battery life is normally distributed, find the

following probabilities: Use 68-95-99.7 rule. Draw a normal curve to identify the

corresponding areas.

(a) a randomly selected battery will last more than 2 years but less than 2.5 years.

(b) a randomly selected battery will last less than 3.5 years.

x

6.5 (page 186) Additional questions: Draw a normal curve and identify/shade the

corresponding areas for each (a) through (f).

6.6 (page 186) Additional questions: Draw a normal curve and identify the given areas.

Locate the ‘z’ for each (a) through (d).

6.9 (page 186)

6.15 (page 186)

6.23

x

x

x

x

Probability & Statistics

for Engineers & Scientists

This page intentionally left blank

Probability & Statistics for

Engineers & Scientists

NINTH

EDITION

Ronald E. Walpole

Roanoke College

Raymond H. Myers

Virginia Tech

Sharon L. Myers

Radford University

Keying Ye

University of Texas at San Antonio

Prentice Hall

Editor in Chief: Deirdre Lynch

Acquisitions Editor: Christopher Cummings

Executive Content Editor: Christine O’Brien

Associate Editor: Christina Lepre

Senior Managing Editor: Karen Wernholm

Senior Production Project Manager: Tracy Patruno

Design Manager: Andrea Nix

Cover Designer: Heather Scott

Digital Assets Manager: Marianne Groth

Associate Media Producer: Vicki Dreyfus

Marketing Manager: Alex Gay

Marketing Assistant: Kathleen DeChavez

Senior Author Support/Technology Specialist: Joe Vetere

Rights and Permissions Advisor: Michael Joyce

Senior Manufacturing Buyer: Carol Melville

Production Coordination: Liﬂand et al. Bookmakers

Composition: Keying Ye

Cover photo: Marjory Dressler/Dressler Photo-Graphics

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as

trademarks. Where those designations appear in this book, and Pearson was aware of a trademark claim, the

designations have been printed in initial caps or all caps.

Library of Congress Cataloging-in-Publication Data

Probability & statistics for engineers & scientists/Ronald E. Walpole . . . [et al.] — 9th ed.

p. cm.

ISBN 978-0-321-62911-1

1. Engineering—Statistical methods. 2. Probabilities. I. Walpole, Ronald E.

TA340.P738 2011

519.02’462–dc22

2010004857

c 2012, 2007, 2002 Pearson Education, Inc. All rights reserved. No part of this publication may be

Copyright

reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,

photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the

United States of America. For information on obtaining permission for use of material in this work, please submit

a written request to Pearson Education, Inc., Rights and Contracts Department, 501 Boylston Street, Suite 900,

Boston, MA 02116, fax your request to 617-671-3447, or e-mail at http://www.pearsoned.com/legal/permissions.htm.

1 2 3 4 5 6 7 8 9 10—EB—14 13 12 11 10

ISBN 10: 0-321-62911-6

ISBN 13: 978-0-321-62911-1

This book is dedicated to

Billy and Julie

R.H.M. and S.L.M.

Limin, Carolyn and Emily

K.Y.

This page intentionally left blank

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Introduction to Statistics and Data Analysis . . . . . . . . . . .

1.1

1

Overview: Statistical Inference, Samples, Populations, and the

Role of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sampling Procedures; Collection of Data . . . . . . . . . . . . . . . . . . . . . . . .

Measures of Location: The Sample Mean and Median . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Measures of Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Discrete and Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Statistical Modeling, Scientiﬁc Inspection, and Graphical Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

General Types of Statistical Studies: Designed Experiment,

Observational Study, and Retrospective Study . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

30

Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

1.2

1.3

1.4

1.5

1.6

1.7

2

xv

2.1

2.2

2.3

2.4

2.5

2.6

2.7

Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Counting Sample Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Probability of an Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Additive Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Conditional Probability, Independence, and the Product Rule . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

7

11

13

14

17

17

18

35

38

42

44

51

52

56

59

62

69

72

76

77

viii

Contents

2.8

3

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

Random Variables and Probability Distributions . . . . . .

81

3.1

3.2

3.3

3.4

3.5

4

81

84

87

91

94

104

107

109

Mathematical Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.1

4.2

4.3

4.4

4.5

5

Concept of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Continuous Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Joint Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Mean of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Variance and Covariance of Random Variables. . . . . . . . . . . . . . . . . . . 119

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Means and Variances of Linear Combinations of Random Variables 128

Chebyshev’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Some Discrete Probability Distributions . . . . . . . . . . . . . . . . 143

5.1

5.2

5.3

5.4

5.5

5.6

Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Binomial and Multinomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Negative Binomial and Geometric Distributions . . . . . . . . . . . . . . . . .

Poisson Distribution and the Poisson Process . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

143

143

150

152

157

158

161

164

166

169

Contents

ix

6

Some Continuous Probability Distributions . . . . . . . . . . . . . 171

6.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

6.9

6.10

6.11

7

171

172

176

182

185

187

193

194

200

201

201

203

206

207

209

Functions of Random Variables (Optional) . . . . . . . . . . . . . . 211

7.1

7.2

7.3

8

Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Areas under the Normal Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Applications of the Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Normal Approximation to the Binomial . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Gamma and Exponential Distributions . . . . . . . . . . . . . . . . . . . . . . . . . .

Chi-Squared Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Weibull Distribution (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Transformations of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Moments and Moment-Generating Functions . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

211

211

218

222

Fundamental Sampling Distributions and

Data Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

8.1

8.2

8.3

8.4

8.5

8.6

8.7

8.8

8.9

Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Some Important Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sampling Distribution of Means and the Central Limit Theorem .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sampling Distribution of S 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

F -Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Quantile and Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

225

227

230

232

233

241

243

246

251

254

259

260

262

x

Contents

9

One- and Two-Sample Estimation Problems . . . . . . . . . . . . 265

9.1

9.2

9.3

9.4

9.5

9.6

9.7

9.8

9.9

9.10

9.11

9.12

9.13

9.14

9.15

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Classical Methods of Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

Single Sample: Estimating the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

Standard Error of a Point Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

Prediction Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Tolerance Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

Two Samples: Estimating the Diﬀerence between Two Means . . . 285

Paired Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

Single Sample: Estimating a Proportion . . . . . . . . . . . . . . . . . . . . . . . . . 296

Two Samples: Estimating the Diﬀerence between Two Proportions 300

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

Single Sample: Estimating the Variance . . . . . . . . . . . . . . . . . . . . . . . . . 303

Two Samples: Estimating the Ratio of Two Variances . . . . . . . . . . . 305

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

Maximum Likelihood Estimation (Optional) . . . . . . . . . . . . . . . . . . . . . 307

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

10 One- and Two-Sample Tests of Hypotheses . . . . . . . . . . . . . 319

10.1

10.2

10.3

Statistical Hypotheses: General Concepts . . . . . . . . . . . . . . . . . . . . . . .

Testing a Statistical Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Use of P -Values for Decision Making in Testing Hypotheses .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.4 Single Sample: Tests Concerning a Single Mean . . . . . . . . . . . . . . . . .

10.5 Two Samples: Tests on Two Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.6 Choice of Sample Size for Testing Means . . . . . . . . . . . . . . . . . . . . . . . .

10.7 Graphical Methods for Comparing Means . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.8 One Sample: Test on a Single Proportion. . . . . . . . . . . . . . . . . . . . . . . .

10.9 Two Samples: Tests on Two Proportions . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.10 One- and Two-Sample Tests Concerning Variances . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.11 Goodness-of-Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.12 Test for Independence (Categorical Data) . . . . . . . . . . . . . . . . . . . . . . .

319

321

331

334

336

342

349

354

356

360

363

365

366

369

370

373

Contents

xi

10.13 Test for Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.14 Two-Sample Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.15 Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

376

379

382

384

386

11 Simple Linear Regression and Correlation . . . . . . . . . . . . . . 389

11.1

11.2

11.3

Introduction to Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Simple Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Least Squares and the Fitted Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.4 Properties of the Least Squares Estimators . . . . . . . . . . . . . . . . . . . . . .

11.5 Inferences Concerning the Regression Coeﬃcients. . . . . . . . . . . . . . . .

11.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.7 Choice of a Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.8 Analysis-of-Variance Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.9 Test for Linearity of Regression: Data with Repeated Observations

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.10 Data Plots and Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.11 Simple Linear Regression Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.12 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.13 Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

389

390

394

398

400

403

408

411

414

414

416

421

424

428

430

435

436

442

12 Multiple Linear Regression and Certain

Nonlinear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 443

12.1

12.2

12.3

12.4

12.5

12.6

12.7

12.8

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Estimating the Coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Linear Regression Model Using Matrices . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Properties of the Least Squares Estimators . . . . . . . . . . . . . . . . . . . . . .

Inferences in Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Choice of a Fitted Model through Hypothesis Testing . . . . . . . . . . .

Special Case of Orthogonality (Optional) . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Categorical or Indicator Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

443

444

447

450

453

455

461

462

467

471

472

xii

Contents

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12.9 Sequential Methods for Model Selection . . . . . . . . . . . . . . . . . . . . . . . . .

12.10 Study of Residuals and Violation of Assumptions (Model Checking) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12.11 Cross Validation, Cp , and Other Criteria for Model Selection . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12.12 Special Nonlinear Models for Nonideal Conditions . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12.13 Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

476

476

482

487

494

496

500

501

506

13 One-Factor Experiments: General . . . . . . . . . . . . . . . . . . . . . . . . 507

13.1

13.2

13.3

Analysis-of-Variance Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Strategy of Experimental Design. . . . . . . . . . . . . . . . . . . . . . . . . . . .

One-Way Analysis of Variance: Completely Randomized Design

(One-Way ANOVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.4 Tests for the Equality of Several Variances . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.5 Single-Degree-of-Freedom Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . .

13.6 Multiple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.7 Comparing a Set of Treatments in Blocks . . . . . . . . . . . . . . . . . . . . . . .

13.8 Randomized Complete Block Designs. . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.9 Graphical Methods and Model Checking . . . . . . . . . . . . . . . . . . . . . . . .

13.10 Data Transformations in Analysis of Variance . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.11 Random Eﬀects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.12 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.13 Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

507

508

509

516

518

520

523

529

532

533

540

543

545

547

551

553

555

559

14 Factorial Experiments (Two or More Factors) . . . . . . . . . . 561

14.1

14.2

14.3

14.4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Interaction in the Two-Factor Experiment . . . . . . . . . . . . . . . . . . . . . . .

Two-Factor Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Three-Factor Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

561

562

565

575

579

586

Contents

xiii

14.5

14.6

Factorial Experiments for Random Eﬀects and Mixed Models. . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

588

592

594

596

15 2k Factorial Experiments and Fractions . . . . . . . . . . . . . . . . . 597

15.1

15.2

15.3

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The 2k Factorial: Calculation of Eﬀects and Analysis of Variance

Nonreplicated 2k Factorial Experiment . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.4 Factorial Experiments in a Regression Setting . . . . . . . . . . . . . . . . . . .

15.5 The Orthogonal Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.6 Fractional Factorial Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.7 Analysis of Fractional Factorial Experiments . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.8 Higher Fractions and Screening Designs . . . . . . . . . . . . . . . . . . . . . . . . .

15.9 Construction of Resolution III and IV Designs with 8, 16, and 32

Design Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.10 Other Two-Level Resolution III Designs; The Plackett-Burman

Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.11 Introduction to Response Surface Methodology . . . . . . . . . . . . . . . . . .

15.12 Robust Parameter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.13 Potential Misconceptions and Hazards; Relationship to Material

in Other Chapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

597

598

604

609

612

617

625

626

632

634

636

637

638

639

643

652

653

654

16 Nonparametric Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

16.1

16.2

16.3

16.4

16.5

16.6

16.7

Nonparametric Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wilcoxon Rank-Sum Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Kruskal-Wallis Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Runs Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Tolerance Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Rank Correlation Coeﬃcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

655

660

663

665

668

670

671

674

674

677

679

xiv

Contents

17 Statistical Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681

17.1

17.2

17.3

17.4

17.5

17.6

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Nature of the Control Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Purposes of the Control Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Control Charts for Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Control Charts for Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Cusum Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

681

683

683

684

697

705

706

18 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709

18.1

18.2

18.3

Bayesian Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayesian Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayes Estimates Using Decision Theory Framework . . . . . . . . . . . . .

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

709

710

717

718

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721

Appendix A: Statistical Tables and Proofs . . . . . . . . . . . . . . . . . . 725

Appendix B: Answers to Odd-Numbered Non-Review

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785

Preface

General Approach and Mathematical Level

Our emphasis in creating the ninth edition is less on adding new material and more

on providing clarity and deeper understanding. This objective was accomplished in

part by including new end-of-chapter material that adds connective tissue between

chapters. We aﬀectionately call these comments at the end of the chapter “Pot

Holes.” They are very useful to remind students of the big picture and how each

chapter ﬁts into that picture, and they aid the student in learning about limitations

and pitfalls that may result if procedures are misused. A deeper understanding

of real-world use of statistics is made available through class projects, which were

added in several chapters. These projects provide the opportunity for students

alone, or in groups, to gather their own experimental data and draw inferences. In

some cases, the work involves a problem whose solution will illustrate the meaning

of a concept or provide an empirical understanding of an important statistical

result. Some existing examples were expanded and new ones were introduced to

create “case studies,” in which commentary is provided to give the student a clear

understanding of a statistical concept in the context of a practical situation.

In this edition, we continue to emphasize a balance between theory and applications. Calculus and other types of mathematical support (e.g., linear algebra)

are used at about the same level as in previous editions. The coverage of analytical tools in statistics is enhanced with the use of calculus when discussion

centers on rules and concepts in probability. Probability distributions and statistical inference are highlighted in Chapters 2 through 10. Linear algebra and

matrices are very lightly applied in Chapters 11 through 15, where linear regression and analysis of variance are covered. Students using this text should have

had the equivalent of one semester of diﬀerential and integral calculus. Linear

algebra is helpful but not necessary so long as the section in Chapter 12 on multiple linear regression using matrix algebra is not covered by the instructor. As

in previous editions, a large number of exercises that deal with real-life scientiﬁc

and engineering applications are available to challenge the student. The many

data sets associated with the exercises are available for download from the website

http://www.pearsonhighered.com/datasets.

xv

xvi

Preface

Summary of the Changes in the Ninth Edition

• Class projects were added in several chapters to provide a deeper understanding of the real-world use of statistics. Students are asked to produce or gather

their own experimental data and draw inferences from these data.

• More case studies were added and others expanded to help students understand the statistical methods being presented in the context of a real-life situation. For example, the interpretation of conﬁdence limits, prediction limits,

and tolerance limits is given using a real-life situation.

• “Pot Holes” were added at the end of some chapters and expanded in others.

These comments are intended to present each chapter in the context of the

big picture and discuss how the chapters relate to one another. They also

provide cautions about the possible misuse of statistical techniques presented

in the chapter.

• Chapter 1 has been enhanced to include more on single-number statistics as

well as graphical techniques. New fundamental material on sampling and

experimental design is presented.

• Examples added to Chapter 8 on sampling distributions are intended to motivate P -values and hypothesis testing. This prepares the student for the more

challenging material on these topics that will be presented in Chapter 10.

• Chapter 12 contains additional development regarding the eﬀect of a single

regression variable in a model in which collinearity with other variables is

severe.

• Chapter 15 now introduces material on the important topic of response surface

methodology (RSM). The use of noise variables in RSM allows the illustration

of mean and variance (dual response surface) modeling.

• The central composite design (CCD) is introduced in Chapter 15.

• More examples are given in Chapter 18, and the discussion of using Bayesian

methods for statistical decision making has been enhanced.

Content and Course Planning

This text is designed for either a one- or a two-semester course. A reasonable

plan for a one-semester course might include Chapters 1 through 10. This would

result in a curriculum that concluded with the fundamentals of both estimation

and hypothesis testing. Instructors who desire that students be exposed to simple

linear regression may wish to include a portion of Chapter 11. For instructors

who desire to have analysis of variance included rather than regression, the onesemester course may include Chapter 13 rather than Chapters 11 and 12. Chapter

13 features one-factor analysis of variance. Another option is to eliminate portions

of Chapters 5 and/or 6 as well as Chapter 7. With this option, one or more of

the discrete or continuous distributions in Chapters 5 and 6 may be eliminated.

These distributions include the negative binomial, geometric, gamma, Weibull,

beta, and log normal distributions. Other features that one might consider removing from a one-semester curriculum include maximum likelihood estimation,

Preface

xvii

prediction, and/or tolerance limits in Chapter 9. A one-semester curriculum has

built-in ﬂexibility, depending on the relative interest of the instructor in regression,

analysis of variance, experimental design, and response surface methods (Chapter

15). There are several discrete and continuous distributions (Chapters 5 and 6)

that have applications in a variety of engineering and scientiﬁc areas.

Chapters 11 through 18 contain substantial material that can be added for the

second semester of a two-semester course. The material on simple and multiple

linear regression is in Chapters 11 and 12, respectively. Chapter 12 alone oﬀers a

substantial amount of ﬂexibility. Multiple linear regression includes such “special

topics” as categorical or indicator variables, sequential methods of model selection

such as stepwise regression, the study of residuals for the detection of violations

of assumptions, cross validation and the use of the PRESS statistic as well as

Cp , and logistic regression. The use of orthogonal regressors, a precursor to the

experimental design in Chapter 15, is highlighted. Chapters 13 and 14 oﬀer a

relatively large amount of material on analysis of variance (ANOVA) with ﬁxed,

random, and mixed models. Chapter 15 highlights the application of two-level

designs in the context of full and fractional factorial experiments (2k ). Special

screening designs are illustrated. Chapter 15 also features a new section on response

surface methodology (RSM) to illustrate the use of experimental design for ﬁnding

optimal process conditions. The ﬁtting of a second order model through the use of

a central composite design is discussed. RSM is expanded to cover the analysis of

robust parameter design type problems. Noise variables are used to accommodate

dual response surface models. Chapters 16, 17, and 18 contain a moderate amount

of material on nonparametric statistics, quality control, and Bayesian inference.

Chapter 1 is an overview of statistical inference presented on a mathematically

simple level. It has been expanded from the eighth edition to more thoroughly

cover single-number statistics and graphical techniques. It is designed to give

students a preliminary presentation of elementary concepts that will allow them to

understand more involved details that follow. Elementary concepts in sampling,

data collection, and experimental design are presented, and rudimentary aspects

of graphical tools are introduced, as well as a sense of what is garnered from a

data set. Stem-and-leaf plots and box-and-whisker plots have been added. Graphs

are better organized and labeled. The discussion of uncertainty and variation in

a system is thorough and well illustrated. There are examples of how to sort

out the important characteristics of a scientiﬁc process or system, and these ideas

are illustrated in practical settings such as manufacturing processes, biomedical

studies, and studies of biological and other scientiﬁc systems. A contrast is made

between the use of discrete and continuous data. Emphasis is placed on the use

of models and the information concerning statistical models that can be obtained

from graphical tools.

Chapters 2, 3, and 4 deal with basic probability as well as discrete and continuous random variables. Chapters 5 and 6 focus on speciﬁc discrete and continuous

distributions as well as relationships among them. These chapters also highlight

examples of applications of the distributions in real-life scientiﬁc and engineering

studies. Examples, case studies, and a large number of exercises edify the student

concerning the use of these distributions. Projects bring the practical use of these

distributions to life through group work. Chapter 7 is the most theoretical chapter

xviii

Preface

in the text. It deals with transformation of random variables and will likely not be

used unless the instructor wishes to teach a relatively theoretical course. Chapter

8 contains graphical material, expanding on the more elementary set of graphical tools presented and illustrated in Chapter 1. Probability plotting is discussed

and illustrated with examples. The very important concept of sampling distributions is presented thoroughly, and illustrations are given that involve the central

limit theorem and the distribution of a sample variance under normal, independent

(i.i.d.) sampling. The t and F distributions are introduced to motivate their use

in chapters to follow. New material in Chapter 8 helps the student to visualize the

importance of hypothesis testing, motivating the concept of a P -value.

Chapter 9 contains material on one- and two-sample point and interval estimation. A thorough discussion with examples points out the contrast between the

diﬀerent types of intervals—conﬁdence intervals, prediction intervals, and tolerance intervals. A case study illustrates the three types of statistical intervals in the

context of a manufacturing situation. This case study highlights the diﬀerences

among the intervals, their sources, and the assumptions made in their development, as well as what type of scientiﬁc study or question requires the use of each

one. A new approximation method has been added for the inference concerning a

proportion. Chapter 10 begins with a basic presentation on the pragmatic meaning of hypothesis testing, with emphasis on such fundamental concepts as null and

alternative hypotheses, the role of probability and the P -value, and the power of

a test. Following this, illustrations are given of tests concerning one and two samples under standard conditions. The two-sample t-test with paired observations

is also described. A case study helps the student to develop a clear picture of

what interaction among factors really means as well as the dangers that can arise

when interaction between treatments and experimental units exists. At the end of

Chapter 10 is a very important section that relates Chapters 9 and 10 (estimation

and hypothesis testing) to Chapters 11 through 16, where statistical modeling is

prominent. It is important that the student be aware of the strong connection.

Chapters 11 and 12 contain material on simple and multiple linear regression,

respectively. Considerably more attention is given in this edition to the eﬀect that

collinearity among the regression variables plays. A situation is presented that

shows how the role of a single regression variable can depend in large part on what

regressors are in the model with it. The sequential model selection procedures (forward, backward, stepwise, etc.) are then revisited in regard to this concept, and

the rationale for using certain P -values with these procedures is provided. Chapter 12 oﬀers material on nonlinear modeling with a special presentation of logistic

regression, which has applications in engineering and the biological sciences. The

material on multiple regression is quite extensive and thus provides considerable

ﬂexibility for the instructor, as indicated earlier. At the end of Chapter 12 is commentary relating that chapter to Chapters 14 and 15. Several features were added

that provide a better understanding of the material in general. For example, the

end-of-chapter material deals with cautions and diﬃculties one might encounter.

It is pointed out that there are types of responses that occur naturally in practice

(e.g. proportion responses, count responses, and several others) with which standard least squares regression should not be used because standard assumptions do

not hold and violation of assumptions may induce serious errors. The suggestion is

Preface

xix

made that data transformation on the response may alleviate the problem in some

cases. Flexibility is again available in Chapters 13 and 14, on the topic of analysis

of variance. Chapter 13 covers one-factor ANOVA in the context of a completely

randomized design. Complementary topics include tests on variances and multiple

comparisons. Comparisons of treatments in blocks are highlighted, along with the

topic of randomized complete blocks. Graphical methods are extended to ANOVA

to aid the student in supplementing the formal inference with a pictorial type of inference that can aid scientists and engineers in presenting material. A new project

is given in which students incorporate the appropriate randomization into each

plan and use graphical techniques and P -values in reporting the results. Chapter

14 extends the material in Chapter 13 to accommodate two or more factors that

are in a factorial structure. The ANOVA presentation in Chapter 14 includes work

in both random and ﬁxed eﬀects models. Chapter 15 oﬀers material associated

with 2k factorial designs; examples and case studies present the use of screening

designs and special higher fractions of the 2k . Two new and special features are

the presentations of response surface methodology (RSM) and robust parameter

design. These topics are linked in a case study that describes and illustrates a

dual response surface design and analysis featuring the use of process mean and

variance response surfaces.

Computer Software

Case studies, beginning in Chapter 8, feature computer printout and graphical

material generated using both SAS and MINITAB. The inclusion of the computer

reﬂects our belief that students should have the experience of reading and interpreting computer printout and graphics, even if the software in the text is not that

which is used by the instructor. Exposure to more than one type of software can

broaden the experience base for the student. There is no reason to believe that

the software used in the course will be that which the student will be called upon

to use in practice following graduation. Examples and case studies in the text are

supplemented, where appropriate, by various types of residual plots, quantile plots,

normal probability plots, and other plots. Such plots are particularly prevalent in

Chapters 11 through 15.

Supplements

Instructor’s Solutions Manual. This resource contains worked-out solutions to all

text exercises and is available for download from Pearson Education’s Instructor

Resource Center.

Student Solutions Manual ISBN-10: 0-321-64013-6; ISBN-13: 978-0-321-64013-0.

Featuring complete solutions to selected exercises, this is a great tool for students

as they study and work through the problem material.

R

PowerPoint

Lecture Slides ISBN-10: 0-321-73731-8; ISBN-13: 978-0-321-737311. These slides include most of the ﬁgures and tables from the text. Slides are

available to download from Pearson Education’s Instructor Resource Center.

xx

Preface

StatCrunch eText. This interactive, online textbook includes StatCrunch, a powerful, web-based statistical software. Embedded StatCrunch buttons allow users

to open all data sets and tables from the book with the click of a button and

immediately perform an analysis using StatCrunch.

StatCrunch TM . StatCrunch is web-based statistical software that allows users to

perform complex analyses, share data sets, and generate compelling reports of

their data. Users can upload their own data to StatCrunch or search the library

of over twelve thousand publicly shared data sets, covering almost any topic of

interest. Interactive graphical outputs help users understand statistical concepts

and are available for export to enrich reports with visual representations of data.

Additional features include

• A full range of numerical and graphical methods that allow users to analyze

and gain insights from any data set.

• Reporting options that help users create a wide variety of visually appealing

representations of their data.

• An online survey tool that allows users to quickly build and administer surveys

via a web form.

StatCrunch is available to qualiﬁed adopters. For more information, visit our

website at www.statcrunch.com or contact your Pearson representative.

Acknowledgments

We are indebted to those colleagues who reviewed the previous editions of this book

and provided many helpful suggestions for this edition. They are David Groggel,

Miami University; Lance Hemlow, Raritan Valley Community College; Ying Ji,

University of Texas at San Antonio; Thomas Kline, University of Northern Iowa;

Sheila Lawrence, Rutgers University; Luis Moreno, Broome County Community

College; Donald Waldman, University of Colorado—Boulder; and Marlene Will,

Spalding University. We would also like to thank Delray Schulz, Millersville University; Roxane Burrows, Hocking College; and Frank Chmely for ensuring the

accuracy of this text.

We would like to thank the editorial and production services provided by numerous people from Pearson/Prentice Hall, especially the editor in chief Deirdre

Lynch, acquisitions editor Christopher Cummings, executive content editor Christine O’Brien, production editor Tracy Patruno, and copyeditor Sally Liﬂand. Many

useful comments and suggestions by proofreader Gail Magin are greatly appreciated. We thank the Virginia Tech Statistical Consulting Center, which was the

source of many real-life data sets.

R.H.M.

S.L.M.

K.Y.

Chapter 1

Introduction to Statistics

and Data Analysis

1.1

Overview: Statistical Inference, Samples, Populations,

and the Role of Probability

Beginning in the 1980s and continuing into the 21st century, an inordinate amount

of attention has been focused on improvement of quality in American industry.

Much has been said and written about the Japanese “industrial miracle,” which

began in the middle of the 20th century. The Japanese were able to succeed where

we and other countries had failed–namely, to create an atmosphere that allows

the production of high-quality products. Much of the success of the Japanese has

been attributed to the use of statistical methods and statistical thinking among

management personnel.

Use of Scientiﬁc Data

The use of statistical methods in manufacturing, development of food products,

computer software, energy sources, pharmaceuticals, and many other areas involves

the gathering of information or scientiﬁc data. Of course, the gathering of data

is nothing new. It has been done for well over a thousand years. Data have

been collected, summarized, reported, and stored for perusal. However, there is a

profound distinction between collection of scientiﬁc information and inferential

statistics. It is the latter that has received rightful attention in recent decades.

The oﬀspring of inferential statistics has been a large “toolbox” of statistical

methods employed by statistical practitioners. These statistical methods are designed to contribute to the process of making scientiﬁc judgments in the face of

uncertainty and variation. The product density of a particular material from a

manufacturing process will not always be the same. Indeed, if the process involved

is a batch process rather than continuous, there will be not only variation in material density among the batches that come oﬀ the line (batch-to-batch variation),

but also within-batch variation. Statistical methods are used to analyze data from

a process such as this one in order to gain more sense of where in the process

changes may be made to improve the quality of the process. In this process, qual1

2

Chapter 1 Introduction to Statistics and Data Analysis

ity may well be deﬁned in relation to closeness to a target density value in harmony

with what portion of the time this closeness criterion is met. An engineer may be

concerned with a speciﬁc instrument that is used to measure sulfur monoxide in

the air during pollution studies. If the engineer has doubts about the eﬀectiveness

of the instrument, there are two sources of variation that must be dealt with.

The ﬁrst is the variation in sulfur monoxide values that are found at the same

locale on the same day. The second is the variation between values observed and

the true amount of sulfur monoxide that is in the air at the time. If either of these

two sources of variation is exceedingly large (according to some standard set by

the engineer), the instrument may need to be replaced. In a biomedical study of a

new drug that reduces hypertension, 85% of patients experienced relief, while it is

generally recognized that the current drug, or “old” drug, brings relief to 80% of patients that have chronic hypertension. However, the new drug is more expensive to

make and may result in certain side eﬀects. Should the new drug be adopted? This

is a problem that is encountered (often with much more complexity) frequently by

pharmaceutical ﬁrms in conjunction with the FDA (Federal Drug Administration).

Again, the consideration of variation needs to be taken into account. The “85%”

value is based on a certain number of patients chosen for the study. Perhaps if the

study were repeated with new patients the observed number of “successes” would

be 75%! It is the natural variation from study to study that must be taken into

account in the decision process. Clearly this variation is important, since variation

from patient to patient is endemic to the problem.

Variability in Scientiﬁc Data

In the problems discussed above the statistical methods used involve dealing with

variability, and in each case the variability to be studied is that encountered in

scientiﬁc data. If the observed product density in the process were always the

same and were always on target, there would be no need for statistical methods.

If the device for measuring sulfur monoxide always gives the same value and the

value is accurate (i.e., it is correct), no statistical analysis is needed. If there

were no patient-to-patient variability inherent in the response to the drug (i.e.,

it either always brings relief or not), life would be simple for scientists in the

pharmaceutical ﬁrms and FDA and no statistician would be needed in the decision

process. Statistics researchers have produced an enormous number of analytical

methods that allow for analysis of data from systems like those described above.

This reﬂects the true nature of the science that we call inferential statistics, namely,

using techniques that allow us to go beyond merely reporting data to drawing

conclusions (or inferences) about the scientiﬁc system. Statisticians make use of

fundamental laws of probability and statistical inference to draw conclusions about

scientiﬁc systems. Information is gathered in the form of samples, or collections

of observations. The process of sampling is introduced in Chapter 2, and the

discussion continues throughout the entire book.

Samples are collected from populations, which are collections of all individuals or individual items of a particular type. At times a population signiﬁes a

scientiﬁc system. For example, a manufacturer of computer boards may wish to

eliminate defects. A sampling process may involve collecting information on 50

computer boards sampled randomly from the process. Here, the population is all

1.1 Overview: Statistical Inference, Samples, Populations, and the Role of Probability

3

computer boards manufactured by the ﬁrm over a speciﬁc period of time. If an

improvement is made in the computer board process and a second sample of boards

is collected, any conclusions drawn regarding the eﬀectiveness of the change in process should extend to the entire population of computer boards produced under

the “improved process.” In a drug experiment, a sample of patients is taken and

each is given a speciﬁc drug to reduce blood pressure. The interest is focused on

drawing conclusions about the population of those who suﬀer from hypertension.

Often, it is very important to collect scientiﬁc data in a systematic way, with

planning being high on the agenda. At times the planning is, by necessity, quite

limited. We often focus only on certain properties or characteristics of the items or

objects in the population. Each characteristic has particular engineering or, say,

biological importance to the “customer,” the scientist or engineer who seeks to learn

about the population. For example, in one of the illustrations above the quality

of the process had to do with the product density of the output of a process. An

engineer may need to study the eﬀect of process conditions, temperature, humidity,

amount of a particular ingredient, and so on. He or she can systematically move

these factors to whatever levels are suggested according to whatever prescription

or experimental design is desired. However, a forest scientist who is interested

in a study of factors that inﬂuence wood density in a certain kind of tree cannot

necessarily design an experiment. This case may require an observational study

in which data are collected in the ﬁeld but factor levels can not be preselected.

Both of these types of studies lend themselves to methods of statistical inference.

In the former, the quality of the inferences will depend on proper planning of the

experiment. In the latter, the scientist is at the mercy of what can be gathered.

For example, it is sad if an agronomist is interested in studying the eﬀect of rainfall

on plant yield and the data are gathered during a drought.

The importance of statistical thinking by managers and the use of statistical

inference by scientiﬁc personnel is widely acknowledged. Research scientists gain

much from scientiﬁc data. Data provide understanding of scientiﬁc phenomena.

Product and process engineers learn a great deal in their oﬀ-line eﬀorts to improve

the process. They also gain valuable insight by gathering production data (online monitoring) on a regular basis. This allows them to determine necessary

modiﬁcations in order to keep the process at a desired level of quality.

There are times when a scientiﬁc practitioner wishes only to gain some sort of

summary of a set of data represented in the sample. In other words, inferential

statistics is not required. Rather, a set of single-number statistics or descriptive

statistics is helpful. These numbers give a sense of center of the location of

the data, variability in the data, and the general nature of the distribution of

observations in the sample. Though no speciﬁc statistical methods leading to

statistical inference are incorporated, much can be learned. At times, descriptive

statistics are accompanied by graphics. Modern statistical software packages allow

for computation of means, medians, standard deviations, and other singlenumber statistics as well as production of graphs that show a “footprint” of the

nature of the sample. Deﬁnitions and illustrations of the single-number statistics

and graphs, including histograms, stem-and-leaf plots, scatter plots, dot plots, and

box plots, will be given in sections that follow.

4

Chapter 1 Introduction to Statistics and Data Analysis

The Role of Probability

In this book, Chapters 2 to 6 deal with fundamental notions of probability. A

thorough grounding in these concepts allows the reader to have a better understanding of statistical inference. Without some formalism of probability theory,

the student cannot appreciate the true interpretation from data analysis through

modern statistical methods. It is quite natural to study probability prior to studying statistical inference. Elements of probability allow us to quantify the strength

or “conﬁdence” in our conclusions. In this sense, concepts in probability form a

major component that supplements statistical methods and helps us gauge the

strength of the statistical inference. The discipline of probability, then, provides

the transition between descriptive statistics and inferential methods. Elements of

probability allow the conclusion to be put into the language that the science or

engineering practitioners require. An example follows that will enable the reader

to understand the notion of a P -value, which often provides the “bottom line” in

the interpretation of results from the use of statistical methods.

Example 1.1: Suppose that an engineer encounters data from a manufacturing process in which

100 items are sampled and 10 are found to be defective. It is expected and anticipated that occasionally there will be defective items. Obviously these 100 items

represent the sample. However, it has been determined that in the long run, the

company can only tolerate 5% defective in the process. Now, the elements of probability allow the engineer to determine how conclusive the sample information is

regarding the nature of the process. In this case, the population conceptually

represents all possible items from the process. Suppose we learn that if the process

is acceptable, that is, if it does produce items no more than 5% of which are defective, there is a probability of 0.0282 of obtaining 10 or more defective items in

a random sample of 100 items from the process. This small probability suggests

that the process does, indeed, have a long-run rate of defective items that exceeds

5%. In other words, under the condition of an acceptable process, the sample information obtained would rarely occur. However, it did occur! Clearly, though, it

would occur with a much higher probability if the process defective rate exceeded

5% by a signiﬁcant amount.

From this example it becomes clear that the elements of probability aid in the

translation of sample information into something conclusive or inconclusive about

the scientiﬁc system. In fact, what was learned likely is alarming information to

the engineer or manager. Statistical methods, which we will actually detail in

Chapter 10, produced a P -value of 0.0282. The result suggests that the process

very likely is not acceptable. The concept of a P-value is dealt with at length

in succeeding chapters. The example that follows provides a second illustration.

Example 1.2: Often the nature of the scientiﬁc study will dictate the role that probability and

deductive reasoning play in statistical inference. Exercise 9.40 on page 294 provides

data associated with a study conducted at the Virginia Polytechnic Institute and

State University on the development of a relationship between the roots of trees and

the action of a fungus. Minerals are transferred from the fungus to the trees and

sugars from the trees to the fungus. Two samples of 10 northern red oak seedlings

were planted in a greenhouse, one containing seedlings treated with nitrogen and

1.1 Overview: Statistical Inference, Samples, Populations, and the Role of Probability

5

the other containing seedlings with no nitrogen. All other environmental conditions

were held constant. All seedlings contained the fungus Pisolithus tinctorus. More

details are supplied in Chapter 9. The stem weights in grams were recorded after

the end of 140 days. The data are given in Table 1.1.

Table 1.1: Data Set for Example 1.2

No Nitrogen

0.32

0.53

0.28

0.37

0.47

0.43

0.36

0.42

0.38

0.43

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.65

Nitrogen

0.26

0.43

0.47

0.49

0.52

0.75

0.79

0.86

0.62

0.46

0.70

0.75

0.80

0.85

0.90

Figure 1.1: A dot plot of stem weight data.

In this example there are two samples from two separate populations. The

purpose of the experiment is to determine if the use of nitrogen has an inﬂuence

on the growth of the roots. The study is a comparative study (i.e., we seek to

compare the two populations with regard to a certain important characteristic). It

is instructive to plot the data as shown in the dot plot of Figure 1.1. The ◦ values

represent the “nitrogen” data and the × values represent the “no-nitrogen” data.

Notice that the general appearance of the data might suggest to the reader

that, on average, the use of nitrogen increases the stem weight. Four nitrogen observations are considerably larger than any of the no-nitrogen observations. Most

of the no-nitrogen observations appear to be below the center of the data. The

appearance of the data set would seem to indicate that nitrogen is eﬀective. But

how can this be quantiﬁed? How can all of the apparent visual evidence be summarized in some sense? As in the preceding example, the fundamentals of probability

can be used. The conclusions may be summarized in a probability statement or

P-value. We will not show here the statistical inference that produces the summary

probability. As in Example 1.1, these methods will be discussed in Chapter 10.

The issue revolves around the “probability that data like these could be observed”

given that nitrogen has no eﬀect, in other words, given that both samples were

generated from the same population. Suppose that this probability is small, say

0.03. That would certainly be strong evidence that the use of nitrogen does indeed

inﬂuence (apparently increases) average stem weight of the red oak seedlings.

6

Chapter 1 Introduction to Statistics and Data Analysis

How Do Probability and Statistical Inference Work Together?

It is important for the reader to understand the clear distinction between the

discipline of probability, a science in its own right, and the discipline of inferential statistics. As we have already indicated, the use or application of concepts in

probability allows real-life interpretation of the results of statistical inference. As a

result, it can be said that statistical inference makes use of concepts in probability.

One can glean from the two examples above that the sample information is made

available to the analyst and, with the aid of statistical methods and elements of

probability, conclusions are drawn about some feature of the population (the process does not appear to be acceptable in Example 1.1, and nitrogen does appear

to inﬂuence average stem weights in Example 1.2). Thus for a statistical problem,

the sample along with inferential statistics allows us to draw conclusions about the population, with inferential statistics making clear use

of elements of probability. This reasoning is inductive in nature. Now as we

move into Chapter 2 and beyond, the reader will note that, unlike what we do in

our two examples here, we will not focus on solving statistical problems. Many

examples will be given in which no sample is involved. There will be a population

clearly described with all features of the population known. Then questions of importance will focus on the nature of data that might hypothetically be drawn from

the population. Thus, one can say that elements in probability allow us to

draw conclusions about characteristics of hypothetical data taken from

the population, based on known features of the population. This type of

reasoning is deductive in nature. Figure 1.2 shows the fundamental relationship

between probability and inferential statistics.

Probability

Population

Sample

Statistical Inference

Figure 1.2: Fundamental relationship between probability and inferential statistics.

Now, in the grand scheme of things, which is more important, the ﬁeld of

probability or the ﬁeld of statistics? They are both very important and clearly are

complementary. The only certainty concerning the pedagogy of the two disciplines

lies in the fact that if statistics is to be taught at more than merely a “cookbook”

level, then the discipline of probability must be taught ﬁrst. This rule stems from

the fact that nothing can be learned about a population from a sample until the

analyst learns the rudiments of uncertainty in that sample. For example, consider

Example 1.1. The question centers around whether or not the population, deﬁned

by the process, is no more than 5% defective. In other words, the conjecture is that

on the average 5 out of 100 items are defective. Now, the sample contains 100

items and 10 are defective. Does this support the conjecture or refute it? On the

1.2 Sampling Procedures; Collection of Data

7

surface it would appear to be a refutation of the conjecture because 10 out of 100

seem to be “a bit much.” But without elements of probability, how do we know?

Only through the study of material in future chapters will we learn the conditions

under which the process is acceptable (5% defective). The probability of obtaining

10 or more defective items in a sample of 100 is 0.0282.

We have given two examples where the elements of probability provide a summary that the scientist or engineer can use as evidence on which to build a decision.

The bridge between the data and the conclusion is, of course, based on foundations

of statistical inference, distribution theory, and sampling distributions discussed in

future chapters.

1.2

Sampling Procedures; Collection of Data

In Section 1.1 we discussed very brieﬂy the notion of sampling and the sampling

process. While sampling appears to be a simple concept, the complexity of the

questions that must be answered about the population or populations necessitates

that the sampling process be very complex at times. While the notion of sampling

is discussed in a technical way in Chapter 8, we shall endeavor here to give some

common-sense notions of sampling. This is a natural transition to a discussion of

the concept of variability.

Simple Random Sampling

The importance of proper sampling revolves around the degree of conﬁdence with

which the analyst is able to answer the questions being asked. Let us assume that

only a single population exists in the problem. Recall that in Example 1.2 two

populations were involved. Simple random sampling implies that any particular

sample of a speciﬁed sample size has the same chance of being selected as any

other sample of the same size. The term sample size simply means the number of

elements in the sample. Obviously, a table of random numbers can be utilized in

sample selection in many instances. The virtue of simple random sampling is that

it aids in the elimination of the problem of having the sample reﬂect a diﬀerent

(possibly more conﬁned) population than the one about which inferences need to be

made. For example, a sample is to be chosen to answer certain questions regarding

political preferences in a certain state in the United States. The sample involves

the choice of, say, 1000 families, and a survey is to be conducted. Now, suppose it

turns out that random sampling is not used. Rather, all or nearly all of the 1000

families chosen live in an urban setting. It is believed that political preferences

in rural areas diﬀer from those in urban areas. In other words, the sample drawn

actually conﬁned the population and thus the inferences need to be conﬁned to the

“limited population,” and in this case conﬁning may be undesirable. If, indeed,

the inferences need to be made about the state as a whole, the sample of size 1000

described here is often referred to as a biased sample.

As we hinted earlier, simple random sampling is not always appropriate. Which

alternative approach is used depends on the complexity of the problem. Often, for

example, the sampling units are not homogeneous and naturally divide themselves

into nonoverlapping groups that are homogeneous. These groups are called strata,

8

Chapter 1 Introduction to Statistics and Data Analysis

and a procedure called stratiﬁed random sampling involves random selection of a

sample within each stratum. The purpose is to be sure that each of the strata

is neither over- nor underrepresented. For example, suppose a sample survey is

conducted in order to gather preliminary opinions regarding a bond referendum

that is being considered in a certain city. The city is subdivided into several ethnic

groups which represent natural strata. In order not to disregard or overrepresent

any group, separate random samples of families could be chosen from each group.

Experimental Design

The concept of randomness or random assignment plays a huge role in the area of

experimental design, which was introduced very brieﬂy in Section 1.1 and is an

important staple in almost any area of engineering or experimental science. This

will be discussed at length in Chapters 13 through 15. However, it is instructive to

give a brief presentation here in the context of random sampling. A set of so-called

treatments or treatment combinations becomes the populations to be studied

or compared in some sense. An example is the nitrogen versus no-nitrogen treatments in Example 1.2. Another simple example would be “placebo” versus “active

drug,” or in a corrosion fatigue study we might have treatment combinations that

involve specimens that are coated or uncoated as well as conditions of low or high

humidity to which the specimens are exposed. In fact, there are four treatment

or factor combinations (i.e., 4 populations), and many scientiﬁc questions may be

asked and answered through statistical and inferential methods. Consider ﬁrst the

situation in Example 1.2. There are 20 diseased seedlings involved in the experiment. It is easy to see from the data themselves that the seedlings are diﬀerent

from each other. Within the nitrogen group (or the no-nitrogen group) there is

considerable variability in the stem weights. This variability is due to what is

generally called the experimental unit. This is a very important concept in inferential statistics, in fact one whose description will not end in this chapter. The

nature of the variability is very important. If it is too large, stemming from a

condition of excessive nonhomogeneity in experimental units, the variability will

“wash out” any detectable diﬀerence between the two populations. Recall that in

this case that did not occur.

The dot plot in Figure 1.1 and P-value indicated a clear distinction between

these two conditions. What role do those experimental units play in the datataking process itself? The common-sense and, indeed, quite standard approach is

to assign the 20 seedlings or experimental units randomly to the two treatments or conditions. In the drug study, we may decide to use a total of 200

available patients, patients that clearly will be diﬀerent in some sense. They are

the experimental units. However, they all may have the same chronic condition

for which the drug is a potential treatment. Then in a so-called completely randomized design, 100 patients are assigned randomly to the placebo and 100 to

the active drug. Again, it is these experimental units within a group or treatment

that produce the variability in data results (i.e., variability in the measured result),

say blood pressure, or whatever drug eﬃcacy value is important. In the corrosion

fatigue study, the experimental units are the specimens that are the subjects of

the corrosion.

1.2 Sampling Procedures; Collection of Data

9

Why Assign Experimental Units Randomly?

What is the possible negative impact of not randomly assigning experimental units

to the treatments or treatment combinations? This is seen most clearly in the

case of the drug study. Among the characteristics of the patients that produce

variability in the results are age, gender, and weight. Suppose merely by chance

the placebo group contains a sample of people that are predominately heavier than

those in the treatment group. Perhaps heavier individuals have a tendency to have

a higher blood pressure. This clearly biases the result, and indeed, any result

obtained through the application of statistical inference may have little to do with

the drug and more to do with diﬀerences in weights among the two samples of

patients.

We should emphasize the attachment of importance to the term variability.

Excessive variability among experimental units “camouﬂages” scientiﬁc ﬁndings.

In future sections, we attempt to characterize and quantify measures of variability.

In sections that follow, we introduce and discuss speciﬁc quantities that can be

computed in samples; the quantities give a sense of the nature of the sample with

respect to center of location of the data and variability in the data. A discussion

of several of these single-number measures serves to provide a preview of what

statistical information will be important components of the statistical methods

that are used in future chapters. These measures that help characterize the nature

of the data set fall into the category of descriptive statistics. This material is

a prelude to a brief presentation of pictorial and graphical methods that go even

further in characterization of the data set. The reader should understand that the

statistical methods illustrated here will be used throughout the text. In order to

oﬀer the reader a clearer picture of what is involved in experimental design studies,

we oﬀer Example 1.3.

Example 1.3: A corrosion study was made in order to determine whether coating an aluminum

metal with a corrosion retardation substance reduced the amount of corrosion.

The coating is a protectant that is advertised to minimize fatigue damage in this

type of material. Also of interest is the inﬂuence of humidity on the amount of

corrosion. A corrosion measurement can be expressed in thousands of cycles to

failure. Two levels of coating, no coating and chemical corrosion coating, were

used. In addition, the two relative humidity levels are 20% relative humidity and

80% relative humidity.

The experiment involves four treatment combinations that are listed in the table

that follows. There are eight experimental units used, and they are aluminum

specimens prepared; two are assigned randomly to each of the four treatment

combinations. The data are presented in Table 1.2.

The corrosion data are averages of two specimens. A plot of the averages is

pictured in Figure 1.3. A relatively large value of cycles to failure represents a

small amount of corrosion. As one might expect, an increase in humidity appears

to make the corrosion worse. The use of the chemical corrosion coating procedure

appears to reduce corrosion.

In this experimental design illustration, the engineer has systematically selected

the four treatment combinations. In order to connect this situation to concepts

with which the reader has been exposed to this point, it should be assumed that the

10

Chapter 1 Introduction to Statistics and Data Analysis

Table 1.2: Data for Example 1.3

Coating

Uncoated

Chemical Corrosion

Humidity

20%

80%

20%

80%

Average Corrosion in

Thousands of Cycles to Failure

975

350

1750

1550

2000

Average Corrosion

Chemical Corrosion Coating

1000

Uncoated

0

0

20%

80%

Humidity

Figure 1.3: Corrosion results for Example 1.3.

conditions representing the four treatment combinations are four separate populations and that the two corrosion values observed for each population are important

pieces of information. The importance of the average in capturing and summarizing certain features in the population will be highlighted in Section 1.3. While we

might draw conclusions about the role of humidity and the impact of coating the

specimens from the ﬁgure, we cannot truly evaluate the results from an analytical point of view without taking into account the variability around the average.

Again, as we indicated earlier, if the two corrosion values for each treatment combination are close together, the picture in Figure 1.3 may be an accurate depiction.

But if each corrosion value in the ﬁgure is an average of two values that are widely

dispersed, then this variability may, indeed, truly “wash away” any information

that appears to come through when one observes averages only. The foregoing

example illustrates these concepts:

(1) random assignment of treatment combinations (coating, humidity) to experimental units (specimens)

(2) the use of sample averages (average corrosion values) in summarizing sample

information

(3) the need for consideration of measures of variability in the analysis of any

sample or sets of samples

1.3 Measures of Location: The Sample Mean and Median

11

This example suggests the need for what follows in Sections 1.3 and 1.4, namely,

descriptive statistics that indicate measures of center of location in a set of data,

and those that measure variability.

1.3

Measures of Location: The Sample Mean and Median

Measures of location are designed to provide the analyst with some quantitative

values of where the center, or some other location, of data is located. In Example

1.2, it appears as if the center of the nitrogen sample clearly exceeds that of the

no-nitrogen sample. One obvious and very useful measure is the sample mean.

The mean is simply a numerical average.

Deﬁnition 1.1: Suppose that the observations in a sample are x1 , x2 , . . . , xn . The sample mean,

denoted by x̄, is

x̄ =

n

xi

i=1

n

=

x1 + x2 + · · · + xn

.

n

There are other measures of central tendency that are discussed in detail in

future chapters. One important measure is the sample median. The purpose of

the sample median is to reﬂect the central tendency of the sample in such a way

that it is uninﬂuenced by extreme values or outliers.

Deﬁnition 1.2: Given that the observations in a sample are x1 , x2 , . . . , xn , arranged in increasing

order of magnitude, the sample median is

x(n+1)/2 ,

if n is odd,

x̃ = 1

2 (xn/2 + xn/2+1 ), if n is even.

As an example, suppose the data set is the following: 1.7, 2.2, 3.9, 3.11, and

14.7. The sample mean and median are, respectively,

x̄ = 5.12,

x̃ = 3.9.

Clearly, the mean is inﬂuenced considerably by the presence of the extreme observation, 14.7, whereas the median places emphasis on the true “center” of the data

set. In the case of the two-sample data set of Example 1.2, the two measures of

central tendency for the individual samples are

x̄ (no nitrogen)

=

x̃ (no nitrogen)

=

x̄ (nitrogen)

=

x̃ (nitrogen)

=

0.399 gram,

0.38 + 0.42

= 0.400 gram,

2

0.565 gram,

0.49 + 0.52

= 0.505 gram.

2

Clearly there is a diﬀerence in concept between the mean and median. It may

be of interest to the reader with an engineering background that the sample mean

12

Chapter 1 Introduction to Statistics and Data Analysis

is the centroid of the data in a sample. In a sense, it is the point at which a

fulcrum can be placed to balance a system of “weights” which are the locations of

the individual data. This is shown in Figure 1.4 with regard to the with-nitrogen

sample.

x 0.565

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

Figure 1.4: Sample mean as a centroid of the with-nitrogen stem weight.

In future chapters, the basis for the computation of x̄ is that of an estimate

of the population mean. As we indicated earlier, the purpose of statistical inference is to draw conclusions about population characteristics or parameters and

estimation is a very important feature of statistical inference.

The median and mean can be quite diﬀerent from each other. Note, however,

that in the case of the stem weight data the sample mean value for no-nitrogen is

quite similar to the median value.

Other Measures of Locations

There are several other methods of quantifying the center of location of the data

in the sample. We will not deal with them at this point. For the most part,

alternatives to the sample mean are designed to produce values that represent

compromises between the mean and the median. Rarely do we make use of these

other measures. However, it is instructive to discuss one class of estimators, namely

the class of trimmed means. A trimmed mean is computed by “trimming away”

a certain percent of both the largest and the smallest set of values. For example,

the 10% trimmed mean is found by eliminating the largest 10% and smallest 10%

and computing the average of the remaining values. For example, in the case of

the stem weight data, we would eliminate the largest and smallest since the sample

size is 10 for each sample. So for the without-nitrogen group the 10% trimmed

mean is given by

x̄tr(10) =

0.32 + 0.37 + 0.47 + 0.43 + 0.36 + 0.42 + 0.38 + 0.43

= 0.39750,

8

and for the 10% trimmed mean for the with-nitrogen group we have

x̄tr(10) =

0.43 + 0.47 + 0.49 + 0.52 + 0.75 + 0.79 + 0.62 + 0.46

= 0.56625.

8

Note that in this case, as expected, the trimmed means are close to both the mean

and the median for the individual samples. The trimmed mean is, of course, more

insensitive to outliers than the sample mean but not as insensitive as the median.

On the other hand, the trimmed mean approach makes use of more information

than the sample median. Note that the sample median is, indeed, a special case of

the trimmed mean in which all of the sample data are eliminated apart from the

middle one or two observations.

/

/

Exercises

13

Exercises

1.1 The following measurements were recorded for

the drying time, in hours, of a certain brand of latex

paint.

3.4 2.5 4.8 2.9 3.6

2.8 3.3 5.6 3.7 2.8

4.4 4.0 5.2 3.0 4.8

Assume that the measurements are a simple random

sample.

(a) What is the sample size for the above sample?

(b) Calculate the sample mean for these data.

(c) Calculate the sample median.

(d) Plot the data by way of a dot plot.

(e) Compute the 20% trimmed mean for the above

data set.

(f) Is the sample mean for these data more or less descriptive as a center of location than the trimmed

mean?

1.2 According to the journal Chemical Engineering,

an important property of a ﬁber is its water absorbency. A random sample of 20 pieces of cotton ﬁber

was taken and the absorbency on each piece was measured. The following are the absorbency values:

18.71 21.41 20.72 21.81 19.29 22.43 20.17

23.71 19.44 20.50 18.92 20.33 23.00 22.85

19.25 21.77 22.11 19.77 18.04 21.12

(a) Calculate the sample mean and median for the

above sample values.

(b) Compute the 10% trimmed mean.

(c) Do a dot plot of the absorbency data.

(d) Using only the values of the mean, median, and

trimmed mean, do you have evidence of outliers in

the data?

1.3 A certain polymer is used for evacuation systems

for aircraft. It is important that the polymer be resistant to the aging process. Twenty specimens of the

polymer were used in an experiment. Ten were assigned randomly to be exposed to an accelerated batch

aging process that involved exposure to high temperatures for 10 days. Measurements of tensile strength of

the specimens were made, and the following data were

recorded on tensile strength in psi:

No aging: 227 222 218 217 225

218 216 229 228 221

Aging:

219 214 215 211 209

218 203 204 201 205

(a) Do a dot plot of the data.

(b) From your plot, does it appear as if the aging process has had an eﬀect on the tensile strength of this

polymer? Explain.

(c) Calculate the sample mean tensile strength of the

two samples.

(d) Calculate the median for both. Discuss the similarity or lack of similarity between the mean and

median of each group.

1.4 In a study conducted by the Department of Mechanical Engineering at Virginia Tech, the steel rods

supplied by two diﬀerent companies were compared.

Ten sample springs were made out of the steel rods

supplied by each company, and a measure of ﬂexibility

was recorded for each. The data are as follows:

Company A: 9.3 8.8 6.8 8.7 8.5

6.7 8.0 6.5 9.2 7.0

Company B: 11.0 9.8 9.9 10.2 10.1

9.7 11.0 11.1 10.2 9.6

(a) Calculate the sample mean and median for the data

for the two companies.

(b) Plot the data for the two companies on the same

line and give your impression regarding any apparent diﬀerences between the two companies.

1.5 Twenty adult males between the ages of 30 and

40 participated in a study to evaluate the eﬀect of a

speciﬁc health regimen involving diet and exercise on

the blood cholesterol. Ten were randomly selected to

be a control group, and ten others were assigned to

take part in the regimen as the treatment group for a

period of 6 months. The following data show the reduction in cholesterol experienced for the time period

for the 20 subjects:

Control group:

7

3 −4 14 2

5 22 −7

9 5

Treatment group: −6

5

9

4 4

12 37

5

3 3

(a) Do a dot plot of the data for both groups on the

same graph.

(b) Compute the mean, median, and 10% trimmed

mean for both groups.

(c) Explain why the diﬀerence in means suggests one

conclusion about the eﬀect of the regimen, while

the diﬀerence in medians or trimmed means suggests a diﬀerent conclusion.

1.6 The tensile strength of silicone rubber is thought

to be a function of curing temperature. A study was

carried out in which samples of 12 specimens of the rubber were prepared using curing temperatures of 20◦ C

and 45◦ C. The data below show the tensile strength

values in megapascals.

14

Chapter 1 Introduction to Statistics and Data Analysis

20◦ C:

◦

45 C:

2.07

2.05

2.52

1.99

2.14

2.18

2.15

2.42

2.22

2.09

2.49

2.08

2.03

2.14

2.03

2.42

2.21

2.11

2.37

2.29

2.03

2.02

2.05

2.01

(a) Show a dot plot of the data with both low and high

temperature tensile strength values.

1.4

(b) Compute sample mean tensile strength for both

samples.

(c) Does it appear as if curing temperature has an

inﬂuence on tensile strength, based on the plot?

Comment further.

(d) Does anything else appear to be inﬂuenced by an

increase in curing temperature? Explain.

Measures of Variability

Sample variability plays an important role in data analysis. Process and product

variability is a fact of life in engineering and scientiﬁc systems: The control or

reduction of process variability is often a source of major diﬃculty. More and

more process engineers and managers are learning that product quality and, as

a result, proﬁts derived from manufactured products are very much a function

of process variability. As a result, much of Chapters 9 through 15 deals with

data analysis and modeling procedures in which sample variability plays a major

role. Even in small data analysis problems, the success of a particular statistical

method may depend on the magnitude of the variability among the observations in

the sample. Measures of location in a sample do not provide a proper summary of

the nature of a data set. For instance, in Example 1.2 we cannot conclude that the

use of nitrogen enhances growth without taking sample variability into account.

While the details of the analysis of this type of data set are deferred to Chapter 9, it should be clear from Figure 1.1 that variability among the no-nitrogen

observations and variability among the nitrogen observations are certainly of some

consequence. In fact, it appears that the variability within the nitrogen sample

is larger than that of the no-nitrogen sample. Perhaps there is something about

the inclusion of nitrogen that not only increases the stem height (x̄ of 0.565 gram

compared to an x̄ of 0.399 gram for the no-nitrogen sample) but also increases the

variability in stem height (i.e., renders the stem height more inconsistent).

As another example, contrast the two data sets below. Each contains two

samples and the diﬀerence in the means is roughly the same for the two samples, but

data set B seems to provide a much sharper contrast between the two populations

from which the samples were taken. If the purpose of such an experiment is to

detect diﬀerences between the two populations, the task is accomplished in the case

of data set B. However, in data set A the large variability within the two samples

creates diﬃculty. In fact, it is not clear that there is a distinction between the two

populations.

Data set A:

X X X X X X

0 X X 0 0 X X X 0

xX

Data set B:

0 0 0 0 0 0 0

x0

X X X X X X X X X X X

0 0 0 0 0 0 0 0 0 0 0

xX

x0

1.4 Measures of Variability

15

Sample Range and Sample Standard Deviation

Just as there are many measures of central tendency or location, there are many

measures of spread or variability. Perhaps the simplest one is the sample range

Xmax − Xmin . The range can be very useful and is discussed at length in Chapter

17 on statistical quality control. The sample measure of spread that is used most

often is the sample standard deviation. We again let x1 , x2 , . . . , xn denote

sample values.

Deﬁnition 1.3: The sample variance, denoted by s2 , is given by

s2 =

n

(xi − x̄)2

i=1

n−1

.

The sample standard deviation, denoted by s, is the positive square root of

s2 , that is,

√

s = s2 .

It should be clear to the reader that the sample standard deviation is, in fact,

a measure of variability. Large variability in a data set produces relatively large

values of (x − x̄)2 and thus a large sample variance. The quantity n − 1 is often

called the degrees of freedom associated with the variance estimate. In this

simple example, the degrees of freedom depict the number of independent pieces

of information available for computing variability. For example, suppose that we

wish to compute the sample variance and standard deviation of the data set (5,

17, 6, 4). The sample average is x̄ = 8. The computation of the variance involves

(5 − 8)2 + (17 − 8)2 + (6 − 8)2 + (4 − 8)2 = (−3)2 + 92 + (−2)2 + (−4)2 .

The quantities inside parentheses sum to zero. In general,

n

(xi − x̄) = 0 (see

i=1

Exercise 1.16 on page 31). Then the computation of a sample variance does not

involve n independent squared deviations from the mean x̄. In fact, since the

last value of x − x̄ is determined by the initial n − 1 of them, we say that these

are n − 1 “pieces of information” that produce s2 . Thus, there are n − 1 degrees

of freedom rather than n degrees of freedom for computing a sample variance.

Example 1.4: In an example discussed extensively in Chapter 10, an engineer is interested in

testing the “bias” in a pH meter. Data are collected on the meter by measuring

the pH of a neutral substance (pH = 7.0). A sample of size 10 is taken, with results

given by

7.07 7.00 7.10 6.97 7.00 7.03 7.01 7.01 6.98 7.08.

The sample mean x̄ is given by

x̄ =

7.07 + 7.00 + 7.10 + · · · + 7.08

= 7.0250.

10

16

Chapter 1 Introduction to Statistics and Data Analysis

The sample variance s2 is given by

s2 =

1

[(7.07 − 7.025)2 + (7.00 − 7.025)2 + (7.10 − 7.025)2

9

+ · · · + (7.08 − 7.025)2 ] = 0.001939.

As a result, the sample standard deviation is given by

√

s = 0.001939 = 0.044.

So the sample standard deviation is 0.0440 with n − 1 = 9 degrees of freedom.

Units for Standard Deviation and Variance

It should be apparent from Deﬁnition 1.3 that the variance is a measure of the

average squared deviation from the mean x̄. We use the term average squared

deviation even though the deﬁnition makes use of a division by degrees of freedom

n − 1 rather than n. Of course, if n is large, the diﬀerence in the denominator

is inconsequential. As a result, the sample variance possesses units that are the

square of the units in the observed data whereas the sample standard deviation

is found in linear units. As an example, consider the data of Example 1.2. The

stem weights are measured in grams. As a result, the sample standard deviations

are in grams and the variances are measured in grams2 . In fact, the individual

standard deviations are 0.0728 gram for the no-nitrogen case and 0.1867 gram for

the nitrogen group. Note that the standard deviation does indicate considerably

larger variability in the nitrogen sample. This condition was displayed in Figure

1.1.

Which Variability Measure Is More Important?

As we indicated earlier, the sample range has applications in the area of statistical

quality control. It may appear to the reader that the use of both the sample

variance and the sample standard deviation is redundant. Both measures reﬂect the

same concept in measuring variability, but the sample standard deviation measures

variability in linear units whereas the sample variance is measured in squared

units. Both play huge roles in the use of statistical methods. Much of what is

accomplished in the context of statistical inference involves drawing conclusions

about characteristics of populations. Among these characteristics are constants

which are called population parameters. Two important parameters are the

population mean and the population variance. The sample variance plays an

explicit role in the statistical methods used to draw inferences about the population

variance. The sample standard deviation has an important role along with the

sample mean in inferences that are made about the population mean. In general,

the variance is considered more in inferential theory, while the standard deviation

is used more in applications.

1.5 Discrete and Continuous Data

17

Exercises

1.7 Consider the drying time data for Exercise 1.1

on page 13. Compute the sample variance and sample

standard deviation.

1.8 Compute the sample variance and standard deviation for the water absorbency data of Exercise 1.2 on

page 13.

1.9 Exercise 1.3 on page 13 showed tensile strength

data for two samples, one in which specimens were exposed to an aging process and one in which there was

no aging of the specimens.

(a) Calculate the sample variance as well as standard

deviation in tensile strength for both samples.

(b) Does there appear to be any evidence that aging

aﬀects the variability in tensile strength? (See also

the plot for Exercise 1.3 on page 13.)

1.5

1.10 For the data of Exercise 1.4 on page 13, compute both the mean and the variance in “ﬂexibility”

for both company A and company B. Does there appear to be a diﬀerence in ﬂexibility between company

A and company B?

1.11 Consider the data in Exercise 1.5 on page 13.

Compute the sample variance and the sample standard

deviation for both control and treatment groups.

1.12 For Exercise 1.6 on page 13, compute the sample

standard deviation in tensile strength for the samples

separately for the two temperatures. Does it appear as

if an increase in temperature inﬂuences the variability

in tensile strength? Explain.

Discrete and Continuous Data

Statistical inference through the analysis of observational studies or designed experiments is used in many scientiﬁc areas. The data gathered may be discrete

or continuous, depending on the area of application. For example, a chemical

engineer may be interested in conducting an experiment that will lead to conditions where yield is maximized. Here, o…

Don't use plagiarized sources. Get Your Custom Essay on

Minneapolis Community and Technical College Statistics & Databases Worksheet

Just from $13/Page

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

24/7 Customer Support

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.