For this assignment, you will implement a project involving statistical procedures. The topic may be something that is related to your work, a hobby, or something you found interesting. If you choose, you may use the example described below.

The project report must include

- name of project and your name
- purpose of project
- data (provide the raw data used, and cite the source)—the sample size must be at least 10
- median, sample mean, range, sample variance, and sample standard deviation (show work)
- frequency distribution
- histogram
- percentage of data within one standard deviation of the mean, percentage of data within two standard deviations of the mean, percentage of data within three standard deviations of the mean (include explanation and interpretation — do your percentages imply that the histogram is approximately bell-shaped?)
- conclusion (several paragraphs interpreting your statistics and graphs; relate to the purpose of the project)

If you choose, you may use the following example for your data.

- Purpose: Compare the amount of sugar in a standard serving size of different brands of cereal. (You may instead choose to compare the amount of fat, protein, salt, or any other category in cereal or some other food.)
- Procedure: Go to the grocery store (or your pantry) and pick at least 10 different brands of cereal. (Instead of choosing a random sample, you might purposely pick from both the “healthy” cereal types and the “sugary” ones.)

From the cereal box, record the suggested serving size and the amount of sugar per serving. The raw data is the serving size and amount of sugar per serving for each of the 10 boxes of cereal. Before calculating the statistics on the amount of sugar in each cereal, be sure you are comparing the same serving size.

If you use a serving size of 50 grams, you must calculate how much sugar is in 50 grams of each cereal. For example, if the box states that there are 9 grams of sugar in 43 grams of cereal, there would be 50 times 9 divided by 43, or 10.5 grams in 50 grams of cereal. The result of this simple calculation (for each of 10 boxes) is the data you will use in the project statistics and charts.

Page1 of 6

Statistics Project

Temperatures in January, 2006 in Purcellville, Virginia

Submitted by Suzanne Sands

Purpose: Analyze temperatures for January, 2006 in my local region, Purcellville, Virginia.

Most people are interested in the local weather, including me! I focused on the weather in January, 2006. News

reports indicated a much warmer January than usual. I was interested in compiling descriptive summaries in the

form of charts and numerical measures to get a sense of the typical temperature for January, 2006, and how the

temperatures have varied over the course of the month. (This particular project example is an adaptation of similar

project examples I have used in statistics classes I have taught in the past.)

Data: Random Sample of 30 Temperatures in January, 2006 in Purcellville, Virginia

Data Collection: An excellent website, www.weatherunderground.com, provides temperature readings from

thousands of weather stations. Toward the middle of the screen, I typed “Purcellville” in the “Location” box and

arrived at the Purcellville forecast. At the bottom of that page, there are links for personal weather stations. I

clicked on the “Top of Tranquility, Purcellville, VA” link

and arrived at

http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KVAPURCE1

You can search for weather readings any day you like in the recent past. This particular weather station

recorded 12 temperatures every hour back in 2006, so there were 12 readings/hr x 24 hours x 31 days = 8,928

temperature readings for January, 2006! I decided to select a simple random sample of 30 temperatures from

this large collection of data.

I collected 30 temperatures at random times in January, 2006. (Random sampling is NOT a

requirement for your project. For instance, you could record the high temperature for each day.)

FYI: Here is how I chose the random sample: Since there are 31 days in January, I generated 30 random numbers

between 1 and 31 (with possible repetition). (You will see below that many days are repeated.) Next I determined

the sampling times. Since there were 288 temperature readings each day, I generated 30 random numbers between 1

and 288, representing the reading numbers. Since there were 12 readings per hour, I divided the reading random

number by 12 to get the hour and used the remainder to figure out which reading to choose during that hour. I

looked up the temperatures for each randomly selected day and time, and recorded the appropriate temperature.

Count

(January, 2006)

Date Time

Temperature

(degrees) Count

(January, 2006)

Date Time

Temperature

(degrees)

1 1 2:41 44.1 16 13 11:55 48.4

2 2 6:35 31.3 17 13 16:01 57.9

3 2 16:21 39.7 18 15 17:35 34.9

4 2 18:11 40.6 19 16 4:55 32.4

5 3 11:45 39.7 20 16 11:11 37.6

6 4 8:05 35.2 21 16 18:35 36.3

7 4 11:55 42.4 22 18 9:01 44.6

8 4 20:35 39.9 23 18 13:45 40.5

9 4 23:52 40.3 24 23 5:41 34.2

10 5 5:21 39.2 25 25 20:31 32.2

11 9 11:31 52.5 26 25 21:31 31.9

12 10 2:45 43.7 27 27 2:11 27.7

13 10 4:21 43.0 28 28 14:35 61.7

14 12 2:31 37.9 29 28 19:21 48.7

15 12 16:55 54.9 30 31 1:01 48.7

Page 2 of 6

Temperature Data, in ascending order:

27.7 31.3 31.9 32.2 32.4 34.2 34.9 35.2 36.3 37.6 37.9 39.2 39.7 39.7 39.9

40.3 40.5 40.6 42.4 43.0 43.7 44.1 44.6 48.4 48.7 48.7 52.5 54.9 57.9 61.7

Notes: To construct a frequency distribution, typically we need to group the data into about four to eight intervals. In

looking over the sorted data, ranging from 27.7 to 61.7, it seems reasonable to use intervals of width 5 or 10 degrees.

Frequency Distribution:

Grouped in intervals of 10 degrees

Grouped in intervals of 5 degrees

REMARKS: Both tables show that the temperatures are principally clustered in the 30’s and 40’s. Which

table is better? It’s really a toss-up; either one is fine. It’s not necessary to make more than one table. I am

showing two tables, just for illustration purposes.

If a table has very low frequencies for all of the intervals (say a frequency of 1-2 for each interval), or if

there are more than 10 intervals, that would be an indication that the interval width is too small. For

example, if each interval consisted of just one degree, then the frequency table for this temperature data

would have over 30 rows and that table would not be very informative, in terms of helping to see where

the data are clustered.

30 Random Temperatures in January, 2006,

Purcellville, VA

Temperature

(degrees) Frequency

Relative

Frequency

19.95 – 29.95 1 .033

29.95 – 39.95 14 .467

39.95 – 49.95 11 .367

49.95 – 59.95 3 .100

59.95 – 69.95 1 .033

Total 30 1.000

30 Random Temperatures in January, 2006,

Purcellville, VA

Temperature

(degrees) Frequency

Relative

Frequency

24.95 – 29.95 1 .033

29.95 – 34.95 6 .200

34.95 – 39.95 8 .267

39.95 – 44.95 8 .267

44.95 – 49.95 3 .100

49.95 – 54.95 2 .067

54.95 – 59.95 1 .033

59.95 – 64.95 1 .033

Total 30 1.00

Page 3 of 6

Histogram

The histogram is a visual representation of the frequency distribution on the previous page, with the

temperatures grouped in intervals of 5 degrees.

The majority of temperatures fall between 34.95 and 44.95 degrees.

The histogram was generated with spreadsheet software. Your histogram does not have to be fancy. It can be hand-

drawn or typed in plain text form. It is important that the scales and the labeling are clear and accurate.

Plain text histogram:

Temperatures in January, 2006 in Purcellville, Virginia

Frequency |

9—|

| 8 8

8—| |XXXXXXX|XXXXXXX|

| |XXXXXXX|XXXXXXX|

7—| |XXXXXXX|XXXXXXX|

| 6 |XXXXXXX|XXXXXXX|

6—| |XXXXXXX|XXXXXXX|XXXXXXX|

| |XXXXXXX|XXXXXXX|XXXXXXX|

5—| |XXXXXXX|XXXXXXX|XXXXXXX|

| |XXXXXXX|XXXXXXX|XXXXXXX|

4—| |XXXXXXX|XXXXXXX|XXXXXXX|

| |XXXXXXX|XXXXXXX|XXXXXXX| 3

3—| |XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|

| |XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX| 2

2—| |XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|

| 1 |XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX| 1 1

1—| |XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|

| |XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|XXXXXXX|

0– .—-|——-|——-|——-|——-|——-|——-|——-|——-|

24.95 29.95 34.95 39.95 44.95 49.95 54.95 59.95 64.95

Temperatures (Degrees Fahrenheit)

(NOTE: If typing in plain text, use a fixed width font, such as Courier New)

1

6

8

8

3

2

1 1

0

1

2

3

4

5

6

7

8

9

25.95-

29.95

29.95-

34.95

34.95-

39.95

39.95-

44.95

44.95-

49.95

49.95-

54.95

54.95-

59.95

59.95-

64.95

F

re

q

u

e

n

c

y

Temperature (degrees Fahrenheit)

30 Random Temperatures in January, 2006, Purcellville, VA

Page 4 of 6

MEDIAN:

When the 30 data values are sorted, since 30 is even, the median is the average of the observations in the

middle, the average of the values in positions 15 and 16 in the sorted list.

27.7 31.3 31.9 32.2 32.4 34.2 34.9 35.2 36.3 37.6 37.9 39.2 39.7 39.7 39.9

40.3 40.5 40.6 42.4 43.0 43.7 44.1 44.6 48.4 48.7 48.7 52.5 54.9 57.9 61.7

Median = (39.9 + 40.3)/2 = 40.1 degrees.

SAMPLE MEAN = �̅ = 1242.1/30 = 41.40 degrees = the sum of the temperatures, divided by the

sample size

Note that the mean is larger than the median. The histogram has a longer right “tail” compared to the left

end, due to a few relatively high temperatures. The mean is affected by the size of the highest

temperatures, but the median is not, so the mean is larger than the median.

RANGE = 61.7 – 27.7 = 34.0 degrees = the difference between the maximum and minimum

SAMPLE VARIANCE = 66.1417 (calculations shown on the next page; used a spreadsheet & pasted it in

the document)

SAMPLE STANDARD DEVIATION = s = 8.13 degrees (calculation shown on the next page)

Data within one standard deviation of the mean must fall in the interval

��̅ − �, �̅ + �� = �41.40 − 8.13, 41.40 + 8.13� = �33.27, 49.53�

Data within two standard deviations of the mean must fall in the interval

��̅ − 2�, �̅ + 2�� = �41.40 − 2�8.13�, 41.40 + 2�8.13�� = �25.14, 57.66�

Data within three standard deviations of the mean must fall in the interval

��̅ − 3�, �̅ + 3�� = �41.40 − 3�8.13�, 41.40 + 3�8.13�� = �17.01, 65.79�

__

__ 27.7 31.3 31.9 32.2 32.4 34.2 34.9 35.2 36.3 37.6 37.9 39.2 39.7 39.7 39.9

40.3 40.5 40.6 42.4 43.0 43.7 44.1 44.6 48.4 48.7 48.7 52.5 54.9 ___ 57.9 61.7 __

In the interval �33.27, 49.53�, there are 21 temperatures, and 21/30 = 70.0%

In the interval �25.14, 57.66� , there are 28 temperatures, and 28/30 = 93.3%

In the interval �17.01, 65.79� , there are 30 temperatures, and 30/30 = 100.0%

So, 70.0% of the temperatures fall within one standard deviation of the mean, 93.3% of the temperatures

fall within two standard deviations of the mean, and 100% of the temperatures fall within three standard

deviations of the mean. For a bell-shaped distribution, the respective percentages are approximately 68%,

95%, and 100%. For the temperature data, the percentages are reasonably close to the bell-shaped model,

so yes, the data distribution is approximately bell-shaped.

Page 5 of 6

Calculation of sample variance and sample standard deviation:

Col 1 Col 2 Col 3 Col 4 = [Col 3]^2

Count Temperature x x – Mean (x – Mean)^2

1 44.1 2.6967 7.2720

2 31.3 -10.1033 102.0773

3 39.7 -1.7033 2.9013

4 40.6 -0.8033 0.6453

5 39.7 -1.7033 2.9013

6 35.2 -6.2033 38.4813

7 42.4 0.9967 0.9933

8 39.9 -1.5033 2.2600

9 40.3 -1.1033 1.2173

10 39.2 -2.2033 4.8547

11 52.5 11.0967 123.1360

12 43.7 2.2967 5.2747

13 43.0 1.5967 2.5493

14 37.9 -3.5033 12.2733

15 54.9 13.4967 182.1600

16 48.4 6.9967 48.9533

17 57.9 16.4967 272.1400

18 34.9 -6.5033 42.2933

19 32.4 -9.0033 81.0600

20 37.6 -3.8033 14.4653

21 36.3 -5.1033 26.0440

22 44.6 3.1967 10.2187

23 40.5 -0.9033 0.8160

24 34.2 -7.2033 51.8880

25 32.2 -9.2033 84.7013

26 31.9 -9.5033 90.3133

27 27.7 -13.7033 187.7813

28 61.7 20.2967 411.9547

29 48.7 7.2967 53.2413

30 48.7 7.2967 53.2413

Sum 1242.1 1918.1097

Mean 41.40333333 Sample Variance 66.14171264

(divide Sum by 30) (divide Col 4 sum by 29, one less than the sample size)

Sample Standard Deviation

(sqrt of variance) 8.132755538

Note: The results of the calculations can be checked by using the spreadsheet functions var( ) and

stdev( ) in Excel. However, for the purposes of demonstrating understanding of the calculations,

you must show work similar to the table above.

Page 6 of 6

CONCLUSION

In January, 2006 in Purcellville, Virginia, the 30 sampled temperatures fell between 27.7 and

61.7 degrees, for a range of 34 degrees. Temperatures tended to be concentrated in the upper

30’s and low 40’s, as shown the histogram.

The median temperature is 40.1° and the mean temperature is 41.4°, with standard deviation

8.13°. The temperature data distribution is approximately bell-shaped.

As mentioned at the beginning of this report, January of 2006 seemed to be unusually warm. The

analysis in this project agrees with this conjecture. In looking at the website www.weather.com, I

found that the average daily HIGH temperature for January (in any year) in Purcellville is 42

degrees. My analysis found an average of ALL sampled temperatures (not merely the daily

highs) to be 41.4, not much below the typical daily high.

[Remark: The average of the data, 41.4, is a statistic – it is the average temperature for the sample. It is possible that

the average of all January temperature readings is somewhat different. If we were familiar with the techniques of

inferential statistics, we could assess whether we can take this statistic and use it in making a statistical inference.]

FINAL REMARKS: This sample project could have been done without the use of a spreadsheet or fancy

software, if the frequency distribution, and histogram were carefully hand-drawn or typed. I have added

considerable commentary to the project items, to indicate what I was thinking about when completing the tasks. You

can be less “wordy,” but be sure that your work and summary are detailed and informative, and you show

calculations as requested.

StatisticsProject (10% of course grade, due ___)

For this assignment, you will implement a project involving statistical procedures. The topic may

be something that is related to your work, a hobby, or something you found interesting. If you

choose, you may use the example described below.

The project report must include

• name of project and your name

• purpose of project

• data (provide the raw data used, and cite the source)—the sample size must be at least

10. (The project example uses a random sample. Your sample does not have to be

random. You could be collecting personal data, such as your own bowling scores, and in

that case, the source is just your personal records.) Post a summary of your topic and your data

here in the Statistics Project class conference (as a new topic). Include a brief informative description

in the title of your posting.

• frequency distribution

• histogram

• median, sample mean, range, sample variance, and sample standard deviation (show

work)

• percentage of data within one standard deviation of the mean, percentage of data

within two standard deviations of the mean, percentage of data within three standard

deviations of the mean (include explanation and interpretation — do your percentages

imply that the histogram is approximately bell-shaped?)

• conclusion (several paragraphs interpreting your statistics and graphs; relate to the

purpose of the project)

If you choose, you may use the following example for your data.

• Purpose: Compare the amount of sugar in a standard serving size of different brands of

cereal. (You may instead choose to compare the amount of fat, protein, salt, or any other

category in cereal or some other food.)

• Procedure: Go to the grocery store (or your pantry) and pick at least 10 different brands

of cereal. (Instead of choosing a random sample, you might purposely pick from both the

“healthy” cereal types and the “sugary” ones.)

From the cereal box, record the suggested serving size and the amount of sugar per serving. The

raw data is the serving size and amount of sugar per serving for each of the 10 boxes of cereal.

Before calculating the statistics on the amount of sugar in each cereal, be sure you are comparing

the same serving size.

If you use a serving size of 50 grams, you must calculate how much sugar is in 50 grams of each

cereal. For example, if the box states that there are 9 grams of sugar in 43 grams of cereal, there

would be 50 times 9 divided by 43, or 10.5 grams in 50 grams of cereal. The result of this simple

calculation (for each of 10 boxes) is the data you will use in the project statistics and charts.

Here is a Statistics Project Checklist:

* Title

* Your Name

* Purpose

*

Data (A set of at least 10 data values of a numeric quantity, such as sugar per serving. Cite

the source. Label data as appropriate. For instance, Brand X: 10.5g per serving.)

*

Summary of Topic, and Data (posted in the Statistics Project conference.) The

idea here is two-fold: (1) To share your interesting project idea with your classmates, and (2)

To give me a chance to give you a brief thumbs-up or thumbs-down before you finish the

project. Sometimes students get off on the wrong foot or misunderstand the intent of the

project, and your posting provides an opportunity for some feedback. Remark: Students may

use similar topics, but must have different data sets. For example, several students may be

interested in points scored by a particular team, and that is fine, but they must collect different

data, perhaps from different years.

*

Frequency Distribution (used for histogram; can do by hand, or by using statistical

software)

* Histogram (can do by hand, or by using statistical software)

* Median (SHOW WORK/EXPLANATION)

*

Sample Mean (SHOW WORK/EXPLANATION; result can be confirmed by using by

using statistical software)

* Range (SHOW WORK/EXPLANATION)

*

Sample Variance (SHOW WORK/EXPLANATION; result can be confirmed by using

by using statistical software)

Make sure you are using the correct sample variance formula when showing your

calculations.

*

Sample Standard Deviation (SHOW WORK/EXPLANATION; result can be

confirmed by using by using statistical software)

*

Distribution of data: Calculate the percentage of data within one standard deviation of

the mean, percentage of data within two standard deviations of the mean, percentage of

data within three standard deviations of the mean (include explanation and

interpretation). For a bell-shaped distribution, the respective percentages are approximately 68%,

95%, and 100%. Do your percentages imply that your data distribution is approximately bell-

shaped? Note that the answer could be Yes or No, depending on your data. You can also look

at the shape of your histogram (is it roughly bell-shaped?) as well as the percentages when

making your judgment.

*

Conclusion (A short narrative summary) Interpret your results in a narrative summary

consisting of several paragraphs. Be sure to describe features of the graphs and measurements

that you find to be important or interesting. )

You may submit all of your project in one document or a combination of documents, which may consist of word

processing documents or spreadsheets or scanned handwritten work, provided it is clearly labeled where each

checklist item can be found. Projects are graded on the basis of completeness, correctness, ease in locating all

of the checklist items, and strength of the narrative portions.

Please see additional topics for Statistics Project Suggestions and Tips, and for a completely-worked

Statistics Project Example.

———————————————

Project Suggestions and Tips

Suggestions for Topics: Look to your own experience and interests for sample topics. Here are some popular

variables of interest, and data are relatively easy to collect:

• Temperatures (ex: set of high temperatures for a particular location)

• Sports-related data or hobby-related data (ex: your most recent 12 bowling scores, points/runs scored

by a particular team for games in a specified time period)

• Gas prices in different geographic locations

• Breakfast cereal attribute (ex: calories per serving, or cost per serving, carbohydrates per serving, etc.)

• Milk prices

• Stock price or Dow-Jones index for selected days

• Mileage rating (miles per gallon) for various vehicles

• Unemployment rates in various locales

• Salaries for a particular type of job (ex: teacher salary) in various geographic locations

• Data compiled for some feature of an item (example: prices of 4&5-megapixel digital cameras

reviewed in Consumer Reports)

Tips:

• Remember that a sample size of at least 10 is required.

• If all of the data are virtually the same (example: 10 gas prices and 8 of them nearly identical), then

there is little interest in carrying out descriptive statistics techniques — there isn’t anything interesting to

learn from the analysis! So, your data should have some variability.

• Make sure you are analyzing data related to a single variable. For example, if your data consists of

15 unemployment rates for males and 15 unemployment rates for females in 15 selected cities, then

you are working with two variables (male unemployment rate and female unemployment rate), NOT

one variable as specified.

• For this project, it is NOT necessary that the data be drawn as a random sample. (In the Statistics

Project example, the data do happen to be randomly selected, to show you how it can be done, but

random sampling is not a requirement for this project.)

• Be sure that your graphs/charts are labeled appropriately, with the axes, scales, and title easy to

interpret.

• Use the Statistics Project Checklist as an aid in making sure your project meets all of the

specifications.

**MATH 106 QUIZ 6**** ** **NAME: _____ __________
**_______

Professor: Dr. Katiraie

**INSTRUCTIONS
**

· The quiz is worth 100 points. There are 10 problems (each worth 10 points).

· This quiz is **
open book
** and

open notes

unlimited time

you may not consult anyone

· **
You must show your work to receive full credit. If you do not show your work, you may earn only partial or no credit at the discretion of the professor.
** Please type your work in your copy of the quiz, or if you prefer, create a document containing your work. Scanned work is acceptable also. Be sure to include your name in the document.

Consult the **Additional Information** portion of the online Syllabus for options regarding the submission of your quiz. If you have any questions, please contact me by e-mail (

farajollah.katiraie@faculty.umuc.edu

).

**MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.**

**Refer to the histogram to answer the question.
**

1) The telephone company kept track of the calls for the correct time during a 24-hour period for two weeks. The results are shown in the histogram below.

**What is the probability that a person will make 62 calls?**

1) _______

A) 9/14

B) 1/2

C) 1/14

D) 2/7

E) none of the above

2) The telephone company kept track of the calls for the correct time during a 24-hour period for two weeks. The results are shown in the histogram below.

**What percentage of people will make less than 61 calls?
**

2) _______

A) 21.43%

B) 92.86%

C) 28.43%

D) 35.71%

E) None of the above

**Prepare a frequency distribution with a column for intervals and frequencies.
**

3) The following is the number of hours students studied per week on average. Use five intervals, starting with 0 – 4.

2 8 13 18 23 24 16 13 7 2

9 12 19 23 19 12 4 9 15 20

3) _________

A)

B)

C)

D)

E)

None of the above

4) Construct cumulative frequency distribution that corresponds to the given frequency distribution.

4) _________

A)

B)

C)

D)

E)

None of the above

5) The ages of the voters at a poll during a 20-minute period are listed below.

Use five intervals starting with

5) _______

35 29 48 63 64 38 21 23 41 68

61 42 43 47 33 37 46 27 23 30

A)

B)

C)

D)

E) None of the above

6) Given the following information:

6) _______

**Find the cumulative frequency distribution using the above frequencies.
**

A)

B)

C)

D)

E) None of the above

7) The students in Mrs. Logan’s Spanish class received the following grades on a test. Use four intervals starting with 60 – 69.

7) _______

75 94 87 83 78 72 65 75 82 78 97

72 87 94 72 83 87 95 85 97 69

**Arrange the above data into a histogram table.
**

A)

B)

C)

D)

E) None of the above

8) The United Way had six large donations: $2657.89, $8237.45, $1682.17, $6271.34, $3918.54, and $5817.57. Find the mean donation.

8) _______

A) $ 5716.99

B) $ 5704.99

C) $ 7146.24

D) $ 4764.16

E) None of the above

9) A mean score of 70 for 6 exams is needed for a final grade of C. Jane’s first 5 exam scores are 65, 70, 55, 82 and 87.

Determine the score needed on the sixth exam to get a C in the course.

9) _______

A) 60

B) 72

C) 61

D) 85

E) None of the above

10) **Find the mean and standard deviation for the following data. (Round the mean to the nearest hundredth and the standard deviation to the nearest thousandth.)
**

** 8.25, 9.85, 10.89, 9.95, 7.85, 2.05**

10) ______

A) Mean = 8.39, Standard Deviation = 3.236

B) Mean = 8.59, Standard Deviation = 0.319

C) Mean = 8.19, Standard Deviation = 2.999

D) Mean = 8.14, Standard Deviation = 3.192

E) None of the above

Order your essay today and save **25%** with the discount code: GREEN