#Name:
#Student ID:
rm(list=ls())
sour
c
e(‘Rallfun-v33.txt’)
#PART 1
#A company claims that, when exposed to their toothpaste, 45% of all bacteria related to gingivitis are killed, on average. You run 10 tests and ???nd that the percentages of bacteria killed in each test was:
# 38%, 44%, 62%, 72%, 43%, 40%, 43%, 42%, 39%, 41%
# Assuming normality, you will test the hypothesis that the average percentage of bacteria killed was 45% at
alpha
=0.05
.
#1.1) Write out the Null and Alternative hypotheses
#1.2) Calculate the T-statistic and use Method 1 (we saw in class) to determine if the average bacteria killed was 45%. Do it by “hand”.
#Hint: Method 1 is to compare T to a critical value “c”.
#1.3) Do you reject or fail to reject the null?
################################################
#PART 2
#Now, let’s not assume normality
#2.1) Using the same data as in Part 1, test the hypothesis that the 20% trimmed mean is 45%?
#2.2) Do you reject or fail to reject the null?
#2.3) Assuming your test in 2.1 is the truth, what type of error did you make in #1.3?
################################################
#PART 3
#In a study of court administration, the following times to disposition (in minutes) were determined for twenty cases and found to be:
# 42, 90, 84, 87, 116, 95, 86, 99, 93, 92, 121, 71, 66, 98, 79, 102, 60, 112, 105, 98
#Assuming normality, you will test the hypothesis that the average time to disposition was 99 minutes at alpha=0.05.
#3.1) Write out the Null and Alternative hypotheses
#3.2) Calculate the T-statistic and use Method 2 (we saw in class) to determine if the average time to disposition was 99? Do it by “hand”.
#Hint: Method 2 is to evaluate the confidence interval.
#3.3) Do you reject or fail to reject the null?
################################################
#PART 4
#Now, let’s not assume normality
#4.1) Using the same data as in Part 3, test the hypothesis that the 20% trimmed mean is 99?
#4.2) Do you reject or fail to reject the null?
#4.3) Assuming your test in 4.1 is the truth, what type of error did you make in #3.3?
################################################
#PART 5
#Suppose you run an experiment, and observe the following values:
# 12, 20, 34, 45, 34, 36, 37, 50, 11, 32, 29
#You will test the hypothesis that the average was 25 at alpha=0.05.
#5.1) Write out the Null and Alternative hypotheses. Conduct the hypothesis test assuming normality. Use the “t.test” function. Do you reject or fail to reject the null?
#5.2) Conduct the hypothesis test without assuming normality. Do you reject or fail to reject the null?
#5.3) Assuming the answer in #5.2 is the truth, what type of error (if any) did you make in #5.1 by assuming normality?
——————————————————————————————
Lab 7- Lecture Notes (FOR YOUR REFERENCE)
#Lab 7-Contents
#1. For
mu
lating Hypotheses
#2. T-statistics by Hand
#3. Alpha Level
#4. Evaluating Our Results
#5. Using the t.test function
#6. T-tests with Trimmed Means (trimci function)
#7. Type 1 and Type 2 Errors
# Last week we talked about computations for when the Population
#Variance is known and unknown.
# Given that we rarely know the population variance,
#we will use the T-distribution for all of today’s lab.
#We will primarily work with the dataset brfss09_lab7.txt:
#########################################################################################################################
#Behavioral Risk Factors Surveilance Survey 2009 (BRFSS09) Data Dictionary:
#———————————————————————————
—————————————
#id: “Subject ID”Values[1,998]
#physhlth: “# Days past month phsycial health poor” Values[1,30]
#menthlth: “# Days past month mental health poor”Values[1,30]
#hlthplan: “Have healthcare coverage?”Values 1=Yes, 2=No
#age:”Age in Years”Values[18,99]
#sex:”Biologic Sex”Values 0=Female, 1=Male
#fruit_day: “# of servings of fruit per day”Values[0,20]
#alcgrp: “Alcohol Consumption Groups”Values 1=None, 2= 1-2 drinks/day 3= 3 or more drinks/day
#smoke:”Smoking Status”Values 0=Never, 1=Current EveryDay, 2=Current SomeDays, 3=Former
#bmi:”Body Mass Index”Values[14,70]
#mi:”Myocardial Infarction (heart attack)”Values 0=No, 1=Yes
#————————————————————————————————————————
# For today’s lab, let’s start by reading in our datafile
# ‘brfss09_lab7.txt’ into an object called mydata
mydata=read.table(‘brfss09_lab7.txt’, header=T)
#This file contains:
dim(mydata)#100 Subjects, 11 variables
#With the following variables:
names(mydata)
# We have collected this data and would like to know
#if the values we have found in our sample are different
#from the reported values in the literature.
# For example, it has been reported that the average BMI
# in the population is 27.5. We would like to know if the
#values in our sample are somehow different than this value.
#———————————————————————————
# 1. Formulating Hypotheses
#———————————————————————————
#Step 1 of determining if our BMI values differ from the
#national average of 27.5 is to formulate our hypotheses
#We have TWO hypotheses
#1) The Null Hypothesis: H0: mu = 27.5
#2) The Alternative Hypothesis: HA: mu != 27.5
#NOTE: mu=Population Mean
#The above hypotheses are Two-Sided.
#By this I mean that we are looking to see if our sample values of
#BMI are greater than (>) OR less than (<) 27.5.
# A one-sided hypothesis test would look like:
#H0: mu < 27.5
#HA: mu > 27.5
#OR
#H0: mu > 27.5
#HA: mu < 27.5
#We will always use two-sided tests in this class,
#and similarly in the real world two-sided tests dominate.
#Once we have our hypotheses we will evaluate them
#and determine one of two outcomes:
# A) Reject the Null Hypothesis
# B) Fail to Reject the Null Hypothesis
#———————————————————————————
# 2. T-statistics by Hand (well..with help from the computer)
#———————————————————————————
#Recall from the last lab, that the formula for a T-statistic is:
# T = (SampleMean – PopMean) / (SampleSD/sqrt(N))
#Another way to write this would be:
#
T = (xbar – mu) / (s/sqrt(N))
#In this instance PopMean (mu) is the NULL hypothesis
#value we are testing against.
#We can solve for the other values that we don’t yet know:
mu=27.5
xbar=mean(mydata$bmi) #28.22
s=sd(mydata$bmi) #6.32
N=100
T = (xbar – mu) / (s/sqrt(N))
T #1.14
#We end up with a T value of ~ 1.14
#But how does this tell us if our mean is different from 27.5 ???!!!
#Before we move on, I want us to think about why we need
#to evaluate if our mean of 28.22 is different from 27.5.
#Certainly we can see that these are different numbers,
#so what are we really asking here?
#One way to think about it is that we are asking if our
#sample mean of 28.22 is different from 27.5 simply due to chance.
#Think of a coin tossing example:
#Your friend tosses a coin in the air and it lands on heads
#3 times in a row!
#While, kinda cool, seems like that is probably random chance.
#What about if it landed on heads 100 times in a row?!
#You would probably think she was cheating somehow!
#Though it is possible to have 100 heads in a row
#by chance alone, it is very unlikely
#The point at which we say that something is random vs not
#is determined by our alpha level.
#———————————————————————————
# 3. Alpha Level
#———————————————————————————
# The alpha level is determined a priori (a head of time)
#and used to set the threshold by which we consider something
#to be random chance
# A common alpha level is 0.05.
# We typically reject the null (think something is not chance)
#when the result we have (eg. 28.22) would only be
#that extreme < 5% of the time by chance.
#Recall from Lab 6, that we use the alpha level
#to help figure out critical values (c)
#
c=qt(1-(alpha/2), df)
#———————————————————————————
# 4. Evaluating our Results
#———————————————————————————
# There are 3 ways to evaluate if our mean of 28.22
# is different from the null of 27.5
# All three ways will yield the same conclusion.
#1) Compare T to a critical value (c)
#2) Evaluate the Confidence interval
#3) Compare the p value to our alpha level
###########################################################
#1) Compare T to a critical value (c)
#In order to compute the critical value (c),
#we must know the alpha level.
#We will choose a value of 0.05 (which is standard)
alpha=0.05
df=100-1
c=qt(1-(alpha/2), df)
#We can then compare the abosulte value of T (|T|)
#to the critical value c
#A)
If |T| > c, then Reject the Null Hypothesis
#B)
If |T| < c, then Fail to Reject the Null Hypothesis
#Let’s look at T can c
abs(T)
c
#What decision do we make about the Null Hypothesis????
###########################################################
#2) Evaluate the Confidence interval
#Rather than compare T to c,
#we could instead compute the confidence interval.
#Recall the formula for the Confidence interval is:
#
LB
= xbar – c*(s/sqrt(N))
#
UB
= xbar + c*(s/sqrt(N))
LB = xbar – c*(s/sqrt(N))
UB= xbar + c*(s/sqrt(N))
#A) If mu is not within the Confidence Interval,
#then Reject the Null Hypothesis
#B) If mu is within the Confidence Interval,
#then Fail to Reject the Null Hypothesis
#Let’s look at LB and UB
LBUBmu#What decision do we make about the Null Hypothesis????
###########################################################
#3) Compare the p value to our alpha level
#Lastly, we could find the probability value (or p-value)
#for the T statistic we created.
#We can do this by using the pt() function we learned
#about last week in lab 6.
#There is a forumla for computing P values from T-statitics:
#
pval
= 2*(1-pt(abs(T), df))
pval = 2*(1-pt(abs(T), df))
#We then compare the p-value to our alpha level
#A) If pval < alpha, then Reject the Null Hypothesis
#B) If pval > alpha, then Fail to Reject the Null Hypothesis
#Let’s look at our p-value.
pvalalpha#What decision do we make about the Null Hypothesis????
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#Exercise 4-1:
#Evaluate if the mean age from our sample (mydata) is different
#than the populatiuin mean age of 56
# A) Write down the Null and Alternative Hypotheses
# B) Calculate the T-statistic by hand
# C) Evaluate the Null hypothesis by using ALL 3 methods that
# we just discussed
# D) Based on the results in C, do you Reject or Fail to Reject
# the Null Hypothesis?
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#A)
#B)
#C)
#Method 1: Compare T to a critical value (c)
#Method 2: Evaluate the Confidence interval
#Method 3: Compare the p value to our alpha level
#D)
#———————————————————————————
# 5. Using the t.test function
#———————————————————————————
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#
# One Sample T-Test : t.test(data$variable, mu)
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#
#It was really awesome that we figured out T by hand!
#And then figured out the confidence intervals and P values!
#From now on, let’s just use a program to do all this for us.
#The function t.test will presume an alpha level of 0.05 by default.
t.test(mydata$age, mu=56)
# t.test(mydata$bmi, mu=27.5)
#Much simpler!
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#Exercise 5-1: Use the t.test function to evaluate if
#A) the mean days of physical health (physhlth) is different
# than the population mean of 10? Reject the Null?
#B) the mean fruits per day (fruit_day) is different than
# the populatiuin mean of 4? Reject the Null?
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#A)#B)#———————————————————————————
# 6. T-test with Trimmed Means
#———————————————————————————
#To use the T-test with trimmed means,
#we will need to load in the source code ‘Rallfun-v33.txt’
#The trimmed mean T-test is beneficial in that it does not
#presume a perfect Normal Distribution
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#
# Trimmed Mean T-Test:
# trimci(data$variable, tr=0.2, alpha=0.05, null.value=0)
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#
#For example, if I wanted to test if the age was equal to 56
#using Trimmed Means I could do:
trimci(mydata$age, null.value=56)
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#Exercise 6-1: Use the trimci function to evaluate if
#A) the 20% trimmed mean of days of physical health (physhlth) is
# different than the populatiuin mean of 10? Reject the Null?
#B) the 20% trimmed mean fruits per day (fruit_day) is different
#than the populatiuin mean of 4? Reject the Null?
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*##A)#B)#———————————————————————————
# 7. Type 1 and Type 2 Errors
#———————————————————————————
#Notice that we had very different answers to the same
#questions in Ex. 5-1 and 6-1
#Depending upon the method that we used.
#This brings us to discussing Type 1 and Type 2 Error
#A Type 1 error is when our test tells us to reject the null,
#but in truth we should not have
#A Type 2 error is when our test tells us to fail to reject the
#null, but in truth we should have rejected the null
#The following 2×2 square might make this easier to see.
# Truth
#————————————
#| H0 | HA |
#————– |——-|———–|
#My Test: H0 | H0 Type 2|
#————– |——-|———–|
#My Test: HA Type 1 | HA |
#————————————
#For the next exercise, let’s presume that our test of the trimmed mean is the Truth
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#Exercise 7-1:
#A) What type of error did we make when evaluating the mean
#of physhlth in exercise 5-1?
#B) What type of error did we make when evaluating the mean
#of fruit_day in exercise 5-1?
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#A)
#B)