Guidelines: Homeworks should be clear and legible, with answers clearly indicated and work shown. Homeworks will be given a minus, check, or check plus owing to completion and correctness. You are welcome to work with others but please submit your own work. Your homework must be produced in an R Markdown (.rmd) file submitted via github. If you are having trouble accomplishing this, please refer to the guide. This homework adapts materials from the work of Michael Lynch (//spia.uga.edu/faculty_pages/mlynch/), Tyler McCormick (http://www.stat.washington.edu/~tylermc/), and Open Intro (https://www.openintro.org/stat/textbook.php)
Topics covered in this homework include:
This is fairly straightforward; we can use the pnorm
command to compute the probability of there being less than 19 bashings, the probability of there being more than 40 bashings, and then add them together:
#this is the lower tail (less than 19)
lower.pval <- pnorm(q=19,mean=35,sd=8,lower.tail=TRUE);lower.pval
## [1] 0.02275013
upper.pval <- pnorm(q=40,mean=35,sd=8,lower.tail=FALSE);upper.pval
## [1] 0.2659855
total.pval <- lower.pval + upper.pval; total.pval
## [1] 0.2887357
The probability that a WWE program will have less than nineteen bashings OR more than forty bashings is 0.289.
Days | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|
Prob. | 0.06 | 0.21 | 0.37 | 0.20 | 0.13 | 0.03 |
Let A be the event “It will be more than 6 days before the machinery becomes operational,” and let B be the event “It will be less than 8 days before the machinery becomes available.”
We can calculate this by summing the probabilities for each day more than 6:
event.A <- 0.37 + 0.20 + 0.13 + 0.03; print(event.A)
## [1] 0.73
event.B <- 0.06 + 0.21 + 0.37; print(event.B)
## [1] 0.64
1 - event.A
## [1] 0.27
The complement of event A is the set of outcomes not in event A, so we find the probability using 1 - p(A).
The intersection of events A and B is the set of potential outcomes that satisfy the event A and event B (i.e., where both are true). Since event A is A = {7, 8, 9, 10}, and B is B = {5, 6, 7}, the intersection of event A and B is \(A \cap B = {7}\), so \(p(A \cap B) = 0.37\).
The union of events A and B is the set of potential outcomes that satisfy either event A or event B (i.e., where either is true). Since event A is A = {7, 8, 9, 10}, and B is B = {5, 6, 7}, the union of event A and B is \(A \cup B = {5, 6, 7, 8, 9, 10}\), so \(p(A \cup B) = 1\) since this includes all possible events.
The probability of a first time entrant finishing the race is p(0.68) and a repeat racer finishing the race is p(0.87). The probability of being a repeat racer is p(0.64). Thus, the probability of being both a repeat racer AND a finisher is 0.64 * 0.87 = 0.56.
The probability of a randomly chose entrant finishing (regardless of whether it is their first time or not) is the calculated by finding the proportion of first-timers who finish and the proportion of repeat-runners who finish and adding these two together: 0.64 * 0.87 + 0.36 * 0.68 = 0.8.
To calculate this, we need to find the union of being a repeat racer and being a finisher (\(p(Repeat \cup Finisher)\)). Remember that we can’t double-count repeat racers who finish, so we need to add the probability of being a repeater to the probability of being a finisher and then subtract the probability of being a repeat finisher: p(Repeat) + p(Finish) - p(Finish + Repeat) = 0.64 + 0.80 - 0.56 = 0.88.
We know that 35% of students signed up for art, and 20% of those students signed up for chess club. So, we know that 0.35 * .20 = 0.07 students signed up for both.
We know that there are 0.07 art club students who also joined the chess team. Thus, if 40% of all students are in the chess club and we know that 0.07 are in both this means that 0.33 are ONLY in the chess club. Essentially, this means that 7 out of every 40 chess club memers are also in the art club. We can divide 7/40 = 0.175 to get the probability of a chess club member being in art as well.
We know that 35% of students signed up for art, and 40% for chess, but we don’t want to double count the 7% of students who signed up for both. Thus, we add 0.35 + 0.40 - 0.07 = 0.68 to get the probability that a randomly chosen student signed up for at least one club.
The trick here is that the probability of being absent is unconditional on the probability of being unvaccinated (at least in this problem). That is, p(absent) = p(absent|unvaccinated). Thus, we can find the number of students likely to miss the doctor visit my multiplying the number of unvaccinated students by the probability of being absent: 0.08 * 84 = 6.72
pnorm
/qnorm
/dnorm
family will be your friend here)pnorm(210,180,25,lower.tail=FALSE)
## [1] 0.1150697
The probabilty that a random draw from N(180,25^2) will be greater than 210 is p(X > 210) = 0.1150697.
pnorm(182,180,25,lower.tail=TRUE)
## [1] 0.5318814
The probabilty that a random draw from N(180,25^2) will be less than 182 is p(Y < 182) = 0.5318814.
1- pnorm(160,180,25,lower.tail=TRUE) - pnorm(192,180,25,lower.tail=FALSE)
## [1] 0.4725309
The probabilty that a random draw from N(180,25^2) will be less than 192 and greater than 160 is p(160 < Z < 192) = 0.4725309 (Remember to set your tails correctly so you find 1 - area_left_of_160 - area_right_of_192)
x | -2 | -1 | 0 | 1 | 2 |
---|---|---|---|---|---|
p(x) | 0.15 | 0.15 | 0.35 | 0.25 | 0.10 |
The probability that X is less than or equal to 0 is \(P(x<=0)\) = 0.15+0.15+0.35 = 0.65.
The probabilty that X is greater than or equal to -1 is \(P(x>=-1)\) = 0.15+0.35+0.25+.10 = 0.85.
The probability that X is greater than or equal to -1 and less than or equal to 1 is: \(P(-1<=x<=1)\) = 0.15 + 0.35 + 0.25 = 0.75.
The probability that X is less than 2 is \(P(x<2)\) = 1 - 0.10 = 0.9.
The probability that X is greater than -1 and less than 2 is: \(P(-1<x<2)\) = 0.35 + 0.25 = 0.6.
SKIP THIS PROBLEM
cflip=rbinom(20,1,.55)
. Use ?rbinom
to make sure you understand what each part of the command stands for.cflip=rbinom(20,1,.55)
cflipsamp=sample(cflip,20,replace=T)
cflipsamp=sample(cflip,20,replace=T)
replicate
function). resamp.mean=rep(NA,500) for(i in 1:500){ resamp.mean[i]<-mean(sample(cflip,20,replace=T))}
.cflip.samp.500 <- replicate(500,mean(sample(cflip,20,replace=T)))
hist(cflip.samp.500)
Yes, this is about what we would expect.The true probability is 0.55, and it looks like the distribution is approximately normally distributed around this value.
I am going to run four different experiments: (1) 20 draws, repeated 20 times; (2) 20 draws, repeated 500 times; (3) 500 draws, repeated 20 times; and (4) 500 draws, repeated 500 times:
par(mfrow=c(2,2))
hist(replicate(20,mean(sample(cflip,20,replace=T))),main='20d, 20s',xlab='')
hist(replicate(500,mean(sample(cflip,20,replace=T))),main='20d, 500s',xlab='')
hist(replicate(20,mean(sample(cflip,500,replace=T))),main='500d, 20s',xlab='')
hist(replicate(500,mean(sample(cflip,500,replace=T))),main='500d, 500s',xlab='')
As shown, what makes the largest difference is the number of samples taken. Even when we take 500 draws, but only 20 samples, the sample distribution is not very normally distributed. However, the plots comparing 500 samples of 20 draws vs. 500 samples of 500 draws are actually fairly similar to one another in shape; nonetheless, there is clearly a significant distinction in terms of spread. The 500 draw, 500 sample histogram is not only approximately normal but is closely centered on the true mean, whereas the histogram for the 20 draw, 500 sample experiment has a much wider spread.
SKIP THIS PROBLEM
Greater than 3 days?
Less than 1.5 days?
Between 2.25 and 3.75 days?
Null: New Yorkers sleep 8 hours a night on average Alternative: New Yorkers sleep less than 8 years a night on average.
\[H_0: \mu = 8\] \[H_A: \mu < 8\]
Null: Employees waste 15 minutes of time per day in March Alternative: Employees waste more than 15 minutes of time per day in March
\[H_0: \mu = 15\] \[H_A: \mu > 15\]
Null: Calorie consumption after the new requirement is the same as before Alternative: Calorie consumption after the new requirement is different than before.
\[H_0: \mu = 1100\] \[H_A: \mu != 1100\]
Null: The average GRE verbal reasoning score has not changed since 2004. Alternative: The average GRE verbal reasoning score has changed since 2004.
\[H_0: \mu = 462\] \[H_A: \mu != 462\]
Null: The restuarant is not in gross violation Alternative: The restaurant is in gross violation
Concluding that the restaurant is in gross violation when it is not.
Failing to shut down the restaurant when it is in fact in gross violation.
For the restaurant owner, Type I error is most problematic, since it means that he will be shut down when he in fact should not be.
For diners, Type II error is more problematic, since it means they could be exposed to an unsafe restaurant that should be shut down.
You’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc. Give credit to your sources, whether it’s a blog post, a fellow student, an online tutorial, etc.
Minus: Didn’t tackle at least 3 tasks. Didn’t interpret anything but left it all to the “reader”. Or more than one technical problem that is relatively easy to fix. It’s hard to find the report in our repo.
Check: Completed, but not fully accurate and/or readable. Requires a bit of detective work on my part to see what you did
Check plus: Hits all the elements. No obvious mistakes. Pleasant to read. No heroic detective work required. Solid.
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.2 tools_3.2.2 htmltools_0.2.6
## [5] yaml_2.1.13 stringi_0.5-5 rmarkdown_0.7 knitr_1.10.5
## [9] stringr_1.0.0 digest_0.6.8 evaluate_0.7