Stat 7 Final Winter 2020 March 14, 2020 Midterm Contains 1
Stat 7 Final Winter 2020march 14 2020this Midterm Contains 100 Poin
This midterm exam covers several statistical hypothesis testing, correlation, regression analysis, probability distributions, and inferential statistics problems based on data analysis from public health, gambling, medical testing, and accident modeling contexts. Students are permitted open notes but are not allowed to collaborate. The exam includes problems on hypothesis testing of proportions, correlation coefficient calculation and testing, regression line estimation and inference, chi-square test of goodness-of-fit, comparison of proportions, probability calculations for binomial and Poisson distributions, and Bayesian probability in medical diagnosis. The exam emphasizes applying statistical formulas, interpreting p-values, chi-square critical values, regression coefficients, determination coefficient, and conditional probabilities to real-world data in health, gambling, and safety scenarios.
Paper For Above instruction
The analysis begins with hypothesis tests concerning proportions in the context of public health, exploring mortality rates of lung cancer across two counties. Using sample sizes of 100 and 200 patients in Hartford and New Haven counties respectively, with observed deaths of 10 and 5, the tests examine whether the proportion of death in New Haven differs from 0.1, whether Hartford’s death proportion exceeds 0.05, and whether the mortality rates between the counties differ significantly.
In the second scenario, an investigation into the association between gestational age and birth weight is performed with a small sample of ten infants. The calculation of the Pearson correlation coefficient involves evaluating the extent to which gestational age predicts birth weight. A hypothesis test for the correlation coefficient evaluates whether there is a statistically significant association. Additionally, a simple linear regression model is constructed, estimating the intercept (β0) and slope (β1), followed by hypothesis testing to assess whether the slope significantly differs from zero, indicating a meaningful relationship between gestational age and birth weight. The coefficient of determination (R²) quantifies the proportion of variability in birth weight explained by gestational age.
The third problem analyzes a casino game involving dice. By observing the number of sixes rolled in multiple trials, a chi-square goodness-of-fit test assesses whether the observed frequencies align with the expected frequencies under the assumption of fair dice. The hypothesis that the dice are fair is tested at the 5% significance level.
In the fourth question, a comparison between two drugs for curing a disease employs a contingency table. The chi-square test determines whether there is a statistically significant difference in efficacy between Drug 1 and Drug 2 concerning cure rates. The null hypothesis of no difference is evaluated at the 5% level of significance.
The fifth and sixth problems involve probability distributions. The probability of death from a virus infection, where the probability of cure is 0.90, follows a binomial distribution, enabling calculations of mean, variance, and standard deviation, as well as the probability of exactly five deaths. Similarly, the number of accidents per week, assuming a Poisson distribution with an average of 12 accidents per month, is used to compute mean, variance, standard deviation, and probability of five accidents occurring in a week.
The seventh problem applies conditional probability to determine the likelihood that a randomly selected smoker from Orange County, known to smoke at a certain rate among males and females, is male given an observed smoking status. Using the law of total probability and Bayes’ theorem, the probability of being male given that the individual is a smoker is computed.
The eighth problem involves medical testing and Bayesian probability. With known test accuracy rates (sensitivity and false-positive rate) and disease prevalence, the probability that a person with a positive test result actually has the disease (posterior probability) and the probability that a person with a negative test result does not have the disease are calculated, illustrating the application of Bayes’ theorem in diagnostics.
References
- Agresti, A. (2018). Statistical Methods for the Social Sciences. Pearson.
- Casella, G., & Berger, R. L. (2002). Statistical Inference. Duxbury Press.
- Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.
- Fisher, R. A. (1922). On the interpretation of χ² from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1), 87-94.
- Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat Med, 17(8), 857-872.
- Motulsky, H. (2014). Intuitive Biostatistics. Oxford University Press.
- Snedecor, G. W., & Cochran, W. G. (1989). Statistical Methods. Iowa State University Press.
- Weiss, N. A. (2012). Introductory Statistics. Pearson.
- Zar, J. H. (2010). Biostatistical Analysis. Pearson.
- Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. Wiley.