Stat 3300 Homework 5 Due Friday 05/22/2020 Note Answer These ✓ Solved

Stat 3300 Homework 5due Friday 05222020note Answer These Questio

Stat 3300 Homework 5 due Friday, 05/22/2020. Answer these questions on a separate piece of paper, including your name, SMU ID, and course number at the top right. Provide a clear title for the assignment. Submit before class if unable to attend the class on the due date.

Sample Paper For Above instruction

Question 1: Testing Hypotheses for Slope Coefficients

In this question, we examine three different linear regression contexts to test whether the slope coefficient equals zero against a two-sided alternative at a significance level of α = 0.05. For each scenario, we compute the test statistic, report the p-value, and state our decision to reject or fail to reject the null hypothesis.

a) For the first case, with n=20, the estimated regression line ŷ = 28.5 + 1.4x, and a standard error of the slope estimate SEb1=0.65, we compute the t-statistic as t = (b1 - 0) / SEb1 = 1.4 / 0.65 ≈ 2.154. With degrees of freedom df = n - 2 = 18, the two-tailed p-value corresponding to t=2.154 is approximately 0.046 (using t-distribution tables or R). Since p

b) For the second scenario with n=30, ŷ=30.8 + 2.1x, and SEb1=1.05, the t-statistic is 2.1 / 1.05 ≈ 2.0. Degrees of freedom df = 28, and the p-value is approximately 0.055. Because p > 0.05, we fail to reject the null hypothesis, suggesting insufficient evidence to conclude the slope differs from zero.

c) For the third setting, with n=100, ŷ=29.3 + 2.1x, and SEb1=1.05, the t-statistic is again approximately 2.0, but with df=98, the p-value is approximately 0.048, leading us to reject the null hypothesis at the 5% significance level.

Question 2: Constructing 95% Confidence Intervals for Slope

For each of the previous scenarios, we construct the 95% confidence interval (CI) for the slope as b1 ± t × SEb1, where t corresponds to the critical value at α=0.05.

a) First case: t* ≈ 2.12 (df=18), CI: 1.4 ± 2.10 × 0.65 → (1.4 - 1.365, 1.4 + 1.365) ≈ (0.035, 2.765). The interval suggests that with 95% confidence, the true slope lies between approximately 0.04 and 2.77, indicating a positive association.

b) Second case: t* ≈ 2.048 (df=28), CI: 2.1 ± 2.048 ×1.05 → (2.1 - 2.15, 2.1 + 2.15) ≈ (-0.05, 4.25). Since zero is included, the interval indicates uncertainty over whether the linear relationship is statistically significant.

c) Third case: t* ≈ 1.984 (df=98), CI: 2.1 ± 1.984 × 1.05 → (2.1 - 2.084, 2.1 + 2.084) ≈ (0.016, 4.184). This interval excludes zero, supporting a significant positive slope.

Question 3: Analyzing Tornado Data with R

a) Plotting the total tornadoes over years from 1953 to 2014 reveals a generally increasing trend, suggesting a possible linear progression. Inspecting for outliers or unusual patterns, some years, such as 2011, show spikes, potentially due to extreme weather events. Outliers are identified through points deviating markedly from the trend line, possibly reflecting data anomalies or reporting inconsistencies.

b) Running a simple linear regression with R:

```R

tornado_data

plot(tornado_data$Year, tornado_data$Tornadoes, main='Annual Tornadoes', xlab='Year', ylab='Number of Tornadoes')

model

summary(model)

```

The regression output provides the least-squares line: Tornadoes = intercept + slope × Year.

c) The fitted regression line's intercept is large and negative, but this is not an error; it indicates the estimated baseline level of tornadoes when Year approaches zero, which is outside the data range, so the interpretation is limited.

d) Residuals versus year plot reveals whether pattern deviations exist, such as funnel shapes indicating heteroscedasticity or autocorrelation. Any clustering or systematic deviations could suggest model inadequacy or unaccounted factors.

Question 4: Inference on Tornado Trend

a) The t-test for the slope coefficient assesses whether a significant linear trend exists. If the p-value is small (less than 0.05), there is evidence supporting a trend. The confidence interval for the slope also helps assess this; if it excludes zero, the trend is statistically significant.

b) A 95% CI for the slope provides bounds within which the true annual change in tornado counts likely falls. For example, if the interval is (20, 50), it indicates that tornadoes increase by between 20 and 50 per year, justifying the presence of an increasing trend.

c) Using the fitted regression model, the predicted number of tornadoes in 2015 is computed as the intercept plus the estimated slope times 2015.

d) An interval for actual tornado counts in 2015, accounting for prediction variability, is calculated as the predicted value ± t* × standard error of forecast, giving the range within which future observations are expected to fall with 95% confidence.

Question 5: Tuition Data Analysis from 2008 to 2014

a) Plotting 2008 versus 2014 tuition, we observe a positive linear relationship. Outliers or unusual points could appear, possibly due to data entry errors or unique institutions. The linear trend appears reasonable as tuition generally increased over the period.

b) Fitting a simple linear regression:

```R

tuition_data

plot(tuition_data$TUITION_2008, tuition_data$TUITION_2014, main='Tuition in 2008 vs 2014', xlab='2008 Tuition', ylab='2014 Tuition')

model_tuition

summary(model_tuition)

```

The least-squares line summarizes the average change pattern.

c) Residual plot indicates whether the model is appropriate. Unusual patterns or heteroscedasticity suggest the need for more complex models or transformations.

d) Removing the five California schools and refitting the model, the estimates may change—for example, the slope might decrease if those schools had higher-than-predicted increases—highlighting their influence on the overall trend.

Question 6: Inference with California Schools Removed

a) Hypotheses: H0: β1 = 0 (no linear relationship); Ha: β1 ≠ 0 (linear relationship exists).

b) The test statistic is t = (b1 - 0) / SEb1, and the p-value is obtained from the t-distribution. If p

c) The 95% confidence interval for β1 is b1 ± t* × SEb1, which quantifies the range of plausible values for the annual percent increase in tuition. If the interval excludes zero, it supports a significant trend.

d) R-squared values quantify the proportion of variation explained. For the model, the R-squared indicates how well 2008 tuition predicts 2014 tuition.

e) Inference on β0 (intercept) is typically not of interest because it represents the estimated tuition at zero, outside the data's practical range, thus not meaningful.

Question 7: Predictions Based on the Model

a) For a hypothetical tuition of $8,800 at Skinflint U in 2008, the predicted 2014 tuition is calculated using the regression equation.

b) Similarly, for I.O.U. with $15,700 tuition in 2008, the predicted 2014 tuition is obtained.

c) Discussing the appropriateness, predictions depend on the model's accuracy within the observed data range. Extrapolation beyond the data range or for institutions with unique factors may reduce prediction reliability.

References

  • Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). _Applied Linear Statistical Models_ (5th ed.). McGraw-Hill/Irwin.
  • Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). _Introduction to Linear Regression Analysis_. Wiley.
  • Faraway, J. J. (2014). _Linear Models with R_. Chapman and Hall/CRC.
  • Fox, J. (2015). _Applied Regression Analysis and Generalized Linear Models_. Sage Publications.
  • Zeileis, A., & Hothorn, T. (2002). Diagnostic checking in regression relationships. _Journal of Statistical Software_, 7(2), 1–25.
  • Venables, W. N., & Ripley, B. D. (2002). _Modern Applied Statistics with S_. Springer.
  • Chatterjee, S., & Hadi, A. S. (2015). _Regression Analysis by Example_. Wiley.
  • Wilkinson, L. (2012). _Statistical Methods in Psychology_. Cambridge University Press.
  • Carroll, R. J., & Ruppert, D. (2010). _Measurement Error in Nonlinear Models_. CRC Press.
  • Gelman, A., & Hill, J. (2007). _Data Analysis Using Regression and Multilevel/Hierarchical Models_. Cambridge University Press.