Midterm Statistical Inference II Dr Jinwook Leestudent Name
Midterm Statistical Inference Ii Dr Jinwook Leestudent Nameproblem
Midterm Statistical Inference II, Dr Jinwook Lee Student Name: Problem 1. A sample of 36 sales receipts from an Apple store has the sample mean of $1200. Assume that the population standard deviation is known as $240. Use these values to test whether or not the mean of the sales is different from $1260. Use the significance level α = 0.05. (a) Find 95% Confidence Interval for a mean. (b) Hypothesis Testing – State the null and alternative hypotheses. – State Decision Rules for both p-value and critical value methods – Find Test Statistic – Conclusion Problem 2. Consider two securities, the first having µ1 = 1 and σ1 = 0.1, and the second having µ2 = 0.8 and σ2 = 0.12. Suppose that they are negatively correlated, with Ï = −0.8. (a) If you could invest in one security, which one would you choose, and why? (b) Suppose you invest 50% of your money in each of the two. What is your expected return and what is your risk? (c) Suppose you invest 80% of your money in security 1 and 20% in security 2. What is your expected return and what is your risk? Repeat the above three parts with Ï = 0.5. (d) What are those values? What do you think when comparing these two different correlated cases? Problem 3. Starbucks manager at LeBow wants to find: (a) the probability of the total average cost less than or equal to $1000 per day. He is only concerned about the cost of waiting time of customers. Customers spend time at the checkout register with mean of 4 minutes and standard deviation of 4 minutes. On average, he got 100 customers per day. 1 minute of waiting time is considered as cost of $0.5. (b) the probability of the total expected revenue greater than or equal to $1500 per day. On average, each customer spends $4 at a time with standard deviation of $1. Problem 4. The exponential p.d.f. is f(x|λ) = λe−λx for x ≥ 0 and λ > 0, and E(X) = λ−1. The c.d.f. is F(x) = P(X ≤ x) = 1 − e−λx. Four observations are made by an instrument that reports x1 = 5,x2 = 3, but x3 and x4 are too large for the instrument to measure and it reports only that x3 > 10 and x4 > 10. (The largest value the instrument can measure is 10.0) (a) What is the likelihood function? (b) Find MLE of λ. Problem 5. Assume that we fit a Poisson distribution to the following accident-frequency data: Number of accidents (k) Number of weeks (nk) log nk log(k!) log(nk k!/N) 0 38 3.....08 0.69 -1..69 1.79 -1..18 -1..99 15.66 What is the MLE of λ (i.e., the Poisson parameter)? N is the sum of nk (i.e., N = 76) and (as you know) the Poisson p.m.f is p(i) = e−λ λ i i! for i = 0, 1, . . . , where λ > 0. Problem 6. Suppose that an i.i.d. sample of size 15 from a normal distribution gives X = 10 and s2 = 25. Find 90% confidence interval for µ.
Paper For Above instruction
The collection of statistical problems presented encompasses various fundamental concepts in statistical inference, including confidence intervals, hypothesis testing, correlation and portfolio risk analysis, probability calculations, maximum likelihood estimation (MLE), and confidence intervals for population parameters. Below, each problem is addressed with detailed explanations and calculations rooted in statistical theory to provide clarity and rigor.
Problem 1: Confidence Interval and Hypothesis Testing for Mean Sales
Given a sample size of 36 receipts with a sample mean X̄ = $1200 and population standard deviation σ = $240, we aim to determine whether the average sales differ from $1260 at a significance level α = 0.05. The known standard deviation justifies using the z-test for hypothesis testing.
First, we compute the 95% confidence interval (CI) for the population mean:
- Standard Error (SE) = σ / √n = 240 / √36 = 240 / 6 = 40
- Critical z-value for 95% CI: z₀.975 ≈ 1.96
- CI: X̄ ± z SE = 1200 ± 1.96 40 = 1200 ± 78.4
- Thus, the 95% CI is approximately ($1121.6, $1278.4).
This interval contains the hypothesized mean of $1260, suggesting no strong evidence to reject the null hypothesis based on the CI alone.
Next, hypothesis testing involves setting up:
- Null hypothesis, H₀: μ = 1260
- Alternative hypothesis, H₁: μ ≠ 1260
The test statistic (z) is:
\[ z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}} = \frac{1200 - 1260}{240 / 6} = \frac{-60}{40} = -1.5 \]
The p-value for this two-tailed test is:
\[ p = 2 \times P(Z \leq -1.5) \approx 2 \times 0.0668 = 0.1336 \]
Since p > 0.05, we fail to reject H₀. The decision based on critical value:
\[ \text{Reject H}_0 \text{ if } |z| > 1.96 \]
Here, |z| = 1.5
Problem 2: Portfolio Return and Risk with Correlation
Two securities with known means, standard deviations, and correlation coefficients are analyzed concerning investment choices and portfolio risk.
(a) Choosing between securities
Investing in the security with higher expected return is preferable, assuming risk levels are acceptable. Here, security 1 has μ = 1, and security 2 has μ = 0.8. Given their mean returns, investing solely in security 1 offers a higher expected return, but risk considerations must be examined.
(b) Equal Investment in Both Securities (50% each)
The expected portfolio return:
\[ E[R] = w_1 \mu_1 + w_2 \mu_2 = 0.5 \times 1 + 0.5 \times 0.8 = 0.9 \]
The portfolio variance with correlation ρ = -0.8:
\[ \text{Var}(R) = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + 2 w_1 w_2 \sigma_1 \sigma_2 \rho \]
\[ = 0.25 \times 0.01 + 0.25 \times 0.0144 + 2 \times 0.5 \times 0.5 \times 0.1 \times 0.12 \times (-0.8) \]
\[ = 0.0025 + 0.0036 - 0.0048 = 0.0013 \]
Therefore, risk (standard deviation):
\[ \sqrt{0.0013} \approx 0.036 \]
When ρ = 0.5, recalculate variance:
\[ \text{Var}(R) = 0.0025 + 0.0036 + 2 \times 0.5 \times 0.5 \times 0.1 \times 0.12 \times 0.5 = 0.0025 + 0.0036 + 0.003 \approx 0.0091 \]
Risk:
\[ \sqrt{0.0091} \approx 0.095 \]
The change in correlation affects portfolio risk significantly, illustrating that negative correlation reduces risk, enhancing diversification benefits more than positive correlation does.
(c) Portfolio Composition (80% in security 1, 20% in security 2)
With ρ = -0.8:
\[ E[R] = 0.8 \times 1 + 0.2 \times 0.8 = 0.8 + 0.16 = 0.96 \]
\[ \text{Var}(R) = 0.64 \times 0.01 + 0.04 \times 0.0144 + 2 \times 0.8 \times 0.2 \times 0.1 \times 0.12 \times (-0.8) \]
\[ = 0.0064 + 0.000576 - 0.003072 = 0.003904 \]
Risk:
\[ \sqrt{0.003904} \approx 0.0624 \]
Similarly, with ρ = 0.5:
\[ \text{Var}(R) = 0.0064 + 0.000576 + 2 \times 0.8 \times 0.2 \times 0.1 \times 0.12 \times 0.5 \]
\[ = 0.0064 + 0.000576 + 0.0096 \approx 0.016576 \]
Risk:
\[ \sqrt{0.016576} \approx 0.1287 \]
Comparing the risk in different correlation scenarios illustrates how negative correlations can significantly lower portfolio risk through diversification, whereas positive correlations enhance risk.
Problem 3: Cost and Revenue Probabilities in a Retail Environment
The Starbucks manager seeks probabilities related to costs and revenues, modeled via normal distributions.
(a) Probability total cost ≤ $1000
Cost per customer is modeled with a mean of 4 minutes and a standard deviation of 4 minutes. For 100 customers, total waiting time T has:
Mean: μ_T = 100 \times 4 = 400 minutes
Standard deviation: σ_T = \sqrt{100} \times 4 = 10 \times 4 = 40 minutes
The cost per minute is $0.5, so total cost C = 0.5 \times T
Hence, μ_C = 0.5 \times 400 = $200, and σ_C = 0.5 \times 40 = $20
Calculate P(C ≤ 1000):
Convert to standard normal variable:
\[ Z = \frac{1000 - 200}{20} = \frac{800}{20} = 40 \]
This Z-value is effectively infinite, so the probability is approximately 1. In practical terms, the probability the total cost exceeds $1000 is negligible.
(b) Probability total revenue ≥ $1500
Customer spending per transaction is normally distributed with mean $4 and standard deviation $1. For 100 customers:
Expected total revenue: μ_R = 100 \times 4 = $400
Standard deviation: σ_R = \sqrt{100} \times 1 = 10
Total revenue R is normally distributed with these parameters.
Calculate P(R ≥ 1500):
\[ Z = \frac{1500 - 400}{10} = \frac{1100}{10} = 110 \]
Again, this Z-value is astronomically high, so the probability is practically zero that total revenue reaches $1500.
Therefore, given the data, the probability of cost exceeding $1000 is negligible, while exceeding $1500 revenue is virtually impossible under current conditions.
Problem 4: Likelihood and MLE for Exponential Distribution
The exponential probability density function (pdf) is f(x|λ) = λ e^(-λx) for x ≥ 0, with E[X] = 1/λ. When observations x1 = 5, x2 = 3, and x3, x4 exceed 10 but are not measured, the likelihood function incorporates the fact that x3 and x4 are censored—i.e., only known to exceed 10.
(a) Likelihood Function
The likelihood function with two exact observations and two censored observations is:
\[ L(\lambda) = f(x_1|\lambda) \times f(x_2|\lambda) \times P(X > 10|\lambda)^2 \]
\[ = \lambda e^{-\lambda \times 5} \times \lambda e^{-\lambda \times 3} \times [e^{-\lambda \times 10}]^2 \]
\[ = \lambda^2 e^{-\lambda(5 + 3 + 20)} = \lambda^2 e^{-\lambda \times 28} \]
(b) MLE of λ
Maximize the likelihood function with respect to λ. Taking the natural log:
\[ \ell(\lambda) = 2 \ln \lambda - 28 \lambda \]
Differentiate and set to zero:
\[ \frac{d\ell}{d\lambda} = \frac{2}{\lambda} - 28 = 0 \]
\[ \Rightarrow \frac{2}{\lambda} = 28 \Rightarrow \lambda = \frac{2}{28} = \frac{1}{14} \]
The maximum likelihood estimate is λ̂ = 1/14.
Problem 5: Poisson Distribution MLE
The data involves accident counts over weeks, with N = 76 total weeks. The Poisson probability mass function is p(i) = e^(-λ) λ^i / i!
The MLE of λ for Poisson data is the sample mean:
\[ \hat{\lambda} = \frac{\sum_{i} i \times n_k}{N} \]
The total number of accidents across all weeks is:
\[ \sum_{i} i \times n_k = \text{provided in the data, assumed to be } 15.66 \]
Therefore:
\[ \hat{\lambda} = \frac{15.66}{76} \approx 0.206 \]
The estimated Poisson parameter indicates the average number of accidents per week.
Problem 6: Confidence Interval for the Mean in Normal Distribution
Given a sample of size 15 with sample mean X̄ = 10 and sample variance s^2 = 25, the standard error (SE) is:
\[ SE = \frac{\sqrt{s^2}}{\sqrt{n}} = \frac{5}{\sqrt{15}} \approx \frac{5}{3.873} \approx 1.29 \]
Using the t-distribution for 90% confidence and df = 14, t_{0.95,14} ≈ 1.761.
The confidence interval for μ is:
\[ \text{CI} = X̄ \pm t \times SE = 10 \pm 1.761 \times 1.29 \]
\[ = 10 \pm 2.272 \]
Thus, the 90% CI for μ is approximately (7.728, 12.272).
References
- Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury.
- Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer.
- Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). CRC Press.
- Hogg, R. V., McKean, J., & Craig, A. T. (2013). Introduction to Mathematical Statistics (7th ed.). Pearson.
- Rice, J. A. (2006). Mathematical Statistics and Data Analysis. Thomson Brooks/Cole.
- Kendall, M., & Stuart, A. (1973). The Advanced Theory of Statistics (4th ed.). Charles Griffin & Co.
- Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduction to the Theory of Statistics. McGraw-Hill.
- Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
- Cox, D. R., & Hinkley, D. V. (1974). Theoretical Statistics. CRC Press.
- Myer, T., & Montgomery, D. C. (2014). Response surface methodology: process and product optimization using designed experiments. Wiley.