Assessment 1 - Assignment Unit: STA101 – Statistics For Busi
Assessment 1 - Assignment Unit: STA101 – Statistics for Business
This assignment covers four questions related to statistical analysis and interpretation. Students are required to answer all questions thoroughly, supporting responses with appropriate Harvard style references where necessary. Answers should include clear explanations, and while no specific word limit is imposed, comprehensive and detailed responses are expected to demonstrate a strong understanding of the topics.
Sample Paper For Above instruction
Question 1: Analysis of Quiz Data from Prof. Hardtack's Class
Prof. Hardtack administered four quizzes last semester to a senior tax accounting class comprising ten students. The quiz scores are as follows:
- Quiz 1: 60, 60, 60, 60, 71, 73, 74, 75, 88, 99
- Quiz 2: 65, 65, 65, 65, 70, 74, 79, 79, 79, 79
- Quiz 3: 66, 67, 70, 71, 72, 72, 74, 74, 95, 99
- Quiz 4: 10, 49, 70, 80, 85, 88, 90, 93, 97, 98
(a) Calculation of Mean, Median, and Mode for Each Quiz
To analyze student performance, we first compute the measures of central tendency for each quiz. The mean is calculated as the sum of scores divided by the number of observations (n=10). The median involves arranging scores in order and identifying the middle value(s). The mode reflects the score that appears most frequently.
- Quiz 1: Scores: 60, 60, 60, 60, 71, 73, 74, 75, 88, 99
- Mean: (60+60+60+60+71+73+74+75+88+99) / 10 = 720 / 10 = 72
- Median: Middle two scores (since n=10): average of 5th and 6th scores: (71 + 73) / 2 = 72
- Mode: 60 (appears four times)
- Quiz 2: Scores: 65, 65, 65, 65, 70, 74, 79, 79, 79, 79
- Mean: (654 + 70 + 74 + 794) / 10 = (260 + 70 + 74 + 316) / 10 = 720 / 10 = 72
- Median: average of 5th and 6th scores: (70 + 74)/2 = 72
- Mode: 65 and 79 (both appear four times—bimodal)
- Quiz 3: Scores: 66, 67, 70, 71, 72, 72, 74, 74, 95, 99
- Mean: Sum = 66+67+70+71+72+72+74+74+95+99 = 730 / 10 = 73
- Median: average of 5th and 6th scores: (72 + 72)/2 = 72
- Mode: 72 and 74 (each twice—bimodal)
- Quiz 4: Scores: 10, 49, 70, 80, 85, 88, 90, 93, 97, 98
- Mean: (10+49+70+80+85+88+90+93+97+98)/10 = 760 / 10 = 76
- Median: average of 5th and 6th scores: (85 + 88)/2= 86.5
- Mode: No repeated scores; thus, no mode
(b) Do these measures of center agree? Explain.
For each quiz, the mean and median largely agree or are close, indicating relatively symmetric data distributions. The mode provides insights into the most common scores, which sometimes differ from the mean or median, especially when there are multiple modes or outliers. For Quiz 4, the mean (76) is higher than the median (86.5), suggesting a potential skewness due to the lower scores like 10 and 49 skewing the mean downward. Conversely, quizzes where mean and median are similar reflect more symmetric distributions.
(c) Strengths and Weaknesses of Each Measure of Center
- Mean: Reflects the average; sensitive to outliers, which can distort the central tendency, especially evident in Quiz 4's outliers.
- Median: The middle value; less affected by outliers, making it a better indicator for skewed data, such as Quiz 4.
- Mode: Indicates the most frequent score; useful for understanding common student performance but can be ambiguous in bimodal or multimodal distributions.
(d) Symmetry or Skewness of the Data
Quizzes 1, 2, and 3 appear roughly symmetric as their mean and median are close, and their distributions are balanced. Quiz 4 exhibits skewness — specifically, negative skewness — because of the outlying low score (10) pulling the mean downward, and the median being higher, which suggests a tail on the lower side.
(e) Student Performance Comparison Across Quizzes
Overall, students performed similarly on quizzes 1 and 2, with mean scores of 72 and medians around 72, indicating consistent performance levels. Quiz 3's mean is slightly higher at 73, with a median of 72, suggesting marginal improvement. Quiz 4 shows a lower mean at 76, but the median at 86.5 indicates that most students scored around or above this level, with a few significantly lower scores skewing the average downward. The presence of outliers in Quiz 4 further indicates variability and inconsistency in student performance at the lower end.
Question 2: Confidence Interval for Proportion of Almonds in Nuts Sample
A sample of 100 Planter's Mixed Nuts revealed 19 almonds.
(a) Constructing a 90% Confidence Interval
The sample proportion (p̂) is 19/100 = 0.19. For a confidence level of 90%, the z-score (z*) is approximately 1.645. The standard error (SE) is:
SE = √[p̂(1 - p̂)/n] = √[0.19 * 0.81 / 100] ≈ √[0.1539 / 100] ≈ 0.0392
The margin of error (ME):
ME = z SE ≈ 1.645 * 0.0392 ≈ 0.0645
Thus, the 90% confidence interval is:
[p̂ - ME, p̂ + ME] = [0.19 - 0.0645, 0.19 + 0.0645] ≈ [0.1255, 0.2545]
Or approximately (12.55%, 25.45%).
(b) Normality Assumption
Since the sample size is large (n=100), the sampling distribution of the proportion is approximately normal due to the Central Limit Theorem, provided that both np̂ and n(1 - p̂) are sufficiently large (>5). Here, np̂ = 19 and n(1 - p̂) = 81, satisfying this condition, so normality assumption is justified.
(c) Sample Size for Specified Confidence and Margin of Error
Using the formula for the required sample size:
n = (z² p̂(1 - p̂)) / E²
Assuming p̂=0.19 (worst-case p), z* = 1.645, and E=0.03:
n = (1.645² 0.19 0.81) / 0.03² ≈ (2.706 * 0.1539) / 0.0009 ≈ 0.4168 / 0.0009 ≈ 463.1
Therefore, approximately 464 nuts need to be sampled to achieve this margin of error at 90% confidence.
(d) Importance of Sampling Knowledge for Planter's Quality Control
Understanding sampling principles enables effective quality control by allowing the company to estimate the true proportion of almonds in larger batches reliably. Proper sampling ensures accurate, cost-effective quality assessments and helps in making informed decisions about batch acceptance or rejection, thereby maintaining product standards and consumer trust.
Question 3: Regression Analysis of Sound Quality Versus Price
In the regression output with 27 stereo speakers, perceived sound quality is modeled as a function of price. The key statistics include an R2 of 0.01104, a standard error of 4.02545, and coefficients with standard errors for the intercept and price.
(a) Significance of the Price Coefficient
The coefficient for Price is -0.00453 with a standard error of approximately 0.006019. The t-statistic is calculated as:
t = coefficient / standard error = -0.00453 / 0.006019 ≈ -0.752
Using the t-distribution with 25 degrees of freedom (n-2), the critical t-value at α=0.05 is approximately 2.006. Since |t| = 0.752
(b) Interpretation of R2
The R2 value of approximately 0.01104 signifies that only about 1.1% of the variability in perceived sound quality is explained by the price. This suggests a very weak linear relationship, indicating that price is not a good predictor of perceived sound quality based on this dataset.
(c) Conclusion on Price and Sound Quality
Given the insignificance of the Price coefficient and the low R2, there is no statistical evidence to support the claim that higher prices imply higher sound quality in this sample. Consumers should exercise caution and consider other factors beyond price when evaluating sound quality.
Question 4: Hypothesis Testing on Delivery Time
A delivery company claims that their packages arrive within two days on average. The objective is to test if the true average delivery time exceeds this claim.
(a) Null and Alternative Hypotheses
Null hypothesis (H0): μ ≤ 2 days
Alternative hypothesis (H1): μ > 2 days
(b) Type of Error and Impact of Wrongly Concluding No Delay
Failing to reject H0 when it is false results in a Type II error. The impact is that the company might continue to assert an average of two days or less when, in reality, the delivery time exceeds this, leading to consumer dissatisfaction and potential reputational damage.
(c) Error and Impact of Wrongly Concluding a Delay
Rejecting H0 when it is true corresponds to a Type I error. The consequence is that the company might be unfairly judged as having longer delivery times, which could lead to unnecessary operational changes, loss of customer trust, and possible financial costs.
(d) Which Error Is Worse from the Company's Perspective? Why?
The company would likely prioritize avoiding a Type I error, as falsely indicating longer delivery times could damage their reputation and customer trust. Thus, from the company's perspective, a Type I error is more detrimental.
(e) Which Error Is Worse from a Consumer's Standpoint? Why?
Consumers would be more concerned about a Type II error, as it implies the company’s actual delays are being overlooked, leading to unmet expectations, inconvenience, and dissatisfaction. Therefore, failing to detect delays (Type II error) is more damaging from the customer's point of view.
References
- Agresti, A. (2018). Statistical methods for the social sciences. Pearson.
- Devore, J. L. (2015). Probability and statistics for engineering and the sciences. Cengage Learning.
- Moore, D. S., McCabe, G. P., & Craig, B. A. (2017). Introduction to the practice of statistics. W. H. Freeman.
- Altman, D. G. (1991). Practical statistics for medical research. Chapman and Hall/CRC.
- Fisher, R. A. (1925). Statistical methods for research workers. Oliver and Boyd.
- Zar, J. H. (2010). Biostatistical analysis. Pearson.
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge.
- Ross, S. M. (2014). Introduction to probability models. Academic Press.
- Newbold, P., Carlson, W., & Thorne, B. (2013). Statistics for business and economics. Pearson.
- Montgomery, D. C., & Runger, G. C. (2014). Applied Statistics and Probability for Engineers. Wiley.