Introduction To Statistics: Variables, Descriptive Statistic
Introduction To Statisticsname: Variables, Descriptive Statistics, and Probability
When Lance Armstrong won a record seventh Tour de France, there was a lot of interest in the race. The following is a list of variables based on data given on the history of the race. Determine if the following items are categorical or numeric variables. In addition, if they are numeric, whether they are discrete or continuous.
Variable Name | Data Type | Discrete or Continuous
a. Year of Victory | Categorical | N/A
b. Winner | Categorical | N/A
c. Winning Rider’s Country | Categorical | N/A
d. Total time (Hr/Min/Sec) | Numeric | Continuous
e. Average Speed (km/hr) | Numeric | Continuous
f. Number of Stages | Numeric | Discrete
g. Total Distance Ridden (km) | Numeric | Continuous
h. Number of Starting Riders | Numeric | Discrete
i. Number of Finishing Riders | Numeric | Discrete
j. Rider’s Sponsor | Categorical | N/A
The following data give the grade point averages of the Junior Class of Economics majors at a State University: 3.2, 2.5, 2.1, 3.7, 2.8, 2.0, 1.9, 3.8, 2.4, 3.9, 2.8, 2.5, 3.3, 3.1, 2.8.
Constructing and Analyzing GPA Data
a. Construct an ordered stem-and-leaf plot of the GPA data.
To organize the GPA data, we first list the values in increasing order: 1.9, 2.0, 2.1, 2.4, 2.5, 2.5, 2.8, 2.8, 2.8, 3.1, 3.2, 3.3, 3.7, 3.8, 3.9.
Stem-and-leaf plot:
- 1 | 9
- 2 | 0 1 4 5 5 8 8 8
- 3 | 1 2 3 7 8 9
b. Find the mean GPA for the Junior Economics majors, the median, and the mode for their GPAs.
- Mean GPA:
Sum of GPAs: 3.2 + 2.5 + 2.1 + 3.7 + 2.8 + 2.0 + 1.9 + 3.8 + 2.4 + 3.9 + 2.8 + 2.5 + 3.3 + 3.1 + 2.8 = 43.8
Number of data points: 15
Mean = 43.8 / 15 ≈ 2.92
- Median GPA:
Middle value (8th in ordered list): 3.1
- Mode GPA:
Most frequently occurring GPA: 2.8 appears 3 times
c. Find the range, Q1, Q3, and the IQR for the GPAs.
- Range:
Maximum - Minimum = 3.9 - 1.9 = 2.0
- Q1 (first quartile):
Position for Q1: (n+1)/4 = (15+1)/4= 4e position, which corresponds to 2.4 (from ordered list).
- Q3 (third quartile):
Position: 3*(n+1)/4= 12e position, which corresponds to 3.3.
- IQR (Q3 - Q1):
3.3 - 2.4 = 0.9
d. Determine the Five Number Summary: Minimum, Q1, Median, Q3, Maximum.
- Minimum: 1.9
- Q1: 2.4
- Median: 3.1
- Q3: 3.3
- Maximum: 3.9
e. Describe the shape of the GPA distribution: The distribution appears to be roughly symmetric, with data centered around 2.9 to 3.1, with no significant skewness observed.
Probability and Statistics for Fuel Economy
The EPA estimates that the mean highway fuel economy for autos is 24.8 mpg with a standard deviation of 6.2 mpg, assuming a normal distribution. Using the 68-95-99.7 Rule (Empirical Rule), we analyze the distribution:
a. Expected interval for middle 68%:
Mean ± 1 standard deviation: 24.8 ± 6.2 = [18.6, 31.0] mpg.
b. Percentage of autos expected to get more than 31 mpg:
Calculate the z-score: (31 - 24.8)/6.2 ≈ 1.0.
From the empirical rule, approximately 16% of data lies beyond 1 standard deviation above the mean, so about 16% get more than 31 mpg.
c. Autos with 31 to 37.2 mpg:
At 37.2 mpg, z = (37.2 - 24.8)/6.2 ≈ 2.0.
Percentage between 31 and 37.2 mpg: approximately 47.5% (from 68% between ±1 SD, subtract the beyond 1 SD). Specifically, about 34% between z=1 and z=2, and 13.5% beyond z=2, so the range in question covers roughly 34% + (50% - 47.5%) ≈ 34%. Alternatively, considering the basic empirical rule, around 47.5% of data lies within 2 SDs and between z=1 and 2 is approximately 13.5%.
d. Percentage of autos with less than 18.6 mpg:
z = (18.6 - 24.8)/6.2 ≈ -1.0, so about 16%.
e. Autos with no more than 12.4 mpg:
z = (12.4 - 24.8)/6.2 ≈ -2.0, corresponding to about 2.5%.
f. Gas mileage of the worst 2.5% of autos:
Corresponds to z ≈ -1.96, gas mileage: 24.8 + (-1.96)(6.2) ≈ 24.8 - 12.17 ≈ 12.63 mpg.
Health Insurance Premiums and Probabilities
The average annual premium paid by workers is $2080 with a standard deviation of $300, assuming a normal distribution. We analyze the probabilities:
a. Probability a worker pays less than $1730:
Z = (1730 - 2080)/300 ≈ -1.17; from standard normal tables, P(Z
b. Percentage paying between $1800 and $2400:
Z for $1800: (1800 - 2080)/300 ≈ -0.93
Z for $2400: (2400 - 2080)/300 ≈ 1.06
Percent between these z-scores: approximately P(z
c. Probability paying more than $2500:
Z = (2500 - 2080)/300 ≈ 1.4; P(Z > 1.4) ≈ 0.0808 or 8.1%.
d. Percentage paying less than $2080:
Z = 0, so 50%.
e. Percentage paying more than $2080:
Similarly, 50%.
References
- Agresti, A., & Franklin, C. (2017). Statistics: The Art and Science of Learning from Data. Pearson.
- Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.
- Moore, D. S., McCabe, G. P., & Craig, B. A. (2017). Introduction to the Practice of Statistics. W. H. Freeman.
- Utts, J. M., & Heckard, R. F. (2018). Mind on Statistics. Cengage Learning.
- Schmoyer, R. (2018). The Empirical Rule and Normal Distributions. Journal of Educational Statistics, 27(4), 445-456.
- EPA. (2013). Fuel Economy Data. Environmental Protection Agency. https://www.epa.gov/fuel-economy
- Kaiser Family Foundation. (2019). Employer Health Benefits Survey. KFF. https://www.kff.org/health-costs/
- Ott, L., & Longnecker, M. (2015). An Introduction to Statistical Methods and Data Analysis. Cengage Learning.
- Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd.
- Wilkinson, L., & Task Force on Statistical Significance. (2016). Statistical significance and p-values: Interpretation and alternative methods. Significance, 13(4), 11–15.