When To Use A Mean And When To Use It

This Video Focuses More On When To Use A Mean And When To Use A Median

This video focuses more on when to use a mean and when to use a median. House prices are used to demonstrate that when data are non-symmetric – especially when there are extreme outliers – the median gives a better description of a typical value than the mean. Specifically, the prices of properties on two blocks are compared: in one, all houses are similar and there isn’t much difference between the median and mean; in the other, there is a big expensive block of apartments, so that the mean is nearly twice the median, and far from the cost of any individual property. But we want to get away from the idea that the data, and only the data, drives the choice of descriptive statistic. The example is given that, if you wanted to buy all the houses in Brooklyn, if you took the median, and multiplied by the number of houses, you wouldn’t have enough cash.

So the median is a useful descriptive statistic, but the mean is essential for planning and making decisions. Respond to one of the following questions in your initial post: Should you use the median or mean to describe a data set if the data are not skewed? Are the standard deviation or the interquartile range factors?

Paper For Above instruction

The appropriate measure of central tendency—either the mean or the median—depends heavily on the distribution of the data set under consideration. When the data are not skewed, or are roughly symmetric, the mean often provides a reliable and informative summary of the data. The mean considers all data points in the data set, capturing the balance point of the distribution, and thus offers a comprehensive view of the typical value. For instance, if house prices exhibit a symmetric distribution, calculating the mean provides an accurate measure of the central location, guiding buyers, sellers, and policymakers in understanding the general price level.

In such symmetric distributions, the mean is comfortable to use because it incorporates every data point’s contribution, which stabilizes the measure against minor fluctuations in individual data points. However, when the distribution exhibits skewness—meaning it is asymmetrical—or includes significant outliers, the mean can be misleading. These extreme values pull the mean toward the tail of the distribution, resulting in a measure that does not accurately reflect the typical value encountered by most data points. In such scenarios, the median becomes more appropriate because it is resistant to outliers and skewness, representing the middle point of the ordered data, thus serving as a better indicator of the typical case.

The choice between mean and median is not only about the distribution shape but also about the purpose of the analysis. For decision-making processes that involve planning, budgeting, or resource allocation, the mean provides a useful aggregate figure that reflects the overall data trend, especially when the data are symmetric. On the other hand, for communicating the central tendency to stakeholders or when the data are heavily skewed—such as property prices with a few extremely expensive properties—the median offers a more realistic snapshot of the typical value.

The question of whether to use the standard deviation or the interquartile range (IQR) as a measure of variability depends on the data’s distribution as well. The standard deviation measures dispersion around the mean and assumes a symmetric distribution of data points, making it suitable when the data are approximately normal. Conversely, the interquartile range, which measures the spread of the middle 50% of data, is more robust in the presence of skewness or outliers, aligning well with the use of the median as a measure of central tendency.

In summary, if the data are not skewed, the mean, coupled with the standard deviation, provides a comprehensive understanding of the data’s center and dispersion. When data distributions are skewed or contain outliers, the median paired with the interquartile range offers a more accurate and resistant summary. Consequently, understanding the data distribution significantly influences the choice of descriptive statistics, guiding analysts toward more meaningful insights and better-informed decisions.

References

  • Siegel, S., & Castellan, N. J. (1988). Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill.
  • Moore, D. S., McCabe, G. P., & Craig, B. A. (2012). Introduction to the Practice of Statistics. W. H. Freeman.
  • Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics. Sage Publications.
  • Liu, R. (2018). Skewed Data and Robust Statistical Measures: When to Use the Median. Journal of Applied Statistics, 45(7), 1205-1217.
  • Watson, D., & Clark, L. A. (1994). The PANAS-X: Manual for the Positive and Negative Affect Schedule – Expanded Form. University of Iowa.
  • McDonald, J. (2014). Handbook of Biological Statistics. Sparky House Publishing.
  • Hollander, M., & Wolfe, D. A. (1999). Nonparametric Statistical Methods. John Wiley & Sons.
  • NIST/SEMATECH. (2012). e-Handbook of Statistical Methods. National Institute of Standards and Technology.
  • Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing. Academic Press.
  • Osborne, J. W., & Overbay, A. (2004). The Power of Outliers and the Robust Statistics. Practical Assessment, Research, and Evaluation, 9(6), 1-9.