Activity: I Suppose Y Is A Dichotomous Dependent Variable

Activity I Suppose Y Is A Dichotomous Dependent Variable And The Dat

Activity I: Suppose Y is a dichotomous dependent variable, and the data-generating process for Y can be expressed as: Yi = 0.1 + 0.03X1i − 0.02X2i + Ui. Interpret the coefficients on X1 and X2.

Activity II: Suppose you are trying to learn the relationship between the price you charge for your product and the likelihood of purchase by individuals offered that price. You offer your product online and for one month have randomly posted prices between $10 and $30. Using data on purchases and prices, you get the following estimates for a linear probability model: Purchasei = 1.7 − 0.06 × Pricei. You are interested in the effect of a $20 price increase (i.e., moving from the lowest price to the highest) on the likelihood of Purchase. Why is answering this question problematic using this model?

Paper For Above instruction

The analysis of a dichotomous dependent variable, such as Y, often involves exploring the effects of various explanatory variables within a specified data-generating process. In the first activity, the model Yi = 0.1 + 0.03X1i − 0.02X2i + Ui provides a linear approximation, which is commonly used in linear probability models (LPM) for binary outcomes. The coefficient 0.03 on X1 indicates that, holding other factors constant, a one-unit increase in X1 is associated with an increase of 0.03 in the probability that Y=1. (Angrist & Pischke, 2009). Similarly, the coefficient −0.02 on X2 suggests that a one-unit increase in X2 is associated with a decrease of 0.02 in this probability. These coefficients serve as approximate marginal effects, intended to illustrate the direction and magnitude of the relationships between the predictors and the likelihood of the outcome occurring.

In the second activity, the linear probability model estimates that each additional dollar in price reduces the probability of purchase by 0.06 (or 6%). While intuitively straightforward, this model confronts several significant issues when interpreting the impact of large price changes, such as a $20 increase from the lowest to the highest price levels. First, the linear probability model tends to predict probabilities outside the valid range of [0,1], especially at the extremes of the independent variables. When prices range from $10 to $30, a 20-dollar increase could theoretically translate into a predicted probability below zero or above one, which is impossible in reality (Wooldridge, 2010). This problem undermines the reliability of the estimated effects over large ranges, making the interpretation of the magnitude of the change problematic.

Furthermore, the linear probability model assumes a constant marginal effect of price across all price levels, which is not practical in real-world consumer behavior. As prices increase or decrease, the sensitivity of purchasing decisions may diminish, implying a non-linear relationship. While more sophisticated models like logistic regressions effectively address this issue by constraining predicted probabilities within [0,1], the linear model fails to account for such non-linearities (Greene, 2012). Consequently, answering questions about the impact of substantial price changes using the linear probability model leads to potentially inaccurate and misleading conclusions, especially when extrapolating beyond the range of observed data.

Overall, while the linear probability model is a convenient tool for preliminary analysis, its limitations—particularly the possibility of generating invalid probability predictions and assuming constant effects—must be acknowledged. For robust policy implications or business decisions involving significant changes, more appropriate modeling approaches such as probit or logit models should be employed to produce reliable and interpretable estimates (Long & Freese, 2014).

References

  • Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
  • Greene, W. H. (2012). Econometric Analysis (7th ed.). Pearson Education.
  • Long, J. S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata (3rd ed.). Stata Press.
  • Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press.