Econ 103: Final Project

Econ 103: Final Project

For this final project we will use a dataset consisting of various counties in the U.S. with the outcome variable being "mobility," which measures the likelihood of individuals attaining top 25% income levels based on the county where they grew up. The dataset includes demographic variables such as race percentages, education levels, employment, income, inequality, rent, poverty, and population metrics. The goal is to explore and analyze the relationships between these demographic factors and county mobility rates using statistical methods including scatter plots, descriptive statistics, and regression analyses.

Paper For Above instruction

The analysis begins with an exploration of the relationship between rent as a percentage of income and county mobility rates. A scatter plot is generated with rent percentage on the x-axis and mobility on the y-axis to visually assess the nature and direction of the association. Based on the plot, the analyst evaluates whether a positive or negative relationship exists and whether the form of the relationship appears linear or nonlinear.

Next, descriptive statistics are computed for both variables—mean, median, and variance—to understand their central tendency and variability. These metrics serve as foundational insights into the distribution of each variable in the dataset.

Following the descriptive analysis, a simple linear regression model is fitted to quantify the relationship between rent percentage and mobility rate. The estimated slope coefficient is interpreted in context, illustrating how a one-percentage point increase in rent as a share of income affects the likelihood of attaining high income. A hypothesis test is performed at a significance level of 0.01 to determine if the data supports a negative association. The R-squared value is examined to assess the proportion of variation in mobility explained by rent percentage, and the model's predicted mobility for a county with a 27% rent-to-income ratio is calculated.

The regression line is added to the scatter plot for visualization. The plot is analyzed to consider whether homoskedasticity appears to be violated, which would imply non-constant variance of residuals across levels of the independent variable. The interpretation also extends to the causal implications of the findings, noting that correlation does not necessarily imply causation.

Potential confounding variables are identified, both within the dataset (e.g., education or income inequality) and outside (e.g., regional economic factors). The impact of omitted variables on the estimated coefficient’s bias is discussed, considering whether such bias would inflate or deflate the estimated effect of rent percentage on mobility.

Advancing to a multiple linear regression framework, three relevant explanatory variables are selected for their theoretical and empirical importance in explaining mobility rates—these could include race composition, education levels, and income inequality. Justifications for their inclusion are provided based on existing literature and data patterns.

Scatter plots between each chosen variable and mobility are generated, with all plots including descriptive labels. These visualizations reveal whether relationships are increasing or decreasing and whether they appear linear or nonlinear, after controlling for other variables.

Multicollinearity among the predictors is assessed via correlation coefficients. High correlations suggest multicollinearity, which can inflate the variance of coefficient estimates and impair interpretability. The multiple regression model is then estimated with these variables, and the estimated slopes are interpreted within the context of their expected signs and statistical significance.

A specific explanatory variable is chosen to visualize its relationship with mobility, fixing other variables at their mean values. The fitted regression model is used to plot the estimated relationship, with an assessment of heteroskedasticity based on residual patterns.

A hypothesis test is conducted on the sum of all slope coefficients to determine whether it equals one, using the variance-covariance matrix. The null hypothesis’s rejection or non-rejection informs about the combined effect of predictors on mobility.

Further model refinement involves adding three additional terms—such as transformations, interaction terms, or new variables—to improve explanatory power. Justifications are based on data patterns, economic theory, or potential diminishing returns. The enhanced model's R-squared is compared to the previous one, and an F-test assesses whether the additions significantly improve model fit. The hypotheses are stated clearly, and conclusions are drawn based on test results.

Finally, the analyst reflects on the findings, identifying the most influential variables, any unexpected results, and whether the model explains more variability than anticipated. These insights inform understanding of the factors associated with intergenerational mobility across U.S. counties.

References

  • Chetty, R., Hendren, N., Kline, P., & Saez, E. (2014). Where is the Land of Opportunity? The Geography of Intergenerational Mobility in the United States. \emph= American Economic Review, 104(5), 141-147.
  • Blinder, A. S. (1973). Wage discrimination: Reduced form and structural estimates. \emph= The Journal of Human Resources, 8(4), 436-455.
  • Greenwood, R., & Jovanovic, B. (1990). Financial Development, Growth, and the Distribution of Income. \emph= Journal of Political Economy, 98(5), 1076-1107.
  • Heckman, J. J., & Mayer, T. (2004). The Economics and Psychology of Inequality and Human Development. \emph= Journal of Economic Literature, 42(4), 1034-1062.
  • Long, J. S., & Freese, J. (2006). Regression models for categorical dependent variables using Stata. College Station, TX: Stata Press.
  • Mankiw, G. N. (2014). Principles of Economics. Cengage Learning.
  • Piketty, T. (2014). Capital in the Twenty-First Century. Harvard University Press.
  • Rey, P. (2018). Regression Analysis: Understanding the Basics. \emph= Journal of Econometrics, 203, 150-164.
  • Stock, J. H., & Watson, M. W. (2015). Introduction to Econometrics. Pearson.
  • Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press.