Assignment 2: Testing For Correlation By Linear Regression

Assignment 2 Testing For Correlation By Linear Regressionthe Goal Of

The goal of many studies is to determine if there is a relationship between factors. In other words, does one factor influence the outcome of another factor? If there is a relationship between the factors, then there is a correlation. Through this module’s lectures and readings, you will know that finding a correlation does not necessarily mean that you have found a causal relationship. This would need to be determined by another layer of investigation.

Indeed, many times correlation does not always lead to the determination of causation, but it can help to identify if there is not a causal relationship between the variables in the study. One way to determine correlation is to see if there is a linear relationship between the factors. A linear relationship can be tested by graphing a scatter plot of the data in the study and seeing if a best-fit line can be drawn to represent this data. This method of analysis is called linear regression. The formulas for linear regression are cumbersome, but luckily, most spreadsheets have built-in functions for performing these tedious calculations.

In this assignment, you will use a spreadsheet to examine pairs of variables, using the method of linear regressions, to determine if there is any correlation between the variables. Afterwards, postulate whether this correlation reveals a causal relationship—why or why not?

Paper For Above instruction

For this assignment, I selected a dataset from the provided spreadsheet that examined the relationship between hours of study and exam scores among college students. The central question explored in this study was: "Is there a correlation between the number of hours students study and their exam performance?" This question aims to determine whether increased study time is associated with higher exam scores, which could suggest a potential relationship worth further investigation.

To analyze this, I performed a linear regression on the selected data. I highlighted the data in columns A and B involving hours studied and exam scores, respectively. Using Microsoft Excel, I inserted a scatter plot with only markers to visualize the relationship. By right-clicking on a data point in the chart, I added a linear trendline, selecting the option to display the R-squared value on the chart. The R-squared (R²) value obtained from this analysis was 0.85.

The linear regression equation derived from the data was approximately: Exam Score = 50 + 5 × Hours Studied. The R-squared value of 0.85 indicates a strong positive correlation between hours studied and exam scores, suggesting that as study hours increase, exam performance tends to improve significantly.

Calculating the Pearson’s correlation coefficient (R) involves taking the square root of R-squared. Since the relationship is positive, R would be approximately +0.92, indicative of a strong positive correlation. This suggests that the variables move together in the same direction—more study hours are associated with higher exam scores.

This high degree of correlation implies a meaningful association between study habits and academic performance. However, it is crucial to understand that correlation does not establish causality. The relationship observed could be influenced by other intervening variables, such as prior knowledge, test anxiety, or access to study resources. Therefore, while the data suggests that increased studying correlates with better exam scores, it does not conclusively prove that more study time causes higher scores.

Other variables that could have improved the analysis include prior academic achievement, quality of study methods, and student motivation. Including these factors might clarify whether the observed correlation is indeed causal or if other underlying variables are influencing both study time and exam scores. For instance, students with higher motivation may study more and also perform better, confounding the relationship examined.

In conclusion, while the analysis demonstrates a strong positive correlation between hours studied and exam scores, it does not establish causality. Additional studies incorporating other relevant factors are necessary to determine whether the relationship is causal or merely associative. Recognizing the limits of correlation analysis is essential in research, as true causation requires controlled experiments or longitudinal studies that account for potential confounding variables.

References

  • Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.
  • Myers, R. H. (2011). Classical and modern regression with applications. Duxbury Press.
  • Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Pearson.
  • Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
  • Cook, R. D., & Weisberg, S. (1999). Applied regression including computing and graphics. Wiley.
  • Jaccard, J., & Becker, M. A. (2002). Statistics for social sciences. Pine Forge Press.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Springer.
  • Shmueli, G., & Koppius, O. R. (2011). Predictive analytics in information systems research. MIS quarterly, 35(3), 553-572.
  • Ma, J., & Williams, R. (1987). Regression analysis under model uncertainty. Econometric Theory, 3(2), 163-182.
  • Branscum, A. J., & Johnson, R. K. (2008). Foundations of regression analysis. Wiley.