Types Of Logistic Regression | Robert Laukait
Types Of Logistic Regression R Laukaitistop
In this discussion, a fictitious researcher plans to perform a logistic regression analysis using the default (enter) method. According to Warner (2013) and George and Mallery (2013), the enter method involves entering all independent variables into the model simultaneously, allowing the researcher to handle a block of variables at once. This approach provides a comprehensive view of the effects of all variables together, as they exist in total.
Alternatively, the researcher’s friend Barbara recommends conducting a sequential logistic regression. This method involves adding variables one at a time, sequentially, until the model fit no longer improves, as described by Sawtelle, Brewe, & Kramer (2011) and Stockburger (n.d.). An advantage of the sequential approach is that it offers a mechanism to impute missing data within the dataset, as Raghunathan, Lepkowski, Van Hoewyk, & Solenberger (2001) suggest. However, a potential disadvantage is that cases with missing data might be inappropriately retained or discarded, depending on the context, which could impact data integrity.
Finally, the researcher’s friend Linda advocates for a stepwise logistic regression. Using software like SPSS, this method involves removing variables sequentially based on their correlation strength, as explained by George & Mallery (2013) and the School of Geography (n.d.). The primary benefit of stepwise regression is its ability to focus on variables that provide the best model fit, thereby simplifying the model. Nonetheless, it is associated with substantial drawbacks, such as the risk of discarding variables that might be theoretically important but do not fit well statistically, as criticized by Smith (2012).
To determine the most appropriate approach, the researcher should begin with an initial review of the dataset. If missing data are present and imputation is necessary, the stepwise method may be suitable, provided there is a sufficiently large sample to estimate missing values reliably. If the goal is to identify the most representative variables related to the research question, then the stepwise approach could be advantageous. Conversely, if the dataset is complete, and case discardment would not threaten the reliability of results, the enter method might be optimal for its straightforwardness and clarity of effects.
Paper For Above instruction
Logistic regression is a commonly used statistical technique for modeling the relationship between a dichotomous dependent variable and one or more independent variables. The choice of the type of logistic regression method significantly impacts the interpretation, validity, and reliability of the results. This paper explores three primary types of logistic regression—enter, sequential, and stepwise—and discusses their advantages and disadvantages, providing guidance on selecting the most appropriate method based on data properties and research objectives.
The enter method, also known as the standard or forced entry method, involves entering all specified independent variables into the logistic regression model simultaneously. This approach allows researchers to observe the combined effect of all variables on the outcome, providing a comprehensive view of their relationships. Warner (2013) and George and Mallery (2013) emphasize that the enter method is straightforward and transparent, making it suitable when the researcher has theoretical reasons for including specific variables and when all variables are deemed equally important. Its primary advantage is that it assesses the unique contribution of each predictor while controlling for others. However, it may result in a less parsimonious model if many variables are included, potentially leading to issues such as multicollinearity or overfitting.
In contrast, sequential logistic regression involves adding or removing variables in a stepwise manner based on specific criteria, such as significance levels or change in model fit (Sawtelle, Brewe, & Kramer, 2011). This approach is often used in exploratory analyses, where the researcher aims to identify the most influential predictors. The sequential method's advantage lies in its ability to handle missing data via imputation methods, as Raghunathan et al. (2001) suggest, and in reducing the number of variables to a manageable subset that best explains the dependent variable. Nonetheless, sequential regression can be computationally intensive and may introduce bias if included variables are not theoretically justified, leading to overfitting and issues with model generalizability.
The stepwise logistic regression is a particular form of sequential regression where variables are systematically added or removed based on their statistical significance. Utilizing algorithms like forward selection, backward elimination, or a combination thereof, this method aims to identify a model with the optimal subset of predictors (George & Mallery, 2013; School of Geography, n.d.). Its strength is in simplifying the model—highlighting only variables with the strongest associations—thus improving interpretability and efficiency. However, Smith (2012) criticizes stepwise procedures for the potential to exclude variables that are theoretically important but do not meet strict statistical criteria, raising concerns about model stability and replicability.
Deciding among these methods depends largely on the nature of the dataset and research goals. For datasets with missing data requiring imputation, the sequential approach may be advantageous, especially if there is a rationale for estimating missing values and ensuring enough data for reliable imputation. When aiming to identify the most relevant variables related to the research question, the stepwise method can be efficient, provided the researcher is aware of its limitations related to model stability. Conversely, if data are complete and the researcher wishes to measure the effects of all relevant variables simultaneously, the enter method remains the most straightforward and defensible approach.
In conclusion, understanding the strengths and limitations of each logistic regression approach is essential for appropriate model selection. Researchers must consider their dataset's characteristics, theoretical foundations, and specific research questions. A careful initial review of data properties—such as missingness, variable relevance, and sample size—can guide the choice of the most suitable method, ultimately yielding more accurate and meaningful results in logistic regression analyses.
References
- George, D., & Mallery, P. (2013). IBM statistics 21 step by step: A simple guide and reference (13th ed.). Pearson.
- Raghunathan, T. E., Lepkowski, J., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85-95.
- Sawtelle, V., Brewe, E., & Kramer, L. H. (2011). Sequential logistic regression: A method to reveal subtlety in self-efficacy. In Proceedings of the tenth annual college of education & GSN research conference (pp. ). Miami, FL.
- School of Geography. (n.d.). Stepwise liear regression. Retrieved Feb 07, 2017, from University of Leeds.
- Smith, M. K. (2012). Problems with stepwise model selection procedures. Retrieved from Common mistakes in using statistics: Spotting and avoiding them.
- Stockburger, D. W. (n.d.). Multiple regression with many predictor variables. Retrieved from psychstat.missouristate.edu.
- Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand Oaks, CA: Sage.