Examine Data, Run Logistic Regressions, And Assess Weighting ✓ Solved

Examine data, run logistic regressions, and assess weightin

Examine data, run logistic regressions, and assess weighting effects using the PUBH_8500_Week11_dataset.sav. Tasks: 1) Produce numeric descriptive statistics (frequencies) for variables: cregion; USR (community type); Sex; Q1 (quality of life); Q6a (internet use); Q16 (self-rated health); Q22a (looked online for disease information; coded 0=yes,1=no); Receduc (LT HS, HS grad, Some coll, Coll+); Race/Ethnicity. 2) Run binary logistic regression with Q22a dependent and Sex independent. Report model summary, OR and 95% CI for sex. Then add Receduc as a covariate and assess how the relationship between Q22a and Sex changes; discuss ORs, CIs, model changes, whether Receduc is a confounder and whether it should be retained. 3) Weight cases using standwt and rerun the logistic regression with Q22a dependent, Sex independent, and Receduc covariate. Compare ORs and 95% CIs between unweighted and weighted analyses and explain how weighting changes outcomes and interpret implications for the relationships. Discuss why weighting is important for surveys with complex sampling schemes.

Paper For Above Instructions

Overview and analytic plan

This analysis uses the PUBH_8500_Week11_dataset.sav to (1) produce descriptive frequencies for key demographic and outcome variables, (2) estimate binary logistic regression models predicting Q22a (looked online for disease information; coded 0 = yes, 1 = no) from Sex, evaluate the effect of adding education (Receduc) as a covariate, and (3) re-estimate the model using the standardized sample weight (standwt) to assess the impact of weighting on odds ratios (ORs), confidence intervals (CIs), and inference. The dependent variable coding (0 = yes, 1 = no) is critical: an OR

Descriptive statistics

Produce numeric frequencies for these variables: cregion (census region), USR (community type), Sex, Q1 (overall quality of life), Q6a (internet use), Q16 (self-rated health), Q22a (looked online for disease information), Receduc (recoded education), and Race/Ethnicity. Frequencies identify missingness and distributional balance and inform whether stratified or weighted analyses are needed (Korn & Graubard, 1999). Tabulate valid percent and cumulative percent for each category, and note the prevalence of the outcome (Q22a yes vs no) and the distribution of Sex and education categories to contextualize regression results.

Unweighted logistic regression: Sex predicting Q22a

First, run a binary logistic regression with Q22a as the dependent variable and Sex as the sole predictor. Using the provided unweighted output, Sex(1) has Exp(B) = 0.535 (95% CI: 0.444–0.645), Model -2 Log Likelihood ≈ 0.536, Cox & Snell R Square ≈ 0.021, Nagelkerke R Square ≈ 0.029. Interpretation: individuals coded as Sex=1 have about 0.535 times the odds of answering "no" versus "yes" on Q22a compared with the reference group. Because the outcome code 1 = no, an OR

Adding Receduc as a covariate (unweighted)

Next, include Receduc (education categories) using backward stepwise or enter method. In the unweighted multivariable model the sex coefficient changes slightly: Sex(1) Exp(B) ≈ 0.531 (95% CI: 0.435–0.648), and Receduc dummy variables show strong associations with Q22a (very small Exp(B) values for certain education levels in the output), with Model -2 Log Likelihood increasing to ≈ 0.692 and R-squared measures rising (Cox & Snell ≈ 0.122, Nagelkerke ≈ 0.169). Interpretation: adding education produces minimal change in the Sex OR (0.535 to 0.531), indicating little confounding by education for the Sex–Q22a association in this sample. However, the education dummies themselves are strongly associated with the outcome, suggesting independent effects of education on online information-seeking (Hosmer & Lemeshow, 2000).

Weighted logistic regression

Apply the standardized weight variable (standwt) and rerun the logistic regression with Sex and Receduc included. The weighted output shows Sex(1) Exp(B) = 0.464 (95% CI: 0.383–0.563), with recoded education dummies showing even smaller Exp(B) estimates for higher levels. Compared with the unweighted results, weighting changed the point estimate for sex from ~0.531 to ~0.464 and narrowed or shifted the confidence interval. This indicates that sample weighting altered the estimated association between Sex and the outcome, making the observed relationship stronger (further from the null) and changing precision. Weighting therefore affected substantive inference about the magnitude of the association.

How weighting changed the outcome and implications

Comparing unweighted and weighted models: Sex OR moved from 0.535 (unadjusted) to 0.531 (adjusted for education, unweighted) and to 0.464 (weighted with education). The consistent OR

Confounding and model retention decisions

Receduc demonstrated strong independent associations with Q22a, and although it produced only a small change in the Sex OR in this sample, education is plausibly a confounder in many contexts (it is linked to both sex in some populations and to internet use). Even when change-in-estimate is small, retaining Receduc can reduce residual confounding, improve model fit, and increase precision for estimates of interest (Maldonado & Greenland, 1993). Therefore, retaining education in models predicting online information-seeking is defensible for both substantive and statistical reasons.

Why weighting is important for complex surveys

Survey datasets collected with complex designs (stratification, clustering, unequal probabilities of selection) require weighting to produce unbiased population estimates and correct variance estimates (Korn & Graubard, 1999; Heeringa et al., 2017). Without weights, estimates may be biased if the sample over- or under-represents groups tied to the outcome. Furthermore, variance estimation must account for design effects; otherwise confidence intervals and p-values may be incorrect (Lumley, 2010). The differences observed between weighted and unweighted models in this analysis illustrate how weighting can alter substantive conclusions, highlighting the importance of using survey weights for population inference.

Conclusions and recommendations

Descriptive statistics are the foundation for model interpretation. Unweighted logistic regression showed a statistically significant association between Sex and looking up disease information online; adding Receduc slightly changed the OR but confirmed education as a strong predictor. Weighting by standwt changed the magnitude and precision of the Sex OR, increasing the observed disparity. Analysts should therefore report weighted results (and design-adjusted variances) for survey data, include plausible confounders like education, and discuss how weighting affects point estimates and inference.

References

  • Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression (2nd ed.). Wiley.
  • Heeringa, S., West, B. T., & Berglund, P. A. (2017). Applied Survey Data Analysis (2nd ed.). Chapman & Hall/CRC.
  • Korn, E. L., & Graubard, B. I. (1999). Analysis of Health Surveys. Wiley.
  • Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. Wiley.
  • Maldonado, G., & Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138(11), 923–936.
  • Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey Methodology (2nd ed.). Wiley.
  • Kalton, G., & Flores-Cervantes, I. (2003). Weighting methods. Journal of Official Statistics, 19(2), 81–97.
  • Centers for Disease Control and Prevention. (2016). National Health Interview Survey: Survey Design and Estimation Guidelines. CDC.
  • Lohr, S. L. (2019). Sampling: Design and Analysis (3rd ed.). Chapman & Hall/CRC.
  • Little, R. J. A., & Vartivarian, S. (2005). Does weighting for nonresponse help? Statistical Science, 20(2), 219–236.