National Adult Tobacco Survey (NATS): Analysis And Applicati ✓ Solved
National Adult Tobacco Survey (NATS): Analysis and Applications
The National Adult Tobacco Survey (NATS) aims to assess the prevalence and determinants of tobacco use among adults in the United States, providing data at both national and state levels. The survey collects comprehensive information on demographic factors, tobacco product usage, attitudes, and behaviors, enabling researchers and policymakers to evaluate tobacco control programs and develop targeted interventions.
This project involves analyzing NATS data to explore various aspects of tobacco use, including demographic disparities, product preferences, economic factors, and behavioral patterns. The analyses include descriptive statistics, clustering techniques, and regression modeling to develop insights valuable for tobacco control efforts and industry marketing strategies.
Sample Data Import and Preparation
The process begins with importing the NATS dataset, which is available as a CSV file from the CDC. Using SAS software, the data is imported, sorted, and processed to handle missing or anomalous values. Recoding negative values and non-sensical codes (e.g., 666 cigarettes per day) ensures data accuracy.
Subset datasets into training (80%) and testing (20%) samples, stratified by key variables such as smoking status and cigarette quantity. This facilitates robust model training and validation, reducing overfitting and enhancing generalizability.
Descriptive Analytics
Descriptive analysis involves calculating the prevalence of current smoking across demographic groups, such as race, gender, age, education, and geographic region, and visualizing these patterns with bar plots and pie charts. Understanding which groups are most affected or most likely to use tobacco products provides foundational insights.
Specific analyses include:
- Estimating the proportion of smokers by demographic segments
- Identifying e-cigarette usage among current smokers
- Assessing smokeless tobacco use among smokers
- Calculating mean and median costs per pack by state and highlighting the most and least expensive regions
- Ranking popular cigarette brands among smokers
- Determining health risk perception across demographic groups
- Analyzing quit attempts and intentions by race, gender, age, and education
Moreover, for promotional strategies—such as identifying target groups for Marlboro—descriptive statistics and graphical representations help pinpoint the most receptive demographics.
Clustering Analysis
Hierarchical and non-hierarchical clustering are applied to segment the population into meaningful groups based on demographics, smoking behaviors, and socio-economic factors. Key steps include:
- Forming clusters based on age, income, and education level
- Testing the significance of these clusters with variables like cigarettes smoked per day and current smoking status
- Identifying the most influential clusters for tobacco marketing based on behavioral and demographic attributes
Cluster validation involves statistical tests and visualization to ensure meaningful segmentation. Out-of-sample predictions using test data evaluate the robustness of the clustering models and their utility in targeting interventions or marketing campaigns.
Regression Analysis
The core of predictive modeling involves developing regression models to estimate the number of cigarettes smoked per day (SMOKSOMDAY). Multiple models are constructed:
- Model 1: Incorporates selected demographic and behavioral variables
- Model 2: Adds nonlinear terms, such as age squared and interaction effects
- Model 3: Utilizes stepwise regression for optimal variable selection
Model performance is assessed both in-sample and out-of-sample using metrics like Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Percentage Error (MPE), and Mean Absolute Error (MAE). The best model balances complexity and predictive accuracy.
Conclusion and Recommendations
The comprehensive analysis of NATS data offers valuable insights into tobacco use dynamics. Identifying at-risk demographic groups, behavioral patterns, and economic factors supports targeted intervention and policy development. Regression and clustering models inform both public health strategies and industry marketing approaches, aiding in prevention efforts or product promotion.
References
- Centers for Disease Control and Prevention. (2022). National Adult Tobacco Survey (NATS). CDC.
- Levy, D. T., et al. (2019). The impact of tobacco control programs and policies. Annual Review of Public Health, 40, 223-238.
- Kozlowski, L. T., et al. (2018). Patterns of E-cigarette use among U.S. adults. Tobacco Regulatory Science, 4(4), 271-288.
- Hiscock, R., et al. (2018). Smoking and the role of health perceptions: Systematic review. Preventive Medicine, 75, 23-32.
- Shah, P. S., et al. (2017). Socioeconomic disparities in tobacco use. Tobacco Control, 26(Suppl 2), S40-S47.
- Chapman, S., & Freeman, B. (2019). Regulating e-cigarettes and addressing tobacco marketing. Tobacco Control, 28(4), 373-377.
- Nuyts, P. A. & Lorant, V. (2020). Clustering approaches in tobacco research: A review. Substance Use & Misuse, 55(7), 1203-1210.
- Monson, R. R., et al. (2019). Regression models for behavioral health risk factors. Journal of Public Health Management & Practice, 25(3), 239-245.
- Moons, K. G., et al. (2020). External validation of prediction models: A review. BMJ, 350, h2519.
- Roberts, M. E., et al. (2018). The importance of out-of-sample validation in predictive modeling. Journal of Biomedical Informatics, 78, 46-55.