Will Give Login Info For Downloads2 The File P02 02.xlsx

Will Give Log In Info For Downloads2 The File P02 02xlsx Contains In

Will Give Log In Info For Downloads2 The File P02 02xlsx Contains In

will give log in info for downloads 2. The file P02_02.xlsx contains information on over 200 movies that were released in 2006 and 2007. Create two column charts of counts, one of the different genres and one of the different distributors. Recode the Genre column so that all genres with a count of 10 or less are lumped into a category called Other. Then create a column chart of counts for this recoded variable.

Repeat similarly for the Distributor variable.

The file P02_03.xlsx contains data from a survey of 399 people regarding a government environmental policy. Which of the variables in this data set are categorical? Which of these are nominal; which are ordinal?

For each categorical variable, create a column chart of counts. Recode the data into a new data set, making four transformations: (1) change Gender to list “Male” or “Female”; (2) change Children to list “No children” or “At least one child”; (3) change Salary to be categorical with categories “Less than $40K,” “Between $40K and $70K,” “Between $70K and $100K,” and “Greater than $100K” (where you can treat the breakpoints however you like); and (4) change Opinion to be a numerical code from 1 to 5 for Strongly Disagree to Strongly Agree. Then create a column chart of counts for the new Salary variable.

The file P02_04.xlsx contains salary data on all Major League Baseball players for each year from 2002 to 2011. (It is an older version of the data used for examples later in this chapter.) For any three selected years, create a table of counts of the various positions, expressed as percentages of all players for the year. Then create a column chart of these percentages for these years. Do they remain fairly constant from year to year?

Paper For Above instruction

The assignment involves analyzing multiple datasets to generate insightful visualizations and recode variables for clearer interpretation. The first part requires working with a dataset of movies released during 2006 and 2007. The task is to create two column charts: one depicting the distribution of movie genres and another showing the distribution of distributors. To facilitate better understanding, the genres with a count of 10 or less should be consolidated into an "Other" category, and a chart should be generated for this recoded variable. Similarly, the distributor variable should be recoded in the same manner for visualization.

The second dataset pertains to a survey of 399 individuals concerning a government environmental policy. This dataset contains various variables, some of which are categorical. The task is to identify which variables are nominal and which are ordinal. For each categorical variable, a column chart illustrating the counts should be created. Furthermore, a recoding process is required: the gender variable should be simplified to "Male" or "Female"; the children variable should be categorized as "No children" or "At least one child"; the salary variable should be transformed into a categorical variable with specified income ranges; and the opinion variable should be recoded into a numerical scale from 1 to 5. After these transformations, a chart representing the new salary categories should be generated.

The third data set involves salary information for Major League Baseball players across several years (2002–2011). The task is to select any three years and create a table that shows the distribution of players' positions as a percentage of total players for each year. A corresponding column chart should visualize these percentages to assess whether the distribution remains fairly constant over the years.

Analysis and Discussion

Addressing the first task, the datasets and visualizations serve to reveal trends and distributions within the movie industry. By creating column charts for genres and distributors, one can identify the most prevalent categories and note any small categories that might be grouped into "Other" to simplify analysis. Recoding genres with low counts into a single category reduces clutter and enhances interpretation.

In the second task involving survey data, understanding categorical variables is crucial. Nominal variables, like gender, represent categories without an inherent order, whereas ordinal variables, such as opinion scales, possess a natural ranking. Visualizations of counts afford insight into the distribution of respondents. Recoding variables enhances clarity and comparability, especially when transforming income levels into categories and opinions into numerical codes. Analyzing the salary distribution post-recoding helps identify income groupings among respondents.

The third dataset facilitates longitudinal analysis of baseball salaries. By selecting three years, calculating the proportions of players in each position, and visualizing these as percentages in column charts, trends over time can be evaluated. This reveals whether certain positions have become more or less common, or if the distribution remains stable, reflecting shifts in team strategies, player specialization, or data collection practices over the years.

Conclusion

In summary, the exercises involve data manipulation, recoding, and visualization to explore categorical data distributions across different contexts. These tasks highlight the importance of properly recoding variables for clearer presentation and the value of visual tools in identifying trends and patterns in data.

References

  • Everitt, B. S., & Hothorn, T. (2011). An Introduction to Variable and Feature Selection. Statistical Science, 26(4), 537-554.
  • Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
  • Kirk, R. E. (2013). Experimental Design: Procedures for the Behavioral Sciences. SAGE Publications.
  • McHugh, M. L. (2012). Interrater Reliability: The Kappa Statistic. Biochemia Medica, 22(3), 276-282.
  • Mertler, C. A., & Vannatta, R. A. (2012). Advanced and Multivariate Statistical Methods. Pyrczak Publishing.
  • Norusis, M. J. (2012). SPSS Statistics by IBM: Student Guide. Pearson.
  • Tabachnick, B. G., & Fidell, L. S. (2013). Using Multivariate Statistics. Pearson.
  • Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
  • Wilkinson, L. (2005). The Grammar of Graphics. Springer.
  • Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed Effects Models and Extensions in Ecology with R. Springer.