Exploring COVID-19 Data For Toronto, Canada STA303/1002 ✓ Solved
Exploring COVID-19 data for Toronto, Canada STA303/1002
For this assessment you will be working with the most up-to-date COVID data for the City of Toronto. In tasks 1 and 2, create versions of the ‘Cases by Day’ and ‘Cases by Outbreak Type and Week’ graphs available on the Toronto COVID portal under ‘Daily Status of Cases’. In task 3, you will use data about Toronto’s neighbourhoods, from the Toronto COVID portal under ‘Neighbourhood Maps’, alongside neighbourhood profile data from the 2016 census. The graphs in your final submission will utilize the most current version of the data.
Task 1: Daily Cases
Data wrangling involves preparing your data for visualization based on the provided guidelines. Your new wrangled dataset should be saved as an object called 'reported', ensuring that all NA values in the recovered, active, and deceased columns are replaced with 0. Verify that the reported_date column is in date format and adjust accordingly. Create a bar chart of active, recovered, and deceased cases by date with specific aesthetic guidelines for titles, axis labels, and source information. This visual representation is crucial for analyzing pandemic trends.
Task 2: Outbreak Type
For this task, you will begin with the raw outbreak data, processing it to create a dataset named 'outbreak'. Similarly, ensure that the episode_week column is correctly formatted and that the dataset is tidy. A stacked bar chart showing cases by outbreak type and week is required, again adhering to specified titles and labeling conventions.
Task 3: Neighbourhoods
This task focuses on preparing a dataset highlighting the percentage of 18 to 64-year-olds classified as low income in each neighbourhood. Data wrangling involves filtering, merging, and creating new variables that accurately capture the socio-economic context of COVID-19 in Toronto.
Paper For Above Instructions
COVID-19 has had a profound impact on cities around the world, showcasing varying infection rates and outcomes influenced by socio-economic factors. By utilizing the COVID-19 data for Toronto, we can analyze the trends and correlations between COVID-19 cases and various demographic factors. This paper aims to address the tasks outlined in the data exploration assessment for STA303/1002, focusing on visualizing data related to COVID-19 cases, outbreak types, and socio-economic conditions within Toronto’s neighbourhoods.
Task 1: Daily Cases
In the first task, we engage in data wrangling of the reported COVID-19 cases in Toronto. Using the R programming language and the dplyr and tidyr packages, the data is prepared for visualization. After loading the necessary data using relevant R functions, the dataset is cleaned to ensure there are no NA values in the 'recovered', 'active', and 'deceased' columns. Through the use of the mutate and replace functions, we prepare the dataset for accuracy in representation.
To visualize the active, recovered, and deceased cases, a stacked bar chart is created using the ggplot2 package. The chart includes essential details such as title, subtitle, and the source of the data while ensuring that the visual aesthetics align with the specified guidelines.
Task 2: Outbreak Type
The second task continues in a similar vein, focusing on the outbreak type data. By cleaning this dataset and ensuring proper formatting of the episode_week column, we create a tidy dataset to work from. Following the guidelines for visualization, a stacked bar chart is created to showcase cases by outbreak type and week, adhering to the specified aesthetics and color coding.
Task 3: Neighbourhoods
The third section delves into the socio-economic composition of Toronto's neighbourhoods concerning COVID-19. By analyzing the percentages of low-income individuals within specific age brackets, we create a comprehensive representation of how socio-economic factors might interact with infection rates.
In our visualizations for this task, we generate three maps that illustrate the percentage of low-income individuals, the number of COVID-19 cases per 100,000 residents, and a combined visualization that classifies neighbourhoods based on their socio-economic and health metrics. Each map is crafted to provide clear insights into how different Toronto neighbourhoods are affected by COVID-19 and how low-income classifications correlate with case rates.
Conclusion
The analysis of COVID-19 data for Toronto demonstrates the importance of rigorous data wrangling and insightful visualizations. The clear representation of trends and correlations in the data can serve as a valuable tool for both public health officials and scholars striving to understand the diverse impacts of the pandemic. By employing tools such as R and ggplot2, we can create meaningful interpretations of complex data patterns, ultimately contributing to the understanding of how socio-economic factors influence health outcomes in urban settings.
References
- Public Health Ontario. (2021). Ontario COVID-19 Data. Retrieved from https://www.publichealthontario.ca/
- City of Toronto. (2021). COVID-19: Status of Cases in Toronto. Retrieved from https://www.toronto.ca/home/covid-19/
- Government of Ontario. (2021). Integrated Public Health Information System and CORES. Retrieved from https://iphis.org/
- Toronto Open Data. (2021). Community Data on COVID-19. Retrieved from https://open.toronto.ca/
- Statistics Canada. (2021). Census Profile, 2016 Census. Retrieved from https://www12.statcan.gc.ca/
- R Core Team. (2021). R: A Language and Environment for Statistical Computing. Retrieved from https://www.r-project.org/
- Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. doi:10.1007/978-3-319-24277-4
- Tidyverse. (2021). Tidyverse Packages. Retrieved from https://www.tidyverse.org/
- Scales. (2021). R Package for Scales. Retrieved from https://cran.r-project.org/web/packages/scales/index.html
- Wickham, H. & Henry, L. (2020). tidyr: Tidy Messy Data. Retrieved from https://cran.r-project.org/web/packages/tidyr/index.html