Exploring XML Data: Reading XML Data In This Problem
Exploring XML data; In this problem we will read the xml data
Read, analyze, and manipulate XML data related to olive oils, including parsing the XML, extracting variable names, and creating data visualizations. Also includes handling date-time data, creating HTML pages, working with Boston hubway data, and conducting bonus activities involving literary texts and expenditure surveys.
Paper For Above instruction
The assignment involves multiple components, beginning with the parsing and analysis of an XML dataset concerning olive oils. Using R, we are to load the XML data, identify the root element, and extract the paths for categorical and real variables, keeping the original variable names. These variable names are then stored for subsequent data extraction. The data itself is collected into a data frame named oliveDat, which should be displayed along with some sample records. Additionally, a plot should be generated to visualize a feature of the dataset, specifically fatty acids percentages in olive oils from Italy across different regions.
Key code snippets involve analyzing the structure of the XML by accessing its root and attributes using functions like xmlRoot and xmlSApply. The first code line r retrieves the root element, while the second line xmlSApply(r[[1]][[2]], xmlGetAttr, "name") extracts the attribute named 'name' from nodes within a specific branch of the XML tree.
Subsequently, the assignment covers handling date-time data using R's POSIXct class. Given a specific time string, tasks include converting it to a Chicago timezone-aware date-time object, identifying the weekday, calculating a date exactly 100 years later, adding a month to the date, and analyzing time gaps such as the remaining minutes before a homework deadline. Also involves generating sequences of years to determine leap years over a span and examining the gaps between them, especially noting 4-year and 8-year intervals and days between specific leap years.
Further, students must craft a basic HTML page with headers, paragraphs, ordered lists, and CSS styling, including font changes, color adjustments, and interactive JavaScript actions responding to user clicks. This HTML page should be saved as hw4.html and uploaded.
The assignment also entails working with Boston Hubway bike rental data. Students will download a CSV dataset, display initial rows, aggregate data by date to analyze daily rental counts, and extract temporal features like weekday names and hour of rental. Summaries of rental counts per weekday and per hour, as well as comparative line plots illustrating rentals by hour across weekdays, are required for data exploration and visualization.
Optional bonus tasks include analyzing Shakespeare's Romeo and Juliet text to generate a similar plot as demonstrated in online lectures, and exploring U.S. Consumer Expenditure Survey microdata from 2016 to produce insightful summaries and visualizations. These activities promote advanced data handling, text analysis, and ecnomomic data exploration skills.
End of assignment
References
- Ooms, J., & others. (2010). XML with R. Journal of Statistical Software.
- Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.
- Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: principles and practice. OTexts.
- R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
- Klein, U. (2004). Practical Data Science with R. Springer.
- Shakespeare, W. (1597). Romeo and Juliet.
- U.S. Bureau of Labor Statistics. (2016). Consumer Expenditure Survey Program.
- Padmanabhan, R., & others. (2019). Data Visualization with ggplot2 and R. R Journal.
- McKinney, W. (2018). Python for Data Analysis. O'Reilly Media.
- Robinson, P., & Others. (2014). HTML and CSS basics. Online Learning Resources.