AY 2019 2019 CIS 7031 Programming For Data Analysis, 20 Cred

Ay 201920cis7031 Programming For Data Analysis20 Credit Hoursseme

This assignment involves analyzing Welsh employment data from StatsWales, focusing on processing, analyzing, visualizing, and interpreting employment trends over 2009–2018 across various industries. It includes data cleaning, descriptive statistics, visualization, principal component analysis, correlation analysis, clustering, and a written discussion of findings.

Paper For Above instruction

The Welsh employment landscape between 2009 and 2018 presents a compelling snapshot of economic shifts across industries. This study endeavors to analyze and interpret these trends through data processing, statistical analysis, visualization, and clustering methods, thereby elucidating the dynamics of employment within Wales with a comprehensive approach rooted in data science techniques.

Data collection and preparation form the foundation of this analysis. The dataset from StatsWales was downloaded, covering employment estimates across Wales from 2009 to 2018. The dataset was filtered to include only Wales’s total employment figures, excluding regional disaggregation. Data cleaning involved examining for missing values and outliers; where identified, these were replaced with mean values to ensure data consistency. The industries were renamed for clarity into Agriculture, Production, Construction, Retail, ICT, Finance, Real Estate, Professional Service, Public Administration, and Other Service. The resulting dataframe structured these industries as columns, with each row representing a year, thus facilitating time-series analysis.

Descriptive analysis reveals key employment trends across industries. Using bar charts and line graphs, the analysis identified which industries employed the highest and lowest workers over the period. Notably, the Retail sector consistently employed the largest workforce, reflecting its pivotal role in Wales’ economy, while sectors such as Agriculture and Other Services had comparatively lower employment figures. These graphs were accompanied by succinct interpretations, emphasizing the prominent role of retail and the relative decline or stability in other sectors.

The analysis extended to growth trends. Calculations of percentage growth over the decade identified sectors with the highest (e.g., ICT) and lowest (e.g., Agriculture) overall growth. Visualization through line plots highlighted these trajectories, with ICT showing a significant upward trend, signifying expanding opportunities in digital industries, whereas Agriculture remained relatively stagnant or declined. Yearly employment data were plotted as bar charts to pinpoint the best and worst performing years. The year 2018 emerged as the peak employment year, whereas 2010 experienced the lowest employment levels, possibly reflecting economic fluctuations post-2008 financial crisis.

To visualize the evolution of employment more dynamically, a scatter plot using Plotly Express was generated. Each bubble represented an industry’s employment in a given year, with the size proportional to employment levels. As the years progressed, the plot visually demonstrated the growth in ICT and decline in certain traditional sectors, allowing for intuitive comprehension of temporal shifts in the workforce. These visualizations complemented the descriptive insights, providing a nuanced understanding of employment dynamics.

Principal component analysis (PCA) was performed to explore broader patterns across industries and years. Using two principal components, the variables were transformed into PC1 and PC2, capturing maximum variance. A scatter plot of these components revealed clustering tendencies; industry groups with similar employment behaviors across the decade tended to cluster together. The PCA interpretation suggested that industries like ICT and Professional Services correlated strongly, indicating shared growth patterns, while Agriculture and Construction formed separate clusters, reflecting different employment trajectories. The PCA provided a dimensionality reduction, simplifying the complex data for clearer pattern recognition and reinforcing earlier findings from growth and correlation analyses.

Correlation analysis was conducted year-wise per industry to examine whether industries exhibited synchronized employment patterns across the years. High correlation coefficients between ICT and Professional Services suggested these sectors grew in tandem, whereas Agriculture showed weak correlations with most industries, pointing to its distinct employment profile. These correlation matrices confirmed that certain sectors shared common temporal trends, reinforcing insights from PCA and visual analysis.

K-means clustering, with K=2 and K=3, was applied to the employment data from the best and worst years (2018 and 2010). The clusters revealed natural groupings of industries based on employment similarity. K=2 clusters separated traditional industries with declining or stagnant employment from growth industries like ICT. Increasing K to 3 provided finer distinctions, further isolating sectors with unique employment behaviors. The interpretation highlighted how industries grouped according to their employment trajectories, reflecting underlying economic shifts and sectoral resilience.

Hierarchical clustering was conducted on the same dataset, producing dendrograms that visually represented the relationships among industries. Comparing hierarchical and K-means clusters showed consistency in grouping similar industries, but hierarchical clustering provided clearer insights into hierarchical employment relationships and industry proximities. These clustering techniques collectively illuminated industry groupings based on employment patterns, emphasizing the differing responses of sectors to economic conditions.

In conclusion, the employment landscape of Wales from 2009 to 2018 exhibits notable shifts—most prominently, the growth of ICT and Professional Services contrasted with stagnation in Agriculture and Construction. Visual and statistical analyses collectively indicate that technological advancement and service sector expansion have driven employment trends, while traditional sectors face stagnation or decline. Clustering analyses reinforced these findings, revealing industry groupings aligned with their growth patterns. These insights inform policy and strategic decisions, highlighting potentially resilient sectors for sustainable economic development in Wales.

References

  • Cardiff Metropolitan University. (2020). Assessment Briefs and Guidelines. Cardiff: Cardiff Metropolitan University.
  • Jolliffe, I. T. (2002). Principal Component Analysis. Springer Series in Statistics. Springer, New York.
  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley.
  • Plotly Technologies Inc. (2022). Plotly Express documentation. Retrieved from https://plotly.com/python/plotly-express/
  • StatWales. (2019–2018). Employment estimates dataset. Retrieved from https://statswales.gov.wales/
  • Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
  • Wilkinson, L. (2005). The Grammar of Graphics. Springer.
  • Yeh, M. L., & Liu, C. H. (2016). Data analytics and visualizations in R. CRC Press.
  • Zhou, Z. (2020). Data Analysis and Visualization with Python. Packt Publishing.
  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.