University Of La Sabana Winter School 2021 Business Intellig

University of La Sabana Winter School 2021 Business Intelligence Assessment: Individual Home Project

In hotel industry, it is very common that customers cancel their bookings before they check-in or do not show up at the time of their check-in. Both cases are usually shown as cancellation in the hotel’s booking system. Predicting a hotel booking's likelihood to be cancelled can help the hotel manager to effectively allocate rooms in their booking systems. In this assessment, you are going to predict the hotel booking cancellations using data-driven models. The data related to this assessment can be downloaded from Teams.

You need to write an analysis report to discuss how do you complete the tasks and go into sufficient depth to demonstrate knowledge and critical understanding of the relevant processes involved. 100% of available marks are through the completion of the written report. Report Guidance Your report must conform to the below structure and include the required content as described. You must supply a written report containing three distinct sections that provide a full and reflective account of the processes undertaken.

Paper For Above instruction

Section I: Data Loading and Preparation (15%)

As a first step, you need to download the datasets from Teams. There are two datasets: hotel_bookings_01.csv and hotel_bookings_02.csv. Your task is to merge these datasets using R, providing screenshots of the key steps and reporting the dimensions of the merged dataset (number of rows and columns). Then, analyze the merged dataset for missing values, identify any features with missing data, and determine how to handle these missing values—either by removing instances, imputing values, or dropping entire columns. Justify your chosen approach, include screenshots of the process, and document the reasoning. Finally, convert and export the cleaned dataset into an Excel (.xlsx) file format.

Section II: Descriptive Analytics (25%)

This section involves performing descriptive analysis on the prepared dataset. Use either Excel or R to identify and summarize numeric features, reporting means, medians, minima, maxima, standard deviations, and the number of unique values, and to compute correlation coefficients between variables. Additionally, visualize the distribution of customer types between city and resort hotels, compare the average daily rates for no-shows, cancellations, and check-outs across hotel types, and analyze whether repeated guests are more likely to check out. Create line plots showing the relationship between days in waiting list and average daily rate and booking changes, with proper axes and annotations, and interpret whether a linear relationship exists based on the plots.

Section III: Hotel Booking Cancellation Prediction (60%)

Utilize R to develop classification models predicting whether a booking will be canceled. Justify why you select IsCanceled as the response variable and discuss potential modifications if using ReservationStatus. Choose two classification models studied in your course, provide a concise description of each, detailing their principles and suitability for this task. Implement model training and testing with R, clearly documenting the code, specifying input variables, model parameters, data splitting ratios, and using the seed number provided in BI_Random_Seed_2021.pdf. Evaluate models based on accuracy, select the best performing model, and explain why it is superior. Finally, interpret the business insights derived from this model, such as key predictive features, customer segments at higher risk of cancellation, and potential behavioral or market factors influencing cancellations. Support your conclusions with relevant literature and theories.

The report must include your student number and course name, be single-spaced in 11 pt font, not exceed 15 pages (excluding cover and references), and follow proper citation and referencing standards according to university guidelines. If any issues arise, consult the course coordinator.

References

  • Ali, S., & Smith, J. (2019). Predictive Analytics in Hotel Management. Journal of Hospitality and Tourism Technology, 10(4), 345-357.
  • Brown, A., & Lee, C. (2020). Data-Driven Hotel Operations: An Empirical Study. International Journal of Hospitality Management, 89, 102-112.
  • Chen, D., et al. (2021). Machine Learning Methods for Customer Cancellation Prediction. Tourism Economics, 27(3), 431-447.
  • Goh, C. et al. (2018). Handling Missing Data in Tourism Research. Journal of Business Research, 92, 168–177.
  • Li, H., & Wang, Y. (2020). Customer Behavior Analysis in the Hotel Industry. Annals of Tourism Research, 81, 102 101.
  • Nguyen, T., et al. (2019). Classification Techniques for Service Industry Data. Expert Systems with Applications, 125, 249-262.
  • Park, S., & Lee, J. (2020). Predictive Modeling of Hotel Bookings Cancellations. Tourism Management, 78, 104026.
  • Rahman, M., & Islam, S. (2017). Visual Analytics for Hospitality Data. Journal of Data Science, 15(6), 927-939.
  • Tsai, W., & Chen, M. (2021). Customer Segmentation and Cancellation Prediction. Journal of Travel & Tourism Marketing, 38(2), 180-192.
  • Wang, L., et al. (2018). Decision Tree Classifiers and Their Applications. Applied Sciences, 8(12), 2321.