Forecasting Hotel Room Demand 

A Mosaic Data Science Case Study

Download PDF


A prominent publicly traded hotel chain that operates thousands of global properties across multiple brands had been investing heavily in developing advanced analytics capabilities and capacity to bring value now, and into the future, for the business. Efforts were underway to bring data together in ways not previously explored, with a focus on enabling analytics across the enterprise.

The business had been using an existing demand forecasting model from an enterprise analytics software company, but were dissatisfied with its level of accuracy which hindered the business in appropriately planning and executing resource allocation. The hotel chain needed an analytics consulting partner who could provide predictive analytical capabilities to improve the accuracy of future demand estimates.

Mosaic, a leading data science consultancy, was engaged by the hotel chain to assess the best way to predict future demand for hotel rooms across their various properties. The ultimate objective was maximizing revenue from a resource with constrained supply (i.e. limited number of rooms) and fluctuating demand over time (i.e. night(s) of stay). This is a critical analytics task for hotel chains, as unoccupied rooms on a given night earn zero revenue, while demand in excess of room capacity carries a cost in terms of lost revenue.

Therefore, forecast of future demand helps the hotel industry make key decisions in revenue management. One can assume a generally negative correlation between price and quantity of demand, and determining how this relationship applies to a given hotel property can inform decisions on room rates offered by that property. Obviously, the appropriateness of such decisions depends on the accuracy of demand forecasting.


Mosaic needed to develop forecasts that outperformed the current analytics tool used by the hotel chain, providing the business with an accurate picture of demand. The first step was becoming familiar with traditional approaches to demand forecasting in the hotel industry. Typically, this type of problem is viewed from two angles: an historical time-series modeling approach and an advanced booking curve fitting approach. The time-series approach models future demand day-by-day by using historical data to fit a parameterized model, and then extrapolating the model into the future. The booking curve approach uses the specific booking data for a given day to generate a curve that can be adjusted to account for current bookings on-hand. After spinning up quickly on these approaches, the Mosaic data science consultants began to implement these analytical methods using an open-source toolset.

Mosaic attacked the historical booking model similarly to any other machine learning (ML) modeling problem: by testing various features and ML algorithms. The challenge in this case was that almost all the information came from time-series features (day of week, month, week of year, holidays, etc.), and the hotel chain only had two years of data for model training, with one year of data for testing and validation. Mosaic’s data science consultants were able to gain additional performance by including segmentation at the market category and room class (standard, premium, suite, etc.) levels, and then aggregating the results. The Mosaic data science team modeled each hotel property independently and compared three methods: generalized linear models, random forest, and XGBoost.


Figure 1. Visualizes a common forecasting strategy in the hotel industry, the booking curves.

After proper tuning, XGBoost was found to outperform the others significantly. However, XGBoost took a significant amount to time to train each model due to hyperparameter tuning. At roughly 12 minutes per property, this approach would not scale well to 5,000+ properties, requiring 100’s of hours per training run. So, in search of extra performance and reduced training time, the team turned to a cutting-edge modeling tool called Prophet, a time-series forecasting software package recently open-sourced by Facebook (and available for both R and Python). The data science consultants were able to get as good or better performance from Prophet as with XGBoost, but training time fell to under a minute per property.

For properties that were consistently booked close to the capacity of the hotel, the team performed a log transform of the data so that the clumping of values near the hotel’s capacity was spread out more evenly. Mosaic’s data scientists modeled the log-transformed data and then back-transformed it to obtain property forecasts. This increased accuracy significantly for those hotels that were usually operating near capacity.

Another method to increase accuracy was to selectively choose which market categories to model individually, or as a group, for each hotel based on the proportion of total bookings of each category. The idea was that some market categories are best modeled separately, if there are enough data, as they have somewhat different behaviors than others. Market categories that did not have enough data to be modeled separately were grouped into one category.

For the advanced booking model, Mosaic decided to fit a model to the booking curves themselves (rather than just using the past data in a lookup table) and obtained good results using log transforms of the data and piece-wise estimation for various time segments. As you can see in the left chart of the figure below, the log transform fit the booking curve fairly well except on the critical last day (when a large portion of the bookings occur). By using two separate models (right chart), the Mosaic team was able to obtain a much better fit, and more accurate results in the few days leading up to the forecast date.

Figure 2. Examining the goodness of fit for the advanced booking model.


Mosaic was able to outperform the current analytical forecasting tool by 76.4%. Mosaic’s data scientists were able to achieve this result using open-source software, which could save the hotel chain significant licensing costs. Now, the hotel chain is able to allocate resources more effectively, leading to a number of downstream positive effects on metrics and bottom line net income. Not only are the business decision makers using data more efficiently, the analytics team at this hotel chain gets another highly visible project ‘win’, inspiring more confidence and more projects for the team.