Towards Designing Ranking Systems for Hotels on Travel Search Engines: Combining Text Mining and Image Classification with Econometrics

In this paper, we empirically estimate the economic value of different hotel characteristics, especially the location-based and service-based characteristics given the associated local infrastructure.  We build a random coefficients-based structural model taking into consideration the multiple-levels of consumer heterogeneity introduced by different travel contexts and different hotel characteristics.  We estimate this econometric model with a unique dataset of hotel reservations located in the US over 3 months and user-generated content data that was processed based on techniques from text mining, image classification, and on-demand annotations.  This enables us to infer the economic significance of various hotel characteristics.  We then propose to design a new hotel ranking system based on the empirical estimates that take into account the multi-dimensional preferences of customers and imputes consumer surplus from transactions for a given hotel.  By doing so, we are able to provide customers with the “best value for money” hotels.  Based on blind tests of users from Amazon Mechanical Turk, we test our ranking system with some benchmark hotel ranking systems.  We find that our system performs significantly better than existing ones.  This suggests that our inter-disciplinary approach has the potential to improve the quality of hotel search.