1. INTRODUCTION
The tourism sector is of great importance to today’s economy and will remain so in the coming decades (Kontogianni and Alepis, 2020). As hotels play an important role in the tourism sector, their performance is closely linked to the overall performance of tourism (Mucharreira et al., 2019). The hospitality industry is affected by the rapid growth of reviews and all types of user-generated content (UGC) on the Internet, and there is a need to implement various Big Data analytics methods to gain valuable insights (Mayer-Schoenberger and Cukier, 2013;Tsai et al., 2022).
(Liu, 2021) and (Onuiri et al., 2016) point out that smart tourism involves the use of ICT methods to fully leverage the vast amounts of data in the tourism industry for decision making and management. Incorporating AI (Artificial Intelligence) methods into Big Data analytics means that they can continuously learn and improve from all the input data analysed and predict customer behaviour.
UGC has a major impact on users’ purchasing decisions - about 35% of travellers change their hotel decisions after reading relevant content on social media, while 53% say they would not book a hotel without reviews, and 87% say reviews increase confidence when choosing accommodation (Nicoli and Papadopoulou, 2017). Electronic Word of Mouth (eWOM) consumers write and they do not disappear immediately - other consumers can read these messages for a long time (Breazeale, 2009) and they become a reference point for buyers of goods or services. (Rita et al., 2022;Mou et al., 2022;Meng et al., 2022;Zenggang et al., 2022). The research findings of (Schuckert et al., 2015), who examined 50 articles on eWOM in hospitality and tourism, show that “online reviews seem to be a strategic tool that plays an important role in hospitality and tourism management, especially in promotion, online sales, and reputation management” (Martin-Fuentes, Mateu, and Fernandez, 2018). Online customer reviews (OCRs), as a particular form of eWOM, have a great impact on consumer decision making. OCRs are any positive, negative, or neutral feedback about a product, service, brand, or person provided and shared online (e.g., Booking.com, TUI.com, Facebook, Google reviews, etc.) by a past buyer (Hennig-Thurau et al., 2004;Filieri and Mariani, 2021)
The development of modern technologies in the tourism industry (recommendation systems, online reservations, dynamic pricing, and interactive platforms for evaluating services) have changed the way tourism products are consumed and the way consumers share their experiences and make decisions about choosing a new accommodation (Sanchez-Franco et al., 2019). It is also changing the way hotels should monitor and analyse online reviews to manage and improve service quality, as hotel reviews influence customers’ booking intentions and reviewers’ sentiment (Casaló et al., 2015;Rita et al., 2022;Mellinas et al., 2015)
Online platforms for selling tourism products generate a huge amount of data related to the service experience and form an online reputation of the hotel (Velázquez et al., 2015). The best way for hoteliers to build a good reputation and attract new customers is to manage the hotel’s online reviews (Bridges, 2022). Therefore, it is crucial to identify the characteristics of customer satisfaction and dissatisfaction related to hotel quality.
The main objective of this research is to examine words and phrases appearing in positive and negative hotel reviews of a sample of Croatian hotels in ten locations on the Adriatic coast in 2019 and 2021 (before and after the pandemic COVID 19) on the Booking.com platform. Hotels were classified into four groups based on their overall rating: 7.0-8.0, 8.1-9.0, 9.1-9.4, 9.5-10.0.
In this information extraction process, several objectives are defined: to identify topics that appear in positive and negative guests’ reviews, to investigate whether there are differences in guest perceptions before and after COVID 19 pandemics, and to find a sufficiently accurate ML model for polar sentiment. The following research questions were formulated:
RQ1) Is it possible to identify the main topics influencing positive and negative sentiment for four hotel rating categories (from 7-8, 8.1-9, 9.1-9.4, 9.5- 10) in the two years observed?
RQ2) Is there a difference in the topics of positive and negative reviews in 2019 and 2021? (Did the pandemic COVID -19 change the topics related to hotel service quality?),
RQ3) Is it possible to build a ML model to classify polar sentiment with acceptable performance (> =70% of precision and recall for each positive and negative class of ratings)?
After the introduction, the Related Work section reviews research in the field of hotel review processing and identifies areas related to this research and the innovations presented in this paper. The Methodology section describes the methodology used to collect reviews on the Booking.com platform, the data sample, and the reason for choosing this platform. It also describes the software tool that was used to create text analysis processes and ML models for 4 different categories of reviews. This is followed by the results and discussion describing the results of the text mining analysis for the reviews, which are divided into four categories by ratings for 2019 and 2021, before and after the COVID years. A ML model for classifying reviews as positive or negative in a sample of all reviews and for each of the above 4 categories is described, along with the performance of the classifier. In the conclusion, answers to the research questions are provided, limitations of this research are described, and directions for further research are given.
The contribution of this research aims at the service quality management in the hotel industry: a better understanding of guests’ experiences and their perception of quality, as well as the possibility of using AI methods to obtain rapid analysis and valuable information about guests’ feedback. In addition, the results of the research show the need for the future creation of a sentiment dictionary specifically for the field of tourism reviews.
2. RELATED WORK
Sentiment analysis provides information for understanding public opinion and analysing various tweets and reviews (Tul et al., 2017). The essence of the whole sentiment analysis is to classify the text and determine the contribution of different words for different classifications (Xu et al., 2019). There are many research papers on sentiment analysis in tourism sector, most of them are from 2018 and newer. Various authors have tried to discover word vector predictors of review polarity, develop a reliable ML model for aspect-based sentiment classification, and provide some new insights into travellers’ opinions and sentiment in the tourism industry.
Martin et al. (2018) investigated different Deep Learning techniques (based on Convolutional Neural Networks -CNN and Long short-term memory - LSTM) in the field of classification of online tourist comments (sentiment) from Booking.com and Tripadvisor. The LSTM recurrent neural network algorithm provided the most accurate results.Setiowati and Setyorini (2018) extracted service words and opinion words in their study and then identified sentiments from opinions on service quality indicators. They were then segmented by hotel department and function. Among k-Nearest Neighbour (KNN), Support Vector Machine (SVM), J48, and Naïve Bayes (NB), the rulebased method was used and achieved the highest precision, recall, and f-measure. To investigate the methods of measuring online reputation of hotels, (Pollak et al., 2018) applied a multifactorial analysis of online reputation (Google, Booking.com, Tripadvisor, and Facebook) and discovered a relationship between online reputation factors. The authors in (Mishra et al., 2019) performed text processing of hotel reviews by using TF-IDF and cosine similarity to extract similar values from the sentiment dataset. In their research, (de Brito et al., 2020) presented the development of SentimentALL, a sentiment analysis tool that extracts and analyses user comments from an online booking platform for travel services. The research of (Mostafa, 2020) proposed a Traveller Review Sentiment Classifier that analyses travellers’ reviews about Egyptian hotels and provides a classification of hotel characteristics by sentiment. Among SVM, NB, and Decision Tree, NB had the highest accuracy.Stefko et al. (2020) assessed perceptions of service quality (polarity of sentiment) based on indicators such as location, staff rating, cleanliness, amenities, comfort, value/ money, and Wi-Fi using regression analysis techniques. The results show that the cleanliness and amenities categories have the greatest impact on perceptions of a hotel’s service quality. One of the research topics was also the sentiment of online consumers towards the environmental discussion based on hotel reviews in America and Europe, which increased over time (Mariani and Borghi, 2020). The authors (Oliveira Lima et al., 2021) compared hotel review classifiers using Latent Dirichlet Allocation (LDA), NB, logistic regression, SVM and LSTM which performed the best. The research of (Mehta et al., 2021) aims to evaluate customer satisfaction through sentiment analysis of customer reviews in the pandemic year 2020. The authors also conducted topic modelling to assess the most discussed topics by customers (12 most discussed topics and dissatisfaction with staff, service, room, cleanliness, slow booking, and hotel response to the pandemic).Sontayasara et al. (2021) created a ML sentiment analysis model using SVM with a classification accuracy of 71% for negative, positive, and neutral classes using Twitter data from the 2020 pandemic year. The changes that the COVID -19 pandemic brought to the hotel industry led to changes in guest perceptions of service quality attributes. The study by (Mušanović et al., 2021) provided a review of Facebook comments on hotel brand posts and applied sentiment analysis to identify and compare guest attitudes toward hotel staff, services, and products. The results showed that sentiment was more positive than negative and that there was no significant difference between the content and sentiments of the different hotel categories. The research identified words associated with positive and negative posts. (Peres and Paladini, 2022) examined the negative aspects affecting hotel service quality (a total of 13 aspects related to five hotel quality attributes) and found that room cleaning and check-in were the most negatively affected by the pandemic.Ghosal and Jain (2022) used Word2Vec and extended families of Ordered Weighted Average (OWA) operators in their sentiment aggregation research. Their model includes explicit and implicit aspect segmentation for ratings, semantics for slang words, and location-based rating analysis. (Cendani et al., 2023) also used the LSTM model (with an attentional mechanism) for aspect-based sentiment analysis in their research.
The literature review showed that sentiment analysis of online tourism reviews is mainly based on finding a ML model that is accurate enough to classify polar sentiments regardless of the rating category. In line with the state of the art, this study also develops a ML model for all reviews, but also for each of the four review categories, to investigate the differences between positive and negative reviews in different hotel categories. It also explores the possibility that COVID -19 has changed the relevance of certain issues in relation to perceptions of hotel service quality. It is discussed whether it is necessary to create a specific sentiment dictionary for all types of sentiment analysis of OCR in the tourism sector.
3. METHODOLOGY
As mentioned in the introduction, it was necessary to collect reviews of hotels in the Republic of Croatia, and the Adriatic coast (and certain destinations) were chosen as hotel locations.
3. 1 Resources for the research data
The website “Touropia” (Best places to visit in croatia, 2022) lists Pula, Rovinj, Zadar, Split, Hvar and Dubrovnik as the top destinations for Croatian tourism in 2021. Taylor Herperger, in her article “15 Best Destinations in Croatia to Visit” (Herperger, 2022), adds the island of Brač and many other places like Makarska for their beauty and pleasant beaches. LonelyPlanet also mentioned a beautiful place in Croatia on its official website, namely the island of Krk, a place worth escaping other cities that have many more tourists. Šibenik is also mentioned in several sources. The following hotel locations were selected, whose reviews are considered in this article: Rovinj, Pula, Krk, Zadar, Šibenik, Split, Brač, Hvar, Makarska and Dubrovnik (Figure 1). All reviews are from the most popular website for hotels in the Republic of Croatia, Booking.com (https://www.booking. com/). Booking.com was selected based on the authenticity of the reviews and the rating methods described below. The Booking.com platform guarantees the authenticity and relevance of the reviews, as it allows reviews from people who have made a booking and completed a stay (at least one night in an accommodation). A review is then checked for inappropriate words and its authenticity is verified before publication. In addition, travellers can post positive and negative reviews separately on the Booking.com platform. This is important to determine customer satisfaction and dissatisfaction with the hotel’s quality attributes (Booking.com, 2022;Peres and Paladini, 2022). Booking.com rates the property in six specific areas from 1-10: cleanliness, comfort, value for money, amenities, location, staff, and an optional open feedback. Starting in 2019, the overall rating is no longer the average of all six rating dimensions, but a new rating given by guests for the overall experience. This is due to the fact that guests may perceive other parameters not covered by the six specified (Booking.com, 2022).
3. 2 Research data
Reviews were collected for hotels in the Adriatic Sea before and after the Covid-19 outbreak (2019 and 2021). All reviews were from the most popular website for hotels in the Republic of Croatia, Booking.com (https://www.booking.com/). The ratings were divided into four groups (1st hotel group: 7.0-8.0 rating, 2nd hotel group: 8.1-9.0 rating, 3rd hotel group: 9.1-9.4 rating, and 4th hotel group 9.5 -10.0 rating), and for each of these groups, the ratings from 2019 and 2021 were considered separately.Table 1 shows the number of downloaded reviews by year and by one of the four groups: a total of 3117 reviews, 1600 positive reviews, and 1517 negative reviews (a smaller number of negative reviews from the hotel with the highest overall rating). Of the total 1,600 hotel facilities, 546 (34%) were 3-stars, 702 (44%) 4-stars, and 352 (22%) 5-stars.
Source: Authors
Table 2 shows the number of reviews for each year and grouping from the selected 10 locations and the proportion of the number of reviews observed by location.
Once hotel groups were selected, hotel guest ratings were extracted for each group (separated into three-, four-, and five-star hotels).Figure 2 shows an example of a hotel from the first hotel group and framed relevant data. Label 1 represents the hotel’s star rating, number 2 represents the hotel’s location, number 3 represents the hotel’s average rating, and number 4 represents the overall number of ratings.
Figure 3 shows an example of a hotel rating used in data collection. Label number 1 indicates when the review was written, a very important aspect since only information for 2019 and 2021 was collected. Label number 2 contains a positive comment from the hotel, while label 3 contains a negative comment that had to be separated for later processing.
3. 3 Data analysis
After the data was collected, it was processed in RapidMiner software using various algorithms for text processing, sentiment analysis, and ML.
The RapidMiner data science platform was chosen for several reasons: It is open source, contains a large number of algorithms, the ability to add different packages, has a simple user interface, an intuitive way of working, and is regularly ranked as one of the best tools in its category (Wolff, 2020Hillier, 2022).
The first part of text processing was done using Data Operator’s process documents (all reviews by category were structured in Excel spreadsheets and the relevant attributes were reviews in text form and sentiment - positive or negative), using tokenization operators (for word extraction) with mechanisms for cleaning and reducing word vectors by filtering stop words, eliminating words with less than 3 characters, setting lowercase letters and using Porter’s stemming algorithm. The Term Frequency-Inverse Document Frequency method (TF-IDF) was used to obtain word vectors. In the Results and Discussion section,Tables 3,4,5 and6. list the most frequently occurring words and phrases in positive and negative reviews for each hotel group in 2019 and 2021.
Then, using the RapidMiner operator Extract Sentiment and the Vader dictionary (Valence Aware Dictionary for Sentiment Reasoning), a sentiment analysis model was created to detect specific tokens for which the dictionary has an individual score (from - 4 to 0 are negative, 0 is neutral, and from 0 to 4 are positive). After the individual tokens, their scores are summarized and the overall sentiment score of a text (reviews in this case) is determined. The result of the sentiment analysis is described in the Results and Discussion section.
In the last part of the research, a ML model was built using operators in RapidMiner Deep Learning (DL), Gradient Boosted Trees (GBT) and Linear Support Vector Machine (LSVM), the results of which are described in the next section.
4. RESULTS AND DISCUSSION
The results obtained are presented below: The frequent words (unigrams) from positive and negative reviews, divided by hotel groups, the frequent bigrams from positive and negative reviews, divided by hotel groups, and ML models for polar sentiment extraction.
4. 1 Frequent unigrams
Tables 3 and4 summarize the results by four groupings and years. The words with the highest frequencies were selected for display (the number of words in each category was not the same because we included only the most relevant frequencies). It should be mentioned that the application of stemming was also chosen when creating the word vector, so some words and phrases are in this form rather than in their original form as lexemes.Table 3 shows the most frequent words in the groups 7.0-8.0 and 8.1-9.0 (here there are the most hotels with 3 and 4 stars). No major variations were observed in the topics of positive and negative ratings in the years before and after the occurrence of Covid 19 - the occurrence of the word clean in the frequent words of negative ratings in 2021 after the pandemic was observed in both groups.
There are topics that appear mostly in positive reviews, such as: staff, friendly, view, location, help, comfort, beach, love, beauty (stem) and positive adjectives: nice, good, great.
Words/areas such as: recept (stem of reception), bed, food, check (check in and check out), park (parking), restaurant, service, bathroom, old, book (booking) are more common in negative reviews.
Areas that appear in both positive and negative reviews are hotel, room, staff, breakfast, pool, clean (where clean in negative reviews indicates a problem with cleanliness).
Source: Authors
In hotels with a higher overall rating, where the highest percentage of hotels with 4 and 5 stars are found, the word occurrence is similar to the previous two groups, except that the pool appears more often in negative reviews and the topics of payment, price, noise and coffee appear only in negative reviews. Breakfast is one of the most common topics in negative reviews across all groups and observed years, along with hotel and room. The word excel appears in positive reviews, where also appears more often word comfort (derived from excellent, excels...).
Source: Authors
4. 2 Frequent bigrams
After analysing the occurrence of individual words, an analysis of bigram searches and their frequency was performed for all hotel groups and the polarity of reviews. A combination of expressions was observed (mostly in the form of adjective_nouns), and it was determined which expressions occur most frequently in positive reviews, in negative reviews, or in both types of reviews. In this way, areas that are important to guests and that influence their satisfaction or are reasons for dissatisfaction are revealed.
Source: Authors
Positive reviews of all hotel groups (tables 5 and6) are dominated by areas related to friendly staff and help, bus_stop and various positive adjectives along with room, beach, buffet, view, location, and breakfast (nice, excellent, great, good, delicious, comfort...). Expressions that appear exclusively in negative reviews are air_condition, reception, atm, air_town, hot_water, park_place and negative adjectives with topics like book, bed, room, balcony, breakfast, bathroom.
The terms beach_advertisement, citi_center, fridge, shower gel, wash_machine were also found in the negative hotel reviews with ratings of 9.1-9.4 and 9.5-10.0.
The critical areas for both types of sentiments in the reviews turned out to be the following: old_ town, breakfast, dinner, room_cleaning, room, sea_view, view, pool, booking, and value_money.
Source: Authors
4. 3 Machine learning model for polar sentiment analysis
The research results can not only help in managing the quality of hotel services, but also serve as a basis for creating a sentiment dictionary that would include, in addition to standard words and their corresponding rating, these typical words for expressing sentiments in hotel ratings. For example, the word room would be paired with an adjective that refers to rooms and can be positive or negative. In this way, it would also be possible to create an aspect-based sentiment analysis that identifies sentiments related to an aspect, such as room. This is useful because when a sentiment analysis model was created using the Vader sentiment dictionary for all 3117 ratings, it was found that more than half of the negative ratings according to Vader were not negative. A look at the method of assigning the total score for each text unit (individual rating) shows that many negative semantics were not detected.
Since the existing sentiment dictionaries cannot detect the sentiment in a large number of reviews, the last part of the research was to build a ML model for detecting the sentiment of hotel reviews based on a specific ML algorithm and the number of reviews in the training phase. A training/ testing partition with a ratio of 80:20 was created from a total of 3117 reviews from 2019 and 2021 (1600 positive and 1517 negative). The stratified sampling method was used (which ensures that the class distribution in the partitions is the same). Cross-validation was used for the training phase, which reduces the occurrence of overfitting by a factor of 10, and stratified sampling was also used for the folds. The following algorithms were investigated: Deep Learning (DL), Gradient Boosted Tree (GBT) and Linear Support Vector Machine (LSVM). The results are presented inTables 7 and 8 andFigures 4 and5. The results presented include the algorithms and parameters that provided the best results for the observed data partition.
Source: Authors
Source: Authors
It can be seen that LSVM achieved the best overall performance of 87.51% in the training phase and 86.84% in the testing phase.
During research, other ML techniques were evaluated:
• the polar classification of each class of four evaluation groups - all performance parameters were below 75%.
• the classifier’s ability to identify each rating group of the hotel based on the review - resulted in low performance (all performance parameters around 45% for both DL and GBT algorithms).
5. CONCLUSION
The adoption of modern technologies and Big Data analytics in tourism is necessary to monitor customer satisfaction, provide quality services and maintain competitiveness. OCR as a form of eWOM represents an important area of potentially valuable information and knowledge in the hospitality industry. The application of AI methods such as natural language processing and, in particular, sentiment analysis in the tourism sector makes it possible to gain some important insights that contribute to the management of the hospitality industry. There is a lot of research on text analytics, text mining, NLP, and sentiment analysis related to the hospitality industry, but with the development of ICT, there is still plenty of room for new insights. The hospitality industry is likely to be the most affected by coronavirus disease in 2019 (COVID -19). Therefore, the results of the various sentiment analysis type studies should help hotel management to provide effective services to restore and maintain customer satisfaction.
This research was conducted on 3117 hotel reviews on the Croatian Adriatic coast in 2019 and 2021. Different text processing techniques and ML models were applied to answer three research questions RQs. The answers are as follows
RQ1) Is it possible to identify the main topics influencing positive and negative sentiment for four hotel rating categories (from 7-8, 8.1-9, 9.1-9.4,9.5- 10) in the two years observed?
Answer on RQ1): the topics that mainly appear in positive and negative reviews were identified, as well as the areas that appear in both types of reviews. The words and bigrams were used to determine which hotel services had the greatest impact on guest satisfaction and which were the main topics of negative sentiment and dissatisfaction. There were no significant differences among the four groups of hotels, except for some topics that occurred only in the group of hotels with the highest ratings.
RQ2) Is there a difference in the topics of positive and negative reviews in 2019 and 2021? (Did the pandemic COVID -19 change the topics related to hotel service quality?),
Answer on RQ2): there is no indication of a significant change in the topics appearing in the 2019 and 2021 reviews, with the exception of some new topics such as coffee, coffee maker and washing machine.
RQ3) Is it possible to build a ML model to classify polar sentiment with acceptable performance (> =70% of precision and recall for each positive and negative class of ratings)?
Answer on RQ3): for building classification models using ML, two main objectives were established: 1) to create a classifier that classifies a review as positive or negative, and 2) to create a classifier that can classify a review not only as negative or positive, but also as belonging to a particular hotel rating group. The results for 1) showed the performance of three ML algorithms, all of which achieved over 79% accuracy, with the LSVM algorithm achieving the best performance. The performance of the ML models for 2) was low, about 45% accuracy, for all observed algorithms.
The results of this study have highlighted the main strengths and weaknesses of the positive and negative scores and can be used to create action plans, eliminate problems, and maintain and improve the dimensions that are perceived as positive. The main limitations of this research are the relatively small number of assessments and the limitation to the Croatian Adriatic coast. Therefore, it is planned to expand the sample of ratings in the future and include more locations in different countries. Creating a sentiment dictionary for the tourism sector is also one of the goals of the next research, as well as exploring the extraction of aspect-based OCR semantics.