User Needs Mining Based on Topic Analysis of Online Reviews

: The purpose of this paper is to aggregate the topic information of online review text and clarify the user needs. We conducted the study on online reviews of women’s clothing store of Taobao.com with semantic analysis and text mining. Online reviews were collected by means of web crawler. Using Chinese word segmentation tool and data analysis tool, the word frequency statistics was realized. The statistical software was used for the clustering analysis and multidimensional scaling analysis of high frequency keywords. The results show that the content of online reviews mainly includes four topics: basic features of products, additional features of products, user experience and product display. It reveals the potential user needs of women’s clothing store of Taobao.com, which cannot only help consumers to make rational decisions, but also provide guidance to merchants and manufacturers.


INTRODUCTION
Online reviews are also called Online Consumer Reviews (OCRs) or user generated content (UGC) [1].Online consumer reviews are product information created by users based on personal experience.They are an important form of online word-of-mouth and serving as free "sales assistants" to help consumers identify products that best meet their conditions of use [2].Online reviews are experiences, evaluations or opinions expressed in natural language in the form of text.They have the characteristics of timeliness, quantity, unstructured and complex content [3].At this stage, the volume of online commentary information has increased dramatically, and even information overload has occurred, which greatly reduces the value of information [4].Therefore, text mining on online reviews of online retail website not only helps consumers make rational decisions, but also can guide merchants and manufacturers to design, produce and operate by discovering potential user needs.

RESEARCH STATUS
The research object of online reviews is related to multi-field just like online retail, electronic books, tourism and other fields.Hao et al. (2009) conducted an empirical study of online reviews based on movie panel data [5].Wang and Zeng (2011) have conducted online mining of ebook readers' online review information [6].Yu (2014) constructed a motivation model for the online review of consumer purchase decisions [7].Zhuo et al. (2015) based on 4366 hotel reviews from Trip Advisor.com, the study found that commentary content length, commentary extremes, comment usefulness votes, reviewer acceptance, and personal information disclosure are significant for the usefulness of online reviews.Positively affects [8].Yue et al. (2017) used online reviews to mine user dietary preferences [9].Wu and Chen (2017) used the UGC text content mining method to study the honeymoon travel memories shared by honeymoon tourists in China, and proposed that the five types of information play an important role in the selection of tourists' honeymoon destination information [10].Liu et al. (2018) focused on electronic word of mouth (eWOM) of e-books [11].Ye et al. (2019) proposed a four-dimensional IT framework which took expertise knowledge, online reviews, profile descriptions and service quality as signals that distinguish high-quality physicians [12].The above research involves a variety of e-commerce products and focuses on the impact of reviews on users' purchase intention, but lacks the research on online reviews on clothing products.
Information mining in the commercial application of online reviews also caught the attention of the researchers, mainly focusing on the usefulness of online reviews, the influence of online reviews on businesses and consumers behavior study [13].Wang et al. (2018) took haodf.com,China's leading Internet medical community, as the research object, pointing out that doctor's personal website can significantly increase the amount of comments from patients [14].Cao et al. (2011) used text mining methods to explore the impact of various features of online user reviews on the usefulness of reviews [15].Yan et al. (2012) studied the impact of factors such as comment depth, personal preferences and feelings, and commentary emotions on the usefulness of reviews [16].Liu and Zhao (2017), based on the DEMATEL method, studied the influencing factors of the usefulness of online reviews and finally identified six key influence factors [17].Li (2012), based on the online reviews of mobile phones, found that the number of online reviews, the degree of attention of products, the timeliness of reviews, and the usefulness of reviews considered by customers have a significant effect on online mobile phone sales [18].Tu et al. (2015) constructed a user demand mining model for online review data, and took Lushan tourism as an example to verify the feasibility of the model [19].Li et al. (2016) constructed a dynamic user preference model in view of online review information mining [20].Li et al. (2018) used the questionnaire data to study the impact of online reviews on customer value creation [21], which manifests that it still needs to further explore the user demand of online reviews of online retail websites from the perspective of text analysis.

RESEARCH METHODS AND RESEARCH PATHS 3.1 Research Methods
The paper adopts the content analysis method and topic analysis method to carry out text mining on online reviews of women's clothing store of Taobao.com.To begin with, the web crawler tool was used to collect online reviews of women's clothing store of Taobao.com, and text data was pre-processed.Next, Chinese word segmentation and part-of-speech tagging were performed on the dataset using the ICTCLAS word segmentation tool of the Chinese Academy of Sciences; word frequency was statistically analyzed using SPSS and Excel data analysis tools.Then, the high-frequency keyword co-occurrence matrix is constructed by using database software and code written, and the co-occurrence matrix was transformed into a similarity matrix using the Ochiia similarity coefficient, so that the dissimilarity matrix is obtained.Finally, statistical software was used to carry out high-frequency keyword clustering analysis and multidimensional scaling analysis of online reviews, and output clustering tree map and multidimensional scaling analysis results.Through comprehensive analysis and combing of the output results, it obtains the four topics of the online reviews.

Research Paths 3.2.1 Collection and Pre-Processing of Online Review Text
Taobao.com is one of the typical online retail platforms in China, and women's clothing is one of Taobao.commost popular product categories.In this paper, an online shop is randomly selected as the research object in the women's clothing stores at the level of golden crown which is favoured by users on Taobao.com.On July 1, 2017, it used web crawler tool to collect information on 144 products for sale in the store, including product name, product image, buyer name, buyer rating, online reviews, review time, etc.There were 9582 online reviews distributed from January 5, 2017 to June 30, 2017, and this text set was used as the research data.
The reprocessing of online review information includes 3 steps: Firstly, system default reviews were filtered.Taobao.comwill give automatic evaluation when the buyer fails to make timely evaluation, such as "Evaluation party did not make timely evaluation, the system default praise!", "The buyer did not make an evaluation within 15 days," "This user did not fill in the evaluation."These items are not generated by buyers and the number of items is large which makes noisy.Therefore, such reviews need to be filtered.
Secondly, invalid reviews were filtered.Some reviews are only punctuation marks or expressions which have no practical meaning, such as "!!!", "??".This kind of content has very little useful information and does not contribute to the topic analysis of the review text, so it will be deleted.
Finally, duplicate reviews were filtered.When a buyer purchases multiple pieces of the same product at a time, repeated reviews often appear.This situation occurs more frequently in the online reviews of this store, and has a greater impact on word frequency statistics, so duplicate reviews need to be removed.After the information was preprocessed, 3382 reviews were retained for further analysis.

Word Segmentation and Word Frequency Statistics
There is no definite word mark between the words of Chinese text.Before the automatic analysis, the whole sentence should be divided into words, that is, word segmentation.In the word segmentation stage, the paper adopts the ICTCLAS word segmentation tool of the Chinese Academy of Sciences.The tool can perform Chinese word segmentation, part-of-speech tagging and keyword extraction.The accuracy of word segmentation is as high as 97.58%, and it has a good segmentation effect.In terms of word frequency statistics, SPSS and Excel data analysis tools are used.Through the word segmentation and word frequency statistics of the online review text, 1952 keywords were obtained.There were 115 keywords whose word frequency was greater than 50.In the topic analysis, nouns and verbs made a great contribution to the topic expression, so only the two kinds of words were considered.In terms of nouns and verbs with a word frequency greater than 50, among similar vocabularies, only the word with the highest frequency was selected.For example, the "pattern" was removed and the "style" was retained; the "cloth" and "material" were removed and "Fabrics" was retained.Then it removed the words that had no practical meaning, such as "up", "feel", "hope", etc.At the same time, we removed some single words that were insufficient to analyze.In addition, in the vocabulary with word frequency greater than 40, four words with strong semantic relevance were added.Finally, 48 high frequency keywords were obtained, as shown in Tab. 1.Among them, the term "XS" is rather special.Although this word is not a Chinese vocabulary, it frequently appears in the online reviews of women's clothing stores, meaning "extra small".It can reflect the size information of the buyers who buy or pay attention to the clothing reservations.

Construct the Keyword Matrix
To begin with, the co-occurrence matrix was constructed.The database software and the written code were used to statistic the co-occurrence frequency of the 48 high-frequency keywords in online reviews and built cooccurrence matrix of high-frequency keywords in online review text (as shown in Tab. 2).The higher the co-occurrence frequency is, the closer the connection between the two words is.Secondly, the correlation matrix according to cooccurrence matrix was constructed.And the co-occurrence matrix was transformed into similarity matrix by Ochiia similarity coefficient.
The larger the value in the similarity matrix is, the closer the relationship is.The closer the distance is, the greater the similarity is.On the contrary, the smaller the value is, the closer and more similar the two are.
Finally, in order to adapt to the next analysis and reduce the error, it is necessary to convert the similarity matrix into the dissimilarity matrix that represents the degree of difference between the two words, as shown in Tab. 3. The specific method is to subtract each number in the similarity matrix from 1, which can be realized by means of function formula in excel.The greater the value is, the more distant between the two.The greater the distance is, the smaller the similarity is.Conversely, the smaller the value is, the closer and more similar the two are.

Clustering Analysis of High-Frequency Keywords
The essence of clustering analysis is to divide the data into several categories according to the distance, so as to minimize the difference in the data within the category and maximize the difference between the categories.In the clustering analysis of high-frequency keywords in online review text, the high-frequency keyword dissimilarity matrix is introduced into SPSS 23.0, and the interval is set to cosine [22].The inter-group linkage method is used to obtain the clustering tree of high-frequency keywords of online reviews, as shown in Fig. 1.
The dendrogram is an important tool for interpreting the results of hierarchical clustering.The clustering results show that high-frequency keywords are divided into several categories.Setting the threshold to 25 can be divided into four categories.The first category mainly includes keywords that represent product characteristics, such as "fabrics", "workmanship", "style", "accessories", "logistics", etc.; the second category mainly includes keywords indicating user experience, such as "match", "look slim", "look fat", "chest", "shoulder", etc.; the third category mainly includes keywords for product display, such as "picture", "real object", "colour aberration", and "colour"; the fourth category mainly includes keywords that indicate customer service, such as "customer service", "size", "effect" and so on.Among them, the first and second categories contain more high-frequency keywords, which are two large categories, implying that buyers pay more attention to content related to product features and user experience, and related words appear more frequently in online reviews.

Multidimensional Scale Analysis
In order to better reveal the degree of similarity between vocabularies and to be complementary to clustering analysis, this paper also adopts the multidimensional scaling analysis method, as shown in Fig. 2. According to the results of multidimensional scale analysis of online reviews, the results of the first quadrant and the second quadrant are relatively tight, and the results of the third quadrant and the fourth quadrant are neither tight nor loose.The distribution trend of the keywords shows that the multidimensional scale analysis has refined the "product characteristics" and "user experience".In the product characteristics section, the first quadrant includes keywords such as "fabric", "workmanship" and "style" that represent the basic characteristics of the product, and the fourth quadrant includes keywords such as "accessory", "logistics", "customer service", "evaluation", etc.In the user experience section, the second quadrant mainly includes "black", "white", "chest", "shoulder", "effect" and other wearing experiences related to specific colours and body parts; the third quadrant mainly includes "tops", "shorts", "wasted", "exaggerated" and other clothingrelated wearing experience.

DATA ANALYSIS AND RESULTS
In this study, three experts in the field of e-commerce were invited to sort out the results of the clustering tree and multidimensional scaling analysis in a back-to-back way.Finally, the high-frequency keywords of the online review text are summarized into four categories.That is, the four topics corresponding to online reviews: basic features of products, additional features of products, user experience and product display.

Basic Features of Products
Basic features of products are the characteristics of the product itself, which usually include the product's appearance, quality, function, trademark, and packaging.By analyzing the information about the basic characteristics of products hidden in user reviews, it is possible to know the general concerns and needs of consumers for physical objects.
Through hierarchical clustering analysis and multidimensional scaling analysis, this paper finds that the reviews on the basic features of the products are mainly focused on colours, fabrics, workmanship, styles and other aspects of women's clothing.By means of the analysis of the review topics, we can see that consumers not only pay attention to the colours and styles of clothing, but also focus on the workmanship, fabrics, and textures of clothing.In particular, they mention more about the "thickness" and "wrinkle" of fabrics, which indicate consumers have certain requirements for the quality of women's clothing.

Additional Features of Products
Additional features of products refer to the relevant features other than the product's own characteristics, mainly including the service forms such as customer service, logistics, technical support, information provision and other factors.
In addition to paying attention to the characteristics of apparel products, consumers have also repeatedly evaluated related services obtained from the purchase of products.The feedback on customer service is mainly expressed in the size recommendation, and generally a positive evaluation of the effect, such as "like" (the word does not usually appear in combination with the negative words).Feedback on logistics is mainly reflected in shipping and spot goods, which can all indicate that consumers have higher requirements for the delivery speed of goods.The analysis results of the sample also show some concerns with accessories, price and evaluation.The clothing matching, the price and the evaluation of the purchased consumers have also become the hot topics for users to talk about.

User Experience
User experience refers to a kind of pure subjective feeling that users establish in the process of using products.As a typical experiential product, clothing gives rise to rich user experience.After a consumer purchases and try on a coat or dress, he will mainly describe the wearing effect from the aspects of clothing matching, body parts, specific colours, and size.In terms of clothing matching, preference is given to the view from the top and bottom, reflecting the strong aesthetic consciousness and personality characteristics of modern women."Chest", "shoulder", "waist" and "leg" are body parts frequently mentioned by buyers."Look slim" and "look fat" are frequently used evaluation words, indicating that buyers are particularly concerned about the design and comfort of clothing in the above four parts.What's more, the slim is still the aesthetic vane of the public.All of these require the store to pay special attention.The buyers of women's clothing store prefer "black" and "white", which should be related to the style of the store and the audience.It is worth further analysis and excavation.In terms of size, "XS", which refers to the size of extra small, is very prominent, indicating that some buyers are petite, and the needs of this group could not be ignored.

Product Display
Product display refers to the detailed display of products, including detailed information on specifications, styles, colours, etc.The purpose is to enable online consumers to more intuitively understand the products displayed on the website as soon as consumers see them [23].By analyzing the text of online reviews, we can see that the sharing of opinions on the effect of product display is mainly reflected in the degree of conformity between the picture and the physical object.The most prominent one is the problem of colour aberration, which has become a common problem for women's clothing stores.

CONCLUSIONS AND PROSPECTS 5.1 Conclusions
This paper mainly adopts the content analysis method and the topic analysis method to study the online reviews of women's clothing store of Taobao.com.By mining the results of hierarchical clustering analysis and multidimensional scaling analysis, four topics of the review text were obtained, thereby further analyzing the potential user needs of women's clothing store of Taobao.com.

Product Characteristics are an Important Factor Influencing Consumers' Purchase Intentions
For example, consumers are very concerned about the colour of clothing.Specifically, there are three aspects: first, the word frequency of "colour" is higher than the other basic features of products such as fabrics, styles, and workmanship; second, in the reviews reflecting the user experience, targeted specific colours such as white and black are discussed, which shows a certain degree of preference; third, in the display of products, exposed is the problem of colour aberration.The above results indicate that colour has become an important factor influencing users' purchase intentions in similar women's clothing stores.Merchants should focus on analyzing the distribution of consumer demand for product features, and then provide targeted products and services to meet the demand.

User Experience is a Key Factor that Affects Consumers' Purchase Intentions
The perceived quality of experiential products is more determined by subjective or personal experience [24].Consumers often find it difficult to obtain complete information when purchasing such merchandise and the perception of the same product varies from person to person.The actual effect and expected value of the product are likely to be significantly different.Therefore, a large number of experiential evaluation contents have emerged.At this stage, through the combination of multimedia technologies such as virtual reality and augmented reality, the "Internet fitting room" can be opened to allow consumers to enhance the experience through "try-on".O2O mode is used to set up an experience shop offline to send garments to consumers.

Product Display is a Special Way of Influencing Users' Purchase Intention
Women's sellers can display apparel products in all directions through pictures, texts, short videos, and live broadcasts through social media such as Weitao, Weibo, and WeChat.In addition to displaying the basic characteristics of the product, information services can also be provided through clothing matching, fashion interpretation, etc. to increase the additional features of the product, thereby affecting the consumer's purchase decision.In addition, there is a need to enhance social attributes in the display of products so that consumers can actively participate in them.

Prospects
This article does not study the pictures, symbols, and emotion-related content involved in online reviews.In the future, it is necessary to do in-depth analysis in this area.In addition, the research sample of this article is a randomly selected women's clothing store of Taobao.com,which has certain sample limitations and requires the selection of large samples for follow-up studies.Finally, online reviews on social media sites would be chosen for comparative research.

Figure 1
Figure 1 Clustering tree of high-frequency keywords of online reviews

Figure 2
Figure 2 Multidimensional scaling analysis results of online reviews

Table 1
High-frequency keywords and frequency of product reviews

Table 2
High-frequency keywords and frequency of product reviews

Table 3
Dissimilarity matrix of high-frequency keywords (part)