Skip to the main content

Original scientific paper

https://doi.org/10.20532/cit.2024.1005803

E-Commerce Fake Reviews Detection Using LSTM with Word2Vec Embedding

Mafas Raheem ; School of Computing, Asia Pacific University of Technology and Innovation, Kuala Lumpur, Malaysia
Yi Chien Chong ; School of Computing, Asia Pacific University of Technology and Innovation, Kuala Lumpur, Malaysia


Full text: english pdf 308 Kb

page 65-80

downloads: 0

cite


Abstract

Customer reviews inform potential buyers' decisions, but fake reviews in e-commerce can skew perceptions as customers may feel pressured to leave positive feedback. Detecting fake reviews in e-commerce platforms is a critical challenge, impacting online shopping and deceiving customers. Effective detection strategies, employing deep learning architectures and word embeddings, are essential to combat this issue. Specifically, the study presented in this paper employed a 1-layer Simple LSTM model, a 1D Convolutional model, and a combined CNN+LSTM model. These models were trained using different pre-trained word embeddings including Word2Vec, GloVe, FastText, and Keras embeddings, to convert the text data into vector form. The models were evaluated based on accuracy and F1-score to provide a comprehensive measure of their performance. The results indicated that the Simple LSTM model with Word2Vec embeddings achieved an accuracy of nearly 91% and an F1-score of 0.9024, outperforming all other model-embedding combinations. The 1D convolutional model performed best without any embeddings, suggesting its ability to extract meaningful features from the raw text. The transformer-based models, BERT and DistilBERT, showed progressive learning but struggled with generalization, indicating the need for strategies such as early stopping, dropout, or regularization to prevent overfitting. Notably, the DistilBERT model consistently outperformed the LSTM model, achieving optimal performance with an accuracy of 96% and an F1-score of 0.9639 using a batch size of 32 and a learning rate of 4.00E-05.

Keywords

fake detection; word embeddings; deep-learning model; transformer-based model

Hrčak ID:

321564

URI

https://hrcak.srce.hr/321564

Publication date:

30.9.2024.

Visits: 0 *