Fake Review Detection Using Ensemble Techniques by the Fusion of Chronology, Aspect and Sentiment of Reviews and Oversampling by Smote

Main Article Content

Navin Kumar Goyal, Mukesh Kr. Gupta, Bright Keswani, Dinesh Goyal, Anil Pal

Abstract

To proposing a system for classification of fraud reviews on products by generating novel features on user-review behaviour, user-review chronology as well as linguistic text features. In this research the utilization of a comprehensive and wide-ranging dataset obtained from the Yelp.com review platform, which is accessible to the public, has been taken into account. From the dataset, we chose the data for Hotels/Restaurants in the state New York City. This research work is divided into 4 parts 1. Pre-processing raw data by using NLTK libraries. 2. Analysis and design of novel features. 3. Building an ensemble supervised machine-learning model over features generated in the previous steps for the detection of ham and fabricated reviews. New features were drawn by considering the combination of user-review (UR) aspect, user-review-product (URP), and the review-review (RR) context. In the 4-part, the comparison with previous research is presented in a tabular format. Additionally, a graph has been generated to depict the performance metrics of precision, recall, F1-score, and accuracy. Extra Tree ensemble classifier outperformed others in accuracy (96%), precision (97%) and f1-score (96%).

Article Details

Section
Articles