Fake Review Detection Using Ensemble Techniques by the Fusion of Chronology, Aspect and Sentiment of Reviews and Oversampling by Smote
Main Article Content
Abstract
To proposing a system for classification of fraud reviews on products by generating novel features on user-review behaviour, user-review chronology as well as linguistic text features. In this research the utilization of a comprehensive and wide-ranging dataset obtained from the Yelp.com review platform, which is accessible to the public, has been taken into account. From the dataset, we chose the data for Hotels/Restaurants in the state New York City. This research work is divided into 4 parts 1. Pre-processing raw data by using NLTK libraries. 2. Analysis and design of novel features. 3. Building an ensemble supervised machine-learning model over features generated in the previous steps for the detection of ham and fabricated reviews. New features were drawn by considering the combination of user-review (UR) aspect, user-review-product (URP), and the review-review (RR) context. In the 4-part, the comparison with previous research is presented in a tabular format. Additionally, a graph has been generated to depict the performance metrics of precision, recall, F1-score, and accuracy. Extra Tree ensemble classifier outperformed others in accuracy (96%), precision (97%) and f1-score (96%).