Iot Data De-Duplication and Removal: A Multi-Faceted Approach Integrating Data Analytics and Machine Learning for Enhanced Server Efficiency
Main Article Content
Abstract
In the ever-expanding landscape of the Internet of Things (IoT), efficient data management is imperative for optimizing server resources. One prevalent challenge faced is the proliferation of redundant copies of shared data backups, leading to unnecessary resource consumption. This paper presents an innovative solution for IoT data de-duplication and the removal of unwanted copies in server backups, addressing the inefficiencies associated with multiple backups. The current scenario witness multiple users generating backup copies of shared data on the same server. This practice results in significant resource allocation and storage challenges, impacting server efficiency. Users, unaware of sharing a single copy, contribute to the accumulation of redundant data, making it imperative to devise a more streamlined and resource-efficient approach. Our proposed system leverages a multi-faceted approach involving data analytics and machine learning to address the existing challenges. The process commences with the systematic collection and analysis of all backup copies through data classification and clustering. An efficient machine learning algorithm is then employed to accurately identify and mark duplicate copies based on various factors. User backups are linked, allowing for versioning and ensuring seamless collaboration. Dynamic copy removal mechanisms ensure that when a user attempts to delete a copy, the system intelligently maintains connectivity for other linked users until the last person disconnects. This approach enhances user transparency, as individuals are not explicitly made aware of sharing a single copy. The identified shared copy persists on the server until the last connected user decides to delete it. The proposed system offers a comprehensive solution to the challenges posed by redundant IoT data backups. By combining data analytics and machine learning, our approach enhances resource utilization, minimizes storage redundancy, and facilitates seamless collaboration among users, ultimately optimizing server efficiency and ensuring data integrity.