Solving the Non-Deterministic Nature of the DBSCAN Algorithm

Samarjit Das, Atowar ul Islam, Priyanka Sarma, Sangeeta Borkakoty, Angshuman Sinha

PDF

Published: Jul 9, 2023

Samarjit Das, Atowar ul Islam, Priyanka Sarma, Sangeeta Borkakoty, Angshuman Sinha

Abstract

Clustering Algorithms are important tools in data mining and an effective approach towards the formation of clusters from a huge dataset. One such algorithm is the DBSCAN algorithm which follows the density based notion of clusters to determine clusters as well as noise points in the dataset. However, this algorithm has certain disadvantages. In our previous work we addressed one such major disadvantage - the absence of a valid formulation for the input parameters on which DBSCAN mainly relies. We introduced a simple algorithm to analyse the values of these parameters, following the basic ideas of simple frequency distributions, mean deviations and first nearest distances. DBSCAN also suffers from a problem of non-determinism, which arises, when adjacent clusters share a common border point. Although, it’s a rare situation and doesn’t create an impact, it may disrupt the cluster quality, allowing the chances of noise points to be present near to the clusters. Here, we present a simple algorithm that addresses this demerit. We follow a simple idea of observing the behaviour of the adjacent clusters under consideration in absence of the common border point and determine which cluster has the greatest affinity towards it.

Issue

Vol. 44 No. 7 (2023): Issue 7

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details