Effect of Subset on Classification Accuracy of Breast Cancer Detection
Main Article Content
Abstract
Data mining is the process of discovering valid, novel, useful, and understandable patterns in data. It involves extracting information from large databases and plays a crucial role in various fields, including business, education, government, health care and engineering .In health care ,data mining is particular useful for disease predictions. Techniques such as classification, clustering, association rules, summarization, and regression are commonly used.
Breast cancer is a serious illness that affects many women worldwide. Early detection significantly increases the chances of successful treatment; with success rates reaching up to 80%.Analyzing existing data for early detection is therefore essential. In our study, we used data from cancer patients provided by the Wisconsin dataset from the UCI learning Repository, which includes 35 different features.
We applied the Ant Colony Optimization (ACO) feature selection algorithm to reduce the number of features. The selected features were then used as input for various classification algorithms. We compared the accuracies of these algorithms before and after applying ACO to assess the improvement in performance .Our results showed that ACO significantly enhanced classification accuracy, with the Random Forest algorithm achieving the highest accuracy of 99.02%.