Refining A Divisive Partitioning Algorithm for Unsupervised Clustering

Canasai Kruengkrai, Virach Sornlertlamvanich, and Hitoshi Isahara

Abstract

The Principal Direction Divisive Partitioning (PDDP) algorithm is a fast and scalable clustering algorithm [3]. The basic idea is to recursively split the data set into sub-clusters based on principal direction vectors. However, the PDDP algorithm can yield poor results, especially when cluster structures are not well-separated from one another. Its stopping criterion is based on a heuristic that often tends to over-estimate the number of clusters. In this paper, we propose simple and efficient solutions to the problems by refining results from the splitting process, and applying the Bayesian Information Criterion (BIC) to estimate the true number of clusters. This motivates a novel algorithm for unsupervised clustering, which its experimental results on different data sets are very encouraging.

Download: pdf, ps


Canasai Kruengkrai