Le Thi Hoai An, Pham Dinh Tao: "Minimum Sum-of-Squares Clustering by DC programming and DCA".

Abstract: In this paper, we propose a new approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. The so called Minimum Sum-of-Squares Clustering (MSSC in short) is first formulated in the form of a hard combinatorial optimization problem. It is afterwards recast as a (continuous) DC program with the help of exact penalty in DC programming. A DCA scheme is then investigated. The related DCA is original and very inexpensive because it amounts to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, that all are given in the explicit form. Numerical results on real word data sets show the efficiency of DCA and its great superiority with respect to K-means, a standard method of clustering.
Keywords: clustering, MSSC, Combinatorial optimization, DC programming, DCA, Nonsmooth nonconvex programming, Exact penalty.

Citation: Le Thi Hoai An and Pham Dinh Tao, Minimum Sum-of-Squares Clustering by DC programming and DCA,  in “Advanced Intelligent Computing Technology & Applications”,  Lecture Notes in Artificial Intelligence, volume 5755, pp. 327-340, Springer Verlag 2010.