Viet Anh Nguyen, Hoai An Le Thi, Hoai Minh Le, A DCA Based Algorithm for Feature Selection in Model-Based Clustering

Abstract: Gaussian Mixture Models (GMM) is a model-based clustering approach which has been used in many applications thanks to its flexibility and effectiveness. However, in high dimension data, GMM based clustering lost its advantages due to over-parameterization and noise features. To deal with this issue, we incorporate feature selection into GMM clustering. For the first time, a non-convex sparse inducing regularization is considered for feature selection in GMM clustering. The resulting optimization problem is nonconvex for which we develop a DCA (Difference of Convex functions Algorithm) to solve. Numerical experiments on several benchmark and synthetic datasets illustrate the efficiency of our algorithm and its superiority over an EM method for solving the GMM clustering using l1 regularization.


Keywords: Model-based clustering, Gaussian Mixture Models, Variable selection, Non-convex regularization, DC programming DCA.


Citation: Nguyen V.A., Le H.M., Le Thi H.A. (2020) A DCA Based Algorithm for Feature Selection in Model-Based Clustering. In: Nguyen N., Jearanaitanakij K., Selamat A., TrawiƄski B., Chittayasothorn S. (eds) Intelligent Information and Database Systems. ACIIDS 2020. Lecture Notes in Computer Science, vol 12033, pp. 404-415. Springer, Cham.


Download link