T.N. Do, H.A. Le Thi, Massive Classification with Support Vector Machines.

Abstract: The new boosting of Least-Squares SVM (LS-SVM), Proximal SVM (PSVM), Newton SVM (NSVM) algorithms aim at classifying very large datasets on standard personal computers (PCs). We extend the PSVM, LS-SVM and NSVM in several ways to efficiently classify large datasets. We developed a row incremental version for datasets with billions of data points. By adding a Tikhonov regularization term and using the Sherman-Morrison-Woodbury formula, we developed new algorithms to process datasets with a small number of data points but very high dimensionality. Finally, by applying boosting including AdaBoost and Arcx4 to these algorithms, we developed classification algorithms for massive, very-high-dimensional datasets. Numerical test results on large datasets from the UCI repository showed that our algorithms are often significantly faster and/or more accurate than state-of-the-art algorithms LibSVM, CB-SVM, SVM-perf and LIBLINEAR.


Keywords: Support vector machine (SVM), Least-Squares SVM, Proximal SVM, Newton SVM, Boosting, Massive classification.


Citation: Thanh Nghi Do, Hoai An Le Thi, Massive Classification with Support Vector Machines. Transactions on Computational Collective Intelligence XVIII, Volume 9240, pp. 147-165, 2015.


Download link