Thanh Nghi Do, Hoai An Le Thi: Massive Classification with Support Vector Machines.

Abstract: The new boosting of Least-Squares SVM (LS-SVM), Proximal SVM (PSVM), Newton SVM (NSVM) algorithms aim at classifying very large datasets on standard personal computers (PCs). We extend the PSVM, LS-SVM and NSVM in several ways to efficiently classify large datasets. We developed a row incremental version for datasets with billions of data points. By adding a Tikhonov regularization term and using the Sherman-Morrison-Woodbury formula, we developed new algorihms to process datasets with a small number of data points but very high dimensionality. Finally, by applying boosting including AdaBoost and Arcx4 to these algorithms, we developed classification algorithms for massive, very-high-dimensional datasets. Numerical test results on large datasets from the UCI repository showed that our algorithms are often significantly faster and/or more accurate than state-of-the-art algorithms LibSVM, CB-SVM, SVM-perf and LIBLINEAR.


Keywords: Support vector machine (SVM) · Least-Squares SVM · Proximal SVM · Newton SVM · Boosting · Massive classification


Citation: T.N. Do, H.A. Le Thi, Massive Classification with Support Vector Machines, T. Computational Collective Intelligence 18: 1-19 (2015).