Rémi Flamary

Professional website

SVM and regularization

In this demo, we illustrate the effect of regularization and kernel parameter choice on the decision function of support vector machines (SVM).

Short introduction to SVM

The decision function of a support vector machine classifier is obtained through the minimization of the following optimization problem:

5b2ea4b46b852167f19927c6576987f5
where C is a regularization parameter that balances the data fitting (left hand term) and the regularization (right hand term). Note that the data fitting term uses a training set cfd878cb02fad0d06970af954452e83d that consists in a list of samples f1adc4952277ee669feadb4a60841f8d and their associated class bd3640d761eaff73c929d8a25c8ce45c.

One of the strengh of SVM is their ability to chose a complex representation of the data thanks to the use of a kernel function 20bf4ae8960323bcc6729f73a0f12e4f that measures the similarity between samples. The decision function is of the form

1f851f5eeb17092ec2b4906b3f4d8c42
In practice the Gaussian kernel (also known as RBF) defined as
fb7ba0f67cd578639df92b2bce351f9d
is often used when the decision function has to handle non-linearities.

When using SVM with a gaussian kernel, one has to select two important parameters: C and ae539dfcc999c28e25a0f3ae65c1de79. In this demo we illustrate the effect of those parameters on the final decision function of the SVM.

Dataset used in the demo

In this demo, we illustrate SVM using a 2D non-linear toy dataset also known as "Clown". The main advantage of a 2D example is that it is easy to plot and visualize the samples of each classes in a classical scatter plot figure as shown below.

training samples

In this Figure, we can see that a non-linear function has to be used for a correct classification but the complexity of the fonction is limited. As illustrated in the next section, the parameters have to be chosen carefully.

Regularization demo

classification result

Regularization parameter C

Kernel parameter gamma

Performance

Rec. rate

Current value: C=10

Current value: gamma=10

RR=0.9754

The C parameter that will balance the data and regularization term, which will promote smooth decision function when C is small. The parameter ae539dfcc999c28e25a0f3ae65c1de79 of the gaussian kernel is also extremely important as it will define the neighborhood of the samples. A large ae539dfcc999c28e25a0f3ae65c1de79 leads to more complex function. The precision of the classifier on a large test sample is also reported on the right as RR (for recognition rate).

References

For more information about Support Vector Machine, I strongly recommend [1] that is a classic introduction. A very good course by Stéphane Canu is also available online [2].

The figures have been generated using Python,Numpy, and Scikit Learn. The code is avalable here.

[1] Learning with kernels: support vector machines, regularization, optimization, and beyond, B Scholkopf, AJ Smola, 2001, MIT Press.

[2] Understanding SVM, S. Canu.