Support Vector Machines Review

This blog is a review about support vector machines.

  • Maximize the distance
  • Hinge loss and slack variable
  • Primal and dual forms
  • Kernel trick
  • Transductive SVM

Maximize the distance

SVM was put forward by Cortes and Vapnik (1995)1.
For a set of labeled points, we want to find a decision boundary which separates the two classes to the furthest possible distances.
Consider two points from the hyperplanes and respectively: their distance could be denoted by:

To maximize this distance is essentially minimize .

Hinge loss and slack variable

The data are not always linearly separable. Introduce hinge loss

which is 0 when each data point is at correct side (a.k.a when or <-1 if -1 for the other class); and larger than 0 otherwise.
Now we have a quadratic programming problem:

subject to

Primal and dual form

Above mentioed is the primal formulation. Its dual form (as in quadratic programming sense?) is:

subject to , and
where are defined as

Kernel trick

We can replace the term with a higher order (kernel) function .

Transductive SVM

Transductive learning applies when there are very few labeled data. During training, look at those unlabeled data. Test the model on the unlabeled data. As a reminder, do not choose models based on any information related to unlabeled data.
Transductive SVM is proposed by (Joachims 1999)2. The algorithm could be written simplified as following.
(1) Train an SVM on the labeled data;
(2) Use it to label those test data;
(3) if there exists two newly labeled points such that their estimated labels are different, and their slack variables are both larger than 0, and that their slack variables sum up to be greater than 2 (a.k.a: this decision boundary is not very “confident” for these two points), then toggle the estimated labels of these two points.
(4) Repeat step (1)-(3) several times until some criteria is satisfied.

This version could be implemented fairly easily. There are other versions of transductive SVMs. e.g: this github repo contains some good examples.

References: