Support vector machine are a popular Machine Learning tool.
We will consider a binary classification problem with positive $y=+1$ and negative $y=-1$ classes.
The goal of a Support vector machine is to separate the two classes using a line that maximizes the minimal distance (margin) of the data to the decision boundary.

There are two types of large margin classification


For our linear model the value
$$ z =w_0 + \vec x\cdot\vec w $$is proportional to the distance to the $z=0$ curve:
$$ d = \frac{z}{||\vec w||}$$
By rescaling $\vec w$ we change the relationship between the distance and the value of $z$.
In a SMV we declare our margin to be between $z=1$ and $z=-1$ and find the value of $w_0,\vec w$ that
The two goals are in conflict!
Here we use the iris dataset again, but we rescaled the features so that they have 0 mean and unit standard deviation.



Here we use the cancer data set we used for previous lectures and exercises.



Adding data to the training set only affects the model if the additional point falls into the margin.
The model is completely defined by the data samples at the boundary or inside the margin (this is where the name comes from, these data samples are the "support" vectors)
Note: Unlike in the logisitic regression case, there is no probabilistic interpretation for a SVM.