Another type of ML worth learning is SVM – support vector machine.

The math behind is similar or derived from logistic regression with some mathematical substitution to get below:

To be more concrete, it’s main purpose is to achieve higher margin in terms of vector distance between data points and decision boundary

Next, Kernel is introduced to create new features and approximate. The key here is to find landmark l1, l2, l3 and calculate similarity between x and l to come up with new f. In this snapshot, Gaussian kernels is used. If x and l is very close, f will be nearly close to 1, other wise, f will be close to 0.

Then replace this original feature value x with new f here, plug into the hypothesis equation theta0 + theta1x1 + theta2x2 + theta3x3 become theta0 + theta1f1 + theta2f2 + theta3f3.

Then the question is how to choose the landmark ls? It turns out using the test data is the best way to start from. So if there are 10,000 sample/test data points, 10,000 f is created (certainly the datapoint itself gets 100% similarity or f=1), thus the hypothesis formula contains 10,000 dimensions. Kernel goes well in SVM with regard to this intensive computation – Theta(t)Matrix(Theta) , but not other algos such as pure linear regression or logistic regression.