Systematic learning of professor Ng’s machine learning course at youtube.

First, repaste the major topics he covered in this 112-video series:

Through out the whole course of machine learning, we need to grasp the essences of it. Cost of function is one the essences. to understand this concept from linear regression is a easy way. For example, below illustrate square error function, one type of cost function in a single feature linear regression problem:

The theta theta1 are adjusted to find the best fit regressional line as

So the key here is to come up with an algorithm to automatically find the theta and theta1 pair residing at the bottom or convergence of bowl shaped curve above.

This is algorithm is Gradient Descent. It’s actually just the partial differentials in calculus, note the theta0 and theta 1 must be updated simultaneously, which is consistent with the partial differentials concept in calculus.

alpha is the learning rate, in calculus, it’s the attempting stride you make when descending toward convergence with the rapidest speed. Intuitively we can infer that even setting alpha a big value can accelerate, it could also prevent you from arriving the right bottom as it escape a narrow bottom area.

To sum, we apply gradient descent algorithm to minimize/optimize the cost function J(theta0, theta1) of the hypothesis functoin Ho(x).