Having learned a great deal of ML theory, the end goal is for application.

Prof.Ng talked about anomaly detection before he went on classical ML application – recommender system and photo OCR, he also touched in between the technique to deal with large scale of data – map reduce, data parallelogram, stochastic gradient descent and online learning. So I will blend then all together into one blog.

The key take away through out for me is how the problem solver come up with the architectural structure to formulate the problem in ML math. Once you can describe the problem in ML math language, the solution surface out automatically.

First, Anomaly Detection, from my view, is not necessary part of ML, it’s more suitable to the statistical field, simply extending one-dimensional Gaussian distribution to multiple dimensional world.

In real world, features relate to each other, hence, using the above multiplication of probability density is not ideal, so co-variance of feature matrix should be calculated and all at once.

Moving on to recommending system, how to describe the problem in a simplified case:

In addition to this simplified case, more complex situations, such as no score on romance or action for each movie is pre-given, but uses can self label and then the ML feed in this theta parameter and known ratings to fine tune recommender system, something called collaborative filtering.

Next, we went bit more deeper into photo OCR (Optical Character Recognition), because it’s quite interesting and powerful. For example, the below photo shows an ordinary scene where you park the car and look up to find banner sign – “Lulab’s Antique Mall”. What about to have a machine help read it so even a blind person can go about as anyone else?

The first mind-baffling obstacle is how to approach this problem? It goes back to the digitized pixel read in by machine and scan row by row like a sliding window, Prof. also talked about synthesize data and ceiling analysis but didn’t go down to the math equations.