The concept of Neural Network can be illustrated by the MNIST number recognizing data set.

Each letter can be broken up into 28X28 pixels, with each cell given a value indicating its brightness, then the whole 28X28 pixel elements are flatten to one line/row = 28*28 = 784 cells. For each cell, a random value is given, associated with a weight and bias, the concept was originated from Rosenblatt’s fine tuning a knob to figure out in which state the bulb is turned on.

Because there is a training data, which is analogues to experimental setting and results when adjusting the knobs to turn on light, feedbacks can be constructed by the math tools – matrix, calculus.

To expand on the math, let’s assume a very simple neural network, composed of 4 layers, each layer only one node (borrow from 3Brown3Blue)

To calculate the divergence from the actual value y (1.0) in this example, note the cost C0 is the cost function that we want to reduce.

To reduce cost function, limitation, differentiation in math will be introduced handily to solve the problem. And because there are two variables, weight and bias, so partial differentiation:

Taking it apart, we get

Once we grasp the math for this simple 4-nodes NN, the following multiple-layers situation is just matrix math expansion:

Lastly, when the math is all laid out logically, what’s left is to calculate. The calculation here is gradient descent, partial differentiation on multiple dimensions to reach to the steepest descending like sliding from mountains to the valley.

from numpy import * # y = mx + b # m is slope, b is y-intercept def compute_error_for_line_given_points(b, m, points): totalError = 0 for i in range(0, len(points)): x = points[i, 0] y = points[i, 1] totalError += (y - (m * x + b)) ** 2 return totalError / float(len(points)) def step_gradient(b_current, m_current, points, learningRate): b_gradient = 0 m_gradient = 0 N = float(len(points)) for i in range(0, len(points)): x = points[i, 0] y = points[i, 1] b_gradient += -(2/N) * (y - ((m_current * x) + b_current)) m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current)) new_b = b_current - (learningRate * b_gradient) new_m = m_current - (learningRate * m_gradient) return [new_b, new_m] def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations): b = starting_b m = starting_m for i in range(num_iterations): b, m = step_gradient(b, m, array(points), learning_rate) return [b, m] def run(): points = genfromtxt("data.csv", delimiter=",") learning_rate = 0.0001 initial_b = 0 # initial y-intercept guess initial_m = 0 # initial slope guess num_iterations = 1000 print "Starting gradient descent at b = {0}, m = {1}, error = {2}".format(initial_b, initial_m, compute_error_for_line_given_points(initial_b, initial_m, points)) print "Running..." [b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations) print "After {0} iterations b = {1}, m = {2}, error = {3}".format(num_iterations, b, m, compute_error_for_line_given_points(b, m, points)) if __name__ == '__main__': run()

A simple Python algorithm is cited from Saraj Github to illustrate how to apply gradient descent computation to find a linear regression model for a set of data (data.csv saved up)