Back Propagation Calculus in Neural Network

The concept of Neural Network can be illustrated by the MNIST number recognizing data set.

Each letter can be broken up into 28X28 pixels, with each cell given a value indicating its brightness, then the whole 28X28 pixel elements are flatten to one line/row = 28*28 = 784 cells. For each cell, a random value is given, associated with a weight and bias, the concept was originated from Rosenblatt’s fine tuning a knob to figure out in which state the bulb is turned on.

Because there is a training data, which is analogues to experimental setting and results when adjusting the knobs to turn on light, feedbacks can be constructed by the math tools – matrix, calculus.

To expand on the math, let’s assume a very simple neural network, composed of 4 layers, each layer only one node (borrow from 3Brown3Blue)

To calculate the divergence from the actual value y (1.0) in this example, note the cost C0 is the cost function that we want to reduce.

To reduce cost function, limitation, differentiation in math will be introduced handily to solve the problem. And because there are two variables, weight and bias, so partial differentiation:

Taking it apart, we get

Once we grasp the math for this simple 4-nodes NN, the following multiple-layers situation is just matrix math expansion:

Lastly, when the math is all laid out logically, what’s left is to calculate. The calculation here is gradient descent, partial differentiation on multiple dimensions to reach to the steepest descending like sliding from mountains to the valley.

from numpy import *
# y = mx + b
# m is slope, b is y-intercept
def compute_error_for_line_given_points(b, m, points):
    totalError = 0
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        totalError += (y - (m * x + b)) ** 2
    return totalError / float(len(points))

def step_gradient(b_current, m_current, points, learningRate):
    b_gradient = 0
    m_gradient = 0
    N = float(len(points))
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        b_gradient += -(2/N) * (y - ((m_current * x) + b_current))
        m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current))
    new_b = b_current - (learningRate * b_gradient)
    new_m = m_current - (learningRate * m_gradient)
    return [new_b, new_m]

def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
    b = starting_b
    m = starting_m
    for i in range(num_iterations):
        b, m = step_gradient(b, m, array(points), learning_rate)
    return [b, m]

def run():
    points = genfromtxt("data.csv", delimiter=",")
    learning_rate = 0.0001
    initial_b = 0 # initial y-intercept guess
    initial_m = 0 # initial slope guess
    num_iterations = 1000
    print "Starting gradient descent at b = {0}, m = {1}, error = {2}".format(initial_b, initial_m, compute_error_for_line_given_points(initial_b, initial_m, points))
    print "Running..."
    [b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
    print "After {0} iterations b = {1}, m = {2}, error = {3}".format(num_iterations, b, m, compute_error_for_line_given_points(b, m, points))

if __name__ == '__main__':
    run()

A simple Python algorithm is cited from Saraj Github to illustrate how to apply gradient descent computation to find a linear regression model for a set of data (data.csv saved up)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.