DAVIAN Lab. Deep Learning Winter Study (2021)

Writer: Sunjun Kweon

Information

Title: (cs231n) Lecture 4 : Introduction to Neural Networks
Link: http://cs231n.stanford.edu/slides/2020/lecture_4.pdf
Keywords: Neural Networks, Jacobians, Backpropagation

Neural Network

Motivation : Linear classifers are not very powerful. It is hard to classify things which cannot be divided by a single line(hyperplane)
Stacking multiple layers with non-linearity expresses more

Instead of just having linear score s=W1x 2-layer neural network's score is s=W2f(W1x)
f which contributes to non-linearity is called activation function. Below are popular choices for activation functions

Jacobian

-Consider a vector function f(x)=(f1(x),f2(x),...fm(x)). Consider a small change Δx.

f(x+Δx)=[f1(x+Δx),f2(x+Δx),...fm(x+Δx)]=f(x)+[∇f1(x),...∇fm(x)]TΔx

The Jacobian of f at x is

-Example : R^m to R^n (y=Ax)

Jacobian is considered as a partial derivative of multidimensional mapping

-Example : R^(m*n) to R (y=f(X))

matrix derivative

-Example : R^(mn) to R^k (y=Wx where W is mn matrix, x is n-dim vector)

The Jacobian has dimension k*(m*n) where ith row is given by (k=m)

Backpropagtion

-Directly optimizing the whole neural network(getting the gradient) is complicated. Therefore we use backpropagation.

-Backpropgation comes from the chain rule

When we want to calculate the gradient(jacobian) of Loss with respect to a certain weight, we multiply the upstream gradient(which comes backward from the loss) with the local gradient. Then we use gradient descent algorithm to optimize the loss. (Note : the derivative can be either gradient or jacobians, but must have the same dimension with the variable to update)

-Scalar example with computational graph

q=x+y and f=q*z

df/dz can be directly calcaluated from f=q*z

df/dx's upstream gradient is df/dq and local gradient is dq/dx

df/dy's upstream gradient is df/dq and local gradient is dq/dy

-Backpropagation in neural network

We have to get the gradient of Wn for the update and Xn to deliver it to the next layer. dL/dXn+1 is received from next or (n+1)th layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week4.1_Neural_Network.md

week4.1_Neural_Network.md

DAVIAN Lab. Deep Learning Winter Study (2021)

Information

Neural Network

Jacobian

Backpropagtion

Files

week4.1_Neural_Network.md

Latest commit

History

week4.1_Neural_Network.md

File metadata and controls

DAVIAN Lab. Deep Learning Winter Study (2021)

Information

Neural Network

Jacobian

Backpropagtion