CSE 5160 Machine Learning Spring 2021 Assignment 4 Due On Ap

Cse 5160 Machine Learning Spring 2021 Assignment 4 Due On April 2

Given a neural network, the structure is shown below. Each neuron in the neural network uses the logistic or sigmoid function ð‘”(ð‘§) = ! !" $!" as activation function. ð‘§% ['] is the output of the linear part of ð‘—) neuron in layer ð‘™; ð‘Ž% ['] is the output of the activation part of ð‘—) neuron in layer ð‘™. 1. [20 points] (Forward propagation) Given a training example (⃗, ð‘¦), 𑥠∈ â„+, what is the output of the neural network ð‘¦.? 2. [50 points] (Backpropagation) The loss function is defined by logistic loss function ð¿(ð‘¦.,ð‘¦) = −[ð‘¦ð‘™ð‘œð‘”ð‘¦. + (1 − ð‘¦)ð‘™ð‘œð‘”(1 − ð‘¦.)] . Please derive the partial derivatives of loss function with respect to parameters in the stochastic gradient descent update rules, that is, derive ,- ,.[$] and ,- ,/[$] , ð‘™ = 1,2,3. ð‘¥! ð‘¥0 ð‘¥+ .... .... .... .... ð‘§! [!] ð‘Ž! [!] ð‘§0 [!] ð‘Ž0 [!] ð‘§1 [!] ð‘Ž1 [!] ð‘§! [0] ð‘Ž! [0] ð‘§0 [0] ð‘Ž0 [0] ð‘§! [1] ð‘Ž! [1] ð‘¦.

Paper For Above instruction

The problem presented involves two key aspects of neural network operations: forward propagation and backpropagation, specifically within the context of a neural network employing sigmoid activation functions and logistic loss. This analysis aims to provide a comprehensive understanding of how a neural network processes input data to produce an output and how it updates its parameters to minimize the loss function effectively.

Introduction

Neural networks are a cornerstone of modern machine learning, mimicking biological neural processes to recognize patterns and make decisions. The forward propagation process involves calculating the output of the network given an input, while backpropagation adjusts the network's parameters to improve its performance based on the error signal derived from a loss function. This paper explores these processes in detail, focusing on a network with sigmoid activations and logistic loss.

Forward Propagation in Neural Networks

Forward propagation involves passing the input features through successive layers of the network. Each neuron computes a linear combination of its inputs, followed by a non-linear activation. Specifically, for neuron i in layer l, the input sum (linear part) is given by:

zi(l) = ∑j wij(l) aj(l-1) + bi(l),

where wij(l) are the weights, bi(l) are biases, and aj(l-1) are activations from the previous layer.

The output of the neuron after applying the sigmoid activation function is:

ai(l) = σ(zi(l)) = 1 / (1 + e-zi(l)).

Given a training example (x, y), where x is the input vector, the forward pass computes the output after propagating through each layer, ultimately resulting in the network’s prediction Ω(x).

Backpropagation and Gradient Calculation

The training process involves updating network parameters to minimize the logistic loss:

L(y, β(x)) = - [ y log(Ω(x)) + (1 - y) log(1 - Ω(x)) ],

where Ω(x) is the predicted probability from the network (after sigmoid activation at the output layer).

The derivatives with respect to weights and biases are derived using chain rule applications, propagating gradients backward from the output layer to earlier layers.

Derivatives of the Loss Function

Let’s denote the output of the final layer after applying sigmoid as p = Ω(x). The derivative of the logistic loss with respect to the predicted probability is:

∂L / ∂p = -( y / p - (1 - y) / (1 - p) ).

Since p = σ(z(L)), where z(L) is the linear combination at the output layer, the chain rule leads to:

∂L / ∂z(L) = (p - y).

For each weight wi and bias bi in the last layer:

  • - ∂L / ∂wi = (a(L-1))i * (p - y)

  • - ∂L / ∂bi = (p - y)

Where a(L-1) are the activations from the previous layer, used as inputs for the last layer weights.

Backpropagation in Hidden Layers

For each hidden layer l, the gradient of the loss with respect to the weights and biases can be computed using the errors propagated from the next layer. Specifically, the error term at each neuron in layer l is:

&deltai(l) = σ'(zi(l)) * ∑k wik(l+1) &deltak(l+1),

where σ'() is the derivative of the sigmoid function:

σ'(z) = σ(z)(1 - σ(z)).

The gradients for weights and biases in hidden layers are then:

  • - ∂L / ∂wij(l) = aj(l-1) * &deltai(l),

  • - ∂L / ∂bi(l) = &deltai(l).

These derivatives enable the network to update parameters via stochastic gradient descent, reducing the loss iteratively.

Conclusion

Understanding the mechanics of forward propagation and backpropagation with sigmoid activation functions and logistic loss lays the foundation for designing effective neural networks. Precise calculation of derivatives ensures accurate weight updates, leading to improved learning outcomes. This process embodies the core principle of supervised learning in neural networks, where the model iteratively adjusts based on the error signal to better approximate the target outputs.

References

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 321(6088), 533-536.
  • Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.
  • Alexnet paper, Krizhevsky, S., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  • Zhang, Y., & Yang, J. (2017). A survey on deep learning: Algorithms, applications, and open research issues. Journal of Communications and Information Networks, 2(4), 1-25.
  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of ICLR.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR.
  • Hinton, G. (2007). Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10), 428–434.