18CSE484T - Deep Learning Unit 2 & 3 (4 MARKS)

4M:

Write short notes on LeNet

Developed in the year 1998 in his paper ‘Gradient based learning applied to document recognition’
Architecture primarily meant for OCR
Lenet 5 consists of 7 layers - alternatively 2 convoluted layers and 2 max pooling layers and the 2 fully connected layers and the output layer with activation function softmax

Write short notes on Visual Geometry Group (VGG) Net

Proposed by Karen and Andrew in 2014
It is a standard deep CNN architecture with multiple layers
The ‘deep’ refers to the number of layers with VGG-16 or VGG-19 consisting of 16 and 19 convolutional layers
Basis of groundbreaking object recognition models
It has

List out the challenges of Recurrent neural network

Each of the neural networks weights receive an update proportional to the partial derivative of the error function
After combining using the chain rule, the final gradient can be vanishingly small, effectively preventing weight change

Theoretically, RNN can handle this problem but it is increasingly harmed by performance problems

Give the block diagram for Long Short term Memory

Long short term memory networks - usually called LSTMs - are a special kind of RNN capable of learning long term dependencies
In LSTM, there are three gates

Distinguish between RNN and LSTM

Give the drawback of rectified linear unit and state how it can be resolved

ReLU stands for Rectified linear unit
The main advantage is that it does not activate all the neurons at the same time
Gradient of ReLU function: f(x) = 1 if x>0 f(x) = 0 if x<0
The gradient is zero, due to this during backpropagation process, the weights and bias for some neurons are not updated
This can create dead neurons which never get activated
This can be resolved by the Leaky ReLU function
Gradient of Leaky ReLU function: f(x) = 1 if x>=0 f(x) = a if x<0

Identify the techniques to address the problem of overfitting in a neural network

The training data contains information about the regularities in the mapping from the input to the output. But it also contains sampling error
When we fit the model, it cannot tell which regularities are real and which are caused by sampling error
This means that the model will not generalize well to unseen data
Techniques to resolve this problem

Write short notes on dropout

One of the most interesting types of regularization techniques
Produces very good results and is consequently the most frequently used regularization technique
At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections
So each iteration has a different set of nodes and results in different sets of outputs
The probability of choosing how many nodes to be dropped is the hyperparameter of the dropout function

Write short notes on early stopping

Early stopping is a kind of cross validation strategy where we keep one part of the training set as the validation set
When we see that the performance in the validation set is getting worse, we immediately stop the training on the model. This is called early stopping

We will stop training at the dotted line since after that our model will start overfitting

Write short notes on stochastic gradient descent on convolutional neural networks

A gradient measures how much the output of a function changes if you change the inputs a little bit
Gradient descent is slow on huge data
SGD randomly picks one data point from the whole data set at each iteration to reduce the computations enormously
By iteratively updating the weights, the model aims to minimize the loss and improve its accuracy
Mini-batch tries to strike a balance between goodness of gradient descent and the speed of SGD

Brief the importance of pooling layer in deep neural networks

The purpose of the pooling layer is to reduce the dimensions of the hidden layers by combining the outputs of neuron clusters at the previous layer into a single neuron in the next layer
Important features:

Dimensionality reduction
Translation invariance (regardless of small changes in the input, the model generates similar outputs)
Feature selection (selects dominant information and discards less relevant details)
Robustness to variations (helps prevent overfitting by generalizing features)
Computational efficiency (smaller feature maps help reduce computation)

Elaborate the importance of ReLU activation function.

ReLU introduces nonlinearity to neural networks
This nonlinearity enables neural networks to capture intricate patterns in data
Traditional activation functions like sigmoid and hyperbolic tangent suffer from the vanishing gradient function
ReLU mitigates this issue by maintaining a gradient of 1 for positive inputs
ReLU is computationally efficient and speeds up training and inference
ReLU encourages sparsity in activations, helps prevent overfitting and improve generalization

Illustrate Learning XOR problem with the inputs and outputs

Plz refer ppt

loveandhate