18CSE484T - Deep Learning Unit 2 & 3 (4 MARKS)
4M:
Write short notes on LeNet
Developed in the year 1998 in his paper ‘Gradient based learning applied to document recognition’
Architecture primarily meant for OCR
Lenet 5 consists of 7 layers - alternatively 2 convoluted layers and 2 max pooling layers and the 2 fully connected layers and the output layer with activation function softmax
Write short notes on Visual Geometry Group (VGG) Net
Proposed by Karen and Andrew in 2014
It is a standard deep CNN architecture with multiple layers
The ‘deep’ refers to the number of layers with VGG-16 or VGG-19 consisting of 16 and 19 convolutional layers
Basis of groundbreaking object recognition models
It has
13 convolutional layers with kernel size 3 x 3 in each layer
5 max pooling layer which reduces volume size
2 fully connected layers each with 4096 nodes
1 fully connected softmax output layer with 1000 channels
List out the challenges of Recurrent neural network
The vanishing gradient problem
Each of the neural networks weights receive an update proportional to the partial derivative of the error function
After combining using the chain rule, the final gradient can be vanishingly small, effectively preventing weight change
The exploding gradient problem
Large error gradients accumulate and result in large updates to the weights
Results in unstable network
Very long sequences and temporal dependencies
Theoretically, RNN can handle this problem but it is increasingly harmed by performance problems
Give the block diagram for Long Short term Memory
Long short term memory networks - usually called LSTMs - are a special kind of RNN capable of learning long term dependencies
In LSTM, there are three gates
Input gate
Forget gate
Output gate
Distinguish between RNN and LSTM
Give the drawback of rectified linear unit and state how it can be resolved
ReLU stands for Rectified linear unit
The main advantage is that it does not activate all the neurons at the same time
Gradient of ReLU function: f(x) = 1 if x>0 f(x) = 0 if x<0
The gradient is zero, due to this during backpropagation process, the weights and bias for some neurons are not updated
This can create dead neurons which never get activated
This can be resolved by the Leaky ReLU function
Gradient of Leaky ReLU function: f(x) = 1 if x>=0 f(x) = a if x<0
Identify the techniques to address the problem of overfitting in a neural network
The training data contains information about the regularities in the mapping from the input to the output. But it also contains sampling error
When we fit the model, it cannot tell which regularities are real and which are caused by sampling error
This means that the model will not generalize well to unseen data
Techniques to resolve this problem
L2 Regularization / ridge regularization
L1 Regularization / lasso regularization
Dropout
Early stopping
Write short notes on dropout
One of the most interesting types of regularization techniques
Produces very good results and is consequently the most frequently used regularization technique
At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections
So each iteration has a different set of nodes and results in different sets of outputs
The probability of choosing how many nodes to be dropped is the hyperparameter of the dropout function
Write short notes on early stopping
Early stopping is a kind of cross validation strategy where we keep one part of the training set as the validation set
When we see that the performance in the validation set is getting worse, we immediately stop the training on the model. This is called early stopping
We will stop training at the dotted line since after that our model will start overfitting
Write short notes on stochastic gradient descent on convolutional neural networks
A gradient measures how much the output of a function changes if you change the inputs a little bit
Gradient descent is slow on huge data
SGD randomly picks one data point from the whole data set at each iteration to reduce the computations enormously
By iteratively updating the weights, the model aims to minimize the loss and improve its accuracy
Mini-batch tries to strike a balance between goodness of gradient descent and the speed of SGD
Brief the importance of pooling layer in deep neural networks
The purpose of the pooling layer is to reduce the dimensions of the hidden layers by combining the outputs of neuron clusters at the previous layer into a single neuron in the next layer
Important features:
Dimensionality reduction
Translation invariance (regardless of small changes in the input, the model generates similar outputs)
Feature selection (selects dominant information and discards less relevant details)
Robustness to variations (helps prevent overfitting by generalizing features)
Computational efficiency (smaller feature maps help reduce computation)
Elaborate the importance of ReLU activation function.
ReLU introduces nonlinearity to neural networks
This nonlinearity enables neural networks to capture intricate patterns in data
Traditional activation functions like sigmoid and hyperbolic tangent suffer from the vanishing gradient function
ReLU mitigates this issue by maintaining a gradient of 1 for positive inputs
ReLU is computationally efficient and speeds up training and inference
ReLU encourages sparsity in activations, helps prevent overfitting and improve generalization
Illustrate Learning XOR problem with the inputs and outputs
Plz refer ppt
Comments
Post a Comment