18CSE484T - Deep Learning Unit 2 & 3 (4 MARKS)

 


4M:


Write short notes on LeNet 

  • Developed in the year 1998 in his paper ‘Gradient based learning applied to document recognition’

  • Architecture primarily meant for OCR 

  • Lenet 5 consists of 7 layers - alternatively 2 convoluted layers and 2 max pooling layers and the 2 fully connected layers and the output layer with activation function softmax


Write short notes on Visual Geometry Group (VGG) Net

  • Proposed by Karen and Andrew in 2014

  • It is a standard deep CNN architecture with multiple layers

  • The ‘deep’ refers to the number of layers with VGG-16 or VGG-19 consisting of 16 and 19 convolutional layers

  • Basis of groundbreaking object recognition models

  • It has

  • 13 convolutional layers with kernel size 3 x 3 in each layer

  • 5 max pooling layer which reduces volume size

  • 2 fully connected layers each with 4096 nodes

  • 1 fully connected softmax output layer with 1000 channels


List out the challenges of Recurrent neural network

  • The vanishing gradient problem

  • Each of the neural networks weights receive an update proportional to the partial derivative of the error function

  • After combining using the chain rule, the final gradient can be vanishingly small, effectively preventing weight change 

  • The exploding gradient problem

  • Large error gradients accumulate and result in large updates to the weights

  • Results in unstable network

  • Very long sequences and temporal dependencies

  • Theoretically, RNN can handle this problem but it is increasingly harmed by performance problems



Give the block diagram for Long Short term Memory

  • Long short term memory networks - usually called LSTMs - are a special kind of RNN capable of learning long term dependencies

  • In LSTM, there are three gates

  • Input gate

  • Forget gate

  • Output gate


Distinguish between RNN and LSTM


Give the drawback of rectified linear unit and state how it can be resolved

  • ReLU stands for Rectified linear unit

  • The main advantage is that it does not activate all the neurons at the same time

  • Gradient of ReLU function: f(x) = 1 if x>0   f(x) = 0 if x<0

  • The gradient is zero, due to this during backpropagation process, the weights and bias for some neurons are not updated

  • This can create dead neurons which never get activated

  • This can be resolved by the Leaky ReLU function

  • Gradient of Leaky ReLU function: f(x) = 1 if x>=0   f(x) = a if x<0



Identify the techniques to address the problem of overfitting in a neural network

  • The training data contains information about the regularities in the mapping from the input to the output. But it also contains sampling error

  • When we fit the model, it cannot tell which regularities are real and which are caused by sampling error

  • This means that the model will not generalize well to unseen data

  • Techniques to resolve this problem

  • L2 Regularization / ridge regularization

  • L1 Regularization / lasso regularization

  • Dropout

  • Early stopping


Write short notes on dropout

  • One of the most interesting types of regularization techniques 

  • Produces very good results and is consequently the most frequently used regularization technique

  • At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections

  • So each iteration has a different set of nodes and results in different sets of outputs

  • The probability of choosing how many nodes to be dropped is the hyperparameter of the dropout function


Write short notes on early stopping

  • Early stopping is a kind of cross validation strategy where we keep one part of the training set as the validation set

  • When we see that the performance in the validation set is getting worse, we immediately stop the training on the model. This is called early stopping

  • We will stop training at the dotted line since after that our model will start overfitting


Write short notes on stochastic gradient descent on convolutional neural networks

  • A gradient measures how much the output of a function changes if you change the inputs a little bit

  • Gradient descent is slow on huge data

  • SGD randomly picks one data point from the whole data set at each iteration to reduce the computations enormously

  • By iteratively updating the weights, the model aims to minimize the loss and improve its accuracy

  • Mini-batch tries to strike a balance between goodness of gradient descent and the speed of SGD


Brief the importance of pooling layer in deep neural networks

  • The purpose of the pooling layer is to reduce the dimensions of the hidden layers by combining the outputs of neuron clusters at the previous layer into a single neuron in the next layer

  • Important features:

  • Dimensionality reduction

  • Translation invariance (regardless of small changes in the input, the model generates similar outputs)

  • Feature selection (selects dominant information and discards less relevant details)

  • Robustness to variations (helps prevent overfitting by generalizing features)

  • Computational efficiency (smaller feature maps help reduce computation)


Elaborate the importance of ReLU activation function.

  • ReLU introduces nonlinearity to neural networks

  • This nonlinearity enables neural networks to capture intricate patterns in data

  • Traditional activation functions like sigmoid and hyperbolic tangent suffer from the vanishing gradient function

  • ReLU mitigates this issue by maintaining a gradient of 1 for positive inputs

  • ReLU is computationally efficient and speeds up training and inference

  • ReLU encourages sparsity in activations, helps prevent overfitting and improve generalization


Illustrate Learning XOR problem with the inputs and outputs

Plz refer ppt


Comments

Popular posts from this blog

18ECO124T - HUMAN ASSIST DEVICES UNIT 2 & 3 - 12M

18CSE483T - INTELLIGENT MACHINING UNIT 4 & 5 - 4M