18CSE479T - Statistical Machine Learning UNIT 2 & 3 (12 MARKS)

 12M:


The below 2 questions are unit 2 theory 12m. The other unit 2 12m are problems and programs on linear regression, which are more important and they have a high chance of being asked.


Explain in detail the terminologies involved in Logistic regression using suitable examples.

Information value(IV):

  • This is very useful in the preliminary filtering of variables prior to including them in the model

  • Eliminates major variables in the first step prior to fitting the model, as the number of variables present in the final model would be about 10


RANGE

BIN NUMBER

EVENTS

NON EVENTS

% OF EVENTS

% OF NON EVENTS

[E] - [NE]

WOE

IV

0-50

1

40

394

5%

5%

0.003

0.062

0.0002


total

744

7794




IV total

0.0356


  • We got the total value of 0.0356 meaning it is a weak predictor to classify events

Akaike Information Criteria(AIC):

  • This measures the relative quality of a statistical model for a given set of data

  • It is a trade-off between bias versus variance

  • During a comparison between two models, the model with less AIC is preferred over high value


Rank ordering:

  • After sorting observations in descending order by predicted probabilities, deciles are created

  • By adding the number of events for each decile, we get aggregated events for each decile and this number should be in descending order


Concordance/ c-statistic:

  • This is a measure of quality of fit for a binary outcome in a logistic regression model


Population stability index(PSI):

  • This is the metric that is used to measure that drift in the current population on which the credit scoring model will be used

  • PSI <= 0.1 - no change

  • 0.1 < PSI <= 0.25 - some change

  • PSI > 0.25 - large change


Differences between Ridge and Lasso Regression

Ridge Regression:

  • Squares of coefficients are taken into account

  • This type of L2 regularization  shrinks the coefficients


Lasso Regression:

  • Magnitude of the coefficients are taken into account

  • This type of L1 regularization can lead to zero coefficients

  • LASSO stands for Least Absolute Shrinkage and Selection Operator

Key difference:

  • Ridge includes all or none of the features of the model. Thus the major advantage of ridge regression is coefficient shrinkage and reducing model complexity

  • Along with shrinking coefficients, lasso also performs feature selection

  • Traditionally stepwise regression was used to perform feature selection. But with advancements in machine learning, ridge and lasso regression provide very good alternatives


Typical use cases:

  • Ridge is majorly used to prevent overfitting, since it includes all the features, it is not very useful for exorbitantly high features

  • Since it provides sparse solution, lasso is generally the model of choice where the features are in millions or more


Presence of highly correlated features:

  • Ridge generally works well even in presence of highly correlated features

  • Lasso arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero

  • Along with ridge and lasso, elastic net is another useful technique which combines both L1 and L2 regularization. It can be used to balance out the pros and cons of ridge and lasso regression


In unit 3, the 12m are problems and programs on Naive Bayes and KNN


Comments

Popular posts from this blog

18ECO124T - HUMAN ASSIST DEVICES UNIT 2 & 3 - 12M

18CSE483T - INTELLIGENT MACHINING UNIT 4 & 5 - 4M