18CSE479T - Statistical Machine Learning UNIT 2 & 3 (12 MARKS)
12M:
The below 2 questions are unit 2 theory 12m. The other unit 2 12m are problems and programs on linear regression, which are more important and they have a high chance of being asked.
Explain in detail the terminologies involved in Logistic regression using suitable examples.
Information value(IV):
This is very useful in the preliminary filtering of variables prior to including them in the model
Eliminates major variables in the first step prior to fitting the model, as the number of variables present in the final model would be about 10
We got the total value of 0.0356 meaning it is a weak predictor to classify events
Akaike Information Criteria(AIC):
This measures the relative quality of a statistical model for a given set of data
It is a trade-off between bias versus variance
During a comparison between two models, the model with less AIC is preferred over high value
Rank ordering:
After sorting observations in descending order by predicted probabilities, deciles are created
By adding the number of events for each decile, we get aggregated events for each decile and this number should be in descending order
Concordance/ c-statistic:
This is a measure of quality of fit for a binary outcome in a logistic regression model
Population stability index(PSI):
This is the metric that is used to measure that drift in the current population on which the credit scoring model will be used
PSI <= 0.1 - no change
0.1 < PSI <= 0.25 - some change
PSI > 0.25 - large change
Differences between Ridge and Lasso Regression
Ridge Regression:
Squares of coefficients are taken into account
This type of L2 regularization shrinks the coefficients
Lasso Regression:
Magnitude of the coefficients are taken into account
This type of L1 regularization can lead to zero coefficients
LASSO stands for Least Absolute Shrinkage and Selection Operator
Key difference:
Ridge includes all or none of the features of the model. Thus the major advantage of ridge regression is coefficient shrinkage and reducing model complexity
Along with shrinking coefficients, lasso also performs feature selection
Traditionally stepwise regression was used to perform feature selection. But with advancements in machine learning, ridge and lasso regression provide very good alternatives
Typical use cases:
Ridge is majorly used to prevent overfitting, since it includes all the features, it is not very useful for exorbitantly high features
Since it provides sparse solution, lasso is generally the model of choice where the features are in millions or more
Presence of highly correlated features:
Ridge generally works well even in presence of highly correlated features
Lasso arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero
Along with ridge and lasso, elastic net is another useful technique which combines both L1 and L2 regularization. It can be used to balance out the pros and cons of ridge and lasso regression
In unit 3, the 12m are problems and programs on Naive Bayes and KNN
Comments
Post a Comment