18CSE479T - Statistical Machine Learning UNIT 2 & 3 (12 MARKS)

April 03, 2024

12M:

The below 2 questions are unit 2 theory 12m. The other unit 2 12m are problems and programs on linear regression, which are more important and they have a high chance of being asked.

Explain in detail the terminologies involved in Logistic regression using suitable examples.

Information value(IV):

This is very useful in the preliminary filtering of variables prior to including them in the model
Eliminates major variables in the first step prior to fitting the model, as the number of variables present in the final model would be about 10

RANGE	BIN NUMBER	EVENTS	NON EVENTS	% OF EVENTS	% OF NON EVENTS	[E] - [NE]	WOE	IV
0-50	1	40	394	5%	5%	0.003	0.062	0.0002
	total	744	7794				IV total	0.0356

We got the total value of 0.0356 meaning it is a weak predictor to classify events

Akaike Information Criteria(AIC):

This measures the relative quality of a statistical model for a given set of data
It is a trade-off between bias versus variance
During a comparison between two models, the model with less AIC is preferred over high value

Rank ordering:

After sorting observations in descending order by predicted probabilities, deciles are created
By adding the number of events for each decile, we get aggregated events for each decile and this number should be in descending order

Concordance/ c-statistic:

This is a measure of quality of fit for a binary outcome in a logistic regression model

Population stability index(PSI):

This is the metric that is used to measure that drift in the current population on which the credit scoring model will be used
PSI <= 0.1 - no change
0.1 < PSI <= 0.25 - some change
PSI > 0.25 - large change

Differences between Ridge and Lasso Regression

Ridge Regression:

Squares of coefficients are taken into account
This type of L2 regularization shrinks the coefficients

Lasso Regression:

Magnitude of the coefficients are taken into account
This type of L1 regularization can lead to zero coefficients
LASSO stands for Least Absolute Shrinkage and Selection Operator

Key difference:

Ridge includes all or none of the features of the model. Thus the major advantage of ridge regression is coefficient shrinkage and reducing model complexity
Along with shrinking coefficients, lasso also performs feature selection
Traditionally stepwise regression was used to perform feature selection. But with advancements in machine learning, ridge and lasso regression provide very good alternatives

Typical use cases:

Ridge is majorly used to prevent overfitting, since it includes all the features, it is not very useful for exorbitantly high features
Since it provides sparse solution, lasso is generally the model of choice where the features are in millions or more

Presence of highly correlated features:

Ridge generally works well even in presence of highly correlated features
Lasso arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero
Along with ridge and lasso, elastic net is another useful technique which combines both L1 and L2 regularization. It can be used to balance out the pros and cons of ridge and lasso regression

In unit 3, the 12m are problems and programs on Naive Bayes and KNN

Search This Blog

loveandhate

18CSE479T - Statistical Machine Learning UNIT 2 & 3 (12 MARKS)

Comments

Post a Comment

Popular posts from this blog

18ECO124T - HUMAN ASSIST DEVICES UNIT 2 & 3 - 12M

18CSE483T - INTELLIGENT MACHINING UNIT 1 CLASS NOTES

18CSE483T - INTELLIGENT MACHINING UNIT 4 & 5 - 4M