18CSE359T - NATURAL LANGUAGE PROCESSING UNIT 1

12M:

Using the different NLP methods, identify POS tagging

The POS tagging problem is to determine the POS tag for a particular istance of a word
POS - Parts of Speech

Methods to assign POS to words:

Rule based tagging
Learning based:

Stochastic models: HMM tagging, maximum entropy tagging
Rule learning: transformation based tagging

Rule based tagging:

Two stage architectureENGTWOL tagging

Stage 1 - Use dictionary to assign each word a list of potential POS
Stage 2 - Hand-written disambiguation rules to fine tune to single POS

Disambiguation rule:

Adverbial-THAT rule:-

Given input: ‘that’

(+1 A/ADV/QUANT);

(+2SENT-LIM);

(NOT -1 SVOC/A);

then eliminate non-ADV tags

else eliminate ADV tag

I consider that odd. NOT ADV

It isn’t that odd. ADV

Stochastic POS Tagging:

Pick the most likely tag for the word
Example:

She is expected to race tomorrow
The race for the cup begins!

Transformation based tagging:

Combination of rule based tagging and stochastic tagging
The TBL algorithm has a set of tagging rule

1 - Corpus is tagged using broadest rule
2 - specific rule
3 - narrower rule

Eg: She is expected to race(NN) tomorrow

Change NN to VB when the previous tag is TO

She is expected to race(VB) tomorrow

Explain in detail various approaches to NLP tasks

Approaches on NLP tasks:

Rule based approach
Statistical / Stochastic Approach
Machine Learning Approach

Rule Based Approach:

It is the oldest, tried and tested approach that has proved its efficiency through its results
It efficiently decodes the linguistic relationship between words to translate the sentence
It can be counted as the ‘fill in the blanks’ method
It is easily achieved through focus on pattern matching or parsing

Statistical approach:

A stochastic model is said to be probabilistic or statistical, if its representation is from the theories of probability and statistics respectively
Maximum entropy model, Hidden Markov Models, Expectation maximization algorithm, Support Vector Machines
Example:

Word sense disambiguation uses bayesian classifiers
Part of speech tagging using Hidden markov Models
Machine translation using probabilistic models
Syntactic parsing using probabilistic grammars

Machine learning approach:

Algorithm based system that aids the machine to learn and understand the language by using statistical methods
It does not require explicit programming
Training data or Annotated corpus - one that has a corpus with mark-up
A general ML system of a training set or training a model on defined parameters, followed by fitting on test data

Neural network and deep learning approach:

Streams of raw parameters without engineered features are fed into neural networks
It has a massive training corpus or training examples that assure better accuracy

Can text be directly used in the Deep Learning approach? If not, give a solution.

Machine learning algorithms cannot work with raw text directly
The text must be converted into number; specifically, vectors of numbers

Word vectors:

Word vectors represent words as multidimensional continuous floating point numbers where semantically similar words are mapped to proximate points in geometric points
This means that words such as wheel and engine should have similar word vectors to the word vector whereas the word banana should be quite distant
The beauty of representing words as vectors is that they lend themselves to mathematical operators
For example, we can add and subtract vectors
In this canonical example, by using word vectors we can determine that King - Man + Woman = Queen

Where do word vectors come from?

Counts of word/context co-occurrences
Predictions of context given word

Machine learning approach:

Algorithm based system that aids the machine to learn and understand the language by using statistical methods
It does not require explicit programming
Training data or Annotated corpus - one that has a corpus with mark-up
A general ML system of a training set or training a model on defined parameters, followed by fitting on test data

Neural network and deep learning approach:

Streams of raw parameters without engineered features are fed into neural networks
It has a massive training corpus or training examples that assure better accuracy

Explain an application where the statistical method can be applied in any of NLP tasks.

Not sure what to write for this answer…but you can write about POS tagging (Used by Google to improve search quality)

Search This Blog

loveandhate

18CSE359T - NATURAL LANGUAGE PROCESSING UNIT 1 - 12M

Comments

Post a Comment

Popular posts from this blog

18ECO124T - HUMAN ASSIST DEVICES UNIT 4 & 5 - 12M

18CSE359T - NATURAL LANGUAGE PROCESSING UNIT 2 & 3 - 12M