POS (Part-Of-Speech) Tagging & Chunking with NLTK

POS Tagging

Image result for Stemming and Lemmatization with Python NLTK

Parts of speech Tagging is responsible for reading the text in a language and assigning some specific token (Parts of Speech) to each word.

e.g.

Input: Everything to permit us.

Output: [('Everything', NN),('to', TO), ('permit', VB), ('us', PRP)]

Steps Involved:

Tokenize text (word_tokenize)
apply pos_tag to above step that is nltk.pos_tag(tokenize_text)

Some examples are as below:

Abbreviation	Meaning
CC	coordinating conjunction
CD	cardinal digit
DT	determiner
EX	existential there
FW	foreign word
IN	preposition/subordinating conjunction
JJ	adjective (large)
JJR	adjective, comparative (larger)
JJS	adjective, superlative (largest)
LS	list market
MD	modal (could, will)
NN	noun, singular (cat, tree)
NNS	noun plural (desks)
NNP	proper noun, singular (sarah)
NNPS	proper noun, plural (indians or americans)
PDT	predeterminer (all, both, half)
POS	possessive ending (parent\ 's)
PRP	personal pronoun (hers, herself, him,himself)
PRP$	possessive pronoun (her, his, mine, my, our )
RB	adverb (occasionally, swiftly)
RBR	adverb, comparative (greater)
RBS	adverb, superlative (biggest)
RP	particle (about)
TO	infinite marker (to)
UH	interjection (goodbye)
VB	verb (ask)
VBG	verb gerund (judging)
VBD	verb past tense (pleaded)
VBN	verb past participle (reunified)
VBP	verb, present tense not 3rd person singular(wrap)
VBZ	verb, present tense with 3rd person singular (bases)
WDT	wh-determiner (that, what)
WP	wh- pronoun (who)
WRB	wh- adverb (how)

POS tagger is used to assign grammatical information of each word of the sentence. Installing, Importing and downloading all the packages of NLTK is complete.

Chunking

Chunking is used to add more structure to the sentence by following parts of speech (POS) tagging. It is also known as shallow parsing. The resulted group of words is called "chunks." In shallow parsing, there is maximum one level between roots and leaves while deep parsing comprises of more than one level. Shallow Parsing is also called light parsing or chunking.

The primary usage of chunking is to make a group of "noun phrases." The parts of speech are combined with regular expressions.

Rules for Chunking:

There are no pre-defined rules, but you can combine them according to need and requirement.

For example, you need to tag Noun, verb (past tense), adjective, and coordinating junction from the sentence. You can use the rule as below

chunk:{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}

Following table shows what the various symbol means:

Name of symbol	Description
.	Any character except new line
*	Match 0 or more repetitions
?	Match 0 or 1 repetitions

Now Let us write the code to understand rule better

from nltk import pos_tag
from nltk import RegexpParser
text ="learn php from guru99 and make study easy".split()
print("After Split:",text)
tokens_tag = pos_tag(text)
print("After Token:",tokens_tag)
patterns= """mychunk:{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}"""
chunker = RegexpParser(patterns)
print("After Regex:",chunker)
output = chunker.parse(tokens_tag)
print("After Chunking",output)

Output

After Split: ['learn', 'php', 'from', 'guru99', 'and', 'make', 'study', 'easy']
After Token: [('learn', 'JJ'), ('php', 'NN'), ('from', 'IN'), ('guru99', 'NN'), ('and', 'CC'), ('make', 'VB'), ('study', 'NN'), ('easy', 'JJ')]
After Regex: chunk.RegexpParser with 1 stages:
RegexpChunkParser with 1 rules:
       <ChunkRule: '<NN.?>*<VBD.?>*<JJ.?>*<CC>?'>
After Chunking (S
  (mychunk learn/JJ)
  (mychunk php/NN)
  from/IN
  (mychunk guru99/NN and/CC)
  make/VB
  (mychunk study/NN easy/JJ))

The conclusion from the above example: "make" is a verb which is not included in the rule, so it is not tagged as mychunk

Use Case of Chunking

Chunking is used for entity detection. An entity is that part of the sentence by which machine get the value for any intention

Example: 
Temperature of New York. 
Here Temperature is the intention and New York is an entity.

In other words, chunking is used as selecting the subsets of tokens. Please follow the below code to understand how chunking is used to select the tokens. In this example, you will see the graph which will correspond to a chunk of a noun phrase. We will write the code and draw the graph for better understanding.

Code to Demonstrate Use Case

 import nltk
text = "learn php from guru99"
tokens = nltk.word_tokenize(text)
print(tokens)
tag = nltk.pos_tag(tokens)
print(tag)
grammar = "NP: {<DT>?<JJ>*<NN>}"
cp  =nltk.RegexpParser(grammar)
result = cp.parse(tag)
print(result)
result.draw()    # It will draw the pattern graphically which can be seen in Noun Phrase chunking

Output:

['learn', 'php', 'from', 'guru99']  -- These are the tokens
[('learn', 'JJ'), ('php', 'NN'), ('from', 'IN'), ('guru99', 'NN')]   -- These are the pos_tag
(S (NP learn/JJ php/NN) from/IN (NP guru99/NN))        -- Noun Phrase Chunking

Graph

Noun Phrase chunking Graph

From the graph, we can conclude that "learn" and "guru99" are two different tokens but are categorized as Noun Phrase whereas token "from" does not belong to Noun Phrase.

Chunking is used to categorize different tokens into the same chunk. The result will depend on grammar which has been selected. Further chunking is used to tag patterns and to explore text corpora.

About Me

Free Hacking Course

POS (Part-Of-Speech) Tagging & Chunking with NLTK

POS Tagging

Chunking

Use Case of Chunking

Code to Demonstrate Use Case

Post a Comment

0 Comments

Top New

Apache NiFi Tutorial: What is, Architecture & Installation

C Programming

Python Tutorial for Beginners: Learn Python Programming in 7 Days

How to Download & Install Java JDK 8 in Windows

Data Warehouse Tutorial for Beginners: Learn in 7 Days

Technology

New Release

Popular Posts

Apache NiFi Tutorial: What is, Architecture & Installation

C Programming

Python Tutorial for Beginners: Learn Python Programming in 7 Days

Java Tutorial for Beginners: Learn in 7 Days

JavaScript Tutorial for Beginners: Learn Javascript in 5 Days

C++ Programming Tutorial for Beginners: Learn in 2 Hours

PHP Tutorial for Beginners: Learn in 7 Days

Web Services Tutorial for Beginners: Learn in 3 Days

ASP.NET Tutorial for Beginners: Learn in 3 Days

AWS(Amazon Web Services) Tutorial for Beginners: Learn in 2 Hours

Recent Posts

Copyright © 2019 HackingKaGuru | Designed for r4 - r4i gold, r4 3ds, r4

About Me

POS (Part-Of-Speech) Tagging & Chunking with NLTK

POS Tagging

Chunking

Use Case of Chunking

Code to Demonstrate Use Case

You may like these posts

Post a Comment

0 Comments

Social Plugin

Top New

Technology

New Release

Popular Posts

Recent Posts

Copyright © 2019 HackingKaGuru | Designed for r4 - r4i gold, r4 3ds, r4