Getting Started With NLP


Sample Image

As the name suggest Natural Language Processing ( NLP ) is processing of languages we human speak to communicate with each other. NLP is the practice to understand and derive knowledge out of these languages in such a way that computer can perform tasks by just understanding what we speak or write as if we were speaking to other human being.

Let’s say we are given a sentence “A cop is chasing a man in the streets”. In NLP we will generally, perform following steps, to collect varied meanings of this sentence,

  • Lexical Analysis: It is also called POS Tagging. This entails marking each word in a sentence with noun, verb adjective etc. ( Parts Of Speech basically).
Lexical Analysis Example
  • Syntactic Analysis: This step parses the sentence into a structure which gives more broader understanding of it.
Syntactic Analysis Example
  • Semantic Analysis: This step helps in identifying the words which are central to the understanding of the sentence. This step is also referred to as Entity Extraction.

In the above example sentence, system will identify important words as follows,





c1 is chasing m1 in s1. We can see how this helps. C1 becomes a variable which can accept different values, similarly m1 and s1. This helps in understanding other similar sentences.

  • Intent or Goal Analysis: Based on all the data we have collected so far, we draw inferences. This gives us more insights into the meaning and context of a sentence.

For example, one inference could be,

Cop is chasing a man because he is a suspect

  • Pragmatic Analysis: Finally, more knowledge can be extracted using this step. We extrapolate the data at hand to delve deeper into the meaning of a sentence.


The man might have escaped the jail.

The man might have committed a crime, so he is dangerous and we need to stay away.

Roadblocks to perfect NLP System

A computer must go through all the above mentioned steps to create understanding of a sentence and we as humans understand this sentence just like that. There are number of reasons for this, some of them are,

  • Languages like English, Chinese were designed for us to communicate efficiently, not computers. For computers they have their own language ( c, python, java etc) which are difficult for us to understand.
  • In a sentence we omit lot of common sense knowledge which we assume that the person listening, or reading might already know.
  • We keep lots of ambiguity in the sentence which we assume that reader or person listening might know how to resolve. ( sentence level ambiguity )

For example: — I saw a man on a hill with a telescope. This sentence can have several interpretations,

There’s a man on a hill, and I’m watching him with my telescope.

There’s a man on a hill, who I’m seeing, and he has a telescope.

There’s a man, and he’s on a hill that also has a telescope on it.

Which one is the correct one? It is here that background knowledge, reasoning, common sense comes in handy and thus it is easy for us to pick the correct one.

  • We also use same word with different meaning in different context. ( word level ambiguity )

Like, word “point”, it has different meaning in below sentences,

A point in 2-dimensional space.

I am making my point with this sentence.

Point of a ball pen.

  • Background knowledge is hard to attain in an open domain which we build from our experiences growing up.

These are some of the major challenges which we must overcome to build a perfect NLP system. Continuous research is being done in this direction and our state of the art implementations are good, but we are still far from a system as good as us.

Best Practices

Until recently, almost all the NLP techniques were based on statistical machine learning involving SVMs, logistic regression, decision tree etc. These were built on sparse inputs ( an array having most of its elements as 0s and only some non-zero elements ).

Sparse Inputs does not capture semantics of a word instead it represents a word from statistical point of view for example: — the frequency of the word w.r.t. set of documents.

Since 2012, NLP has seen lot of success as it has switched from sparse inputs to dense inputs (called Word Embeddings) based on nonlinear neural networks. Smaller dimension word embeddings improves effectiveness of tasks such as entity recognition, part of speech tagging etc, where as large dimension word embeddings are better suited for tasks such as intent extraction.

Advanced Deep Neural Networks such as Bidirectional LSTM and Attention LSTM have become common in NLP tasks, and are being progressively used in applications which we use in our daily work. Like, search engines, google translate. Most of these networks have grown to be deeper, like Google’s nmt is 16 layers deep (8+8 encoder-decoder). Contrary to the popular belief, deeper and wider network does not always perform better, in some complex cases shallow networks performs better.

With such a huge open source machine learning community, training models from scratch is not practical. Transfer Learning helps in shortening the training time significantly. However, in NLP, implementation of TL is limited as use cases and domain of the problem varies from client to client. Nevertheless, pre-trained models like glove and word2vec can still be used.

Ensemble approach has been a proven approach to improve model performance. It effectively captures the diversity in dataset compared to a single model. Similarly, multiple DNN models can be combined for improved performance and in some use cases, a model from linear machine learning world can be combined with DNN model to improve the performance further. To note, this approach comes at a cost of increase computation cost.

To Summarize

Computers must perform several steps to understand a sentence and today it is far from perfect. The statistical techniques used are by no means trivial, decade’s worth of research went into these techniques and they are still quite effective. Ever since breakthrough in DNN, NLP has been riding a wave of success and only time will tell how far it takes us.




Senior SDE @Amazon

Love podcasts or audiobooks? Learn on the go with our new app.

Face Recognition Using Python

Getting started with Machine Learning and data science with Python

Tweedie Regressor

Text Preprocessing Techniques in NLP

Evolution of ML Infra

Recent Language Models

Let’s Develop Artificial Neural Network in 30 lines of code

The NLP Cypher | 04.25.21

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sarwesh Suman

Sarwesh Suman

Senior SDE @Amazon

More from Medium

Topic modeling using Latent Dirichlet Allocation

Text Summarization using NLP

Natural Language Processing

Tweet Sentiment Extraction