Natural language processing applies different methods to extract patterns and build knowledge based from text data. N-grams is one of the language model, where we use previous N-1 (N being the size of your document/sentence),to predict the next word.
Along with sequence prediction, n-grams model is being used for spelling correction (as in Google search), language translation and text summarization.
n-gram model is based on the idea of computing the probability of a sentence or sequence of words.
P(W) = P(w1, w2, w3, .....)
If we need to predict the upcoming word/ sequence (w4),
Here, we need to calculate the probability of number of words; which can be represented
as joint probability and by using
Conditional probability can be written as:
P(B | A) = P(A,B) / P(A)
=> P(A,B) = P (B | A) * P(A)
If we include more variables:
P(A,B,C,D,E) = = P(A) P(B|A) P(C|A,B) P(D|A,B,C) P(E|A,B,C,D)
Therefore, we use
Chain Rule to compute join probability for the words in a sentence.