N-gram Models

N-grams are contiguous sequences of n items (words, characters) from a given text. N-gram models use these sequences to estimate the probability of the next word, forming the basis of statistical language modeling.

N-gram Probability

P(w_n | w_1, w_2, ..., w_{n-1}) \approx P(w_n | w_{n-N+1}, ..., w_{n-1})

Types of N-grams

N	Name	Example from "I love NLP"
1	Unigram	"I", "love", "NLP"
2	Bigram	"I love", "love NLP"
3	Trigram	"I love NLP"

from nltk.util import ngrams
from nltk.tokenize import word_tokenize

text = "Natural language processing is powerful"
tokens = word_tokenize(text)

unigrams = list(ngrams(tokens, 1))
bigrams = list(ngrams(tokens, 2))
trigrams = list(ngrams(tokens, 3))

print("Unigrams:", unigrams)
print("Bigrams:", bigrams)
print("Trigrams:", trigrams)

Building an N-gram Model

from collections import defaultdict, Counter

class NGramModel:
    def __init__(self, n=2):
        self.n = n
        self.ngram_counts = defaultdict(Counter)
        self.context_counts = defaultdict(int)

    def train(self, corpus):
        for sentence in corpus:
            tokens = ['<s>'] * (self.n - 1) + sentence + ['</s>']
            for i in range(self.n - 1, len(tokens)):
                context = tuple(tokens[i - self.n + 1:i])
                word = tokens[i]
                self.ngram_counts[context][word] += 1
                self.context_counts[context] += 1

    def probability(self, word, context):
        context = tuple(context) if not isinstance(context, tuple) else context
        count = self.ngram_counts[context][word]
        total = self.context_counts[context]
        if total == 0:
            return 0
        return count / total

    def sentence_probability(self, sentence):
        tokens = ['<s>'] * (self.n - 1) + sentence + ['</s>']
        prob = 1.0
        for i in range(self.n - 1, len(tokens)):
            context = tuple(tokens[i - self.n + 1:i])
            word = tokens[i]
            prob *= self.probability(word, context)
        return prob

# Train bigram model
corpus = [
    ["the", "cat", "sat", "on", "the", "mat"],
    ["the", "dog", "sat", "on", "the", "log"],
    ["the", "cat", "chased", "the", "dog"]
]

model = NGramModel(n=2)
model.train(corpus)

# Query probabilities
print(model.probability("cat", ("the",)))  # P(cat | the)
print(model.probability("sat", ("cat",)))  # P(sat | cat)

Bigram Probability

P(w_i | w_{i-1}) = \frac{C(w_{i-1}, w_i)}{C(w_{i-1})}

Smoothing Techniques

Raw n-gram counts can be zero for unseen combinations. Smoothing handles this by redistributing probability mass.

Laplace Smoothing

P_{Laplace}(w_i | w_{i-1}) = \frac{C(w_{i-1}, w_i) + 1}{C(w_{i-1}) + V}

def laplace_probability(self, word, context):
    context = tuple(context) if not isinstance(context, tuple) else context
    count = self.ngram_counts[context][word]
    total = self.context_counts[context]
    V = len(set(w for counts in self.ngram_counts.values() for w in counts))
    return (count + 1) / (total + V)

Perplexity

PPL = 2^{-\frac{1}{N}\sum_{i=1}^{N} \log_2 P(w_i | w_{<i})}

import math

def perplexity(model, test_corpus):
    total_log_prob = 0
    total_tokens = 0
    for sentence in test_corpus:
        tokens = ['<s>'] * (model.n - 1) + sentence + ['</s>']
        for i in range(model.n - 1, len(tokens)):
            context = tuple(tokens[i - model.n + 1:i])
            word = tokens[i]
            prob = model.probability(word, context)
            if prob > 0:
                total_log_prob += math.log2(prob)
            else:
                total_log_prob += -100  # Penalty for zero prob
            total_tokens += 1
    return 2 ** (-total_log_prob / total_tokens)

test = [["the", "cat", "sat"]]
print("Perplexity:", perplexity(model, test))

N-gram Model Comparison

Model	Context Window	Parameters	Pros	Cons
Unigram	None	V	Simple	No word order
Bigram	1 word	V²	Captures local order	Limited context
Trigram	2 words	V³	Better context	Sparse data
4-gram	3 words	V⁴	Rich context	Very sparse

N-gram Models

N-gram Models

N-gram Probability

Types of N-grams

Building an N-gram Model

Bigram Probability

Smoothing Techniques

Laplace Smoothing

Perplexity

Perplexity

N-gram Model Comparison

Premium Content

Need Expert NLP Help?