πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Named Entity Recognition

Sequence LabelingNER🟒 Free Lesson

Advertisement

Named Entity Recognition (NER)

NER identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and monetary values. It's a foundational task for information extraction and knowledge graph construction.

BIO Tagging Scheme

BIO (Beginning-Inside-Outside) is the standard tagging scheme for NER:

TagMeaningExample
B-PERBeginning of personJohn
I-PERInside (continuation)Smith
B-ORGBeginning of organizationGoogle
I-ORGInside organizationInc
B-LOCBeginning of locationNew
I-LOCInside locationYork
OOutside any entitythe, is, at
# Example BIO tags
tokens = ["John", "Smith", "works", "at", "Google", "in", "New", "York"]
tags = ["B-PER", "I-PER", "O", "O", "B-ORG", "O", "B-LOC", "I-LOC"]

NER with spaCy

import spacy

nlp = spacy.load("en_core_web_sm")

text = "Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976."
doc = nlp(text)

for ent in doc.ents:
    print(f"{ent.text:20} {ent.label_:8} {spacy.explain(ent.label_)}")
# Apple Inc.          ORG      Companies, agencies
# Steve Jobs          PERSON   People, including fictional
# Cupertino           GPE      Countries, cities, states
# California          GPE      Countries, cities, states
# 1976                DATE     Absolute or relative dates

Custom NER Training

import spacy
from spacy.tokens import DocBin
from spacy.training import Example

# Training data in spaCy format
TRAIN_DATA = [
    ("iPhone 15 Pro is amazing", {"entities": [(0, 14, "PRODUCT")]}),
    ("Samsung Galaxy S24 released", {"entities": [(0, 18, "PRODUCT")]}),
    ("Tim Cook announced the new MacBook", {"entities": [(0, 8, "PERSON"), (29, 36, "PRODUCT")]}),
]

# Initialize blank model
nlp = spacy.blank("en")
ner = nlp.add_pipe("ner")

# Add labels
for _, annotations in TRAIN_DATA:
    for ent in annotations.get("entities"):
        ner.add_label(ent[2])

# Train
optimizer = nlp.begin_training()
for i in range(20):
    losses = {}
    for text, annotations in TRAIN_DATA:
        doc = nlp.make_doc(text)
        example = Example.from_dict(doc, annotations)
        nlp.update([example], sgd=optimizer, losses=losses)
    if i % 5 == 0:
        print(f"Step {i}, Loss: {losses['ner']:.4f}")

CRF-Based NER

Conditional Random Fields model the dependencies between neighboring tags.

# Feature extraction for CRF
def word2features(sent, i):
    word = sent[i]
    features = {
        'bias': 1.0,
        'word.lower()': word.lower(),
        'word[-3:]': word[-3:],
        'word.isupper()': word.isupper(),
        'word.istitle()': word.istitle(),
        'word.isdigit()': word.isdigit(),
    }
    if i > 0:
        features['prev.word'] = sent[i-1].lower()
    if i < len(sent)-1:
        features['next.word'] = sent[i+1].lower()
    return features

Transformer-Based NER

from transformers import pipeline

ner_pipeline = pipeline("ner", aggregation_strategy="simple")

text = "Elon Musk founded SpaceX in Hawthorne, California."
entities = ner_pipeline(text)
for ent in entities:
    print(f"{ent['word']:20} {ent['entity_group']:6} {ent['score']:.3f}")
# Elon Musk            PER    0.998
# SpaceX               ORG    0.995
# Hawthorne            LOC    0.989
# California           LOC    0.994

NER Challenges

ChallengeExampleDifficulty
Ambiguity"Washington" (person vs state)High
Nested entities"New York City Police Department"Medium
Abbreviations"MIT", "NYC"Medium
Domain-specificGene names, drug namesHigh
Multi-languageVarying entity formatsVery High
⭐

Premium Content

Named Entity Recognition

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert NLP Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement