πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Text Classification

Text ClassificationClassification Methods🟒 Free Lesson

Advertisement

Text Classification

Text classification is the task of assigning predefined categories to text documents. It's one of the most common NLP applications, used for spam detection, topic labeling, intent recognition, and content moderation.

Classical ML Approaches

Naive Bayes

Based on Bayes' theorem with the "naive" assumption of feature independence.

Naive Bayes Classification

P(class∣features)=P(features∣class)Γ—P(class)P(features)P(class | features) = \frac{P(features | class) \times P(class)}{P(features)}
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

# Sample data
train_texts = [
    "Win free iPhone now!!!",
    "Click here for prize money",
    "Meeting agenda for Monday",
    "Quarterly report attached",
    "Buy cheap medications",
    "Project deadline extended"
]
train_labels = [1, 1, 0, 0, 1, 0]  # 1=spam, 0=ham

# Build pipeline
nb_pipeline = Pipeline([
    ('vectorizer', CountVectorizer(ngram_range=(1, 2))),
    ('classifier', MultinomialNB(alpha=1.0))
])

nb_pipeline.fit(train_texts, train_labels)

# Predict
test = ["Free money click here", "Meeting at noon"]
print(nb_pipeline.predict(test))  # [1, 0]

Support Vector Machine (SVM)

SVMs find the optimal hyperplane that maximizes the margin between classes.

from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline

svm_pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(
        ngram_range=(1, 2),
        max_features=10000,
        sublinear_tf=True
    )),
    ('classifier', LinearSVC(C=1.0))
])

svm_pipeline.fit(train_texts, train_labels)
print(svm_pipeline.predict(test))

Logistic Regression

Linear model for binary and multiclass classification.

from sklearn.linear_model import LogisticRegression

lr_pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(ngram_range=(1, 2))),
    ('classifier', LogisticRegression(
        C=1.0,
        max_iter=1000,
        class_weight='balanced'
    ))
])

lr_pipeline.fit(train_texts, train_labels)
print(lr_pipeline.predict(test))

Model Comparison

ModelSpeedAccuracyInterpretabilityBest For
Naive BayesVery FastGoodHighBaseline, small data
SVMFastVery GoodMediumHigh-dimensional text
Logistic RegressionFastGoodHighProbabilistic outputs
Random ForestMediumGoodMediumNon-linear boundaries
Neural NetworksSlowExcellentLowLarge datasets

Evaluation Metrics

from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, classification_report, confusion_matrix
)

y_true = [1, 1, 0, 0, 1, 0, 1, 0]
y_pred = [1, 0, 0, 1, 1, 0, 1, 1]

print(f"Accuracy: {accuracy_score(y_true, y_pred):.3f}")
print(f"Precision: {precision_score(y_true, y_pred):.3f}")
print(f"Recall: {recall_score(y_true, y_pred):.3f}")
print(f"F1 Score: {f1_score(y_true, y_pred):.3f}")
print(classification_report(y_true, y_pred))

F1 Score

F1=2β‹…precisionΓ—recallprecision+recallF1 = 2 \cdot \frac{precision \times recall}{precision + recall}

Multi-Class Classification

from sklearn.datasets import fetch_20newsgroups

# Load dataset
categories = ['sci.space', 'rec.sport.baseball', 'comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)
test = fetch_20newsgroups(subset='test', categories=categories)

# Train classifier
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression(max_iter=1000))
])

pipeline.fit(train.data, train.target)
accuracy = pipeline.score(test.data, test.target)
print(f"Accuracy: {accuracy:.3f}")
⭐

Premium Content

Text Classification

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert NLP Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement