Sentiment Analysis
Sentiment analysis (opinion mining) automatically detects the emotional tone of textβwhether it expresses positive, negative, or neutral sentiment. It's widely used for brand monitoring, customer feedback analysis, and social media analytics.
Approaches to Sentiment Analysis
| Approach | Pros | Cons | Accuracy |
|---|---|---|---|
| Lexicon-based | No training data needed, interpretable | Limited context understanding | 60-75% |
| Traditional ML | Good with domain-specific data | Needs labeled data | 80-88% |
| Deep Learning | Captures complex patterns | Needs large datasets, less interpretable | 88-95% |
| Transformer-based | State-of-the-art, contextual | Resource intensive | 92-97% |
1. Lexicon-Based Approach
Uses predefined sentiment dictionaries where each word has a sentiment score.
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()
texts = [
"This product is amazing! Best purchase ever!",
"Terrible quality. Complete waste of money.",
"It's okay, nothing special."
]
for text in texts:
scores = sia.polarity_scores(text)
print(f"Text: {text}")
print(f" Scores: {scores}")
print(f" Label: {'positive' if scores['compound'] > 0.05 else 'negative' if scores['compound'] < -0.05 else 'neutral'}")
print()
Custom Lexicon
custom_lexicon = {
"excellent": 3.0,
"great": 2.0,
"good": 1.5,
"okay": 0.0,
"bad": -1.5,
"terrible": -3.0,
"awful": -3.0
}
def lexicon_sentiment(text, lexicon):
words = text.lower().split()
score = sum(lexicon.get(w, 0) for w in words)
return "positive" if score > 0 else "negative" if score < 0 else "neutral"
2. Machine Learning Approach
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
# Training data
texts = ["Great product!", "Love it!", "Amazing quality",
"Terrible", "Worst ever", "Very bad"]
labels = [1, 1, 1, 0, 0, 0] # 1=positive, 0=negative
# Build pipeline
pipeline = Pipeline([
('tfidf', TfidfVectorizer(ngram_range=(1, 2))),
('clf', LogisticRegression())
])
# Train
pipeline.fit(texts, labels)
# Predict
test = ["This is wonderful", "I hate this"]
predictions = pipeline.predict(test)
for text, pred in zip(test, predictions):
print(f"{text} -> {'positive' if pred == 1 else 'negative'}")
3. Deep Learning with Transformers
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
reviews = [
"Absolutely fantastic! Will buy again.",
"Product broke after one day. Terrible.",
"Average product, meets basic needs."
]
for review in reviews:
result = classifier(review)[0]
print(f"{result['label']}: {result['score']:.3f} | {review}")
Fine-Tuning for Domain-Specific Sentiment
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from datasets import load_dataset
# Load dataset
dataset = load_dataset("imdb")
# Load model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
# Tokenize
def tokenize(batch):
return tokenizer(batch["text"], padding=True, truncation=True)
dataset = dataset.map(tokenize, batched=True)
# Training
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
evaluation_strategy="epoch"
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"]
)
trainer.train()
Aspect-Based Sentiment Analysis
Analyzes sentiment toward specific aspects of a product or service.
# Example: "The camera quality is excellent but the battery life is terrible"
# Camera quality: positive
# Battery life: negative
# Using PyABSA or custom NER + sentiment pipeline