Deep Learning for Sentiment Analysis
Sentiment analysis using deep learning goes far beyond simple lexicon-based approaches. Modern architectures capture contextual nuance, long-range dependencies, and aspect-level granularity that traditional methods cannot achieve.
Evolution of Sentiment Analysis
The field has progressed through several key stages, each building on the limitations of its predecessor.
| Approach | Era | Key Technique | Limitation |
|---|---|---|---|
| Lexicon-based | 2000s | Word polarity lists | No context awareness |
| Bag-of-Words + ML | 2010s | Naive Bayes, SVM | Word order lost |
| Word Embeddings | 2013+ | Word2Vec, GloVe | Static representations |
| RNNs / LSTMs | 2015+ | Sequential modeling | Vanishing gradients |
| Transformers | 2018+ | BERT, RoBERTa | Computational cost |
LSTM-Based Sentiment Analysis
Long Short-Term Memory (LSTMs) solve the vanishing gradient problem inherent in vanilla RNNs, making them well-suited for capturing sentiment across long documents.
LSTM Cell Architecture
An LSTM cell maintains three gates that control information flow:
DfLSTM Gate Equations
BiLSTM Implementation
Using bidirectional LSTMs captures both forward and backward context:
import torch
import torch.nn as nn
class BiLSTMSentiment(nn.Module):
def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes, num_layers=2, dropout=0.5):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(
embed_dim, hidden_dim,
num_layers=num_layers,
batch_first=True,
bidirectional=True,
dropout=dropout
)
self.attention = nn.Linear(hidden_dim * 2, 1)
self.classifier = nn.Sequential(
nn.Linear(hidden_dim * 2, hidden_dim),
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(hidden_dim, num_classes)
)
def forward(self, x):
embedded = self.embedding(x) # (batch, seq_len, embed_dim)
lstm_out, _ = self.lstm(embedded) # (batch, seq_len, hidden*2)
# Self-attention weights
attn_weights = torch.softmax(self.attention(lstm_out), dim=1)
context = torch.sum(attn_weights * lstm_out, dim=1) # (batch, hidden*2)
return self.classifier(context)
model = BiLSTMSentiment(vocab_size=30000, embed_dim=300, hidden_dim=256, num_classes=3)
BERT for Sentiment Analysis
BERT (Bidirectional Encoder Representations from Transformers) revolutionized sentiment analysis by providing deep bidirectional context through pre-training on masked language modeling.
Fine-Tuning Architecture
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import TrainingArguments, Trainer
import torch
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=3)
# Tokenize dataset
def tokenize_function(examples):
return tokenizer(
examples["text"],
padding="max_length",
truncation=True,
max_length=512
)
# Fine-tuning configuration
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir="./logs",
learning_rate=2e-5,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
Aspect-Based Sentiment Analysis (ABSA)
ABSA identifies sentiment toward specific aspects or entities within text, providing granular opinion mining.
Task Decomposition
ABSA typically involves three subtasks:
DfABSA Formulation
Given a sentence and an aspect term , the model predicts:
where is an attention mechanism linking context to the aspect.
from transformers import BertTokenizer, BertModel
import torch
import torch.nn as nn
class AspectBasedSentiment(nn.Module):
def __init__(self, bert_model_name="bert-base-uncased", num_classes=3):
super().__init__()
self.bert = BertModel.from_pretrained(bert_model_name)
self.aspect_attention = nn.Linear(self.bert.config.hidden_size, 1)
self.classifier = nn.Linear(self.bert.config.hidden_size, num_classes)
def forward(self, input_ids, attention_mask, aspect_positions):
outputs = self.bert(input_ids, attention_mask=attention_mask)
sequence_output = outputs.last_hidden_state # (batch, seq_len, hidden)
# Create aspect mask from positions
aspect_mask = torch.zeros_like(input_ids, dtype=torch.bool)
for i, pos in enumerate(aspect_positions):
for p in pos:
if p < aspect_mask.size(1):
aspect_mask[i, p] = True
# Masked attention over aspect tokens
aspect_tokens = sequence_output * aspect_mask.unsqueeze(-1).float()
aspect_repr = aspect_tokens.sum(dim=1) / aspect_mask.float().sum(dim=1, keepdim=True).clamp(min=1)
logits = self.classifier(aspect_repr)
return logits
# Example: "The food was great but the service was terrible"
# Aspect "food" β Positive
# Aspect "service" β Negative
Data Structures for ABSA
| Component | Example | Label |
|---|---|---|
| Sentence | "The food was great but the service was terrible" | β |
| Aspect Term | "food" | β |
| Aspect Category | "food quality" | β |
| Sentiment | β | Positive |
| Opinion Term | "great" | β |
Multi-Task Learning for Sentiment
Combining multiple sentiment objectives improves generalization:
class MultiTaskSentiment(nn.Module):
def __init__(self, bert_model_name="bert-base-uncased"):
super().__init__()
self.bert = BertModel.from_pretrained(bert_model_name)
hidden_size = self.bert.config.hidden_size
# Task-specific heads
self.sentiment_head = nn.Linear(hidden_size, 3) # Pos/Neg/Neu
self.emotion_head = nn.Linear(hidden_size, 6) # Joy, Anger, etc.
self.aspect_head = nn.Linear(hidden_size, 5) # Food, Service, etc.
def forward(self, input_ids, attention_mask):
outputs = self.bert(input_ids, attention_mask=attention_mask)
pooled = outputs.pooler_output
return {
"sentiment": self.sentiment_head(pooled),
"emotion": self.emotion_head(pooled),
"aspect": self.aspect_head(pooled),
}
# Multi-task loss
def multi_task_loss(outputs, labels, weights=(1.0, 0.5, 0.8)):
loss_fct = nn.CrossEntropyLoss()
total_loss = 0
for (key, label), weight in zip(labels.items(), weights):
total_loss += weight * loss_fct(outputs[key], label)
return total_loss
Training Strategies
DfFocal Loss for Imbalanced Sentiment
Standard cross-entropy struggles with class imbalance. Focal loss down-weights easy examples:
where focuses learning on hard, misclassified examples.
| Loss Function | Use Case | Key Benefit |
|---|---|---|
| Cross-Entropy | Balanced datasets | Simple, effective |
| Focal Loss | Imbalanced classes | Hard-example focus |
| Label Smoothing | Noisy labels | Regularization |
| Contrastive Loss | Few-shot sentiment | Metric learning |
Evaluation and Best Practices
| Metric | Formula | When to Use |
|---|---|---|
| Accuracy | Balanced classes | |
| Macro-F1 | Equal class importance | |
| Weighted-F1 | Proportional to support | |
| MCC | Imbalanced binary |
Key Takeaways
- LSTMs remain effective for resource-constrained environments and short texts
- BERT models achieve state-of-the-art on most sentiment benchmarks
- ABSA provides actionable insights by linking sentiment to specific aspects
- Multi-task learning improves generalization across sentiment sub-tasks
- Always consider class imbalance and choose appropriate loss functions