Deep Learning for Sentiment Analysis

Sentiment analysis using deep learning goes far beyond simple lexicon-based approaches. Modern architectures capture contextual nuance, long-range dependencies, and aspect-level granularity that traditional methods cannot achieve.

Evolution of Sentiment Analysis

The field has progressed through several key stages, each building on the limitations of its predecessor.

Approach	Era	Key Technique	Limitation
Lexicon-based	2000s	Word polarity lists	No context awareness
Bag-of-Words + ML	2010s	Naive Bayes, SVM	Word order lost
Word Embeddings	2013+	Word2Vec, GloVe	Static representations
RNNs / LSTMs	2015+	Sequential modeling	Vanishing gradients
Transformers	2018+	BERT, RoBERTa	Computational cost

LSTM-Based Sentiment Analysis

Long Short-Term Memory (LSTMs) solve the vanishing gradient problem inherent in vanilla RNNs, making them well-suited for capturing sentiment across long documents.

LSTM Cell Architecture

An LSTM cell maintains three gates that control information flow:

DfLSTM Gate Equations

\begin{aligned} f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) & \text{(forget gate)} \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) & \text{(input gate)} \\ \tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) & \text{(candidate)} \\ C_t &= f_t \odot C_{t-1} + i_t \odot \tilde{C}_t & \text{(cell state)} \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) & \text{(output gate)} \\ h_t &= o_t \odot \tanh(C_t) & \text{(hidden state)} \end{aligned}

BiLSTM Implementation

Using bidirectional LSTMs captures both forward and backward context:

import torch
import torch.nn as nn

class BiLSTMSentiment(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes, num_layers=2, dropout=0.5):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(
            embed_dim, hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            bidirectional=True,
            dropout=dropout
        )
        self.attention = nn.Linear(hidden_dim * 2, 1)
        self.classifier = nn.Sequential(
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, num_classes)
        )

    def forward(self, x):
        embedded = self.embedding(x)           # (batch, seq_len, embed_dim)
        lstm_out, _ = self.lstm(embedded)       # (batch, seq_len, hidden*2)

        # Self-attention weights
        attn_weights = torch.softmax(self.attention(lstm_out), dim=1)
        context = torch.sum(attn_weights * lstm_out, dim=1)  # (batch, hidden*2)

        return self.classifier(context)

model = BiLSTMSentiment(vocab_size=30000, embed_dim=300, hidden_dim=256, num_classes=3)

BERT for Sentiment Analysis

BERT (Bidirectional Encoder Representations from Transformers) revolutionized sentiment analysis by providing deep bidirectional context through pre-training on masked language modeling.

Fine-Tuning Architecture

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import TrainingArguments, Trainer
import torch

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=3)

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        padding="max_length",
        truncation=True,
        max_length=512
    )

# Fine-tuning configuration
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    learning_rate=2e-5,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

trainer.train()

Aspect-Based Sentiment Analysis (ABSA)

ABSA identifies sentiment toward specific aspects or entities within text, providing granular opinion mining.

Task Decomposition

ABSA typically involves three subtasks:

DfABSA Formulation

Given a sentence $s$ and an aspect term $a$ , the model predicts:

P(y|s, a) = \text{softmax}(W_o \cdot \text{ATT}(h_s, h_a) + b_o)

where $\text{ATT}$ is an attention mechanism linking context to the aspect.

from transformers import BertTokenizer, BertModel
import torch
import torch.nn as nn

class AspectBasedSentiment(nn.Module):
    def __init__(self, bert_model_name="bert-base-uncased", num_classes=3):
        super().__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)
        self.aspect_attention = nn.Linear(self.bert.config.hidden_size, 1)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_classes)

    def forward(self, input_ids, attention_mask, aspect_positions):
        outputs = self.bert(input_ids, attention_mask=attention_mask)
        sequence_output = outputs.last_hidden_state  # (batch, seq_len, hidden)

        # Create aspect mask from positions
        aspect_mask = torch.zeros_like(input_ids, dtype=torch.bool)
        for i, pos in enumerate(aspect_positions):
            for p in pos:
                if p < aspect_mask.size(1):
                    aspect_mask[i, p] = True

        # Masked attention over aspect tokens
        aspect_tokens = sequence_output * aspect_mask.unsqueeze(-1).float()
        aspect_repr = aspect_tokens.sum(dim=1) / aspect_mask.float().sum(dim=1, keepdim=True).clamp(min=1)

        logits = self.classifier(aspect_repr)
        return logits

# Example: "The food was great but the service was terrible"
# Aspect "food" → Positive
# Aspect "service" → Negative

Data Structures for ABSA

Component	Example	Label
Sentence	"The food was great but the service was terrible"	—
Aspect Term	"food"	—
Aspect Category	"food quality"	—
Sentiment	—	Positive
Opinion Term	"great"	—

Multi-Task Learning for Sentiment

Combining multiple sentiment objectives improves generalization:

class MultiTaskSentiment(nn.Module):
    def __init__(self, bert_model_name="bert-base-uncased"):
        super().__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)
        hidden_size = self.bert.config.hidden_size

        # Task-specific heads
        self.sentiment_head = nn.Linear(hidden_size, 3)      # Pos/Neg/Neu
        self.emotion_head = nn.Linear(hidden_size, 6)         # Joy, Anger, etc.
        self.aspect_head = nn.Linear(hidden_size, 5)          # Food, Service, etc.

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids, attention_mask=attention_mask)
        pooled = outputs.pooler_output

        return {
            "sentiment": self.sentiment_head(pooled),
            "emotion": self.emotion_head(pooled),
            "aspect": self.aspect_head(pooled),
        }

# Multi-task loss
def multi_task_loss(outputs, labels, weights=(1.0, 0.5, 0.8)):
    loss_fct = nn.CrossEntropyLoss()
    total_loss = 0
    for (key, label), weight in zip(labels.items(), weights):
        total_loss += weight * loss_fct(outputs[key], label)
    return total_loss

Training Strategies

DfFocal Loss for Imbalanced Sentiment

Standard cross-entropy struggles with class imbalance. Focal loss down-weights easy examples:

\text{FL}(p_t) = -\alpha_t (1 - p_t)^\gamma \log(p_t)

where $\gamma = 2$ focuses learning on hard, misclassified examples.

Loss Function	Use Case	Key Benefit
Cross-Entropy	Balanced datasets	Simple, effective
Focal Loss	Imbalanced classes	Hard-example focus
Label Smoothing	Noisy labels	Regularization
Contrastive Loss	Few-shot sentiment	Metric learning

Evaluation and Best Practices

Metric	Formula	When to Use
Accuracy	$\frac{TP + TN}{Total}$	Balanced classes
Macro-F1	$\frac{1}{C}\sum_{i=1}^{C} F1_i$	Equal class importance
Weighted-F1	$\sum_i \frac{n_i}{N} F1_i$	Proportional to support
MCC	$\frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}$	Imbalanced binary

Key Takeaways

LSTMs remain effective for resource-constrained environments and short texts
BERT models achieve state-of-the-art on most sentiment benchmarks
ABSA provides actionable insights by linking sentiment to specific aspects
Multi-task learning improves generalization across sentiment sub-tasks
Always consider class imbalance and choose appropriate loss functions

Deep Learning for Sentiment Analysis