πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Deep Learning for Sentiment Analysis

Advanced NLPDeep Learning Sentiment Analysis🟒 Free Lesson

Advertisement

Deep Learning for Sentiment Analysis

Sentiment analysis using deep learning goes far beyond simple lexicon-based approaches. Modern architectures capture contextual nuance, long-range dependencies, and aspect-level granularity that traditional methods cannot achieve.

Evolution of Sentiment Analysis

The field has progressed through several key stages, each building on the limitations of its predecessor.

ApproachEraKey TechniqueLimitation
Lexicon-based2000sWord polarity listsNo context awareness
Bag-of-Words + ML2010sNaive Bayes, SVMWord order lost
Word Embeddings2013+Word2Vec, GloVeStatic representations
RNNs / LSTMs2015+Sequential modelingVanishing gradients
Transformers2018+BERT, RoBERTaComputational cost

LSTM-Based Sentiment Analysis

Long Short-Term Memory (LSTMs) solve the vanishing gradient problem inherent in vanilla RNNs, making them well-suited for capturing sentiment across long documents.

LSTM Cell Architecture

An LSTM cell maintains three gates that control information flow:

DfLSTM Gate Equations

ft=Οƒ(Wfβ‹…[htβˆ’1,xt]+bf)(forgetΒ gate)it=Οƒ(Wiβ‹…[htβˆ’1,xt]+bi)(inputΒ gate)C~t=tanh⁑(WCβ‹…[htβˆ’1,xt]+bC)(candidate)Ct=ftβŠ™Ctβˆ’1+itβŠ™C~t(cellΒ state)ot=Οƒ(Woβ‹…[htβˆ’1,xt]+bo)(outputΒ gate)ht=otβŠ™tanh⁑(Ct)(hiddenΒ state)\begin{aligned} f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) & \text{(forget gate)} \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) & \text{(input gate)} \\ \tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) & \text{(candidate)} \\ C_t &= f_t \odot C_{t-1} + i_t \odot \tilde{C}_t & \text{(cell state)} \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) & \text{(output gate)} \\ h_t &= o_t \odot \tanh(C_t) & \text{(hidden state)} \end{aligned}

BiLSTM Implementation

Using bidirectional LSTMs captures both forward and backward context:

import torch
import torch.nn as nn

class BiLSTMSentiment(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes, num_layers=2, dropout=0.5):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(
            embed_dim, hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            bidirectional=True,
            dropout=dropout
        )
        self.attention = nn.Linear(hidden_dim * 2, 1)
        self.classifier = nn.Sequential(
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, num_classes)
        )

    def forward(self, x):
        embedded = self.embedding(x)           # (batch, seq_len, embed_dim)
        lstm_out, _ = self.lstm(embedded)       # (batch, seq_len, hidden*2)

        # Self-attention weights
        attn_weights = torch.softmax(self.attention(lstm_out), dim=1)
        context = torch.sum(attn_weights * lstm_out, dim=1)  # (batch, hidden*2)

        return self.classifier(context)

model = BiLSTMSentiment(vocab_size=30000, embed_dim=300, hidden_dim=256, num_classes=3)

BERT for Sentiment Analysis

BERT (Bidirectional Encoder Representations from Transformers) revolutionized sentiment analysis by providing deep bidirectional context through pre-training on masked language modeling.

Fine-Tuning Architecture

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import TrainingArguments, Trainer
import torch

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=3)

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        padding="max_length",
        truncation=True,
        max_length=512
    )

# Fine-tuning configuration
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    learning_rate=2e-5,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

trainer.train()

Aspect-Based Sentiment Analysis (ABSA)

ABSA identifies sentiment toward specific aspects or entities within text, providing granular opinion mining.

Task Decomposition

ABSA typically involves three subtasks:

DfABSA Formulation

Given a sentence ss and an aspect term aa, the model predicts:

P(y∣s,a)=softmax(Woβ‹…ATT(hs,ha)+bo)P(y|s, a) = \text{softmax}(W_o \cdot \text{ATT}(h_s, h_a) + b_o)

where ATT\text{ATT} is an attention mechanism linking context to the aspect.

from transformers import BertTokenizer, BertModel
import torch
import torch.nn as nn

class AspectBasedSentiment(nn.Module):
    def __init__(self, bert_model_name="bert-base-uncased", num_classes=3):
        super().__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)
        self.aspect_attention = nn.Linear(self.bert.config.hidden_size, 1)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_classes)

    def forward(self, input_ids, attention_mask, aspect_positions):
        outputs = self.bert(input_ids, attention_mask=attention_mask)
        sequence_output = outputs.last_hidden_state  # (batch, seq_len, hidden)

        # Create aspect mask from positions
        aspect_mask = torch.zeros_like(input_ids, dtype=torch.bool)
        for i, pos in enumerate(aspect_positions):
            for p in pos:
                if p < aspect_mask.size(1):
                    aspect_mask[i, p] = True

        # Masked attention over aspect tokens
        aspect_tokens = sequence_output * aspect_mask.unsqueeze(-1).float()
        aspect_repr = aspect_tokens.sum(dim=1) / aspect_mask.float().sum(dim=1, keepdim=True).clamp(min=1)

        logits = self.classifier(aspect_repr)
        return logits

# Example: "The food was great but the service was terrible"
# Aspect "food" β†’ Positive
# Aspect "service" β†’ Negative

Data Structures for ABSA

ComponentExampleLabel
Sentence"The food was great but the service was terrible"β€”
Aspect Term"food"β€”
Aspect Category"food quality"β€”
Sentimentβ€”Positive
Opinion Term"great"β€”

Multi-Task Learning for Sentiment

Combining multiple sentiment objectives improves generalization:

class MultiTaskSentiment(nn.Module):
    def __init__(self, bert_model_name="bert-base-uncased"):
        super().__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)
        hidden_size = self.bert.config.hidden_size

        # Task-specific heads
        self.sentiment_head = nn.Linear(hidden_size, 3)      # Pos/Neg/Neu
        self.emotion_head = nn.Linear(hidden_size, 6)         # Joy, Anger, etc.
        self.aspect_head = nn.Linear(hidden_size, 5)          # Food, Service, etc.

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids, attention_mask=attention_mask)
        pooled = outputs.pooler_output

        return {
            "sentiment": self.sentiment_head(pooled),
            "emotion": self.emotion_head(pooled),
            "aspect": self.aspect_head(pooled),
        }

# Multi-task loss
def multi_task_loss(outputs, labels, weights=(1.0, 0.5, 0.8)):
    loss_fct = nn.CrossEntropyLoss()
    total_loss = 0
    for (key, label), weight in zip(labels.items(), weights):
        total_loss += weight * loss_fct(outputs[key], label)
    return total_loss

Training Strategies

DfFocal Loss for Imbalanced Sentiment

Standard cross-entropy struggles with class imbalance. Focal loss down-weights easy examples:

FL(pt)=βˆ’Ξ±t(1βˆ’pt)Ξ³log⁑(pt)\text{FL}(p_t) = -\alpha_t (1 - p_t)^\gamma \log(p_t)

where Ξ³=2\gamma = 2 focuses learning on hard, misclassified examples.

Loss FunctionUse CaseKey Benefit
Cross-EntropyBalanced datasetsSimple, effective
Focal LossImbalanced classesHard-example focus
Label SmoothingNoisy labelsRegularization
Contrastive LossFew-shot sentimentMetric learning

Evaluation and Best Practices

MetricFormulaWhen to Use
AccuracyTP+TNTotal\frac{TP + TN}{Total}Balanced classes
Macro-F11Cβˆ‘i=1CF1i\frac{1}{C}\sum_{i=1}^{C} F1_iEqual class importance
Weighted-F1βˆ‘iniNF1i\sum_i \frac{n_i}{N} F1_iProportional to support
MCCTPβ‹…TNβˆ’FPβ‹…FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}Imbalanced binary

Key Takeaways

  • LSTMs remain effective for resource-constrained environments and short texts
  • BERT models achieve state-of-the-art on most sentiment benchmarks
  • ABSA provides actionable insights by linking sentiment to specific aspects
  • Multi-task learning improves generalization across sentiment sub-tasks
  • Always consider class imbalance and choose appropriate loss functions
⭐

Premium Content

Deep Learning for Sentiment Analysis

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert NLP Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement