πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

LSTM and GRU Networks

Neural NLPGated Recurrent Units🟒 Free Lesson

Advertisement

LSTM and GRU

Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) solve the vanishing gradient problem by introducing gating mechanisms that regulate information flow through the network.

LSTM Gates

Forget Gate

ft=Οƒ(Wfβ‹…[htβˆ’1,xt]+bf)f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)

Input Gate

it=Οƒ(Wiβ‹…[htβˆ’1,xt]+bi)i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)

Candidate Values

C~t=tanh⁑(WCβ‹…[htβˆ’1,xt]+bC)\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)

Cell State Update

Ct=ftβŠ™Ctβˆ’1+itβŠ™C~tC_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t

Output Gate

ot=Οƒ(Woβ‹…[htβˆ’1,xt]+bo)o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)

LSTM Implementation

import torch
import torch.nn as nn

class LSTMClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers,
                 num_classes, dropout=0.5):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(
            embed_dim, hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout,
            bidirectional=True
        )
        self.fc = nn.Linear(hidden_dim * 2, num_classes)
        self.dropout = nn.Dropout(dropout)

    def forward(self, text):
        embedded = self.dropout(self.embedding(text))
        output, (hidden, cell) = self.lstm(embedded)

        # Concatenate final forward and backward hidden states
        hidden = torch.cat((hidden[-2], hidden[-1]), dim=1)
        hidden = self.dropout(hidden)
        return self.fc(hidden)

# Initialize
model = LSTMClassifier(
    vocab_size=25000, embed_dim=300,
    hidden_dim=256, num_layers=2,
    num_classes=2, dropout=0.5
)

# Forward pass
batch = torch.randint(0, 25000, (32, 100))
output = model(batch)
print(output.shape)  # (32, 2)

GRU Architecture

GRU simplifies LSTM by combining the forget and input gates into a single update gate.

GRU Update Gate

zt=Οƒ(Wzβ‹…[htβˆ’1,xt]+bz)z_t = \sigma(W_z \cdot [h_{t-1}, x_t] + b_z)

GRU Reset Gate

rt=Οƒ(Wrβ‹…[htβˆ’1,xt]+br)r_t = \sigma(W_r \cdot [h_{t-1}, x_t] + b_r)

GRU Candidate

h~t=tanh⁑(Wβ‹…[rtβŠ™htβˆ’1,xt]+b)\tilde{h}_t = \tanh(W \cdot [r_t \odot h_{t-1}, x_t] + b)

GRU Output

ht=(1βˆ’zt)βŠ™htβˆ’1+ztβŠ™h~th_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tilde{h}_t

GRU Implementation

class GRUClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers,
                 num_classes, dropout=0.5):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.gru = nn.GRU(
            embed_dim, hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout,
            bidirectional=True
        )
        self.fc = nn.Linear(hidden_dim * 2, num_classes)
        self.dropout = nn.Dropout(dropout)

    def forward(self, text):
        embedded = self.dropout(self.embedding(text))
        output, hidden = self.gru(embedded)
        hidden = torch.cat((hidden[-2], hidden[-1]), dim=1)
        return self.fc(self.dropout(hidden))

LSTM vs GRU Comparison

AspectLSTMGRU
Gates3 (forget, input, output)2 (update, reset)
ParametersMoreFewer
Training speedSlowerFaster
PerformanceSlightly betterComparable
Memory usageHigherLower
Cell stateSeparateCombined with hidden

Bidirectional RNNs

Process sequences in both directions to capture full context.

# Forward: "The cat sat" -> h_forward
# Backward: "sat cat The" -> h_backward
# Combined: [h_forward; h_backward]

bilstm = nn.LSTM(128, 256, bidirectional=True, batch_first=True)
⭐

Premium Content

LSTM and GRU Networks

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert NLP Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement