πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Fine-Tuning Transformers

TransformersHugging Face Trainer, LoRA, and QLoRA🟒 Free Lesson

Advertisement

Fine-Tuning Transformers

Fine-tuning adapts pre-trained models to specific tasks. Modern approaches range from full fine-tuning to parameter-efficient methods like LoRA and QLoRA.

Hugging Face Trainer

The Trainer API provides a complete training loop with logging, evaluation, and checkpointing.

from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
)
from datasets import load_dataset
import numpy as np

# Load dataset
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        padding="max_length",
        truncation=True,
        max_length=512
    )

tokenized_datasets = dataset.map(tokenize_function, batched=True)
small_train = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

# Load model
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)

# Define metrics
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = (predictions == labels).mean()
    return {"accuracy": accuracy}

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    learning_rate=2e-5,
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train,
    eval_dataset=small_eval,
    compute_metrics=compute_metrics,
)

# Train
trainer.train()

LoRA (Low-Rank Adaptation)

LoRA freezes pre-trained weights and injects trainable low-rank decomposition matrices into each transformer layer.

DfLoRA Weight Update

Where:

  • W: Original frozen weight (d Γ— d)
  • B: Low-rank matrix (d Γ— r)
  • A: Low-rank matrix (r Γ— d)
  • r: Rank (typically 4-64, much smaller than d)

DfLoRA Forward Pass

Where Ξ± is a scaling hyperparameter.

from peft import LoraConfig, get_peft_model, TaskType

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    r=8,                          # Rank
    lora_alpha=32,                # Scaling factor
    lora_dropout=0.1,
    target_modules=["query", "value"],  # Apply to attention layers
    bias="none",
)

# Create PEFT model
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)
peft_model = get_peft_model(model, lora_config)

# Print trainable parameters
peft_model.print_trainable_parameters()
# trainable params: 667,906 || all params: 109,485,314 || trainable%: 0.61

LoRA reduces trainable parameters by ~1000Γ— while maintaining comparable performance to full fine-tuning. The low-rank constraint assumes that weight updates have a low intrinsic rank.

QLoRA (Quantized LoRA)

QLoRA combines 4-bit quantization with LoRA, enabling fine-tuning of large models on consumer GPUs.

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from peft import LoraConfig, get_peft_model

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
)

# LoRA config for QLoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Memory Comparison

MethodGPU MemoryTrainable ParamsSpeed
Full fine-tuning~40GB (7B model)100%1x
LoRA (r=8)~18GB (7B model)~0.1%~1.2x
QLoRA (r=16)~6GB (7B model)~0.1%~0.8x
Adapter~12GB (7B model)~1%~1.1x

Comparison of Fine-Tuning Methods

# Method 1: Full fine-tuning
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)
# All parameters are trainable

# Method 2: Freeze base, train head only
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)
for param in model.base_model.parameters():
    param.requires_grad = False

# Method 3: LoRA
from peft import get_peft_model, LoraConfig
lora_config = LoraConfig(r=8, lora_alpha=16, target_modules=["query", "value"])
model = get_peft_model(model, lora_config)

# Method 4: Adapter layers
from peft import get_peft_model, AdapterConfig
adapter_config = AdapterConfig(
    peft_type="ADAPTER",
    adapter_hidden_size=256,
)
model = get_peft_model(model, adapter_config)

Training Best Practices

PracticeRecommendationImpact
Learning rate2e-5 to 5e-5 (full), 1e-4 to 3e-4 (LoRA)Stability
Batch size16-32 effectiveGeneralization
Warmup6-10% of stepsConvergence
Weight decay0.01-0.1Regularization
Epochs2-5Overfitting

LoRA Forward Pass Computation

Mixed Precision Training

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
    with autocast():
        outputs = model(batch['input_ids'], labels=batch['labels'])
        loss = outputs.loss

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    optimizer.zero_grad()

Mixed precision training uses FP16 for forward/backward passes and FP32 for weight updates, reducing memory usage by ~50% while maintaining training stability.

⭐

Premium Content

Fine-Tuning Transformers

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert NLP Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement