🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Federated Learning — Privacy-Preserving ML

Expert TopicsFederated Learning🟢 Free Lesson

Advertisement

Advanced Topics

Federated Learning — Training Models Without Sharing Data

Learn how federated learning enables collaborative model training while keeping data private and secure. Essential for healthcare, finance, and privacy-sensitive applications.

  • Federated Averaging — The core algorithm for distributed training
  • Privacy Preservation — Keeping data local while learning globally
  • Communication Efficiency — Reducing the cost of distributed learning

"The future of AI is decentralized and privacy-preserving."

Federated Learning — Complete Guide

Federated learning trains models across decentralized devices without centralizing data. Essential for privacy-sensitive applications.


Federated Learning Architecture

Central ServerGlobal Model θAggregation + DistributionDevice 1Hospital ALocal Data — Never SharedDevice 2Hospital BLocal Data — Never SharedDevice 3Hospital CLocal Data — Never SharedDevice 4Hospital DLocal Data — Never SharedPatient RecordsLab ResultsImaging DataGenomic DataBroadcast θBroadcast θBroadcast θBroadcast θΔθᵢΔθᵢΔθᵢΔθᵢFederated Averaging (FedAvg)θglobal = Σ (nₖ/n) · θₖnₖ = local samples on device k, n = total samples across all devices

How It Works

DfFederated Learning

Federated learning is a distributed machine learning approach where models are trained across multiple decentralized devices or servers holding local data samples, without exchanging the raw data.

Centralized vs Federated

  • Centralized ML: Data → Central server → Model (privacy risk, regulatory burden)
  • Federated ML: Model → Devices → Updates → Server (data stays local)

Formal Problem Statement:

Given KK clients with local datasets Dk\mathcal{D}_k, minimize:

minθF(θ)=k=1KnknFk(θ)\min_{\theta} F(\theta) = \sum_{k=1}^{K} \frac{n_k}{n} F_k(\theta)

where Fk(θ)=1nkiDk(xi,yi;θ)F_k(\theta) = \frac{1}{n_k} \sum_{i \in \mathcal{D}_k} \ell(x_i, y_i; \theta) is the local objective.


The FedAvg Algorithm

FedAvg Algorithm — Communication Round t1. Server BroadcastSend θt to all K clientsGlobal model shared2. Local TrainingEach client k runs E epochsθₖ ← θₖ − η∇Fₖ(θₖ)3. Upload UpdatesClients send Δθₖ to serverGradients or model diffs4. Aggregationθt+1 = Σ (nₖ/n)θₖWeighted averageRepeat for t = 0, 1, 2, ..., T roundsConvergence Guarantee (Convex Case)F(θ̄) − F(θ*) ≤ O(1/√(KT)) + O(1/(ηKT)) + O(E²G²/(K²η²))Communication ComplexityTotal cost = T × K × d (T rounds × K clients × d parameters)Compression: Top-K sparsification, quantization reduce by 10-100×

Differential Privacy in Federated Learning

Differential Privacy: ε-DP MechanismDefinition: (ε, δ)-Differential PrivacyPr[M(D) ∈ S] ≤ e^ε · Pr[M(D') ∈ S] + δ for all |D Δ D'| = 1Gradient Clippingg ← g · min(1, C/||g||)Clip gradients to norm CControls sensitivity Δf = 2C/nGaussian Noiseñ = g + N(0, σ²C²I)σ ≥ Δf·√(2ln(1.25/δ))/εHigher ε → less noise → less privacyPrivate Updateθ ← θ − η · ñAggregate noisy gradientsPer-round privacy cost ε_rPrivacy Accounting: Composition TheoremAfter T rounds: ε_total ≤ √(2T·ln(1/δ)) · ε_r (Rényi DP / Moments accountant)Key trade-off: More training rounds → more privacy budget consumed → need larger σ or stop earlier

Communication Efficiency

DfCommunication Bottleneck

In federated learning, communication is the primary bottleneck. Each round requires sending model parameters (dd dimensions) to KK clients and receiving updates back. Compression techniques reduce this cost at the expense of convergence speed.

Key Compression Methods

Gradient Compression:

  • Top-K Sparsification: Only send largest K% of gradients — reduces communication by KK×
  • Quantization: Reduce precision from 32-bit to 8-bit or binary — reduces by 4×
  • Error Feedback: Accumulate compression error locally to maintain convergence

One-Shot Federated Learning:

  • Each client trains locally to convergence
  • Server aggregates once: θglobal=knknθk\theta_{\text{global}} = \sum_k \frac{n_k}{n} \theta_k
  • No communication rounds — but lower model quality

Structured Updates:

  • Restrict updates to low-rank matrices or sparse structures
  • Naturally compressible without information loss

Privacy-Utility Trade-off

Privacy vs Model Utility Trade-offPrivacy Budget ε (log scale) →Model Accuracy (%) →ε=0.1Acc: 72%ε=0.5Acc: 81%ε=1.0Acc: 87%ε=5.0Acc: 91%ε=10Acc: 93%No DP baselineHigh PrivacyHigh Utility

Secure Aggregation

DfSecure Aggregation

Secure aggregation ensures the server learns only the aggregate kθk\sum_k \theta_k but not any individual θk\theta_k. Achieved via cryptographic protocols where clients add random masks that cancel out during aggregation.

Protocol Overview:

  1. Pairwise Masking: Each pair of clients (i,j)(i, j) shares a random mask mij=mjim_{ij} = -m_{ji} via Diffie-Hellman key exchange
  2. Summation: Each client sends θi+jmij\theta_i + \sum_j m_{ij} to server
  3. Cancellation: Server sums all masked updates: iθi+i<j(mij+mji)=iθi\sum_i \theta_i + \sum_{i<j}(m_{ij} + m_{ji}) = \sum_i \theta_i
  4. Privacy: Individual updates remain hidden even from the server

Practical Considerations

  • Dropout handling: Use Shamir's secret sharing so aggregation still works if some clients drop out
  • Overhead: ~2-3× computation overhead, but communication cost unchanged
  • Combined with DP: Secure aggregation + differential privacy provides defense in depth
  • Used by: Google (Gboard), Apple (Siri), PySyft framework

Non-IID Data Challenges

Non-IID Data Distributions Across ClientsIID: Balanced DistributionC120%C220%C320%C420%C520%Each client has similar class distributionNon-IID: Skewed DistributionC1C2C3C4C5Each client has different class proportions

Mitigation Strategies:

  • FedProx: Add proximal term μ2θθt2\frac{\mu}{2}\|\theta - \theta^t\|^2 to local objective — keeps clients close to global model
  • SCAFFOLD: Use control variates to correct client drift
  • Per-Layer Fine-Tuning: Only aggregate certain layers, freeze others
  • Data Augmentation: Synthetically balance data across clients

Federated Learning at Scale

Production Considerations

System Heterogeneity:

  • Devices have different compute, memory, and network capabilities
  • Solution: Asynchronous updates, straggler tolerance, adaptive participation

Privacy Regulations:

  • GDPR, HIPAA require data minimization
  • Federated learning + DP satisfies most regulatory requirements
  • Audit trail via secure logging

Real-World Deployments:

  • Google Gboard: Next-word prediction trained on billions of devices
  • Apple Siri: Voice recognition improvement across devices
  • Healthcare: Multi-hospital collaboration for disease prediction (e.g., cancer, COVID-19)
  • Finance: Anti-money laundering across banks without sharing customer data

Framework Landscape:

  • PySyft: Python library for private ML (OpenMined)
  • TensorFlow Federated: Google's FL framework
  • FATE: Enterprise FL by WeBank
  • Flower: Framework for FL research and production

Key Takeaways

Summary: Federated Learning

  • Federated learning trains models without sharing data
  • FedAvg is the standard aggregation algorithm
  • Differential privacy provides formal privacy guarantees: (ε,δ)(\varepsilon, \delta)-DP
  • Secure aggregation hides individual updates from the server
  • Non-IID data is the main technical challenge → use FedProx, SCAFFOLD
  • Communication efficiency via compression (Top-K, quantization) is critical
  • Privacy-utility trade-off: ε\varepsilon controls the balance
  • Composition theorems track privacy budget across rounds
  • Used by Google, Apple, healthcare, finance for privacy compliance

What to Learn Next

-> ML Ethics — Fairness, Bias, Interpretability and Responsible AI Learn about ml ethics — fairness, bias, interpretability and responsible ai.

-> MLOps — Machine Learning Operations Complete Guide Learn about mlops — machine learning operations complete guide.

-> ML System Design — Architecture and Production Patterns Learn about ml system design — architecture and production patterns.

-> Model Evaluation — Metrics, Cross-Validation and Selection Learn about model evaluation — metrics, cross-validation and selection.

-> Model Deployment — APIs, Containers and Production ML Learn about model deployment — apis, containers and production ml.

-> Causal Inference — Moving Beyond Correlation Learn about causal inference — moving beyond correlation.

Premium Content

Federated Learning — Privacy-Preserving ML

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement