Advanced Topics

Federated Learning — Training Models Without Sharing Data

Learn how federated learning enables collaborative model training while keeping data private and secure. Essential for healthcare, finance, and privacy-sensitive applications.

Federated Averaging — The core algorithm for distributed training
Privacy Preservation — Keeping data local while learning globally
Communication Efficiency — Reducing the cost of distributed learning

"The future of AI is decentralized and privacy-preserving."

Federated Learning — Complete Guide

Federated learning trains models across decentralized devices without centralizing data. Essential for privacy-sensitive applications.

Federated Learning Architecture

How It Works

DfFederated Learning

Federated learning is a distributed machine learning approach where models are trained across multiple decentralized devices or servers holding local data samples, without exchanging the raw data.

Centralized vs Federated

Centralized ML: Data → Central server → Model (privacy risk, regulatory burden)
Federated ML: Model → Devices → Updates → Server (data stays local)

Formal Problem Statement:

Given $K$ clients with local datasets $\mathcal{D}_k$ , minimize:

\min_{\theta} F(\theta) = \sum_{k=1}^{K} \frac{n_k}{n} F_k(\theta)

where $F_k(\theta) = \frac{1}{n_k} \sum_{i \in \mathcal{D}_k} \ell(x_i, y_i; \theta)$ is the local objective.

The FedAvg Algorithm

Differential Privacy in Federated Learning

Communication Efficiency

DfCommunication Bottleneck

In federated learning, communication is the primary bottleneck. Each round requires sending model parameters ( $d$ dimensions) to $K$ clients and receiving updates back. Compression techniques reduce this cost at the expense of convergence speed.

Key Compression Methods

Gradient Compression:

Top-K Sparsification: Only send largest K% of gradients — reduces communication by $K$ ×
Quantization: Reduce precision from 32-bit to 8-bit or binary — reduces by 4×
Error Feedback: Accumulate compression error locally to maintain convergence

One-Shot Federated Learning:

Each client trains locally to convergence
Server aggregates once: $\theta_{\text{global}} = \sum_k \frac{n_k}{n} \theta_k$
No communication rounds — but lower model quality

Structured Updates:

Restrict updates to low-rank matrices or sparse structures
Naturally compressible without information loss

Privacy-Utility Trade-off

Secure Aggregation

DfSecure Aggregation

Secure aggregation ensures the server learns only the aggregate $\sum_k \theta_k$ but not any individual $\theta_k$ . Achieved via cryptographic protocols where clients add random masks that cancel out during aggregation.

Protocol Overview:

Pairwise Masking: Each pair of clients $(i, j)$ shares a random mask $m_{ij} = -m_{ji}$ via Diffie-Hellman key exchange
Summation: Each client sends $\theta_i + \sum_j m_{ij}$ to server
Cancellation: Server sums all masked updates: $\sum_i \theta_i + \sum_{i<j}(m_{ij} + m_{ji}) = \sum_i \theta_i$
Privacy: Individual updates remain hidden even from the server

Practical Considerations

Dropout handling: Use Shamir's secret sharing so aggregation still works if some clients drop out
Overhead: ~2-3× computation overhead, but communication cost unchanged
Combined with DP: Secure aggregation + differential privacy provides defense in depth
Used by: Google (Gboard), Apple (Siri), PySyft framework

Non-IID Data Challenges

Mitigation Strategies:

FedProx: Add proximal term $\frac{\mu}{2}\|\theta - \theta^t\|^2$ to local objective — keeps clients close to global model
SCAFFOLD: Use control variates to correct client drift
Per-Layer Fine-Tuning: Only aggregate certain layers, freeze others
Data Augmentation: Synthetically balance data across clients

Federated Learning at Scale

Production Considerations

System Heterogeneity:

Devices have different compute, memory, and network capabilities
Solution: Asynchronous updates, straggler tolerance, adaptive participation

Privacy Regulations:

GDPR, HIPAA require data minimization
Federated learning + DP satisfies most regulatory requirements
Audit trail via secure logging

Real-World Deployments:

Google Gboard: Next-word prediction trained on billions of devices
Apple Siri: Voice recognition improvement across devices
Healthcare: Multi-hospital collaboration for disease prediction (e.g., cancer, COVID-19)
Finance: Anti-money laundering across banks without sharing customer data

Framework Landscape:

PySyft: Python library for private ML (OpenMined)
TensorFlow Federated: Google's FL framework
FATE: Enterprise FL by WeBank
Flower: Framework for FL research and production

Key Takeaways

Summary: Federated Learning

Federated learning trains models without sharing data
FedAvg is the standard aggregation algorithm
Differential privacy provides formal privacy guarantees: $(\varepsilon, \delta)$ -DP
Secure aggregation hides individual updates from the server
Non-IID data is the main technical challenge → use FedProx, SCAFFOLD
Communication efficiency via compression (Top-K, quantization) is critical
Privacy-utility trade-off: $\varepsilon$ controls the balance
Composition theorems track privacy budget across rounds
Used by Google, Apple, healthcare, finance for privacy compliance

What to Learn Next

-> ML Ethics — Fairness, Bias, Interpretability and Responsible AI Learn about ml ethics — fairness, bias, interpretability and responsible ai.

-> MLOps — Machine Learning Operations Complete Guide Learn about mlops — machine learning operations complete guide.

-> ML System Design — Architecture and Production Patterns Learn about ml system design — architecture and production patterns.

-> Model Evaluation — Metrics, Cross-Validation and Selection Learn about model evaluation — metrics, cross-validation and selection.

-> Model Deployment — APIs, Containers and Production ML Learn about model deployment — apis, containers and production ml.

-> Causal Inference — Moving Beyond Correlation Learn about causal inference — moving beyond correlation.

Federated Learning — Privacy-Preserving ML