Advanced Topics
Federated Learning — Training Models Without Sharing Data
Learn how federated learning enables collaborative model training while keeping data private and secure. Essential for healthcare, finance, and privacy-sensitive applications.
- Federated Averaging — The core algorithm for distributed training
- Privacy Preservation — Keeping data local while learning globally
- Communication Efficiency — Reducing the cost of distributed learning
"The future of AI is decentralized and privacy-preserving."
Federated Learning — Complete Guide
Federated learning trains models across decentralized devices without centralizing data. Essential for privacy-sensitive applications.
Federated Learning Architecture
How It Works
DfFederated Learning
Federated learning is a distributed machine learning approach where models are trained across multiple decentralized devices or servers holding local data samples, without exchanging the raw data.
Centralized vs Federated
- Centralized ML: Data → Central server → Model (privacy risk, regulatory burden)
- Federated ML: Model → Devices → Updates → Server (data stays local)
Formal Problem Statement:
Given clients with local datasets , minimize:
where is the local objective.
The FedAvg Algorithm
Differential Privacy in Federated Learning
Communication Efficiency
DfCommunication Bottleneck
In federated learning, communication is the primary bottleneck. Each round requires sending model parameters ( dimensions) to clients and receiving updates back. Compression techniques reduce this cost at the expense of convergence speed.
Key Compression Methods
Gradient Compression:
- Top-K Sparsification: Only send largest K% of gradients — reduces communication by ×
- Quantization: Reduce precision from 32-bit to 8-bit or binary — reduces by 4×
- Error Feedback: Accumulate compression error locally to maintain convergence
One-Shot Federated Learning:
- Each client trains locally to convergence
- Server aggregates once:
- No communication rounds — but lower model quality
Structured Updates:
- Restrict updates to low-rank matrices or sparse structures
- Naturally compressible without information loss
Privacy-Utility Trade-off
Secure Aggregation
DfSecure Aggregation
Secure aggregation ensures the server learns only the aggregate but not any individual . Achieved via cryptographic protocols where clients add random masks that cancel out during aggregation.
Protocol Overview:
- Pairwise Masking: Each pair of clients shares a random mask via Diffie-Hellman key exchange
- Summation: Each client sends to server
- Cancellation: Server sums all masked updates:
- Privacy: Individual updates remain hidden even from the server
Practical Considerations
- Dropout handling: Use Shamir's secret sharing so aggregation still works if some clients drop out
- Overhead: ~2-3× computation overhead, but communication cost unchanged
- Combined with DP: Secure aggregation + differential privacy provides defense in depth
- Used by: Google (Gboard), Apple (Siri), PySyft framework
Non-IID Data Challenges
Mitigation Strategies:
- FedProx: Add proximal term to local objective — keeps clients close to global model
- SCAFFOLD: Use control variates to correct client drift
- Per-Layer Fine-Tuning: Only aggregate certain layers, freeze others
- Data Augmentation: Synthetically balance data across clients
Federated Learning at Scale
Production Considerations
System Heterogeneity:
- Devices have different compute, memory, and network capabilities
- Solution: Asynchronous updates, straggler tolerance, adaptive participation
Privacy Regulations:
- GDPR, HIPAA require data minimization
- Federated learning + DP satisfies most regulatory requirements
- Audit trail via secure logging
Real-World Deployments:
- Google Gboard: Next-word prediction trained on billions of devices
- Apple Siri: Voice recognition improvement across devices
- Healthcare: Multi-hospital collaboration for disease prediction (e.g., cancer, COVID-19)
- Finance: Anti-money laundering across banks without sharing customer data
Framework Landscape:
- PySyft: Python library for private ML (OpenMined)
- TensorFlow Federated: Google's FL framework
- FATE: Enterprise FL by WeBank
- Flower: Framework for FL research and production
Key Takeaways
Summary: Federated Learning
- Federated learning trains models without sharing data
- FedAvg is the standard aggregation algorithm
- Differential privacy provides formal privacy guarantees: -DP
- Secure aggregation hides individual updates from the server
- Non-IID data is the main technical challenge → use FedProx, SCAFFOLD
- Communication efficiency via compression (Top-K, quantization) is critical
- Privacy-utility trade-off: controls the balance
- Composition theorems track privacy budget across rounds
- Used by Google, Apple, healthcare, finance for privacy compliance
What to Learn Next
-> ML Ethics — Fairness, Bias, Interpretability and Responsible AI Learn about ml ethics — fairness, bias, interpretability and responsible ai.
-> MLOps — Machine Learning Operations Complete Guide Learn about mlops — machine learning operations complete guide.
-> ML System Design — Architecture and Production Patterns Learn about ml system design — architecture and production patterns.
-> Model Evaluation — Metrics, Cross-Validation and Selection Learn about model evaluation — metrics, cross-validation and selection.
-> Model Deployment — APIs, Containers and Production ML Learn about model deployment — apis, containers and production ml.
-> Causal Inference — Moving Beyond Correlation Learn about causal inference — moving beyond correlation.