πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design AI Content Moderation System

ML System DesignComputer Vision and NLP⭐ Premium

Advertisement

Facebook, YouTube, TikTok, Twitter

Design AI Content Moderation System

Building multi-modal content moderation for billions of posts with high accuracy

Interview Question

"Design an AI content moderation system like Facebook or YouTube that can analyze text, images, and videos in real-time, detect policy violations with high accuracy, and handle millions of content submissions daily while maintaining low latency."

Difficulty: Hard | Frequently asked at Meta, Google/YouTube, TikTok, Twitter


1. Requirements Gathering

Functional Requirements

  1. Multi-modal Analysis: Analyze text, images, and videos
  2. Real-time Detection: Classify content in real-time as it's uploaded
  3. Policy Compliance: Enforce complex community guidelines
  4. Human Review: Escalate uncertain cases to human moderators
  5. Appeals Process: Allow users to appeal moderation decisions
  6. Explainability: Provide reasons for moderation decisions
  7. Continuous Learning: Adapt to new types of violations

Non-Functional Requirements

  1. Latency: < 500ms for text, < 2s for images, < 10s for videos
  2. Throughput: 10M+ content items/day, 1000+ items/second at peak
  3. Accuracy: > 95% recall for high-severity violations
  4. Precision: > 99% (minimize false positives)
  5. Availability: 99.99% uptime
  6. Scale: 2B+ users, 500M+ daily posts
  7. Multilingual: Support 100+ languages

ℹ️

Scale Perspective: Facebook processes over 2B daily active users, with 500M+ posts daily. YouTube receives 500+ hours of video per minute. Content moderation must scale to handle this volume while maintaining accuracy and low latency.


2. High-Level Architecture Overview

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         CONTENT UPLOAD                                      β”‚
β”‚  Mobile Apps β”‚ Web Clients β”‚ APIs β”‚ Live Streaming β”‚ Stories                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         CONTENT INGESTION                                   β”‚
β”‚  Message Queue β”‚ Content Storage β”‚ Metadata Extraction β”‚ Pre-processing    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό               β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  TEXT ANALYSIS          β”‚ β”‚ IMAGE ANALYSISβ”‚ β”‚ VIDEO ANALYSIS       β”‚
β”‚  (NLP Models)           β”‚ β”‚ (CV Models)   β”‚ β”‚ (3D CNN + NLP)       β”‚
β”‚  (< 100ms)              β”‚ β”‚ (< 500ms)     β”‚ β”‚ (< 5s)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        DECISION ENGINE                                       β”‚
β”‚  Score Aggregation β”‚ Policy Rules β”‚ Confidence Threshold β”‚ Action Selection β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό               β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  AUTO-APPROVE          β”‚ β”‚ HUMAN REVIEW  β”‚ β”‚ AUTO-REMOVE          β”‚
β”‚  (High confidence safe)β”‚ β”‚ (Uncertain)   β”‚ β”‚ (High confidence bad)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        FEEDBACK LOOP                                         β”‚
β”‚  Moderator Decisions β”‚ User Appeals β”‚ Policy Updates β”‚ Model Retraining     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ’‘

Key Insight: Content moderation requires multi-modal analysis. Text, images, and videos each need specialized models. The decision engine must combine signals from all modalities and apply complex policy rules.


3. Data Pipeline Design

3.1 Content Data Model

from dataclasses import dataclass
from typing import List, Dict, Optional
from datetime import datetime

@dataclass
class ContentItem:
    content_id: str
    user_id: str
    content_type: str  # text, image, video, live
    timestamp: datetime
    
    # Text content
    text: Optional[str]
    
    # Image content
    image_url: Optional[str]
    image_hashes: Optional[List[str]]
    
    # Video content
    video_url: Optional[str]
    video_duration: Optional[float]
    thumbnail_url: Optional[str]
    
    # Metadata
    language: str
    device_type: str
    location: Optional[Dict]
    
    # Moderation status
    moderation_status: str  # pending, approved, rejected, under_review
    confidence_score: Optional[float]
    violation_types: Optional[List[str]]

@dataclass
class ModerationDecision:
    content_id: str
    decision: str  # approve, reject, escalate
    confidence: float
    violation_types: List[str]
    explanation: str
    moderator_id: Optional[str]
    timestamp: datetime
    appeal_status: Optional[str]

3.2 Multi-Modal Feature Extraction

class MultiModalFeatureExtractor:
    def __init__(self):
        self.text_analyzer = TextAnalyzer()
        self.image_analyzer = ImageAnalyzer()
        self.video_analyzer = VideoAnalyzer()
    
    async def extract_features(self, content: ContentItem) -> Dict:
        features = {}
        
        if content.text:
            text_features = await self.text_analyzer.extract(content.text)
            features['text'] = text_features
        
        if content.image_url:
            image_features = await self.image_analyzer.extract(content.image_url)
            features['image'] = image_features
        
        if content.video_url:
            video_features = await self.video_analyzer.extract(content.video_url)
            features['video'] = video_features
        
        # Cross-modal features
        if 'text' in features and 'image' in features:
            features['text_image_match'] = self.compute_cross_modal_match(
                features['text'], features['image']
            )
        
        return features

class TextAnalyzer:
    async def extract(self, text: str) -> Dict:
        return {
            'toxicity_score': await self.predict_toxicity(text),
            'hate_speech_score': await self.predict_hate_speech(text),
            'spam_score': await self.predict_spam(text),
            'language': await self.detect_language(text),
            'sentiment': await self.analyze_sentiment(text),
            'entities': await self.extract_entities(text),
            'topic': await self.classify_topic(text)
        }

class ImageAnalyzer:
    async def extract(self, image_url: str) -> Dict:
        image = await self.load_image(image_url)
        return {
            'nsfw_score': await self.predict_nsfw(image),
            'violence_score': await self.predict_violence(image),
            'gore_score': await self.predict_gore(image),
            'face_count': await self.detect_faces(image),
            'ocr_text': await self.extract_text_from_image(image),
            'objects': await self.detect_objects(image),
            'scene': await self.classify_scene(image)
        }

⚠️

Critical Design Considerations:

  1. Multi-modal fusion: Text and images together can be more harmful than separately
  2. Context: Same image can be benign or harmful depending on context
  3. Adversarial attacks: Users try to evade detection with subtle modifications
  4. Cultural sensitivity: Different regions have different standards

4. Model Selection and Training

4.1 Multi-Modal Architecture

class ContentModerationModel:
    def __init__(self):
        self.text_model = TextClassificationModel()
        self.image_model = ImageClassificationModel()
        self.video_model = VideoClassificationModel()
        self.fusion_model = MultiModalFusionModel()
    
    async def predict(self, content: ContentItem) -> Dict:
        predictions = {}
        
        if content.text:
            text_pred = await self.text_model.predict(content.text)
            predictions['text'] = text_pred
        
        if content.image_url:
            image_pred = await self.image_model.predict(content.image_url)
            predictions['image'] = image_pred
        
        if content.video_url:
            video_pred = await self.video_model.predict(content.video_url)
            predictions['video'] = video_pred
        
        # Multi-modal fusion
        if len(predictions) > 1:
            fused_pred = await self.fusion_model.fuse(predictions)
            return fused_pred
        
        return list(predictions.values())[0]

4.2 Training Strategy

class ModerationTrainer:
    def __init__(self):
        self.models = {}
    
    def train_with_hard_examples(self, train_data, hard_examples):
        """Train with focus on hard examples"""
        # Standard training
        self.model.fit(train_data)
        
        # Hard negative mining
        hard_negatives = self.mine_hard_negatives(train_data)
        
        # Re-train with emphasis on hard examples
        combined_data = train_data + hard_negatives * 3  # Oversample
        self.model.fit(combined_data)
    
    def active_learning(self, unlabeled_data, budget=100):
        """Select most informative examples for labeling"""
        uncertainties = []
        for item in unlabeled_data:
            pred = self.model.predict(item)
            uncertainty = self.compute_uncertainty(pred)
            uncertainties.append((item, uncertainty))
        
        # Select most uncertain examples
        uncertainties.sort(key=lambda x: x[1], reverse=True)
        return [item for item, _ in uncertainties[:budget]]

ℹ️

Training Best Practices:

  1. Use focal loss for class imbalance
  2. Implement hard negative mining
  3. Use active learning for efficient labeling
  4. Regular retraining with new violation types

5. Serving Architecture

5.1 Real-time Moderation Pipeline

Architecture Diagram
Content Upload β†’ Message Queue β†’ Parallel Processing β†’ Decision Engine β†’ Action
                    β”‚                β”‚                    β”‚              β”‚
                    β–Ό                β–Ό                    β–Ό              β–Ό
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚ Kafka   β”‚    β”‚ Text    β”‚          β”‚ Score   β”‚    β”‚ Approve β”‚
               β”‚ Queue   β”‚    β”‚ Image   β”‚          β”‚ Fusion  β”‚    β”‚ Reject  β”‚
               β”‚         β”‚    β”‚ Video   β”‚          β”‚         β”‚    β”‚ Escalateβ”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5.2 Human-in-the-Loop

class HumanInTheLoop:
    def __init__(self):
        self.uncertainty_threshold = 0.7
        self.priority_queue = PriorityQueue()
    
    async def should_escalate(self, prediction):
        # Check confidence
        if prediction['confidence'] < self.uncertainty_threshold:
            return True
        
        # Check violation type
        high_risk_violations = ['terrorism', 'child_exploitation', 'imminent_harm']
        if any(v in prediction['violation_types'] for v in high_risk_violations):
            return True
        
        return False
    
    async def assign_to_moderator(self, content, prediction):
        # Determine priority
        priority = self.compute_priority(prediction)
        
        # Find available moderator with expertise
        moderator = await self.find_moderator(prediction['violation_types'])
        
        # Assign
        await self.priority_queue.add(content, priority, moderator)
        
        return moderator

πŸ’‘

Human Review Tips:

  1. Prioritize high-severity violations
  2. Match moderator expertise to violation type
  3. Provide clear context and explanations
  4. Track moderator performance and wellbeing

6. Monitoring and Observability

6.1 Key Metrics

class ModerationMetrics:
    QUALITY_METRICS = ['precision', 'recall', 'f1_score', 'false_positive_rate', 'false_negative_rate']
    OPERATIONAL_METRICS = ['latency_p50', 'latency_p99', 'throughput', 'queue_depth']
    BUSINESS_METRICS = ['user_satisfaction', 'appeal_rate', 'overturn_rate']
    SAFETY_METRICS = ['high_severity_recall', 'time_to_detection', 'recidivism_rate']

7. Scale Considerations and Trade-offs

7.1 Horizontal Scaling

Architecture Diagram
Content Volume: Partition by content type and user region
Model Serving: GPU instances for CV models, CPU for NLP
Storage: Distributed object storage for content
Queue: Kafka with partitioning by content type

7.2 Cost vs Performance Trade-offs

DimensionOption A (Cost Optimized)Option B (Performance Optimized)
Model ComplexityLightweight modelsEnsemble of heavy models
Human ReviewMinimal human reviewExtensive human review
LatencyBatch processingReal-time processing
AccuracyAccept some false negativesMinimize all errors

8. Advanced Topics

8.1 Adversarial Robustness

class AdversarialRobustness:
    def detect_evasion(self, content):
        # Check for image obfuscation
        if self.detect_obfuscated_image(content.image):
            return True
        
        # Check for text obfuscation
        if self.detect_obfuscated_text(content.text):
            return True
        
        # Check for encoding tricks
        if self.detect_encoding_tricks(content):
            return True
        
        return False

8.2 Cross-Modal Understanding

class CrossModalAnalyzer:
    def analyze_combined(self, text, image):
        # Text-image consistency
        consistency = self.compute_consistency(text, image)
        
        # Combined harmfulness
        combined_harm = self.compute_combined_harm(text, image)
        
        # Context understanding
        context = self.understand_context(text, image)
        
        return {
            'consistency': consistency,
            'combined_harm': combined_harm,
            'context': context
        }

9. Implementation Roadmap

Phase 1: Basic Moderation (Weeks 1-4)

  • Text classification model
  • Basic image classification
  • Simple rule engine

Phase 2: Multi-Modal (Weeks 5-8)

  • Video analysis
  • Cross-modal fusion
  • Human review system

Phase 3: Advanced Features (Weeks 9-12)

  • Adversarial robustness
  • Active learning
  • Appeals process

Phase 4: Optimization (Weeks 13-16)

  • Latency optimization
  • Cost optimization
  • Global deployment

10. Summary and Key Takeaways

Architecture Recap

  1. Multi-modal analysis: Text, image, and video models
  2. Fusion model: Combines signals from all modalities
  3. Human-in-the-loop: For uncertain cases
  4. Feedback loop: Continuous improvement

Key Metrics

  • Recall: > 95% for high-severity violations
  • Precision: > 99% to minimize false positives
  • Latency: < 500ms for text, < 2s for images

Common Interview Mistakes

  1. Not discussing multi-modal analysis
  2. Ignoring adversarial robustness
  3. Forgetting about human review
  4. Not considering cultural sensitivity

ℹ️

Final Interview Tip: Emphasize the balance between automation and human review. Discuss how you'd handle adversarial attacks and cultural differences. Show understanding of both ML techniques and policy requirements.


Further Reading

  • "Multimodal Content Moderation" (Meta Research)
  • "Adversarial Robustness in Content Moderation" (Google)
  • "Human-in-the-Loop Systems" (Microsoft)
  • "Content Policy Enforcement at Scale" (YouTube)

Advertisement