Spark vs Flink vs Presto vs Trino: When to Use What
Difficulty: Expert | Companies: Meta, Google, Netflix, Uber, Airbnb
βΉοΈInterview Context
Technology comparison questions test architectural judgment. Interviewers expect you to understand trade-offs and recommend the right tool for specific use cases, not just list features.
Question
Compare Spark, Flink, Presto, and Trino for different data processing scenarios. What are the architectural differences between batch and streaming engines? When would you choose one over another? Provide specific use cases and performance characteristics for each.
Detailed Answer
1. Architecture Comparison
2. Performance Characteristics
# Performance comparison across workloads:
<div className="my-6 flex justify-center">
<svg viewBox="0 0 800 200" width="100%" style={{ maxWidth: 750 }} xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="perf-hdr" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stopColor="#6366f1"/>
<stop offset="100%" stopColor="#4f46e5"/>
</linearGradient>
<filter id="perf-shadow">
<feDropShadow dx="0" dy="2" stdDeviation="3" floodOpacity="0.12"/>
</filter>
</defs>
<rect x="10" y="10" width="780" height="180" rx="14" fill="#fff" filter="url(#perf-shadow)" stroke="#e2e8f0" strokeWidth="1"/>
<rect x="10" y="10" width="780" height="30" rx="14" fill="url(#perf-hdr)"/>
<rect x="10" y="24" width="780" height="16" fill="url(#perf-hdr)"/>
<text x="120" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Workload</text>
<text x="290" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Spark</text>
<text x="440" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Flink</text>
<text x="590" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Presto</text>
<text x="720" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Trino</text>
<text x="120" y="56" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Batch ETL</text>
<text x="290" y="56" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<text x="440" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<text x="590" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<text x="720" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<line x1="30" y1="66" x2="770" y2="66" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="120" y="80" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Streaming</text>
<text x="290" y="80" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<text x="440" y="80" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<text x="590" y="80" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">N/A</text>
<text x="720" y="80" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">N/A</text>
<line x1="30" y1="90" x2="770" y2="90" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="120" y="104" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Interactive Query</text>
<text x="290" y="104" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<text x="440" y="104" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">N/A</text>
<text x="590" y="104" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<text x="720" y="104" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<line x1="30" y1="114" x2="770" y2="114" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="120" y="128" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">ML Training</text>
<text x="290" y="128" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<text x="440" y="128" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Limited</text>
<text x="590" y="128" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">N/A</text>
<text x="720" y="128" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">N/A</text>
<line x1="30" y1="138" x2="770" y2="138" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="120" y="152" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Ad-hoc Analytics</text>
<text x="290" y="152" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<text x="440" y="152" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Limited</text>
<text x="590" y="152" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<text x="720" y="152" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<line x1="30" y1="162" x2="770" y2="162" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="120" y="176" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Large-scale Join</text>
<text x="290" y="176" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<text x="440" y="176" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<text x="590" y="176" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
<text x="720" y="176" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Good</text>
</svg>
</div>
# Latency comparison:
# Batch processing:
# Spark: minutes to hours (depends on data size)
# Flink: seconds to minutes (for batch mode)
# Presto/Trino: seconds to minutes (in-memory)
#
# Streaming:
# Spark Structured Streaming: 100ms - 1s (micro-batch)
# Flink: 1-10ms (event-at-a-time)
# Presto/Trino: Not designed for streaming
#
# Query latency:
# Spark SQL: 1-30 seconds
# Flink SQL: 1-10 seconds (for bounded queries)
# Presto/Trino: 1-10 seconds (in-memory)
3. Use Case Recommendations
# Use Case 1: Large-scale ETL (100GB - 10TB)
# Winner: Apache Spark
# Reason: Mature ecosystem, fault tolerance, wide connector support
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("ETL Pipeline") \
.config("spark.sql.shuffle.partitions", "200") \
.getOrCreate()
# Spark ETL example
raw_df = spark.read.parquet("s3://data/raw/")
transformed = raw_df \
.filter(F.col("status") == "active") \
.groupBy("category") \
.agg(F.sum("amount").alias("total"))
transformed.write.parquet("s3://data/processed/")
# Use Case 2: Real-time fraud detection (< 100ms latency)
# Winner: Apache Flink
# Reason: True streaming, low latency, stateful processing
# Flink equivalent (pseudocode):
# StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
# DataStream<Transaction> transactions = env.addSource(new KafkaSource<>());
# transactions
# .keyBy(Transaction::getUserId)
# .process(new FraudDetector()) // Stateful processing
# .addSink(new AlertSink<>());
# Use Case 3: Interactive analytics (BI dashboards)
# Winner: Presto/Trino
# Reason: In-memory execution, no setup overhead, SQL-first
# Trino query (SQL):
# SELECT category, SUM(amount) as total
# FROM events
# WHERE date >= '2024-01-01'
# GROUP BY category
# ORDER BY total DESC
# LIMIT 10;
# Use Case 4: Machine learning training
# Winner: Apache Spark
# Reason: MLlib integration, distributed training, feature engineering
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
assembler = VectorAssembler(inputCols=["feature1", "feature2"], outputCol="features")
classifier = RandomForestClassifier(featuresCol="features", labelCol="label")
pipeline = Pipeline(stages=[assembler, classifier])
model = pipeline.fit(training_data)
# Use Case 5: Mixed batch and streaming
# Winner: Apache Spark
# Reason: Unified API (Structured Streaming), same code for batch and stream
# Use Case 6: Ad-hoc queries on data lake
# Winner: Trino (formerly PrestoSQL)
# Reason: Better performance than Presto, more connectors, cost-based optimizer
4. Memory and State Management
# Memory management comparison:
<div className="my-6 flex justify-center">
<svg viewBox="0 0 800 170" width="100%" style={{ maxWidth: 700 }} xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="mem-hdr" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stopColor="#6366f1"/>
<stop offset="100%" stopColor="#4f46e5"/>
</linearGradient>
<filter id="mem-shadow">
<feDropShadow dx="0" dy="2" stdDeviation="3" floodOpacity="0.12"/>
</filter>
</defs>
<rect x="10" y="10" width="780" height="150" rx="14" fill="#fff" filter="url(#mem-shadow)" stroke="#e2e8f0" strokeWidth="1"/>
<rect x="10" y="10" width="780" height="30" rx="14" fill="url(#mem-hdr)"/>
<rect x="10" y="24" width="780" height="16" fill="url(#mem-hdr)"/>
<text x="140" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Feature</text>
<text x="340" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Spark</text>
<text x="500" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Flink</text>
<text x="680" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Presto/Trino</text>
<text x="140" y="56" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Memory Model</text>
<text x="340" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Unified</text>
<text x="500" y="56" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Segmented</text>
<text x="680" y="56" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">In-memory</text>
<line x1="30" y1="66" x2="770" y2="66" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="80" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">State Backend</text>
<text x="340" y="80" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">External</text>
<text x="500" y="80" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">RocksDB</text>
<text x="680" y="80" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">None</text>
<line x1="30" y1="90" x2="770" y2="90" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="104" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Backpressure</text>
<text x="340" y="104" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Limited</text>
<text x="500" y="104" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Excellent</text>
<text x="680" y="104" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">None</text>
<line x1="30" y1="114" x2="770" y2="114" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="128" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Checkpointing</text>
<text x="340" y="128" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">External</text>
<text x="500" y="128" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Built-in</text>
<text x="680" y="128" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">None</text>
<line x1="30" y1="138" x2="770" y2="138" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="152" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Fault Tolerance</text>
<text x="340" y="152" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">RDD lineage</text>
<text x="500" y="152" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Chandy-Lamport</text>
<text x="680" y="152" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Query retry</text>
</svg>
</div>
# Spark memory model:
# Unified memory pool divided into:
# - Execution memory (sort, join, aggregation)
# - Storage memory (cached data)
# Can borrow from each other
# Flink memory model:
# Segmented memory with:
# - Framework heap (job/task management)
# - Task heap (user code)
# - Network buffers (shuffle)
# - Managed memory (state, sorting)
# Presto/Trino memory model:
# In-memory processing:
# - Query memory (per query)
# - Exchange memory (shuffle between stages)
# No persistent state management
# State management comparison:
# Spark: External (Delta Lake, Cassandra, HBase)
# - Requires external system for state
# - Good for large state (TB+)
# - Higher latency for state access
#
# Flink: Built-in (RocksDB, heap)
# - Integrated state management
# - Good for medium state (GB-TB)
# - Low latency for state access
#
# Presto/Trino: Stateless
# - No state management
# - Queries are stateless
# - Good for read-only analytics
5. Ecosystem and Integration
# Ecosystem comparison:
<div className="my-6 flex justify-center">
<svg viewBox="0 0 800 210" width="100%" style={{ maxWidth: 700 }} xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="eco-hdr" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stopColor="#6366f1"/>
<stop offset="100%" stopColor="#4f46e5"/>
</linearGradient>
<filter id="eco-shadow">
<feDropShadow dx="0" dy="2" stdDeviation="3" floodOpacity="0.12"/>
</filter>
</defs>
<rect x="10" y="10" width="780" height="190" rx="14" fill="#fff" filter="url(#eco-shadow)" stroke="#e2e8f0" strokeWidth="1"/>
<rect x="10" y="10" width="780" height="30" rx="14" fill="url(#eco-hdr)"/>
<rect x="10" y="24" width="780" height="16" fill="url(#eco-hdr)"/>
<text x="140" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Component</text>
<text x="340" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Spark</text>
<text x="500" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Flink</text>
<text x="680" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Presto/Trino</text>
<text x="140" y="56" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">SQL</text>
<text x="340" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Spark SQL</text>
<text x="500" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Flink SQL</text>
<text x="680" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">SQL native</text>
<line x1="30" y1="66" x2="770" y2="66" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="80" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">ML</text>
<text x="340" y="80" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">MLlib</text>
<text x="500" y="80" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Flink ML</text>
<text x="680" y="80" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">None</text>
<line x1="30" y1="90" x2="770" y2="90" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="104" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Graph</text>
<text x="340" y="104" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">GraphX</text>
<text x="500" y="104" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Gelly</text>
<text x="680" y="104" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">None</text>
<line x1="30" y1="114" x2="770" y2="114" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="128" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Streaming</text>
<text x="340" y="128" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Structured</text>
<text x="500" y="128" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">DataStream</text>
<text x="680" y="128" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">None</text>
<line x1="30" y1="138" x2="770" y2="138" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="152" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Connectors</text>
<text x="340" y="152" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">100+</text>
<text x="500" y="152" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">50+</text>
<text x="680" y="152" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">80+</text>
<line x1="30" y1="162" x2="770" y2="162" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="176" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Serialization</text>
<text x="340" y="176" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Parquet, ORC</text>
<text x="500" y="176" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Avro, Protobuf</text>
<text x="680" y="176" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Parquet, ORC</text>
<line x1="30" y1="186" x2="770" y2="186" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="196" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Storage</text>
<text x="340" y="196" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">HDFS, S3, GCS</text>
<text x="500" y="196" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">HDFS, S3</text>
<text x="680" y="196" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">HDFS, S3</text>
</svg>
</div>
# Connector support:
# Spark: HDFS, S3, GCS, Azure, Cassandra, HBase, Kafka, JDBC, Delta Lake
# Flink: HDFS, S3, GCS, Azure, Cassandra, Kafka, JDBC, Elasticsearch
# Presto/Trino: HDFS, S3, GCS, Azure, Cassandra, Kafka, JDBC, MySQL, PostgreSQL
# SQL compatibility:
# Spark SQL: ANSI SQL with extensions
# Flink SQL: ANSI SQL with streaming extensions
# Presto/Trino: ANSI SQL with dialect differences
# - Presto: Amazon dialect
# - Trino: Open-source dialect (more compatible)
6. Operational Characteristics
# Operational comparison:
<div className="my-6 flex justify-center">
<svg viewBox="0 0 800 190" width="100%" style={{ maxWidth: 700 }} xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="ops-hdr" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stopColor="#6366f1"/>
<stop offset="100%" stopColor="#4f46e5"/>
</linearGradient>
<filter id="ops-shadow">
<feDropShadow dx="0" dy="2" stdDeviation="3" floodOpacity="0.12"/>
</filter>
</defs>
<rect x="10" y="10" width="780" height="170" rx="14" fill="#fff" filter="url(#ops-shadow)" stroke="#e2e8f0" strokeWidth="1"/>
<rect x="10" y="10" width="780" height="30" rx="14" fill="url(#ops-hdr)"/>
<rect x="10" y="24" width="780" height="16" fill="url(#ops-hdr)"/>
<text x="140" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Aspect</text>
<text x="340" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Spark</text>
<text x="500" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Flink</text>
<text x="680" y="30" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Presto/Trino</text>
<text x="140" y="56" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Deployment</text>
<text x="340" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">YARN, K8s</text>
<text x="500" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">YARN, K8s</text>
<text x="680" y="56" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">K8s, Bare</text>
<line x1="30" y1="66" x2="770" y2="66" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="80" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Scaling</text>
<text x="340" y="80" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Dynamic</text>
<text x="500" y="80" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Reactive</text>
<text x="680" y="80" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Dynamic</text>
<line x1="30" y1="90" x2="770" y2="90" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="104" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Monitoring</text>
<text x="340" y="104" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Spark UI</text>
<text x="500" y="104" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Web UI</text>
<text x="680" y="104" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Web UI</text>
<line x1="30" y1="114" x2="770" y2="114" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="128" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Resource Usage</text>
<text x="340" y="128" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">High</text>
<text x="500" y="128" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Medium</text>
<text x="680" y="128" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Low</text>
<line x1="30" y1="138" x2="770" y2="138" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="152" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Startup Time</text>
<text x="340" y="152" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Minutes</text>
<text x="500" y="152" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Seconds</text>
<text x="680" y="152" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Seconds</text>
<line x1="30" y1="162" x2="770" y2="162" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="176" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Learning Curve</text>
<text x="340" y="176" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Medium</text>
<text x="500" y="176" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">High</text>
<text x="680" y="176" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Low</text>
</svg>
</div>
# Resource requirements:
# Spark:
# - Driver: 1-4 cores, 2-8 GB memory
# - Executors: 4-16 cores, 8-64 GB memory each
# - Total: scales linearly with data size
#
# Flink:
# - JobManager: 1-2 cores, 2-4 GB memory
# - TaskManagers: 4-16 cores, 8-32 GB memory each
# - Total: scales with parallelism
#
# Presto/Trino:
# - Coordinator: 1-2 cores, 4-8 GB memory
# - Workers: 4-16 cores, 16-64 GB memory each
# - Total: scales with query concurrency
# Scaling patterns:
# Spark: Add more executors (dynamic allocation)
# Flink: Add more TaskManagers (reactive scaling)
# Presto/Trino: Add more workers (auto-discovery)
7. Decision Framework
Example decisions:
1. Netflix: Spark for ETL, Flink for real-time recommendations
2. Uber: Spark for batch analytics, Flink for real-time pricing
3. LinkedIn: Spark for ML, Presto for interactive analytics
4. Airbnb: Spark for ETL, Trino for ad-hoc queries
### 8. Hybrid Architectures
```python
# Common hybrid architectures:
# Architecture 1: Lambda Architecture
# Batch layer: Spark (hourly/daily batch)
# Speed layer: Flink (real-time streaming)
# Serving layer: Presto/Trino (query serving)
# Architecture 2: Kappa Architecture
# Stream processing: Flink (all data as streams)
# Batch processing: Flink (batch mode on streams)
# Query serving: Trino (ad-hoc queries)
# Architecture 3: Lakehouse Architecture
# Ingestion: Spark Structured Streaming
# Processing: Spark (batch and micro-batch)
# Storage: Delta Lake / Iceberg / Hudi
# Query: Trino (interactive) + Spark (batch)
# Implementation example:
# 1. Ingest with Spark Structured Streaming
stream_df = spark.readStream \
.format("kafka") \
.option("subscribe", "events") \
.load()
# 2. Process with Spark
processed = stream_df \
.withColumn("event", F.from_json(F.col("value"), schema)) \
.select("event.*")
# 3. Write to Delta Lake
processed.writeStream \
.format("delta") \
.option("checkpointLocation", "/checkpoint") \
.start("/delta/events")
# 4. Query with Trino (for interactive analytics)
# Trino query: SELECT * FROM delta.events WHERE date = CURRENT_DATE
# 5. Train ML with Spark MLlib
training_data = spark.read.format("delta").load("/delta/training")
model = pipeline.fit(training_data)
β οΈCommon Pitfall
Choosing Flink for batch ETL when Spark is more appropriate. Flink excels at streaming but has overhead for batch workloads. Use Spark for batch, Flink for streaming, unless you need true event-at-a-time processing.
π‘Interview Tip
When discussing technology comparison, always focus on use case fit rather than feature comparison. The best tool depends on your specific requirements: data size, latency, state management, and ecosystem.
Summary
| Technology | Primary Use | Strengths | Weaknesses |
|---|---|---|---|
| Spark | Batch ETL, ML | Mature ecosystem, fault tolerance | Higher latency for streaming |
| Flink | Real-time streaming | Low latency, state management | Steeper learning curve |
| Presto | Interactive analytics | In-memory, fast queries | No state management |
| Trino | Ad-hoc analytics | Better than Presto, more connectors | Limited streaming support |
The key to technology selection is understanding your specific requirements and choosing the tool that best fits those requirements, not the most popular or feature-rich option.