Caching: MEMORY_ONLY, DISK_ONLY, SER, Kryo Serialization
Difficulty: Expert | Companies: Netflix, Uber, Airbnb, LinkedIn, Databricks
βΉοΈInterview Context
Caching is one of the most impactful optimization levers in Spark. Interviewers expect you to understand every storage level, when to use each, and how serialization choices affect memory footprint and GC pressure.
Question
Compare all Spark storage levels (MEMORY_ONLY, MEMORY_ONLY_SER, MEMORY_AND_DISK, DISK_ONLY, and their 2x variants). When should you use Kryo serialization vs Java serialization for cached data? How does caching interact with the unified memory manager and garbage collection? Provide quantitative analysis of memory savings.
Detailed Answer
1. Storage Levels β Complete Taxonomy
from pyspark import StorageLevel
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("CachingDeepDive") \
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
.getOrCreate()
# All storage levels:
<div className="my-6 flex justify-center">
<svg viewBox="0 0 800 340" width="100%" style={{ maxWidth: 750 }} xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="cache-hdr" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stopColor="#6366f1"/>
<stop offset="100%" stopColor="#4f46e5"/>
</linearGradient>
<filter id="cache-shadow">
<feDropShadow dx="0" dy="2" stdDeviation="3" floodOpacity="0.12"/>
</filter>
</defs>
<rect x="10" y="10" width="780" height="320" rx="14" fill="#fff" filter="url(#cache-shadow)" stroke="#e2e8f0" strokeWidth="1"/>
<rect x="10" y="10" width="780" height="32" rx="14" fill="url(#cache-hdr)"/>
<rect x="10" y="28" width="780" height="14" fill="url(#cache-hdr)"/>
<text x="140" y="32" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Storage Level</text>
<text x="340" y="32" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Memory</text>
<text x="440" y="32" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Disk</text>
<text x="560" y="32" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Serialized</text>
<text x="700" y="32" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Replicated</text>
<text x="140" y="58" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_ONLY</text>
<text x="340" y="58" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="58" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="560" y="58" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="700" y="58" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">1</text>
<line x1="30" y1="66" x2="770" y2="66" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="80" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_ONLY_2</text>
<text x="340" y="80" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="80" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="560" y="80" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="700" y="80" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">2</text>
<line x1="30" y1="88" x2="770" y2="88" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="102" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_ONLY_SER</text>
<text x="340" y="102" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="102" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="560" y="102" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="700" y="102" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">1</text>
<line x1="30" y1="110" x2="770" y2="110" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="124" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_ONLY_SER_2</text>
<text x="340" y="124" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="124" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="560" y="124" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="700" y="124" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">2</text>
<line x1="30" y1="132" x2="770" y2="132" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="146" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_AND_DISK</text>
<text x="340" y="146" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="146" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="560" y="146" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="700" y="146" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">1</text>
<line x1="30" y1="154" x2="770" y2="154" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="168" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_AND_DISK_2</text>
<text x="340" y="168" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="168" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="560" y="168" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="700" y="168" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">2</text>
<line x1="30" y1="176" x2="770" y2="176" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="190" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_AND_DISK_SER</text>
<text x="340" y="190" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="190" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="560" y="190" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="700" y="190" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">1</text>
<line x1="30" y1="198" x2="770" y2="198" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="212" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">MEMORY_AND_DISK_SER_2</text>
<text x="340" y="212" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="440" y="212" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="560" y="212" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="700" y="212" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">2</text>
<line x1="30" y1="220" x2="770" y2="220" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="234" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">DISK_ONLY</text>
<text x="340" y="234" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="440" y="234" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="560" y="234" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="700" y="234" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">1</text>
<line x1="30" y1="242" x2="770" y2="242" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="256" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">DISK_ONLY_2</text>
<text x="340" y="256" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="440" y="256" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="560" y="256" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="700" y="256" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">2</text>
<line x1="30" y1="264" x2="770" y2="264" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="278" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">DISK_ONLY_3</text>
<text x="340" y="278" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="440" y="278" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="560" y="278" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">Yes</text>
<text x="700" y="278" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">3</text>
<line x1="30" y1="286" x2="770" y2="286" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="140" y="300" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">OFF_HEAP</text>
<text x="340" y="300" textAnchor="middle" fill="#6b7280" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Off</text>
<text x="440" y="300" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="560" y="300" textAnchor="middle" fill="#ef4444" fontFamily="Inter,system-ui,sans-serif" fontSize="10">No</text>
<text x="700" y="300" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">1</text>
</svg>
</div>
2. Caching API and Configuration
from pyspark.sql import functions as F
from pyspark import StorageLevel
df = spark.read.parquet("s3://data/events/")
# Basic caching
df.cache() # shorthand for persist(MEMORY_ONLY)
# Explicit persistence with storage level
df.persist(StorageLevel.MEMORY_AND_DISK)
# Serialized in memory with Kryo
df.persist(StorageLevel.MEMORY_ONLY_SER)
# Disk only (no memory caching)
df.persist(StorageLevel.DISK_ONLY)
# Two replicas for fault tolerance
df.persist(StorageLevel.MEMORY_ONLY_2)
# Off-heap (bypasses JVM heap entirely)
df.persist(StorageLevel.OFF_HEAP)
# Unpersist
df.unpersist()
# Verify cached status
print(df.is_cached)
print(df.storageLevel)
3. Memory Footprint Analysis
import sys
# Object size estimation
# Java objects have ~16 bytes header + field overhead
# Example: a simple (int, double, String) row
# Java serialization size estimation
# Per row overhead:
# Object header: 16 bytes
# Integer field: 8 bytes (boxed)
# Double field: 8 bytes (boxed)
# String reference: 8 bytes
# String data: 40+ bytes (char array, hash, etc.)
# Total: ~80 bytes per row (Java serialized)
# Kryo serialization size estimation
# Per row overhead:
# Integer: 4 bytes (unboxed)
# Double: 8 bytes (unboxed)
# String: 2 + N bytes (length + chars)
# Total: ~30 bytes per row (Kryo)
# Off-heap (UnsafeRow) size estimation
# Per row overhead:
# Header: 8 bytes
# Fixed-width fields: 8 bytes each
# Variable-width: offset array + data
# Total: ~40 bytes per row (off-heap)
# Quantitative comparison for 1 billion rows:
rows = 1e9
java_size = rows * 80 / 1e9 # 80 GB
kryo_size = rows * 30 / 1e9 # 30 GB
offheap_size = rows * 40 / 1e9 # 40 GB
print(f"Java serialized: {java_size:.1f} GB")
print(f"Kryo serialized: {kryo_size:.1f} GB")
print(f"Off-heap (UnsafeRow): {offheap_size:.1f} GB")
# Kryo saves ~62% memory vs Java serialization
4. Kryo vs Java Serialization
# Kryo serialization configuration
spark.conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
spark.conf.set("spark.kryoserializer.buffer", "64k") # initial buffer
spark.conf.set("spark.kryoserializer.buffer.max", "512m") # max buffer
spark.conf.set("spark.kryo.registrationRequired", "false") # allow unregistered
# Register custom classes for better performance
from pyspark.serializer import KryoSerializer
conf = spark.sparkContext._conf
conf.set("spark.kryo.registrator", "com.example.MyKryoRegistrator")
# Kryo benefits:
# 1. 2-10x smaller serialization size
# 2. 2-5x faster serialization/deserialization
# 3. Lower GC pressure (fewer objects)
# Kryo limitations:
# 1. Requires class registration for best performance
# 2. Not all Java types supported natively
# 3. Cross-version compatibility issues
# PySpark Kryo usage
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True)
])
# Kryo automatically handles PySpark internal types
# For custom Python objects, use pickle or cloudpickle
5. Cache eviction and spill mechanics
# When cache is full, Spark evicts blocks using LRU (Least Recently Used)
# The eviction process differs by storage level:
# MEMORY_ONLY:
# Eviction: LRU blocks dropped entirely
# Recomputation: lost blocks recomputed from lineage
# Risk: if lineage is long, recomputation is expensive
# MEMORY_AND_DISK:
# Eviction: LRU blocks spilled to disk
# Recomputation: reads from disk (slower but no recomputation)
# Risk: disk I/O bottleneck if too much spilling
# MEMORY_ONLY_SER:
# Eviction: LRU blocks dropped
# Same as MEMORY_ONLY but with smaller memory footprint
# Risk: deserialization overhead on access
# Unified memory manager interaction:
# Cache uses "storage" portion of unified memory pool
# If execution needs more memory, it can evict cached data
# Priority: execution > storage (execution always wins)
# Cache size monitoring
storage_info = spark._jsc.sc().getStorageStatus()
for info in storage_info:
print(f"Executor {info.executorId()}:")
print(f" Max memory: {info.maxMem() / 1e6:.1f} MB")
print(f" Used memory: {info.memUsed() / 1e6:.1f} MB")
print(f" Used disk: {info.diskUsed() / 1e6:.1f} MB")
6. When to Use Each Storage Level
# Decision matrix:
<div className="my-6 flex justify-center">
<svg viewBox="0 0 800 200" width="100%" style={{ maxWidth: 720 }} xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="dm-hdr" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stopColor="#6366f1"/>
<stop offset="100%" stopColor="#4f46e5"/>
</linearGradient>
<filter id="dm-shadow">
<feDropShadow dx="0" dy="2" stdDeviation="3" floodOpacity="0.12"/>
</filter>
</defs>
<rect x="10" y="10" width="780" height="180" rx="14" fill="#fff" filter="url(#dm-shadow)" stroke="#e2e8f0" strokeWidth="1"/>
<rect x="10" y="10" width="780" height="32" rx="14" fill="url(#dm-hdr)"/>
<rect x="10" y="28" width="780" height="14" fill="url(#dm-hdr)"/>
<text x="200" y="32" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Scenario</text>
<text x="560" y="32" textAnchor="middle" fill="#fff" fontFamily="Inter,system-ui,sans-serif" fontSize="11" fontWeight="700">Recommended Storage Level</text>
<text x="200" y="60" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Iterative ML algorithms</text>
<text x="560" y="60" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">MEMORY_ONLY (fastest access)</text>
<line x1="30" y1="70" x2="770" y2="70" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="200" y="84" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Large dataset, limited RAM</text>
<text x="560" y="84" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">MEMORY_AND_DISK (spill to disk)</text>
<line x1="30" y1="94" x2="770" y2="94" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="200" y="108" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Very large dataset</text>
<text x="560" y="108" textAnchor="middle" fill="#f59e0b" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">DISK_ONLY (no memory pressure)</text>
<line x1="30" y1="118" x2="770" y2="118" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="200" y="132" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">GC-sensitive workloads</text>
<text x="560" y="132" textAnchor="middle" fill="#8b5cf6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">OFF_HEAP or MEMORY_ONLY_SER</text>
<line x1="30" y1="142" x2="770" y2="142" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="200" y="156" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Fault-tolerant cluster</text>
<text x="560" y="156" textAnchor="middle" fill="#3b82f6" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">MEMORY_ONLY_2 (replication)</text>
<line x1="30" y1="166" x2="770" y2="166" stroke="#e2e8f0" strokeWidth="0.5"/>
<text x="200" y="180" textAnchor="middle" fill="#334155" fontFamily="Inter,system-ui,sans-serif" fontSize="10">Ad-hoc analytics</text>
<text x="560" y="180" textAnchor="middle" fill="#10b981" fontFamily="Inter,system-ui,sans-serif" fontSize="10" fontWeight="600">MEMORY_AND_DISK_SER (balanced)</text>
</svg>
</div>
# Example: Iterative ML with caching
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import VectorAssembler
# Cache training data for multiple iterations
training_data = spark.read.parquet("s3://data/training/")
assembler = VectorAssembler(inputCols=["f1", "f2", "f3"], outputCol="features")
prepared = assembler.transform(training_data).select("features", "label")
prepared.cache() # Cache before iterative algorithm
# Multiple training iterations reuse cached data
for i in range(10):
lr = LogisticRegression(maxIter=10, regParam=0.01 * (i + 1))
model = lr.fit(prepared) # reads from cache each time
prepared.unpersist()
7. Cache-Related Diagnostics
# Spark UI diagnostics for caching
# Navigate to Storage tab to see:
# - Cached RDD/DataFrame name
# - Storage level
# - Size in memory / on disk
# - Number of partitions cached
# - Fraction cached
# Programmatic diagnostics
def cache_diagnostics(df, name="DataFrame"):
"""Analyze caching behavior of a DataFrame."""
# Force materialization
count = df.count()
# Check storage level
print(f"--- {name} Cache Diagnostics ---")
print(f"Storage Level: {df.storageLevel}")
print(f"Is Cached: {df.is_cached}")
# Get executor storage info
storage = spark._jsc.sc().getStorageStatus()
total_mem = sum(s.memUsed() for s in storage)
total_disk = sum(s.diskUsed() for s in storage)
print(f"Total memory used: {total_mem / 1e6:.1f} MB")
print(f"Total disk used: {total_disk / 1e6:.1f} MB")
return count
# After caching
df.cache()
cache_diagnostics(df, "Events")
β οΈCommon Pitfall
Caching a DataFrame that won't be reused wastes memory and causes GC pressure. Always verify a DataFrame will be accessed multiple times before caching. Profile with df.count() to verify cache behavior.
π‘Interview Tip
When asked about caching, always discuss the trade-off triangle: memory footprint vs. recomputation cost vs. GC pressure. The optimal choice depends on your workload's access pattern and memory constraints.
Summary
| Storage Level | Best For | Memory Cost | Failure Handling |
|---|---|---|---|
| MEMORY_ONLY | Fastest access, fits in RAM | Highest | Recompute from lineage |
| MEMORY_ONLY_SER | Memory-constrained caching | Medium | Recompute from lineage |
| MEMORY_AND_DISK | Large datasets with spillover | Medium | Read from disk |
| DISK_ONLY | Very large datasets | Lowest | Read from disk |
| OFF_HEAP | GC-sensitive workloads | Off-heap | Recompute from lineage |
The key insight: caching is not free β it trades memory and GC pressure for faster access. Always profile before and after caching.