πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

ELT vs ETL: Modern Data Stack Architecture

Data EngineeringData Architecture⭐ Premium

Advertisement

dbt & Snowflake Interview

ELT vs ETL: Modern Data Stack Architecture

Understanding data transformation paradigms

Interview Question

"Your company is migrating from a traditional ETL system to a modern ELT architecture. Compare ETL and ELT, explain the tradeoffs, and design the new architecture. Include: (1) tool selection, (2) data modeling approach, (3) transformation strategy, (4) testing and monitoring."

Difficulty: Medium-Hard | Frequently asked at dbt, Snowflake, Fivetran, Airbyte


Theoretical Foundation

ETL (Extract, Transform, Load)

Data is transformed before loading into the target system.

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ETL Architecture                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  Source Systems ──▢ Extract ──▢ Transform ──▢ Load ──▢ Targetβ”‚
β”‚                                                             β”‚
β”‚  Example:                                                   β”‚
β”‚  PostgreSQL ──▢ Extract rows ──▢ Clean/Validate ──▢ Redshiftβ”‚
β”‚                                                             β”‚
β”‚  Transformation happens in:                                 β”‚
β”‚  - ETL tool (Informatica, Talend)                          β”‚
β”‚  - Custom code (Python, Java)                              β”‚
β”‚  - Middleware (Apache Camel)                                β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Characteristics:

  • Transform before loading
  • Requires staging area
  • Fixed schema (schema-on-write)
  • Batch processing
  • Complex error handling

Pros:

  • Data is clean when loaded
  • Reduced load on target system
  • Well-established patterns

Cons:

  • Expensive transform infrastructure
  • Long development cycles
  • Inflexible schema changes
  • Limited ad-hoc analysis

ELT (Extract, Load, Transform)

Data is loaded first, then transformed within the target system.

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ELT Architecture                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  Source Systems ──▢ Extract ──▢ Load ──▢ Transform ──▢ Targetβ”‚
β”‚                                                             β”‚
β”‚  Example:                                                   β”‚
β”‚  PostgreSQL ──▢ Extract rows ──▢ Snowflake ──▢ dbt models   β”‚
β”‚                                                             β”‚
β”‚  Transformation happens in:                                 β”‚
β”‚  - Target warehouse (Snowflake, BigQuery)                  β”‚
β”‚  - dbt (data build tool)                                   β”‚
β”‚  - Spark on warehouse                                      β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Characteristics:

  • Load raw data first
  • Transform within warehouse
  • Flexible schema (schema-on-read)
  • SQL-based transformations
  • Version controlled

Pros:

  • Faster time to insights
  • Cheaper storage
  • Flexible schema evolution
  • SQL-based (easier to learn)
  • Version controlled transformations

Cons:

  • Raw data in warehouse
  • Higher warehouse compute costs
  • Requires warehouse optimization

Modern Data Stack

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Modern Data Stack (ELT)                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  Ingestion: Fivetran, Airbyte, Stitch                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Extract from 200+ sources                          β”‚   β”‚
β”‚  β”‚  Load raw data to warehouse                         β”‚   β”‚
β”‚  β”‚  Managed connectors (no maintenance)                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  Storage: Snowflake, BigQuery, Redshift, Databricks        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Scalable compute and storage                       β”‚   β”‚
β”‚  β”‚  SQL interface                                      β”‚   β”‚
β”‚  β”‚  Time travel, cloning                               β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  Transformation: dbt, Dataform, SQLMesh                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  SQL-based transformations                          β”‚   β”‚
β”‚  β”‚  Version controlled (Git)                           β”‚   β”‚
β”‚  β”‚  Testing and documentation                          β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  Orchestration: Airflow, Dagster, Prefect                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Schedule and monitor pipelines                     β”‚   β”‚
β”‚  β”‚  Dependency management                              β”‚   β”‚
β”‚  β”‚  Alerting                                           β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  BI: Tableau, Looker, Power BI, Metabase                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Visualization and dashboards                       β”‚   β”‚
β”‚  β”‚  Self-service analytics                             β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Comparison Matrix

AspectETLELT
Transform LocationExternal toolTarget warehouse
Data MovementSource β†’ Staging β†’ TargetSource β†’ Target
SchemaOn-write (fixed)On-read (flexible)
DevelopmentCustom codeSQL + dbt
Version ControlDifficultEasy (Git)
TestingManualAutomated (dbt tests)
Time to InsightsWeeks/MonthsDays/Hours
CostHigh (ETL infrastructure)Lower (warehouse compute)
FlexibilityLowHigh
Skill RequirementETL developersSQL + analytics

Data Modeling Approaches

Inmon (Top-Down)

Inmon Approach (ETL) — Top-DownSource→ETL→3NF Warehouse→Data Marts→BICentralized, normalized warehouse → single source of truth

Kimball (Bottom-Up)

Kimball Approach (ETL/ELT) — Bottom-UpSource→ETL/ELT→Dimensional Model→BIStar/Snowflake schema → business process oriented

Modern (dbt Approach)

Modern Approach (ELT) — dbtSource→Ingestion→Raw→Staging→Marts→BILayered architecture → incremental models → self-service analytics

Code Implementation

ETL Example (Traditional)

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder \
    .appName("ETL_Example") \
    .getOrCreate()

# ============================================================
# ETL: Extract, Transform, Load
# ============================================================

# EXTRACT: Read from source
source_data = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://source:5432/production") \
    .option("dbtable", "orders") \
    .option("user", "reader") \
    .option("password", "password") \
    .load()

# TRANSFORM: Clean and enrich (in Spark)
transformed_data = source_data \
    .filter(F.col("status") != "cancelled") \
    .withColumn("order_date", F.to_date("created_at")) \
    .withColumn("amount_clean", F.when(F.col("amount") < 0, 0).otherwise(F.col("amount"))) \
    .join(customer_data, "customer_id") \
    .groupBy("order_date", "customer_segment") \
    .agg(
        F.count("*").alias("order_count"),
        F.sum("amount_clean").alias("total_amount")
    )

# LOAD: Write to target warehouse
transformed_data.write \
    .format("parquet") \
    .mode("overwrite") \
    .partitionBy("order_date") \
    .save("s3://warehouse/order_summary/")

ELT Example (Modern with dbt)

-- ============================================================
-- ELT: Extract, Load, Transform (using dbt)
-- ============================================================

-- models/staging/stg_orders.sql
-- Source: Raw data loaded by Fivetran/Airbyte
-- Purpose: Clean and standardize raw data

SELECT
    id AS order_id,
    customer_id,
    amount,
    status,
    created_at,
    updated_at,
    _fivetran_synced AS _loaded_at
FROM {{ source('production', 'orders') }}
WHERE status != 'deleted'

-- models/marts/fct_orders.sql
-- Purpose: Business-level order facts

WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
),

customers AS (
    SELECT * FROM {{ ref('stg_customers') }}
),

order_summary AS (
    SELECT
        o.order_id,
        o.customer_id,
        c.customer_name,
        c.segment AS customer_segment,
        o.amount,
        o.status,
        DATE_TRUNC('day', o.created_at) AS order_date,
        o.created_at,
        o.updated_at
    FROM orders o
    LEFT JOIN customers c ON o.customer_id = c.customer_id
)

SELECT * FROM order_summary

-- models/marts/rpt_daily_revenue.sql
-- Purpose: Daily revenue report

WITH orders AS (
    SELECT * FROM {{ ref('fct_orders') }}
),

daily_summary AS (
    SELECT
        order_date,
        customer_segment,
        COUNT(DISTINCT order_id) AS order_count,
        COUNT(DISTINCT customer_id) AS customer_count,
        SUM(amount) AS total_revenue,
        AVG(amount) AS avg_order_value
    FROM orders
    WHERE status = 'completed'
    GROUP BY order_date, customer_segment
)

SELECT * FROM daily_summary

dbt Project Structure

Architecture Diagram
my_dbt_project/
β”œβ”€β”€ dbt_project.yml
β”œβ”€β”€ packages.yml
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   β”œβ”€β”€ _staging__models.yml
β”‚   β”‚   β”œβ”€β”€ stg_orders.sql
β”‚   β”‚   β”œβ”€β”€ stg_customers.sql
β”‚   β”‚   └── stg_products.sql
β”‚   β”œβ”€β”€ marts/
β”‚   β”‚   β”œβ”€β”€ _marts__models.yml
β”‚   β”‚   β”œβ”€β”€ fct_orders.sql
β”‚   β”‚   β”œβ”€β”€ dim_customers.sql
β”‚   β”‚   └── rpt_daily_revenue.sql
β”‚   └── intermediate/
β”‚       β”œβ”€β”€ int_orders_enriched.sql
β”‚       └── int_customer_lifetime.sql
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_order_amount_positive.sql
β”‚   └── test_customer_unique.sql
β”œβ”€β”€ macros/
β”‚   β”œβ”€β”€ generate_schema_name.sql
β”‚   └── materializations.sql
└── seeds/
    β”œβ”€β”€ country_codes.csv
    └── product_categories.csv

dbt Configuration

# dbt_project.yml
name: 'my_project'
version: '1.0.0'
config-version: 2

profile: 'snowflake'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:
  - "target"
  - "dbt_packages"

models:
  my_project:
    staging:
      +materialized: view
      +schema: staging
    marts:
      +materialized: table
      +schema: analytics
    intermediate:
      +materialized: ephemeral

vars:
  start_date: '2024-01-01'
  end_date: '2024-12-31'

# packages.yml
packages:
  - package: dbt-labs/dbt_utils
    version: ">=1.0.0"
  - package: dbt-labs/codegen
    version: ">=0.11.0"
  - package: calogica/dbt_expectations
    version: ">=0.10.0"

dbt Tests

# models/staging/_staging__models.yml
version: 2

models:
  - name: stg_orders
    description: "Cleaned orders from production database"
    columns:
      - name: order_id
        description: "Primary key"
        tests:
          - unique
          - not_null
      - name: customer_id
        description: "Foreign key to customers"
        tests:
          - not_null
          - relationships:
              to: ref('stg_customers')
              field: customer_id
      - name: amount
        description: "Order amount"
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 1000000
      - name: status
        description: "Order status"
        tests:
          - accepted_values:
              values: ['pending', 'completed', 'cancelled', 'refunded']

# models/marts/_marts__models.yml
version: 2

models:
  - name: fct_orders
    description: "Order facts with customer information"
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: order_date
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: "2020-01-01"
              max_value: "2030-12-31"

dbt Macros

-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}
    {%- if custom_schema_name is none -%}
        {{ default_schema }}
    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name }}
    {%- endif -%}
{%- endmacro %}

-- macros/insert_audit_columns.sql
{% macro insert_audit_columns() %}
    , CURRENT_TIMESTAMP AS _loaded_at
    , '{{ invocation_id }}' AS _dbt_invocation_id
    , '{{ run_started_at }}' AS _dbt_run_started_at
{% endmacro %}

dbt Snapshots (SCD Type 2)

-- snapshots/customers_snapshot.sql
{% snapshot customers_snapshot %}

{{
    config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='updated_at',
    )
}}

SELECT * FROM {{ source('production', 'customers') }}

{% endsnapshot %}

Airflow Integration

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago

with DAG(
    'dbt_pipeline',
    schedule_interval='0 6 * * *',
    start_date=days_ago(1),
    catchup=False,
) as dag:

    # Run dbt staging models
    dbt_staging = BashOperator(
        task_id='dbt_staging',
        bash_command='cd /opt/dbt && dbt run --select staging',
    )

    # Run dbt tests
    dbt_test = BashOperator(
        task_id='dbt_test',
        bash_command='cd /opt/dbt && dbt test --select staging',
    )

    # Run dbt marts
    dbt_marts = BashOperator(
        task_id='dbt_marts',
        bash_command='cd /opt/dbt && dbt run --select marts',
    )

    # Run dbt tests on marts
    dbt_test_marts = BashOperator(
        task_id='dbt_test_marts',
        bash_command='cd /opt/dbt && dbt test --select marts',
    )

    # Dependencies
    dbt_staging >> dbt_test >> dbt_marts >> dbt_test_marts

πŸ’‘

Production Tip: dbt is the de facto standard for ELT transformations. It provides: (1) version control for SQL, (2) automated testing, (3) documentation generation, (4) incremental models, and (5) a rich package ecosystem. Always use dbt for SQL-based transformations.


Common Follow-Up Questions

Q1: When would you choose ETL over ELT?

Choose ETL when:

  • Target system can't handle raw data (e.g., legacy systems)
  • Data is extremely sensitive and must be masked before loading
  • Transformations are complex and require custom code
  • Source systems have limited data

Choose ELT when:

  • Target is a modern warehouse (Snowflake, BigQuery)
  • You need flexibility for ad-hoc analysis
  • You want faster time to insights
  • Your team knows SQL

Q2: How do you handle late-arriving data in ELT?

-- dbt incremental model with late-arriving data
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge'
    )
}}

SELECT
    order_id,
    customer_id,
    amount,
    status,
    created_at,
    updated_at
FROM {{ source('production', 'orders') }}

{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

Q3: How do you test data quality in ELT?

-- dbt tests
-- Schema tests
tests:
  - unique
  - not_null
  - accepted_values
  - relationships

-- Data tests
SELECT
    order_id
FROM {{ ref('fct_orders') }}
WHERE amount < 0
-- Should return 0 rows

Q4: How do you handle schema evolution in ELT?

-- dbt handles schema evolution automatically
-- New columns are added to the target table
-- Use {{ config(materialized='incremental') }} for automatic schema evolution

⚠️

Critical Consideration: ELT requires a powerful warehouse to handle transformations. If your warehouse is slow or expensive, ETL might be more cost-effective. Always benchmark both approaches for your specific workload.


Company-Specific Tips

dbt Interview Tips

  • Discuss dbt and SQLMesh for transformations
  • Explain incremental models and materializations
  • Mention testing and documentation
  • Talk about packages and macros

Snowflake Interview Tips

  • Focus on ELT with Snowflake
  • Discuss Snowflake features (time travel, cloning)
  • Mention Snowflake optimization (clustering, search optimization)
  • Talk about Snowflake cost management

Fivetran/Airbyte Interview Tips

  • Discuss managed connectors for ingestion
  • Explain CDC and incremental sync
  • Mention schema evolution handling
  • Talk about data freshness monitoring

ℹ️

Final Takeaway: ELT is the modern standard for data transformation. It provides flexibility, speed, and cost-effectiveness. Use dbt for SQL transformations, Snowflake/BigQuery for storage, and Fivetran/Airbyte for ingestion. Always consider your team's skills and budget when choosing between ETL and ELT.

Advertisement