πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Looker Deep Dive: LookML, Explores & Derived Tables

GCP Data EngineeringLooker Advanced⭐ Premium

Advertisement

Looker Advanced Patterns

Master advanced Looker patterns including LookML modeling, explores, derived tables, and data engineering integration.

18 min readAdvanced

Advanced LookML Patterns

# Advanced explore with fanout prevention
explore: orders {
  label: "Sales Orders"
  group_label: "Sales"

  # Join with fanout prevention
  join: order_items {
    type: left_outer
    sql_on: ${orders.order_id} = ${order_items.order_id} ;;
    relationship: one_to_many
  }

  # Aggregate awareness for performance
  aggregate_table: daily_orders {
    query: {
      dimensions: [order_date_date]
      measures: [order_count, total_revenue]
    }
    materialization: {
      datagroup_trigger: daily_etl
    }
  }

  # Always filter for performance
  always_filter: {
    filters: [order_date_date: "30 days"]
  }

  # Access control
  access_filter: {
    field: customers.region
    user_attribute: allowed_regions
  }

  # Conditionally join
  join: promotions {
    type: left_outer
    sql_on: ${orders.promotion_id} = ${promotions.promotion_id} AND {% condition promotions.promotion_date %} ${promotions.start_date} {% endcondition %} ;;
    relationship: many_to_one
  }
}

Persistent Derived Tables (PDTs)

view: user_lifetime_value {
  derived_table: {
    sql:
      SELECT
        user_id,
        SUM(amount) as ltv,
        COUNT(*) as order_count,
        MIN(order_date) as first_order_date,
        MAX(order_date) as last_order_date
      FROM `project.analytics.orders`
      WHERE status = 'completed'
      GROUP BY 1
    ;;
    datagroup_trigger: daily_etl
    distribution: "user_id"
    sortkeys: ["user_id"]
    partition_keys: ["first_order_date"]
    cluster_keys: ["user_id"]
  }

  dimension: user_id {
    type: string
    primary_key: yes
    sql: ${TABLE}.user_id ;;
  }

  dimension: ltv {
    type: number
    sql: ${TABLE}.ltv ;;
    value_format_name: usd
  }

  dimension: ltv_tier {
    type: tier
    tiers: [0, 100, 500, 1000, 5000]
    style: integer
    sql: ${ltv} ;;
  }

  measure: total_ltv {
    type: sum
    sql: ${ltv} ;;
    value_format_name: usd
  }
}

✨

Best Practice: Use aggregate tables for dashboard performance. Implement datagroups for PDT refresh scheduling. Use always_filter to prevent full-table scans. Document all explores and fields. Test LookML changes in development mode before deploying.

πŸ’¬

Common Interview Questions

Q1: What is aggregate awareness in Looker?

Answer: Aggregate awareness automatically routes queries to the most efficient pre-computed aggregate table. Looker selects the smallest aggregate that satisfies the query, improving performance and reducing costs.

Q2: What is a datagroup in Looker?

Answer: Datagroups define refresh policies for PDTs. They can trigger on schedule, SQL query results, or custom triggers. Datagroups ensure PDTs are refreshed at appropriate intervals based on data freshness requirements.

Q3: How do you prevent fanout in Looker joins?

Answer: Use symmetric aggregates (count_distinct, sum_distinct) to prevent double-counting. Use join_types that limit row multiplication. Consider creating derived tables that pre-aggregate data before joining.

Q4: What is the difference between derived tables and PDTs?

Answer: Derived tables are virtual tables computed at query time. PDTs are materialized in the database and refreshed on a schedule. PDTs improve performance for expensive queries but require storage and refresh time.

Q5: How do you optimize Looker for large datasets?

Answer: 1) Use aggregate tables for dashboards, 2) Implement PDTs for expensive queries, 3) Use always_filter to prevent full-table scans, 4) Optimize LookML joins, 5) Use BigQuery materialized views, 6) Implement caching.

Advertisement