Template Types
Custom Flex Template
# pipeline.py - The actual pipeline code
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import json
def run():
"""Custom Dataflow pipeline."""
pipeline_options = PipelineOptions([
'--project', 'my-project',
'--runner', 'DataflowRunner',
'--region', 'us-central1',
])
with beam.Pipeline(options=pipeline_options) as pipeline:
(
pipeline
| 'Read' >> beam.io.ReadFromPubSub(
topic='projects/my-project/topics/events'
)
| 'Parse' >> beam.Map(lambda x: json.loads(x.decode('utf-8')))
| 'Transform' >> beam.Map(lambda x: {
'event_id': x['id'],
'event_type': x['type'],
'timestamp': x['timestamp']
})
| 'Write' >> beam.io.WriteToBigQuery(
'my-project:analytics.events',
schema='event_id:STRING,event_type:STRING,timestamp:TIMESTAMP',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)
)
if __name__ == '__main__':
run()
# metadata.json - Template metadata
{
"name": "Custom Pub/Sub to BigQuery",
"description": "Custom streaming pipeline from Pub/Sub to BigQuery",
"parameters": [
{
"name": "inputTopic",
"label": "Input Pub/Sub Topic",
"helpText": "Pub/Sub topic to read from",
"isOptional": false
},
{
"name": "outputTable",
"label": "Output BigQuery Table",
"helpText": "BigQuery table to write to",
"isOptional": false
}
]
}
Template Deployment
# Build Flex Template
gcloud dataflow flex-templates build gs://my-templates/pubsub-to-bq \
--image=gcr.io/my-project/pipeline:latest \
--sdk-language=PYTHON \
--metadata-file=metadata.json
# Run Flex Template
gcloud dataflow flex-templates run pubsub-to-bq-job \
--template-file-gcs-location=gs://my-templates/pubsub-to-bq \
--parameters=inputTopic=projects/my-project/topics/events,outputTable=my-project:analytics.events \
--region=us-central1
β¨
Best Practice: Use Flex Templates for custom pipelines as they provide better versioning, parameter management, and container-based deployment. Store templates in Artifact Registry for better access control. Always include metadata for parameter documentation.
Common Interview Questions
Q1: What is the difference between Flex and Classic templates?
Answer: Flex Templates use Docker containers, providing better versioning, parameter management, and deployment flexibility. Classic Templates package code in GCS and are simpler but less flexible. Use Flex Templates for production custom pipelines.
Q2: When would you use pre-built vs. custom templates?
Answer: Use pre-built templates for common patterns (Pub/Sub to BigQuery, GCS to BigQuery). Use custom templates when you need custom transformations, specific error handling, or integration with multiple services not covered by pre-built templates.
Q3: How do you version control Dataflow templates?
Answer: Store pipeline code in Git, build containers with semantic version tags, push to Artifact Registry, and reference specific versions in template metadata. Use CI/CD pipelines (Cloud Build) for automated template builds.
Q4: What are the benefits of Flex Templates?
Answer: 1) Container-based deployment, 2) Parameter validation, 3) Better versioning, 4) Custom runtime dependencies, 5) Integration with Artifact Registry, 6) Support for multiple SDKs.
Q5: How do you debug template deployment issues?
Answer: 1) Check Cloud Build logs for container build errors, 2) Verify template metadata parameters, 3) Check Dataflow job logs, 4) Validate IAM permissions, 5) Test pipeline locally with DirectRunner first.