🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Generators, Iterators, yield, __iter__, __next__

PythonIterators & Generators⭐ Premium

Advertisement

Google, Meta & Amazon Interview

Generators, Iterators, yield, iter, next

Understanding iteration protocols and lazy evaluation in Python

Interview Question

"Explain the difference between iterators and generators. How does yield work internally? Implement custom iterators and generators for real-world use cases. Discuss memory efficiency and performance implications."

Difficulty: Hard | Frequently asked at Google, Meta, Amazon, Microsoft


Theoretical Foundation

Iterator Protocol

The iterator protocol consists of two methods:

  • __iter__(): Returns the iterator object itself
  • __next__(): Returns the next value or raises StopIteration
# Basic iterator implementation
class CountUp:
    """Custom iterator that counts up to a limit."""
    
    def __init__(self, start=0, limit=10):
        self.current = start
        self.limit = limit
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current >= self.limit:
            raise StopIteration
        value = self.current
        self.current += 1
        return value

# Usage
counter = CountUp(start=1, limit=5)
for num in counter:
    print(num, end=" ")
# Output: 1 2 3 4 5

# Manual iteration
counter2 = CountUp(start=1, limit=3)
print(next(counter2))  # 1
print(next(counter2))  # 2
print(next(counter2))  # 3
# next(counter2) would raise StopIteration

Iterables vs Iterators

# Iterable: Has __iter__() method
# Iterator: Has __iter__() AND __next__() methods

class MyIterable:
    """Example of an iterable."""
    
    def __init__(self, data):
        self.data = data
    
    def __iter__(self):
        # Returns a new iterator each time
        return MyIterator(self.data)

class MyIterator:
    """Example of an iterator."""
    
    def __init__(self, data):
        self.data = data
        self.index = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.index >= len(self.data):
            raise StopIteration
        value = self.data[self.index]
        self.index += 1
        return value

# Usage
my_iterable = MyIterable([1, 2, 3])

# Can create multiple independent iterators
for item in my_iterable:
    print(item, end=" ")  # 1 2 3
print()

for item in my_iterable:  # Works again!
    print(item, end=" ")  # 1 2 3

ℹ️

Key Concept: Iterables can create multiple independent iterators, while iterators are consumed once.


Generators

Generator Functions

Generators are functions that use yield to produce values lazily.

def countdown(n):
    """Generator function for countdown."""
    print("Starting countdown")
    while n > 0:
        yield n  # Pauses here
        n -= 1
    print("Done!")

# Usage
gen = countdown(3)
print(next(gen))  # Starting countdown, 3
print(next(gen))  # 2
print(next(gen))  # 1
# next(gen) would print "Done!" and raise StopIteration

# In for loop
for num in countdown(3):
    print(num, end=" ")
# Output: Starting countdown 3 2 1 Done!

Generator Expressions

# List comprehension (creates full list in memory)
squares_list = [x*x for x in range(1000000)]
print(f"List size: {len(squares_list)} items")
print(f"Memory: {squares_list.__sizeof__()} bytes")

# Generator expression (lazy evaluation)
squares_gen = (x*x for x in range(1000000))
print(f"Generator: {squares_gen}")
print(f"Memory: {squares_gen.__sizeof__()} bytes")

# Process one item at a time
total = sum(x*x for x in range(1000000))
print(f"Total: {total}")

Output:

Architecture Diagram
List size: 1000000 items
Memory: 8448728 bytes
Generator: <generator object <genexpr> at 0x...>
Memory: 200 bytes
Total: 333333333333500000

How yield Works Internally

def simple_generator():
    """Step-by-step yield execution."""
    print("Step 1: Before first yield")
    yield 1
    print("Step 2: After first yield, before second")
    yield 2
    print("Step 3: After second yield")
    return "Done"  # Optional return value

# Demonstrate execution flow
gen = simple_generator()

print("Creating generator (nothing executes yet)")
print(f"First next(): ", end="")
print(next(gen))  # Executes until first yield

print(f"Second next(): ", end="")
print(next(gen))  # Resumes and executes until second yield

print(f"Third next(): ", end="")
try:
    next(gen)  # Resumes and executes until return/StopIteration
except StopIteration as e:
    print(f"StopIteration with value: {e.value}")

Output:

Architecture Diagram
Creating generator (nothing executes yet)
First next(): Step 1: Before first yield
1
Second next(): Step 2: After first yield, before second
2
Third next(): Step 3: After second yield
StopIteration with value: Done

💡

Key Insight: yield pauses the generator's execution and saves its state. next() resumes from where it left off.


Memory Efficiency

Generator vs List

import sys
import time

def memory_comparison():
    """Compare memory usage of generators vs lists."""
    
    n = 1000000
    
    # List: Creates all items in memory
    start = time.time()
    list_data = [i * i for i in range(n)]
    list_time = time.time() - start
    list_memory = sys.getsizeof(list_data)
    
    # Generator: Creates items on demand
    start = time.time()
    gen_data = (i * i for i in range(n))
    gen_time = time.time() - start
    gen_memory = sys.getsizeof(gen_data)
    
    print(f"List: {list_memory:,} bytes, {list_time:.4f}s")
    print(f"Generator: {gen_memory:,} bytes, {gen_time:.4f}s")
    print(f"Memory savings: {(1 - gen_memory/list_memory) * 100:.1f}%")

memory_comparison()

Output:

Architecture Diagram
List: 8,448,728 bytes, 0.0312s
Generator: 200 bytes, 0.0001s
Memory savings: 100.0%

Infinite Sequences

def fibonacci():
    """Infinite Fibonacci sequence."""
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

def primes():
    """Infinite prime number generator."""
    yield 2
    candidate = 3
    while True:
        is_prime = True
        for i in range(2, int(candidate ** 0.5) + 1):
            if candidate % i == 0:
                is_prime = False
                break
        if is_prime:
            yield candidate
        candidate += 2

# Get first 10 Fibonacci numbers
fib = fibonacci()
first_10_fib = [next(fib) for _ in range(10)]
print(f"First 10 Fibonacci: {first_10_fib}")

# Get first 10 primes
prime_gen = primes()
first_10_primes = [next(prime_gen) for _ in range(10)]
print(f"First 10 Primes: {first_10_primes}")

Output:

Architecture Diagram
First 10 Fibonacci: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
First 10 Primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

ℹ️

Memory Efficiency: Generators can represent infinite sequences with O(1) memory, while lists would consume all available memory.


Advanced Generator Patterns

Generator Pipeline

def read_large_file(file_path):
    """Read file line by line (memory efficient)."""
    with open(file_path, 'r') as f:
        for line in f:
            yield line.strip()

def filter_comments(lines):
    """Filter out comment lines."""
    for line in lines:
        if not line.startswith('#'):
            yield line

def parse_csv(lines):
    """Parse CSV lines."""
    for line in lines:
        yield line.split(',')

def transform_data(records):
    """Transform data."""
    for record in records:
        yield {
            'name': record[0],
            'age': int(record[1]),
            'email': record[2]
        }

# Pipeline: file -> filter -> parse -> transform
def process_large_file(file_path):
    """Process large file with generator pipeline."""
    lines = read_large_file(file_path)
    non_comments = filter_comments(lines)
    csv_data = parse_csv(non_comments)
    transformed = transform_data(csv_data)
    return transformed

# Usage
# for record in process_large_file('large_data.csv'):
#     process(record)

Generator with send()

def accumulator():
    """Generator that accumulates values using send()."""
    total = 0
    while True:
        value = yield total
        if value is None:
            break
        total += value

# Usage
acc = accumulator()
next(acc)  # Prime the generator
print(acc.send(10))   # 10
print(acc.send(20))   # 30
print(acc.send(30))   # 60

# Advanced: Average calculator
def running_average():
    """Calculate running average using send()."""
    total = 0
    count = 0
    average = 0
    while True:
        value = yield average
        if value is None:
            break
        total += value
        count += 1
        average = total / count

avg = running_average()
next(avg)  # Prime
print(avg.send(10))   # 10.0
print(avg.send(20))   # 15.0
print(avg.send(30))   # 20.0

Generator Delegation with yield from

def flatten_nested(nested_list):
    """Flatten arbitrarily nested lists using yield from."""
    for item in nested_list:
        if isinstance(item, (list, tuple)):
            yield from flatten_nested(item)
        else:
            yield item

# Usage
nested = [1, [2, 3], [4, [5, 6]], [[7, 8], 9]]
flat = list(flatten_nested(nested))
print(f"Flattened: {flat}")

# Generator delegation
def outer_generator():
    """Delegate to inner generators."""
    yield from inner_generator_1()
    yield from inner_generator_2()

def inner_generator_1():
    yield 1
    yield 2

def inner_generator_2():
    yield 3
    yield 4

# Usage
for item in outer_generator():
    print(item, end=" ")  # 1 2 3 4

💡

Interview Tip: yield from is a powerful tool for generator delegation and flattening nested structures.


Real-World Use Cases

Database Pagination

def fetch_paginated(url, page_size=100):
    """Fetch paginated data from API."""
    page = 1
    while True:
        # In real code: response = requests.get(f"{url}?page={page}&size={page_size}")
        # Simulate API response
        data = [{"id": i, "page": page} for i in range(page_size)]
        
        if not data:
            break
        
        yield from data
        page += 1

# Usage
# for record in fetch_paginated("https://api.example.com/users"):
#     process(record)

File Processing

def read_in_chunks(file_obj, chunk_size=8192):
    """Read file in chunks for memory efficiency."""
    while True:
        chunk = file_obj.read(chunk_size)
        if not chunk:
            break
        yield chunk

def process_large_file(filename):
    """Process large file chunk by chunk."""
    with open(filename, 'rb') as f:
        for chunk in read_in_chunks(f):
            # Process chunk
            yield len(chunk)

# Usage
# for chunk_size in process_large_file("huge_file.bin"):
#     print(f"Processed {chunk_size} bytes")

Data Stream Processing

import time
from collections import deque

def tail_log(file_path, n=10):
    """Tail last n lines of a log file (like Unix tail -f)."""
    with open(file_path, 'r') as f:
        # Read last n lines
        last_lines = deque(maxlen=n)
        for line in f:
            last_lines.append(line)
        
        yield from last_lines
        
        # Follow new lines
        while True:
            line = f.readline()
            if not line:
                time.sleep(0.1)
                continue
            yield line

# Usage
# for line in tail_log("/var/log/app.log"):
#     print(line, end="")

Web Scraping

import re

def extract_links(html_content):
    """Extract links from HTML content."""
    link_pattern = r'href=["\'](.*?)["\']'
    for match in re.finditer(link_pattern, html_content):
        yield match.group(1)

def crawl(start_url, max_depth=2):
    """Simple web crawler using generators."""
    visited = set()
    queue = [(start_url, 0)]
    
    while queue:
        url, depth = queue.pop(0)
        
        if url in visited or depth > max_depth:
            continue
        
        visited.add(url)
        
        # In real code: response = requests.get(url)
        # html = response.text
        html = f"<a href='http://example.com/page{i}'>Link {i}</a>"
        
        yield {'url': url, 'depth': depth, 'links': list(extract_links(html))}
        
        # Add new URLs to queue
        for link in extract_links(html):
            if link not in visited:
                queue.append((link, depth + 1))

ℹ️

Real-World Pattern: Generators are essential for processing large datasets, streaming data, and memory-efficient operations.


Itertools for Advanced Iteration

import itertools
import operator

def itertools_examples():
    """Demonstrate powerful itertools functions."""
    
    # chain: Combine multiple iterables
    list1 = [1, 2, 3]
    list2 = [4, 5, 6]
    chained = itertools.chain(list1, list2)
    print(f"chain: {list(chained)}")
    
    # islice: Slice an iterator
    def infinite_counter():
        n = 0
        while True:
            yield n
            n += 1
    
    first_10 = list(itertools.islice(infinite_counter(), 10))
    print(f"islice: {first_10}")
    
    # groupby: Group consecutive elements
    data = [('A', 1), ('A', 2), ('B', 3), ('B', 4), ('A', 5)]
    grouped = itertools.groupby(data, key=lambda x: x[0])
    for key, group in grouped:
        print(f"groupby {key}: {list(group)}")
    
    # product: Cartesian product
    colors = ['red', 'blue']
    sizes = ['S', 'M', 'L']
    products = list(itertools.product(colors, sizes))
    print(f"product: {products}")
    
    # permutations and combinations
    items = ['A', 'B', 'C']
    perms = list(itertools.permutations(items, 2))
    print(f"permutations: {perms}")
    
    combos = list(itertools.combinations(items, 2))
    print(f"combinations: {combos}")
    
    # accumulate: Running totals
    numbers = [1, 2, 3, 4, 5]
    accumulated = list(itertools.accumulate(numbers))
    print(f"accumulate: {accumulated}")

itertools_examples()

Output:

Architecture Diagram
chain: [1, 2, 3, 4, 5, 6]
islice: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
groupby A: [('A', 1), ('A', 2)]
groupby B: [('B', 3), ('B', 4)]
groupby A: [('A', 5)]
product: [('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]
permutations: [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
combinations: [('A', 'B'), ('A', 'C'), ('B', 'C')]
accumulate: [1, 3, 6, 10, 15]

Performance Analysis

Time Complexity

OperationListGeneratorNotes
CreationO(n)O(1)Generator is lazy
First elementO(1)O(1)
All elementsO(n)O(n)Same
Random accessO(1)O(n)Generator must iterate
MemoryO(n)O(1)Generator is better

Space Complexity

import sys

def space_comparison():
    """Compare space complexity."""
    
    n = 1000000
    
    # List: O(n) space
    list_data = [i for i in range(n)]
    print(f"List memory: {sys.getsizeof(list_data):,} bytes")
    
    # Generator: O(1) space
    gen_data = (i for i in range(n))
    print(f"Generator memory: {sys.getsizeof(gen_data)} bytes")
    
    # Even better: Use itertools
    import itertools
    gen_itertools = itertools.count()
    print(f"itertools memory: {sys.getsizeof(gen_itertools)} bytes")

space_comparison()

Output:

Architecture Diagram
List memory: 8,448,728 bytes
Generator memory: 200 bytes
itertools memory: 56 bytes

⚠️

Performance Trade-off: Generators save memory but don't support random access. Choose based on your access pattern.


Common Patterns and Idioms

Generator Comprehension

# Generator comprehension syntax
squares = (x*x for x in range(10))
even_squares = (x for x in squares if x % 2 == 0)

# Chaining generators
def process_data(data):
    return (
        transform(item)
        for item in data
        if item is not None
        and item > 0
    )

def transform(x):
    return x * 2

StopIteration for Flow Control

def first_positive(numbers):
    """Return first positive number using StopIteration."""
    def _generator():
        for n in numbers:
            if n > 0:
                yield n
                return  # Raises StopIteration with None
    
    return next(_generator(), None)

# Alternative: Custom exception
class Found(Exception):
    def __init__(self, value):
        self.value = value

def find_in_nested(data, target):
    """Find target in nested structure."""
    def _search(obj):
        if isinstance(obj, list):
            for item in obj:
                _search(item)
        elif obj == target:
            raise Found(obj)
    
    try:
        _search(data)
        return None
    except Found as e:
        return e.value

result = find_in_nested([1, [2, 3], [4, [5]]], 3)
print(f"Found: {result}")

Context Manager Generators

from contextlib import contextmanager

@contextmanager
def managed_resource(name):
    """Context manager using generator."""
    print(f"Acquiring {name}")
    resource = {"name": name, "active": True}
    try:
        yield resource
    finally:
        print(f"Releasing {name}")
        resource["active"] = False

# Usage
with managed_resource("database") as res:
    print(f"Using {res}")
    # Do work
print(f"After: {res}")

💡

Interview Tip: Generators + context managers = powerful patterns for resource management.


Interview Tips

Common Follow-up Questions

  1. "What's the difference between return and yield in a generator?"

    • yield produces a value and pauses
    • return raises StopIteration (can optionally carry a value)
    • return in generators (Python 3.3+) sets StopIteration.value
  2. "How do you send values to a generator?"

    • Use generator.send(value)
    • Generator must be primed with next() first
    • yield expression returns the sent value
  3. "What are generator pipelines?"

    • Chain multiple generators together
    • Each generator processes data and passes to next
    • Memory efficient for large data processing

Code Review Tips

# BAD: Creating large list
def get_data():
    return [process(x) for x in large_dataset]

# GOOD: Generator for large data
def get_data():
    return (process(x) for x in large_dataset)

# BAD: Not handling StopIteration
def first_item(iterable):
    return next(iterable)

# GOOD: Handling empty iterable
def first_item(iterable, default=None):
    return next(iterable, default)

# BAD: Consuming generator multiple times
def process():
    gen = (x for x in range(10))
    list1 = list(gen)
    list2 = list(gen)  # Empty!

# GOOD: Create new generator
def process():
    def gen():
        return (x for x in range(10))
    list1 = list(gen())
    list2 = list(gen())  # Works!

⚠️

Common Mistake: Generators are consumed once. Create new generators if you need multiple passes.


Summary

FeatureIteratorGeneratorGenerator Expression
SyntaxClass with __iter__, __next__Function with yield(expr for x in iterable)
MemoryO(n) or O(1)O(1)O(1)
StateManual managementAutomaticAutomatic
ComplexityMore codeLess codeLeast code
FlexibilityFull controlLimitedLimited

When to Use What?

  • Iterators: When you need full control over iteration
  • Generators: For lazy evaluation and memory efficiency
  • Generator Expressions: For simple transformations
  • itertools: For complex iteration patterns

ℹ️

Key Takeaway: Generators and iterators are fundamental to Python's memory efficiency. Master them for writing Pythonic, performant code.


Practice Problems

  1. Custom Range: Implement a range() function using an iterator
  2. Flatten Nested: Create a generator to flatten arbitrarily nested lists
  3. Window Sliding: Implement a sliding window generator
  4. Chunk Splitter: Create a generator that splits data into chunks
  5. Infinite Buffer: Implement a ring buffer using generators

Further Reading

  • PEP 255: Simple Generators
  • PEP 342: Coroutines via Enhanced Generators
  • PEP 380: Syntax for Delegating to a Sub-Generator
  • Books: "Fluent Python" by Luciano Ramalho

Remember: Generators are lazy iterators that yield values one at a time. They're essential for memory-efficient processing of large datasets.

Advertisement