🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Python Performance — Profiling & Optimization

Python AdvancedPerformance🟢 Free Lesson

Advertisement

Python Performance — Profiling & Optimization

Understanding performance helps you write faster, more efficient Python code. Always profile first to find actual bottlenecks before optimizing.

Learning Objectives

  • Profile code to find bottlenecks with cProfile and line_profiler
  • Use timeit for micro-benchmarks
  • Apply common Python performance optimizations
  • Use memory profiling tools and generators for memory efficiency
  • Understand Big O notation for Python operations
  • Optimize data structures and algorithms

timeit — Micro-benchmarks

Measure execution time of small code snippets:

import timeit

# Time a statement
time = timeit.timeit('sum(range(1000))', number=10000)
print(f"Total: {time:.4f}s")  # ~0.5s

# Time a function
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

time = timeit.timeit(lambda: fib(30), number=10)
print(f"Fib(30) x10: {time:.4f}s")

# Compare approaches
list_time = timeit.timeit(
    '[i ** 2 for i in range(1000)]',
    number=10000
)
gen_time = timeit.timeit(
    'sum(i ** 2 for i in range(1000))',
    number=10000
)
print(f"List comprehension: {list_time:.4f}s")
print(f"Generator: {gen_time:.4f}s")

Timeit from Command Line

# Time a statement
python -m timeit "sum(range(1000))"

# Time with setup code
python -m timeit -s "import random; data = [random.randint(0,100) for _ in range(1000)]" "sorted(data)"

# Compare multiple approaches
python -m timeit "l = [1,2,3,4,5]; l.append(6)"
python -m timeit "l = [1,2,3,4,5]; l + [6]"

cProfile — Function-Level Profiling

Find which functions take the most time:

import cProfile
import pstats

def slow_function():
    """Simulate slow code."""
    total = 0
    for i in range(1000000):
        total += i ** 2
    return total

def another_slow():
    """Another slow function."""
    data = [i for i in range(500000)]
    return sum(data)

# Profile execution
cProfile.run('slow_function(); another_slow()')

# More detailed profiling
profiler = cProfile.Profile()
profiler.enable()

slow_function()
another_slow()

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')  # Sort by cumulative time
stats.print_stats(10)  # Show top 10 functions

Profiling to File

import cProfile
import pstats

# Profile and save results
cProfile.run('slow_function()', 'profile_output')

# Load and analyze later
stats = pstats.Stats('profile_output')
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(20)

# Save report to file
with open('profile_report.txt', 'w') as f:
    stats = pstats.Stats('profile_output', stream=f)
    stats.sort_stats('cumulative')
    stats.print_stats(20)

line_profiler — Line-by-Line Profiling

Find exactly which lines are slow:

# Install: pip install line_profiler
# Add @profile decorator
@profile
def process_data(data):
    filtered = [x for x in data if x > 0]         # Line 4
    squared = [x ** 2 for x in filtered]           # Line 5
    total = sum(squared)                            # Line 6
    average = total / len(filtered) if filtered else 0  # Line 7
    return average                                  # Line 8

# Run: kernprof -l -v script.py

Practical Line Profiling

# profile_example.py
import random

def generate_data(n):
    return [random.randint(0, 1000) for _ in range(n)]

def filter_data(data):
    return [x for x in data if x > 100]

def transform_data(data):
    return [x ** 2 for x in data]

def aggregate_data(data):
    return sum(data) / len(data) if data else 0

def main():
    data = generate_data(1000000)
    filtered = filter_data(data)
    transformed = transform_data(filtered)
    result = aggregate_data(transformed)
    return result

if __name__ == '__main__':
    main()

# Profile with: python -m line_profiler profile_example.py

Memory Profiling

# Install: pip install memory_profiler

from memory_profiler import profile

@profile
def memory_intensive():
    """Function with memory issues."""
    # List comprehension - high memory
    large_list = [i ** 2 for i in range(1000000)]

    # Filter creates another list
    filtered = [x for x in large_list if x % 2 == 0]

    # This doubles memory usage
    result = sum(filtered)

    return result

# Run: python -m memory_profiler script.py
# Output shows memory usage per line

Check Object Size

import sys

# Check size of different objects
print(f"int: {sys.getsizeof(42)} bytes")
print(f"float: {sys.getsizeof(3.14)} bytes")
print(f"string: {sys.getsizeof('hello')} bytes")
print(f"list: {sys.getsizeof([1, 2, 3])} bytes")
print(f"dict: {sys.getsizeof({'a': 1})} bytes")
print(f"set: {sys.getsizeof({1, 2, 3})} bytes")

# Large lists
big_list = list(range(1000000))
print(f"List of 1M ints: {sys.getsizeof(big_list)} bytes")
# ~8MB for the list object itself (each int is a separate object)

Common Python Optimizations

Use Sets for Membership Testing

import time

# Slow: O(n) lookup in list
large_list = list(range(1000000))
start = time.time()
999999 in large_list
print(f"List lookup: {time.time() - start:.6f}s")

# Fast: O(1) lookup in set
large_set = set(range(1000000))
start = time.time()
999999 in large_set
print(f"Set lookup: {time.time() - start:.6f}s")
# Sets are ~1000x faster for membership testing

Use Generators for Large Datasets

import sys

# List comprehension - loads all into memory
list_comp = [i ** 2 for i in range(1000000)]
print(f"List size: {sys.getsizeof(list_comp)} bytes")  # ~8MB

# Generator expression - lazy evaluation
gen_expr = (i ** 2 for i in range(1000000))
print(f"Generator size: {sys.getsizeof(gen_expr)} bytes")  # ~200 bytes

# Use generators for aggregation
total = sum(i ** 2 for i in range(1000000))  # Memory efficient

Use List Comprehensions Over Loops

import time

# Slow: manual loop
start = time.time()
result = []
for i in range(100000):
    result.append(i * 2)
print(f"Loop: {time.time() - start:.4f}s")

# Fast: list comprehension
start = time.time()
result = [i * 2 for i in range(100000)]
print(f"Comprehension: {time.time() - start:.4f}s")
# Comprehensions are ~30% faster due to optimized bytecode

Use Built-in Functions and Libraries

import time

data = list(range(100000))

# Slow: manual sum
start = time.time()
total = 0
for x in data:
    total += x
print(f"Manual sum: {time.time() - start:.4f}s")

# Fast: built-in sum
start = time.time()
total = sum(data)
print(f"Built-in sum: {time.time() - start:.4f}s")

# Fast: built-in sort
start = time.time()
sorted_data = sorted(data)
print(f"Built-in sort: {time.time() - start:.4f}s")

String Concatenation

import time

# Slow: string concatenation in loop
start = time.time()
result = ""
for i in range(10000):
    result += str(i)
print(f"Concatenation: {time.time() - start:.4f}s")

# Fast: join method
start = time.time()
result = "".join(str(i) for i in range(10000))
print(f"Join: {time.time() - start:.4f}s")

Big O Notation for Python

# O(1) — Constant time
def get_first(items):
    return items[0]  # Always one operation

# O(n) — Linear time
def find_item(items, target):
    for item in items:  # Checks each item
        if item == target:
            return True
    return False

# O(n²) — Quadratic time
def bubble_sort(items):
    n = len(items)
    for i in range(n):
        for j in range(n):  # Nested loop
            if items[j] > items[j+1]:
                items[j], items[j+1] = items[j+1], items[j]

# O(log n) — Logarithmic time
def binary_search(sorted_items, target):
    low, high = 0, len(sorted_items) - 1
    while low <= high:
        mid = (low + high) // 2
        if sorted_items[mid] == target:
            return mid
        elif sorted_items[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    return -1

# Python operation complexities:
# List append: O(1)
# List index: O(1)
# List contains: O(n)
# Dict get: O(1) average
# Dict contains: O(1) average
# Set contains: O(1) average
# Set add: O(1) average

Memory Optimization with slots

import sys

# Without __slots__
class PointRegular:
    def __init__(self, x, y):
        self.x = x
        self.y = y

# With __slots__
class PointOptimized:
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Compare memory usage
regular = PointRegular(1, 2)
optimized = PointOptimized(1, 2)

print(f"Regular: {sys.getsizeof(regular)} bytes + dict: {sys.getsizeof(regular.__dict__)} bytes")
print(f"Optimized: {sys.getsizeof(optimized)} bytes")
# Optimized uses ~40% less memory per instance

# For millions of objects, this adds up significantly

Real-World Examples

Example 1: Optimizing a Slow Function

import time
from collections import Counter

# SLOW VERSION
def count_words_slow(text):
    """Count word frequencies - slow version."""
    words = text.lower().split()
    word_count = {}
    for word in words:
        if word in word_count:
            word_count[word] += 1
        else:
            word_count[word] = 1
    return word_count

# OPTIMIZED VERSION
def count_words_fast(text):
    """Count word frequencies - fast version."""
    return Counter(text.lower().split())

# Even faster for large texts
def count_words_optimized(text):
    """Use Counter with generator for memory efficiency."""
    return Counter(word for word in text.lower().split() if word)

# Benchmark
text = "hello world " * 100000

start = time.time()
result1 = count_words_slow(text)
print(f"Slow: {time.time() - start:.4f}s")

start = time.time()
result2 = count_words_fast(text)
print(f"Fast: {time.time() - start:.4f}s")

Example 2: Database Query Optimization

import sqlite3
import time

def slow_query():
    """Inefficient database query pattern."""
    conn = sqlite3.connect('app.db')
    cursor = conn.cursor()

    # Bad: N+1 query problem
    users = cursor.execute("SELECT * FROM users").fetchall()
    for user in users:
        # This runs a query for EACH user!
        orders = cursor.execute(
            "SELECT * FROM orders WHERE user_id = ?", (user[0],)
        ).fetchall()

    return users

def fast_query():
    """Efficient database query pattern."""
    conn = sqlite3.connect('app.db')
    cursor = conn.cursor()

    # Good: Single JOIN query
    results = cursor.execute('''
        SELECT u.*, GROUP_CONCAT(o.id) as order_ids
        FROM users u
        LEFT JOIN orders o ON u.id = o.user_id
        GROUP BY u.id
    ''').fetchall()

    return results

Example 3: Caching Expensive Computations

import time
from functools import lru_cache

# Without caching
def fibonacci_slow(n):
    if n < 2:
        return n
    return fibonacci_slow(n-1) + fibonacci_slow(n-2)

# With caching
@lru_cache(maxsize=None)
def fibonacci_fast(n):
    if n < 2:
        return n
    return fibonacci_fast(n-1) + fibonacci_fast(n-2)

# Benchmark
start = time.time()
result1 = fibonacci_slow(35)
print(f"Slow fib(35): {result1} in {time.time() - start:.4f}s")

start = time.time()
result2 = fibonacci_fast(35)
print(f"Fast fib(35): {result2} in {time.time() - start:.4f}s")
# Caching makes it ~1000x faster

# Manual caching
cache = {}
def fibonacci_manual(n):
    if n in cache:
        return cache[n]
    if n < 2:
        return n
    result = fibonacci_manual(n-1) + fibonacci_manual(n-2)
    cache[n] = result
    return result

Common Mistakes

MistakeProblemSolution
Optimizing without profilingWastes time on non-bottlenecksProfile first to find actual bottlenecks
Using lists for membership testingO(n) lookupUse sets for O(1) lookup
String concatenation in loopsO(n²) timeUse "".join()
Not using generatorsHigh memory usageUse generators for large datasets
Ignoring built-in functionsSlower than C implementationsUse sum(), min(), max(), sorted()
Not using __slots__High memory per instanceUse __slots__ for many instances

Best Practices

# 1. Profile before optimizing
import cProfile
cProfile.run('my_function()')

# 2. Use appropriate data structures
# Need fast lookup? Use dict or set
# Need ordered data? Use list
# Need sorted data? Use heapq or sorted()

# 3. Use built-in functions
total = sum(data)
minimum = min(data)
maximum = max(data)
sorted_data = sorted(data)

# 4. Use generators for memory efficiency
total = sum(x**2 for x in range(1000000))

# 5. Cache expensive computations
from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_computation(n):
    return n ** n

# 6. Use __slots__ for many instances
class Point:
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x = x
        self.y = y

Key Takeaways

  1. Profile first — don't optimize blindly; find the actual bottleneck
  2. Use sets for O(1) membership testing instead of lists
  3. Use generators for memory-efficient iteration over large datasets
  4. Use list comprehensions over manual loops — they're ~30% faster
  5. Use built-in functions (sum, min, max, sorted) — they're implemented in C
  6. Use __slots__ to reduce memory overhead for classes with many instances
  7. Cache expensive computations with @lru_cache for repeated calls

Premium Content

Python Performance — Profiling & Optimization

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement