Integration Fundamentals
Why It Matters
Integration is the reverse operation of differentiation and one of the two pillars of calculus. While derivatives measure rates of change, integrals accumulate quantities: areas under curves, volumes of solids, total probability, and expected values. In machine learning, integration is indispensable β probability density functions are normalized by integrating to 1, expected values are computed as integrals, Bayesian inference requires marginalizing over posterior distributions, and variational inference approximates intractable integrals. Every time you compute the probability that a continuous random variable falls in a range, you are performing integration. Every time you sample from a distribution or estimate an expectation, integration is happening underneath. Mastering integration means understanding the mathematical foundation behind probability, statistics, and the training of probabilistic models.
What is an Integral
DfIndefinite Integral (Antiderivative)
The indefinite integral of a function is a family of functions whose derivative is . It is written , where is an arbitrary constant of integration representing the fact that infinitely many antiderivatives differ only by a constant.
DfDefinite Integral
The definite integral of from to is the signed area between the curve and the -axis over the interval . It is defined as the limit of Riemann sums:
Riemann Sum Definition
Here,
- =Width of each subinterval: (b - a) / n
- =A sample point in the i-th subinterval
- =Function value at the sample point
- =The lower and upper limits of integration
Definite vs Indefinite
An indefinite integral produces a family of functions (), while a definite integral produces a number (the signed area). The Fundamental Theorem of Calculus connects them: the definite integral equals the antiderivative evaluated at the bounds.
Not All Functions Have Elementary Antiderivatives
Many common functions β such as , , and β do not have closed-form antiderivatives in terms of elementary functions. For these, we rely on numerical integration or special functions (e.g., the error function ).
Fundamental Theorem of Calculus
ThFundamental Theorem of Calculus (Part 1)
If is continuous on and , then is differentiable on and . In other words, differentiation undoes integration.
ThFundamental Theorem of Calculus (Part 2)
If is continuous on and is any antiderivative of (i.e., ), then:
Fundamental Theorem of Calculus
Here,
- =The integrand β the function being integrated
- =An antiderivative of f, where F'(x) = f(x)
- =The lower limit of integration
- =The upper limit of integration
- =The net change of F over [a, b]
Intuition
Part 1 says that if you build a function by integrating from a fixed point to a variable point , the rate at which this accumulated area grows is exactly . Part 2 says that to compute the accumulated area, you only need any antiderivative β evaluate it at the top bound and subtract its value at the bottom bound. This transforms the hard problem of summing infinitely many infinitesimal contributions into the easy problem of evaluating a function at two points.
Applying the Fundamental Theorem
Problem: Compute .
Solution
Find an antiderivative: (since ).
Evaluate at the bounds: .
The area under from to is .
Properties of Definite Integrals
| Property | Formula | Description |
|---|---|---|
| Reversing bounds | Swapping limits negates the integral | |
| Additivity | Split at any intermediate point | |
| Linearity | Constants factor out, integrals add | |
| Zero-width | Integrating over a point gives zero | |
| Positivity | If on , then | Non-negative functions have non-negative integrals |
| Comparison | If on , then | Inequality preserved under integration |
| Triangle Inequality | The integral of the absolute value bounds the absolute value of the integral |
Basic Integration Rules
The following rules allow us to integrate complex functions by breaking them into simpler parts.
| Rule | Formula | Example |
|---|---|---|
| Constant | ||
| Power Rule | () | |
| Reciprocal | ||
| Exponential | ||
| General Exponential | ||
| Sine | ||
| Cosine | ||
| Secant | ||
| Cosecant | ||
| Secant-Tangent | ||
| Cosecant-Cotangent | ||
| Inverse Sine | ||
| Inverse Tangent | ||
| Hyperbolic Sine | ||
| Hyperbolic Cosine |
Power Rule Exception
The power rule fails when . In that case, . This is because .
Integration by Substitution
DfIntegration by Substitution (u-substitution)
The substitution rule is the reverse of the chain rule for differentiation. If is a differentiable function whose range is an interval, and is continuous on that interval, then:
Substitution Rule
Here,
- =The substitution β the inner function
- =The differential of u
- =The original integrand with chain rule factor
When to Use Substitution
Use substitution when the integrand contains a function and its derivative (or a constant multiple of it). Look for a "inner function" whose derivative appears as a factor. Common patterns: , , .
Substitution Example 1
Problem: Compute .
Solution
Let , so .
.
Substitution Example 2
Problem: Compute .
Solution
Let , so .
When , . When , .
.
Don't Forget to Change the Bounds
When performing substitution on a definite integral, you must either change the limits of integration to match the new variable or back-substitute before evaluating. Forgetting to update the bounds is one of the most common errors.
Integration by Parts
DfIntegration by Parts
Integration by parts is the reverse of the product rule for differentiation. It is used to integrate products of functions:
Integration by Parts
Here,
- =The part you differentiate (choose something that simplifies)
- =The part you integrate (choose something easy to integrate)
- =The derivative of u
- =The antiderivative of dv
Definite Integral Version
Here,
- =The boundary term: u(b)v(b) - u(a)v(a)
- =The remaining integral (often simpler)
LIATE Rule for Choosing u
A helpful heuristic for choosing : Logarithmic -> Inverse trig -> Algebraic -> Trigonometric -> Exponential. Choose as whichever comes first in this list. The remaining factor becomes .
Integration by Parts Example 1
Problem: Compute .
Solution
Let (algebraic), (exponential). Then , .
.
Integration by Parts Example 2
Problem: Compute .
Solution
Let , . Then , .
.
Integration by Parts Example 3
Problem: Compute .
Solution
Apply integration by parts twice:
Let , . Then , .
Now integrate by parts again:
Let , . Then , .
Substitute back:
Cyclic Integration by Parts
When integration by parts returns you to the original integral (as in the example), you can solve for the integral algebraically. This "cyclic" pattern occurs with products of exponentials and trigonometric functions.
Common Integrals
| Integral | Result | Notes |
|---|---|---|
| The case | ||
| Its own antiderivative | ||
| , | ||
| Rewrite as | ||
| Partial fractions | ||
Improper Integrals
DfImproper Integral
An improper integral is an integral where either the interval of integration is infinite or the integrand has an infinite discontinuity (vertical asymptote) within the interval. We evaluate it as a limit.
Infinite Upper Limit
Here,
- =The finite lower bound
- =The upper bound approaches infinity
Infinite Lower Limit
Here,
- =The lower bound approaches negative infinity
- =The finite upper bound
Double Infinite
Here,
- =Any finite point (often 0)
DfConvergence and Divergence
An improper integral converges if the limit exists and is finite. It diverges if the limit does not exist or is infinite. Both parts must converge independently for the full integral to converge.
Improper Integral That Converges
Problem: Evaluate .
Solution
.
The integral converges to .
Improper Integral That Diverges
Problem: Evaluate .
Solution
.
The integral diverges (grows without bound).
p-Integral Test
converges if and only if . This is a quick way to determine convergence for power-type integrands. For example, converges (just barely), while diverges.
Improper Integral with Discontinuity
Problem: Evaluate .
Solution
The integrand has a vertical asymptote at .
.
The integral converges to .
Numerical Integration
When an antiderivative cannot be found in closed form, or when the integrand is defined only by data points, we approximate the integral numerically.
Trapezoidal Rule
DfTrapezoidal Rule
Approximates the area under the curve by dividing the interval into subintervals and approximating each strip as a trapezoid. The error is proportional to (second-order method).
Trapezoidal Rule
Here,
- =Width of each subinterval
- =The i-th grid point
- =Number of subintervals
Simpson's Rule
DfSimpson's Rule
Approximates the area using parabolic arcs instead of straight lines. Requires an even number of subintervals. The error is proportional to (fourth-order method), making it significantly more accurate than the trapezoidal rule for smooth functions.
Simpson's Rule
Here,
- =Width of each subinterval (n must be even)
- =The i-th grid point
- =Weight 4 for odd-indexed points
- =Weight 2 for even-indexed points (except endpoints)
Gaussian Quadrature
DfGaussian Quadrature
A higher-order numerical integration method that chooses both the nodes and weights optimally. An -point Gaussian quadrature rule integrates polynomials of degree exactly. It is particularly efficient for smooth functions.
| Method | Order of Accuracy | Best For | Nodes Required |
|---|---|---|---|
| Trapezoidal | Rough data, quick estimates | Uniform grid | |
| Simpson's | Smooth functions, moderate precision | Uniform grid (even ) | |
| Gaussian | High-precision integration of smooth functions | Optimally placed nodes | |
| Monte Carlo | High-dimensional integrals | Random samples |
Monte Carlo Integration in High Dimensions
For integrals over high-dimensional spaces (common in Bayesian inference and physics simulations), grid-based methods suffer from the curse of dimensionality β the number of grid points grows exponentially with dimension. Monte Carlo integration converges at rate regardless of dimension, making it the only practical choice for high-dimensional integrals.
Python Implementation
Basic Integration with scipy.integrate
import numpy as np
from scipy import integrate
# Define the function to integrate
def f(x):
return x**2 * np.exp(-x)
# Compute the integral from 0 to infinity (improper integral)
result, error = integrate.quad(f, 0, np.inf)
print(f"Integral of x^2 * exp(-x) from 0 to inf:")
print(f" Result: {result:.6f}")
print(f" Error estimate: {error:.2e}")
print(f" Exact (2! = 2): 2.000000")
# Definite integral from 0 to 1
result, error = integrate.quad(lambda x: np.sin(x), 0, np.pi)
print(f"\nIntegral of sin(x) from 0 to pi: {result:.6f} (exact: 2)")
Numerical Methods Comparison
import numpy as np
from scipy import integrate
# Define test function: f(x) = x^2
f = lambda x: x**2
a, b = 0, 1
exact = 1/3
# Trapezoidal rule
for n in [10, 100, 1000]:
x = np.linspace(a, b, n + 1)
trap_result = np.trapz(f(x), x)
print(f"Trapezoidal (n={n:4d}): {trap_result:.8f} error: {abs(trap_result - exact):.2e}")
print()
# Simpson's rule
for n in [10, 100, 1000]:
x = np.linspace(a, b, n + 1)
simp_result = integrate.simpson(f(x), x=x)
print(f"Simpson's (n={n:4d}): {simp_result:.8f} error: {abs(simp_result - exact):.2e}")
print()
# Gaussian quadrature (scipy)
for n in [5, 10, 20]:
result, error = integrate.fixed_quad(lambda x: x**2, a, b, n=n)
print(f"Gauss quad (n={n:4d}): {result:.8f} error: {abs(result - exact):.2e}")
# scipy.integrate.quad (adaptive)
result, error = integrate.quad(f, a, b)
print(f"\nAdaptive quad: {result:.8f} error: {error:.2e}")
Symbolic Integration with SymPy
import sympy as sp
x = sp.Symbol('x')
# Symbolic indefinite integral
f = x**2 * sp.exp(x)
F = sp.integrate(f, x)
print(f"Indefinite integral of x^2 * e^x: {F}")
# Definite integral
result = sp.integrate(x**2, (x, 0, 1))
print(f"Definite integral of x^2 from 0 to 1: {result}")
# Improper integral
result = sp.integrate(sp.exp(-x**2), (x, -sp.oo, sp.oo))
print(f"Integral of e^(-x^2) from -inf to inf: {result}")
print(f" = sqrt(pi) = {sp.sqrt(sp.pi)}")
# Verify Fundamental Theorem
F = sp.integrate(sp.sin(x), x)
print(f"\nAntiderivative of sin(x): {F}")
print(f"FTC: F(pi) - F(0) = {F.subs(x, sp.pi) - F.subs(x, 0)}")
High-Dimensional Integration (Monte Carlo)
import numpy as np
def monte_carlo_integrate(f, bounds, n_samples=100000):
"""Monte Carlo integration for arbitrary dimensions."""
dim = len(bounds)
samples = np.random.uniform(
low=[b[0] for b in bounds],
high=[b[1] for b in bounds],
size=(n_samples, dim)
)
values = np.array([f(s) for s in samples])
volume = np.prod([b[1] - b[0] for b in bounds])
mean_val = np.mean(values)
std_err = np.std(values) / np.sqrt(n_samples)
return mean_val * volume, std_err
# Example: integral of x^2 + y^2 over [0,1] x [0,1]
f = lambda p: p[0]**2 + p[1]**2
result, error = monte_carlo_integrate(f, [(0, 1), (0, 1)])
exact = 2/3 # integral of x^2 + y^2 over [0,1]^2
print(f"Monte Carlo estimate: {result:.6f} +/- {error:.6f}")
print(f"Exact value: {exact:.6f}")
Applications in AI/ML
Probability Density Functions
Integration and Probability
The entire foundation of continuous probability rests on integration. A probability density function must satisfy (normalization), and the probability of any event is computed as an integral of the density.
Probability from PDF
Here,
- =The probability density function of the random variable X
- =The interval bounds
- =The probability that X falls in [a, b]
Normalization Condition
Here,
- =A valid PDF must integrate to 1 over its support
Expected Values and Moments
Expected Value
Here,
- =The expected value (mean) of X
- =Each value weighted by its probability density
Variance
Here,
- =The mean of X
- =Squared deviation from the mean
General Moment
Here,
- =Any function of the random variable
- =The expected value of g(X) β a weighted average
Bayesian Inference
Integrals in Bayesian Methods
Bayesian inference is built on integration. The posterior distribution requires the evidence (marginal likelihood), which is an integral: . This integral is often intractable, which is why methods like MCMC, variational inference, and Laplace approximation exist β they all approximate this integral.
Evidence (Marginal Likelihood)
Here,
- =The likelihood of the data given parameters
- =The prior distribution over parameters
- =The evidence β obtained by integrating over all parameter values
Common Probability Integrals
| Distribution | Key Integral | |
|---|---|---|
| Normal | ||
| Standard Normal | ||
| Exponential | , | |
| Beta | ||
| Gamma |
Common Mistakes
| Mistake | Incorrect | Correct | Explanation |
|---|---|---|---|
| Forgetting the constant of integration | Always include for indefinite integrals | ||
| Wrong sign on trig integral | The integral of sine is negative cosine | ||
| Power rule on | Power rule fails at ; use | ||
| Not changing bounds on substitution | Evaluate with old bounds after -sub | Update bounds when substituting | If , new bounds are and |
| Dropping absolute value in | is defined for too | ||
| Confusing integration with differentiation rules | Treating integrals like derivatives | Integration follows different rules | E.g., |
| Forgetting boundary term in integration by parts | The term is essential | ||
| Divergent improper integrals | Assuming always converges | Converges only for | Check convergence before evaluating |
| Wrong in substitution | Forgetting to include the derivative | not just | The differential must include the derivative |
| Splitting integrals incorrectly | (by symmetry) | diverges | The integrand has a singularity at |
Double-Check Your Work
After computing an integral, verify by differentiating your answer. If , then should equal . For definite integrals, check boundary values and sign consistency.
Interview Questions
Q1: State the Fundamental Theorem of Calculus and explain its significance.
Answer
The Fundamental Theorem of Calculus has two parts:
Part 1: If and is continuous, then . This shows differentiation undoes integration.
Part 2: If is any antiderivative of , then . This allows us to compute definite integrals using antiderivatives.
Significance: It connects the two branches of calculus (differential and integral), transforms the problem of computing areas into evaluating antiderivatives, and provides the theoretical foundation for the entire field.
Q2: When would you use integration by parts vs. substitution?
Answer
- Substitution is used when the integrand contains a composition of functions where the inner function's derivative appears as a factor (reverse of the chain rule). Example: .
- Integration by parts is used for products of different types of functions (reverse of the product rule). Example: .
- A good heuristic: if you see a "inner-derivative" pattern, use substitution. If you see a product of different function types (algebraic Γ exponential, algebraic Γ trig, etc.), use by parts. The LIATE rule helps choose .
Q3: Why does ? Why is this important in ML?
Answer
This is the Gaussian integral. It cannot be computed by finding an antiderivative (since has no elementary antiderivative). The classic proof squares the integral and converts to polar coordinates:
.
Taking the square root gives .
Importance in ML: This integral is the normalization constant for the Gaussian distribution, the most important distribution in statistics and ML. It ensures , which is required for any valid probability distribution.
Q4: Explain the difference between convergence and divergence for improper integrals. Give an example of each.
Answer
An improper integral converges if the limit exists and is finite; it diverges if the limit does not exist or is infinite.
Convergent example: . The area is finite despite the infinite interval.
Divergent example: . The area grows without bound.
Key insight: The p-test says converges iff . For integrals near a singularity like , it converges iff .
Q5: How is numerical integration used when analytical solutions are unavailable?
Answer
When the antiderivative cannot be expressed in closed form (e.g., , , or integrands defined only by data), we use numerical methods:
- Trapezoidal rule: Approximate area with trapezoids. Simple, convergence. Good for rough data.
- Simpson's rule: Use parabolic arcs. convergence. Better for smooth functions.
- Gaussian quadrature: Optimally placed nodes. Integrates degree polynomials exactly with points.
- Monte Carlo integration: Random sampling. Converges at but is the only practical method for high-dimensional integrals (critical in Bayesian inference).
- Adaptive quadrature: Automatically refines the grid where the integrand is difficult.
scipy.integrate.quaduses this.
Q6: What role does integration play in training probabilistic models?
Answer
Integration is essential in several aspects of probabilistic model training:
- Normalization: Every PDF must integrate to 1. For complex models, this constant is often intractable.
- Marginalization: To compute , we integrate out latent variables.
- Evidence computation: is needed for model comparison and Bayesian model selection.
- Expected loss: β computing the expected loss over the posterior.
- Variational inference: Approximates intractable integrals with tractable ones by minimizing KL divergence.
- MCMC: Draws samples from posteriors by constructing Markov chains whose stationary distribution is the target β a way to estimate integrals via sampling.
Q7: Prove that using the definition of the integral.
Answer
Using the Riemann sum with a uniform partition into subintervals of width :
Using Faulhaber's formula for large :
Alternatively, by the Fundamental Theorem: .
Practice Problems
Problem 1: Substitution
Compute .
Solution
Let , so and .
.
Problem 2: Integration by Parts
Compute .
Solution
First application: , . Then , .
Second application: , . Then , .
Combine: .
Problem 3: Definite Integral with Substitution
Compute .
Solution
Let , so .
When : . When : .
.
Problem 4: Improper Integral
Determine whether converges, and if so, evaluate it.
Solution
Use integration by parts: , . Then , .
Evaluate the improper integral:
The integral converges to .
Problem 5: Probability Application
Let have PDF for (Laplace distribution). Find and .
Solution
Expected value: By symmetry of (even function) and (odd function):
(odd integrand over symmetric interval).
Variance:
Using integration by parts twice (or the gamma function ):
.
Quick Reference
Key Takeaways
- Indefinite Integral: where β a family of antiderivatives.
- Definite Integral: β the signed area under the curve, computed via the Fundamental Theorem.
- Fundamental Theorem: Connects differentiation and integration: and .
- Power Rule: for ; for : .
- Substitution: β reverse of the chain rule.
- Integration by Parts: β reverse of the product rule; use LIATE to choose .
- Improper Integrals: Evaluate as limits; converges if the limit is finite, diverges otherwise.
- Numerical Methods: Trapezoidal (), Simpson's (), Gaussian quadrature (), Monte Carlo ().
- Probability: , , normalization requires .
- Bayesian Integration: Evidence is often intractable, motivating MCMC and variational methods.
Cross-References
- Limits and Continuity: The foundation for understanding integrals as limits of Riemann sums -> Limits and Continuity
- Derivatives and Differentiation: Integration is the reverse operation of differentiation -> Derivatives and Differentiation
- Chain Rule: Integration by substitution is the reverse of the chain rule -> Chain Rule and Implicit Differentiation
- Multivariable Calculus: Double and triple integrals extend integration to multiple dimensions -> Multivariable Calculus
- Taylor Series: Polynomial approximations used in deriving integration rules -> Taylor Series
- Probability Foundations: Integration is the backbone of continuous probability -> Probability Foundations
- Probability Distributions: Common distributions and their integral properties -> Probability Distributions
- Expectation and Variance: Expected values are integrals of functions against PDFs -> Expectation and Variance
- Numerical Integration: In-depth coverage of numerical methods -> Numerical Methods
- Differential Equations: Many differential equations are solved by integration -> Differential Equations
- Optimization: Integration in the context of optimization and Lagrange multipliers -> Optimization Fundamentals
- Statistics (MLE): MLE involves integrals over likelihood functions -> Maximum Likelihood Estimation
- Bayesian Statistics: Posterior computation requires integration -> Bayesian Statistics