πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Vector and Matrix Norms

Linear AlgebraVector Norms🟒 Free Lesson

Advertisement

Vector and Matrix Norms

Why It Matters: Norms are the foundation of measuring "size" and "distance" in vector spaces. They determine how we quantify error, define convergence, and control optimization. In machine learning, the choice of norm directly influences model behavior β€” L1 produces sparse solutions, L2 promotes smoothness, and the L∞ norm captures worst-case scenarios. Understanding norms is essential for regularization, numerical stability analysis, and distance-based algorithms.

What is a Norm

DfVector Norm

A norm is a function βˆ₯β‹…βˆ₯:Vβ†’Rβ‰₯0\|\cdot\|: V \to \mathbb{R}_{\geq 0} from a vector space VV to the non-negative real numbers that satisfies four fundamental axioms:

AxiomPropertyDescription
1Non-negativityβˆ₯xβƒ—βˆ₯β‰₯0\|\vec{x}\| \geq 0 for all xβƒ—βˆˆV\vec{x} \in V
2Definitenessβˆ₯xβƒ—βˆ₯=0β€…β€ŠβŸΊβ€…β€Šxβƒ—=0βƒ—\|\vec{x}\| = 0 \iff \vec{x} = \vec{0}
3Homogeneityβˆ₯Ξ±xβƒ—βˆ₯=βˆ₯Ξ±βˆ₯β‹…βˆ₯xβƒ—βˆ₯\|\alpha \vec{x}\| = \|\alpha\| \cdot \|\vec{x}\| for all scalars Ξ±\alpha
4Triangle Inequalityβˆ₯xβƒ—+yβƒ—βˆ₯≀βˆ₯xβƒ—βˆ₯+βˆ₯yβƒ—βˆ₯\|\vec{x} + \vec{y}\| \leq \|\vec{x}\| + \|\vec{y}\| for all xβƒ—,yβƒ—βˆˆV\vec{x}, \vec{y} \in V

A vector space equipped with a norm is called a normed vector space. The norm induces a natural distance function d(xβƒ—,yβƒ—)=βˆ₯xβƒ—βˆ’yβƒ—βˆ₯d(\vec{x}, \vec{y}) = \|\vec{x} - \vec{y}\|, making it a metric space.

Vector Norms

Lp Norm Family

βˆ₯xβƒ—βˆ₯p=(βˆ‘i=1n∣xi∣p)1/p\|\vec{x}\|_p = \left(\sum_{i=1}^{n} |x_i|^p\right)^{1/p}

Here,

  • xβƒ—\vec{x}=Vector in \mathbb{R}^n
  • pp=Parameter satisfying p \geq 1
  • ∣xi∣|x_i|=Absolute value of the i-th component
NormFormulaWhen to Use
L1 (Manhattan)βˆ₯xβƒ—βˆ₯1=βˆ‘i=1n∣xi∣\|\vec{x}\|_1 = \sum_{i=1}^{n} |x_i|Sparse solutions, feature selection (Lasso)
L2 (Euclidean)βˆ₯xβƒ—βˆ₯2=βˆ‘i=1nxi2\|\vec{x}\|_2 = \sqrt{\sum_{i=1}^{n} x_i^2}Smooth solutions, general-purpose (Ridge)
L∞ (Max Norm)βˆ₯xβƒ—βˆ₯∞=max⁑i∣xi∣\|\vec{x}\|_\infty = \max_i |x_i|Worst-case analysis, adversarial robustness
Lp (General)βˆ₯xβƒ—βˆ₯p=(βˆ‘βˆ£xi∣p)1/p\|\vec{x}\|_p = \left(\sum |x_i|^p\right)^{1/p}Interpolation between L1 and L∞
L0 (Pseudo-norm)βˆ₯xβƒ—βˆ₯0=#{i:xiβ‰ 0}\|\vec{x}\|_0 = \#\{i : x_i \neq 0\}Cardinality (non-convex, NP-hard to optimize)

Step-by-Step Example: Computing Vector Norms

Computing Vector Norms for x = [1, -2, 3, -4]

Given xβƒ—=[1βˆ’23βˆ’4]\vec{x} = \begin{bmatrix} 1 \\ -2 \\ 3 \\ -4 \end{bmatrix}, compute all common norms.

Step 1: L1 Norm

βˆ₯xβƒ—βˆ₯1=∣1∣+βˆ£βˆ’2∣+∣3∣+βˆ£βˆ’4∣=1+2+3+4=10\|\vec{x}\|_1 = |1| + |-2| + |3| + |-4| = 1 + 2 + 3 + 4 = 10

Step 2: L2 Norm

βˆ₯xβƒ—βˆ₯2=sqrt12+(βˆ’2)2+32+(βˆ’4)2=sqrt1+4+9+16=sqrt30approx5.477\|\vec{x}\|_2 = \\sqrt{1^2 + (-2)^2 + 3^2 + (-4)^2} = \\sqrt{1 + 4 + 9 + 16} = \\sqrt{30} \\approx 5.477

Step 3: L∞ Norm

βˆ₯xβƒ—βˆ₯∞=max⁑(∣1∣,βˆ£βˆ’2∣,∣3∣,βˆ£βˆ’4∣)=4\|\vec{x}\|_\infty = \max(|1|, |-2|, |3|, |-4|) = 4

Step 4: L4 Norm (example of Lp)

βˆ₯xβƒ—βˆ₯4=(14+24+34+44)1/4=(1+16+81+256)1/4=3541/4approx4.34\|\vec{x}\|_4 = (1^4 + 2^4 + 3^4 + 4^4)^{1/4} = (1 + 16 + 81 + 256)^{1/4} = 354^{1/4} \\approx 4.34

Solution

Key Insight: For any vector, βˆ₯xβƒ—βˆ₯βˆžβ‰€βˆ₯xβƒ—βˆ₯2≀βˆ₯xβƒ—βˆ₯1\|\vec{x}\|_\infty \leq \|\vec{x}\|_2 \leq \|\vec{x}\|_1. The L∞ norm captures only the largest component, L2 averages all components, and L1 sums all magnitudes. As pp increases, the Lp norm converges to the L∞ norm.

Matrix Norms

Frobenius Norm

βˆ₯Aβˆ₯F=βˆ‘i=1mβˆ‘j=1naij2=tr(ATA)=βˆ‘i=1min⁑(m,n)Οƒi2\|A\|_F = \sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n} a_{ij}^2} = \sqrt{\text{tr}(A^TA)} = \sqrt{\sum_{i=1}^{\min(m,n)} \sigma_i^2}

Here,

  • AA=Matrix of size m Γ— n
  • aija_{ij}=Element in row i, column j
  • tr\text{tr}=Trace (sum of diagonal elements)
  • Οƒi\sigma_i=Singular values of A

The Frobenius norm treats a matrix as a vector in RmΓ—n\mathbb{R}^{m \times n} and computes its Euclidean norm. It equals the square root of the sum of squared singular values.

Spectral Norm (Operator 2-Norm)

βˆ₯Aβˆ₯2=Οƒmax⁑(A)=sqrtlambdamax⁑(ATA)\|A\|_2 = \sigma_{\max}(A) = \\sqrt{\\lambda_{\max}(A^TA)}

Here,

  • Οƒmax⁑(A)\sigma_{\max}(A)=Largest singular value of A
  • Ξ»max⁑(ATA)\lambda_{\max}(A^TA)=Largest eigenvalue of A^T A

The spectral norm measures the maximum "stretch" factor of a linear transformation. It equals the largest singular value.

Nuclear Norm

βˆ₯Aβˆ₯βˆ—=βˆ‘i=1rΟƒi=tr(ATA)\|A\|_* = \sum_{i=1}^{r} \sigma_i = \text{tr}(\sqrt{A^TA})

Here,

  • Οƒi\sigma_i=Singular values of A
  • rr=Rank of A

The nuclear norm (also called the trace norm) is the convex envelope of the rank function over the unit spectral norm ball. It is used in matrix completion and low-rank approximation.

Comparison of Matrix Norms

NormFormulaUse Case
Frobeniusβˆ₯Aβˆ₯F=βˆ‘aij2\|A\|_F = \sqrt{\sum a_{ij}^2}General matrix similarity, reconstruction error
Spectralβˆ₯Aβˆ₯2=Οƒmax⁑\|A\|_2 = \sigma_{\max}Stability analysis, condition number, Lipschitz constants
Nuclearβˆ₯Aβˆ₯βˆ—=βˆ‘Οƒi\|A\|_* = \sum \sigma_iMatrix completion, low-rank recovery
L1 (entry-wise)βˆ₯Aβˆ₯1,1=βˆ‘βˆ£aij∣\|A\|_{1,1} = \sum |a_{ij}|Sparse matrix recovery
L∞ (entry-wise)βˆ₯Aβˆ₯∞,∞=max⁑∣aij∣\|A\|_{\infty,\infty} = \max |a_{ij}|Bounded perturbations

Induced (Operator) Norms

DfInduced Matrix Norm

An induced norm (also called an operator norm) measures the maximum output norm given an input constrained to unit norm. The most common induced norms are:

Induced NormDefinitionFormula
Induced 2-normβˆ₯Aβˆ₯2=max⁑βˆ₯xβƒ—βˆ₯=1βˆ₯Axβƒ—βˆ₯2\|A\|_2 = \max_{\|\vec{x}\|=1} \|A\vec{x}\|_2Οƒmax⁑(A)\sigma_{\max}(A)
Induced 1-normβˆ₯Aβˆ₯1=max⁑βˆ₯xβƒ—βˆ₯=1βˆ₯Axβƒ—βˆ₯1\|A\|_1 = \max_{\|\vec{x}\|=1} \|A\vec{x}\|_1max⁑jβˆ‘i∣aij∣\max_j \sum_i |a_{ij}|
Induced ∞-normβˆ₯Aβˆ₯∞=max⁑βˆ₯xβƒ—βˆ₯=1βˆ₯Axβƒ—βˆ₯∞\|A\|_\infty = \max_{\|\vec{x}\|=1} \|A\vec{x}\|_\inftymax⁑iβˆ‘j∣aij∣\max_i \sum_j |a_{ij}|

Example: Computing Matrix Norms

Matrix Norms for A = [[1, 2], [3, 4]]

Given A=[1234]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}.

Frobenius Norm:

βˆ₯Aβˆ₯F=sqrt12+22+32+42=sqrt30approx5.477\|A\|_F = \\sqrt{1^2 + 2^2 + 3^2 + 4^2} = \\sqrt{30} \\approx 5.477

Spectral Norm: Compute ATAA^TA:

ATA=[1324][1234]=[10141420]A^TA = \begin{bmatrix} 1 & 3 \\\\ 2 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\\\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 10 & 14 \\\\ 14 & 20 \end{bmatrix}
lambdamax⁑=30+sqrt302βˆ’4(200βˆ’196)2=30+sqrt8802approx29.37\\lambda_{\max} = \frac{30 + \\sqrt{30^2 - 4(200-196)}}{2} = \frac{30 + \\sqrt{880}}{2} \\approx 29.37
βˆ₯Aβˆ₯2=sqrt29.37approx5.42\|A\|_2 = \\sqrt{29.37} \\approx 5.42

Induced 1-norm:

βˆ₯Aβˆ₯1=max⁑(1+3,2+4)=max⁑(4,6)=6\|A\|_1 = \max(1+3, 2+4) = \max(4, 6) = 6

Induced ∞-norm:

βˆ₯Aβˆ₯∞=max⁑(1+2,3+4)=max⁑(3,7)=7\|A\|_\infty = \max(1+2, 3+4) = \max(3, 7) = 7

Norm Equivalence

ThNorm Equivalence in Finite Dimensions

For any two norms βˆ₯β‹…βˆ₯a\|\cdot\|_a and βˆ₯β‹…βˆ₯b\|\cdot\|_b on a finite-dimensional vector space VV (with dim⁑(V)=n\dim(V) = n), there exist constants c1,c2>0c_1, c_2 > 0 such that for all xβƒ—βˆˆV\vec{x} \in V:

c1βˆ₯xβƒ—βˆ₯a≀βˆ₯xβƒ—βˆ₯b≀c2βˆ₯xβƒ—βˆ₯ac_1 \|\vec{x}\|_a \leq \|\vec{x}\|_b \leq c_2 \|\vec{x}\|_a

Specific bounds for Rn\mathbb{R}^n:

RelationshipBound
βˆ₯xβƒ—βˆ₯βˆžβ‰€βˆ₯xβƒ—βˆ₯2\|\vec{x}\|_\infty \leq \|\vec{x}\|_2βˆ₯xβƒ—βˆ₯2≀nβˆ₯xβƒ—βˆ₯∞\|\vec{x}\|_2 \leq \sqrt{n} \|\vec{x}\|_\infty
βˆ₯xβƒ—βˆ₯2≀βˆ₯xβƒ—βˆ₯1\|\vec{x}\|_2 \leq \|\vec{x}\|_1βˆ₯xβƒ—βˆ₯1≀nβˆ₯xβƒ—βˆ₯2\|\vec{x}\|_1 \leq \sqrt{n} \|\vec{x}\|_2
βˆ₯xβƒ—βˆ₯βˆžβ‰€βˆ₯xβƒ—βˆ₯1\|\vec{x}\|_\infty \leq \|\vec{x}\|_1βˆ₯xβƒ—βˆ₯1≀nβˆ₯xβƒ—βˆ₯∞\|\vec{x}\|_1 \leq n \|\vec{x}\|_\infty

Implication: In finite dimensions, all norms define the same topology β€” convergence in one norm implies convergence in all others. However, the constants matter: the L1 norm can be up to nn times larger than the L∞ norm. In infinite dimensions (function spaces), norms need NOT be equivalent.

Unit Ball: Geometric Interpretation

The unit ball of a norm is the set B={xβƒ—:βˆ₯xβƒ—βˆ₯≀1}B = \{\vec{x} : \|\vec{x}\| \leq 1\}. Its shape reveals the geometric character of the norm.

NormUnit Ball ShapeGeometry
L1Diamond (rotated square in 2D)Vertices at (Β±1,0)(\pm 1, 0) and (0,Β±1)(0, \pm 1)
L2Circle / SphereSmooth, rotationally symmetric
L∞Square / HypercubeVertices at (±1,±1)(\pm 1, \pm 1)
Lp (1<p<∞)Rounded polygonInterpolates between diamond and circle

The L1 unit ball's "pointy" vertices at the axes explain why L1 optimization produces sparse solutions β€” the optimal point is more likely to land on a vertex where some coordinates are exactly zero.

Norms in Optimization

Regularized Loss Function

min⁑wβƒ—L(wβƒ—)+lambdaβˆ₯wβƒ—βˆ₯pp\min_{\vec{w}} \mathcal{L}(\vec{w}) + \\lambda \|\vec{w}\|_p^p

Here,

  • L(wβƒ—)\mathcal{L}(\vec{w})=Loss function (e.g., squared error)
  • Ξ»\lambda=Regularization strength
  • βˆ₯wβƒ—βˆ₯pp\|\vec{w}\|_p^p=p-norm penalty (p=1 or 2 common)
PenaltyNameEffect
Ξ»βˆ₯wβƒ—βˆ₯1\lambda \|\vec{w}\|_1LassoSparse solutions, automatic feature selection
Ξ»βˆ₯wβƒ—βˆ₯22\lambda \|\vec{w}\|_2^2RidgeSmall weights, no feature selection
Ξ»βˆ₯wβƒ—βˆ₯1+Ξ»2βˆ₯wβƒ—βˆ₯22\lambda \|\vec{w}\|_1 + \lambda_2 \|\vec{w}\|_2^2Elastic NetCombines sparsity and smoothness
Ξ»βˆ₯wβƒ—βˆ₯∞\lambda \|\vec{w}\|_\inftyMinimaxBounded maximum coefficient

Distance Metrics

A norm βˆ₯β‹…βˆ₯\|\cdot\| induces a distance metric d(xβƒ—,yβƒ—)=βˆ₯xβƒ—βˆ’yβƒ—βˆ₯d(\vec{x}, \vec{y}) = \|\vec{x} - \vec{y}\| that satisfies:

  1. Non-negativity: d(xβƒ—,yβƒ—)β‰₯0d(\vec{x}, \vec{y}) \geq 0
  2. Identity: d(xβƒ—,yβƒ—)=0β€…β€ŠβŸΊβ€…β€Šxβƒ—=yβƒ—d(\vec{x}, \vec{y}) = 0 \iff \vec{x} = \vec{y}
  3. Symmetry: d(x⃗,y⃗)=d(y⃗,x⃗)d(\vec{x}, \vec{y}) = d(\vec{y}, \vec{x})
  4. Triangle inequality: d(xβƒ—,zβƒ—)≀d(xβƒ—,yβƒ—)+d(yβƒ—,zβƒ—)d(\vec{x}, \vec{z}) \leq d(\vec{x}, \vec{y}) + d(\vec{y}, \vec{z})
DistanceFormulaUse Case
Manhattan (L1L_1)d1=βˆ‘βˆ£xiβˆ’yi∣d_1 = \sum |x_i - y_i|Grid-based movement, high-dimensional data
Euclidean (L2L_2)d2=βˆ‘(xiβˆ’yi)2d_2 = \sqrt{\sum (x_i - y_i)^2}Geometric distance, clustering
Chebyshev (L∞L_\infty)d∞=max⁑∣xiβˆ’yi∣d_\infty = \max |x_i - y_i|Warehouse logistics, robotics

Python Implementation

import numpy as np

# --- Vector Norms ---
x = np.array([1, -2, 3, -4])

l1 = np.linalg.norm(x, ord=1)           # L1: 10.0
l2 = np.linalg.norm(x, ord=2)           # L2: sqrt(30) β‰ˆ 5.477
linf = np.linalg.norm(x, ord=np.inf)    # L∞: 4.0
l4 = np.linalg.norm(x, ord=4)           # L4: 354^(1/4) β‰ˆ 4.34

print(f"L1: {l1}, L2: {l2:.4f}, L∞: {linf}, L4: {l4:.4f}")

# --- Matrix Norms ---
A = np.array([[1, 2], [3, 4]])

frob = np.linalg.norm(A, ord='fro')     # Frobenius: sqrt(30) β‰ˆ 5.477
spectral = np.linalg.norm(A, ord=2)     # Spectral: largest singular value
nuclear = np.linalg.norm(A, ord='nuc')  # Nuclear: sum of singular values

print(f"Frobenius: {frob:.4f}")
print(f"Spectral: {spectral:.4f}")
print(f"Nuclear: {nuclear:.4f}")

# --- Induced Norms ---
induced1 = np.linalg.norm(A, ord=1)     # Induced 1-norm: max column sum
induced_inf = np.linalg.norm(A, ord=np.inf)  # Induced ∞-norm: max row sum
print(f"Induced 1-norm: {induced1}")
print(f"Induced ∞-norm: {induced_inf}")

# --- Distance Computation ---
from scipy.spatial.distance import cdist, pdist

points = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
dist_l1 = cdist(points, points, metric='cityblock')   # Manhattan
dist_l2 = cdist(points, points, metric='euclidean')   # Euclidean
dist_linf = cdist(points, points, metric='chebyshev') # Chebyshev

# --- Regularization Comparison ---
from sklearn.linear_model import Lasso, Ridge

np.random.seed(42)
X = np.random.randn(100, 10)
true_coef = np.zeros(10)
true_coef[:3] = [3, -2, 1]  # Only 3 non-zero features
y = X @ true_coef + np.random.randn(100) * 0.1

lasso = Lasso(alpha=0.1).fit(X, y)
ridge = Ridge(alpha=0.1).fit(X, y)

print(f"Lasso coefficients: {np.round(lasso.coef_, 3)}")  # Sparse
print(f"Ridge coefficients: {np.round(ridge.coef_, 3)}")  # Small but non-zero

Applications in AI/ML

L1 Regularization (Lasso): The L1 norm penalty Ξ»βˆ₯wβƒ—βˆ₯1\lambda \|\vec{w}\|_1 drives some weights to exactly zero, performing automatic feature selection. This is critical in high-dimensional settings where only a subset of features matter (genomics, NLP feature selection).

L2 Regularization (Ridge): The L2 norm penalty Ξ»βˆ₯wβƒ—βˆ₯22\lambda \|\vec{w}\|_2^2 shrinks all weights toward zero but never sets them exactly to zero. It prevents overfitting and improves generalization. It is the default regularization in most linear models.

Adversarial Robustness: The L∞ norm measures the maximum perturbation allowed in adversarial examples. Models trained with the PGD adversarial method optimize max⁑βˆ₯Ξ΄βˆ₯βˆžβ‰€Ο΅L(x+Ξ΄,y)\max_{\|\delta\|_\infty \leq \epsilon} \mathcal{L}(x + \delta, y).

Gradient Clipping: In deep learning, gradients are clipped by norm: if βˆ₯gβƒ—βˆ₯>Ο„\|\vec{g}\| > \tau, then g⃗←τ⋅gβƒ—/βˆ₯gβƒ—βˆ₯\vec{g} \leftarrow \tau \cdot \vec{g} / \|\vec{g}\|. This prevents exploding gradients and stabilizes training.

Matrix Completion (Netflix Prize): The nuclear norm βˆ₯Aβˆ₯βˆ—\|A\|_* is minimized to recover low-rank matrices from partial observations: min⁑Xβˆ₯Xβˆ₯βˆ—\min_{X} \|X\|_* subject to observed entries matching.

Common Mistakes

MistakeWhy It's WrongCorrect Approach
Using L0 norm for optimizationL0 is non-convex, NP-hardUse L1 as convex relaxation
Confusing βˆ₯Aβˆ₯F\|A\|_F with βˆ₯Aβˆ₯2\|A\|_2Frobenius sums all singular values, spectral takes the maxβˆ₯Aβˆ₯2≀βˆ₯Aβˆ₯F≀rβˆ₯Aβˆ₯2\|A\|_2 \leq \|A\|_F \leq \sqrt{r} \|A\|_2
Forgetting βˆ₯cxβƒ—βˆ₯=∣c∣βˆ₯xβƒ—βˆ₯\|c\vec{x}\| = |c| \|\vec{x}\|Homogeneity requires absolute value on scalarβˆ₯βˆ’3xβƒ—βˆ₯=3βˆ₯xβƒ—βˆ₯\|-3\vec{x}\| = 3\|\vec{x}\|, not βˆ’3βˆ₯xβƒ—βˆ₯-3\|\vec{x}\|
Assuming all norms are equal in infinite dimensionsNorm equivalence requires finite dimensionsIn function spaces, different norms define different topologies
Using L2 norm for sparse feature selectionL2 shrinks but doesn't zero out featuresUse L1 (Lasso) or Elastic Net
Ignoring norm when computing condition numberΞΊ(A)=βˆ₯Aβˆ₯β‹…βˆ₯Aβˆ’1βˆ₯\kappa(A) = \|A\| \cdot \|A^{-1}\| depends on the normChoose the norm appropriate for your error metric

Interview Questions

Q1: Why does L1 regularization produce sparse solutions while L2 does not?

Solution

Geometrically, the L1 unit ball is a diamond with vertices on the axes. The level curves of the loss function are more likely to intersect the L1 ball at a vertex, where some coordinates are exactly zero. The L2 ball is a circle β€” level curves typically intersect it at points where all coordinates are non-zero.

Q2: What is the relationship between the spectral norm and the Frobenius norm?

Solution

For any matrix AA: βˆ₯Aβˆ₯2≀βˆ₯Aβˆ₯F≀rβ‹…βˆ₯Aβˆ₯2\|A\|_2 \leq \|A\|_F \leq \sqrt{r} \cdot \|A\|_2, where r=rank(A)r = \text{rank}(A). The spectral norm equals the largest singular value, while the Frobenius norm equals the root-sum-of-squares of all singular values.

Q3: When would you use the nuclear norm instead of the Frobenius norm?

Solution

Use the nuclear norm when you want to encourage low-rank structure in a matrix. The nuclear norm is the tightest convex relaxation of the rank function. Applications include matrix completion (e.g., recommender systems), denoising, and dimensionality reduction.

Q4: Prove that the L∞ norm is indeed a norm on Rn\mathbb{R}^n.

Solution

  1. Non-negativity: max⁑∣xi∣β‰₯0\max |x_i| \geq 0 since each ∣xi∣β‰₯0|x_i| \geq 0.
  2. Definiteness: max⁑∣xi∣=0β€…β€ŠβŸΉβ€…β€Šxi=0\max |x_i| = 0 \implies x_i = 0 for all iβ€…β€ŠβŸΉβ€…β€Šxβƒ—=0βƒ—i \implies \vec{x} = \vec{0}.
  3. Homogeneity: βˆ₯Ξ±xβƒ—βˆ₯∞=max⁑∣αxi∣=∣α∣max⁑∣xi∣=∣α∣βˆ₯xβƒ—βˆ₯∞\|\alpha \vec{x}\|_\infty = \max |\alpha x_i| = |\alpha| \max |x_i| = |\alpha| \|\vec{x}\|_\infty.
  4. Triangle inequality: βˆ₯xβƒ—+yβƒ—βˆ₯∞=max⁑∣xi+yiβˆ£β‰€max⁑(∣xi∣+∣yi∣)≀max⁑∣xi∣+max⁑∣yi∣=βˆ₯xβƒ—βˆ₯∞+βˆ₯yβƒ—βˆ₯∞\|\vec{x} + \vec{y}\|_\infty = \max |x_i + y_i| \leq \max (|x_i| + |y_i|) \leq \max |x_i| + \max |y_i| = \|\vec{x}\|_\infty + \|\vec{y}\|_\infty.

Q5: What is the condition number of a matrix, and why does the norm matter?

Solution

The condition number is ΞΊ(A)=βˆ₯Aβˆ₯β‹…βˆ₯Aβˆ’1βˆ₯\kappa(A) = \|A\| \cdot \|A^{-1}\|. It measures how sensitive the solution of Axβƒ—=bβƒ—A\vec{x} = \vec{b} is to perturbations in bβƒ—\vec{b}. A large condition number indicates an ill-conditioned problem. The value depends on the norm chosen β€” typically the spectral norm or the L∞ norm is used.

Q6: How do norms relate to convergence in optimization algorithms?

Solution

Convergence of an iterative algorithm xβƒ—(k)β†’xβƒ—βˆ—\vec{x}^{(k)} \to \vec{x}^* is defined with respect to a norm: βˆ₯xβƒ—(k)βˆ’xβƒ—βˆ—βˆ₯β†’0\|\vec{x}^{(k)} - \vec{x}^*\| \to 0. In finite dimensions, convergence in one norm implies convergence in all norms. However, the rate of convergence (and practical numerical behavior) can differ significantly between norms.

Practice Problems

Problem 1: Compute the L1, L2, L∞, and L4 norms of xβƒ—=[3βˆ’405]\vec{x} = \begin{bmatrix} 3 \\ -4 \\ 0 \\ 5 \end{bmatrix}.

Solution

βˆ₯xβƒ—βˆ₯1=3+4+0+5=12\|\vec{x}\|_1 = 3 + 4 + 0 + 5 = 12
βˆ₯xβƒ—βˆ₯2=sqrt9+16+0+25=sqrt50=5sqrt2approx7.07\|\vec{x}\|_2 = \\sqrt{9 + 16 + 0 + 25} = \\sqrt{50} = 5\\sqrt{2} \\approx 7.07
βˆ₯xβƒ—βˆ₯∞=max⁑(3,4,0,5)=5\|\vec{x}\|_\infty = \max(3, 4, 0, 5) = 5
βˆ₯xβƒ—βˆ₯4=(81+256+0+625)1/4=9621/4approx5.57\|\vec{x}\|_4 = (81 + 256 + 0 + 625)^{1/4} = 962^{1/4} \\approx 5.57

Verify: βˆ₯xβƒ—βˆ₯βˆžβ‰€βˆ₯xβƒ—βˆ₯4≀βˆ₯xβƒ—βˆ₯2≀βˆ₯xβƒ—βˆ₯1\|\vec{x}\|_\infty \leq \|\vec{x}\|_4 \leq \|\vec{x}\|_2 \leq \|\vec{x}\|_1: 5≀5.57≀7.07≀125 \leq 5.57 \leq 7.07 \leq 12 βœ“

Problem 2: Verify the Cauchy-Schwarz inequality for x⃗=[123]\vec{x} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} and y⃗=[456]\vec{y} = \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix}.

Solution

x⃗⋅y⃗=1(4)+2(5)+3(6)=32\vec{x} \cdot \vec{y} = 1(4) + 2(5) + 3(6) = 32
βˆ₯xβƒ—βˆ₯2=sqrt1+4+9=sqrt14approx3.74\|\vec{x}\|_2 = \\sqrt{1 + 4 + 9} = \\sqrt{14} \\approx 3.74
βˆ₯yβƒ—βˆ₯2=sqrt16+25+36=sqrt77approx8.77\|\vec{y}\|_2 = \\sqrt{16 + 25 + 36} = \\sqrt{77} \\approx 8.77
∣xβƒ—β‹…yβƒ—βˆ£=32≀sqrt14β‹…sqrt77=sqrt1078approx32.83|\vec{x} \cdot \vec{y}| = 32 \leq \\sqrt{14} \cdot \\sqrt{77} = \\sqrt{1078} \\approx 32.83

Cauchy-Schwarz holds: 32≀32.8332 \leq 32.83 βœ“

Problem 3: Compute the Frobenius and spectral norms of A=[2003]A = \begin{bmatrix} 2 & 0 \\\\ 0 & 3 \end{bmatrix}.

Solution

βˆ₯Aβˆ₯F=sqrt4+0+0+9=sqrt13approx3.61\|A\|_F = \\sqrt{4 + 0 + 0 + 9} = \\sqrt{13} \\approx 3.61

Since AA is diagonal, its singular values are ∣2∣=2|2| = 2 and ∣3∣=3|3| = 3.

βˆ₯Aβˆ₯2=Οƒmax⁑=3\|A\|_2 = \sigma_{\max} = 3
βˆ₯Aβˆ₯βˆ—=Οƒ1+Οƒ2=2+3=5\|A\|_* = \sigma_1 + \sigma_2 = 2 + 3 = 5

Problem 4: Show that for any vector xβƒ—βˆˆRn\vec{x} \in \mathbb{R}^n: βˆ₯xβƒ—βˆ₯βˆžβ‰€βˆ₯xβƒ—βˆ₯2≀nβˆ₯xβƒ—βˆ₯∞\|\vec{x}\|_\infty \leq \|\vec{x}\|_2 \leq \sqrt{n} \|\vec{x}\|_\infty.

Solution

Lower bound: Let j=arg⁑max⁑i∣xi∣j = \arg\max_i |x_i|. Then:

βˆ₯xβƒ—βˆ₯2=sqrtβˆ‘ixi2β‰₯sqrtxj2=∣xj∣=βˆ₯xβƒ—βˆ₯∞\|\vec{x}\|_2 = \\sqrt{\sum_i x_i^2} \geq \\sqrt{x_j^2} = |x_j| = \|\vec{x}\|_\infty

Upper bound: Since ∣xiβˆ£β‰€βˆ₯xβƒ—βˆ₯∞|x_i| \leq \|\vec{x}\|_\infty for all ii:

βˆ₯xβƒ—βˆ₯2=sqrtβˆ‘ixi2≀sqrtβˆ‘iβˆ₯xβƒ—βˆ₯∞2=sqrtnβˆ₯xβƒ—βˆ₯∞2=sqrtnβˆ₯xβƒ—βˆ₯∞\|\vec{x}\|_2 = \\sqrt{\sum_i x_i^2} \leq \\sqrt{\sum_i \|\vec{x}\|_\infty^2} = \\sqrt{n \|\vec{x}\|_\infty^2} = \\sqrt{n} \|\vec{x}\|_\infty

Quick Reference

ConceptFormulaKey Property
L1 Normβˆ₯xβƒ—βˆ₯1=βˆ‘βˆ₯xiβˆ₯\|\vec{x}\|_1 = \sum \|x_i\|Promotes sparsity
L2 Normβˆ₯xβƒ—βˆ₯2=βˆ‘xi2\|\vec{x}\|_2 = \sqrt{\sum x_i^2}Promotes smoothness
L∞ Normβˆ₯xβƒ—βˆ₯∞=max⁑βˆ₯xiβˆ₯\|\vec{x}\|_\infty = \max \|x_i\|Worst-case measure
Lp Normβˆ₯xβƒ—βˆ₯p=(βˆ‘βˆ₯xiβˆ₯p)1/p\|\vec{x}\|_p = (\sum \|x_i\|^p)^{1/p}Interpolates L1–L∞
Frobeniusβˆ₯Aβˆ₯F=tr(ATA)\|A\|_F = \sqrt{\text{tr}(A^TA)}Matrix Euclidean norm
Spectralβˆ₯Aβˆ₯2=Οƒmax⁑(A)\|A\|_2 = \sigma_{\max}(A)Maximum stretch factor
Nuclearβˆ₯Aβˆ₯βˆ—=βˆ‘Οƒi\|A\|_* = \sum \sigma_iLow-rank relaxation
Induced 1-normβˆ₯Aβˆ₯1=max⁑jβˆ‘iβˆ₯aijβˆ₯\|A\|_1 = \max_j \sum_i \|a_{ij}\|Max column sum
Induced ∞-normβˆ₯Aβˆ₯∞=max⁑iβˆ‘jβˆ₯aijβˆ₯\|A\|_\infty = \max_i \sum_j \|a_{ij}\|Max row sum
Condition NumberΞΊ(A)=βˆ₯Aβˆ₯β‹…βˆ₯Aβˆ’1βˆ₯\kappa(A) = \|A\| \cdot \|A^{-1}\|Numerical stability

Cross-References

  • Vector Spaces: Foundation for defining norms
  • Eigenvalues and Singular Values: Used to compute spectral and nuclear norms
  • Inner Products: Cauchy-Schwarz inequality connects norms to inner products
  • Optimization: Regularization, gradient descent, constrained optimization
  • Machine Learning: Lasso, Ridge, Elastic Net, adversarial robustness
  • Numerical Linear Algebra: Condition numbers, stability analysis
  • Clustering: K-means uses Euclidean norm, K-medians uses L1
  • Dimensionality Reduction: PCA minimizes Frobenius norm reconstruction error
⭐

Premium Content

Vector and Matrix Norms

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Mathematics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement