Finding the Optimal Boundary — Maximum Margin Classification
SVM finds the hyperplane that maximizes the margin between classes. It is theoretically elegant and powerful in high-dimensional spaces.
- Maximum Margin — The widest possible gap between classes
- Kernel Trick — Nonlinear classification without explicit transformation
- Support Vectors — The critical points that define the decision boundary
"The art of discovery consists of seeing what everyone has seen and thinking what nobody has thought."
Support Vector Machines — Complete Guide
SVM finds the optimal hyperplane that maximizes the margin between classes. It is one of the most theoretically elegant ML algorithms.
Maximum Margin Classifier
DfSupport Vector Machine (SVM)
Given training data with , SVM finds the hyperplane that maximizes the margin , subject to for all .
Soft Margin SVM
DfSoft Margin SVM (C-SVM)
For non-separable data, allow margin violations with slack variables :
The C Parameter
- Large C: Less regularization, fewer margin violations (may overfit)
- Small C: More regularization, more margin violations (smoother boundary)
- : Hard margin SVM (requires perfect separation)
The Kernel Trick
DfKernel Trick
The kernel trick allows SVM to learn nonlinear decision boundaries by implicitly mapping inputs into high-dimensional feature spaces without explicitly computing the transformation. The dual formulation only needs dot products , which can be computed efficiently via kernel functions.
Common Kernel Functions
- Linear: — no mapping, original space
- Polynomial: — polynomial features
- RBF (Gaussian): — infinite-dimensional feature space
- Sigmoid: — neural network-like
RBF Kernel
Here,
- =Inverse bandwidth; large γ → complex boundary
- =Bandwidth parameter
Kernel Decision Boundaries
Python Implementation
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# Linear SVM
pipe_linear = Pipeline([
('scaler', StandardScaler()),
('svm', SVC(kernel='linear', C=1.0))
])
pipe_linear.fit(X_train, y_train)
print(f"Linear: {pipe_linear.score(X_test, y_test):.3f}")
# RBF SVM (default)
pipe_rbf = Pipeline([
('scaler', StandardScaler()),
('svm', SVC(kernel='rbf', C=1.0, gamma='scale'))
])
pipe_rbf.fit(X_train, y_train)
print(f"RBF: {pipe_rbf.score(X_test, y_test):.3f}")
Always Scale Features
SVM is sensitive to feature magnitudes because it uses distances. Always standardize features (zero mean, unit variance) before training SVM.
Key Takeaways
Summary: SVM
- SVM finds the maximum margin hyperplane: s.t.
- Support vectors are the points on the margin boundary — only they determine the decision boundary
- Kernel trick enables nonlinear classification:
- RBF kernel is the default — maps to infinite-dimensional space
- C parameter controls regularization: large C = less margin violations
- Always scale features — SVM is distance-based
- SVMs work well for high-dimensional data, especially when
- Slow for large datasets — dual formulation is to
What to Learn Next
-> Logistic Regression Classification with probability — from linear to sigmoid.
-> Naive Bayes Bayes' theorem in action — fast, simple, surprisingly powerful.
-> Dimensionality Reduction Reduce features while preserving information with PCA and t-SNE.