🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

ML Cheatsheet — Quick Reference Guide

Expert TopicsReference🟢 Free Lesson

Advertisement

Career

ML Cheatsheet — Everything You Need in One Place

Your comprehensive quick reference for machine learning concepts, algorithms, formulas, and best practices. Perfect for interviews and daily work.

  • Algorithm Summaries — Quick reference for all major ML algorithms
  • Formula Reference — Mathematical foundations at your fingertips
  • Best Practices — Proven guidelines for ML projects

"Knowledge is power, but organized knowledge is superpower."

ML Cheatsheet — Quick Reference

A comprehensive quick reference for machine learning algorithms, metrics, math, and Python code.


Algorithm Comparison Chart

Algorithm Comparison: When to Use WhatAlgorithmTypeProsConsBest ForLinear RegressionLinearSimple, interpretableAssumes linearityBaseline, interpretableLogistic RegressionLinearProbabilities, fastLinear boundaryBinary classificationRandom ForestEnsembleRobust, handles missingLess interpretableTabular data defaultXGBoostEnsembleBest accuracy, fastHyperparameter sensitiveCompetitions, tabularSVMKernelEffective in high-dSlow on large dataSmall-medium datasetsNeural NetworkDeepUniversal approximatorNeeds lots of dataImages, text, speechK-MeansClusteringSimple, scalableMust specify KCustomer segmentationDBSCANClusteringFinds任意 shapeStruggles with densityAnomaly detection

Decision Tree: Model Selection

Model Selection Decision TreeWhat type of problem?Supervised (labeled data)Unsupervised (no labels)ClassificationRegressionBinaryLogReg, SVM, XGBMulti-classRF, XGB, NNContinuousLinReg, XGB, NNClusteringDim. ReductionK-MeansKnown KDBSCANUnknown KPCA, t-SNEVisualizationGolden RulesStart simple (linear) → add complexity if needed. Feature engineering > algorithm choice. Cross-validate everything.

Classification Metrics

DfClassification Metrics

Metrics for evaluating classification models, measuring performance across different aspects like accuracy, precision, recall, and their trade-offs.

MetricFormulaWhen to Use
Accuracy(TP+TN)/(TP+TN+FP+FN)(TP+TN)/(TP+TN+FP+FN)Balanced classes
PrecisionTP/(TP+FP)TP/(TP+FP)Cost of false positive is high (spam)
RecallTP/(TP+FN)TP/(TP+FN)Cost of false negative is high (cancer)
F1 Score2PRP+R2 \cdot \frac{P \cdot R}{P + R}Imbalanced classes
AUC-ROCArea under ROC curveRanking quality
Log Loss1N[ylogp+(1y)log(1p)]-\frac{1}{N}\sum[y\log p + (1-y)\log(1-p)]Probabilistic predictions

Regression Metrics

MetricFormulaInterpretation
MSE1N(yiy^i)2\frac{1}{N}\sum(y_i - \hat{y}_i)^2Penalizes large errors
RMSEMSE\sqrt{MSE}Same units as target
MAE1Nyiy^i\frac{1}{N}\sum\|y_i - \hat{y}_i\|Robust to outliers
1SSresSStot1 - \frac{SS_{res}}{SS_{tot}}Variance explained (0-1)
MAPE100%Nyy^y\frac{100\%}{N}\sum\|\frac{y-\hat{y}}{y}\|Percentage error

Math Quick Reference

Linear Algebra

Dot product: ab=iaibi\mathbf{a} \cdot \mathbf{b} = \sum_i a_i b_i

Matrix multiply: (AB)ij=kAikBkj(AB)_{ij} = \sum_k A_{ik}B_{kj}

Norm: x2=ixi2\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}

Eigenvalue: Av=λvA\mathbf{v} = \lambda\mathbf{v}

Calculus

Power rule: ddxxn=nxn1\frac{d}{dx}x^n = nx^{n-1}

Chain rule: ddxf(g(x))=f(g(x))g(x)\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)

Gradient: f=[fx1,...,fxn]\nabla f = [\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}]

Probability

Bayes' theorem: P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Expected value: E[X]=xxP(x)E[X] = \sum_x x \cdot P(x)

Variance: Var(X)=E[(Xμ)2]=E[X2](E[X])2\text{Var}(X) = E[(X-\mu)^2] = E[X^2] - (E[X])^2

Normal distribution: f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}


Python Libraries

  • Data: pandas, numpy
  • Visualization: matplotlib, seaborn, plotly
  • ML: scikit-learn, xgboost, lightgbm
  • Deep Learning: pytorch, tensorflow, keras
  • NLP: transformers, spacy, nltk
  • CV: opencv, torchvision
  • AutoML: auto-sklearn, optuna
  • Deployment: fastapi, flask, streamlit
  • Experiment: mlflow, wandb

Key Takeaways

Summary: ML Cheatsheet

  • Start simple — linear models as baselines before complex ones
  • Feature engineering matters more than algorithm choice
  • Cross-validate everything — never trust a single train/test split
  • Regularize to prevent overfitting (L1 for sparsity, L2 for smoothness)
  • Scale features for distance-based algorithms (KNN, SVM, K-Means)
  • Ensemble multiple models for best performance (bagging, stacking)
  • Monitor models in production for data drift and performance degradation
  • Keep learning — the field evolves fast (new papers every week)

What to Learn Next

-> What is Machine Learning? — Complete Introduction Learn about what is machine learning? — complete introduction.

-> Linear Regression — Complete Guide with Math and Code Learn about linear regression — complete guide with math and code.

-> Model Evaluation — Metrics, Cross-Validation and Selection Learn about model evaluation — metrics, cross-validation and selection.

-> Transformers — Attention Is All You Need Complete Guide Learn about transformers — attention is all you need complete guide.

-> ML System Design — Architecture and Production Patterns Learn about ml system design — architecture and production patterns.

-> ML Interview Prep — Questions, Answers and System Design Learn about ml interview prep — questions, answers and system design.

Premium Content

ML Cheatsheet — Quick Reference Guide

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement