πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Logistic Regression

StatisticsClassification🟒 Free Lesson

Advertisement

Logistic Regression

Why It Matters

Logistic regression is the baseline classifier and the foundation for neural network classification. It models binary outcomes using the sigmoid function, producing interpretable odds ratios. Every data scientist must understand its coefficients, evaluation metrics, and relationship to more complex models. It is the starting point for any classification task.


Overview

Logistic regression models the probability that a binary outcome Y=1Y = 1 given features XX. Unlike linear regression, it uses the sigmoid function Οƒ(z)=1/(1+eβˆ’z)\sigma(z) = 1/(1 + e^{-z}) to map the linear predictor to a probability between 0 and 1. The model is linear in the log-odds space: log⁑(p/(1βˆ’p))=Ξ²0+Ξ²1x1+β‹―+Ξ²pxp\log(p/(1-p)) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p. Coefficients exponentiate to odds ratios (eΞ²je^{\beta_j}), providing intuitive effect size estimates. A classification threshold (typically 0.5) converts probabilities to binary predictions. The model is fitted via maximum likelihood estimation (MLE), not OLS.


Key Concepts

Logistic Regression Model

P(Y=1∣X)=Οƒ(Ξ²0+Ξ²1x1+β‹―+Ξ²pxp)P(Y=1|X) = \sigma(\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p)

Here,

  • Οƒ(z)\sigma(z)=Sigmoid function: $1 / (1 + e^{-z})$
  • Ξ²0,Ξ²1,…\beta_0, \beta_1, \ldots=Model coefficients
  • x1,…,xpx_1, \ldots, x_p=Input features

Log-Odds (Logit) Link

log⁑(p1βˆ’p)=Ξ²0+Ξ²1x1+β‹―+Ξ²pxp\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p

Here,

  • pp=$P(Y=1|X)$ β€” probability of class 1
  • p1βˆ’p\frac{p}{1-p}=Odds of the outcome

Odds Ratio

OR=eΞ²j\text{OR} = e^{\beta_j}

Here,

  • Ξ²j\beta_j=Coefficient for feature j
  • eΞ²je^{\beta_j}=Multiplicative effect on odds for one-unit increase in x_j

Log-Likelihood

β„“(Ξ²)=βˆ‘i=1n[yilog⁑(p^i)+(1βˆ’yi)log⁑(1βˆ’p^i)]\ell(\beta) = \sum_{i=1}^{n} [y_i \log(\hat{p}_i) + (1-y_i) \log(1-\hat{p}_i)]

Here,

  • yiy_i=Observed outcome (0 or 1)
  • p^i\hat{p}_i=Predicted probability for observation i

Classification Metrics

MetricFormulaUse Case
AccuracyTP+TNTP+TN+FP+FN\frac{TP+TN}{TP+TN+FP+FN}Balanced classes
PrecisionTPTP+FP\frac{TP}{TP+FP}Cost of false positive high
RecallTPTP+FN\frac{TP}{TP+FN}Cost of false negative high
F1 Score2β‹…Precisionβ‹…RecallPrecision+Recall\frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}Balance precision and recall
AUC-ROCArea under ROC curveThreshold-independent evaluation

Odds Ratio Interpretation

OR ValueInterpretation
OR = 1No association
OR > 1Positive association (increases odds)
OR < 1Negative association (decreases odds)
OR = 2Doubles the odds
OR = 0.5Halves the odds

Quick Example

Interpreting Logistic Coefficient

Ξ²^1=0.5\hat{\beta}_1 = 0.5 for age in a logistic regression predicting disease.

Odds ratio = e0.5β‰ˆ1.65e^{0.5} \approx 1.65. For each year increase in age, the odds of disease multiply by 1.65 (65% increase in odds). The 95% CI for the OR might be [1.2, 2.3], indicating the effect is statistically significant.

Threshold Trade-Off

A model predicts P(spam)=0.45P(\text{spam}) = 0.45 for an email. With threshold 0.5, it's classified as not spam. But if missing spam is costly (false negative), lower the threshold to 0.3 β€” catching more spam but also flagging more legitimate emails. The choice of threshold depends on the relative costs of false positives vs. false negatives.

Confusion Matrix

A classifier predicts 80 correct and 20 incorrect out of 100 samples:

Predicted PositivePredicted Negative
Actual PositiveTP = 45FN = 5
Actual NegativeFP = 15TN = 35

Accuracy = (45+35)/100 = 80%. Precision = 45/(45+15) = 75%. Recall = 45/(45+5) = 90%. F1 = 2(0.75)(0.9)/(0.75+0.9) = 81.8%.


Key Takeaways

Summary: Logistic Regression

  • Sigmoid Function: Maps any real number to (0,1)(0, 1), making it ideal for probability estimation.
  • Log-Odds Linear: The model is linear in log-odds space, not probability space. This avoids probabilities outside [0,1][0, 1].
  • Odds Ratio: eΞ²je^{\beta_j} gives the multiplicative effect on odds for a one-unit increase in xjx_j. The most interpretable output.
  • MLE Estimation: Fitted via maximum likelihood, not OLS. The log-likelihood is maximized.
  • Threshold: Default 0.5, but can be adjusted to trade off precision vs. recall based on business costs.
  • Evaluation: Use accuracy for balanced classes, precision when false positives are costly, recall when false negatives are costly, and F1 for balance. AUC-ROC for threshold-independent evaluation.
  • Foundation: Logistic regression is the baseline for binary classification. Neural networks generalize it with hidden layers and non-linear activations.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Logistic Regression

Odds Ratios

  • Odds Ratios β€” Interpreting coefficients as odds ratios, confidence intervals, and practical examples

Related Topics

⭐

Premium Content

Logistic Regression

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Mathematics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement