🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Logistic Regression — Binary Classification with Statistics

Regression AnalysisLogistic Regression🟢 Free Lesson

Advertisement

Logistic Regression

Regression Analysis

Statistical Foundations of Binary Classification

Logistic regression models the probability of a binary outcome using the log-odds link function. Maximum likelihood estimation, Wald tests, and likelihood ratio tests provide rigorous statistical inference for classification problems.

  • Medical Diagnosis — Predict disease presence from patient characteristics

  • Credit Scoring — Estimate default probability for loan applications

  • Customer Analytics — Model churn likelihood from behavioral features

The sigmoid function maps any linear combination to a valid probability.


Why Not Linear Regression for Binary Data?

Linear regression is inappropriate for binary responses Y{0,1}Y \in \{0, 1\} because:

  1. The errors εi=Yi(β0+β1Xi)\varepsilon_i = Y_i - (\beta_0 + \beta_1 X_i) are not normally distributed (they are Bernoulli).

  2. The predicted values Y^\hat{Y} can fall outside [0,1][0, 1], which is impossible for probabilities.

  3. The variance is not constant: Var(YX)=p(X)(1p(X))\text{Var}(Y \mid X) = p(X)(1-p(X)), which depends on XX.

Logistic regression solves these problems by modeling the probability through the logistic (sigmoid) function.


The Logistic Model

DfLogistic Regression Model

Logistic regression models the probability of success as a function of predictors via the logit link function:

P(Y=1X)=exp(β0+β1X1++βpXp)1+exp(β0+β1X1++βpXp)P(Y = 1 \mid \mathbf{X}) = \frac{\exp(\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p)}{1 + \exp(\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p)}

Equivalently, in terms of the log-odds (logit):

log(p1p)=β0+β1X1++βpXp\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p

where p=P(Y=1X)p = P(Y = 1 \mid \mathbf{X}).

Logistic (Sigmoid) Function

p(x)=eβ0+β1x1+eβ0+β1x=11+e(β0+β1x)p(x) = \frac{e^{\beta_0 + \beta_1 x}}{1 + e^{\beta_0 + \beta_1 x}} = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}

Here,

  • p(x)p(x)=Probability of success given x
  • β0\beta_0=Intercept (log-odds when x = 0)
  • β1\beta_1=Coefficient for x

Odds and Odds Ratios

DfOdds

The odds of an event with probability pp is:

Odds=p1p\text{Odds} = \frac{p}{1 - p}

Odds ranges from 0 to \infty. When p=0.5p = 0.5, odds = 1. When p=0.75p = 0.75, odds = 3 (3 to 1).

ThInterpretation of Coefficients via Odds Ratios

Exponentiating a coefficient gives the odds ratio (OR):

OR=eβj=odds at Xj+1odds at Xj\text{OR} = e^{\beta_j} = \frac{\text{odds at } X_j + 1}{\text{odds at } X_j}
  • βj>0OR>1\beta_j > 0 \Rightarrow \text{OR} > 1: increasing XjX_j increases the odds of success

  • βj<0OR<1\beta_j < 0 \Rightarrow \text{OR} < 1: increasing XjX_j decreases the odds of success

  • βj=0OR=1\beta_j = 0 \Rightarrow \text{OR} = 1: XjX_j has no effect on the odds


Goodness of Fit

| Metric | Definition | Interpretation |

|--------|-----------|----------------|

| McFadden's R2R^2 | 1(β^)/(β^0)1 - \ell(\hat{\beta})/\ell(\hat{\beta}_0) | 0.2–0.4 is considered good |

| AIC | 2(β^)+2p-2\ell(\hat{\beta}) + 2p | Lower is better; penalizes complexity |

| BIC | 2(β^)+pln(n)-2\ell(\hat{\beta}) + p \ln(n) | Lower is better; stronger penalty than AIC |

| Hosmer–Lemeshow | Compares observed vs. predicted frequencies in deciles | Non-significant pp indicates good fit |


Key Takeaways

Summary: Logistic Regression

  • Logistic regression outputs probabilities via the sigmoid function — never below 0 or above 1

  • Coefficients are on the log-odds scale — exponentiate to get odds ratios

  • MLE, not OLS, is used to fit logistic regression — there is no closed-form solution

  • The Likelihood Ratio Test is more powerful than the Wald test for hypothesis testing

  • AUC–ROC is a better performance metric than accuracy for imbalanced classes

  • The logit link ensures the linear predictor maps to a valid probability

  • Odds ratios provide intuitive multiplicative interpretation of effects

Premium Content

Logistic Regression — Binary Classification with Statistics

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement