🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

ROC Curves and AUC � Model Discrimination

StatisticsModel Evaluation🟢 Free Lesson

Advertisement

ROC Curves and AUC � Model Discrimination

Statistics

Evaluating How Well Classifiers Separate Classes

ROC curves plot true positive rates against false positive rates across all thresholds, while AUC summarizes overall discrimination ability. These threshold-independent metrics reveal a models inherent ability to distinguish between classes.

  • Medical Screening � Compare diagnostic tests for disease detection accuracy
  • Fraud Detection � Evaluate model performance across operating thresholds
  • Credit Risk � Assess borrower classification before setting cutoff policies

An AUC of 0.5 means random guessing; an AUC of 1 means perfect separation.


ROC curves and AUC measure how well a classifier distinguishes between classes. They are threshold-independent metrics that evaluate the discrimination ability of a model.

DfROC Curve

A plot of the True Positive Rate (sensitivity) against the False Positive Rate (1 - specificity) at various classification thresholds.


Key Metrics

True Positive Rate (Sensitivity)

TPR=TPTP+FNTPR = \frac{TP}{TP + FN}

Here,

  • TPTP=True positives (correctly predicted positive)
  • FNFN=False negatives (missed positive cases)

False Positive Rate

FPR=FPFP+TNFPR = \frac{FP}{FP + TN}

Here,

  • FPFP=False positives (incorrectly predicted positive)
  • TNTN=True negatives (correctly predicted negative)

Precision

Precision=TPTP+FPPrecision = \frac{TP}{TP + FP}

Here,

  • PrecisionPrecision=Proportion of positive predictions that are correct

Area Under the Curve (AIC)

AUC

AUC=01TPR(FPR1(x))dxAUC = \int_0^1 TPR(FPR^{-1}(x)) \, dx

Here,

  • AUCAUC=Area under the ROC curve (0 to 1)
AUCInterpretation
0.5No discrimination (random guessing)
0.5 - 0.7Poor discrimination
0.7 - 0.8Acceptable discrimination
0.8 - 0.9Excellent discrimination
> 0.9Outstanding discrimination

Probabilistic Interpretation

AUC = the probability that a randomly chosen positive case receives a higher predicted probability than a randomly chosen negative case.


Threshold Selection

The optimal threshold depends on the cost ratio of false positives vs false negatives.

Optimal Threshold

Threshold=CFPP(N)CFNP(P)\text{Threshold}^* = \frac{C_{FP} \cdot P(N)}{C_{FN} \cdot P(P)}

Here,

  • CFPC_{FP}=Cost of a false positive
  • CFNC_{FN}=Cost of a false negative
  • P(P),P(N)P(P), P(N)=Prior probabilities

Balanced Threshold

The default threshold of 0.5 is optimal only when classes are equally important and equally prevalent. Adjust the threshold based on the specific application costs.


Confusion Matrix

Predicted PositivePredicted Negative
Actual PositiveTPFN
Actual NegativeFPTN

F1 Score

F1=2×Precision×RecallPrecision+RecallF1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}

Here,

  • RecallRecall=Same as TPR
  • F1F1=Harmonic mean of precision and recall

Precision-Recall Curve

When classes are imbalanced, the PR curve may be more informative than ROC.

ROC with Imbalanced Classes

With severe class imbalance (e.g., 99% negatives), ROC can be optimistically misleading because FPR uses a large denominator (all negatives). The PR curve focuses on the minority class.


Multi-Class Extensions

One-vs-Rest (OvR)

Compute ROC for each class against all others, then average.

One-vs-One (OvO)

Compute ROC for each pair of classes.

Multi-class AUC (macro)

AUCmacro=1Kk=1KAUCkAUC_{macro} = \frac{1}{K}\sum_{k=1}^{K} AUC_k

Here,

  • KK=Number of classes
  • AUCkAUC_k=AUC for class k (one-vs-rest)

DeLong Test

Tests whether two AUCs are significantly different.

DeLong Test

z=AUC1AUC2Var(AUC1)+Var(AUC2)2Cov(AUC1,AUC2)z = \frac{AUC_1 - AUC_2}{\sqrt{Var(AUC_1) + Var(AUC_2) - 2Cov(AUC_1, AUC_2)}}

Here,

  • zz=Test statistic (standard normal under $H_0$)

Python Implementation

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (roc_curve, roc_auc_score, precision_recall_curve,
                              confusion_matrix, classification_report)
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

np.random.seed(42)

# Generate data
X, y = make_classification(n_samples=500, n_features=10, n_informative=5,
                           weights=[0.7], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Fit model
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)
y_prob = model.predict_proba(X_test)[:, 1]

# ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
auc = roc_auc_score(y_test, y_prob)

# Find optimal threshold (Youden's J)
J = tpr - fpr
optimal_idx = np.argmax(J)
optimal_threshold = thresholds[optimal_idx]

# Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# ROC
axes[0].plot(fpr, tpr, 'b-', label=f'AUC = {auc:.3f}')
axes[0].plot([0, 1], [0, 1], 'r--', label='Random')
axes[0].scatter(fpr[optimal_idx], tpr[optimal_idx], c='red', s=100, label=f'Threshold = {optimal_threshold:.2f}')
axes[0].set_xlabel('FPR')
axes[0].set_ylabel('TPR')
axes[0].set_title('ROC Curve')
axes[0].legend()

# Precision-Recall
precision, recall, _ = precision_recall_curve(y_test, y_prob)
axes[1].plot(recall, precision, 'b-')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].set_title('Precision-Recall Curve')
plt.tight_layout()
plt.show()

# Confusion matrix at optimal threshold
y_pred_optimal = (y_prob >= optimal_threshold).astype(int)
print(f"Optimal threshold: {optimal_threshold:.3f}")
print(f"Confusion Matrix:\n{confusion_matrix(y_test, y_pred_optimal)}")
print(f"\n{classification_report(y_test, y_pred_optimal)}")

Worked Example

Example: Medical Screening

Evaluating a disease screening test with 5% prevalence:

ThresholdSensitivitySpecificityPPVF1
0.30.950.720.140.24
0.50.820.880.260.40
0.70.650.950.420.51

AUC = 0.89 (excellent discrimination)

Recommendation: Use threshold = 0.3 for screening (high sensitivity, accept more false positives). Use threshold = 0.7 for diagnosis (high specificity, fewer false positives).


Key Takeaways

Summary: ROC and AUC

  • ROC curve plots TPR vs FPR across all thresholds
  • AUC summarizes discrimination ability: 0.5 = random, 1.0 = perfect
  • AUC = probability that a random positive scores higher than a random negative
  • Threshold selection depends on the relative costs of FP vs FN
  • For imbalanced classes, use the PR curve instead of ROC
  • Use the DeLong test to compare AUCs between models
  • F1 score balances precision and recall at a single threshold

Related Topics

Premium Content

ROC Curves and AUC � Model Discrimination

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement