🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Point-Biserial Correlation — Binary and Continuous Variables

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Point-Biserial Correlation

Descriptive Statistics

Bridging Categories and Numbers in One Statistic

When one variable splits your data into two groups (treated/untreated, pass/fail) and the other is continuous (test score, blood pressure), point-biserial correlation gives you a single number that captures the relationship — and it is mathematically identical to the independent t-test.

Key things point-biserial correlation helps you understand:

  • Group differences — Whether binary membership (e.g., gender, treatment status) is associated with different means on a continuous outcome.
  • Effect size — The square of r_pb tells you the proportion of variance explained by group membership.
  • T-test equivalence — You can convert a t-statistic directly into r_pb, making it easy to compare effect sizes across studies.

Whenever you run an independent t-test, you are already computing point-biserial correlation — you just might not know it.


What is Point-Biserial Correlation?

Definition

The point-biserial correlation measures the association between a binary variable and a continuous variable.


DfPoint-Biserial Correlation

The point-biserial correlation coefficient is a special case of Pearson's r that measures the relationship between a dichotomous (binary) variable and a continuous variable.

Point-Biserial Formula

rpb=xˉ1xˉ0snn0n1n2r_{pb} = \frac{\bar{x}_1 - \bar{x}_0}{s_n} \sqrt{\frac{n_0 \cdot n_1}{n^2}}

Here,

  • xˉ1\bar{x}_1=Mean of the continuous variable for group 1 (code=1)
  • xˉ0\bar{x}_0=Mean of the continuous variable for group 0 (code=0)
  • sns_n=Standard deviation of the continuous variable (population formula)
  • n0,n1n_0, n_1=Sample sizes of each group
  • nn=Total sample size
import numpy as np
from scipy import stats

np.random.seed(42)

# Binary variable: gender (0=Male, 1=Female)
gender = np.array([0]*30 + [1]*30)

# Continuous variable: test scores
scores_male = np.random.normal(75, 10, 30)
scores_female = np.random.normal(82, 10, 30)
scores = np.concatenate([scores_male, scores_female])

r_pb, p_value = stats.pointbiserialr(gender, scores)
print(f"Point-biserial r = {r_pb:.4f}")
print(f"p-value          = {p_value:.6f}")

Relationship to Independent t-Test

# The point-biserial r is equivalent to:
# r_pb = sqrt(t² / (t² + df))

t_stat, p_t = stats.ttest_ind(scores_female, scores_male)
df = len(scores_female) + len(scores_male) - 2
r_from_t = np.sqrt(t_stat**2 / (t_stat**2 + df))

print(f"t-statistic = {t_stat:.4f}, p = {p_t:.6f}")
print(f"r from t-test: {r_from_t:.4f}")
print(f"r from pointbiserial: {r_pb:.4f}")

Equivalence to t-test

The point-biserial correlation is mathematically equivalent to the independent samples t-test. The square of r_pb equals the proportion of variance explained by group membership.


Interpretation

r_pb ValueInterpretation
0.10 – 0.29Small effect
0.30 – 0.49Medium effect
0.50+Large effect
# Effect size interpretation
r_squared = r_pb**2
print(f"r² = {r_squared:.4f}")
print(f"{r_squared*100:.1f}% of variance in scores explained by gender")

Point-Biserial Correlation in Machine Learning

ML ApplicationUsageWhy
Feature selectionBinary target vs continuous featureIdentify discriminative features
A/B testingBinary outcome vs continuous metricMeasure treatment effect
ClassificationBinary class separationQuick feature importance
import numpy as np
from scipy.stats import pointbiserialr

np.random.seed(42)
# Binary outcome (e.g., pass/fail) and continuous feature (e.g., hours studied)
passed = np.random.binomial(1, 0.6, 200)
hours = np.where(passed == 1,
                 np.random.normal(8, 2, 200),
                 np.random.normal(4, 2, 200))

r, p = pointbiserialr(passed, hours)
print(f"Point-biserial r: {r:.3f}, p-value: {p:.4f}")
print(f"Hours studied is {'strongly' if abs(r) > 0.5 else 'moderately'} correlated with passing")

Key Takeaways

Measures association between a binary and continuous variable — a special case of Pearson's r.

Equivalent to the independent t-test — r_pb² = t²/(t²+df), so every t-test already produces a point-biserial correlation.

Positive r_pb means group 1 (coded 1) has a higher mean; negative means group 0 has the higher mean.

r_pb² gives the proportion of variance in the continuous variable explained by group membership — your effect size in one number.

"The t-test and point-biserial correlation are two sides of the same coin — one tells you if the difference is significant, the other tells you how big it actually is."

Summary: Point-Biserial Correlation

  • Measures association between a binary and continuous variable — a special case of Pearson's r
  • Equivalent to the independent t-test — r_pb² = t²/(t²+df)
  • Positive r_pb: group 1 (coded 1) has higher mean; negative: group 0 has higher mean
  • Assumptions: continuous variable is approximately normal within each group, observations are independent
  • Effect size: r_pb² gives the proportion of variance in the continuous variable explained by group membership
  • Use when: one variable is naturally dichotomous (pass/fail, male/female, treated/untreated)

Premium Content

Point-Biserial Correlation — Binary and Continuous Variables

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement