🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Factor Analysis — Latent Variable Models

StatisticsMultivariate Analysis🟢 Free Lesson

Advertisement

Factor Analysis — Latent Variable Models

Statistics

Discovering Hidden Variables That Drive Observed Patterns

Factor analysis identifies latent constructs that explain correlations among observed variables. It reveals underlying dimensions — like intelligence or socioeconomic status — that cannot be measured directly but manifest through multiple indicators.

  • Psychology — Identify personality traits from questionnaire responses

  • Marketing — Discover latent customer segments from behavioral data

  • Education — Measure abstract constructs like academic aptitude from test scores

Beneath the surface of many measured variables lie a few hidden forces shaping them all.


Factor analysis identifies latent (hidden) variables that explain correlations among observed variables. It reduces many correlated measurements into a smaller set of underlying factors.

DfFactor Analysis

A statistical method that describes variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

Factor Model

Xi=i1F1+i2F2++imFm+εiX_i = \ell_{i1}F_1 + \ell_{i2}F_2 + \cdots + \ell_{im}F_m + \varepsilon_i

Here,

  • XiX_i=Observed variable i (standardized)
  • FjF_j=Latent factor j
  • ij\ell_{ij}=Factor loading — association between variable i and factor j
  • εi\varepsilon_i=Unique factor (error) for variable i
  • mm=Number of latent factors (m < p)

Factor Loadings

Factor loadings represent the correlation between each observed variable and each factor. They form the loading matrix LL.

Communality

hi2=i12+i22++im2h_i^2 = \ell_{i1}^2 + \ell_{i2}^2 + \cdots + \ell_{im}^2

Here,

  • hi2h_i^2=Communality — proportion of variance in X_i explained by all factors
  • ij\ell_{ij}=Factor loading of variable i on factor j

Interpreting Loadings

A loading close to 1 or -1 means the variable strongly represents that factor. Loadings near 0 indicate little relationship. Typically, loadings > |0.4| are considered meaningful.


Factor Extraction Methods

Principal Component Method

The most common approach. Factors are extracted sequentially to maximize variance explained.

| Method | Key Idea | When to Use |

|--------|----------|-------------|

| Principal Components | Maximize total variance | Default; data reduction |

| Maximum Likelihood | Maximize likelihood under normality | Normal data; hypothesis testing |

| Principal Axis Factoring | Iteratively estimates communalities | When normality is questionable |

| Minimum Residual | Minimize off-diagonal residuals | Small samples |


Rotation

Rotation makes factors more interpretable by achieving simple structure — each variable loads highly on one factor and lowly on others.

Types of Rotation

| Type | Constraint | When to Use |

|------|-----------|-------------|

| Varimax (orthogonal) | Factors uncorrelated | When factors are independent |

| Promax (oblique) | Factors may correlate | When factors are expected to be related |

| Oblimin (oblique) | Factors may correlate | Flexible oblique rotation |

Rotation Choice

If you believe the underlying factors are independent, use Varimax. If factors are likely correlated (common in social sciences), use Promax or Oblimin.


Number of Factors

Several criteria help determine how many factors to retain:

  1. Kaiser's Rule: Retain factors with eigenvalues > 1

  2. Scree Plot: Look for the "elbow" where eigenvalues level off

  3. Parallel Analysis: Compare eigenvalues to those from random data

  4. Velicer's MAP: Minimize average partial correlations

Variance Explained

Proportion=λjk=1pλk\text{Proportion} = \frac{\lambda_j}{\sum_{k=1}^{p} \lambda_k}

Here,

  • λj\lambda_j=Eigenvalue for factor j
  • pp=Total number of variables

Assumptions

  • Linearity: Relationships among variables are linear

  • Multivariate normality: Data are approximately normally distributed

  • Adequate sample size: Generally n > 100 (some say n > 5 variables per factor)

  • No perfect multicollinearity: Variables are not perfectly correlated

Bartlett's Test

Before running factor analysis, test that the correlation matrix is not an identity matrix using Bartlett's test of sphericity. A significant result (p < 0.05) indicates factor analysis is appropriate.


KMO Test

The Kaiser-Meyer-Olkin measure assesses sampling adequacy.

KMO Statistic

KMO=ijrij2ijrij2+ijaij2KMO = \frac{\sum\sum_{i \neq j} r_{ij}^2}{\sum\sum_{i \neq j} r_{ij}^2 + \sum\sum_{i \neq j} a_{ij}^2}

Here,

  • rijr_{ij}=Correlation between variables i and j
  • aija_{ij}=Partial correlation between variables i and j

| KMO Value | Interpretation |

|-----------|---------------|

| 0.9 - 1.0 | Marvelous |

| 0.8 - 0.9 | Meritorious |

| 0.7 - 0.8 | Middling |

| 0.6 - 0.7 | Mediocre |

| 0.5 - 0.6 | Miserable |

| < 0.5 | Unacceptable |


Python Implementation


import numpy as np

import pandas as pd

from factor_analyzer import FactorAnalyzer

from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity, calculate_kmo



np.random.seed(42)



# Simulated data with 3 latent factors

n = 500

f1 = np.random.randn(n)

f2 = np.random.randn(n)

f3 = np.random.randn(n)



data = pd.DataFrame({

    'X1': 0.8*f1 + 0.1*np.random.randn(n),

    'X2': 0.7*f1 + 0.2*np.random.randn(n),

    'X3': 0.6*f1 + 0.3*np.random.randn(n),

    'X4': 0.8*f2 + 0.1*np.random.randn(n),

    'X5': 0.7*f2 + 0.2*np.random.randn(n),

    'X6': 0.6*f2 + 0.3*np.random.randn(n),

    'X7': 0.9*f3 + 0.1*np.random.randn(n),

    'X8': 0.7*f3 + 0.2*np.random.randn(n),

})



# Bartlett's test

chi_square, p_value = calculate_bartlett_sphericity(data)

print(f"Bartlett's test: chi2={chi_square:.2f}, p={p_value:.4e}")



# KMO

kmo_all, kmo_model = calculate_kmo(data)

print(f"KMO: {kmo_model:.3f}")



# Factor analysis with 3 factors, varimax rotation

fa = FactorAnalyzer(n_factors=3, rotation='varimax')

fa.fit(data)



# Loadings

loadings = pd.DataFrame(fa.loadings_,

    index=data.columns,

    columns=['Factor 1', 'Factor 2', 'Factor 3'])

print("\nFactor Loadings:")

print(loadings.round(3))



# Variance explained

variance = fa.get_factor_variance()

print(f"\nVariance explained: {variance[1].round(3)}")

print(f"Cumulative: {variance[2].round(3)}")

Worked Example

Example: Personality Assessment

A psychologist measures 6 personality items and wants to identify underlying traits. After running factor analysis with varimax rotation, the loading matrix is:

| Item | Factor 1 (Extraversion) | Factor 2 (Agreeableness) |

|------|------------------------|-------------------------|

| Talkative | 0.82 | 0.11 |

| Sociable | 0.78 | 0.08 |

| Warm | 0.65 | 0.32 |

| Kind | 0.12 | 0.85 |

| Cooperative | 0.09 | 0.79 |

| Trusting | 0.15 | 0.71 |

Items 1-3 load highly on Factor 1 (Extraversion), while items 4-6 load on Factor 2 (Agreeableness). This simple structure suggests a clean two-factor solution.

Factor Loadings by Item

Key Takeaways

Summary: Factor Analysis

  • Factor analysis reduces many correlated variables into fewer latent factors

  • Factor loadings measure the correlation between variables and factors (|l| > 0.4 is meaningful)

  • Communality hi2h_i^2 is the total variance in a variable explained by all factors

  • Use Kaiser's rule, scree plot, or parallel analysis to choose the number of factors

  • Varimax rotation assumes independent factors; Promax allows correlated factors

  • Always check Bartlett's test and KMO before interpreting results

  • Factor analysis requires adequate sample size (n > 100) and linear relationships


Related Topics

Premium Content

Factor Analysis — Latent Variable Models

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement