🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Types of Data in Statistics — Quantitative vs Qualitative

Foundations of StatisticsData Types🟢 Free Lesson

Advertisement

Types of Data in Statistics

Data Types

Know Your Data Before You Analyze It

Understanding data types is the first and most critical step in any statistical analysis. The type of data you have determines which statistical methods are valid, which visualizations are appropriate, and what conclusions you can draw. Choose wrong, and your entire analysis falls apart.

Here is what mastering data types helps you do:

  • Select the Right Tests — Different data types require different statistical methods; using the wrong one produces meaningless results.
  • Visualize Effectively — The chart type that works for continuous data is useless for categorical data, and vice versa.
  • Avoid Common Mistakes — Stop treating ZIP codes as numbers, Likert scales as intervals, or discrete counts as continuous values.
  • Communicate Clearly — Classifying variables correctly lets you describe your data accurately to stakeholders and peers.

Identifying your data type is not busywork — it is the foundation every analysis stands on.


Types of Data in Statistics

Definition

Data types classify variables based on their mathematical properties. The type determines which statistical methods, visualizations, and operations are valid.


The Data Type Hierarchy

ALL DATAQualitative (Categorical)Quantitative (Numerical)NominalNo orderOrdinalHas orderDiscreteCountableContinuousMeasurableEye color, Blood typeEducation, RatingsChildren, CarsHeight, Temperature

Qualitative (Categorical) Data

DfQualitative Data

Qualitative data represents categories or groups — things that are described rather than measured numerically.

Nominal Data

Categories with no natural order. You can only determine equality or inequality.

Examples:

  • Eye color: brown, blue, green, hazel
  • Blood type: A, B, AB, O
  • Country of birth
  • Product category: electronics, clothing, food
  • Survey responses: Yes / No

Valid operations: Count, mode, chi-square test
Invalid operations: Mean, median, subtraction

Eye Color Distribution (Nominal Data)

Ordinal Data

Categories with a meaningful order, but the gaps between categories are not necessarily equal.

Examples:

  • Education level: high school < bachelor's < master's < PhD
  • Customer satisfaction: Poor < Fair < Good < Excellent
  • Military rank: Private < Corporal < Sergeant < Captain
  • Star ratings: ★ < ★★ < ★★★ < ★★★★ < ★★★★★

Valid operations: Ordering, median, percentiles, Spearman correlation
Invalid operations: Arithmetic mean (controversial), subtraction (intervals unknown)

Customer Satisfaction (Ordinal Data)

Key Distinction

"Excellent" is better than "Good", but is it exactly twice as good? Ordinal scales can't tell us.


Quantitative (Numerical) Data

DfQuantitative Data

Quantitative data represents measured or counted quantities — numbers that have mathematical meaning.

Discrete Data

Can only take specific, countable values — usually whole numbers. There are gaps between possible values.

Examples:

  • Number of children in a family (0, 1, 2, 3, ... — not 1.7)
  • Number of cars in a parking lot
  • Number of defects in a product
  • Shoe sizes (though not whole numbers, they're discrete: 8, 8.5, 9...)
  • Number of goals scored in a soccer match

Valid operations: All arithmetic, count, Poisson distribution, binomial distribution

Number of Children per Family (Discrete Data)

Continuous Data

Can take any value within a range, including fractions and decimals. Limited only by measurement precision.

Examples:

  • Height (1.753847... meters)
  • Temperature (23.7°C)
  • Time to complete a task
  • Weight, blood pressure, distance
  • Stock prices

Valid operations: All arithmetic, normal distribution, integration, derivatives

Height Distribution (Continuous Data)

Interval vs Ratio (A Deeper Cut)

Within quantitative data, we can further distinguish:

FeatureIntervalRatio
Equal intervals✅ Yes✅ Yes
True zero (zero = absence)❌ No✅ Yes
Meaningful ratios❌ No✅ Yes
ExampleTemperature (°C), IQHeight, weight, income

Interval example: 0°C is not "no temperature." 40°C is not twice as hot as 20°C (in the thermodynamic sense). Temperature in Kelvin is ratio.

Ratio example: A person who weighs 80 kg is genuinely twice as heavy as someone who weighs 40 kg.


Why Data Types Matter for Statistics

Analysis GoalNominalOrdinalDiscrete/Continuous
Central tendencyModeMode, MedianMean, Median, Mode
SpreadFrequencyIQRStd Dev, Variance
CorrelationCramér's VSpearman ρPearson r
Group comparisonChi-squareKruskal-WallisANOVA, t-test
RegressionDummy variablesOrdinal logisticLinear regression
VisualizationBar chartBar/boxHistogram, scatter

Python: Identifying and Working with Data Types

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load a rich dataset
df = sns.load_dataset('tips')
print("Dataset shape:", df.shape)
print("\nData types (pandas dtypes):")
print(df.dtypes)

Output:

Architecture Diagram
Dataset shape: (244, 7)

Data types (pandas dtypes):
total_bill     float64   <- Continuous quantitative
tip            float64   <- Continuous quantitative
sex           category   <- Nominal qualitative
smoker        category   <- Nominal qualitative
day           category   <- Ordinal qualitative (Sun > Sat > Fri > Thur semantically)
time          category   <- Nominal qualitative
size            int64    <- Discrete quantitative
# --- Statistical summaries differ by type ---

print("\n=== Quantitative Variables ===")
print(df[['total_bill', 'tip', 'size']].describe())

print("\n=== Qualitative Variables ===")
for col in ['sex', 'smoker', 'day', 'time']:
    print(f"\n{col} — value counts:")
    print(df[col].value_counts())
    print(f"Mode: {df[col].mode()[0]}")

# --- Visualizations appropriate to each type ---
fig, axes = plt.subplots(2, 3, figsize=(14, 8))

# Continuous: histogram
axes[0, 0].hist(df['total_bill'], bins=20, color='steelblue', edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Total Bill (Continuous)\n-> Histogram')
axes[0, 0].set_xlabel('Amount ($)')

# Discrete: bar chart
size_counts = df['size'].value_counts().sort_index()
axes[0, 1].bar(size_counts.index, size_counts.values, color='coral', edgecolor='black')
axes[0, 1].set_title('Party Size (Discrete)\n-> Bar Chart')
axes[0, 1].set_xlabel('Size')

# Nominal: pie chart
sex_counts = df['sex'].value_counts()
axes[0, 2].pie(sex_counts.values, labels=sex_counts.index, autopct='%1.1f%%', startangle=90)
axes[0, 2].set_title('Sex (Nominal)\n-> Pie Chart')

# Ordinal: ordered bar
day_order = ['Thur', 'Fri', 'Sat', 'Sun']
day_counts = df['day'].value_counts().reindex(day_order)
axes[1, 0].bar(day_counts.index, day_counts.values, color='mediumseagreen', edgecolor='black')
axes[1, 0].set_title('Day (Ordinal)\n-> Ordered Bar Chart')

# Continuous: box plot by category
df.boxplot(column='tip', by='day', ax=axes[1, 1])
axes[1, 1].set_title('Tip by Day\n-> Box Plot')

# Scatter: two continuous
axes[1, 2].scatter(df['total_bill'], df['tip'], alpha=0.5, color='purple')
axes[1, 2].set_title('Bill vs Tip (Continuous × Continuous)\n-> Scatter Plot')
axes[1, 2].set_xlabel('Total Bill ($)')
axes[1, 2].set_ylabel('Tip ($)')

plt.tight_layout()
plt.savefig('data_types_visualization.png', dpi=150)
plt.show()

Data Type Classification in Practice

def classify_variable(series: pd.Series, nunique_threshold: int = 15) -> str:
    """Classify a pandas Series into a statistical data type."""
    dtype = series.dtype
    nunique = series.nunique()

    if dtype == 'bool':
        return 'Nominal (Binary)'
    elif dtype.name == 'category' or dtype == 'object':
        return 'Nominal Categorical'
    elif dtype in ['int32', 'int64']:
        if nunique <= nunique_threshold:
            return f'Discrete Quantitative ({nunique} unique values)'
        else:
            return 'Discrete Quantitative (high cardinality)'
    elif dtype in ['float32', 'float64']:
        return 'Continuous Quantitative'
    else:
        return f'Unknown ({dtype})'

# Apply to the tips dataset
print("Variable Classification:")
print("-" * 50)
for col in df.columns:
    classification = classify_variable(df[col])
    print(f"{col:<15} -> {classification}")

Common Mistakes

Treating Ordinal as Interval

Averaging Likert-scale responses (1–5) as if they are interval data is common but technically incorrect. The difference between "Strongly Agree" and "Agree" may not equal the difference between "Neutral" and "Disagree."

Zip Codes as Quantitative

ZIP code 90210 is not 40,000 more than ZIP code 50000. It's a nominal identifier.

Treating Discrete Data as Continuous

Modeling number of children with a continuous distribution can predict 1.7 children — meaningless. Use Poisson or negative binomial.


Practice Exercises

Exercise 1: Classify each variable:

  • a) Temperature in Fahrenheit
  • b) Movie genre (Action, Comedy, Drama)
  • c) Customer age
  • d) Job satisfaction rating (1 = Very Unsatisfied, 5 = Very Satisfied)
  • e) Number of siblings

Exercise 2: For each variable in the iris dataset, identify the type and choose the most appropriate visualization.

import seaborn as sns
iris = sns.load_dataset('iris')
print(iris.dtypes)
# Your classifications and visualizations here

See Solution

# sepal_length: float64 -> Continuous -> histogram or box plot
# sepal_width: float64 -> Continuous -> histogram or box plot
# petal_length: float64 -> Continuous -> histogram or box plot
# petal_width: float64 -> Continuous -> histogram or box plot
# species: object/category -> Nominal -> bar chart

import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset('iris')
fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# Continuous: petal_length distribution by species
for species in iris['species'].unique():
    subset = iris[iris['species'] == species]['petal_length']
    axes[0].hist(subset, bins=15, alpha=0.6, label=species)
axes[0].set_title('Petal Length by Species\n(Continuous, grouped)')
axes[0].legend()

# Nominal: species counts
iris['species'].value_counts().plot(kind='bar', ax=axes[1], color='coral', edgecolor='black')
axes[1].set_title('Species Count\n(Nominal)')
axes[1].tick_params(rotation=0)

plt.tight_layout()
plt.show()

Data Types in Machine Learning & Deep Learning

Categorical (Nominal)One-Hot EncodingOrdinalLabel EncodingDiscreteEmbedding / CountContinuousStandardScalerML models require numeric input — how you encode depends on data type
Data TypeML EncodingDeep LearningExample
NominalOne-Hot, LabelEmbedding layerColor: [1,0,0] for Red
OrdinalOrdinal encodingEmbedding layerRating: 1,2,3,4,5
DiscreteCount, BinEmbeddingChildren: 0,1,2,3
ContinuousStandardScaler, MinMaxNormalization layerHeight: 1.72m

Example — Encoding Data for ML:

import pandas as pd
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# Sample data
df = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'red', 'blue'],
    'size': ['S', 'M', 'L', 'XL', 'M'],  # Ordinal
    'price': [10.5, 20.3, 15.7, 12.1, 25.0]  # Continuous
})

# One-Hot Encode nominal data (color)
encoder = OneHotEncoder(sparse=False)
color_encoded = encoder.fit_transform(df[['color']])
print("One-Hot Encoded Color:")
print(pd.DataFrame(color_encoded, columns=encoder.get_feature_names_out()))

# Standardize continuous data (price)
scaler = StandardScaler()
price_scaled = scaler.fit_transform(df[['price']])
print("\nStandardized Price:")
print(price_scaled.flatten())

Output:

Architecture Diagram
One-Hot Encoded Color:
   color_blue  color_green  color_red
0         0.0          0.0        1.0
1         1.0          0.0        0.0
2         0.0          1.0        0.0
3         0.0          0.0        1.0
4         1.0          0.0        0.0

Standardized Price:
[-1.09  0.72 -0.17 -0.91  1.45]

Key Takeaways

Data type determines your entire analysis pipeline — identify types before doing anything else.

ML models need numeric input — encoding strategy depends on data type.

Deep learning uses embedding layers for categorical data — more powerful than one-hot for high cardinality.

Wrong type leads to wrong method leads to wrong conclusion — this is not just academic pedantry.

"Get the data type right, and the statistics follow. Get it wrong, and no method can save you."


What to Learn Next

-> Levels of Measurement Nominal, ordinal, interval, ratio — which statistics are valid for each?

-> Frequency Distributions Organize raw data into tables and charts — the first step in any analysis.

-> Data Collection Methods Surveys, experiments, observations — how to gather quality data.

-> Sampling Techniques Random, stratified, cluster — how to choose who gets measured.

-> Mean, Median, Mode The three ways to find the center of your data.

-> Probability Basics The math of uncertainty — foundation of all inference.

Premium Content

Types of Data in Statistics — Quantitative vs Qualitative

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement