Computer Vision

Semantic Segmentation — Pixel-Level Understanding of Images

Semantic segmentation assigns a class label to every pixel in an image, enabling fine-grained scene understanding. It is critical for autonomous driving, medical imaging, and robotics.

U-Net Dominates Medical — Encoder-decoder with skip connections excels on small medical imaging datasets
Dice + BCE is Standard — Combined loss functions handle class imbalance and optimize overlap directly
mIoU is the Metric — Mean Intersection over Union across all classes is the primary evaluation measure

Semantic Segmentation — FCN, U-Net, DeepLab and Medical Imaging

Semantic segmentation assigns a class label to every pixel in an image, enabling fine-grained scene understanding. It is critical for autonomous driving, medical imaging, and robotics.

Segmentation Types

DfTypes of Segmentation

Semantic Segmentation: Each pixel gets a class label (no distinction between instances)
Instance Segmentation: Each instance of each class gets a separate mask
Panoptic Segmentation: Combines semantic + instance segmentation

Fully Convolutional Network (FCN)

DfFCN

FCN replaces fully connected layers with 1×1 convolutions, producing pixel-wise predictions:

Use a pre-trained CNN (VGG, ResNet) as encoder
Replace FC layers with 1×1 convolutions
Upsample to original resolution using transposed convolution
Output: $H \times W \times C$ (where $C$ is number of classes)

Transposed Convolution Output Size

\text{Output size} = (\text{Input size} - 1) \times \text{Stride} - 2 \times \text{Padding} + \text{Kernel size}

U-Net Architecture

DfU-Net

U-Net is the dominant architecture for medical image segmentation:

Encoder (contracting path): Downsamples with conv + max pool
Bottleneck: Deepest feature representation
Decoder (expanding path): Upsamples with transposed convolution
Skip connections: Concatenate encoder features with decoder features

Skip connections preserve spatial details lost during downsampling, crucial for precise pixel-level localization.

DeepLab

DfDeepLab

DeepLab uses atrous (dilated) convolution to capture multi-scale context:

\text{Atrous convolution: } (f * k)(p) = \sum_{s+r \cdot t = p} f(s) k(r)

where $t$ is the dilation rate. Dilated convolution increases receptive field without reducing resolution.

Key components:

Atrous convolution: Multi-scale feature extraction
ASPP (Atrous Spatial Pyramid Pooling): Parallel dilated convolutions with different rates
CRF refinement: Post-processing for sharp boundaries

Loss Functions for Segmentation

DfDice Loss

Dice loss directly optimizes the overlap metric:

\mathcal{L}_{\text{Dice}} = 1 - \frac{2 \sum_i p_i g_i + \epsilon}{\sum_i p_i + \sum_i g_i + \epsilon}

where $p_i$ is the predicted probability and $g_i$ is the ground truth mask.

DfCombined Loss

\mathcal{L} = \mathcal{L}_{\text{BCE}} + \lambda \mathcal{L}_{\text{Dice}}

Combining BCE and dice loss handles both pixel-level classification and overlap optimization.

Evaluation: IoU

DfMean IoU (mIoU)

The standard metric for segmentation:

\text{IoU}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c + \text{FN}_c}

\text{mIoU} = \frac{1}{C}\sum_{c=1}^{C} \text{IoU}_c

where TP, FP, FN are true positives, false positives, and false negatives for class $c$ .

Segmentation IoU

\text{IoU}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c + \text{FN}_c}

Applications

Summary

Semantic segmentation assigns a class label to every pixel
FCN introduced fully convolutional architecture for pixel-wise prediction
U-Net with skip connections dominates medical imaging
DeepLab uses dilated convolution for multi-scale context
Dice + BCE loss handles class imbalance
mIoU is the standard evaluation metric

Next: RNN Deep Dive

Semantic Segmentation — FCN, U-Net, DeepLab and Medical Imaging

Semantic Segmentation — Pixel-Level Understanding of Images

Semantic Segmentation — FCN, U-Net, DeepLab and Medical Imaging

Segmentation Types

DfTypes of Segmentation

Fully Convolutional Network (FCN)

DfFCN

U-Net Architecture

DfU-Net

DeepLab

DfDeepLab

Loss Functions for Segmentation

DfDice Loss

DfCombined Loss

Evaluation: IoU

DfMean IoU (mIoU)

Applications

Summary

Premium Content

Need Expert Deep Learning Help?