Computer Vision
Semantic Segmentation — Pixel-Level Understanding of Images
Semantic segmentation assigns a class label to every pixel in an image, enabling fine-grained scene understanding. It is critical for autonomous driving, medical imaging, and robotics.
- U-Net Dominates Medical — Encoder-decoder with skip connections excels on small medical imaging datasets
- Dice + BCE is Standard — Combined loss functions handle class imbalance and optimize overlap directly
- mIoU is the Metric — Mean Intersection over Union across all classes is the primary evaluation measure
Semantic Segmentation — FCN, U-Net, DeepLab and Medical Imaging
Semantic segmentation assigns a class label to every pixel in an image, enabling fine-grained scene understanding. It is critical for autonomous driving, medical imaging, and robotics.
Segmentation Types
DfTypes of Segmentation
- Semantic Segmentation: Each pixel gets a class label (no distinction between instances)
- Instance Segmentation: Each instance of each class gets a separate mask
- Panoptic Segmentation: Combines semantic + instance segmentation
Fully Convolutional Network (FCN)
DfFCN
FCN replaces fully connected layers with 1×1 convolutions, producing pixel-wise predictions:
- Use a pre-trained CNN (VGG, ResNet) as encoder
- Replace FC layers with 1×1 convolutions
- Upsample to original resolution using transposed convolution
- Output: (where is number of classes)
U-Net Architecture
DfU-Net
U-Net is the dominant architecture for medical image segmentation:
- Encoder (contracting path): Downsamples with conv + max pool
- Bottleneck: Deepest feature representation
- Decoder (expanding path): Upsamples with transposed convolution
- Skip connections: Concatenate encoder features with decoder features
Skip connections preserve spatial details lost during downsampling, crucial for precise pixel-level localization.
DeepLab
DfDeepLab
DeepLab uses atrous (dilated) convolution to capture multi-scale context:
where is the dilation rate. Dilated convolution increases receptive field without reducing resolution.
Key components:
- Atrous convolution: Multi-scale feature extraction
- ASPP (Atrous Spatial Pyramid Pooling): Parallel dilated convolutions with different rates
- CRF refinement: Post-processing for sharp boundaries
Loss Functions for Segmentation
DfDice Loss
Dice loss directly optimizes the overlap metric:
where is the predicted probability and is the ground truth mask.
DfCombined Loss
Combining BCE and dice loss handles both pixel-level classification and overlap optimization.
Evaluation: IoU
DfMean IoU (mIoU)
The standard metric for segmentation:
where TP, FP, FN are true positives, false positives, and false negatives for class .
Applications
Summary
- Semantic segmentation assigns a class label to every pixel
- FCN introduced fully convolutional architecture for pixel-wise prediction
- U-Net with skip connections dominates medical imaging
- DeepLab uses dilated convolution for multi-scale context
- Dice + BCE loss handles class imbalance
- mIoU is the standard evaluation metric
Next: RNN Deep Dive