🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Semantic Segmentation — FCN, U-Net, DeepLab and Medical Imaging

Computer VisionSegmentation🟢 Free Lesson

Advertisement

Computer Vision

Semantic Segmentation — Pixel-Level Understanding of Images

Semantic segmentation assigns a class label to every pixel in an image, enabling fine-grained scene understanding. It is critical for autonomous driving, medical imaging, and robotics.

  • U-Net Dominates Medical — Encoder-decoder with skip connections excels on small medical imaging datasets
  • Dice + BCE is Standard — Combined loss functions handle class imbalance and optimize overlap directly
  • mIoU is the Metric — Mean Intersection over Union across all classes is the primary evaluation measure

Semantic Segmentation — FCN, U-Net, DeepLab and Medical Imaging

Semantic segmentation assigns a class label to every pixel in an image, enabling fine-grained scene understanding. It is critical for autonomous driving, medical imaging, and robotics.


Segmentation Types

DfTypes of Segmentation

  1. Semantic Segmentation: Each pixel gets a class label (no distinction between instances)
  2. Instance Segmentation: Each instance of each class gets a separate mask
  3. Panoptic Segmentation: Combines semantic + instance segmentation
Segmentation Types Comparison🚗🚶🚗🚗OriginalSemanticInstancePanopticCar (all same)Car₁, Car₂Person

Fully Convolutional Network (FCN)

DfFCN

FCN replaces fully connected layers with 1×1 convolutions, producing pixel-wise predictions:

  1. Use a pre-trained CNN (VGG, ResNet) as encoder
  2. Replace FC layers with 1×1 convolutions
  3. Upsample to original resolution using transposed convolution
  4. Output: H×W×CH \times W \times C (where CC is number of classes)
Transposed Convolution Output Size
Output size=(Input size1)×Stride2×Padding+Kernel size\text{Output size} = (\text{Input size} - 1) \times \text{Stride} - 2 \times \text{Padding} + \text{Kernel size}

U-Net Architecture

DfU-Net

U-Net is the dominant architecture for medical image segmentation:

  • Encoder (contracting path): Downsamples with conv + max pool
  • Bottleneck: Deepest feature representation
  • Decoder (expanding path): Upsamples with transposed convolution
  • Skip connections: Concatenate encoder features with decoder features

Skip connections preserve spatial details lost during downsampling, crucial for precise pixel-level localization.

U-Net Architecture: Encoder-Decoder with Skip ConnectionsEncoder (Downsampling)64×H×W128×H/2256×H/4512×H/81024×H/16BottleneckDecoder (Upsampling)512×H/8256×H/4128×H/2C×H×WOutputSkip Connections (Concatenate)

DeepLab

DfDeepLab

DeepLab uses atrous (dilated) convolution to capture multi-scale context:

Atrous convolution: (fk)(p)=s+rt=pf(s)k(r)\text{Atrous convolution: } (f * k)(p) = \sum_{s+r \cdot t = p} f(s) k(r)

where tt is the dilation rate. Dilated convolution increases receptive field without reducing resolution.

Key components:

  • Atrous convolution: Multi-scale feature extraction
  • ASPP (Atrous Spatial Pyramid Pooling): Parallel dilated convolutions with different rates
  • CRF refinement: Post-processing for sharp boundaries
Atrous (Dilated) ConvolutionStandard 3×3Rate=1Dilated 3×3 (r=2)Rate=2, larger receptive fieldDilated 3×3 (r=4)Rate=4, even largerASPPRate=1Rate=6Rate=12Rate=18Paralleldilatedconvolutions

Loss Functions for Segmentation

DfDice Loss

Dice loss directly optimizes the overlap metric:

LDice=12ipigi+ϵipi+igi+ϵ\mathcal{L}_{\text{Dice}} = 1 - \frac{2 \sum_i p_i g_i + \epsilon}{\sum_i p_i + \sum_i g_i + \epsilon}

where pip_i is the predicted probability and gig_i is the ground truth mask.

DfCombined Loss

L=LBCE+λLDice\mathcal{L} = \mathcal{L}_{\text{BCE}} + \lambda \mathcal{L}_{\text{Dice}}

Combining BCE and dice loss handles both pixel-level classification and overlap optimization.


Evaluation: IoU

DfMean IoU (mIoU)

The standard metric for segmentation:

IoUc=TPcTPc+FPc+FNc\text{IoU}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c + \text{FN}_c}
mIoU=1Cc=1CIoUc\text{mIoU} = \frac{1}{C}\sum_{c=1}^{C} \text{IoU}_c

where TP, FP, FN are true positives, false positives, and false negatives for class cc.

Segmentation IoU
IoUc=TPcTPc+FPc+FNc\text{IoU}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c + \text{FN}_c}

Applications

Semantic Segmentation ApplicationsMedical Imaging• Tumor segmentation• Organ delineation• Cell counting• Retinal vessel analysis• Skin lesion analysisU-Net dominates hereAutonomous Driving• Road segmentation• Lane detection• Pedestrian detection• Sky/building parsing• Free space estimationReal-time requiredOther Domains• Satellite imagery• Agricultural monitoring• Industrial inspection• AR/VR scene understanding• Robot navigationDeepLab for speed

Summary

  • Semantic segmentation assigns a class label to every pixel
  • FCN introduced fully convolutional architecture for pixel-wise prediction
  • U-Net with skip connections dominates medical imaging
  • DeepLab uses dilated convolution for multi-scale context
  • Dice + BCE loss handles class imbalance
  • mIoU is the standard evaluation metric

Next: RNN Deep Dive

Premium Content

Semantic Segmentation — FCN, U-Net, DeepLab and Medical Imaging

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Deep Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement