Boundary Detection in ML

Q: What is the difference between edge detection and boundary detection?

Edge detection finds all intensity changes in an image, including textures and shadows. Boundary detection finds semantically meaningful object contours that require high-level understanding. A checkerboard has many edges but no object boundaries. Modern systems use deep learning to bridge this gap.

Q: How does U-Net handle precise boundary localization?

U-Net uses an encoder-decoder architecture with skip connections that concatenate high-resolution features from the encoder to the decoder. This recovers fine spatial details lost during pooling. Combined with specialized loss functions like Dice Loss or Tversky Loss, U-Net produces thin, accurate boundaries instead of blurry predictions.

Q: How do you deploy boundary detection models at 60 FPS for autonomous driving?

Key techniques include INT8 quantization for 3-4x CPU speedup, TensorRT optimization for 5-10x GPU speedup, lightweight architectures like MobileNet backbones, and processing at lower resolution with upsampling. Hardware like NVIDIA Jetson Xavier or Tesla FSD chips provides dedicated inference acceleration.

Q: What loss functions work best for boundary detection?

Standard cross-entropy produces thick, blurry boundaries because boundary pixels are rare. Dice Loss directly optimizes overlap, Tversky Loss controls the precision-recall tradeoff, Focal Loss focuses on hard examples, and weighted BCE upweights boundary pixels. Combining Dice Loss with BCE often works best in practice.

21 minute read

“Defining where one object ends and another begins.”

TL;DR

Boundary detection identifies where objects begin and end in images, a critical capability for autonomous driving, medical imaging, and photo editing. This article covers the evolution from classical Canny edge detection through deep learning models like U-Net and Mask R-CNN, specialized loss functions for thin structures, CRF-based refinement, and real-time deployment techniques. If you are working on model compression for edge deployment or distributed training of large vision models, these patterns apply directly.

A topographic relief map with sharp contour lines carved into layered colored acrylic

1. The Problem: Edges vs. Boundaries

Edge Detection: Finding sharp changes in pixel intensity (low-level).
Example: A checkerboard pattern has many edges.
Boundary Detection: Finding semantically meaningful contours of objects (high-level).
Example: The outline of a “Dog” or “Car”.

Applications:

Autonomous Driving: Lane detection, road boundaries.
Medical Imaging: Tumor segmentation, organ boundaries.
Photo Editing: “Select Subject” tool in Photoshop.

2. Classical Approaches

Before Deep Learning, we used math.

1. Canny Edge Detector (1986)

The gold standard for decades.

Gaussian Blur: Remove noise.
Gradient Calculation: Find intensity change (\nabla I) using Sobel filters.
Non-Maximum Suppression: Thin out edges to 1-pixel width.
Hysteresis Thresholding: Keep strong edges, and weak edges connected to strong ones.

Pros: Fast, precise localization. Cons: Detects all edges (texture, shadows), not just object boundaries.

2. Structured Forests

Uses Random Forests to classify patches as “edge” or “non-edge”.
Uses hand-crafted features (color, gradient histograms).

3. Deep Learning Approaches

Modern systems use CNNs to learn semantic boundaries.

1. Holistically-Nested Edge Detection (HED)

Architecture: VGG-16 backbone.
Multi-Scale: Predicts edges at multiple layers (conv3, conv4, conv5).
Fusion: Combines side-outputs into a final edge map.
Loss: Weighted Cross-Entropy (to handle class imbalance: 90% pixels are non-edge).

2. CASENet (Category-Aware Semantic Edge Detection)

Not just “is this an edge?”, but “is this a Dog edge or a Car edge?”.
Architecture: ResNet-101 with multi-label loss.
Output: K channels, one for each class boundary.

4. Deep Dive: U-Net for Boundary Detection

U-Net is the standard for biomedical segmentation, but it excels at boundaries too.

Architecture:

Encoder (Contracting Path): Captures context (What is this?).
Decoder (Expanding Path): Precise localization (Where is it?).
Skip Connections: Concatenate high-res features from encoder to decoder to recover fine details.

Loss Function for Thin Boundaries: Standard Cross-Entropy produces thick, blurry boundaries. Solution: Dice Loss or Tversky Loss. Dice = \frac{2 |P \cap G|}{|P| + |G|} Where P is prediction, G is ground truth.

5. System Design: Lane Detection System

Scenario: Self-driving car needs to stay in lane.

Pipeline:

Input: Camera feed (1080p, 60fps).
Preprocessing: ROI cropping (focus on road), Perspective Transform (Bird’s Eye View).
Model: Lightweight CNN (e.g., ENet or LaneNet).
- Output: Binary mask of lane lines.
Post-processing:
- Curve Fitting: Fit a 2nd or 3rd degree polynomial (y = ax^2 + bx + c) to the points.
- Kalman Filter: Smooth predictions over time (lanes don’t jump).

Challenges:

Occlusion: Car in front blocks view.
Lighting: Shadows, glare, night.
Worn Markings: Faded lines.

6. Deep Dive: Active Contour Models (Snakes)

A hybrid approach: Deep Learning gives a rough mask, Snakes refine it.

Concept:

Define a curve (snake) around the object.
Define an Energy Function:
E_{internal}: Smoothness (don’t bend too sharply).
E_{external}: Image forces (snap to high gradients).
Minimize energy iteratively. The snake “shrinks-wraps” the object.

Modern Twist: Deep Snake. Use a GNN to predict vertex offsets for the polygon contour.

7. Evaluation Metrics

F-Measure (ODS/OIS):
- ODS (Optimal Dataset Scale): Best fixed threshold for the whole dataset.
- OIS (Optimal Image Scale): Best threshold per image.
Boundary IoU:
- Standard IoU is dominated by the object interior.
- Boundary IoU computes intersection only along the contour band.

8. Real-World Case Studies

Case Study 1: Adobe Photoshop “Select Subject”

Problem: User wants to cut out a person.
Solution: Deep Learning model (Sensei) predicts a “trimap” (Foreground, Background, Unknown).
Refinement: Matting Laplacian to solve the alpha value for hair/fur pixels.

Case Study 2: Tesla Autopilot

Problem: Map the drivable space.
Solution: “HydraNet” multi-task learning.
Heads: Lane lines, Road edges, Curbs.
Vector Space: Projects image-space predictions into 3D vector space for planning.

9. Summary

Component	Technology
Low-Level	Canny, Sobel
Deep Learning	HED, CASENet, U-Net
Refinement	Active Contours, CRF
Metrics	Boundary IoU, F-Score

10. Deep Dive: U-Net Architecture Implementation

Let’s implement a production-ready U-Net in PyTorch.

import torch
import torch.nn as nn

class DoubleConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = nn.Sequential(
        nn.Conv2d(in_channels, out_channels, 3, padding=1),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, 3, padding=1),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.conv(x)

    class UNet(nn.Module):
    def __init__(self, in_channels=3, out_channels=1):
        super().__init__()

        # Encoder
        self.enc1 = DoubleConv(in_channels, 64)
        self.enc2 = DoubleConv(64, 128)
        self.enc3 = DoubleConv(128, 256)
        self.enc4 = DoubleConv(256, 512)

        self.pool = nn.MaxPool2d(2)

        # Bottleneck
        self.bottleneck = DoubleConv(512, 1024)

        # Decoder
        self.upconv4 = nn.ConvTranspose2d(1024, 512, 2, stride=2)
        self.dec4 = DoubleConv(1024, 512) # 1024 = 512 (upconv) + 512 (skip)

        self.upconv3 = nn.ConvTranspose2d(512, 256, 2, stride=2)
        self.dec3 = DoubleConv(512, 256)

        self.upconv2 = nn.ConvTranspose2d(256, 128, 2, stride=2)
        self.dec2 = DoubleConv(256, 128)

        self.upconv1 = nn.ConvTranspose2d(128, 64, 2, stride=2)
        self.dec1 = DoubleConv(128, 64)

        self.out = nn.Conv2d(64, out_channels, 1)

    def forward(self, x):
        # Encoder
        e1 = self.enc1(x)
        e2 = self.enc2(self.pool(e1))
        e3 = self.enc3(self.pool(e2))
        e4 = self.enc4(self.pool(e3))

        # Bottleneck
        b = self.bottleneck(self.pool(e4))

        # Decoder with skip connections
        d4 = self.upconv4(b)
        d4 = torch.cat([d4, e4], dim=1)
        d4 = self.dec4(d4)

        d3 = self.upconv3(d4)
        d3 = torch.cat([d3, e3], dim=1)
        d3 = self.dec3(d3)

        d2 = self.upconv2(d3)
        d2 = torch.cat([d2, e2], dim=1)
        d2 = self.dec2(d2)

        d1 = self.upconv1(d2)
        d1 = torch.cat([d1, e1], dim=1)
        d1 = self.dec1(d1)

        return torch.sigmoid(self.out(d1))

11. Deep Dive: Loss Functions for Boundary Detection

Standard Binary Cross-Entropy (BCE) produces thick boundaries. We need specialized losses.

1. Weighted BCE (Class Imbalance)

Boundary pixels are rare (< 5% of image). Weight them higher.

def weighted_bce_loss(pred, target, pos_weight=10.0):
    bce = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([pos_weight]))
    return bce(pred, target)

2. Dice Loss (Overlap Metric)

Directly optimizes for IoU.

def dice_loss(pred, target, smooth=1.0):
    pred = pred.view(-1)
    target = target.view(-1)

    intersection = (pred * target).sum()
    dice = (2. * intersection + smooth) / (pred.sum() + target.sum() + smooth)

    return 1 - dice

3. Tversky Loss (Precision/Recall Trade-off)

Generalization of Dice. Control false positives vs. false negatives.

def tversky_loss(pred, target, alpha=0.7, beta=0.3, smooth=1.0):
    pred = pred.view(-1)
    target = target.view(-1)

    TP = (pred * target).sum()
    FP = ((1 - target) * pred).sum()
    FN = (target * (1 - pred)).sum()

    tversky = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
    return 1 - tversky

4. Focal Loss (Hard Examples)

Down-weight easy examples, focus on hard ones.

def focal_loss(pred, target, alpha=0.25, gamma=2.0):
    bce = nn.functional.binary_cross_entropy(pred, target, reduction='none')
    pt = torch.exp(-bce)
    focal = alpha * (1 - pt) ** gamma * bce
    return focal.mean()

12. Deep Dive: Post-Processing Techniques

Raw model output is noisy. Refine it.

1. Morphological Operations

import cv2
import numpy as np

def post_process_boundary(mask):
    # Convert to uint8
    mask = (mask * 255).astype(np.uint8)

    # Morphological closing (fill small gaps)
    kernel = np.ones((3, 3), np.uint8)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)

    # Skeletonization (thin to 1-pixel width)
    mask = cv2.ximgproc.thinning(mask)

    return mask

2. Non-Maximum Suppression (NMS)

Keep only local maxima along the gradient direction.

def non_max_suppression(edge_map, gradient_direction):
    M, N = edge_map.shape
    suppressed = np.zeros((M, N))

    angle = gradient_direction * 180. / np.pi
    angle[angle < 0] += 180

    for i in range(1, M-1):
        for j in range(1, N-1):
            q = 255
            r = 255

            # Angle 0
            if (0 <= angle[i,j] < 22.5) or (157.5 <= angle[i,j] <= 180):
                q = edge_map[i, j+1]
                r = edge_map[i, j-1]
                # Angle 45
            elif (22.5 <= angle[i,j] < 67.5):
                q = edge_map[i+1, j-1]
                r = edge_map[i-1, j+1]
                # Angle 90
            elif (67.5 <= angle[i,j] < 112.5):
                q = edge_map[i+1, j]
                r = edge_map[i-1, j]
                # Angle 135
            elif (112.5 <= angle[i,j] < 157.5):
                q = edge_map[i-1, j-1]
                r = edge_map[i+1, j+1]

                if (edge_map[i,j] >= q) and (edge_map[i,j] >= r):
                    suppressed[i,j] = edge_map[i,j]

                    return suppressed

13. Deep Dive: Real-Time Deployment Optimizations

For autonomous driving, we need 60 FPS (16ms per frame).

1. Model Quantization

Convert FP32 to INT8.

import torch.quantization

model_fp32 = UNet()
model_fp32.eval()

# Post-training static quantization
model_int8 = torch.quantization.quantize_dynamic(
model_fp32,
{torch.nn.Conv2d, torch.nn.Linear},
dtype=torch.qint8
)

# Speedup: 3-4x on CPU

2. TensorRT Optimization

NVIDIA’s inference optimizer.

import tensorrt as trt

# Convert PyTorch model to ONNX
torch.onnx.export(model, dummy_input, "unet.onnx")

# Build TensorRT engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)

with open("unet.onnx", 'rb') as model_file:
    parser.parse(model_file.read())

    config = builder.create_builder_config()
    config.max_workspace_size = 1 << 30 # 1GB
    config.set_flag(trt.BuilderFlag.FP16) # Use FP16

    engine = builder.build_engine(network, config)

    # Speedup: 5-10x on GPU

3. Spatial Pyramid Pooling

Process multiple scales simultaneously.

class SPPLayer(nn.Module):
    def __init__(self, num_levels=3):
        super().__init__()
        self.num_levels = num_levels

    def forward(self, x):
        batch_size, channels, h, w = x.size()
        pooled = []

        for i in range(self.num_levels):
            level = i + 1
            kernel_size = (h // level, w // level)
            stride = kernel_size
            pooling = nn.AdaptiveMaxPool2d((level, level))
            tensor = pooling(x).view(batch_size, channels, -1)
            pooled.append(tensor)

            return torch.cat(pooled, dim=2)

14. Deep Dive: Data Augmentation for Boundary Detection

Boundaries are thin. Augmentation must preserve them.

import albumentations as A

transform = A.Compose([
A.RandomRotate90(p=0.5),
A.Flip(p=0.5),
A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.1, rotate_limit=15, p=0.5),
A.OneOf([
A.ElasticTransform(alpha=120, sigma=120 * 0.05, alpha_affine=120 * 0.03, p=0.5),
A.GridDistortion(p=0.5),
A.OpticalDistortion(distort_limit=1, shift_limit=0.5, p=0.5),
], p=0.3),
A.RandomBrightnessContrast(p=0.3),
])

# Apply to both image and mask
augmented = transform(image=image, mask=boundary_mask)

15. Deep Dive: Multi-Task Learning

Instead of just boundaries, predict boundaries + segmentation + depth.

Architecture:

 Shared Encoder
 |
 ┌─────────┼─────────┐
 | | |
Boundary Segment Depth
 Head Head Head

Loss: L_{total} = \lambda_1 L_{boundary} + \lambda_2 L_{segment} + \lambda_3 L_{depth}

Benefit: Shared features improve all tasks. Segmentation provides context for boundaries.

16. System Design: Medical Image Boundary Detection

Scenario: Detect tumor boundaries in MRI scans.

Pipeline:

Preprocessing:
- Normalize intensity (Z-score).
- Resize to 512x512.
- Apply CLAHE (Contrast Limited Adaptive Histogram Equalization).
Model: 3D U-Net (process volumetric data).
Post-processing:
- 3D Connected Components (remove small noise).
- Surface smoothing (Laplacian smoothing).
Validation: Radiologist review (Human-in-the-loop).

Metrics:

Dice Score: Overlap with ground truth.
Hausdorff Distance: Maximum boundary error.
Hausdorff Distance: Maximum boundary error.

Problem: CNN outputs are often blurry at boundaries due to pooling and upsampling.

Solution: Post-process with a CRF to enforce spatial consistency.

Dense CRF (Fully Connected CRF):

Every pixel is connected to every other pixel.
Unary Potential: CNN prediction for pixel i.
Pairwise Potential: Encourages similar pixels to have similar labels.

E(x) = \sum_i \psi_u(x_i) + \sum_{i<j} \psi_p(x_i, x_j)

Where:

\psi_u(x_i) = -\log P(x_i) (from CNN).
\psi_p(x_i, x_j) = \mu(x_i, x_j) \cdot k(f_i, f_j) (similarity kernel based on color and position).

Implementation (PyDenseCRF):

import pydensecrf.densecrf as dcrf
from pydensecrf.utils import unary_from_softmax

def crf_refine(image, prob_map):
    h, w = image.shape[:2]

    # Create CRF
    d = dcrf.DenseCRF2D(w, h, 2) # 2 classes: boundary/non-boundary

    # Unary potential
    U = unary_from_softmax(prob_map)
    d.setUnaryEnergy(U)

    # Pairwise potentials
    # Appearance kernel (color similarity)
    d.addPairwiseGaussian(sxy=3, compat=3)

    # Smoothness kernel (spatial proximity)
    d.addPairwiseBilateral(sxy=80, srgb=13, rgbim=image, compat=10)

    # Inference
    Q = d.inference(5) # 5 iterations
    refined = np.argmax(Q, axis=0).reshape((h, w))

    return refined

Result: Sharp, clean boundaries aligned with object edges.

18. Deep Dive: Attention Mechanisms for Boundary Detection

Observation: Not all regions are equally important. Focus on boundary regions.

Spatial Attention:

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super().__init__()
        self.conv = nn.Conv2d(2, 1, kernel_size, padding=kernel_size//2)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Aggregate across channels
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)

        # Concatenate and convolve
        attention = torch.cat([avg_out, max_out], dim=1)
        attention = self.conv(attention)
        attention = self.sigmoid(attention)

        return x * attention

Channel Attention (SE Block):

class SEBlock(nn.Module):
    def __init__(self, channels, reduction=16):
        super().__init__()
        self.fc = nn.Sequential(
        nn.Linear(channels, channels // reduction),
        nn.ReLU(),
        nn.Linear(channels // reduction, channels),
        nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        # Global average pooling
        y = x.view(b, c, -1).mean(dim=2)
        # Excitation
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

19. Case Study: Instance Segmentation (Mask R-CNN)

Problem: Detect boundaries of individual instances (e.g., 3 separate cars).

Mask R-CNN Architecture:

Backbone: ResNet-50 + FPN (Feature Pyramid Network).
RPN (Region Proposal Network): Proposes bounding boxes.
RoI Align: Extract features for each box (better than RoI Pooling, preserves spatial alignment).
Heads:
- Classification: What class?
- Box Regression: Refine box coordinates.
- Mask: Binary mask for the instance (28x28, upsampled to box size).

Boundary Extraction:

The mask head outputs a soft mask.
Apply threshold (0.5) to get binary mask.
Use cv2.findContours() to extract boundary polygon.

Production Optimization:

import detectron2
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.WEIGHTS = "model_final.pth"
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7

predictor = DefaultPredictor(cfg)

# Inference
outputs = predictor(image)
instances = outputs["instances"]

# Extract boundaries
for i in range(len(instances)):
    mask = instances.pred_masks[i].cpu().numpy()
    contours, _ = cv2.findContours(mask.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    # contours[0] is the boundary polygon

20. Advanced: Differentiable Rendering for Boundary Optimization

Concept: Treat boundary detection as an inverse rendering problem.

Pipeline:

Predict: 3D mesh of the object.
Render: Project mesh to 2D using differentiable renderer (PyTorch3D).
Loss: Compare rendered silhouette with target boundary.
Backprop: Gradients flow through the renderer to update the mesh.

Code Sketch:

from pytorch3d.renderer import (
MeshRenderer, MeshRasterizer, SoftSilhouetteShader,
RasterizationSettings, PerspectiveCameras
)

# Define mesh
verts, faces = load_mesh()

# Differentiable renderer
cameras = PerspectiveCameras()
raster_settings = RasterizationSettings(image_size=512, blur_radius=1e-5)
renderer = MeshRenderer(
rasterizer=MeshRasterizer(cameras=cameras, raster_settings=raster_settings),
shader=SoftSilhouetteShader()
)

# Render
silhouette = renderer(meshes)

# Loss
loss = F.mse_loss(silhouette, target_boundary)
loss.backward()

# Update mesh vertices
optimizer.step()

Use Case: 3D reconstruction from 2D images (e.g., NeRF, 3D Gaussian Splatting).

21. Ethical Considerations

1. Bias in Medical Imaging:

If training data is mostly from one demographic (e.g., Caucasian patients), boundary detection might fail on others.
Fix: Diverse, representative datasets.

2. Surveillance:

Boundary detection enables person tracking and re-identification.
Mitigation: Privacy-preserving techniques (on-device processing, federated learning).

3. Deepfakes:

Precise boundary detection enables realistic face swaps.
Safeguard: Watermarking, detection models.
Safeguard: Watermarking, detection models.

22. Benchmark Datasets for Boundary Detection

1. BSDS500 (Berkeley Segmentation Dataset):

500 natural images with human-annotated boundaries.
Metric: F-measure (ODS/OIS).
SOTA: F-ODS = 0.82 (HED).

2. Cityscapes:

5,000 street scene images with fine annotations.
Task: Instance-level boundary detection for cars, pedestrians, etc.
Metric: Boundary IoU.

3. NYU Depth V2:

1,449 indoor RGB-D images.
Task: Depth discontinuities (boundaries in 3D).
Use Case: Robotics, AR/VR.

4. Medical Datasets:

ISIC (Skin Lesions): Melanoma boundary detection.
BraTS (Brain Tumors): 3D tumor boundaries in MRI.
DRIVE (Retinal Vessels): Blood vessel segmentation.

23. Production Monitoring and Debugging

Challenge: Model works in lab, fails in production.

Monitoring Metrics:

Boundary Precision/Recall: Track over time.
Inference Latency: P50, P95, P99.
GPU Utilization: Should be > 80% for efficiency.
Error Cases: Log images where Dice < 0.5.

Debugging Tools:

import wandb

# Log predictions
wandb.log({
"prediction": wandb.Image(pred_mask),
"ground_truth": wandb.Image(gt_mask),
"dice_score": dice,
"inference_time_ms": latency
})

# Alert if performance degrades
if dice < 0.7:
    wandb.alert(
    title="Low Dice Score",
    text=f"Dice = {dice} on image {image_id}"
    )

A/B Testing:

Deploy new model to 5% of traffic.
Compare boundary quality (human eval or automated metrics).
Gradual rollout if metrics improve.

24. Common Pitfalls and How to Avoid Them

Pitfall 1: Ignoring Class Imbalance

Boundary pixels are < 5% of the image.
Fix: Use weighted loss or focal loss.

Pitfall 2: Over-smoothing

Too much pooling/upsampling blurs boundaries.
Fix: Use skip connections (U-Net) or dilated convolutions.

Pitfall 3: Inconsistent Annotations

Different annotators draw boundaries differently.
Fix: Multi-annotator consensus, use soft labels (average of multiple annotations).

Pitfall 4: Domain Shift

Train on sunny day images, deploy on rainy nights.
Fix: Domain adaptation (CycleGAN), diverse training data.

Pitfall 5: Not Testing on Edge Cases

Occlusion, motion blur, low light.
Fix: Curate a “hard examples” test set.

25. Advanced: Boundary-Aware Data Augmentation

Standard augmentation (rotation, flip) isn’t enough for thin boundaries.

Elastic Deformation:

import elasticdeform

# Deform image and mask together
[image_deformed, mask_deformed] = elasticdeform.deform_random_grid(
[image, mask],
sigma=25, # Deformation strength
points=3, # Grid resolution
order=[3, 0], # Interpolation order (cubic for image, nearest for mask)
axis=(0, 1)
)

Boundary-Specific Augmentation:

def augment_boundary(mask, dilation_range=(1, 3)):
    # Randomly dilate or erode boundary
    kernel_size = np.random.randint(*dilation_range)
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))

    if np.random.rand() > 0.5:
        mask = cv2.dilate(mask, kernel)
    else:
        mask = cv2.erode(mask, kernel)

        return mask

26. Advanced: Multi-Scale Boundary Detection

Objects have boundaries at different scales (fine hair vs. body outline).

Laplacian Pyramid:

def build_laplacian_pyramid(image, levels=4):
    gaussian_pyramid = [image]
    for i in range(levels):
        image = cv2.pyrDown(image)
        gaussian_pyramid.append(image)

        laplacian_pyramid = []
        for i in range(levels):
            size = (gaussian_pyramid[i].shape[1], gaussian_pyramid[i].shape[0])
            expanded = cv2.pyrUp(gaussian_pyramid[i+1], dstsize=size)
            laplacian = cv2.subtract(gaussian_pyramid[i], expanded)
            laplacian_pyramid.append(laplacian)

            return laplacian_pyramid

            # Process each scale
            for level in laplacian_pyramid:
                boundary_map = model(level)
                # Fuse multi-scale outputs

# Fuse multi-scale outputs

## 27. Hardware Considerations for Real-Time Boundary Detection

**Challenge:** Autonomous vehicles need 60 FPS at 1080p.

**Hardware Options:**
1. **NVIDIA Jetson AGX Xavier:**
 - 32 TOPS (INT8).
 - Power: 30W.
 - **Use Case:** Embedded systems, drones.

2. **Tesla FSD Chip:**
 - Custom ASIC for neural networks.
 - 144 TOPS.
 - **Use Case:** Tesla Autopilot.

3. **Google Edge TPU:**
 - 4 TOPS.
 - Power: 2W.
 - **Use Case:** Mobile devices, IoT.

**Optimization for Edge:**
```python
# Model pruning
import torch.nn.utils.prune as prune

# Prune 30% of weights
for module in model.modules():
    if isinstance(module, nn.Conv2d):
        prune.l1_unstructured(module, name='weight', amount=0.3)

        # Knowledge distillation
        teacher = UNet(channels=64) # Large model
        student = UNet(channels=16) # Small model

        # Train student to mimic teacher
        loss = F.mse_loss(student(x), teacher(x).detach())

loss = F.mse_loss(student(x), teacher(x).detach()) ```

Performance Benchmarks (1080p Image):

Hardware	Model	FPS	Latency (ms)	Power (W)
RTX 3090	U-Net (FP32)	120	8.3	350
RTX 3090	U-Net (INT8)	350	2.9	350
Jetson Xavier	U-Net (INT8)	45	22	30
Edge TPU	MobileNet-UNet	15	67	2
CPU (i9)	U-Net (FP32)	3	333	125

Takeaway: For real-time edge deployment, use INT8 quantization + lightweight architecture.

28. Interview Tips for Boundary Detection Problems

Q1: How would you handle class imbalance in boundary detection? Answer: Use weighted loss (weight boundary pixels 10x higher), focal loss, or Dice loss which is robust to imbalance.

Q2: Why use skip connections in U-Net? Answer: Pooling loses spatial information. Skip connections concatenate high-res features from the encoder to the decoder, recovering fine details needed for precise boundaries.

Q3: How to deploy a boundary detection model at 60 FPS? Answer: Model quantization (FP32 → INT8), TensorRT optimization, use lightweight architectures (MobileNet backbone), process at lower resolution and upsample.

Q4: How to evaluate boundary quality? Answer: Boundary IoU (intersection over union along the contour band), F-measure (precision/recall on boundary pixels), Hausdorff distance (maximum error).

Q5: What’s the difference between edge detection and boundary detection? Answer: Edge detection finds all intensity changes (low-level, includes texture). Boundary detection finds semantically meaningful object contours (high-level, requires understanding).

29. Further Reading

“U-Net: Convolutional Networks for Biomedical Image Segmentation” (Ronneberger et al., 2015): The U-Net paper.
“Holistically-Nested Edge Detection” (Xie & Tu, 2015): HED architecture.
“Mask R-CNN” (He et al., 2017): Instance segmentation standard.
“DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs” (Chen et al., 2018): CRF refinement.
“Attention U-Net: Learning Where to Look for the Pancreas” (Oktay et al., 2018): Attention for medical imaging.

30. Conclusion

Boundary detection has evolved from simple gradient operators (Canny) to sophisticated deep learning models (U-Net, Mask R-CNN) that understand semantic context. The key challenges, thin structures, class imbalance, real-time performance, are being addressed through specialized loss functions (Dice, Tversky), attention mechanisms, and deployment optimizations (TensorRT, quantization). As we move toward 3D understanding and multi-modal fusion (LiDAR + Camera), boundary detection will remain a critical building block for autonomous systems, medical AI, and creative tools.

31. Summary

Component	Technology
Low-Level	Canny, Sobel
Deep Learning	HED, CASENet, U-Net
Refinement	Active Contours, CRF
Metrics	Boundary IoU, F-Score
Deployment	TensorRT, Quantization
Advanced	Attention, Differentiable Rendering

FAQ

What is the difference between edge detection and boundary detection?

Edge detection finds all intensity changes in an image, including textures, shadows, and noise. Boundary detection identifies semantically meaningful object contours that require high-level understanding of what constitutes an “object.” A checkerboard has many edges but no meaningful boundaries. Modern deep learning approaches like HED and CASENet bridge this gap by learning to distinguish semantic boundaries from low-level intensity changes.

How does U-Net handle precise boundary localization?

U-Net uses an encoder-decoder architecture with skip connections that concatenate high-resolution features from the encoder to the decoder. The encoder captures “what” (context through pooling), while the decoder recovers “where” (precise localization). Skip connections preserve fine spatial details that would otherwise be lost. Combined with specialized loss functions like Dice Loss, U-Net produces thin, accurate boundaries rather than the blurry predictions typical of standard cross-entropy training.

How do you deploy boundary detection models at 60 FPS for autonomous driving?

Key techniques include INT8 quantization (3-4x CPU speedup), TensorRT optimization (5-10x GPU speedup), lightweight architectures like MobileNet backbones, and processing at lower resolution with learned upsampling. Hardware like NVIDIA Jetson Xavier (32 TOPS) or Tesla’s custom FSD chip (144 TOPS) provides dedicated inference acceleration. Profiling on the target hardware is essential since desktop benchmarks do not reflect edge device performance.

What loss functions work best for boundary detection?

Standard cross-entropy produces thick, blurry boundaries because boundary pixels are rare (less than 5% of the image). Dice Loss directly optimizes for overlap and is robust to class imbalance. Tversky Loss generalizes Dice to control the precision-recall tradeoff. Focal Loss down-weights easy examples to focus on hard boundary pixels. In practice, combining Dice Loss with weighted BCE often gives the best results.

Originally published at: arunbaby.com/ml-system-design/0035-boundary-detection-in-ml

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch

TL;DR

1. The Problem: Edges vs. Boundaries

2. Classical Approaches

1. Canny Edge Detector (1986)

2. Structured Forests

3. Deep Learning Approaches

1. Holistically-Nested Edge Detection (HED)

2. CASENet (Category-Aware Semantic Edge Detection)

4. Deep Dive: U-Net for Boundary Detection

5. System Design: Lane Detection System

6. Deep Dive: Active Contour Models (Snakes)

7. Evaluation Metrics

8. Real-World Case Studies

Case Study 1: Adobe Photoshop “Select Subject”

Case Study 2: Tesla Autopilot

9. Summary

10. Deep Dive: U-Net Architecture Implementation

11. Deep Dive: Loss Functions for Boundary Detection

1. Weighted BCE (Class Imbalance)

2. Dice Loss (Overlap Metric)

3. Tversky Loss (Precision/Recall Trade-off)

4. Focal Loss (Hard Examples)

12. Deep Dive: Post-Processing Techniques

1. Morphological Operations

2. Non-Maximum Suppression (NMS)

13. Deep Dive: Real-Time Deployment Optimizations

1. Model Quantization

2. TensorRT Optimization

3. Spatial Pyramid Pooling

14. Deep Dive: Data Augmentation for Boundary Detection

15. Deep Dive: Multi-Task Learning

16. System Design: Medical Image Boundary Detection

17. Deep Dive: Conditional Random Fields (CRF) for Boundary Refinement

18. Deep Dive: Attention Mechanisms for Boundary Detection

19. Case Study: Instance Segmentation (Mask R-CNN)

20. Advanced: Differentiable Rendering for Boundary Optimization

21. Ethical Considerations

22. Benchmark Datasets for Boundary Detection

23. Production Monitoring and Debugging

24. Common Pitfalls and How to Avoid Them

25. Advanced: Boundary-Aware Data Augmentation

26. Advanced: Multi-Scale Boundary Detection

28. Interview Tips for Boundary Detection Problems

29. Further Reading

30. Conclusion

31. Summary

FAQ

Related across topics

Surrounded Regions (DFS/BFS)

Speech Boundary Detection

Structured Output Patterns

Share on