Boundary Detection in ML
“Defining where one object ends and another begins.”
1. The Problem: Edges vs. Boundaries
- Edge Detection: Finding sharp changes in pixel intensity (low-level).
- Example: A checkerboard pattern has many edges.
- Boundary Detection: Finding semantically meaningful contours of objects (high-level).
- Example: The outline of a “Dog” or “Car”.
Applications:
- Autonomous Driving: Lane detection, road boundaries.
- Medical Imaging: Tumor segmentation, organ boundaries.
- Photo Editing: “Select Subject” tool in Photoshop.
2. Classical Approaches
Before Deep Learning, we used math.
1. Canny Edge Detector (1986)
The gold standard for decades.
- Gaussian Blur: Remove noise.
- Gradient Calculation: Find intensity change ($\nabla I$) using Sobel filters.
- Non-Maximum Suppression: Thin out edges to 1-pixel width.
- Hysteresis Thresholding: Keep strong edges, and weak edges connected to strong ones.
Pros: Fast, precise localization. Cons: Detects all edges (texture, shadows), not just object boundaries.
2. Structured Forests
- Uses Random Forests to classify patches as “edge” or “non-edge”.
- Uses hand-crafted features (color, gradient histograms).
3. Deep Learning Approaches
Modern systems use CNNs to learn semantic boundaries.
1. Holistically-Nested Edge Detection (HED)
- Architecture: VGG-16 backbone.
- Multi-Scale: Predicts edges at multiple layers (conv3, conv4, conv5).
- Fusion: Combines side-outputs into a final edge map.
- Loss: Weighted Cross-Entropy (to handle class imbalance: 90% pixels are non-edge).
2. CASENet (Category-Aware Semantic Edge Detection)
- Not just “is this an edge?”, but “is this a Dog edge or a Car edge?”.
- Architecture: ResNet-101 with multi-label loss.
- Output: $K$ channels, one for each class boundary.
4. Deep Dive: U-Net for Boundary Detection
U-Net is the standard for biomedical segmentation, but it excels at boundaries too.
Architecture:
- Encoder (Contracting Path): Captures context (What is this?).
- Decoder (Expanding Path): Precise localization (Where is it?).
- Skip Connections: Concatenate high-res features from encoder to decoder to recover fine details.
Loss Function for Thin Boundaries: Standard Cross-Entropy produces thick, blurry boundaries. Solution: Dice Loss or Tversky Loss. \(Dice = \frac{2 |P \cap G|}{|P| + |G|}\) Where $P$ is prediction, $G$ is ground truth.
5. System Design: Lane Detection System
Scenario: Self-driving car needs to stay in lane.
Pipeline:
- Input: Camera feed (1080p, 60fps).
- Preprocessing: ROI cropping (focus on road), Perspective Transform (Bird’s Eye View).
- Model: Lightweight CNN (e.g., ENet or LaneNet).
- Output: Binary mask of lane lines.
- Post-processing:
- Curve Fitting: Fit a 2nd or 3rd degree polynomial ($y = ax^2 + bx + c$) to the points.
- Kalman Filter: Smooth predictions over time (lanes don’t jump).
Challenges:
- Occlusion: Car in front blocks view.
- Lighting: Shadows, glare, night.
- Worn Markings: Faded lines.
6. Deep Dive: Active Contour Models (Snakes)
A hybrid approach: Deep Learning gives a rough mask, Snakes refine it.
Concept:
- Define a curve (snake) around the object.
- Define an Energy Function:
- $E_{internal}$: Smoothness (don’t bend too sharply).
- $E_{external}$: Image forces (snap to high gradients).
- Minimize energy iteratively. The snake “shrinks-wraps” the object.
Modern Twist: Deep Snake. Use a GNN to predict vertex offsets for the polygon contour.
7. Evaluation Metrics
- F-Measure (ODS/OIS):
- ODS (Optimal Dataset Scale): Best fixed threshold for the whole dataset.
- OIS (Optimal Image Scale): Best threshold per image.
- Boundary IoU:
- Standard IoU is dominated by the object interior.
- Boundary IoU computes intersection only along the contour band.
8. Real-World Case Studies
Case Study 1: Adobe Photoshop “Select Subject”
- Problem: User wants to cut out a person.
- Solution: Deep Learning model (Sensei) predicts a “trimap” (Foreground, Background, Unknown).
- Refinement: Matting Laplacian to solve the alpha value for hair/fur pixels.
Case Study 2: Tesla Autopilot
- Problem: Map the drivable space.
- Solution: “HydraNet” multi-task learning.
- Heads: Lane lines, Road edges, Curbs.
- Vector Space: Projects image-space predictions into 3D vector space for planning.
9. Summary
| Component | Technology |
|---|---|
| Low-Level | Canny, Sobel |
| Deep Learning | HED, CASENet, U-Net |
| Refinement | Active Contours, CRF |
| Metrics | Boundary IoU, F-Score |
10. Deep Dive: U-Net Architecture Implementation
Let’s implement a production-ready U-Net in PyTorch.
import torch
import torch.nn as nn
class DoubleConv(nn.Module):
def __init__(self, in_channels, out_channels):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels, 3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
def forward(self, x):
return self.conv(x)
class UNet(nn.Module):
def __init__(self, in_channels=3, out_channels=1):
super().__init__()
# Encoder
self.enc1 = DoubleConv(in_channels, 64)
self.enc2 = DoubleConv(64, 128)
self.enc3 = DoubleConv(128, 256)
self.enc4 = DoubleConv(256, 512)
self.pool = nn.MaxPool2d(2)
# Bottleneck
self.bottleneck = DoubleConv(512, 1024)
# Decoder
self.upconv4 = nn.ConvTranspose2d(1024, 512, 2, stride=2)
self.dec4 = DoubleConv(1024, 512) # 1024 = 512 (upconv) + 512 (skip)
self.upconv3 = nn.ConvTranspose2d(512, 256, 2, stride=2)
self.dec3 = DoubleConv(512, 256)
self.upconv2 = nn.ConvTranspose2d(256, 128, 2, stride=2)
self.dec2 = DoubleConv(256, 128)
self.upconv1 = nn.ConvTranspose2d(128, 64, 2, stride=2)
self.dec1 = DoubleConv(128, 64)
self.out = nn.Conv2d(64, out_channels, 1)
def forward(self, x):
# Encoder
e1 = self.enc1(x)
e2 = self.enc2(self.pool(e1))
e3 = self.enc3(self.pool(e2))
e4 = self.enc4(self.pool(e3))
# Bottleneck
b = self.bottleneck(self.pool(e4))
# Decoder with skip connections
d4 = self.upconv4(b)
d4 = torch.cat([d4, e4], dim=1)
d4 = self.dec4(d4)
d3 = self.upconv3(d4)
d3 = torch.cat([d3, e3], dim=1)
d3 = self.dec3(d3)
d2 = self.upconv2(d3)
d2 = torch.cat([d2, e2], dim=1)
d2 = self.dec2(d2)
d1 = self.upconv1(d2)
d1 = torch.cat([d1, e1], dim=1)
d1 = self.dec1(d1)
return torch.sigmoid(self.out(d1))
11. Deep Dive: Loss Functions for Boundary Detection
Standard Binary Cross-Entropy (BCE) produces thick boundaries. We need specialized losses.
1. Weighted BCE (Class Imbalance)
Boundary pixels are rare (< 5% of image). Weight them higher.
def weighted_bce_loss(pred, target, pos_weight=10.0):
bce = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([pos_weight]))
return bce(pred, target)
2. Dice Loss (Overlap Metric)
Directly optimizes for IoU.
def dice_loss(pred, target, smooth=1.0):
pred = pred.view(-1)
target = target.view(-1)
intersection = (pred * target).sum()
dice = (2. * intersection + smooth) / (pred.sum() + target.sum() + smooth)
return 1 - dice
3. Tversky Loss (Precision/Recall Trade-off)
Generalization of Dice. Control false positives vs. false negatives.
def tversky_loss(pred, target, alpha=0.7, beta=0.3, smooth=1.0):
pred = pred.view(-1)
target = target.view(-1)
TP = (pred * target).sum()
FP = ((1 - target) * pred).sum()
FN = (target * (1 - pred)).sum()
tversky = (TP + smooth) / (TP + alpha*FP + beta*FN + smooth)
return 1 - tversky
4. Focal Loss (Hard Examples)
Down-weight easy examples, focus on hard ones.
def focal_loss(pred, target, alpha=0.25, gamma=2.0):
bce = nn.functional.binary_cross_entropy(pred, target, reduction='none')
pt = torch.exp(-bce)
focal = alpha * (1 - pt) ** gamma * bce
return focal.mean()
12. Deep Dive: Post-Processing Techniques
Raw model output is noisy. Refine it.
1. Morphological Operations
import cv2
import numpy as np
def post_process_boundary(mask):
# Convert to uint8
mask = (mask * 255).astype(np.uint8)
# Morphological closing (fill small gaps)
kernel = np.ones((3, 3), np.uint8)
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
# Skeletonization (thin to 1-pixel width)
mask = cv2.ximgproc.thinning(mask)
return mask
2. Non-Maximum Suppression (NMS)
Keep only local maxima along the gradient direction.
def non_max_suppression(edge_map, gradient_direction):
M, N = edge_map.shape
suppressed = np.zeros((M, N))
angle = gradient_direction * 180. / np.pi
angle[angle < 0] += 180
for i in range(1, M-1):
for j in range(1, N-1):
q = 255
r = 255
# Angle 0
if (0 <= angle[i,j] < 22.5) or (157.5 <= angle[i,j] <= 180):
q = edge_map[i, j+1]
r = edge_map[i, j-1]
# Angle 45
elif (22.5 <= angle[i,j] < 67.5):
q = edge_map[i+1, j-1]
r = edge_map[i-1, j+1]
# Angle 90
elif (67.5 <= angle[i,j] < 112.5):
q = edge_map[i+1, j]
r = edge_map[i-1, j]
# Angle 135
elif (112.5 <= angle[i,j] < 157.5):
q = edge_map[i-1, j-1]
r = edge_map[i+1, j+1]
if (edge_map[i,j] >= q) and (edge_map[i,j] >= r):
suppressed[i,j] = edge_map[i,j]
return suppressed
13. Deep Dive: Real-Time Deployment Optimizations
For autonomous driving, we need 60 FPS (16ms per frame).
1. Model Quantization
Convert FP32 to INT8.
import torch.quantization
model_fp32 = UNet()
model_fp32.eval()
# Post-training static quantization
model_int8 = torch.quantization.quantize_dynamic(
model_fp32,
{torch.nn.Conv2d, torch.nn.Linear},
dtype=torch.qint8
)
# Speedup: 3-4x on CPU
2. TensorRT Optimization
NVIDIA’s inference optimizer.
import tensorrt as trt
# Convert PyTorch model to ONNX
torch.onnx.export(model, dummy_input, "unet.onnx")
# Build TensorRT engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)
with open("unet.onnx", 'rb') as model_file:
parser.parse(model_file.read())
config = builder.create_builder_config()
config.max_workspace_size = 1 << 30 # 1GB
config.set_flag(trt.BuilderFlag.FP16) # Use FP16
engine = builder.build_engine(network, config)
# Speedup: 5-10x on GPU
3. Spatial Pyramid Pooling
Process multiple scales simultaneously.
class SPPLayer(nn.Module):
def __init__(self, num_levels=3):
super().__init__()
self.num_levels = num_levels
def forward(self, x):
batch_size, channels, h, w = x.size()
pooled = []
for i in range(self.num_levels):
level = i + 1
kernel_size = (h // level, w // level)
stride = kernel_size
pooling = nn.AdaptiveMaxPool2d((level, level))
tensor = pooling(x).view(batch_size, channels, -1)
pooled.append(tensor)
return torch.cat(pooled, dim=2)
14. Deep Dive: Data Augmentation for Boundary Detection
Boundaries are thin. Augmentation must preserve them.
import albumentations as A
transform = A.Compose([
A.RandomRotate90(p=0.5),
A.Flip(p=0.5),
A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.1, rotate_limit=15, p=0.5),
A.OneOf([
A.ElasticTransform(alpha=120, sigma=120 * 0.05, alpha_affine=120 * 0.03, p=0.5),
A.GridDistortion(p=0.5),
A.OpticalDistortion(distort_limit=1, shift_limit=0.5, p=0.5),
], p=0.3),
A.RandomBrightnessContrast(p=0.3),
])
# Apply to both image and mask
augmented = transform(image=image, mask=boundary_mask)
15. Deep Dive: Multi-Task Learning
Instead of just boundaries, predict boundaries + segmentation + depth.
Architecture:
Shared Encoder
|
┌─────────┼─────────┐
| | |
Boundary Segment Depth
Head Head Head
Loss: \(L_{total} = \lambda_1 L_{boundary} + \lambda_2 L_{segment} + \lambda_3 L_{depth}\)
Benefit: Shared features improve all tasks. Segmentation provides context for boundaries.
16. System Design: Medical Image Boundary Detection
Scenario: Detect tumor boundaries in MRI scans.
Pipeline:
- Preprocessing:
- Normalize intensity (Z-score).
- Resize to 512x512.
- Apply CLAHE (Contrast Limited Adaptive Histogram Equalization).
- Model: 3D U-Net (process volumetric data).
- Post-processing:
- 3D Connected Components (remove small noise).
- Surface smoothing (Laplacian smoothing).
- Validation: Radiologist review (Human-in-the-loop).
Metrics:
- Dice Score: Overlap with ground truth.
-
Hausdorff Distance: Maximum boundary error.
- Hausdorff Distance: Maximum boundary error.
17. Deep Dive: Conditional Random Fields (CRF) for Boundary Refinement
Problem: CNN outputs are often blurry at boundaries due to pooling and upsampling.
Solution: Post-process with a CRF to enforce spatial consistency.
Dense CRF (Fully Connected CRF):
- Every pixel is connected to every other pixel.
- Unary Potential: CNN prediction for pixel $i$.
- Pairwise Potential: Encourages similar pixels to have similar labels.
Where:
- $\psi_u(x_i) = -\log P(x_i)$ (from CNN).
- $\psi_p(x_i, x_j) = \mu(x_i, x_j) \cdot k(f_i, f_j)$ (similarity kernel based on color and position).
Implementation (PyDenseCRF):
import pydensecrf.densecrf as dcrf
from pydensecrf.utils import unary_from_softmax
def crf_refine(image, prob_map):
h, w = image.shape[:2]
# Create CRF
d = dcrf.DenseCRF2D(w, h, 2) # 2 classes: boundary/non-boundary
# Unary potential
U = unary_from_softmax(prob_map)
d.setUnaryEnergy(U)
# Pairwise potentials
# Appearance kernel (color similarity)
d.addPairwiseGaussian(sxy=3, compat=3)
# Smoothness kernel (spatial proximity)
d.addPairwiseBilateral(sxy=80, srgb=13, rgbim=image, compat=10)
# Inference
Q = d.inference(5) # 5 iterations
refined = np.argmax(Q, axis=0).reshape((h, w))
return refined
Result: Sharp, clean boundaries aligned with object edges.
18. Deep Dive: Attention Mechanisms for Boundary Detection
Observation: Not all regions are equally important. Focus on boundary regions.
Spatial Attention:
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
super().__init__()
self.conv = nn.Conv2d(2, 1, kernel_size, padding=kernel_size//2)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
# Aggregate across channels
avg_out = torch.mean(x, dim=1, keepdim=True)
max_out, _ = torch.max(x, dim=1, keepdim=True)
# Concatenate and convolve
attention = torch.cat([avg_out, max_out], dim=1)
attention = self.conv(attention)
attention = self.sigmoid(attention)
return x * attention
Channel Attention (SE Block):
class SEBlock(nn.Module):
def __init__(self, channels, reduction=16):
super().__init__()
self.fc = nn.Sequential(
nn.Linear(channels, channels // reduction),
nn.ReLU(),
nn.Linear(channels // reduction, channels),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
# Global average pooling
y = x.view(b, c, -1).mean(dim=2)
# Excitation
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
19. Case Study: Instance Segmentation (Mask R-CNN)
Problem: Detect boundaries of individual instances (e.g., 3 separate cars).
Mask R-CNN Architecture:
- Backbone: ResNet-50 + FPN (Feature Pyramid Network).
- RPN (Region Proposal Network): Proposes bounding boxes.
- RoI Align: Extract features for each box (better than RoI Pooling, preserves spatial alignment).
- Heads:
- Classification: What class?
- Box Regression: Refine box coordinates.
- Mask: Binary mask for the instance (28x28, upsampled to box size).
Boundary Extraction:
- The mask head outputs a soft mask.
- Apply threshold (0.5) to get binary mask.
- Use
cv2.findContours()to extract boundary polygon.
Production Optimization:
import detectron2
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file("mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.WEIGHTS = "model_final.pth"
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7
predictor = DefaultPredictor(cfg)
# Inference
outputs = predictor(image)
instances = outputs["instances"]
# Extract boundaries
for i in range(len(instances)):
mask = instances.pred_masks[i].cpu().numpy()
contours, _ = cv2.findContours(mask.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# contours[0] is the boundary polygon
20. Advanced: Differentiable Rendering for Boundary Optimization
Concept: Treat boundary detection as an inverse rendering problem.
Pipeline:
- Predict: 3D mesh of the object.
- Render: Project mesh to 2D using differentiable renderer (PyTorch3D).
- Loss: Compare rendered silhouette with target boundary.
- Backprop: Gradients flow through the renderer to update the mesh.
Code Sketch:
from pytorch3d.renderer import (
MeshRenderer, MeshRasterizer, SoftSilhouetteShader,
RasterizationSettings, PerspectiveCameras
)
# Define mesh
verts, faces = load_mesh()
# Differentiable renderer
cameras = PerspectiveCameras()
raster_settings = RasterizationSettings(image_size=512, blur_radius=1e-5)
renderer = MeshRenderer(
rasterizer=MeshRasterizer(cameras=cameras, raster_settings=raster_settings),
shader=SoftSilhouetteShader()
)
# Render
silhouette = renderer(meshes)
# Loss
loss = F.mse_loss(silhouette, target_boundary)
loss.backward()
# Update mesh vertices
optimizer.step()
Use Case: 3D reconstruction from 2D images (e.g., NeRF, 3D Gaussian Splatting).
21. Ethical Considerations
1. Bias in Medical Imaging:
- If training data is mostly from one demographic (e.g., Caucasian patients), boundary detection might fail on others.
- Fix: Diverse, representative datasets.
2. Surveillance:
- Boundary detection enables person tracking and re-identification.
- Mitigation: Privacy-preserving techniques (on-device processing, federated learning).
3. Deepfakes:
- Precise boundary detection enables realistic face swaps.
-
Safeguard: Watermarking, detection models.
- Safeguard: Watermarking, detection models.
22. Benchmark Datasets for Boundary Detection
1. BSDS500 (Berkeley Segmentation Dataset):
- 500 natural images with human-annotated boundaries.
- Metric: F-measure (ODS/OIS).
- SOTA: F-ODS = 0.82 (HED).
2. Cityscapes:
- 5,000 street scene images with fine annotations.
- Task: Instance-level boundary detection for cars, pedestrians, etc.
- Metric: Boundary IoU.
3. NYU Depth V2:
- 1,449 indoor RGB-D images.
- Task: Depth discontinuities (boundaries in 3D).
- Use Case: Robotics, AR/VR.
4. Medical Datasets:
- ISIC (Skin Lesions): Melanoma boundary detection.
- BraTS (Brain Tumors): 3D tumor boundaries in MRI.
- DRIVE (Retinal Vessels): Blood vessel segmentation.
23. Production Monitoring and Debugging
Challenge: Model works in lab, fails in production.
Monitoring Metrics:
- Boundary Precision/Recall: Track over time.
- Inference Latency: P50, P95, P99.
- GPU Utilization: Should be > 80% for efficiency.
- Error Cases: Log images where Dice < 0.5.
Debugging Tools:
import wandb
# Log predictions
wandb.log({
"prediction": wandb.Image(pred_mask),
"ground_truth": wandb.Image(gt_mask),
"dice_score": dice,
"inference_time_ms": latency
})
# Alert if performance degrades
if dice < 0.7:
wandb.alert(
title="Low Dice Score",
text=f"Dice = {dice} on image {image_id}"
)
A/B Testing:
- Deploy new model to 5% of traffic.
- Compare boundary quality (human eval or automated metrics).
- Gradual rollout if metrics improve.
24. Common Pitfalls and How to Avoid Them
Pitfall 1: Ignoring Class Imbalance
- Boundary pixels are < 5% of the image.
- Fix: Use weighted loss or focal loss.
Pitfall 2: Over-smoothing
- Too much pooling/upsampling blurs boundaries.
- Fix: Use skip connections (U-Net) or dilated convolutions.
Pitfall 3: Inconsistent Annotations
- Different annotators draw boundaries differently.
- Fix: Multi-annotator consensus, use soft labels (average of multiple annotations).
Pitfall 4: Domain Shift
- Train on sunny day images, deploy on rainy nights.
- Fix: Domain adaptation (CycleGAN), diverse training data.
Pitfall 5: Not Testing on Edge Cases
- Occlusion, motion blur, low light.
- Fix: Curate a “hard examples” test set.
25. Advanced: Boundary-Aware Data Augmentation
Standard augmentation (rotation, flip) isn’t enough for thin boundaries.
Elastic Deformation:
import elasticdeform
# Deform image and mask together
[image_deformed, mask_deformed] = elasticdeform.deform_random_grid(
[image, mask],
sigma=25, # Deformation strength
points=3, # Grid resolution
order=[3, 0], # Interpolation order (cubic for image, nearest for mask)
axis=(0, 1)
)
Boundary-Specific Augmentation:
def augment_boundary(mask, dilation_range=(1, 3)):
# Randomly dilate or erode boundary
kernel_size = np.random.randint(*dilation_range)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))
if np.random.rand() > 0.5:
mask = cv2.dilate(mask, kernel)
else:
mask = cv2.erode(mask, kernel)
return mask
26. Advanced: Multi-Scale Boundary Detection
Objects have boundaries at different scales (fine hair vs. body outline).
Laplacian Pyramid:
def build_laplacian_pyramid(image, levels=4):
gaussian_pyramid = [image]
for i in range(levels):
image = cv2.pyrDown(image)
gaussian_pyramid.append(image)
laplacian_pyramid = []
for i in range(levels):
size = (gaussian_pyramid[i].shape[1], gaussian_pyramid[i].shape[0])
expanded = cv2.pyrUp(gaussian_pyramid[i+1], dstsize=size)
laplacian = cv2.subtract(gaussian_pyramid[i], expanded)
laplacian_pyramid.append(laplacian)
return laplacian_pyramid
# Process each scale
for level in laplacian_pyramid:
boundary_map = model(level)
# Fuse multi-scale outputs
# Fuse multi-scale outputs ```
27. Hardware Considerations for Real-Time Boundary Detection
Challenge: Autonomous vehicles need 60 FPS at 1080p.
Hardware Options:
- NVIDIA Jetson AGX Xavier:
- 32 TOPS (INT8).
- Power: 30W.
- Use Case: Embedded systems, drones.
- Tesla FSD Chip:
- Custom ASIC for neural networks.
- 144 TOPS.
- Use Case: Tesla Autopilot.
- Google Edge TPU:
- 4 TOPS.
- Power: 2W.
- Use Case: Mobile devices, IoT.
Optimization for Edge:
# Model pruning
import torch.nn.utils.prune as prune
# Prune 30% of weights
for module in model.modules():
if isinstance(module, nn.Conv2d):
prune.l1_unstructured(module, name='weight', amount=0.3)
# Knowledge distillation
teacher = UNet(channels=64) # Large model
student = UNet(channels=16) # Small model
# Train student to mimic teacher
loss = F.mse_loss(student(x), teacher(x).detach())
loss = F.mse_loss(student(x), teacher(x).detach()) ```
Performance Benchmarks (1080p Image):
| Hardware | Model | FPS | Latency (ms) | Power (W) |
|---|---|---|---|---|
| RTX 3090 | U-Net (FP32) | 120 | 8.3 | 350 |
| RTX 3090 | U-Net (INT8) | 350 | 2.9 | 350 |
| Jetson Xavier | U-Net (INT8) | 45 | 22 | 30 |
| Edge TPU | MobileNet-UNet | 15 | 67 | 2 |
| CPU (i9) | U-Net (FP32) | 3 | 333 | 125 |
Takeaway: For real-time edge deployment, use INT8 quantization + lightweight architecture.
28. Interview Tips for Boundary Detection Problems
Q1: How would you handle class imbalance in boundary detection? Answer: Use weighted loss (weight boundary pixels 10x higher), focal loss, or Dice loss which is robust to imbalance.
Q2: Why use skip connections in U-Net? Answer: Pooling loses spatial information. Skip connections concatenate high-res features from the encoder to the decoder, recovering fine details needed for precise boundaries.
Q3: How to deploy a boundary detection model at 60 FPS? Answer: Model quantization (FP32 → INT8), TensorRT optimization, use lightweight architectures (MobileNet backbone), process at lower resolution and upsample.
Q4: How to evaluate boundary quality? Answer: Boundary IoU (intersection over union along the contour band), F-measure (precision/recall on boundary pixels), Hausdorff distance (maximum error).
Q5: What’s the difference between edge detection and boundary detection? Answer: Edge detection finds all intensity changes (low-level, includes texture). Boundary detection finds semantically meaningful object contours (high-level, requires understanding).
29. Further Reading
- “U-Net: Convolutional Networks for Biomedical Image Segmentation” (Ronneberger et al., 2015): The U-Net paper.
- “Holistically-Nested Edge Detection” (Xie & Tu, 2015): HED architecture.
- “Mask R-CNN” (He et al., 2017): Instance segmentation standard.
- “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs” (Chen et al., 2018): CRF refinement.
- “Attention U-Net: Learning Where to Look for the Pancreas” (Oktay et al., 2018): Attention for medical imaging.
30. Conclusion
Boundary detection has evolved from simple gradient operators (Canny) to sophisticated deep learning models (U-Net, Mask R-CNN) that understand semantic context. The key challenges—thin structures, class imbalance, real-time performance—are being addressed through specialized loss functions (Dice, Tversky), attention mechanisms, and deployment optimizations (TensorRT, quantization). As we move toward 3D understanding and multi-modal fusion (LiDAR + Camera), boundary detection will remain a critical building block for autonomous systems, medical AI, and creative tools.
31. Summary
| Component | Technology |
|---|---|
| Low-Level | Canny, Sobel |
| Deep Learning | HED, CASENet, U-Net |
| Refinement | Active Contours, CRF |
| Metrics | Boundary IoU, F-Score |
| Deployment | TensorRT, Quantization |
| Advanced | Attention, Differentiable Rendering |
Originally published at: arunbaby.com/ml-system-design/0035-boundary-detection-in-ml