Zero Overlap Solution (Hybrid Loss + Legalization) - 16s Total Runtime by marcoshernanz · Pull Request #37 · partcleda/intern_challenge

marcoshernanz · 2026-01-18T12:50:36Z

Summary

This submission implements a robust placement optimizer that guarantees zero overlap across all test cases (including the 100k-cell test) while maintaining low runtime (~17s total) and competitive wirelength.

Key Implementation Details

Hybrid Overlap Loss:
- Uses exact pairwise loss for small designs ($N \le 2000$).
- Switches to a differentiable density-based loss (Gaussian binning) for large designs to maintain $O(N)$ memory/compute complexity.
Deterministic Legalization:
- Implements a fast row-based packing algorithm (_legalize_placement) as a post-processing step.
- Ensures 100% valid placement (0.0000 overlap) regardless of optimization convergence.
Wirelength Optimization:
- Includes a dedicated fast-path for large designs (_quick_wirelength_opt) using aggressive sampling and simplified loss.
- Runs pre- and post-legalization refinement to recover wirelength quality after overlap removal.
Adaptive Scheduling:
- Dynamically scales epochs, learning rates, and penalty weights based on netlist size ($N < 200$, $N < 2000$, $N < 10000$, etc.).
- Skips heavy training loops for massive designs (100k cells) in favor of fast legalization to keep runtime milliseconds-low.

Results

Average Overlap: 0.0000 (Passed all 12 tests)
Average Wirelength: 0.7480
Total Runtime: ~16.68s (Test 12 ran in ~2.3s)

Copilot

Pull request overview

This PR implements a sophisticated placement optimizer that achieves zero overlap across all test cases through a hybrid approach combining differentiable overlap losses with deterministic legalization. The solution adapts its strategy based on design size, using exact pairwise overlap detection for small designs and switching to density-based approximation for larger ones.

Changes:

Implemented hybrid overlap loss function that switches between exact pairwise (≤2048 cells) and Gaussian density-based approximation (>2048 cells)
Added deterministic row-based legalization post-processing to guarantee zero overlaps
Introduced adaptive hyperparameter scheduling based on design size with early stopping

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 18 comments.

File	Description
placement.py	Core implementation: hybrid overlap loss, fast wirelength optimization, legalization algorithm, adaptive training with size-dependent hyperparameters, and optimized evaluation functions for large designs
README.md	Updated leaderboard with new submission results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-18T12:57:07Z

placement.py

+                0.0, device=cell_positions.device, dtype=cell_positions.dtype
+            )
+
+        warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))


The warmup schedule modifies wl_scale to be 1.0 - 0.2 * warmup. When warmup reaches 1.0 (after 20% of epochs), wl_scale becomes 0.8. This means the wirelength loss weight is permanently reduced to 80% of its intended value (lambda_wirelength * 0.8) for the remaining 80% of training. This may be intentional, but the behavior is not documented and could be confusing. Consider documenting this behavior or using a more intuitive schedule.

Suggested change

warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))

warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))

# During the first 20% of epochs, gradually:

# - increase overlap loss weight from 1.0x to 4.0x (overlap_scale)

# - decrease wirelength loss weight from 1.0x to 0.8x (wl_scale)

# After warmup (warmup == 1.0), overlap_scale stays at 4.0 and

# wl_scale stays at 0.8 for the remaining 80% of training, so the

# effective wirelength weight is 0.8 * lambda_wirelength by design.

Copilot · 2026-01-18T12:57:07Z

placement.py

+        for idx in macro_indices:
+            dx = np.abs(positions[:, 0] - positions[idx, 0])
+            dy = np.abs(positions[:, 1] - positions[idx, 1])
+
+            min_sep_x = (widths + widths[idx]) / 2
+            min_sep_y = (heights + heights[idx]) / 2
+
+            overlap_x = min_sep_x - dx
+            overlap_y = min_sep_y - dy
+            mask = (overlap_x > 0) & (overlap_y > 0)
+            mask[idx] = False
+
+            if mask.any():
+                overlap_area = overlap_x[mask] * overlap_y[mask]
+                overlap_count += int(overlap_area.size)
+                total_overlap_area += float(overlap_area.sum())
+                max_overlap_area = max(max_overlap_area, float(overlap_area.max()))


The overlap counting for macros may double-count macro-to-macro overlaps. When iterating through macros and checking each against all cells, if macro A overlaps with macro B, this overlap will be counted twice: once when idx=A checks against B, and again when idx=B checks against A. The original O(n²) implementation avoided this by only checking pairs where i < j. Consider modifying the logic to exclude already-processed macros from the check, similar to how mask[idx] = False excludes self-overlaps.

Copilot · 2026-01-18T12:57:07Z

placement.py

+    macro_overlap = (overlap_x * overlap_y).sum()
+    macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)
+
+    return density_loss + 2.0 * macro_overlap


The magic number 2.0 used as the weight for macro overlap loss is not documented. This weight determines the relative importance of macro overlaps vs density-based overlap loss. Consider extracting as a named constant or documenting why this specific weight was chosen.

Copilot · 2026-01-18T12:57:08Z

placement.py

+    if num_cells >= 50000:
+        grid_size = 16
+    elif num_cells >= 20000:
+        grid_size = 20
+    elif num_cells >= 10000:
+        grid_size = 24
+    else:
+        grid_size = int(min(48, max(24, round(math.sqrt(num_cells) / 8))))


The grid size thresholds (50000, 20000, 10000) and their corresponding values (16, 20, 24) are magic numbers without clear documentation. Consider extracting these as named constants or providing documentation explaining the rationale for these specific thresholds and grid sizes.

Copilot · 2026-01-18T12:57:08Z

placement.py

+            bins = {}
+            for idx in std_indices:
+                gx = int(math.floor((positions[idx, 0] - min_x) / bin_size))
+                gy = int(math.floor((positions[idx, 1] - min_y) / bin_size))
+                bins.setdefault((gx, gy), []).append(idx)
+
+            for idx in std_indices:
+                gx = int(math.floor((positions[idx, 0] - min_x) / bin_size))
+                gy = int(math.floor((positions[idx, 1] - min_y) / bin_size))
+                for dx_bin in range(-neighbor, neighbor + 1):
+                    for dy_bin in range(-neighbor, neighbor + 1):
+                        bucket = bins.get((gx + dx_bin, gy + dy_bin))
+                        if not bucket:
+                            continue
+                        for jdx in bucket:
+                            if jdx <= idx:
+                                continue
+                            dx = abs(positions[idx, 0] - positions[jdx, 0])
+                            dy = abs(positions[idx, 1] - positions[jdx, 1])
+
+                            min_sep_x = (widths[idx] + widths[jdx]) / 2
+                            min_sep_y = (heights[idx] + heights[jdx]) / 2
+
+                            overlap_x = min_sep_x - dx
+                            overlap_y = min_sep_y - dy
+
+                            if overlap_x > 0 and overlap_y > 0:
+                                overlap_area = overlap_x * overlap_y
+                                overlap_count += 1
+                                total_overlap_area += overlap_area
+                                if overlap_area > max_overlap_area:
+                                    max_overlap_area = overlap_area


The spatial hashing implementation for large designs uses a dictionary with tuple keys (gx, gy) for binning. While functional, this could be optimized by using a more efficient data structure or pre-allocating a 2D array. Additionally, the neighbor range calculation could potentially miss overlaps if cells are larger than expected. Consider adding validation to ensure the neighbor range is sufficient to capture all potential overlaps.

Copilot · 2026-01-18T12:57:10Z

placement.py

+    if num_epochs <= 0:
+        if total_cells <= 200:
+            num_epochs = 650
+        elif total_cells <= 2000:
+            num_epochs = 450
+        elif total_cells <= 10000:
+            num_epochs = 220
+        else:
+            num_epochs = 90
+
+    edge_sample_size = None
+    wirelength_every = 1
+    wl_phase = num_epochs
+
+    if total_cells >= 10000:
+        lr = min(lr, 0.025)
+        lambda_overlap = max(lambda_overlap, 28.0)
+        lambda_wirelength = min(lambda_wirelength, 0.5)
+        edge_sample_size = 50000
+        wirelength_every = 6
+        wl_phase = max(15, int(num_epochs * 0.35))
+    elif total_cells >= 2000:
+        lr = min(lr, 0.03)
+        lambda_overlap = max(lambda_overlap, 20.0)
+        lambda_wirelength = min(lambda_wirelength, 1.1)
+        edge_sample_size = 80000
+        wirelength_every = 3
+        wl_phase = max(50, int(num_epochs * 0.6))
+    else:
+        lambda_overlap = max(lambda_overlap, 10.0)


The magic numbers used for adaptive scheduling (e.g., 650, 450, 220, 90 for epochs; 0.025, 28.0, 0.5 for hyperparameters; 50000, 80000 for edge sampling) lack documentation explaining why these specific values were chosen. This makes it difficult to understand the reasoning behind the adaptive strategy and to tune or debug the algorithm. Consider adding inline comments explaining the rationale for these thresholds and values.

Copilot · 2026-01-18T12:57:10Z

placement.py

+    dx_macro = torch.abs(macro_positions[:, None, 0] - positions[None, :, 0])
+    dy_macro = torch.abs(macro_positions[:, None, 1] - positions[None, :, 1])
+
+    min_sep_x = (macro_widths[:, None] + widths[None, :]) / 2
+    min_sep_y = (macro_heights[:, None] + heights[None, :]) / 2
+
+    overlap_x = torch.relu(min_sep_x - dx_macro)
+    overlap_y = torch.relu(min_sep_y - dy_macro)
+    macro_overlap = (overlap_x * overlap_y).sum()
+    macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)


When macro_mask results in zero macros being selected, macro_positions will be an empty tensor with shape [0, 2]. This causes dx_macro and dy_macro to have shape [0, num_cells], and subsequently macro_overlap will be computed as division by zero (0 / 1e-6), resulting in 0.0. While this works, it performs unnecessary computation. Consider adding an early check to skip macro overlap computation when no macros exist.

Suggested change

dx_macro = torch.abs(macro_positions[:, None, 0] - positions[None, :, 0])

dy_macro = torch.abs(macro_positions[:, None, 1] - positions[None, :, 1])

min_sep_x = (macro_widths[:, None] + widths[None, :]) / 2

min_sep_y = (macro_heights[:, None] + heights[None, :]) / 2

overlap_x = torch.relu(min_sep_x - dx_macro)

overlap_y = torch.relu(min_sep_y - dy_macro)

macro_overlap = (overlap_x * overlap_y).sum()

macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)

if macro_positions.shape[0] == 0:

# No macros present; skip macro overlap computation.

macro_overlap = torch.tensor(0.0, device=positions.device)

else:

dx_macro = torch.abs(macro_positions[:, None, 0] - positions[None, :, 0])

dy_macro = torch.abs(macro_positions[:, None, 1] - positions[None, :, 1])

min_sep_x = (macro_widths[:, None] + widths[None, :]) / 2

min_sep_y = (macro_heights[:, None] + heights[None, :]) / 2

overlap_x = torch.relu(min_sep_x - dx_macro)

overlap_y = torch.relu(min_sep_y - dy_macro)

macro_overlap = (overlap_x * overlap_y).sum()

macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)

Copilot · 2026-01-18T12:57:11Z

placement.py

-    N = cell_features.shape[0]
-    if N <= 1:
-        return torch.tensor(0.0, requires_grad=True)
+    del pin_features, edge_list


The statement del pin_features, edge_list explicitly deletes unused parameters. While this can help clarify that these parameters are not used in this function, using del on function parameters is uncommon in Python and can be confusing. A more conventional approach would be to prefix the parameters with an underscore (e.g., _pin_features, _edge_list) or simply not use them without explicitly deleting them. The Python garbage collector will handle cleanup automatically when the function exits.

Suggested change

del pin_features, edge_list

Copilot · 2026-01-18T12:57:11Z

placement.py

+    positions_cpu = positions.detach().cpu().numpy()
+    widths_cpu = widths.detach().cpu().numpy()
+    heights_cpu = heights.detach().cpu().numpy()
+
+    max_width = float(widths_cpu.max())
+    target_width = max(math.sqrt(total_area) * 1.2, max_width * 1.2)
+    pad = 0.05
+
+    x_vals = positions_cpu[:, 0]
+    y_vals = positions_cpu[:, 1]
+    range_x = float(x_vals.max() - x_vals.min() + 1.0)
+    sort_key = y_vals * range_x + x_vals
+    order = np.argsort(sort_key, kind="mergesort")
+
+    new_positions = np.zeros_like(positions_cpu)
+    current_x = 0.0
+    current_y = 0.0
+    row_height = 0.0
+
+    for idx in order:
+        width = float(widths_cpu[idx])
+        height = float(heights_cpu[idx])
+        pad_x = max(pad, width * 0.02)
+        pad_y = max(pad, height * 0.02)
+
+        if current_x > 0.0 and current_x + width > target_width:
+            current_y += row_height + pad_y
+            current_x = 0.0
+            row_height = 0.0
+
+        new_positions[idx, 0] = current_x + width / 2
+        new_positions[idx, 1] = current_y + height / 2
+        current_x += width + pad_x
+        row_height = max(row_height, height)
+
+    old_center = positions_cpu.mean(axis=0)
+    new_center = new_positions.mean(axis=0)
+    new_positions = new_positions + (old_center - new_center)
+
+    return torch.as_tensor(new_positions, device=device, dtype=dtype)


The function _legalize_placement converts tensors to numpy, performs operations in numpy, then converts back to tensors. This is inefficient as it breaks the computational graph and requires multiple data transfers between GPU and CPU. Consider implementing this functionality directly in PyTorch to maintain the computational graph and avoid unnecessary data transfers, especially since the placement optimization is already using PyTorch tensors.

Copilot · 2026-01-18T12:57:11Z

placement.py

    for pin_idx in range(total_pins):
-        pin_cell = pin_to_cell[pin_idx].item()
-        num_connections = torch.randint(1, 4, (1,)).item()  # 1-3 connections per pin
+        pin_cell = int(pin_to_cell[pin_idx].item())


Variable pin_cell is not used.

Suggested change

pin_cell = int(pin_to_cell[pin_idx].item())

Copilot AI review requested due to automatic review settings January 18, 2026 12:50

Copilot started reviewing on behalf of marcoshernanz January 18, 2026 12:51 View session

Copilot AI reviewed Jan 18, 2026

View reviewed changes

marcoshernanz force-pushed the main branch from b08db25 to 5471bdd Compare January 18, 2026 12:57

Optimize placement: 0.0000 overlap, 0.7480 WL, 16.68s runtime

125f916

marcoshernanz force-pushed the main branch from 5471bdd to 125f916 Compare January 18, 2026 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero Overlap Solution (Hybrid Loss + Legalization) - 16s Total Runtime#37

Zero Overlap Solution (Hybrid Loss + Legalization) - 16s Total Runtime#37
marcoshernanz wants to merge 1 commit intopartcleda:mainfrom
marcoshernanz:main

marcoshernanz commented Jan 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Copilot AI Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))
+        warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))
+        # During the first 20% of epochs, gradually:
+        # - increase overlap loss weight from 1.0x to 4.0x (overlap_scale)
+        # - decrease wirelength loss weight from 1.0x to 0.8x (wl_scale)
+        # After warmup (warmup == 1.0), overlap_scale stays at 4.0 and
+        # wl_scale stays at 0.8 for the remaining 80% of training, so the
+        # effective wirelength weight is 0.8 * lambda_wirelength by design.

Conversation

marcoshernanz commented Jan 18, 2026

Summary

Key Implementation Details

Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant