Skip to content

Zero Overlap Solution (Hybrid Loss + Legalization) - 16s Total Runtime#37

Open
marcoshernanz wants to merge 1 commit intopartcleda:mainfrom
marcoshernanz:main
Open

Zero Overlap Solution (Hybrid Loss + Legalization) - 16s Total Runtime#37
marcoshernanz wants to merge 1 commit intopartcleda:mainfrom
marcoshernanz:main

Conversation

@marcoshernanz
Copy link

Summary

This submission implements a robust placement optimizer that guarantees zero overlap across all test cases (including the 100k-cell test) while maintaining low runtime (~17s total) and competitive wirelength.

Key Implementation Details

  1. Hybrid Overlap Loss:
    • Uses exact pairwise loss for small designs ($N \le 2000$).
    • Switches to a differentiable density-based loss (Gaussian binning) for large designs to maintain $O(N)$ memory/compute complexity.
  2. Deterministic Legalization:
    • Implements a fast row-based packing algorithm (_legalize_placement) as a post-processing step.
    • Ensures 100% valid placement (0.0000 overlap) regardless of optimization convergence.
  3. Wirelength Optimization:
    • Includes a dedicated fast-path for large designs (_quick_wirelength_opt) using aggressive sampling and simplified loss.
    • Runs pre- and post-legalization refinement to recover wirelength quality after overlap removal.
  4. Adaptive Scheduling:
    • Dynamically scales epochs, learning rates, and penalty weights based on netlist size ($N < 200$, $N < 2000$, $N < 10000$, etc.).
    • Skips heavy training loops for massive designs (100k cells) in favor of fast legalization to keep runtime milliseconds-low.

Results

  • Average Overlap: 0.0000 (Passed all 12 tests)
  • Average Wirelength: 0.7480
  • Total Runtime: ~16.68s (Test 12 ran in ~2.3s)

Copilot AI review requested due to automatic review settings January 18, 2026 12:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a sophisticated placement optimizer that achieves zero overlap across all test cases through a hybrid approach combining differentiable overlap losses with deterministic legalization. The solution adapts its strategy based on design size, using exact pairwise overlap detection for small designs and switching to density-based approximation for larger ones.

Changes:

  • Implemented hybrid overlap loss function that switches between exact pairwise (≤2048 cells) and Gaussian density-based approximation (>2048 cells)
  • Added deterministic row-based legalization post-processing to guarantee zero overlaps
  • Introduced adaptive hyperparameter scheduling based on design size with early stopping

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 18 comments.

File Description
placement.py Core implementation: hybrid overlap loss, fast wirelength optimization, legalization algorithm, adaptive training with size-dependent hyperparameters, and optimized evaluation functions for large designs
README.md Updated leaderboard with new submission results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

0.0, device=cell_positions.device, dtype=cell_positions.dtype
)

warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warmup schedule modifies wl_scale to be 1.0 - 0.2 * warmup. When warmup reaches 1.0 (after 20% of epochs), wl_scale becomes 0.8. This means the wirelength loss weight is permanently reduced to 80% of its intended value (lambda_wirelength * 0.8) for the remaining 80% of training. This may be intentional, but the behavior is not documented and could be confusing. Consider documenting this behavior or using a more intuitive schedule.

Suggested change
warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))
warmup = min(1.0, (epoch + 1) / max(1, num_epochs * 0.2))
# During the first 20% of epochs, gradually:
# - increase overlap loss weight from 1.0x to 4.0x (overlap_scale)
# - decrease wirelength loss weight from 1.0x to 0.8x (wl_scale)
# After warmup (warmup == 1.0), overlap_scale stays at 4.0 and
# wl_scale stays at 0.8 for the remaining 80% of training, so the
# effective wirelength weight is 0.8 * lambda_wirelength by design.

Copilot uses AI. Check for mistakes.
Comment on lines +813 to +829
for idx in macro_indices:
dx = np.abs(positions[:, 0] - positions[idx, 0])
dy = np.abs(positions[:, 1] - positions[idx, 1])

min_sep_x = (widths + widths[idx]) / 2
min_sep_y = (heights + heights[idx]) / 2

overlap_x = min_sep_x - dx
overlap_y = min_sep_y - dy
mask = (overlap_x > 0) & (overlap_y > 0)
mask[idx] = False

if mask.any():
overlap_area = overlap_x[mask] * overlap_y[mask]
overlap_count += int(overlap_area.size)
total_overlap_area += float(overlap_area.sum())
max_overlap_area = max(max_overlap_area, float(overlap_area.max()))
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overlap counting for macros may double-count macro-to-macro overlaps. When iterating through macros and checking each against all cells, if macro A overlaps with macro B, this overlap will be counted twice: once when idx=A checks against B, and again when idx=B checks against A. The original O(n²) implementation avoided this by only checking pairs where i < j. Consider modifying the logic to exclude already-processed macros from the check, similar to how mask[idx] = False excludes self-overlaps.

Copilot uses AI. Check for mistakes.
macro_overlap = (overlap_x * overlap_y).sum()
macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)

return density_loss + 2.0 * macro_overlap
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 2.0 used as the weight for macro overlap loss is not documented. This weight determines the relative importance of macro overlaps vs density-based overlap loss. Consider extracting as a named constant or documenting why this specific weight was chosen.

Copilot uses AI. Check for mistakes.
Comment on lines +358 to +365
if num_cells >= 50000:
grid_size = 16
elif num_cells >= 20000:
grid_size = 20
elif num_cells >= 10000:
grid_size = 24
else:
grid_size = int(min(48, max(24, round(math.sqrt(num_cells) / 8))))
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grid size thresholds (50000, 20000, 10000) and their corresponding values (16, 20, 24) are magic numbers without clear documentation. Consider extracting these as named constants or providing documentation explaining the rationale for these specific thresholds and grid sizes.

Copilot uses AI. Check for mistakes.
Comment on lines +842 to +873
bins = {}
for idx in std_indices:
gx = int(math.floor((positions[idx, 0] - min_x) / bin_size))
gy = int(math.floor((positions[idx, 1] - min_y) / bin_size))
bins.setdefault((gx, gy), []).append(idx)

for idx in std_indices:
gx = int(math.floor((positions[idx, 0] - min_x) / bin_size))
gy = int(math.floor((positions[idx, 1] - min_y) / bin_size))
for dx_bin in range(-neighbor, neighbor + 1):
for dy_bin in range(-neighbor, neighbor + 1):
bucket = bins.get((gx + dx_bin, gy + dy_bin))
if not bucket:
continue
for jdx in bucket:
if jdx <= idx:
continue
dx = abs(positions[idx, 0] - positions[jdx, 0])
dy = abs(positions[idx, 1] - positions[jdx, 1])

min_sep_x = (widths[idx] + widths[jdx]) / 2
min_sep_y = (heights[idx] + heights[jdx]) / 2

overlap_x = min_sep_x - dx
overlap_y = min_sep_y - dy

if overlap_x > 0 and overlap_y > 0:
overlap_area = overlap_x * overlap_y
overlap_count += 1
total_overlap_area += overlap_area
if overlap_area > max_overlap_area:
max_overlap_area = overlap_area
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spatial hashing implementation for large designs uses a dictionary with tuple keys (gx, gy) for binning. While functional, this could be optimized by using a more efficient data structure or pre-allocating a 2D array. Additionally, the neighbor range calculation could potentially miss overlaps if cells are larger than expected. Consider adding validation to ensure the neighbor range is sufficient to capture all potential overlaps.

Copilot uses AI. Check for mistakes.
Comment on lines +611 to +640
if num_epochs <= 0:
if total_cells <= 200:
num_epochs = 650
elif total_cells <= 2000:
num_epochs = 450
elif total_cells <= 10000:
num_epochs = 220
else:
num_epochs = 90

edge_sample_size = None
wirelength_every = 1
wl_phase = num_epochs

if total_cells >= 10000:
lr = min(lr, 0.025)
lambda_overlap = max(lambda_overlap, 28.0)
lambda_wirelength = min(lambda_wirelength, 0.5)
edge_sample_size = 50000
wirelength_every = 6
wl_phase = max(15, int(num_epochs * 0.35))
elif total_cells >= 2000:
lr = min(lr, 0.03)
lambda_overlap = max(lambda_overlap, 20.0)
lambda_wirelength = min(lambda_wirelength, 1.1)
edge_sample_size = 80000
wirelength_every = 3
wl_phase = max(50, int(num_epochs * 0.6))
else:
lambda_overlap = max(lambda_overlap, 10.0)
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic numbers used for adaptive scheduling (e.g., 650, 450, 220, 90 for epochs; 0.025, 28.0, 0.5 for hyperparameters; 50000, 80000 for edge sampling) lack documentation explaining why these specific values were chosen. This makes it difficult to understand the reasoning behind the adaptive strategy and to tune or debug the algorithm. Consider adding inline comments explaining the rationale for these thresholds and values.

Copilot uses AI. Check for mistakes.
Comment on lines +398 to +407
dx_macro = torch.abs(macro_positions[:, None, 0] - positions[None, :, 0])
dy_macro = torch.abs(macro_positions[:, None, 1] - positions[None, :, 1])

min_sep_x = (macro_widths[:, None] + widths[None, :]) / 2
min_sep_y = (macro_heights[:, None] + heights[None, :]) / 2

overlap_x = torch.relu(min_sep_x - dx_macro)
overlap_y = torch.relu(min_sep_y - dy_macro)
macro_overlap = (overlap_x * overlap_y).sum()
macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When macro_mask results in zero macros being selected, macro_positions will be an empty tensor with shape [0, 2]. This causes dx_macro and dy_macro to have shape [0, num_cells], and subsequently macro_overlap will be computed as division by zero (0 / 1e-6), resulting in 0.0. While this works, it performs unnecessary computation. Consider adding an early check to skip macro overlap computation when no macros exist.

Suggested change
dx_macro = torch.abs(macro_positions[:, None, 0] - positions[None, :, 0])
dy_macro = torch.abs(macro_positions[:, None, 1] - positions[None, :, 1])
min_sep_x = (macro_widths[:, None] + widths[None, :]) / 2
min_sep_y = (macro_heights[:, None] + heights[None, :]) / 2
overlap_x = torch.relu(min_sep_x - dx_macro)
overlap_y = torch.relu(min_sep_y - dy_macro)
macro_overlap = (overlap_x * overlap_y).sum()
macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)
if macro_positions.shape[0] == 0:
# No macros present; skip macro overlap computation.
macro_overlap = torch.tensor(0.0, device=positions.device)
else:
dx_macro = torch.abs(macro_positions[:, None, 0] - positions[None, :, 0])
dy_macro = torch.abs(macro_positions[:, None, 1] - positions[None, :, 1])
min_sep_x = (macro_widths[:, None] + widths[None, :]) / 2
min_sep_y = (macro_heights[:, None] + heights[None, :]) / 2
overlap_x = torch.relu(min_sep_x - dx_macro)
overlap_y = torch.relu(min_sep_y - dy_macro)
macro_overlap = (overlap_x * overlap_y).sum()
macro_overlap = macro_overlap / (macro_positions.shape[0] * num_cells + 1e-6)

Copilot uses AI. Check for mistakes.
N = cell_features.shape[0]
if N <= 1:
return torch.tensor(0.0, requires_grad=True)
del pin_features, edge_list
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement del pin_features, edge_list explicitly deletes unused parameters. While this can help clarify that these parameters are not used in this function, using del on function parameters is uncommon in Python and can be confusing. A more conventional approach would be to prefix the parameters with an underscore (e.g., _pin_features, _edge_list) or simply not use them without explicitly deleting them. The Python garbage collector will handle cleanup automatically when the function exits.

Suggested change
del pin_features, edge_list

Copilot uses AI. Check for mistakes.
Comment on lines +490 to +529
positions_cpu = positions.detach().cpu().numpy()
widths_cpu = widths.detach().cpu().numpy()
heights_cpu = heights.detach().cpu().numpy()

max_width = float(widths_cpu.max())
target_width = max(math.sqrt(total_area) * 1.2, max_width * 1.2)
pad = 0.05

x_vals = positions_cpu[:, 0]
y_vals = positions_cpu[:, 1]
range_x = float(x_vals.max() - x_vals.min() + 1.0)
sort_key = y_vals * range_x + x_vals
order = np.argsort(sort_key, kind="mergesort")

new_positions = np.zeros_like(positions_cpu)
current_x = 0.0
current_y = 0.0
row_height = 0.0

for idx in order:
width = float(widths_cpu[idx])
height = float(heights_cpu[idx])
pad_x = max(pad, width * 0.02)
pad_y = max(pad, height * 0.02)

if current_x > 0.0 and current_x + width > target_width:
current_y += row_height + pad_y
current_x = 0.0
row_height = 0.0

new_positions[idx, 0] = current_x + width / 2
new_positions[idx, 1] = current_y + height / 2
current_x += width + pad_x
row_height = max(row_height, height)

old_center = positions_cpu.mean(axis=0)
new_center = new_positions.mean(axis=0)
new_positions = new_positions + (old_center - new_center)

return torch.as_tensor(new_positions, device=device, dtype=dtype)
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function _legalize_placement converts tensors to numpy, performs operations in numpy, then converts back to tensors. This is inefficient as it breaks the computational graph and requires multiple data transfers between GPU and CPU. Consider implementing this functionality directly in PyTorch to maintain the computational graph and avoid unnecessary data transfers, especially since the placement optimization is already using PyTorch tensors.

Copilot uses AI. Check for mistakes.
for pin_idx in range(total_pins):
pin_cell = pin_to_cell[pin_idx].item()
num_connections = torch.randint(1, 4, (1,)).item() # 1-3 connections per pin
pin_cell = int(pin_to_cell[pin_idx].item())
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable pin_cell is not used.

Suggested change
pin_cell = int(pin_to_cell[pin_idx].item())

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant