WIP - E2E triage by alishakawaguchi · Pull Request #647 · entireio/cli

alishakawaguchi · 2026-03-06T20:19:21Z

No description provided.

…script Automates E2E failure triage with three new components: - scripts/download-e2e-artifacts.sh: reusable script to download CI artifacts - .claude/skills/e2e-triage/SKILL.md: 7-step triage skill (classify flaky vs real bug, create PRs or issues) - .github/workflows/e2e-triage.yml: workflow_run trigger that auto-runs Claude Opus on E2E failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 1aa72dcd8a2b

Post "Claude is triaging..." when triage starts and a structured summary with PR/issue links when it completes. The skill now writes triage-summary.json which the workflow parses with jq for the Slack message. Falls back to a warning if no summary is produced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 8e5dcc6ef8ab

…ifications - Build Slack payload via jq (payload-file-path) instead of interpolating raw text into inline JSON, which broke on quotes/newlines in summaries - Add secrets.E2E_SLACK_WEBHOOK_URL guard to "Build Slack summary" and "Notify Slack - triage complete" steps (matching the "started" step) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7c1914052967

cursor · 2026-03-06T20:19:26Z

PR Summary

High Risk
Adds a new workflow_run GitHub Action that runs an LLM with contents/pull-requests/issues write permissions to create PRs and issues based on CI artifacts, which is security- and process-sensitive. Misclassification or prompt-injection via artifact/log content could result in unintended repo changes or noisy issue/PR creation.

Overview
Introduces an automated E2E-failure triage pipeline: a new e2e-triage Claude skill plus a E2E Triage GitHub Action that triggers when E2E Tests fails, downloads the run’s artifacts, classifies failures as flaky vs real bug, and then creates batched PRs for flaky test hardening or files/comments on GitHub issues for real bugs.

Adds scripts/download-e2e-artifacts.sh to fetch and normalize per-agent artifacts into e2e/artifacts/ci-<run-id>/ (including .run-info.json), and updates e2e/README.md with instructions for local artifact download and the new triage workflow (including optional Slack start/complete summaries via triage-summary.json).

^{Written by Cursor Bugbot for commit 3edf654. Configure here.}

Copilot

Pull request overview

This PR adds automated E2E test failure triage infrastructure. When E2E tests fail in CI, a new workflow_run-triggered workflow invokes Claude Code (Opus) to download artifacts, analyze failures, classify them as flaky (agent non-determinism) or real bugs, and automatically create PRs for flaky fixes or GitHub issues for real bugs. It also includes Slack notifications at each stage.

Changes:

New e2e-triage.yml GitHub Actions workflow that auto-triggers on E2E test failures, runs Claude Code to triage, and sends Slack notifications with structured summaries.
New download-e2e-artifacts.sh script to download and restructure E2E test artifacts from GitHub Actions by run ID, URL, or "latest" failed run.
New .claude/skills/e2e-triage/SKILL.md providing structured instructions for Claude to classify and act on E2E failures, plus README updates documenting the new workflow and artifact download process.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`.github/workflows/e2e-triage.yml`	New workflow triggered by E2E test failures; downloads artifacts, runs Claude Code triage, sends Slack notifications
`scripts/download-e2e-artifacts.sh`	New script to download, restructure, and annotate E2E artifacts from GitHub Actions
`.claude/skills/e2e-triage/SKILL.md`	Skill instructions for Claude to analyze failures, classify them, and create PRs/issues
`e2e/README.md`	Documents the new triage workflow, skill, and artifact download script

Copilot · 2026-03-06T20:24:49Z

scripts/download-e2e-artifacts.sh

+  # Move contents up: e2e-artifacts-claude-code/* -> claude-code/
+  if [ -d "$agent" ]; then
+    # Agent dir already exists (shouldn't happen, but be safe)
+    cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true


In the fallback cp -r branch (when $agent dir already exists), the original $wrapper directory is not removed after copying its contents. This means both e2e-artifacts-claude-code/ and claude-code/ would coexist, and the wrapper directory would appear in the agents_found listing on line 87, producing incorrect metadata.

Add rm -rf "$wrapper" after the cp -r to clean up the wrapper directory.

Suggested change

cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true

cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true

rm -rf "$wrapper"

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-03-06T20:25:38Z

scripts/download-e2e-artifacts.sh

+    cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true
+  else
+    mv "$wrapper" "$agent"
+  fi


Stale wrapper directories left after copy branch

Medium Severity

When the cp -r branch is taken (agent directory already exists), the original e2e-artifacts-* wrapper directory is never removed. This leaves stale wrapper directories that get included in agents_found (via ls -d */) and written into .run-info.json. Downstream, the triage skill iterates "each agent subdirectory in the artifact root," so it would scan these stale wrappers as if they were additional agents, potentially causing duplicate failure reports.

Additional Locations (1)

scripts/download-e2e-artifacts.sh#L86-L87

Rewrite SKILL.md with dual-mode support (auto-detected via WORKFLOW_RUN_ID env var): local mode runs tests with mise and re-runs failures up to 3 times, CI mode triggers e2e-isolated.yml workflows for re-run verification. Classification now uses re-run results as the primary signal (all fail = real-bug, mixed results = flaky). Workflow changes: actions permission upgraded to write for gh workflow run, timeout increased to 60m for re-run polling, Claude prompt updated with CI mode hint and re-run instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 9f75c3effd9b

Local mode now presents findings interactively and applies fixes directly in the working tree instead of creating branches/PRs/issues: - Step 4a: findings report, proposed fixes, user approval gate, in-place fixes - Step 4b: unchanged CI behavior (batched PR for flaky, issues for real bugs) - Step 5: local mode gets simpler summary table, no triage-summary.json Entire-Checkpoint: 4e1d9cf59d52

Consistent test failures can be test infrastructure bugs (e2e/ code), not product bugs (cmd/entire/cli/). Update classification signals, fix lists, and action sections to distinguish the two. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 28c90fcc7266

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: ba6877944a6c

Replace duplicated artifact-reading steps in e2e-triage Step 1 with a reference to debug-e2e's Debugging Workflow (steps 2-5), keeping the collect list so classification inputs remain clear. Add Related Skills section to README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: eb14496bde1e

Entire-Checkpoint: bb778fbab533

alishakawaguchi and others added 3 commits March 6, 2026 11:42

alishakawaguchi self-assigned this Mar 6, 2026

Copilot AI review requested due to automatic review settings March 6, 2026 20:19

Copilot started reviewing on behalf of alishakawaguchi March 6, 2026 20:19 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

cursor bot reviewed Mar 6, 2026

View reviewed changes

alishakawaguchi and others added 8 commits March 6, 2026 12:47

Add README to E2E triage skill

6bdf24d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: ba6877944a6c

For testing purposes only

39d1d8f

For testing purposes only

4f54947

Fix allowed tools

270d48d

Entire-Checkpoint: bb778fbab533

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - E2E triage#647

WIP - E2E triage#647
alishakawaguchi wants to merge 11 commits intomainfrom
alisha/e2e-triage

alishakawaguchi commented Mar 6, 2026

Uh oh!

cursor bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

	cp -r "$wrapper"/* "$agent"/ 2>/dev/null \|\| true
	cp -r "$wrapper"/* "$agent"/ 2>/dev/null \|\| true
	rm -rf "$wrapper"

Conversation

alishakawaguchi commented Mar 6, 2026

Uh oh!

cursor bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 6, 2026

Choose a reason for hiding this comment

Stale wrapper directories left after copy branch

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

cursor bot commented Mar 6, 2026 •

edited

Loading