Skip to content

WIP - E2E triage#647

Draft
alishakawaguchi wants to merge 11 commits intomainfrom
alisha/e2e-triage
Draft

WIP - E2E triage#647
alishakawaguchi wants to merge 11 commits intomainfrom
alisha/e2e-triage

Conversation

@alishakawaguchi
Copy link
Contributor

No description provided.

alishakawaguchi and others added 3 commits March 6, 2026 11:42
…script

Automates E2E failure triage with three new components:
- scripts/download-e2e-artifacts.sh: reusable script to download CI artifacts
- .claude/skills/e2e-triage/SKILL.md: 7-step triage skill (classify flaky vs real bug, create PRs or issues)
- .github/workflows/e2e-triage.yml: workflow_run trigger that auto-runs Claude Opus on E2E failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1aa72dcd8a2b
Post "Claude is triaging..." when triage starts and a structured
summary with PR/issue links when it completes. The skill now writes
triage-summary.json which the workflow parses with jq for the Slack
message. Falls back to a warning if no summary is produced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8e5dcc6ef8ab
…ifications

- Build Slack payload via jq (payload-file-path) instead of interpolating
  raw text into inline JSON, which broke on quotes/newlines in summaries
- Add secrets.E2E_SLACK_WEBHOOK_URL guard to "Build Slack summary" and
  "Notify Slack - triage complete" steps (matching the "started" step)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 7c1914052967
@alishakawaguchi alishakawaguchi self-assigned this Mar 6, 2026
Copilot AI review requested due to automatic review settings March 6, 2026 20:19
@cursor
Copy link

cursor bot commented Mar 6, 2026

PR Summary

High Risk
Adds a new workflow_run GitHub Action that runs an LLM with contents/pull-requests/issues write permissions to create PRs and issues based on CI artifacts, which is security- and process-sensitive. Misclassification or prompt-injection via artifact/log content could result in unintended repo changes or noisy issue/PR creation.

Overview
Introduces an automated E2E-failure triage pipeline: a new e2e-triage Claude skill plus a E2E Triage GitHub Action that triggers when E2E Tests fails, downloads the run’s artifacts, classifies failures as flaky vs real bug, and then creates batched PRs for flaky test hardening or files/comments on GitHub issues for real bugs.

Adds scripts/download-e2e-artifacts.sh to fetch and normalize per-agent artifacts into e2e/artifacts/ci-<run-id>/ (including .run-info.json), and updates e2e/README.md with instructions for local artifact download and the new triage workflow (including optional Slack start/complete summaries via triage-summary.json).

Written by Cursor Bugbot for commit 3edf654. Configure here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automated E2E test failure triage infrastructure. When E2E tests fail in CI, a new workflow_run-triggered workflow invokes Claude Code (Opus) to download artifacts, analyze failures, classify them as flaky (agent non-determinism) or real bugs, and automatically create PRs for flaky fixes or GitHub issues for real bugs. It also includes Slack notifications at each stage.

Changes:

  • New e2e-triage.yml GitHub Actions workflow that auto-triggers on E2E test failures, runs Claude Code to triage, and sends Slack notifications with structured summaries.
  • New download-e2e-artifacts.sh script to download and restructure E2E test artifacts from GitHub Actions by run ID, URL, or "latest" failed run.
  • New .claude/skills/e2e-triage/SKILL.md providing structured instructions for Claude to classify and act on E2E failures, plus README updates documenting the new workflow and artifact download process.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
.github/workflows/e2e-triage.yml New workflow triggered by E2E test failures; downloads artifacts, runs Claude Code triage, sends Slack notifications
scripts/download-e2e-artifacts.sh New script to download, restructure, and annotate E2E artifacts from GitHub Actions
.claude/skills/e2e-triage/SKILL.md Skill instructions for Claude to analyze failures, classify them, and create PRs/issues
e2e/README.md Documents the new triage workflow, skill, and artifact download script

# Move contents up: e2e-artifacts-claude-code/* -> claude-code/
if [ -d "$agent" ]; then
# Agent dir already exists (shouldn't happen, but be safe)
cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the fallback cp -r branch (when $agent dir already exists), the original $wrapper directory is not removed after copying its contents. This means both e2e-artifacts-claude-code/ and claude-code/ would coexist, and the wrapper directory would appear in the agents_found listing on line 87, producing incorrect metadata.

Add rm -rf "$wrapper" after the cp -r to clean up the wrapper directory.

Suggested change
cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true
cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true
rm -rf "$wrapper"

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true
else
mv "$wrapper" "$agent"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale wrapper directories left after copy branch

Medium Severity

When the cp -r branch is taken (agent directory already exists), the original e2e-artifacts-* wrapper directory is never removed. This leaves stale wrapper directories that get included in agents_found (via ls -d */) and written into .run-info.json. Downstream, the triage skill iterates "each agent subdirectory in the artifact root," so it would scan these stale wrappers as if they were additional agents, potentially causing duplicate failure reports.

Additional Locations (1)

Fix in Cursor Fix in Web

alishakawaguchi and others added 8 commits March 6, 2026 12:47
Rewrite SKILL.md with dual-mode support (auto-detected via WORKFLOW_RUN_ID
env var): local mode runs tests with mise and re-runs failures up to 3
times, CI mode triggers e2e-isolated.yml workflows for re-run verification.
Classification now uses re-run results as the primary signal (all fail =
real-bug, mixed results = flaky).

Workflow changes: actions permission upgraded to write for gh workflow run,
timeout increased to 60m for re-run polling, Claude prompt updated with
CI mode hint and re-run instructions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 9f75c3effd9b
Local mode now presents findings interactively and applies fixes
directly in the working tree instead of creating branches/PRs/issues:
- Step 4a: findings report, proposed fixes, user approval gate, in-place fixes
- Step 4b: unchanged CI behavior (batched PR for flaky, issues for real bugs)
- Step 5: local mode gets simpler summary table, no triage-summary.json

Entire-Checkpoint: 4e1d9cf59d52
Consistent test failures can be test infrastructure bugs (e2e/ code),
not product bugs (cmd/entire/cli/). Update classification signals,
fix lists, and action sections to distinguish the two.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 28c90fcc7266
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: ba6877944a6c
Replace duplicated artifact-reading steps in e2e-triage Step 1 with a
reference to debug-e2e's Debugging Workflow (steps 2-5), keeping the
collect list so classification inputs remain clear. Add Related Skills
section to README.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: eb14496bde1e
Entire-Checkpoint: bb778fbab533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants