Conversation
…script Automates E2E failure triage with three new components: - scripts/download-e2e-artifacts.sh: reusable script to download CI artifacts - .claude/skills/e2e-triage/SKILL.md: 7-step triage skill (classify flaky vs real bug, create PRs or issues) - .github/workflows/e2e-triage.yml: workflow_run trigger that auto-runs Claude Opus on E2E failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 1aa72dcd8a2b
Post "Claude is triaging..." when triage starts and a structured summary with PR/issue links when it completes. The skill now writes triage-summary.json which the workflow parses with jq for the Slack message. Falls back to a warning if no summary is produced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 8e5dcc6ef8ab
…ifications - Build Slack payload via jq (payload-file-path) instead of interpolating raw text into inline JSON, which broke on quotes/newlines in summaries - Add secrets.E2E_SLACK_WEBHOOK_URL guard to "Build Slack summary" and "Notify Slack - triage complete" steps (matching the "started" step) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7c1914052967
Rewrite SKILL.md with dual-mode support (auto-detected via WORKFLOW_RUN_ID env var): local mode runs tests with mise and re-runs failures up to 3 times, CI mode triggers e2e-isolated.yml workflows for re-run verification. Classification now uses re-run results as the primary signal (all fail = real-bug, mixed results = flaky). Workflow changes: actions permission upgraded to write for gh workflow run, timeout increased to 60m for re-run polling, Claude prompt updated with CI mode hint and re-run instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 9f75c3effd9b
Local mode now presents findings interactively and applies fixes directly in the working tree instead of creating branches/PRs/issues: - Step 4a: findings report, proposed fixes, user approval gate, in-place fixes - Step 4b: unchanged CI behavior (batched PR for flaky, issues for real bugs) - Step 5: local mode gets simpler summary table, no triage-summary.json Entire-Checkpoint: 4e1d9cf59d52
Consistent test failures can be test infrastructure bugs (e2e/ code), not product bugs (cmd/entire/cli/). Update classification signals, fix lists, and action sections to distinguish the two. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 28c90fcc7266
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: ba6877944a6c
Replace duplicated artifact-reading steps in e2e-triage Step 1 with a reference to debug-e2e's Debugging Workflow (steps 2-5), keeping the collect list so classification inputs remain clear. Add Related Skills section to README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: eb14496bde1e
Entire-Checkpoint: bb778fbab533
…d GitHub issues - Remove workflow_run trigger from e2e-triage.yml (now dispatch-only) - Remove issues permission and gh issue commands from CI mode - Replace real-bug GitHub issues with structured CI log reports - Add triage link to Slack failure notification in e2e.yml - Update skill docs and README to reflect new behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 4725d29fd8ff
Remove CI mode (branch creation, PRs, triage-summary.json, CI re-runs via gh workflow run) from the e2e-triage skill while preserving local debugging of CI failures (downloading artifacts, analyzing them, running tests locally). Also removes the e2e-triage.yml workflow and the triage link from the E2E Slack failure notification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c8ddf0fdb9df
Teach Step L1 to accept CI run references (latest, run ID, run URL) and use scripts/download-e2e-artifacts.sh to fetch artifacts, skipping local re-runs and jumping straight to shared analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fb0c122d807e
Skip re-downloading when the artifact directory already exists and is non-empty, printing a log message instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 5f75d4661043
…ge skill Pre-create ~/.config/cursor/ in Bootstrap() so the cursor CLI doesn't crash with ENOENT when writing cli-config.json after accepting workspace trust on CI. Follows the same pattern used by Claude, Gemini, and Droid agents. Update e2e-triage skill to require running real E2E tests after applying fixes, scoped by change type: agent-specific → that agent's full suite, shared infra → all affected agents, prompt-only → just the affected test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fa1e4e3fb457
Split the monolithic e2e-triage skill into three focused commands (triage-ci, debug, implement) following the agent-integration plugin pattern. Triage-ci is report-only, implement is action-only, and the /e2e orchestrator runs both sequentially. - Create .claude/plugins/e2e/ with plugin.json and command wrappers - Create .claude/skills/e2e/ with orchestrator SKILL.md and 3 procedures - Delete old .claude/skills/e2e-triage/ and .claude/skills/debug-e2e/ - Update all /debug-e2e references to /e2e:debug in agent-integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 2b582392c6b6
PR SummaryLow Risk Overview Adds Hardens the Written by Cursor Bugbot for commit 3e27584. Configure here. |
There was a problem hiding this comment.
Pull request overview
Adds tooling and documentation to improve debugging/triage of flaky E2E runs (especially Cursor), including a helper script for downloading CI artifacts and new .claude skill/plugin docs for a triage→fix workflow.
Changes:
- Add
scripts/download-e2e-artifacts.shto fetch and normalize GitHub Actions E2E artifacts locally. - Update Cursor E2E agent bootstrap to pre-create the Cursor config directory to avoid runtime ENOENT failures.
- Add/update E2E triage/debug/implement skill + plugin documentation and refresh E2E README guidance.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/download-e2e-artifacts.sh | New helper script to locate a run (latest/ID/URL), download artifacts, flatten wrapper dirs, and write .run-info.json. |
| e2e/agents/cursor_cli.go | Create Cursor config directory during bootstrap to reduce flaky failures. |
| e2e/README.md | Document local artifact download + new triage workflow references. |
| cmd/entire/cli/strategy/common_test.go | Minor formatting/alignment in tests. |
| .claude/skills/e2e/triage-ci.md | New CI triage procedure doc (download artifacts or rerun locally, classify flaky vs real-bug). |
| .claude/skills/e2e/implement.md | New procedure doc for applying fixes and verifying via scoped E2E runs. |
| .claude/skills/e2e/debug.md | Update/normalize debug procedure doc formatting. |
| .claude/skills/e2e/SKILL.md | Add orchestrator skill definition for the E2E triage→implement pipeline. |
| .claude/skills/agent-integration/*.md | Update references to use /e2e:debug instead of the old command name. |
| .claude/plugins/e2e/** | Add local plugin command wrappers + plugin metadata and README. |
| ;; | ||
| http*) | ||
| # Extract run ID from URL: https://github.com/<owner>/<repo>/actions/runs/<id> | ||
| run_id=$(echo "$input" | grep -oE '/runs/[0-9]+' | grep -oE '[0-9]+') |
There was a problem hiding this comment.
With set -euo pipefail, this URL parsing pipeline will cause the script to exit immediately if the URL doesn’t contain /runs/<id> (grep returns non-zero), so the subsequent "Could not extract" check never runs. Capture the pipeline’s failure explicitly (e.g., via if ! run_id=...; then die ...; fi or || true) so invalid URLs produce the intended error message instead of an abrupt exit.
| run_id=$(echo "$input" | grep -oE '/runs/[0-9]+' | grep -oE '[0-9]+') | |
| if ! run_id=$(echo "$input" | grep -oE '/runs/[0-9]+' | grep -oE '[0-9]+'); then | |
| die "Could not extract run ID from URL: $input" | |
| fi |
| else | ||
| mkdir -p "$dest" | ||
| log "Downloading artifacts to $dest/ ..." | ||
| gh run download "$run_id" --dir "$dest" 2>&1 >&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)." |
There was a problem hiding this comment.
The redirection 2>&1 >&2 doesn’t reliably enforce the contract that only the final absolute path is written to stdout; depending on evaluation order it can still leak output to stdout. Redirect the command’s stdout to stderr explicitly (and keep stderr on stderr) so callers can safely capture stdout.
| gh run download "$run_id" --dir "$dest" 2>&1 >&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)." | |
| gh run download "$run_id" --dir "$dest" 1>&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)." |
|
|
||
| # --- Write run metadata --- | ||
|
|
||
| agents_found=$(cd "$dest" && ls -d */ 2>/dev/null | tr -d '/' | tr '\n' ', ' | sed 's/,$//') |
There was a problem hiding this comment.
agents_found is computed via ls -d */ | ... under set -euo pipefail. If the download directory contains no subdirectories (e.g., no artifacts were present or the layout changes), the */ glob/ls will fail and the script will exit before writing .run-info.json. Consider using a glob with nullglob, find, or otherwise handling the empty case so the script fails with a clear message (or records an empty agent list) instead of exiting on ls.
| agents_found=$(cd "$dest" && ls -d */ 2>/dev/null | tr -d '/' | tr '\n' ', ' | sed 's/,$//') | |
| agents_found=$(cd "$dest" && find . -mindepth 1 -maxdepth 1 -type d -printf '%f\n' | tr '\n' ', ' | sed 's/, $//') |
| home, err := os.UserHomeDir() | ||
| if err != nil { | ||
| return fmt.Errorf("get home dir: %w", err) | ||
| } | ||
| dir := filepath.Join(home, ".config", "cursor") |
There was a problem hiding this comment.
This hard-codes the Linux-style config path ~/.config/cursor. To keep local E2E runs working across OSes (and respect XDG_CONFIG_HOME), prefer deriving the base config dir with os.UserConfigDir() and then appending cursor.
| home, err := os.UserHomeDir() | |
| if err != nil { | |
| return fmt.Errorf("get home dir: %w", err) | |
| } | |
| dir := filepath.Join(home, ".config", "cursor") | |
| cfgDir, err := os.UserConfigDir() | |
| if err != nil { | |
| return fmt.Errorf("get user config dir: %w", err) | |
| } | |
| dir := filepath.Join(cfgDir, "cursor") |
| Use the `debug-e2e` skill (`.claude/skills/debug-e2e/`) for a structured workflow when investigating failures. | ||
|
|
||
| Use the `e2e-triage` skill (`.claude/skills/e2e-triage/`) to automate full triage: download CI artifacts, classify failures as flaky vs real bug, and create PRs or GitHub issues. Run locally with `/e2e-triage` or see the automated CI workflow below. |
There was a problem hiding this comment.
This references an e2e-triage skill at .claude/skills/e2e-triage/ and a /e2e-triage command, but this PR adds the E2E skills under .claude/skills/e2e/ and the plugin commands as /e2e:triage-ci (or the orchestrator /e2e). Update the paths/command names here so the README matches what actually exists in the repo.
| Use the `debug-e2e` skill (`.claude/skills/debug-e2e/`) for a structured workflow when investigating failures. | |
| Use the `e2e-triage` skill (`.claude/skills/e2e-triage/`) to automate full triage: download CI artifacts, classify failures as flaky vs real bug, and create PRs or GitHub issues. Run locally with `/e2e-triage` or see the automated CI workflow below. | |
| Use the E2E debug workflow in the `e2e` skill (`.claude/skills/e2e/`) for a structured workflow when investigating failures. | |
| Use the CI triage workflow in the same `e2e` skill to automate full triage: download CI artifacts, classify failures as flaky vs real bug, and create PRs or GitHub issues. Run locally with the `/e2e:triage-ci` plugin command (or invoke the orchestrator with `/e2e`), or see the automated CI workflow below. |
|
|
||
| - **`.github/workflows/e2e.yml`** — Runs full suite on push to main. Matrix: `[claude, opencode, gemini]`. | ||
| - **`.github/workflows/e2e-isolated.yml`** — Manual dispatch for debugging a single test. Inputs: agent + test name filter. | ||
| - **`.github/workflows/e2e-triage.yml`** — Auto-triggers on E2E failure via `workflow_run`. Runs Claude Code (Opus) to download artifacts, classify failures, and create PRs (flaky) or issues (real bugs). |
There was a problem hiding this comment.
The README lists .github/workflows/e2e-triage.yml, but there’s no such workflow file in .github/workflows/ in this branch. Either add the workflow in this PR or remove/adjust this bullet so the documentation doesn’t point at a non-existent file.
| - **`.github/workflows/e2e-triage.yml`** — Auto-triggers on E2E failure via `workflow_run`. Runs Claude Code (Opus) to download artifacts, classify failures, and create PRs (flaky) or issues (real bugs). |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true | ||
| else | ||
| mv "$wrapper" "$agent" | ||
| fi |
There was a problem hiding this comment.
Wrapper directory not removed after copy operation
Low Severity
In the restructuring loop, when the $agent directory already exists, cp -r copies from the e2e-artifacts-* wrapper into it but never removes the wrapper directory. The stale e2e-artifacts-* directory then gets picked up by the ls -d */ on line 92, polluting agents_found in .run-info.json with entries like e2e-artifacts-claude-code alongside claude-code.
Cursor's atomic config write (cli-config.json.tmp → cli-config.json)
races when parallel tests trigger "Workspace Trust Required"
simultaneously. Pre-seeding the file with {} in Bootstrap() avoids
the temp-file rename path entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 47c9b5f45145


No description provided.