feat: detect and kill stuck agents after 20 minutes of inactivity#173
Open
maswa wants to merge 1 commit intoAutoForgeAI:masterfrom
Open
feat: detect and kill stuck agents after 20 minutes of inactivity#173maswa wants to merge 1 commit intoAutoForgeAI:masterfrom
maswa wants to merge 1 commit intoAutoForgeAI:masterfrom
Conversation
Agents that hang without producing output are now automatically detected and killed after 20 minutes of inactivity. This prevents features from being stuck indefinitely when an agent hangs. Changes: - Add AGENT_INACTIVITY_TIMEOUT constant (1200 seconds) - Track last activity timestamp per agent - Kill and restart agents with no output for 20+ minutes - Clean up tracking on agent completion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CoreAspectStu
added a commit
to CoreAspectStu/autocoder-custom
that referenced
this pull request
Feb 9, 2026
ISSUE: ------ The production systemd service was starting uvicorn directly without ensuring the React frontend was built. This caused UI changes to not appear until someone manually ran 'npm run build'. ROOT CAUSE: ----------- The ExecStart line in autocoder-ui.service bypassed start_ui.py, which contains smart build detection logic: ExecStart=/home/stu/.../venv/bin/python -m uvicorn server.main:app SOLUTION: --------- Created a production wrapper script that: 1. Runs 'npm run build' to compile TypeScript and bundle React app 2. Starts uvicorn server 3. Ensures UI changes are reflected on every service restart FILES CREATED: -------------- 1. start_ui_production.sh - Production launcher for systemd - Builds frontend before starting server - Reports build status in logs - Fails fast if build fails 2. docs/BUILD_PROCESS.md - Comprehensive documentation - Problem description and solution - How build process works - Troubleshooting guide - Verification steps 3. verify_feature_173.py - Automated verification script - Tests wrapper script exists and is executable - Verifies systemd service configuration - Tests TypeScript compilation - Confirms dist directory is created SYSTEMD CHANGES: ---------------- Modified: ~/.config/systemd/user/autocoder-ui.service ExecStart: /home/stu/projects/autocoder/venv/bin/python ... → ExecStart: /home/stu/projects/autocoder/start_ui_production.sh VERIFICATION: ------------- All 6/6 checks passed: ✅ Wrapper script exists and is executable ✅ Systemd service uses wrapper script ✅ Wrapper script contains build command ✅ TypeScript strict mode enabled ✅ TypeScript compilation succeeds (7.03s) ✅ dist directory created with assets IMPACT: ------- Before: UI changes required manual 'npm run build' → service restart After: UI changes automatically built on every service start Build time: ~7 seconds (TypeScript + Vite bundling) Output: ui/dist/ with optimized assets (~1.2 MB gzipped) Marked feature AutoForgeAI#173 as PASSING.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
When running in parallel mode, agents sometimes hang indefinitely — usually waiting for an API response that never comes back, or stuck in some internal loop. This leaves features permanently marked as "in progress" and wastes a concurrency slot.
This PR adds a simple inactivity timeout: if an agent produces no output for 20 minutes, it gets killed and the feature is released back to the queue.
How it works
last_activitytimestamp per agent, updated every time stdout produces output_check_stuck_agents()method runs each iteration of the main orchestrator loopWhy 20 minutes?
Working agents produce continuous output — tool calls, code generation, thinking blocks. Even complex features that take 1-2 hours always have activity. 20 minutes of complete silence reliably indicates something is wrong.
Changes
parallel_orchestrator.py— 80 lines added (constant, tracking dict, spawn hooks, check method, cleanup, main loop integration)Test plan
--max-concurrency 2and verify agents complete normally (no false kills)