-
Notifications
You must be signed in to change notification settings - Fork 668
[Feature] Add check health in FD #5534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add check health in FD #5534
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds health monitoring functionality for the token processor to detect and prevent hang situations. The implementation introduces a health check command that can be called externally to verify if the token processor is operating correctly.
Key changes:
- Added
healthy()method to check token processor health status based on timestamp tracking - Implemented timestamp monitoring before and after batch processing
- Added
check_healthcommand handler in the internal adapter
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| fastdeploy/envs.py | Adds FD_TOKEN_PROCESSOR_HEALTH_TIMEOUT configuration with 120-second default |
| fastdeploy/output/token_processor.py | Implements health monitoring with timestamps and healthy() method to detect hung states |
| fastdeploy/splitwise/internal_adapter_utils.py | Adds check_health command handler that invokes the token processor's healthy() method |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5534 +/- ##
==========================================
Coverage ? 60.73%
==========================================
Files ? 329
Lines ? 41161
Branches ? 6274
==========================================
Hits ? 24998
Misses ? 14273
Partials ? 1890
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
防止 token processor hang 死,添加探活接口由于探知。
Modifications
新增服务层探活 check 项:token processor组件
Usage or Command
None
Accuracy Tests
None
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.