Skip to content

feat(reliability): add retry governor controls and telemetry#40

Open
ndycode wants to merge 1 commit intomainfrom
transform/stage-01-reliability
Open

feat(reliability): add retry governor controls and telemetry#40
ndycode wants to merge 1 commit intomainfrom
transform/stage-01-reliability

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 4, 2026

Summary

  • add a pure retry governor decision module for all-rate-limited retry behavior
  • add
    etryAllAccountsAbsoluteCeilingMs\ + \CODEX_AUTH_RETRY_ALL_ABSOLUTE_CEILING_MS\ and wire it into the request loop
  • expose retry ceiling in Settings Hub (Rotation & Quota)
  • add structured \codex-metrics\ counters for retry governor stop reasons
  • update docs and tests for config/schema/settings parity

Validation

  • npm run typecheck
  • npm run lint
  • npm run build
  • npm test
  • npm run clean:repo:check
  • npm run audit:ci

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

adds pure retry governor module to centralize all-rate-limited retry decisions with new absoluteCeilingMs config and env var. clean separation of decision logic from request loop enables isolated unit testing. three new telemetry counters track governor stop reasons (wait>max, retry limit, absolute ceiling).

key changes:

  • lib/request/retry-governor.ts - pure decision function with comprehensive edge case handling (NaN, Infinity, negative values)
  • index.ts - integrated governor, tracks accumulated wait across retries, logs stop reasons
  • lib/config.ts + lib/schemas.ts - new retryAllAccountsAbsoluteCeilingMs field with env override CODEX_AUTH_RETRY_ALL_ABSOLUTE_CEILING_MS
  • settings hub exposes ceiling control in rotation & quota category (0 to 24h range)
  • comprehensive test coverage: unit tests for governor logic + integration test verifying ceiling enforcement + telemetry

notes:

  • accumulated wait tracking uses base waitMs but actual sleeps include ±20% jitter, so real wait can exceed ceiling by ~20% (commented)
  • no windows filesystem or token safety concerns - pure timing logic
  • no concurrency issues - accumulatedAllRateLimitedWaitMs is local to request loop scope
  • all tests passing, docs updated, config/schema/settings parity maintained

Confidence Score: 4/5

  • safe to merge - well-tested retry logic with minor jitter tracking discrepancy
  • pure decision logic with comprehensive unit + integration tests, clean separation of concerns, and thorough edge case handling. minor style issue: accumulated wait tracks base values while actual sleeps include ±20% jitter, allowing real wait to slightly exceed ceiling. all validation passing, docs complete, no concurrency or safety risks.
  • index.ts around line 2417 - consider whether jitter should be included in accumulated wait tracking

Important Files Changed

Filename Overview
lib/request/retry-governor.ts new pure decision module for retry-all-rate-limited logic with comprehensive edge case handling
index.ts integrated retry governor, added telemetry counters and accumulated wait tracking - minor jitter accounting issue
lib/config.ts added getRetryAllAccountsAbsoluteCeilingMs with env override and min constraint
lib/codex-manager/settings-hub.ts exposed retry ceiling in rotation & quota category with proper bounds and formatting
test/retry-governor.test.ts comprehensive unit tests covering all stop reasons and edge cases
test/index-retry.test.ts added integration test verifying absolute ceiling enforcement and telemetry

Fix All in Codex

Last reviewed commit: d7a8a4d

Adds a pure retry governor for all-rate-limited flows, introduces an absolute wait ceiling setting with env override, and wires decision-based retry gating into the request loop.

Also exposes retry ceiling in Settings Hub (Rotation & Quota), and adds structured codex-metrics counters for retry governor stop reasons.

Validation:
- npm run typecheck
- npm run lint
- npm run build
- npm test
- npm run clean:repo:check
- npm run audit:ci

Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Warning

Rate limit exceeded

@ndycode has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 15 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7fa01f5-4bd0-443b-bbcb-6e4e1127a2fd

📥 Commits

Reviewing files that changed from the base of the PR and between d36b04f and d7a8a4d.

📒 Files selected for processing (14)
  • docs/development/CONFIG_FIELDS.md
  • docs/reference/settings.md
  • index.ts
  • lib/codex-manager/settings-hub.ts
  • lib/config.ts
  • lib/request/retry-governor.ts
  • lib/schemas.ts
  • test/codex-manager-cli.test.ts
  • test/index-retry.test.ts
  • test/index.test.ts
  • test/plugin-config.test.ts
  • test/retry-governor.test.ts
  • test/schemas.test.ts
  • test/settings-hub-utils.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch transform/stage-01-reliability

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

@coderabbitai review

Re-triggering review to clear the transient rate-limit status check failure.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

@ndycode: re-triggering the review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

Re-requesting CodeRabbit review: the current failure is CodeRabbit: Review rate limit exceeded while other checks are green.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

Follow-up tracking: external blocker recorded in #44.

Current failing context remains:

  • CodeRabbit: Review rate limit exceeded

Actions already taken:

  • re-triggered @coderabbitai review
  • verified review threads are resolved

Next required step is external quota reset/increase, then rerun CodeRabbit.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

@ndycode: acknowledged — the rate limit block is external and nothing on the pr side is holding things up. once the quota resets (tracked in #44), re-trigger with @coderabbitai review and the incremental review will resume from d7a8a4d.

no action needed from you in the meantime — all threads are clean and the pr is in good shape.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant