feat: implement production hardening roadmap phases by ndycode · Pull Request #42 · ndycode/codex-multi-auth

ndycode · 2026-03-04T16:52:17Z

Summary

Addressed all open PR42 review threads with code or explicit rationale and posted per-thread replies.
Landed follow-up fixes for new late review findings (authorization least-privilege default, file-lock release race hardening).
Kept changes minimal and regression-focused.

What Changed

Retry/governor and settings coverage:
- Added deterministic jitter ceiling regression in test/index-retry.test.ts.
- Added settings-hub regressions for retryAllAccountsAbsoluteCeilingMs unlimited preview + category/label wiring.
Windows/concurrency hardening:
- lib/quota-cache.ts now omits unix mode on win32; added regression.
- lib/accounts.ts conflict recovery now retries on EBUSY (plus deterministic merge regression).
- lib/file-lock.ts now re-checks ownership immediately before delete (async + sync paths), with explicit TOCTOU limitation note and replacement-race regression test.
Script/runtime guard improvements:
- Strengthened benchmark-runtime-path sync-merge guard (stale precondition + changed/token repair assertions) and added helper regression tests.
- Added runtime budget helper + regression tests for non-positive maxAvgMs rejection.
Test maintainability:
- Extracted shared removeWithRetry helper to test/helpers/fs-retry.ts and reused in affected tests.
Auth/security behavior:
- Changed default CODEX_AUTH_ROLE fallback from admin to viewer (least privilege) and updated tests.

Behavior Change

Security default changed: when CODEX_AUTH_ROLE is unset, authorization now defaults to viewer instead of admin.

How To Test

npm run typecheck
npm run lint:ts
npm run lint:scripts
npm test -- test/settings-hub-utils.test.ts test/accounts-edge.test.ts test/index-retry.test.ts test/quota-cache.test.ts test/check-runtime-budgets-script.test.ts test/benchmark-runtime-path-helpers.test.ts test/authorization.test.ts test/config-save.test.ts test/file-lock.test.ts test/utils.test.ts test/network.test.ts

Risk / Rollout Notes

Low-to-medium risk, concentrated in auth defaulting and lock-release race hardening.
Added targeted regressions for each changed behavior to reduce rollback risk.
If any workflow relies on implicit admin privileges with unset CODEX_AUTH_ROLE, set CODEX_AUTH_ROLE=admin explicitly.

Update 2026-03-05 (commit `6277c85`)

What changed

Resolved remaining review findings across config retry coverage, SSE parse-status mapping (502), storage lock-release safety, auth timeout clamping, benchmark retry helpers, and telemetry read/write concurrency hardening.
Added string-pattern redaction for sensitive token/email values in external JSON outputs.
Added/refined regression tests: test/telemetry-lock.test.ts, test/config-save.test.ts, test/fetch-helpers.test.ts, test/auth.test.ts, test/idempotency.test.ts, test/data-retention.test.ts, test/codex-manager-cli.test.ts.

How to test

npm run typecheck
npm run lint:ts
npm run lint:scripts
npm test -- test/telemetry-lock.test.ts test/telemetry.test.ts test/data-retention.test.ts test/shutdown.test.ts test/storage.test.ts test/auth.test.ts test/account-view.test.ts test/accounts-edge.test.ts test/codex-manager-cli.test.ts test/benchmark-runtime-path-helpers.test.ts test/index.test.ts test/index-retry.test.ts test/fetch-helpers.test.ts test/retry-governor.test.ts test/config-save.test.ts test/tool-utils.test.ts test/background-jobs.test.ts test/idempotency.test.ts test/settings-hub-utils.test.ts test/data-redaction.test.ts

Risk / rollout notes

Behavior is intentionally unchanged except for safer defaults/guards (timeout normalization and redacted JSON output).
Added targeted regression coverage for Windows lock contention, retry boundaries, and cross-process telemetry safety.

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this PR lands a focused production-hardening batch: least-privilege auth default (viewer), file-lock TOCTOU double-read hardening, refreshAccessToken timeout floor guard, EBUSY conflict recovery in account persistence, win32 mode omission in quota-cache writes, and extraction of a shared removeWithRetry test helper — all with targeted regression tests.

security default fixed: CODEX_AUTH_ROLE unset now resolves to "viewer" instead of "admin", closing a silent privilege escalation on unconfigured installs. tests updated correctly.
file-lock release hardening: double-read ownership check before unlink on both async and sync paths narrows the replacement-race window; known TOCTOU limitation is documented inline with a regression test covering the scenario.
token refresh timeout safety: refreshAccessToken now uses the shared resolveExchangeTimeoutMs guard, preventing 0 / negative / non-finite callers from silently killing token rotation.
windows concurrency: mode: 0o600 is correctly skipped on win32 in quota-cache.ts; account saves retry on EBUSY/ECONFLICT with a disk-merge strategy to survive antivirus-induced lock contention.
test helper extraction: test/helpers/fs-retry.ts extracted, but test/file-lock.test.ts was not updated to import from it — it still carries a local duplicate, defeating the stated refactor goal.
indentation drift: the second half of mergeIntoLatestStorage in lib/accounts.ts (lines 834–862) is indented one extra level, making nesting visually ambiguous despite being brace-correct; this may be caught by the biome formatter on CI.

Confidence Score: 4/5

safe to merge after cleaning up the duplicate removeWithRetry in test/file-lock.test.ts and the indentation drift in lib/accounts.ts
all behavioral changes (auth default, timeout clamping, conflict retry, win32 mode guard) are correct and covered by regressions. the two open items are maintenance/style quality issues, not runtime bugs. the break-glass and ABAC paths are well-tested. callers relying on implicit admin access are the primary rollout risk, which the PR explicitly calls out.
test/file-lock.test.ts (duplicate helper) and lib/accounts.ts lines 834–862 (indentation drift in mergeIntoLatestStorage)

Important Files Changed

Filename	Overview
lib/authorization.ts	default role switched from `admin` to `viewer` on line 31; fallback for invalid values already returned `viewer` — both paths are now consistently least-privilege. tests updated accordingly.
lib/file-lock.ts	double-read ownership check added before unlink in both async and sync release paths; acknowledged TOCTOU limitation is documented in comments. the pattern is correct and the regression test in test/file-lock.test.ts covers the replacement-during-release scenario.
lib/auth/auth.ts	`refreshAccessToken` now routes its timeout through `resolveExchangeTimeoutMs`, clamping 0 / negative / non-finite values to a 1 000 ms floor — matches the guard already present on `exchangeAuthorizationCode`.
lib/accounts.ts	new `mergeIntoLatestStorage` and `persistStorageWithConflictRecovery` methods add EBUSY/ECONFLICT retry with disk-merge. logic is correct but the second half of `mergeIntoLatestStorage` (lines 834–862) has inconsistent extra indentation that makes nesting visually ambiguous and may be caught by the biome formatter.
lib/quota-cache.ts	`mode: 0o600` correctly omitted on win32 (Windows does not support POSIX permission bits and setting them can cause `EINVAL`). temp-rename write path and file-lock acquisition look solid.
test/file-lock.test.ts	replacement-race regression and multi-process contention test are good additions, but the file still defines a local `removeWithRetry` copy instead of importing from the newly-extracted `test/helpers/fs-retry.ts`, defeating the stated refactor goal.
test/helpers/fs-retry.ts	clean extraction of shared retry helper; exports `RETRYABLE_REMOVE_CODES` and `removeWithRetry` with correct exponential backoff. only issue is that `test/file-lock.test.ts` was not updated to consume it.
test/authorization.test.ts	test updated to assert `"viewer"` as default role; correctly exercises least-privilege default, invalid-role fallback, break-glass bypass, ABAC read-only mode, deny-actions/commands, and interactive/idempotency-key guards.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[saveToDisk] --> B[buildStorageSnapshot]
    B --> C[persistStorageWithConflictRecovery]
    C --> D{saveAccounts}
    D -->|success| E[return]
    D -->|ECONFLICT / EBUSY| F{attempt < maxAttempts?}
    F -->|no| G[throw error]
    F -->|yes| H[loadAccounts from disk]
    H --> I[mergeIntoLatestStorage\nlatest + local]
    I --> J[sleep 20ms * 2^attempt]
    J --> D

    subgraph mergeIntoLatestStorage
        K[start with latest.accounts] --> L[claimIndex by token / accountId / email]
        L -->|found| M[mergeStoredAccount\nboth spread + rate-limit union]
        L -->|not found| N[append new account]
        M --> O[resolve activeIndexByFamily\nby local token lookup\nfall back to latest index]
        N --> O
    end

_{Last reviewed commit: 6277c85}

Context used:

Rule from dashboard - What: Every code change must explain how it defends against Windows filesystem concurrency bugs and ... (source)

Add timeout and abort boundary for authorization code exchange. Wire fetch timeout config into oauth exchange callsites. Add tests for network and timeout failure behavior. Co-authored-by: Codex <noreply@openai.com>

Add enterprise-grade hardening across runtime, CLI, storage, CI, and docs.\n\n- Add cross-process file locking for settings/quota persistence\n- Add at-rest secret encryption with rotation command and idempotency support\n- Add RBAC/ABAC-style command authorization, JSON redaction, and retention policies\n- Add background retry + dead-letter queue for async persistence failures\n- Add list JSON pagination standard and schemaVersion contract updates\n- Add CI security gates: secret scan, supply-chain/SCA/license checks, SBOM, required checks policy\n- Add operations and incident response runbooks\n- Add/extend tests for new security/reliability primitives and CLI behaviors\n\nValidated with:\n- npm run typecheck\n- npm run lint\n- npm run build && npm test\n- npm run coverage\n- npm run audit:ci\n- npm run license:check\n- npm run clean:repo:check Co-authored-by: Codex <noreply@openai.com>

- migrate new envelopes to salted scrypt-derived AES keys\n- retain legacy enc:v1 decryption compatibility\n- add coverage for v2 and legacy v1 crypto paths\n- clarify high-entropy 32-byte key requirements in docs Co-authored-by: Codex <noreply@openai.com>

- dedupe OAuth timeout resolution in plugin flows\n- close upstream abort-listener race in auth exchange\n- add transient config-read retry handling for EBUSY/EPERM\n- add timeout propagation and retry regression tests\n\nCo-authored-by: Codex <noreply@openai.com>

- sanitize OAuth token error logging and payloads - add deterministic upstream-abort and cleanup regressions - harden config-save merge path against transient EBUSY/EPERM reads - strengthen timeout wiring and EPERM regression tests Co-authored-by: Codex <noreply@openai.com>

- audit break-glass authorization bypass events\n- add multi-process contention regression for file locking\n- cover new auth.break_glass audit action in tests Co-authored-by: Codex <noreply@openai.com>

Prevent sensitive OAuth payload leakage during refresh HTTP failures by logging status and body length only, and return a generic safe error message when response text exists. Adds regression tests for auth behavior and sanitized logging. Co-authored-by: Codex <noreply@openai.com>

Introduce optional environment-driven ABAC constraints layered ahead of RBAC for auth actions and commands. - add ABAC policy evaluation in authorization module - pass command/interactive/idempotency context from CLI command routing - enforce read authorization on read command paths consistently - add unit and CLI coverage for ABAC deny and idempotency requirements - document ABAC controls in config and operations runbooks Co-authored-by: Codex <noreply@openai.com>

Add schema-aware config sanitization, introduce a resilient fetch primitive with timeout/retry instrumentation, apply it to remote prompt/update fetches, and add shutdown timeout boundaries with concurrent cleanup deduplication. Includes new regression tests for network retries, config sanitization, and shutdown timeout behavior. Co-authored-by: Codex <noreply@openai.com>

Introduce a shared account-view helper module and reuse it across the plugin entrypoint and CLI manager to remove duplicated formatting logic while preserving behavior. Add unit/regression tests to lock formatting and active-index fallback behavior. Co-authored-by: Codex <noreply@openai.com>

Co-authored-by: Codex <noreply@openai.com>

Address unresolved hardening review items across auth abort classification, config sanitization, network retry cancellation, update-check backoff, prompt refresh dedupe coverage, and shutdown timer cleanup. Adds deterministic regression tests for each scenario and aligns config-save expectations with schema-bound validation. Co-authored-by: Codex <noreply@openai.com>

Add shared helpers for per-family active-index map creation and bulk updates, then reuse them across plugin, CLI manager, and codex-cli sync flows to reduce duplicated state-mutation logic. Extend tests to lock switch and selection behavior for all model-family indexes. Co-authored-by: Codex <noreply@openai.com>

Extract active-index normalization into a shared helper and reuse it in plugin account-flow clamping, codex-cli sync, and doctor normalization paths. Add focused tests covering bounded normalization and empty-account map handling to preserve behavior across both clear and fill modes. Co-authored-by: Codex <noreply@openai.com>

Extract model-family rate-limit status rendering into account-view helpers and replace duplicate status mapping logic in the plugin status output paths. Add unit coverage for mixed limited/ok family labels to protect output behavior. Co-authored-by: Codex <noreply@openai.com>

Move active-index-by-family label rendering into shared account-view helpers and reuse it in both v2 and legacy codex status output paths. Add helper tests to lock 1-based labels and placeholder output for missing families. Co-authored-by: Codex <noreply@openai.com>

Move codex-remove index/family reindexing into shared active-index helpers to centralize the mutation policy and reduce inline branching in the plugin entrypoint. Add helper tests for in-range and out-of-range removal behavior. Co-authored-by: Codex <noreply@openai.com>

- cleanup stale/dead process lock artifacts before acquiring account lock - ensure lock release always attempts fallback cleanup - keep clearAccounts/saveTransactions serialized across file and memory locks Co-authored-by: Codex <noreply@openai.com>

Introduce shared account-storage view helpers for creating empty v3 storage and cloning mutable working copies, then reuse them across plugin login/check flow, codex-cli sync, and manager reset paths. Add unit tests to lock clone isolation and family-index initialization behavior. Co-authored-by: Codex <noreply@openai.com>

- add storage identity fallback test for malformed .git metadata - add OAuth poll abort test for deterministic close behavior - add prompt refresh dedupe test for concurrent stale cache reads Co-authored-by: Codex <noreply@openai.com>

Add dev doctor and setup commands with fail-fast checks, document the contributor workflow, and wire a CI sanity check for doctor:dev. Co-authored-by: Codex <noreply@openai.com>

- replace expensive tool-schema clone path with fast JSON-safe clone and lower-allocation cleanup loops - eliminate repeated index map rebuilds and JSON.stringify comparisons in codex-cli account sync - parse error bodies once in fetch helpers and thread parsed payload through rate-limit mapping - add detailed non-stream success handling to avoid duplicate JSON parse in empty-response retry checks - extend runtime benchmark harness with sync/error/non-stream cases and update tests/mocks for new helper exports Co-authored-by: Codex <noreply@openai.com>

Add a canonical verify workflow shared by local and CI, integrate scoped Biome formatting checks, and document contributor/release runbooks to reduce setup and release friction. Co-authored-by: Codex <noreply@openai.com>

Bound expensive fan-out in account refresh/email hydration paths, cache index remaps in storage, and add runtime benchmark regression gating in CI. - throttle stale rotating backup cleanup scans per storage path - add bounded concurrency for proactive refresh and email hydration - reduce refresh-guardian lookup churn with precomputed token->index mapping - extend runtime benchmark coverage and add CI gate budgets Co-authored-by: Codex <noreply@openai.com>

Instrument hydration and proactive refresh hot paths with duration and throughput counters, and surface guardian tick timing in runtime metrics output. - track email hydration runs/candidates/refreshed/updated/failures/duration - log proactive refresh batch timing with effective concurrency - add guardian tick duration stats (last/max/avg) Co-authored-by: Codex <noreply@openai.com>

Tighten metrics correctness and improve test coverage for the new performance instrumentation. - keep guardian avg tick duration synchronized with cumulative/runs - add deterministic guardian duration aggregate test - assert codex-metrics output includes hydration/guardian timing lines Co-authored-by: Codex <noreply@openai.com>

Add regression tests for doctor/setup scripts, make setup command execution deterministic under event-order races, add bounded Windows npm ci retries, and document contributor upgrade + Windows lock remediation guidance. Co-authored-by: Codex <noreply@openai.com>

Add explicit loadAccounts tests for rotating-backup cleanup concurrency controls: - verify 30s cleanup interval throttling per storage path - verify in-flight cleanup promise dedup across concurrent loadAccounts calls This addresses the missing coverage gap called out by Greptile. Co-authored-by: Codex <noreply@openai.com>

Ensure account-storage mutations keep deterministic ordering while preserving the historical file-lock before in-process mutex acquisition sequence. Co-authored-by: Codex <noreply@openai.com>

- preserve manual accountId on snapshot merge - clarify refresh guardian last-tick metric semantics - add CRLF/fallback/concurrency/metrics regression tests Co-authored-by: Codex <noreply@openai.com>

Adds a pure retry governor for all-rate-limited flows, introduces an absolute wait ceiling setting with env override, and wires decision-based retry gating into the request loop. Also exposes retry ceiling in Settings Hub (Rotation & Quota), and adds structured codex-metrics counters for retry governor stop reasons. Validation: - npm run typecheck - npm run lint - npm run build - npm test - npm run clean:repo:check - npm run audit:ci Co-authored-by: Codex <noreply@openai.com>

- add shared spawnSync stub for npm/git version probes - use stub in missing-script, old-node, and missing-verify tests - keep doctor tests deterministic across CI host environments Co-authored-by: Codex <noreply@openai.com>

Replace sync rotation fs calls with async operations wrapped in bounded retry\nfor EBUSY/EPERM/ENOTEMPTY to handle Windows lock contention safely.\nAlso adds a telemetry test that simulates a transient rename lock and\nverifies rotation still succeeds.\n\nCo-authored-by: Codex <noreply@openai.com>

Implement targeted mitigations for high-risk production landmines:\n- CAS conflict detection for account, unified settings, and config persistence\n- account save conflict merge-retry flow\n- refresh lease timeout fail-closed policy\n- network timeouts across OAuth and prompt fetch paths\n- idempotency key propagation and safer SSE fallback handling\n- recovery storage corruption guards\n- quota cache save result signaling\n\nValidated with full test, lint, and typecheck passes. Co-authored-by: Codex <noreply@openai.com>

…openai.com>

Co-authored-by: Codex <noreply@openai.com>

…l-plan-20260305

…60305

Co-authored-by: Codex <noreply@openai.com>

coderabbitai · 2026-03-04T16:52:39Z

📝 Walkthrough

walkthrough

this pr adds many new modules and CI/docs changes: authorization/abac, secrets encryption and rotation, file-locking and optimistic writes, idempotency, telemetry, background-jobs, resilient network/retry-governor, data-retention and shutdown, plus broad CLI/manager integrations and many tests.

changes

Cohort / File(s)	Summary
authorization & abac `lib/authorization.ts`, `lib/audit.ts`, `test/authorization.test.ts`, `test/audit.test.ts`	adds role-based and abac checks, break-glass path and `AUTH_BREAK_GLASS` enum member (`lib/audit.ts:line`). tests in `test/authorization.test.ts:line` and `test/audit.test.ts:line`.
secrets encryption & storage rotation `lib/secrets-crypto.ts`, `lib/storage.ts`, `lib/recovery/storage.ts`, `test/secrets-crypto.test.ts`	introduces AES-256-GCM envelopes, key rotation and parse helpers, integrates rotateStoredSecretEncryption into storage persistence (`lib/storage.ts:line`). review crypto handling in `lib/secrets-crypto.ts:line`.
file locking & optimistic concurrency `lib/file-lock.ts`, `lib/unified-settings.ts`, `lib/quota-cache.ts`, `lib/config.ts`, `test/file-lock.test.ts`	adds async/sync file locks, optimistic expectedRevision checks, lock-backed atomic writes and retry logic. key paths: `lib/file-lock.ts:line`, `lib/unified-settings.ts:line`, `lib/quota-cache.ts:line`. concurrency tests at `test/file-lock.test.ts:line`.
idempotency store & cli flows `lib/idempotency.ts`, `lib/codex-manager.ts`, `test/idempotency.test.ts`	on-disk idempotency store with TTL, file-locking and check/record API used by rotate-secrets and CLI idempotent flows (`lib/idempotency.ts:line`, `lib/codex-manager.ts:line`). tests at `test/idempotency.test.ts:line`.
telemetry & redaction `lib/telemetry.ts`, `lib/data-redaction.ts`, `lib/codex-manager.ts`, `docs/privacy.md`, `test/telemetry.test.ts`	persistent jsonl telemetry with sanitization/redaction, async writes, rotation and query APIs. telemetry enablement surfaced in config (`lib/telemetry.ts:line`, `lib/data-redaction.ts:line`). tests at `test/telemetry.test.ts:line`.
background jobs & dlq `lib/background-jobs.ts`, `lib/accounts.ts`, `test/background-jobs.test.ts`	runBackgroundJobWithRetry with backoff and DLQ fallback. accounts save flows now use retryable background jobs (`lib/background-jobs.ts:line`, `lib/accounts.ts:line`). tests at `test/background-jobs.test.ts:line`.
resilient network & retry governor `lib/network.ts`, `lib/request/retry-governor.ts`, `lib/request/fetch-helpers.ts`, `test/network.test.ts`, `test/retry-governor.test.ts`	fetchWithTimeoutAndRetry (per-attempt timeouts, retries, jitter) and decideRetryAllAccountsRateLimited (absolute ceiling and reasons). see `lib/network.ts:line`, `lib/request/retry-governor.ts:line`. tests at `test/network.test.ts:line`.
proactive refresh & guardian metrics `lib/proactive-refresh.ts`, `lib/refresh-guardian.ts`, `test/proactive-refresh.test.ts`, `test/refresh-guardian.test.ts`	adds worker-pool concurrency for proactive refresh and tick duration metrics (avg/max/last) (`lib/proactive-refresh.ts:line`, `lib/refresh-guardian.ts:line`). tests updated at `test/proactive-refresh.test.ts:line`.
request/response improvements `lib/request/fetch-helpers.ts`, `lib/request/response-handler.ts`, `test/fetch-helpers.test.ts`, `test/response-handler.test.ts`	introduces handleSuccessResponseDetailed, convertSseToJsonWithDetails, parsedBody propagation, idempotency header support and refined error parsing (`lib/request/fetch-helpers.ts:line`, `lib/request/response-handler.ts:line`). tests at `test/fetch-helpers.test.ts:line`.
cli/manager features & settings `lib/codex-manager.ts`, `lib/codex-manager/settings-hub.ts`, `package.json`, `test/codex-manager-cli.test.ts`	adds rotate-secrets and telemetry CLI commands, idempotency wiring, pagination/json outputs, and backend setting for retry ceiling (`lib/codex-manager.ts:line`, `lib/codex-manager/settings-hub.ts:line`). tests at `test/codex-manager-cli.test.ts:line`.
config, schemas & safe io `lib/config.ts`, `lib/schemas.ts`, `docs/development/CONFIG_FIELDS.md`, `test/plugin-config.test.ts`	sanitized config IO with retries/locks, optimistic save with revision checks, new defaults `telemetryEnabled` and `retryAllAccountsAbsoluteCeilingMs`, and getters `getTelemetryEnabled` / `getRetryAllAccountsAbsoluteCeilingMs` (`lib/config.ts:line`, `lib/schemas.ts:line`). tests at `test/plugin-config.test.ts:line`.
retention & shutdown `lib/data-retention.ts`, `lib/shutdown.ts`, `test/data-retention.test.ts`, `test/shutdown.test.ts`	enforceDataRetention API and pruning helpers. improved shutdown coordination and configurable timeout with in-flight dedupe (`lib/data-retention.ts:line`, `lib/shutdown.ts:line`). tests at `test/data-retention.test.ts:line`.
barrel exports & index integration `lib/index.ts`, `index.ts`	re-exports for file-lock, secrets-crypto, data-retention, data-redaction, authorization, background-jobs, idempotency, telemetry. `index.ts` integrates telemetry/audit and many runtime changes (`lib/index.ts:line`, `index.ts:line`).
ci, docs, local-dev tooling `.github/workflows/*`, `.github/settings.yml`, `scripts/doctor-dev.js`, `scripts/setup-dev.js`, `CONTRIBUTING.md`, `docs/development/LOCAL_DEV.md`, `package.json`	new CI workflows (ci, secret-scan, supply-chain), branch protection, dev doctor and setup scripts, new verify scripts and docs updates. review `.github/workflows/:line` and `scripts/:line` for platform-specific paths.
tests & coverage `test/*.test.ts`	large addition and updates to tests covering file-lock, idempotency, telemetry, secrets-crypto, retry-governor, background-jobs, storage conflict merges, auth timeouts, proactive-refresh, and many integration points (see `test/` files referenced above).

sequence diagram(s)

sequenceDiagram
    participant cli as user/cli
    participant auth as lib/authorization.ts
    participant audit as lib/audit.ts
    participant env as environment
    cli->>auth: authorizeAction(action, context)
    activate auth
    env->>auth: read CODEX_AUTH_BREAK_GLASS / CODEX_AUTH_ABAC_*
    alt break-glass enabled
        auth->>audit: emitAudit(AUTH_BREAK_GLASS)
        audit-->>auth: ack
        auth-->>cli: { allowed: true, role }
    else evaluate abac + role
        auth->>auth: evaluate abac rules
        auth-->>cli: { allowed: boolean, reason? }
    end
    deactivate auth

sequenceDiagram
    participant manager as lib/codex-manager.ts
    participant idemp as lib/idempotency.ts
    participant crypto as lib/secrets-crypto.ts
    participant lock as lib/file-lock.ts
    participant storage as lib/storage.ts
    manager->>idemp: checkAndRecordIdempotencyKey(scope,key)
    alt replay
        idemp-->>manager: { replayed: true }
        manager-->>manager: return cached result
    else new
        idemp-->>manager: { replayed: false }
        manager->>crypto: encryptSecret(secret)
        manager->>lock: acquireFileLock(path)
        activate lock
        manager->>storage: rotateStoredSecretEncryption()
        lock->>manager: release
        manager->>manager: emit audit/telemetry
    end

estimated code review effort

🎯 5 (critical) | ⏱️ ~120 minutes

suggested labels

security telemetry concurrency ci

notes and review flags

concurrency risks: review merge-and-retry and optimistic write loops in lib/accounts.ts:line, lib/unified-settings.ts:line, and lib/config.ts:line. multi-process races may remain despite tests (test/accounts-edge.test.ts:line). add stress tests under contention.
file-lock correctness: validate stale-lock cleanup, owner validation and cross-platform semantics in lib/file-lock.ts:line. multi-process test exists at test/file-lock.test.ts:line but verify on real windows runners.
windows edge cases: inspect windows-specific temp/rename and mode handling in lib/quota-cache.ts:line, scripts/setup-dev.js:line, and scripts/doctor-dev.js:line. tests exercise win variants (test/scripts/setup-dev.test.ts:line) but CI windows coverage should be confirmed.
missing regression tests: add targeted tests for
- secret rotation partial-failure across mixed envelope versions (lib/secrets-crypto.ts:line beyond unit coverage in test/secrets-crypto.test.ts:line).
- idempotency TTL under heavy clock skew and extreme concurrent writers (lib/idempotency.ts:line).
- prolonged optimistic conflict exhaustion in lib/unified-settings.ts:line with many concurrent writers.
crypto review requested: perform focused cryptographic review of lib/secrets-crypto.ts:line for scrypt params, iv/tag lengths, encoding edge cases, and failure semantics when no key matches.
telemetry & redaction audit: confirm sensitive fields are consistently redacted before recording and before any CLI JSON output in lib/telemetry.ts:line and lib/data-redaction.ts:line. tests cover redaction (test/telemetry.test.ts:line) but verify integration points like lib/auth/auth.ts:line.
retry-governor semantics: review absolute ceiling logic and accumulated-wait accounting in lib/request/retry-governor.ts:line and callers in index.ts:line to ensure correct stop reasons are surfaced to operators.

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch impl/full-plan-20260305

coderabbitai

Actionable comments posted: 90

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)

test/server.unit.test.ts (1)
328-350: 🛠️ Refactor suggestion | 🟠 Major

missing try/finally for timer cleanup in timeout test.

if any assertion in this block throws, vi.useRealTimers() at line 349 is skipped, leaving fake timers active for subsequent tests. the new test at 303-326 correctly uses try/finally — this one should match.
suggested fix
 		it('should return null after 5 minute timeout', async () => {
 			vi.useFakeTimers();
-			
-			(mockServer.listen as ReturnType<typeof vi.fn>).mockImplementation(
-				(_port: number, _host: string, callback: () => void) => {
-					callback();
-					return mockServer;
-				}
-			);
-			(mockServer.on as ReturnType<typeof vi.fn>).mockReturnValue(mockServer);
-
-			const result = await startLocalOAuthServer({ state: 'test-state' });
-			
-			const codePromise = result.waitForCode('test-state');
-			
-			await vi.advanceTimersByTimeAsync(5 * 60 * 1000 + 100);
-			
-			const code = await codePromise;
-			expect(code).toBeNull();
-			expect(logWarn).toHaveBeenCalledWith('OAuth poll timeout after 5 minutes');
-			
-			vi.useRealTimers();
+			try {
+				(mockServer.listen as ReturnType<typeof vi.fn>).mockImplementation(
+					(_port: number, _host: string, callback: () => void) => {
+						callback();
+						return mockServer;
+					}
+				);
+				(mockServer.on as ReturnType<typeof vi.fn>).mockReturnValue(mockServer);
+
+				const result = await startLocalOAuthServer({ state: 'test-state' });
+				
+				const codePromise = result.waitForCode();
+				
+				await vi.advanceTimersByTimeAsync(5 * 60 * 1000 + 100);
+				
+				const code = await codePromise;
+				expect(code).toBeNull();
+				expect(logWarn).toHaveBeenCalledWith('OAuth poll timeout after 5 minutes');
+			} finally {
+				vi.useRealTimers();
+			}
 		});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/server.unit.test.ts` around lines 328 - 350, The test "should return
null after 5 minute timeout" uses vi.useFakeTimers() but does not guarantee
vi.useRealTimers() runs if an assertion throws; wrap the fake-timer usage in a
try/finally so timers are always restored. Concretely, update this test around
the startLocalOAuthServer call / codePromise handling to call vi.useFakeTimers()
before setting up mocks, run the test actions in a try block, and call
vi.useRealTimers() in the finally block (referencing vi.useFakeTimers,
vi.useRealTimers, startLocalOAuthServer, and result.waitForCode) to ensure timer
cleanup even on failures.
docs/reference/public-api.md (1)
80-106: ⚠️ Potential issue | 🟠 Major

migration rules need explicit checklist for security.md sync and npm script documentation.

the api standards baseline section (lines 66-80) declares active contracts for schemaVersion, idempotency, and pagination. per coding guideline, "whenever behavior changes, require updated upgrade notes and mention new npm scripts." the migration rules (lines 95-106) require readme.md and docs/upgrade.md updates but don't explicitly mandate security.md consistency review or documenting new/changed npm scripts used by callers/operators.

this is a docs-only change with no regression tests, windows edge cases, or concurrency risks. the omission creates drift risk: future contract breaks may update docs but skip these required steps.
proposed addition to migration rules
 3. Update:
    - `README.md`
+   - `SECURITY.md`
    - `docs/upgrade.md`
    - affected `docs/reference/*`
    - release notes and changelog
+   - document any new/changed npm scripts and workflow commands used by callers/operators
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/public-api.md` around lines 80 - 106, Update the "Migration
Rules" section to add explicit checklist items requiring (1) a security.md
consistency review to ensure any contract change doesn't introduce security
regressions and (2) documentation of any new or changed npm scripts used by
callers/operators; reference the existing migration checklist items (README.md,
docs/upgrade.md, docs/reference/*, release notes/changelog) and add short
guidance to include before/after examples and test notes for schemaVersion,
idempotency, and pagination where relevant so maintainers know to mention these
contracts when documenting migrations.
lib/request/helpers/tool-utils.ts (2)
138-141: ⚠️ Potential issue | 🔴 Critical

guard anyof entries before const-flattening to prevent runtime throws.

schema.anyOf is cast to records and then checked with "const" in opt. if any entry is non-object, this can throw and abort cleanup. see lib/request/helpers/tool-utils.ts:140 (Line 140).
proposed fix
  if (Array.isArray(schema.anyOf)) {
-   const anyOf = schema.anyOf as Record<string, unknown>[];
-   const allConst = anyOf.every((opt) => "const" in opt);
+   const rawAnyOf = schema.anyOf as unknown[];
+   const anyOf = rawAnyOf.filter(
+     (opt): opt is Record<string, unknown> => isRecord(opt),
+   );
+   const allConst =
+     anyOf.length > 0 &&
+     anyOf.length === rawAnyOf.length &&
+     anyOf.every((opt) => Object.prototype.hasOwnProperty.call(opt, "const"));
    if (allConst && anyOf.length > 0) {
      const enumValues = anyOf.map((opt) => opt.const);
      schema.enum = enumValues;
      delete schema.anyOf;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/request/helpers/tool-utils.ts` around lines 138 - 141, The anyOf entries
are assumed to be objects and the every check uses `"const" in opt`, which
throws for non-objects; update the guard around schema.anyOf (the anyOf variable
and the allConst computation) to first ensure each opt is a non-null object
(e.g., typeof opt === "object" && opt !== null) before testing for the "const"
property, or filter/coerce non-object entries out so allConst only runs on safe
object entries; modify the block that defines anyOf and computes allConst to
perform this safe type-checking to avoid runtime throws.
104-108: ⚠️ Potential issue | 🟠 Major

clone failure fallback returns unsanitized schemas.

on clone failure, the function returns the original tool unchanged. this bypasses cleanup and can re-introduce invalid/circular schema into downstream requests. see lib/request/helpers/tool-utils.ts:106 (Line 106).
proposed fix
-    } catch {
-      return tool;
+    } catch {
+      return {
+        ...tool,
+        function: {
+          ...functionDef,
+          parameters: {
+            type: "object",
+            properties: {
+              _placeholder: {
+                type: "boolean",
+                description: "this property is a placeholder and should be ignored.",
+              },
+            },
+          },
+        },
+      };
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/request/helpers/tool-utils.ts` around lines 104 - 108, The catch block
for cloneRecord currently returns the original tool (variable tool), which
leaves unsanitized/circular schemas in place; instead, on clone failure assign
cleanedParameters = {} (or create and return a shallow copy of tool with
parameters set to an empty object) so downstream uses a sanitized, safe
parameters object; update the catch to set cleanedParameters = {} (or return
{...tool, parameters: {}}) and continue processing rather than returning the
original tool.
lib/shutdown.ts (1)
77-79: ⚠️ Potential issue | 🟡 Minor

add SIGBREAK handler for windows ctrl+break graceful shutdown support.

lib/shutdown.ts:77-79 registers SIGINT and SIGTERM but omits SIGBREAK. on windows, ctrl+break emits SIGBREAK (not SIGINT), so without it, graceful shutdown fails for users relying on ctrl+break in some console spawn scenarios. add process.once("SIGBREAK", handleSignal); alongside SIGINT/SIGTERM.

also add regression test to test/shutdown.test.ts:127+ (within the "process signal integration" suite) that verifies SIGBREAK handler is registered and runs cleanup/exit, following the same pattern as the existing SIGINT/SIGTERM tests at lines 128–184.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/shutdown.ts` around lines 77 - 79, Add a SIGBREAK registration alongside
the existing signal handlers so Windows ctrl+break triggers graceful shutdown:
in lib/shutdown.ts where process.once("SIGINT", handleSignal) and
process.once("SIGTERM", handleSignal) are registered, add
process.once("SIGBREAK", handleSignal). Then add a regression test in the
"process signal integration" suite in test/shutdown.test.ts (next to the
existing SIGINT/SIGTERM tests) that verifies the SIGBREAK handler is registered
and that firing SIGBREAK runs the same cleanup/exit logic used by the
SIGINT/SIGTERM tests.
lib/auto-update-checker.ts (1)
36-41: 🧹 Nitpick | 🔵 Trivial

sleepSync using SharedArrayBuffer/Atomics is blocking—intentional?

Atomics.wait blocks the thread synchronously. this is used in the retry loop for cache writes. while it works, it blocks the event loop during retries. for a background update check this is probably acceptable, but worth documenting the intentional blocking behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/auto-update-checker.ts` around lines 36 - 41, sleepSync currently uses
Atomics.wait which blocks the thread and thus the event loop; update the code to
explicitly document that behavior in the sleepSync JSDoc/comment (function name:
sleepSync) stating that it intentionally performs a synchronous/blocking sleep
and why it is acceptable for the background update check; additionally, if
non-blocking behavior is preferred, add an alternative asyncSleep implementation
(e.g., async function asyncSleep(ms: number): Promise<void>) and update the
retry loop(s) that call sleepSync to use asyncSleep instead of sleepSync.
lib/storage.ts (1)
1586-1598: ⚠️ Potential issue | 🟠 Major

flagged-account writes are missing file-lock hardening and windows retry behavior.

Line 1587 (lib/storage.ts:1587) uses process-local mutex only, and Line 1597 (lib/storage.ts:1597) does a single rename. this leaves openai-codex-flagged-accounts.json exposed to cross-process races and windows EPERM/EBUSY failures.
proposed fix
 export async function saveFlaggedAccounts(storage: FlaggedAccountStorageV1): Promise<void> {
-	return withStorageLock(async () => {
-		const path = getFlaggedAccountsPath();
+	const path = getFlaggedAccountsPath();
+	return withStorageSerializedFileLock(path, async () => {
 		const uniqueSuffix = `${Date.now()}.${Math.random().toString(36).slice(2, 8)}`;
 		const tempPath = `${path}.${uniqueSuffix}.tmp`;
@@
-			await fs.rename(tempPath, path);
+			await renameFileWithRetry(tempPath, path);
 		} catch (error) {
As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 1586 - 1598, saveFlaggedAccounts currently only
uses a process-local mutex (withStorageLock) and performs a single fs.rename,
leaving cross-process races and Windows EPERM/EBUSY failures possible; update
saveFlaggedAccounts (and related helpers like getFlaggedAccountsPath and
tempPath usage) to use a cross-process file lock (e.g., advisory lock or
lockfile on the target path) around the write/rename, write the temp file with
exclusive create semantics, then attempt fs.rename in a retry loop with
exponential backoff that handles EPERM/EBUSY/EMFILE and maximum retry count;
ensure the lock is always released in finally, preserve file mode 0o600, and
add/update vitest cases that simulate concurrent writers and Windows-style
EBUSY/EPERM to verify retries and correctness.
index.ts (1)
882-885: ⚠️ Potential issue | 🟠 Major

hydrate persistence can lose concurrent updates.

line [884] writes with saveAccounts(storage) after parallel refresh work. this path can clobber concurrent account mutations during the same window. use transactional persistence here as done in other write paths, and add a concurrency regression similar in spirit to test/codex-cli-sync.test.ts:503-583.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@index.ts` around lines 882 - 885, The current write path that sets
storage.accounts = accountsCopy and calls await saveAccounts(storage) can
clobber concurrent updates; change it to use the same transactional persistence
approach used elsewhere: load the latest storage inside a
transaction/compare-and-swap, merge the updated accounts into that fresh storage
(instead of overwriting), and commit via the transactional save API (e.g., the
existing transactional save function used in other write paths) so concurrent
mutations are preserved; update the block around
changed/accountsCopy/saveAccounts to perform read-merge-compare-and-commit and
add a concurrency regression test modeled after
test/codex-cli-sync.test.ts:503-583 that simulates parallel refresh updates to
ensure no lost updates.
docs/configuration.md (1)

71-99: ⚠️ Potential issue | 🟠 Major

add explicit upgrade-note and npm-script pointers for these behavior changes.

this section introduces new operational/security behavior toggles, but the change does not explicitly point operators to upgrade notes and the new script workflow entrypoints in the docs trail.

As per coding guidelines, docs/**: keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
test/schemas.test.ts (1)
33-45: ⚠️ Potential issue | 🟡 Minor

add a negative type test for telemetryEnabled.

test/schemas.test.ts:44 only validates the true-path for telemetryEnabled. add a regression assertion that string/number input is rejected, so this new field stays schema-guarded.
proposed test addition
 it("rejects wrong types", () => {
 	const result = PluginConfigSchema.safeParse({ codexMode: "yes" });
 	expect(result.success).toBe(false);
 });
+
+it("rejects non-boolean telemetryEnabled", () => {
+	expect(PluginConfigSchema.safeParse({ telemetryEnabled: "true" }).success).toBe(false);
+	expect(PluginConfigSchema.safeParse({ telemetryEnabled: 1 }).success).toBe(false);
+});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/schemas.test.ts` around lines 33 - 45, Add a negative regression test in
the same schemas.test.ts suite that verifies the telemetryEnabled schema rejects
non-boolean inputs: call the same configuration validator used elsewhere in the
file (the schema parse/validate function used in existing assertions) with
telemetryEnabled set to a string (e.g., "true") and a number (e.g., 1) and
assert that validation fails (throws or returns errors) for each case; place the
assertions alongside the existing positive test for telemetryEnabled so the
schema continues to guard this field.
lib/refresh-lease.ts (1)
255-284: 🧹 Nitpick | 🔵 Trivial

duplicate follower handle construction — consider extraction.

lib/refresh-lease.ts:255-264 and lib/refresh-lease.ts:275-284 repeat the same follower handle pattern. extract a helper like createFollowerHandle(result: TokenResult) to reduce duplication and ensure consistent behavior if the follower semantics evolve.
♻️ proposed extraction
+	private createFollowerHandle(result: TokenResult): RefreshLeaseHandle {
+		return {
+			role: "follower",
+			result,
+			release: async () => {
+				// Follower does not own lock.
+			},
+		};
+	}
then replace the inline objects with return this.createFollowerHandle(finalResult);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/refresh-lease.ts` around lines 255 - 284, Two identical follower handle
objects are built in refresh-lease.ts after readFreshResult; extract a helper
method (e.g. createFollowerHandle(result: TokenResult)) on the same class that
returns the { role: "follower", result, release: async () => { /* follower does
not own lock */ } } object and replace both inline constructions (the blocks
after readFreshResult at the two locations) with return
this.createFollowerHandle(finalResult); so behavior stays consistent and
duplication is removed; ensure the helper uses the same TokenResult type and is
referenced where the current inline objects are returned.
lib/auth/auth.ts (1)
327-334: ⚠️ Potential issue | 🟡 Minor

loose timeout detection via regex could match unrelated errors.

line 329 uses /timeout/i.test(err.message) which could match any error message containing "timeout" (e.g., "connection timeout", "socket timeout", or even "timeout value invalid"). this might incorrectly suppress logging for non-timeout errors.

consider relying solely on isAbortError(err) which checks name === "AbortError" or code === "ABORT_ERR" per lib/utils.ts:19-23.
🔧 suggested fix
-		if (isAbortError(err) || /timeout/i.test(err.message)) {
+		if (isAbortError(err)) {
 			return { type: "failed", reason: "unknown", message: err?.message ?? "Request aborted" };
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/auth/auth.ts` around lines 327 - 334, The catch block in the token
refresh flow currently treats any error whose message matches /timeout/i as an
abort and skips logging; update the catch handling in the try/catch inside
lib/auth/auth.ts so it no longer relies on the regex check and instead only
treats errors as aborted when isAbortError(err) returns true (leave the err
normalization via error instanceof Error intact), preserve the existing returned
shapes (e.g., return { type: "failed", reason: "unknown", message: ... } for
aborts) and ensure non-abort errors always call logError("Token refresh error",
err) and return the network_error result; reference the catch block and
functions isAbortError and logError to locate and change the logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4777ad81-b342-4db1-8d7f-a5f369bbf6d3

📥 Commits

Reviewing files that changed from the base of the PR and between d36b04f and 74b844d.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (116)

.github/pull_request_template.md
.github/settings.yml
.github/workflows/ci.yml
.github/workflows/secret-scan.yml
.github/workflows/supply-chain.yml
CONTRIBUTING.md
README.md
biome.jsonc
docs/DOCUMENTATION.md
docs/README.md
docs/configuration.md
docs/development/CONFIG_FIELDS.md
docs/development/LOCAL_DEV.md
docs/development/RELEASE_RUNBOOK.md
docs/development/TESTING.md
docs/index.md
docs/privacy.md
docs/reference/commands.md
docs/reference/error-contracts.md
docs/reference/public-api.md
docs/reference/settings.md
docs/reference/storage-paths.md
docs/runbooks/README.md
docs/runbooks/incident-response.md
docs/runbooks/operations.md
docs/upgrade.md
index.ts
lib/accounts.ts
lib/accounts/account-view.ts
lib/accounts/active-index.ts
lib/accounts/storage-view.ts
lib/audit.ts
lib/auth/auth.ts
lib/authorization.ts
lib/auto-update-checker.ts
lib/background-jobs.ts
lib/codex-cli/sync.ts
lib/codex-manager.ts
lib/codex-manager/settings-hub.ts
lib/config.ts
lib/data-redaction.ts
lib/data-retention.ts
lib/file-lock.ts
lib/idempotency.ts
lib/index.ts
lib/network.ts
lib/proactive-refresh.ts
lib/prompts/codex.ts
lib/prompts/host-codex-prompt.ts
lib/quota-cache.ts
lib/recovery/storage.ts
lib/refresh-guardian.ts
lib/refresh-lease.ts
lib/refresh-queue.ts
lib/request/fetch-helpers.ts
lib/request/helpers/tool-utils.ts
lib/request/response-handler.ts
lib/request/retry-governor.ts
lib/schemas.ts
lib/secrets-crypto.ts
lib/shutdown.ts
lib/storage.ts
lib/telemetry.ts
lib/unified-settings.ts
lib/utils.ts
package.json
scripts/benchmark-runtime-path.mjs
scripts/check-runtime-budgets.mjs
scripts/doctor-dev.js
scripts/license-policy-check.js
scripts/runtime-bench-budgets.json
scripts/setup-dev.js
test/account-storage-view.test.ts
test/account-view.test.ts
test/accounts-edge.test.ts
test/active-index.test.ts
test/audit.test.ts
test/auth-logging.test.ts
test/auth.test.ts
test/authorization.test.ts
test/auto-update-checker.test.ts
test/background-jobs.test.ts
test/codex-cli-sync.test.ts
test/codex-manager-cli.test.ts
test/codex-prompts.test.ts
test/config-save.test.ts
test/data-redaction.test.ts
test/data-retention.test.ts
test/fetch-helpers.test.ts
test/file-lock.test.ts
test/host-codex-prompt.test.ts
test/idempotency.test.ts
test/index-retry.test.ts
test/index.test.ts
test/network.test.ts
test/paths.test.ts
test/plugin-config.test.ts
test/proactive-refresh.test.ts
test/quota-cache.test.ts
test/recovery-storage.test.ts
test/refresh-guardian.test.ts
test/refresh-lease.test.ts
test/refresh-queue.test.ts
test/response-handler-logging.test.ts
test/response-handler.test.ts
test/retry-governor.test.ts
test/schemas.test.ts
test/scripts/doctor-dev.test.ts
test/scripts/setup-dev.test.ts
test/secrets-crypto.test.ts
test/server.unit.test.ts
test/settings-hub-utils.test.ts
test/shutdown.test.ts
test/storage.test.ts
test/telemetry.test.ts
test/unified-settings.test.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (3)

test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

test/codex-cli-sync.test.ts
test/response-handler-logging.test.ts
test/response-handler.test.ts
test/schemas.test.ts
test/paths.test.ts
test/recovery-storage.test.ts
test/auto-update-checker.test.ts
test/authorization.test.ts
test/proactive-refresh.test.ts
test/plugin-config.test.ts
test/fetch-helpers.test.ts
test/quota-cache.test.ts
test/auth-logging.test.ts
test/settings-hub-utils.test.ts
test/scripts/setup-dev.test.ts
test/scripts/doctor-dev.test.ts
test/active-index.test.ts
test/auth.test.ts
test/accounts-edge.test.ts
test/index-retry.test.ts
test/unified-settings.test.ts
test/refresh-queue.test.ts
test/telemetry.test.ts
test/host-codex-prompt.test.ts
test/retry-governor.test.ts
test/idempotency.test.ts
test/data-retention.test.ts
test/server.unit.test.ts
test/network.test.ts
test/refresh-lease.test.ts
test/shutdown.test.ts
test/data-redaction.test.ts
test/secrets-crypto.test.ts
test/refresh-guardian.test.ts
test/account-view.test.ts
test/codex-manager-cli.test.ts
test/config-save.test.ts
test/codex-prompts.test.ts
test/background-jobs.test.ts
test/file-lock.test.ts
test/audit.test.ts
test/storage.test.ts
test/account-storage-view.test.ts
test/index.test.ts

lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

lib/audit.ts
lib/index.ts
lib/auto-update-checker.ts
lib/accounts/account-view.ts
lib/data-retention.ts
lib/proactive-refresh.ts
lib/refresh-guardian.ts
lib/schemas.ts
lib/background-jobs.ts
lib/refresh-lease.ts
lib/recovery/storage.ts
lib/quota-cache.ts
lib/codex-cli/sync.ts
lib/request/helpers/tool-utils.ts
lib/utils.ts
lib/codex-manager/settings-hub.ts
lib/idempotency.ts
lib/file-lock.ts
lib/secrets-crypto.ts
lib/network.ts
lib/prompts/host-codex-prompt.ts
lib/request/response-handler.ts
lib/shutdown.ts
lib/refresh-queue.ts
lib/accounts.ts
lib/accounts/storage-view.ts
lib/prompts/codex.ts
lib/accounts/active-index.ts
lib/auth/auth.ts
lib/request/retry-governor.ts
lib/storage.ts
lib/telemetry.ts
lib/request/fetch-helpers.ts
lib/authorization.ts
lib/codex-manager.ts
lib/config.ts
lib/unified-settings.ts
lib/data-redaction.ts

docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

docs/development/RELEASE_RUNBOOK.md
docs/reference/settings.md
docs/configuration.md
docs/runbooks/incident-response.md
docs/upgrade.md
docs/reference/error-contracts.md
docs/development/LOCAL_DEV.md
docs/runbooks/README.md
docs/privacy.md
docs/DOCUMENTATION.md
docs/runbooks/operations.md
docs/development/CONFIG_FIELDS.md
docs/development/TESTING.md
docs/README.md
docs/index.md
docs/reference/storage-paths.md
docs/reference/commands.md
docs/reference/public-api.md

🧬 Code graph analysis (53)

test/codex-cli-sync.test.ts (3)

scripts/benchmark-runtime-path.mjs (2)

accountsPath (208-208)

current (292-292)

lib/storage.ts (1)

AccountStorageV3 (34-34)

lib/codex-cli/sync.ts (1)

syncAccountStorageFromCodexCli (328-421)

test/response-handler.test.ts (1)

lib/request/response-handler.ts (2)

convertSseToJsonWithDetails (65-144)

convertSseToJson (56-63)

test/paths.test.ts (1)

lib/storage/paths.ts (1)

resolveProjectStorageIdentityRoot (256-298)

test/recovery-storage.test.ts (1)

lib/recovery/constants.ts (2)

MESSAGE_STORAGE (31-31)

PART_STORAGE (32-32)

test/authorization.test.ts (2)

lib/authorization.ts (2)

getAuthorizationRole (38-40)

authorizeAction (107-145)

lib/audit.ts (2)

getAuditConfig (63-65)

configureAudit (59-61)

test/proactive-refresh.test.ts (1)

lib/proactive-refresh.ts (3)

refreshExpiringAccounts (136-200)

DEFAULT_PROACTIVE_BUFFER_MS (21-21)

DEFAULT_PROACTIVE_REFRESH_MAX_CONCURRENCY (25-25)

test/fetch-helpers.test.ts (1)

lib/request/fetch-helpers.ts (1)

handleSuccessResponseDetailed (651-683)

test/quota-cache.test.ts (1)

lib/quota-cache.ts (1)

saveQuotaCache (221-275)

test/auth-logging.test.ts (2)

lib/auth/auth.ts (3)

exchangeAuthorizationCode (180-244)

REDIRECT_URI (12-12)

refreshAccessToken (278-335)

lib/logger.ts (1)

logError (348-352)

lib/auto-update-checker.ts (1)

lib/network.ts (1)

fetchWithTimeoutAndRetry (124-194)

lib/accounts/account-view.ts (1)

lib/prompts/codex.ts (2)

ModelFamily (54-59)

MODEL_FAMILIES (65-71)

lib/data-retention.ts (1)

lib/runtime-paths.ts (3)

getCodexLogDir (226-228)

getCodexCacheDir (212-214)

getCodexMultiAuthDir (166-199)

test/scripts/setup-dev.test.ts (2)

scripts/setup-dev.js (5)

resolveNpmInvocation (20-36)

spawnFactory (39-39)

runCommand (38-72)

runCommandFn (78-78)

runSetupDev (74-142)

scripts/audit-dev-allowlist.js (1)

process (139-147)

lib/background-jobs.ts (5)

lib/runtime-paths.ts (1)

getCodexMultiAuthDir (166-199)

lib/file-lock.ts (1)

acquireFileLock (57-107)

lib/utils.ts (1)

sleep (65-67)

lib/data-redaction.ts (1)

redactForExternalOutput (31-50)

lib/logger.ts (1)

logWarn (341-346)

test/active-index.test.ts (2)

lib/accounts/active-index.ts (4)

createActiveIndexByFamily (22-31)

setActiveIndexForAllFamilies (33-43)

normalizeActiveIndexByFamily (45-86)

removeAccountAndReconcileActiveIndexes (88-124)

lib/prompts/codex.ts (1)

MODEL_FAMILIES (65-71)

lib/recovery/storage.ts (4)

lib/logger.ts (1)

createLogger (366-424)

lib/utils.ts (1)

isRecord (11-13)

lib/recovery/types.ts (2)

StoredMessageMeta (15-25)

StoredPart (67-78)

lib/recovery/constants.ts (2)

MESSAGE_STORAGE (31-31)

THINKING_TYPES (34-34)

test/auth.test.ts (1)

lib/auth/auth.ts (3)

exchangeAuthorizationCode (180-244)

REDIRECT_URI (12-12)

refreshAccessToken (278-335)

test/index-retry.test.ts (1)

index.ts (1)

OpenAIAuthPlugin (4378-4378)

lib/idempotency.ts (3)

lib/runtime-paths.ts (1)

getCodexMultiAuthDir (166-199)

lib/utils.ts (1)

nowMs (31-33)

lib/file-lock.ts (1)

acquireFileLock (57-107)

test/unified-settings.test.ts (1)

lib/unified-settings.ts (3)

getUnifiedSettingsPath (315-317)

saveUnifiedPluginConfig (379-398)

loadUnifiedPluginConfigSync (328-336)

test/host-codex-prompt.test.ts (1)

lib/prompts/host-codex-prompt.ts (1)

getHostCodexPrompt (378-409)

test/retry-governor.test.ts (1)

lib/request/retry-governor.ts (1)

decideRetryAllAccountsRateLimited (42-72)

scripts/benchmark-runtime-path.mjs (4)

lib/codex-cli/state.ts (1)

clearCodexCliStateCache (545-549)

lib/storage.ts (1)

normalizeAccountStorage (956-1044)

lib/codex-cli/sync.ts (1)

syncAccountStorageFromCodexCli (328-421)

lib/request/fetch-helpers.ts (2)

handleErrorResponse (595-632)

handleSuccessResponseDetailed (651-683)

test/data-retention.test.ts (1)

lib/data-retention.ts (2)

RetentionPolicy (5-11)

enforceDataRetention (95-122)

test/server.unit.test.ts (2)

lib/auth/server.ts (1)

startLocalOAuthServer (31-123)

lib/logger.ts (1)

logWarn (341-346)

test/network.test.ts (1)

lib/network.ts (1)

fetchWithTimeoutAndRetry (124-194)

lib/network.ts (1)

lib/utils.ts (2)

isAbortError (20-24)

sleep (65-67)

lib/prompts/host-codex-prompt.ts (2)

lib/network.ts (1)

fetchWithTimeoutAndRetry (124-194)

lib/logger.ts (1)

logDebug (325-331)

test/shutdown.test.ts (1)

lib/shutdown.ts (2)

registerCleanup (18-21)

runCleanup (30-62)

test/data-redaction.test.ts (1)

lib/data-redaction.ts (1)

redactForExternalOutput (31-50)

lib/refresh-queue.ts (1)

scripts/doctor-dev.js (1)

log (183-183)

test/secrets-crypto.test.ts (1)

lib/secrets-crypto.ts (4)

encryptSecret (107-121)

isEncryptedSecret (102-105)

decryptSecret (123-161)

getSecretEncryptionKeysFromEnv (168-173)

lib/accounts.ts (2)

lib/utils.ts (1)

sleep (65-67)

lib/background-jobs.ts (1)

runBackgroundJobWithRetry (71-115)

test/account-view.test.ts (1)

lib/accounts/account-view.ts (5)

resolveActiveIndex (16-25)

getRateLimitResetTimeForFamily (27-47)

formatRateLimitEntry (49-59)

formatRateLimitStatusByFamily (61-71)

formatActiveIndexByFamilyLabels (73-82)

test/codex-manager-cli.test.ts (1)

lib/codex-manager.ts (1)

runCodexMultiAuthCli (4657-4765)

lib/accounts/storage-view.ts (1)

lib/accounts/active-index.ts (1)

createActiveIndexByFamily (22-31)

lib/prompts/codex.ts (2)

lib/network.ts (1)

fetchWithTimeoutAndRetry (124-194)

lib/logger.ts (1)

logDebug (325-331)

test/config-save.test.ts (1)

lib/config.ts (2)

savePluginConfig (488-531)

loadPluginConfig (194-258)

lib/accounts/active-index.ts (1)

lib/prompts/codex.ts (2)

ModelFamily (54-59)

MODEL_FAMILIES (65-71)

lib/auth/auth.ts (3)

lib/logger.ts (2)

error (389-393)

logError (348-352)

lib/schemas.ts (1)

safeParseOAuthTokenResponse (312-318)

lib/utils.ts (2)

isAbortError (20-24)

fetchWithTimeout (76-112)

test/codex-prompts.test.ts (1)

lib/prompts/codex.ts (1)

getCodexInstructions (248-337)

test/background-jobs.test.ts (1)

lib/background-jobs.ts (2)

runBackgroundJobWithRetry (71-115)

getBackgroundJobDlqPath (67-69)

lib/storage.ts (2)

lib/secrets-crypto.ts (5)

SecretEncryptionKeys (17-20)

getSecretEncryptionKeysFromEnv (168-173)

isEncryptedSecret (102-105)

decryptSecret (123-161)

encryptSecret (107-121)

lib/file-lock.ts (1)

acquireFileLock (57-107)

test/file-lock.test.ts (1)

lib/file-lock.ts (1)

acquireFileLock (57-107)

lib/request/fetch-helpers.ts (3)

lib/constants.ts (1)

HTTP_STATUS (19-26)

lib/request/response-handler.ts (1)

convertSseToJsonWithDetails (65-144)

lib/utils.ts (1)

isRecord (11-13)

lib/authorization.ts (1)

lib/audit.ts (1)

auditLog (123-153)

index.ts (8)

lib/auth-rate-limit.ts (3)

checkAuthRateLimit (119-127)

recordAuthAttempt (49-61)

resetAuthRateLimit (98-101)

lib/request/retry-governor.ts (2)

RetryAllAccountsRateLimitDecisionReason (12-19)

decideRetryAllAccountsRateLimited (42-72)

lib/config.ts (3)

getFetchTimeoutMs (910-917)

getRetryAllAccountsAbsoluteCeilingMs (712-719)

getTelemetryEnabled (851-857)

lib/accounts/active-index.ts (3)

setActiveIndexForAllFamilies (33-43)

normalizeActiveIndexByFamily (45-86)

removeAccountAndReconcileActiveIndexes (88-124)

lib/telemetry.ts (1)

recordTelemetryEvent (287-312)

lib/audit.ts (1)

auditLog (123-153)

lib/accounts/storage-view.ts (2)

cloneAccountStorage (20-29)

createEmptyAccountStorage (8-18)

lib/accounts/account-view.ts (2)

formatActiveIndexByFamilyLabels (73-82)

formatRateLimitStatusByFamily (61-71)

test/storage.test.ts (1)

lib/storage.ts (4)

setStoragePathDirect (624-632)

loadAccounts (1051-1053)

AccountStorageV3 (34-34)

saveAccounts (1401-1406)

test/account-storage-view.test.ts (2)

lib/accounts/storage-view.ts (2)

createEmptyAccountStorage (8-18)

cloneAccountStorage (20-29)

lib/prompts/codex.ts (1)

MODEL_FAMILIES (65-71)

lib/codex-manager.ts (11)

lib/audit.ts (1)

auditLog (123-153)

lib/auth-rate-limit.ts (2)

checkAuthRateLimit (119-127)

resetAuthRateLimit (98-101)

lib/data-redaction.ts (1)

redactForExternalOutput (31-50)

lib/authorization.ts (3)

AuthAction (5-9)

AuthorizationContext (11-15)

authorizeAction (107-145)

lib/quota-cache.ts (2)

QuotaCacheData (23-26)

saveQuotaCache (221-275)

lib/accounts/active-index.ts (3)

createActiveIndexByFamily (22-31)

normalizeActiveIndexByFamily (45-86)

setActiveIndexForAllFamilies (33-43)

lib/storage.ts (1)

rotateStoredSecretEncryption (1713-1739)

lib/accounts/account-view.ts (2)

resolveActiveIndex (16-25)

formatRateLimitEntry (49-59)

lib/accounts.ts (2)

formatAccountLabel (922-941)

formatCooldown (943-952)

lib/telemetry.ts (3)

queryTelemetryEvents (314-334)

summarizeTelemetryEvents (336-365)

getTelemetryLogPath (283-285)

lib/idempotency.ts (1)

checkAndRecordIdempotencyKey (86-125)

lib/config.ts (2)

lib/schemas.ts (1)

PluginConfigSchema (13-62)

lib/utils.ts (1)

isRecord (11-13)

lib/unified-settings.ts (3)

lib/runtime-paths.ts (1)

getCodexMultiAuthDir (166-199)

lib/file-lock.ts (2)

acquireFileLockSync (126-181)

acquireFileLock (57-107)

lib/utils.ts (1)

sleep (65-67)

test/index.test.ts (1)

lib/prompts/codex.ts (1)

MODEL_FAMILIES (65-71)

🪛 ast-grep (0.41.0)

test/index.test.ts

[warning] 897-900: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 LanguageTool

docs/runbooks/operations.md

[uncategorized] ~64-~64: The official name of this software platform is spelled with a capital “H”.
Context: ...ion required checks remain aligned with .github/settings.yml. --- ## Failure Triage ...

(GITHUB)

🪛 markdownlint-cli2 (0.21.0)

docs/configuration.md

[warning] 73-73: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing

(MD056, table-column-count)

.github/workflows/ci.yml

.github/workflows/secret-scan.yml

.github/workflows/supply-chain.yml

docs/development/TESTING.md

lib/authorization.ts

test/refresh-guardian.test.ts

test/scripts/doctor-dev.test.ts

test/server.unit.test.ts

test/shutdown.test.ts

coderabbitai

Review continued from previous batch...

docs/configuration.md

docs/development/CONFIG_FIELDS.md

docs/development/LOCAL_DEV.md

docs/reference/settings.md

docs/runbooks/operations.md

test/plugin-config.test.ts

test/retry-governor.test.ts

test/secrets-crypto.test.ts

test/storage.test.ts

test/telemetry.test.ts

coderabbitai

Review continued from previous batch...

lib/accounts/active-index.ts

lib/codex-manager.ts

lib/config.ts

test/auth.test.ts

- harden auth and codex-manager review paths\n- add deterministic regression coverage for auth/manager/prompts/index\n- tighten CLI/list behavior, telemetry redaction, and account-index reconciliation\n\nCo-authored-by: Codex <noreply@openai.com>

lib/authorization.ts

lib/file-lock.ts

greptile-apps · 2026-03-04T23:17:55Z

Additional Comments (1)

lib/request/response-handler.ts
SSE parse-failure returns upstream status code (often 200)

when parseSseStream finds no response.done event (line 95), the returned Response inherits response.status from the upstream SSE response. for a normal codex request that stream was opened with a 200, so response.ok evaluates to true on the error path. any caller that gates on response.ok (or checks status < 400) will silently swallow this and treat the structured stream_parse_error body as a success payload.

the tests at test/response-handler.test.ts verify the error body is correct but don't assert the status code — a caller checking result.response.ok would incorrectly treat the parse error as success.

fix: use an explicit 5xx or 502 status to signal a gateway/parse failure, so downstream error handling gates on status code:

			return {
				response: new Response(
					JSON.stringify({
						error: {
							message: "No response.done event found in SSE stream",
							type: "stream_parse_error",
						},
					}),
					{
						status: 502,
						statusText: "Bad Gateway",
						headers: jsonHeaders,
					},
				),
			};

Prompt To Fix With AI

This is a comment left during a code review.
Path: lib/request/response-handler.ts
Line: 95-116

Comment:
**SSE parse-failure returns upstream status code (often 200)**

when `parseSseStream` finds no `response.done` event (line 95), the returned `Response` inherits `response.status` from the upstream SSE response. for a normal codex request that stream was opened with a 200, so `response.ok` evaluates to `true` on the error path. any caller that gates on `response.ok` (or checks status < 400) will silently swallow this and treat the structured `stream_parse_error` body as a success payload.

the tests at `test/response-handler.test.ts` verify the error body is correct but don't assert the status code — a caller checking `result.response.ok` would incorrectly treat the parse error as success.

fix: use an explicit 5xx or 502 status to signal a gateway/parse failure, so downstream error handling gates on status code:

```suggestion
			return {
				response: new Response(
					JSON.stringify({
						error: {
							message: "No response.done event found in SSE stream",
							type: "stream_parse_error",
						},
					}),
					{
						status: 502,
						statusText: "Bad Gateway",
						headers: jsonHeaders,
					},
				),
			};
```

How can I resolve this? If you propose a fix, please make it concise.

- add deterministic regressions for settings-hub metadata/unlimited preview, jitter ceiling accounting, quota-cache win32 writes, runtime budget validation, and sync benchmark guard - treat EBUSY account saves as recoverable merge conflicts and verify merged retry payloads - share removeWithRetry test helper to avoid duplicate Windows cleanup logic - normalize caller abort reasons in fetchWithTimeout catch paths to preserve AbortError contract Co-authored-by: Codex <noreply@openai.com>

- default CODEX_AUTH_ROLE to viewer to preserve least-privilege when unset - add double ownership verification before lock unlink and document TOCTOU limitation - add regressions for replacement-race lock release behavior and viewer default authorization Co-authored-by: Codex <noreply@openai.com>

coderabbitai

Actionable comments posted: 29

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

test/server.unit.test.ts (1)

328-350: 🧹 Nitpick | 🔵 Trivial

wrap timer cleanup in try/finally for determinism.

if an assertion fails before line 349, fake timers stay active and can bleed into subsequent tests. the new close-abort test at test/server.unit.test.ts:303-326 uses try/finally correctly—this one should too.

suggested fix

 		it('should return null after 5 minute timeout', async () => {
 			vi.useFakeTimers();
-			
-			(mockServer.listen as ReturnType<typeof vi.fn>).mockImplementation(
-				(_port: number, _host: string, callback: () => void) => {
-					callback();
-					return mockServer;
-				}
-			);
-			(mockServer.on as ReturnType<typeof vi.fn>).mockReturnValue(mockServer);
-
-			const result = await startLocalOAuthServer({ state: 'test-state' });
-			
-			const codePromise = result.waitForCode();
-			
-			await vi.advanceTimersByTimeAsync(5 * 60 * 1000 + 100);
-			
-			const code = await codePromise;
-			expect(code).toBeNull();
-			expect(logWarn).toHaveBeenCalledWith('OAuth poll timeout after 5 minutes');
-			
-			vi.useRealTimers();
+			try {
+				(mockServer.listen as ReturnType<typeof vi.fn>).mockImplementation(
+					(_port: number, _host: string, callback: () => void) => {
+						callback();
+						return mockServer;
+					}
+				);
+				(mockServer.on as ReturnType<typeof vi.fn>).mockReturnValue(mockServer);
+
+				const result = await startLocalOAuthServer({ state: 'test-state' });
+				
+				const codePromise = result.waitForCode();
+				
+				await vi.advanceTimersByTimeAsync(5 * 60 * 1000 + 100);
+				
+				const code = await codePromise;
+				expect(code).toBeNull();
+				expect(logWarn).toHaveBeenCalledWith('OAuth poll timeout after 5 minutes');
+			} finally {
+				vi.useRealTimers();
+			}
 		});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/server.unit.test.ts` around lines 328 - 350, The test "should return
null after 5 minute timeout" uses vi.useFakeTimers() but doesn't guarantee
restoring real timers on failure; wrap the fake-timer setup and all test logic
in a try/finally so vi.useRealTimers() is always called. Specifically, in the
test block containing startLocalOAuthServer, result.waitForCode(),
vi.advanceTimersByTimeAsync, and assertions, move vi.useRealTimers() into a
finally clause to ensure cleanup even if an assertion throws; reference the test
name, vi.useFakeTimers/vi.useRealTimers, startLocalOAuthServer, and
result.waitForCode to locate the code to change.

index.ts (1)

797-857: ⚠️ Potential issue | 🟠 Major

add deterministic vitest coverage for the concurrent hydration worker path.

index.ts:797-857 introduces concurrent workers with shared progress/counter state (nextHydrateIndex, refreshed/failed/updated counters), but no corresponding deterministic regression is included in the changed test/** files. add a test that forces mixed refresh outcomes and asserts each account is processed once with stable aggregate counts.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Also applies to: 859-880
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@index.ts` around lines 797 - 857, The concurrent worker loop using shared
state (workerCount, nextHydrateIndex, queuedRefresh, accountsToHydrate,
refreshedCount, failedCount, updatedAccountCount) needs a deterministic vitest
regression: add a test in test/** that sets HYDRATE_EMAILS_MAX_CONCURRENCY to a
known value, provides a fixed accountsToHydrate array, and mocks queuedRefresh
to return a controlled sequence of mixed outcomes (success with varied
access/refresh/expires, failure) while ensuring each mock call is deterministic
(use a seeded sequence or a controlled promise queue rather than timing/flaky
delays). The test must assert each account was processed exactly once and that
refreshedCount/failedCount/updatedAccountCount match expected values, and must
avoid mocking real secrets or skipping assertions.

docs/reference/settings.md (1)

184-192: ⚠️ Potential issue | 🟠 Major

validation commands are stale relative to current verification gates.

docs/reference/settings.md:184-192 still shows only runtime cli checks. this section should include the current verification gates and npm scripts used by the hardening rollout, plus an upgrade-note pointer.
proposed docs update
 ## Validation
 
 After changes:
 
 ```bash
-codex auth status
-codex auth check
-codex auth forecast --live
+npm run verify
+npm run verify:ci
+npm test -- test/documentation.test.ts
+# optional local checks
+npm run doctor:dev
+npm run setup:dev
</details>



As per coding guidelines `docs/**`: keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/reference/settings.md around lines 184 - 192, Update the Validation
section to replace the stale CLI-only checks with the current verification gates
and npm scripts: list "npm run verify", "npm run verify:ci", and "npm test --
test/documentation.test.ts" as the primary gates, and add optional local checks
"npm run doctor:dev" and "npm run setup:dev"; also add a brief upgrade-note
pointer to the release/upgrade documentation indicating these new scripts and
any changed CLI flags (ensure the section references the exact script names and
the documentation test file to match current workflows).
</details>

</blockquote></details>
<details>
<summary>lib/auth/auth.ts (1)</summary><blockquote>

`234-253`: _⚠️ Potential issue_ | _🟠 Major_

**normalize refresh timeout input before calling `fetchwithtimeout`**

`lib/auth/auth.ts:253` (line 253) uses `options.timeoutMs ?? OAUTH_REFRESH_TIMEOUT_MS`, so `0`, negative values, and `NaN` can bypass sane bounds. this is inconsistent with `lib/auth/auth.ts:121-126` (line 121-126), where exchange timeout is clamped and validated.

<details>
<summary>proposed fix</summary>

```diff
 type RefreshAccessTokenOptions = {
 	signal?: AbortSignal;
 	timeoutMs?: number;
 };

+function resolveRefreshTimeoutMs(timeoutMs: number | undefined): number {
+	if (typeof timeoutMs !== "number" || !Number.isFinite(timeoutMs)) {
+		return OAUTH_REFRESH_TIMEOUT_MS;
+	}
+	return Math.max(1_000, Math.floor(timeoutMs));
+}
+
 export async function refreshAccessToken(
 	refreshToken: string,
 	options: RefreshAccessTokenOptions = {},
 ): Promise<TokenResult> {
 	try {
 		const response = await fetchWithTimeout(TOKEN_URL, {
@@
-		}, options.timeoutMs ?? OAUTH_REFRESH_TIMEOUT_MS);
+		}, resolveRefreshTimeoutMs(options.timeoutMs));
please add vitest coverage for timeoutMs values 0, -1, and NaN in the refresh path. As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/auth/auth.ts` around lines 234 - 253, The refreshAccessToken function
passes options.timeoutMs directly to fetchWithTimeout allowing 0, negative, or
NaN to bypass the default; normalize and clamp timeoutMs before the fetch: read
RefreshAccessTokenOptions.timeoutMs, coerce to a finite number, and if it's NaN
or <= 0 fall back to OAUTH_REFRESH_TIMEOUT_MS (or apply the same min/max clamp
logic used in the exchange path), then pass that normalized value to
fetchWithTimeout; update refreshAccessToken and related references
(refreshAccessToken, RefreshAccessTokenOptions, fetchWithTimeout,
OAUTH_REFRESH_TIMEOUT_MS) accordingly and add Vitest cases asserting behavior
for timeoutMs values 0, -1, and NaN in the refresh path, ensuring tests avoid
logging tokens/emails and reference any changed tests in the commit message.

lib/storage.ts (1)

1118-1174: ⚠️ Potential issue | 🟠 Major

known revision is overwritten with a stale checksum after migration

lib/storage.ts:1123-1129 (line 1123-1129) can persist migrated content, but lib/storage.ts:1173 (line 1173) still stores rawChecksum from the pre-migration file. then saveAccountsUnlocked conflict checks at lib/storage.ts:1258-1273 (line 1258-1273) can raise false ECONFLICT even with no concurrent writer.
proposed fix
-    const { normalized, storedVersion, schemaErrors, rawChecksum } = await loadAccountsFromPath(path);
+    const { normalized, storedVersion, schemaErrors, rawChecksum } = await loadAccountsFromPath(path);
+    let revisionToRemember = rawChecksum;
@@
       if (persistMigration) {
         try {
           await persistMigration(normalized);
+          revisionToRemember = (await readStorageRevision(path)) ?? rawChecksum;
         } catch (saveError) {
           log.warn("Failed to persist migrated storage", { error: String(saveError) });
         }
       }
@@
-	rememberKnownStorageRevision(path, rawChecksum);
+	rememberKnownStorageRevision(path, revisionToRemember);
please add a vitest regression for v1 -> v3 auto-migration followed by immediate save to confirm no false conflict is raised. As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Also applies to: 1258-1273
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 1118 - 1174, The migration path currently
persists migrated data (persistMigration) but still calls
rememberKnownStorageRevision(path, rawChecksum) using the pre-migration
rawChecksum, causing false ECONFLICTs in saveAccountsUnlocked; fix by updating
the stored checksum after a successful persistence: after
persistMigration(normalized) either compute the actual on-disk checksum for the
migrated content or reload the written file (e.g., via
loadAccountsFromPath(path)) to obtain the new rawChecksum and pass that value
into rememberKnownStorageRevision instead of the original rawChecksum; update
code paths around persistMigration, the promotion branch, and the final
rememberKnownStorageRevision call accordingly (functions/methods:
persistMigration, loadAccountsFromPath, rememberKnownStorageRevision,
saveAccountsUnlocked), and add a vitest regression that performs a v1->v3
auto-migration then immediately calls saveAccountsUnlocked to assert no
ECONFLICT is thrown.

♻️ Duplicate comments (8)

test/utils.test.ts (1)
194-236: ⚠️ Potential issue | 🟠 Major

add timeout-path regression coverage for fetchwithtimeout.

Line 199 and Line 216 validate caller-abort behavior, but there is still no test for the hard-timeout abort path implemented in lib/utils.ts:84-89 and lib/utils.ts:123-125. this leaves a concurrency-sensitive abort classification path unguarded.
proposed test addition
 describe('fetchWithTimeout', () => {
 	afterEach(() => {
 		vi.unstubAllGlobals();
 	});
+
+	it('normalizes hard timeout to AbortError shape', async () => {
+		vi.useFakeTimers();
+		try {
+			vi.stubGlobal(
+				'fetch',
+				vi.fn((_input: unknown, init?: RequestInit) =>
+					new Promise((_, reject) => {
+						init?.signal?.addEventListener(
+							'abort',
+							() => reject(init.signal?.reason),
+							{ once: true },
+						);
+					}),
+				),
+			);
+			const request = fetchWithTimeout('https://example.com', {}, 1_000);
+			await vi.advanceTimersByTimeAsync(1_000);
+			await expect(request).rejects.toMatchObject({
+				name: 'AbortError',
+				code: 'ABORT_ERR',
+				message: 'Fetch timeout after 1000ms',
+			});
+		} finally {
+			vi.useRealTimers();
+		}
+	});
As per coding guidelines `test/**`: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
lib/idempotency.ts (1)
98-100: ⚠️ Potential issue | 🟠 Major

invalid ttl fallback to 1ms weakens replay protection under misconfiguration.

Line 98 in lib/idempotency.ts:98 maps non-finite/invalid ttl to 1. that can effectively disable idempotency after ~1ms if caller input is bad. default to IDEMPOTENCY_TTL_MS for fail-safe behavior.
proposed fix
 function normalizeTtlMs(ttlMs: number): number {
-	return Number.isFinite(ttlMs) && ttlMs > 0 ? Math.floor(ttlMs) : 1;
+	if (Number.isFinite(ttlMs) && ttlMs > 0) {
+		return Math.max(1, Math.floor(ttlMs));
+	}
+	return IDEMPOTENCY_TTL_MS;
 }
As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/idempotency.ts` around lines 98 - 100, normalizeTtlMs currently maps
invalid/non-finite TTLs to 1ms which weakens replay protection; change
normalizeTtlMs to return the configured fail-safe constant IDEMPOTENCY_TTL_MS
when ttlMs is not a positive finite number (keep Math.floor for valid values),
update the function reference in lib/idempotency.ts to import/use
IDEMPOTENCY_TTL_MS, and adjust/add vitest unit tests that assert fallback
behavior (ensure tests cover non-finite, negative, and NaN inputs) so the
default is IDEMPOTENCY_TTL_MS rather than 1.
test/network.test.ts (1)
89-106: ⚠️ Potential issue | 🟡 Minor

strengthen the 429 retry regression with callback payload assertions.

Line 89 in test/network.test.ts:89 covers 429 retry count, but it does not assert onRetry metadata. this can miss regressions in retry reason/status reporting while still passing.
proposed test tightening
 it("retries on HTTP 429 when configured", async () => {
+	const onRetry = vi.fn();
 	fetchMock
 		.mockResolvedValueOnce(new Response("rate-limited", { status: 429 }))
 		.mockResolvedValueOnce(new Response("ok", { status: 200 }));

 	const resultPromise = fetchWithTimeoutAndRetry("https://example.com", undefined, {
 		timeoutMs: 2_000,
 		retries: 1,
 		baseDelayMs: 25,
 		jitterMs: 0,
 		retryOnStatuses: [429],
+		onRetry,
 	});
 	await vi.runAllTimersAsync();
 	const result = await resultPromise;

 	expect(result.attempts).toBe(2);
 	expect(fetchMock).toHaveBeenCalledTimes(2);
+	expect(onRetry).toHaveBeenCalledWith(
+		expect.objectContaining({
+			reason: "status",
+			status: 429,
+			attempt: 1,
+			maxAttempts: 2,
+		}),
+	);
 });
As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/network.test.ts` around lines 89 - 106, The test currently only asserts
retry counts but not the onRetry callback payload; update the test for
fetchWithTimeoutAndRetry to pass a mocked onRetry handler (e.g., vi.fn()) and
assert it was called once with the expected metadata (attempt number 1, the
retry reason/status indicating 429 or a response object with status 429, and the
request URL and delay values), in addition to the existing attempts/fetch call
count assertions so the retry reason/status reporting is validated
deterministically.
lib/authorization.ts (1)
30-36: ⚠️ Potential issue | 🔴 Critical

default role still "admin" when env absent - pr objectives flag this as critical.

lib/authorization.ts:31 defaults to "admin" when CODEX_AUTH_ROLE is unset. pr objectives explicitly state: "default role fallback ... currently defaults to 'admin' when env absent; should be 'viewer'". the past review addressed invalid values falling back to "viewer" (line 35), but the absent-env case still grants admin privileges.
🔧 proposed fix
 function getRoleFromEnv(): AuthRole {
-	const raw = (process.env.CODEX_AUTH_ROLE ?? "admin").trim().toLowerCase();
+	const raw = (process.env.CODEX_AUTH_ROLE ?? "").trim().toLowerCase();
 	if (raw === "operator" || raw === "viewer" || raw === "admin") {
 		return raw;
 	}
 	return "viewer";
 }
this change means both absent env and invalid env fall back to "viewer", which is least-privilege. update test/authorization.test.ts:39-52 to expect "viewer" instead of "admin" when env is deleted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/authorization.ts` around lines 30 - 36, The getRoleFromEnv function
currently defaults to "admin" when CODEX_AUTH_ROLE is unset; change it so the
default fallback is "viewer" (least-privilege) and preserve the existing
validation that invalid values map to "viewer"; update the logic in
getRoleFromEnv to initialize raw to "viewer" (or check for undefined and return
"viewer") instead of "admin", and update the test expectations in
test/authorization.test.ts (lines checking behavior when env is deleted) to
expect "viewer" rather than "admin".
test/data-retention.test.ts (1)
80-114: ⚠️ Potential issue | 🟡 Minor

add ebusy case to complete windows prune regression coverage.

line 80 currently validates only EPERM. please include an EBUSY variant (or parameterize both) so windows transient fs behavior is fully pinned in test/data-retention.test.ts:80 against lib/data-retention.ts:95.
proposed parameterized variant
-it("rethrows non-ENOENT errors during recursive prune", async () => {
+it.each(["EPERM", "EBUSY"] as const)(
+ "rethrows non-ENOENT errors during recursive prune (%s)",
+ async (code) => {
   const { enforceDataRetention } = await import("../lib/data-retention.js");
   // ...
   unlinkSpy.mockImplementation(async (...args) => {
     const path = args[0];
     if (typeof path === "string" && path.endsWith("locked.log")) {
       const error = new Error("locked") as NodeJS.ErrnoException;
-      error.code = "EPERM";
+      error.code = code;
       throw error;
     }
     return originalUnlink(...(args as Parameters<typeof fs.unlink>));
   });
   // ...
-  ).rejects.toMatchObject({ code: "EPERM" });
+  ).rejects.toMatchObject({ code });
   // ...
-});
+});
As per coding guidelines, "test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/data-retention.test.ts` around lines 80 - 114, The test only asserts
that enforceDataRetention throws an EPERM error for the locked file; add
coverage for the Windows transient case by including an EBUSY variant (or
parameterize to test both "EPERM" and "EBUSY") in the mocked fs.unlink behavior
inside the "rethrows non-ENOENT errors during recursive prune" test, so the
mockImplementation throws a NodeJS.ErrnoException with code set to "EBUSY" for
the locked.log path in addition to (or as a separate subtest from) "EPERM", and
update the expect(...).rejects.toMatchObject to assert the corresponding
error.code for each variant while restoring the spy in the finally block.
lib/file-lock.ts (2)
184-187: ⚠️ Potential issue | 🟠 Major

released is marked true before cleanup succeeds, blocking retry on transient failures.

if releaseOwnedLock or releaseOwnedLockSync throws, callers cannot retry release because the handle is already marked released. this can strand lock files until stale eviction.
proposed fix
 				release: async () => {
 					if (released) return;
-					released = true;
-					await releaseOwnedLock(path, metadata.ownerId);
+					await releaseOwnedLock(path, metadata.ownerId);
+					released = true;
 				},
 				release: () => {
 					if (released) return;
-					released = true;
-					releaseOwnedLockSync(path, metadata.ownerId);
+					releaseOwnedLockSync(path, metadata.ownerId);
+					released = true;
 				},
As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Also applies to: 265-268
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/file-lock.ts` around lines 184 - 187, The current release flow marks the
local flag released = true before calling releaseOwnedLock/releaseOwnedLockSync,
which prevents retries if those functions throw; change the logic so released is
only set to true after the release call completes successfully (i.e., call await
releaseOwnedLock(path, metadata.ownerId) inside a try/catch and set released =
true in the try after success, rethrow or handle errors without flipping the
flag), and apply the same pattern for the sync variant; update/ add vitest cases
that simulate transient failures (EBUSY/429) to assert retries and eventual
cleanup, and ensure any added logging around failures does not leak tokens or
emails.
94-117: ⚠️ Potential issue | 🔴 Critical

owner verification is still exposed to read-then-unlink race.

the current flow reads owner id then unlinks by path. under stale-lock eviction + rapid reacquire, a late releaser can still remove a newer lock file between those steps. this needs an atomic ownership release model and a regression in test/file-lock.test.ts that interleaves stale eviction/reacquire with delayed release.

As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Also applies to: 119-142
test/plugin-config.test.ts (1)
1052-1073: ⚠️ Potential issue | 🟡 Minor

absolute ceiling clamp coverage still misses pure config-value bounds.

these tests validate env clamp behavior, but they still do not assert negative/oversized values coming from pluginConfig when env is unset (lib/config.ts:731-738 handles both paths). add two config-path cases to lock the getter behavior.
proposed tests
+		it('should clamp config retry-all absolute ceiling to non-negative values', () => {
+			delete process.env.CODEX_AUTH_RETRY_ALL_ABSOLUTE_CEILING_MS;
+			expect(
+				getRetryAllAccountsAbsoluteCeilingMs({ retryAllAccountsAbsoluteCeilingMs: -10 }),
+			).toBe(0);
+		});
+
+		it('should clamp config retry-all absolute ceiling to 24h hard maximum', () => {
+			delete process.env.CODEX_AUTH_RETRY_ALL_ABSOLUTE_CEILING_MS;
+			expect(
+				getRetryAllAccountsAbsoluteCeilingMs({
+					retryAllAccountsAbsoluteCeilingMs: 999_999_999,
+				}),
+			).toBe(24 * 60 * 60_000);
+		});
As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/plugin-config.test.ts` around lines 1052 - 1073, Add two unit tests that
cover the config-path clamping for getRetryAllAccountsAbsoluteCeilingMs when the
environment var is unset: one that sets
PluginConfig.retryAllAccountsAbsoluteCeilingMs to a negative value and asserts
the getter returns 0, and another that sets it to a very large value and asserts
the getter returns the 24-hour hard maximum; ensure both tests clear
CODEX_AUTH_RETRY_ALL_ABSOLUTE_CEILING_MS (or leave it unset) so they exercise
the config branch in the getter.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/supply-chain.yml:
- Line 22: Replace all floating action tags like actions/checkout@v4 in the
workflow with their corresponding immutable commit SHAs; specifically locate
occurrences of actions/checkout@v4 (and any other actions referenced with `@v4` at
the same file) and update each tag to the full commit SHA (e.g.,
actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5) so every use in the
file is pinned to an exact commit; ensure you update all instances mentioned
(the entries at the same positions as actions/checkout@v4 and the other `@v4`
refs) and run a quick YAML validation after editing.

In `@docs/development/CONFIG_FIELDS.md`:
- Around line 197-214: The docs table of env overrides and the plugin field
inventory are missing the telemetry entries; add the
`CODEX_AUTH_TELEMETRY_ENABLED` row to the env variable table in CONFIG_FIELDS
and include `telemetryEnabled` in the plugin/field inventory so documentation
matches runtime/tests (the tests reference telemetry usage). Update the docs
entry text to briefly describe the toggle (boolean enabling/disabling telemetry)
and ensure naming matches the code symbols `CODEX_AUTH_TELEMETRY_ENABLED` and
`telemetryEnabled`.

In `@docs/runbooks/operations.md`:
- Around line 57-66: The pre-release checklist in docs/runbooks/operations.md is
missing two validation gates; update the checklist to include explicit entries
for the npm scripts `npm run verify` and `npm test --
test/documentation.test.ts` so operators run the full gate set, i.e., insert
both commands into the numbered list alongside `npm run verify:ci`/other checks
and ensure wording matches the rest of the list for consistency with documented
CLI/workflows.

In `@index.ts`:
- Around line 1784-1794: The auditLog calls (e.g., the invocation using
AuditAction.REQUEST_START and the actor argument currently set to account.email)
are persisting raw emails; change these audit actor values to a non-identifying
key instead (for example a hashed or formatted identifier derived from
account.index or account.id) and remove account.email from the persistent
metadata. Update all occurrences that pass account.email to auditLog (including
the calls using account.index + 1 and the metadata object with
accountIndex/model/modelFamily) to pass a stable non-PII actor string (e.g.,
`account-${account.index + 1}` or a hash of account.id) and ensure any email is
only used transiently (not stored) or omitted; preserve other metadata fields
(model, accountIndex, modelFamily) as before. Ensure tests referencing audit
records (vitest) are updated to expect the non-PII actor key.

In `@lib/accounts.ts`:
- Around line 846-856: The merge-on-conflict path currently uses
clampNonNegativeInt on latest.activeIndexByFamily values so a stale too-large
index can survive; update the merge logic in lib/accounts.ts where
mergedActiveIndexByFamily and activeIndex/activeIndexByFamily are computed to
additionally clamp each family index against the mergedAccounts.length-1 (i.e.,
min(currentIndex, mergedAccounts.length-1) after ensuring non-negative) so
indices never exceed the merged account bounds; then add a vitest regression
that simulates an ebusy/econflict recovery merge where latest has fewer accounts
than stored family indices and assert for mergedActiveIndexByFamily and
activeIndex that every persisted index ∈ [0, accounts.length-1], and reference
the changed symbols clampNonNegativeInt, mergedActiveIndexByFamily, activeIndex,
and activeIndexByFamily in the test description.

In `@lib/accounts/account-view.ts`:
- Around line 23-24: The clamp currently returns fractional indexes; change
normalization in lib/accounts/account-view.ts to convert rawCandidate to an
integer (e.g., use Math.floor or Math.trunc on rawCandidate) before clamping so
the returned active index is an integer, and update the logic that computes the
active index (look for the variable rawCandidate / raw and the function
producing activeIndexByFamily.codex) to use the integer-normalized value; then
add a vitest regression in test/account-view.test.ts that sets
activeIndexByFamily.codex to a fractional value like 1.8 and asserts the
resolved active index equals 1 to cover this case.

In `@lib/codex-manager.ts`:
- Around line 4255-4268: The failure branch that prints the rotate-secrets JSON
currently logs raw error and idempotencyKey; wrap the output object with
maybeRedactJsonOutput (or run its fields through the redaction helper) before
JSON.stringify in the rotate-secrets error path so error and idempotencyKey are
redacted; update the code around the console.log that emits the { command:
"rotate-secrets", ..., error: message, ...(idempotencyKey ? { idempotencyKey } :
{}) } object to call maybeRedactJsonOutput(...) and then JSON.stringify that
result. Also add a vitest assertion in test/codex-manager-cli.test.ts that
running rotate-secrets --json on the failure path produces redacted output
(e.g., contains the redact marker or does not contain the original
error/idempotencyKey values).
- Around line 1340-1343: checkAuthRateLimit can throw inside the oauth login
path causing runOAuthFlow to propagate exceptions; wrap the pre-checks
(checkAuthRateLimit and recordAuthAttempt) in runOAuthFlow with a try/catch that
converts rate-limit/EBUSY errors (e.g. 429 or filesystem EBUSY) into a valid
failed TokenResult instead of throwing, leaving loadPluginConfig unchanged; then
add a vitest regression in test/codex-manager-cli.test.ts that stubs
checkAuthRateLimit to throw a 429-like error and asserts the CLI path (which
calls runOAuthFlow) returns the expected failed auth TokenResult (and document
the new test referencing EBUSY/429 handling).

In `@lib/config.ts`:
- Around line 306-319: Add vitest tests for readFileSyncWithRetry that mock
isRetryableFsError to return true and stub/spy on sleepSync to assert it is
called with CONFIG_READ_RETRY_BASE_DELAY_MS * 2**attempt for each retry attempt,
and assert retry stops and the original error is thrown after
CONFIG_READ_MAX_ATTEMPTS; include a separate test verifying cumulative blocking
time equals the sum of backoff delays (document the computed max cumulative
block time) and mark it acceptable for startup paths. Reference the
functions/symbols sleepSync, readFileSyncWithRetry, isRetryableFsError,
CONFIG_READ_MAX_ATTEMPTS, and CONFIG_READ_RETRY_BASE_DELAY_MS when locating code
to test; keep tests in the vitest suite (e.g., test/config.test.ts) and
restore/unmock original implementations after each test.

In `@lib/data-retention.ts`:
- Around line 57-75: The pruneDirectoryByAge flow currently does a
check-then-delete on directories (pruneDirectoryByAge + fs.rmdir) and rethrows
non-ENOENT errors, which causes flaky failures when rmdir races with concurrent
writers and raises ENOTEMPTY/EBUSY; update the deletion logic to catch ENOTEMPTY
and EBUSY, retry rmdir a few times with short backoff (e.g., exponential/backoff
up to N attempts) before giving up, ensuring removed count and recursion still
behave correctly; update the catch block that currently only treats ENOENT
specially to also swallow ENOTEMPTY/EBUSY until retries are exhausted and only
rethrow other errors; add a vitest regression that mocks/simulates concurrent
repopulation and a windows-style EBUSY from fs.rmdir (e.g., first call throws
EBUSY, subsequent call succeeds, and another test where a concurrent writer
re-adds a file so rmdir ultimately fails) to assert pruneDirectoryByAge handles
retries and doesn't abort retention, and ensure tests verify no logs emit
sensitive tokens/emails.

In `@lib/request/fetch-helpers.ts`:
- Around line 667-673: handleSuccessResponseDetailed returns converted.response
from convertSseToJsonWithDetails, but convertSseToJsonWithDetails (in
response-handler.ts) currently maps SSE parse failures to the upstream
response.status which can be 200; update convertSseToJsonWithDetails so that
when it constructs a parse-failure response it sets status to 502 and a suitable
error body instead of using response.status, then ensure
handleSuccessResponseDetailed (in fetch-helpers.ts) continues to use
converted.response; add a vitest regression in test/fetch-helpers.test.ts that
sends a malformed SSE stream with an upstream 200 and asserts the final response
status is 502; mention the changed test in the commit message and, per lib/**
guidelines, ensure any new queue logic added/modified around this flow properly
handles EBUSY and 429 retry/backoff scenarios.

In `@lib/secrets-crypto.ts`:
- Around line 26-28: The deriveAesKey function currently calls scryptSync(input,
salt, AES_KEY_LENGTH) which blocks the event loop; update the code or docs:
either (a) replace scryptSync with the async scrypt and make deriveAesKey return
a Promise (and update its callers such as decryptSecret to await it), or (b) add
a clear comment above deriveAesKey documenting that scryptSync is blocking
(~50-100ms) and recommend using the async variant or caching derived keys for
hot paths; reference deriveAesKey, scryptSync, AES_KEY_LENGTH, and decryptSecret
in your change so reviewers can verify caller updates or the added
documentation.

In `@lib/shutdown.ts`:
- Around line 31-33: The current runCleanup() directly awaits the shared promise
cleanupInFlight which lets later callers bypass their own timeout if the first
caller timed out; change the in-flight branch to wait with the same bounded
timeout logic used for fresh cleanups (i.e., race cleanupInFlight against the
existing timeout promise instead of awaiting it unconditionally), ensure
cleanupInFlight is left intact if it still runs, and update/add a vitest
regression in test/shutdown.test.ts that simulates a hanging cleanup where call
`#1` times out and verifies call `#2` also returns after its timeout; reference the
runCleanup function and the cleanupInFlight symbol in the changes and tests.

In `@lib/storage.ts`:
- Around line 345-373: The current fallback releaseStorageLockFallback
unconditionally force-removes the lock file (called from withAccountFileLock
when lock.release() throws), which can delete a lock owned by another process;
change the fallback so it does NOT blindly unlink: instead, on release failure
read/inspect the lock metadata (e.g., the lock file contents or lock token
returned by acquireFileLock) and only remove the file if you can prove ownership
(or better, avoid removing at all and only log the failure), update
withAccountFileLock to pass any lock identity/token from acquireFileLock into
the fallback path so ownership can be verified, and ensure
releaseStorageLockFallback no longer calls fs.rm(..., {force:true}) without
ownership proof. Also add a vitest regression that mocks lock.release() throwing
and asserts fs.rm is not called (or that removal only occurs when ownership is
verified) and update tests to cover Windows/EBUSY-like behavior and logging
checks to avoid leaking sensitive info.

In `@lib/telemetry.ts`:
- Around line 82-85: Add Windows transient errno coverage by including "EACCES"
and "EAGAIN" in the RETRYABLE_FILE_OP_CODES Set (alongside existing "EBUSY",
"EPERM", "ENOTEMPTY") in lib/telemetry.ts so the existing retry plumbing that
uses FILE_OP_MAX_ATTEMPTS and FILE_OP_BASE_DELAY_MS will retry on common Windows
lock errors; update test/telemetry.test.ts to add cases that simulate EACCES and
EAGAIN during telemetry file write/rotation to assert retries occur and no data
is dropped, and ensure added tests do not log secrets (tokens/emails).

In `@lib/unified-settings.ts`:
- Around line 217-228: Add a brief inline comment above the retry loop that uses
Atomics.wait to explain that the synchronous sleep is intentionally blocking the
event loop because this code path uses fs.renameSync (renameSync) and callers
expect synchronous behavior; mention that this is deliberate for the sync path,
reference UNIFIED_SETTINGS_PATH/renameSync and the isRetryableFsError retry
logic, and note that developers should avoid converting this to async without
adjusting callers or behavior.

In `@scripts/benchmark-runtime-path.mjs`:
- Around line 36-50: The removeWithRetry function calls an undefined sleep,
causing ReferenceError on retry; fix by adding a proper async backoff helper
(e.g., implement or import a sleep/msDelay function that returns a Promise
resolved after given ms) and use that from removeWithRetry, ensuring the symbol
name used matches the call (sleep) or update the call to the actual helper name;
also add a unit/regression test that mocks rm to throw a retryable code
(EBUSY/EPERM/ENOTEMPTY) then succeeds and assert that removeWithRetry retries
(no exception) and that the delay function is invoked (or time advanced via fake
timers) to prevent future regressions.

In `@test/authorization.test.ts`:
- Around line 39-52: Update the test to match the new default role: when
deleting process.env.CODEX_AUTH_ROLE assert getAuthorizationRole() returns
"viewer" (not "admin") and assert authorizeAction("secrets:rotate").allowed is
false; keep the same setup/teardown around process.env but change the expected
role string and the expected authorization boolean to reflect the new default in
getAuthorizationRole and authorizeAction.

In `@test/background-jobs.test.ts`:
- Around line 26-49: Add a concurrent DLQ write regression test that ensures the
file-locking in runBackgroundJobWithRetry correctly serializes writes: import
runBackgroundJobWithRetry and getBackgroundJobDlqPath from
lib/background-jobs.js, spawn several parallel runBackgroundJobWithRetry calls
that each fail on first attempt (maxAttempts: 1) so they write to the same DLQ,
await Promise.all to finish, then read the DLQ with fs.readFile and assert the
number of lines equals the number of jobs; this will catch regressions in the
DLQ write serialization implemented in runBackgroundJobWithRetry.

In `@test/config-save.test.ts`:
- Around line 76-140: The test never asserts that the injected transient read
errors actually executed, so add an assertion that the helper variable
transientReadFailures hit the expected failure count (e.g., === 2) after calling
savePluginConfig to guarantee the readFileSync mock path ran; reference the
transientReadFailures variable and the mocked readFileSync (used when invoking
savePluginConfig) and place the expectation just after the try/finally block
(before reading/parsing the config) so the test fails if the contention branch
in savePluginConfig never executed.

In `@test/fetch-helpers.test.ts`:
- Around line 530-539: The test for SSE parse failures in detailed mode (the it
block using handleSuccessResponseDetailed) checks the error body but doesn't
assert the response status; update this test to also assert that
result.response.status (or the returned Response.status) equals 502 to ensure
parse failures map to HTTP 502 in the non-streaming path exercised by
handleSuccessResponseDetailed and the logic around the SSE parsing in the
fetch-helpers code (referenced at the parse/fallback handling near the code path
invoked by handleSuccessResponseDetailed). Ensure the assertion is added
alongside the existing body shape checks so the contract is locked.

In `@test/idempotency.test.ts`:
- Around line 97-111: The test for checkAndRecordIdempotencyKey currently only
verifies immediate replay and can pass with an unsafe 1ms fallback; update the
"normalizes non-finite ttl values to a safe minimum" test to advance fake time a
few milliseconds after the first successful record and then assert replay still
returns true. Concretely, after the first expect(...).resolves.toEqual({
replayed: false }) call, call vi.advanceTimersByTime(5) (or vi.setSystemTime(new
Date(Date.now() + 5))) and then call
checkAndRecordIdempotencyKey("codex.auth.rotate-secrets", "key-non-finite",
Number.NaN) again and assert .resolves.toEqual({ replayed: true }); keep
vi.useFakeTimers()/vi.useRealTimers() as-is.

In `@test/index-retry.test.ts`:
- Around line 334-347: The test currently simulates single-flight behavior
inside the mock (refreshAndUpdateTokenMock) by using refreshInFlight, which
hides whether the plugin under test actually dedups concurrent refreshes; remove
refreshInFlight from the mock so the mock simply calls refreshEndpoint() and
returns its value, then implement the single-flight expectation by invoking the
real integration seam (the plugin method that triggers refresh) concurrently in
the test and asserting refreshAndUpdateTokenMock was called only once (check
call counts and timing), and adjust the related assertions around the calls at
the integration boundaries referenced near refreshAndUpdateTokenMock,
refreshEndpoint and the concurrent-refresh test blocks (lines ~318 and ~370-380)
so the test proves plugin single-flight deterministically under Vitest.

In `@test/index.test.ts`:
- Around line 530-587: The tests set persistent mock implementations for
configModule.loadPluginConfig and configModule.getFetchTimeoutMs using
vi.mocked(...).mockReturnValue(...) which can leak between tests because the
suite only calls vi.clearAllMocks(); replace that with resetting mock
implementations between tests (e.g., call vi.resetAllMocks() or
vi.restoreAllMocks() in the shared beforeEach/afterEach) or switch those
specific mocks to mockImplementationOnce so each test provides its own return
value; ensure changes target the setup that currently calls vi.clearAllMocks()
and update tests referencing loadPluginConfig and getFetchTimeoutMs to rely on
per-test mock setups.

In `@test/refresh-guardian.test.ts`:
- Around line 75-79: The timing assertions in tests using
missingManagerGuardian.getStats() (and the other test locations) are too
permissive (>= 0) and should assert deterministic branch outcomes: update the
assertions for missingManagerGuardian and similar instances to expect exact runs
counts (e.g., 0 or the known countedRun value), exact maxTickDurationMs and
avgTickDurationMs (0 when no run), and assert cumulativeTickDurationMs equals 0
for the early-return manager-missing branch; for the path exercising the
finally-based timing update (the countedRun gating in lib/refresh-guardian.ts
around lines handling countedRun and the finally block), assert non-zero
positive durations and that avgTickDurationMs equals cumulativeTickDurationMs /
runs, and adjust the tests at the other referenced spots (lines noted in the
review) to mirror these precise checks so tests deterministically cover the two
branches (early return vs finally timing update).

In `@test/retry-governor.test.ts`:
- Around line 95-108: Add a test that passes all-negative values to
decideRetryAllAccountsRateLimited to ensure clampNonNegative is applied to every
numeric input before evaluation; specifically call
decideRetryAllAccountsRateLimited with negative accountCount, waitMs, maxWaitMs,
currentRetryCount, accumulatedWaitMs, and absoluteCeilingMs (keep maxRetries
positive) and assert the outcome matches the clamped result (e.g., accountCount
clamps to 0 → reason "no-accounts"). This verifies the clamp-then-evaluate path
and prevents regressions around negative-field handling in
decideRetryAllAccountsRateLimited and the clampNonNegative logic.

In `@test/settings-hub-utils.test.ts`:
- Around line 308-316: The test currently asserts the object returned by
persistBackendConfigSelection but that helper can return the input fallback on
failure (see persistBackendConfigSelection in settings-hub.ts), so change the
test to verify on-disk persistence: after calling
persistBackendConfigSelection(selected, "backend"), reload the stored backend
config from the real persistence layer (e.g., via the test API's read/load
method or by reading the config file used by the settings hub) and assert that
retryAllAccountsAbsoluteCeilingMs is 0 in the reloaded data; this ensures
failures in persistBackendConfigSelection don't make the test falsely pass.

In `@test/telemetry.test.ts`:
- Around line 220-242: The test "serializes concurrent telemetry writes without
dropping events" can miss events due to log rotation; update the test to read
events via queryTelemetryEvents (the helper in lib/telemetry.ts that reads
rotated files) instead of directly reading getTelemetryLogPath(), or
alternatively temporarily set maxFileSizeBytes to a larger value for this test
so rotation won't occur; modify the test to call queryTelemetryEvents() (or
adjust the telemetry config used in this test) and assert the returned array
length equals total and that every entry.details.token === "***MASKED***".

In `@test/tool-utils.test.ts`:
- Around line 623-642: Update the test to include Number.NEGATIVE_INFINITY in
the input anyOf and assert post-flattening behavior from cleanupToolDefinitions:
add Number.NEGATIVE_INFINITY to the anyOf list for the coercedValues parameter,
change the expected enum on props.coercedValues to include three nulls ([null,
null, null]), and add an assertion that the original anyOf was removed (e.g.,
props.coercedValues.anyOf is undefined or not present) to prove flattening
removed the anyOf wrapper; reference the cleanupToolDefinitions call and the
coercedValues property to locate the assertion changes.

---

Outside diff comments:
In `@docs/reference/settings.md`:
- Around line 184-192: Update the Validation section to replace the stale
CLI-only checks with the current verification gates and npm scripts: list "npm
run verify", "npm run verify:ci", and "npm test -- test/documentation.test.ts"
as the primary gates, and add optional local checks "npm run doctor:dev" and
"npm run setup:dev"; also add a brief upgrade-note pointer to the
release/upgrade documentation indicating these new scripts and any changed CLI
flags (ensure the section references the exact script names and the
documentation test file to match current workflows).

In `@index.ts`:
- Around line 797-857: The concurrent worker loop using shared state
(workerCount, nextHydrateIndex, queuedRefresh, accountsToHydrate,
refreshedCount, failedCount, updatedAccountCount) needs a deterministic vitest
regression: add a test in test/** that sets HYDRATE_EMAILS_MAX_CONCURRENCY to a
known value, provides a fixed accountsToHydrate array, and mocks queuedRefresh
to return a controlled sequence of mixed outcomes (success with varied
access/refresh/expires, failure) while ensuring each mock call is deterministic
(use a seeded sequence or a controlled promise queue rather than timing/flaky
delays). The test must assert each account was processed exactly once and that
refreshedCount/failedCount/updatedAccountCount match expected values, and must
avoid mocking real secrets or skipping assertions.

In `@lib/auth/auth.ts`:
- Around line 234-253: The refreshAccessToken function passes options.timeoutMs
directly to fetchWithTimeout allowing 0, negative, or NaN to bypass the default;
normalize and clamp timeoutMs before the fetch: read
RefreshAccessTokenOptions.timeoutMs, coerce to a finite number, and if it's NaN
or <= 0 fall back to OAUTH_REFRESH_TIMEOUT_MS (or apply the same min/max clamp
logic used in the exchange path), then pass that normalized value to
fetchWithTimeout; update refreshAccessToken and related references
(refreshAccessToken, RefreshAccessTokenOptions, fetchWithTimeout,
OAUTH_REFRESH_TIMEOUT_MS) accordingly and add Vitest cases asserting behavior
for timeoutMs values 0, -1, and NaN in the refresh path, ensuring tests avoid
logging tokens/emails and reference any changed tests in the commit message.

In `@lib/storage.ts`:
- Around line 1118-1174: The migration path currently persists migrated data
(persistMigration) but still calls rememberKnownStorageRevision(path,
rawChecksum) using the pre-migration rawChecksum, causing false ECONFLICTs in
saveAccountsUnlocked; fix by updating the stored checksum after a successful
persistence: after persistMigration(normalized) either compute the actual
on-disk checksum for the migrated content or reload the written file (e.g., via
loadAccountsFromPath(path)) to obtain the new rawChecksum and pass that value
into rememberKnownStorageRevision instead of the original rawChecksum; update
code paths around persistMigration, the promotion branch, and the final
rememberKnownStorageRevision call accordingly (functions/methods:
persistMigration, loadAccountsFromPath, rememberKnownStorageRevision,
saveAccountsUnlocked), and add a vitest regression that performs a v1->v3
auto-migration then immediately calls saveAccountsUnlocked to assert no
ECONFLICT is thrown.

In `@test/server.unit.test.ts`:
- Around line 328-350: The test "should return null after 5 minute timeout" uses
vi.useFakeTimers() but doesn't guarantee restoring real timers on failure; wrap
the fake-timer setup and all test logic in a try/finally so vi.useRealTimers()
is always called. Specifically, in the test block containing
startLocalOAuthServer, result.waitForCode(), vi.advanceTimersByTimeAsync, and
assertions, move vi.useRealTimers() into a finally clause to ensure cleanup even
if an assertion throws; reference the test name,
vi.useFakeTimers/vi.useRealTimers, startLocalOAuthServer, and result.waitForCode
to locate the code to change.

---

Duplicate comments:
In `@lib/authorization.ts`:
- Around line 30-36: The getRoleFromEnv function currently defaults to "admin"
when CODEX_AUTH_ROLE is unset; change it so the default fallback is "viewer"
(least-privilege) and preserve the existing validation that invalid values map
to "viewer"; update the logic in getRoleFromEnv to initialize raw to "viewer"
(or check for undefined and return "viewer") instead of "admin", and update the
test expectations in test/authorization.test.ts (lines checking behavior when
env is deleted) to expect "viewer" rather than "admin".

In `@lib/file-lock.ts`:
- Around line 184-187: The current release flow marks the local flag released =
true before calling releaseOwnedLock/releaseOwnedLockSync, which prevents
retries if those functions throw; change the logic so released is only set to
true after the release call completes successfully (i.e., call await
releaseOwnedLock(path, metadata.ownerId) inside a try/catch and set released =
true in the try after success, rethrow or handle errors without flipping the
flag), and apply the same pattern for the sync variant; update/ add vitest cases
that simulate transient failures (EBUSY/429) to assert retries and eventual
cleanup, and ensure any added logging around failures does not leak tokens or
emails.

In `@lib/idempotency.ts`:
- Around line 98-100: normalizeTtlMs currently maps invalid/non-finite TTLs to
1ms which weakens replay protection; change normalizeTtlMs to return the
configured fail-safe constant IDEMPOTENCY_TTL_MS when ttlMs is not a positive
finite number (keep Math.floor for valid values), update the function reference
in lib/idempotency.ts to import/use IDEMPOTENCY_TTL_MS, and adjust/add vitest
unit tests that assert fallback behavior (ensure tests cover non-finite,
negative, and NaN inputs) so the default is IDEMPOTENCY_TTL_MS rather than 1.

In `@test/data-retention.test.ts`:
- Around line 80-114: The test only asserts that enforceDataRetention throws an
EPERM error for the locked file; add coverage for the Windows transient case by
including an EBUSY variant (or parameterize to test both "EPERM" and "EBUSY") in
the mocked fs.unlink behavior inside the "rethrows non-ENOENT errors during
recursive prune" test, so the mockImplementation throws a NodeJS.ErrnoException
with code set to "EBUSY" for the locked.log path in addition to (or as a
separate subtest from) "EPERM", and update the expect(...).rejects.toMatchObject
to assert the corresponding error.code for each variant while restoring the spy
in the finally block.

In `@test/network.test.ts`:
- Around line 89-106: The test currently only asserts retry counts but not the
onRetry callback payload; update the test for fetchWithTimeoutAndRetry to pass a
mocked onRetry handler (e.g., vi.fn()) and assert it was called once with the
expected metadata (attempt number 1, the retry reason/status indicating 429 or a
response object with status 429, and the request URL and delay values), in
addition to the existing attempts/fetch call count assertions so the retry
reason/status reporting is validated deterministically.

In `@test/plugin-config.test.ts`:
- Around line 1052-1073: Add two unit tests that cover the config-path clamping
for getRetryAllAccountsAbsoluteCeilingMs when the environment var is unset: one
that sets PluginConfig.retryAllAccountsAbsoluteCeilingMs to a negative value and
asserts the getter returns 0, and another that sets it to a very large value and
asserts the getter returns the 24-hour hard maximum; ensure both tests clear
CODEX_AUTH_RETRY_ALL_ABSOLUTE_CEILING_MS (or leave it unset) so they exercise
the config branch in the getter.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85817824-b9e5-472d-b088-bd378d68fd01

📥 Commits

Reviewing files that changed from the base of the PR and between 74b844d and 14fec87.

📒 Files selected for processing (76)

.github/workflows/ci.yml
.github/workflows/secret-scan.yml
.github/workflows/supply-chain.yml
docs/configuration.md
docs/development/CONFIG_FIELDS.md
docs/development/LOCAL_DEV.md
docs/development/TESTING.md
docs/reference/settings.md
docs/runbooks/operations.md
index.ts
lib/accounts.ts
lib/accounts/account-view.ts
lib/accounts/active-index.ts
lib/auth/auth.ts
lib/authorization.ts
lib/background-jobs.ts
lib/codex-manager.ts
lib/codex-manager/settings-hub.ts
lib/config.ts
lib/data-redaction.ts
lib/data-retention.ts
lib/file-lock.ts
lib/idempotency.ts
lib/network.ts
lib/prompts/codex.ts
lib/prompts/host-codex-prompt.ts
lib/quota-cache.ts
lib/refresh-queue.ts
lib/request/fetch-helpers.ts
lib/request/helpers/tool-utils.ts
lib/request/retry-governor.ts
lib/secrets-crypto.ts
lib/shutdown.ts
lib/storage.ts
lib/telemetry.ts
lib/unified-settings.ts
lib/utils.ts
scripts/benchmark-runtime-path-helpers.mjs
scripts/benchmark-runtime-path.mjs
scripts/check-runtime-budgets-helpers.mjs
scripts/check-runtime-budgets.mjs
scripts/license-policy-check.js
test/account-view.test.ts
test/accounts-edge.test.ts
test/active-index.test.ts
test/auth.test.ts
test/authorization.test.ts
test/background-jobs.test.ts
test/benchmark-runtime-path-helpers.test.ts
test/check-runtime-budgets-script.test.ts
test/codex-manager-cli.test.ts
test/codex-prompts.test.ts
test/config-save.test.ts
test/data-redaction.test.ts
test/data-retention.test.ts
test/fetch-helpers.test.ts
test/file-lock.test.ts
test/helpers/fs-retry.ts
test/idempotency.test.ts
test/index-retry.test.ts
test/index.test.ts
test/network.test.ts
test/plugin-config.test.ts
test/quota-cache.test.ts
test/refresh-guardian.test.ts
test/refresh-lease.test.ts
test/retry-governor.test.ts
test/scripts/doctor-dev.test.ts
test/secrets-crypto.test.ts
test/server.unit.test.ts
test/settings-hub-utils.test.ts
test/shutdown.test.ts
test/storage.test.ts
test/telemetry.test.ts
test/tool-utils.test.ts
test/utils.test.ts

.github/workflows/supply-chain.yml

docs/development/CONFIG_FIELDS.md

docs/runbooks/operations.md

index.ts

lib/accounts.ts

test/refresh-guardian.test.ts

test/retry-governor.test.ts

test/settings-hub-utils.test.ts

test/telemetry.test.ts

test/tool-utils.test.ts

lib/auth/auth.ts

lib/telemetry.ts

greptile-apps · 2026-03-04T23:43:18Z

Additional Comments (1)

lib/quota-cache.ts
existsSync blocks the event loop and introduces a TOCTOU race

existsSync is a synchronous syscall. on windows, antivirus can hold a filesystem handle on the quota-cache directory, causing existsSync to stall and block the Node event loop for hundreds of milliseconds. more importantly, this is a classic TOCTOU: another process could delete the file between the existsSync check and the readCacheFileWithRetry call. the outer catch (line 192) already handles all errors including ENOENT, so the guard is redundant.

removing it and letting the async path handle the absent-file case is both simpler and correct:

	try {
		const content = await readCacheFileWithRetry(QUOTA_CACHE_PATH);

the readCacheFileWithRetry loop only retries EBUSY/EPERM; ENOENT will fall through to the outer catch and return empty maps — identical observable behavior. no vitest change needed beyond verifying the existing "missing file" test still passes.

Prompt To Fix With AI

This is a comment left during a code review.
Path: lib/quota-cache.ts
Line: 173-175

Comment:
**`existsSync` blocks the event loop and introduces a TOCTOU race**

`existsSync` is a synchronous syscall. on windows, antivirus can hold a filesystem handle on the quota-cache directory, causing `existsSync` to stall and block the Node event loop for hundreds of milliseconds. more importantly, this is a classic TOCTOU: another process could delete the file between the `existsSync` check and the `readCacheFileWithRetry` call. the outer `catch` (line 192) already handles all errors including `ENOENT`, so the guard is redundant.

removing it and letting the async path handle the absent-file case is both simpler and correct:

```suggestion
	try {
		const content = await readCacheFileWithRetry(QUOTA_CACHE_PATH);
```

the `readCacheFileWithRetry` loop only retries `EBUSY`/`EPERM`; `ENOENT` will fall through to the outer `catch` and return empty maps — identical observable behavior. no vitest change needed beyond verifying the existing "missing file" test still passes.

How can I resolve this? If you propose a fix, please make it concise.

Address outstanding review feedback for telemetry reliability and coverage: - enforce telemetry source/outcome enum validation while parsing logs - sort telemetry report events deterministically before summary/recent output - await CLI telemetry lifecycle writes before command return/throw - expand CLI/index/telemetry regressions (windows paths, deterministic time, exception paths, concurrent appends) - document telemetry rollout/automation pointer in command reference Co-authored-by: Codex <noreply@openai.com>

- address security/reliability review findings across auth, storage, telemetry, request, and shutdown paths - pin supply-chain workflow actions and update config/release docs - harden redaction, timeout/index clamping, cross-process telemetry writes, and retention cleanup retries - add/adjust regression tests for CLI authz flows, retry behavior, locking, and edge-case normalization Co-authored-by: Codex <noreply@openai.com>

coderabbitai

Actionable comments posted: 23

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

lib/accounts.ts (1)
900-917: ⚠️ Potential issue | 🟠 Major

do not skip the fresh debounced save when the previous pending save rejects.

lib/accounts.ts:900-903 currently awaits this.pendingSave inside the main try block. if that prior promise rejects, execution jumps to catch and the new save at lib/accounts.ts:903-913 is never scheduled. this can drop a later state update.
proposed fix
 			const doSave = async () => {
 				try {
 					if (this.pendingSave) {
-						await this.pendingSave;
+						try {
+							await this.pendingSave;
+						} catch (error) {
+							log.warn("Previous debounced save failed; continuing with fresh save", {
+								error: error instanceof Error ? error.message : String(error),
+							});
+						}
 					}
 					this.pendingSave = runBackgroundJobWithRetry({
 						name: "accounts.saveToDiskDebounced",
 						task: () => this.saveToDisk(),
also add a vitest regression that simulates: first pending save fails, second debounced trigger still performs a new save attempt (test/accounts-edge.test.ts is a good fit).

As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/accounts.ts` around lines 900 - 917, The current code awaits
this.pendingSave directly inside the try so a rejection aborts scheduling the
new debounced save; change the await to swallow the prior rejection (e.g., await
this.pendingSave.catch(() => undefined)) before creating this.pendingSave so a
rejected prior save doesn't jump to catch and prevent scheduling the new
runBackgroundJobWithRetry; update the code paths around pendingSave,
runBackgroundJobWithRetry and saveToDisk accordingly and add a vitest regression
test (test/accounts-edge.test.ts) that simulates a failing first pending save
and verifies the second debounced trigger still runs a new save attempt.
lib/storage.ts (2)
1578-1590: ⚠️ Potential issue | 🟠 Major

flagged storage writes still lack cross-process lock protection.

accounts are serialized with file lock in lib/storage.ts:1389-1393, but flagged writes at lib/storage.ts:1579 only use in-process mutex. rotateStoredSecretEncryption then rewrites flagged storage at lib/storage.ts:1761, so concurrent processes can still do last-write-wins and lose updates. rename retry helps windows EBUSY, but it does not prevent concurrent clobber.
proposed fix
 export async function saveFlaggedAccounts(storage: FlaggedAccountStorageV1): Promise<void> {
-	return withStorageLock(async () => {
-		const path = getFlaggedAccountsPath();
+	const path = getFlaggedAccountsPath();
+	return withAccountFileLock(path, () => withStorageLock(async () => {
 		const uniqueSuffix = `${Date.now()}.${Math.random().toString(36).slice(2, 8)}`;
 		const tempPath = `${path}.${uniqueSuffix}.tmp`;
@@
-	});
+	}));
 }
please add a vitest regression for concurrent flagged writers across processes (test/storage.test.ts:1) and keep a windows EBUSY case in the same area.

As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

Also applies to: 1760-1762

1110-1116: ⚠️ Potential issue | 🟠 Major

do not overwrite known revision with pre-migration checksum.

line 1114 (lib/storage.ts:1114) can persist migrated content, but line 1161 (lib/storage.ts:1161) then stores rawChecksum from the pre-migration file. this can trigger false ECONFLICT on the next persist in the same transaction path.
proposed fix
 async function loadAccountsInternal(
   persistMigration: ((storage: AccountStorageV3) => Promise<void>) | null,
 ): Promise<AccountStorageV3 | null> {
   const path = getStoragePath();
+  let persistedDuringLoad = false;
   await cleanupStaleRotatingBackupArtifacts(path);
   const migratedLegacyStorage = persistMigration
     ? await migrateLegacyProjectStorageIfNeeded(persistMigration)
     : null;

   try {
     const { normalized, storedVersion, schemaErrors, rawChecksum } = await loadAccountsFromPath(path);
@@
     if (normalized && storedVersion !== normalized.version) {
       log.info("Migrating account storage to v3", { from: storedVersion, to: normalized.version });
       if (persistMigration) {
         try {
           await persistMigration(normalized);
+          persistedDuringLoad = true;
         } catch (saveError) {
           log.warn("Failed to persist migrated storage", { error: String(saveError) });
         }
       }
     }
@@
-    rememberKnownStorageRevision(path, rawChecksum);
+    if (!persistedDuringLoad) {
+      rememberKnownStorageRevision(path, rawChecksum);
+    }
     return normalized;
please add a vitest regression around migration + transaction persist (e.g. test/storage.test.ts:1) to assert no false conflict after migration write.

As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest).

Also applies to: 1161-1162
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 1110 - 1116, When migrating account storage in
the block that checks `if (normalized && storedVersion !== normalized.version)`
avoid overwriting the known revision/checksum with the pre-migration
`rawChecksum`: after a successful `persistMigration(normalized)` ensure the
in-memory or to-be-persisted checksum used later (the `rawChecksum` value
written around the code that sets `rawChecksum` at/near the later store step) is
replaced with the post-migration checksum (`normalized.checksum`) or cleared so
it doesn’t reintroduce the old checksum and cause a false ECONFLICT; update the
code that writes `rawChecksum` (the same symbol referenced at lines ~1161-1162)
to respect the migrated checksum state and add a vitest regression (e.g.,
test/storage.test.ts) that performs migration + transaction persist to assert no
ECONFLICT on the subsequent persist and reference this test in the change.

♻️ Duplicate comments (4)

test/refresh-guardian.test.ts (1)

74-80: ⚠️ Potential issue | 🟠 Major

separate manager-missing vs no-enabled timing expectations more strictly.

manager missing and no enabled accounts hit different paths: lib/refresh-guardian.ts:83-86 exits before timing setup, while lib/refresh-guardian.ts:189-201 should still write lastTickDurationMs in finally. in test/refresh-guardian.test.ts:91-97, asserting lastTickDurationMs === 0 does not prove that finally path still updates timing for countedRun=false.

proposed deterministic tightening

@@
     const disabledGuardian = new RefreshGuardian(() => disabledManager, {
       intervalMs: 5_000,
       bufferMs: 60_000,
     });
+    const nowSpy = vi.spyOn(performance, "now");
+    nowSpy.mockReturnValueOnce(100).mockReturnValueOnce(112);
     await expect(disabledGuardian.tick()).resolves.toBeUndefined();
     expect(refreshExpiringAccountsMock).not.toHaveBeenCalled();
     const disabledStats = disabledGuardian.getStats();
     expect(disabledStats.runs).toBe(0);
-    expect(disabledStats.lastTickDurationMs).toBe(0);
+    expect(disabledStats.lastTickDurationMs).toBe(12);
     expect(disabledStats.maxTickDurationMs).toBe(0);
     expect(disabledStats.cumulativeTickDurationMs).toBe(0);
     expect(disabledStats.avgTickDurationMs).toBe(0);

as per coding guidelines, "tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions."

Also applies to: 91-97

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/refresh-guardian.test.ts` around lines 74 - 80, The test currently
conflates the "manager missing" path (which exits before timing setup) with the
"no enabled accounts" path (which still executes the finally block and should
update timing). Split the assertions into two deterministic checks: for the
missing-manager case (use missingManagerGuardian.tick() and
missingManagerGuardian.getStats()) assert runs==0 and that timing fields remain
at their initialized zero/unchanged values; for the no-enabled-accounts case
invoke the guardian path that reaches the finally block (the code path in
lib/refresh-guardian.ts that sets timing when countedRun=false) and assert that
lastTickDurationMs and related timing fields are updated (e.g., >0 or non-zero)
to prove the finally timing logic ran. Ensure you reference and update the
specific test invocations of missingManagerGuardian.tick() and getStats(), and
add a separate test or adjust the existing one to explicitly exercise the
countedRun=false branch so timing is deterministically checked.

test/file-lock.test.ts (1)

10-29: 🛠️ Refactor suggestion | 🟠 Major

reuse the shared fs retry helper instead of redefining it here.

this local removeWithRetry duplicates logic already centralized in test/helpers/fs-retry.ts, which reintroduces drift risk (test/file-lock.test.ts:10-29, test/helpers/fs-retry.ts:1).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/file-lock.test.ts` around lines 10 - 29, The test defines a local
removeWithRetry that duplicates the centralized helper; delete this local
function and import and use the shared helper instead (replace the local
removeWithRetry with the exported helper from the test helpers module), updating
any calls in file-lock.test.ts to call the imported helper; ensure the import
name matches the helper's export (e.g., removeWithRetry or the actual exported
identifier) and remove the RETRYABLE_REMOVE_CODES constant which is now handled
by the shared implementation.

lib/file-lock.ts (1)

215-219: ⚠️ Potential issue | 🟠 Major

release state is set too early

in lib/file-lock.ts:215-219 and lib/file-lock.ts:296-300 (Line 215-219, Line 296-300), released flips before releaseOwnedLock* succeeds. if release throws transient ebusy/eperm on windows, this handle can’t retry and the lock can linger until stale cleanup.

proposed fix

-			let released = false;
+			let released = false;
+			let releasing: Promise<void> | null = null;
 			return {
 				path,
 				release: async () => {
 					if (released) return;
-					released = true;
-					await releaseOwnedLock(path, metadata.ownerId);
+					if (releasing) return releasing;
+					releasing = (async () => {
+						await releaseOwnedLock(path, metadata.ownerId);
+						released = true;
+					})();
+					try {
+						await releasing;
+					} finally {
+						if (!released) releasing = null;
+					}
 				},
 			};
@@
-			let released = false;
+			let released = false;
 			return {
 				path,
 				release: () => {
 					if (released) return;
-					released = true;
 					releaseOwnedLockSync(path, metadata.ownerId);
+					released = true;
 				},
 			};

#!/bin/bash
rg -n "let released|released = true|releaseOwnedLock|releaseOwnedLockSync" lib/file-lock.ts -C3
rg -n "replacement-race|release.*ebusy|release.*eperm|stale" test/file-lock.test.ts -i -C2

As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Also applies to: 296-300

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/file-lock.ts` around lines 215 - 219, The flag released is being set
before the actual release call completes, which prevents retries on transient
errors (EBUSY/EPERM); update the async release closure in the returned locker
(the release function that calls releaseOwnedLock) so that released = true is
assigned only after releaseOwnedLock(path, metadata.ownerId) resolves
successfully (and similarly move the assignment in the synchronous variant that
calls releaseOwnedLockSync), propagate or rethrow errors so callers can retry on
transient failures, and add/adjust vitest cases (tests referencing
replacement-race, release.*ebusy, release.*eperm, stale in file-lock.test.ts) to
assert retry behavior and that logging does not leak tokens/emails when failures
are logged.

lib/codex-manager.ts (1)

152-165: ⚠️ Potential issue | 🟠 Major

guard recordauthattempt in the same pre-check block to avoid batch aborts

lib/codex-manager.ts:164 can still throw after the current checkAuthRateLimit catch, which means one account can still abort multi-account flows. keep both pre-check calls in one guarded block and convert 429/ebusy style failures into a TokenResult failure so callers continue.
proposed fix
 async function queuedRefreshWithAuthRateLimit(
 	refreshToken: string,
 	context: { accountId?: string; email?: string },
 ): Promise<TokenResult> {
 	const rateLimitKey = buildRefreshRateLimitKey(refreshToken, context.accountId, context.email);
 	try {
 		checkAuthRateLimit(rateLimitKey);
+		recordAuthAttempt(rateLimitKey);
 	} catch (error) {
 		if (error instanceof AuthRateLimitError) {
 			return {
 				type: "failed",
 				reason: "unknown",
 				message: error.message,
 			};
 		}
 		throw error;
 	}
-	recordAuthAttempt(rateLimitKey);
 	const result = await queuedRefresh(refreshToken);
please add a vitest regression in test/codex-manager-cli.test.ts:1 that stubs pre-check recording failure (including windows-style EBUSY) for one account and asserts other accounts still process.

As per coding guidelines lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/codex-manager.ts` around lines 152 - 165, Move
recordAuthAttempt(rateLimitKey) into the same guarded pre-check block with
checkAuthRateLimit(rateLimitKey) inside the try/catch in codex-manager (so both
pre-checks are atomic) and, inside the catch that currently handles
AuthRateLimitError, also catch errors thrown by recordAuthAttempt (including
filesystem EBUSY and HTTP 429-style errors) and return a TokenResult-style
failure object ({ type: "failed", reason: "unknown", message: error.message })
instead of letting the error bubble and abort multi-account flows; update the
code paths around queuedRefresh(refreshToken) to only run when the pre-check
block succeeds. Add a vitest regression in test/codex-manager-cli.test.ts that
stubs recordAuthAttempt to throw an AuthRateLimitError and an EBUSY-style error
for one account and asserts other accounts still proceed (i.e., their
queuedRefresh runs), covering Windows EBUSY and 429-like failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/supply-chain.yml:
- Around line 1-11: Add a workflow-level concurrency block to supply-chain.yml
to prevent overlapping runs: at the top-level (before the jobs key) add a
concurrency key with a unique group name (e.g., "supply-chain-${{ github.ref
}}") and set cancel-in-progress: true so a new run cancels any in-flight run for
the same branch/ref; ensure the concurrency block is placed alongside name and
on, not inside any job.
- Around line 3-10: The sca-and-license and sbom jobs are currently running for
pull_request and execute npm/npx on untrusted PR code; update the workflow so
those two jobs (sca-and-license and sbom) are gated to run only on push and
schedule (not pull_request) while leaving dependency-review running on
pull_request; implement this by restricting their job-level runs-on condition
(e.g., add an if: github.event_name != 'pull_request' or equivalent job-level
trigger) or by splitting triggers so the global on: keeps pull_request for
dependency-review but sca-and-license and sbom are scoped to push and schedule
only.
- Around line 72-73: The workflow step "Generate CycloneDX SBOM" currently
invokes an unpinned CLI via the command string `npx --yes
`@cyclonedx/cyclonedx-npm` --output-file sbom.cdx.json --omit dev`; pin that
package to a specific released version (e.g., change to `@4.2.0`) in the same
command to ensure reproducible SBOMs, and update the "Generate CycloneDX SBOM"
step metadata in .github/workflows/supply-chain.yml accordingly; additionally
add a minimal CI job or test that runs the SBOM generation and asserts the
produced sbom.cdx.json contains required top-level fields (e.g.,
bomFormat/version) to catch future version-induced breaks.

In `@docs/development/CONFIG_FIELDS.md`:
- Line 65: The docs table lists retryAllAccountsMaxRetries as defaulting to
Infinity but runtime sets it to 12; update the documentation entry for
retryAllAccountsMaxRetries in CONFIG_FIELDS.md to reflect the actual runtime
default of 12 and ensure any related fields (e.g.,
retryAllAccountsAbsoluteCeilingMs) remain accurate; locate the doc row
referencing retryAllAccountsMaxRetries and change its default value to 12 so it
matches the value set in lib/config.ts (where retryAllAccountsMaxRetries is
initialized).

In `@docs/runbooks/operations.md`:
- Around line 45-46: The runbook list of env vars omits that CODEX_AUTH_ROLE
defaults to "viewer" when unset; update the operations.md entry near the
CODEX_AUTH_ROLE / CODEX_AUTH_BREAK_GLASS bullet list to explicitly state "If
CODEX_AUTH_ROLE is not set, it defaults to 'viewer' (operator-facing behavior
change)" and mention CODEX_AUTH_BREAK_GLASS is for emergency windows only; also
add a short upgrade note and mirror this default in README and SECURITY docs and
any npm scripts or CLI docs that reference auth role behavior so all docs stay
consistent.
- Around line 77-79: Add a Windows-specific subsection to the filesystem lock
contention triage that explains how to identify and resolve
EBUSY/replacement-race scenarios (e.g., how to detect handles holding *.lock
files, using Sysinternals Handle/Process Explorer to find/kill locking
processes, recommended safe sequence to remove stale locks and restart services,
and the “hardened recovery” steps to use instead of simply deleting files), and
update the surrounding steps that mention stale `*.lock` files and DLQ
inspection to reference this Windows note; also ensure docs/** (README,
SECURITY, operations.md) mention any changed CLI flags/workflows and include
upgrade notes plus the new npm scripts required for the hardened recovery path
so on-call has consistent, platform-specific guidance.

In `@index.ts`:
- Around line 1311-1341: The current dedupe key builder getRefreshDedupeKey uses
the last 16 chars of refreshToken which can collide; replace that branch to
compute a collision-resistant hash (e.g., SHA-256 hex) of the full trimmed
refreshToken and use `refresh:<hash>` as the key (use an existing hash utility
or add one nearby and import it), keeping the accountId/email/index branches
unchanged; update refreshAccountAuth and refreshInFlightByAccount usage
unchanged since only the key changes; add a deterministic regression test that
constructs two RefreshDedupeAccount objects with distinct full refreshToken
values that share the same trailing 16 chars and assert that
refreshInFlightByAccount does not return the same promise for both (i.e., both
tokens trigger separate refreshAndUpdateToken calls).

In `@lib/accounts/account-view.ts`:
- Around line 76-84: formatActiveIndexByFamilyLabels currently renders idx + 1
directly which allows fractional persisted values to show; update
formatActiveIndexByFamilyLabels to normalize indexes (use Math.floor or the same
normalization logic as resolveActiveIndex) before adding 1 and rendering the
label so labels are always integers; add a deterministic vitest regression in
test/account-view.test.ts that passes a fractional activeIndexByFamily (e.g.,
codex: 1.8) and asserts the output includes "codex: 2" to prevent regressions.

In `@lib/codex-manager.ts`:
- Around line 4726-4748: The emitTelemetry call is currently awaited and can
throw, so make telemetry emission best-effort by changing emitTelemetry to
swallow errors and not block command execution: wrap the recordTelemetryEvent
call in try/catch (inside emitTelemetry) or fire-and-forget without awaiting,
and apply the same protection for the finish/exception emits invoked around the
main try/catch (the other emitTelemetry calls near the end of the command flow).
Add a Vitest regression in test/codex-manager-cli.test.ts that stubs/mock
recordTelemetryEvent to throw (or rejects) and asserts the CLI command (the main
entry that calls emitTelemetry and printUsage paths) still returns its normal
exit code; reference the emitTelemetry helper and recordTelemetryEvent in the
test so failures in telemetry do not affect business logic.

In `@lib/data-retention.ts`:
- Around line 89-99: The file-unlink path in lib/data-retention.ts currently
does a single fs.unlink and rethrows non-ENOENT errors, which causes transient
Windows EBUSY/EPERM to abort the entire run; update the unlink logic to mirror
removeEmptyDirectoryWithRetry’s retry/backoff behavior (reuse the same retry
constants and exponential backoff loop used by removeEmptyDirectoryWithRetry) so
unlink will retry on EBUSY/EPERM and only fail after retries, and add a vitest
in test/data-retention.test.ts that mocks/fs-stubs or spies to simulate an
initial EBUSY (or EPERM) from fs.unlink followed by a successful unlink to
assert the retry path succeeds (similar to the existing directory test).

In `@lib/secrets-crypto.ts`:
- Around line 104-112: The current isEncryptedSecret() treats any string
starting with `${ENCRYPTED_PREFIX}${LEGACY_ENCRYPTED_VERSION}:` or
`${ENCRYPTED_PREFIX}${ENCRYPTED_VERSION}:` as already encrypted which lets
malformed plaintext like "enc:v2:not-a-valid-envelope" bypass encryptSecret;
update isEncryptedSecret to more strictly validate the envelope format (e.g.,
check that the remainder matches the expected envelope pattern/structure or
base64/JSON format your encryption produces) and leave encryptSecret unchanged
except it should rely on this stricter check (functions to look at:
isEncryptedSecret and encryptSecret); add a deterministic vitest regression in
test/secrets-crypto.test.ts that passes a malformed prefixed value (e.g.,
"enc:v2:not-a-valid-envelope") to encryptSecret and asserts the returned value
is actually encrypted (not equal to the input, matches the expected encrypted
envelope pattern), and ensure the test names reference the changed behavior.

In `@lib/storage.ts`:
- Around line 1739-1758: The TOCTOU race happens because code calls
fileExists(flaggedPath) then reads the file, and if the file is removed between
those calls an ENOENT is treated as a hard failure; update the validation in the
flagged-rotation block (references: flaggedPath, fileExists,
loadFlaggedAccounts, flaggedCount) to treat an ENOENT during the read/parse as a
benign “file disappeared” case (do not convert it into the hard `Failed to
validate...` error) — either by re-checking fileExists and returning success
when missing or by explicitly ignoring errors with error.code === 'ENOENT' in
the catch and only rethrowing other parsing/IO errors; add a vitest regression
(e.g., tests/storage.test.ts) that simulates the concurrent delete between the
initial existence check and read (use a temp file or mock fs.readFile to throw
ENOENT) to assert validation does not fail, and ensure the test also covers
Windows-like behavior (handle transient EBUSY/EPERM/429 scenarios as non-fatal
where appropriate).

In `@lib/telemetry.ts`:
- Around line 290-292: The configureTelemetry function merges runtime settings
but does not clamp bounds for rotation parameters, so update configureTelemetry
to validate and clamp telemetryConfig.maxFiles (minimum 2) and
telemetryConfig.maxFileSizeBytes (minimum >0, cap to a sensible upper bound)
before assigning; reference the configureTelemetry function and telemetryConfig
object to locate the change and ensure behavior ties into the existing rotation
logic in the rotation code paths (e.g., where maxFiles is read). Also add a
regression test in test/telemetry.test.ts that sets maxFiles=1 and asserts
rotation still occurs (or that the config was clamped to >=2) to prevent future
regressions. Ensure the implementation handles concurrent updates safely if
configureTelemetry may be called at runtime.

In `@scripts/benchmark-runtime-path-helpers.mjs`:
- Around line 31-45: The assertion in assertSyncBenchmarkMergeResult assumes
reconciled.storage.accounts is an array and accesses .length / [0].refreshToken
directly; change the guard to explicitly verify reconciled?.storage and
Array.isArray(reconciled.storage.accounts) and that accounts.length >=
minimumAccounts before proceeding, and when checking expectedRefreshToken ensure
reconciled.storage.accounts[0] exists (or use the same array check) so malformed
storage triggers the same deterministic Error(`${caseName} failed`) rather than
a TypeError; update references in that function (minimumAccounts,
expectedRefreshToken, caseName) accordingly.

In `@test/authorization.test.ts`:
- Around line 39-86: Replace the manual single-key env save/restore in the three
tests that call getAuthorizationRole() and authorizeAction() with the existing
auth-env helpers so tests are deterministic: at the start of each test call
captureAuthEnv() (or clear all keys in AUTH_ENV_KEYS) to snapshot/clear all
authorization-related env vars, and in the finally block call restoreAuthEnv()
to restore them; update the tests around getAuthorizationRole and
authorizeAction to use captureAuthEnv/restoreAuthEnv (or explicitly clear
AUTH_ENV_KEYS) instead of only manipulating CODEX_AUTH_ROLE so
CODEX_AUTH_BREAK_GLASS and other ABAC keys can't leak and flip assertions.

In `@test/benchmark-runtime-path-helpers.test.ts`:
- Around line 62-91: Add deterministic vitest cases that mirror the existing
EPERM retry test to also cover the other retryable Windows fs error codes EBUSY
and ENOTEMPTY for removeWithRetry: create two new it(...) tests that mock remove
to reject once with Object.assign(new Error("busy"), { code: "EBUSY" }) and once
with Object.assign(new Error("not empty"), { code: "ENOTEMPTY" }) respectively,
then resolve on the second call; inject a mocked sleep and assert remove was
called twice and sleep was called with 25; keep the existing non-retryable test
unchanged. Reference the removeWithRetry helper and the mocked remove and sleep
symbols used in the current tests when adding these cases.

In `@test/config-save.test.ts`:
- Line 149: The current flaky assertion
expect(readAttempts).toBeGreaterThanOrEqual(3) should be tightened to assert an
exact retry count to detect regressions; replace it with
expect(readAttempts).toBe(3) (or the precise number your retry policy
guarantees) so the test for the retry loop (the readAttempts variable) is
deterministic and fails fast on accidental extra retries.

In `@test/fetch-helpers.test.ts`:
- Around line 542-547: The test at handleSuccessResponseDetailed should be
hardened to use an SSE-framed payload (e.g., event-stream style "data: ...\n\n"
chunks and Content-Type "text/event-stream") instead of plain text so the
streaming branch in lib/request/fetch-helpers.ts:671 is exercised; update the
test in test/fetch-helpers.test.ts (the case "'returns undefined parsedBody for
streaming success responses in detailed mode'") to construct a Response that
mimics SSE framing and content-type, call
handleSuccessResponseDetailed(response, true), assert result.parsedBody is
undefined, and assert await result.response.text() equals the original
SSE-framed string.

In `@test/index.test.ts`:
- Around line 2242-2262: The getAuth test fixture is using real-looking secret
literals ("sk-live-sensitive-access-token-1234567890" and
"refresh-sensitive-token-1234567890"); replace those with clearly synthetic
tokens (e.g., "sk-test-0000000000" and "refresh-test-0000000000") inside the
getAuth async function so secret scanners won't be triggered, and keep the rest
of the test logic (plugin.auth.loader, sdk.fetch call, and telemetry assertions
using recordTelemetryEventMock) unchanged to preserve behavior and assertions
that telemetry does not contain the synthetic tokens.

In `@test/shutdown.test.ts`:
- Around line 163-200: The test misses the sub-minimum clamp regression for
positive values below 1000ms; update the test in shutdown.test.ts to include a
deterministic case for process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = "1" (which
should clamp to 1000ms) by adding await runCase("1") alongside the existing
runCase("0"), runCase("-1"), and runCase("not-a-number") calls; this verifies
the behavior in lib/shutdown.ts that clamps positive values under 1000 to 1000,
while still using registerCleanup and runCleanup and the same fake timers
pattern.

In `@test/telemetry-lock.test.ts`:
- Around line 44-49: The test uses a secret-looking fixture value in the call to
telemetry.recordTelemetryEvent (see telemetry.recordTelemetryEvent invocation) —
replace the token "sk-live-lock-test-token" with a neutral, non-secret
placeholder (e.g. "test-placeholder-token" or "placeholder-token") in the
details.token field so test data hygiene is preserved; ensure no other test
fixtures use real-looking secret patterns and keep the existing assertions and
deterministic vitest setup intact (do not mock out assertions or remove the
telemetry call).

In `@test/telemetry.test.ts`:
- Around line 16-33: The local removeWithRetry function duplicates shared logic;
delete this local definition in test/telemetry.test.ts and import the shared
helper from test/helpers/fs-retry.ts instead (use the exported removeWithRetry
helper from that module), update any usages to call the imported
removeWithRetry, and remove any now-unused local symbols like
RETRYABLE_REMOVE_CODES or direct fs.rm retry logic so the test reuses the
centralized retry behavior.
- Around line 305-330: The test uses secret-like fixture values
("sk-concurrent-token-...") which can trigger secret-scanning; update the
telemetry test to use clearly synthetic, non-secret token strings (e.g.
"concurrent-token-0000" or "token-<index>") when calling recordTelemetryEvent in
the concurrent write spec, keeping the same deterministic generation pattern so
configureTelemetry, recordTelemetryEvent, and getTelemetryLogPath behavior and
the final masking assertion remain unchanged; ensure the token string change is
applied only in this test fixture and that parsed.every(...) still checks for
masked tokens.

---

Outside diff comments:
In `@lib/accounts.ts`:
- Around line 900-917: The current code awaits this.pendingSave directly inside
the try so a rejection aborts scheduling the new debounced save; change the
await to swallow the prior rejection (e.g., await this.pendingSave.catch(() =>
undefined)) before creating this.pendingSave so a rejected prior save doesn't
jump to catch and prevent scheduling the new runBackgroundJobWithRetry; update
the code paths around pendingSave, runBackgroundJobWithRetry and saveToDisk
accordingly and add a vitest regression test (test/accounts-edge.test.ts) that
simulates a failing first pending save and verifies the second debounced trigger
still runs a new save attempt.

In `@lib/storage.ts`:
- Around line 1110-1116: When migrating account storage in the block that checks
`if (normalized && storedVersion !== normalized.version)` avoid overwriting the
known revision/checksum with the pre-migration `rawChecksum`: after a successful
`persistMigration(normalized)` ensure the in-memory or to-be-persisted checksum
used later (the `rawChecksum` value written around the code that sets
`rawChecksum` at/near the later store step) is replaced with the post-migration
checksum (`normalized.checksum`) or cleared so it doesn’t reintroduce the old
checksum and cause a false ECONFLICT; update the code that writes `rawChecksum`
(the same symbol referenced at lines ~1161-1162) to respect the migrated
checksum state and add a vitest regression (e.g., test/storage.test.ts) that
performs migration + transaction persist to assert no ECONFLICT on the
subsequent persist and reference this test in the change.

---

Duplicate comments:
In `@lib/codex-manager.ts`:
- Around line 152-165: Move recordAuthAttempt(rateLimitKey) into the same
guarded pre-check block with checkAuthRateLimit(rateLimitKey) inside the
try/catch in codex-manager (so both pre-checks are atomic) and, inside the catch
that currently handles AuthRateLimitError, also catch errors thrown by
recordAuthAttempt (including filesystem EBUSY and HTTP 429-style errors) and
return a TokenResult-style failure object ({ type: "failed", reason: "unknown",
message: error.message }) instead of letting the error bubble and abort
multi-account flows; update the code paths around queuedRefresh(refreshToken) to
only run when the pre-check block succeeds. Add a vitest regression in
test/codex-manager-cli.test.ts that stubs recordAuthAttempt to throw an
AuthRateLimitError and an EBUSY-style error for one account and asserts other
accounts still proceed (i.e., their queuedRefresh runs), covering Windows EBUSY
and 429-like failures.

In `@lib/file-lock.ts`:
- Around line 215-219: The flag released is being set before the actual release
call completes, which prevents retries on transient errors (EBUSY/EPERM); update
the async release closure in the returned locker (the release function that
calls releaseOwnedLock) so that released = true is assigned only after
releaseOwnedLock(path, metadata.ownerId) resolves successfully (and similarly
move the assignment in the synchronous variant that calls releaseOwnedLockSync),
propagate or rethrow errors so callers can retry on transient failures, and
add/adjust vitest cases (tests referencing replacement-race, release.*ebusy,
release.*eperm, stale in file-lock.test.ts) to assert retry behavior and that
logging does not leak tokens/emails when failures are logged.

In `@test/file-lock.test.ts`:
- Around line 10-29: The test defines a local removeWithRetry that duplicates
the centralized helper; delete this local function and import and use the shared
helper instead (replace the local removeWithRetry with the exported helper from
the test helpers module), updating any calls in file-lock.test.ts to call the
imported helper; ensure the import name matches the helper's export (e.g.,
removeWithRetry or the actual exported identifier) and remove the
RETRYABLE_REMOVE_CODES constant which is now handled by the shared
implementation.

In `@test/refresh-guardian.test.ts`:
- Around line 74-80: The test currently conflates the "manager missing" path
(which exits before timing setup) with the "no enabled accounts" path (which
still executes the finally block and should update timing). Split the assertions
into two deterministic checks: for the missing-manager case (use
missingManagerGuardian.tick() and missingManagerGuardian.getStats()) assert
runs==0 and that timing fields remain at their initialized zero/unchanged
values; for the no-enabled-accounts case invoke the guardian path that reaches
the finally block (the code path in lib/refresh-guardian.ts that sets timing
when countedRun=false) and assert that lastTickDurationMs and related timing
fields are updated (e.g., >0 or non-zero) to prove the finally timing logic ran.
Ensure you reference and update the specific test invocations of
missingManagerGuardian.tick() and getStats(), and add a separate test or adjust
the existing one to explicitly exercise the countedRun=false branch so timing is
deterministically checked.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9cdd410c-afbd-4fc7-9568-e3f53e2786e9

📥 Commits

Reviewing files that changed from the base of the PR and between 14fec87 and 6277c85.

📒 Files selected for processing (45)

.github/workflows/supply-chain.yml
docs/development/CONFIG_FIELDS.md
docs/reference/commands.md
docs/runbooks/operations.md
index.ts
lib/accounts.ts
lib/accounts/account-view.ts
lib/auth/auth.ts
lib/authorization.ts
lib/codex-manager.ts
lib/config.ts
lib/data-redaction.ts
lib/data-retention.ts
lib/file-lock.ts
lib/idempotency.ts
lib/request/response-handler.ts
lib/secrets-crypto.ts
lib/shutdown.ts
lib/storage.ts
lib/telemetry.ts
lib/unified-settings.ts
scripts/benchmark-runtime-path-helpers.mjs
scripts/benchmark-runtime-path.mjs
test/account-view.test.ts
test/accounts-edge.test.ts
test/auth.test.ts
test/authorization.test.ts
test/background-jobs.test.ts
test/benchmark-runtime-path-helpers.test.ts
test/codex-manager-cli.test.ts
test/config-save.test.ts
test/data-retention.test.ts
test/fetch-helpers.test.ts
test/file-lock.test.ts
test/idempotency.test.ts
test/index-retry.test.ts
test/index.test.ts
test/refresh-guardian.test.ts
test/retry-governor.test.ts
test/settings-hub-utils.test.ts
test/shutdown.test.ts
test/storage.test.ts
test/telemetry-lock.test.ts
test/telemetry.test.ts
test/tool-utils.test.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (3)

test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

test/retry-governor.test.ts
test/account-view.test.ts
test/accounts-edge.test.ts
test/background-jobs.test.ts
test/data-retention.test.ts
test/telemetry-lock.test.ts
test/shutdown.test.ts
test/settings-hub-utils.test.ts
test/telemetry.test.ts
test/file-lock.test.ts
test/tool-utils.test.ts
test/auth.test.ts
test/index-retry.test.ts
test/refresh-guardian.test.ts
test/authorization.test.ts
test/codex-manager-cli.test.ts
test/index.test.ts
test/config-save.test.ts
test/benchmark-runtime-path-helpers.test.ts
test/fetch-helpers.test.ts
test/idempotency.test.ts
test/storage.test.ts

lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

lib/data-redaction.ts
lib/request/response-handler.ts
lib/accounts.ts
lib/data-retention.ts
lib/idempotency.ts
lib/file-lock.ts
lib/auth/auth.ts
lib/accounts/account-view.ts
lib/shutdown.ts
lib/secrets-crypto.ts
lib/authorization.ts
lib/config.ts
lib/unified-settings.ts
lib/storage.ts
lib/codex-manager.ts
lib/telemetry.ts

docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

docs/reference/commands.md
docs/runbooks/operations.md
docs/development/CONFIG_FIELDS.md

🧬 Code graph analysis (28)

test/retry-governor.test.ts (1)

lib/request/retry-governor.ts (1)

decideRetryAllAccountsRateLimited (43-73)

test/account-view.test.ts (2)

lib/accounts/account-view.ts (5)

resolveActiveIndex (16-26)

getRateLimitResetTimeForFamily (28-48)

formatRateLimitEntry (50-60)

formatRateLimitStatusByFamily (62-74)

formatActiveIndexByFamilyLabels (76-85)

lib/prompts/codex.ts (1)

MODEL_FAMILIES (65-71)

test/background-jobs.test.ts (1)

lib/background-jobs.ts (2)

runBackgroundJobWithRetry (71-117)

getBackgroundJobDlqPath (67-69)

test/data-retention.test.ts (2)

scripts/benchmark-runtime-path-helpers.mjs (1)

removeWithRetry (9-29)

lib/data-retention.ts (2)

RetentionPolicy (5-11)

enforceDataRetention (119-146)

test/telemetry-lock.test.ts (2)

scripts/benchmark-runtime-path.mjs (1)

tempDir (228-228)

scripts/benchmark-runtime-path-helpers.mjs (1)

removeWithRetry (9-29)

test/telemetry.test.ts (1)

lib/telemetry.ts (6)

getTelemetryConfig (294-296)

configureTelemetry (290-292)

recordTelemetryEvent (302-334)

queryTelemetryEvents (336-356)

getTelemetryLogPath (298-300)

summarizeTelemetryEvents (358-387)

test/file-lock.test.ts (1)

lib/file-lock.ts (1)

acquireFileLock (176-238)

lib/accounts.ts (3)

lib/storage.ts (3)

AccountStorageV3 (34-34)

saveAccounts (1389-1394)

loadAccounts (1039-1041)

lib/utils.ts (1)

sleep (65-67)

lib/background-jobs.ts (1)

runBackgroundJobWithRetry (71-117)

lib/data-retention.ts (1)

lib/runtime-paths.ts (3)

getCodexLogDir (226-228)

getCodexCacheDir (212-214)

getCodexMultiAuthDir (166-199)

test/tool-utils.test.ts (1)

lib/request/helpers/tool-utils.ts (1)

cleanupToolDefinitions (89-122)

lib/file-lock.ts (1)

lib/utils.ts (1)

sleep (65-67)

lib/auth/auth.ts (2)

lib/utils.ts (2)

fetchWithTimeout (76-134)

isAbortError (20-24)

lib/schemas.ts (1)

safeParseOAuthTokenResponse (312-318)

lib/accounts/account-view.ts (1)

lib/prompts/codex.ts (2)

ModelFamily (54-59)

MODEL_FAMILIES (65-71)

test/auth.test.ts (1)

lib/auth/auth.ts (3)

exchangeAuthorizationCode (131-208)

REDIRECT_URI (12-12)

refreshAccessToken (242-300)

test/index-retry.test.ts (1)

index.ts (1)

OpenAIAuthPlugin (4415-4415)

test/refresh-guardian.test.ts (1)

lib/refresh-guardian.ts (1)

RefreshGuardian (32-206)

test/authorization.test.ts (2)

lib/authorization.ts (2)

getAuthorizationRole (38-40)

authorizeAction (107-145)

lib/audit.ts (2)

getAuditConfig (63-65)

configureAudit (59-61)

scripts/benchmark-runtime-path.mjs (5)

lib/codex-cli/state.ts (1)

clearCodexCliStateCache (545-549)

scripts/benchmark-runtime-path-helpers.mjs (2)

removeWithRetry (9-29)

assertSyncBenchmarkMergeResult (31-46)

lib/storage.ts (1)

normalizeAccountStorage (944-1032)

lib/codex-cli/sync.ts (1)

syncAccountStorageFromCodexCli (328-421)

lib/request/fetch-helpers.ts (2)

handleErrorResponse (595-632)

handleSuccessResponseDetailed (651-684)

test/codex-manager-cli.test.ts (1)

lib/codex-manager.ts (1)

runCodexMultiAuthCli (4703-4810)

test/index.test.ts (1)

lib/prompts/codex.ts (1)

MODEL_FAMILIES (65-71)

test/benchmark-runtime-path-helpers.test.ts (3)

scripts/benchmark-runtime-path-helpers.mjs (4)

assertSyncBenchmarkMergeResult (31-46)

remove (14-14)

sleep (5-7)

removeWithRetry (9-29)

lib/utils.ts (1)

sleep (65-67)

test/helpers/fs-retry.ts (1)

removeWithRetry (5-22)

test/fetch-helpers.test.ts (1)

lib/request/fetch-helpers.ts (2)

handleSuccessResponseDetailed (651-684)

handleErrorResponse (595-632)

lib/authorization.ts (1)

lib/audit.ts (1)

auditLog (123-153)

lib/config.ts (3)

lib/schemas.ts (1)

PluginConfigSchema (13-62)

lib/utils.ts (2)

isRecord (11-13)

sleep (65-67)

lib/file-lock.ts (1)

acquireFileLock (176-238)

test/idempotency.test.ts (1)

lib/idempotency.ts (2)

checkAndRecordIdempotencyKey (110-149)

getIdempotencyStorePath (106-108)

lib/unified-settings.ts (3)

lib/runtime-paths.ts (1)

getCodexMultiAuthDir (166-199)

lib/file-lock.ts (2)

acquireFileLockSync (257-323)

acquireFileLock (176-238)

lib/utils.ts (1)

sleep (65-67)

test/storage.test.ts (1)

lib/storage.ts (7)

loadAccounts (1039-1041)

AccountStorageV3 (34-34)

saveAccounts (1389-1394)

saveFlaggedAccounts (1578-1600)

loadFlaggedAccounts (1529-1576)

getStoragePath (626-631)

rotateStoredSecretEncryption (1705-1768)

lib/telemetry.ts (3)

lib/runtime-paths.ts (1)

getCodexLogDir (226-228)

lib/logger.ts (1)

getCorrelationId (137-139)

lib/file-lock.ts (1)

acquireFileLock (176-238)

🪛 ast-grep (0.41.0)

test/index.test.ts

[warning] 903-906: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 LanguageTool

docs/runbooks/operations.md

[uncategorized] ~68-~68: The official name of this software platform is spelled with a capital “H”.
Context: ...ion required checks remain aligned with .github/settings.yml. --- ## Failure Triage ...

(GITHUB)

🔇 Additional comments (35)

test/tool-utils.test.ts (1)

623-648: good regression coverage for non-finite normalization.

this change closes the earlier gap by covering nan, +infinity, and -infinity, and it verifies both flattening (anyOf removed) and json-safe coercion ([null, null, null]). citations: test/tool-utils.test.ts:623, test/tool-utils.test.ts:635, test/tool-utils.test.ts:646, test/tool-utils.test.ts:647, lib/request/helpers/tool-utils.ts:88.

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

test/refresh-guardian.test.ts (1)

570-596: good deterministic regression coverage for duration aggregates.

test/refresh-guardian.test.ts:570-596 cleanly validates last/max/cumulative/avg math against controlled timing inputs and maps well to lib/refresh-guardian.ts:189-201. this is strong protection against timing-metric regressions.

test/fetch-helpers.test.ts (2)

509-540: good coverage for non-streaming detailed-mode contracts.

nice lock-in of parse behavior and 502 mapping in test/fetch-helpers.test.ts:509 and test/fetch-helpers.test.ts:530 for the conversion path in lib/request/fetch-helpers.ts:650.

584-592: good regression for non-json error parse debug path.

the assertion in test/fetch-helpers.test.ts:590 gives useful coverage for the fallback branch exercised by lib/request/fetch-helpers.ts:594.

test/config-save.test.ts (3)

76-150: good concurrency regression coverage for transient env-path locks.

test/config-save.test.ts:76-150 now proves the contention path ran and preserves existing keys after retries. this is the right kind of windows/concurrency regression guard.

152-230: solid deterministic coverage for sync lock retries and exhaustion fallback.

test/config-save.test.ts:152-230 explicitly validates exponential backoff timing and exhausted-retry fallback behavior under retryable lock errors. this materially reduces concurrency regression risk.

271-277: default concurrency expectation update is consistent.

test/config-save.test.ts:271-277 correctly locks the new default path at parallelProbingMaxConcurrency = 5 when using unified settings.

scripts/benchmark-runtime-path-helpers.mjs (1)

9-29: retry helper looks good for transient fs cleanup paths.

the bounded retries and exponential backoff are appropriate for windows-style lock contention and are exercised by tests (test/benchmark-runtime-path-helpers.test.ts:62, test/benchmark-runtime-path-helpers.test.ts:79).

scripts/benchmark-runtime-path.mjs (3)

38-50: good fix for process-wide mutation concurrency risk.

serializing process.env and cache mutation through withProcessStateMutationLock is the right containment pattern for this benchmark path (lib/codex-cli/state.ts:544).

226-269: cleanup and teardown path is robust now.

the setup/restore block is tight, and using retryable temp-dir cleanup reduces windows flakiness while preserving failure semantics (test/benchmark-runtime-path-helpers.test.ts:62).

313-326: sync benchmark guard now validates real merge behavior.

seeding a stale token and asserting changed plus repaired token closes the no-op false-positive hole against reconcile semantics (lib/codex-cli/sync.ts:327).

test/benchmark-runtime-path-helpers.test.ts (1)

8-60: token-repair regression coverage is strong and deterministic.

these cases correctly gate unchanged reconcile outputs and stale-token non-repair paths, matching reconcile behavior expectations (lib/codex-cli/sync.ts:327).
.github/workflows/supply-chain.yml (2)
22-26: good hardening updates landed.

immutable action pinning and the sca job timeout are in place and look correct. see lib/.github/workflows/supply-chain.yml:22 and lib/.github/workflows/supply-chain.yml:34.

Also applies to: 34-34, 38-42, 61-64, 76-76

33-67: windows regression tests already exist in ci.yml.

supply-chain.yml runs ubuntu-only, which is correct for its purpose (vulnerability and license policy gates). windows regression testing for the pr scope is already covered in .github/workflows/ci.yml:129 with windows-latest runners executing win32-specific tests at line 151: test/runtime-paths.test.ts, test/file-lock.test.ts, test/quota-cache.test.ts. no action needed.
			> Likely an incorrect or invalid review comment.
lib/shutdown.ts (2)

30-48: bounded timeout handling for in-flight callers looks good.

lib/shutdown.ts:30-48 (Line 30-48) correctly applies the same timeout wrapper to in-flight cleanup waits, which prevents later callers from waiting unboundedly.

54-66: add regression test for synchronous re-entry from cleanup callbacks.

lib/shutdown.ts:54-66 already guards against re-entrancy correctly — cleanupInFlight is assigned before any cleanup function executes (the IIFE body is microtask-scheduled). but test coverage is missing: add a deterministic vitest case in test/shutdown.test.ts where a cleanup function calls runCleanup() synchronously and assert cleanup still executes as a single pass with the in-flight guard working.

test/shutdown.test.ts (1)

83-143: good deterministic timeout and in-flight concurrency coverage.

test/shutdown.test.ts:83-143 (Line 83-143) uses fake timers, controlled deferred cleanup resolution, and explicit restoration paths. this is solid regression coverage for timeout and concurrent-caller behavior.

test/retry-governor.test.ts (1)

4-157: good deterministic coverage for retry-governor decision branches.

test/retry-governor.test.ts:4-157 covers the core allow/deny reasons, clamp normalization, and absolute-ceiling behavior with stable assertions. this is solid regression coverage.

lib/data-redaction.ts (1)

48-82: cycle handling and plain-object traversal look correct.

lib/data-redaction.ts:48-82 uses a WeakSet for circular references and avoids recursing non-plain objects via isObjectRecord from lib/data-redaction.ts:22-28. this is the right hardening direction for redaction safety.

test/background-jobs.test.ts (1)

114-136: good concurrency regression for dlq file-lock serialization.

test/background-jobs.test.ts:114-136 directly exercises parallel failing jobs and verifies serialized dlq persistence. this is exactly the kind of regression that catches windows/concurrency write races.

lib/accounts.ts (1)

834-862: good bounds clamping in conflict merge output.

lib/accounts.ts:834-862 now clamps activeIndexByFamily and activeIndex against mergedAccounts bounds, which hardens the ebusy/econflict merge path.

test/index-retry.test.ts (1)

318-374: good token-refresh race regression with deterministic call-count assertions.

test/index-retry.test.ts:318-374 verifies single-flight behavior at the plugin integration seam by asserting both refreshAndUpdateTokenMock and upstream refresh endpoint call counts under concurrent sdk.fetch calls.

test/idempotency.test.ts (1)

42-113: good deterministic coverage for ttl and duplicate-key race paths.

the fake-timer ttl tests and the concurrent duplicate-key assertion are solid and directly exercise the critical lock-protected path (test/idempotency.test.ts:42-113, lib/idempotency.ts:109-148).

test/data-retention.test.ts (1)

81-161: nice windows-oriented retention regressions.

the transient EBUSY retry case plus explicit EPERM/EBUSY rejection checks give strong, deterministic filesystem coverage (test/data-retention.test.ts:81-161).

test/settings-hub-utils.test.ts (1)

289-319: good regression coverage for retry-all absolute ceiling semantics.

preview, metadata, and persisted reload checks are all covered, and line [314]-line [318] now verifies actual on-disk state instead of only returned values (test/settings-hub-utils.test.ts:289-319).

lib/file-lock.ts (1)

94-133: ownership re-check before unlink is solid race hardening

the two-phase owner verification in lib/file-lock.ts:94-133 and lib/file-lock.ts:135-174 (Line 109-112, Line 150-153) is a good mitigation for stale-handle delete races.

Also applies to: 135-174

test/codex-manager-cli.test.ts (2)

396-500: good deterministic telemetry coverage

test/codex-manager-cli.test.ts:396-500 and test/codex-manager-cli.test.ts:512-569 (Line 401-403, Line 537-559) add stable time control, windows path assertions, redaction assertions, and explicit telemetry await semantics. this is strong regression coverage.

Also applies to: 512-569

1193-1339: timeout, pagination, rate-limit, and idempotency regressions are well scoped

test/codex-manager-cli.test.ts:1193-1339 and test/codex-manager-cli.test.ts:2515-2868 (Line 1254-1259, Line 2620-2669, Line 2733-2818) cover min-timeout clamp, env override precedence, pagination cursor behavior, rate-limit continuation, and rotate-secrets idempotency/redaction paths.

Also applies to: 2515-2868

test/storage.test.ts (2)

705-821: good concurrency regression depth for load/save paths

test/storage.test.ts:705-821 and test/storage.test.ts:853-889 (Line 758-821, Line 883-889) validate cleanup scan dedup under concurrent loadAccounts and guard against stale overwrite conflicts.

Also applies to: 853-889

1703-1800: windows and lock/rotation resilience coverage is strong

test/storage.test.ts:1703-1800 and test/storage.test.ts:2114-2184 (Line 1703-1730, Line 1761-1800, Line 2132-2184) add targeted regressions for ebusy/eperm lock-release behavior, lock serialization, flagged rename retry, and rotation failure signaling.

Also applies to: 2114-2184

lib/idempotency.ts (1)

51-62: idempotency store hardening looks good

lib/idempotency.ts:51-62, lib/idempotency.ts:73-92, and lib/idempotency.ts:99-149 (Line 58-60, Line 73-83, Line 133-145) properly handle malformed json recovery, windows rename retries, ttl normalization, and lock-serialized replay detection.

Also applies to: 73-92, 99-104, 110-149

lib/data-retention.ts (1)

45-66: directory cleanup retry strategy is a good windows-safe addition

lib/data-retention.ts:45-66 (Line 53-63) correctly adds bounded retry/backoff for enotempty/ebusy/eperm around rmdir, which reduces flaky retention failures under concurrent writers.

lib/unified-settings.ts (2)

200-243: good fix for revision-check toctou

lib/unified-settings.ts:200-243 and lib/unified-settings.ts:273-313 (Line 207-212, Line 280-285) now validate expectedRevision while holding the file lock. this closes the prior check-then-lock race.

Also applies to: 273-313

367-414: conflict-retry flow is well implemented

lib/unified-settings.ts:367-414 and lib/unified-settings.ts:453-470 (Line 369-383, Line 455-469) add bounded ECONFLICT retries for both plugin and dashboard saves, which is the right tradeoff for optimistic concurrency under contention.

Also applies to: 453-470

lib/config.ts (1)

394-435: good hardening on config writes

the lock acquisition + revision check + retryable rename flow in lib/config.ts:394-435 is a solid improvement for cross-process safety and windows transient fs contention handling.

coderabbitai · 2026-03-05T01:35:11Z

.github/workflows/supply-chain.yml

+name: Supply Chain
+
+on:
+  pull_request:
+    branches: [main]
+  push:
+    branches: [main]
+  schedule:
+    - cron: "0 4 * * 1"
+
+jobs:


🧹 Nitpick | 🔵 Trivial

add workflow concurrency to avoid overlapping supply-chain runs.

there is no concurrency guard, so push/schedule runs can overlap and duplicate expensive gates. add a workflow-level concurrency group. see lib/.github/workflows/supply-chain.yml:1.

proposed patch

name: Supply Chain + +concurrency: + group: supply-chain-${{ github.ref }} + cancel-in-progress: true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

name: Supply Chain

on:

pull_request:

branches: [main]

push:

branches: [main]

schedule:

- cron: "0 4 * * 1"

jobs:

name: Supply Chain

concurrency:

group: supply-chain-${{ github.ref }}

cancel-in-progress: true

on:

pull_request:

branches: [main]

push:

branches: [main]

schedule:

- cron: "0 4 * * 1"

jobs:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/supply-chain.yml around lines 1 - 11, Add a workflow-level concurrency block to supply-chain.yml to prevent overlapping runs: at the top-level (before the jobs key) add a concurrency key with a unique group name (e.g., "supply-chain-${{ github.ref }}") and set cancel-in-progress: true so a new run cancels any in-flight run for the same branch/ref; ensure the concurrency block is placed alongside name and on, not inside any job.

coderabbitai · 2026-03-05T01:35:11Z

.github/workflows/supply-chain.yml

+on:
+  pull_request:
+    branches: [main]
+  push:
+    branches: [main]
+  schedule:
+    - cron: "0 4 * * 1"
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/supply-chain.yml

Repository: ndycode/codex-multi-auth

Length of output: 2618

scope sca/sbom away from pull_request execution.

sca-and-license and sbom both run npm ci and npx on untrusted pr code. gate these two jobs to push/schedule only, keeping dependency-review on pr where it belongs. see .github/workflows/supply-chain.yml:3-9 for triggers and .github/workflows/supply-chain.yml:31-79 for the unguarded jobs.

proposed patch

sca-and-license: name: SCA and License Gate + if: github.event_name != 'pull_request' runs-on: ubuntu-latest timeout-minutes: 15 @@ sbom: name: Generate SBOM + if: github.event_name != 'pull_request' runs-on: ubuntu-latest

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/supply-chain.yml around lines 3 - 10, The sca-and-license and sbom jobs are currently running for pull_request and execute npm/npx on untrusted PR code; update the workflow so those two jobs (sca-and-license and sbom) are gated to run only on push and schedule (not pull_request) while leaving dependency-review running on pull_request; implement this by restricting their job-level runs-on condition (e.g., add an if: github.event_name != 'pull_request' or equivalent job-level trigger) or by splitting triggers so the global on: keeps pull_request for dependency-review but sca-and-license and sbom are scoped to push and schedule only.

coderabbitai · 2026-03-05T01:35:11Z

.github/workflows/supply-chain.yml

+      - name: Generate CycloneDX SBOM
+        run: npx --yes @cyclonedx/cyclonedx-npm --output-file sbom.cdx.json --omit dev


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the supply-chain workflow file cat -n .github/workflows/supply-chain.yml | sed -n '65,80p'

Repository: ndycode/codex-multi-auth

Length of output: 608

🏁 Script executed:

# Search for any other version pins in the workflow rg "@cyclonedx" .github/workflows/ -A 2 -B 2

Repository: ndycode/codex-multi-auth

Length of output: 404

🏁 Script executed:

# Check if there are any tests for sbom generation fd -e "test\.*" -e "*.test.*" | xargs rg -l "sbom\|cyclone" 2>/dev/null || echo "no sbom tests found"

Repository: ndycode/codex-multi-auth

Length of output: 87

🏁 Script executed:

# Check for any existing version constraints or lockfiles that might affect this fd -e "package-lock.json" -e "package.json" -e "yarn.lock" | head -5

Repository: ndycode/codex-multi-auth

Length of output: 50

🏁 Script executed:

# Search for any documentation or comments about sbom generation versioning rg "cyclone|sbom" --type-list | head -5 rg -i "cyclone" . -A 3 -B 3 --type md --type yml 2>/dev/null | head -50

Repository: ndycode/codex-multi-auth

Length of output: 457

🏁 Script executed:

# Check the full workflow structure and job configuration head -n 90 .github/workflows/supply-chain.yml | cat -n

Repository: ndycode/codex-multi-auth

Length of output: 2618

🏁 Script executed:

# Check if there are any concurrency or matrix configurations rg -i "concurrency|matrix|strategy|os\:" .github/workflows/supply-chain.yml -B 2 -A 2

Repository: ndycode/codex-multi-auth

Length of output: 196

🏁 Script executed:

# Check what versions of cyclonedx-npm are available and commonly used web_search "cyclonedx-npm latest version pinning best practice supply chain"

Repository: ndycode/codex-multi-auth

Length of output: 118

🏁 Script executed:

# Check if there are any other npm package pins in this workflow rg "npx.*@.*@" .github/workflows/supply-chain.yml

Repository: ndycode/codex-multi-auth

Length of output: 50

🏁 Script executed:

# Look at the broader supply chain workflow context head -n 30 .github/workflows/supply-chain.yml | cat -n

Repository: ndycode/codex-multi-auth

Length of output: 967

🌐 Web query:

cyclonedx-npm current version best practices pinning npm package supply chain security

💡 Result:

Current version (cyclonedx-npm)

The latest tagged release of @cyclonedx/cyclonedx-npm is 4.2.0 (released Mar 3, 2026). [1]

Pin it explicitly, e.g.

CI one-off: npx --yes --package @cyclonedx/cyclonedx-npm@4.2.0 cyclonedx-npm --help (or your args)

Repo-local: npm i -D @cyclonedx/cyclonedx-npm@4.2.0 (then run via npx cyclonedx-npm ...) [1]

Best practices: pinning + npm supply-chain security (practical checklist)

1) Prefer lockfile-based, reproducible installs

Commit your lockfile and use npm ci in CI to install exactly what’s in it (instead of re-resolving). [2]

Keep version ranges in package.json tight where feasible; from a security perspective “narrower is better,” and the lockfile should be committed. [2]

2) Reduce install-time attack surface

Consider installing with lifecycle scripts disabled in hardened CI paths (then re-enable only where needed), e.g. npm ci --ignore-scripts. [3]

3) Enforce “known-good” dependency resolution

Use policy tooling to ensure dependencies are pinned/come from allowed registries (lockfile linting / allowlists), and review diffs to package-lock.json like code. [3]

4) SBOM generation hygiene with cyclonedx-npm

For maximum determinism, you can generate SBOMs from the lockfile (less dependent on what’s present in node_modules) using --package-lock-only. [1]

Enable SBOM validation (--validate) as part of CI to catch malformed output early. [1]

5) Continuous monitoring

Run npm audit (or your SCA of choice) continuously and treat new advisories as events that trigger controlled updates (PRs that change both package.json + lockfile). [3]

Sources

[1] CycloneDX/cyclonedx-node-npm GitHub repository (releases + usage/options) (github.com)
[2] npm Blog: “AppSec POV on Dependency Management” (lockfiles; narrower pinning) (blog.npmjs.org)
[3] “npm-security-best-practices” (pinning; npm ci; --ignore-scripts; provenance/trusted publishing concepts) (github.com)

Citations:

1: https://github.com/CycloneDX/cyclonedx-node-npm

2: https://blog.npmjs.org/post/187496869845/appsec-pov-on-dependency-management.html?utm_source=openai

3: https://github.com/bodadotsh/npm-security-best-practices?utm_source=openai

pin @cyclonedx/cyclonedx-npm to explicit version for reproducible sbom generation.

unpinned npx --yes @cyclonedx/cyclonedx-npm`` at .github/workflows/supply-chain.yml:73 can pull different versions across runs, breaking sbom determinism and supply-chain repeatability. pin to a known version like @4.2.0 to guarantee consistent output.

note: no regression tests exist for sbom generation; consider adding a basic test that validates the generated sbom artifact format and required fields to catch version-induced breaks early.

windows edge cases and concurrency risks don't apply here (job runs on ubuntu-latest only, no matrix or concurrency config).

proposed patch

- - name: Generate CycloneDX SBOM - run: npx --yes `@cyclonedx/cyclonedx-npm` --output-file sbom.cdx.json --omit dev + - name: Generate CycloneDX SBOM + run: npx --yes `@cyclonedx/cyclonedx-npm`@4.2.0 --output-file sbom.cdx.json --omit dev

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Generate CycloneDX SBOM

run: npx --yes @cyclonedx/cyclonedx-npm --output-file sbom.cdx.json --omit dev

- name: Generate CycloneDX SBOM

run: npx --yes `@cyclonedx/cyclonedx-npm`@4.2.0 --output-file sbom.cdx.json --omit dev

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/supply-chain.yml around lines 72 - 73, The workflow step "Generate CycloneDX SBOM" currently invokes an unpinned CLI via the command string `npx --yes `@cyclonedx/cyclonedx-npm` --output-file sbom.cdx.json --omit dev`; pin that package to a specific released version (e.g., change to `@4.2.0`) in the same command to ensure reproducible SBOMs, and update the "Generate CycloneDX SBOM" step metadata in .github/workflows/supply-chain.yml accordingly; additionally add a minimal CI job or test that runs the SBOM generation and asserts the produced sbom.cdx.json contains required top-level fields (e.g., bomFormat/version) to catch future version-induced breaks.

coderabbitai · 2026-03-05T01:35:11Z

docs/development/CONFIG_FIELDS.md

 | `retryAllAccountsRateLimited` | `true` |
 | `retryAllAccountsMaxWaitMs` | `0` |
 | `retryAllAccountsMaxRetries` | `Infinity` |
+| `retryAllAccountsAbsoluteCeilingMs` | `0` |


⚠️ Potential issue | 🟡 Minor

update retry max-retries default to match runtime

the table around docs/development/CONFIG_FIELDS.md:64-65 still implies retryAllAccountsMaxRetries defaults to Infinity, but runtime default is 12 (lib/config.ts:44, lib/config.ts:137). please align the docs to prevent operator misconfiguration.

As per coding guidelines docs/**: keep README, SECURITY, and docs consistent with actual CLI flags and workflows.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/development/CONFIG_FIELDS.md` at line 65, The docs table lists retryAllAccountsMaxRetries as defaulting to Infinity but runtime sets it to 12; update the documentation entry for retryAllAccountsMaxRetries in CONFIG_FIELDS.md to reflect the actual runtime default of 12 and ensure any related fields (e.g., retryAllAccountsAbsoluteCeilingMs) remain accurate; locate the doc row referencing retryAllAccountsMaxRetries and change its default value to 12 so it matches the value set in lib/config.ts (where retryAllAccountsMaxRetries is initialized).

coderabbitai · 2026-03-05T01:35:11Z

docs/runbooks/operations.md

+   - `CODEX_AUTH_ROLE=admin|operator|viewer`
+   - `CODEX_AUTH_BREAK_GLASS=1` only for emergency windows


⚠️ Potential issue | 🟠 Major

document the new unset-role default explicitly.

line 45 lists allowed values, but it does not state that codex_auth_role now defaults to viewer when unset. this is an operator-facing behavior change and should be explicit in the runbook to avoid accidental privilege assumptions.

proposed docs patch

2. Validate RBAC defaults for deployment environments: - - `CODEX_AUTH_ROLE=admin|operator|viewer` + - `CODEX_AUTH_ROLE=admin|operator|viewer` (defaults to `viewer` when unset) - `CODEX_AUTH_BREAK_GLASS=1` only for emergency windows

As per coding guidelines, docs/**: keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- `CODEX_AUTH_ROLE=admin|operator|viewer`

- `CODEX_AUTH_BREAK_GLASS=1` only for emergency windows

- `CODEX_AUTH_ROLE=admin|operator|viewer` (defaults to `viewer` when unset)

- `CODEX_AUTH_BREAK_GLASS=1` only for emergency windows

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/runbooks/operations.md` around lines 45 - 46, The runbook list of env vars omits that CODEX_AUTH_ROLE defaults to "viewer" when unset; update the operations.md entry near the CODEX_AUTH_ROLE / CODEX_AUTH_BREAK_GLASS bullet list to explicitly state "If CODEX_AUTH_ROLE is not set, it defaults to 'viewer' (operator-facing behavior change)" and mention CODEX_AUTH_BREAK_GLASS is for emergency windows only; also add a short upgrade note and mirror this default in README and SECURITY docs and any npm scripts or CLI docs that reference auth role behavior so all docs stay consistent.

coderabbitai · 2026-03-05T01:35:12Z

test/index.test.ts

+		const getAuth = async () => ({
+			type: "oauth" as const,
+			access: "sk-live-sensitive-access-token-1234567890",
+			refresh: "refresh-sensitive-token-1234567890",
+			expires: Date.now() + 60_000,
+			multiAccount: true,
+		});
+
+		try {
+			const sdk = await plugin.auth.loader(getAuth, { options: {}, models: {} });
+			const response = await sdk.fetch!("https://api.openai.com/v1/chat", {
+				method: "POST",
+				body: JSON.stringify({ model: "gpt-5.1" }),
+			});
+			expect(response.status).toBe(503);
+
+			expect(recordTelemetryEventMock).toHaveBeenCalled();
+			const serializedCalls = JSON.stringify(recordTelemetryEventMock.mock.calls);
+			expect(serializedCalls).not.toContain("sk-live-sensitive-access-token-1234567890");
+			expect(serializedCalls).not.toContain("refresh-sensitive-token-1234567890");
+		} finally {


⚠️ Potential issue | 🟠 Major

replace secret-like fixture literals with clearly synthetic tokens.

test/index.test.ts:2244-2246 uses sk-live-...-style values. this can trip secret scanners and violates test hygiene constraints for mocked secrets.

proposed fix

const getAuth = async () => ({ type: "oauth" as const, - access: "sk-live-sensitive-access-token-1234567890", - refresh: "refresh-sensitive-token-1234567890", + access: "telemetry_access_token_fixture_1234567890", + refresh: "telemetry_refresh_token_fixture_1234567890", expires: Date.now() + 60_000, multiAccount: true, }); @@ - expect(serializedCalls).not.toContain("sk-live-sensitive-access-token-1234567890"); - expect(serializedCalls).not.toContain("refresh-sensitive-token-1234567890"); + expect(serializedCalls).not.toContain("telemetry_access_token_fixture_1234567890"); + expect(serializedCalls).not.toContain("telemetry_refresh_token_fixture_1234567890");

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const getAuth = async () => ({

type: "oauth" as const,

access: "sk-live-sensitive-access-token-1234567890",

refresh: "refresh-sensitive-token-1234567890",

expires: Date.now() + 60_000,

multiAccount: true,

});

try {

const sdk = await plugin.auth.loader(getAuth, { options: {}, models: {} });

const response = await sdk.fetch!("https://api.openai.com/v1/chat", {

method: "POST",

body: JSON.stringify({ model: "gpt-5.1" }),

});

expect(response.status).toBe(503);

expect(recordTelemetryEventMock).toHaveBeenCalled();

const serializedCalls = JSON.stringify(recordTelemetryEventMock.mock.calls);

expect(serializedCalls).not.toContain("sk-live-sensitive-access-token-1234567890");

expect(serializedCalls).not.toContain("refresh-sensitive-token-1234567890");

} finally {

const getAuth = async () => ({

type: "oauth" as const,

access: "telemetry_access_token_fixture_1234567890",

refresh: "telemetry_refresh_token_fixture_1234567890",

expires: Date.now() + 60_000,

multiAccount: true,

});

try {

const sdk = await plugin.auth.loader(getAuth, { options: {}, models: {} });

const response = await sdk.fetch!("https://api.openai.com/v1/chat", {

method: "POST",

body: JSON.stringify({ model: "gpt-5.1" }),

});

expect(response.status).toBe(503);

expect(recordTelemetryEventMock).toHaveBeenCalled();

const serializedCalls = JSON.stringify(recordTelemetryEventMock.mock.calls);

expect(serializedCalls).not.toContain("telemetry_access_token_fixture_1234567890");

expect(serializedCalls).not.toContain("telemetry_refresh_token_fixture_1234567890");

} finally {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/index.test.ts` around lines 2242 - 2262, The getAuth test fixture is using real-looking secret literals ("sk-live-sensitive-access-token-1234567890" and "refresh-sensitive-token-1234567890"); replace those with clearly synthetic tokens (e.g., "sk-test-0000000000" and "refresh-test-0000000000") inside the getAuth async function so secret scanners won't be triggered, and keep the rest of the test logic (plugin.auth.loader, sdk.fetch call, and telemetry assertions using recordTelemetryEventMock) unchanged to preserve behavior and assertions that telemetry does not contain the synthetic tokens.

coderabbitai · 2026-03-05T01:35:12Z

test/shutdown.test.ts

+	it("defaults to 8s timeout for non-positive and non-numeric values", async () => {
+		const originalTimeout = process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS;
+		vi.useFakeTimers();
+		try {
+			const runCase = async (value: string) => {
+				process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = value;
+				let resolved = false;
+				let releaseCleanup: (() => void) | undefined;
+				registerCleanup(
+					() =>
+						new Promise<void>((resolve) => {
+							releaseCleanup = resolve;
+						}),
+				);
+				const cleanupPromise = runCleanup().then(() => {
+					resolved = true;
+				});
+				await vi.advanceTimersByTimeAsync(7_999);
+				expect(resolved).toBe(false);
+				await vi.advanceTimersByTimeAsync(1);
+				await cleanupPromise;
+				expect(resolved).toBe(true);
+				releaseCleanup?.();
+				await runCleanup();
+			};
+
+			await runCase("0");
+			await runCase("-1");
+			await runCase("not-a-number");
+		} finally {
+			if (originalTimeout === undefined) {
+				delete process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS;
+			} else {
+				process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = originalTimeout;
+			}
+			vi.useRealTimers();
+		}
+	});


⚠️ Potential issue | 🟡 Minor

add the missing sub-minimum clamp regression ("1" -> 1000ms).

test/shutdown.test.ts:163-200 (Line 163-200) checks default fallback cases but misses lib/shutdown.ts:15 (Line 15), where positive values below 1000 are clamped up to 1000. add a deterministic case for "1".

proposed test update

- const runCase = async (value: string) => { + const runCase = async (value: string, expectedMs: number) => { process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = value; let resolved = false; let releaseCleanup: (() => void) | undefined; @@ - await vi.advanceTimersByTimeAsync(7_999); + await vi.advanceTimersByTimeAsync(expectedMs - 1); expect(resolved).toBe(false); await vi.advanceTimersByTimeAsync(1); await cleanupPromise; expect(resolved).toBe(true); @@ - await runCase("0"); - await runCase("-1"); - await runCase("not-a-number"); + await runCase("0", 8_000); + await runCase("-1", 8_000); + await runCase("not-a-number", 8_000); + await runCase("1", 1_000);

as per coding guidelines, "test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it("defaults to 8s timeout for non-positive and non-numeric values", async () => {

const originalTimeout = process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS;

vi.useFakeTimers();

try {

const runCase = async (value: string) => {

process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = value;

let resolved = false;

let releaseCleanup: (() => void) | undefined;

registerCleanup(

() =>

new Promise<void>((resolve) => {

releaseCleanup = resolve;

}),

);

const cleanupPromise = runCleanup().then(() => {

resolved = true;

});

await vi.advanceTimersByTimeAsync(7_999);

expect(resolved).toBe(false);

await vi.advanceTimersByTimeAsync(1);

await cleanupPromise;

expect(resolved).toBe(true);

releaseCleanup?.();

await runCleanup();

};

await runCase("0");

await runCase("-1");

await runCase("not-a-number");

} finally {

if (originalTimeout === undefined) {

delete process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS;

} else {

process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = originalTimeout;

}

vi.useRealTimers();

}

});

it("defaults to 8s timeout for non-positive and non-numeric values", async () => {

const originalTimeout = process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS;

vi.useFakeTimers();

try {

const runCase = async (value: string, expectedMs: number) => {

process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = value;

let resolved = false;

let releaseCleanup: (() => void) | undefined;

registerCleanup(

() =>

new Promise<void>((resolve) => {

releaseCleanup = resolve;

}),

);

const cleanupPromise = runCleanup().then(() => {

resolved = true;

});

await vi.advanceTimersByTimeAsync(expectedMs - 1);

expect(resolved).toBe(false);

await vi.advanceTimersByTimeAsync(1);

await cleanupPromise;

expect(resolved).toBe(true);

releaseCleanup?.();

await runCleanup();

};

await runCase("0", 8_000);

await runCase("-1", 8_000);

await runCase("not-a-number", 8_000);

await runCase("1", 1_000);

} finally {

if (originalTimeout === undefined) {

delete process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS;

} else {

process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = originalTimeout;

}

vi.useRealTimers();

}

});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/shutdown.test.ts` around lines 163 - 200, The test misses the sub-minimum clamp regression for positive values below 1000ms; update the test in shutdown.test.ts to include a deterministic case for process.env.CODEX_AUTH_SHUTDOWN_TIMEOUT_MS = "1" (which should clamp to 1000ms) by adding await runCase("1") alongside the existing runCase("0"), runCase("-1"), and runCase("not-a-number") calls; this verifies the behavior in lib/shutdown.ts that clamps positive values under 1000 to 1000, while still using registerCleanup and runCleanup and the same fake timers pattern.

coderabbitai · 2026-03-05T01:35:12Z

test/telemetry-lock.test.ts

+		await telemetry.recordTelemetryEvent({
+			source: "plugin",
+			event: "request.locked_write",
+			outcome: "info",
+			details: { token: "sk-live-lock-test-token" },
+		});


⚠️ Potential issue | 🟠 Major

replace secret-looking token fixtures with neutral placeholders.

line [48] uses sk-live-lock-test-token, which looks like a real key pattern (test/telemetry-lock.test.ts:48). replace it with a neutral non-secret-like value to keep test data hygiene strict.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/telemetry-lock.test.ts` around lines 44 - 49, The test uses a secret-looking fixture value in the call to telemetry.recordTelemetryEvent (see telemetry.recordTelemetryEvent invocation) — replace the token "sk-live-lock-test-token" with a neutral, non-secret placeholder (e.g. "test-placeholder-token" or "placeholder-token") in the details.token field so test data hygiene is preserved; ensure no other test fixtures use real-looking secret patterns and keep the existing assertions and deterministic vitest setup intact (do not mock out assertions or remove the telemetry call).

coderabbitai · 2026-03-05T01:35:12Z

test/telemetry.test.ts

+async function removeWithRetry(
+	targetPath: string,
+	options: { recursive?: boolean; force?: boolean },
+): Promise<void> {
+	for (let attempt = 0; attempt < 6; attempt += 1) {
+		try {
+			await fs.rm(targetPath, options);
+			return;
+		} catch (error) {
+			const code = (error as NodeJS.ErrnoException).code;
+			if (code === "ENOENT") return;
+			if (!code || !RETRYABLE_REMOVE_CODES.has(code) || attempt === 5) {
+				throw error;
+			}
+			await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt));
+		}
+	}
+}


🧹 Nitpick | 🔵 Trivial

prefer reusing the shared fs retry helper instead of local duplication.

test/telemetry.test.ts:16-33 redefines removeWithRetry. import the shared helper from test/helpers/fs-retry.ts to keep retry behavior consistent across suites.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/telemetry.test.ts` around lines 16 - 33, The local removeWithRetry function duplicates shared logic; delete this local definition in test/telemetry.test.ts and import the shared helper from test/helpers/fs-retry.ts instead (use the exported removeWithRetry helper from that module), update any usages to call the imported removeWithRetry, and remove any now-unused local symbols like RETRYABLE_REMOVE_CODES or direct fs.rm retry logic so the test reuses the centralized retry behavior.

coderabbitai · 2026-03-05T01:35:12Z

test/telemetry.test.ts

+		it("serializes concurrent telemetry writes without dropping events", async () => {
+			configureTelemetry({ maxFileSizeBytes: 1_000_000, maxFiles: 12 });
+			const total = 30;
+			await Promise.all(
+				Array.from({ length: total }, (_, index) =>
+					recordTelemetryEvent({
+						source: "plugin",
+						event: "request.concurrent",
+						outcome: "info",
+						details: {
+							index,
+							token: `sk-concurrent-token-${String(index).padStart(4, "0")}`,
+						},
+					}),
+				),
+			);
+
+			const raw = await fs.readFile(getTelemetryLogPath(), "utf8");
+			const lines = raw.trim().split("\n");
+			expect(lines).toHaveLength(total);
+
+			const parsed = lines.map(
+				(line) => JSON.parse(line) as { details?: Record<string, unknown> },
+			);
+			expect(parsed.every((entry) => entry.details?.token === "***MASKED***")).toBe(true);
+		});


⚠️ Potential issue | 🟠 Major

avoid secret-like token literals in telemetry fixtures.

test/telemetry.test.ts:316-317 uses sk-concurrent-token-... fixture values. use clearly synthetic non-secret strings to avoid false positives in secret scanning.

proposed fix

details: { index, - token: `sk-concurrent-token-${String(index).padStart(4, "0")}`, + token: `telemetry_token_fixture_${String(index).padStart(4, "0")}`, },

As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it("serializes concurrent telemetry writes without dropping events", async () => {

configureTelemetry({ maxFileSizeBytes: 1_000_000, maxFiles: 12 });

const total = 30;

await Promise.all(

Array.from({ length: total }, (_, index) =>

recordTelemetryEvent({

source: "plugin",

event: "request.concurrent",

outcome: "info",

details: {

index,

token: `sk-concurrent-token-${String(index).padStart(4, "0")}`,

},

}),

),

);

const raw = await fs.readFile(getTelemetryLogPath(), "utf8");

const lines = raw.trim().split("\n");

expect(lines).toHaveLength(total);

const parsed = lines.map(

(line) => JSON.parse(line) as { details?: Record<string, unknown> },

);

expect(parsed.every((entry) => entry.details?.token === "***MASKED***")).toBe(true);

});

it("serializes concurrent telemetry writes without dropping events", async () => {

configureTelemetry({ maxFileSizeBytes: 1_000_000, maxFiles: 12 });

const total = 30;

await Promise.all(

Array.from({ length: total }, (_, index) =>

recordTelemetryEvent({

source: "plugin",

event: "request.concurrent",

outcome: "info",

details: {

index,

token: `telemetry_token_fixture_${String(index).padStart(4, "0")}`,

},

}),

),

);

const raw = await fs.readFile(getTelemetryLogPath(), "utf8");

const lines = raw.trim().split("\n");

expect(lines).toHaveLength(total);

const parsed = lines.map(

(line) => JSON.parse(line) as { details?: Record<string, unknown> },

);

expect(parsed.every((entry) => entry.details?.token === "***MASKED***")).toBe(true);

});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/telemetry.test.ts` around lines 305 - 330, The test uses secret-like fixture values ("sk-concurrent-token-...") which can trigger secret-scanning; update the telemetry test to use clearly synthetic, non-secret token strings (e.g. "concurrent-token-0000" or "token-<index>") when calling recordTelemetryEvent in the concurrent write spec, keeping the same deterministic generation pattern so configureTelemetry, recordTelemetryEvent, and getTelemetryLogPath behavior and the final masking assertion remain unchanged; ensure the token string change is applied only in this test fixture and that parsed.every(...) still checks for masked tokens.

ndycode and others added 30 commits March 3, 2026 21:16

fix: harden oauth code exchange timeout handling

4e6e7be

Add timeout and abort boundary for authorization code exchange. Wire fetch timeout config into oauth exchange callsites. Add tests for network and timeout failure behavior. Co-authored-by: Codex <noreply@openai.com>

fix: address review follow-ups for auth and lock tests

f274725

- audit break-glass authorization bypass events\n- add multi-process contention regression for file locking\n- cover new auth.break_glass audit action in tests Co-authored-by: Codex <noreply@openai.com>

fix: guard account storage with file lock

d1c603e

Co-authored-by: Codex <noreply@openai.com>

test: add boundary regression tests for core flows

77d0e03

- add storage identity fallback test for malformed .git metadata - add OAuth poll abort test for deterministic close behavior - add prompt refresh dedupe test for concurrent stale cache reads Co-authored-by: Codex <noreply@openai.com>

chore(dx): add one-command local dev bootstrap

b120301

Add dev doctor and setup commands with fail-fast checks, document the contributor workflow, and wire a CI sanity check for doctor:dev. Co-authored-by: Codex <noreply@openai.com>

chore(dx): unify verify gates and add release/local runbooks

5d0da22

Add a canonical verify workflow shared by local and CI, integrate scoped Biome formatting checks, and document contributor/release runbooks to reduce setup and release friction. Co-authored-by: Codex <noreply@openai.com>

fix(storage): serialize account file lock queue

d267377

Ensure account-storage mutations keep deterministic ordering while preserving the historical file-lock before in-process mutex acquisition sequence. Co-authored-by: Codex <noreply@openai.com>

fix(review): close remaining perf-sweep CodeRabbit threads

68e12ac

- preserve manual accountId on snapshot merge - clarify refresh guardian last-tick metric semantics - add CRLF/fallback/concurrency/metrics regression tests Co-authored-by: Codex <noreply@openai.com>

ndycode and others added 15 commits March 4, 2026 16:39

chore: retrigger CodeRabbit review\n\nCo-authored-by: Codex <noreply@…

93d0df5

…openai.com>

Merge branch 'gitcheck/pr34-lock-order' into impl/full-plan-20260305

9b30aa3

merge: integrate reliability followups from fix/pr31-feedback

9c148b7

Co-authored-by: Codex <noreply@openai.com>

Merge branch 'refactor/arch-pr1-shared-account-helpers' into impl/ful…

e615129

…l-plan-20260305

Merge branch 'qa/test-dominant-origin-main' into impl/full-plan-20260305

7d2f4ad

Merge branch 'transform/stage-01-reliability' into impl/full-plan-202…

8a51e06

…60305

merge: integrate telemetry visibility into hardened command paths

b72421a

Co-authored-by: Codex <noreply@openai.com>

merge: integrate runtime performance safeguards and CI perf gate

edfe2be

Co-authored-by: Codex <noreply@openai.com>

merge: add developer bootstrap workflow and release gates

40e2f87

Co-authored-by: Codex <noreply@openai.com>

merge: apply p0 concurrency and bounded retry safeguards

88df2f0

Co-authored-by: Codex <noreply@openai.com>

fix: resolve integration regressions after full-plan merge

74b844d

Co-authored-by: Codex <noreply@openai.com>

coderabbitai bot requested changes Mar 4, 2026

View reviewed changes

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

lib/authorization.ts Outdated Show resolved Hide resolved

lib/file-lock.ts Show resolved Hide resolved

ndycode and others added 2 commits March 5, 2026 07:32

coderabbitai bot requested changes Mar 4, 2026

View reviewed changes

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

lib/auth/auth.ts Outdated Show resolved Hide resolved

lib/telemetry.ts Outdated Show resolved Hide resolved

lib/telemetry.ts Show resolved Hide resolved

ndycode and others added 2 commits March 5, 2026 08:18

coderabbitai bot requested changes Mar 5, 2026

View reviewed changes

		- name: Generate CycloneDX SBOM
		run: npx --yes @cyclonedx/cyclonedx-npm --output-file sbom.cdx.json --omit dev

		- `CODEX_AUTH_ROLE=admin\|operator\|viewer`
		- `CODEX_AUTH_BREAK_GLASS=1` only for emergency windows

Conversation

ndycode commented Mar 4, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Behavior Change

How To Test

Risk / Rollout Notes

Update 2026-03-05 (commit 6277c85)

What changed

How to test

Risk / rollout notes

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

walkthrough

changes

sequence diagram(s)

estimated code review effort

suggested labels

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ndycode commented Mar 4, 2026 •

edited by greptile-apps bot

Loading

Update 2026-03-05 (commit `6277c85`)

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.

coderabbitai bot commented Mar 4, 2026 •

edited

Loading