Skip to content

perf: performance & efficiency sweep (request/storage/refresh hot paths)#38

Open
ndycode wants to merge 12 commits intomainfrom
perf/efficiency-sweep
Open

perf: performance & efficiency sweep (request/storage/refresh hot paths)#38
ndycode wants to merge 12 commits intomainfrom
perf/efficiency-sweep

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 4, 2026

Summary

  • Optimize request/response and sync hot paths to reduce parse/copy/reconcile overhead.
  • Bound fan-out in email hydration and proactive refresh to reduce burst pressure.
  • Optimize storage normalization and stale backup cleanup behavior on load path.
  • Add runtime benchmark coverage and a CI regression gate for performance budgets.
  • Add timing/throughput instrumentation for hydration and refresh guardian paths.

Commits

  • ef16e32 perf: optimize hot request and sync paths
  • bca2742 perf: reduce hot-path fanout and add perf gate
  • 05cb605 perf: add refresh and hydration timing metrics
  • 526f261 test: harden perf metrics coverage for review quality

Validation

  • npm run lint
  • npm run typecheck
  • npm test (100 files, 2333 tests passed)
  • npm run perf:runtime:ci (benchmark + budget gate passed)

Notes

  • External behavior preserved; changes focus on scheduling, deduplication, parsing efficiency, and observability.
  • Branch includes only perf/test/CI changes for this sweep.

Thread Resolution Update (2026-03-05)

What changed

  • Added plugin-level in-flight auth refresh deduping in index.ts so concurrent sdk.fetch paths do not fan out duplicate refresh work.
  • Hardened lib/refresh-guardian.ts account resolution to avoid wrong-account mutations:
    • token map now tracks all candidates
    • duplicate-token cases are disambiguated by stable identity (accountId/email)
    • no-refresh-token paths fall back to snapshot index resolution
  • Added Codex CLI sync regression to verify refresh/email re-key updates do not insert duplicate accounts on follow-up snapshots (test/codex-cli-sync.test.ts).
  • Strengthened concurrent retry regression by removing mock-side coalescing and asserting orchestration-level dedupe (test/index-retry.test.ts).
  • Made storage in-flight cleanup test deterministic with vi.waitFor (test/storage.test.ts).
  • Added new refresh guardian regressions for duplicate-token disambiguation and no-refresh-token fallback (test/refresh-guardian.test.ts).

How to test

  • npx vitest run test/index-retry.test.ts test/refresh-guardian.test.ts test/codex-cli-sync.test.ts test/storage.test.ts
  • npm run lint
  • npm run typecheck

Risk / rollout notes

  • Risk level: low.
  • Changes are scoped to concurrency correctness and test determinism; no intended user-facing CLI/API behavior changes.

Thread Resolution Update (2026-03-05, follow-up)

What changed

  • index.ts: refresh dedup now requires a stable refresh-token key; no-refresh auth now refreshes without shared dedup key to prevent cross-token/account collisions.
  • lib/codex-cli/sync.ts: preserved existing ^GccountIdSource when snapshot includes accountId (unless manual source explicitly wins), avoiding silent inferred->token coercion.
  • lib/storage.ts: added
    esetRotatingBackupCleanupStateForTesting() and reset cleanup state when storage path context changes to prevent module-level cleanup-state leakage.
  • est/index-retry.test.ts: added deterministic regressions for no-refresh concurrent fan-out and failed-refresh dedup cleanup.
  • est/index.test.ts: added hydration metrics/concurrency-cap regression through deep-check auth flow.
  • est/codex-cli-sync.test.ts: force snapshot reread via clearCodexCliStateCache() before second sync and strengthened assertions.
  • est/storage.test.ts: reset rotating cleanup state around tests for isolation.

How to test

px vitest run test/index-retry.test.ts test/index.test.ts test/codex-cli-sync.test.ts test/storage.test.ts

pm run lint

pm run typecheck

Risk / rollout notes

  • No external API changes. Main runtime behavior change is safer refresh dedup semantics when refresh token is unavailable.
  • Storage cleanup-state reset is low-risk and scoped to path transitions/testing context.

Final Thread Resolution Update (2026-03-05)

What changed

  • Updated RefreshGuardian.tick metrics to record maxTickDurationMs across all ticks, including error/no-op paths.
  • Hardened stale rotating-backup cleanup in lib/storage.ts with transient unlink retries (EPERM/EBUSY/EAGAIN) and non-throttled re-scan behavior when transient retries are exhausted.
  • Added deterministic regressions for:
    • failed-tick duration accounting ( est/refresh-guardian.test.ts)
    • transient stale-artifact cleanup retry/exhaustion behavior ( est/storage.test.ts)
    • bounded hydration concurrency under partial failures ( est/index.test.ts)
  • Refactored repeated env teardown in hydration tests via a local
    estoreEnv helper.

How to test

px vitest run test/refresh-guardian.test.ts test/storage.test.ts test/index.test.ts

pm run lint

pm run typecheck

Risk / rollout notes

  • Risk level: low.
  • Changes are scoped to observability/correctness hardening of refresh metrics and backup-cleanup reliability under transient filesystem lock conditions.

Thread Resolution Update (2026-03-06)

What changed

  • lib/refresh-guardian.ts: switched duplicate-refresh-token candidate handling to stable account.index resolution (no live-array positional lookup), with index-safe fallback.
  • lib/storage.ts: transient readdir scan failures (EBUSY/EPERM/EAGAIN) now do not advance stale-artifact cleanup throttling.
  • lib/storage.ts: in-flight cleanup finalizer now guards against stale state writes after storage-path transitions.
  • test/refresh-guardian.test.ts: hardened duplicate-token and account-removal regressions to use non-positional getAccountByIndex behavior.
  • test/storage.test.ts: added deterministic in-flight path-transition cleanup isolation regression.
  • test/index.test.ts: hoisted shared restoreEnv and metricValue helpers to remove duplicate test helper blocks.

How to test

  • npx vitest run test/refresh-guardian.test.ts test/storage.test.ts test/index.test.ts
  • npm run lint
  • npm run typecheck
  • npm test

Risk / rollout notes

  • External behavior unchanged; fixes are scoped to account/index correctness, transient filesystem reliability, and test maintainability.
  • Risk level: low.

Follow-up Thread Hardening (2026-03-06)

What changed

  • lib/storage.ts: extended transient stale-artifact unlink retry classification to include EACCES (Windows access-lock path).
  • test/storage.test.ts: added deterministic regression does not throttle stale-artifact cleanup after exhausted EACCES unlink retries.

How to test

  • npx vitest run test/storage.test.ts -t "EACCES unlink retries|exhausted transient unlink retries"
  • npm run lint
  • npm run typecheck
  • npm test

Risk / rollout notes

  • Low risk reliability fix; no external behavior changes, only cleanup retry/throttle correctness under Windows lock conditions.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr is a well-scoped performance and correctness sweep across the hot request/storage/refresh paths. all previously-flagged thread issues (dedup key collision, accountIdSource coercion, maxTickDurationMs undercounting, cleanup-state leakage) are resolved and covered by new regressions.

key changes:

  • refreshAuthWithDedup in index.ts deduplicates concurrent refresh work by stable refresh-token key; skips dedup entirely when no refresh token is present, preventing cross-account token contamination
  • email hydration and proactive refresh fan-out is now bounded by a worker-pool (cap=4) instead of unbounded Promise.all, reducing burst load on the token endpoint
  • handleSuccessResponseDetailed / convertSseToJsonWithDetails / readBodyForErrorHandling eliminate redundant JSON.parse on both the success and error paths by threading the already-parsed body through to the caller
  • cleanupStaleRotatingBackupArtifacts is now throttled with a 30s per-path gate, in-flight dedup, and transient-retry unlinks (EPERM/EBUSY/EAGAIN/EACCES) — important for windows desktop antivirus lock scenarios
  • lib/codex-cli/sync.ts builds account indexes once per sync pass and updates them incrementally, replacing three O(n) passes per snapshot with O(1) lookups
  • lib/refresh-guardian.ts disambiguates duplicate-refresh-token candidates by stable accountId/email identity and falls back to sourceAccount.index, avoiding wrong-account mutations; tick timing metrics (last/max/cumulative/avg) are recorded on all paths including error ticks
  • custom cloneJsonRecord in tool-utils.ts replaces JSON.parse(JSON.stringify()) with a direct traversal (circular detection via WeakSet, correct array-undefined-to-null semantics)
  • ci perf gate added — npm run perf:runtime:ci runs benchmark + budget check on node 20.x; budgets are generous enough for ci variance while still catching major regressions

windows/token safety notes addressed by this pr:

  • transient fs lock retries for stale-backup cleanup
  • cleanup-state reset on storage path transitions guards against module-level state leakage across plugin instances
  • refresh dedup is strictly keyed on a stable refresh token; fallback cases skip the dedup map entirely, preventing access-token header collisions across accounts

Confidence Score: 4/5

  • safe to merge — all previous thread issues are addressed, behavior is preserved for external callers, and new regressions cover the critical concurrency paths
  • changes are correctness/perf hardening with no intended behavior change; all previously-flagged concurrency and token-safety issues have regression coverage. the one point deducted is for the broad module-scope inFlightAuthRefreshByToken map in index.ts being plugin-instance-scoped (not exported/resettable) which could be hard to isolate in future test scenarios, though it doesn't cause any current issue
  • No files require special attention

Sequence Diagram

sequenceDiagram
    participant C1 as Caller 1 (sdk.fetch)
    participant C2 as Caller 2 (sdk.fetch)
    participant D as refreshAuthWithDedup
    participant M as inFlightAuthRefreshByToken
    participant R as refreshAndUpdateToken

    C1->>D: auth (refresh="r")
    D->>M: get("r") → undefined
    D->>R: refreshAndUpdateToken(auth)
    activate R
    D->>M: set("r", refreshPromise)
    D-->>C1: awaiting refreshPromise

    C2->>D: auth (refresh="r")
    D->>M: get("r") → refreshPromise
    D-->>C2: awaiting same refreshPromise

    R-->>D: resolved OAuthAuthDetails
    deactivate R
    D->>M: delete("r") via finally
    D-->>C1: OAuthAuthDetails
    D-->>C2: OAuthAuthDetails (same result)

    Note over C1,C2: No refresh token case — each caller gets its own refresh, no dedup
Loading

Comments Outside Diff (1)

  1. lib/storage.ts, line 1245-1247 (link)

    EACCES missing from transient retry set — windows safety gap

    EACCES (access denied) is the most common error code returned by windows when antivirus or shell-extension hooks briefly hold a file handle during unlink. the current list only retries EPERM, EBUSY, EAGAIN.

    when EACCES hits, canRetry is false, the error is thrown immediately, and because err.transientExhausted = isTransientUnlinkCode("EACCES") evaluates to false, shouldAdvanceLastRunAt stays true. this means the cleanup window advances as if the run succeeded — the stale artifact is left on disk, and the next retry is gated behind the 30-second interval rather than being attempted immediately.

    contrast this with the correct EPERM/EBUSY behavior: after exhausting all 5 attempts, transientExhausted = trueshouldAdvanceLastRunAt = false → interval resets to 0 → next load path retries immediately.

    regression test that would catch this: mock fs.unlink to throw EACCES twice then succeed; assert shouldAdvanceLastRunAt is false after the exhausted run and that the artifact is eventually removed on the next call.

Last reviewed commit: 462cce6

ndycode and others added 4 commits March 4, 2026 11:14
- replace expensive tool-schema clone path with fast JSON-safe clone and lower-allocation cleanup loops
- eliminate repeated index map rebuilds and JSON.stringify comparisons in codex-cli account sync
- parse error bodies once in fetch helpers and thread parsed payload through rate-limit mapping
- add detailed non-stream success handling to avoid duplicate JSON parse in empty-response retry checks
- extend runtime benchmark harness with sync/error/non-stream cases and update tests/mocks for new helper exports

Co-authored-by: Codex <noreply@openai.com>
Bound expensive fan-out in account refresh/email hydration paths, cache index remaps in storage, and add runtime benchmark regression gating in CI.

- throttle stale rotating backup cleanup scans per storage path
- add bounded concurrency for proactive refresh and email hydration
- reduce refresh-guardian lookup churn with precomputed token->index mapping
- extend runtime benchmark coverage and add CI gate budgets

Co-authored-by: Codex <noreply@openai.com>
Instrument hydration and proactive refresh hot paths with duration and throughput counters, and surface guardian tick timing in runtime metrics output.

- track email hydration runs/candidates/refreshed/updated/failures/duration
- log proactive refresh batch timing with effective concurrency
- add guardian tick duration stats (last/max/avg)

Co-authored-by: Codex <noreply@openai.com>
Tighten metrics correctness and improve test coverage for the new performance instrumentation.

- keep guardian avg tick duration synchronized with cumulative/runs
- add deterministic guardian duration aggregate test
- assert codex-metrics output includes hydration/guardian timing lines

Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

adds a runtime perf gate and benchmarking tooling; exposes parsed response bodies via detailed handlers; introduces bounded worker pools for email hydration and proactive refresh; refactors account indexing for O(1) lookups; adds rotating backup cleanup throttling and retry unlink; improves circular-safe cloning; expands guardian/runtime timing and hydration metrics.

Changes

Cohort / File(s) Summary
ci & benchmarking
/.github/workflows/ci.yml, package.json, scripts/benchmark-runtime-path.mjs, scripts/check-runtime-budgets.mjs, scripts/runtime-bench-budgets.json
adds a node-20.x runtime performance regression gate to ci, three npm bench scripts, an async-capable benchmark runner, and a budget checker that fails ci when avgMs > maxAvgMs.
http response handling
lib/request/fetch-helpers.ts, lib/request/response-handler.ts, test/response-handler.test.ts, test/fetch-helpers.test.ts
introduces detailed success results ({ response, parsedBody }) via handleSuccessResponseDetailed and convertSseToJsonWithDetails; centralizes body parsing for error/rate-limit logic and updates tests to assert parsedBody.
email hydration & metrics
index.ts, test/index.test.ts, test/index-retry.test.ts
replaces serial hydration with a bounded concurrent worker pool (HYDRATE_EMAILS_MAX_CONCURRENCY), adds detailed hydration metrics and logging, dedupes token refreshes, and switches to handleSuccessResponseDetailed; tests updated for new metrics and retry/dedup behavior.
account indexing & sync
lib/codex-cli/sync.ts, test/codex-cli-sync.test.ts
adds AccountIndexes (byAccountId, byRefreshToken, byEmail) and helpers to build/update/clear them; refactors upsertFromSnapshot and active-index resolution to use indexes and structural snapshot comparisons.
proactive refresh & refresh guardian
lib/proactive-refresh.ts, lib/refresh-guardian.ts, test/proactive-refresh.test.ts, test/refresh-guardian.test.ts
exports DEFAULT_PROACTIVE_REFRESH_MAX_CONCURRENCY, changes refreshExpiringAccounts to a worker-pool model with maxConcurrency and per-batch outcomes, and records tick timing stats (last/max/cumulative/avg) in guardian state.
storage & cleanup
lib/storage.ts, test/storage.test.ts
adds rotating backup cleanup throttling with per-path in-flight state and exponential unlink retry; adds resetRotatingBackupCleanupStateForTesting() test helper and an index builder for O(1) key→index lookups used in normalization.
clone & schema utils
lib/request/helpers/tool-utils.ts
replaces naive JSON cloning with circular-safe cloneJsonValue/cloneJsonRecord using WeakSet; mirrors JSON.stringify semantics and tightens schema cleanup to own-property iteration and nullable handling.
benchmark harness integration
scripts/benchmark-runtime-path.mjs
extends harness with async benchmark cases, temporary codex-cli state setup/cleanup, and aggregates async benchmark results into the CI payload.

Sequence Diagram(s)

sequenceDiagram
    participant client as client
    participant sdk as sdk.fetch
    participant handler as handleSuccessResponseDetailed
    participant hydrator as hydrationPool
    participant storage as storage
    client->>sdk: make api request
    sdk->>handler: receive http response
    handler-->>sdk: { response, parsedBody }
    sdk->>hydrator: enqueue accounts (maxConcurrency = N)
    par workers
        hydrator->>handler: worker network call (uses parsedBody if present)
        hydrator->>storage: persist per-account updates and worker metrics
    end
    hydrator->>storage: merge worker metrics into runtime metrics
    storage-->>client: expose updated runtime metrics
Loading
sequenceDiagram
    participant codex as codex snapshots
    participant indexer as account indexer
    participant reconciler as upsertFromSnapshot
    participant storage as storage
    codex->>indexer: provide snapshots
    indexer->>indexer: createAccountIndexes(byAccountId,byRefreshToken,byEmail)
    indexer->>reconciler: upsertFromSnapshot(using indexes, now)
    alt new account
        reconciler->>indexer: indexAccount(...)
    else updated account
        reconciler->>indexer: updateAccountIndexes(...)
    end
    reconciler->>storage: persist changes if any
    storage-->>codex: return changed flag
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

flagged concerns

  • concurrency risks: worker pools update shared counters, metrics, and account state. review index.ts:line, lib/proactive-refresh.ts:line, and lib/refresh-guardian.ts:line. add locking or careful atomic increments where necessary.
  • missing regression tests: deduplicated token refresh, worker error isolation, and hydration ordering need explicit tests. add tests around test/index-retry.test.ts:line, test/index.test.ts:line, and test/proactive-refresh.test.ts:line.
  • windows edge cases: benchmark harness and tmp-dir operations use mkdtemp/rm and posix assumptions. validate scripts/benchmark-runtime-path.mjs:line and scripts/check-runtime-budgets.mjs:line on windows CI agents to avoid temp-dir and rm differences.
  • circular clone correctness: new cloning uses a WeakSet per call; ensure no cross-call retention and add unit tests for deep cyclic structures in lib/request/helpers/tool-utils.ts:line.
  • metrics accumulation guards: guardian tick averaging must guard divide-by-zero and ordering of last vs. avg updates. review lib/refresh-guardian.ts:line and validate with test/refresh-guardian.test.ts:line.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title uses conventional commits format (perf:), is exactly 72 chars, and accurately summarizes the multi-area performance optimization sweep.
Description check ✅ Passed The PR description comprehensively covers the optimization scope, thread resolutions, hardened concurrency/token semantics, new regressions, validation status, and risk assessment.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf/efficiency-sweep

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/codex-cli/sync.ts`:
- Around line 160-163: The merge currently lets snapshot.accountId
unconditionally override current.accountId (see mergedAccountId and
mergedAccountIdSource), which discards user-pinned IDs when
current.accountIdSource === "manual"; change the merge so that if
current.accountIdSource === "manual" you keep current.accountId and
current.accountIdSource regardless of snapshot.accountId, otherwise use
snapshot.accountId and set mergedAccountIdSource appropriately; update the
variables mergedAccountId and mergedAccountIdSource logic to implement this
conditional, and add a Vitest regression that covers a manual-source accountId
plus a snapshot that would otherwise match refresh-token behavior to ensure the
manual id is preserved; ensure the new test references the merge behavior in
sync.ts and conforms to lib/** guidelines (auth rotation, windows IO,
concurrency) and mention affected tests in the change description.

In `@lib/refresh-guardian.ts`:
- Around line 192-200: The code updates this.stats.lastTickDurationMs
unconditionally which records durations for early no-op ticks; if you want
lastTickDurationMs to represent only counted runs, move the assignment of
lastTickDurationMs inside the countedRun branch alongside the other updates (the
block that updates this.stats.maxTickDurationMs, cumulativeTickDurationMs and
avgTickDurationMs) in the tick completion logic in refresh-guardian (so related
to countedRun, durationMs, and this.stats.*). Alternatively, if you intend to
keep lastTickDurationMs for all ticks, add a clarifying comment near the
unconditional assignment explaining it is intentionally tracking even
uncounted/no-op ticks.

In `@test/fetch-helpers.test.ts`:
- Around line 509-517: Add a deterministic CRLF-framed SSE regression to the
existing test for handleSuccessResponseDetailed: create an sseContent string
that uses "\r\n" line endings (e.g. "data: {...}\r\n\r\n") to simulate Windows
framing, construct a Response from that content with status 200, call
handleSuccessResponseDetailed(response, false), and assert that
result.parsedBody and await result.response.json() both equal the expected
parsed object (same expectations as the LF case) to ensure non-stream parsing
handles CRLF correctly.

In `@test/index-retry.test.ts`:
- Around line 32-35: Add a deterministic concurrent-regression test in
test/index-retry.test.ts that triggers sdk.fetch in parallel to prove token
refresh + retry deduplication: create a mocked transport using the existing
handleSuccessResponseDetailed helper and a controllable delayed
401->refresh->200 flow, then call Promise.all on many simultaneous sdk.fetch
calls (e.g., 10) and assert that the token refresh function (or refresh
endpoint) was invoked exactly once and that all fetches eventually succeed; use
vitest utilities (vi.useFakeTimers/advanceTimersByTime or explicit Promise-based
delays) to keep the test deterministic, do not mock real secrets, and include
concrete assertions for single refresh call and correct responses.

In `@test/index.test.ts`:
- Around line 753-756: Replace the loose toContain assertions on the metrics
snapshot (the asserts using result) with deterministic numeric-format checks:
use regex/toMatch to assert each metric line follows "MetricName:
<number(.fraction)?>" format, parse the numeric values from result and assert
exact expected counts/durations (integers or fixed decimal places) instead of
substring presence; add a parallel execution sub-case that performs the same
operation via Promise.all to exercise concurrent code paths and then assert the
aggregated metric equals the sum of individual runs, using vitest APIs so tests
remain deterministic and do not mock real secrets or skip assertions.

In `@test/response-handler.test.ts`:
- Around line 56-64: Add a deterministic vitest unit that mirrors the existing
success case for convertSseToJsonWithDetails but verifies the fallback: when the
SSE stream contains no "response.done" final event, ensure the returned
details.parsedBody is undefined and the returned details.response preserves the
raw SSE text (assert response.text() equals the original sseContent); use the
same Response and Headers setup as the existing test and keep assertions
deterministic and synchronous.

ℹ️ Review info
Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 631c418f-b245-4566-afdd-e17a7dc8ca7b

📥 Commits

Reviewing files that changed from the base of the PR and between d36b04f and 526f261.

📒 Files selected for processing (19)
  • .github/workflows/ci.yml
  • index.ts
  • lib/codex-cli/sync.ts
  • lib/proactive-refresh.ts
  • lib/refresh-guardian.ts
  • lib/request/fetch-helpers.ts
  • lib/request/helpers/tool-utils.ts
  • lib/request/response-handler.ts
  • lib/storage.ts
  • package.json
  • scripts/benchmark-runtime-path.mjs
  • scripts/check-runtime-budgets.mjs
  • scripts/runtime-bench-budgets.json
  • test/fetch-helpers.test.ts
  • test/index-retry.test.ts
  • test/index.test.ts
  • test/proactive-refresh.test.ts
  • test/refresh-guardian.test.ts
  • test/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/refresh-guardian.test.ts
  • test/proactive-refresh.test.ts
  • test/fetch-helpers.test.ts
  • test/index.test.ts
  • test/index-retry.test.ts
  • test/response-handler.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/proactive-refresh.ts
  • lib/storage.ts
  • lib/refresh-guardian.ts
  • lib/codex-cli/sync.ts
  • lib/request/helpers/tool-utils.ts
  • lib/request/response-handler.ts
  • lib/request/fetch-helpers.ts
🧬 Code graph analysis (9)
test/refresh-guardian.test.ts (1)
lib/refresh-guardian.ts (1)
  • RefreshGuardian (32-204)
test/proactive-refresh.test.ts (1)
lib/proactive-refresh.ts (3)
  • refreshExpiringAccounts (136-200)
  • DEFAULT_PROACTIVE_BUFFER_MS (21-21)
  • DEFAULT_PROACTIVE_REFRESH_MAX_CONCURRENCY (25-25)
lib/storage.ts (1)
lib/storage/migrations.ts (1)
  • AccountMetadataV3 (40-57)
test/fetch-helpers.test.ts (2)
scripts/benchmark-runtime-path.mjs (2)
  • result (313-315)
  • result (328-331)
lib/request/fetch-helpers.ts (1)
  • handleSuccessResponseDetailed (644-676)
test/index.test.ts (2)
scripts/benchmark-runtime-path.mjs (2)
  • result (313-315)
  • result (328-331)
scripts/test-model-matrix.js (1)
  • result (383-386)
scripts/benchmark-runtime-path.mjs (4)
lib/codex-cli/state.ts (1)
  • clearCodexCliStateCache (545-549)
lib/storage.ts (1)
  • normalizeAccountStorage (796-882)
lib/codex-cli/sync.ts (1)
  • syncAccountStorageFromCodexCli (349-442)
lib/request/fetch-helpers.ts (2)
  • handleErrorResponse (588-625)
  • handleSuccessResponseDetailed (644-676)
lib/codex-cli/sync.ts (3)
lib/storage.ts (2)
  • AccountMetadataV3 (26-26)
  • AccountStorageV3 (26-26)
lib/codex-cli/state.ts (1)
  • CodexCliAccountSnapshot (24-27)
lib/prompts/codex.ts (2)
  • ModelFamily (47-52)
  • MODEL_FAMILIES (58-64)
test/response-handler.test.ts (1)
lib/request/response-handler.ts (1)
  • convertSseToJsonWithDetails (65-139)
lib/request/fetch-helpers.ts (3)
lib/constants.ts (1)
  • HTTP_STATUS (19-26)
lib/request/response-handler.ts (1)
  • convertSseToJsonWithDetails (65-139)
lib/utils.ts (1)
  • isRecord (11-13)
🔇 Additional comments (15)
scripts/runtime-bench-budgets.json (1)

1-15: budget config looks reasonable.

the budgets include headroom for ci variance as noted. values range from 0.12ms to 7.0ms which seems appropriate for the described operations. no issues with the structure.

package.json (1)

62-64: ci benchmark scripts are well-structured.

the script chain ensures bench runs before gate validation. output to .tmp/perf/ follows project conventions. iterations=40 provides reasonable sample size for ci variance.

test/proactive-refresh.test.ts (2)

329-374: solid concurrency regression test.

the controlled release pattern at test/proactive-refresh.test.ts:340-355 properly verifies that maxConcurrency=2 is respected. tracking active and maxActive ensures the worker pool doesn't exceed bounds. result ordering assertion at line 373 confirms map key preservation.


460-463: constant assertion ensures api stability.

verifies the exported default matches expected value of 4 at test/proactive-refresh.test.ts:462.

lib/proactive-refresh.ts (2)

159-168: worker pool concurrency is safe.

the pattern at lib/proactive-refresh.ts:160-162 reads and increments nextIndex synchronously before any await, so no race exists. each worker claims a position atomically within a single js tick, then awaits the refresh. the shared index model is valid for bounded async concurrency.


193-197: timing instrumentation is clean.

lib/proactive-refresh.ts:193-197 logs only operational metrics (total, concurrency, durationMs) without leaking tokens or emails. good observability hygiene.

lib/storage.ts (3)

362-420: throttled cleanup with in-flight deduplication is sound.

lib/storage.ts:363-374 prevents redundant scans within 30s and reuses in-flight work. the finally block at lines 414-418 correctly updates lastRunAt only after completion. unlink failures at lines 389-395 handle ENOENT gracefully. windows EBUSY is not explicitly handled here, but the scan itself is read-only so EBUSY on readdir would fail gracefully.


762-772: buildAccountIndexByKey provides efficient O(1) lookups.

lib/storage.ts:762-772 replaces linear findIndex scans with a single map construction pass. defensive null check at line 768 handles sparse arrays. good perf optimization for normalization hot path.


837-870: index map lookups replace linear scans correctly.

lib/storage.ts:843-844 and lib/storage.ts:867-870 now use O(1) map lookups instead of findIndex. fallback to clampIndex at line 847 preserves behavior when key is not found. semantics are preserved while reducing complexity from O(n²) to O(n) in normalization.

lib/refresh-guardian.ts (1)

131-137: refresh token index map enables stable resolution.

lib/refresh-guardian.ts:131-137 builds a lookup map to resolve accounts by refresh token instead of by index. this correctly handles the index-shift scenario where accounts are added/removed between snapshot and resolution. defensive check at line 135 skips accounts without refresh tokens.

test/refresh-guardian.test.ts (2)

556-582: tick duration aggregate test is well-designed.

test/refresh-guardian.test.ts:567-571 mocks performance.now() to produce controlled durations: tick 1 = 15ms (25-10), tick 2 = 30ms (60-30). the expected values at lines 578-581 are correct:

  • lastTickDurationMs = 30 (most recent)
  • maxTickDurationMs = 30 (max of 15, 30)
  • cumulativeTickDurationMs = 45 (15 + 30)
  • avgTickDurationMs = 23 (45 / 2 rounded)

deterministic and exercises the stats aggregation logic thoroughly.


109-111: sanity assertions for duration metrics are appropriate.

the >= 0 checks at test/refresh-guardian.test.ts:109-111 (and similar at lines 181-183, 370-372, 532-534) ensure duration metrics don't go negative due to timing quirks. lightweight guards that catch regressions without over-specifying.

scripts/check-runtime-budgets.mjs (1)

1-92: budget validation script is solid.

scripts/check-runtime-budgets.mjs handles edge cases well:

  • line 30: filters null/undefined entries and validates name type
  • line 48: validates maxAvgMs is finite number
  • line 57: validates avgMs is finite
  • line 36: throws if no budget cases defined

one minor note: toFixed(6) at lines 65, 69 produces excessive precision for sub-millisecond values. toFixed(3) would be cleaner for microsecond-level reporting. not blocking.

test/fetch-helpers.test.ts (1)

11-11: good import wiring for the detailed success path.

this keeps the test module aligned with the exported helper surface.
refs: test/fetch-helpers.test.ts:11, lib/request/fetch-helpers.ts:643-675.

test/index.test.ts (1)

215-218: mock surface parity looks good.

including handleSuccessResponseDetailed here prevents drift between plugin tests and fetch-helper exports.
refs: test/index.test.ts:215-218, lib/request/fetch-helpers.ts:643-675.

Add explicit loadAccounts tests for rotating-backup cleanup concurrency controls:

- verify 30s cleanup interval throttling per storage path
- verify in-flight cleanup promise dedup across concurrent loadAccounts calls

This addresses the missing coverage gap called out by Greptile.

Co-authored-by: Codex <noreply@openai.com>
@ndycode
Copy link
Owner Author

ndycode commented Mar 4, 2026

Addressed Greptile’s remaining 4/5 gap in commit 917d936.

Added explicit Vitest coverage for lib/storage.ts cleanup concurrency controls in est/storage.test.ts:

  • verifies 30s interval throttling for stale rotating-backup cleanup
  • verifies in-flight cleanup promise dedup across concurrent loadAccounts() calls

Validation rerun:

  • npm run lint
  • npm run typecheck
  • npm test (100 files, 2335 tests)
  • npm run perf:runtime:ci

- preserve manual accountId on snapshot merge
- clarify refresh guardian last-tick metric semantics
- add CRLF/fallback/concurrency/metrics regression tests

Co-authored-by: Codex <noreply@openai.com>
@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

@coderabbitai review

Re-triggering review to clear the transient rate-limit status check failure.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

@ndycode: re-triggering the review now. will cover the new test/storage.test.ts concurrency coverage additions and the full sweep across all changed paths.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

Re-requesting CodeRabbit review: the current failure is CodeRabbit: Review rate limit exceeded while other checks are green.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

Follow-up tracking: external blocker recorded in #44.

Current failing context remains:

  • CodeRabbit: Review rate limit exceeded

Actions already taken:

  • re-triggered @coderabbitai review
  • verified review threads are resolved

Next required step is external quota reset/increase, then rerun CodeRabbit.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

@ndycode: acknowledged — the "review rate limit exceeded" status is a quota constraint on the CodeRabbit side, not a code or ci failure. re-triggering the review now. if the quota hasn't reset yet this may show the same status, but once it clears the review will run automatically against the latest commits including the test/storage.test.ts concurrency coverage added in 917d936.

if you need to escalate the quota reset faster, the CodeRabbit FAQ and pricing pages have details on plan limits and support channels: https://docs.coderabbit.ai/faq/ and https://docs.coderabbit.ai/about/pricing/

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

No code changes; this commit exists to re-run external CodeRabbit status checks that previously failed due provider rate limits.\n\nCo-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/codex-cli/sync.ts`:
- Around line 100-108: The updateAccountIndexes flow currently clears entries
for the previous AccountMetadataV3 and then indexes the next, which can miss
re-keying when refresh token or email changes; update updateAccountIndexes to
explicitly remove any old keys tied to previous.refreshToken and previous.email
(use clearAccountIndexEntries(previous, index) and then indexAccount(indexes,
next, index)) and ensure indexAccount handles collisions by deduping existing
entries before inserting; add a vitest that mutates an account’s refresh token
and email, applies a snapshot keyed by the new values, and asserts no duplicate
account is inserted (covering updateAccountIndexes, clearAccountIndexEntries,
indexAccount), and extend tests/queue logic to simulate EBUSY and 429 retry
paths to validate the new re-keying under concurrent IO.

In `@lib/refresh-guardian.ts`:
- Around line 132-147: The token-only lookup (liveIndexByRefreshToken) can
collide or miss accounts and cause wrong-account mutations; change resolution in
refresh-guardian to first try a token-to-index map that stores arrays of indices
(liveIndexByRefreshToken: Map<string, number[]>) and disambiguate by matching a
stable key from the snapshot (e.g. sourceAccount.id or sourceAccount.uniqueId or
sourceAccount.email) against manager.getAccountByIndex(...) candidates, and if
no token exists or no candidate matches, fall back to resolving by the original
snapshot index via manager.getAccountByIndex(accountIndex) (or by a stable
id-to-index map if available). Update uses of liveIndexByRefreshToken, the loop
over refreshResults, and the logic around sourceAccount.refreshToken and
resolvedIndex to implement this fallback and de-duplication; add/adjust tests to
cover no_refresh_token paths, duplicate-refresh-token scenarios, and ensure logs
do not contain secret tokens.

In `@test/index-retry.test.ts`:
- Around line 258-271: The mock refresh handler refreshAndUpdateTokenMock
currently deduplicates calls by using a shared refreshInFlight promise, which
masks plugin-level concurrency bugs; change the mock so each call returns a new
promise that calls refreshEndpoint (i.e., remove the refreshInFlight caching
logic in refreshAndUpdateTokenMock) so concurrent invocations in the test (the
calls asserted in the block around lines 294-304) exercise real race behavior;
ensure the mock still assigns refreshed data into the passed auth object
(Object.assign(auth, refreshed)) but do not coalesce calls—let the plugin under
test be responsible for orchestration so the test deterministically fails when
refresh dedupe regresses.

In `@test/storage.test.ts`:
- Around line 753-816: The polling loop waiting for cleanupScanCount is flaky;
replace it with a deterministic wait such as vi.waitFor(() => cleanupScanCount
=== 1) (or increase timeout) before starting the second loadAccounts call so the
test reliably detects the in-flight scan; update the test around the
cleanupScanCount polling (the for loop between the firstLoad = loadAccounts()
and secondLoad = loadAccounts()), keeping the existing readdirSpy, scanGate,
releaseScan, and final restore behavior intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 59700bc5-7894-4f14-88de-8d9a15a5bb4e

📥 Commits

Reviewing files that changed from the base of the PR and between 526f261 and bfdfe6b.

📒 Files selected for processing (8)
  • lib/codex-cli/sync.ts
  • lib/refresh-guardian.ts
  • test/codex-cli-sync.test.ts
  • test/fetch-helpers.test.ts
  • test/index-retry.test.ts
  • test/index.test.ts
  • test/response-handler.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/response-handler.test.ts
  • test/storage.test.ts
  • test/fetch-helpers.test.ts
  • test/index.test.ts
  • test/codex-cli-sync.test.ts
  • test/index-retry.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/refresh-guardian.ts
  • lib/codex-cli/sync.ts
🧬 Code graph analysis (5)
test/response-handler.test.ts (1)
lib/request/response-handler.ts (1)
  • convertSseToJsonWithDetails (65-139)
test/storage.test.ts (1)
lib/storage.ts (2)
  • setStoragePathDirect (467-472)
  • loadAccounts (889-891)
test/fetch-helpers.test.ts (1)
lib/request/fetch-helpers.ts (1)
  • handleSuccessResponseDetailed (644-676)
test/index-retry.test.ts (1)
index.ts (1)
  • OpenAIAuthPlugin (4256-4256)
lib/codex-cli/sync.ts (3)
lib/storage.ts (2)
  • AccountMetadataV3 (26-26)
  • AccountStorageV3 (26-26)
lib/codex-cli/state.ts (1)
  • CodexCliAccountSnapshot (24-27)
lib/prompts/codex.ts (2)
  • ModelFamily (47-52)
  • MODEL_FAMILIES (58-64)
🪛 ast-grep (0.41.0)
test/index.test.ts

[warning] 757-760: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (9)
test/storage.test.ts (2)

700-751: solid regression test for the 30s cleanup throttle.

test correctly advances Date.now past the 30s window and validates cleanup scans fire at expected boundaries. the spy setup is clean and properly intercepts only the storage directory.

one minor note: line 741 advances by 29_000 and line 743 by 1_001, which totals 30_001ms from the first call. this correctly crosses the 30s threshold. the assertion at line 750 expecting exactly 2 scans aligns with the throttle spec.


711-716: fake test credentials are appropriate here.

these fake tokens and emails (real-refresh-token, real-user@example.com) are clearly fixture data for testing purposes. no real secrets or pii are exposed. per coding guidelines, this is acceptable.

test/codex-cli-sync.test.ts (1)

656-705: good regression coverage for manual-id preservation on token-based merge.

test/codex-cli-sync.test.ts:656 (Line 656) through test/codex-cli-sync.test.ts:705 (Line 705) is deterministic and directly protects the auth-rotation merge path by asserting manual accountId/accountIdSource are preserved while token/email fields still refresh.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.

lib/codex-cli/sync.ts (1)

160-169: manual account-id precedence is correctly preserved now.

lib/codex-cli/sync.ts:160 (Line 160) through lib/codex-cli/sync.ts:169 (Line 169) keeps local manual identity authoritative during snapshot merge, which avoids auth-rotation identity drift.

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

lib/refresh-guardian.ts (2)

192-202: duration metric semantics look good here.

lib/refresh-guardian.ts:192-202 now clearly documents why lastTickDurationMs includes no-op ticks while aggregate duration metrics stay gated behind counted runs. this closes the earlier ambiguity cleanly.


139-180: regression tests for no_refresh_token, duplicate tokens, and 429 handling already exist.

the tests you asked for are already in test/refresh-guardian.test.ts. here's what's covered:

  • no_refresh_token disabling behavior at test/refresh-guardian.test.ts:279-372: the "classifies failure reasons" test verifies that setAccountEnabled(1, false) is called for accountB when reason is no_refresh_token (line 346-347), and markAccountCoolingDown(accountB, 60_000, "auth-failure") is invoked (line 350).

  • duplicate token reconciliation at test/refresh-guardian.test.ts:221-277: the "resolves refreshed account using stable refresh token when indices shift" test proves that accounts are correctly mapped by refresh token even when array indices change. additionally, test/refresh-guardian.test.ts:584-654 tests account removal safety during tick.

  • 429 classification at test/refresh-guardian.test.ts:279-372: line 310 sends a 429 statusCode, and line 353 confirms it's classified as "rate-limit" via markAccountCoolingDown(accountC, 60_000, "rate-limit").

no additional test coverage is needed for these scenarios.

			> Likely an incorrect or invalid review comment.
test/fetch-helpers.test.ts (1)

509-528: good deterministic windows sse regression coverage.

test/fetch-helpers.test.ts:509-528 now validates both standard and crlf-framed payloads with strict parsedBody and response.json() assertions, which cleanly exercises the non-stream detailed path in lib/request/fetch-helpers.ts:643-675.

test/response-handler.test.ts (1)

56-77: detail-mode fallback contract is now covered correctly.

test/response-handler.test.ts:56-77 now asserts both success metadata extraction and the no-final-event fallback (parsedBody undefined + raw text preserved), aligned with lib/request/response-handler.ts:64-138.

test/index.test.ts (1)

749-840: metrics assertions are materially stronger and concurrency-aware.

test/index.test.ts:749-840 moves from loose substring checks to parsed numeric assertions and adds a parallel fetch case, giving a deterministic guardrail for metrics aggregation regressions under concurrent load.

Add deterministic regressions for sync/indexing and refresh races, harden refresh guardian account resolution, and move auth refresh deduping into plugin orchestration.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
index.ts (1)

841-844: ⚠️ Potential issue | 🟠 Major

hydrate save needs windows-safe retry/backoff.

this persistence is single-shot. transient ebusy/eperm on windows can fail hydration writes and drop refreshed token/email updates in this pass. wrap saveAccounts with a small bounded retry (same pattern already used for watcher path switches).

proposed fix
 if (changed) {
   storage.accounts = accountsCopy;
-  await saveAccounts(storage);
+  for (let attempt = 0; attempt < 3; attempt += 1) {
+    try {
+      await saveAccounts(storage);
+      break;
+    } catch (error) {
+      const code = (error as NodeJS.ErrnoException | undefined)?.code;
+      if ((code !== "EBUSY" && code !== "EPERM") || attempt === 2) throw error;
+      await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt));
+    }
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@index.ts` around lines 841 - 844, Wrap the call to saveAccounts(...) inside
the same bounded retry/backoff you use for watcher path switches: detect
transient Windows errors (e.g., EBUSY/EPERM), retry a small number of times with
a short exponential/backoff delay, and only throw after retries are exhausted;
update the block where storage.accounts = accountsCopy and await
saveAccounts(storage) to perform the guarded retry so hydration writes are
resilient to transient filesystem locks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@index.ts`:
- Around line 392-393: The current dedup key generation for in-flight refreshes
uses a truncated fallback `access:${auth.access.slice(0, 32)}` which can
collide; update the logic that builds refreshKey (where auth.refresh is checked
and refreshKey is used to look up inFlightAuthRefreshByToken) to use a
collision-safe identifier instead of a 32-char prefix — e.g., use the full
access token or a deterministic hash (sha256/base64) of auth.access (or include
another unique claim/id from the token) so distinct access tokens cannot map to
the same refreshKey.
- Around line 389-410: Add deterministic regression tests covering
refreshAuthWithDedup and email hydration concurrency: write tests that (1)
stub/mock refreshAndUpdateToken and call refreshAuthWithDedup concurrently with
the same OAuthAuthDetails to assert the returned value is the identical Promise
instance and that inFlightAuthRefreshByToken contains only one entry for the
refresh key while pending, (2) trigger a rejection from the mocked
refreshAndUpdateToken and assert the finally block removes the entry from
inFlightAuthRefreshByToken after the promise settles, and (3) exercise the
hydration worker path by bypassing or mocking the hydrateEmails skip (the
environment check around hydrateEmails) and by controlling worker tasks to
deterministically saturate HYDRATE_EMAILS_MAX_CONCURRENCY (4) assert
emailHydrationMaxConcurrencyUsed never exceeds HYDRATE_EMAILS_MAX_CONCURRENCY
and that metrics emailHydrationRuns, emailHydrationCandidates,
emailHydrationRefreshed, emailHydrationUpdated, emailHydrationFailures and
emailHydrationCumulativeDurationMs accumulate expected values; use injected
mocks/fake timers/promises to make all concurrency behavior deterministic and
locate targets by symbol names refreshAuthWithDedup, inFlightAuthRefreshByToken,
refreshAndUpdateToken, hydrateEmails, HYDRATE_EMAILS_MAX_CONCURRENCY and the
emailHydration* metric names.

In `@test/codex-cli-sync.test.ts`:
- Around line 263-289: The test is accidentally passing because
syncAccountStorageFromCodexCli may reuse a cached codex-cli snapshot; before
calling syncAccountStorageFromCodexCli the second time, explicitly clear the
codex-cli state cache or force a refresh so the rewritten file is re-read. Add a
step between the firstSync and secondSync that either calls your existing
cache-clear helper (e.g., clearCodexCliCache or resetCodexCliStateCache) or
calls syncAccountStorageFromCodexCli with a forceRefresh/disableCache option so
the secondSync actually reads the updated accounts JSON and exercises the
rekey-on-updated-snapshot path.

---

Outside diff comments:
In `@index.ts`:
- Around line 841-844: Wrap the call to saveAccounts(...) inside the same
bounded retry/backoff you use for watcher path switches: detect transient
Windows errors (e.g., EBUSY/EPERM), retry a small number of times with a short
exponential/backoff delay, and only throw after retries are exhausted; update
the block where storage.accounts = accountsCopy and await saveAccounts(storage)
to perform the guarded retry so hydration writes are resilient to transient
filesystem locks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 78c5a5c8-5de0-4aa4-8b80-be99714532a0

📥 Commits

Reviewing files that changed from the base of the PR and between bfdfe6b and 596f48f.

📒 Files selected for processing (6)
  • index.ts
  • lib/refresh-guardian.ts
  • test/codex-cli-sync.test.ts
  • test/index-retry.test.ts
  • test/refresh-guardian.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/refresh-guardian.test.ts
  • test/storage.test.ts
  • test/codex-cli-sync.test.ts
  • test/index-retry.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/refresh-guardian.ts
🧬 Code graph analysis (3)
test/storage.test.ts (1)
lib/storage.ts (2)
  • setStoragePathDirect (467-472)
  • loadAccounts (889-891)
test/codex-cli-sync.test.ts (3)
scripts/benchmark-runtime-path.mjs (2)
  • accountsPath (208-208)
  • current (292-292)
lib/storage.ts (1)
  • AccountStorageV3 (26-26)
lib/codex-cli/sync.ts (1)
  • syncAccountStorageFromCodexCli (356-449)
test/index-retry.test.ts (1)
index.ts (1)
  • OpenAIAuthPlugin (4272-4272)
🔇 Additional comments (6)
test/storage.test.ts (1)

700-751: good deterministic coverage for cleanup throttle and in-flight dedup.

the new regressions in test/storage.test.ts:700-816 validate both the 30s throttle window and concurrent in-flight dedup with explicit gating and assertions. this is the right shape for filesystem/concurrency hot-path protection.

Also applies to: 753-816

test/codex-cli-sync.test.ts (1)

729-779: manual accountid preservation coverage is strong.

test/codex-cli-sync.test.ts:729-779 correctly asserts that manual accountId and accountIdSource survive refresh-token matching while auth/email fields still update.

lib/refresh-guardian.ts (1)

137-181: account resolution + tick telemetry changes look good.

the new candidate disambiguation and fallback flow in lib/refresh-guardian.ts:137-181 materially lowers wrong-account mutation risk under duplicate refresh tokens, and the duration aggregation in lib/refresh-guardian.ts:227-237 is cleanly implemented with counted-run gating.

Also applies to: 227-237

test/refresh-guardian.test.ts (1)

279-343: excellent regression additions for duplicate-token and metric paths.

test/refresh-guardian.test.ts:279-380 covers the core concurrency/account-resolution hazards, and test/refresh-guardian.test.ts:659-685 locks down the new tick-duration aggregates with deterministic timing control.

Also applies to: 345-380, 659-685

test/index-retry.test.ts (1)

242-299: good concurrency regression for refresh dedup under retry pressure.

the new test at test/index-retry.test.ts:242-299 is the right race repro: parallel sdk.fetch calls, gated refresh completion, and strict single-refresh assertions.

index.ts (1)

2367-2379: parsed-body reuse is a good hot-path optimization.

using handleSuccessResponseDetailed and only parsing on-demand for empty-response retries keeps behavior intact while avoiding redundant clone+parse in the common path.

- skip refresh dedup when refresh token is absent to prevent key collisions
- preserve existing inferred accountIdSource when syncing Codex CLI snapshots
- add storage cleanup-state reset hook for test/plugin isolation
- add deterministic regressions for dedup cleanup, no-refresh fanout, hydration metrics, and sync cache refresh behavior

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
index.ts (1)

754-843: ⚠️ Potential issue | 🔴 Critical

add explicit vitest for hydration worker pool 429 handling and bounded concurrency.

the hydration worker queue at index.ts:754-843 has zero test coverage. i found no tests for hydrateEmails, HYDRATE_EMAILS_MAX_CONCURRENCY, or emailHydration in the test directory. while queuedRefresh has 429 coverage (test/refresh-queue.test.ts:373), it doesn't exercise the multi-worker concurrency scenario that your bounded pool introduces. add a vitest that drives concurrent queuedRefresh calls within the worker pool and asserts the 429 backoff behavior doesn't leak into the next hydration batch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@index.ts` around lines 754 - 843, Add a vitest that targets the email
hydration worker pool by invoking the function that runs this block
(hydrateEmails or the exported function that processes accounts using
HYDRATE_EMAILS_MAX_CONCURRENCY) and mock queuedRefresh to simulate 429 responses
for some calls (then succeed on retry) and varied latencies; assert that
concurrency never exceeds HYDRATE_EMAILS_MAX_CONCURRENCY, that 429 backoffs are
retried without affecting other workers (no leak into subsequent batch), and
that runtimeMetrics (emailHydrationRefreshed, emailHydrationFailures,
emailHydrationUpdated, emailHydrationMaxConcurrencyUsed, etc.) reflect the
expected counts; use vitest mocks/fake timers to control timing and ensure
Promise.all completes deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/storage.ts`:
- Around line 392-425: The cleanup loop that unlinks staleArtifacts (inside the
anonymous async IIFE updating rotatingBackupCleanupState) must add
transient-retry handling for fs.unlink failures with Windows transient codes
("EBUSY", "EPERM"): implement a small retry loop with limited attempts and short
backoff (e.g., exponential or fixed delay) around the unlink call in the for
(const staleArtifactPath of staleArtifacts) block, treat EBUSY/EPERM as
transient (retry) and only log and give up after attempts are exhausted; also
ensure lastRunAt in rotatingBackupCleanupState is not advanced to Date.now()
when transient unlink retries still failed—only set lastRunAt after all
artifacts are either removed or definitively failed with non-transient errors;
update the failure logging to include attempt count and error details;
add/adjust vitest unit tests to simulate EBUSY/EPERM transient failures on
fs.unlink and verify retries occur and lastRunAt behavior is correct (throttling
not advanced on transient-failure exhaustion).

In `@test/index.test.ts`:
- Around line 844-919: Add a new test that mirrors the existing "tracks
hydration worker cap..." case but simulates partial failures: set
mockStorage.accounts to 6 entries, vi.mocked(queuedRefresh) should resolve
successfully for 4 tokens and throw/return failure for 2 tokens (identify by
refreshToken or index), keep the same active/maxActive tracking to validate
bounded concurrency, call oauthMethod.authorize() and then await
plugin.tool["codex-metrics"].execute(), and assert metricValue(report, "Email
hydration failures") === 2 while also asserting "Email hydration
refreshed"/"updated" reflect 4 successes, "Email hydration candidates" is 6, and
both the reported "Email hydration max concurrency used" and local maxActive are
<=4; reuse promptLoginMode mocking and the same env var save/restore pattern as
in the original test to avoid side effects.
- Around line 903-917: Refactor the repeated env restore logic by extracting a
small helper (e.g., restoreEnv) that takes an env var name and its previous
value and either deletes the key or resets it; replace the three blocks that
reference previousVitestWorkerId/VITEST_WORKER_ID, previousNodeEnv/NODE_ENV, and
previousSkipHydrate/CODEX_SKIP_EMAIL_HYDRATE with calls to that helper to DRY up
the cleanup in the test teardown.

In `@test/storage.test.ts`:
- Around line 703-754: Add a deterministic vitest regression that injects a
transient Windows-style lock error into fs.unlink while exercising loadAccounts
cleanup logic: mock fs.unlink (via vi.spyOn or vi.stub) to throw an
EBUSY/EPERM-like error on the first call for the targeted stale artifact path
and succeed on subsequent calls, then trigger concurrent calls to loadAccounts
(or otherwise invoke the cleanup queue path used by the cleanup logic) to
reproduce the transient-lock scenario and assert the cleanup finishes without
unhandled rejection and the expected cleanup side-effects still occur; restore
the spy after the test. Use the existing test patterns (date mocking, readdir
spy) and reference loadAccounts and the unlink spy to locate the scope to
change.

---

Outside diff comments:
In `@index.ts`:
- Around line 754-843: Add a vitest that targets the email hydration worker pool
by invoking the function that runs this block (hydrateEmails or the exported
function that processes accounts using HYDRATE_EMAILS_MAX_CONCURRENCY) and mock
queuedRefresh to simulate 429 responses for some calls (then succeed on retry)
and varied latencies; assert that concurrency never exceeds
HYDRATE_EMAILS_MAX_CONCURRENCY, that 429 backoffs are retried without affecting
other workers (no leak into subsequent batch), and that runtimeMetrics
(emailHydrationRefreshed, emailHydrationFailures, emailHydrationUpdated,
emailHydrationMaxConcurrencyUsed, etc.) reflect the expected counts; use vitest
mocks/fake timers to control timing and ensure Promise.all completes
deterministically.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 500ad623-5563-4c44-ad02-65bb19bd1cb0

📥 Commits

Reviewing files that changed from the base of the PR and between 596f48f and 02337db.

📒 Files selected for processing (7)
  • index.ts
  • lib/codex-cli/sync.ts
  • lib/storage.ts
  • test/codex-cli-sync.test.ts
  • test/index-retry.test.ts
  • test/index.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/storage.test.ts
  • test/index-retry.test.ts
  • test/codex-cli-sync.test.ts
  • test/index.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/storage.ts
  • lib/codex-cli/sync.ts
🧬 Code graph analysis (5)
test/storage.test.ts (1)
lib/storage.ts (3)
  • resetRotatingBackupCleanupStateForTesting (46-51)
  • setStoragePathDirect (475-481)
  • loadAccounts (898-900)
index.ts (1)
lib/request/fetch-helpers.ts (2)
  • refreshAndUpdateToken (356-391)
  • handleSuccessResponseDetailed (644-676)
lib/storage.ts (1)
lib/storage/migrations.ts (1)
  • AccountMetadataV3 (40-57)
test/codex-cli-sync.test.ts (3)
scripts/benchmark-runtime-path.mjs (2)
  • accountsPath (208-208)
  • current (292-292)
lib/codex-cli/sync.ts (1)
  • syncAccountStorageFromCodexCli (356-449)
lib/codex-cli/state.ts (1)
  • clearCodexCliStateCache (545-549)
test/index.test.ts (2)
lib/refresh-queue.ts (1)
  • queuedRefresh (408-410)
lib/cli.ts (1)
  • promptLoginMode (185-289)
🪛 ast-grep (0.41.0)
test/index.test.ts

[warning] 759-762: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 854-854: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${escapedLabel}: (\\d+)$, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (13)
lib/storage.ts (1)

771-781: good o(1) remap path for active and family indices.

the buildAccountIndexByKey path at lib/storage.ts:771-781 and lookups at lib/storage.ts:846-853 / lib/storage.ts:876-877 remove repeated linear scans cleanly, and the remap behavior is covered by test/storage.test.ts:45-79 and test/storage.test.ts:974-993.

Also applies to: 846-853, 876-877

test/storage.test.ts (1)

21-22: good isolation guard for shared cleanup state.

using resetRotatingBackupCleanupStateForTesting in both hooks at test/storage.test.ts:36 and test/storage.test.ts:40 prevents cross-test leakage from the per-path in-memory queue state.

Also applies to: 33-41

index.ts (2)

389-413: refresh dedup cleanup logic looks correct.

index.ts line 389-413 correctly skips dedup when refresh is absent and clears inFlightAuthRefreshByToken entries in finally, so failed refreshes do not poison future calls.


2370-2382: parsedbody threading is integrated cleanly.

index.ts line 2370-2382 uses handleSuccessResponseDetailed first and only falls back to lazy body parsing when parsedBody is missing, which is the right hot-path behavior.

lib/codex-cli/sync.ts (2)

100-108: index rekey maintenance is solid.

lib/codex-cli/sync.ts line 100-108 and line 196-198 clear old keys before reindexing merged records, which keeps refresh/email/accountId lookups stable after key updates.

Also applies to: 196-198


160-169: manual account id preservation guard is correct.

lib/codex-cli/sync.ts line 160-169 keeps manual accountId ownership intact while still allowing token/email/access merges, which protects auth rotation identity semantics.

test/codex-cli-sync.test.ts (2)

218-292: the rekey regression is deterministic and exercises the real path.

test/codex-cli-sync.test.ts line 263-287 rewrites snapshot data, clears codex cli state cache, and asserts changed plus token/email updates, so this is no longer self-fulfilling.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.


294-340: accountidsource preservation coverage is strong.

test/codex-cli-sync.test.ts line 294-340 and line 780-829 validate inferred/manual source preservation during snapshot merges and protect rotation identity invariants.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Also applies to: 780-829

test/index-retry.test.ts (2)

242-299: concurrent dedup regression is now scoped to plugin orchestration.

test/index-retry.test.ts line 258-264 avoids mock-side coalescing, and line 287-299 asserts a single refresh path across 10 concurrent sdk.fetch calls.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.


301-378: no-refresh fanout and failed-entry cleanup are both covered well.

test/index-retry.test.ts line 301-378 verifies no-refresh concurrent fanout behavior, and line 380-435 verifies failed dedup entries are cleaned so later retries re-execute refresh.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Also applies to: 380-435

test/index.test.ts (3)

215-218: lgtm — mock aligns with the detailed handler contract.

the mock correctly returns { response, parsedBody: undefined } which matches the handleSuccessResponseDetailed signature used in lib/request/fetch-helpers.ts. parsedBody as undefined is appropriate for the default test path.


748-842: solid metrics regression test — addresses prior feedback about coarse assertions.

the readNumericMetric helper with exact format matching and the parallel execution via Promise.all at lines 814-821 will catch counter aggregation races. the exact increment check at line 835 (totalAfter - baselineTotal === parallelRuns) is the right approach.

re: static analysis ReDoS warning at lines 759-762 — this is a false positive in test context since escapedLabel escapes regex metacharacters and all labels are test-controlled string literals ("Total upstream requests", etc.).


853-858: static analysis ReDoS warning is a false positive here.

the metricValue helper at line 854 builds regex from escapedLabel which is always a test-controlled literal string passed directly in the assertions below (e.g., "Email hydration runs"). the escape function handles regex metacharacters. no risk in test context.

Capture refresh-guardian max tick duration across error/no-op ticks, add transient stale-artifact unlink retries with non-throttled re-scan on exhausted transient locks, and expand deterministic hydration/storage regressions requested in review.

Co-authored-by: Codex <noreply@openai.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/refresh-guardian.ts`:
- Around line 137-145: The current logic in getAccountsSnapshot handling (in
lib/refresh-guardian.ts) records loop index i into liveIndexByRefreshToken and
then calls manager.getAccountByIndex(...) using those array positions, which is
wrong for reordered or sparse snapshots; instead capture and store each
account's stable account.index (candidate.index) when populating
liveIndexByRefreshToken, update any downstream use of those vectors to call
manager.getAccountByIndex(accountIndex) with that stored account.index, and
replace the bounds check that compares sourceAccount.index against
liveAccounts.length with a validation against the account index space (or use
manager.hasAccountIndex/getAccountByIndex safely) so we never mutate the wrong
account; run/update the deterministic tests in test/refresh-guardian.test.ts to
cover reordered/sparse indices and ensure logging still avoids leaking
tokens/emails.

In `@lib/storage.ts`:
- Around line 458-470: The finally block can re-insert stale state after
setStoragePath()/setStoragePathDirect() cleared or replaced it; modify the
finally so it only updates rotatingBackupCleanupState for the same path if the
current state's inFlight still equals the cleanupPromise (i.e., read current =
rotatingBackupCleanupState.get(path) and only call set(...) when
current?.inFlight === cleanupPromise), using the same
path/cleanupPromise/shouldAdvanceLastRunAt symbols; apply this guard where the
finally writes the lastRunAt/inFlight so an older cleanup cannot overwrite a
cleared or swapped state.
- Around line 448-455: The current catch around the fs.readdir call advances
lastRunAt even for transient lock errors (EBUSY/EPERM/EAGAIN), suppressing
retries; change the error handling so that when (error as
NodeJS.ErrnoException).code is one of the transient codes (EBUSY, EPERM, EAGAIN)
you do not update lastRunAt and return/exit early (or rethrow) instead of
proceeding to the success path that sets lastRunAt; keep the existing handling
for ENOENT and non-transient errors (log/warn) but only update lastRunAt inside
the success/completion path of the scan function (or after confirming no
transient error), and add/adjust vitest coverage to assert that lastRunAt is not
advanced for EBUSY/EPERM/EAGAIN scenarios.

In `@test/index.test.ts`:
- Around line 849-855: Hoist the duplicate restoreEnv helper into the outer
describe("codex-metrics tool") scope so both test blocks reuse the same
function; remove the inline definitions at the two duplicate sites (the ones
currently defining restoreEnv) and update the tests to call the shared
restoreEnv; keep the signature (key: string, previous: string | undefined): void
and behavior (delete env when previous is undefined, otherwise restore) so
existing calls continue to work.
- Around line 860-865: The metric parsing logic is duplicated as metricValue in
two places (test/index.test.ts lines around 860–865 and 932–937); consolidate by
moving a single helper into the shared describe scope (or replace both uses with
the existing readNumericMetric helper) so both tests call the same function.
Create one helper (e.g., metricValue or readNumericMetric) at the describe-level
that takes (report: string, label: string) and reuses the existing escaped-label
RegExp logic, then update both test sites to call that single helper and remove
the duplicate definition.

In `@test/refresh-guardian.test.ts`:
- Around line 297-303: The test mock for manager.getAccountByIndex is positional
(returns snapshots[1][index]) which hides bugs where code resolves array
positions instead of actual account.index values; change the mock in
test/refresh-guardian.test.ts so getAccountByIndex(index) looks up accounts by
their index property in the current snapshot (e.g., find the entry where
acc.index === index within the snapshot returned by getAccountsSnapshot) to
force the code path in lib/refresh-guardian (the index-resolution logic around
the duplicate-token race) to use real account.index semantics and keep the
regression deterministic.

In `@test/storage.test.ts`:
- Around line 703-937: Add a deterministic regression that reproduces a
path-transition during an in-flight cleanup scan: when calling loadAccounts()
start a cleanup scan (spy/mock fs.readdir or fs.unlink to detect/hold the scan
using a Promise gate), then call setStoragePathDirect(...) to switch the storage
path while that first scan is still blocked, then release the gate and await
both the first and a subsequent loadAccounts() call; assert that cleanupScan
counts are tracked per-directory (the original dir got cleaned and the new dir’s
cleanup did not leak into the old dir) and that stale artifacts in the old path
are removed while the new path remains intact. Target the loadAccounts logic and
the cleanup code paths referenced (the rotating cleanup and stale-artifact
unlink/retry behavior) using the same spy/restore pattern (readdirSpy/unlinkSpy
+ release gate) and vitest waitFor to keep the test deterministic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 96db57de-7221-4ae3-9f01-e3e6683ddffd

📥 Commits

Reviewing files that changed from the base of the PR and between 02337db and 01d6305.

📒 Files selected for processing (5)
  • lib/refresh-guardian.ts
  • lib/storage.ts
  • test/index.test.ts
  • test/refresh-guardian.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/refresh-guardian.test.ts
  • test/storage.test.ts
  • test/index.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/refresh-guardian.ts
  • lib/storage.ts
🧬 Code graph analysis (4)
test/refresh-guardian.test.ts (1)
lib/refresh-guardian.ts (1)
  • RefreshGuardian (38-242)
test/storage.test.ts (1)
lib/storage.ts (3)
  • resetRotatingBackupCleanupStateForTesting (46-51)
  • setStoragePathDirect (520-526)
  • loadAccounts (943-945)
test/index.test.ts (2)
lib/refresh-queue.ts (1)
  • queuedRefresh (408-410)
lib/cli.ts (1)
  • promptLoginMode (185-289)
lib/storage.ts (1)
lib/storage/migrations.ts (1)
  • AccountMetadataV3 (40-57)
🪛 ast-grep (0.41.0)
test/index.test.ts

[warning] 759-762: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 861-861: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${escapedLabel}: (\\d+)$, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 933-933: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${escapedLabel}: (\\d+)$, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (5)
test/index.test.ts (4)

215-218: lgtm — mock aligns with new detailed response handler.

the mock returns { response, parsedBody: undefined } which matches the signature exposed by lib/request/fetch-helpers.ts. consistent with the other response handling mocks in this file.


748-842: solid numeric assertions replace loose toContain checks — addresses prior feedback.

the readNumericMetric helper with explicit regex patterns and the parallel fetch aggregation test properly verify that metrics increment deterministically. the delta assertions (expect(totalAfter - baselineTotal).toBe(parallelRuns)) are exactly the kind of concurrency regression coverage the guidelines demand.

the static analysis flags ReDoS on the regex at test/index.test.ts:759-762, but since label values are hardcoded test strings (e.g., "Total upstream requests"), not external input, this is a false positive in the test context.


916-1017: good partial-failure regression — validates worker pool doesn't stall on errors.

this test directly addresses the prior review feedback requesting a case where 2 of 6 refreshes fail. the assertions at test/index.test.ts:1000-1003 confirm:

  • deltaRefreshed === 4 (successes)
  • deltaFailures >= 2 (failures)
  • deltaFailures === deltaCandidates - deltaRefreshed (accounting identity)

the maxActive <= 4 check at line 1011 ensures bounded concurrency holds even when some workers throw. this is the concurrency regression coverage the guidelines require.


844-914: hydration worker cap test is deterministic and correctly asserts bounded concurrency.

the active/maxActive tracking pattern at test/index.test.ts:874-888 accurately captures peak parallelism, and the dual assertion at lines 907-908 (reported metric + local counter) is the right approach per prior feedback.

env var cleanup in finally block at lines 909-913 is correct.

lib/storage.ts (1)

816-826: good move to keyed index remapping for active/family lookups.

the buildAccountIndexByKey path in lib/storage.ts:816-923 removes repeated linear scans during normalization and keeps behavior clear.

Also applies to: 891-923

ndycode added 2 commits March 6, 2026 00:42
Address PR #38 review findings with minimal behavior-safe changes:\n- map live refresh-token candidates by stable account.index\n- keep stale-artifact cleanup throttle unchanged on transient scan failures\n- guard cleanup finalizer to avoid stale in-flight state writes after path transitions\n- add regression tests for sparse/reordered index lookups and path-transition isolation\n- dedupe duplicated codex-metrics test helpers\n\nCo-authored-by: Codex <noreply@openai.com>
Treat EACCES as a transient unlink failure in rotating backup cleanup retry logic and add deterministic regression coverage to ensure exhausted EACCES retries do not advance cleanup throttling.\n\nCo-authored-by: Codex <noreply@openai.com>
@ndycode
Copy link
Owner Author

ndycode commented Mar 5, 2026

All previously open review threads are now addressed and resolved, including the follow-up Windows EACCES transient cleanup handling + regression tests in 462cce6. Requesting re-review for approval refresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant