perf: performance & efficiency sweep (request/storage/refresh hot paths)#38
perf: performance & efficiency sweep (request/storage/refresh hot paths)#38
Conversation
- replace expensive tool-schema clone path with fast JSON-safe clone and lower-allocation cleanup loops - eliminate repeated index map rebuilds and JSON.stringify comparisons in codex-cli account sync - parse error bodies once in fetch helpers and thread parsed payload through rate-limit mapping - add detailed non-stream success handling to avoid duplicate JSON parse in empty-response retry checks - extend runtime benchmark harness with sync/error/non-stream cases and update tests/mocks for new helper exports Co-authored-by: Codex <noreply@openai.com>
Bound expensive fan-out in account refresh/email hydration paths, cache index remaps in storage, and add runtime benchmark regression gating in CI. - throttle stale rotating backup cleanup scans per storage path - add bounded concurrency for proactive refresh and email hydration - reduce refresh-guardian lookup churn with precomputed token->index mapping - extend runtime benchmark coverage and add CI gate budgets Co-authored-by: Codex <noreply@openai.com>
Instrument hydration and proactive refresh hot paths with duration and throughput counters, and surface guardian tick timing in runtime metrics output. - track email hydration runs/candidates/refreshed/updated/failures/duration - log proactive refresh batch timing with effective concurrency - add guardian tick duration stats (last/max/avg) Co-authored-by: Codex <noreply@openai.com>
Tighten metrics correctness and improve test coverage for the new performance instrumentation. - keep guardian avg tick duration synchronized with cumulative/runs - add deterministic guardian duration aggregate test - assert codex-metrics output includes hydration/guardian timing lines Co-authored-by: Codex <noreply@openai.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughadds a runtime perf gate and benchmarking tooling; exposes parsed response bodies via detailed handlers; introduces bounded worker pools for email hydration and proactive refresh; refactors account indexing for O(1) lookups; adds rotating backup cleanup throttling and retry unlink; improves circular-safe cloning; expands guardian/runtime timing and hydration metrics. Changes
Sequence Diagram(s)sequenceDiagram
participant client as client
participant sdk as sdk.fetch
participant handler as handleSuccessResponseDetailed
participant hydrator as hydrationPool
participant storage as storage
client->>sdk: make api request
sdk->>handler: receive http response
handler-->>sdk: { response, parsedBody }
sdk->>hydrator: enqueue accounts (maxConcurrency = N)
par workers
hydrator->>handler: worker network call (uses parsedBody if present)
hydrator->>storage: persist per-account updates and worker metrics
end
hydrator->>storage: merge worker metrics into runtime metrics
storage-->>client: expose updated runtime metrics
sequenceDiagram
participant codex as codex snapshots
participant indexer as account indexer
participant reconciler as upsertFromSnapshot
participant storage as storage
codex->>indexer: provide snapshots
indexer->>indexer: createAccountIndexes(byAccountId,byRefreshToken,byEmail)
indexer->>reconciler: upsertFromSnapshot(using indexes, now)
alt new account
reconciler->>indexer: indexAccount(...)
else updated account
reconciler->>indexer: updateAccountIndexes(...)
end
reconciler->>storage: persist changes if any
storage-->>codex: return changed flag
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes flagged concerns
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/codex-cli/sync.ts`:
- Around line 160-163: The merge currently lets snapshot.accountId
unconditionally override current.accountId (see mergedAccountId and
mergedAccountIdSource), which discards user-pinned IDs when
current.accountIdSource === "manual"; change the merge so that if
current.accountIdSource === "manual" you keep current.accountId and
current.accountIdSource regardless of snapshot.accountId, otherwise use
snapshot.accountId and set mergedAccountIdSource appropriately; update the
variables mergedAccountId and mergedAccountIdSource logic to implement this
conditional, and add a Vitest regression that covers a manual-source accountId
plus a snapshot that would otherwise match refresh-token behavior to ensure the
manual id is preserved; ensure the new test references the merge behavior in
sync.ts and conforms to lib/** guidelines (auth rotation, windows IO,
concurrency) and mention affected tests in the change description.
In `@lib/refresh-guardian.ts`:
- Around line 192-200: The code updates this.stats.lastTickDurationMs
unconditionally which records durations for early no-op ticks; if you want
lastTickDurationMs to represent only counted runs, move the assignment of
lastTickDurationMs inside the countedRun branch alongside the other updates (the
block that updates this.stats.maxTickDurationMs, cumulativeTickDurationMs and
avgTickDurationMs) in the tick completion logic in refresh-guardian (so related
to countedRun, durationMs, and this.stats.*). Alternatively, if you intend to
keep lastTickDurationMs for all ticks, add a clarifying comment near the
unconditional assignment explaining it is intentionally tracking even
uncounted/no-op ticks.
In `@test/fetch-helpers.test.ts`:
- Around line 509-517: Add a deterministic CRLF-framed SSE regression to the
existing test for handleSuccessResponseDetailed: create an sseContent string
that uses "\r\n" line endings (e.g. "data: {...}\r\n\r\n") to simulate Windows
framing, construct a Response from that content with status 200, call
handleSuccessResponseDetailed(response, false), and assert that
result.parsedBody and await result.response.json() both equal the expected
parsed object (same expectations as the LF case) to ensure non-stream parsing
handles CRLF correctly.
In `@test/index-retry.test.ts`:
- Around line 32-35: Add a deterministic concurrent-regression test in
test/index-retry.test.ts that triggers sdk.fetch in parallel to prove token
refresh + retry deduplication: create a mocked transport using the existing
handleSuccessResponseDetailed helper and a controllable delayed
401->refresh->200 flow, then call Promise.all on many simultaneous sdk.fetch
calls (e.g., 10) and assert that the token refresh function (or refresh
endpoint) was invoked exactly once and that all fetches eventually succeed; use
vitest utilities (vi.useFakeTimers/advanceTimersByTime or explicit Promise-based
delays) to keep the test deterministic, do not mock real secrets, and include
concrete assertions for single refresh call and correct responses.
In `@test/index.test.ts`:
- Around line 753-756: Replace the loose toContain assertions on the metrics
snapshot (the asserts using result) with deterministic numeric-format checks:
use regex/toMatch to assert each metric line follows "MetricName:
<number(.fraction)?>" format, parse the numeric values from result and assert
exact expected counts/durations (integers or fixed decimal places) instead of
substring presence; add a parallel execution sub-case that performs the same
operation via Promise.all to exercise concurrent code paths and then assert the
aggregated metric equals the sum of individual runs, using vitest APIs so tests
remain deterministic and do not mock real secrets or skip assertions.
In `@test/response-handler.test.ts`:
- Around line 56-64: Add a deterministic vitest unit that mirrors the existing
success case for convertSseToJsonWithDetails but verifies the fallback: when the
SSE stream contains no "response.done" final event, ensure the returned
details.parsedBody is undefined and the returned details.response preserves the
raw SSE text (assert response.text() equals the original sseContent); use the
same Response and Headers setup as the existing test and keep assertions
deterministic and synchronous.
ℹ️ Review info
Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 631c418f-b245-4566-afdd-e17a7dc8ca7b
📒 Files selected for processing (19)
.github/workflows/ci.ymlindex.tslib/codex-cli/sync.tslib/proactive-refresh.tslib/refresh-guardian.tslib/request/fetch-helpers.tslib/request/helpers/tool-utils.tslib/request/response-handler.tslib/storage.tspackage.jsonscripts/benchmark-runtime-path.mjsscripts/check-runtime-budgets.mjsscripts/runtime-bench-budgets.jsontest/fetch-helpers.test.tstest/index-retry.test.tstest/index.test.tstest/proactive-refresh.test.tstest/refresh-guardian.test.tstest/response-handler.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/refresh-guardian.test.tstest/proactive-refresh.test.tstest/fetch-helpers.test.tstest/index.test.tstest/index-retry.test.tstest/response-handler.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/proactive-refresh.tslib/storage.tslib/refresh-guardian.tslib/codex-cli/sync.tslib/request/helpers/tool-utils.tslib/request/response-handler.tslib/request/fetch-helpers.ts
🧬 Code graph analysis (9)
test/refresh-guardian.test.ts (1)
lib/refresh-guardian.ts (1)
RefreshGuardian(32-204)
test/proactive-refresh.test.ts (1)
lib/proactive-refresh.ts (3)
refreshExpiringAccounts(136-200)DEFAULT_PROACTIVE_BUFFER_MS(21-21)DEFAULT_PROACTIVE_REFRESH_MAX_CONCURRENCY(25-25)
lib/storage.ts (1)
lib/storage/migrations.ts (1)
AccountMetadataV3(40-57)
test/fetch-helpers.test.ts (2)
scripts/benchmark-runtime-path.mjs (2)
result(313-315)result(328-331)lib/request/fetch-helpers.ts (1)
handleSuccessResponseDetailed(644-676)
test/index.test.ts (2)
scripts/benchmark-runtime-path.mjs (2)
result(313-315)result(328-331)scripts/test-model-matrix.js (1)
result(383-386)
scripts/benchmark-runtime-path.mjs (4)
lib/codex-cli/state.ts (1)
clearCodexCliStateCache(545-549)lib/storage.ts (1)
normalizeAccountStorage(796-882)lib/codex-cli/sync.ts (1)
syncAccountStorageFromCodexCli(349-442)lib/request/fetch-helpers.ts (2)
handleErrorResponse(588-625)handleSuccessResponseDetailed(644-676)
lib/codex-cli/sync.ts (3)
lib/storage.ts (2)
AccountMetadataV3(26-26)AccountStorageV3(26-26)lib/codex-cli/state.ts (1)
CodexCliAccountSnapshot(24-27)lib/prompts/codex.ts (2)
ModelFamily(47-52)MODEL_FAMILIES(58-64)
test/response-handler.test.ts (1)
lib/request/response-handler.ts (1)
convertSseToJsonWithDetails(65-139)
lib/request/fetch-helpers.ts (3)
lib/constants.ts (1)
HTTP_STATUS(19-26)lib/request/response-handler.ts (1)
convertSseToJsonWithDetails(65-139)lib/utils.ts (1)
isRecord(11-13)
🔇 Additional comments (15)
scripts/runtime-bench-budgets.json (1)
1-15: budget config looks reasonable.the budgets include headroom for ci variance as noted. values range from 0.12ms to 7.0ms which seems appropriate for the described operations. no issues with the structure.
package.json (1)
62-64: ci benchmark scripts are well-structured.the script chain ensures bench runs before gate validation. output to
.tmp/perf/follows project conventions. iterations=40 provides reasonable sample size for ci variance.test/proactive-refresh.test.ts (2)
329-374: solid concurrency regression test.the controlled release pattern at
test/proactive-refresh.test.ts:340-355properly verifies thatmaxConcurrency=2is respected. trackingactiveandmaxActiveensures the worker pool doesn't exceed bounds. result ordering assertion at line 373 confirms map key preservation.
460-463: constant assertion ensures api stability.verifies the exported default matches expected value of 4 at
test/proactive-refresh.test.ts:462.lib/proactive-refresh.ts (2)
159-168: worker pool concurrency is safe.the pattern at
lib/proactive-refresh.ts:160-162reads and incrementsnextIndexsynchronously before anyawait, so no race exists. each worker claims a position atomically within a single js tick, then awaits the refresh. the shared index model is valid for bounded async concurrency.
193-197: timing instrumentation is clean.
lib/proactive-refresh.ts:193-197logs only operational metrics (total, concurrency, durationMs) without leaking tokens or emails. good observability hygiene.lib/storage.ts (3)
362-420: throttled cleanup with in-flight deduplication is sound.
lib/storage.ts:363-374prevents redundant scans within 30s and reuses in-flight work. the finally block at lines 414-418 correctly updateslastRunAtonly after completion. unlink failures at lines 389-395 handle ENOENT gracefully. windows EBUSY is not explicitly handled here, but the scan itself is read-only so EBUSY on readdir would fail gracefully.
762-772: buildAccountIndexByKey provides efficient O(1) lookups.
lib/storage.ts:762-772replaces linear findIndex scans with a single map construction pass. defensive null check at line 768 handles sparse arrays. good perf optimization for normalization hot path.
837-870: index map lookups replace linear scans correctly.
lib/storage.ts:843-844andlib/storage.ts:867-870now use O(1) map lookups instead of findIndex. fallback to clampIndex at line 847 preserves behavior when key is not found. semantics are preserved while reducing complexity from O(n²) to O(n) in normalization.lib/refresh-guardian.ts (1)
131-137: refresh token index map enables stable resolution.
lib/refresh-guardian.ts:131-137builds a lookup map to resolve accounts by refresh token instead of by index. this correctly handles the index-shift scenario where accounts are added/removed between snapshot and resolution. defensive check at line 135 skips accounts without refresh tokens.test/refresh-guardian.test.ts (2)
556-582: tick duration aggregate test is well-designed.
test/refresh-guardian.test.ts:567-571mocksperformance.now()to produce controlled durations: tick 1 = 15ms (25-10), tick 2 = 30ms (60-30). the expected values at lines 578-581 are correct:
- lastTickDurationMs = 30 (most recent)
- maxTickDurationMs = 30 (max of 15, 30)
- cumulativeTickDurationMs = 45 (15 + 30)
- avgTickDurationMs = 23 (45 / 2 rounded)
deterministic and exercises the stats aggregation logic thoroughly.
109-111: sanity assertions for duration metrics are appropriate.the
>= 0checks attest/refresh-guardian.test.ts:109-111(and similar at lines 181-183, 370-372, 532-534) ensure duration metrics don't go negative due to timing quirks. lightweight guards that catch regressions without over-specifying.scripts/check-runtime-budgets.mjs (1)
1-92: budget validation script is solid.
scripts/check-runtime-budgets.mjshandles edge cases well:
- line 30: filters null/undefined entries and validates name type
- line 48: validates maxAvgMs is finite number
- line 57: validates avgMs is finite
- line 36: throws if no budget cases defined
one minor note:
toFixed(6)at lines 65, 69 produces excessive precision for sub-millisecond values.toFixed(3)would be cleaner for microsecond-level reporting. not blocking.test/fetch-helpers.test.ts (1)
11-11: good import wiring for the detailed success path.this keeps the test module aligned with the exported helper surface.
refs:test/fetch-helpers.test.ts:11,lib/request/fetch-helpers.ts:643-675.test/index.test.ts (1)
215-218: mock surface parity looks good.including
handleSuccessResponseDetailedhere prevents drift between plugin tests and fetch-helper exports.
refs:test/index.test.ts:215-218,lib/request/fetch-helpers.ts:643-675.
Add explicit loadAccounts tests for rotating-backup cleanup concurrency controls: - verify 30s cleanup interval throttling per storage path - verify in-flight cleanup promise dedup across concurrent loadAccounts calls This addresses the missing coverage gap called out by Greptile. Co-authored-by: Codex <noreply@openai.com>
|
Addressed Greptile’s remaining 4/5 gap in commit 917d936. Added explicit Vitest coverage for lib/storage.ts cleanup concurrency controls in est/storage.test.ts:
Validation rerun:
|
- preserve manual accountId on snapshot merge - clarify refresh guardian last-tick metric semantics - add CRLF/fallback/concurrency/metrics regression tests Co-authored-by: Codex <noreply@openai.com>
|
@coderabbitai review Re-triggering review to clear the transient rate-limit status check failure. |
|
✅ Actions performedReview triggered.
|
|
Re-requesting CodeRabbit review: the current failure is CodeRabbit: Review rate limit exceeded while other checks are green. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Follow-up tracking: external blocker recorded in #44. Current failing context remains:
Actions already taken:
Next required step is external quota reset/increase, then rerun CodeRabbit. |
|
if you need to escalate the quota reset faster, the CodeRabbit FAQ and pricing pages have details on plan limits and support channels: https://docs.coderabbit.ai/faq/ and https://docs.coderabbit.ai/about/pricing/ ✅ Actions performedReview triggered.
|
No code changes; this commit exists to re-run external CodeRabbit status checks that previously failed due provider rate limits.\n\nCo-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/codex-cli/sync.ts`:
- Around line 100-108: The updateAccountIndexes flow currently clears entries
for the previous AccountMetadataV3 and then indexes the next, which can miss
re-keying when refresh token or email changes; update updateAccountIndexes to
explicitly remove any old keys tied to previous.refreshToken and previous.email
(use clearAccountIndexEntries(previous, index) and then indexAccount(indexes,
next, index)) and ensure indexAccount handles collisions by deduping existing
entries before inserting; add a vitest that mutates an account’s refresh token
and email, applies a snapshot keyed by the new values, and asserts no duplicate
account is inserted (covering updateAccountIndexes, clearAccountIndexEntries,
indexAccount), and extend tests/queue logic to simulate EBUSY and 429 retry
paths to validate the new re-keying under concurrent IO.
In `@lib/refresh-guardian.ts`:
- Around line 132-147: The token-only lookup (liveIndexByRefreshToken) can
collide or miss accounts and cause wrong-account mutations; change resolution in
refresh-guardian to first try a token-to-index map that stores arrays of indices
(liveIndexByRefreshToken: Map<string, number[]>) and disambiguate by matching a
stable key from the snapshot (e.g. sourceAccount.id or sourceAccount.uniqueId or
sourceAccount.email) against manager.getAccountByIndex(...) candidates, and if
no token exists or no candidate matches, fall back to resolving by the original
snapshot index via manager.getAccountByIndex(accountIndex) (or by a stable
id-to-index map if available). Update uses of liveIndexByRefreshToken, the loop
over refreshResults, and the logic around sourceAccount.refreshToken and
resolvedIndex to implement this fallback and de-duplication; add/adjust tests to
cover no_refresh_token paths, duplicate-refresh-token scenarios, and ensure logs
do not contain secret tokens.
In `@test/index-retry.test.ts`:
- Around line 258-271: The mock refresh handler refreshAndUpdateTokenMock
currently deduplicates calls by using a shared refreshInFlight promise, which
masks plugin-level concurrency bugs; change the mock so each call returns a new
promise that calls refreshEndpoint (i.e., remove the refreshInFlight caching
logic in refreshAndUpdateTokenMock) so concurrent invocations in the test (the
calls asserted in the block around lines 294-304) exercise real race behavior;
ensure the mock still assigns refreshed data into the passed auth object
(Object.assign(auth, refreshed)) but do not coalesce calls—let the plugin under
test be responsible for orchestration so the test deterministically fails when
refresh dedupe regresses.
In `@test/storage.test.ts`:
- Around line 753-816: The polling loop waiting for cleanupScanCount is flaky;
replace it with a deterministic wait such as vi.waitFor(() => cleanupScanCount
=== 1) (or increase timeout) before starting the second loadAccounts call so the
test reliably detects the in-flight scan; update the test around the
cleanupScanCount polling (the for loop between the firstLoad = loadAccounts()
and secondLoad = loadAccounts()), keeping the existing readdirSpy, scanGate,
releaseScan, and final restore behavior intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 59700bc5-7894-4f14-88de-8d9a15a5bb4e
📒 Files selected for processing (8)
lib/codex-cli/sync.tslib/refresh-guardian.tstest/codex-cli-sync.test.tstest/fetch-helpers.test.tstest/index-retry.test.tstest/index.test.tstest/response-handler.test.tstest/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/response-handler.test.tstest/storage.test.tstest/fetch-helpers.test.tstest/index.test.tstest/codex-cli-sync.test.tstest/index-retry.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/refresh-guardian.tslib/codex-cli/sync.ts
🧬 Code graph analysis (5)
test/response-handler.test.ts (1)
lib/request/response-handler.ts (1)
convertSseToJsonWithDetails(65-139)
test/storage.test.ts (1)
lib/storage.ts (2)
setStoragePathDirect(467-472)loadAccounts(889-891)
test/fetch-helpers.test.ts (1)
lib/request/fetch-helpers.ts (1)
handleSuccessResponseDetailed(644-676)
test/index-retry.test.ts (1)
index.ts (1)
OpenAIAuthPlugin(4256-4256)
lib/codex-cli/sync.ts (3)
lib/storage.ts (2)
AccountMetadataV3(26-26)AccountStorageV3(26-26)lib/codex-cli/state.ts (1)
CodexCliAccountSnapshot(24-27)lib/prompts/codex.ts (2)
ModelFamily(47-52)MODEL_FAMILIES(58-64)
🪛 ast-grep (0.41.0)
test/index.test.ts
[warning] 757-760: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (9)
test/storage.test.ts (2)
700-751: solid regression test for the 30s cleanup throttle.test correctly advances
Date.nowpast the 30s window and validates cleanup scans fire at expected boundaries. the spy setup is clean and properly intercepts only the storage directory.one minor note: line 741 advances by
29_000and line 743 by1_001, which totals30_001msfrom the first call. this correctly crosses the 30s threshold. the assertion at line 750 expecting exactly 2 scans aligns with the throttle spec.
711-716: fake test credentials are appropriate here.these fake tokens and emails (
real-refresh-token,real-user@example.com) are clearly fixture data for testing purposes. no real secrets or pii are exposed. per coding guidelines, this is acceptable.test/codex-cli-sync.test.ts (1)
656-705: good regression coverage for manual-id preservation on token-based merge.
test/codex-cli-sync.test.ts:656(Line 656) throughtest/codex-cli-sync.test.ts:705(Line 705) is deterministic and directly protects the auth-rotation merge path by asserting manualaccountId/accountIdSourceare preserved while token/email fields still refresh.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.lib/codex-cli/sync.ts (1)
160-169: manual account-id precedence is correctly preserved now.
lib/codex-cli/sync.ts:160(Line 160) throughlib/codex-cli/sync.ts:169(Line 169) keeps local manual identity authoritative during snapshot merge, which avoids auth-rotation identity drift.As per coding guidelines,
lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.lib/refresh-guardian.ts (2)
192-202: duration metric semantics look good here.
lib/refresh-guardian.ts:192-202now clearly documents whylastTickDurationMsincludes no-op ticks while aggregate duration metrics stay gated behind counted runs. this closes the earlier ambiguity cleanly.
139-180: regression tests for no_refresh_token, duplicate tokens, and 429 handling already exist.the tests you asked for are already in
test/refresh-guardian.test.ts. here's what's covered:
no_refresh_token disabling behavior at
test/refresh-guardian.test.ts:279-372: the "classifies failure reasons" test verifies thatsetAccountEnabled(1, false)is called for accountB when reason isno_refresh_token(line 346-347), andmarkAccountCoolingDown(accountB, 60_000, "auth-failure")is invoked (line 350).duplicate token reconciliation at
test/refresh-guardian.test.ts:221-277: the "resolves refreshed account using stable refresh token when indices shift" test proves that accounts are correctly mapped by refresh token even when array indices change. additionally,test/refresh-guardian.test.ts:584-654tests account removal safety during tick.429 classification at
test/refresh-guardian.test.ts:279-372: line 310 sends a 429 statusCode, and line 353 confirms it's classified as "rate-limit" viamarkAccountCoolingDown(accountC, 60_000, "rate-limit").no additional test coverage is needed for these scenarios.
> Likely an incorrect or invalid review comment.test/fetch-helpers.test.ts (1)
509-528: good deterministic windows sse regression coverage.
test/fetch-helpers.test.ts:509-528now validates both standard and crlf-framed payloads with strictparsedBodyandresponse.json()assertions, which cleanly exercises the non-stream detailed path inlib/request/fetch-helpers.ts:643-675.test/response-handler.test.ts (1)
56-77: detail-mode fallback contract is now covered correctly.
test/response-handler.test.ts:56-77now asserts both success metadata extraction and the no-final-event fallback (parsedBodyundefined + raw text preserved), aligned withlib/request/response-handler.ts:64-138.test/index.test.ts (1)
749-840: metrics assertions are materially stronger and concurrency-aware.
test/index.test.ts:749-840moves from loose substring checks to parsed numeric assertions and adds a parallel fetch case, giving a deterministic guardrail for metrics aggregation regressions under concurrent load.
Add deterministic regressions for sync/indexing and refresh races, harden refresh guardian account resolution, and move auth refresh deduping into plugin orchestration. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
index.ts (1)
841-844:⚠️ Potential issue | 🟠 Majorhydrate save needs windows-safe retry/backoff.
this persistence is single-shot. transient
ebusy/epermon windows can fail hydration writes and drop refreshed token/email updates in this pass. wrapsaveAccountswith a small bounded retry (same pattern already used for watcher path switches).proposed fix
if (changed) { storage.accounts = accountsCopy; - await saveAccounts(storage); + for (let attempt = 0; attempt < 3; attempt += 1) { + try { + await saveAccounts(storage); + break; + } catch (error) { + const code = (error as NodeJS.ErrnoException | undefined)?.code; + if ((code !== "EBUSY" && code !== "EPERM") || attempt === 2) throw error; + await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt)); + } + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 841 - 844, Wrap the call to saveAccounts(...) inside the same bounded retry/backoff you use for watcher path switches: detect transient Windows errors (e.g., EBUSY/EPERM), retry a small number of times with a short exponential/backoff delay, and only throw after retries are exhausted; update the block where storage.accounts = accountsCopy and await saveAccounts(storage) to perform the guarded retry so hydration writes are resilient to transient filesystem locks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@index.ts`:
- Around line 392-393: The current dedup key generation for in-flight refreshes
uses a truncated fallback `access:${auth.access.slice(0, 32)}` which can
collide; update the logic that builds refreshKey (where auth.refresh is checked
and refreshKey is used to look up inFlightAuthRefreshByToken) to use a
collision-safe identifier instead of a 32-char prefix — e.g., use the full
access token or a deterministic hash (sha256/base64) of auth.access (or include
another unique claim/id from the token) so distinct access tokens cannot map to
the same refreshKey.
- Around line 389-410: Add deterministic regression tests covering
refreshAuthWithDedup and email hydration concurrency: write tests that (1)
stub/mock refreshAndUpdateToken and call refreshAuthWithDedup concurrently with
the same OAuthAuthDetails to assert the returned value is the identical Promise
instance and that inFlightAuthRefreshByToken contains only one entry for the
refresh key while pending, (2) trigger a rejection from the mocked
refreshAndUpdateToken and assert the finally block removes the entry from
inFlightAuthRefreshByToken after the promise settles, and (3) exercise the
hydration worker path by bypassing or mocking the hydrateEmails skip (the
environment check around hydrateEmails) and by controlling worker tasks to
deterministically saturate HYDRATE_EMAILS_MAX_CONCURRENCY (4) assert
emailHydrationMaxConcurrencyUsed never exceeds HYDRATE_EMAILS_MAX_CONCURRENCY
and that metrics emailHydrationRuns, emailHydrationCandidates,
emailHydrationRefreshed, emailHydrationUpdated, emailHydrationFailures and
emailHydrationCumulativeDurationMs accumulate expected values; use injected
mocks/fake timers/promises to make all concurrency behavior deterministic and
locate targets by symbol names refreshAuthWithDedup, inFlightAuthRefreshByToken,
refreshAndUpdateToken, hydrateEmails, HYDRATE_EMAILS_MAX_CONCURRENCY and the
emailHydration* metric names.
In `@test/codex-cli-sync.test.ts`:
- Around line 263-289: The test is accidentally passing because
syncAccountStorageFromCodexCli may reuse a cached codex-cli snapshot; before
calling syncAccountStorageFromCodexCli the second time, explicitly clear the
codex-cli state cache or force a refresh so the rewritten file is re-read. Add a
step between the firstSync and secondSync that either calls your existing
cache-clear helper (e.g., clearCodexCliCache or resetCodexCliStateCache) or
calls syncAccountStorageFromCodexCli with a forceRefresh/disableCache option so
the secondSync actually reads the updated accounts JSON and exercises the
rekey-on-updated-snapshot path.
---
Outside diff comments:
In `@index.ts`:
- Around line 841-844: Wrap the call to saveAccounts(...) inside the same
bounded retry/backoff you use for watcher path switches: detect transient
Windows errors (e.g., EBUSY/EPERM), retry a small number of times with a short
exponential/backoff delay, and only throw after retries are exhausted; update
the block where storage.accounts = accountsCopy and await saveAccounts(storage)
to perform the guarded retry so hydration writes are resilient to transient
filesystem locks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 78c5a5c8-5de0-4aa4-8b80-be99714532a0
📒 Files selected for processing (6)
index.tslib/refresh-guardian.tstest/codex-cli-sync.test.tstest/index-retry.test.tstest/refresh-guardian.test.tstest/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/refresh-guardian.test.tstest/storage.test.tstest/codex-cli-sync.test.tstest/index-retry.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/refresh-guardian.ts
🧬 Code graph analysis (3)
test/storage.test.ts (1)
lib/storage.ts (2)
setStoragePathDirect(467-472)loadAccounts(889-891)
test/codex-cli-sync.test.ts (3)
scripts/benchmark-runtime-path.mjs (2)
accountsPath(208-208)current(292-292)lib/storage.ts (1)
AccountStorageV3(26-26)lib/codex-cli/sync.ts (1)
syncAccountStorageFromCodexCli(356-449)
test/index-retry.test.ts (1)
index.ts (1)
OpenAIAuthPlugin(4272-4272)
🔇 Additional comments (6)
test/storage.test.ts (1)
700-751: good deterministic coverage for cleanup throttle and in-flight dedup.the new regressions in
test/storage.test.ts:700-816validate both the 30s throttle window and concurrent in-flight dedup with explicit gating and assertions. this is the right shape for filesystem/concurrency hot-path protection.Also applies to: 753-816
test/codex-cli-sync.test.ts (1)
729-779: manual accountid preservation coverage is strong.
test/codex-cli-sync.test.ts:729-779correctly asserts that manualaccountIdandaccountIdSourcesurvive refresh-token matching while auth/email fields still update.lib/refresh-guardian.ts (1)
137-181: account resolution + tick telemetry changes look good.the new candidate disambiguation and fallback flow in
lib/refresh-guardian.ts:137-181materially lowers wrong-account mutation risk under duplicate refresh tokens, and the duration aggregation inlib/refresh-guardian.ts:227-237is cleanly implemented with counted-run gating.Also applies to: 227-237
test/refresh-guardian.test.ts (1)
279-343: excellent regression additions for duplicate-token and metric paths.
test/refresh-guardian.test.ts:279-380covers the core concurrency/account-resolution hazards, andtest/refresh-guardian.test.ts:659-685locks down the new tick-duration aggregates with deterministic timing control.Also applies to: 345-380, 659-685
test/index-retry.test.ts (1)
242-299: good concurrency regression for refresh dedup under retry pressure.the new test at
test/index-retry.test.ts:242-299is the right race repro: parallelsdk.fetchcalls, gated refresh completion, and strict single-refresh assertions.index.ts (1)
2367-2379: parsed-body reuse is a good hot-path optimization.using
handleSuccessResponseDetailedand only parsing on-demand for empty-response retries keeps behavior intact while avoiding redundant clone+parse in the common path.
- skip refresh dedup when refresh token is absent to prevent key collisions - preserve existing inferred accountIdSource when syncing Codex CLI snapshots - add storage cleanup-state reset hook for test/plugin isolation - add deterministic regressions for dedup cleanup, no-refresh fanout, hydration metrics, and sync cache refresh behavior Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
index.ts (1)
754-843:⚠️ Potential issue | 🔴 Criticaladd explicit vitest for hydration worker pool 429 handling and bounded concurrency.
the hydration worker queue at
index.ts:754-843has zero test coverage. i found no tests forhydrateEmails,HYDRATE_EMAILS_MAX_CONCURRENCY, oremailHydrationin the test directory. whilequeuedRefreshhas 429 coverage (test/refresh-queue.test.ts:373), it doesn't exercise the multi-worker concurrency scenario that your bounded pool introduces. add a vitest that drives concurrentqueuedRefreshcalls within the worker pool and asserts the 429 backoff behavior doesn't leak into the next hydration batch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 754 - 843, Add a vitest that targets the email hydration worker pool by invoking the function that runs this block (hydrateEmails or the exported function that processes accounts using HYDRATE_EMAILS_MAX_CONCURRENCY) and mock queuedRefresh to simulate 429 responses for some calls (then succeed on retry) and varied latencies; assert that concurrency never exceeds HYDRATE_EMAILS_MAX_CONCURRENCY, that 429 backoffs are retried without affecting other workers (no leak into subsequent batch), and that runtimeMetrics (emailHydrationRefreshed, emailHydrationFailures, emailHydrationUpdated, emailHydrationMaxConcurrencyUsed, etc.) reflect the expected counts; use vitest mocks/fake timers to control timing and ensure Promise.all completes deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/storage.ts`:
- Around line 392-425: The cleanup loop that unlinks staleArtifacts (inside the
anonymous async IIFE updating rotatingBackupCleanupState) must add
transient-retry handling for fs.unlink failures with Windows transient codes
("EBUSY", "EPERM"): implement a small retry loop with limited attempts and short
backoff (e.g., exponential or fixed delay) around the unlink call in the for
(const staleArtifactPath of staleArtifacts) block, treat EBUSY/EPERM as
transient (retry) and only log and give up after attempts are exhausted; also
ensure lastRunAt in rotatingBackupCleanupState is not advanced to Date.now()
when transient unlink retries still failed—only set lastRunAt after all
artifacts are either removed or definitively failed with non-transient errors;
update the failure logging to include attempt count and error details;
add/adjust vitest unit tests to simulate EBUSY/EPERM transient failures on
fs.unlink and verify retries occur and lastRunAt behavior is correct (throttling
not advanced on transient-failure exhaustion).
In `@test/index.test.ts`:
- Around line 844-919: Add a new test that mirrors the existing "tracks
hydration worker cap..." case but simulates partial failures: set
mockStorage.accounts to 6 entries, vi.mocked(queuedRefresh) should resolve
successfully for 4 tokens and throw/return failure for 2 tokens (identify by
refreshToken or index), keep the same active/maxActive tracking to validate
bounded concurrency, call oauthMethod.authorize() and then await
plugin.tool["codex-metrics"].execute(), and assert metricValue(report, "Email
hydration failures") === 2 while also asserting "Email hydration
refreshed"/"updated" reflect 4 successes, "Email hydration candidates" is 6, and
both the reported "Email hydration max concurrency used" and local maxActive are
<=4; reuse promptLoginMode mocking and the same env var save/restore pattern as
in the original test to avoid side effects.
- Around line 903-917: Refactor the repeated env restore logic by extracting a
small helper (e.g., restoreEnv) that takes an env var name and its previous
value and either deletes the key or resets it; replace the three blocks that
reference previousVitestWorkerId/VITEST_WORKER_ID, previousNodeEnv/NODE_ENV, and
previousSkipHydrate/CODEX_SKIP_EMAIL_HYDRATE with calls to that helper to DRY up
the cleanup in the test teardown.
In `@test/storage.test.ts`:
- Around line 703-754: Add a deterministic vitest regression that injects a
transient Windows-style lock error into fs.unlink while exercising loadAccounts
cleanup logic: mock fs.unlink (via vi.spyOn or vi.stub) to throw an
EBUSY/EPERM-like error on the first call for the targeted stale artifact path
and succeed on subsequent calls, then trigger concurrent calls to loadAccounts
(or otherwise invoke the cleanup queue path used by the cleanup logic) to
reproduce the transient-lock scenario and assert the cleanup finishes without
unhandled rejection and the expected cleanup side-effects still occur; restore
the spy after the test. Use the existing test patterns (date mocking, readdir
spy) and reference loadAccounts and the unlink spy to locate the scope to
change.
---
Outside diff comments:
In `@index.ts`:
- Around line 754-843: Add a vitest that targets the email hydration worker pool
by invoking the function that runs this block (hydrateEmails or the exported
function that processes accounts using HYDRATE_EMAILS_MAX_CONCURRENCY) and mock
queuedRefresh to simulate 429 responses for some calls (then succeed on retry)
and varied latencies; assert that concurrency never exceeds
HYDRATE_EMAILS_MAX_CONCURRENCY, that 429 backoffs are retried without affecting
other workers (no leak into subsequent batch), and that runtimeMetrics
(emailHydrationRefreshed, emailHydrationFailures, emailHydrationUpdated,
emailHydrationMaxConcurrencyUsed, etc.) reflect the expected counts; use vitest
mocks/fake timers to control timing and ensure Promise.all completes
deterministically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 500ad623-5563-4c44-ad02-65bb19bd1cb0
📒 Files selected for processing (7)
index.tslib/codex-cli/sync.tslib/storage.tstest/codex-cli-sync.test.tstest/index-retry.test.tstest/index.test.tstest/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/storage.test.tstest/index-retry.test.tstest/codex-cli-sync.test.tstest/index.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/storage.tslib/codex-cli/sync.ts
🧬 Code graph analysis (5)
test/storage.test.ts (1)
lib/storage.ts (3)
resetRotatingBackupCleanupStateForTesting(46-51)setStoragePathDirect(475-481)loadAccounts(898-900)
index.ts (1)
lib/request/fetch-helpers.ts (2)
refreshAndUpdateToken(356-391)handleSuccessResponseDetailed(644-676)
lib/storage.ts (1)
lib/storage/migrations.ts (1)
AccountMetadataV3(40-57)
test/codex-cli-sync.test.ts (3)
scripts/benchmark-runtime-path.mjs (2)
accountsPath(208-208)current(292-292)lib/codex-cli/sync.ts (1)
syncAccountStorageFromCodexCli(356-449)lib/codex-cli/state.ts (1)
clearCodexCliStateCache(545-549)
test/index.test.ts (2)
lib/refresh-queue.ts (1)
queuedRefresh(408-410)lib/cli.ts (1)
promptLoginMode(185-289)
🪛 ast-grep (0.41.0)
test/index.test.ts
[warning] 759-762: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 854-854: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${escapedLabel}: (\\d+)$, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (13)
lib/storage.ts (1)
771-781: good o(1) remap path for active and family indices.the
buildAccountIndexByKeypath atlib/storage.ts:771-781and lookups atlib/storage.ts:846-853/lib/storage.ts:876-877remove repeated linear scans cleanly, and the remap behavior is covered bytest/storage.test.ts:45-79andtest/storage.test.ts:974-993.Also applies to: 846-853, 876-877
test/storage.test.ts (1)
21-22: good isolation guard for shared cleanup state.using
resetRotatingBackupCleanupStateForTestingin both hooks attest/storage.test.ts:36andtest/storage.test.ts:40prevents cross-test leakage from the per-path in-memory queue state.Also applies to: 33-41
index.ts (2)
389-413: refresh dedup cleanup logic looks correct.
index.tsline 389-413 correctly skips dedup when refresh is absent and clearsinFlightAuthRefreshByTokenentries infinally, so failed refreshes do not poison future calls.
2370-2382: parsedbody threading is integrated cleanly.
index.tsline 2370-2382 useshandleSuccessResponseDetailedfirst and only falls back to lazy body parsing whenparsedBodyis missing, which is the right hot-path behavior.lib/codex-cli/sync.ts (2)
100-108: index rekey maintenance is solid.
lib/codex-cli/sync.tsline 100-108 and line 196-198 clear old keys before reindexing merged records, which keeps refresh/email/accountId lookups stable after key updates.Also applies to: 196-198
160-169: manual account id preservation guard is correct.
lib/codex-cli/sync.tsline 160-169 keeps manualaccountIdownership intact while still allowing token/email/access merges, which protects auth rotation identity semantics.test/codex-cli-sync.test.ts (2)
218-292: the rekey regression is deterministic and exercises the real path.
test/codex-cli-sync.test.tsline 263-287 rewrites snapshot data, clears codex cli state cache, and assertschangedplus token/email updates, so this is no longer self-fulfilling.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
294-340: accountidsource preservation coverage is strong.
test/codex-cli-sync.test.tsline 294-340 and line 780-829 validate inferred/manual source preservation during snapshot merges and protect rotation identity invariants.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.Also applies to: 780-829
test/index-retry.test.ts (2)
242-299: concurrent dedup regression is now scoped to plugin orchestration.
test/index-retry.test.tsline 258-264 avoids mock-side coalescing, and line 287-299 asserts a single refresh path across 10 concurrentsdk.fetchcalls.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
301-378: no-refresh fanout and failed-entry cleanup are both covered well.
test/index-retry.test.tsline 301-378 verifies no-refresh concurrent fanout behavior, and line 380-435 verifies failed dedup entries are cleaned so later retries re-execute refresh.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.Also applies to: 380-435
test/index.test.ts (3)
215-218: lgtm — mock aligns with the detailed handler contract.the mock correctly returns
{ response, parsedBody: undefined }which matches thehandleSuccessResponseDetailedsignature used inlib/request/fetch-helpers.ts. parsedBody as undefined is appropriate for the default test path.
748-842: solid metrics regression test — addresses prior feedback about coarse assertions.the
readNumericMetrichelper with exact format matching and the parallel execution viaPromise.allat lines 814-821 will catch counter aggregation races. the exact increment check at line 835 (totalAfter - baselineTotal === parallelRuns) is the right approach.re: static analysis ReDoS warning at lines 759-762 — this is a false positive in test context since
escapedLabelescapes regex metacharacters and all labels are test-controlled string literals ("Total upstream requests", etc.).
853-858: static analysis ReDoS warning is a false positive here.the
metricValuehelper at line 854 builds regex fromescapedLabelwhich is always a test-controlled literal string passed directly in the assertions below (e.g.,"Email hydration runs"). the escape function handles regex metacharacters. no risk in test context.
Capture refresh-guardian max tick duration across error/no-op ticks, add transient stale-artifact unlink retries with non-throttled re-scan on exhausted transient locks, and expand deterministic hydration/storage regressions requested in review. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/refresh-guardian.ts`:
- Around line 137-145: The current logic in getAccountsSnapshot handling (in
lib/refresh-guardian.ts) records loop index i into liveIndexByRefreshToken and
then calls manager.getAccountByIndex(...) using those array positions, which is
wrong for reordered or sparse snapshots; instead capture and store each
account's stable account.index (candidate.index) when populating
liveIndexByRefreshToken, update any downstream use of those vectors to call
manager.getAccountByIndex(accountIndex) with that stored account.index, and
replace the bounds check that compares sourceAccount.index against
liveAccounts.length with a validation against the account index space (or use
manager.hasAccountIndex/getAccountByIndex safely) so we never mutate the wrong
account; run/update the deterministic tests in test/refresh-guardian.test.ts to
cover reordered/sparse indices and ensure logging still avoids leaking
tokens/emails.
In `@lib/storage.ts`:
- Around line 458-470: The finally block can re-insert stale state after
setStoragePath()/setStoragePathDirect() cleared or replaced it; modify the
finally so it only updates rotatingBackupCleanupState for the same path if the
current state's inFlight still equals the cleanupPromise (i.e., read current =
rotatingBackupCleanupState.get(path) and only call set(...) when
current?.inFlight === cleanupPromise), using the same
path/cleanupPromise/shouldAdvanceLastRunAt symbols; apply this guard where the
finally writes the lastRunAt/inFlight so an older cleanup cannot overwrite a
cleared or swapped state.
- Around line 448-455: The current catch around the fs.readdir call advances
lastRunAt even for transient lock errors (EBUSY/EPERM/EAGAIN), suppressing
retries; change the error handling so that when (error as
NodeJS.ErrnoException).code is one of the transient codes (EBUSY, EPERM, EAGAIN)
you do not update lastRunAt and return/exit early (or rethrow) instead of
proceeding to the success path that sets lastRunAt; keep the existing handling
for ENOENT and non-transient errors (log/warn) but only update lastRunAt inside
the success/completion path of the scan function (or after confirming no
transient error), and add/adjust vitest coverage to assert that lastRunAt is not
advanced for EBUSY/EPERM/EAGAIN scenarios.
In `@test/index.test.ts`:
- Around line 849-855: Hoist the duplicate restoreEnv helper into the outer
describe("codex-metrics tool") scope so both test blocks reuse the same
function; remove the inline definitions at the two duplicate sites (the ones
currently defining restoreEnv) and update the tests to call the shared
restoreEnv; keep the signature (key: string, previous: string | undefined): void
and behavior (delete env when previous is undefined, otherwise restore) so
existing calls continue to work.
- Around line 860-865: The metric parsing logic is duplicated as metricValue in
two places (test/index.test.ts lines around 860–865 and 932–937); consolidate by
moving a single helper into the shared describe scope (or replace both uses with
the existing readNumericMetric helper) so both tests call the same function.
Create one helper (e.g., metricValue or readNumericMetric) at the describe-level
that takes (report: string, label: string) and reuses the existing escaped-label
RegExp logic, then update both test sites to call that single helper and remove
the duplicate definition.
In `@test/refresh-guardian.test.ts`:
- Around line 297-303: The test mock for manager.getAccountByIndex is positional
(returns snapshots[1][index]) which hides bugs where code resolves array
positions instead of actual account.index values; change the mock in
test/refresh-guardian.test.ts so getAccountByIndex(index) looks up accounts by
their index property in the current snapshot (e.g., find the entry where
acc.index === index within the snapshot returned by getAccountsSnapshot) to
force the code path in lib/refresh-guardian (the index-resolution logic around
the duplicate-token race) to use real account.index semantics and keep the
regression deterministic.
In `@test/storage.test.ts`:
- Around line 703-937: Add a deterministic regression that reproduces a
path-transition during an in-flight cleanup scan: when calling loadAccounts()
start a cleanup scan (spy/mock fs.readdir or fs.unlink to detect/hold the scan
using a Promise gate), then call setStoragePathDirect(...) to switch the storage
path while that first scan is still blocked, then release the gate and await
both the first and a subsequent loadAccounts() call; assert that cleanupScan
counts are tracked per-directory (the original dir got cleaned and the new dir’s
cleanup did not leak into the old dir) and that stale artifacts in the old path
are removed while the new path remains intact. Target the loadAccounts logic and
the cleanup code paths referenced (the rotating cleanup and stale-artifact
unlink/retry behavior) using the same spy/restore pattern (readdirSpy/unlinkSpy
+ release gate) and vitest waitFor to keep the test deterministic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 96db57de-7221-4ae3-9f01-e3e6683ddffd
📒 Files selected for processing (5)
lib/refresh-guardian.tslib/storage.tstest/index.test.tstest/refresh-guardian.test.tstest/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/refresh-guardian.test.tstest/storage.test.tstest/index.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/refresh-guardian.tslib/storage.ts
🧬 Code graph analysis (4)
test/refresh-guardian.test.ts (1)
lib/refresh-guardian.ts (1)
RefreshGuardian(38-242)
test/storage.test.ts (1)
lib/storage.ts (3)
resetRotatingBackupCleanupStateForTesting(46-51)setStoragePathDirect(520-526)loadAccounts(943-945)
test/index.test.ts (2)
lib/refresh-queue.ts (1)
queuedRefresh(408-410)lib/cli.ts (1)
promptLoginMode(185-289)
lib/storage.ts (1)
lib/storage/migrations.ts (1)
AccountMetadataV3(40-57)
🪛 ast-grep (0.41.0)
test/index.test.ts
[warning] 759-762: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
^${escapedLabel}: ${numberPattern}${suffixPattern}$,
"m",
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 861-861: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${escapedLabel}: (\\d+)$, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 933-933: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${escapedLabel}: (\\d+)$, "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (5)
test/index.test.ts (4)
215-218: lgtm — mock aligns with new detailed response handler.the mock returns
{ response, parsedBody: undefined }which matches the signature exposed bylib/request/fetch-helpers.ts. consistent with the other response handling mocks in this file.
748-842: solid numeric assertions replace loosetoContainchecks — addresses prior feedback.the
readNumericMetrichelper with explicit regex patterns and the parallel fetch aggregation test properly verify that metrics increment deterministically. the delta assertions (expect(totalAfter - baselineTotal).toBe(parallelRuns)) are exactly the kind of concurrency regression coverage the guidelines demand.the static analysis flags ReDoS on the regex at
test/index.test.ts:759-762, but sincelabelvalues are hardcoded test strings (e.g.,"Total upstream requests"), not external input, this is a false positive in the test context.
916-1017: good partial-failure regression — validates worker pool doesn't stall on errors.this test directly addresses the prior review feedback requesting a case where 2 of 6 refreshes fail. the assertions at
test/index.test.ts:1000-1003confirm:
deltaRefreshed === 4(successes)deltaFailures >= 2(failures)deltaFailures === deltaCandidates - deltaRefreshed(accounting identity)the
maxActive <= 4check at line 1011 ensures bounded concurrency holds even when some workers throw. this is the concurrency regression coverage the guidelines require.
844-914: hydration worker cap test is deterministic and correctly asserts bounded concurrency.the
active/maxActivetracking pattern attest/index.test.ts:874-888accurately captures peak parallelism, and the dual assertion at lines 907-908 (reported metric + local counter) is the right approach per prior feedback.env var cleanup in
finallyblock at lines 909-913 is correct.lib/storage.ts (1)
816-826: good move to keyed index remapping for active/family lookups.the
buildAccountIndexByKeypath inlib/storage.ts:816-923removes repeated linear scans during normalization and keeps behavior clear.Also applies to: 891-923
Address PR #38 review findings with minimal behavior-safe changes:\n- map live refresh-token candidates by stable account.index\n- keep stale-artifact cleanup throttle unchanged on transient scan failures\n- guard cleanup finalizer to avoid stale in-flight state writes after path transitions\n- add regression tests for sparse/reordered index lookups and path-transition isolation\n- dedupe duplicated codex-metrics test helpers\n\nCo-authored-by: Codex <noreply@openai.com>
Treat EACCES as a transient unlink failure in rotating backup cleanup retry logic and add deterministic regression coverage to ensure exhausted EACCES retries do not advance cleanup throttling.\n\nCo-authored-by: Codex <noreply@openai.com>
|
All previously open review threads are now addressed and resolved, including the follow-up Windows EACCES transient cleanup handling + regression tests in 462cce6. Requesting re-review for approval refresh. |
Summary
Commits
Validation
Notes
Thread Resolution Update (2026-03-05)
What changed
index.tsso concurrentsdk.fetchpaths do not fan out duplicate refresh work.lib/refresh-guardian.tsaccount resolution to avoid wrong-account mutations:test/codex-cli-sync.test.ts).test/index-retry.test.ts).vi.waitFor(test/storage.test.ts).test/refresh-guardian.test.ts).How to test
npx vitest run test/index-retry.test.ts test/refresh-guardian.test.ts test/codex-cli-sync.test.ts test/storage.test.tsnpm run lintnpm run typecheckRisk / rollout notes
Thread Resolution Update (2026-03-05, follow-up)
What changed
esetRotatingBackupCleanupStateForTesting() and reset cleanup state when storage path context changes to prevent module-level cleanup-state leakage.
How to test
px vitest run test/index-retry.test.ts test/index.test.ts test/codex-cli-sync.test.ts test/storage.test.ts
pm run lint
pm run typecheck
Risk / rollout notes
Final Thread Resolution Update (2026-03-05)
What changed
estoreEnv helper.
How to test
px vitest run test/refresh-guardian.test.ts test/storage.test.ts test/index.test.ts
pm run lint
pm run typecheck
Risk / rollout notes
Thread Resolution Update (2026-03-06)
What changed
How to test
Risk / rollout notes
Follow-up Thread Hardening (2026-03-06)
What changed
How to test
Risk / rollout notes
note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
this pr is a well-scoped performance and correctness sweep across the hot request/storage/refresh paths. all previously-flagged thread issues (dedup key collision,
accountIdSourcecoercion,maxTickDurationMsundercounting, cleanup-state leakage) are resolved and covered by new regressions.key changes:
refreshAuthWithDedupinindex.tsdeduplicates concurrent refresh work by stable refresh-token key; skips dedup entirely when no refresh token is present, preventing cross-account token contaminationPromise.all, reducing burst load on the token endpointhandleSuccessResponseDetailed/convertSseToJsonWithDetails/readBodyForErrorHandlingeliminate redundantJSON.parseon both the success and error paths by threading the already-parsed body through to the callercleanupStaleRotatingBackupArtifactsis now throttled with a 30s per-path gate, in-flight dedup, and transient-retry unlinks (EPERM/EBUSY/EAGAIN/EACCES) — important for windows desktop antivirus lock scenarioslib/codex-cli/sync.tsbuilds account indexes once per sync pass and updates them incrementally, replacing three O(n) passes per snapshot with O(1) lookupslib/refresh-guardian.tsdisambiguates duplicate-refresh-token candidates by stableaccountId/email identity and falls back tosourceAccount.index, avoiding wrong-account mutations; tick timing metrics (last/max/cumulative/avg) are recorded on all paths including error tickscloneJsonRecordintool-utils.tsreplacesJSON.parse(JSON.stringify())with a direct traversal (circular detection viaWeakSet, correct array-undefined-to-null semantics)npm run perf:runtime:ciruns benchmark + budget check on node 20.x; budgets are generous enough for ci variance while still catching major regressionswindows/token safety notes addressed by this pr:
Confidence Score: 4/5
inFlightAuthRefreshByTokenmap inindex.tsbeing plugin-instance-scoped (not exported/resettable) which could be hard to isolate in future test scenarios, though it doesn't cause any current issueSequence Diagram
sequenceDiagram participant C1 as Caller 1 (sdk.fetch) participant C2 as Caller 2 (sdk.fetch) participant D as refreshAuthWithDedup participant M as inFlightAuthRefreshByToken participant R as refreshAndUpdateToken C1->>D: auth (refresh="r") D->>M: get("r") → undefined D->>R: refreshAndUpdateToken(auth) activate R D->>M: set("r", refreshPromise) D-->>C1: awaiting refreshPromise C2->>D: auth (refresh="r") D->>M: get("r") → refreshPromise D-->>C2: awaiting same refreshPromise R-->>D: resolved OAuthAuthDetails deactivate R D->>M: delete("r") via finally D-->>C1: OAuthAuthDetails D-->>C2: OAuthAuthDetails (same result) Note over C1,C2: No refresh token case — each caller gets its own refresh, no dedupComments Outside Diff (1)
lib/storage.ts, line 1245-1247 (link)EACCESmissing from transient retry set — windows safety gapEACCES(access denied) is the most common error code returned by windows when antivirus or shell-extension hooks briefly hold a file handle duringunlink. the current list only retriesEPERM,EBUSY,EAGAIN.when
EACCEShits,canRetryisfalse, the error is thrown immediately, and becauseerr.transientExhausted = isTransientUnlinkCode("EACCES")evaluates tofalse,shouldAdvanceLastRunAtstaystrue. this means the cleanup window advances as if the run succeeded — the stale artifact is left on disk, and the next retry is gated behind the 30-second interval rather than being attempted immediately.contrast this with the correct
EPERM/EBUSYbehavior: after exhausting all 5 attempts,transientExhausted = true→shouldAdvanceLastRunAt = false→ interval resets to 0 → next load path retries immediately.regression test that would catch this: mock
fs.unlinkto throwEACCEStwice then succeed; assertshouldAdvanceLastRunAtisfalseafter the exhausted run and that the artifact is eventually removed on the next call.Last reviewed commit: 462cce6