Skip to content

Integration: merge all approved PRs (hardening, CI, reliability, performance, features)#46

Merged
ndycode merged 101 commits intodevfrom
integration/all-approved-prs-to-dev
Mar 6, 2026
Merged

Integration: merge all approved PRs (hardening, CI, reliability, performance, features)#46
ndycode merged 101 commits intodevfrom
integration/all-approved-prs-to-dev

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 6, 2026

Merged PRs

Summary

  • Integrates the approved CI/DX, hardening, reliability, architecture, telemetry, rotation, and wrapper follow-up work onto a single branch targeting dev.
  • Preserves merge history with sequential merges on integration/all-approved-prs-to-dev.
  • Includes one post-merge validation fix: flagged-account decrypt failures now fail fast instead of silently returning empty storage.
  • Includes package.json formatting normalization required to clear the existing Biome formatting gate.

Validation

  • Passed: npm run typecheck
  • Passed: npm test
  • Passed: npm run build
  • Passed: npm run lint
  • Passed: npm run clean:repo:check

Conflict Resolutions

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this integration PR lands ~12 approved branches onto dev: storage at-rest encryption (AES-256-GCM with scrypt KDF), a dual-lock hierarchy (in-process mutex + cross-process file lock + OCC revision check) for account saves, oauth timeout hardening, a local telemetry ledger, retry-governor controls, quota-based account rotation, and several CI/DX additions. the breadth of changes is large but the individual pieces are well-scoped and the conflict resolutions documented in the PR description are accurate.

three issues require attention before merging to production:

  • blocking sync I/O in telemetry log rotationrotateLogsIfNeeded() calls statSync/renameSync/unlinkSync inside the async queueAppend chain (lib/telemetry.ts:256), blocking the node event loop. on windows, AV-held file locks can make these stall for seconds, freezing token refresh and account save operations. needs to be converted to async fs.stat/fs.rename/fs.unlink. no vitest coverage exists for concurrent rotation during a save.

  • unguarded TOCTOU race in rotateStoredSecretEncryption — the main accounts rotation is correctly serialized via withAccountStorageTransaction, but the flagged-accounts load→save block (lib/storage.ts:2295-2298) runs with no lock. two concurrent invocations or a concurrent saveFlaggedAccounts call can cause a silent overwrite of the just-rotated ciphertext.

  • DISK_PREFERRED_MERGE_KEYS discards freshly-refreshed tokens during conflict recovery — including "refreshToken" in the disk-wins set (lib/accounts.ts:863) means that when a concurrent save conflict triggers the merge path, a token refreshed in-memory since the last successful write gets replaced by the older disk copy, causing the next request to auth-fail with a 401.

additionally, isPidAlive via process.kill(pid, 0) is a no-op on windows (always EPERM for cross-process calls), which defeats the pid-based stale lock reclaim path in lib/file-lock.ts; the age-only fallback is the only working path on windows and this should be documented and tested.

Confidence Score: 2/5

  • not safe to merge as-is — the sync I/O in telemetry and the token-discard bug in conflict recovery are production landmines on windows
  • three confirmed logic bugs: (1) blocking sync fs calls inside the async append queue in telemetry.ts will freeze token refresh on windows under AV load; (2) rotateStoredSecretEncryption has an unprotected TOCTOU on flagged accounts; (3) DISK_PREFERRED_MERGE_KEYS will silently discard a freshly-refreshed token during conflict recovery, causing a 401 that cannot be self-healed. these are not speculative — they are deterministic failure paths in the new code paths added by this PR
  • lib/telemetry.ts (sync I/O in async queue), lib/storage.ts (unguarded flagged-accounts rotation), lib/accounts.ts (disk-preferred merge drops fresh tokens)

Important Files Changed

Filename Overview
lib/telemetry.ts new telemetry ledger with redaction and log rotation; synchronous I/O in rotateLogsIfNeeded blocks the event loop inside the async append queue — critical on windows where AV holds file locks
lib/storage.ts major hardening: dual-lock hierarchy (in-process mutex + fs file lock), OCC revision check, at-rest encryption, WAL journaling; rotateStoredSecretEncryption has an unguarded TOCTOU race on flagged accounts outside the serialized lock
lib/accounts.ts adds conflict-recovery merge with debounced retry; DISK_PREFERRED_MERGE_KEYS includes refreshToken which can discard a freshly-refreshed in-memory token during a concurrent save conflict
lib/file-lock.ts new cross-process file lock with stale-lock detection; isPidAlive via process.kill(pid, 0) is unreliable on windows — EPERM is always thrown for cross-process signals, making pid-based stale detection a no-op there
lib/auth/auth.ts adds per-call abort+timeout context for oauth code exchange and refresh; error body is no longer forwarded in plain text (good for token leakage); timeout regex fallback in refreshAccessToken is mildly fragile but functionally safe given fetchWithTimeout emits AbortErrors
lib/secrets-crypto.ts new AES-256-GCM encryption with scrypt KDF (v2) and legacy SHA-256 key derivation (v1); v2 is solid; v1 legacy path remains for migration reads only and is not used for new encryptions
lib/request/retry-governor.ts pure deterministic retry decision function; absolute ceiling and per-retry limit guards are correct; well-tested by test/retry-governor.test.ts
lib/quota-cache.ts saveQuotaCache now wraps the temp-write+rename in acquireFileLock, eliminating the prior concurrent-writer race; return type correctly changed to boolean

Sequence Diagram

sequenceDiagram
    participant Caller
    participant AccountManager
    participant withStorageSerializedFileLock
    participant withAccountFileMutex
    participant withAccountFileLock
    participant withStorageLock
    participant saveAccountsUnlocked
    participant withStorageSaveFileLock
    participant disk as Disk (accounts.json)

    Caller->>AccountManager: saveToDisk()
    AccountManager->>AccountManager: buildStorageSnapshot()
    AccountManager->>AccountManager: persistStorageWithConflictRecovery()
    AccountManager->>withStorageSerializedFileLock: path, fn
    withStorageSerializedFileLock->>withAccountFileMutex: fn
    withAccountFileMutex->>withAccountFileLock: path (.queue.lock), fn
    withAccountFileLock->>withStorageLock: fn
    withStorageLock->>saveAccountsUnlocked: storage
    saveAccountsUnlocked->>withStorageSaveFileLock: path (.lock), task
    withStorageSaveFileLock->>disk: OCC revision check (SHA-256)
    alt revision matches
        withStorageSaveFileLock->>disk: write WAL, write temp, rename
        withStorageSaveFileLock-->>saveAccountsUnlocked: ok
    else ECONFLICT
        withStorageSaveFileLock-->>saveAccountsUnlocked: throws ECONFLICT
        saveAccountsUnlocked-->>withStorageLock: throws
        withStorageLock-->>AccountManager: throws
        AccountManager->>disk: loadAccounts() (reload)
        AccountManager->>AccountManager: mergeIntoLatestStorage(latest, local)
        AccountManager->>AccountManager: retry persistStorageWithConflictRecovery
    end
Loading

Fix All in Codex

Last reviewed commit: afda35a

Greptile also left 3 inline comments on this PR.

Context used:

  • Rule from dashboard - What: Every code change must explain how it defends against Windows filesystem concurrency bugs and ... (source)

ndycode and others added 30 commits March 3, 2026 20:11
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Add timeout and abort boundary for authorization code exchange.

Wire fetch timeout config into oauth exchange callsites.

Add tests for network and timeout failure behavior.

Co-authored-by: Codex <noreply@openai.com>
Add enterprise-grade hardening across runtime, CLI, storage, CI, and docs.\n\n- Add cross-process file locking for settings/quota persistence\n- Add at-rest secret encryption with rotation command and idempotency support\n- Add RBAC/ABAC-style command authorization, JSON redaction, and retention policies\n- Add background retry + dead-letter queue for async persistence failures\n- Add list JSON pagination standard and schemaVersion contract updates\n- Add CI security gates: secret scan, supply-chain/SCA/license checks, SBOM, required checks policy\n- Add operations and incident response runbooks\n- Add/extend tests for new security/reliability primitives and CLI behaviors\n\nValidated with:\n- npm run typecheck\n- npm run lint\n- npm run build && npm test\n- npm run coverage\n- npm run audit:ci\n- npm run license:check\n- npm run clean:repo:check

Co-authored-by: Codex <noreply@openai.com>
- migrate new envelopes to salted scrypt-derived AES keys\n- retain legacy enc:v1 decryption compatibility\n- add coverage for v2 and legacy v1 crypto paths\n- clarify high-entropy 32-byte key requirements in docs

Co-authored-by: Codex <noreply@openai.com>
- dedupe OAuth timeout resolution in plugin flows\n- close upstream abort-listener race in auth exchange\n- add transient config-read retry handling for EBUSY/EPERM\n- add timeout propagation and retry regression tests\n\nCo-authored-by: Codex <noreply@openai.com>
- sanitize OAuth token error logging and payloads
- add deterministic upstream-abort and cleanup regressions
- harden config-save merge path against transient EBUSY/EPERM reads
- strengthen timeout wiring and EPERM regression tests

Co-authored-by: Codex <noreply@openai.com>
- audit break-glass authorization bypass events\n- add multi-process contention regression for file locking\n- cover new auth.break_glass audit action in tests

Co-authored-by: Codex <noreply@openai.com>
Prevent sensitive OAuth payload leakage during refresh HTTP failures by logging status and body length only, and return a generic safe error message when response text exists. Adds regression tests for auth behavior and sanitized logging.

Co-authored-by: Codex <noreply@openai.com>
Introduce optional environment-driven ABAC constraints layered ahead of RBAC for auth actions and commands.

- add ABAC policy evaluation in authorization module
- pass command/interactive/idempotency context from CLI command routing
- enforce read authorization on read command paths consistently
- add unit and CLI coverage for ABAC deny and idempotency requirements
- document ABAC controls in config and operations runbooks

Co-authored-by: Codex <noreply@openai.com>
Add schema-aware config sanitization, introduce a resilient fetch primitive with timeout/retry instrumentation, apply it to remote prompt/update fetches, and add shutdown timeout boundaries with concurrent cleanup deduplication. Includes new regression tests for network retries, config sanitization, and shutdown timeout behavior.

Co-authored-by: Codex <noreply@openai.com>
Introduce a shared account-view helper module and reuse it across the plugin entrypoint and CLI manager to remove duplicated formatting logic while preserving behavior.

Add unit/regression tests to lock formatting and active-index fallback behavior.

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Address unresolved hardening review items across auth abort classification, config sanitization, network retry cancellation, update-check backoff, prompt refresh dedupe coverage, and shutdown timer cleanup. Adds deterministic regression tests for each scenario and aligns config-save expectations with schema-bound validation.

Co-authored-by: Codex <noreply@openai.com>
Add shared helpers for per-family active-index map creation and bulk updates, then reuse them across plugin, CLI manager, and codex-cli sync flows to reduce duplicated state-mutation logic.

Extend tests to lock switch and selection behavior for all model-family indexes.

Co-authored-by: Codex <noreply@openai.com>
Extract active-index normalization into a shared helper and reuse it in plugin account-flow clamping, codex-cli sync, and doctor normalization paths.

Add focused tests covering bounded normalization and empty-account map handling to preserve behavior across both clear and fill modes.

Co-authored-by: Codex <noreply@openai.com>
Extract model-family rate-limit status rendering into account-view helpers and replace duplicate status mapping logic in the plugin status output paths.

Add unit coverage for mixed limited/ok family labels to protect output behavior.

Co-authored-by: Codex <noreply@openai.com>
Move active-index-by-family label rendering into shared account-view helpers and reuse it in both v2 and legacy codex status output paths.

Add helper tests to lock 1-based labels and placeholder output for missing families.

Co-authored-by: Codex <noreply@openai.com>
Move codex-remove index/family reindexing into shared active-index helpers to centralize the mutation policy and reduce inline branching in the plugin entrypoint.

Add helper tests for in-range and out-of-range removal behavior.

Co-authored-by: Codex <noreply@openai.com>
- cleanup stale/dead process lock artifacts before acquiring account lock

- ensure lock release always attempts fallback cleanup

- keep clearAccounts/saveTransactions serialized across file and memory locks

Co-authored-by: Codex <noreply@openai.com>
Introduce shared account-storage view helpers for creating empty v3 storage and cloning mutable working copies, then reuse them across plugin login/check flow, codex-cli sync, and manager reset paths.

Add unit tests to lock clone isolation and family-index initialization behavior.

Co-authored-by: Codex <noreply@openai.com>
- add storage identity fallback test for malformed .git metadata

- add OAuth poll abort test for deterministic close behavior

- add prompt refresh dedupe test for concurrent stale cache reads

Co-authored-by: Codex <noreply@openai.com>
Align local and CI validation through npm run verify commands, wire CI to verify:ci, and update contributor-facing validation docs/templates.

Co-authored-by: Codex <noreply@openai.com>
- replace instantaneous fetch-count check with stable post-refresh assertion window

- verify fetch count remains fixed after refresh persistence completes

Co-authored-by: Codex <noreply@openai.com>
Add dev doctor and setup commands with fail-fast checks, document the contributor workflow, and wire a CI sanity check for doctor:dev.

Co-authored-by: Codex <noreply@openai.com>
- replace microtask spin-loops with vi.waitFor polling

- keep final no-extra-fetch stability assertion after refresh

Co-authored-by: Codex <noreply@openai.com>
Add a canonical verify workflow shared by local and CI, integrate scoped Biome formatting checks, and document contributor/release runbooks to reduce setup and release friction.

Co-authored-by: Codex <noreply@openai.com>
Add regression tests for doctor/setup scripts, make setup command execution deterministic under event-order races, add bounded Windows npm ci retries, and document contributor upgrade + Windows lock remediation guidance.

Co-authored-by: Codex <noreply@openai.com>
Ensure account-storage mutations keep deterministic ordering while preserving the historical file-lock before in-process mutex acquisition sequence.

Co-authored-by: Codex <noreply@openai.com>
Drop pre-acquire dead-process lock cleanup and only run fallback lock-file deletion when lock.release() fails. Also align lock-order comment and add debug observability for fallback cleanup failures.

Co-authored-by: Codex <noreply@openai.com>
ndycode and others added 26 commits March 5, 2026 22:11
- dedupe safe audit resource sanitization in plugin fetch flow
- surface retention retry exhaustion and cover persistent EBUSY regression
- share test removeWithRetry helper and add license policy parsing regressions

Co-authored-by: Codex <noreply@openai.com>
- add recovery drill timeout and failure notifier hook
- clarify upgrade runbook npm script execution context
- harden secret-scan allowlist regression fixtures and test isolation
- reuse shared removeWithRetry helper in audit forwarder tests

Co-authored-by: Codex <noreply@openai.com>
Rotate immediately on 429 responses and bootstrap runtime selection from persisted quota cache (accountId-first, normalized email fallback) with a 30m lazy TTL and env escape hatch.

Co-authored-by: Codex <noreply@openai.com>
- block exhausted accounts until the furthest reset window
- make quota bootstrap loading best-effort on cache failures
- add regressions for email fallback, concurrency, refresh races, and EPERM/EBUSY

Co-authored-by: Codex <noreply@openai.com>
- avoid cross-model bootstrap blocking within the same family
- align scheduler keys with the model that produced cached quota state
- make concurrency/race tests deterministic with explicit barriers

Co-authored-by: Codex <noreply@openai.com>
- simplify bootstrap apply path to use the loader's best-effort fallback
- keep error handling centralized in loadQuotaCacheForBootstrap

Co-authored-by: Codex <noreply@openai.com>
Add an explicit ceiling-exhausted guard and integration coverage to ensure exhausted absolute ceiling stops immediately instead of spinning into retry-limit handling.

Co-authored-by: Codex <noreply@openai.com>
- avoid over-blocking when quota reset timestamps are known but elapsed
- add failure cooldown with retry interval to prevent load hammering while allowing recovery
- normalize model ids for bootstrap exact matching using provider-prefix stripping + lowercase
- add regression coverage for model-id variants, elapsed resets, and cooldown recovery behavior

Co-authored-by: Codex <noreply@openai.com>
- use shared quota schedule key builder in bootstrap, runtime deferral, and stream fallback updates
- add regression in index-retry for provider-prefixed model-id deferral matching
- tighten recovery assertion to verify rotation target account deterministically

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
- harden recovery drill workflow concurrency and webhook payload handling\n- tighten forwarder harness determinism for lock/timeout edge cases\n- dedupe test cleanup helper and make slo report module import test-compatible\n\nCo-authored-by: Codex <noreply@openai.com>
…s-to-dev

Co-authored-by: Codex <noreply@openai.com>
…to-dev

Conflict resolution: preserved verify-first workflow from PR #36 while adding setup, doctor, and Biome-based local bootstrap tooling from PR #37.

Co-authored-by: Codex <noreply@openai.com>
…ved-prs-to-dev

Co-authored-by: Codex <noreply@openai.com>
…-prs-to-dev

Conflict resolution: kept the verify-first DX workflow while preserving PR #32 enterprise hardening scripts, docs, workflows, and runtime changes.

Co-authored-by: Codex <noreply@openai.com>
Conflict resolution: applied the stacked hardening follow-up on top of the merged PR #32 runtime and test surfaces while preserving the earlier DX workflow choices.

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Too many files!

This PR contains 157 files, which is 7 over the limit of 150.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f679d069-c567-42f6-96ac-4b2944c84846

📥 Commits

Reviewing files that changed from the base of the PR and between d36b04f and afda35a.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (157)
  • .github/pull_request_template.md
  • .github/settings.yml
  • .github/workflows/ci.yml
  • .github/workflows/recovery-drill.yml
  • .github/workflows/release-provenance.yml
  • .github/workflows/retention-maintenance.yml
  • .github/workflows/sbom-attestation.yml
  • .github/workflows/secret-scan.yml
  • .github/workflows/supply-chain.yml
  • .gitleaks.toml
  • CONTRIBUTING.md
  • README.md
  • SECURITY.md
  • biome.jsonc
  • config/performance-budgets.json
  • config/performance-budgets.schema.json
  • config/slo-policy.json
  • docs/DOCUMENTATION.md
  • docs/README.md
  • docs/configuration.md
  • docs/development/CONFIG_FIELDS.md
  • docs/development/LOCAL_DEV.md
  • docs/development/RELEASE_RUNBOOK.md
  • docs/development/TESTING.md
  • docs/index.md
  • docs/operations/audit-forwarding.md
  • docs/operations/incident-drill-template.md
  • docs/operations/incident-response.md
  • docs/operations/release-runbook.md
  • docs/operations/slo-error-budget.md
  • docs/privacy.md
  • docs/reference/commands.md
  • docs/reference/error-contracts.md
  • docs/reference/public-api.md
  • docs/reference/settings.md
  • docs/reference/storage-paths.md
  • docs/runbooks/README.md
  • docs/runbooks/incident-response.md
  • docs/runbooks/operations.md
  • docs/upgrade.md
  • index.ts
  • lib/accounts.ts
  • lib/accounts/account-view.ts
  • lib/accounts/active-index.ts
  • lib/accounts/storage-view.ts
  • lib/audit.ts
  • lib/auth/auth.ts
  • lib/authorization.ts
  • lib/auto-update-checker.ts
  • lib/background-jobs.ts
  • lib/codex-cli/sync.ts
  • lib/codex-manager.ts
  • lib/codex-manager/settings-hub.ts
  • lib/config.ts
  • lib/data-redaction.ts
  • lib/data-retention.ts
  • lib/file-lock.ts
  • lib/idempotency.ts
  • lib/index.ts
  • lib/keytar.d.ts
  • lib/network.ts
  • lib/prompts/codex.ts
  • lib/prompts/host-codex-prompt.ts
  • lib/quota-cache.ts
  • lib/recovery/storage.ts
  • lib/refresh-lease.ts
  • lib/refresh-queue.ts
  • lib/request/fetch-helpers.ts
  • lib/request/response-handler.ts
  • lib/request/retry-governor.ts
  • lib/schemas.ts
  • lib/secrets-crypto.ts
  • lib/secrets/token-store.ts
  • lib/shutdown.ts
  • lib/storage.ts
  • lib/storage/migrations.ts
  • lib/telemetry.ts
  • lib/unified-settings.ts
  • lib/utils.ts
  • package.json
  • scripts/audit-log-forwarder.js
  • scripts/benchmark-runtime-path.mjs
  • scripts/codex.js
  • scripts/compliance-evidence-bundle.js
  • scripts/doctor-dev.js
  • scripts/enterprise-health-check.js
  • scripts/generate-sbom.js
  • scripts/keychain-assert.js
  • scripts/license-policy-check.js
  • scripts/performance-budget-check.js
  • scripts/retention-cleanup.js
  • scripts/secret-scan-regression.sh
  • scripts/seed-health-fixture.js
  • scripts/setup-dev.js
  • scripts/slo-budget-report.js
  • scripts/verify-sbom.js
  • test/account-storage-view.test.ts
  • test/account-view.test.ts
  • test/accounts-edge.test.ts
  • test/accounts.test.ts
  • test/active-index.test.ts
  • test/audit-log-forwarder.test.ts
  • test/audit-retry.test.ts
  • test/audit.test.ts
  • test/auth-logging.test.ts
  • test/auth.test.ts
  • test/authorization.test.ts
  • test/auto-update-checker.test.ts
  • test/background-jobs.test.ts
  • test/chaos/fault-injection.test.ts
  • test/codex-manager-cli.test.ts
  • test/codex-prompts.test.ts
  • test/compliance-evidence-bundle.test.ts
  • test/config-save.test.ts
  • test/data-redaction.test.ts
  • test/data-retention.test.ts
  • test/documentation.test.ts
  • test/enterprise-health-check.test.ts
  • test/fetch-helpers.test.ts
  • test/file-lock-fd-leak.test.ts
  • test/file-lock.test.ts
  • test/generate-sbom.test.ts
  • test/helpers/remove-with-retry.ts
  • test/host-codex-prompt.test.ts
  • test/idempotency.test.ts
  • test/index-retry.test.ts
  • test/index.test.ts
  • test/license-policy-check.test.ts
  • test/network.test.ts
  • test/paths.test.ts
  • test/plugin-config.test.ts
  • test/public-api-contract.test.ts
  • test/quota-cache.test.ts
  • test/recovery-storage.test.ts
  • test/refresh-lease.test.ts
  • test/refresh-queue.test.ts
  • test/response-handler-logging.test.ts
  • test/response-handler.test.ts
  • test/retention-cleanup.test.ts
  • test/retry-governor.test.ts
  • test/schemas.test.ts
  • test/scripts/doctor-dev.test.ts
  • test/scripts/setup-dev.test.ts
  • test/secrets-crypto.test.ts
  • test/security/secret-scan-regression.test.ts
  • test/server.unit.test.ts
  • test/settings-hub-utils.test.ts
  • test/shutdown.test.ts
  • test/slo-budget-report.test.ts
  • test/storage-flagged.test.ts
  • test/storage-recovery-paths.test.ts
  • test/storage-v4-keychain.test.ts
  • test/storage.test.ts
  • test/telemetry.test.ts
  • test/token-store.test.ts
  • test/unified-settings.test.ts
  • vitest.config.ts

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch integration/all-approved-prs-to-dev

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +254 to +259
await queueAppend(async () => {
await ensureLogDir();
rotateLogsIfNeeded();
const line = `${JSON.stringify(entry)}\n`;
await fs.appendFile(getTelemetryPath(), line, "utf8");
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronous I/O inside async queue blocks the event loop

rotateLogsIfNeeded() (called at line 256) uses statSync, existsSync, unlinkSync, and renameSync. all four are synchronous and are called directly inside the queueAppend async task. on windows, antivirus/filesystem watchers routinely hold file locks on *.jsonl files for dozens of milliseconds; a blocking statSync or renameSync inside node's microtask queue will freeze the entire event loop for that duration, stalling all pending i/o — including in-flight token refresh or account save operations.

rotateLogsIfNeeded should be converted to an async function using fs.stat, fs.rename, and fs.unlink (already imported as promises as fs). this also creates a testable regression point — there is currently no vitest coverage for concurrent log rotation while a save is in flight.

Suggested change
await queueAppend(async () => {
await ensureLogDir();
rotateLogsIfNeeded();
const line = `${JSON.stringify(entry)}\n`;
await fs.appendFile(getTelemetryPath(), line, "utf8");
});
await queueAppend(async () => {
await ensureLogDir();
await rotateLogsIfNeeded();
const line = `${JSON.stringify(entry)}\n`;
await fs.appendFile(getTelemetryPath(), line, "utf8");
});

(rotateLogsIfNeeded itself should be converted to async using fs.stat, fs.rename, and fs.unlink.)

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/telemetry.ts
Line: 254-259

Comment:
**Synchronous I/O inside async queue blocks the event loop**

`rotateLogsIfNeeded()` (called at line 256) uses `statSync`, `existsSync`, `unlinkSync`, and `renameSync`. all four are synchronous and are called directly inside the `queueAppend` async task. on windows, antivirus/filesystem watchers routinely hold file locks on `*.jsonl` files for dozens of milliseconds; a blocking `statSync` or `renameSync` inside node's microtask queue will freeze the entire event loop for that duration, stalling all pending i/o — including in-flight token refresh or account save operations.

`rotateLogsIfNeeded` should be converted to an async function using `fs.stat`, `fs.rename`, and `fs.unlink` (already imported as `promises as fs`). this also creates a testable regression point — there is currently no vitest coverage for concurrent log rotation while a save is in flight.

```suggestion
		await queueAppend(async () => {
			await ensureLogDir();
			await rotateLogsIfNeeded();
			const line = `${JSON.stringify(entry)}\n`;
			await fs.appendFile(getTelemetryPath(), line, "utf8");
		});
```

(`rotateLogsIfNeeded` itself should be converted to `async` using `fs.stat`, `fs.rename`, and `fs.unlink`.)

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment on lines +2295 to +2298
const flagged = await loadFlaggedAccounts();
const flaggedCount = flagged.accounts.length;
if (flaggedCount > 0) {
await saveFlaggedAccounts(flagged);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rotateStoredSecretEncryption has an unguarded TOCTOU race on flagged accounts

the main accounts block is correctly serialized via withAccountStorageTransaction (which holds withStorageSerializedFileLock), but the flagged-accounts rotate (loadFlaggedAccountssaveFlaggedAccounts) runs entirely outside any lock. two concurrent callers or a concurrent saveFlaggedAccounts from elsewhere will interleave: the second writer's load snapshot is already stale before it calls save, silently overwriting the first writer's rotation result. this is a real risk on windows where cli restarts can overlap.

the saveFlaggedAccounts path already supports atomic temp-file-rename writes, but without serialization the load→save window is unguarded. there is no regression test for concurrent rotateStoredSecretEncryption invocations in test/storage.test.ts or test/storage-flagged.test.ts.

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/storage.ts
Line: 2295-2298

Comment:
**`rotateStoredSecretEncryption` has an unguarded TOCTOU race on flagged accounts**

the main accounts block is correctly serialized via `withAccountStorageTransaction` (which holds `withStorageSerializedFileLock`), but the flagged-accounts rotate (`loadFlaggedAccounts``saveFlaggedAccounts`) runs entirely outside any lock. two concurrent callers or a concurrent `saveFlaggedAccounts` from elsewhere will interleave: the second writer's load snapshot is already stale before it calls save, silently overwriting the first writer's rotation result. this is a real risk on windows where cli restarts can overlap.

the `saveFlaggedAccounts` path already supports atomic temp-file-rename writes, but without serialization the load→save window is unguarded. there is no regression test for concurrent `rotateStoredSecretEncryption` invocations in `test/storage.test.ts` or `test/storage-flagged.test.ts`.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment on lines +863 to +864
if (DISK_PREFERRED_MERGE_KEYS.has(rawKey) && currentValue !== undefined) {
continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISK_PREFERRED_MERGE_KEYS can discard a freshly-refreshed in-memory token during conflict recovery

DISK_PREFERRED_MERGE_KEYS contains "refreshToken" and "accessToken". in mergeStoredAccountRecords the guard currentValue !== undefined means the disk copy always wins. the problem: mergeIntoLatestStorage is called during conflict recovery (after a successful refreshAccessToken that updated the in-memory account but before the first save succeeded). if the disk has the old token and the in-memory state has the new one, the merge discards the fresh token and writes the stale one back. the next request will then fail auth with a 401 using the now-invalid token.

the intent is to prefer stable long-lived disk state, but refreshToken rotates on every refresh call; disk-preferred semantics are wrong for this field in the conflict-recovery merge path. consider comparing token expiry (expiresAt) and preferring the newer token, or at minimum not including "refreshToken" in DISK_PREFERRED_MERGE_KEYS.

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/accounts.ts
Line: 863-864

Comment:
**`DISK_PREFERRED_MERGE_KEYS` can discard a freshly-refreshed in-memory token during conflict recovery**

`DISK_PREFERRED_MERGE_KEYS` contains `"refreshToken"` and `"accessToken"`. in `mergeStoredAccountRecords` the guard `currentValue !== undefined` means the disk copy always wins. the problem: `mergeIntoLatestStorage` is called **during conflict recovery** (after a successful `refreshAccessToken` that updated the in-memory account but before the first save succeeded). if the disk has the old token and the in-memory state has the new one, the merge discards the fresh token and writes the stale one back. the next request will then fail auth with a 401 using the now-invalid token.

the intent is to prefer stable long-lived disk state, but `refreshToken` rotates on every refresh call; disk-preferred semantics are wrong for this field in the conflict-recovery merge path. consider comparing token expiry (`expiresAt`) and preferring the newer token, or at minimum not including `"refreshToken"` in `DISK_PREFERRED_MERGE_KEYS`.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

@ndycode ndycode merged commit 555ca58 into dev Mar 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant