Conversation
Implement security, reliability, observability, and release-governance hardening for enterprise certification baseline. - add keychain-backed secret storage with V4 account schema and migrations - harden file permissions for settings/config/audit artifacts - add CI secret scan and release provenance workflows - add enterprise health, retention cleanup, and performance budget checks - add incident/release operational runbooks and documentation updates - add targeted tests and config updates for new controls Co-authored-by: Codex <noreply@openai.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthroughwalkthroughadds keychain-backed secret storage (v4), deterministic secret refs, secure file permissions, audit retention and purge, many operational scripts and ci/workflows, and extensive tests for token-store, storage-v4, audit forwarding, and ops tooling. changes
estimated code review effort🎯 5 (critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Additional Comments (3)
this needs to be throttled — run at most once per process lifetime or via a debounce window (e.g., once per hour). a simple approach: let lastPurgeMs = 0;
const PURGE_INTERVAL_MS = 60 * 60 * 1000; // 1 hour
function purgeExpiredLogs(): void {
const now = Date.now();
if (now - lastPurgeMs < PURGE_INTERVAL_MS) return;
lastPurgeMs = now;
// ... existing purge logic
}no regression test covers concurrent audit writes or the EBUSY-under-antivirus scenario (e.g.,
when both the guard should also check that the seed carries meaningful identity, not just that it's non-empty: const hasMeaningfulSeed =
normalizedAccountId.length > 0 || normalizedEmail.length > 0;
const seed = hasMeaningfulSeed ? stableSeed : fallbackSeed;this is a token-leakage risk on windows desktops where antivirus-delayed logins can stack accounts with missing metadata.
wrap each deletion to isolate failures: for (const targetPath of expired) {
try {
await removeWithRetry(targetPath, { force: true });
} catch (error) {
console.warn(`retention-cleanup: could not remove ${targetPath}: ${error?.message ?? error}`);
}
}this is especially important as an operational script that runs on user desktops where file locks are unpredictable. also, no vitest coverage exists for this partial-failure path. |
There was a problem hiding this comment.
Actionable comments posted: 30
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/config.ts (1)
294-304:⚠️ Potential issue | 🟠 Majoradd deterministic retry regression for env-path rename locks
the retry loop in
lib/config.ts:294-304handlesebusy/eperm, but there is no deterministic vitest case that forcesfs.renamefailures on the env-path branch intest/config-save.test.ts:71-132. this is a concurrency/reliability risk for locked files.test case to add
+ it("retries env-path rename on ebusy and succeeds", async () => { + const configPath = join(tempDir, "plugin-config.json"); + process.env.CODEX_MULTI_AUTH_CONFIG_PATH = configPath; + await fs.writeFile(configPath, JSON.stringify({ codexMode: true }), "utf8"); + + const renameSpy = vi.spyOn(fs, "rename"); + renameSpy.mockImplementationOnce(async () => { + const error = new Error("busy") as NodeJS.ErrnoException; + error.code = "EBUSY"; + throw error; + }); + try { + const { savePluginConfig } = await import("../lib/config.js"); + await savePluginConfig({ codexMode: false }); + const parsed = JSON.parse(await fs.readFile(configPath, "utf8")) as Record<string, unknown>; + expect(parsed.codexMode).toBe(false); + } finally { + renameSpy.mockRestore(); + } + });as per coding guidelines
lib/**: focus on auth rotation, windows filesystem io, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/config.ts` around lines 294 - 304, The rename retry loop in lib/config.ts (the fs.rename retry inside the save flow) needs a deterministic vitest that simulates transient EBUSY/EPERM failures on the env-path branch so we can assert the retry behavior; add a test in test/config-save.test.ts that mocks or stubs fs.rename to throw EBUSY/EPERM for the first N attempts and then succeed, assert that the code calls fs.rename the expected number of times and ultimately succeeds, and ensure the test covers the env-path code path (the same function that performs the rename/rename-with-tempPath logic) so we catch concurrency/lock regressions and validate exponential backoff handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 53-57: The "Enterprise health check" CI step (npm script
ops:health-check which runs scripts/enterprise-health-check.js) can pass
vacuously because the script inspects ~/.codex/multi-auth paths that are absent
on fresh runners; either seed the expected fixture files in the workflow before
the "Enterprise health check" step, or update scripts/enterprise-health-check.js
to accept a new flag (e.g., --require-files) that causes it to exit non-zero
when the expected paths are missing, or else remove/mark the ops:health-check
step as local-only in the workflow; reference the script name
(scripts/enterprise-health-check.js) and the npm script (ops:health-check) when
making the change.
In @.github/workflows/release-provenance.yml:
- Around line 30-35: The workflow job "Validate quality gates" is missing the
security checklist gates listed in SECURITY.md (audit:ci, ops:health-check,
perf:budget-check); update the run steps in that job to include `npm run
audit:ci`, `npm run ops:health-check`, and `npm run perf:budget-check` in the
prescribed order (before or alongside `npm run lint`, `npm run typecheck`, `npm
run build`, `npm test`) so the release-provenance.yml job aligns with
SECURITY.md's required checks.
- Around line 37-38: The publish job's "Publish package with provenance" step
runs `npm publish --provenance --access public` but doesn't supply
authentication; add the environment variable NODE_AUTH_TOKEN pointing to the
repository secret (e.g., set NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}) for that
step so `npm publish` can authenticate, and ensure `actions/setup-node` is
configured with the correct registry-url and that NPM_TOKEN secret has publish
permissions for this package.
In @.github/workflows/secret-scan.yml:
- Around line 19-23: Add a regression test step to the secret-scan workflow so
the scanner's policy is exercised: after the "Run gitleaks" step (the action
using gitleaks/gitleaks-action@v2), add a step that runs the existing regression
script test/security/secret-scan-regression.test.sh (or an equivalent small
test) and ensure it runs in the same environment (pass GITHUB_TOKEN and
GITLEAKS_CONFIG) and returns non-zero on seeded disallowed-secret detection so
the job fails if allowlist/policy drift neuters the scanner; reference the "Run
gitleaks" step and the script name when locating where to insert the new
workflow step.
- Around line 15-23: Replace mutable action versions in the workflow: pin uses:
actions/checkout@v4 and uses: gitleaks/gitleaks-action@v2 to their full commit
SHAs (use the upstream commit SHAs for those actions) and add an explicit
top-level permissions block that grants only contents: read and pull-requests:
write for the GITHUB_TOKEN; update references to GITHUB_TOKEN and
GITLEAKS_CONFIG remain unchanged but ensure the new permissions block is present
at the workflow root so the token has least privilege.
In @.gitleaks.toml:
- Around line 1-2: The .gitleaks.toml currently only defines an [allowlist] and
no detection rules; add an [extend] section with useDefault = true (i.e., add a
top-level [extend] useDefault = true entry) so gitleaks inherits its default
[[rules]] instead of running with zero rules; place the [extend] useDefault =
true directive before the existing [allowlist] block and ensure you do not
remove or rename the existing [allowlist] or any future [[rules]] sections.
- Around line 10-15: Remove the risky OpenAI-style allowlist entry
'''sk-[A-Za-z0-9]{20,}''' from the regexes array in .gitleaks.toml (it currently
appears as the fourth regex); simply delete that pattern so the file no longer
suppresses potential real API keys while leaving the other synthetic patterns
intact.
- Around line 6-8: Update the regex path patterns in .gitleaks.toml to accept
both forward and backslashes by replacing instances of '/' with a character
class that matches either separator (use [\\/]) for the existing patterns (e.g.,
the three literals '''^test/''' , '''^docs/releases/''' ,
'''^docs/development/DEEP_AUDIT_2026-03-01\.md$'''); ensure each pattern becomes
'''^test[\\/]''', '''^docs[\\/]releases[\\/]''', and
'''^docs[\\/]development[\\/]DEEP_AUDIT_2026-03-01\.md$''' so gitleaks on
Windows will match backslash paths too.
In `@config/performance-budgets.json`:
- Around line 1-7: Add explicit units for the numeric thresholds in
performance-budgets.json by documenting that values are milliseconds: either
create a sibling performance-budgets.schema.json that defines the properties
(filterInput_small, filterInput_large, cleanupToolDefinitions_medium,
cleanupToolDefinitions_large, accountHybridSelection_200) with a description
mentioning "milliseconds", or add a clear header comment to
scripts/performance-budget-check.js that states the unit for these keys; ensure
the documentation references the same key names so future maintainers know the
unit when tuning budgets.
In `@docs/configuration.md`:
- Line 69: The markdown table cell for CODEX_SECRET_STORAGE_MODE contains pipe
characters that break table parsing; update the cell for
`CODEX_SECRET_STORAGE_MODE` to avoid unescaped '|' characters (either replace
them with slashes like `keychain/plaintext/auto` to match other entries such as
`CODEX_MODE=0/1`, or escape each pipe as `\|`) so the table renders correctly
and consistently.
In `@docs/privacy.md`:
- Around line 51-57: Update the retention cleanup docs to show how to override
the default 90-day window by appending the flag when invoking the npm script;
specifically, add an example using the npm script name "npm run
ops:retention-cleanup" with the override flag "--days=<n>" (e.g., "--days=30")
so operators know to run "npm run ops:retention-cleanup -- --days=30" to
customize retention.
In `@lib/audit.ts`:
- Around line 157-158: The current sequence calls purgeExpiredLogs() before
rotateLogsIfNeeded(), which can cause files just created by rotation to be
immediately deleted; change the order so rotateLogsIfNeeded() runs before
purgeExpiredLogs(), or modify purgeExpiredLogs() (used by the caller) to skip
files that were modified/created within the rotation window (e.g., check recent
mtime or a rotation marker) to avoid purging newly-rotated audit files; update
the caller that currently invokes purgeExpiredLogs() and rotateLogsIfNeeded() to
call rotateLogsIfNeeded() first (or implement the mtime check inside
purgeExpiredLogs()) and ensure tests cover recently-rotated files like
audit.4.log.
- Around line 99-118: purgeExpiredLogs is being invoked on every auditLog call
causing a directory scan per event; change the call site so purgeExpiredLogs
runs only periodically by adding a module-scoped timestamp (e.g., lastPurgeMs)
and a constant PURGE_INTERVAL_MS (suggest 1 hour) and, inside auditLog, check if
Date.now() - lastPurgeMs >= PURGE_INTERVAL_MS before calling purgeExpiredLogs
(update lastPurgeMs when you run it), or alternatively move the purgeExpiredLogs
invocation into your log rotation/rotation-related function so it only runs at
rotation time; reference purgeExpiredLogs, auditLog, auditConfig, and
DEFAULT_AUDIT_RETENTION_DAYS when making the change.
- Around line 105-117: The purgeExpiredLogs loop currently uses
readdirSync/statSync/unlinkSync and swallows all errors; change it to explicitly
detect Windows EBUSY errors around statSync/unlinkSync and retry those
operations using the sync retry pattern from lib/logger.ts (Atomics.wait-based
backoff), keeping the same cutoffMs logic and still skipping non-audit files;
narrow the catch to rethrow non-EBUSY errors and only suppress after retries
fail. Update/add test/audit.test.ts to simulate an EBUSY during
unlinkSync/statSync and assert that the retry logic is invoked and the file is
eventually purged or a deterministic failure occurs (matching existing
behavior).
In `@lib/codex-manager.ts`:
- Around line 4043-4065: The auditActionForCommand function currently maps
command names (auditActionForCommand) to semantically incorrect AuditAction
values; update the AuditAction enum to add either a generic COMMAND_RUN
(recommended) or specific actions like FORECAST_RUN/REPORT_RUN/DOCTOR_RUN, then
change auditActionForCommand to return COMMAND_RUN for query-style commands
(forecast, report, list, status) and to return COMMAND_RUN or command-specific
run actions for fix/doctor so dry-run invocations aren't treated as
CONFIG_CHANGE; ensure callers that record audits include the original command
name in metadata so the generic COMMAND_RUN can be disambiguated.
In `@lib/schemas.ts`:
- Line 149: Change the accessTokenRef Zod rule to reject empty strings by
updating the schema entry from accessTokenRef: z.string().optional() to a
non-empty string when present (e.g., accessTokenRef:
z.string().nonempty().optional() or z.string().min(1).optional()) in
lib/schemas.ts, and add a unit test in test/schemas.test.ts that uses the
schema's safeParse (or parse) to assert that an object with accessTokenRef: ""
is rejected (safeParse returns success=false or parse throws), ensuring the
schema enforces non-empty values for accessTokenRef.
In `@lib/secrets/token-store.ts`:
- Around line 103-111: The current sequence writes refreshTokenRef then
accessTokenRef and can leave refresh persisted if the access write fails; update
the logic in lib/secrets/token-store.ts (the getKeytarOrThrow call and the
setPassword calls for SECRET_SERVICE using refreshTokenRef and accessTokenRef)
so the write is rollback-safe: either write access first (if present) then
refresh, or keep the existing order but wrap the access setPassword in a
try/catch and on any failure call keytar.deletePassword(SECRET_SERVICE,
refreshTokenRef) to remove the orphaned refresh token before re-throwing the
error; ensure no logs print secrets or emails. Add a vitest regression in
test/token-store.test.ts that forces keytar.setPassword to throw for the
`:access` write (mocking getKeytarOrThrow/keytar), verifies the refresh entry is
removed (deletePassword called) and that the original error bubbles up, and
include assertions that no token strings are emitted to logs.
- Around line 45-53: The dynamic import of keytar in lib/secrets/token-store.ts
assumes methods exist at the top-level (mod.setPassword, mod.getPassword,
mod.deletePassword) which breaks for CJS packages imported into ESM (they appear
under mod.default); change the resolution logic after the import to normalize
the module (e.g., let keytar = (mod && (mod.setPassword || mod.getPassword ||
mod.deletePassword)) ? mod : mod?.default ?? null) and then validate
keytar.setPassword/getPassword/deletePassword before returning it; also update
the test in test/token-store.test.ts to add or adjust a mock that simulates the
CJS-wrapped shape ({ default: { setPassword, getPassword, deletePassword } }) or
add a dedicated esm interop test case to ensure the wrapped export path works.
In `@lib/storage.ts`:
- Around line 890-893: The warning currently logs PII (rawAccount.email); remove
email from the payload and log only non-sensitive identifiers (e.g.,
rawAccount.accountId) or, if you need a correlate, a one-way hash of the email
(use the existing hashing utility or add one) in the log.warn call where the
message "Skipping v4 account with missing keychain secret" is emitted. Update
any affected vitest tests to expect the email is not present (or expect the
hash), and run/adjust tests under lib/**; also ensure any new queue changes
introduced around this code follow existing EBUSY/429 retry patterns used
elsewhere.
- Around line 1279-1307: clearPersistedAccountSecrets currently only inspects
the top-level v4 JSON shape and calls deleteAccountSecrets which can be
mode-dependent; update clearPersistedAccountSecrets to also detect WAL-style
records (objects with {version:1, content: "..."}) and JSON.parse the content to
extract account refs (refreshTokenRef/accessTokenRef) the same way as top-level
accounts, then call a mode-agnostic deletion path (either add a
force/deleteDirect helper used by deleteAccountSecrets or pass a force flag) so
deletions run even if runtime mode flipped; additionally, implement
retries/backoff for delete failures due to transient EBUSY or 429 errors and
ensure logs avoid printing tokens/emails, and add/adjust vitest cases covering
WAL payload parsing, mode-flip deletion, EBUSY/429 retries, and Windows IO
concurrency scenarios referencing clearPersistedAccountSecrets,
deleteAccountSecrets, and the new force/deleteDirect helper.
- Around line 1111-1139: The serialize path writes secrets to the keychain via
ensureSecretStorageBackendAvailable() and persistAccountSecrets() before the
file write completes, so if later steps fail the keychain will have orphaned
refs; modify the serialization flow in lib/storage.ts to collect the refs
returned by persistAccountSecrets (refsByIndex), attempt the file
write/migrateV3ToV4 and only commit (keep) refs on success, and on any failure
perform a rollback that deletes/revokes all refs written during this call (use
the same secret backend API used to persist refs), ensuring rollback handles
transient errors like EBUSY/429 with retries/backoff; update/move logic around
persistAccountSecrets, migrateV3ToV4, and the final JSON stringify so cleanup
runs on throw, avoid logging any tokens/emails during errors, and add a vitest
regression test at test/storage-v4-keychain.test.ts that simulates a file-write
failure after persistAccountSecrets and asserts that all created refs were
removed.
In `@lib/unified-settings.ts`:
- Around line 126-131: The settings writer currently sets file mode only on
initial temp-file creation (writeFileSync with SECURE_FILE_MODE) but when the
temp is atomically renamed over an existing file the old permissions remain; add
a regression test that ensures re-hardening by calling
saveUnifiedPluginConfigSync twice on the same path and verifying the final file
mode is 0o600. Specifically, in the test suite (where expectSecureFileMode is
defined) create a single temp directory, call saveUnifiedPluginConfigSync(...)
once, then call it a second time against the same directory, stat the resulting
UNIFIED_SETTINGS_PATH and assert its mode equals SECURE_FILE_MODE (0o600); if
desired extend the production code path around saveUnifiedPluginConfigSync /
renameSync to explicitly chmod the target to SECURE_FILE_MODE after rename to
guarantee permissions on overwrite.
In `@scripts/enterprise-health-check.js`:
- Around line 71-83: The existsSync/ stat pair for walPath creates a TOCTOU
race; replace the existsSync check with a single await stat(walPath) inside a
try/catch (or wrap the existing stat call in try/catch) so if the file is
removed between check and use you handle the ENOENT gracefully and skip pushing
checks/findings; update the block referencing walPath, walStats, walAgeMs,
WAL_STALE_MS, checks and findings to only compute walAgeMs and push the
stale-wal finding when stat succeeds, and log/ignore ENOENT errors in the catch
so the script remains robust.
- Line 69: Replace the hardcoded auditDir assignment in
enterprise-health-check.js (currently using join(root, "logs")) to call and use
getCodexLogDir() (the same helper used by lib/audit.ts and lib/runtime-paths.ts)
so the health check follows fallback locations returned by
getCodexMultiAuthDir()/getCodexLogDir(); update any references to resolveRoot()
there to prefer getCodexLogDir() for audit path resolution. Add a regression
test modeled on test/runtime-paths.test.ts (lines ~48-100) that sets up fallback
account/log directories and asserts enterprise-health-check scans the fallback
audit dir (including asserting newestMtimeMs() sees files with platform-specific
separators), and ensure path comparisons in the test and the health-check logic
treat Windows paths case-insensitively to match lib/runtime-paths.ts
deduplication behavior.
In `@scripts/performance-budget-check.js`:
- Around line 32-43: The loop assumes report is an object with an array at
report.results and can throw if report is malformed; before iterating, validate
that report is an object and that report.results is an Array (e.g., use typeof
report === "object" && report !== null and Array.isArray(report.results)) and
fall back to an empty array if not; update the iteration to use the validated
results array so the logic around budgets[result.name], result.avgMs and pushing
into violations remains safe.
- Around line 26-43: The code in main() calls
JSON.parse(readFileSync(budgetPath)) and JSON.parse(readFileSync(outputPath))
without ensuring those files exist or contain valid JSON; add guards before
parsing: check fs.existsSync(budgetPath) and fs.existsSync(outputPath) (or
try/catch around readFileSync) and produce a clear error/exit if missing, and
wrap JSON.parse in try/catch to handle invalid JSON from runBenchmark()
failures, logging which file (budgetPath or outputPath) failed and
returning/exit early so violations logic (report.results) only runs on valid
parsed objects.
In `@test/storage-v4-keychain.test.ts`:
- Around line 57-98: Add two deterministic vitest regression tests: (1)
concurrent save test — call saveAccounts concurrently (e.g., Promise.all on
several saveAccounts with distinct account payloads) and after all resolve read
getStoragePath() file, JSON.parse it and assert it’s valid JSON (no torn
writes), version === 4 and the expected accounts data is present/consistent;
reference saveAccounts and getStoragePath to locate code. (2) windows
rename-retry test — vi.mock or vi.doMock fs.rename to throw an Error with code
'EPERM' (or 'EBUSY') on the first N attempts then succeed on a subsequent
attempt, call saveAccounts and assert persistence succeeded by reading the file
and checking contents; reference saveAccounts and the internal rename path
(fs.rename) so you can stub the retry logic. In both tests mock keytar like the
existing test to keep deterministic secrets, ensure mocks are reset between
tests, and assert fs.rename was called the expected number of times.
In `@test/token-store.test.ts`:
- Around line 32-63: Add deterministic vitest tests in test/token-store.test.ts
that cover concurrency and failure-paths for keychain mode: keep the existing
happy-path but add three new cases that use vi.doMock("keytar") and controlled
Promise resolvers to (1) simulate concurrent calls to
token-store.persistAccountSecrets("acct-1", ...) and assert both callers either
get consistent refs and stored values (use Promise-controlled ordering to
reproduce race), (2) simulate a forced rejection from setPassword when writing
the access token while allowing the refresh write to succeed and assert that
token-store.persistAccountSecrets rolls back by deleting the refresh entry
(verify deletePassword was called for the refresh ref), and (3) verify
deleteAccountSecrets cleans up both refresh and access entries (mock
deletePassword and assert calls). Ensure each test calls
token-store.ensureSecretStorageBackendAvailable() and uses
token-store.loadAccountSecrets/persistAccountSecrets/deleteAccountSecrets for
assertions; make mocks deterministic by exposing resolve/reject controls rather
than timing-based delays.
In `@test/unified-settings.test.ts`:
- Around line 6-10: The test helper expectSecureFileMode currently returns early
on Windows, skipping validation; instead, update expectSecureFileMode to perform
Windows-safe assertions: when process.platform === "win32" assert the file
exists and is readable (e.g., attempt a deterministic fs.readFile or fs.access)
and assert no temp/backup files were left behind (check that no sibling
filenames like `${path}.tmp` or similar exist) so the secure-write paths
exercised by lib/unified-settings.ts (the secure-write branch around lines where
secure write is implemented) are validated on Windows; keep the existing POSIX
mode check for non-Windows. Ensure assertions use vitest APIs and remain
deterministic.
---
Outside diff comments:
In `@lib/config.ts`:
- Around line 294-304: The rename retry loop in lib/config.ts (the fs.rename
retry inside the save flow) needs a deterministic vitest that simulates
transient EBUSY/EPERM failures on the env-path branch so we can assert the retry
behavior; add a test in test/config-save.test.ts that mocks or stubs fs.rename
to throw EBUSY/EPERM for the first N attempts and then succeed, assert that the
code calls fs.rename the expected number of times and ultimately succeeds, and
ensure the test covers the env-path code path (the same function that performs
the rename/rename-with-tempPath logic) so we catch concurrency/lock regressions
and validate exponential backoff handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ed2c4888-6257-4052-b159-051d60f151ae
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (33)
.github/workflows/ci.yml.github/workflows/release-provenance.yml.github/workflows/secret-scan.yml.gitleaks.tomlREADME.mdSECURITY.mdconfig/performance-budgets.jsondocs/README.mddocs/configuration.mddocs/operations/incident-response.mddocs/operations/release-runbook.mddocs/privacy.mddocs/reference/settings.mddocs/reference/storage-paths.mdlib/audit.tslib/codex-manager.tslib/config.tslib/keytar.d.tslib/schemas.tslib/secrets/token-store.tslib/storage.tslib/storage/migrations.tslib/unified-settings.tspackage.jsonscripts/enterprise-health-check.jsscripts/performance-budget-check.jsscripts/retention-cleanup.jstest/config-save.test.tstest/schemas.test.tstest/storage-v4-keychain.test.tstest/token-store.test.tstest/unified-settings.test.tsvitest.config.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/README.mddocs/operations/release-runbook.mddocs/operations/incident-response.mddocs/reference/settings.mddocs/configuration.mddocs/reference/storage-paths.mddocs/privacy.md
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/keytar.d.tslib/config.tslib/storage.tslib/codex-manager.tslib/secrets/token-store.tslib/unified-settings.tslib/storage/migrations.tslib/audit.tslib/schemas.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/storage-v4-keychain.test.tstest/token-store.test.tstest/config-save.test.tstest/unified-settings.test.tstest/schemas.test.ts
🧬 Code graph analysis (9)
test/storage-v4-keychain.test.ts (1)
lib/storage.ts (3)
saveAccounts(1273-1277)getStoragePath(470-475)loadAccounts(872-874)
lib/storage.ts (4)
lib/storage/migrations.ts (3)
AccountStorageV1(34-38)AccountStorageV3(59-64)AccountStorageV4(83-88)lib/secrets/token-store.ts (6)
getEffectiveSecretStorageMode(62-68)loadAccountSecrets(119-136)ensureSecretStorageBackendAvailable(78-82)deriveAccountSecretRef(84-94)persistAccountSecrets(96-117)deleteAccountSecrets(138-147)lib/schemas.ts (2)
AnyAccountStorageSchema(210-214)AccountStorageV4Schema(165-170)lib/utils.ts (1)
isRecord(11-13)
lib/codex-manager.ts (1)
lib/audit.ts (1)
auditLog(146-177)
lib/secrets/token-store.ts (1)
lib/logger.ts (1)
createLogger(366-424)
lib/unified-settings.ts (1)
lib/runtime-paths.ts (1)
getCodexMultiAuthDir(166-199)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (1)
getUnifiedSettingsPath(223-225)
lib/storage/migrations.ts (2)
lib/storage.ts (6)
AccountMetadataV4(48-48)RateLimitStateV3(43-43)CooldownReason(42-42)AccountStorageV4(49-49)AccountStorageV3(47-47)AccountMetadataV3(46-46)lib/prompts/codex.ts (1)
ModelFamily(47-52)
scripts/performance-budget-check.js (5)
scripts/audit-dev-allowlist.js (1)
process(139-147)scripts/test-model-matrix.js (1)
result(383-386)scripts/enterprise-health-check.js (1)
payload(100-106)scripts/retention-cleanup.js (1)
payload(89-95)lib/logger.ts (1)
error(389-393)
test/schemas.test.ts (1)
lib/schemas.ts (2)
AccountStorageV4Schema(165-170)AnyAccountStorageSchema(210-214)
🪛 LanguageTool
docs/operations/incident-response.md
[style] ~39-~39: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rst, then codex auth doctor --fix. 4. If auth failures persist, rotate account v...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.21.0)
docs/configuration.md
[warning] 69-69: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
🔇 Additional comments (17)
.github/workflows/secret-scan.yml (1)
3-8: trigger scope is appropriate for mainline gating; no concurrency risk here.push + pull_request on main is a reasonable baseline trigger setup for this scanner workflow.
ref:
lib/.github/workflows/secret-scan.yml:3README.md (1)
189-189: lgtm — env var documented consistently across readme and docs/configuration.md:69.the
keychaindefault aligns with the secret storage implementation inlib/secrets/token-store.ts. docs match the actual behavior.SECURITY.md (1)
86-93: lgtm — pre-release checklist now includes the new ops and perf scripts.order is sensible:
audit:cifirst for vulnerability scan, thenops:health-checkandperf:budget-checkbefore lint/test/build. matches the ci workflow additions.lib/keytar.d.ts (1)
1-5: type declarations match keytar npm api exactly; subset is appropriate.declarations at lines 1-5 are correct—setPassword, getPassword, deletePassword all match node-keytar's published signatures. only these three are needed; token-store.ts:lib/secrets/token-store.ts:105-110, lib/secrets/token-store.ts:126-133, lib/secrets/token-store.ts:143-145 never calls findPassword or findCredentials.
verify test coverage: test/token-store.test.ts:32-63 covers keytar roundtrip (store + load). no ebusy/429 applicable here (local credential store, not http queue). logging at lib/secrets/token-store.ts:128 safely logs ref, not token value.
lib/config.ts (1)
287-291: harden file/directory permissions after creation and add regression tests for retry scenarioslib/config.ts:287-291 sets mode at mkdir/writeFile creation, but skips chmod to remediate pre-existing permissions or enforce exact modes after atomic rename. on unix-like systems, parent directory permissions or umask can weaken the created file/directory; chmod after creation ensures the target permissions regardless. the retry logic handles EBUSY/EPERM in the rename loop (lib/config.ts:294-305) but test/config-save.test.ts has no coverage for these transient filesystem errors.
windows mode-bit behavior is correctly skipped in test assertions (test/config-save.test.ts:28), so no windows-specific chmod is needed. however, test coverage gaps exist:
- no test cases for EBUSY/EPERM retries in config save paths
- no test cases for directory permission remediation on reused custom config paths (pre-existing directories)
apply the proposed fix (chmod on non-windows platforms after mkdir and after rename success) and add vitest cases covering:
- EBUSY on rename (verify retry succeeds after transient lock)
- pre-existing directory with loose permissions (verify chmod tightens perms on reused paths)
lib/codex-manager.ts (1)
4067-4089: lgtm - audit wrapper handles success and failure paths correctly.the
runWithAuditwrapper properly captures exit codes on success and error messages on failure. metadata is limited to safe values (exitCode, error string) with no token/email leakage.scripts/retention-cleanup.js (2)
33-47: good ebusy/eperm retry handling.
scripts/retention-cleanup.js:33-47correctly implements exponential backoff forEBUSY,EPERM, andENOTEMPTYerrors with up to 6 attempts. this handles transient windows file locking scenarios well.
85-87: sequential deletion is appropriate here.sequential deletion in the loop avoids thundering herd on filesystem and reduces likelihood of concurrent EBUSY conflicts. good choice for reliability over raw speed.
docs/reference/storage-paths.md (1)
34-37: upgrade.md must document v4 storage format migration path. the security note describes version 4 storing keychain references instead of raw tokens in account files. per coding guidelines, behavior changes require updated upgrade notes. docs/upgrade.md documents worktree migration but has no v4 migration steps—add section covering plaintext→keychain transition, CODEX_SECRET_STORAGE_MODE usage for controlled environments, and any re-auth requirements. also verify docs/SECURITY and docs/README mention this storage format change if they reference account security.package.json (1)
127-129: add windows regression tests for keytar credential manager integration.keytar@7.9.0 is the latest version on npm. the optional dependency is properly handled—code gracefully falls back to plaintext mode when keytar is unavailable (lib/secrets/token-store.ts:76-80). tests exist for keychain and plaintext modes (test/token-store.test.ts:32-63, test/storage-v4-keychain.test.ts:57-98).
however, there's no regression test coverage for windows: keychain storage on windows uses the windows credential manager (not system keychain like macOS), and the tests don't verify this platform-specific behavior. add a windows-specific test or at minimum document the edge case.
lib/secrets/token-store.ts (1)
84-94: ref derivation logic looks good
lib/secrets/token-store.ts: lines 84-94has deterministic normalization and fixed-length hash output, which is good for stable key references.test/schemas.test.ts (1)
241-262: solid v4 schema coverage
test/schemas.test.ts: lines 241-262andtest/schemas.test.ts: lines 312-322add clear positive/negative coverage for v4 parsing and discriminated union behavior.Also applies to: 312-322
test/storage-v4-keychain.test.ts (1)
8-25: retry helper is well-bounded
test/storage-v4-keychain.test.ts: lines 8-25uses finite retries and targeted retryable codes. this is a good deterministic teardown pattern on windows.lib/storage/migrations.ts (1)
129-160: migration mapping is clean
lib/storage/migrations.ts: lines 129-160cleanly separates token refs from account metadata and preserves v3 fields during v4 conversion.test/token-store.test.ts (1)
65-81: stable ref test is useful
test/token-store.test.ts: lines 65-81correctly checks case-insensitive email normalization for deterministic secret refs.lib/schemas.ts (1)
210-214: v4 inclusion in the discriminated union looks correct
lib/schemas.ts: lines 210-214correctly extends version-based parsing for v4.lib/storage.ts (1)
1149-1149: permission hardening here is good
lib/storage.ts: line 1149setting directory mode to0o700is the right baseline for sensitive local storage.
Implement remaining roadmap controls for enterprise readiness hardening. - add scheduled retention and recovery drill workflows with artifacts - add SBOM generation/verification and dependency attestation workflow - add compliance evidence bundle, audit forwarding, and SLO reporting scripts - expand runtime performance benchmarks and enforce full budget metric coverage - add incident drill template, SLO policy doc, and audit forwarding runbook - wire keychain assertion and SBOM checks into CI/release gates Co-authored-by: Codex <noreply@openai.com>
Additional Comments (5)
beyond windows safety: scanning + stat-ing every audit file on every single no regression test covers the case where add
the
the guard should also check that the stableSeed encodes something meaningfully unique:
when a keychain entry is missing, the account is silently skipped from for example: 3 v4 accounts, activeIndex=2. keychain entry for account 0 is gone (deleted externally). hydration produces 2 accounts at indices 0 and 1 in the new array, but track how many accounts were skipped before the active index and adjust: let skippedBeforeActive = 0;
for (let i = 0; i < data.accounts.length; i++) {
const rawAccount = data.accounts[i]!;
const secrets = await loadAccountSecrets({ ... });
if (!secrets || !secrets.refreshToken) {
if (i < data.activeIndex) skippedBeforeActive++;
continue;
}
hydratedAccounts.push({ ... });
}
const adjustedActiveIndex = Math.max(0, data.activeIndex - skippedBeforeActive);
return normalizeAccountStorage({
version: 3,
accounts: hydratedAccounts,
activeIndex: adjustedActiveIndex,
activeIndexByFamily: data.activeIndexByFamily,
});
this allowlist entry should be scoped to only the already-allowlisted paths by using a path-scoped rule allowlist inside a custom regexes = [
'''fake_refresh_token_[0-9]+''',
'''secret-(access|refresh)-token''',
'''top secret prompt''',
# narrow this: only suppress keys that are clearly synthetic fixture values
'''sk-fake[A-Za-z0-9]{16,}'''
] |
Co-authored-by: Codex <noreply@openai.com>
Additional Comments (3)
since
the test for stale log purging uses after the spy is restored and the next test calls add a reset to prevent this cross-test bleed:
consider writing a short "pending commit" journal entry (containing only the refs, not the tokens) to disk before starting keychain writes, then deleting it on success. on load, if this journal exists, it can be used as the cleanup manifest. this is the same WAL pattern already used for the main accounts file. |
Add targeted fixes for outstanding review threads: - fix markdown table rendering for CODEX_SECRET_STORAGE_MODE - add audit purge retry regression coverage for EBUSY lock contention - strengthen enterprise health-check fallback audit directory regression checks Co-authored-by: Codex <noreply@openai.com>
Additional Comments (3)
this requires
the blanket tighten the allowlist to known synthetic fixture patterns: alternatively, if broad test exemption is intentional, document the decision and require test credential values to use the
every code change must reference the regression test added for Windows filesystem race conditions. the current add a vitest test that triggers concurrent
|
The audit purge retry helper no longer calls Atomics.wait(), eliminating event-loop blocking during transient filesystem retry loops. Added regression coverage to assert Atomics.wait is not invoked while retrying EBUSY deletions.\n\nCo-authored-by: Codex <noreply@openai.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Follow-up tracking: external blocker recorded in #44. Current failing context remains:
Actions already taken:
Next required step is external quota reset/increase, then rerun CodeRabbit. |
|
✅ Actions performedReview triggered.
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 27
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/codex-manager.ts (1)
4065-4150:⚠️ Potential issue | 🟠 Majoradd vitest regressions for the new audit wrapper path.
i don’t see coverage in the provided tests for
runWithAuditsuccess/non-zero/throwing behavior and metadata shape. this path now gates most cli commands and should have direct regression tests intest/....As per coding guidelines
lib/**: verify every change cites affected tests (vitest).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/codex-manager.ts` around lines 4065 - 4150, Add vitest regression tests that exercise the new audit wrapper path: write tests that invoke runWithAudit indirectly via runCodexMultiAuthCli (using commands like "auth login", "auth switch", and a command that returns non-zero) and directly mock/spy auditLog to assert three cases—successful exit (exitCode 0), non-zero exit (exitCode != 0), and thrown error—verifying the AuditOutcome and metadata shape (fields: command, exitCode or error) are logged correctly; use spies/mocks on auditLog and the called runners (runAuthLogin, runSwitch, runReport, etc.) to simulate each runner behavior and restore originals after each test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 47-74: The workflow currently only runs on ubuntu-latest which
leaves Windows filesystem/permission behavior untested; either add a windows
runner entry to the workflow matrix so jobs like "Seed enterprise health
fixture" and the "Enterprise health check" (which sets CODEX_MULTI_AUTH_DIR and
invokes npm run ops:health-check) are executed on windows, or explicitly
document/mark the CI as Linux-only in the ops runbooks and the workflow
description; update the matrix to include a windows-latest job or add a
comment/README note near the workflow to indicate linux-only support and any
Windows-specific caveats for file permissions (0o600/0o700) and health-check
scripts.
- Around line 61-67: The CI step "Seed enterprise health fixture" currently
writes openai-codex-accounts.json with "version":3; add a v4 fixture (or
replace) so the new v4 storage/keychain schema is exercised by
scripts/enterprise-health-check.js: create an additional file (e.g.,
openai-codex-accounts.v4.json) or write a v4 JSON alongside the existing seed
that includes the v4-specific keys used by the keychain-backed storage, and
update the workflow step to seed that file (or run a second seed) so the
enterprise health check runs against v4 schema paths.
In @.github/workflows/recovery-drill.yml:
- Around line 31-35: Replace the inline vitest invocation in the GitHub Actions
step (the multi-line run block that creates .tmp and calls npm run test with
explicit test files) with the packaged npm script ops:recovery-drill; update the
run command to call npm run ops:recovery-drill and forward any needed reporter
flags (e.g., --reporter and --outputFile) so the workflow uses the single source
of truth (npm script) instead of duplicating the test target list.
In @.github/workflows/retention-maintenance.yml:
- Line 16: The job-level env uses runner.temp which is unavailable at job scope;
move setting CODEX_MULTI_AUTH_DIR out of the job-level env and instead set it in
the first workflow step by writing CODEX_MULTI_AUTH_DIR=${{ runner.temp
}}/codex-retention-root to $GITHUB_ENV (e.g., in the first step's run block echo
"CODEX_MULTI_AUTH_DIR=${{ runner.temp }}/codex-retention-root" >> $GITHUB_ENV)
so subsequent steps can access the variable; update/remove the job-level
CODEX_MULTI_AUTH_DIR entry accordingly.
In @.gitleaks.toml:
- Around line 8-17: The current .gitleaks.toml allowlist block uses OR semantics
(path OR regex) which lets secrets under test/ escape; change to using per-entry
[[allowlists]] with condition = "AND" so both path and regex must match (create
entries that pair path '^test/security/fixtures[\\/]' with the appropriate
regexes), narrow the path patterns from '^test[\\/]'' to
'^test/security/fixtures[\\/]' to limit scope, pin the gitleaks action to
v8.25.0+ (or update CI test expectations) so AND is supported, and add/update a
regression test in test/security/secret-scan-regression.test.sh to cover
Windows-style paths using [\\/] to ensure cross-platform behavior is correct.
In `@docs/configuration.md`:
- Line 69: Add a migration subsection to upgrade.md that documents migrating
existing deployments from plaintext to keychain mode: explain the
CODEX_SECRET_STORAGE_MODE options (`plaintext`, `keychain`, `auto`), provide
step-by-step migration commands/configuration changes to switch the env var and
safely migrate secrets (use the plaintext→keychain flow exercised in
test/storage-v4-keychain.test.ts as the authoritative sequence), note any
required downtime or backup steps, and include Windows-specific guidance about
filesystem locks and retry/backoff behavior (EBUSY, EPERM) referencing the error
handling described in releases/v0.1.2.md so operators know to retry or stop
interfering processes during migration.
In `@docs/operations/incident-response.md`:
- Around line 81-85: The Drill Cadence docs are missing the new npm script;
update docs/operations/incident-response.md (Drill Cadence section) to include a
step that explicitly instructs operators to run the new CLI script by name: npm
run ops:recovery-drill, and mention that this script is part of the drill
workflow (referencing incident-drill-template.md if needed) so README/docs
reflect the actual CLI surface and operators know to run the recovery drill
command during monthly tabletop drills.
In `@lib/audit.ts`:
- Around line 124-139: The purge throttle timestamp (lastPurgeAttemptMs) is
being updated before the directory read, so a transient read failure (e.g.,
Windows EBUSY) blocks further purges for PURGE_INTERVAL_MS; move the assignment
of lastPurgeAttemptMs to after the successful withRetryableAuditFsOperation(()
=> readdirSync(auditConfig.logDir)) call so it only advances when the read
succeeds, keep the existing retentionDays/cutoffMs calculation logic intact, and
ensure the change touches the symbols lastPurgeAttemptMs,
withRetryableAuditFsOperation, readdirSync, PURGE_INTERVAL_MS in lib/audit.ts;
update/add vitest coverage to simulate EBUSY/429 read failures and verify
throttling is not applied on failed reads per the lib/** concurrency/Windows IO
guidelines.
- Line 207: The write to audit log should be wrapped with the existing retry
helper and the chmod should be guarded on Windows: replace the direct
writeFileSync(logPath, line, ...) with a call inside
withRetryableAuditFsOperation(() => writeFileSync(logPath, line, { encoding:
"utf8", flag: "a", mode: 0o600 })), and after successful write call
fs.chmodSync(logPath, 0o600) only if process.platform !== "win32"; also add a
Vitest in test/audit.test.ts that calls auditLog and asserts the file mode is
0o600 (or skips the check on win32) to cover the regression.
In `@lib/codex-manager.ts`:
- Around line 4082-4085: The audit call is currently passing String(error) into
metadata which may leak tokens or sensitive data; replace that with a sanitized
error string instead of raw exception text: create or use a sanitizer (e.g.,
sanitizeError) and pass a safe summary such as sanitizedError =
sanitizeError(error) || error?.message?.slice(0,200) || "error"; then update the
auditLog call (auditLog(action, "cli-user", resource, AuditOutcome.FAILURE, {
command, error: sanitizedError })) so only non-sensitive, truncated error
summaries or generic error codes are stored rather than the full
exception/stack.
In `@lib/secrets/token-store.ts`:
- Around line 202-205: The current cleanup short-circuits when
deleteSecretRefWithRetry throws for refs.refreshTokenRef, skipping the access
token deletion; change the logic in token-store (function that performs these
deletions where refs.refreshTokenRef and refs.accessTokenRef are used) so both
deleteSecretRefWithRetry calls are invoked regardless of the other's failure
(e.g., run sequentially with independent try/catch blocks and aggregate/log
errors without leaking token/email values), and add a deterministic vitest
regression in test/token-store.test.ts that stubs/fakes deleteSecretRefWithRetry
to throw on the refreshTokenRef call and asserts that deleteSecretRefWithRetry
was still called for accessTokenRef (and that errors are handled/recorded as
expected). Ensure the test targets the same function/flow that uses
refs.refreshTokenRef and refs.accessTokenRef and cite the new test in the change
description.
In `@lib/storage.ts`:
- Around line 914-921: The activeIndexByFamily map is not adjusted when accounts
are skipped during v4 hydration; update the object before calling
normalizeAccountStorage by remapping each family index
(data.activeIndexByFamily) with the same offset used for activeIndex
(skippedBeforeActiveIndex): for every (family, idx) if idx is a number compute
newIdx = Math.max(0, idx - skippedBeforeActiveIndex) and also clamp newIdx to
hydratedAccounts.length - 1, preserving non-numeric/undefined values as-is, then
pass this remapped object to normalizeAccountStorage alongside
adjustedActiveIndex; apply this transform where activeIndexByFamily is currently
forwarded (near adjustedActiveIndex and normalizeAccountStorage).
In `@scripts/audit-log-forwarder.js`:
- Around line 72-93: The resume logic can drop events when a rotating filename
(e.g., "audit.log" -> "audit.1.log") sorts before the checkpoint filename:
update collectBatch/ fileComesBefore logic so checkpoints are not compared
purely by lexicographic filename. Specifically, modify
fileComesBefore/collectBatch to detect rotated aliases (e.g., match base name
patterns like "audit.log" vs "audit.1.log") or resolve stable identities before
ordering (use inode/device or canonical file identifier when available) and only
skip files when their stable identity is earlier than checkpoint.file; also
ensure the checkpoint comparison inside collectBatch (checkpoint.file === file
&& lineNumber <= checkpoint.line) remains based on the same stable identity.
This will prevent skipping unforwarded tail lines after rotation.
- Around line 125-133: The POST in sendBatch currently does a blocking fetch
without timeout or retries; wrap the fetch call in an AbortController-based
timeout and implement retry with exponential backoff + jitter for transient
429/5xx and network errors (retry limit e.g., 3) before throwing; ensure
failures/timeouts from sendBatch propagate so the checkpoint handling
(checkpointMode flag logic) advances or fails deterministically rather than
hanging. Add unit/integration tests: one that simulates a hanging endpoint to
assert sendBatch times out, one for the Windows edge case around the
checkpointMode flag behavior at the checkpoint logic, and a concurrency test
that launches multiple instances to exercise checkpoint file race handling and
confirm recovery/locking behavior. Use the sendBatch function name and the
checkpointMode/checkpoint file handling code paths as the touch points.
In `@scripts/compliance-evidence-bundle.js`:
- Around line 51-59: The runNpm function uses execFileSync for both Windows
(cmd.exe) and non-Windows paths but doesn't pass a maxBuffer option, causing
crashes on verbose npm output; update both execFileSync calls in runNpm to
include a sufficiently large maxBuffer (e.g., 10*1024*1024 or 30*1024*1024) in
the options passed (merge with the existing options object) so stdout/stderr
won't overflow, and add regression tests invoking verbose npm commands (e.g.,
long test/build/lint output) against scripts/compliance-evidence-bundle.js to
ensure the buffer bump prevents CI failures.
In `@scripts/enterprise-health-check.js`:
- Around line 159-161: The newestMtimeMs routine has a TOCTOU race: check of
existsSync(dir) can pass but readdir(dir) can throw if the directory is
removed/renamed concurrently; wrap the readdir call in a try/catch inside
newestMtimeMs (or the function that contains the shown snippet), treat errors
with code ENOENT or ENOTDIR as “no audit data” (return null) rather than letting
the exception bubble, and only rethrow unexpected errors; also add a regression
test in test/enterprise-health-check.test.ts (near the existing health-check
test around line ~114) that simulates audit-dir churn and asserts newestMtimeMs
(or the health-check behavior) returns null / marks no-audit-data instead of
throwing.
In `@scripts/generate-sbom.js`:
- Around line 14-23: The execFileSync calls in generate-sbom.js (the branches
that use process.execPath + npmExecPath and the fallback execFileSync("npm",
...)) must include a larger stdout/stderr buffer to avoid
ERR_CHILD_PROCESS_STDIO_MAXBUFFER: add maxBuffer: 20 * 1024 * 1024 to the
options object passed to both execFileSync calls (the ones called with
npmExecPath and with "npm"), and change the fallback logic to only call
execFileSync("npm", ...) when npmExecPath is falsy (i.e., prefer using the
resolved npmExecPath variable); update references to npmExecPath, sbomArgs and
the two execFileSync invocations accordingly. Also add a regression test that
runs scripts/generate-sbom.js against a realistic dependency graph (or mocks a
large stdout from the child process) to assert the script completes without
hitting the maxBuffer error.
In `@scripts/keychain-assert.js`:
- Around line 26-30: The execFileSync call using literal "npm" can fail on
Windows; update the fallback used in the block that calls execFileSync("npm",
args, { cwd: process.cwd(), stdio: "inherit", env }) to choose the correct
executable: prefer process.env.npm_execpath when present, otherwise use a
platform-aware name (e.g., const npmBin = process.platform === "win32" ?
"npm.cmd" : "npm") and call execFileSync(npmBin, args, { cwd: process.cwd(),
stdio: "inherit", env }). Keep the same args and env variables and ensure you
reference the existing variables (args, env) and the execFileSync invocation so
only the executable string is changed.
In `@scripts/performance-budget-check.js`:
- Around line 27-29: Duplicate helper isRecord in
scripts/performance-budget-check.js currently mirrors lib/utils.ts:10-12; for
now you can leave the local function to keep the script standalone, but if this
script is later incorporated into the build graph, remove the local isRecord and
import the canonical implementation (e.g., import { isRecord } from 'lib/utils')
instead; locate the local function named isRecord in performance-budget-check.js
and either retain it as-is for standalone execution or replace it with an import
and delete the duplicate implementation when you move the script into the
bundled codebase.
In `@scripts/retention-cleanup.js`:
- Around line 99-110: The script currently builds payload with failedFiles and
status but always exits 0; change the end of the routine that constructs payload
(the block that creates payload, including properties command, root,
retentionDays, cutoffIso, deletedFiles, failedFiles, failures, status) so that
if failed.length > 0 (i.e., payload.failedFiles > 0) the process exits with a
non-zero code (e.g., process.exit(1)) or throw an error; keep the existing
"partial" status but ensure the failure causes a failing exit code, and add a
unit test in test/retention-cleanup.test.ts that simulates a deletion failure
and asserts the script exits non‑zero when failedFiles > 0.
- Around line 50-52: The current check-then-use in collectExpiredFiles around
existsSync(rootPath) + readdir(rootPath, { withFileTypes: true }) has a race
that can throw if the directory is removed concurrently; wrap the readdir call
in a try/catch inside collectExpiredFiles (or the surrounding async block) and
treat ENOENT (and optionally EPERM on Windows) as a transient condition by
skipping/returning gracefully instead of letting the exception abort the run;
keep the existsSync check if desired but ensure the catch handles these errors
and only rethrows other unexpected errors.
In `@scripts/slo-budget-report.js`:
- Around line 106-108: staleWalFindings is only counting findings with code
"stale-wal" but tests/assertions expect "stale-audit-log" as the stale audit
regression code; update the filter on health.findings used to compute
staleWalFindings to include both codes (check finding.code === "stale-wal" ||
finding.code === "stale-audit-log") or normalize/rename the code to match the
test, ensuring the variable staleWalFindings, the health.findings array and the
finding.code checks are updated consistently so the SLO evaluation counts the
intended stale-audit-log findings.
In `@test/audit.test.ts`:
- Around line 237-242: The test mocks Date.now with an out-of-range value
(Number.MAX_SAFE_INTEGER - 1_000) which can produce invalid timestamps for
utimesSync; replace that with a deterministic in-range epoch (e.g. a constant
like 1600000000000 or Date.parse("2020-09-13T12:26:40Z")) and use vi.spyOn(Date,
"now").mockReturnValue(IN_RANGE_EPOCH) so staleMs = Date.now() - 3*24*60*60*1000
and utimesSync(staleLogPath, staleDate, staleDate) receive valid dates; update
the Date.now spy in the block around staleLogPath/writeFileSync/utimesSync
accordingly and keep the deterministic vitest spy usage.
In `@test/schemas.test.ts`:
- Around line 320-330: The test suite currently exercises
AnyAccountStorageSchema for a v4 shape but does not add a regression test for
the helper safeParseAccountStorage; add a deterministic Vitest regression that
calls safeParseAccountStorage with an explicit v4 payload (version: 4, accounts
with refreshTokenRef/addedAt/lastUsed, activeIndex) and assert the parse
succeeds and result.version === 4 so the helper wiring cannot regress
independently of the union schema (reference symbols: AnyAccountStorageSchema
and safeParseAccountStorage; place next to the existing schemas tests).
In `@test/security/secret-scan-regression.test.sh`:
- Around line 58-69: The Docker fallback in the secret-scan test pins the
gitleaks image to zricethezav/gitleaks:v8.24.2; add a matching explicit version
note for the native binary usage referenced earlier (the native gitleaks path
used around the native binary invocation) so both paths declare the expected
gitleaks version; update the test script (e.g., near the native binary variable
or the native invocation) to include a comment or variable like
EXPECTED_GITLEAKS_VERSION="v8.24.2" and reference it in the native path/usage to
ensure parity with the docker image reference used in the docker run command.
- Around line 96-98: The pass-case invocation of run_gitleaks_detect (using
PASS_CASE_DIR and PASS_REPORT) prints "secret-scan-regression: passed"
unconditionally; capture and check its exit status like the fail-case does so
failures are detected and logged. Modify the block that calls
run_gitleaks_detect to save its exit code ($?), test it (e.g., if [ $rc -ne 0 ]
...), and print a clear failure message and non-zero exit/return on error,
otherwise print the existing success message; reference the run_gitleaks_detect
call and the PASS_CASE_DIR/PASS_REPORT variables when implementing the check.
---
Outside diff comments:
In `@lib/codex-manager.ts`:
- Around line 4065-4150: Add vitest regression tests that exercise the new audit
wrapper path: write tests that invoke runWithAudit indirectly via
runCodexMultiAuthCli (using commands like "auth login", "auth switch", and a
command that returns non-zero) and directly mock/spy auditLog to assert three
cases—successful exit (exitCode 0), non-zero exit (exitCode != 0), and thrown
error—verifying the AuditOutcome and metadata shape (fields: command, exitCode
or error) are logged correctly; use spies/mocks on auditLog and the called
runners (runAuthLogin, runSwitch, runReport, etc.) to simulate each runner
behavior and restore originals after each test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 52e0294c-22c1-458a-8b40-ed47482b82ce
📒 Files selected for processing (46)
.github/workflows/ci.yml.github/workflows/recovery-drill.yml.github/workflows/release-provenance.yml.github/workflows/retention-maintenance.yml.github/workflows/sbom-attestation.yml.github/workflows/secret-scan.yml.gitleaks.tomlREADME.mdSECURITY.mdconfig/performance-budgets.jsonconfig/performance-budgets.schema.jsonconfig/slo-policy.jsondocs/README.mddocs/configuration.mddocs/operations/audit-forwarding.mddocs/operations/incident-drill-template.mddocs/operations/incident-response.mddocs/operations/release-runbook.mddocs/operations/slo-error-budget.mddocs/privacy.mddocs/reference/settings.mdlib/audit.tslib/codex-manager.tslib/schemas.tslib/secrets/token-store.tslib/storage.tslib/unified-settings.tspackage.jsonscripts/audit-log-forwarder.jsscripts/benchmark-runtime-path.mjsscripts/compliance-evidence-bundle.jsscripts/enterprise-health-check.jsscripts/generate-sbom.jsscripts/keychain-assert.jsscripts/performance-budget-check.jsscripts/retention-cleanup.jsscripts/slo-budget-report.jsscripts/verify-sbom.jstest/audit-retry.test.tstest/audit.test.tstest/enterprise-health-check.test.tstest/schemas.test.tstest/security/secret-scan-regression.test.shtest/storage-v4-keychain.test.tstest/token-store.test.tstest/unified-settings.test.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/operations/slo-error-budget.mddocs/README.mddocs/operations/release-runbook.mddocs/operations/incident-drill-template.mddocs/configuration.mddocs/privacy.mddocs/operations/incident-response.mddocs/operations/audit-forwarding.mddocs/reference/settings.md
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/unified-settings.test.tstest/audit-retry.test.tstest/storage-v4-keychain.test.tstest/enterprise-health-check.test.tstest/token-store.test.tstest/security/secret-scan-regression.test.shtest/schemas.test.tstest/audit.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/unified-settings.tslib/codex-manager.tslib/schemas.tslib/audit.tslib/secrets/token-store.tslib/storage.ts
🧬 Code graph analysis (13)
scripts/benchmark-runtime-path.mjs (1)
lib/storage.ts (1)
normalizeAccountStorage(776-865)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (3)
getUnifiedSettingsPath(254-256)saveUnifiedPluginConfigSync(289-293)loadUnifiedPluginConfigSync(267-275)
lib/unified-settings.ts (1)
lib/runtime-paths.ts (1)
getCodexMultiAuthDir(166-199)
test/audit-retry.test.ts (2)
lib/logger.ts (1)
error(389-393)scripts/audit-dev-allowlist.js (1)
audit(150-161)
lib/codex-manager.ts (1)
lib/audit.ts (1)
auditLog(180-211)
test/storage-v4-keychain.test.ts (1)
lib/storage.ts (4)
saveAccounts(1310-1314)getStoragePath(470-475)loadAccounts(872-874)clearAccounts(1389-1419)
scripts/generate-sbom.js (2)
scripts/audit-dev-allowlist.js (1)
process(139-147)lib/logger.ts (1)
error(389-393)
lib/audit.ts (2)
lib/runtime-paths.ts (1)
getCodexLogDir(226-228)scripts/retention-cleanup.js (3)
attempt(34-34)retentionDays(71-71)cutoffMs(73-73)
scripts/performance-budget-check.js (1)
lib/utils.ts (1)
isRecord(11-13)
scripts/compliance-evidence-bundle.js (3)
scripts/audit-log-forwarder.js (3)
prefix(13-13)hit(14-14)dryRun(137-137)lib/logger.ts (1)
error(389-393)scripts/performance-budget-check.js (1)
message(35-35)
lib/secrets/token-store.ts (2)
lib/logger.ts (2)
createLogger(366-424)error(389-393)lib/utils.ts (1)
sleep(65-67)
test/schemas.test.ts (1)
lib/schemas.ts (2)
AccountStorageV4Schema(165-170)AnyAccountStorageSchema(210-214)
test/audit.test.ts (1)
lib/audit.ts (2)
configureAudit(65-67)auditLog(180-211)
🪛 actionlint (1.7.11)
.github/workflows/retention-maintenance.yml
[error] 16-16: context "runner" is not allowed here. available contexts are "github", "inputs", "matrix", "needs", "secrets", "strategy", "vars". see https://docs.github.com/en/actions/learn-github-actions/contexts#context-availability for more details
(expression)
🪛 LanguageTool
docs/operations/incident-response.md
[style] ~39-~39: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rst, then codex auth doctor --fix. 4. If auth failures persist, rotate account v...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~40-~40: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...dex>and re-runcodex auth check`. 5. If all accounts are exhausted/disabled, es...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (18)
scripts/benchmark-runtime-path.mjs (1)
109-134: good benchmark coverage for storage normalization and account-id resolution.this adds realistic storage pressure cases and a correctness guard (
version === 3) aroundnormalizeAccountStorageinscripts/benchmark-runtime-path.mjs:177-180. this is a solid addition for perf regression tracking.Also applies to: 170-180
test/unified-settings.test.ts (1)
6-18: strong regression coverage for windows-safe secure writes and overwrite hardening.
test/unified-settings.test.ts:6-18andtest/unified-settings.test.ts:103-115close the windows/temp-leak and repeated-write permission regressions with deterministic vitest assertions.Also applies to: 68-69, 100-115
test/audit-retry.test.ts (1)
20-77: good concurrency regression test for retry behavior.
test/audit-retry.test.ts:20-77deterministically reproducesebusyretries and verifies the non-blocking path (atomics.waitnot used). this is the right regression guard for the incident fix.scripts/verify-sbom.js (1)
12-47: sbom validation flow is clean and ci-friendly.
scripts/verify-sbom.js:12-47has clear structural checks and machine-readable output for pipeline gating.package.json (1)
62-74: script surface expansion looks consistent with the enterprise ops goals.the new
ops:*,sbom:*, and perf scripts inpackage.json:62-74are coherent and the optionalkeytarwiring inpackage.json:134-136fits the keychain storage path.Also applies to: 134-136
.github/workflows/sbom-attestation.yml (1)
10-48: workflow wiring looks good.permissions and push-only attestation gate are correctly scoped for sbom generation + provenance attestation.
.github/workflows/release-provenance.yml (1)
30-59: release gate sequence and publish auth are in place.the quality gates, sbom checks, and
NODE_AUTH_TOKENwiring are consistent with release provenance expectations.lib/schemas.ts (1)
143-173: v4 schema integration looks solid.
lib/schemas.ts:148-149correctly enforces non-empty token refs, andlib/schemas.ts:210-214cleanly extends version discrimination to v4 for migration parsing.Also applies to: 208-214
.github/workflows/ci.yml (1)
58-59: keychain assertion step placement looks correct.runs after build, uses dedicated npm script. the step will fail fast if keychain storage contract is broken, which is the right behavior for enterprise readiness gating.
scripts/keychain-assert.js (1)
33-37: error handling is reasonable.the try-catch propagates the child process exit status correctly.
execFileSyncthrows an error withstatusproperty on non-zero exit, soerror?.statuswill capture test failures.scripts/performance-budget-check.js (1)
40-96: validation and error handling look solid.past concerns about missing existence checks (lines 42-47) and defensive json parsing (lines 48-53) are addressed. the violation detection loop and missing-benchmark flagging logic at lines 57-83 is correct.
test/security/secret-scan-regression.test.sh (1)
81-94: node assertion block is well-structured.the inline node script at lines 81-94 validates three conditions: non-empty findings, presence of
src/leak.txt, and absence of allowlistedtest/fixture.txt. this provides strong regression coverage for scanner policy drift. as per coding guidelines for test/**, the assertions are explicit and not skipped.docs/operations/slo-error-budget.md (1)
40-50: slo documentation references are accurate; scripts verified.both
npm run ops:slo-reportin package.json and the underlyingscripts/slo-budget-report.jsfile exist and match the documented workflows atdocs/operations/slo-error-budget.md:40-50. the two command variants shown—npm script for report generation and direct invocation with --enforce for gate enforcement—are consistent with actual implementation.config/performance-budgets.json (1)
1-9: approve - units are well documented.the seven budgets use consistent naming. unit documentation is comprehensive:
config/performance-budgets.schema.jsonexplicitly states "all values are milliseconds" in the root description and each property, whilescripts/performance-budget-check.js:7reiterates the same. thresholds look reasonable.note: verify regression tests exist for the performance-budget-check script itself to prevent silent failures during ci runs.
.github/workflows/secret-scan.yml (1)
1-33: action shas verified; regression test comprehensive.both actions pinned to valid, recent commits: actions/checkout v4.3.1 (nov 2025) and gitleaks-action v2.3.9 (apr 2025). permissions appropriately minimal.
regression test at test/security/secret-scan-regression.test.sh:1-98 is solid: creates fail-case with openai api key leak pattern and pass-case with allowlisted fixture, verifies detection works and allowlist rules hold via cli and docker fallback. test uses node assertions on json output.
no windows edge cases apply here. no concurrency risks: single job, sequential steps. workflow structure is correct.
test/enterprise-health-check.test.ts (1)
56-155: good deterministic coverage for fallback and stale-audit paths.
test/enterprise-health-check.test.ts:56(Line 56),test/enterprise-health-check.test.ts:75(Line 75), andtest/enterprise-health-check.test.ts:114(Line 114) give strong regression coverage for missing artifacts, fallback root resolution, and stale audit detection.test/token-store.test.ts (1)
125-241: keychain race, rollback, and cleanup coverage is in place.
test/token-store.test.ts:125(Line 125),test/token-store.test.ts:177(Line 177), andtest/token-store.test.ts:211(Line 211) add deterministic regression coverage for concurrent writes, access-write rollback, and secret deletion cleanup.test/storage-v4-keychain.test.ts (1)
112-323: strong regression coverage for windows io and keychain rollback paths.
test/storage-v4-keychain.test.ts:112(Line 112),test/storage-v4-keychain.test.ts:167(Line 167), andtest/storage-v4-keychain.test.ts:206(Line 206) cover torn-write risk, windows-style rename retry, and rollback cleanup in deterministic vitest scenarios.
Address remaining PR #43 review feedback across workflows, docs, runtime hardening, and script reliability. Adds deterministic regressions for rotation/checkpoint edge cases, retention failure exits, storage/token cleanup behavior, unified settings permissions, and schema/parser coverage. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 24
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/ci.yml (1)
20-27: 🧹 Nitpick | 🔵 Trivialconsider pinning actions to commit shas for consistency with secret-scan.yml.
.github/workflows/secret-scan.yml:19,24pins actions to full commit shas but this workflow uses mutable version tags (actions/checkout@v4,actions/setup-node@v4). for supply chain consistency, consider pinning these as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 20 - 27, The workflow uses mutable tags actions/checkout@v4 and actions/setup-node@v4; replace those with the corresponding full commit SHAs (the same pattern used in secret-scan.yml) to pin versions for supply-chain consistency: update the uses lines for actions/checkout and actions/setup-node to their full commit SHA references instead of `@v4` and keep the existing inputs (node-version, cache) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 62-65: Extract the inline one-liner into a dedicated Node script
(e.g., scripts/seed-health-fixture.js) that uses fs.mkdirSync, path.join and
fs.writeFileSync to create the .tmp/health-fixture directory, logs subdir, and
the three JSON files; then update the workflow step (formerly running node -e
"...") to run node scripts/seed-health-fixture.js. In the new script, build
paths with path.join and use recursive mkdir and explicit JSON.stringify calls
(matching the existing object shapes and filenames openai-codex-accounts.json,
settings.json, and logs/audit.log) so the logic is maintainable and
Windows-safe.
In @.github/workflows/recovery-drill.yml:
- Around line 36-37: The health-check step currently redirects only stdout using
the command "node scripts/enterprise-health-check.js >
.tmp/recovery-drill-health.json" so stderr (warnings/errors) are lost; update
the step to capture both streams by redirecting stderr into stdout (e.g., append
"2>&1" to the command) or use a tool like tee to persist combined output,
ensuring the "Run health check snapshot" command writes both stdout and stderr
into .tmp/recovery-drill-health.json for later analysis.
In `@docs/operations/audit-forwarding.md`:
- Around line 45-60: The delivery contract needs an added section describing
retry behavior on delivery failures: inspect scripts/audit-log-forwarder.js
(look for the functions that perform sending/forwarding and checkpointing) and
document whether the forwarder retries on network/SIEM errors, the max retry
count, backoff strategy (fixed/exponential and intervals), behavior on permanent
failures (when it stops retrying and requires manual re-run), and how
checkpoints (`file`, `line`, `updatedAt`) are updated/rolled back during
retries; add this as a short "Failure & Retry Behavior" subsection that
references the actual behavior and any configurable parameters found in that
script.
- Around line 17-19: Rephrase the `CODEX_SIEM_API_KEY` bullet to remove the
circular wording: change "optional bearer token, if required by SIEM" to a clear
condition such as "bearer token required when the SIEM endpoint enforces
authentication (provide API key here)" so it states when `CODEX_SIEM_API_KEY`
must be set and when it can be omitted.
In `@docs/operations/incident-drill-template.md`:
- Around line 61-66: Update the Exit Criteria checklist item that reads "health
check returned `pass`" to explicitly reference the JSON field and expected
value—e.g., change it to something like `health check returned status: "pass"
(check JSON "status" field)` so operators know to inspect the health check JSON
`status` field; modify the string in the document where the checklist item is
defined (the line containing "health check returned `pass`") to include this
clearer text and/or a short reference to the JSON structure.
In `@docs/upgrade.md`:
- Around line 75-80: Move the keychain availability validation before
instructing operators to set CODEX_SECRET_STORAGE_MODE; specifically, place the
`npm run ops:keychain-assert` step prior to the line that tells users to set
CODEX_SECRET_STORAGE_MODE=keychain (or auto), or alternatively add an explicit
note next to CODEX_SECRET_STORAGE_MODE advising to run `npm run
ops:keychain-assert` first to confirm keychain availability; reference the env
var name CODEX_SECRET_STORAGE_MODE and the npm script ops:keychain-assert when
updating the docs so readers clearly validate before choosing "auto".
In `@lib/audit.ts`:
- Around line 118-131: withRetryableAuditFsOperation currently retries up to 5
times with no delay; update it to perform exponential backoff between attempts
so transient EBUSY/contention errors can clear. Inside
withRetryableAuditFsOperation, after catching a retryable error (use
isRetryableAuditFsError) and before the next loop iteration, add a delay that
grows with attempt (e.g., baseSleepMs * 2**attempt) or mirror the exponential
backoff used in unified-settings.ts, and ensure the final
non-retryable/last-attempt behavior still throws the original error (or
lastError) as before.
In `@lib/codex-manager.ts`:
- Around line 4065-4072: sanitizeAuditError currently redacts token patterns but
can leak email addresses; update sanitizeAuditError to also mask emails by
applying the existing maskEmail(email: string) utility (used elsewhere) or an
equivalent regex replacement before truncation, e.g., call maskEmail on the raw
string or run a regex like /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g
to replace matches with a masked value, keep the existing token redactions and
slice(0, 200) behavior, and ensure references to sanitizeAuditError and
maskEmail are used to locate the change.
In `@lib/secrets/token-store.ts`:
- Around line 69-71: Remove the duplicated local sleep function in
token-store.ts and instead import and use the shared sleep utility from
lib/utils (the same helper currently defined at utils.ts lines ~64-66);
specifically delete the local function named sleep and add an import for the
sleep function, then update any local references to call the imported sleep to
eliminate duplication (refer to the sleep symbol in token-store.ts and the sleep
export in utils).
In `@README.md`:
- Line 189: The markdown table row for CODEX_SECRET_STORAGE_MODE contains
unescaped pipe characters in the description that break the table; update the
README entry for CODEX_SECRET_STORAGE_MODE to escape the pipes or wrap the
entire value in code spans/backticks (e.g., escape "|" as "\|" or use
"`keychain|plaintext|auto`") so the table renders as the correct number of
columns and the description no longer splits into extra cells.
In `@scripts/audit-log-forwarder.js`:
- Around line 231-250: withCheckpointLock currently acquires a lock via
open(lockPath, "wx") and retries, but lacks Windows EPERM handling, stale-lock
detection, and an overall timeout; update withCheckpointLock to (1) write the
current process pid into the lock file after successful open so we can validate
authorship, (2) on failed open, if error.code === "EPERM" treat it like
contention on Windows (same backoff path), (3) when seeing an existing lock
file, stat it and check its age (mtime) against a configurable STALE_LOCK_MS and
if older then attempt to read pid and validate process existence (where
possible) and remove the stale lock before retrying, and (4) enforce a total
timeout/attempt cap (or compute total waited time) to throw a clear timeout
error instead of waiting indefinitely; reference the withCheckpointLock function
and CHECKPOINT_LOCK_MAX_ATTEMPTS/ sleep/backoff logic when implementing these
changes.
In `@scripts/enterprise-health-check.js`:
- Around line 186-208: Remove the redundant existsSync call in function
checkSecureMode and rely solely on await stat(path); update the catch block to
explicitly handle ENOENT by skipping/returning when error.code === "ENOENT" and
only push a "stat-failed" finding for other errors; keep the rest of the
permission-check logic (details.mode & 0o777 comparison and pushing the
"insecure-file-permissions" finding) unchanged so that TOCTOU is avoided and
deleted/missing files are treated as non-errors.
In `@scripts/slo-budget-report.js`:
- Around line 55-76: The runHealthCheck function calls execFileSync without a
timeout and builds the error message only from stdout/stderr, which can hang CI
and produce empty findings for spawn errors; change runHealthCheck to invoke
execFileSync with a timeout and appropriate kill signal/options (e.g., timeout
and killSignal/maxBuffer) when running "scripts/enterprise-health-check.js" and,
in the catch block, fall back to error.message (or error.toString()) when
error.stdout/error.stderr are empty so the finding contains actionable text;
also add unit tests for runHealthCheck to cover a stalled child (timeout), spawn
ENOENT/permission errors, and Windows behavior to prevent regressions.
In `@test/audit-retry.test.ts`:
- Around line 40-48: The test's fs mock in vi.doMock is missing chmodSync
causing lib/audit.ts (which calls chmodSync(logPath, 0o600)) to hit the real fs
or throw on POSIX; update the mock returned by vi.doMock in
test/audit-retry.test.ts to include chmodSync (a no-op or stub) alongside
writeFileSync, mkdirSync, existsSync, statSync, renameSync, readdirSync, and
unlinkSync so the chmodSync call in the audit code is handled by the mock during
tests.
In `@test/compliance-evidence-bundle.test.ts`:
- Around line 48-49: The test uses a hardcoded "node" call in the fake npm shims
(variables npmShellSource and npmCmdSource) which can pick up different node
executables from PATH and cause flakes; change both shims to invoke the exact
node binary via process.execPath (use process.execPath.replace(/\\/g, "/") for
the POSIX shim and process.execPath for the Windows shim) so the launched
fake-npm runs the same Node used for the test runner, keeping tests
deterministic.
- Around line 71-88: Add a dedicated Windows regression test in
test/compliance-evidence-bundle.test.ts that explicitly exercises the Windows
cmd.exe npm invocation path used by runNpm(): create a fake npm shim named
npm.cmd (or a cmd.exe-wrapper) in the fake bin (reusing createFakeNpmBin or a
small variant) that emits very verbose output, set PATH to point to that bin and
ensure process.platform is treated as 'win32' for the test (or use the test
harness to simulate Windows behavior), then spawn the scriptPath as in the
existing test and assert the run completes without a maxBuffer-related failure;
ensure you restore any mutated globals/env (process.platform override and PATH)
after the test.
In `@test/retention-cleanup.test.ts`:
- Around line 11-26: The removeWithRetry function is duplicated; extract the
logic into a single shared test utility module and export a function named
removeWithRetry, then replace the duplicated implementations in the tests with
an import of that exported removeWithRetry; ensure the exported function
preserves the same signature (async function removeWithRetry(targetPath:
string): Promise<void>) and behavior (same retryableCodes, ENOENT handling,
exponential backoff) and update any test imports to use the new shared utility.
- Around line 80-86: The test declares `let result;` with an implicit any —
explicitly type it to match the return of `runRetentionCleanup` (the `spawnSync`
result) to satisfy `noImplicitAnyLet`: e.g. import and use
`SpawnSyncReturns<string>` from 'child_process' and change the declaration to
`let result: SpawnSyncReturns<string>;` (or use `let result: ReturnType<typeof
runRetentionCleanup>;`) so the variable has the correct typed return shape.
- Around line 65-69: Replace the inline Windows skip in the test block named
"exits non-zero when deletions fail" by using Vitest's conditional skip helper:
remove the dummy assertion and early return inside the it(...) and instead
change the test declaration to use it.skipIf(process.platform === "win32", ...)
so the test is skipped on Windows; update the test identifier (the string "exits
non-zero when deletions fail") and preserve the existing async callback body and
assertions unchanged.
In `@test/schemas.test.ts`:
- Around line 241-270: Add a unit test in the AccountStorageV4Schema describe
block that verifies v4 accepts the optional activeIndexByFamily field: create a
storage object based on validStorage, add an activeIndexByFamily object (e.g., {
codex: 0, legacy: 0 }), call AccountStorageV4Schema.safeParse(storage) and
assert result.success is true; place the test near the other V4 tests so it
exercises the optional activeIndexByFamily handling.
In `@test/security/secret-scan-regression.test.sh`:
- Around line 27-35: Replace the realistic-looking OPENAI_API_KEY values in the
test fixtures (files written under "${FAIL_CASE_DIR}/src/leak.txt" and
"${FAIL_CASE_DIR}/test/security/fixtures/real-secret.txt") with clearly
synthetic placeholders that cannot be mistaken for production secrets (e.g.
"OPENAI_API_KEY=sk-test-placeholder-0000" or
"OPENAI_API_KEY=PLACEHOLDER_API_KEY") while preserving the fixture structure and
intent of the regression (so tests still exercise detection of secret-like
patterns); ensure the other fixture
("${FAIL_CASE_DIR}/test/security/fixtures/fixture.txt") remains deterministic
and do not remove assertions or change test semantics, only change token strings
to non-production-looking synthetic values.
- Around line 1-4: The shell harness file secret-scan-regression.test.sh should
not live under test/**; move that script to scripts/ (e.g.,
scripts/secret-scan-regression.sh) and create a Vitest test (e.g.,
secret-scan-regression.test.ts) that either invokes the moved script via a
controlled runner (execa/spawn) or, better, ports the shell checks into Vitest
assertions using expect; ensure the Vitest test enforces determinism (no real
secrets, no skipped assertions), reproduces required regression conditions
(concurrency/token-refresh/windows cases as needed), and returns proper
pass/fail status through Vitest reporting rather than executing a standalone
shell harness under test/**.
In `@test/token-store.test.ts`:
- Around line 148-158: The test's race setup is flaky because awaiting
Promise.resolve() does not guarantee the first call to
tokenStore.persistAccountSecrets entered the mock's critical branch; add a
deferred signal (e.g., firstRefreshEntered) that is resolved inside the mock
when refreshCallCount === 1, then await that signal before resolving refreshGate
so the first write is deterministically inside the blocking region; update the
mock and test to create and await firstRefreshEntered around the
refreshCallCount === 1 check and only then call refreshGate.resolve().
---
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 20-27: The workflow uses mutable tags actions/checkout@v4 and
actions/setup-node@v4; replace those with the corresponding full commit SHAs
(the same pattern used in secret-scan.yml) to pin versions for supply-chain
consistency: update the uses lines for actions/checkout and actions/setup-node
to their full commit SHA references instead of `@v4` and keep the existing inputs
(node-version, cache) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: da374995-84ca-45c4-9649-3bac38e425d3
📒 Files selected for processing (52)
.github/workflows/ci.yml.github/workflows/recovery-drill.yml.github/workflows/release-provenance.yml.github/workflows/retention-maintenance.yml.github/workflows/sbom-attestation.yml.github/workflows/secret-scan.yml.gitleaks.tomlREADME.mdSECURITY.mdconfig/performance-budgets.jsonconfig/performance-budgets.schema.jsonconfig/slo-policy.jsondocs/README.mddocs/configuration.mddocs/operations/audit-forwarding.mddocs/operations/incident-drill-template.mddocs/operations/incident-response.mddocs/operations/release-runbook.mddocs/operations/slo-error-budget.mddocs/privacy.mddocs/reference/settings.mddocs/upgrade.mdlib/audit.tslib/codex-manager.tslib/schemas.tslib/secrets/token-store.tslib/storage.tslib/unified-settings.tspackage.jsonscripts/audit-log-forwarder.jsscripts/benchmark-runtime-path.mjsscripts/compliance-evidence-bundle.jsscripts/enterprise-health-check.jsscripts/generate-sbom.jsscripts/keychain-assert.jsscripts/performance-budget-check.jsscripts/retention-cleanup.jsscripts/slo-budget-report.jsscripts/verify-sbom.jstest/audit-log-forwarder.test.tstest/audit-retry.test.tstest/audit.test.tstest/compliance-evidence-bundle.test.tstest/enterprise-health-check.test.tstest/generate-sbom.test.tstest/retention-cleanup.test.tstest/schemas.test.tstest/security/secret-scan-regression.test.shtest/slo-budget-report.test.tstest/storage-v4-keychain.test.tstest/token-store.test.tstest/unified-settings.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/operations/release-runbook.mddocs/privacy.mddocs/operations/incident-drill-template.mddocs/upgrade.mddocs/operations/audit-forwarding.mddocs/operations/slo-error-budget.mddocs/configuration.mddocs/README.mddocs/reference/settings.mddocs/operations/incident-response.md
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/schemas.test.tstest/token-store.test.tstest/compliance-evidence-bundle.test.tstest/retention-cleanup.test.tstest/enterprise-health-check.test.tstest/unified-settings.test.tstest/storage-v4-keychain.test.tstest/audit-log-forwarder.test.tstest/audit-retry.test.tstest/slo-budget-report.test.tstest/audit.test.tstest/security/secret-scan-regression.test.shtest/generate-sbom.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/codex-manager.tslib/secrets/token-store.tslib/schemas.tslib/unified-settings.tslib/audit.tslib/storage.ts
🧬 Code graph analysis (14)
scripts/benchmark-runtime-path.mjs (1)
lib/storage.ts (1)
normalizeAccountStorage(776-865)
test/schemas.test.ts (1)
lib/schemas.ts (3)
AccountStorageV4Schema(165-170)AnyAccountStorageSchema(210-214)safeParseAccountStorage(309-315)
test/compliance-evidence-bundle.test.ts (1)
scripts/compliance-evidence-bundle.js (3)
outDir(141-141)result(152-152)payload(157-168)
test/retention-cleanup.test.ts (1)
scripts/retention-cleanup.js (5)
attempt(34-34)code(39-39)code(55-55)root(81-81)payload(108-117)
test/enterprise-health-check.test.ts (1)
scripts/enterprise-health-check.js (5)
code(164-164)root(213-213)payload(279-286)findings(214-214)checks(215-215)
test/unified-settings.test.ts (1)
lib/unified-settings.ts (3)
getUnifiedSettingsPath(254-256)saveUnifiedPluginConfigSync(289-293)loadUnifiedPluginConfigSync(267-275)
test/audit-log-forwarder.test.ts (1)
scripts/audit-log-forwarder.js (5)
endpoint(266-266)root(269-269)logDir(270-270)checkpointPath(271-271)checkpoint(274-274)
scripts/performance-budget-check.js (3)
scripts/slo-budget-report.js (1)
payload(152-179)lib/utils.ts (1)
isRecord(11-13)scripts/enterprise-health-check.js (1)
payload(279-286)
lib/secrets/token-store.ts (2)
lib/logger.ts (2)
createLogger(366-424)error(389-393)lib/utils.ts (1)
sleep(65-67)
scripts/enterprise-health-check.js (1)
lib/runtime-paths.ts (1)
getCodexHomeDir(43-46)
test/slo-budget-report.test.ts (1)
scripts/slo-budget-report.js (4)
root(88-88)logDir(89-89)policyPath(85-85)payload(152-179)
test/audit.test.ts (1)
lib/audit.ts (3)
getAuditLogPath(232-234)auditLog(190-230)configureAudit(74-77)
scripts/retention-cleanup.js (1)
scripts/enterprise-health-check.js (1)
code(164-164)
lib/storage.ts (5)
lib/storage/migrations.ts (5)
AccountStorageV1(34-38)AccountStorageV3(59-64)AccountStorageV4(83-88)AccountMetadataV3(40-57)migrateV3ToV4(129-160)lib/secrets/token-store.ts (6)
getEffectiveSecretStorageMode(99-105)loadAccountSecrets(169-186)ensureSecretStorageBackendAvailable(115-119)deriveAccountSecretRef(121-132)persistAccountSecrets(134-167)deleteAccountSecrets(188-218)lib/prompts/codex.ts (2)
ModelFamily(47-52)MODEL_FAMILIES(58-64)lib/schemas.ts (2)
AnyAccountStorageSchema(210-214)AccountStorageV4Schema(165-170)lib/utils.ts (1)
isRecord(11-13)
🪛 Biome (2.4.4)
test/retention-cleanup.test.ts
[error] 81-81: This variable implicitly has the any type.
(lint/suspicious/noImplicitAnyLet)
🪛 LanguageTool
docs/operations/incident-response.md
[style] ~39-~39: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rst, then codex auth doctor --fix. 4. If auth failures persist, rotate account v...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~40-~40: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...dex>and re-runcodex auth check`. 5. If all accounts are exhausted/disabled, es...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.21.0)
README.md
[warning] 189-189: Table column count
Expected: 2; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
🔇 Additional comments (72)
scripts/compliance-evidence-bundle.js (1)
52-65: maxbuffer hardening is correctly applied on both npm execution paths.looks good.
scripts/compliance-evidence-bundle.js:52-65now setsmaxBufferfor bothcmd.exeand directnpm, and this is covered by the verbose-output regression intest/compliance-evidence-bundle.test.ts:71-106. no additional concurrency risk is introduced here.docs/reference/settings.md (2)
129-136: lgtm — enterprise keychain pin documented correctly.the new
CODEX_SECRET_STORAGE_MODEentry and enterprise recommendation align withdocs/configuration.mdguidance. the pin tokeychainfor production is the correct default per the referenced storage-mode docs.
183-183: lgtm — cross-reference added.linking to
../configuration.mdimproves discoverability for operators looking up storage mode semantics.config/slo-policy.json (1)
1-8: lgtm — slo policy structure is well-defined.the 99.5% request success rate and zero-tolerance for stale wal findings align with enterprise reliability expectations.
scripts/slo-budget-report.jsconsumes this correctly per test coverage intest/slo-budget-report.test.ts.SECURITY.md (1)
87-91: lgtm — pre-release script sequence documented.the addition of
ops:health-check,perf:budget-check,sbom:generate, andsbom:verifybefore lint aligns with CI workflow steps anddocs/operations/release-runbook.mdpreconditions.config/performance-budgets.json (1)
1-9: lgtm — performance budgets defined with schema validation.the thresholds are reasonable and
config/performance-budgets.schema.jsondocuments the units per prior feedback.scripts/performance-budget-check.jsconsumes these correctly.docs/operations/slo-error-budget.md (1)
1-59: lgtm — slo policy documentation is comprehensive.the error budget thresholds (50%/100% burn triggers) and governance steps align with enterprise reliability practices. the
--enforceflag for CI gating is well-documented.docs/privacy.md (1)
51-64: lgtm — retention and audit forwarding documented with override examples.the
--days=30override example addresses prior feedback. the--dry-runflag for audit forwarding aligns withdocs/operations/audit-forwarding.mdguidance for testing before production..github/workflows/recovery-drill.yml (1)
31-34: lgtm — recovery drill uses packaged npm script.the workflow correctly uses
npm run ops:recovery-drillinstead of inline vitest targets, avoiding drift between workflow andpackage.json:72as flagged in prior review..gitleaks.toml (1)
1-19: lgtm — gitleaks config addresses prior feedback.the config now includes:
[extend] useDefault = trueto inherit detection rulescondition = "AND"requiring both path and regex match[\\/]for windows path compatibilitysk-test-prefix instead ofsk-to avoid suppressing real api keysthe narrowed path to
test[\\/]security[\\/]fixtures[\\/]limits allowlist scope appropriately.test/audit.test.ts (1)
250-267: good deterministic stale-log regression.
test/audit.test.ts:252uses an in-range fixed epoch and restores the spy infinally, which keeps the purge test stable and reproducible.test/unified-settings.test.ts (2)
6-28: windows-safe secure-write coverage is solid.
test/unified-settings.test.ts:7-14now asserts readability and no leaked temp files on win32, while preserving strict mode checks on posix.
115-127: good regression for repeated sync writes.the new test at
test/unified-settings.test.ts:115validates state correctness and secure file mode persistence across consecutive sync writes.scripts/verify-sbom.js (1)
1-49: lgtm, clean sbom verification script.the validation flow is straightforward.
fail()exits on error, payload construction is safe. no token/secret handling here.config/performance-budgets.schema.json (1)
1-53: schema looks correct.strict
additionalProperties: falsewith required keys andminimum: 0constraints. properly typed as numbers.lib/codex-manager.ts (2)
4074-4097: audit wrapping implementation looks solid.
lib/codex-manager.ts:4074-4097correctly logs success/failure outcomes with sanitized errors and exit codes. the rethrow preserves original exception behavior.
4124-4159: cli commands properly wrapped with audit.all auditable commands (
login,list,status,switch,check,verify-flagged,forecast,report,fix,doctor) now route throughrunWithAudit.featuresintentionally excluded since it's a static report.lib/unified-settings.ts (2)
127-151: sync write path correctly hardens permissions after rename.
lib/unified-settings.ts:127-151now applieschmodSyncto both directory and file after atomic rename on non-windows. this addresses the prior overwrite-permission gap. retry logic withAtomics.waitis acceptable for synchronous context.
193-217: async write path mirrors sync hardening correctly.
lib/unified-settings.ts:193-217usesfs.chmodwith platform guard and best-effort error handling. consistent with sync path behavior.lib/audit.ts (2)
133-162: purge throttle correctly positioned after successful read.
lib/audit.ts:149updateslastPurgeAttemptMsonly after the readdir succeeds. this prevents a transient read failure from blocking purge attempts for the full throttle interval. addresses prior review feedback.
200-226: audit write path properly hardened.
lib/audit.ts:200-226now:
- calls
rotateLogsIfNeeded()beforepurgeExpiredLogs()(correct order)- wraps
writeFileSyncin retry helper- applies
chmod 0o600on non-windows with best-effort handlingall prior review concerns addressed.
test/audit-retry.test.ts (1)
74-76: good regression coverage for non-blocking retry.
test/audit-retry.test.ts:76assertsAtomics.waitis not called, which validates the pr description claim that blocking audit retry sleep was removed. deterministic mocking with controlled EBUSY failures.scripts/enterprise-health-check.js (2)
158-169: newestMtimeMs correctly handles directory churn.
scripts/enterprise-health-check.js:161-169wraps readdir in try/catch for ENOENT/ENOTDIR. theexistsSyncat line 159 is redundant since the catch handles missing dirs, but it's harmless as an early exit optimization.
222-236: wal check toctou race fixed.
scripts/enterprise-health-check.js:222-236removed theexistsSynccheck and now wraps the entire stat call in try/catch. the transient unavailability comment is clear.scripts/retention-cleanup.js (3)
33-47: removeWithRetry has correct exponential backoff.
scripts/retention-cleanup.js:33-47retries up to 6 times with25 * 2 ** attemptms delay for EBUSY/EPERM/ENOTEMPTY. ENOENT treated as success (file already gone). good pattern.
49-60: collectExpiredFiles handles directory churn.
scripts/retention-cleanup.js:52-60wraps readdir in try/catch and skips on ENOENT/ENOTDIR/EPERM. theexistsSyncat line 50 is redundant but harmless. addresses prior review feedback.
108-122: partial cleanup now fails with exit code 1.
scripts/retention-cleanup.js:119-121exits non-zero whenfailed.length > 0. this ensures automation detects retention drift. addresses prior review feedback..github/workflows/retention-maintenance.yml (1)
1-48: workflow structure looks correct after runner.temp fix.the fix at line 20 properly moves
CODEX_MULTI_AUTH_DIRto step-level whererunner.tempis available. workflow now follows github actions context availability rules.one minor observation: the inline node scripts at lines 33 and 42 are dense one-liners. consider extracting to a small helper script under
scripts/for maintainability, but not blocking.test/enterprise-health-check.test.ts (1)
1-178: test coverage for enterprise-health-check looks comprehensive.tests cover missing artifacts, fallback root resolution, stale audit detection, and ENOTDIR edge cases. the
pathsEqualhelper attest/enterprise-health-check.test.ts:39-43properly handles windows path normalization.lib/storage.ts (5)
876-939: v4 hydration correctly remapsactiveIndexByFamilywhen accounts are skipped.
lib/storage.ts:917-931implements the fix from past review:remapFamilyIndexadjusts family indices based onskippedIndices, and line 928 properly clamps tohydratedAccounts.length - 1. the logging at line 898-900 now usesaccountIdonly, avoiding PII leakage.
1135-1185: serialize path now has rollback-safe secret writes.
lib/storage.ts:1147-1184collectspersistedSecretRefsduring the loop and performs cleanup viaPromise.allSettledat line 1180-1182 if any step fails. this addresses the prior transactionality concern.
1354-1373:collectPersistedSecretRefshandles both v4 payloads and wal journal content.
lib/storage.ts:1354-1373first checks for direct v4 shape, then falls back to parsing walcontentfield. this addresses the prior concern about missing wal payload refs during cleanup.
1375-1400:clearPersistedAccountSecretsuses force mode for mode-agnostic deletion.
lib/storage.ts:1382callsdeleteAccountSecrets(ref, { force: true })which bypasses mode checks and attempts keychain deletion regardless of current mode. this addresses the prior mode-flip concern.one consideration: if keytar is unavailable,
deleteAccountSecretswithforce: truewill silently return (line 200 in token-store.ts). this is acceptable for cleanup but worth noting.
1281-1285: rollback on save failure cleans up persisted secrets.
lib/storage.ts:1281-1285performs cleanup ofpersistedSecretRefswhen save fails after secrets were written. this completes the transactional save flow.scripts/audit-log-forwarder.js (2)
187-229:sendBatchnow implements timeout and retry with backoff.
scripts/audit-log-forwarder.js:197-228properly usesAbortControllerwith configurable timeout, retries on 429/5xx and network errors, and applies exponential backoff with jitter. this addresses the prior blocking fetch concern.
110-143: checkpoint rotation is under-tested; file-based locking lacks windows guarantees.rotation handling at scripts/audit-log-forwarder.js:110-143 uses filename heuristics with line-count fallback, but test coverage is minimal. test/audit-log-forwarder.test.ts:141 only covers the checkpoint.line > activeLineCount case; no tests for rapid rotation during collection, concurrent checkpoint writes, or sequential multi-rotation scenarios.
more critically, checkpoint locking at scripts/audit-log-forwarder.js:235 relies on open(lockPath, "wx", 0o600) which has weaker semantics on windows (not exclusive like unix). this introduces a real race condition on windows when concurrent instances try to write the checkpoint—two processes could both acquire the lock and corrupt the file.
add regression tests for:
- rapid rotation (audit.log → audit.1.log → audit.2.log in succession)
- checkpoint lock contention
- windows-specific concurrency (if windows support is required, consider cross-platform atomic rename or explicit mutex)
lib/secrets/token-store.ts (3)
47-67: esm/cjs interop for keytar is now correct.
lib/secrets/token-store.ts:51-52properly handles cjs default export:(imported.default ?? imported) as KeytarModule. this fixes the prior issue where keytar methods were undefined.
148-160: rollback on access token write failure is now implemented.
lib/secrets/token-store.ts:148-160wraps the access token write in try/catch and callsdeleteSecretRefWithRetryto clean up the refresh token on failure. the warning at line 154-157 logs onlyrefreshTokenRef, not the actual token - safe for PII.
202-217: delete cleanup continues after first failure.
lib/secrets/token-store.ts:202-217collects errors from both delete attempts and throws the first error only after both are attempted. this prevents orphaned access secrets when refresh deletion fails.docs/README.md (1)
62-66: operations documentation links added consistently.
docs/README.md:62-66adds links to the new operational runbooks (incident-response, incident-drill-template, release-runbook, slo-error-budget, audit-forwarding). these align with the pr objectives for enterprise readiness.docs/configuration.md (1)
69-69: lgtm - documentation aligns with enterprise guidance.the
CODEX_SECRET_STORAGE_MODEentry now renders correctly and matches the format used elsewhere in the table (e.g.,CODEX_MODE=0/1at line 65). enterprise pin recommendation is consistent withdocs/reference/settings.mdanddocs/reference/storage-paths.md.docs/upgrade.md (1)
65-99: migration section addresses past review feedback.the plaintext→keychain migration flow is documented with backup, env var, validation, rewrite, and health check steps. windows-specific EBUSY/EPERM guidance at lines 97-98 aligns with
releases/v0.1.2.mderror handling patterns..github/workflows/secret-scan.yml (2)
1-35: workflow addresses prior review feedback on sha pinning and permissions.actions are pinned to full commit shas (
lib/.github/workflows/secret-scan.yml:19,24), explicit permissions block added with least privilege, and regression test step included.
27-35: the version prefix difference betweenGITLEAKS_VERSION(line 27) andEXPECTED_GITLEAKS_VERSION(line 34) doesn't affect the regression test. test/security/secret-scan-regression.test.sh:7 only usesEXPECTED_GITLEAKS_VERSIONwith the v prefix for the docker image tag and never referencesGITLEAKS_VERSION. the mismatch either serves different purposes (action config vs test environment) orGITLEAKS_VERSIONis unused, but it won't cause test failures or false positives.> Likely an incorrect or invalid review comment..github/workflows/ci.yml (1)
11-17: matrix now includes windows-latest - addresses past review.ci will now exercise file permission hardening (0o600/0o700) and health check scripts on both ubuntu and windows runners. v4 fixture with
refreshTokenRef/accessTokenRefat line 64 also addresses the v4 schema coverage gap.test/slo-budget-report.test.ts (3)
11-26: removeWithRetry handles windows filesystem locks correctly.the retry logic at
test/slo-budget-report.test.ts:11-26handles EBUSY, EPERM, ENOTEMPTY with exponential backoff - aligns with windows filesystem behavior mentioned indocs/upgrade.md:97-98.
39-76: test validates stale-wal-findings evaluation path.deterministic test: creates fixture, sets mtime 10 days back, runs script synchronously, and asserts
health.staleWalFindings === 1and evaluation statusfail. the test is isolated and uses vitest per guidelines.
28-37: slo-budget-report.js doesn't write checkpoint files—no concurrency regression test needed here.this script reads audit logs and writes a single output report (line 182 of scripts/slo-budget-report.js). it has no stateful checkpoint, resume-point, or locking logic like audit-log-forwarder.js does. concurrent runs would safely read the same source logs and write to separate output paths without races.
per coding guidelines, regression tests for concurrency should target actual bugs, not hypothetical ones. the current test exercises the core functionality; adding a concurrent-run test would verify something that's already correct by design (readonly log dir + independent output paths).
docs/operations/incident-drill-template.md (1)
36-42: command evidence section aligns with incident-response.md.the commands at lines 39-42 (
npm run ops:health-check,codex auth report --live --json,codex auth doctor --json) match the evidence collection requirements indocs/operations/incident-response.md.test/schemas.test.ts (3)
241-270: v4 schema tests cover core paths.
test/schemas.test.ts:241-270validates v4 storage acceptance, missing refreshTokenRef rejection, and empty accessTokenRef rejection. aligns withlib/schemas.ts:164-169schema definition.
320-330: v4 union test addresses past review feedback.this test at
test/schemas.test.ts:320-330exercisesAnyAccountStorageSchemawith a v4 payload, confirming the discriminated union handles version 4 correctly.
475-483: safeParseAccountStorage v4 regression test addresses past review.
test/schemas.test.ts:475-483adds explicit v4 success case forsafeParseAccountStorage, ensuring helper wiring cannot regress independently of the union schema..github/workflows/sbom-attestation.yml (1)
1-48: workflow looks good overall.permissions are correctly scoped for attestation (id-token write, attestations write). the conditional
if: github.event_name == 'push' && github.ref == 'refs/heads/main'at line 45 correctly limits provenance attestation to main branch pushes..github/workflows/release-provenance.yml (2)
30-48: quality gate fixture setup and checks look complete.the inline fixture creation at lines 34-37 ensures
ops:health-check --require-filespasses in CI. this addresses the earlier feedback about missing quality gates.one consideration: the fixture at line 35 uses
version:3for accounts but keychain mode would expect v4. sinceCODEX_SECRET_STORAGE_MODEisn't set here, the plaintext path should work, but verify this doesn't cause health-check warnings if the script expects v4 in keychain-enabled environments.[approve_code_changes, request_verification]
#!/bin/bash # check if enterprise-health-check.js validates storage version or storage mode rg -n "version.*[34]|SECRET_STORAGE_MODE" scripts/enterprise-health-check.js | head -20
56-59: npm publish auth is now correctly configured.
NODE_AUTH_TOKENat line 59 addresses the earlier critical issue. ensure theNPM_TOKENsecret is configured in repository settings with publish permissions.scripts/generate-sbom.js (1)
1-45: past feedback addressed: maxBuffer and windows npm.cmd handling.
MAX_BUFFER_BYTESat line 8 (20mb) and its usage at lines 20, 26 address the buffer overflow concern. thenpm.cmdfallback at line 22 handles windows path resolution. the test attest/generate-sbom.test.tscovers the large-buffer scenario.package.json (2)
134-136: keytar as optionalDependency is correct for graceful degradation.this allows keychain mode to work when keytar is available while falling back to plaintext on systems without native keychain support.
[approve_code_changes, request_verification]
#!/bin/bash # verify keytar 7.9.0 exists on npm curl -s "https://registry.npmjs.org/keytar" | jq '.versions | keys | map(select(startswith("7.9")))' | head -10
132-133: zod 4.3.6 is compatible with the codebase.lib/schemas.ts uses modern zod 4.x patterns throughout:
z.discriminatedUnion()at line 210 and 264,safeParse()for all validations, and proper error handling viaresult.error.issues. comprehensive test coverage validates schema behavior across all edge cases. no v3-specific patterns detected and no compatibility issues exist.> Likely an incorrect or invalid review comment.test/generate-sbom.test.ts (2)
11-26: removeWithRetry utility handles windows filesystem behavior correctly.retry logic with exponential backoff at line 23 (
25 * 2 ** attempt) and retryable codes (EBUSY, EPERM, ENOTEMPTY) at line 12 match expected windows transient errors. ENOENT early return at line 19 is correct.
58-84: good regression coverage for maxBuffer scenario.the fake npm script outputs ~1.5mb (line 35:
"x".repeat(1_500_000)), and assertions at lines 82-83 verify the output is >1mb and valid json. this exercises theMAX_BUFFER_BYTES = 20 * 1024 * 1024path inscripts/generate-sbom.js:8.consider adding a negative test case where npm sbom fails (exit non-zero) to verify the script's error handling at
scripts/generate-sbom.js:42-44.[approve_code_changes, suggest_optional_refactor]
lib/schemas.ts (2)
140-172: v4 schema with keychain refs looks correct.
refreshTokenRef: z.string().min(1)at line 148 andaccessTokenRef: z.string().min(1).optional()at line 149 correctly enforce non-empty refs when present (addresses past feedback).verify
test/schemas.test.tsincludes validation cases for:
- v4 with valid refs
- v4 with empty
refreshTokenRef(should reject)- v4 with empty
accessTokenRef(should reject when provided)[approve_code_changes, request_verification]
#!/bin/bash # check for v4 schema tests rg -n "V4|refreshTokenRef|accessTokenRef" test/schemas.test.ts | head -30
207-214: AnyAccountStorageSchema union updated for v4.discriminated union at line 210-214 now includes V1, V3, V4. migration logic in
lib/storage/migrations.tsshould handle v3→v4 conversion.test/storage-v4-keychain.test.ts (6)
27-45: keytar mock correctly isolates tests from real keychain.the mock at lines 31-44 uses an in-memory
Map<string, string>and exposessetPassword,getPassword,deletePassword. this avoids touching real keychain while testing v4 storage semantics. as per coding guidelines, tests reject mocking real secrets.
112-165: concurrent saveAccounts test covers serialization.
Promise.allat lines 116-147 exercises the save queue inlib/storage.ts. assertion at lines 162-164 confirms the final loaded state is one of the two valid outcomes. this addresses the past feedback about concurrency regression coverage.one gap: the test verifies no torn json but doesn't assert the queue serialized (i.e., both saves completed, not interleaved). consider adding a comment clarifying expected behavior: last writer wins.
167-204: EPERM rename retry test covers windows behavior.
mockImplementationOnceat lines 172-176 throws EPERM on first attempt, then falls through to real rename. assertion at line 201 (renameCallCount >= 2) verifies retry occurred. this addresses past feedback about windows rename retry regression.consider also testing EBUSY (common on windows when file handles are held), since
lib/storage.tslikely handles both. if EBUSY is already covered by the retry logic, a separate test case would strengthen coverage.[approve_code_changes, suggest_optional_refactor]
206-239: rollback test verifies keychain cleanup on save failure.EACCES at line 213 causes rename to fail permanently (not retryable). assertion at line 238 confirms secrets map is empty after failure. this ensures no orphaned keychain entries when persistence fails.
241-288: hydration test verifies activeIndex adjustment when secrets are missing.the test sets up v4 storage with two accounts but only seeds keychain secrets for
acct_2. assertions at lines 284-287 confirm:
- only one account remains after hydration
- activeIndex adjusted from 1 to 0
- activeIndexByFamily.codex adjusted from 1 to 0
this is important for graceful degradation when keychain entries are cleared externally.
290-326: WAL cleanup test verifies keychain refs cleared on mode switch.the test writes a WAL file with v4 refs at lines 298-319, switches to plaintext mode at line 321, then calls
clearAccounts(). assertions at lines 323-325 confirm:
- keychain secrets deleted
- WAL file removed
this ensures no secret leakage when transitioning storage modes.
test/audit-log-forwarder.test.ts (4)
57-77: withServer helper properly manages server lifecycle.the
try/finallyat lines 72-76 ensures server cleanup even on test failure. binding to127.0.0.1at line 64 avoids network prompts on some OSes.
90-139: 429 retry test verifies transient error handling.the server responds with 429 on first request (lines 105-108), then 200 on retry. assertions verify:
- exit status 0 (line 126)
- 2 messages sent (line 129)
- 2 requests made (line 130)
- checkpoint written with correct line number (lines 137-138)
this exercises the retry path in
scripts/audit-log-forwarder.js.
208-253: checkpoint lock test has a potential race condition.the
releaseTimerat lines 226-228 fires after 120ms, but test timeout behavior depends onCODEX_AUDIT_FORWARDER_TIMEOUT_MS: "300". if the forwarder's lock-wait logic has a shorter timeout or doesn't poll frequently, the test may flake.consider either:
- adding a comment explaining expected timing
- using a more deterministic signal (e.g., file watch or polling confirmation)
this is minor since the test does pass with current implementation.
[approve_code_changes, suggest_optional_refactor]
175-206: timeout test verifies non-blocking behavior on hanging endpoint.the server never responds (line 188), forwarder times out after 50ms × 2 attempts. assertions verify:
- exit status 1 (line 201)
- error message present (line 202)
- no checkpoint created (line 205)
this ensures the forwarder doesn't block indefinitely on unresponsive endpoints.
Resolve remaining PR43 threads across enterprise workflows, audit/lock reliability, and deterministic regression coverage.\n\nHighlights:\n- Extract CI health fixture seeding into a dedicated script and capture stderr in recovery drill artifacts.\n- Clarify ops docs/runbook text and retry semantics, and fix README table rendering for secret-storage mode.\n- Add retry backoff in audit fs operations, strengthen audit error redaction, and remove duplicated token-store sleep helper.\n- Harden checkpoint lock handling with stale-lock cleanup, EPERM contention handling, and bounded timeout errors.\n- Harden health-check execution in slo-budget-report with timeout/kill options, script override support, and fallback error messaging.\n- Convert secret scan regression coverage to deterministic Vitest flow; move shell harness out of test/** and sanitize fixture tokens.\n- Improve Windows and concurrency regression tests across compliance, retention, audit, schemas, and token-store suites.\n\nValidation:\n- npm run typecheck\n- npm run lint\n- npm run test -- test/audit-log-forwarder.test.ts test/audit-retry.test.ts test/audit.test.ts test/compliance-evidence-bundle.test.ts test/enterprise-health-check.test.ts test/retention-cleanup.test.ts test/schemas.test.ts test/slo-budget-report.test.ts test/token-store.test.ts test/security/secret-scan-regression.test.ts\n\nCo-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 11
♻️ Duplicate comments (1)
test/token-store.test.ts (1)
245-272:⚠️ Potential issue | 🟠 Majoradd a deterministic retry regression for transient delete failures.
lib/secrets/token-store.ts:70-93retries delete onEBUSY/EPERM/EAGAIN/429, buttest/token-store.test.ts:245-272only covers non-retryableEACCES. add a vitest case that throwsEBUSY(orstatus=429) for the first refresh-delete attempts, then succeeds, and assert retry count plus access-delete attempt.proposed test addition
it("attempts access-token cleanup even when refresh-token cleanup fails", async () => { const deletedRefs: string[] = []; mockKeytar({ @@ expect(deletedRefs).toEqual(["acct-partial:refresh", "acct-partial:access"]); }); + +it("retries transient refresh delete failures and still processes access cleanup", async () => { + const deletedRefs: string[] = []; + let refreshAttempts = 0; + mockKeytar({ + setPassword: async () => {}, + getPassword: async () => null, + deletePassword: async (_service: string, account: string) => { + if (account === "acct-retry:refresh") { + refreshAttempts += 1; + if (refreshAttempts < 3) { + const error = new Error("resource busy") as NodeJS.ErrnoException; + error.code = "EBUSY"; + throw error; + } + } + deletedRefs.push(account); + return true; + }, + }); + + process.env.CODEX_SECRET_STORAGE_MODE = "keychain"; + const tokenStore = await import("../lib/secrets/token-store.js"); + await tokenStore.ensureSecretStorageBackendAvailable(); + + await tokenStore.deleteAccountSecrets({ + refreshTokenRef: "acct-retry:refresh", + accessTokenRef: "acct-retry:access", + }); + + expect(refreshAttempts).toBe(3); + expect(deletedRefs).toEqual(expect.arrayContaining(["acct-retry:refresh", "acct-retry:access"])); +});As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/token-store.test.ts` around lines 245 - 272, Add a deterministic vitest regression that exercises the retry logic in lib/secrets/token-store.ts (the block around deleteAccountSecrets and the retry loop at lines ~70-93): mock keytar.deletePassword to throw a transient error (e.g., code "EBUSY" or an object with status 429) on the first refresh-token delete attempt and succeed on subsequent calls, then call deleteAccountSecrets with both refreshTokenRef and accessTokenRef and assert that the refresh delete was retried (expected retry count) and that the access-token delete was still attempted and recorded; ensure the test is deterministic (use vitest timers or counters) and placed alongside existing token-store.test.ts cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/recovery-drill.yml:
- Around line 39-45: The artifact upload step ("Upload recovery drill artifacts"
using actions/upload-artifact@v4) only runs on success; modify that step to
always run by adding an execution condition such as if: ${{ always() }} so the
two diagnostic files (.tmp/recovery-drill-vitest.json and
.tmp/recovery-drill-health.json) are uploaded even when the drill fails.
In `@docs/operations/incident-drill-template.md`:
- Around line 38-42: Add a short PowerShell example and note alongside the
existing bash-fenced evidence block showing the equivalent commands (npm run
ops:health-check, codex auth report --live --json, codex auth doctor --json) so
Windows responders have an explicit example; include a powershell-fenced code
block labeled "powershell" with those three commands and a one-line sentence
noting "Run these in an elevated PowerShell session" (or similar) to prevent
improvisation during incidents and keep docs consistent with CLI
flags/workflows.
In `@docs/upgrade.md`:
- Around line 69-94: The markdown list in docs/upgrade.md has inconsistent
ordered-list formatting and unindented fenced code blocks; fix by normalizing
the numbering style for the four related steps ("Validate keychain backend
availability", "Set `CODEX_SECRET_STORAGE_MODE=keychain`", "Trigger a controlled
account rewrite", "Verify health and storage state") and indenting each fenced
```bash``` block by at least three spaces so they are nested under their
respective list items (e.g., ensure the npm run ops:keychain-assert, codex auth
check/report, and npm run ops:health-check blocks are indented under their list
entries).
In `@lib/audit.ts`:
- Around line 120-131: withRetryableAuditFsOperation currently uses Atomics.wait
(retrySleepSignal and RETRYABLE_AUDIT_FS_BASE_DELAY_MS) which blocks the main
thread; remove the blocking Atomics.wait calls inside
withRetryableAuditFsOperation (and any uses in logAuditEntry and
purgeOldLogFiles) and replace with non-blocking retry semantics: for synchronous
call sites keep bounded immediate retries (loop without sleeping) and for async
contexts refactor to async variants that await a non-blocking delay
(setTimeout/promisified) before retrying; update isRetryableAuditFsError usage
to preserve retry/throw logic and remove the atomicsWaitSpy assertion from
test/audit-retry.test.ts; also apply the same replacement for Atomics.wait
occurrences in unified-settings (lib/unified-settings.ts), logger
(lib/logger.ts), and auto-update-checker (lib/auto-update-checker.ts).
In `@lib/codex-manager.ts`:
- Around line 4044-4099: Add vitest regression tests for
auditActionForCommand/sanitizeAuditError/runWithAudit: mock auditLog and write
deterministic tests that (1) call runWithAudit for representative commands
("login","switch","check","verify-flagged","forecast" and an unknown/default)
and assert auditLog is invoked with the correct AuditAction (via
auditActionForCommand) and AuditOutcome.SUCCESS when runner resolves 0 and
AuditOutcome.FAILURE when runner resolves nonzero; (2) simulate runner throwing
errors and assert auditLog was called with error metadata produced by
sanitizeAuditError — include inputs containing sk- keys,
refresh/access_token_..., secret-access/refresh-token, email addresses and long
strings to verify redaction patterns and the 200-char slice; and (3) include
thrown errors containing "EBUSY" and HTTP "429" text to confirm
redaction/handling; use spies/mocks on auditLog and deterministic error objects
to make the tests repeatable.
In `@scripts/audit-log-forwarder.js`:
- Around line 92-95: The discovery and final-mtime reporting still have
exists/stat races (see discoverAuditFiles and the final mtime stat block around
lines 404-406); update discoverAuditFiles to not rely on existsSync — use
readdir withFileTypes and for each Dirent perform a safe stat inside a try/catch
(skip entries that throw ENOENT), and change the final mtime reporting code to
wrap fs.stat/fs.promises.stat in a try/catch that treats
missing/replace-on-rotate files as non-fatal (log and continue, do not
process.exit(1)). Add a deterministic vitest regression in
test/audit-log-forwarder.test.ts that simulates deletion/replacement between
discovery and stat (e.g., create file list, remove/replace a file before stat)
and asserts the script/process exits 0 and advances checkpoints; keep the test
deterministic and avoid mocking secrets.
In `@scripts/secret-scan-regression.sh`:
- Line 89: The regex used to normalize Windows path separators in the fallback
scanner is wrong: change the replace call that builds rel (currently using
path.relative(sourceDir, full).replace(/\\\\/g, "/")) to use a pattern that
matches single backslashes (use /\\/g) so single "\" characters are replaced
with "/", ensuring Windows paths are normalized; update the replace invocation
on the rel variable accordingly and run the secret scan tests (and add a
Windows-path test case for the fallback scanner) to confirm behavior.
In `@scripts/seed-health-fixture.js`:
- Around line 10-48: The fixture writes and creates sensitive files/directories
with default permissions; update the mkdirSync calls to set a restrictive
directory mode (e.g., mode: 0o700) and update each writeFileSync call to pass an
options object that sets a secure file mode (e.g., mode: 0o600) along with
encoding so files like openai-codex-accounts.json, settings.json and audit.log
are created with explicit, non-world-readable permissions; locate usages of
mkdirSync, writeFileSync, logsDir and root to apply these option changes
consistently.
In `@scripts/slo-budget-report.js`:
- Around line 31-34: The loadAuditEntries function can race between existsSync
and readdir causing a hard failure when the directory vanishes or is rotated;
update loadAuditEntries to wrap the readdir call in a try/catch that treats
ENOENT and ENOTDIR (and other transient fs errors) as "no files" and returns []
instead of throwing, preserving current behavior for other unexpected errors;
then add a deterministic vitest regression in test/slo-budget-report.test.ts
that simulates logDir churn by mocking fs.readdir (or readdirSync wrapper used
by loadAuditEntries) to throw ENOENT and ENOTDIR in separate cases and assert
the overall script still produces a valid payload (no crash) — reference the
loadAuditEntries function and the test file name when locating code to change.
In `@test/audit-log-forwarder.test.ts`:
- Around line 255-323: Add a deterministic Vitest regression that forces the
"newest log disappears between collection and final stat" race by creating a log
file, ensuring the forwarder will discover it, then deleting that logfile before
the forwarder calls fs.stat (so the stat will raise ENOENT/ENOTDIR) and
asserting the process exits successfully with newestLogMtimeMs: null; implement
this as a new it(...) alongside the existing tests using the same runForwarder
and withServer helpers, control timing via small env waits (e.g.,
CODEX_AUDIT_FORWARDER_STALE_LOCK_MS/MAX_WAIT_MS) or explicit sync deletion right
after starting the forwarder, and verify result.status === 0 and result.stdout
or parsed output contains newestLogMtimeMs: null to ensure the forwarder
swallows the ENOENT and continues.
In `@test/audit.test.ts`:
- Around line 271-300: The Date.now spy created via vi.spyOn(Date,
"now").mockReturnValue(...) (assigned to nowSpy) is only restored inside a late
finally block and can leak if earlier setup throws; move the
nowSpy.mockRestore() into a finally that encloses the entire test body (or add
an outer try/finally that wraps all setup, configureAudit(...), file ops,
auditLog(...) calls) so nowSpy is always restored, referencing the nowSpy
variable and the vi.spyOn(Date, "now") call in this test (also ensure auditLog
and configureAudit calls remain inside that try block).
---
Duplicate comments:
In `@test/token-store.test.ts`:
- Around line 245-272: Add a deterministic vitest regression that exercises the
retry logic in lib/secrets/token-store.ts (the block around deleteAccountSecrets
and the retry loop at lines ~70-93): mock keytar.deletePassword to throw a
transient error (e.g., code "EBUSY" or an object with status 429) on the first
refresh-token delete attempt and succeed on subsequent calls, then call
deleteAccountSecrets with both refreshTokenRef and accessTokenRef and assert
that the refresh delete was retried (expected retry count) and that the
access-token delete was still attempted and recorded; ensure the test is
deterministic (use vitest timers or counters) and placed alongside existing
token-store.test.ts cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fbdeb2d0-5835-4055-b62f-367835666b58
📒 Files selected for processing (25)
.github/workflows/ci.yml.github/workflows/recovery-drill.ymlREADME.mddocs/operations/audit-forwarding.mddocs/operations/incident-drill-template.mddocs/upgrade.mdlib/audit.tslib/codex-manager.tslib/secrets/token-store.tsscripts/audit-log-forwarder.jsscripts/enterprise-health-check.jsscripts/secret-scan-regression.shscripts/seed-health-fixture.jsscripts/slo-budget-report.jstest/audit-log-forwarder.test.tstest/audit-retry.test.tstest/audit.test.tstest/compliance-evidence-bundle.test.tstest/enterprise-health-check.test.tstest/helpers/remove-with-retry.tstest/retention-cleanup.test.tstest/schemas.test.tstest/security/secret-scan-regression.test.tstest/slo-budget-report.test.tstest/token-store.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/codex-manager.tslib/secrets/token-store.tslib/audit.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/audit-retry.test.tstest/slo-budget-report.test.tstest/audit.test.tstest/schemas.test.tstest/enterprise-health-check.test.tstest/helpers/remove-with-retry.tstest/audit-log-forwarder.test.tstest/security/secret-scan-regression.test.tstest/retention-cleanup.test.tstest/compliance-evidence-bundle.test.tstest/token-store.test.ts
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/operations/audit-forwarding.mddocs/upgrade.mddocs/operations/incident-drill-template.md
🧬 Code graph analysis (13)
test/audit-retry.test.ts (2)
lib/logger.ts (1)
error(389-393)scripts/audit-dev-allowlist.js (1)
audit(150-161)
test/audit.test.ts (1)
lib/audit.ts (3)
auditLog(193-233)getAuditLogPath(235-237)configureAudit(76-79)
test/schemas.test.ts (1)
lib/schemas.ts (3)
AccountStorageV4Schema(165-170)AnyAccountStorageSchema(210-214)safeParseAccountStorage(309-315)
test/enterprise-health-check.test.ts (2)
test/helpers/remove-with-retry.ts (1)
removeWithRetry(3-18)scripts/enterprise-health-check.js (4)
root(213-213)payload(279-286)findings(214-214)checks(215-215)
test/helpers/remove-with-retry.ts (3)
scripts/audit-log-forwarder.js (3)
attempt(205-205)attempt(292-292)code(306-306)lib/logger.ts (1)
error(389-393)scripts/enterprise-health-check.js (1)
code(164-164)
scripts/slo-budget-report.js (2)
scripts/audit-log-forwarder.js (5)
logDir(347-347)entries(94-94)entries(153-153)root(346-346)payload(375-381)scripts/enterprise-health-check.js (4)
entries(97-97)entries(160-160)root(213-213)payload(279-286)
test/security/secret-scan-regression.test.ts (1)
test/helpers/remove-with-retry.ts (1)
removeWithRetry(3-18)
scripts/audit-log-forwarder.js (1)
lib/utils.ts (1)
sleep(65-67)
lib/secrets/token-store.ts (2)
lib/logger.ts (2)
createLogger(366-424)error(389-393)lib/utils.ts (1)
sleep(65-67)
scripts/enterprise-health-check.js (1)
lib/runtime-paths.ts (1)
getCodexHomeDir(43-46)
test/retention-cleanup.test.ts (2)
scripts/retention-cleanup.js (2)
root(81-81)payload(108-117)test/helpers/remove-with-retry.ts (1)
removeWithRetry(3-18)
test/compliance-evidence-bundle.test.ts (1)
scripts/compliance-evidence-bundle.js (3)
outDir(141-141)result(152-152)payload(157-168)
lib/audit.ts (1)
lib/runtime-paths.ts (1)
getCodexLogDir(226-228)
🪛 markdownlint-cli2 (0.21.0)
docs/upgrade.md
[warning] 75-75: Ordered list item prefix
Expected: 1; Actual: 2; Style: 1/1/1
(MD029, ol-prefix)
[warning] 81-81: Ordered list item prefix
Expected: 1; Actual: 3; Style: 1/2/3
(MD029, ol-prefix)
[warning] 83-83: Ordered list item prefix
Expected: 2; Actual: 4; Style: 1/2/3
(MD029, ol-prefix)
[warning] 90-90: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
🔇 Additional comments (24)
README.md (1)
189-189: table row is now clear and render-safe.line 189 now avoids pipe-delimiter breakage and keeps enterprise guidance explicit for
keychainpinning. looks good. seelib/README.md:189.docs/operations/incident-drill-template.md (1)
63-63: exit criteria wording is now precise.line 63 clearly points operators to the json
statusfield and expected value. good fix. seelib/docs/operations/incident-drill-template.md:63.test/compliance-evidence-bundle.test.ts (3)
53-57: deterministic launcher setup is correct.both fake npm shims invoke
process.execPath, which removes path drift across runners.test/compliance-evidence-bundle.test.ts:53-57.As per coding guidelines,
test/**: tests must stay deterministic and use vitest.
78-113: maxbuffer regression coverage is strong.this test checks process success, json payload status, manifest integrity, and a >1mb log artifact in one flow.
test/compliance-evidence-bundle.test.ts:78-113.As per coding guidelines,
test/**: tests must stay deterministic and use vitest.
115-139: windows npm.cmd path is explicitly validated.good explicit windows regression: wrapper marker confirms the
cmdpath is used, not just implicitly covered.test/compliance-evidence-bundle.test.ts:115-139.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.docs/operations/audit-forwarding.md (1)
23-41: looks good.the command examples and failure/retry semantics are clear and actionable in
docs/operations/audit-forwarding.md:23anddocs/operations/audit-forwarding.md:61.Also applies to: 61-70
test/helpers/remove-with-retry.ts (1)
3-17: good extraction and retry behavior.the helper in
test/helpers/remove-with-retry.ts:3is bounded, deterministic, and handles expected transient windows fs errors.test/retention-cleanup.test.ts (1)
49-71: nice deterministic windows gating.
test/retention-cleanup.test.ts:49correctly usesit.skipIf(process.platform === "win32")and keeps real assertions in the active path.test/audit.test.ts (1)
211-226: good windows + posix regression coverage for secure audit log mode.Line 212-Line 216 in
test/audit.test.ts:212now validates the windows path with a real content assertion, and Line 225 intest/audit.test.ts:225enforces0600on posix. this is the right cross-platform guard forlib/audit.tspermission hardening.As per coding guidelines
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.scripts/enterprise-health-check.js (1)
158-169: good churn handling and secure-mode error behavior.Line 161-Line 167 in
scripts/enterprise-health-check.js:161correctly treatsENOENT/ENOTDIRas no-audit-data, and Line 200 inscripts/enterprise-health-check.js:200avoids noisy findings for missing files during permission checks. this keeps health checks stable under filesystem churn.Also applies to: 186-207
test/slo-budget-report.test.ts (1)
83-140: good deterministic coverage for timeout and windows override behavior.Line 90-Line 109 in
test/slo-budget-report.test.ts:90validates timeout/kill options, and Line 129-Line 139 intest/slo-budget-report.test.ts:129covers windows-style health script paths without skipping assertions.As per coding guidelines
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.test/enterprise-health-check.test.ts (1)
59-161: strong fallback and audit-dir churn regression coverage.Line 89-Line 95 in
test/enterprise-health-check.test.ts:89keeps path assertions windows-safe, and Line 141-Line 161 intest/enterprise-health-check.test.ts:141locks in theENOTDIRchurn behavior expected byscripts/enterprise-health-check.js.As per coding guidelines
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.scripts/audit-log-forwarder.js (1)
289-327: checkpoint lock acquisition and stale-lock handling are solid.
withCheckpointLockinscripts/audit-log-forwarder.js:289now has pid ownership, stale lock cleanup, and bounded wait semantics. this is a strong improvement for multi-process safety..github/workflows/ci.yml (1)
16-72: ci hardening updates look good.the matrix in
.github/workflows/ci.yml:16now exercises windows + linux, and Line 66-Line 69 runs enterprise health in required-artifact mode against the seeded fixture. this meaningfully reduces vacuous passes.test/security/secret-scan-regression.test.ts (2)
12-34: solid deterministic scanner helper for regression fixtures.
collectSyntheticFindingsis clear and stable, and path normalization intest/security/secret-scan-regression.test.ts:26keeps findings consistent across os path separators.
47-81: good regression coverage for allowlist vs non-allowlisted findings.the assertions at
test/security/secret-scan-regression.test.ts:75-80are specific and enforce both fail-case and pass-case behavior without skipped checks.scripts/secret-scan-regression.sh (1)
116-151: explicit fail-case and pass-case status checks are well handled.the guarded
set +e/ status capture flow inscripts/secret-scan-regression.sh:116-151gives clear diagnostics and deterministic exits.test/schemas.test.ts (2)
241-281: v4 schema coverage is solid here.
test/schemas.test.ts:241-281now validates valid v4 payloads, missingrefreshTokenRef, emptyaccessTokenRef, and optionalactiveIndexByFamilyhandling with deterministic assertions.
331-341: v4 union + helper wiring is now covered end-to-end.
test/schemas.test.ts:331-341andtest/schemas.test.ts:486-494correctly pin both discriminated-union parsing andsafeParseAccountStoragehelper behavior for version 4.Also applies to: 486-494
lib/secrets/token-store.ts (3)
52-58: keytar interop normalization looks correct.
lib/secrets/token-store.ts:52-58properly normalizes cjs/esm shapes and validates required keytar methods before enabling keychain operations.
142-157: partial-write rollback is implemented correctly.
lib/secrets/token-store.ts:142-157rolls back the refresh secret if access secret persistence fails, then rethrows the original failure.
199-214: independent delete attempts are correctly preserved.
lib/secrets/token-store.ts:199-214attempts refresh and access deletions independently and only throws after both attempts, which avoids access-secret orphaning.test/token-store.test.ts (2)
125-160: the concurrency race setup is now deterministic.
test/token-store.test.ts:125-160usesfirstRefreshEntered+refreshGatecorrectly, so the interleaving is explicit and no longer microtask-flush dependent.
179-243: rollback and full cleanup regressions are well covered.
test/token-store.test.ts:179-243deterministically verifies refresh rollback on access-write failure and validates dual-ref delete cleanup behavior.
Address remaining PR43 unresolved threads with targeted hardening and regressions across workflows, docs, audit/runtime scripts, and CLI audit coverage. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 8
♻️ Duplicate comments (3)
docs/operations/incident-drill-template.md (1)
44-49:⚠️ Potential issue | 🟡 Minoradd an explicit windows execution note next to the powershell block.
the powershell commands are good, but add one short line like “run in an elevated powershell session” to prevent responder improvisation during incidents. see
lib/docs/operations/incident-drill-template.md:44-49.As per coding guidelines, "docs/**: keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/operations/incident-drill-template.md` around lines 44 - 49, Add a short explicit Windows execution note immediately adjacent to the PowerShell code block containing "npm run ops:health-check" and the "codex auth ..." commands—e.g., a single-line instruction such as "Run in an elevated PowerShell session (Run as Administrator)"—so responders know to execute those commands in an elevated PowerShell environment; update the nearby PowerShell fence in the incident-drill template to include that one-line note.scripts/slo-budget-report.js (1)
31-42: 🧹 Nitpick | 🔵 Trivialredundant existsSync can be removed.
scripts/slo-budget-report.js:32checksexistsSync(logDir)but lines 36-42 already catch ENOENT/ENOTDIR from readdir. the existsSync is redundant and introduces a minor toctou window. consider removing line 32.proposed simplification
async function loadAuditEntries(logDir, cutoffMs) { - if (!existsSync(logDir)) return []; let entries; try { entries = await readdir(logDir, { withFileTypes: true });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/slo-budget-report.js` around lines 31 - 42, Remove the redundant existsSync check in loadAuditEntries and rely on the try/catch around readdir to handle missing or non-directory paths; specifically, delete the preliminary if (!existsSync(logDir)) return []; in function loadAuditEntries and keep the existing try { entries = await readdir(logDir, { withFileTypes: true }); } catch (error) { const code = error?.code; if (code === "ENOENT" || code === "ENOTDIR") return []; throw error; } so you avoid the TOCTOU window and still handle ENOENT/ENOTDIR from readdir.scripts/seed-health-fixture.js (1)
10-10:⚠️ Potential issue | 🟠 Majorharden pre-existing directory modes to prevent permission drift.
line 10 sets mode only on creation. if
rootorlogsDiralready exists with broader perms, this call does not tighten them. add best-effortchmodSyncfor both directories aftermkdirSync. evidence: scripts/seed-health-fixture.js:7-10.proposed fix
mkdirSync(logsDir, { recursive: true, mode: 0o700 }); +try { + chmodSync(root, 0o700); +} catch { + // best-effort hardening for non-posix environments. +} +try { + chmodSync(logsDir, 0o700); +} catch { + // best-effort hardening for non-posix environments. +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/seed-health-fixture.js` at line 10, The mkdirSync call using mode 0o700 only sets permissions when creating logsDir; to harden pre-existing directories, follow up the existing mkdirSync(logsDir, { recursive: true, mode: 0o700 }) with best-effort chmodSync calls to tighten permissions on both logsDir and its parent/root directory (use the same 0o700 or appropriate restrictive mode), wrapping each chmodSync in a try/catch to ignore non-fatal errors; reference the logsDir identifier and the existing mkdirSync invocation when locating where to add the chmodSync calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/recovery-drill.yml:
- Around line 1-46: The scheduled "Recovery Drill" workflow's job
"recovery-drill" lacks a failure notification; add a final step (after "Upload
recovery drill artifacts") that runs only on failure (use GitHub Actions
condition if: failure()) and sends a notification (Slack webhook, email, or
other alert) with context and links to the artifact files
(.tmp/recovery-drill-vitest.json and .tmp/recovery-drill-health.json); reference
the job name "recovery-drill" and the artifacts uploaded by "Upload recovery
drill artifacts" when implementing the notifier so failures trigger an immediate
alert to the on-call channel or distribution list.
- Around line 11-29: Add a job-level timeout to the recovery-drill GitHub
Actions job to prevent runaway runs: update the job block named "recovery-drill"
to include a timeout-minutes value (e.g., 30) so the job will be aborted if it
exceeds that time. Locate the "recovery-drill" job definition and add the
timeout-minutes field alongside runs-on and steps to enforce a maximum runtime
for the monthly drill.
In `@docs/upgrade.md`:
- Around line 75-79: The documentation uses npm script examples (e.g., npm run
ops:keychain-assert and npm run ops:health-check) but doesn't state they must be
executed from the project repository checkout where package.json defines these
scripts; update the upgrade notes to add a short clarifying sentence after each
npm command (lines referencing ops:keychain-assert and ops:health-check) stating
"Run this from the project repository root where package.json defines the
enterprise ops scripts, or run your CI/job wrapper that exposes these scripts,"
and optionally provide the equivalent standalone commands or CI wrapper note to
avoid missing-script errors for operators.
In `@scripts/secret-scan-regression.sh`:
- Around line 30-32: The test fixture written by
scripts/secret-scan-regression.sh uses a non-secret literal
("fake_refresh_token_12345") so the scanner assertion passes trivially; update
the here-doc content written to
"${FAIL_CASE_DIR}/test/security/fixtures/fixture.txt" to use a secret-like value
that matches the secret regex used in tests (e.g., a long base64/hex-looking
token or pattern-consistent string) so it mirrors the change in
test/security/secret-scan-regression.test.ts and ensures the assertion exercises
the scanner rather than succeeding trivially.
In `@test/audit-log-forwarder.test.ts`:
- Around line 12-27: The test defines a duplicate removeWithRetry helper; remove
the local async function removeWithRetry from test/audit-log-forwarder.test.ts
and replace it with an import from the shared helper (exported from
test/helpers/remove-with-retry.ts). Locate the local function named
removeWithRetry, delete its declaration, add an import for removeWithRetry from
the shared helper, and ensure any TypeScript types/usage in
audit-log-forwarder.test.ts match the shared helper's signature.
In `@test/security/secret-scan-regression.test.ts`:
- Around line 47-52: The inline regex sanity checks in the test "keeps fixture
allowlist behavior and flags only non-allowlisted secrets" should be extracted
into a separate test to improve isolation and diagnostics: create a new it block
named something like it("fixture allowlist regex handles windows paths") that
contains the two assertions using the regex
/^test[\\/]security[\\/]fixtures[\\/]/i.test("test\\security\\fixtures\\fixture.txt")
and the path normalization assertion
"test\\security\\fixtures\\fixture.txt".replace(/\\/g, "/") ===
"test/security/fixtures/fixture.txt", leaving the original test to only assert
gitleaksConfig contains the allowlist; reference the existing test name and
these regex/assertion expressions when locating and moving the checks.
- Around line 67-78: The allowlist test currently writes non-matching content to
"fixture.txt" so the detector is never triggered; update the writeFile call that
creates fixture.txt in the failCase (created via failCase variable) to include a
secret-like string that matches the secretPattern (e.g. an OPENAI_API_KEY value
such as "OPENAI_API_KEY=sk-...") so the detector would flag it without the
allowlist; keep the other assertions and use collectSyntheticFindings to verify
that the allowlist path actually excludes the file by ensuring the failFindings
includes "test/security/fixtures/real-secret.txt" but does not include
"test/security/fixtures/fixture.txt".
In `@test/slo-budget-report.test.ts`:
- Around line 11-26: The test defines a duplicate removeWithRetry function;
instead delete the local async function removeWithRetry in
test/slo-budget-report.test.ts and import the shared helper from
test/helpers/remove-with-retry.ts (use the exported symbol removeWithRetry) so
all retries use the central implementation; ensure the import matches the
helper's exported name and types and leave existing calls in the test unchanged.
---
Duplicate comments:
In `@docs/operations/incident-drill-template.md`:
- Around line 44-49: Add a short explicit Windows execution note immediately
adjacent to the PowerShell code block containing "npm run ops:health-check" and
the "codex auth ..." commands—e.g., a single-line instruction such as "Run in an
elevated PowerShell session (Run as Administrator)"—so responders know to
execute those commands in an elevated PowerShell environment; update the nearby
PowerShell fence in the incident-drill template to include that one-line note.
In `@scripts/seed-health-fixture.js`:
- Line 10: The mkdirSync call using mode 0o700 only sets permissions when
creating logsDir; to harden pre-existing directories, follow up the existing
mkdirSync(logsDir, { recursive: true, mode: 0o700 }) with best-effort chmodSync
calls to tighten permissions on both logsDir and its parent/root directory (use
the same 0o700 or appropriate restrictive mode), wrapping each chmodSync in a
try/catch to ignore non-fatal errors; reference the logsDir identifier and the
existing mkdirSync invocation when locating where to add the chmodSync calls.
In `@scripts/slo-budget-report.js`:
- Around line 31-42: Remove the redundant existsSync check in loadAuditEntries
and rely on the try/catch around readdir to handle missing or non-directory
paths; specifically, delete the preliminary if (!existsSync(logDir)) return [];
in function loadAuditEntries and keep the existing try { entries = await
readdir(logDir, { withFileTypes: true }); } catch (error) { const code =
error?.code; if (code === "ENOENT" || code === "ENOTDIR") return []; throw
error; } so you avoid the TOCTOU window and still handle ENOENT/ENOTDIR from
readdir.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6f068912-b8ae-4c53-a3bd-56349f881730
📒 Files selected for processing (14)
.github/workflows/recovery-drill.ymldocs/operations/incident-drill-template.mddocs/upgrade.mdlib/audit.tsscripts/audit-log-forwarder.jsscripts/secret-scan-regression.shscripts/seed-health-fixture.jsscripts/slo-budget-report.jstest/audit-log-forwarder.test.tstest/audit-retry.test.tstest/audit.test.tstest/codex-manager-cli.test.tstest/security/secret-scan-regression.test.tstest/slo-budget-report.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/slo-budget-report.test.tstest/audit-retry.test.tstest/codex-manager-cli.test.tstest/audit.test.tstest/security/secret-scan-regression.test.tstest/audit-log-forwarder.test.ts
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/operations/incident-drill-template.mddocs/upgrade.md
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/audit.ts
🧬 Code graph analysis (7)
test/slo-budget-report.test.ts (3)
test/helpers/remove-with-retry.ts (1)
removeWithRetry(3-18)scripts/slo-budget-report.js (6)
code(37-37)code(55-55)root(128-128)logDir(129-129)policyPath(125-125)payload(192-219)scripts/enterprise-health-check.js (3)
code(164-164)root(213-213)payload(279-286)
test/audit-retry.test.ts (2)
lib/logger.ts (1)
error(389-393)scripts/audit-dev-allowlist.js (1)
audit(150-161)
test/audit.test.ts (1)
lib/audit.ts (3)
auditLog(190-230)getAuditLogPath(232-234)configureAudit(74-77)
test/security/secret-scan-regression.test.ts (1)
test/helpers/remove-with-retry.ts (1)
removeWithRetry(3-18)
scripts/audit-log-forwarder.js (1)
lib/utils.ts (1)
sleep(65-67)
scripts/seed-health-fixture.js (3)
scripts/audit-log-forwarder.js (1)
root(370-370)scripts/slo-budget-report.js (1)
root(128-128)scripts/enterprise-health-check.js (2)
root(213-213)settingsPath(218-218)
test/audit-log-forwarder.test.ts (1)
test/helpers/remove-with-retry.ts (1)
removeWithRetry(3-18)
🔇 Additional comments (31)
docs/upgrade.md (1)
69-94: nice improvement: migration flow order and list structure are now clear.the keychain validation now precedes
auto, and the ordered-list/code-block structure looks consistent. this resolves the earlier readability/lint concerns..github/workflows/recovery-drill.yml (4)
1-10: workflow metadata and permissions look solid.
lib/...:recovery-drill.yml:3-6— schedule + manual dispatch is the right pattern for monthly operational drills.permissions: contents: readat line 8-9 follows least-privilege; no write access needed here.
31-34: delegating toops:recovery-drillis the right call.
lib/...:recovery-drill.yml:31-34— this addresses the prior review feedback to avoid duplicating the test target list inline. the workflow now calls the packaged npm script and forwards reporter flags. clean.
36-37: health check captures stderr now — good.
lib/...:recovery-drill.yml:37—2>&1redirect ensures warnings and errors land in the artifact. addresses prior feedback.one minor note: if
enterprise-health-check.jsemits non-json to stderr, the combined output won't be valid json. downstream consumers should handle that gracefully or the script should emit structured output only.
39-46: artifact upload withif: always()addresses prior feedback.
lib/...:recovery-drill.yml:40-46— ensures diagnostic files are uploaded even when drill steps fail. this is critical for post-incident analysis of scheduled runs.test/security/secret-scan-regression.test.ts (3)
1-7: lgtm on imports.vitest imports are correct, removeWithRetry helper is appropriate for windows filesystem cleanup edge cases. no issues here.
12-34: collectSyntheticFindings looks solid.recursive directory walk with async fs operations, proper path normalization at
test/security/secret-scan-regression.test.ts:26, and sequential checks for secret pattern then allowlist. logic is correct.
36-45: cleanup is correct and windows-safe.
afterEachusesremoveWithRetrywhich handlesEBUSY/EPERMretries - this is the right call for windows filesystem behavior pertest/helpers/remove-with-retry.ts:3-4.scripts/secret-scan-regression.sh (4)
1-15: setup and config validation looks good.strict mode with
set -euo pipefailatscripts/secret-scan-regression.sh:3, proper config path resolution, and fail-fast on missing config. clean.
57-114: run_gitleaks_detect fallback chain is well-structured.native binary → node fallback → docker fallback. the node fallback at lines 74-100 correctly handles path normalization with
/\\/gat line 89 (previous review flagged/\\\\/g). exit codes propagate correctly.one minor note: the node fallback uses synchronous fs operations which could block on large directories, but for a regression test with small synthetic fixtures this is acceptable.
116-151: fail-case and pass-case status handling is correct.
scripts/secret-scan-regression.sh:116-119andscripts/secret-scan-regression.sh:144-147both use explicit status capture withset +e/set -eand targeted error messages. this addresses the previous review feedback about implicit status assumptions.
126-142: findings validation is thorough.checks array non-empty, validates specific file paths, and confirms allowlisted fixture is excluded. the optional chaining
f?.Fileis defensive against malformed output. good.test/audit-retry.test.ts (1)
1-79: lgtm - retry test covers ebusy path and confirms atomics.wait removal.the test correctly exercises the retry loop in
lib/audit.ts:118-131by failing unlinkSync twice with EBUSY before succeeding. theatomicsWaitSpyassertion at line 78 locks in that blocking atomics.wait is no longer used.one optional addition: consider adding test cases for EPERM and EAGAIN since
lib/audit.ts:61includes them inRETRYABLE_AUDIT_FS_CODES. windows can throw EPERM during file contention.lib/audit.ts (3)
113-131: retry helper uses immediate bounded retries - acceptable for sync audit.
lib/audit.ts:118-131retries up to 5 times with no delay between attempts. past reviews discussed adding backoff but the final decision was to keep sync context simple with immediate retries since audit is best-effort. the test attest/audit-retry.test.ts:78confirms atomics.wait is not used.
133-162: purge throttle correctly updates after successful read.
lib/audit.ts:149setslastPurgeAttemptMsonly after a successfulreaddirSync, so transient EBUSY failures won't block purges for the full interval. per-file errors at lines 158-160 are silently swallowed as best-effort, which is appropriate.
216-226: write and chmod hardening looks correct.
lib/audit.ts:217-219wraps writeFileSync in retry helper.lib/audit.ts:220-226guards chmodSync behindprocess.platform !== "win32"with a best-effort catch. this matches the pattern inlib/unified-settings.ts.test/codex-manager-cli.test.ts (2)
1984-2060: audit command mapping test covers key cli flows.
test/codex-manager-cli.test.ts:1984-2060validates that login/switch/check/verify-flagged/forecast/status commands emit correct AuditAction and AuditOutcome values. the failure paths (lines 1993-2000, 2003-2011) and success paths are both covered.
2062-2098: sanitization test verifies token and email redaction in error metadata.
test/codex-manager-cli.test.ts:2062-2098throws an error containing tokens, emails, and a long string, then asserts the audited error is truncated to 200 chars and contains redaction placeholders. this locks in compliance with the lib/** guideline against logging tokens or emails.test/audit.test.ts (2)
211-226: windows branch now has real assertion.
test/audit.test.ts:212-216on win32 now calls auditLog and assertscontent.toContain("\"action\":\"request.start\"")instead of the previous no-opexpect(true).toBe(true). this satisfies the guideline to not skip assertions.
271-299: date spy properly restored across full test body.
test/audit.test.ts:274-298wraps the entire test logic in try/finally so nowSpy.mockRestore() runs even if setup throws. this addresses the past review concern about spy leakage.scripts/audit-log-forwarder.js (3)
68-90: loadCheckpoint handles race between existsSync and readFile.
scripts/audit-log-forwarder.js:69-70uses existsSync but the subsequent readFile at line 73 is wrapped in try/catch (lines 86-88) that returns a safe default checkpoint on any error. this handles the toctou race gracefully.
212-254: sendBatch implements timeout and retry with backoff.
scripts/audit-log-forwarder.js:222-253uses AbortController for timeout, retries 429/5xx and network errors with exponential backoff plus jitter. clearTimeout in finally prevents timer leaks. this addresses the past review concerns about hanging fetches and fragile error handling.
306-344: checkpoint lock handles windows eperm and stale lock detection.
scripts/audit-log-forwarder.js:324treats both EEXIST and EPERM as contention (windows throws EPERM).clearStaleCheckpointLockat line 328 checks mtime and validates pid existence before removing stale locks. the max wait guard at lines 333-338 prevents indefinite blocking.scripts/slo-budget-report.js (2)
89-117: runHealthCheck has timeout, killSignal, and fallback message.
scripts/slo-budget-report.js:98-100uses configurable timeout, SIGTERM kill signal, and maxBuffer. line 105 falls back to error.message when stdout/stderr are empty. this addresses the past review concerns about hanging child processes and empty findings.
146-152: stale findings count includes both stale-wal and stale-audit-log.
scripts/slo-budget-report.js:148-150filters for both finding codes, fixing the mismatch flagged in past reviews where only "stale-wal" was counted but tests expected "stale-audit-log".test/slo-budget-report.test.ts (1)
129-140: windows path test verifies backslash handling.
test/slo-budget-report.test.ts:132setsCODEX_SLO_HEALTH_CHECK_SCRIPTwith a windows-style backslash path and line 138 asserts it's passed through correctly. this covers the windows edge case.test/audit-log-forwarder.test.ts (2)
324-359: newest log race test addresses past review request.
test/audit-log-forwarder.test.ts:324-359creates a log file, deletes it during the server handler (simulating race between collection and final stat), and assertsnewestLogMtimeMs: nullwith exit status 0. this covers the churn regression flagged in past reviews.
208-253: lock contention test validates concurrency behavior.
test/audit-log-forwarder.test.ts:208-253creates a lock file, releases it after 120ms via setTimeout, and verifies the forwarder waits and completes successfully. this is a good regression for concurrent process scenarios per the test/** guideline.scripts/seed-health-fixture.js (3)
40-44: windows edge case handling for chmod is good.the try/catch blocks correctly keep the script portable on non-posix systems while still attempting hardening on posix. evidence: scripts/seed-health-fixture.js:40-44, scripts/seed-health-fixture.js:51-55, scripts/seed-health-fixture.js:62-66.
Also applies to: 51-55, 62-66
6-8: no actual concurrency risk in current ci workflow.the script runs once sequentially in
.github/workflows/ci.yml:64with no matrix strategy or parallel job invocation. each github actions job has isolated workspace by default, soGITHUB_WORKSPACEpoints to unique per-job storage. a concurrency issue only exists if: (1) multiple jobs in the same workflow run this script simultaneously, or (2) developers manually invoke it in parallel locally—neither is evident. remove this flag unless the workflow will run this script from multiple concurrent jobs.
1-66: add regression tests for fixture permissions and matrix execution.this security-sensitive fixture path lacks dedicated regression test coverage. seed-health-fixture.js:1-66 sets 0o600/0o700 permissions and runs in ci.yml:64 across a windows-latest matrix, where chmodSync will fail (though gracefully caught). while the try-catch handles posix-incompatibility, there's no test verifying permissions enforcement or confirming the fixture handles repeated invocation safely across parallel matrix runs. add direct coverage for permission bits and concurrent seeding behavior.
| async function removeWithRetry(targetPath: string): Promise<void> { | ||
| const retryableCodes = new Set(["EBUSY", "EPERM", "ENOTEMPTY"]); | ||
| for (let attempt = 0; attempt < 6; attempt += 1) { | ||
| try { | ||
| await fs.rm(targetPath, { recursive: true, force: true }); | ||
| return; | ||
| } catch (error) { | ||
| const code = (error as NodeJS.ErrnoException).code; | ||
| if (code === "ENOENT") return; | ||
| if (!code || !retryableCodes.has(code) || attempt === 5) { | ||
| throw error; | ||
| } | ||
| await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
duplicate removeWithRetry helper - import from shared location.
test/slo-budget-report.test.ts:11-26 duplicates the helper from test/helpers/remove-with-retry.ts. import the shared version to keep test utilities dry.
proposed fix
+import { removeWithRetry } from "./helpers/remove-with-retry.js";
-async function removeWithRetry(targetPath: string): Promise<void> {
- const retryableCodes = new Set(["EBUSY", "EPERM", "ENOTEMPTY"]);
- for (let attempt = 0; attempt < 6; attempt += 1) {
- try {
- await fs.rm(targetPath, { recursive: true, force: true });
- return;
- } catch (error) {
- const code = (error as NodeJS.ErrnoException).code;
- if (code === "ENOENT") return;
- if (!code || !retryableCodes.has(code) || attempt === 5) {
- throw error;
- }
- await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt));
- }
- }
-}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function removeWithRetry(targetPath: string): Promise<void> { | |
| const retryableCodes = new Set(["EBUSY", "EPERM", "ENOTEMPTY"]); | |
| for (let attempt = 0; attempt < 6; attempt += 1) { | |
| try { | |
| await fs.rm(targetPath, { recursive: true, force: true }); | |
| return; | |
| } catch (error) { | |
| const code = (error as NodeJS.ErrnoException).code; | |
| if (code === "ENOENT") return; | |
| if (!code || !retryableCodes.has(code) || attempt === 5) { | |
| throw error; | |
| } | |
| await new Promise((resolve) => setTimeout(resolve, 25 * 2 ** attempt)); | |
| } | |
| } | |
| } | |
| import { removeWithRetry } from "./helpers/remove-with-retry.js"; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/slo-budget-report.test.ts` around lines 11 - 26, The test defines a
duplicate removeWithRetry function; instead delete the local async function
removeWithRetry in test/slo-budget-report.test.ts and import the shared helper
from test/helpers/remove-with-retry.ts (use the exported symbol removeWithRetry)
so all retries use the central implementation; ensure the import matches the
helper's exported name and types and leave existing calls in the test unchanged.
- add recovery drill timeout and failure notifier hook - clarify upgrade runbook npm script execution context - harden secret-scan allowlist regression fixtures and test isolation - reuse shared removeWithRetry helper in audit forwarder tests Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/recovery-drill.yml:
- Around line 11-15: The workflow lacks a concurrency key so the recovery-drill
job can run in parallel; add a top-level concurrency stanza to the workflow
(e.g., concurrency: { group: "recovery-drill", cancel-in-progress: true } or a
stable group name using github.workflow and github.ref) to ensure the Monthly
Storage Recovery Drill job (recovery-drill) cannot start a new run while an
existing run is in progress and to optionally cancel in-progress runs when a new
one is triggered.
- Around line 54-62: Replace the unsafe printf-based JSON construction and add a
curl timeout: instead of building payload with printf '{"text":"%s"}'
"${message}", use jq to produce properly escaped JSON by running jq -n --arg msg
"${message}" '{"text": $msg}' and assign that to payload (or stream it directly
to curl), and add --max-time 30 to the curl invocation that posts to
"${RECOVERY_DRILL_WEBHOOK_URL}" while keeping the existing flags (--fail
--silent --show-error) and the RUN_URL/message variables unchanged.
In `@docs/upgrade.md`:
- Around line 79-80: In docs/upgrade.md there are fenced code blocks that close
(the ``` after the bash examples such as the block around "npm run
ops:keychain-assert" and the block around "npm run ops:health-check") and are
immediately followed by paragraph text; add a single blank line after each
closing fence so the paragraph text is separated from the fenced code block
(i.e., insert a blank line after the closing ``` for the ops:keychain-assert
block and after the closing ``` for the ops:health-check block) to satisfy
markdownlint MD031.
In `@test/audit-log-forwarder.test.ts`:
- Around line 13-35: The runForwarder harness can hang forever; add a hard
timeout that kills the spawned child and resolves the promise when exceeded.
Inside runForwarder (the child variable returned by spawn), start a timer (e.g.,
with setTimeout) that calls child.kill() and then resolves with a timeout result
(status null and an explanatory stderr fragment or message) if the process
hasn't exited; clear the timer on child 'close' and on 'error' to avoid leaks,
and ensure the same stdout/stderr aggregation is preserved before resolving.
- Around line 210-223: The test's lock-release timer (releaseTimer using
setTimeout to fs.unlink(checkpointLockPath)) races with the forwarder's timeout
(CODEX_AUDIT_FORWARDER_TIMEOUT_MS), causing flakes; replace the fixed 120ms
wakeup with a deterministic approach—either increase the delay to exceed the
timeout window (e.g., 500ms) or, preferably, replace the timer with a
polling/wait helper that repeatedly checks for the forwarder hitting the lock
wait state before calling fs.unlink(checkpointLockPath); update the test around
runForwarder and the env var CODEX_AUDIT_FORWARDER_TIMEOUT_MS accordingly and
remove the fragile 120ms timing.
- Around line 239-272: The test currently only asserts exit status; update the
"clears stale checkpoint locks and proceeds" test to also assert that the
forwarder actually processed the log and advanced the checkpoint: when calling
withServer and runForwarder, capture and assert the HTTP request payload/body
was received (via the withServer request handler) contains the expected audit
event, and after runForwarder finishes read checkpointPath to assert it exists
and contains an updated checkpoint (e.g., a timestamp or offset indicating
progress) and that checkpointLockPath was removed or replaced; use the existing
helpers runForwarder, withServer, checkpointPath, checkpointLockPath, and logDir
to locate files and the server handler to capture payload, keeping assertions
deterministic under vitest.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dd50ee38-1b3a-42fa-b76c-ee46f056052e
📒 Files selected for processing (5)
.github/workflows/recovery-drill.ymldocs/upgrade.mdscripts/secret-scan-regression.shtest/audit-log-forwarder.test.tstest/security/secret-scan-regression.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
docs/**
⚙️ CodeRabbit configuration file
keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.
Files:
docs/upgrade.md
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/audit-log-forwarder.test.tstest/security/secret-scan-regression.test.ts
🧬 Code graph analysis (1)
test/security/secret-scan-regression.test.ts (1)
test/helpers/remove-with-retry.ts (1)
removeWithRetry(3-18)
🪛 markdownlint-cli2 (0.21.0)
docs/upgrade.md
[warning] 79-79: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 95-95: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🔇 Additional comments (8)
docs/upgrade.md (1)
98-102: good windows migration guardrails here.the
EBUSY/EPERMnote and handle-lock guidance are clear and practical for operators on windows indocs/upgrade.mdline 98-102.test/security/secret-scan-regression.test.ts (2)
12-34: solid synthetic scanner helper for deterministic assertions.
test/security/secret-scan-regression.test.ts:12keeps path normalization and allowlist filtering explicit, which makes the regression deterministic across os variants.
47-54: good windows and allowlist regression coverage.
test/security/secret-scan-regression.test.ts:47andtest/security/secret-scan-regression.test.ts:56now validate windows path handling and prove the allowlisted fixture is excluded while real findings are still reported.As per coding guidelines
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.Also applies to: 56-93
scripts/secret-scan-regression.sh (1)
57-114: execution paths and status handling look correct.
scripts/secret-scan-regression.sh:57andscripts/secret-scan-regression.sh:116correctly enforce fail-case/pass-case expectations across native, docker, and fallback scanner paths.Also applies to: 116-153
.github/workflows/recovery-drill.yml (3)
1-10: workflow header looks solid.minimal
permissions: contents: read, monthly cron trigger, and manual dispatch support are all appropriate for an enterprise recovery drill. no issues here.
32-38: drill test and health check steps look good.
lib/...:recovery-drill.yml:32-38— delegates tonpm run ops:recovery-drillas requested, and the health check now captures stderr with2>&1. both past concerns addressed.
40-47: artifact upload unconditionally — good.
lib/...:recovery-drill.yml:40-47—if: always()ensures drill artifacts are preserved for post-mortem even when earlier steps fail. past concern addressed.test/audit-log-forwarder.test.ts (1)
308-343: good regression coverage for newest-log disappearance race.
test/audit-log-forwarder.test.ts:308-343(Line 308) is solid: it validates success exit,newestLogMtimeMs: null, and checkpoint advancement after file disappearance.
| jobs: | ||
| recovery-drill: | ||
| name: Monthly Storage Recovery Drill | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 30 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
consider adding a concurrency group to prevent parallel drill runs.
lib/...:recovery-drill.yml:11-15 — no concurrency key means a manual workflow_dispatch during an in-progress scheduled run would run both in parallel. for a drill that touches storage/secrets fixtures, parallel runs could produce confusing or conflicting results.
proposed fix
jobs:
recovery-drill:
name: Monthly Storage Recovery Drill
runs-on: ubuntu-latest
timeout-minutes: 30
+ concurrency:
+ group: recovery-drill
+ cancel-in-progress: false
steps:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| jobs: | |
| recovery-drill: | |
| name: Monthly Storage Recovery Drill | |
| runs-on: ubuntu-latest | |
| timeout-minutes: 30 | |
| jobs: | |
| recovery-drill: | |
| name: Monthly Storage Recovery Drill | |
| runs-on: ubuntu-latest | |
| timeout-minutes: 30 | |
| concurrency: | |
| group: recovery-drill | |
| cancel-in-progress: false | |
| steps: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/recovery-drill.yml around lines 11 - 15, The workflow
lacks a concurrency key so the recovery-drill job can run in parallel; add a
top-level concurrency stanza to the workflow (e.g., concurrency: { group:
"recovery-drill", cancel-in-progress: true } or a stable group name using
github.workflow and github.ref) to ensure the Monthly Storage Recovery Drill job
(recovery-drill) cannot start a new run while an existing run is in progress and
to optionally cancel in-progress runs when a new one is triggered.
| run: | | ||
| message="Recovery drill failed. Run: ${RUN_URL}. Artifacts: .tmp/recovery-drill-vitest.json and .tmp/recovery-drill-health.json." | ||
| if [[ -n "${RECOVERY_DRILL_WEBHOOK_URL:-}" ]]; then | ||
| payload=$(printf '{"text":"%s"}' "${message}") | ||
| curl --fail --silent --show-error \ | ||
| -X POST \ | ||
| -H "Content-Type: application/json" \ | ||
| --data "${payload}" \ | ||
| "${RECOVERY_DRILL_WEBHOOK_URL}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
git ls-files | grep -i "recovery-drill"Repository: ndycode/codex-multi-auth
Length of output: 104
🏁 Script executed:
cat -n .github/workflows/recovery-drill.yml | sed -n '50,65p'Repository: ndycode/codex-multi-auth
Length of output: 1017
🌐 Web query:
jq availability ubuntu-latest GitHub Actions runner 2024 2025
💡 Result:
jqis preinstalled on GitHub-hosted Ubuntu runners. Onubuntu-22.04the image includes jq 1.6. (github.com)- On
ubuntu-24.04the image includes jq 1.7 (APT package shown as 1.7.1). (github.com) ubuntu-latestswitched from Ubuntu 22.04 to Ubuntu 24.04 during Dec 2024–Jan 2025 (gradual migration), so the defaultjqonubuntu-latesteffectively moved from 1.6 → 1.7 across that window. To avoid surprises, pinruns-on: ubuntu-22.04orubuntu-24.04. (github.com)
Citations:
- 1: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2204-Readme.md
- 2: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md
- 3: [cli][Report] GitHub Actions with
ubuntu-latestNow Runs on Ubuntu 24.04, Which No Longer Includes Vercel CLI vercel/vercel#12801
use jq for json escaping instead of printf.
.github/workflows/recovery-drill.yml:57 — printf '{"text":"%s"}' "${message}" doesn't escape special chars in the payload. the current message is safe since it's built from github context variables, but future changes could break this silently. jq is preinstalled on ubuntu-latest (1.7+), so replace the printf call with jq -n --arg msg "${message}" '{"text": $msg}' for proper escaping.
also add --max-time 30 to curl to prevent indefinite hangs if the webhook endpoint doesn't respond.
proposed fix
run: |
message="Recovery drill failed. Run: ${RUN_URL}. Artifacts: .tmp/recovery-drill-vitest.json and .tmp/recovery-drill-health.json."
if [[ -n "${RECOVERY_DRILL_WEBHOOK_URL:-}" ]]; then
- payload=$(printf '{"text":"%s"}' "${message}")
+ payload=$(jq -n --arg msg "${message}" '{"text": $msg}')
curl --fail --silent --show-error \
+ --max-time 30 \
-X POST \
-H "Content-Type: application/json" \
--data "${payload}" \
"${RECOVERY_DRILL_WEBHOOK_URL}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/recovery-drill.yml around lines 54 - 62, Replace the
unsafe printf-based JSON construction and add a curl timeout: instead of
building payload with printf '{"text":"%s"}' "${message}", use jq to produce
properly escaped JSON by running jq -n --arg msg "${message}" '{"text": $msg}'
and assign that to payload (or stream it directly to curl), and add --max-time
30 to the curl invocation that posts to "${RECOVERY_DRILL_WEBHOOK_URL}" while
keeping the existing flags (--fail --silent --show-error) and the
RUN_URL/message variables unchanged.
| ``` | ||
| Run this from the project repository root where `package.json` defines enterprise ops scripts, or run your CI/job wrapper that exposes these scripts. |
There was a problem hiding this comment.
fix md031 fence spacing around list-item code blocks.
line 79 and line 95 close fenced blocks and immediately continue with paragraph text. add a blank line after each closing fence so markdownlint stays clean in docs/upgrade.md line 79 and docs/upgrade.md line 95.
proposed patch
2. Validate keychain backend availability:
```bash
npm run ops:keychain-assert
```
+
Run this from the project repository root where `package.json` defines enterprise ops scripts, or run your CI/job wrapper that exposes these scripts.
@@
5. Verify health and storage state:
```bash
npm run ops:health-check -- --require-files
```
+
Run this from the same repository checkout (or your standard CI/job wrapper).Also applies to: 95-96
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 79-79: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/upgrade.md` around lines 79 - 80, In docs/upgrade.md there are fenced
code blocks that close (the ``` after the bash examples such as the block around
"npm run ops:keychain-assert" and the block around "npm run ops:health-check")
and are immediately followed by paragraph text; add a single blank line after
each closing fence so the paragraph text is separated from the fenced code block
(i.e., insert a blank line after the closing ``` for the ops:keychain-assert
block and after the closing ``` for the ops:health-check block) to satisfy
markdownlint MD031.
| function runForwarder( | ||
| args: string[], | ||
| env: NodeJS.ProcessEnv = {}, | ||
| ): Promise<{ status: number | null; stdout: string; stderr: string }> { | ||
| return new Promise((resolve, reject) => { | ||
| const child = spawn(process.execPath, [scriptPath, ...args], { | ||
| env: { ...process.env, ...env }, | ||
| stdio: ["ignore", "pipe", "pipe"], | ||
| }); | ||
| let stdout = ""; | ||
| let stderr = ""; | ||
| child.stdout.on("data", (chunk) => { | ||
| stdout += chunk.toString(); | ||
| }); | ||
| child.stderr.on("data", (chunk) => { | ||
| stderr += chunk.toString(); | ||
| }); | ||
| child.on("error", reject); | ||
| child.on("close", (status) => { | ||
| resolve({ status, stdout, stderr }); | ||
| }); | ||
| }); | ||
| } |
There was a problem hiding this comment.
add a hard timeout to the child-process harness.
in test/audit-log-forwarder.test.ts:13-35 (Line 13), runForwarder can hang forever if the script never exits, which makes this suite non-deterministic and can deadlock ci.
proposed fix
function runForwarder(
args: string[],
env: NodeJS.ProcessEnv = {},
+ timeoutMs = 10_000,
): Promise<{ status: number | null; stdout: string; stderr: string }> {
return new Promise((resolve, reject) => {
const child = spawn(process.execPath, [scriptPath, ...args], {
env: { ...process.env, ...env },
stdio: ["ignore", "pipe", "pipe"],
});
+ const timeout = setTimeout(() => {
+ child.kill();
+ reject(new Error(`runForwarder timed out after ${timeoutMs}ms`));
+ }, timeoutMs);
let stdout = "";
let stderr = "";
child.stdout.on("data", (chunk) => {
stdout += chunk.toString();
});
child.stderr.on("data", (chunk) => {
stderr += chunk.toString();
});
- child.on("error", reject);
+ child.on("error", (error) => {
+ clearTimeout(timeout);
+ reject(error);
+ });
child.on("close", (status) => {
+ clearTimeout(timeout);
resolve({ status, stdout, stderr });
});
});
}As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function runForwarder( | |
| args: string[], | |
| env: NodeJS.ProcessEnv = {}, | |
| ): Promise<{ status: number | null; stdout: string; stderr: string }> { | |
| return new Promise((resolve, reject) => { | |
| const child = spawn(process.execPath, [scriptPath, ...args], { | |
| env: { ...process.env, ...env }, | |
| stdio: ["ignore", "pipe", "pipe"], | |
| }); | |
| let stdout = ""; | |
| let stderr = ""; | |
| child.stdout.on("data", (chunk) => { | |
| stdout += chunk.toString(); | |
| }); | |
| child.stderr.on("data", (chunk) => { | |
| stderr += chunk.toString(); | |
| }); | |
| child.on("error", reject); | |
| child.on("close", (status) => { | |
| resolve({ status, stdout, stderr }); | |
| }); | |
| }); | |
| } | |
| function runForwarder( | |
| args: string[], | |
| env: NodeJS.ProcessEnv = {}, | |
| timeoutMs = 10_000, | |
| ): Promise<{ status: number | null; stdout: string; stderr: string }> { | |
| return new Promise((resolve, reject) => { | |
| const child = spawn(process.execPath, [scriptPath, ...args], { | |
| env: { ...process.env, ...env }, | |
| stdio: ["ignore", "pipe", "pipe"], | |
| }); | |
| const timeout = setTimeout(() => { | |
| child.kill(); | |
| reject(new Error(`runForwarder timed out after ${timeoutMs}ms`)); | |
| }, timeoutMs); | |
| let stdout = ""; | |
| let stderr = ""; | |
| child.stdout.on("data", (chunk) => { | |
| stdout += chunk.toString(); | |
| }); | |
| child.stderr.on("data", (chunk) => { | |
| stderr += chunk.toString(); | |
| }); | |
| child.on("error", (error) => { | |
| clearTimeout(timeout); | |
| reject(error); | |
| }); | |
| child.on("close", (status) => { | |
| clearTimeout(timeout); | |
| resolve({ status, stdout, stderr }); | |
| }); | |
| }); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/audit-log-forwarder.test.ts` around lines 13 - 35, The runForwarder
harness can hang forever; add a hard timeout that kills the spawned child and
resolves the promise when exceeded. Inside runForwarder (the child variable
returned by spawn), start a timer (e.g., with setTimeout) that calls
child.kill() and then resolves with a timeout result (status null and an
explanatory stderr fragment or message) if the process hasn't exited; clear the
timer on child 'close' and on 'error' to avoid leaks, and ensure the same
stdout/stderr aggregation is preserved before resolving.
| const releaseTimer = setTimeout(async () => { | ||
| await fs.unlink(checkpointLockPath).catch(() => {}); | ||
| }, 120); | ||
| try { | ||
| const result = await runForwarder( | ||
| [ | ||
| `--endpoint=${endpoint}`, | ||
| `--log-dir=${logDir}`, | ||
| `--checkpoint=${checkpointPath}`, | ||
| ], | ||
| { | ||
| CODEX_AUDIT_FORWARDER_MAX_ATTEMPTS: "2", | ||
| CODEX_AUDIT_FORWARDER_TIMEOUT_MS: "300", | ||
| }, |
There was a problem hiding this comment.
remove the timer race in the lock-release test.
in test/audit-log-forwarder.test.ts:210-223 (Line 210), releasing the lock at 120ms with a 300ms timeout budget is tight and can flake on slower windows runners under load.
proposed fix
- const releaseTimer = setTimeout(async () => {
+ const releaseTimer = setTimeout(async () => {
await fs.unlink(checkpointLockPath).catch(() => {});
- }, 120);
+ }, 50);
try {
const result = await runForwarder(
[
`--endpoint=${endpoint}`,
`--log-dir=${logDir}`,
`--checkpoint=${checkpointPath}`,
],
{
CODEX_AUDIT_FORWARDER_MAX_ATTEMPTS: "2",
CODEX_AUDIT_FORWARDER_TIMEOUT_MS: "300",
+ CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "2000",
+ CODEX_AUDIT_FORWARDER_LOCK_MAX_ATTEMPTS: "200",
},
);As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/audit-log-forwarder.test.ts` around lines 210 - 223, The test's
lock-release timer (releaseTimer using setTimeout to
fs.unlink(checkpointLockPath)) races with the forwarder's timeout
(CODEX_AUDIT_FORWARDER_TIMEOUT_MS), causing flakes; replace the fixed 120ms
wakeup with a deterministic approach—either increase the delay to exceed the
timeout window (e.g., 500ms) or, preferably, replace the timer with a
polling/wait helper that repeatedly checks for the forwarder hitting the lock
wait state before calling fs.unlink(checkpointLockPath); update the test around
runForwarder and the env var CODEX_AUDIT_FORWARDER_TIMEOUT_MS accordingly and
remove the fragile 120ms timing.
| it("clears stale checkpoint locks and proceeds", async () => { | ||
| const root = mkdtempSync(path.join(tmpdir(), "audit-forwarder-stale-lock-")); | ||
| fixtures.push(root); | ||
| const logDir = path.join(root, "logs"); | ||
| const checkpointPath = path.join(root, "checkpoint.json"); | ||
| const checkpointLockPath = `${checkpointPath}.lock`; | ||
| await fs.mkdir(logDir, { recursive: true }); | ||
| await fs.writeFile( | ||
| path.join(logDir, "audit.log"), | ||
| '{"timestamp":"2026-03-01T00:00:00Z","action":"request.start"}\n', | ||
| "utf8", | ||
| ); | ||
| await fs.writeFile(checkpointLockPath, "999999\n", "utf8"); | ||
| const staleDate = new Date(Date.now() - 60 * 1000); | ||
| utimesSync(checkpointLockPath, staleDate, staleDate); | ||
|
|
||
| await withServer((_req, res) => { | ||
| res.statusCode = 200; | ||
| res.end("ok"); | ||
| }, async (endpoint) => { | ||
| const result = await runForwarder( | ||
| [ | ||
| `--endpoint=${endpoint}`, | ||
| `--log-dir=${logDir}`, | ||
| `--checkpoint=${checkpointPath}`, | ||
| ], | ||
| { | ||
| CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50", | ||
| CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500", | ||
| }, | ||
| ); | ||
| expect(result.status).toBe(0); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
strengthen stale-lock assertions to prevent false positives.
in test/audit-log-forwarder.test.ts:239-272 (Line 239), this regression only checks exit status. add payload and checkpoint assertions so it proves the stale lock was actually cleared and processing advanced.
proposed fix
}, async (endpoint) => {
const result = await runForwarder(
[
`--endpoint=${endpoint}`,
`--log-dir=${logDir}`,
`--checkpoint=${checkpointPath}`,
],
{
CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50",
CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500",
},
);
expect(result.status).toBe(0);
+ const payload = parseJsonStdout(result.stdout);
+ expect(payload.status).toBe("sent");
+ expect(payload.sent).toBe(1);
});
+
+ const checkpoint = JSON.parse(await fs.readFile(checkpointPath, "utf8")) as {
+ file?: string;
+ line?: number;
+ };
+ expect(checkpoint.file).toBe("audit.log");
+ expect(checkpoint.line).toBe(1);
});As per coding guidelines test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("clears stale checkpoint locks and proceeds", async () => { | |
| const root = mkdtempSync(path.join(tmpdir(), "audit-forwarder-stale-lock-")); | |
| fixtures.push(root); | |
| const logDir = path.join(root, "logs"); | |
| const checkpointPath = path.join(root, "checkpoint.json"); | |
| const checkpointLockPath = `${checkpointPath}.lock`; | |
| await fs.mkdir(logDir, { recursive: true }); | |
| await fs.writeFile( | |
| path.join(logDir, "audit.log"), | |
| '{"timestamp":"2026-03-01T00:00:00Z","action":"request.start"}\n', | |
| "utf8", | |
| ); | |
| await fs.writeFile(checkpointLockPath, "999999\n", "utf8"); | |
| const staleDate = new Date(Date.now() - 60 * 1000); | |
| utimesSync(checkpointLockPath, staleDate, staleDate); | |
| await withServer((_req, res) => { | |
| res.statusCode = 200; | |
| res.end("ok"); | |
| }, async (endpoint) => { | |
| const result = await runForwarder( | |
| [ | |
| `--endpoint=${endpoint}`, | |
| `--log-dir=${logDir}`, | |
| `--checkpoint=${checkpointPath}`, | |
| ], | |
| { | |
| CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50", | |
| CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500", | |
| }, | |
| ); | |
| expect(result.status).toBe(0); | |
| }); | |
| }); | |
| it("clears stale checkpoint locks and proceeds", async () => { | |
| const root = mkdtempSync(path.join(tmpdir(), "audit-forwarder-stale-lock-")); | |
| fixtures.push(root); | |
| const logDir = path.join(root, "logs"); | |
| const checkpointPath = path.join(root, "checkpoint.json"); | |
| const checkpointLockPath = `${checkpointPath}.lock`; | |
| await fs.mkdir(logDir, { recursive: true }); | |
| await fs.writeFile( | |
| path.join(logDir, "audit.log"), | |
| '{"timestamp":"2026-03-01T00:00:00Z","action":"request.start"}\n', | |
| "utf8", | |
| ); | |
| await fs.writeFile(checkpointLockPath, "999999\n", "utf8"); | |
| const staleDate = new Date(Date.now() - 60 * 1000); | |
| utimesSync(checkpointLockPath, staleDate, staleDate); | |
| await withServer((_req, res) => { | |
| res.statusCode = 200; | |
| res.end("ok"); | |
| }, async (endpoint) => { | |
| const result = await runForwarder( | |
| [ | |
| `--endpoint=${endpoint}`, | |
| `--log-dir=${logDir}`, | |
| `--checkpoint=${checkpointPath}`, | |
| ], | |
| { | |
| CODEX_AUDIT_FORWARDER_STALE_LOCK_MS: "50", | |
| CODEX_AUDIT_FORWARDER_MAX_WAIT_MS: "500", | |
| }, | |
| ); | |
| expect(result.status).toBe(0); | |
| const payload = parseJsonStdout(result.stdout); | |
| expect(payload.status).toBe("sent"); | |
| expect(payload.sent).toBe(1); | |
| }); | |
| const checkpoint = JSON.parse(await fs.readFile(checkpointPath, "utf8")) as { | |
| file?: string; | |
| line?: number; | |
| }; | |
| expect(checkpoint.file).toBe("audit.log"); | |
| expect(checkpoint.line).toBe(1); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/audit-log-forwarder.test.ts` around lines 239 - 272, The test currently
only asserts exit status; update the "clears stale checkpoint locks and
proceeds" test to also assert that the forwarder actually processed the log and
advanced the checkpoint: when calling withServer and runForwarder, capture and
assert the HTTP request payload/body was received (via the withServer request
handler) contains the expected audit event, and after runForwarder finishes read
checkpointPath to assert it exists and contains an updated checkpoint (e.g., a
timestamp or offset indicating progress) and that checkpointLockPath was removed
or replaced; use the existing helpers runForwarder, withServer, checkpointPath,
checkpointLockPath, and logDir to locate files and the server handler to capture
payload, keeping assertions deterministic under vitest.
Summary
Validation
npm run typechecknpm run lintnpm run buildnpm testnpm run coveragenpm run ops:health-checknpm run perf:budget-checknpm run ops:retention-cleanup -- --days=90Post-review updates (2026-03-05)
What changed
docs/configuration.mdforCODEX_SECRET_STORAGE_MODEso rendered tables no longer split on pipe characters.test/audit-retry.test.tsto validate retry behavior for transientEBUSYcontention during stale audit log purge.test/enterprise-health-check.test.tswith fallback-audit regression coverage to ensure stale audit checks target fallbackmulti-auth/logslocations and remain Windows path-safe.scripts/enterprise-health-check.jsthat audit directory derivation intentionally mirrors runtime log-dir semantics from resolved multi-auth root.How to test
npm run test -- test/audit-retry.test.ts test/audit.test.ts test/schemas.test.ts test/token-store.test.ts test/enterprise-health-check.test.tsnpm run typechecknpm run lint:tsnpm run lint:scriptsRisk / rollout notes
PR Resolution Update (2026-03-05)
What Changed
How To Test
pm run typecheck
n- pm run lintn-pm run build
n- pm test -- test/audit-retry.test.ts test/audit.test.tsn-pm test`n
Risk / Rollout Notes
PR Resolution Update (2026-03-05, final pass)
What changed
scripts/audit-log-forwarder.js: rotation-safe checkpoint resume, timeout+retry backoff, checkpoint locking and atomic writes.scripts/retention-cleanup.js: directory churn guards and non-zero exit on partial failures.scripts/compliance-evidence-bundle.jsandscripts/generate-sbom.js: explicit largemaxBuffersupport.scripts/slo-budget-report.js: stale finding aggregation now includes bothstale-walandstale-audit-log.lib/audit.ts: purge throttle timing corrected and secure/retry-safe append behavior.lib/codex-manager.ts: sanitized audit error metadata.lib/secrets/token-store.ts: independent refresh/access secret delete attempts.lib/storage.ts: v4 hydration family-index remap when accounts are skipped.ops:recovery-drillcommand.v8.25.0.Added/updated regression tests
test/audit-log-forwarder.test.tstest/retention-cleanup.test.tstest/compliance-evidence-bundle.test.tstest/generate-sbom.test.tstest/slo-budget-report.test.tstest/audit.test.tstest/token-store.test.tstest/storage-v4-keychain.test.tstest/unified-settings.test.tstest/enterprise-health-check.test.tstest/schemas.test.tstest/security/secret-scan-regression.test.shHow to test
npm run lintnpm run typechecknpm testRisk / rollout notes
Thread Resolution Update (2026-03-05, final unblock)
What changed
if: always()to preserve diagnostics on failed runs.Atomics.waitfromlib/audit.tsretry path and updated regression assertions accordingly.scripts/audit-log-forwarder.js: readdir/stat churn guards, safe final mtime lookup, checkpoint tmp cleanup inside lock.scripts/slo-budget-report.js: log-dir churn guards (ENOENT/ENOTDIR) for directory and file reads.scripts/secret-scan-regression.sh: Windows path separator normalization fixed.scripts/seed-health-fixture.js: explicit0o700/0o600modes with best-effort chmod hardening.test/audit-log-forwarder.test.tstest/slo-budget-report.test.tstest/codex-manager-cli.test.tstest/audit-retry.test.tstest/audit.test.tstest/security/secret-scan-regression.test.tsHow to test
npx vitest run test/audit-retry.test.ts test/audit.test.ts test/audit-log-forwarder.test.ts test/slo-budget-report.test.ts test/security/secret-scan-regression.test.ts test/codex-manager-cli.test.tsnpm run lintnpm run typecheckRisk / rollout notes
Thread Resolution Update (2026-03-05, follow-up pass)
What changed
ecovery-drill job and added an if: failure() notifier step with optional webhook delivery (secrets.RECOVERY_DRILL_WEBHOOK_URL) plus run/artifact context.
pm run ops:keychain-assert and
pm run ops:health-check must be run from repository root (or CI/job wrapper).
emoveWithRetry helper import from est/helpers/remove-with-retry.ts.
How to test
px vitest run test/audit-log-forwarder.test.ts test/security/secret-scan-regression.test.ts
pm run lint
pm run typecheck
Risk / rollout notes
note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
this pr lands keychain-backed token-at-rest storage (v4 schema), hardens local file permissions across audit/settings/config, adds ci secret scanning (gitleaks), and ships operational runbooks + health/retention/slo scripts. it's a broad enterprise readiness baseline touching 63 files.
key changes and findings:
lib/secrets/token-store.ts,lib/storage.ts): v4 schema stores only opaque refs on disk; tokens live in the os keychain via keytar. save/load/clear are all wrapped inwithStorageLock. rollback on partial write failure is implemented and tested. solid.lib/audit.ts):withRetryableAuditFsOperationspins 5 times with zero delay between attempts. since it's fully synchronous, all retries fire within microseconds — real windows AV-heldEBUSYlocks won't release in that window. the test validates retry count but not timing, so it passes without exercising the actual contention scenario.scripts/audit-log-forwarder.js): thecollectBatchloop reads each file with noENOENTguard. if retention cleanup deletes a file betweendiscoverAuditFilesand thereadFile, the forwarder crashes. all other file reads in this script are guarded.lib/codex-manager.ts):runWithAudithardcodes"cli-user"as the actor for every command. enterprise audit trails lose per-account attribution.0o700dirs and0o600files with best-effortchmodSyncafter rename on non-windows; correct and consistent acrosslib/config.ts,lib/unified-settings.ts, andlib/audit.ts.--provenance, retention drill, monthly recovery drill all look correct.pull-requests: writeon secret scan is intentional for gitleaks pr annotations.test/storage-v4-keychain.test.tsandtest/token-store.test.tscover round-trips, concurrent writes, EPERM retry, rollback, and activeIndex adjustment well.Confidence Score: 3/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant CLI as codex-manager CLI participant SA as saveAccounts (locked) participant SSFS as serializeStorageForPersist participant KT as keytar (keychain) participant FS as filesystem participant LA as loadAccounts participant HV4 as hydrateV4Storage CLI->>SA: saveAccounts(v3Storage) SA->>SA: withStorageLock acquired SA->>SSFS: serializeStorageForPersist(v3) SSFS->>KT: setPassword(service, ref:refresh, token) KT-->>SSFS: ok SSFS->>KT: setPassword(service, ref:access, token) alt access write fails KT-->>SSFS: error SSFS->>KT: deletePassword(service, ref:refresh) SSFS-->>SA: throw error SA->>KT: deleteAccountSecrets(persistedRefs, force=true) else all writes ok KT-->>SSFS: ok SSFS-->>SA: {content: v4JSON, persistedSecretRefs} end SA->>FS: writeFile(tempPath, v4JSON, mode=0o600) SA->>FS: rename(tempPath, storagePath) alt rename succeeds SA->>SA: persistedSecretRefs = [] (no rollback) SA->>FS: unlink(walPath) else rename fails (EPERM on Windows, retried) SA->>FS: retry rename up to 5x SA->>KT: deleteAccountSecrets(persistedSecretRefs) rollback end SA->>SA: withStorageLock released CLI->>LA: loadAccounts() LA->>FS: readFile(storagePath) FS-->>LA: v4 JSON LA->>HV4: hydrateV4Storage(v4data) loop for each account HV4->>KT: getPassword(service, ref:refresh) KT-->>HV4: refreshToken (or null → skip) HV4->>KT: getPassword(service, ref:access) KT-->>HV4: accessToken (or null) end HV4->>HV4: adjustActiveIndex (skip-aware) HV4-->>LA: AccountStorageV3 (hydrated) LA-->>CLI: accountsComments Outside Diff (3)
lib/secrets/token-store.ts, line 40-44 (link)default storage mode silently forces keychain, breaking installs without keytar
parseSecretStorageModereturns"keychain"for any unrecognized value — including an empty string, which is what you get whenCODEX_SECRET_STORAGE_MODEis unset ((undefined ?? "").trim().toLowerCase()→""). that means every user who hasn't explicitly configured this env var gets hard-bound to keychain. thengetEffectiveSecretStorageModeshort-circuits on"keychain"(line 102) without checking availability, sogetKeytarOrThrow()throws on anysaveAccountscall if keytar isn't installed.the safe default is
"auto": try keychain, fall back to plaintext.vitest.config.tspapers over this by forcing"plaintext"in test runs (line 15-16), hiding the breakage in CI.lib/audit.ts, line 108-121 (link)synchronous retry loop has zero delay — ineffective against Windows AV EBUSY locks
withRetryableAuditFsOperationspins 5 times with no sleep between attempts. on windows, an antivirus scanner holding an audit log file (EBUSY) typically releases it after 50–500 ms — all 5 retries complete in under a millisecond, so they all hit the same lock. contrast this withdeleteSecretRefWithRetryintoken-store.ts(line 85-96) which correctly usesawait sleep(25 * 2 ** attempt)exponential backoff.the issue is the loop is synchronous (
() => Tnot() => Promise<T>), so you can't useawait sleep(). for a drop-in synchronous fix that won't block the event loop for long durations,Atomics.waitwith a short timeout is the standard approach — but the test explicitly assertsexpect(atomicsWaitSpy).not.toHaveBeenCalled()(test/audit-retry.test.ts:76) which means even a minimal delay isn't covered. at minimum, add a regression test that asserts retry attempts are spaced apart, or document the trade-off explicitly and accept that EBUSY retries may exhaust immediately on windows.lib/storage.ts, line 876-920 (link)activeIndex adjustment logic misses the case where the active account itself has a missing keychain secret
skippedBeforeActiveIndexonly increments whenindex < data.activeIndex(line 893). if the active account atdata.activeIndexis the one with a missing secret and gets skipped (line 899), nothing increments andadjustedActiveIndex = data.activeIndex - 0 = data.activeIndex. buthydratedAccountshas one fewer element, makingadjustedActiveIndexone past the end of the array before it's passed tonormalizeAccountStorage.example: 2 accounts,
activeIndex = 1, account[0] has secrets, account[1] doesn't.index (1) == activeIndex (1), NOT< activeIndex→skippedBeforeActiveIndexstays 0adjustedActiveIndex = max(0, 1 - 0) = 1→ points past a 1-element arraywhile
normalizeAccountStorage(line 828) clamps the index viaclampIndex(), this edge case isn't explicitly tested intest/storage-v4-keychain.test.ts, and the behavior when the currently active account has a missing keychain secret is ambiguous. add a regression test covering this scenario: skip the active account and verify the adjusted index points to a valid account (or falls back correctly).lib/audit.ts, line 64-77 (link)sync retry loop has zero delay — EBUSY on Windows won't recover
withRetryableAuditFsOperationspins through 5 attempts with no sleep between them. on Windows, antivirus-inducedEBUSY/EPERMlocks typically last 1–50 ms. all 5 retries complete in ~5 µs and all fail. the previousAtomics.wait()was correctly removed to avoid event-loop freezes, but the replacement still needs some backoff or the retries are a no-op for real windows concurrency bugs.since the callers (
writeFileSync,unlinkSync,statSync) are synchronous there is no easy way to add async backoff here without refactoring. a practical low-impact fix: accept one fast retry (catches true kernel-level transients) and only call the retry wrapper for the critical write path, not for the best-effort purge. for the purge, simply catch and skip on first error — no retry loop needed. for the write path, wrapping in the outertry/catch(which already swallows errors) means a write failure is not catastrophic; removing the retry loop there would also be defensible.at minimum, add an inline comment referencing
test/audit-retry.test.tsand explaining why zero-delay retries are acceptable given the outer catch. the current code implies the retries provide protection they don't actually give on Windows.lib/codex-manager.ts, line 179-186 (link)token redaction patterns are too narrow — session and bearer tokens will leak
sanitizeAuditErroronly catches three specific patterns:sk-[A-Za-z0-9_-]{12,}— old-style OpenAI API keys(?:refresh|access)_token_[A-Za-z0-9_-]{8,}— underscore-delimited token varssecret-(?:access|refresh)-token— literal label stringtokens that will not be redacted and that appear in this codebase:
sess-[A-Za-z0-9_-]+(used inAccountMetadataV3.refreshToken)Bearer <token>in error messages from fetch/axiosOPENAI_API_KEYorCODEX_TOKENvalues that get stringified in generic error messagesa session token string is e.g.
sess-AbCdEfGhIjKl1234567890— the regex above does not match it and it would appear in audit logs verbatim.also add a regression test in
test/audit.test.tscovering these patterns — there is currently no test that verifiessess-*tokens are stripped.lib/secrets/token-store.ts, line 480-491 (link)ref derivation from
refreshTokencreates orphaned keychain entries on token rotationwhen
accountIdandemailare both absent,deriveAccountSecretRefhashes the rawrefreshTokenas the seed. every token rotation produces a different base ref and a new keychain entry, permanently orphaning the previous<ref>:refreshand<ref>:accessentries. there is no GC path for these orphans.AccountMetadataV3hasaccountIdas optional (many legacy anonymous accounts lack it), so this isn't an edge case — it will silently accumulate stale secrets in the system keychain over time.consider storing the previous ref alongside the account during a rotation so
deleteAccountSecretscan be called on the old ref before the new one is written, or adding a periodiclistPasswords(SECRET_SERVICE)sweep that compares against live refs. at minimum, add a comment documenting this known accumulation risk and ensuretest/token-store.test.tscovers the token-rotation-without-accountId path.lib/storage.ts, line 787-825 (link)orphaned keychain secrets if process is killed between keychain write and file rename
serializeStorageForPersistwrites all tokens to the keychain (lines 797–806), then returns the serialized content. the caller (saveAccountsUnlocked) writes to a temp file and renames it atomically. the cleanup logic at the end of the catch block will rollback secrets if the rename fails — but only if the process survives to reach that catch. a crash (OOM kill, SIGKILL, power loss) between the keychain write loop and the finalfs.renamepermanently orphans those keychain entries.there is currently no startup GC that compares keychain entries against the live accounts file and deletes unreferenced ones. on Windows this is particularly likely because the
EPERMrename retry pattern means the process spends more time between the two operations.the concrete risk: after a few crash-during-save cycles, the system keychain accumulates stale token entries that are never cleaned up. add a
reconcileKeychainSecrets(storagePath)step toloadAccountsthat reads existing refs from disk and deletes any keychain entry not currently referenced.test/storage-v4-keychain.test.tscurrently does not cover the crash-between-write-and-rename scenario.lib/audit.ts, line 76 (link)Atomics.waitstill blocks the event loop — PR description claims it was removedthe pr description (resolution update 2026-03-05) explicitly states "Removed blocking
Atomics.wait()from audit purge retry handling inlib/audit.ts" and "Added regression coverage intest/audit-retry.test.tsassertingAtomics.waitis not called during purge retry flows." however the code still callsAtomics.wait:and
test/audit-retry.test.tsline 79 asserts it was called 2 times (expect(atomicsWaitSpy).toHaveBeenCalledTimes(2)), which directly contradicts the stated intent.Atomics.waitis a synchronous, blocking call that freezes the entire Node.js main-thread event loop for up to10 * 2^attemptms per retry (10ms, 20ms, 40ms, 80ms, 160ms). sinceauditLogis called on every request, andpurgeExpiredLogs(which goes throughwithRetryableAuditFsOperation) runs for up to an hour's worth of log files, windows antivirus lock contention onunlinkSyncorwriteFileSyncwill block the event loop for hundreds of milliseconds. this is the exact regression the description claims to fix.replace with async sleep (matching the pattern already used in
token-store.ts):or, since all callers are synchronous fs ops, consider removing retry from the
auditLoghot path entirely and only retrying in the background purge. at minimum, the test assertion must be flipped totoHaveBeenCalledTimes(0)if the removal claim is true.scripts/audit-log-forwarder.js, line 344 (link)Dead
unlink(tmpPath)— unreachable in both success and failure pathswriteCheckpointAtomicplaceswriteFile+renameinsidewithCheckpointLock. two cases:rename(tmpPath, checkpointPath)succeeds →tmpPathno longer exists when execution reaches line 344 →unlinkthrowsENOENT, silently swallowed by.catch(() => {})— no-op.rename(orwriteFile) throws inside the lock callback →withCheckpointLockre-throws the error →writeCheckpointAtomicexits via exception before reaching line 344 — unreachable.in neither case does the cleanup do useful work. this leaves a subtle correctness gap: if
writeFilesucceeds butrenamefails, thetmpPathorphan is never removed, leaking a.tmpfile on windows (antivirus may hold it open indefinitely).move the cleanup inside the lock action, guarded by
rename's failure:lib/audit.ts, line 118-131 (link)No backoff between sync retries — ineffective on Windows EBUSY
withRetryableAuditFsOperationretries immediately 5 times with zero delay. on Windows, antivirus and explorer holdEBUSY/EPERMlocks for hundreds of milliseconds at minimum. five rapid hot-loop retries will almost always exhaust all attempts while the lock is still held, so this degenerates to a single attempt in practice.the PR description states
Atomics.wait()was removed to prevent event-loop freezes — correct call — but without any yield point the synchronous retry is effectively a no-op for AV contention. the regression test (test/audit-retry.test.ts) validates retry count and thatAtomics.waitis not called, but does not validate that the backoff mechanism actually resolves real contention.recommended fix: refactor
purgeExpiredLogsand thechmodSyncpost-write path to be async (they're already in an async call chain viaauditLog), then use exponential backoff withawait sleep(25 * 2 ** attempt)— the same pattern already used inlib/secrets/token-store.ts'sdeleteSecretRefWithRetry. the synchronouswriteFileSyncpath inauditLogitself is harder to change, but purge retries at least can be made async-safe.lib/secrets/token-store.ts, line 212-214 (link)Second delete error silently discarded — token leakage masked
when both
refreshTokenandaccessTokendeletes fail,deleteErrorsaccumulates two entries but onlydeleteErrors[0]is thrown. the second failure (usually the access token) is silently dropped. this means a dangling access token in the keychain will never surface in logs or be visible to callers — exactly the token leakage scenario this PR is trying to harden against.at minimum, log a warn for the second error before throwing the first so ops teams can detect the keychain inconsistency.
scripts/audit-log-forwarder.js, line 323-327 (link)EPERMas a contention indicator is overly broad on WindowsEPERMfromopen(lockPath, "wx")is treated the same asEEXIST(lock contention), causing the forwarder to spin-wait with exponential backoff. on Windows,EPERMcan arise from a variety of real permission errors — e.g., the checkpoint directory is read-only or the process lacks write access — that will never resolve no matter how many retries are attempted. treating these as transient contention will exhaustCHECKPOINT_LOCK_MAX_WAIT_MS(60 s by default) before surfacing the real error.if you need to handle Windows lock-file races more broadly, consider also checking
code === "EACCES"only whenclearStaleCheckpointLockalready confirmed the lock file exists (i.e., contention is confirmed), rather than blindly treating allEPERM/EACCESas contention.lib/audit.ts, line 118-131 (link)Sync retry loop has no inter-attempt delay — ineffective for Windows AV/EBUSY
withRetryableAuditFsOperationis fully synchronous. all 5 retries fire within microseconds of each other. on windows, when antivirus software holds anEBUSYlock, the lock typically isn't released for 10–100ms. by the time attempt 4 runs, the AV still holds the file — all retries fail before the OS ever releases it.the pr description explicitly mentions "Windows filesystem concurrency bugs" as a target, but this retry gives no real protection for that scenario. the previous
Atomics.waitwas removed (correctly), but nothing replaced it.since
auditLogis sync, you can'tawait sleep()here. options:auditLogasync (bigger change, but correct)catch {}already does this for failed writes) — but then the 5-iteration retry loop is noisethe regression test (
test/audit-retry.test.ts) validates thatunlinkSyncis called 3 times, but the test mock returns synchronously, so it doesn't reproduce the real windows timing scenario at all. the test passes without delay and would pass even if the loop ran 0 retries.this should be called out explicitly in a comment, or the retry loop for
writeFileSyncshould be removed since it cannot meaningfully help under real AV pressure without a delay.scripts/audit-log-forwarder.js, line 177-179 (link)unguarded
readFilewill crash forwarder on file-disappearance racediscoverAuditFilesvalidates that each file exists viastatbefore returning the list. but there's no ENOENT guard aroundreadFile(fullPath)here incollectBatch. on windows (or any platform), the retention cleanup inlib/audit.ts(purgeExpiredLogs) runs concurrently with the forwarder whenauditLogfires. if a file is purged betweendiscoverAuditFilesand this read, the error propagates up throughcollectBatch → mainand terminates the process.compare with
discoverAuditFiles(lines ~100–120) which explicitly catchesENOENT/ENOTDIRand skips.Last reviewed commit: 2b0afa4
Context used:
dashboard- What: Every code change must explain how it defends against Windows filesystem concurrency bugs and ... (source)