Skip to content

Conversation

Copy link

Copilot AI commented Dec 12, 2025

Pull Request Update - Code Review Fixes

Changes Made

Type Safety Improvements ✅

  • Changed error: any to error: unknown throughout codebase for better type safety
  • Added proper type guards for error handling in congressClient.ts and workerPool.ts
  • Safe error message extraction with fallbacks to String()

Variable Naming ✅

  • Renamed offset and limit to pageOffset and pageLimit in congressClient to avoid confusion with SQL keywords

PackageId Generation Fix ✅

  • Fixed timestamp-based packageId generation that was non-deterministic
  • Now uses content-based hash (JSON.stringify + base64) for consistent IDs across pagination runs
  • Ensures deduplication works correctly when processing same items multiple times

Worker Pool Refactoring ✅

  • Extracted common error handling logic into addTaskWithLifecycle helper method
  • Eliminated code duplication across addCongressTask, addGovInfoTask, and addBulkDataTask
  • Fixed inconsistent state: failed tasks no longer marked as completed
  • Errors logged but not re-thrown to prevent queue interruption

Storage Manager ✅

  • Added empty string check for directory path validation alongside '.' check

Build & Test ✅

  • Added pretest script to automatically build before running tests
  • Tests now always run against current code, not stale build output

All Tests Passing ✅

  • Build successful
  • Smoke tests passing
  • Type safety improved throughout
Original prompt

use the postman mcp server (take steps to acquire it if you dont have it yet) to breakdown and reverse engineer the api endpoints/websites at api.congress.gov and govinfo.gov/bulkdata (or govinfo.gov/api). and then analyze the endpoints, giving a breakdown of all of the offerings. Lets then systematically design mcp server to interact with each of the api endpoints and then ingest the bulk data offerings at each one in an organized fashion. lets come up with a way to track the data that we have ingested to avoid duplication. for each of the functions that we are to design the api endpiotns we need to respect the rate limiit which can be gathered form the websites and we need to use offset variable for our loops and we also need to observe the pagination variable as well. tracking these variables is a necessity to the success of our functions. they are integral to their functioning. lets also create a way to use a large number of additional workers to speed up the ingestion process to run parrallel workers that can be spread across the data


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits December 12, 2025 08:16
…API integration

Co-authored-by: cbwinslow <8528478+cbwinslow@users.noreply.github.com>
Co-authored-by: cbwinslow <8528478+cbwinslow@users.noreply.github.com>
Co-authored-by: cbwinslow <8528478+cbwinslow@users.noreply.github.com>
…ndling, add completion tracking

Co-authored-by: cbwinslow <8528478+cbwinslow@users.noreply.github.com>
…practices

Co-authored-by: cbwinslow <8528478+cbwinslow@users.noreply.github.com>
Copilot AI changed the title [WIP] Analyze and design MCP server for API interactions Add MCP server for systematic Congress.gov and GovInfo.gov bulk data ingestion Dec 12, 2025
Copilot AI requested a review from cbwinslow December 12, 2025 08:32
@cbwinslow cbwinslow marked this pull request as ready for review December 12, 2025 08:37
Copilot AI review requested due to automatic review settings December 12, 2025 08:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive MCP (Model Context Protocol) server for programmatically ingesting bulk data from api.congress.gov and govinfo.gov APIs. The implementation includes rate limiting, pagination tracking, SQLite-based deduplication, parallel processing via worker pools, and resume capabilities for interrupted ingestions.

Key Changes:

  • Complete TypeScript-based MCP server with 20+ tools for Congress and GovInfo API access
  • SQLite storage manager with deduplication and state persistence
  • Configurable worker pool system for parallel data ingestion with 5-10 concurrent workers
  • Comprehensive documentation including API endpoints, usage guide, and MCP configuration

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/index.ts Main MCP server with 20+ tool handlers for Congress API, GovInfo API, workers, and storage
src/congress/congressClient.ts Congress.gov API client with 8 endpoints, rate limiting (5K req/hr), and pagination
src/govinfo/govinfoClient.ts GovInfo.gov API client for collections, packages, search, and bulk data
src/workers/workerPool.ts Parallel processing worker pool with configurable concurrency and task queue
src/storage/storageManager.ts SQLite-based storage for deduplication, pagination state, and worker progress
src/utils/rateLimiter.ts Bottleneck-based rate limiting with token bucket algorithm
package.json Project configuration with dependencies for MCP SDK, axios, SQLite, p-queue
tsconfig.json TypeScript configuration targeting ES2022 with strict mode
test-smoke.js Smoke tests for storage, deduplication, and pagination without API keys
test-manual.js Manual integration tests requiring API keys
USAGE_GUIDE.md Comprehensive usage documentation with examples and patterns
API_ENDPOINTS.md Complete API endpoint reference for both Congress and GovInfo APIs
IMPLEMENTATION_SUMMARY.md Technical architecture and design decisions documentation
SECURITY_SUMMARY.md Security analysis, best practices, and CodeQL scan results
MCP_CONFIGURATION.md Claude Desktop integration setup instructions
README.md Updated project overview with MCP server features
.gitignore Ignore patterns for dependencies, build outputs, data, and environment files
.env.example Template for environment configuration with API keys and worker settings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

params: Record<string, any> = {},
pagination: PaginationParams = {}
): Promise<any> {
const { offset = 0, limit = 250 } = pagination;
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reserved words 'limit' and 'offset' are used as variable names in the pagination parameters. While this works in JavaScript/TypeScript, it could cause confusion and potential issues. Consider using more specific names like 'resultLimit' and 'resultOffset' or 'pageLimit' and 'pageOffset' to avoid any potential conflicts with SQL keywords (even though these are used with parameterized queries).

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f0644e4. Renamed to pageOffset and pageLimit to avoid confusion with SQL keywords.

packageId = `url-${Buffer.from(item.url).toString('base64').substring(0, 32)}`;
} else {
// Last resort: use item index with timestamp
packageId = `${endpoint}-${offset + items.indexOf(item)}-${Date.now()}`;
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The packageId generation logic uses a timestamp as part of the fallback, which creates a potential issue. When the same item is processed multiple times (e.g., during pagination or resume), it will generate different packageIds due to the timestamp changing, defeating the deduplication mechanism. The packageId should be deterministic based on the item's properties, not time-dependent.

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f0644e4. Replaced timestamp with content-based hash (JSON.stringify + base64) for deterministic package IDs.

Comment on lines 188 to 203
} catch (error: any) {
console.error(`Worker ${workerId} error:`, error);

this.storage.updateWorkerProgress({
workerId,
collection: `govinfo-${collection}`,
status: 'error',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
errorMessage: error.message,
});

this.activeTasks.delete(taskId);
this.completedTasks++;
throw error;
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling pattern is duplicated in three locations (addCongressTask, addGovInfoTask, addBulkDataTask). Similar issue as lines 110-125: tasks are marked as completed even when errors occur, and then the error is thrown, which creates inconsistent state.

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f0644e4. Extracted common logic to addTaskWithLifecycle helper, improved error handling (failed tasks not marked completed), and errors no longer re-thrown to prevent queue interruption.


this.activeTasks.delete(taskId);
this.completedTasks++;
} catch (error: any) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling uses 'error: any' type annotation which defeats TypeScript's type safety. Consider using 'error: unknown' and then checking the type, or creating a proper error type to handle error.message safely.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f0644e4. Changed to error: unknown with proper type guards in addTaskWithLifecycle helper.

Comment on lines 88 to 89
} catch (error: any) {
if (error.response?.status === 429) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling uses 'error: any' type throughout the file. Consider using 'error: unknown' for better type safety, especially since you're accessing error.response which may not exist on all error types.

Suggested change
} catch (error: any) {
if (error.response?.status === 429) {
} catch (error: unknown) {
// Type guard for AxiosError with response
if (
typeof error === 'object' &&
error !== null &&
'response' in error &&
typeof (error as any).response === 'object' &&
(error as any).response !== null &&
'status' in (error as any).response &&
(error as any).response.status === 429
) {

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f0644e4. Changed to error: unknown with proper type guard checking for response.status === 429.

Comment on lines 215 to 285

const task: Task = {
id: taskId,
type: 'bulkdata',
target: path,
};

this.activeTasks.set(taskId, task);

this.queue.add(async () => {
const workerId = `worker-${taskId}`;

try {
this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'working',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
});

if (!this.govinfoClient) {
throw new Error('GovInfo client not initialized');
}

await this.govinfoClient.ingestBulkData(
path,
(processed) => {
this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'working',
currentOffset: processed,
itemsProcessed: processed,
lastActivity: new Date().toISOString(),
});
}
);

this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'idle',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
});

this.activeTasks.delete(taskId);
this.completedTasks++;
} catch (error: any) {
console.error(`Worker ${workerId} error:`, error);

this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'error',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
errorMessage: error.message,
});

this.activeTasks.delete(taskId);
this.completedTasks++;
throw error;
}
});

return taskId;
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three task addition methods (addCongressTask, addGovInfoTask, addBulkDataTask) have significant code duplication in their error handling, worker progress tracking, and task lifecycle management. Consider extracting this common logic into a private helper method to improve maintainability and reduce the risk of inconsistencies.

Suggested change
const task: Task = {
id: taskId,
type: 'bulkdata',
target: path,
};
this.activeTasks.set(taskId, task);
this.queue.add(async () => {
const workerId = `worker-${taskId}`;
try {
this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'working',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
});
if (!this.govinfoClient) {
throw new Error('GovInfo client not initialized');
}
await this.govinfoClient.ingestBulkData(
path,
(processed) => {
this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'working',
currentOffset: processed,
itemsProcessed: processed,
lastActivity: new Date().toISOString(),
});
}
);
this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'idle',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
});
this.activeTasks.delete(taskId);
this.completedTasks++;
} catch (error: any) {
console.error(`Worker ${workerId} error:`, error);
this.storage.updateWorkerProgress({
workerId,
collection: `bulkdata-${path}`,
status: 'error',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
errorMessage: error.message,
});
this.activeTasks.delete(taskId);
this.completedTasks++;
throw error;
}
});
return taskId;
const task: Task = {
id: taskId,
type: 'bulkdata',
target: path,
};
this._addTaskWithLifecycle(
task,
`bulkdata-${path}`,
async (workerId: string, progressCallback: (processed: number) => void) => {
if (!this.govinfoClient) {
throw new Error('GovInfo client not initialized');
}
await this.govinfoClient.ingestBulkData(
path,
progressCallback
);
}
);
return taskId;
}
/**
* Private helper to add a task with common lifecycle/error/progress logic
*/
private _addTaskWithLifecycle(
task: Task,
collection: string,
operation: (workerId: string, progressCallback: (processed: number) => void) => Promise<void>
): void {
this.activeTasks.set(task.id, task);
this.queue.add(async () => {
const workerId = `worker-${task.id}`;
try {
this.storage.updateWorkerProgress({
workerId,
collection,
status: 'working',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
});
await operation(workerId, (processed: number) => {
this.storage.updateWorkerProgress({
workerId,
collection,
status: 'working',
currentOffset: processed,
itemsProcessed: processed,
lastActivity: new Date().toISOString(),
});
});
this.storage.updateWorkerProgress({
workerId,
collection,
status: 'idle',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
});
this.activeTasks.delete(task.id);
this.completedTasks++;
} catch (error: any) {
console.error(`Worker ${workerId} error:`, error);
this.storage.updateWorkerProgress({
workerId,
collection,
status: 'error',
currentOffset: 0,
itemsProcessed: 0,
lastActivity: new Date().toISOString(),
errorMessage: error.message,
});
this.activeTasks.delete(task.id);
this.completedTasks++;
throw error;
}
});

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f0644e4. Extracted common logic to addTaskWithLifecycle private helper method, eliminating duplication across all three task methods.

@cbwinslow
Copy link

@copilot apply changes based on the comments in this thread

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
cbwinslow and others added 4 commits December 15, 2025 00:52
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ol, fix packageId generation

Co-authored-by: cbwinslow <8528478+cbwinslow@users.noreply.github.com>
@cbwinslow
Copy link

@gemini please review the repo and the changes and i want you to fix the conflicts so the pr can successfully merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants