Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
analyze_github_org.py	analyze_github_org.py
backlog_to_issues.py	backlog_to_issues.py
extract_repo_urls.py	extract_repo_urls.py
filter_active_originals.py	filter_active_originals.py
fix_heatmap_data.py	fix_heatmap_data.py
generate-leaderboard-data.py	generate-leaderboard-data.py
query_github_cache.py	query_github_cache.py
regenerate_heatmap.py	regenerate_heatmap.py
requirements.txt	requirements.txt
research_config.yaml	research_config.yaml
sync-claude-md.sh	sync-claude-md.sh
update_research.py	update_research.py

Research Update Scripts

Automated maintenance scripts for the AgentReady research report.

Overview

The update_research.py script runs weekly via GitHub Actions to:

Search for recent research on AI-assisted development
Analyze relevance using Claude API
Update agent-ready-codebase-attributes.md with new citations
Create a pull request for review

Setup

1. Install Dependencies

pip install anthropic pyyaml requests python-dotenv

2. Set Environment Variables

export ANTHROPIC_API_KEY="sk-ant-api03-..."

3. Configure Settings

Edit research_config.yaml to customize:

max_updates_per_run: How many attributes to update per week
min_citation_quality_score: Threshold for including updates
priority_attributes: Which attributes get updated first

Manual Usage

Run Full Update

python scripts/update_research.py

Test Configuration

# Verify config loads correctly
python -c "import yaml; print(yaml.safe_load(open('scripts/research_config.yaml')))"

GitHub Actions Integration

The workflow runs automatically every Monday at 9 AM UTC.

Manual trigger:

gh workflow run research-update.yml

View recent runs:

gh run list --workflow=research-update.yml

Output

Exit Codes

0: Changes made, PR should be created
1: No changes needed or error occurred

Files Modified

agent-ready-codebase-attributes.md: Research report content
- Updated attribute sections with new findings
- New citations added
- Version incremented
- Date updated to current

Configuration Reference

`research_config.yaml`

update_settings:
  max_updates_per_run: 5          # Limit changes per PR
  min_citation_quality_score: 0.7  # Claude relevance threshold
  search_recency_months: 12        # Only recent research

priority_attributes:
  - "1.1"  # CLAUDE.md
  - "2.1"  # README
  # ... Tier 1 attributes processed first

search_domains:
  prioritized:
    - anthropic.com
    - arxiv.org
    # ... High-authority sources
  blocked:
    - spam-site.com
    # ... Low-quality sources to avoid

Development

Test Search Functionality

from update_research import ResearchUpdater

updater = ResearchUpdater()
results = updater.search_recent_research("1.1", "CLAUDE.md Configuration Files")
print(f"Found {len(results)} results")

Test Relevance Analysis

from update_research import ResearchUpdater

updater = ResearchUpdater()
analysis = updater.analyze_relevance(
    "1.1",
    search_results,
    "Current attribute content..."
)
print(f"Relevance score: {analysis['relevance_score']}")

Dry Run (No File Modifications)

# Comment out the write operations in update_attribute_section()
# to test without modifying files

Troubleshooting

No Updates Generated

Possible causes:

min_citation_quality_score too high
No recent research found
Search API issues

Solutions:

Lower threshold in config
Check search functionality manually
Verify API credentials

PR Not Created

Possible causes:

Script exited with code 1 (no changes)
GitHub Actions permissions issue
Branch conflicts

Solutions:

Check workflow logs: gh run view --log
Verify repository permissions
Delete stale automated/research-update branch

API Rate Limits

Claude API:

Default: 1,000 requests/minute
Cost: ~$0.30 per weekly run
Caching: Search results cached 7 days

Mitigation:

Reduce max_updates_per_run
Increase search_recency_months (fewer new results)

Cost Estimation

Weekly Run (5 Updates)

Search generation: 5 × ~2K tokens = 10K tokens
Relevance analysis: 5 × ~6K tokens = 30K tokens
Total: ~40K tokens/week ≈ $0.30/week

Annual Cost

~$15-20/year for weekly automation
Scales linearly with max_updates_per_run

Security

API Key Protection

Store only in GitHub Secrets (never commit)
Rotate quarterly or after team changes
Use least-privilege API keys

Content Validation

All URLs verified before adding
Malicious content filtered
JSON parsing validated to prevent injection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Research Update Scripts

Overview

Setup

1. Install Dependencies

2. Set Environment Variables

3. Configure Settings

Manual Usage

Run Full Update

Test Configuration

GitHub Actions Integration

Output

Exit Codes

Files Modified

Configuration Reference

`research_config.yaml`

Development

Test Search Functionality

Test Relevance Analysis

Dry Run (No File Modifications)

Troubleshooting

No Updates Generated

PR Not Created

API Rate Limits

Cost Estimation

Weekly Run (5 Updates)

Annual Cost

Security

API Key Protection

Content Validation

Related Documentation

FilesExpand file tree

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md

Research Update Scripts

Overview

Setup

1. Install Dependencies

2. Set Environment Variables

3. Configure Settings

Manual Usage

Run Full Update

Test Configuration

GitHub Actions Integration

Output

Exit Codes

Files Modified

Configuration Reference

research_config.yaml

Development

Test Search Functionality

Test Relevance Analysis

Dry Run (No File Modifications)

Troubleshooting

No Updates Generated

PR Not Created

API Rate Limits

Cost Estimation

Weekly Run (5 Updates)

Annual Cost

Security

API Key Protection

Content Validation

Related Documentation

`research_config.yaml`