Centralized, AI-readable documentation extracted from 243+ frameworks, libraries, and developer tools. Automated extraction tools keep documentation current with upstream sources.
llm-code-docs/
├── docs/
│ ├── llms-txt/ # 238 sites following llms.txt standard (HIGHEST PRIORITY)
│ ├── github-scraped/ # 15 Git repository extractions
│ └── web-scraped/ # Custom web scrapers (Claude Code SDK, READMEs)
├── scripts/ # All extraction and update tools
├── AGENTS.md # Guide for AI agents using these docs
├── CLAUDE.md # AI assistant instructions
├── index.yaml # Index of all documentation sources
├── todo.md # Wishlist and future ideas
└── README.md # This file
See AGENTS.md for detailed guidance on finding and using documentation in this repository.
238 sites following the llms.txt standard - optimized for LLM consumption.
Notable sources include:
- AI/LLM: Anthropic, OpenAI, Vercel AI SDK, LangChain, Ollama
- Web Frameworks: Next.js, React, Vue, Astro, Remix, SvelteKit
- Python: FastAPI, Pydantic, Streamlit, Gradio
- JavaScript: Bun, Deno, Vite, Vitest, Zod
- Databases: Supabase, PlanetScale, Turso, Neon
- Infrastructure: Cloudflare, Vercel, Fly.io, Railway
15 repositories cloned and extracted for comprehensive documentation:
| Project | Description |
|---|---|
| CircuitPython | MicroPython for microcontrollers |
| MicroPython | Python for microcontrollers |
| Textual | Modern TUI framework |
| FastAPI | Modern Python web framework |
| Flask | Lightweight WSGI framework |
| Click | Python CLI framework |
| SQLAlchemy | Python SQL toolkit and ORM |
| Go | Official Go documentation |
| Python 3.13 | Official Python documentation |
| Goose | AI-powered developer agent |
| LibreChat | Multi-AI chat interface |
| Joplin | Note-taking application |
| BuildBuddy | Remote execution platform |
| esptool | ESP bootloader utility |
Custom scrapers for sites without llms.txt support:
- Claude Code SDK - Anthropic's Claude Code development tools
- READMEs - Individual README files from various projects
./scripts/update.sh# Update all llms.txt sites (238 sites in parallel)
python3 scripts/llms-txt-scraper.py
# Update single site
python3 scripts/llms-txt-scraper.py --site anthropic
# Update Git repository extractions
python3 scripts/extract_docs.py
# Update Claude Code SDK docs
python3 scripts/claude-code-sdk-docs.py-
Edit
scripts/llms-sites.yaml:- name: new-site base_url: https://example.com/ description: Site description
-
Download:
python3 scripts/llms-txt-scraper.py --site new-site
Central registry of all llms.txt-compliant documentation sources. Each entry specifies:
name- Unique identifier and output folder namebase_url- URL where llms.txt is locateddescription- Brief description of the documentationrate_limit_seconds(optional) - Delay between requests
Configuration for Git-based documentation extraction:
repo_url- GitHub repository URLsource_folder- Path to documentation within repotarget_folder- Output path underdocs/github-scraped/branch- Branch to clone (default: main/master)
- Smart Caching: 23-hour freshness window avoids redundant downloads
- Parallel Downloads: 15 concurrent workers for fast bulk updates
- Source Headers: Each file includes source URL for traceability
- Error Resilience: Individual failures don't stop bulk operations
- 238 llms.txt documentation sites
- 15 Git repository extractions
- 12,000+ markdown/RST files
- 300MB+ total documentation
- Check if the site has llms.txt support (visit
{docs-url}/llms.txt) - Edit
scripts/llms-sites.yamlwith the new entry - Run
python3 scripts/llms-txt-scraper.py --site new-site - Verify extraction:
ls -lh docs/llms-txt/new-site/
- Edit
scripts/repo_config.yamlwith repo details - Run
python3 scripts/extract_docs.py
Check index.yaml under not_yet_fetched for libraries we've identified but haven't extracted. See todo.md for ideas and expansion strategies.
Priority order:
- llms.txt - Highest quality, official AI-optimized format
- Git repos - Comprehensive but requires custom configuration
- Web scraping - Last resort for critical documentation
Maintained for AI-assisted development across multiple frameworks and tools.