Fast Repository Indexing Library for Coding Assistants
Loregrep is a Rust library with Python bindings that parses codebases into fast, searchable in-memory indexes. It's designed to provide coding assistants and AI tools with structured access to code functions, structures, dependencies, and call graphs.
Currently supports Rust and Python with TypeScript/JavaScript coming soon.
- Installation
- Quick Start
- What It Does
- Use Cases
- Language Support
- API Examples
- Available Tools
- Performance
- Architecture
- Examples
- Development Setup
- Contributing
- License
cargo add loregreppip install loregrep- Rust: 1.70 or later
- Python: 3.8 or later
- Memory: 100MB+ available RAM (scales with repository size)
- OS: Linux, macOS, Windows
Looking for language-specific documentation?
- 🦀 Rust Developers: See README-rust.md for Rust-specific API, CLI usage, performance tips, and integration patterns
- 🐍 Python Developers: See README-python.md for Python-specific examples, async patterns, and AI integration
- 📖 General Overview: Continue reading this README for project overview and cross-language examples
First, clone the loregrep repository to try the examples:
git clone https://github.com/Vasu014/loregrep.git
cd loregrepRust:
use loregrep::LoreGrep;
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Zero-configuration setup - auto-detects project type
let mut loregrep = LoreGrep::auto_discover(".")?;
let scan_result = loregrep.scan(".").await?;
println!("📁 Scanned loregrep repository");
println!(" {} files analyzed, {} functions found",
scan_result.files_scanned, scan_result.functions_found);
// Search for functions containing "main"
let result = loregrep.execute_tool("search_functions", json!({
"pattern": "main",
"limit": 5
})).await?;
println!("🔍 Found functions:\n{}", result.content);
Ok(())
}Python:
import asyncio
import loregrep
import os
async def main():
# Zero-configuration setup - auto-detects project type
loregrep_instance = loregrep.LoreGrep.auto_discover(".")
scan_result = await loregrep_instance.scan(".")
repo_name = os.path.basename(os.getcwd())
print(f"📁 Scanned {repo_name} repository")
print(f" {scan_result.files_scanned} files analyzed, "
f"{scan_result.functions_found} functions found")
# Search for functions containing "main"
result = await loregrep_instance.execute_tool("search_functions", {
"pattern": "main",
"limit": 5
})
print(f"🔍 Found functions:\n{result.content}")
asyncio.run(main())Expected Output (when run in loregrep repository):
📁 Scanned loregrep repository
42 files analyzed, 156 functions found
🔍 Found functions:
main (src/main.rs:15) - Entry point for CLI application
cli_main (src/cli_main.rs:8) - CLI entry point wrapper
async_main (src/internal/ai/conversation.rs:45) - Main conversation loop
...
Once you understand the basics, scan your own projects:
Rust:
// Easiest way - auto-discover project type and scan
let mut loregrep = LoreGrep::auto_discover("/path/to/your/project")?;
let scan_result = loregrep.scan("/path/to/your/project").await?;
println!("📁 Found {} functions in {} files",
scan_result.functions_found, scan_result.files_scanned);Python:
# Easiest way - auto-discover project type and scan
loregrep_instance = loregrep.LoreGrep.auto_discover("/path/to/your/project")
scan_result = await loregrep_instance.scan("/path/to/your/project")
print(f"📁 Found {scan_result.functions_found} functions in {scan_result.files_scanned} files")- Parses code files using tree-sitter for accurate syntax analysis
- Indexes functions, structs, imports, exports, and relationships in memory
- Stores complete file path information for every code element for cross-file reference tracking
- Provides 6 standardized tools that coding assistants can call to query the codebase
- Enables AI systems to understand code structure without re-parsing
Every function, struct, import, export, and function call includes its originating file path:
{
"name": "parse_config",
"file_path": "src/config.rs", // ← Always included
"start_line": 45,
"signature": "pub fn parse_config(...)"
}Why This Matters for AI Coding Assistants:
- Cross-file Navigation: AI can track where functions are defined vs. where they're called
- Impact Analysis: Understand which files are affected by changes
- Context Awareness: AI knows the full location context when suggesting code changes
- Refactoring Safety: Track all references across the entire codebase
Before (v0.3.2 and earlier):
{
"name": "parse_config",
"signature": "pub fn parse_config(path: &str) -> Result<Config>",
"line_number": 45
}AI knows the function exists but not where it's located.
After (v0.4.2+):
{
"name": "parse_config",
"file_path": "src/config.rs", // ← Now included!
"signature": "pub fn parse_config(path: &str) -> Result<Config>",
"start_line": 45,
"end_line": 52
}AI knows exactly where the function is defined and can track cross-file relationships.
Real-World Impact:
- Question: "How is the Config struct used across the codebase?"
- Before: AI could find Config but couldn't tell you it's defined in
src/config.rsand used insrc/main.rs,src/api.rs, etc. - After: AI can now provide a complete cross-file usage analysis with specific file locations.
- ❌ Not an AI tool itself (provides data TO AI systems)
- ❌ Not a traditional code analysis tool (no linting, metrics, complexity analysis)
- ❌ Not a replacement for LSP servers (different use case)
Create AI-powered tools that understand your codebase:
# Your custom coding assistant
assistant = MyCodingAssistant(loregrep_instance)
answer = await assistant.ask("How do I use the Config struct?")
# Assistant can now search functions, analyze files, and understand relationshipsAdd structured code understanding to chatbots and AI applications:
- Before: AI sees code as plain text
- After: AI understands functions, classes, imports, and relationships
Build advanced code search beyond text matching:
# Find all functions that call a specific function
callers = await loregrep.execute_tool("find_callers", {
"function_name": "parse_config",
"include_context": True
})Understand codebase structure and complexity:
# Get complete repository overview
tree = await loregrep.execute_tool("get_repository_tree", {
"include_file_details": True,
"show_complexity": True
})| Language | Status | Functions | Structs/Classes | Imports | Calls |
|---|---|---|---|---|---|
| Rust | ✅ Full | ✅ | ✅ | ✅ | ✅ |
| Python | ✅ Full | ✅ | ✅ | ✅ | ✅ |
| TypeScript | 📋 Planned | 📋 | 📋 | 📋 | 📋 |
| JavaScript | 📋 Planned | 📋 | 📋 | 📋 | 📋 |
| Go | 📋 Future | 📋 | 📋 | 📋 | 📋 |
Want to contribute a language analyzer? See Contributing
Loregrep provides three convenient ways to create instances:
auto_discover()- Zero configuration, automatically detects project languagesbuilder()- Full control with fluent configuration methods- Project presets - Language-specific optimized configurations (
rust_project(),python_project(),polyglot_project())
use loregrep::{LoreGrep, ToolSchema};
use serde_json::json;
// Easiest way: zero-configuration auto-discovery
let mut loregrep = LoreGrep::auto_discover(".")?;
// Alternative: configure with builder pattern for fine control
// let mut loregrep = LoreGrep::builder()
// .with_rust_analyzer() // ✅ Rust analyzer registered
// .with_python_analyzer() // ✅ Python analyzer registered
// .max_file_size(1024 * 1024) // 1MB max file size
// .max_depth(10) // Maximum 10 directory levels
// .file_patterns(vec!["*.rs", "*.py"]) // Include only these files
// .exclude_patterns(vec!["target/", "node_modules/"]) // Skip these dirs
// .respect_gitignore(true) // Honor .gitignore
// .build()?;
// Scan repository (use "." for current directory)
let scan_result = loregrep.scan(".").await?;
println!("Found {} functions in {} files",
scan_result.functions_found, scan_result.files_scanned);
// Get available tools for LLM integration
let tools: Vec<ToolSchema> = LoreGrep::get_tool_definitions();
// Execute tools
let functions = loregrep.execute_tool("search_functions", json!({
"pattern": "parse.*config",
"limit": 10,
"include_private": false
})).await?;
let file_analysis = loregrep.execute_tool("analyze_file", json!({
"file_path": "src/config.rs", // Adjust path to match your project
"include_source": true,
"show_complexity": true
})).await?;import asyncio
import loregrep
import json
async def main():
# Easiest way: zero-configuration auto-discovery
loregrep_instance = loregrep.LoreGrep.auto_discover(".")
# Alternative: configure with builder pattern for fine control
# loregrep_instance = (loregrep.LoreGrep.builder()
# .with_rust_analyzer() # ✅ Rust analyzer registered
# .with_python_analyzer() # ✅ Python analyzer registered
# .max_file_size(1024 * 1024)
# .max_depth(10)
# .file_patterns(["*.py", "*.rs", "*.js"])
# .exclude_patterns(["__pycache__/", "node_modules/"])
# .respect_gitignore(True)
# .build())
# Scan repository (use "." for current directory)
scan_result = await loregrep_instance.scan(".")
print(f"Found {scan_result.functions_found} functions in {scan_result.files_scanned} files")
# Get available tools
tools = loregrep.LoreGrep.get_tool_definitions()
for tool in tools:
print(f"Tool: {tool.name} - {tool.description}")
# Execute tools
functions = await loregrep_instance.execute_tool("search_functions", {
"pattern": "parse.*config",
"limit": 10,
"include_private": False
})
file_analysis = await loregrep_instance.execute_tool("analyze_file", {
"file_path": "src/config.py", # Adjust path to match your project
"include_source": True,
"show_complexity": True
})
asyncio.run(main())Loregrep provides 6 standardized tools designed for LLM integration:
Find functions by name or pattern across the codebase.
Input:
{
"pattern": "parse.*config",
"limit": 10,
"include_private": false,
"file_pattern": "*.rs"
}Output:
{
"functions": [
{
"name": "parse_config",
"file_path": "src/config.rs",
"line_number": 45,
"signature": "pub fn parse_config(path: &str) -> Result<Config>",
"visibility": "public",
"complexity": 3,
"start_line": 45,
"end_line": 52
},
{
"name": "load_config",
"file_path": "src/main.rs",
"line_number": 23,
"signature": "fn load_config(file: &Path) -> Config",
"visibility": "private",
"complexity": 2,
"start_line": 23,
"end_line": 28
}
],
"total_found": 2
}Use Case: Find entry points, locate specific functionality, discover API patterns.
Find structures, classes, and types by name or pattern.
Input:
{
"pattern": "Config",
"limit": 5,
"include_fields": true
}Output:
{
"structs": [
{
"name": "Config",
"file_path": "src/config.rs",
"line_number": 12,
"fields": [
{"name": "name", "field_type": "String", "is_public": true},
{"name": "port", "field_type": "u16", "is_public": true},
{"name": "debug", "field_type": "bool", "is_public": true}
],
"visibility": "public",
"start_line": 12,
"end_line": 16
},
{
"name": "DatabaseConfig",
"file_path": "src/db/mod.rs",
"line_number": 8,
"fields": [
{"name": "url", "field_type": "String", "is_public": false},
{"name": "max_connections", "field_type": "usize", "is_public": false}
],
"visibility": "public",
"start_line": 8,
"end_line": 12
}
]
}Use Case: Understand data structures, find models, discover type definitions.
Get comprehensive analysis of a specific file.
Input:
{
"file_path": "src/config.rs",
"include_source": true,
"show_complexity": true
}Output:
{
"file_path": "src/config.rs",
"language": "rust",
"functions": [
{
"name": "parse_config",
"file_path": "src/config.rs",
"start_line": 45,
"end_line": 52,
"signature": "pub fn parse_config(path: &str) -> Result<Config>",
"is_public": true,
"is_async": false
}
],
"structs": [
{
"name": "Config",
"file_path": "src/config.rs",
"start_line": 12,
"end_line": 16,
"is_public": true,
"fields": [{"name": "port", "field_type": "u16", "is_public": true}]
}
],
"imports": [
{
"module_path": "std::fs",
"file_path": "src/config.rs",
"line_number": 1,
"imported_items": ["read_to_string"],
"is_external": true
}
],
"exports": [
{
"exported_item": "Config",
"file_path": "src/config.rs",
"line_number": 16,
"is_public": true
}
],
"complexity_score": 15,
"source_code": "// Full source if requested"
}Use Case: Deep dive into specific files, understand file structure, code review.
Find imports, exports, and dependencies for a file.
Input:
{
"file_path": "src/main.rs",
"include_external": true,
"show_usage": true
}Output:
{
"imports": [
{
"module_path": "./config",
"file_path": "src/main.rs",
"line_number": 2,
"imported_items": ["Config"],
"is_external": false,
"usage_count": 3
},
{
"module_path": "std::fs",
"file_path": "src/main.rs",
"line_number": 1,
"imported_items": ["read_to_string"],
"is_external": true,
"usage_count": 1
}
],
"exports": [
{
"exported_item": "main",
"file_path": "src/main.rs",
"line_number": 8,
"is_public": true,
"type": "function"
}
]
}Use Case: Understand module relationships, track dependencies, refactoring impact.
Find where specific functions are called.
Input:
{
"function_name": "parse_config",
"include_context": true,
"limit": 20
}Output:
{
"callers": [
{
"function_name": "parse_config",
"file_path": "src/main.rs",
"line_number": 23,
"column": 17,
"caller_function": "main",
"context": "let config = parse_config(&args.config_path)?;",
"is_method_call": false
},
{
"function_name": "parse_config",
"file_path": "src/utils/loader.rs",
"line_number": 45,
"column": 12,
"caller_function": "load_default_config",
"context": "return parse_config(DEFAULT_CONFIG_PATH);",
"is_method_call": false
}
]
}Use Case: Impact analysis, understand function usage, refactoring safety.
Get repository structure and overview.
Input:
{
"include_file_details": true,
"max_depth": 3,
"show_stats": true
}Output:
{
"structure": {
"src/": {
"main.rs": {
"functions": 1,
"lines": 45,
"file_path": "src/main.rs",
"function_details": [
{"name": "main", "file_path": "src/main.rs", "start_line": 8, "is_public": false}
]
},
"config.rs": {
"functions": 3,
"lines": 120,
"file_path": "src/config.rs",
"function_details": [
{"name": "parse_config", "file_path": "src/config.rs", "start_line": 45, "is_public": true},
{"name": "validate_config", "file_path": "src/config.rs", "start_line": 65, "is_public": false},
{"name": "default_config", "file_path": "src/config.rs", "start_line": 85, "is_public": true}
]
}
}
},
"stats": {
"total_files": 15,
"total_functions": 67,
"languages": ["rust"],
"cross_file_references": 23
}
}Use Case: Repository overview, architecture understanding, documentation generation.
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Code Files │───▶│ Tree-sitter │───▶│ In-Memory │
│ (.rs, .py, │ │ Parsing │ │ RepoMap │
│ .ts, etc.) │ │ │ │ Indexes │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Coding Assistant│◀───│ 6 Query Tools │◀───│ Fast Lookups │
│ (Claude, GPT, │ │ (search, analyze,│ │ (functions, │
│ Cursor, etc.) │ │ dependencies) │ │ structs, etc.)│
└─────────────────┘ └──────────────────┘ └─────────────────┘
LoreGrep: Main API facade with builder patternRepoMap: Fast in-memory indexes with lookup optimizationRepositoryScanner: File discovery with gitignore support- Language Analyzers: Tree-sitter based parsing (currently Rust only)
- Tool System: 6 standardized tools for AI integration
src/
├── lib.rs # Public API exports and documentation
├── loregrep.rs # Main LoreGrep struct and builder
├── core/ # Core types and errors
├── types/ # Data structures (FunctionSignature, etc.)
├── analyzers/ # Language-specific parsers
│ └── rust.rs # Rust analyzer (tree-sitter based)
├── storage/ # In-memory indexing
│ └── repo_map.rs # RepoMap with fast lookups
├── scanner/ # File discovery
├── internal/ # Internal CLI and AI implementation
│ ├── cli.rs # CLI application
│ ├── ai/ # Anthropic client and conversation
│ └── ui/ # Progress indicators and theming
└── cli_main.rs # CLI binary entry point
Memory Management:
- Indexes built in memory for fast access
- Thread-safe with
Arc<Mutex<>>design - Memory usage scales linearly with codebase size
- No external dependencies required at runtime
Error Handling:
Uses comprehensive error types from core::errors::LoreGrepError:
- IO errors (file access, permissions)
- Parse errors (malformed code)
- Configuration errors
- API errors (for AI integration)
The project includes comprehensive examples for different use cases:
basic_scan.rs- Simple repository scanning and basic queriestool_execution.rs- Complete LLM tool integration patternsfile_watcher.rs- File watching for automatic re-indexingcoding_assistant.rs- Full coding assistant implementationadvanced_queries.rs- Complex search and analysis patterns
basic_usage.py- Python API introduction and common patternsai_integration.py- Integration with OpenAI/Anthropic APIsweb_server.py- REST API wrapper for web applicationsbatch_analysis.py- Processing multiple repositories
Rust:
cargo run --example basic_scan -- /path/to/repo
cargo run --example coding_assistantPython:
cd python/examples
python basic_usage.py
python ai_integration.py- Rust 1.70 or later
- Python 3.8+ (for Python bindings)
- For AI integration tests: Anthropic API key
git clone https://github.com/Vasu014/loregrep.git
cd loregrep
# Build Rust library
cargo build --release
# Build Python package (requires maturin)
pip install maturin
maturin develop --features python# Run all tests
cargo test
# Run specific test suites
cargo test public_api_integration # Public API tests
cargo test cli::tests # CLI tests
cargo test storage::tests # Storage/indexing tests
# Run with output
cargo test -- --nocapture
# Test Python bindings
cd python && python -m pytestThe CLI is primarily for development and testing:
# Build development binary
cargo build --bin loregrep
# Basic commands for testing
./target/debug/loregrep scan .
./target/debug/loregrep search "parse" --type function
./target/debug/loregrep analyze src/main.rs- ✅ 60+ tests passing across core functionality
⚠️ 8 pre-existing test failures in older modules (technical debt)- ✅ 100% pass rate on new Phase 3B+ tests
We welcome contributions! Loregrep prioritizes the library API over CLI functionality - focus on core indexing and analysis capabilities that enable AI coding assistants.
Help expand beyond Rust to support more programming languages:
-
Python analyzer (
src/analyzers/python.rs)- Functions, classes, decorators
- Import/export analysis
- Method calls and inheritance
-
TypeScript analyzer (
src/analyzers/typescript.rs)- Interfaces, types, generics
- ES6+ features, async/await
- Module system analysis
-
JavaScript analyzer (
src/analyzers/javascript.rs)- Functions, classes, arrow functions
- CommonJS and ES6 modules
- Dynamic features handling
- Memory optimization for large repositories (>100k files)
- Incremental update detection when files change
- Query result caching improvements
- Parallel parsing optimizations
- Call graph visualization and analysis
- Dependency impact analysis
- Cross-language project support
- Code complexity metrics
-
Fork and Clone
git clone https://github.com/Vasu014/loregrep.git cd loregrep -
Create Feature Branch
git checkout -b feature/python-analyzer
-
Develop and Test
cargo build cargo test cargo clippy cargo fmt -
Test Integration
# Test CLI cargo build --bin loregrep ./target/debug/loregrep scan examples/ # Test public API cargo run --example basic_scan # Test Python bindings (if applicable) maturin develop --features python cd python && python examples/basic_usage.py
-
Submit Pull Request
- Include tests for new functionality
- Update documentation for public API changes
- Follow existing code style and patterns
- Use
rustfmtfor formatting:cargo fmt - Use
clippyfor linting:cargo clippy - Add comprehensive tests for new functionality
- Update documentation for any public API changes
- Follow existing module organization patterns
- Include examples for new features
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Design discussions and questions
- Discord: Real-time chat (link in issues)
- TypeScript Analyzer: Complete TS/JS support including modern features
- JavaScript Analyzer: Full JavaScript support with ES6+ features
- Go Language Support: Package analysis and interface tracking
- Call Graph Analysis: Function call extraction and visualization
- Dependency Tracking: Advanced import/export analysis
- Incremental Updates: Smart re-indexing when files change
- Performance Improvements: 2x improvement in scanning speed for large repositories
- Memory Optimization: Improved handling of very large repositories
- Database Persistence: Optional disk-based storage for massive codebases
- Distributed Analysis: Support for monorepos and multi-service codebases
- MCP Server Integration: Standard Model Context Protocol interface
- Editor Integrations: VS Code and IntelliJ plugins
- API Stability: Stable public API with semantic versioning
Want to help with any roadmap item? Check out our Contributing Guide!
This project is dual-licensed under either:
Choose the license that best fits your use case. Most users prefer MIT for simplicity.
Ready to get started? Jump to Installation or try the Quick Start example!