AskYourDocs is a privacy-first, local-only CLI tool that transforms your document collections into an intelligent Q&A system. Using advanced RAG (Retrieval Augmented Generation) technology, it allows you to ask natural language questions about your documents and get accurate, contextual answers with source citations.
- π 100% Privacy: All processing happens locally, your documents never leave your machine
- π§ Intelligent Q&A: Ask natural language questions and get contextual answers
- π Multi-Format Support: PDF, Word, PowerPoint, Markdown, code files, and more
- β‘ Fast Retrieval: Hybrid search combining semantic and keyword matching
- π― Source Attribution: Every answer includes citations to source documents
- π Incremental Updates: Only processes changed files for efficiency
- π¨ Beautiful CLI: Rich terminal output with progress bars and colors
- βοΈ Highly Configurable: YAML-based configuration for all settings
# Basic installation (local models only)
pip install askyourdocs
# With remote LLM support
pip install askyourdocs[remote]
# With GPU acceleration
pip install askyourdocs[gpu]
# Full installation with all features
pip install askyourdocs[all]# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Install with all extras for development
poetry install --extras "all"
# Run a basic command
poetry run askyourdocs --help# Clone the repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Or install with optional dependencies
pip install -e ".[gpu,remote,dev]"-
Install Ollama (for local LLM inference):
# macOS brew install ollama # Linux curl -fsSL https://ollama.ai/install.sh | sh # Windows (WSL) curl -fsSL https://ollama.ai/install.sh | sh
-
Start Ollama and download the default model:
# Start Ollama service ollama serve # In another terminal, download the default lightweight model ollama pull tinyllama:1.1b # Or download a more capable model (larger download) ollama pull llama3.1:8b
If you prefer to use remote LLM providers, you'll need API keys:
OpenAI Setup:
# Install with OpenAI support
pip install askyourdocs[openai]
# Set your API key
export OPENAI_API_KEY="your-api-key-here"
# Configure for OpenAI
askyourdocs config setup --provider openaiAnthropic Setup:
# 1. Install with remote provider support
pip install askyourdocs[remote]
# 2. Get your API key from https://console.anthropic.com/settings/keys
export ANTHROPIC_API_KEY="your-api-key-here"
# 3. Configure for Anthropic (recommended)
askyourdocs config setup --provider anthropicAzure OpenAI Setup:
# 1. Install with remote provider support
pip install askyourdocs[remote]
# 2. Set your credentials
export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
# 3. Configure for Azure (recommended)
askyourdocs config setup --provider azure-
Index your documents:
# Index documents in current directory askyourdocs ingest # Index specific directory askyourdocs ingest ./my-documents # Index with progress and verbose output askyourdocs ingest ./docs --verbose
-
Ask questions:
# Ask a question askyourdocs ask "What are the main conclusions in the research papers?" # Ask with specific number of sources askyourdocs ask "How does the API authentication work?" --top-k 5 # Get detailed response with full sources askyourdocs ask "Summarize the project requirements" --verbose
-
Interactive mode:
# Start interactive session askyourdocs interactive # In interactive mode: > What is the project timeline? > Can you explain the technical architecture? > exit
-
Check system status:
# View system status and configuration askyourdocs status # Validate configuration askyourdocs config validate
-
Configuration management:
# Interactive setup askyourdocs config setup # View configuration askyourdocs config show # Set specific values askyourdocs config set model.temperature 0.2 askyourdocs config set retrieval.top_k 10
askyourdocs ingest [PATH] [OPTIONS]
# Examples:
askyourdocs ingest # Current directory
askyourdocs ingest ./documents # Specific path
askyourdocs ingest --include "*.pdf,*.md" # Filter file types
askyourdocs ingest --exclude "temp/*" # Exclude patterns
askyourdocs ingest --force # Rebuild entire index
askyourdocs ingest --watch # Watch for changesOptions:
--include TEXT: File patterns to include (e.g., ".pdf,.docx")--exclude TEXT: File patterns to exclude (e.g., "temp/,.log")--force: Force rebuild of entire index--watch: Watch directory for changes and auto-update--chunk-size INTEGER: Override chunk size for processing--verbose: Show detailed processing information
askyourdocs ask "your question" [OPTIONS]
# Examples:
askyourdocs ask "What is the main thesis?"
askyourdocs ask "How do I configure the database?" --top-k 5
askyourdocs ask "Summarize key findings" --mode compact
askyourdocs ask "What are the requirements?" --streamOptions:
--top-k INTEGER: Number of relevant chunks to retrieve (default: 5)--mode TEXT: Response mode (compact/tree_summarize/accumulate)--stream: Stream response as it's generated--no-sources: Don't show source citations--threshold FLOAT: Similarity threshold for retrieval (0.0-1.0)
askyourdocs search "keyword" [OPTIONS]
# Examples:
askyourdocs search "authentication"
askyourdocs search "machine learning" --limit 10
askyourdocs search "API" --format jsonaskyourdocs refresh [OPTIONS]
# Examples:
askyourdocs refresh # Rebuild current index
askyourdocs refresh --reset # Delete and rebuild from scratch
askyourdocs refresh --optimize # Optimize vector storeaskyourdocs status
# Example output:
π AskYourDocs Status
βββ π Documents: 1,247 files indexed
βββ π§© Chunks: 5,834 text chunks
βββ πΎ Storage: 156.7 MB vector data
βββ π§ Model: llama3.1:8b (Ollama)
βββ π Embeddings: BAAI/bge-small-en-v1.5
βββ βοΈ Config: ~/.config/askyourdocs/config.yamlaskyourdocs config [COMMAND] [OPTIONS]
# View current configuration
askyourdocs config show
askyourdocs config show --format yaml
askyourdocs config show --section model
# Set configuration values
askyourdocs config set model.name llama3.1:8b
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2"
# Interactive setup
askyourdocs config setup
askyourdocs config setup --provider openai
# Validate configuration
askyourdocs config validate
# Reset to defaults
askyourdocs config reset
# Show configuration file location
askyourdocs config pathaskyourdocs interactive [OPTIONS]
# Start interactive session with custom settings
askyourdocs interactive --top-k 3 --streamaskyourdocs export --output backup.tar.gz
askyourdocs export --output backup.tar.gz --include-configaskyourdocs import --input backup.tar.gz
askyourdocs import --input backup.tar.gz --mergeAskYourDocs uses a YAML configuration file located at ~/.config/askyourdocs/config.yaml. You can customize all aspects of the tool:
model:
provider: "ollama" # Local Ollama server
name: "tinyllama:1.1b" # Lightweight model (fast, good for most tasks)
base_url: "http://localhost:11434"
temperature: 0.1 # Response creativity (0.0-2.0)
max_tokens: 2048 # Maximum response length
embedding:
provider: "huggingface" # Local embeddings
model: "BAAI/bge-small-en-v1.5" # Fast, accurate embeddings
device: "cpu" # cpu/cuda/mps/autoSetup Command: askyourdocs config setup --provider ollama
OpenAI Configuration:
model:
provider: "openai"
name: "gpt-4" # or gpt-3.5-turbo
api_key: "sk-your-key-here" # Or set OPENAI_API_KEY env var
temperature: 0.1
max_tokens: 2048
embedding:
provider: "openai" # Optional: use OpenAI embeddings
model: "text-embedding-3-small"
api_key: "sk-your-key-here"Setup Command: askyourdocs config setup --provider openai
Anthropic Configuration:
model:
provider: "anthropic"
name: "claude-3-5-sonnet-20241022" # Latest Claude model
api_key: "sk-ant-your-key-here" # Or set ANTHROPIC_API_KEY env var
temperature: 0.1
max_tokens: 2048
embedding:
provider: "huggingface" # Keep local embeddings for privacy
model: "BAAI/bge-small-en-v1.5"Setup Command: askyourdocs config setup --provider anthropic
Azure OpenAI Configuration:
model:
provider: "azure"
name: "gpt-4"
api_key: "your-azure-key"
azure_endpoint: "https://your-resource.openai.azure.com/"
azure_deployment: "your-deployment-name"Setup Command: askyourdocs config setup --provider azure
Document Processing:
chunking:
strategy: "sentence" # sentence/recursive/semantic/fixed
chunk_size: 1000 # Characters per chunk (100-8000)
chunk_overlap: 200 # Overlap between chunks
respect_boundaries: true # Respect sentence/paragraph boundaries
min_chunk_size: 100 # Minimum chunk sizeRetrieval Settings:
retrieval:
top_k: 5 # Number of chunks to retrieve (1-50)
similarity_threshold: 0.7 # Minimum similarity score (0.0-1.0)
rerank: true # Re-rank results for better relevance
retrieval_mode: "hybrid" # vector/keyword/hybrid
max_context_length: 4000 # Maximum context for LLMStorage Settings:
storage:
backend: "chromadb" # Vector database backend
path: ".askyourdocs" # Storage directory
compression: true # Enable compression
collection_name: "documents" # Collection name# 1. Install and setup
pip install askyourdocs
ollama serve # In one terminal
ollama pull tinyllama:1.1b # In another terminal
# 2. Index your documents
askyourdocs ingest ./my-documents
# 3. Ask questions
askyourdocs ask "What are the key findings?"# 1. Install with remote provider support
pip install askyourdocs[remote]
# 2. Set up OpenAI API key
export OPENAI_API_KEY="your-api-key"
# 3. Configure for OpenAI
askyourdocs config setup --provider openai
# 4. Index and query documents
askyourdocs ingest ./documents
askyourdocs ask "What are the key findings in these documents?"
# 5. Verify setup
askyourdocs status# Index your research papers
askyourdocs ingest ./research-papers --include "*.pdf"
# Ask analytical questions
askyourdocs ask "What are the common methodologies across these studies?"
askyourdocs ask "Which papers mention transformer architecture?"
askyourdocs ask "Summarize the key findings about neural networks"# Index your codebase documentation
askyourdocs ingest ./docs --include "*.md,*.rst"
# Query your docs
askyourdocs ask "How do I set up authentication?"
askyourdocs ask "What are the API rate limits?"
askyourdocs ask "Show me examples of database configuration"# Index contracts and legal docs
askyourdocs ingest ./legal --include "*.pdf,*.docx"
# Ask specific questions
askyourdocs ask "What are the termination clauses?"
askyourdocs ask "What payment terms are specified?"
askyourdocs ask "Are there any liability limitations?"
# Query specific contract types
askyourdocs ask "What are the key terms?" --path ./employment-contracts
askyourdocs ask "What are the renewal conditions in ./service-agreements?"AskYourDocs supports querying specific paths, with automatic ingestion if needed:
# Method 1: Using --path option
askyourdocs ask "What are the main topics?" --path ./research-papers
# Method 2: Include path in question
askyourdocs ask "What are the key findings in ./data-analysis?"
# Auto-ingestion: If path isn't indexed, it will be ingested automatically
askyourdocs ask "Summarize the content" --path ./new-documents
# Multiple path queries
askyourdocs ask "Compare findings in ./study-a vs ./study-b"# Switch to different providers (recommended method)
askyourdocs config setup --provider ollama
askyourdocs config setup --provider openai
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider azure
# Interactive setup (choose provider during setup)
askyourdocs config setup
# Advanced: Direct configuration (for automation/scripts)
askyourdocs config set chunking.chunk_size 1500
askyourdocs config set embedding.device "cuda"
askyourdocs config set retrieval.top_k 10
# View current configuration
askyourdocs config show
# Validate configuration
askyourdocs config validate# Check system status
askyourdocs status
# Refresh index (incremental)
askyourdocs refresh
# Full rebuild (when changing chunk settings)
askyourdocs refresh --reset
# Optimize vector store
askyourdocs refresh --optimize# Create backup
askyourdocs export --output documents-backup.tar.gz --include-config
# Restore from backup
askyourdocs import --input documents-backup.tar.gz
# Merge with existing index
askyourdocs import --input additional-docs.tar.gz --merge| Category | Formats | Extensions |
|---|---|---|
| Documents | PDF, Word, PowerPoint, OpenDocument | .pdf, .docx, .pptx, .odt, .odp |
| Text | Plain text, Markdown, reStructuredText | .txt, .md, .rst, .csv |
| Code | Source code, configuration files | .py, .js, .java, .cpp, .yaml, .json |
| Structured | HTML, XML, LaTeX, Jupyter | .html, .xml, .tex, .ipynb |
AskYourDocs uses a modern RAG architecture:
- Document Ingestion: Files are processed and split into semantic chunks
- Embedding Generation: Text chunks are converted to vector embeddings
- Vector Storage: ChromaDB stores embeddings with metadata for fast retrieval
- Query Processing: User questions are embedded and matched against stored vectors
- Context Retrieval: Most relevant chunks are retrieved based on similarity
- Response Generation: Local LLM generates answers using retrieved context
- Local Processing: All operations happen on your machine
- No Data Transmission: Documents never leave your environment
- Secure Storage: Vector data stored locally with optional encryption
- No Telemetry: Zero tracking or analytics
- Open Source: Full transparency with auditable code
"Configuration issues found"
# Check what's wrong
askyourdocs status
askyourdocs config validate
# Fix with interactive setup (recommended)
askyourdocs config setup"Ollama connection failed"
# Check if Ollama is running
ollama list
# Start Ollama if not running
ollama serve
# Test connection
curl http://localhost:11434/api/tags
# Download the default model
ollama pull tinyllama:1.1b
# List available models
ollama list"No documents found"
# Check current directory
askyourdocs ingest --verbose
# Specify path explicitly
askyourdocs ingest /path/to/documents
# Check supported formats
askyourdocs ingest --include "*.pdf,*.docx,*.txt""Embedding model download failed"
# Check internet connection and try again
askyourdocs refresh
# Use different model
askyourdocs config set embedding.model "sentence-transformers/all-MiniLM-L6-v2""API key not found" (for remote providers)
# Set environment variable first
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
export OPENAI_API_KEY="your-openai-key"
export AZURE_OPENAI_API_KEY="your-azure-key"
# Then configure provider (recommended)
askyourdocs config setup --provider anthropic
askyourdocs config setup --provider openai
askyourdocs config setup --provider azure
# Verify configuration
askyourdocs config validate
askyourdocs statusPerformance Issues
# Reduce chunk size
askyourdocs config set chunking.chunk_size 800
# Reduce batch size
askyourdocs config set embedding.batch_size 16
# Optimize storage
askyourdocs refresh --optimize
# Switch to lighter model
askyourdocs config set model.name "tinyllama:1.1b"
# Use GPU acceleration (if available)
askyourdocs config set embedding.device "cuda"# Show general help
askyourdocs --help
# Show command-specific help
askyourdocs ask --help
askyourdocs ingest --help
# Show current configuration
askyourdocs config show
# Check system status
askyourdocs status# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install --extras "all"
# Run a basic command
poetry run askyourdocs --help
# Install pre-commit hooks
pre-commit install# Clone repository
git clone https://github.com/lincmba/askyourdocs.git
cd askyourdocs
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install with development dependencies
pip install -e ".[dev,gpu,remote]"
# Install pre-commit hooks
pre-commit install# Run with coverage
poetry run pytest
# or: pytest
# Run with coverage
poetry run pytest --cov=askyourdocs
# or: pytest --cov=askyourdocs
# Format code
poetry run black src/ tests/
poetry run ruff check src/ tests/
# Type checking
poetry run mypy src/
# Run all quality checks
poetry run pre-commit run --all-files
# Build package
poetry build
# Install locally for testing
poetry installNote: Local models require initial download but then work offline. Remote models require internet and API costs.
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the test suite
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- LlamaIndex: For the excellent RAG framework
- ChromaDB: For fast vector storage
- Ollama: For local LLM inference
- Rich: For beautiful terminal output
- Click: For the CLI framework
- π§ Email: lincolncmba@gmail.com