Skip to content

Your personal research assistant that remembers everything you've ever read.

License

Notifications You must be signed in to change notification settings

keshavashiya/docify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Docify - Local-First AI Second Brain

Docify - AI Second Brain

Your personal research assistant that remembers everything you've ever read.

Docify is an open-source, local-first AI application that lets you upload any resource (PDFs, URLs, documents, images, code), ask questions about them, and receive cited, grounded answersβ€”all while keeping your data completely private.

✨ Key Features

  • πŸ”’ Privacy-First: All processing happens locally (embeddings, LLM, storage)
  • 🧠 Smart Deduplication: Content-based fingerprinting prevents duplicate processing
  • πŸ“š Multi-Format Support: PDF, URL, Word, Excel, Markdown, images (OCR), code, and more
  • πŸ’¬ Cited Answers: Every response includes citations to source documents
  • πŸ” Hybrid Search: Combines semantic (vector) and keyword (BM25) search
  • πŸ€– Local LLM: Runs Mistral 7B via Ollama (optional cloud LLM support)
  • 🌐 Workspace Model: Personal, team, or hybrid collaboration
  • πŸš€ One-Command Setup: Docker Compose orchestration

πŸ—οΈ Architecture Overview

Docify's RAG pipeline integrates 11 core services:

  1. Resource Ingestion - Upload, parse, deduplicate
  2. Chunking - Semantic boundary preservation
  3. Embeddings (Async) - Vector generation via Celery
  4. Query Expansion - Better recall with variants
  5. Hybrid Search - Semantic + keyword (BM25)
  6. Re-Ranking - 5-factor scoring + conflict detection
  7. Context Assembly - Token budget management
  8. Prompt Engineering - Anti-hallucination prompts
  9. LLM Service - Ollama/OpenAI/Anthropic support
  10. Citation Verification - Verify claims against sources
  11. Message Generation - Full pipeline orchestration

See ARCHITECTURE.md for complete technical details.

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • 8GB RAM minimum (16GB recommended)
  • 20GB disk space (for models and data)

Docker Setup (Recommended)

# Clone the repository
git clone https://github.com/yourusername/docify.git
cd docify

# Copy environment configuration
cp .env.example .env

# Start all services
docker-compose up -d --build

# Wait for services to be healthy (~2-3 minutes)
docker-compose ps

# Initialize database (one-time setup)
docker-compose exec postgres psql -U docify -d docify -c "CREATE EXTENSION IF NOT EXISTS vector"
docker-compose exec backend alembic upgrade head

# Download optimized models (one-time, ~2GB total)
docker-compose exec ollama ollama pull mistral:7b-instruct-q4_0
docker-compose exec ollama ollama pull all-minilm:22m

# Restart services with models loaded
docker-compose restart backend celery-worker

Verify Setup

# Check if all containers are running
docker-compose ps

# Test API health
curl http://localhost:8000/api/health

# Monitor system resources
docker stats docify-ollama docify-backend

# View logs
docker-compose logs -f backend
docker-compose logs -f celery-worker

Access

πŸ› οΈ Local Development

Backend (Python/FastAPI)

cd backend

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start development server (requires running docker-compose services)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend (React/TypeScript)

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

πŸ“¦ Tech Stack

Backend

  • FastAPI (Python 3.10+)
  • PostgreSQL 15+ with pgvector
  • Celery + Redis (async tasks)
  • Ollama (local LLM: mistral:7b-instruct-q4_0, all-minilm:22m)
  • sentence-transformers optional (OpenAI/Anthropic support)

Frontend

  • React 18+ with TypeScript
  • Vite, Tailwind CSS
  • React Query, Zustand

Infrastructure

  • Docker & Docker Compose
  • Alembic (database migrations)

πŸ“– API Usage

Upload a Resource

curl -X POST "http://localhost:8000/api/resources/upload" \
  -F "file=@research_paper.pdf" \
  -F "workspace_id=<your-workspace-id>"

Search

curl -X POST "http://localhost:8000/api/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is RAG?", "workspace_id": "<id>"}'

Ask Questions

curl -X POST "http://localhost:8000/api/conversations/<id>/messages" \
  -H "Content-Type: application/json" \
  -d '{"content": "Explain the main findings", "role": "user"}'

🐳 Docker & Troubleshooting

Common Commands

# Start all services
docker-compose up -d

# View logs (all services)
docker-compose logs -f

# View logs for specific service
docker-compose logs -f backend
docker-compose logs -f celery-worker

# Stop all services
docker-compose down

# Stop and remove data (WARNING: deletes all data)
docker-compose down -v

# Restart specific service
docker-compose restart backend

Port Conflicts

If you get "port already in use" errors:

# PostgreSQL: Docify uses 5433 (standard is 5432)
# Redis: Docify uses 6380 (standard is 6379)
# Backend: Docify uses 8000
# Frontend: Docify uses 3000
# Ollama: Docify uses 11434

# Check what's using a port (macOS/Linux)
lsof -i :8000

# Kill process (if needed)
kill -9 <PID>

Manual API Testing

Use the built-in API documentation:

  • Open http://localhost:8000/docs in your browser
  • Try requests directly in Swagger UI
  • All endpoints are documented with request/response schemas

Alternatively, use curl:

# Health check
curl http://localhost:8000/api/health

# List workspaces
curl http://localhost:8000/api/workspaces

# Create workspace
curl -X POST http://localhost:8000/api/workspaces \
  -H "Content-Type: application/json" \
  -d '{"name":"My Workspace","workspace_type":"personal"}'

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments


Made with ❀️ for researchers, students, and knowledge workers

About

Your personal research assistant that remembers everything you've ever read.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published