β οΈ Important Noticeπ§ This project is currently under active development π§
Thoth is not yet ready for production use. The API and features are subject to change without notice. We recommend against using this in production environments until a stable release is available.
Thoth is inspired by the ancient Egyptian god of wisdom, writing, and knowledge, this project aims to be a reliable and intelligent storage solution for modern applications.
Thoth is an open-source, self-hosted storage solution designed for companies that want S3-like capabilities on their own infrastructure without the complexity of cloud services. Our goal is to provide a simple, no-code solution that lets organizations manage their files on-premise with minimal setup and maintenance.
- Built-in File Validation: Automatic checks for file size, type, and security
- Pre-configured Rules: Set up validation rules without writing any code
- Custom Processing: Define workflows for file processing on upload
- Security First: Built-in scanning and verification for safe file storage
Thoth offers the power of enterprise-grade object storage with the simplicity of a plug-and-play solution, making it perfect for businesses that need reliable, secure file storage without the overhead of cloud services or custom development.
Thoth is built using Hexagonal Architecture (Ports & Adapters) to ensure:
- Maintainability: Clear separation of concerns
- Testability: Easy to write unit and integration tests
- Flexibility: Swap components without affecting the core business logic
- Scalability: Designed to grow with your needs
graph TB
%% External Layer
Client[Client Applications]
Ollama[Ollama AI Service<br/>π€ llama3.2 + nomic-embed-text]
%% Application Layer
subgraph "Thoth Application"
API[REST API<br/>π‘ Controllers & Swagger]
Domain[Domain Logic<br/>π§ Use Cases & Services]
Storage[Storage Layer<br/>πΎ File System + Vector Store]
end
%% Database Layer
PostgreSQL[(PostgreSQL 17<br/>ποΈ + pgvector extension)]
%% Main Flows
Client --> API
API --> Domain
Domain --> Storage
Storage --> PostgreSQL
%% AI Integration
Domain --> Ollama
Ollama --> Storage
%% Styling
classDef external fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef application fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef database fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
class Client,Ollama external
class API,Domain,Storage application
class PostgreSQL database
Client β API β Domain Logic β Storage Layer
β
Ollama (Embedding) β Vector Store β PostgreSQL
Client β API β Domain Logic β Vector Store (Search)
β
Ollama (Chat) β Response
Client β API β Domain Logic β Storage Layer β PostgreSQL
- API Layer: RESTful API for client interactions
- Domain Layer: Core business logic and entities
- Infrastructure Layer: Implementation details (storage, database, etc.)
- Ports: Interfaces that define the application's boundaries
- Adapters: Concrete implementations of the ports
- Object Storage: Store and retrieve any type of file or binary data
- Bucket Management: Organize your data into logical containers
- Namespace Support: Multi-tenancy support for different organizations or teams
- Extensible Functions: Custom processing for stored objects
- REST API: Simple and consistent API for integration
- Self-hosted: Full control over your data
- Document Processing: Automatic text extraction and chunking
- Vector Search: Semantic similarity search using embeddings
- RAG (Retrieval-Augmented Generation): AI-powered document querying
- Intelligent Storage: AI-assisted file organization and retrieval
- Multi-Model Support: Separate models for chat and embeddings
- Vector Database: PostgreSQL with pgvector extension for efficient vector operations
- Language: Java 24
- Framework: Spring Boot 3.5.0
- Build Tool: Maven
- Database: PostgreSQL 17 with pgvector extension
- AI Integration: Spring AI 1.0.1
- AI Models:
- Chat Model: Llama 3.2 (latest)
- Embedding Model: nomic-embed-text (768 dimensions)
- Testing: JUnit 5, Testcontainers, AssertJ, Mockito
- Containerization: Docker
Thoth uses two separate AI models for different purposes:
- Purpose: Text generation, conversation, and RAG responses
- Usage: AI assistant interactions and document query processing
- Model Size: ~4.7GB
- Purpose: Convert text to numerical vectors for similarity search
- Dimensions: 768 (critical for database schema)
- Model Size: ~274MB
-
Install Ollama:
# macOS brew install ollama # Linux curl -fsSL https://ollama.ai/install.sh | sh # Windows # Download from https://ollama.ai/download
-
Start Ollama Service:
ollama serve
-
Pull Required Models:
# Pull chat model ollama pull llama3.2:latest # Pull embedding model ollama pull nomic-embed-text
-
Verify Models:
ollama list
# Run Ollama in Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Pull models
docker exec -it ollama ollama pull llama3.2:latest
docker exec -it ollama ollama pull nomic-embed-textThe models are configured in application.properties:
# AI Model Configuration
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.embedding.model=nomic-embed-text
spring.ai.ollama.chat.model=llama3.2:latest
spring.ai.ollama.vectorstore.pgvector.dimensions=768For local development, we provide a development-dependencies.yml file that sets up all necessary services using Docker Compose. This includes:
- PostgreSQL 17 with pgvector: The primary database with vector support
- pgAdmin 4: Web-based database management tool
To start the development environment:
docker-compose -f development-dependencies.yml up -dAccess the services at:
- PostgreSQL:
localhost:5432 - pgAdmin:
http://localhost:5050(email: admin@admin.com, password: admin)
Thoth leverages Testcontainers for reliable integration testing. This allows us to:
- Run tests against real database instances
- Ensure consistent test environments
- Test database migrations and queries with actual PostgreSQL
- Isolate tests using containerized dependencies
Testcontainers automatically manages the lifecycle of Docker containers, spinning up fresh instances for each test class and tearing them down afterward.
To run the tests:
mvn test- Java 24 or higher
- Maven 3.6+
- PostgreSQL 17+ with pgvector extension
- Ollama (for AI models)
- Docker (optional, for containerized deployment)
- Minimum 8GB RAM (for AI model processing)
- Minimum 10GB free disk space (for AI models)
-
Clone the repository:
git clone https://github.com/Mahmoud02/thoth.git cd thoth -
Start PostgreSQL with pgvector:
docker-compose -f development-dependencies.yml up -d
-
Install and start Ollama:
# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start Ollama service ollama serve # Pull required models ollama pull llama3.2:latest ollama pull nomic-embed-text
-
Build the project:
mvn clean install
-
Run the application:
mvn spring-boot:run
-
Verify installation:
- API Documentation:
http://localhost:8080/swagger-ui.html - Health Check:
http://localhost:8080/actuator/health
- API Documentation:
docker-compose up -dOnce the application is running, you can access the API documentation at:
- Swagger UI:
http://localhost:8080/swagger-ui.html - OpenAPI Docs:
http://localhost:8080/v3/api-docs
src/
βββ main/
β βββ java/
β β βββ com/mahmoud/thoth/
β β βββ ai/ # AI Integration Layer
β β β βββ config/ # AI Configuration (Ollama)
β β β βββ controller/ # AI Controllers (RAG, ThothAI)
β β β βββ dto/ # AI DTOs (Query, Response)
β β β βββ service/ # AI Services (Document Processing, RAG)
β β βββ api/ # REST API Layer
β β β βββ controller/v1/ # Versioned Controllers
β β β βββ doc/ # API Documentation
β β β βββ dto/ # API DTOs
β β β βββ mapper/ # Object Mappers
β β βββ config/ # Application Configuration
β β βββ domain/ # Core Business Logic (Hexagonal Architecture)
β β β βββ model/ # Domain Models
β β β βββ port/ # Ports (Interfaces)
β β β β βββ in/ # Input Ports (Commands/Queries)
β β β β βββ out/ # Output Ports (Repositories)
β β β βββ service/ # Domain Services (Use Cases)
β β βββ function/ # Bucket Functions System
β β β βββ annotation/ # Function Metadata Annotations
β β β βββ config/ # Function Configuration
β β β βββ exception/ # Function Exceptions
β β β βββ factory/ # Function Factory
β β β βββ impl/ # Function Implementations
β β βββ infrastructure/ # Infrastructure Layer
β β β βββ repository/ # Repository Adapters
β β β βββ store/ # Storage Implementations
β β β β βββ impl/sqlite/
β β β β βββ converter/ # JSONB Converters
β β β β βββ entity/ # Database Entities
β β β β βββ repository/ # SQLite Repositories
β β β βββ StorageService.java
β β βββ model/ # Shared Models
β β βββ service/ # Application Services
β β βββ shared/ # Shared Components
β β β βββ exception/ # Global Exception Handling
β β β βββ JsonUtil.java # JSON Utilities
β β βββ ThothApplication.java
β βββ resources/
β βββ application.properties # Application Configuration
β βββ db/migration/ # Database Migrations
β βββ V1__Create_Namespace_Table.sql
β βββ V2__Create_Bucket_Table.sql
β βββ V3__Create_Object_Table.sql
β βββ V4__document_chunks.sql
β βββ V5__Add_Ingested_Column_To_Objects.sql
β βββ V6__Create_Vector_Store_Table.sql
βββ test/ # Test Files
βββ java/ # Test Classes
βββ resources/ # Test Configuration
Contributions are welcome! Please read our Contributing Guidelines for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Inspired by the wisdom of Thoth, the ancient Egyptian god of knowledge
- Built with the help of the open-source community
- Special thanks to all contributors and supporters
