Skip to content

Mahmoud02/thoth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

125 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚠️ Important Notice

🚧 This project is currently under active development 🚧

Thoth is not yet ready for production use. The API and features are subject to change without notice. We recommend against using this in production environments until a stable release is available.

Thoth - Cloud Storage Solution

License Build Status Code Coverage

Thoth - Egyptian God of Wisdom

Thoth - The Egyptian God of Wisdom, Writing, and Knowledge

πŸ›οΈ Inspiration

Thoth is inspired by the ancient Egyptian god of wisdom, writing, and knowledge, this project aims to be a reliable and intelligent storage solution for modern applications.

πŸš€ Overview

Thoth is an open-source, self-hosted storage solution designed for companies that want S3-like capabilities on their own infrastructure without the complexity of cloud services. Our goal is to provide a simple, no-code solution that lets organizations manage their files on-premise with minimal setup and maintenance.

Key Capabilities:

  • Built-in File Validation: Automatic checks for file size, type, and security
  • Pre-configured Rules: Set up validation rules without writing any code
  • Custom Processing: Define workflows for file processing on upload
  • Security First: Built-in scanning and verification for safe file storage

Thoth offers the power of enterprise-grade object storage with the simplicity of a plug-and-play solution, making it perfect for businesses that need reliable, secure file storage without the overhead of cloud services or custom development.

πŸ—οΈ Architecture

Thoth is built using Hexagonal Architecture (Ports & Adapters) to ensure:

  • Maintainability: Clear separation of concerns
  • Testability: Easy to write unit and integration tests
  • Flexibility: Swap components without affecting the core business logic
  • Scalability: Designed to grow with your needs

System Architecture

graph TB
    %% External Layer
    Client[Client Applications]
    Ollama[Ollama AI Service<br/>πŸ€– llama3.2 + nomic-embed-text]
    
    %% Application Layer
    subgraph "Thoth Application"
        API[REST API<br/>πŸ“‘ Controllers & Swagger]
        Domain[Domain Logic<br/>🧠 Use Cases & Services]
        Storage[Storage Layer<br/>πŸ’Ύ File System + Vector Store]
    end
    
    %% Database Layer
    PostgreSQL[(PostgreSQL 17<br/>πŸ—„οΈ + pgvector extension)]
    
    %% Main Flows
    Client --> API
    API --> Domain
    Domain --> Storage
    Storage --> PostgreSQL
    
    %% AI Integration
    Domain --> Ollama
    Ollama --> Storage
    
    %% Styling
    classDef external fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef application fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef database fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
    
    class Client,Ollama external
    class API,Domain,Storage application
    class PostgreSQL database
Loading

Key Data Flows

πŸ“„ Document Upload & AI Processing

Client β†’ API β†’ Domain Logic β†’ Storage Layer
                              ↓
                         Ollama (Embedding) β†’ Vector Store β†’ PostgreSQL

πŸ” RAG Query Processing

Client β†’ API β†’ Domain Logic β†’ Vector Store (Search)
                              ↓
                         Ollama (Chat) β†’ Response

βš™οΈ Bucket Management

Client β†’ API β†’ Domain Logic β†’ Storage Layer β†’ PostgreSQL

Core Components

  • API Layer: RESTful API for client interactions
  • Domain Layer: Core business logic and entities
  • Infrastructure Layer: Implementation details (storage, database, etc.)
  • Ports: Interfaces that define the application's boundaries
  • Adapters: Concrete implementations of the ports

πŸ“š Features

Core Storage Features

  • Object Storage: Store and retrieve any type of file or binary data
  • Bucket Management: Organize your data into logical containers
  • Namespace Support: Multi-tenancy support for different organizations or teams
  • Extensible Functions: Custom processing for stored objects
  • REST API: Simple and consistent API for integration
  • Self-hosted: Full control over your data

AI-Powered Features

  • Document Processing: Automatic text extraction and chunking
  • Vector Search: Semantic similarity search using embeddings
  • RAG (Retrieval-Augmented Generation): AI-powered document querying
  • Intelligent Storage: AI-assisted file organization and retrieval
  • Multi-Model Support: Separate models for chat and embeddings
  • Vector Database: PostgreSQL with pgvector extension for efficient vector operations

πŸ› οΈ Technology Stack

  • Language: Java 24
  • Framework: Spring Boot 3.5.0
  • Build Tool: Maven
  • Database: PostgreSQL 17 with pgvector extension
  • AI Integration: Spring AI 1.0.1
  • AI Models:
    • Chat Model: Llama 3.2 (latest)
    • Embedding Model: nomic-embed-text (768 dimensions)
  • Testing: JUnit 5, Testcontainers, AssertJ, Mockito
  • Containerization: Docker

πŸ€– AI Models & Installation

Required AI Models

Thoth uses two separate AI models for different purposes:

1. Chat Model: Llama 3.2 (latest)

  • Purpose: Text generation, conversation, and RAG responses
  • Usage: AI assistant interactions and document query processing
  • Model Size: ~4.7GB

2. Embedding Model: nomic-embed-text

  • Purpose: Convert text to numerical vectors for similarity search
  • Dimensions: 768 (critical for database schema)
  • Model Size: ~274MB

Installing AI Models

Option 1: Using Ollama (Recommended)

  1. Install Ollama:

    # macOS
    brew install ollama
    
    # Linux
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Windows
    # Download from https://ollama.ai/download
  2. Start Ollama Service:

    ollama serve
  3. Pull Required Models:

    # Pull chat model
    ollama pull llama3.2:latest
    
    # Pull embedding model
    ollama pull nomic-embed-text
  4. Verify Models:

    ollama list

Option 2: Using Docker

# Run Ollama in Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Pull models
docker exec -it ollama ollama pull llama3.2:latest
docker exec -it ollama ollama pull nomic-embed-text

Model Configuration

The models are configured in application.properties:

# AI Model Configuration
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.embedding.model=nomic-embed-text
spring.ai.ollama.chat.model=llama3.2:latest
spring.ai.ollama.vectorstore.pgvector.dimensions=768

⚠️ Important: The embedding dimensions (768) are critical and must match the database schema exactly.

πŸš€ Getting Started

Development Setup

For local development, we provide a development-dependencies.yml file that sets up all necessary services using Docker Compose. This includes:

  • PostgreSQL 17 with pgvector: The primary database with vector support
  • pgAdmin 4: Web-based database management tool

To start the development environment:

docker-compose -f development-dependencies.yml up -d

Access the services at:

  • PostgreSQL: localhost:5432
  • pgAdmin: http://localhost:5050 (email: admin@admin.com, password: admin)

Testing with Testcontainers

Thoth leverages Testcontainers for reliable integration testing. This allows us to:

  • Run tests against real database instances
  • Ensure consistent test environments
  • Test database migrations and queries with actual PostgreSQL
  • Isolate tests using containerized dependencies

Testcontainers automatically manages the lifecycle of Docker containers, spinning up fresh instances for each test class and tearing them down afterward.

To run the tests:

mvn test

Prerequisites

  • Java 24 or higher
  • Maven 3.6+
  • PostgreSQL 17+ with pgvector extension
  • Ollama (for AI models)
  • Docker (optional, for containerized deployment)
  • Minimum 8GB RAM (for AI model processing)
  • Minimum 10GB free disk space (for AI models)

Installation

  1. Clone the repository:

    git clone https://github.com/Mahmoud02/thoth.git
    cd thoth
  2. Start PostgreSQL with pgvector:

    docker-compose -f development-dependencies.yml up -d
  3. Install and start Ollama:

    # Install Ollama
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Start Ollama service
    ollama serve
    
    # Pull required models
    ollama pull llama3.2:latest
    ollama pull nomic-embed-text
  4. Build the project:

    mvn clean install
  5. Run the application:

    mvn spring-boot:run
  6. Verify installation:

    • API Documentation: http://localhost:8080/swagger-ui.html
    • Health Check: http://localhost:8080/actuator/health

Docker Setup

docker-compose up -d

πŸ“š API Documentation

Once the application is running, you can access the API documentation at:

  • Swagger UI: http://localhost:8080/swagger-ui.html
  • OpenAPI Docs: http://localhost:8080/v3/api-docs

πŸ—οΈ Project Structure

src/
β”œβ”€β”€ main/
β”‚   β”œβ”€β”€ java/
β”‚   β”‚   └── com/mahmoud/thoth/
β”‚   β”‚       β”œβ”€β”€ ai/                 # AI Integration Layer
β”‚   β”‚       β”‚   β”œβ”€β”€ config/         # AI Configuration (Ollama)
β”‚   β”‚       β”‚   β”œβ”€β”€ controller/     # AI Controllers (RAG, ThothAI)
β”‚   β”‚       β”‚   β”œβ”€β”€ dto/            # AI DTOs (Query, Response)
β”‚   β”‚       β”‚   └── service/        # AI Services (Document Processing, RAG)
β”‚   β”‚       β”œβ”€β”€ api/                # REST API Layer
β”‚   β”‚       β”‚   β”œβ”€β”€ controller/v1/  # Versioned Controllers
β”‚   β”‚       β”‚   β”œβ”€β”€ doc/            # API Documentation
β”‚   β”‚       β”‚   β”œβ”€β”€ dto/            # API DTOs
β”‚   β”‚       β”‚   └── mapper/         # Object Mappers
β”‚   β”‚       β”œβ”€β”€ config/             # Application Configuration
β”‚   β”‚       β”œβ”€β”€ domain/             # Core Business Logic (Hexagonal Architecture)
β”‚   β”‚       β”‚   β”œβ”€β”€ model/          # Domain Models
β”‚   β”‚       β”‚   β”œβ”€β”€ port/           # Ports (Interfaces)
β”‚   β”‚       β”‚   β”‚   β”œβ”€β”€ in/         # Input Ports (Commands/Queries)
β”‚   β”‚       β”‚   β”‚   └── out/        # Output Ports (Repositories)
β”‚   β”‚       β”‚   └── service/        # Domain Services (Use Cases)
β”‚   β”‚       β”œβ”€β”€ function/           # Bucket Functions System
β”‚   β”‚       β”‚   β”œβ”€β”€ annotation/     # Function Metadata Annotations
β”‚   β”‚       β”‚   β”œβ”€β”€ config/         # Function Configuration
β”‚   β”‚       β”‚   β”œβ”€β”€ exception/      # Function Exceptions
β”‚   β”‚       β”‚   β”œβ”€β”€ factory/        # Function Factory
β”‚   β”‚       β”‚   └── impl/           # Function Implementations
β”‚   β”‚       β”œβ”€β”€ infrastructure/     # Infrastructure Layer
β”‚   β”‚       β”‚   β”œβ”€β”€ repository/     # Repository Adapters
β”‚   β”‚       β”‚   β”œβ”€β”€ store/          # Storage Implementations
β”‚   β”‚       β”‚   β”‚   └── impl/sqlite/
β”‚   β”‚       β”‚   β”‚       β”œβ”€β”€ converter/  # JSONB Converters
β”‚   β”‚       β”‚   β”‚       β”œβ”€β”€ entity/     # Database Entities
β”‚   β”‚       β”‚   β”‚       └── repository/ # SQLite Repositories
β”‚   β”‚       β”‚   └── StorageService.java
β”‚   β”‚       β”œβ”€β”€ model/              # Shared Models
β”‚   β”‚       β”œβ”€β”€ service/            # Application Services
β”‚   β”‚       β”œβ”€β”€ shared/             # Shared Components
β”‚   β”‚       β”‚   β”œβ”€β”€ exception/      # Global Exception Handling
β”‚   β”‚       β”‚   └── JsonUtil.java   # JSON Utilities
β”‚   β”‚       └── ThothApplication.java
β”‚   └── resources/
β”‚       β”œβ”€β”€ application.properties  # Application Configuration
β”‚       └── db/migration/           # Database Migrations
β”‚           β”œβ”€β”€ V1__Create_Namespace_Table.sql
β”‚           β”œβ”€β”€ V2__Create_Bucket_Table.sql
β”‚           β”œβ”€β”€ V3__Create_Object_Table.sql
β”‚           β”œβ”€β”€ V4__document_chunks.sql
β”‚           β”œβ”€β”€ V5__Add_Ingested_Column_To_Objects.sql
β”‚           └── V6__Create_Vector_Store_Table.sql
└── test/                          # Test Files
    β”œβ”€β”€ java/                      # Test Classes
    └── resources/                 # Test Configuration

🀝 Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on our code of conduct and the process for submitting pull requests.

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Inspired by the wisdom of Thoth, the ancient Egyptian god of knowledge
  • Built with the help of the open-source community
  • Special thanks to all contributors and supporters

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published