Aethelgard is a lightweight, pure-pull Federated Retrieval-Augmented Generation (FedRAG) framework. It allows you to query highly sensitive, distributed vector databases (like clinical patient data) without ever moving raw data or opening inbound corporate firewalls. Unlike traditional federated learning frameworks that focus on training models across silos, Aethelgard focuses strictly on inference and routing. If deployed, Aethelgard could eliminate millions of years of diagnostic waiting time without requiring a single Data Use Agreement (DUA), creating a scalable infrastructure for global clinical consensus.
See the HQ video presentation here
Figure 1: The UI for the Local Intelligence Node (samples/demo_app.py). The current variant is built on the basis of NiceGUI
- Pure-Pull Architecture: Edge nodes use outbound asynchronous polling. Zero inbound port-forwarding required by IT departments.
- 100% Pluggable: Ships with FastAPI and Redis defaults, but core abstractions allow easy swapping to gRPC, Kafka, AWS SQS, or GCP Pub/Sub.
- Semantic Firewall Ready: Designed to easily integrate local LLM verification (e.g., MedGemma) to sanitize vector search results before they are transmitted back to the global orchestrator.
Solving the rare disease "Diagnostic Odyssey" requires more than a single application; it requires a paradigm shift in how clinical systems communicate. Healthcare is notoriously fragmented. Every hospital has a unique IT infrastructure, differing firewall policies, and strict, incompatible data governance laws (HIPAA, GDPR, etc.).
We did not build an app. We built a protocol.
Aethelgard is designed as a foundational Federated Retrieval-Augmented Generation (FedRAG) Framework. Applications are brittle and siloed; protocols scale. We engineered Aethelgard to act as the decentralized nervous system for clinical intelligence:
- Agnostic to the UI: Whether a hospital uses Epic, Cerner, or a custom legacy Electronic Health Record (EHR) system, Aethelgard operates at the infrastructure layer, allowing local apps to hook into the global network seamlessly.
- Adaptable to any IT Environment: Built on strict Hexagonal Architecture, the core geometric and AI logic is completely decoupled from the transport layer.
- Beyond Diagnostics: While our primary demonstration focuses on rare disease diagnostics, the Aethelgard protocol can be instantly adapted for pharmacovigilance (detecting rare adverse drug reactions globally), multi-center clinical trial matching, and real-time epidemiological tracking - all without moving a single row of raw data.
Figure 2: System Design of our protocol. Super-link is built on the basis of message queue.
For each component we have a pre-defined interface in our framework - see the classes in /aethelgard
- Broadcast: The global orchestrator drops a vectorized query into a secure mailbox (Broker).
- Pull: The client node (behind a strict hospital firewall) wakes up on its 10-second heartbeat and asks, "Do I have any mail?"
- Local RAG: The client executes a local vector search (e.g., LanceDB) and sanitizes the output.
- Upload: The client pushes the safe, sanitized insight back to the orchestrator (super-link on the diagram).
The most significant technical hurdle in Federated RAG is ensuring that transmitted semantic vectors cannot be reverse-engineered to reveal patient Protected Health Information (PHI).
Our empirical evaluation of 1920-dimensional clinical vectors revealed that strict Local Differential Privacy (LDP) is mathematically incompatible with exact Top-1 retrieval utility in high-dimensional spaces. Applying standard LDP collapsed Top-1 retrieval accuracy to under 10%.
To resolve this, Aethelgard utilizes an Empirical Noise Strategy.
By applying a controlled Gaussian noise (
Result: 100% Top-1 Retrieval Accuracy across the network with zero raw data exposure.
๐ Privacy-Utility Trade-off Analysis ๐ Paper Draft
The codebase is organized to separate infrastructure from implementation logic.
Here are the main components:
aethelgard/
โโโ aethelgard/ # The core Python package
โ โโโ __init__.py
โ โโโ core/ # Abstract Base Classes & Core Utilities (The "Ports")
โ โ โโโ broker.py # Defines the BaseTaskBroker interface
โ โ โโโ config.py # Global logging and environment configuration
โ โ โโโ llm_middleware.py # Model-agnostic LLM routing (powered by LiteLLM)
โ โ โโโ smartfolder.py # SQLite-based state tracker for local file changes
โ โ โโโ transport.py # Defines ServerTransport & ClientTransport interfaces
โ โโโ brokers/ # Concrete state managers (The "Adapters")
โ โ โโโ redis_broker.py # Distributed task queue implementation using Redis
โ โโโ firewall/ # Security & Sanitization
โ โ โโโ litellm_firewall.py # The MedGemma-powered generative sanitization adapter
โ โโโ transports/ # Concrete network protocols
โ โ โโโ fastapi_server.py # REST/HTTP Orchestrator API implementation
โ โ โโโ httpx_client.py # Async HTTP client for outbound node polling
โ โโโ node.py # The Edge Node heartbeat and execution loop
โโโ pipeline/ # Scripts for GCP batch inference and data prep -
โ # only if a new dataset for experiments is needed
โโโ profiles/ # .env configuration files for different network nodes
โโโ samples/ # Demonstration scripts and interactive UIs
โ โโโ demo_app.py # The example of interactive clinician app built on NiceGUI
โ โโโ test_integration.py # Full network broadcast and consensus simulation for smoke tests
โ โโโ ...
โโโ tests/ # Unit and integration test suite
โโโ docker-compose.yml # Instantly spins up the Redis & FastAPI Orchestrator
โโโ Dockerfile # Container definition for the SuperLink server
โโโ pyproject.toml # Modern Python packaging configuration
โโโ README.md```
Aethelgard is engineered strictly on Hexagonal Architecture (Ports and Adapters) to ensure high decoupling between the clinical logic and the infrastructure layer. This design makes the protocol adaptable to any hospital IT environment and allows for easy swapping of backend components.
The central SuperLink orchestrator operates via a unified REST API (FastAPIServer) that maps directly to an abstract BaseTaskBroker.
This abstraction allows the state management to be instantly swapped from the default Redis implementation to enterprise message queues like Kafka or GCP Pub/Sub without altering the core logic.
- Broadcast: The server drops a new query into the target clients' respective queues.
- Poll: Client nodes securely hit a polling endpoint to pull their pending tasks.
- Insight & ACK: Clients push successfully sanitized insights back to the server and explicitly acknowledge (
ACK) task completion to safely clear the queue.
Figure 3: The super-link API layer. The current variant is built on the basis of FastAPI and uvicorn (aethelgard/transports/fastapi_server.py)
The Node class operates a secure, outbound-only heartbeat loop. By exclusively polling the transport layer for tasks, it completely eliminates the need for inbound corporate firewall ports.
- Upon receiving a task, the node executes a dependency-injected
search_fn(acting as the local Semantic Firewall) against the incoming query vector. - If a relevant clinical insight is found locally, it securely uploads the sanitized payload.
- The node strictly guarantees an
ACKafter successful processing (even if no matching data is found) to maintain network consensus integrity.
The LiteLLMFirewall acts as the critical generative security layer between the local vector database and the outbound network.
- It executes the local mathematical vector search using a provided retriever function (e.g., against LanceDB).
- The retrieved raw, highly sensitive clinical text is injected into a specialized Jinja prompt template.
- A local modelโsuch as MedGemma 4B routed via our LiteLLM middlewareโacts as an intelligent sanitizer to synthesize and extract only the relevant clinical protocol.
- The firewall strictly returns only the sanitized JSON payload and the similarity score, mathematically guaranteeing no raw PHI leaks.
To seamlessly manage the local ingestion of EHR notes and medical imaging, Aethelgard implements a SmartFolder utility. It acts similarly to a local git status tracker,
utilizing a lightweight SQLite database to track file modification times and sizes.
This ensures that only newly added or modified clinical records are computationally embedded and fused into the local vector store.
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/query/broadcast |
Initiates a federated query. Drops the query payload into the secure queues of all targeted client nodes. |
| GET | /api/v1/client/{client_id}/poll |
The outbound polling endpoint utilized by hospital nodes to retrieve their pending task queues. |
| POST | /api/v1/query/{request_id}/insight |
Endpoint for client nodes to push back their successfully sanitized, localized insights. |
| POST | /api/v1/query/{request_id}/ack |
Required endpoint for clients to acknowledge task completion, instructing the broker to drop the task from the active queue. |
| GET | /api/v1/query/{request_id}/consensus |
Polled by the original requesting client to retrieve the globally aggregated insights once all targeted nodes have responded. |
Figure 4: The structure of payload that is used in protocol; 1920-d combined embedding is mixed with noise before broadcasting,
the user prompt is added as is. For text embeddings we are using ollama/embeddinggemma and for CXR images google/medsiglip-448.
The core logic is in pipeline/generate_embeddings.py
Aethelgard utilizes strict JSON schemas for all network communication to ensure type safety and seamless cross-node deserialization. The data exchange revolves around three primary payloads:
- Clinical Query (
/broadcast): When a doctor initiates a search, the orchestrator receives a payload containing thequery_text(the human-readable clinical question), thequery_vector(the 1920-dimensional fused multimodal embedding, obfuscated with empirical noise), and thetarget_clients(the list of hospital nodes to poll). - Insight Submission (
/insight): When a remote node successfully finds a match and sanitizes it via the MedGemma Semantic Firewall, it returns an object containing itsclient_idand thesanitized_insight(a JSON string containing the extracted clinical protocol devoid of Protected Health Information). - Acknowledgment (
/ack): A simple payload containing theclient_id, sent by the edge node to clear the task from the orchestrator's processing queue, regardless of whether a semantic match was found.
Ensure you have the Google Cloud SDK installed and authenticated.
- Clone the repository
git clone https://github.com/akaliutau/aethelgard.git
cd aethelgard- Create and activate a Conda environment
conda create -n aethelgard python=3.12 -y
conda activate aethelgard- Install dependencies
pip install -r requirements.txt- Install ollama and Gemma/embedding models
curl -fsSL https://ollama.com/install.sh | sh
ollama --version
ollama pull embeddinggemma
ollama pull gemma3:4b
# quick test
ollama run gemma3:4b "What is the capital of France?"Cache Location: The model weights (typically a .gguf file) are cached securely on local disk:
** Linux: /usr/share/ollama/.ollama/models
NOTE: Extra steps for using Gated Models
- Accept the Terms: You cannot download these models anonymously. If they are hosted at Hugging Face, you must log in to Hugging Face, navigate to the model card page, and review the Health AI Developer Foundations terms of use. Once you click to agree, your access request is processed immediately.
- Generate a Token: Go to your Hugging Face account settings and generate an Access Token (Read permission) and store in
.envfile under HF
- (optional) Run the Editable Install
pip install -e .- Build all images
sudo docker build -t aethelgard-server:latest .
sudo docker images- (optional) Re-Generate datasets from scratch
Note: this step is only needed if you need to re-create dataset for your experiments.
Evaluating a privacy-preserving clinical network requires high-fidelity, multimodal data. To safely validate Aethelgard, we generated a highly realistic synthetic dataset distributed across our simulated hospital environments.
The dataset is a curated subset (N=66) of the CheXpert chest X-ray competition dataset. We mapped these open-access images to generative, synthetic clinical admission notes.
See more details and the instructions how to do that in dedicated page
First, we have to validate all workflow via running integration test
Run the following command to build and start the persistent Redis broker and super-link (the latter is available at http://localhost:8010/docs):
sudo docker compose up --build --remove-orphans
sudo docker psIt will start a container with redis, exposing redis://localhost:6379 for requests.
A cache folder will automatically appear in your project directory containing the /appendonlydir data.
The super-link that should be available at http://localhost:8010/docs
In other terminal run the local node using profile for the Hospital B:
python samples/03_hospital_node.py --config profiles/node_b.envIf everything is green, run the demo app using profile for the Hospital A:
python samples/demo_app.py --config profiles/node_a.envThe UI page of application will automatically open in browser.
Aethelgard is a foundation ready for enterprise scaling. Our immediate roadmap focuses on making the protocol completely invisible to the end-user while expanding its security and interoperability:
- Native OS Daemon & Zero-Touch Ollama Integration: Package the Aethelgard edge node as a lightweight,
headless background service (e.g.,
systemdfor Linux, Windows Service). This daemon will natively orchestrate local Ollama instances, dynamically loading and unloading MedGemma weights and managing the inference lifecycle automatically - requiring zero technical overhead or terminal usage from clinicians. - Automated FHIR/HL7 EHR Ingestion: Build native data pipelines to continuously ingest, vectorize, and index clinical notes and imaging directly from standard Electronic Health Record (EHR) systems (like Epic and Cerner) in real-time, completely replacing manual data uploads.
- Enterprise Message Brokers: Expand the
BaseTaskBrokerport beyond Redis. Ship drop-in adapters for state-scale deployments using Apache Kafka, GCP Pub/Sub, or AWS SQS with zero changes to the core geometric logic. - Hardware-Level Enclaves (TEE): Integrate Trusted Execution Environments (e.g., Intel SGX, AMD SEV) for the SuperLink Message Queue to guarantee mathematically and at the hardware level that the centralized routing infrastructure cannot inspect even the heavily obfuscated query vectors.
- Specialized Multi-Agent Semantic Firewalls: Evolve the single MedGemma instance into a local multi-agent system. Deploy specialized sub-agents for distinct tasks (e.g., a Genomic Privacy Agent, a Radiographic Reasoning Agent) that debate and synthesize the final, hyper-secure outbound payload.
- gRPC Multiplexing: Upgrade the FastAPI/HTTPX transports to multiplexed gRPC to support low-latency, high-volume vector polling across tens of thousands of concurrent hospital nodes globally.
If you use Aethelgard or our Empirical Noise methodology in your research, please cite our work:
@misc{kaliutau2024aethelgard,
title={Project Aethelgard: Decentralized Clinical Intelligence via Federated RAG},
author={Kaliutau, Aliaksei},
year={2026},
howpublished={\url{https://github.com/akaliutau/aethelgard}},
note={Built for the MedGemma Impact Challenge}
}
Project Aethelgard is open-source software distributed under the MIT License.
By keeping the core routing and security protocol open and accessible, we aim to lower the barrier to entry for underfunded rural clinics and state-scale hospital networks alike. See the LICENSE file for more details.
Built for the MedGemma Impact Challenge organized by Google Research.
Sharing knowledge to save lives.



