MEDGEM is an offline inference engine for MedGemma-1.5-4B, a medical-domain Multimodal Large Language Model (MLLM). It enables secure, high-performance inference on local devices without requiring internet connectivity.
- Android Version: Android 14 (API 34) or higher.
- RAM: 12GB or higher (required for 4B parameter models). (Note: The app will be able load the text model on devices with 8GB RAM with vision encoder disabled, but the RAG module will require at least 12GB for optimal performance.)
- Storage: ~4GB free space for model checkpoints.
- Offline Inference: Run models locally for maximum data privacy and security.
- Multimodal Support: Process both text and visual medical data (X-Rays, scans). Vision Encoder can be disabled to save memory on lower-end devices.
- RAG (Retrieval-Augmented Generation): Integrate external knowledge bases to improve accuracy and reduce hallucinations. The app comes pre-loaded with a medical knowledge base.
- Patient Management & SOAP Notes: Create patient profiles and generate structured SOAP (Subjective, Objective, Assessment, Plan) notes from visit data using AI.
- Thinking Mode: Enable Chain-of-Thought reasoning (Gemini-style thinking) for complex medical queries to get more reasoned responses.
- Knowledge Search: Perform direct semantic searches on the medical database to find relevant information without starting a chat.
- Protocol Viewer: Quick offline access to essential medical PDF protocols.
- Customizable Inference: Fine-tune generation parameters (Temperature, Top-P, Max Tokens), set custom System Prompts, and adjust Chunk Sizes for performance optimization.
We evaluated our on-device quantized models against their original HuggingFace counterparts to ensure minimal quality loss during edge deployment.
| Model | On-Device Format | HF Reference | Result |
|---|---|---|---|
| MedGemma 1.5 4B | ExecuTorch (8da4w) |
google/medgemma-1.5-4b-it |
β Clinically Equivalent |
| MedAsr | ONNX int8 (sherpa-onnx) | google/medasr |
β 0.00% WER |
| EmbeddingGemma 300M | LiteRT int8 (TFLite) | google/embeddinggemma-300m |
β 0.9987 Cosine Sim |
For detailed methodology, see the Evaluation Guide and Full Evaluation Report.
We have detailed documentation available in the docs/ directory:
- π Setup & Installation: Start here! Downloading models, device setup, and build instructions.
- ποΈ Architecture: System design, component interaction, and custom optimizations.
- π¨ UI/UX Guide: Overview of screens, theming, and visualization features.
- π Evaluation Report: Quantization quality benchmarks.
- π§ͺ Evaluation Guide: How to run the evaluation scripts.
- π§ Data Pipeline & RAG: Ingestion workflow, RAG pipeline, and script explanations.
- πΎ Database Schema: Detailed look at the ObjectBox database design.
- π§ Troubleshooting: Solutions for common issues (OOM, missing files).
- π€ Contributing: Guidelines for code style and pull requests.
- Install Prerequisites: Android Studio, SDK/NDK,
uv,git. - Download Checkpoints:
uv tool install hf hf auth login # Download models (LLM, ASR, and Embedding) hf download kamalkraj/medgemma-1.5-4b-it-executorch --local-dir models/llm hf download kamalkraj/medasr-onnx --local-dir models/asr hf download kamalkraj/embeddinggemma-300m-litert --local-dir models/embedding - Push to Device: See SETUP.md for detailed
adb pushand internal directory move commands. - Build & Run: Open in Android Studio and run on your device.
For detailed setup instructions, including manual model conversion and building AARs from source, please refer to SETUP.md.
The application already includes a pre-built database (app/src/main/assets/initial_data.mdb). If you want to add additional PDFs to the knowledge base, refer to the RAG Ingestion Module.
