Foundry Local 🚀

Foundry Local is an on-device AI inference solution that lets you run AI models locally through a CLI, SDK, or REST API. This repository provides a collection of Jupyter Notebook tutorials to help you get started and explore advanced capabilities.

🌐 Website: www.foundrylocal.ai

Foundry Local is currently in preview.

🧠 What is Foundry Local?

Foundry Local is a Microsoft on-device AI inference solution designed to let developers and organizations run modern generative AI models directly on their local hardware — Windows PCs, macOS (Apple Silicon), or servers — without relying on cloud-based endpoints.

Key Highlights

🔒 Complete Data Privacy — All prompts and outputs are processed entirely on your device. Data never leaves your system, making it ideal for sensitive, confidential, or regulated workloads in healthcare, government, finance, and more.
⚡ Low-Latency Inference — Run AI models locally for real-time, interactive experiences with minimal latency — no network round-trips required.
📴 Offline Operation — Once models are downloaded, everything works fully offline. Perfect for remote environments, air-gapped systems, or locations with unreliable connectivity.
💰 Cost Efficiency — Leverage your existing hardware (CPU, GPU, NPU) for inference, eliminating recurring cloud costs and providing predictable cost control.
🔗 OpenAI-Compatible API — Foundry Local exposes an OpenAI-compatible REST API, allowing you to use the same code for local and cloud-based inference. Switch between local and Azure endpoints by simply changing the base URL.
🛠️ Multiple Integration Options — Interact via CLI, Python SDK, JavaScript SDK, .NET SDK, or REST API — flexible integration for any workflow.
⚙️ Automatic Hardware Optimization — Foundry Local detects your hardware and automatically downloads the best-optimized model variant (NVIDIA CUDA, AMD DirectML, Apple Metal, Intel/Qualcomm NPU, or CPU with INT4/INT8 quantization).
🚀 No Azure Subscription Required — Use Foundry Local entirely standalone, though hybrid cloud-to-edge workflows with Azure AI Foundry are fully supported.

Supported Platforms

Platform	Details
Windows	Windows 10/11 (x64, ARM), Windows Server 2025
macOS	macOS with Apple Silicon (M1/M2/M3/M4)
Hardware	Min 8 GB RAM (16 GB recommended); NVIDIA, AMD, Intel, Qualcomm GPUs/NPUs, Apple Metal

Typical Use Cases

🏥 Applications handling sensitive or regulated data (HIPAA, GDPR)
🌐 Scenarios with unreliable or no internet access
🧪 Prototyping and developing AI applications before cloud deployment
⏱️ Real-time, interactive AI-driven applications requiring low latency
💸 Reducing ongoing public cloud inference costs

📚 Notebooks

#	Notebook	Description
01	Getting Started with Foundry Local	Introduction to Foundry Local — installation, setup, and running your first local model
02	Foundry Local Chat Completions	Using the chat completions API to interact with local models
03	Foundry Local Practical Applications	Real-world use cases and practical examples with Foundry Local
04	Foundry Local Mistral 7B	Running and interacting with the Mistral 7B model locally
05	Advanced Function Calling with Foundry Local	Implementing advanced function calling and tool use with local models
06	Deploying Custom Models with Microsoft Olive and Foundry Local	Optimizing and deploying custom models using Microsoft Olive

🏗️ Architecture

Foundry Local's architecture is designed for efficient, private, and scalable on-device AI inference. For the complete architecture reference, see the official documentation: Foundry Local Architecture on Microsoft Learn.

Core Components

┌─────────────────────────────────────────────────────────────────┐
│                    Developer / Application                      │
│              (CLI, Python SDK, JS SDK, .NET SDK)                │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Foundry Local Service                         │
│            (OpenAI-Compatible REST API Endpoint)                │
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐     │
│  │   Model      │  │    Cache     │  │     Service        │     │
│  │   Manager    │  │    Manager   │  │     Manager        │     │
│  └──────┬───────┘  └──────┬───────┘  └────────────────────┘     │
│         │                 │                                     │
│         ▼                 ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    ONNX Runtime                         │    │
│  │     (CPU / CUDA / DirectML / Metal / NPU Providers)     │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Local Hardware                               │
│          (CPU, NVIDIA GPU, AMD GPU, Apple Silicon, NPU)         │
└─────────────────────────────────────────────────────────────────┘

Component Details

Component	Role
Foundry Local Service	Core engine that orchestrates local AI model execution. Exposes an OpenAI-compatible REST API endpoint for inference and model management.
Model Manager	Handles the full model lifecycle — downloading, loading, unloading, compilation, and removal from cache.
Cache Manager	Manages local storage of AI models. Configure cache locations, list cached models, and optimize storage space.
Service Manager	Controls the Foundry Local Service — start, stop, monitor, and restart for maintenance or configuration changes.
ONNX Runtime	The inference engine that executes optimized models across supported hardware. Uses execution providers (CUDA, DirectML, Metal, CPU) for hardware-specific acceleration.
CLI & SDKs	Primary interfaces to interact with the service. CLI for command-line operations; Python, JavaScript, C#, and Rust SDKs for programmatic integration.

How It Works

Request — The developer sends a request via CLI, SDK, or REST API
Routing — The Foundry Local Service receives the request through its OpenAI-compatible endpoint
Model Operations — The Model Manager loads the requested model (downloading and caching if needed)
Inference — ONNX Runtime executes the inference using the optimal hardware execution provider
Response — Results are returned through the same API interface

Key Architectural Benefits

🔐 Local-first design — All processing happens on-device with no data leaving the system
🔄 Cloud-compatible — Same API interface as Azure OpenAI, enabling seamless local-to-cloud portability
⚡ Hardware-aware — Automatic detection and optimization for available compute resources
📦 Efficient caching — Models are downloaded once and cached locally for instant offline access

📖 Documentation

The official Foundry Local documentation is available at www.foundrylocal.ai and covers everything you need to get started and build on-device AI applications.

Key Documentation Resources

Resource	Link	Description
🌐 Official Website	foundrylocal.ai	Main homepage with overview, downloads, and getting started guides
📘 Microsoft Learn	Foundry Local on Microsoft Learn	In-depth documentation including concepts, quickstarts, and API references
🏗️ Architecture	Foundry Local Architecture	Detailed architecture overview and component descriptions
🚀 Getting Started Guide	Get Started with Foundry Local	Step-by-step guide to install and run your first model

What You'll Find in the Docs

Installation & Setup — How to install Foundry Local on Windows, macOS, and servers
CLI Reference — Full command-line interface documentation (foundry model list, foundry model run, etc.)
SDK Integration — Python, JavaScript, and .NET SDK guides with code examples
REST API — OpenAI-compatible REST API reference for seamless integration
Hardware Optimization — How Foundry Local auto-detects and optimizes for your hardware (NVIDIA/AMD GPU, Apple Silicon, NPU, CPU)
Custom Model Deployment — Guide to converting and deploying your own models using Microsoft Olive

🤖 Available Models

Foundry Local provides a curated catalog of pre-optimized, open-source AI models ready to run on your device. Browse the full model catalog at foundrylocal.ai/models.

Featured Models

+25 models are available.

💡 The model catalog is regularly updated. Visit foundrylocal.ai/models for the latest available models.

Hardware-Optimized Variants

Foundry Local automatically detects your hardware and downloads the best-optimized variant for your device:

🟢 NVIDIA GPU — CUDA-accelerated ONNX models
🔴 AMD GPU — DirectML-optimized models
🍎 Apple Silicon — Metal-accelerated models for M-series chips
🔵 Intel/Qualcomm NPU — Neural Processing Unit optimized models
💻 CPU — Quantized INT4/INT8 models for CPU-only inference

Model Management CLI

# List all available models in the catalog
foundry model list

# Get detailed info about a specific model
foundry model info <model-alias>

# Download and run a model
foundry model run <model-alias>

# Remove a cached model
foundry model remove <model-alias>

Bring Your Own Models

You can also deploy custom models from Hugging Face by converting them to ONNX format using Microsoft Olive. See Notebook 06 for a complete walkthrough.

A reference list of models is also available in this repository: 📊 models.xlsx

⚙️ Getting Started

Prerequisites

Python 3.10+
Foundry Local installed — see foundrylocal.ai for installation instructions
Jupyter Notebook or JupyterLab

Installation

Clone this repository:

git clone https://github.com/retkowsky/foundry-local.git
cd foundry-local

Install the required Python packages:
```
pip install -r requirements.txt
```
Launch Jupyter and open any notebook:
```
jupyter notebook
```

🔑 Key Dependencies

Package	Purpose
`foundry-local`	Core Foundry Local package
`foundry-local-sdk`	Foundry Local Python SDK
`openai`	OpenAI-compatible API client
`onnxruntime` / `onnxruntime-genai`	ONNX Runtime for model inference
`olive-ai`	Microsoft Olive for model optimization
`transformers`	Hugging Face Transformers
`torch`	PyTorch

📄 Resources

Author

Field	Details
Name	Serge Retkowsky
Created	26 February 2026
Last updated	26 February 2026
Email	serge.retkowsky@microsoft.com
LinkedIn	https://www.linkedin.com/in/serger/
Medium publications	https://medium.com/@sergems18/

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
01 Getting Started with Foundry Local.ipynb		01 Getting Started with Foundry Local.ipynb
02 Foundry Local chat completions.ipynb		02 Foundry Local chat completions.ipynb
03 Foundry Local Practical Applications.ipynb		03 Foundry Local Practical Applications.ipynb
04 Foundry Local Mistral7b.ipynb		04 Foundry Local Mistral7b.ipynb
05 Advanced Function Calling with Foundry Local.ipynb		05 Advanced Function Calling with Foundry Local.ipynb
06 Deploying Custom Models with Microsoft Olive and Foundry Local.ipynb		06 Deploying Custom Models with Microsoft Olive and Foundry Local.ipynb
README.md		README.md
foundrylocal.jpg		foundrylocal.jpg
models.jpg		models.jpg
models.xlsx		models.xlsx
requirements.txt		requirements.txt
screenshot1.jpg		screenshot1.jpg
screenshot2.jpg		screenshot2.jpg
screenshot3.jpg		screenshot3.jpg
screenshot4.jpg		screenshot4.jpg
screenshot5.jpg		screenshot5.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundry Local 🚀

🧠 What is Foundry Local?

Key Highlights

Supported Platforms

Typical Use Cases

📚 Notebooks

🏗️ Architecture

Core Components

Component Details

How It Works

Key Architectural Benefits

📖 Documentation

Key Documentation Resources

What You'll Find in the Docs

🤖 Available Models

Featured Models

Hardware-Optimized Variants

Model Management CLI

Bring Your Own Models

⚙️ Getting Started

Prerequisites

Installation

🔑 Key Dependencies

📄 Resources

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Foundry Local 🚀

🧠 What is Foundry Local?

Key Highlights

Supported Platforms

Typical Use Cases

📚 Notebooks

🏗️ Architecture

Core Components

Component Details

How It Works

Key Architectural Benefits

📖 Documentation

Key Documentation Resources

What You'll Find in the Docs

🤖 Available Models

Featured Models

Hardware-Optimized Variants

Model Management CLI

Bring Your Own Models

⚙️ Getting Started

Prerequisites

Installation

🔑 Key Dependencies

📄 Resources

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages