Skip to content

everettVT/daft-examples-1

 
 

Repository files navigation

Daft Examples

An examples hub for running Multimodal AI Workloads on Daft

The distributed query engine providing simple and reliable data processing for any modality and scale.

on Daft the distributed query engine providing simple and reliable data processing for any modality and scale. This repository is organized into three sections:

  1. Usage Patterns - Small atomic demonstrations of core features.
  2. Use Cases - Entire Pipelines or end-to-end workflows built in Daft.
  3. Notebooks - End to End tutorials on working with Daft in an interactive Jupyter Notebook

Getting Started

This project leverages uv scripts for dependency management isolation.

You can run any script like:

uv run usage_patterns/prompt/prompt.py

If you don't have uv, check out this installation guide.

Setting up a venv for notebooks and type hints

Quickly get started by using the following command to create a virtual environment useful for type-hinting and jupyter notebook dependencies:

make setup

Environment variables

Some examples require credentials. Create a .env file from in the repo root with the keys you need:

OPENAI_API_KEY=sk-...


# AWS (for Common Crawl access; requester pays)
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

System dependencies

  • Some examples leverage libraries like soundfile or PyAv to process audio and video files which requires ffmpeg.

Usage Patterns

Prompt

Embeddings

Classification

Common Crawl

UDFs

  • udfs/cls_with_types.py - Class-based UDFs with TypedDict, Pydantic, batch processing, and async functions
  • udfs/udf.py - Simple UDF example to extract file names from File objects

I/O

Use Cases

Transcription

Voice AI Analytics

Notebooks

About

Examples for using the Daft data engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.6%
  • Python 4.3%
  • Makefile 0.1%