Evaluate

This is an example of doing simple model evaluation of RAG based prompt pipeline. Quality of the model response is evaulated using another LLM model. The Judge model compares the input prompt against the output responses to determine the quality of the response. The score is then used to rank the models.

Requirements

cllm installed
OpenAI API key and OPENAI_API_KEY set in the environment
Ollama installed and models downloaded with chat completion API running
Groq Cloud and GROQ_API_KEY set in the environment

Examples

Evaluate a single model ./.cllm/systems directory against the prompting guide vector store.

RAG=promptingguide
PROMPT="What is a RAG?"
task evaluate rag=${RAG} rag_fetch=5 prompt="${PROMPT}" model=l/llama

Evaluate all the models in the ./.cllm/systems directory against the prompting guide vector store.

RAG=promptingguide
PROMPT="What is a RAG?"
task all rag=${RAG} rag_fetch=5 prompt="${PROMPT}"

task report rag=${RAG}

Evaluate just the defined ollama the models using the cllm vector store.

RAG=cllm
PROMPT="What is cllm and how does it work?"
task exp-local-models rag=${RAG} rag_fetch=5 prompt="${PROMPT}"

task report rag=${RAG}

Output

The output of the evaluation is a table of the models and their scores. The scores are based on the quality of the response to the prompt. The higher the score the better the response.

Example output of the evaluation of the models against the cllm vector store. See traces/cllm directory.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.cllm		.cllm
images		images
traces/cllm		traces/cllm
README.md		README.md
Taskfile.yml		Taskfile.yml
report.py		report.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluate

Requirements

Examples

Output

About

Uh oh!

Releases

Packages

Languages

o3-cloud/evaluate

Folders and files

Latest commit

History

Repository files navigation

Evaluate

Requirements

Examples

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages