Playground-Model-Testing

Project Overview

This is a Python testing framework for validating AI model responses across different providers through the SEMOSS API. The framework runs standardized tests against multiple models and uses OpenAI models to confirm response quality.

Docker Setup (Recommended)

Create an .env file based on the .env.example provided in the root directory.
Open Docker Desktop and run docker-compose up --build in the root directory of this project to start the server and frontend services. Make sure your SEMOSS instance is running.

If you are developing and want to see code changes reflected you will need to rebuild the docker containers using docker-compose up --build after making changes.

The server will be available at http://localhost:8888 and the frontend at http://localhost:3000.

Local Environment Setup (You don't need this if using Docker)

Required Environment Variables (in .env):

SEMOSS_ACCESS_KEY - Access key for SEMOSS API
SEMOSS_SECRET_KEY - Secret key for SEMOSS API
SEMOSS_BASE_URL - Base URL for SEMOSS instance (e.g., http://localhost:9090/Monolith/api)
OPENAI_API_KEY - OpenAI API key for confirmation testing

UV Server Installation

Install Server dependencies in root directory:

uv venv

Then install required packages:

uv sync

If the above doesn't work, use:

uv pip install -r pyproject.toml

Pip Installation

Alternatively, you can set up the environment using pip: Install required packages using pip:

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:

pip install -r pyproject.toml

Frontend Installation

Install Frontend dependencies:

cd client

npm install

Running Application

To run the server, use the following command:

python server.py

The server will start of port 8888 To run the frontend, use the following command in the client directory:

npm run dev

Proceed to http://localhost:3000 in your web browser.

Project Structure

src/: Contains all source code.
- runners/: Logic for executing tests against selected models.
- tests/: Standardized test cases and response models.
- utils/: Utility functions and model definitions.
- confirmations/: Logic for confirming test responses using OpenAI models.
- pixels/: Pixel factory class for creating pixel calls

Adding New Models

To add a new model, update the models list in src/utils/models.py with the new model's details.

Adding New Tests

Create a new method in src/tests/standard_tests.py or create a new file/class with the method.
(If required) Update the Pixel Maker class to include any new parameters needed for the test.
Then update the TestSelections class in src/runners/runners.py to include the new test option.
Update the run_selected_tests function in src/runners/runners.py to execute the new test when selected.

Features to Add

Ability to add models through UI: Update the code to read models from a JSON file so that we can add models through the UI instead of hardcoding them in models.py
Full Capabilities Test: Eventually when we have more tests built, I want the ability to add a model, run the full test suite and return a table of the capabilities of the model

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
client		client
server_src		server_src
src		src
test-images		test-images
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
server.py		server.py
start-dev.ps1		start-dev.ps1
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Playground-Model-Testing

Project Overview

Docker Setup (Recommended)

Local Environment Setup (You don't need this if using Docker)

UV Server Installation

Pip Installation

Frontend Installation

Running Application

Project Structure

Adding New Models

Adding New Tests

Features to Add

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

SEMOSS/Playground-Model-Testing

Folders and files

Latest commit

History

Repository files navigation

Playground-Model-Testing

Project Overview

Docker Setup (Recommended)

Local Environment Setup (You don't need this if using Docker)

UV Server Installation

Pip Installation

Frontend Installation

Running Application

Project Structure

Adding New Models

Adding New Tests

Features to Add

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages