Skip to content

Sandjab/mooltiproxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License Status - DEV maintained - no issues - mooltiproxy

Made with Python dependency - requests dependency - pyyaml dependency - voluptuous

Caution

mooltiproxy is a week-end side project, it was developped mid 2023 and has not been maintained since. It was useful to me back when there was no standardized interface to the various LLM servers. But since then, things have been rationalized. So it may no longer be useful, except from an educational perspective. Features may be incomplete or broken. There is no support, no documentation guarantees, and no warranty of any kind. Use at your own risk.

MOOLTIPROXY

A minimalistic HTTP proxy with URL mapping and body translation for LLMs

Mooltiproxy is an abstraction layer allowing to expose one API while adressing another API in the background. It can thus be seen as an API translation proxy.

It is a pico-framework (can't even name it micro or nano) with very basic features, and being a personal pet project, it obviously cannot compete with more complete solutions, but it is self-contained with minimal dependencies, and as such allows to rapidly experiment switching from one api to another without changing your client code.

Mooltiproxy has initially been designed for exposing OpenAI-like api endpoints, while actually using Hugging Face Text Generation Inference (TGI) endpoints, and as such it allows to switch from openAI to Open LLMs without changing your client code, whether you are using openai python bindings, LangChain, or even direct http requests, but can be used for other purposes.

Mooltiproxy offers the following features:

  • Bearer token authorization with basic security features (ip ban, whitelist, blacklist)
  • endpoint routing
  • request and answer body mapping
  • input translation for chat completion for most of the open LLMs out there, using prompt templates (e.g. transforming an history list of messages with associated roles from a chat completion endpoint into one single prompt string to address a text completion endpoint).
  • input translation for chat completion for Llama2-Chat models (which use a very specific prompt template)

So it can be used for instance:

  • to protect unprotected endpoints with an api key
  • to expose only a subset of endpoints
  • to masquerade endpoints paths
  • to translate to/from one API from/to another.

or a combination of all the above.

Installation

Clone this repository and install the dependencies in your python (virtual) environment

$ pip install -r requirements.txt

Configuration

Proxy Master Key

The proxy won't allow you to expose its endpoint to the outside world without a bearer token, and won't execute without it, so you have to define a master api key, that should be included in the http authorisation header of any requests you adress to the proxy.

The recommended method is to set the environment variable MOOLTIPROXY_KEY to the value of your choice (a string long enough and not easily guessable).

An alternate method is to set it in system.masterkey entry of the config.yaml file (see below), but this should be used only for testing purposes, and the proxy will bug you with security recommendations until you use the recommended method.

Proxy configuration file

Copy the config.template.yaml file into a config.yaml file, and edit it according to your needs. The entries names, included comments and examples should be clear enough to understand the purpose of each line.

Note: I may enrich this paragraph based on user feedback, if any.

Execution

To run the proxy, just type `python -m main' in a terminal window.

Note: The proxy logs are by defaut spit on stdout, so if you plan to run it in the background, redirect stdout to the file of your choice. I know, i could have used logging.

Debugging

In most of the cases, the log messages should be clear enough, especially regarding configuration errors, so read them.

Using it with an OpenAI client

To use it with an openai client, you only have to use the MOOLTIPROXY_KEY as your openai api key, and the mooltiproxy url as your openai api base url.

For instance, with the openai package python bindings:

import os
import openai

openai.api_base = <your proxy url and port>
openai.api_key = os.getenv('MOOLTIPROXY_KEY')

As long as your default target in config.yaml is configured, you're all set.

Now if you want to address a specific target, you have to pass it explicitely. This is achieved using the fact that most of the client apis let kwargs simply pass through, without raising errors. So as the target is passed as an http header named 'target', you just have to add headers = {'target':'your target name'}.

For instance, with the openai package python bindings:

import os
import openai

openai.api_base = <your proxy url and port>
openai.api_key = os.getenv('MOOLTIPROXY_KEY')
TARGET = 'mytarget'

response = openai.ChatCompletion.create(
    model=chat_model, # Note: with TGI, this will be disregarded, as it serves only one model
    messages= [
      {"role":"system", "content": "You are Bob, a very polite and helpfull assistant."}
      {"role": "user", "content": "Hello, who are you?"}],
    temperature=0.7,
    max_tokens=150,
    headers={"target": TARGET},  # * This is how you specify the target
)

Extensions

Body Mapping

Mooltiproxy comes with built-in back-and-forth mapping between OpenAI and TGI:

  • textReqOpenAItoTGI & textAnsTGItoOpenAI: to map OpenAI text completion endpoint (/completions) to TGI text completion endpoint (/generate)
  • chatReqOpenAItoTGI & chatAnsTGItoOpenAI: to map OpenAI chat completion endpoint (/chat/completions) to TGI text completion endpoint (/generate), whilst performing prompt translation.
  • identity : Although you dont need to specify any mapping when you don"t want to perofm any translation, you can use this function to force the proxy to pass through the translation code branch. This is may prove useful in debug mode, to see what happens behind the scene.

To support other APIs, and thus other translations, you'll need to write your own mapping functions, that can then be specified in the config.yaml file, and will be automatically invoked by the proxy main loop.

Such functions must be added into the mappers.py file, and their signature must be as follow: def yourfunctionname(ip: Any, cfg: dict = {}) -> Any, where ip is the input payload, and cfg is the configuration endpoint object as specified in the `config.yaml`` file.

Though not mandatory, it is also recommended that you adopt the following naming convention:

  • <prefix>Req<Source>to<Target> for the request mapper
  • <prefix>Ans<Target>to<Source> for the answer mapper

where prefix is a free string indicating the type of endpoint used, and Source and Target identify the source and target apis.

Here is an exemple for mapping an OpenAI text completion request to a TGI text completion requests (basically building the output payload from the input payload, setting default values, and adding some logic if a one-to-one mapping is not possible, or if some value transformations are needed):

def textReqOpenAItoTGI(ip: Any, cfg: dict = {}) -> Any:
    """Converts a payload to TGI format"""

    # * Sanity Checks
    # temperature can be O for OpenAI, but must be strictly positive for TGI
    temperature = ip.get("temperature", 0.5)
    if temperature <= 0:
        temperature = 0.01

    # build the output payload
    op = {
        "inputs": ip.get("prompt", ""),
        "parameters": {
            "best_of": ip.get("best_of", 1),
            "decoder_input_details": True,
            "details": False,
            "do_sample": False,
            "max_new_tokens": ip.get("max_tokens", 20),
            "repetition_penalty": ip.get("frequency_penalty", 1.03),
            "return_full_text": False,
            "seed": 0,
            "stop": ip.get("stop", []),
            "temperature": ip.get("temperature", 0.5),
            "top_k": ip.get("top_k", 10),
            "top_p": ip.get("top_p", 0.95),
            "truncate": None,
            "typical_p": 0.95,
            "watermark": False,
        },
    }

    return op

Prompt Mapping

Mooltiproxy comes with a built-in list of prompt mapping functions:

  • fromTemplate: a generic prompt transformation function, using a prompt temlate defined in the config.yaml file.
  • llama2_chat: a prompt transformation function specifically designed for Llama2-Chat models (not Llama2 models, which can be served via fromTemplate)

To support other prompts, you'll need to write your own prompt mapping functions, that can then be specified in the `config.yaml`` file, and will be automatically invoked by the proxy main loop.

Such functions must be added into the prompters.py file, and their signature must be as follow (only if you want to use them with the openai<->TGI built-in mappers, else if invoked from your own mappers functions, you are free to use your own) def yourfunctionname(messages: list[dict], cfg: dict) -> (str, str), where message is the list of input messages with associated roles, and cfg is the configuration endpoint object as specified in the config.yaml file. The function returns the string prompt, and a stop sequence string.

There is no specific naming convention.

For examples of prompters, see the prompters.py file.

Limitations

  • Does not support streaming apis (yet)
  • https should work (as long as you activate it in the config file, and copy cert and key .pem files in the certificates subdirectory) but is untested
  • Does not support certificate exchange for authentication
  • All https requests to a given endpoint are routed the same way, disregarding the http method. If the target does not support the request method, it will answer with a 405, which will be in turn passed back to the client. I may add the http method as a routing parameter in the future, if there is a use case.
  • For simplicity sake, the proxt acts as a YesServer when receiving OPTIONS request, without even consulting the actual api server.
  • Not designed to handle traffic
  • Has obviously an impact on api response time
  • I may not dedicate a lot more time to add features on this pet project, but i'm open to issues fixing, documentation updates, PR merging, and making the list of mappers functions and prompts templates evolve based on users' feedback.

About

MooltyProxy - A simple HTTP proxy with URL mapping and body translation for LLMs

Resources

License

Stars

Watchers

Forks

Languages