Broker

The broker custom processor load balances over multiple LLM APIs (routers) and disables resources from the pool of LLM APIs when requests fail. It is easy to change the load balancing and failover logic. The default behavior is as follows:

  • Requests to /azure/ will be forwarded to either /azure/gpt35turbo/, /azure/gpt4/, or /azure/dud/. The order is random.
  • If a request is routed to /azure/dud/, you will receive an error code. From that point on, /azure/dud/ is disabled for 10 minutes.

The disabled time can be changed using the DISABLED_TIME environment variable. The /azure/dud/ path illustrates what happens if one of the LLM APIs fails.

Prerequisites

Make sure you have (at least) two deployments. These are the defaults

gpt4
gpt35turbo

If you want to use other deployments, update the gl_config.json router->outbound->endpoint:

"path": "/azure/gpt4/",
    "endpoint": "openai/deployments/your_first_deployment/chat/completions?api-version=2023-05-15",
"path": "/azure/gpt35turbo/",
    "endpoint": "openai/deployments/your_second_deployment/chat/completions?api-version=2023-05-15",

Quick Start: Load Balance LLM APIs

1. Clone the GitHub repo

git clone https://github.com/direktoren/gecholog_resources.git

2. Set environment variables

# Set the nats token (necessary for broker to connect to gecholog)
setx NATS_TOKEN "changeme"

# Set the gui secret to be able to gecholog web interface
setx GUI_SECRET "changeme"

# Replace this with the url to your LLM API
setx AISERVICE_API_BASE "https://your.openai.azure.com/"
# Set the nats token (necessary for broker to connect to gecholog)
export NATS_TOKEN=changeme

# Set the gui secret to be able to gecholog web interface
export GUI_SECRET=changeme

# Replace this with the url to your LLM API
export AISERVICE_API_BASE=https://your.openai.azure.com/


3. Start gecholog and the broker processor

cd gecholog_resources/processors/broker
docker compose up -d

The Docker Compose command starts and configures the LLM Gateway gecholog and the processor broker. It builds the broker container locally.


NOTE: To take the app down, run docker compose down -v


4. Make the calls

This example will use Azure OpenAI, but you can use any LLM API service.

setx AISERVICE_API_KEY "your_api_key"                        
export AISERVICE_API_KEY=your_api_key          


Let's send a request to the /azure/ router. Run as many times as you like

curl -X POST ^
     -H "api-key: %AISERVICE_API_KEY%" ^
     -H "Content-Type: application/json" ^
     -d "{\"messages\": [{\"role\": \"system\",\"content\": \"Assistant is a large language model trained by OpenAI.\"},{\"role\": \"user\",\"content\": \"Who are the founders of Microsoft?\"}],\"max_tokens\": 15}" ^
     http://localhost:5380/azure/
curl -X POST -H "api-key: $AISERVICE_API_KEY" -H "Content-Type: application/json" -d '{
    "messages": [
      {
        "role": "system",
        "content": "Assistant is a large language model trained by OpenAI."
      },
      {
        "role": "user",
        "content": "Who are the founders of Microsoft?"
      }
    ],
    "max_tokens": 15
  }' "http://localhost:5380/azure/"


If you get routed to /azure/gpt4/ or /azure/gpt35turbo/ you would see a response like this

{
  "id": "chatcmpl-8nZCiOLutrIDeVT94lyXkYzdKtkDe",
  "object": "chat.completion",
  "created": 1706824088,
  "model": "gpt-35-turbo",  // OR gpt-4 
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The founders of Microsoft are Bill Gates and Paul Allen. They founded Microsoft on"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 15,
    "total_tokens": 44
  }
}

When you get routed to /azure/dud/ you would receive and empty response. From that point /azure/dud/ is disabled for DISABLED_TIME and you cannot hit a second time until it's activated again.

5. Learn about configuring gecholog via web interface

The GUI_SECRET is the password to login to the web interface of gecholog, available on http://localhost:8080/login.

Usage

Load Balancing and Disabled Routers

broker will randomly select the router to send the traffic to. Example

request1 to /azure/ randomly selects /azure/gpt4/
request2 to /azure/ randomly selects /azure/gpt4/
request3 to /azure/ randomly selects /azure/gpt35turbo/
request4 to /azure/ randomly selects /azure/dud/

broker will remove a router from the selection for DISABLED_TIME minutes when a request has failed. Example

request1 to /azure/ randomly selects /azure/gpt4/
request2 to /azure/ randomly selects /azure/dud/ failed. Disabled for DISABLED_TIME minutes
request3 to /azure/ randomly selects /azure/gpt4/
request4 to /azure/ randomly selects /azure/gpt35turbo/
request5 to /azure/ randomly selects /azure/gpt35turbo/
request6 to /azure/ randomly selects /azure/gpt4/
request7 to /azure/ randomly selects /azure/gpt35turbo/
...
request234 to /azure/ randomly selects /azure/gpt4/
# /azure/dud/ enabled again
request235 to /azure/ randomly selects /azure/gpt35turbo/
request236 to /azure/ randomly selects /azure/gpt35turbo/
request237 to /azure/ randomly selects /azure/dud/ failed. Disabled for DISABLED_TIME minutes
...

Disabled time

broker will disable a router after a failed request with environment variable DISABLED_TIME minutes. Default is DISABLED_TIME=10

Start gecholog and broker manually

# Set the nats token (necessary for broker to connect to gecholog)
setx NATS_TOKEN "changeme"

# Set the gui secret to be able to gecholog web interface
setx GUI_SECRET "changeme"

# Replace this with the url to your LLM API
setx AISERVICE_API_BASE "https://your.openai.azure.com/"
# Set the nats token (necessary for broker to connect to gecholog)
export NATS_TOKEN=changeme

# Set the gui secret to be able to gecholog web interface
export GUI_SECRET=changeme

# Replace this with the url to your LLM API
export AISERVICE_API_BASE=https://your.openai.azure.com/


Start and configure the gecholog container

cd gecholog_resources/processors/broker

# Create a docker network
docker network create gecholog

# Spin up gecholog container
docker run -d -p 5380:5380 -p 4222:4222 -p 8080:8080 \
  --network gecholog --name gecholog \
  --env NATS_TOKEN=$NATS_TOKEN \
  --env GUI_SECRET=$GUI_SECRET \
  --env AISERVICE_API_BASE=$AISERVICE_API_BASE \
  gecholog/gecholog:latest

# Copy the gl_config to gecholog (if valid it will be applied directly)
docker cp gl_config.json gecholog:/app/conf/gl_config.json

Optional: Check that the config file is applied (both statements should produce the same checksum)

docker exec gecholog ./healthcheck -s gl -p

versus

CertUtil -hashfile gl_config.json SHA256
shasum -a 256 gl_config.json
sha256sum gl_config.json


Continue with the broker processor container

# Build the processor container
docker build --no-cache -f Dockerfile -t broker .

# Start the processor container
docker run -d \
        --network gecholog --name broker \
        --env NATS_TOKEN=$NATS_TOKEN \
        --env GECHOLOG_HOST=gecholog \
        --env DISABLED_TIME=10 \
        broker

Monitor logs in realtime

You can connect to the service bus of gecholog container to see the logs from the api calls.

This command will display the router that broker has selected

nats sub --translate "jq .response.gl_path" -s "%NATS_TOKEN%@localhost" "coburn.gl.logger"
nats sub --translate "jq .response.gl_path" -s "$NATS_TOKEN@localhost" "coburn.gl.logger"


Example of output

17:19:28 Subscribing on coburn.gl.logger 
[#1] Received on "coburn.gl.logger"
"/azure/dud/"

[#2] Received on "coburn.gl.logger"
"/azure/gpt4/"

[#3] Received on "coburn.gl.logger"
"/azure/gpt4/"

[#4] Received on "coburn.gl.logger"
"/azure/gpt35turbo/"