Mock

The mock custom processor lets you easily mock and replicate your LLM API responses. It works like this:

Make a regular LLM API request to any gecholog router, for example, /service/standard/
The mock custom processor will record the response payload and response headers
Send as many requests as you want to /mock/service/standard/ to get the same response over and over again

The mock processor will randomize the response time to simulate an LLM API. You can change this behavior via the LAMBDA environment variable.

Quick Start: Simulate LLM API Responses

1. Clone the GitHub repo

git clone https://github.com/direktoren/gecholog_resources.git

2. Set environment variables

Windows Command Line

# Set the nats token (necessary for mock to connect to gecholog)
setx NATS_TOKEN "changeme"

# Set the gui secret to be able to gecholog web interface
setx GUI_SECRET "changeme"

# Replace this with the url to your LLM API
setx AISERVICE_API_BASE "https://your.openai.azure.com/"

MacOS/Linux

# Set the nats token (necessary for mock to connect to gecholog)
export NATS_TOKEN=changeme

# Set the gui secret to be able to gecholog web interface
export GUI_SECRET=changeme

# Replace this with the url to your LLM API
export AISERVICE_API_BASE=https://your.openai.azure.com/

3. Start `gecholog` and the `mock` processor

cd gecholog_resources/processors/mock
docker compose up -d

The Docker Compose command starts and configures the LLM Gateway gecholog and the processor mock. It builds the mock container locally.

NOTE: To take the app down, run docker compose down -v

4. Make the calls

This example will use Azure OpenAI, but you can use any LLM API service.

Windows Command Line

setx AISERVICE_API_KEY "your_api_key"              
setx DEPLOYMENT "your_azure_deployment"

MacOS/Linux

export AISERVICE_API_KEY=your_api_key      
export DEPLOYMENT=your_azure_deployment

Send the request to the /service/standard/ router:

Windows Command Line

curl -X POST ^
     -H "api-key: %AISERVICE_API_KEY%" ^
     -H "Content-Type: application/json" ^
     -d "{\"messages\": [{\"role\": \"system\",\"content\": \"Assistant is a large language model trained by OpenAI.\"},{\"role\": \"user\",\"content\": \"Who are the founders of Microsoft?\"}],\"max_tokens\": 15}" ^
     http://localhost:5380/service/standard/openai/deployments/%DEPLOYMENT%/chat/completions?api-version=2023-12-01-preview

MacOS/Linux

curl -X POST -H "api-key: $AISERVICE_API_KEY" -H "Content-Type: application/json" -d '{
    "messages": [
      {
        "role": "system",
        "content": "Assistant is a large language model trained by OpenAI."
      },
      {
        "role": "user",
        "content": "Who are the founders of Microsoft?"
      }
    ],
    "max_tokens": 15
  }' "http://localhost:5380/service/standard/openai/deployments/$DEPLOYMENT/chat/completions?api-version=2023-12-01-preview"

Expect a response like this. This will be your recorded response for /mock/service/standard/ requests

{
  "id": "chatcmpl-8nZCiOLutrIDeVT94lyXkYzdKtkDe",
  "object": "chat.completion",
  "created": 1706824088,
  "model": "gpt-35-turbo",
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The founders of Microsoft are Bill Gates and Paul Allen. They founded Microsoft on"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 15,
    "total_tokens": 44
  }
}

Now try to make your requests to the mock router /mock/service/standard/:

Windows Command Line

curl -X POST ^
     -H "api-key: %AISERVICE_API_KEY%" ^
     -H "Content-Type: application/json" ^
     -d "{\"messages\": [{\"role\": \"system\",\"content\": \"Assistant is a large language model trained by OpenAI.\"},{\"role\": \"user\",\"content\": \"Who are the founders of Microsoft?\"}],\"max_tokens\": 15}" ^
     http://localhost:5380/mock/service/standard/openai/deployments/%DEPLOYMENT%/chat/completions?api-version=2023-12-01-preview

MacOS/Linux

curl -X POST -H "api-key: $AISERVICE_API_KEY" -H "Content-Type: application/json" -d '{
    "messages": [
      {
        "role": "system",
        "content": "Assistant is a large language model trained by OpenAI."
      },
      {
        "role": "user",
        "content": "Who are the founders of Microsoft?"
      }
    ],
    "max_tokens": 15
  }' "http://localhost:5380/mock/service/standard/openai/deployments/$DEPLOYMENT/chat/completions?api-version=2023-12-01-preview"

And every time you should receive your first recorded response:

{
  "id": "chatcmpl-8nZCiOLutrIDeVT94lyXkYzdKtkDe",
  "object": "chat.completion",
  "created": 1706824088,
  "model": "gpt-35-turbo",
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The founders of Microsoft are Bill Gates and Paul Allen. They founded Microsoft on"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 15,
    "total_tokens": 44
  }
}

Congratulations, you now have a mock LLM API!

5. Learn about configuring `gecholog` via web interface

The GUI_SECRET is the password to login to the web interface of gecholog, available on http://localhost:8080/login.

Usage

Record responses

mock will store the last response for each router.

request1 to /service/standard/ returns answer1
request2 to /service/standard/ returns answer2
request3 to /service/standard/ returns answer3
request4 to /mock/service/standard/ returns answer3

mock will separate the responses for each router.

request1 to /service/standard/ returns answer1
request2 to /service/capped/ returns answer2
request3 to /mock/service/standard/ returns answer1
request4 to /mock/service/capped/ returns answer2

Change response time

mock will randomize response time using the Exponential distribution with environment variable LAMBDA. Set LAMBDA=0 for disabling the latency which is the default value. The docker-compose.yml uses LAMBDA=0.2 which gives mean value of response time to 500 ms.

Start `gecholog` and `mock` manually

Windows Command Line

# Set the nats token (necessary for mock to connect to gecholog)
setx NATS_TOKEN "changeme"

# Set the gui secret to be able to gecholog web interface
setx GUI_SECRET "changeme"

# Replace this with the url to your LLM API
setx AISERVICE_API_BASE "https://your.openai.azure.com/"

MacOS/Linux

# Set the nats token (necessary for mock to connect to gecholog)
export NATS_TOKEN=changeme

# Set the gui secret to be able to gecholog web interface
export GUI_SECRET=changeme

# Replace this with the url to your LLM API
export AISERVICE_API_BASE=https://your.openai.azure.com/

Start and configure the gecholog container

cd gecholog_resources/processors/mock

# Create a docker network
docker network create gecholog

# Spin up gecholog container
docker run -d -p 5380:5380 -p 4222:4222 -p 8080:8080 \
  --network gecholog --name gecholog \
  --env NATS_TOKEN=$NATS_TOKEN \
  --env GUI_SECRET=$GUI_SECRET \
  --env AISERVICE_API_BASE=$AISERVICE_API_BASE \
  gecholog/gecholog:latest

# Copy the gl_config to gecholog (if valid it will be applied directly)
# This config tells gecholog when to call mock
docker cp gl_config.json gecholog:/app/conf/gl_config.json

Optional: Check that the config file is applied (both statements should produce the same checksum)

docker exec gecholog ./healthcheck -s gl -p

versus

Windows Command Line

CertUtil -hashfile gl_config.json SHA256

MacOS

shasum -a 256 gl_config.json

Linux

sha256sum gl_config.json

Continue with the mock processor container

# Build the processor container
docker build --no-cache -f Dockerfile -t mock .

# Start the processor container
docker run -d \
        --network gecholog --name mock \
        --env NATS_TOKEN=$NATS_TOKEN \
        --env GECHOLOG_HOST=gecholog \
        --env LAMBDA=0.2 \
        mock

Response headers

mock will store both the response payload and response headers. The headers are coupled to each recorded answer. gecholog applies processing rules for headers as documented in Section Headers. For example, the Session-ID header will be unique for each request to the /mock/ router

{
    "Session-Id": [
    "TST00001_1709042087441156891_5_0"
  ],
}

Monitor logs in realtime

You can connect to the service bus of gecholog container to see the logs from the api calls.

This command will display the control field that mock uses to prevent the request from being forwarded to the LLM API

Windows Command Line

nats sub --translate "jq .request.control" -s "%NATS_TOKEN%@localhost" "coburn.gl.logger"

MacOS/Linux

nats sub --translate "jq .request.control" -s "$NATS_TOKEN@localhost" "coburn.gl.logger"

Sending a first request to /service/standard/ and three consecutive to /mock/service/standard/ produces this output

14:10:56 Subscribing on coburn.gl.logger 
[#1] Received on "coburn.gl.logger"
null

[#2] Received on "coburn.gl.logger"
"/service/standard/openai/deployments/gpt4/chat/completions"

[#3] Received on "coburn.gl.logger"
"/service/standard/openai/deployments/gpt4/chat/completions"

[#4] Received on "coburn.gl.logger"
"/service/standard/openai/deployments/gpt4/chat/completions"

Mock

Quick Start: Simulate LLM API Responses

1. Clone the GitHub repo

2. Set environment variables

3. Start gecholog and the mock processor

4. Make the calls

5. Learn about configuring gecholog via web interface

Usage

Record responses

Change response time

Start gecholog and mock manually

Response headers

Monitor logs in realtime

3. Start `gecholog` and the `mock` processor

5. Learn about configuring `gecholog` via web interface

Start `gecholog` and `mock` manually