Regex

The regex custom processor utilizes regular expressions to extract information from the LLM API response. Fields from which to extract and regex patterns to use can be customized. The default behavior is as follows:

  • regex attempts to extract text within markdown TEXT from the response field choices[0].message.content for all routers except the /json/ router.
  • It adds a new field to the response indicating whether the match was successful and includes the extracted text.
  • For the /json/ router, regex extracts JSON and deserializes the extracted data.

The environment variable MATCH_JSON toggles the deserialization feature.

Quick Start: Explore Regex Extraction

1. Clone the GitHub repo

git clone https://github.com/direktoren/gecholog_resources.git

2. Set environment variables

# Set the nats token (necessary for mock to connect to gecholog)
setx NATS_TOKEN "changeme"

# Set the gui secret to be able to gecholog web interface
setx GUI_SECRET "changeme"

# Replace this with the url to your LLM API
setx AISERVICE_API_BASE "https://your.openai.azure.com/"
# Set the nats token (necessary for mock to connect to gecholog)
export NATS_TOKEN=changeme

# Set the gui secret to be able to gecholog web interface
export GUI_SECRET=changeme

# Replace this with the url to your LLM API
export AISERVICE_API_BASE=https://your.openai.azure.com/


3. Start gecholog and the regex processor

cd gecholog_resources/processors/regex
docker compose up -d

The Docker Compose command starts and configures the LLM Gateway gecholog and the processor regex. It builds the regex container locally.


NOTE: To take the app down, run docker compose down -v


4. Make the calls

These examples will use Azure OpenAI, but you can use any LLM API service.

setx AISERVICE_API_KEY "your_api_key"              
setx DEPLOYMENT "your_azure_deployment"         
export AISERVICE_API_KEY=your_api_key      
export DEPLOYMENT=your_azure_deployment       


Markdown Extraction

Make a request to the /markdown/ router and ask the LLM API for response in markdown. Test this with GPT-4 for best results.

curl -X POST ^
     -H "api-key: %AISERVICE_API_KEY%" ^
     -H "Content-Type: application/json" ^
     -d "{\"messages\": [{\"role\": \"system\",\"content\": \"Assistant is a large language model trained by OpenAI.\"},{\"role\": \"user\",\"content\": \"Who are the founders of Microsoft? Please provide answer in 20 words in markdown\"}],\"max_tokens\": 150}" ^
     http://localhost:5380/markdown/openai/deployments/%DEPLOYMENT%/chat/completions?api-version=2023-12-01-preview
curl -X POST -H "api-key: $AISERVICE_API_KEY" -H "Content-Type: application/json" -d '{
    "messages": [
      {
        "role": "system",
        "content": "Assistant is a large language model trained by OpenAI."
      },
      {
        "role": "user",
        "content": "Who are the founders of Microsoft? Please provide answer in 20 words in markdown"
      }
    ],
    "max_tokens": 150
  }' "http://localhost:5380/markdown/openai/deployments/$DEPLOYMENT/chat/completions?api-version=2023-12-01-preview"


Receive a response like this

{
  "id": "chatcmpl-8gA57qCBPn7nTOMqhNDIiaFxoCRD7",
  "object": "chat.completion",
  "created": 1705059221,
  "model": "gpt-4",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "```markdown\nMicrosoft was founded by Bill Gates and Paul Allen on April 4, 1975.\n```"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 38,
    "completion_tokens": 22,
    "total_tokens": 60
  },
  "system_fingerprint": "fp_6d044fb900",
  "regex": {
    "match": true,
    "sections": [
      {
        "text": "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975.",
        "object": null
      }
    ]
  }
}

The regex processor has added the fields

{
  "regex": {
    "match": true,
    "sections": [
      {
        "text": "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975.",
        "object": null
      }
    ]
  }
}

JSON Extraction

Make a request to the /json/ router and ask the LLM API for response in JSON. Test this with GPT-4 for best results.

curl -X POST ^
     -H "api-key: %AISERVICE_API_KEY%" ^
     -H "Content-Type: application/json" ^
     -d "{\"messages\": [{\"role\": \"system\",\"content\": \"Assistant is a large language model trained by OpenAI.\"},{\"role\": \"user\",\"content\": \"Who are the founders of Microsoft? Please provide answer in 20 words in json\"}],\"max_tokens\": 150}" ^
     http://localhost:5380/json/openai/deployments/%DEPLOYMENT%/chat/completions?api-version=2023-12-01-preview
curl -X POST -H "api-key: $AISERVICE_API_KEY" -H "Content-Type: application/json" -d '{
    "messages": [
      {
        "role": "system",
        "content": "Assistant is a large language model trained by OpenAI."
      },
      {
        "role": "user",
        "content": "Who are the founders of Microsoft? Please provide answer in 20 words in json"
      }
    ],
    "max_tokens": 150
  }' "http://localhost:5380/json/openai/deployments/$DEPLOYMENT/chat/completions?api-version=2023-12-01-preview"


Receive a response like this

{
  "id": "chatcmpl-8gA6hfW1QLmh2MaLTI8J55KraVyBq",
  "object": "chat.completion",
  "created": 1705059319,
  "model": "gpt-4",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "```json\n{\n  \"founders_of_microsoft\": [\n    \"Bill Gates\",\n    \"Paul Allen\"\n  ]\n}\n```"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 38,
    "completion_tokens": 27,
    "total_tokens": 65
  },
  "system_fingerprint": "fp_6d044fb900",
  "regex": {
    "match": true,
    "sections": [
      {
        "text": "{\n  \"founders_of_microsoft\": [\n    \"Bill Gates\",\n    \"Paul Allen\"\n  ]\n}",
        "object": {
          "founders_of_microsoft": [
            "Bill Gates",
            "Paul Allen"
          ]
        }
      }
    ]
  }
}

The regex processor has added the fields and deserialized the JSON object

{
  "regex": {
    "match": true,
    "sections": [
      {
        "text": "{\n  \"founders_of_microsoft\": [\n    \"Bill Gates\",\n    \"Paul Allen\"\n  ]\n}",
        "object": {
          "founders_of_microsoft": [
            "Bill Gates",
            "Paul Allen"
          ]
        }
      }
    ]
  }
}

5. Learn about configuring gecholog via web interface

The GUI_SECRET is the password to login to the web interface of gecholog, available on http://localhost:8080/login.

Usage

Field Selection

The regex processor uses gjson syntax to select response fields to extraction. This makes regex LLM API agnostic. Default field selection pattern is choices[0].message.content.

Regular Expression

regex uses regular expression to extract patterns from the response fields. regex is written in go and uses the Re2 library Syntax.

Start gecholog and regex manually

# Set the nats token (necessary for mock to connect to gecholog)
setx NATS_TOKEN "changeme"

# Set the gui secret to be able to gecholog web interface
setx GUI_SECRET "changeme"

# Replace this with the url to your LLM API
setx AISERVICE_API_BASE "https://your.openai.azure.com/"
# Set the nats token (necessary for mock to connect to gecholog)
export NATS_TOKEN=changeme

# Set the gui secret to be able to gecholog web interface
export GUI_SECRET=changeme

# Replace this with the url to your LLM API
export AISERVICE_API_BASE=https://your.openai.azure.com/


Start and configure the gecholog container

cd gecholog_resources/processors/regex

# Create a docker network
docker network create gecholog

# Spin up gecholog container
docker run -d -p 5380:5380 -p 4222:4222 -p 8080:8080 \
  --network gecholog --name gecholog \
  --env NATS_TOKEN=$NATS_TOKEN \
  --env GUI_SECRET=$GUI_SECRET \
  --env AISERVICE_API_BASE=$AISERVICE_API_BASE \
  gecholog/gecholog:latest

# Copy the gl_config to gecholog (if valid it will be applied directly)
docker cp gl_config.json gecholog:/app/conf/gl_config.json

Optional: Check that the config file is applied (both statements should produce the same checksum)

docker exec gecholog ./healthcheck -s gl -p

versus

CertUtil -hashfile gl_config.json SHA256
shasum -a 256 gl_config.json
sha256sum gl_config.json


Continue with the regex processor container

# Build the processor container
docker build --no-cache -f Dockerfile -t regex .

# Start the processor container
docker run -d \
    --network gecholog --name regex \
    --env NATS_TOKEN=$NATS_TOKEN \
    --env GECHOLOG_HOST=gecholog \
    --env MATCH_JSON=true \
    regex

Monitor logs in realtime

You can connect to the service bus of gecholog container to see the logs from the api calls.

This command will display the regex response field the processor is adding to each response

nats sub --translate "jq .response.egress_payload.regex" -s "%NATS_TOKEN%@localhost" "coburn.gl.logger"
nats sub --translate "jq .response.egress_payload.regex" -s "$NATS_TOKEN@localhost" "coburn.gl.logger"


Example

12:33:27 Subscribing on coburn.gl.logger 
[#1] Received on "coburn.gl.logger"
{
  "match": true,
  "sections": [
    {
      "text": "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975.",
      "object": null
    }
  ]
}