Documentation

MindBalancer is a high-performance load balancer and reverse proxy for AI/LLM APIs. Think ProxySQL, but for AI.

Installation

Requirements

  • Go 1.20 or later
  • SQLite (included)
  • Any Linux, macOS, or Windows system

From Source

# Clone the repository
git clone https://github.com/mindbalancer/mindbalancer-labs.git
cd mindbalancer-labs

# Build
make build

# Binaries will be in ./bin/
ls -la bin/
# mindbalancer  mindsql

Using Go Install

go install github.com/mindbalancer/mindbalancer-labs/cmd/mindbalancer@latest
go install github.com/mindbalancer/mindbalancer-labs/cmd/mindsql@latest

Quick Start

Get MindBalancer running in under 5 minutes.

1. Create Configuration

# mindbalancer.cnf
[mindbalancer]
proxy_bind_address = 0.0.0.0
proxy_port = 6034
admin_bind_address = 127.0.0.1
admin_port = 6032
data_dir = /var/lib/mindbalancer

# Optional: 32-char key for API key encryption
api_key_encryption_key = your-32-character-encryption-key

2. Start MindBalancer

./bin/mindbalancer -config mindbalancer.cnf

You should see:

Starting proxy server on 0.0.0.0:6034
Starting admin MySQL server on 127.0.0.1:6032
Starting admin HTTP server on 127.0.0.1:6033

3. Add Your First Server

# Connect to admin interface
./bin/mindsql

# Add an OpenAI server
mindsql> INSERT INTO ai_servers (name, provider_type, endpoint, api_key_encrypted, weight, status)
         VALUES ('openai-primary', 'openai', 'https://api.openai.com', 'sk-your-key', 100, 'ONLINE');

# Verify
mindsql> SELECT * FROM ai_servers;

4. Send Your First Request

curl http://localhost:6034/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Configuration

MindBalancer uses an INI-style configuration file.

Setting Default Description
proxy_port 6034 Port for OpenAI-compatible API
admin_port 6032 Port for mindsql admin interface
data_dir /var/lib/mindbalancer Directory for SQLite database
health_check_interval_ms 5000 Health check frequency (ms)
circuit_breaker_threshold 5 Failures before circuit opens
max_retries 3 Max retry attempts
cache_enabled true Enable response caching
cache_ttl_ms 300000 Cache entry TTL (5 min)
rate_limit_enabled true Enable per-user rate limiting

Architecture

MindBalancer sits between your application and AI providers, handling:

  • Load Balancing — Distribute requests across multiple providers
  • Health Checks — Continuously monitor provider health
  • Circuit Breaking — Prevent cascade failures
  • Response Caching — Cache deterministic responses
  • Request Routing — Route by model, pattern, or user
┌─────────────┐     ┌──────────────────┐     ┌─────────────┐
│ Application │────▶│   MindBalancer   │────▶│   OpenAI    │
│  (OpenAI    │     │                  │     └─────────────┘
│   SDK)      │     │  ┌────────────┐  │     ┌─────────────┐
└─────────────┘     │  │ Balancer   │  │────▶│  Anthropic  │
                    │  │ Router     │  │     └─────────────┘
                    │  │ Cache      │  │     ┌─────────────┐
                    │  │ Metrics    │  │────▶│   Ollama    │
                    │  └────────────┘  │     └─────────────┘
                    └──────────────────┘

Load Balancing

Strategies

MindBalancer supports multiple load balancing strategies:

  • Weighted Round-Robin — Distribute based on server weights
  • Least Connections — Route to server with fewest active requests
  • Latency-based — Prefer servers with lowest latency

Hostgroups

Organize servers into hostgroups for different workloads:

-- Fast models for chat
INSERT INTO ai_servers (name, hostgroup, ...) VALUES ('groq-1', 1, ...);

-- Powerful models for complex tasks  
INSERT INTO ai_servers (name, hostgroup, ...) VALUES ('openai-1', 2, ...);

-- Route based on model
INSERT INTO routing_rules (match_model, destination_hostgroup)
VALUES ('llama*', 1), ('gpt-4*', 2);

Failover

Health Checks

MindBalancer continuously monitors server health with configurable checks:

mindsql> SHOW HEALTH STATUS;
+---------------+---------+---------+---------------------+
| server        | healthy | latency | last_check          |
+---------------+---------+---------+---------------------+
| openai-main   | Yes     | 245ms   | 2025-01-25 10:30:05 |
| anthropic-1   | Yes     | 312ms   | 2025-01-25 10:30:05 |
| ollama-local  | No      | -       | 2025-01-25 10:30:04 |
+---------------+---------+---------+---------------------+

Circuit Breaker

After consecutive failures, the circuit breaker opens to prevent cascading failures:

  • Closed — Normal operation, requests flow through
  • Open — Too many failures, requests fail fast
  • Half-Open — Testing if service recovered

Retry with Backoff

Failed requests automatically retry with exponential backoff:

[mindbalancer]
max_retries = 3
retry_initial_delay_ms = 100
retry_max_delay_ms = 5000
retry_multiplier = 2.0

Caching

MindBalancer caches deterministic responses to reduce costs and latency.

How It Works

  • Only caches when temperature=0 (deterministic output)
  • Cache key = hash(model + messages + temperature + max_tokens)
  • Response includes X-Cache: HIT or X-Cache: MISS header

Managing Cache

-- Check cache status
mindsql> SHOW CACHE STATUS;
+------------------+------------------+
| Variable         | Value            |
+------------------+------------------+
| status           | enabled          |
| hits             | 1247             |
| misses           | 523              |
| hit_rate         | 0.70             |
| evictions        | 12               |
| size_bytes       | 2458624          |
| item_count       | 847              |
+------------------+------------------+

-- Enable/disable caching
mindsql> CACHE ENABLE;
mindsql> CACHE DISABLE;

-- Clear cache
mindsql> CACHE CLEAR;

HTTP API

# Get cache status
curl http://localhost:6033/api/cache

# Enable cache
curl -X PUT http://localhost:6033/api/cache -d '{"enabled": true}'

# Clear cache
curl -X POST http://localhost:6033/api/cache/clear

API Reference

MindBalancer exposes an OpenAI-compatible API.

Chat Completions

POST /v1/chat/completions

{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Models List

GET /v1/models

Response Headers

Header Description
X-Request-ID Unique request identifier
X-Cache HIT or MISS (caching status)
X-Retry-Count Number of retries (if any)
X-RateLimit-Remaining Remaining requests in window
X-RateLimit-Reset Unix timestamp when limit resets

mindsql CLI

mindsql is a MySQL-compatible CLI for managing MindBalancer.

Connection

# Default connection
./bin/mindsql

# Custom host/port
./bin/mindsql -h 192.168.1.100 -P 6032

# Execute single command
./bin/mindsql -e "SELECT * FROM ai_servers"

Commands

-- Server Management
SELECT * FROM ai_servers;
INSERT INTO ai_servers (name, provider_type, endpoint, api_key_encrypted, weight, status)
  VALUES ('name', 'openai', 'https://...', 'sk-...', 100, 'ONLINE');
DELETE FROM ai_servers WHERE name = 'server-name';

-- Routing Rules
SELECT * FROM routing_rules;

-- Users & Rate Limits
SELECT * FROM ai_users;

-- Monitoring
SHOW HEALTH STATUS;
SHOW API KEYS;
SHOW STATS;
SHOW VARIABLES;

-- Cache Management
SHOW CACHE STATUS;
CACHE ENABLE;
CACHE DISABLE;
CACHE CLEAR;

-- Configuration
SET max_retries = 5;

Providers

MindBalancer supports multiple AI providers through a unified interface.

Provider Type Endpoint
OpenAI openai https://api.openai.com
Anthropic anthropic https://api.anthropic.com
Azure OpenAI azure https://YOUR.openai.azure.com
Ollama ollama http://localhost:11434
Groq groq https://api.groq.com
Google AI google https://generativelanguage.googleapis.com

Monitoring

Prometheus Metrics

MindBalancer exposes Prometheus-compatible metrics at :9090/metrics.

# Request metrics
mindbalancer_requests_total{server, model, status}
mindbalancer_request_duration_seconds{server, model}
mindbalancer_tokens_total{server, model, type}

# Cost tracking
mindbalancer_cost_usd_total{server, model, provider_type}

# Cache metrics
mindbalancer_cache_hits_total{model}
mindbalancer_cache_misses_total{model}

# Health metrics
mindbalancer_server_health{server}

Web Dashboard

Access the built-in dashboard at http://localhost:6033/ for real-time monitoring.

Security

API Key Encryption

API keys are encrypted at rest using AES-256-GCM. Set a 32-character encryption key:

[mindbalancer]
api_key_encryption_key = your-32-character-encryption-key

Rate Limiting

Configure per-user rate limits:

[mindbalancer]
rate_limit_enabled = true
default_requests_per_minute = 60
default_tokens_per_minute = 100000

Best Practices

  • Run admin interface on localhost only (admin_bind_address = 127.0.0.1)
  • Use TLS in production (configure tls_cert_file and tls_key_file)
  • Rotate the encryption key periodically
  • Monitor rate limit headers for abuse detection