Skip to main content

LLM Configuration

Configure AI providers for decision synthesis in self-hosted Align.

Overview

Align uses LLMs for:

  • Decision synthesis — Extracting structured decisions from conversations
  • Context understanding — Understanding the surrounding discussion
  • Embeddings — Semantic search across decisions

Provider Options

ProviderProsCons
OpenAIBest quality, easy setupData leaves your infra
AnthropicHigh quality, safety focusData leaves your infra
Self-HostedFull data sovereigntyMore setup, hardware needed

Option 1: OpenAI

Setup

  1. Get an API key from platform.openai.com

  2. Create the secret:

kubectl create secret generic align-llm \
--namespace align \
--from-literal=openai-api-key="sk-..."
  1. Configure in Helm values:
secrets:
llm:
openaiApiKey: "" # Pulled from secret

Models Used

  • GPT-4 — Decision synthesis
  • text-embedding-3-small — Embeddings (or local)

Option 2: Anthropic

Setup

  1. Get an API key from console.anthropic.com

  2. Create the secret:

kubectl create secret generic align-llm \
--namespace align \
--from-literal=anthropic-api-key="sk-ant-..."
  1. Configure in Helm values:
secrets:
llm:
anthropicApiKey: "" # Pulled from secret

Models Used

  • Claude 3 — Decision synthesis

Option 3: Self-Hosted Models

For complete data sovereignty, use your own LLM server.

Supported Servers

Any server implementing the OpenAI API format:

ServerUse CaseSetup
OllamaEasy local deploymentollama serve
vLLMProduction GPU inferenceDocker/K8s
LocalAICPU-friendlyDocker
LM StudioDesktop GUIApp

Quick Start with Ollama

  1. Deploy Ollama in your cluster:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: align
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
limits:
nvidia.com/gpu: 1 # Optional: GPU
volumeMounts:
- name: models
mountPath: /root/.ollama
volumes:
- name: models
persistentVolumeClaim:
claimName: ollama-models
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: align
spec:
selector:
app: ollama
ports:
- port: 11434
  1. Pull a model:
kubectl exec -it deploy/ollama -n align -- ollama pull llama3:70b
  1. Configure Align:
# values.yaml
secrets:
llm:
custom:
baseUrl: "http://ollama.align.svc.cluster.local:11434/v1"
model: "llama3:70b"
apiKey: "" # Not needed for Ollama
useLocalEmbeddings: true

vLLM for Production

For high-throughput production use:

apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm
namespace: align
spec:
replicas: 1
selector:
matchLabels:
app: vllm
template:
metadata:
labels:
app: vllm
spec:
containers:
- name: vllm
image: vllm/vllm-openai:latest
args:
- "--model"
- "meta-llama/Llama-3-70b-chat-hf"
- "--tensor-parallel-size"
- "4"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: 4
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: token

Configure Align:

secrets:
llm:
custom:
baseUrl: "http://vllm.align.svc.cluster.local:8000/v1"
model: "meta-llama/Llama-3-70b-chat-hf"

Embeddings

Local Embeddings (Default)

Align uses local sentence-transformers by default:

  • Model: all-MiniLM-L6-v2 (384 dimensions)
  • Cost: Free (runs locally in Brain pod)
  • Privacy: Data never leaves your cluster

Enable with:

secrets:
llm:
useLocalEmbeddings: true

OpenAI Embeddings

To use OpenAI embeddings instead:

secrets:
llm:
useLocalEmbeddings: false
openaiApiKey: "sk-..."

TaskModelNotes
Decision synthesisllama3:70bBest quality for local
Decision synthesisllama3:8bFaster, lower resource
Decision synthesismistral:7bGood balance
Decision synthesisgpt-4Best overall (cloud)
Decision synthesisclaude-3-opusHigh quality (cloud)
Embeddingsall-MiniLM-L6-v2Local, fast, good quality

Configuration via UI

You can also configure LLM settings in the Align UI:

  1. Go to SettingsLLM Settings
  2. Select provider
  3. Enter credentials
  4. Save

UI-configured settings are stored encrypted in the database and take precedence over Helm values.


Troubleshooting

Connection refused

Ensure the LLM server is accessible from the Brain pod:

kubectl exec -it deploy/align-brain -n align -- \
curl http://ollama:11434/v1/models

Slow responses

  • Use GPU acceleration (vLLM recommended for production)
  • Reduce model size (8B instead of 70B)
  • Increase Brain pod resources

Model not found

# For Ollama, pull the model first
kubectl exec -it deploy/ollama -n align -- ollama pull llama3:70b

JSON mode issues

Some local models don't support JSON mode reliably. Consider:

  • Using models fine-tuned for structured output
  • Falling back to OpenAI/Anthropic for critical tasks

Security

  • API keys are encrypted at rest
  • Self-hosted models keep all data in your cluster
  • Use Kubernetes NetworkPolicies to restrict LLM server access
  • Consider mTLS for production deployments