LLM Configuration
Configure AI providers for decision synthesis in self-hosted Align.
Overview
Align uses LLMs for:
- Decision synthesis — Extracting structured decisions from conversations
- Context understanding — Understanding the surrounding discussion
- Embeddings — Semantic search across decisions
Provider Options
| Provider | Pros | Cons |
|---|---|---|
| OpenAI | Best quality, easy setup | Data leaves your infra |
| Anthropic | High quality, safety focus | Data leaves your infra |
| Self-Hosted | Full data sovereignty | More setup, hardware needed |
Option 1: OpenAI
Setup
-
Get an API key from platform.openai.com
-
Create the secret:
kubectl create secret generic align-llm \
--namespace align \
--from-literal=openai-api-key="sk-..."
- Configure in Helm values:
secrets:
llm:
openaiApiKey: "" # Pulled from secret
Models Used
- GPT-4 — Decision synthesis
- text-embedding-3-small — Embeddings (or local)
Option 2: Anthropic
Setup
-
Get an API key from console.anthropic.com
-
Create the secret:
kubectl create secret generic align-llm \
--namespace align \
--from-literal=anthropic-api-key="sk-ant-..."
- Configure in Helm values:
secrets:
llm:
anthropicApiKey: "" # Pulled from secret
Models Used
- Claude 3 — Decision synthesis
Option 3: Self-Hosted Models
For complete data sovereignty, use your own LLM server.
Supported Servers
Any server implementing the OpenAI API format:
| Server | Use Case | Setup |
|---|---|---|
| Ollama | Easy local deployment | ollama serve |
| vLLM | Production GPU inference | Docker/K8s |
| LocalAI | CPU-friendly | Docker |
| LM Studio | Desktop GUI | App |
Quick Start with Ollama
- Deploy Ollama in your cluster:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: align
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
limits:
nvidia.com/gpu: 1 # Optional: GPU
volumeMounts:
- name: models
mountPath: /root/.ollama
volumes:
- name: models
persistentVolumeClaim:
claimName: ollama-models
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: align
spec:
selector:
app: ollama
ports:
- port: 11434
- Pull a model:
kubectl exec -it deploy/ollama -n align -- ollama pull llama3:70b
- Configure Align:
# values.yaml
secrets:
llm:
custom:
baseUrl: "http://ollama.align.svc.cluster.local:11434/v1"
model: "llama3:70b"
apiKey: "" # Not needed for Ollama
useLocalEmbeddings: true
vLLM for Production
For high-throughput production use:
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm
namespace: align
spec:
replicas: 1
selector:
matchLabels:
app: vllm
template:
metadata:
labels:
app: vllm
spec:
containers:
- name: vllm
image: vllm/vllm-openai:latest
args:
- "--model"
- "meta-llama/Llama-3-70b-chat-hf"
- "--tensor-parallel-size"
- "4"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: 4
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: token
Configure Align:
secrets:
llm:
custom:
baseUrl: "http://vllm.align.svc.cluster.local:8000/v1"
model: "meta-llama/Llama-3-70b-chat-hf"
Embeddings
Local Embeddings (Default)
Align uses local sentence-transformers by default:
- Model:
all-MiniLM-L6-v2(384 dimensions) - Cost: Free (runs locally in Brain pod)
- Privacy: Data never leaves your cluster
Enable with:
secrets:
llm:
useLocalEmbeddings: true
OpenAI Embeddings
To use OpenAI embeddings instead:
secrets:
llm:
useLocalEmbeddings: false
openaiApiKey: "sk-..."
Recommended Models
| Task | Model | Notes |
|---|---|---|
| Decision synthesis | llama3:70b | Best quality for local |
| Decision synthesis | llama3:8b | Faster, lower resource |
| Decision synthesis | mistral:7b | Good balance |
| Decision synthesis | gpt-4 | Best overall (cloud) |
| Decision synthesis | claude-3-opus | High quality (cloud) |
| Embeddings | all-MiniLM-L6-v2 | Local, fast, good quality |
Configuration via UI
You can also configure LLM settings in the Align UI:
- Go to Settings → LLM Settings
- Select provider
- Enter credentials
- Save
UI-configured settings are stored encrypted in the database and take precedence over Helm values.
Troubleshooting
Connection refused
Ensure the LLM server is accessible from the Brain pod:
kubectl exec -it deploy/align-brain -n align -- \
curl http://ollama:11434/v1/models
Slow responses
- Use GPU acceleration (vLLM recommended for production)
- Reduce model size (8B instead of 70B)
- Increase Brain pod resources
Model not found
# For Ollama, pull the model first
kubectl exec -it deploy/ollama -n align -- ollama pull llama3:70b
JSON mode issues
Some local models don't support JSON mode reliably. Consider:
- Using models fine-tuned for structured output
- Falling back to OpenAI/Anthropic for critical tasks
Security
- API keys are encrypted at rest
- Self-hosted models keep all data in your cluster
- Use Kubernetes NetworkPolicies to restrict LLM server access
- Consider mTLS for production deployments