LLM Configuration

Configure AI providers for decision synthesis in self-hosted Align.

Overview

Align uses LLMs for:

Decision synthesis — Extracting structured decisions from conversations
Context understanding — Understanding the surrounding discussion
Embeddings — Semantic search across decisions

Provider Options

Provider	Pros	Cons
OpenAI	Best quality, easy setup	Data leaves your infra
Anthropic	High quality, safety focus	Data leaves your infra
Self-Hosted	Full data sovereignty	More setup, hardware needed

Option 1: OpenAI

Setup

Get an API key from platform.openai.com
Create the secret:

kubectl create secret generic align-llm \
  --namespace align \
  --from-literal=openai-api-key="sk-..."

Configure in Helm values:

secrets:
  llm:
    openaiApiKey: ""  # Pulled from secret

Models Used

GPT-4 — Decision synthesis
text-embedding-3-small — Embeddings (or local)

Option 2: Anthropic

Setup

Get an API key from console.anthropic.com
Create the secret:

kubectl create secret generic align-llm \
  --namespace align \
  --from-literal=anthropic-api-key="sk-ant-..."

Configure in Helm values:

secrets:
  llm:
    anthropicApiKey: ""  # Pulled from secret

Models Used

Claude 3 — Decision synthesis

Option 3: Self-Hosted Models

For complete data sovereignty, use your own LLM server.

Supported Servers

Any server implementing the OpenAI API format:

Server	Use Case	Setup
Ollama	Easy local deployment	`ollama serve`
vLLM	Production GPU inference	Docker/K8s
LocalAI	CPU-friendly	Docker
LM Studio	Desktop GUI	App

Quick Start with Ollama

Deploy Ollama in your cluster:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: align
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
        - name: ollama
          image: ollama/ollama:latest
          ports:
            - containerPort: 11434
          resources:
            limits:
              nvidia.com/gpu: 1  # Optional: GPU
          volumeMounts:
            - name: models
              mountPath: /root/.ollama
      volumes:
        - name: models
          persistentVolumeClaim:
            claimName: ollama-models
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: align
spec:
  selector:
    app: ollama
  ports:
    - port: 11434

Pull a model:

kubectl exec -it deploy/ollama -n align -- ollama pull llama3:70b

Configure Align:

# values.yaml
secrets:
  llm:
    custom:
      baseUrl: "http://ollama.align.svc.cluster.local:11434/v1"
      model: "llama3:70b"
      apiKey: ""  # Not needed for Ollama
    useLocalEmbeddings: true

vLLM for Production

For high-throughput production use:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm
  namespace: align
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm
  template:
    metadata:
      labels:
        app: vllm
    spec:
      containers:
        - name: vllm
          image: vllm/vllm-openai:latest
          args:
            - "--model"
            - "meta-llama/Llama-3-70b-chat-hf"
            - "--tensor-parallel-size"
            - "4"
          ports:
            - containerPort: 8000
          resources:
            limits:
              nvidia.com/gpu: 4
          env:
            - name: HUGGING_FACE_HUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-token
                  key: token

Configure Align:

secrets:
  llm:
    custom:
      baseUrl: "http://vllm.align.svc.cluster.local:8000/v1"
      model: "meta-llama/Llama-3-70b-chat-hf"

Embeddings

Local Embeddings (Default)

Align uses local sentence-transformers by default:

Model: all-MiniLM-L6-v2 (384 dimensions)
Cost: Free (runs locally in Brain pod)
Privacy: Data never leaves your cluster

Enable with:

secrets:
  llm:
    useLocalEmbeddings: true

OpenAI Embeddings

To use OpenAI embeddings instead:

secrets:
  llm:
    useLocalEmbeddings: false
    openaiApiKey: "sk-..."

Recommended Models

Task	Model	Notes
Decision synthesis	`llama3:70b`	Best quality for local
Decision synthesis	`llama3:8b`	Faster, lower resource
Decision synthesis	`mistral:7b`	Good balance
Decision synthesis	`gpt-4`	Best overall (cloud)
Decision synthesis	`claude-3-opus`	High quality (cloud)
Embeddings	`all-MiniLM-L6-v2`	Local, fast, good quality

Configuration via UI

You can also configure LLM settings in the Align UI:

Go to Settings → LLM Settings
Select provider
Enter credentials
Save

UI-configured settings are stored encrypted in the database and take precedence over Helm values.

Troubleshooting

Connection refused

Ensure the LLM server is accessible from the Brain pod:

kubectl exec -it deploy/align-brain -n align -- \
  curl http://ollama:11434/v1/models

Slow responses

Use GPU acceleration (vLLM recommended for production)
Reduce model size (8B instead of 70B)
Increase Brain pod resources

Model not found

# For Ollama, pull the model first
kubectl exec -it deploy/ollama -n align -- ollama pull llama3:70b

JSON mode issues

Some local models don't support JSON mode reliably. Consider:

Using models fine-tuned for structured output
Falling back to OpenAI/Anthropic for critical tasks

Security

API keys are encrypted at rest
Self-hosted models keep all data in your cluster
Use Kubernetes NetworkPolicies to restrict LLM server access
Consider mTLS for production deployments

Overview​

Provider Options​

Option 1: OpenAI​

Setup​

Models Used​

Option 2: Anthropic​

Setup​

Models Used​

Option 3: Self-Hosted Models​

Supported Servers​

Quick Start with Ollama​

vLLM for Production​

Embeddings​

Local Embeddings (Default)​

OpenAI Embeddings​

Recommended Models​

Configuration via UI​

Troubleshooting​

Connection refused​

Slow responses​

Model not found​

JSON mode issues​

Security​

Overview

Provider Options

Option 1: OpenAI

Setup

Models Used

Option 2: Anthropic

Setup

Models Used

Option 3: Self-Hosted Models

Supported Servers

Quick Start with Ollama

vLLM for Production

Embeddings

Local Embeddings (Default)

OpenAI Embeddings

Recommended Models

Configuration via UI

Troubleshooting

Connection refused

Slow responses

Model not found

JSON mode issues

Security