Models

Models is where you wire Runwita to one or more AI providers. Two tiers, four providers each, fully independent. See AI tiers for what each tier does; this page is the configuration reference.

The shape of the panel

You’ll see two cards: Frontier (intelligence) and Workhorse (extraction and chat). Each has the same controls:

Provider toggle, four buttons: Claude, OpenAI, Ollama, Custom.
API key, password-masked input.
Base URL, only visible for Ollama and Custom.
Model, a dropdown populated from the provider’s /v1/models endpoint when you click Load models.

Settings auto-saves only the theme picker. Everything else lives in the form until you click Save all at the bottom.

Provider: Claude (Anthropic)

The default for the Frontier tier. Recommended whenever output quality matters more than cost.

Get an API key

console.anthropic.com → API Keys → Create key. Copy it.

Paste it into the API key field

On either or both tiers, depending on what you want Claude to handle.

Click Load models

The dropdown populates with every Claude model your key has access to. Newer models appear at the top.

Pick a model

Suggested defaults: claude-haiku-4-5 for both tiers (cheap, fast, surprisingly capable). claude-opus-4-5 for Frontier when budget allows. claude-sonnet-4-5 for Workhorse when you want a step up from Haiku without going to Opus.

Save all

The bottom save bar commits your changes.

The base URL for Claude isn’t user-editable, it’s always https://api.anthropic.com.

Provider: OpenAI

Get an API key

platform.openai.com → API keys → Create new secret key.

Paste it in

Same field. Same flow.

Click Load models

The dropdown filters to chat-completion-capable models only, embeddings, audio, image, and reasoning-only models are stripped out, so you don’t have to wade through 60+ entries.

Pick a model

Recommended defaults: gpt-5.4-nano for Workhorse (cheapest credible). gpt-5 for Frontier or for Workhorse if Workhorse is struggling. gpt-4.1 if you need a larger output budget.

OpenAI’s GPT-5 and o-series models have a different parameter shape than older models (max_completion_tokens instead of max_tokens, no custom temperature). Runwita detects this automatically by model name, you don’t have to configure anything.

Provider: Ollama (local)

Ollama runs a model on your own machine. Zero API cost, zero data leaves your laptop. It’s slower and less capable than the cloud options, but for privacy or offline use it’s the answer.

Install Ollama

ollama.com/download. Runs as a background service.

Pull a model

Open Terminal and run ollama pull qwen3:8b. (Qwen3-8B is the current recommended default for Runwita-style extraction tasks. Llama 3 and Phi-3 also work.)

Configure the tier

Provider: Ollama. Base URL: http://localhost:11434/v1 (the default; only change if you’ve moved Ollama somewhere else). Click Load models, the dropdown shows everything you have pulled.

Pick the model

qwen3:8b is the safe default. Smaller models (qwen3:4b, phi3:mini) are faster but the output quality drops off a cliff for topic matching.

The API key field is unused for Ollama (Ollama doesn’t require auth), but the form keeps it visible for consistency. Leave it blank.

Provider: Custom (any OpenAI-compatible endpoint)

For LiteLLM, vLLM, Together, OpenRouter, your own proxy, anything that speaks the OpenAI chat completions API.

Set the base URL

The full URL up to but not including /chat/completions. For LiteLLM that’s typically https://your-litellm-host.example/v1. For OpenRouter it’s https://openrouter.ai/api/v1.

Set the API key

Whatever your endpoint expects. Paste it in.

Click Load models

Hits ${base_url}/models. If your endpoint exposes a model list, the dropdown populates.

Pick the model

The exact model identifier your endpoint uses. “claude-3-5-sonnet” on a LiteLLM proxy is different from “anthropic/claude-3-5-sonnet” on OpenRouter, watch the names.

If your custom endpoint doesn’t expose /v1/models, you can still use Custom: the Load models button errors out, but you can pick from previously-saved models or just type the model name in. The model field is editable text.

Token budgets

Runwita sends generous output token budgets to give the AI room to produce thorough notes:

Single-meeting extraction: 16,384 tokens.
Merged extractions (multiple inbox items combined): 32,768 tokens.
Frontier intelligence calls: 6,144 tokens (these are short outputs).

Smaller models can fail to fill these budgets and that’s fine. Bigger models with smaller native output caps (some open models cap at 4K or 8K) might hit a truncation error. If you see “AI response was truncated”, switch to a model with a higher output cap, the error message tells you which model truncated.

What “Load models” actually does

Clicking Load models makes a real HTTP request from your machine to the provider’s /v1/models endpoint (or /api/tags for Ollama). It uses your API key. Two implications:

If the request fails (bad key, wrong endpoint, network issue), you’ll see the exact error inline, that’s diagnostic info, not just “something’s wrong”.
The model list reflects what your account actually has access to right now. If OpenAI rolls out a new model and your key is on the right tier to use it, it appears in the dropdown automatically.

Switching providers preserves your settings

Each provider has its own API-key slot per tier. So you can configure Claude and OpenAI both, switch back and forth, and your keys are preserved. The active provider is what gets used for extractions; the others sit dormant.

Welcome

Getting started

Concepts

Settings

Troubleshooting

The shape of the panel

Provider: Claude (Anthropic)

Provider: OpenAI

Provider: Ollama (local)

Provider: Custom (any OpenAI-compatible endpoint)

Token budgets

What “Load models” actually does

Switching providers preserves your settings

What’s next

Connections

AI tiers (concept)

Welcome

Getting started

Concepts

Settings

Troubleshooting

Documentation Index

​The shape of the panel

​Provider: Claude (Anthropic)

​Provider: OpenAI

​Provider: Ollama (local)

​Provider: Custom (any OpenAI-compatible endpoint)

​Token budgets

​What “Load models” actually does

​Switching providers preserves your settings

​What’s next

Connections

AI tiers (concept)

The shape of the panel

Provider: Claude (Anthropic)

Provider: OpenAI

Provider: Ollama (local)

Provider: Custom (any OpenAI-compatible endpoint)

Token budgets

What “Load models” actually does

Switching providers preserves your settings

What’s next