Can I use a local or open-weight model?

Yes. Point API_OPENAI_FORCE_BASE_URL at your internal inference endpoint (vLLM, TGI, Ollama, LiteLLM). The agent does not know if Llama, Mistral, GLM, Kimi, or DeepSeek is on the other end.

Can I pin a specific model version?

Yes. Use the exact snapshot string the provider exposes, like claude-sonnet-4-6 or gpt-5.1-2026-04-15. Kodus does not pick a model behind your back.

Can I switch models per repo?

Yes. The agent reads model config per repository, so backend can run on Sonnet 4.6 while frontend runs on Haiku 4.5 and the data-platform repo routes to a local vLLM.

Is there a default model if I do not bring one?

On Cloud, yes: a hosted default kicks in during trial. On self-hosted, no, you must set the three .env vars before the agent can call out.

What happens when a model gets deprecated?

The provider notifies you. We do not silently swap the model. We recommend running a regression set of historical PRs against the candidate version before flipping the env var.

How does Kodus compare to PR-Agent on BYO LLM?

PR-Agent is the honest peer on BYO LLM mechanics. It is the open source agent Qodo donated to the community in late 2025, now maintained at the-PR-Agent/pr-agent, distinct from Qodo's commercial Qodo Merge. Both Kodus and PR-Agent bring your key, pin versions, support local models, run open source. The difference is on the product side: Kodus ships a free hosted Cloud option, polished web UI, and first-class GitHub Enterprise Server, GitLab Self-Managed, and Bitbucket Data Center integrations.

BYO LLM code review.

Open source AI code review without vendor lock-in. The first model-agnostic AI code review tool with predictable AI review costs. You bring the model, we never touch your bill. Pay your provider directly with zero markup on inference, switch models in one config line, see every cent the review costs.

Pick your model See pricing

Install on a VM · Get the installer

the money path // 50 devs, annual rates

Bundled vendor

YOU $2,400 → VENDORkeeps $1,500 ~$900 → MODEL

Kodus BYO

YOU $818 → MODEL

KODUS orchestrates only, never in the money path

− $1,582/mo Same review. 66% less spend. Every line item visible.

What it is

BYO LLM code review is when an AI code reviewer runs on a model you own (your account, your API key, your pricing), instead of a model bundled by the review vendor. The vendor charges for orchestration. You pay the model provider directly for inference, with no markup in the middle.

Why this matters

Three things a bundled model decides for you.

SaaS code review vendors that lock the model are deciding three things on your behalf. Each one looks like a feature in the marketing page and shows up as a bill or a migration headache later.

Vendor lock-in on the model contract

The model lives behind the vendor's account. When you outgrow them, switching is not "swap an env var", it is renegotiating the entire review pipeline. With BYO LLM, the agent is the thing you can swap. The model contract stays where it always was: with the provider.

Cost opacity on inference

Bundled-vendor pricing rolls inference into a per-seat fee. You cannot tell what fraction is the model and what fraction is margin. When the bill grows, you have no levers (cheaper model, smaller context, fewer reviews) because you do not see the math. BYO LLM gives you two invoices and an audit trail per request.

Model lifecycle traps

Provider deprecates the snapshot the vendor bundled. The vendor moves you to the next default without telling you. Your review behavior shifts overnight. With BYO LLM, you pin the snapshot, you run a regression set on the candidate before flipping, and the deprecation is your calendar event, not a surprise.

Your model, your bill

You bring it. You own it.

Bundled vendors decide which model you use, which version you run, when it gets deprecated, and what the markup is. BYO LLM puts every one of those decisions back on your side of the contract. Four guarantees ship with Kodus by default.

01 ━

You pick the provider

OpenAI, Anthropic, Google, Groq, Together, Fireworks, an internal vLLM, an open-weight model on Ollama. The call goes where you point it. Kodus stays out of the routing.

02 ━

You pin the model version

Lock to claude-sonnet-4-6, gpt-5.1, or any specific release. No silent upgrades. When the provider deprecates a snapshot you decide when the next one ships, not us.

03 ━

You see the full LLM bill

Inference invoices land on your provider account, line-item per request. No hidden inference cost rolled into a per-seat fee. The bill matches the work and the math is yours to audit.

04 ━

Zero markup on inference

We charge per seat for orchestration. The model spend is between you and the provider, with nothing skimmed in the middle. Same applies to the self-hosted edition (which is free).

What the bill looks like

Two invoices. One difference.

Bundled-vendor pricing hides what the inference actually costs. BYO LLM splits the bill cleanly: orchestration on one side, inference on the other, both audited by you. Example below uses a 50-developer team on annual rates, running ~30 PRs per developer per month on Claude Sonnet 4.6.

Bundled vendor INV-2026-04 · CodeRabbit Pro (annual)

50 seats · Pro · $48/dev $2,400.00

Inference (bundled) hidden in seat fee

Markup on inference unknown

Monthly total $2,400.00

Inference cost is folded into the per-seat price. You cannot tell what the model actually costs or what margin sits on top. Monthly billing pushes this to $3,000 ($60/dev).

Kodus BYO INV-2026-04 · Kodus + your Anthropic account

50 seats · Teams · $8/dev annual $400.00

Inference (Anthropic, passthrough) $418.20

Markup on inference $0.00

Monthly total $818.20

Two invoices, two providers, two line items you can take to finance. The inference invoice comes from Anthropic with your seat count, your token usage, your spend cap.

Monthly delta in this example − $1,581.80 66% lower at annual rates, even bigger at monthly billing. And the inference number is yours to optimize (Haiku for triage, Sonnet for review, internal vLLM for the sensitive repos).

Numbers above use public list pricing as of 2026-05 on annual prepay: CodeRabbit Pro at $48/dev/month (or $60/dev/month if billed monthly), Kodus Teams at $8/dev/month (or $10/dev/month if billed monthly), Anthropic Claude Sonnet 4.6 at list passthrough rates. Inference estimate assumes ~3.5k input + ~600 output tokens per PR review across 1,500 reviewed PRs/month. Your bill will vary with model choice, PR size, and rule depth. Talk to us for a real estimate against your repo history.

Cost calculator

Run the math on your team.

Three pre-computed scenarios across team size and model class. Pick the one closest to your shape. Numbers use public list pricing on annual prepay for both vendors, with Anthropic passthrough rates for the inference side.

Cost-optimized 10 devs · Haiku 4.5 Balanced 50 devs · Sonnet 4.6 Scale 200 devs · Sonnet 4.6

CodeRabbit Pro $480/mo 10 seats × $48 (annual prepay)

Kodus BYO $102/mo $80 seats + $22 Haiku 4.5 inference

Monthly delta − $378/mo 79% lower · ~$4.5k/yr saved

Smaller team, fast triage on Haiku. Inference is so cheap it almost vanishes against seats.

CodeRabbit Pro $2,400/mo 50 seats × $48 (annual prepay)

Kodus BYO $818/mo $400 seats + $418 Sonnet 4.6 inference

Monthly delta − $1,582/mo 66% lower · ~$19k/yr saved

Mid-sized team on Sonnet for balanced reasoning depth. This is the receipt comparison shown above.

CodeRabbit Pro $9,600/mo 200 seats × $48 (annual prepay)

Kodus BYO $3,272/mo $1,600 seats + $1,672 Sonnet 4.6 inference

Monthly delta − $6,328/mo 66% lower · ~$76k/yr saved

Larger org. Same model class, 4x the PR volume. Annual savings buys a senior eng FTE.

All scenarios assume ~30 PRs per developer per month with ~3.5k input + ~600 output tokens per PR review on Anthropic list pricing. CodeRabbit Pro at $48/dev/month annual prepay ($60/dev/month monthly). Kodus Teams at $8/dev/month annual prepay ($10/dev/month monthly). Real numbers will vary with model choice, PR size, and rule depth. Reach out for a quote based on your actual repo history.

how kody reviews

From PR opened to inline comments. Four stages, the model you brought.

Deterministic pipeline. Real components in the repo, no marketing-ware. Click any stage to read the source.

01 / 04

intake

Trigger from your Git host or the CLI.

Webhooks come in from GitHub, GitLab, Bitbucket, and Azure DevOps. Self-managed flavors work the same way: GitHub Enterprise Server, GitLab Self-Managed, Bitbucket Data Center. Or skip the Git host entirely and trigger reviews from the Kodus CLI in your dev loop or CI.

CLI · webhooks

02 / 04

context

Kody builds the picture before it writes anything.

A sandbox is provisioned for the review (local on your box, hosted on E2B, or skipped entirely if you don't want one). Kody reads the diff, walks the code structure, and pulls in your Kody Rules and linked tickets. Everything assembled, then it starts looking for problems.

sandbox · your choice code graph your rules + tickets

03 / 04

review

One reviewer by default. Four specialists when you ask for it.

In normal mode Kody runs as a single generalist reviewer that covers logic, security, and performance in one pass, plus your Kody Rules agent if you have rules configured. Switch the review to deep mode and three dedicated specialists go in parallel instead. Same model you brought, same sandbox.

Generalist

default · logic, security, and performance in one agent

KodyRules

enforces your team's custom rules and conventions

Bug / Security / Performance

deep mode · three specialists run in parallel

BYO LLM

every agent uses the provider you set in .env

findings collapse by semantic similarity · ranked critical / high / medium / low

04 / 04

output

Findings come back on the PR or in the CLI.

When the review starts on a PR, Kody posts line-anchored inline comments and sets approve or request-changes status, right next to the rest of your CI. When it starts from the CLI, the same findings stream back to your terminal as structured output you can pipe, fail builds on, or feed into another tool.

                42  function isTenantAllowed(req) {
                43    if (typeof req.query.orgId !== 'string') return false;
                43    const raw = req.query.orgId;
                44    if (typeof raw !== 'string' || !UUID.test(raw)) return false;
                45    return tenants.has(raw);
                46  }
              

The original guard accepted any string-shaped orgId and let an unrelated tenant's UUID through. Narrow with the UUID regex before the lookup, otherwise the route leaks cross-tenant rows under /api/cockpit?orgId=....

// HOW KODY REVIEWS

From PR to inline comments.

Webhook lands, sandbox spins up, AST graph builds, context loads from your tools, four specialized agents review in parallel, dedup runs, comments post back. All inside your network. Reference: libs/code-review/pipeline/.

kody.engine · live review RUNNING

$pr #1247 received · branch feat/auth-guard-tiergithub ✓context pack assembled ├─ sandbox up e2b ├─ ast graph built (12 files)kodus-graph └─ context loaded jira KOD-301 + 3 docspgvector $dispatching 4 agents in parallel ├─ BugAgent investigatingclaude-sonnet-4.6 ├─ SecurityAgent investigatingclaude-sonnet-4.6 ├─ PerformanceAgent investigatingclaude-sonnet-4.6 └─ KodyRulesAgent investigatingclaude-sonnet-4.6 ✓23 raw findings → 18 after dedup ├─ 3 critical ├─ 7 high ├─ 6 medium └─ 2 low $posting inline comments... ✓6 comments posted · status REQUEST_CHANGES completed in 47s · all inside your network

hover to pause · loops every 22s libs/code-review/pipeline/ · agents in libs/code-review/infrastructure/agents/

A · INTAKE pull request received apps/webhooks

webhook

github · gitlab · azure · bitbucket → enqueued on rabbitmq for the worker.

B · CONTEXT PACK facts assembled for review libs/code-review/pipeline/stages

sandbox

e2b vm spins up. readFile, listDir, bash, run-cmd. agents get a real shell, not a string buffer.

ast graph

kodus-graph parses changed files + their call graph. caller/callee relations emit as xml.

context loader

pgvector rag pulls kody rules, conventions, linked jira / linear / docs into the prompt.

deterministic order · sandbox is reused across reviews via lease

C · AGENTS · PARALLEL four specialists investigate infrastructure/agents

BugAgent

logic · edge cases · null safety · race conditions

SecurityAgent

authz · injection · secrets · xss · idor

PerformanceAgent

n+1 · hot loops · allocations · algorithms

KodyRulesAgent

your team rules · architecture · conventions

each agent runs its own llm loop with sandbox tools · byo provider · openai / anthropic / google / groq / cerebras / any openai-compatible

D · DEDUP + RANK noise out, signal ranked agent-review.stage.ts

semantic dedup · severity rank

findings from all four agents collapsed by similarity. coverage ledger forces every changed file to be touched. results graded critical · high · medium · low.

E · OUTPUT comments back on the pr create-file-comments.stage.ts

inline comments + pr status

line-anchored comments via the git provider's native api. status: approve or request-changes. github checks line up next to your other ci.

resource minimum recommended

CPU 2+ cores 4+ for repos >100k LOC

RAM 8 GB 16 GB with local sandbox

DISK 60 GB grows with PR volume

OS Any Docker host Linux preferred

NET Domain or fixed IP for Git webhooks

PHONE Anonymous heartbeat opt-out via env

Supported providers

14+ providers. One config shape.

Run code review with Claude Sonnet 4.6, GPT-5.1, Gemini 2.5 Pro, Llama 3.3, Kimi K2, GLM 4.6, or any OpenAI-compatible endpoint you operate. Three env vars and the agent talks to any of them. Frontier models, speed-tuned inference, open weights. Same code path on every side.

Frontier claude-opus-4-7 claude-sonnet-4-6 gpt-5.1 gemini-2.5-pro

Speed / cost claude-haiku-4-5 groq llama-3.3-70b cerebras qwen3-coder together-ai fireworks

Open weights llama-3.3 mistral-large kimi-k2 glm-4.6 deepseek-v3

Local / self-hosted vLLM Ollama TGI LiteLLM gateway + any OpenAI-compatible

OpenAI

Anthropic

Gemini Local

# Same 3 vars for every provider.
# Swap the base URL, swap the model, you are done.
API_OPENAI_FORCE_BASE_URL="https://api.openai.com/v1"
API_OPEN_AI_API_KEY="sk-..."
API_LLM_PROVIDER_MODEL=gpt-5.1

# Anthropic exposes an OpenAI-compatible endpoint.
API_OPENAI_FORCE_BASE_URL="https://api.anthropic.com/v1"
API_OPEN_AI_API_KEY="sk-ant-..."
API_LLM_PROVIDER_MODEL=claude-sonnet-4-6

# Gemini exposes an OpenAI-compatible endpoint.
API_OPENAI_FORCE_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
API_OPEN_AI_API_KEY="..."
API_LLM_PROVIDER_MODEL=gemini-2.5-pro

# Same 3 vars. Point at your own gateway.
# vLLM, Ollama, LiteLLM, TGI, any OpenAI-compatible server.
API_OPENAI_FORCE_BASE_URL="http://llm.internal.your-co/v1"
API_OPEN_AI_API_KEY="sk-local-anything"
API_LLM_PROVIDER_MODEL=your-local-model

One config shape. The base URL is the switch. Kody never knows which vendor is on the other end.

BYO LLM vs alternatives

Kodus is the BYO-friendly side. Everyone else bundles.

Most AI code review vendors lock the model. The honest exception is PR-Agent, the open source agent Qodo donated to the community in late 2025 (now maintained at the-PR-Agent/pr-agent, distinct from Qodo's commercial Qodo Merge). Below: how Kodus, PR-Agent, and the bundled crowd stack on the eight things that matter for BYO LLM teams.

Verdict

Only tool that ships all 8 properties with a free hosted option.

Kodus 8 / 8

PR-Agent (OSS) 7 / 8

Sourcery 2 / 8

CodeRabbit 1 / 8

Greptile 1 / 8

Capability	Kodus	PA PR-Agent	SC Sourcery	CR CodeRabbit	GR Greptile
Models & providers
Bring your own API key	Yes	Yes	Enterprise only	Bundled	Bundled
Multi-provider support	14+ providers	LiteLLM (many)	Proprietary	Internal routing	Internal routing
Pin specific model version	Yes	Yes	No	No	No
Local / open-weight models	vLLM, Ollama, TGI	Ollama, vLLM	No	No	No
Cost & transparency
Zero markup on inference	Yes	Yes (OSS)	Bundled	Bundled	Bundled
Inference billed separately	Yes	Yes	Opaque	Opaque	Opaque
Product & access
Free hosted Cloud option	Yes	Self-host only	Free for OSS	Trial	Trial
Open source agent	AGPLv3	Apache 2.0	No	No	No

Where BYO LLM pays off

For teams whose AI bill should not be a mystery.

Cost-aware engineering orgs, multi-team companies juggling more than one model, and security teams that can't approve a black-box inference pipeline. BYO LLM gives all three the same thing: a model their finance, security, and platform people can actually reason about.

File · 01-COST FINANCE

Cost-conscious teams

One bill per provider. No surprises.

The inference invoice comes from Anthropic or OpenAI with token-level detail. Finance can forecast spend by repo, model, or month. Hit a budget cap and your provider stops the bleed, not us.

FinOps · per-repo budgets · token caps

File · 02-LIFECYCLE PIN

Model lifecycle control

Upgrade on your schedule, not theirs.

Pin a snapshot the day it ships. Run a regression set against your real PR history before flipping the model env var. Deprecation timelines and behavior changes stop being an unannounced incident.

Version pinning · regression sets · change windows

File · 03-MULTI MIX

Multi-model orgs

Different repos, different models.

Backend on Sonnet for cross-file reasoning, frontend on Haiku for cost, an internal vLLM endpoint for the data-platform repo that can't call out. One agent, one config shape, three model routes.

Per-repo routing · cost-tier mixing · internal endpoints

File · 04-OSS OPEN

Open-weight commitment

Run the review on a model you can read.

Llama, Mistral, Qwen, GLM, Kimi, DeepSeek. If your governance policy says no closed-weight models in the loop, point Kodus at an internal inference server and the agent never knows the difference.

Open weights · vLLM · TGI · Ollama

FAQ

kodus-self-hosted-faq(1) bash

KODUS-FAQ(1) Self-Hosted Edition KODUS-FAQ(1)

It means the LLM provider is on your side of the contract. Your API key, your account, your usage caps, your invoice. Kodus orchestrates the review and sends the prompts, but the inference call hits your provider directly. There is no Kodus middleman in the request, the billing, or the rate limit. It is open source AI code review without vendor lock-in.

No. We charge per seat for the agent, orchestration, web UI, and the rule engine. Inference is billed by your provider at their list price. Self-hosted edition is free under AGPLv3 with the same model setup. The number on your seat invoice and the number on your Anthropic/OpenAI invoice are independent.

OpenAI, Anthropic, Google Gemini, Google Vertex AI, Novita, Groq, Cerebras, Together AI, Fireworks, Chutes, Moonshot / Kimi, Synthetic, and Z.ai / GLM are wired in. Anything else with an OpenAI-compatible API works through API_OPENAI_FORCE_BASE_URL, including an internal vLLM, TGI, LiteLLM, or Ollama instance you operate. Three env vars cover every case.

Yes. Point API_OPENAI_FORCE_BASE_URL at your internal inference endpoint (vLLM, TGI, Ollama, LiteLLM, or any OpenAI-compatible server). Set the model name in API_LLM_PROVIDER_MODEL. The agent does not need to know if Llama, Mistral, GLM, Kimi, DeepSeek, or anything else is on the other end of the wire.

Yes. Use the exact snapshot string the provider exposes: claude-sonnet-4-6, gpt-5.1-2026-04-15, gemini-2.5-pro. Kodus does not pick a model behind your back. When the provider deprecates a snapshot, you decide when the next pin ships.

Yes. The agent reads model config per repository, so the backend repo can run on Sonnet 4.6 for cross-file reasoning while the frontend repo runs on Haiku 4.5 for cost. The internal data-platform repo can route to a local vLLM endpoint. Same agent binary, three different model paths.

On Cloud, yes: a hosted default kicks in during trial so you can test without setup. On self-hosted, no, because there is no Kodus account to bill against. You must set API_OPENAI_FORCE_BASE_URL, API_OPEN_AI_API_KEY, and API_LLM_PROVIDER_MODEL in .env before the agent can call out.

The provider notifies you on their normal cadence. We do not silently swap the model. We maintain a regression test set of historical PRs and recommend running it against the candidate version before flipping the env var. That lets you measure the behavior delta on real diffs instead of trusting marketing copy.

Two ways. Static: each review logs token counts and the provider invoice gives you cost per call, so you can rebuild the math in BigQuery or your data warehouse. Dynamic: run the same PR through two different model env vars and compare findings. Cheap models triage, expensive models go deep. Use the cheaper one until it misses something that matters.

PR-Agent is the honest peer on BYO LLM mechanics. It is the open source agent Qodo donated to the community in late 2025, now maintained at the-PR-Agent/pr-agent (11k+ stars, active releases through 2026). It is a separate project from Qodo Merge, which is Qodo's commercial paid product with bundled models. Both Kodus and PR-Agent bring your key, pin model versions, support local models, and run open source. The difference is on the product side: Kodus ships a free hosted Cloud option (PR-Agent is self-host only), a polished web UI for review history and Kody Rules, and first-class integrations with GitHub Enterprise Server, GitLab Self-Managed, and Bitbucket Data Center. Pick PR-Agent if CLI-first OSS-only is your stance. Pick Kodus if you want the same BYO LLM mechanics plus a managed surface for the rest of the team.

Ship it

Pick your model. Run the review.

Three env vars and the agent talks to any OpenAI-compatible provider. Switch in one line. No markup, ever.

See pricing See the config

Install on a VM Star kodus-ai on GitHub