BYO LLM code review.
Open source AI code review without vendor lock-in. The first model-agnostic AI code review tool with predictable AI review costs. You bring the model, we never touch your bill. Pay your provider directly with zero markup on inference, switch models in one config line, see every cent the review costs.
BYO LLM code review is when an AI code reviewer runs on a model you own (your account, your API key, your pricing), instead of a model bundled by the review vendor. The vendor charges for orchestration. You pay the model provider directly for inference, with no markup in the middle.
Three things a bundled model decides for you.
SaaS code review vendors that lock the model are deciding three things on your behalf. Each one looks like a feature in the marketing page and shows up as a bill or a migration headache later.
Vendor lock-in on the model contract
The model lives behind the vendor's account. When you outgrow them, switching is not "swap an env var", it is renegotiating the entire review pipeline. With BYO LLM, the agent is the thing you can swap. The model contract stays where it always was: with the provider.
Cost opacity on inference
Bundled-vendor pricing rolls inference into a per-seat fee. You cannot tell what fraction is the model and what fraction is margin. When the bill grows, you have no levers (cheaper model, smaller context, fewer reviews) because you do not see the math. BYO LLM gives you two invoices and an audit trail per request.
Model lifecycle traps
Provider deprecates the snapshot the vendor bundled. The vendor moves you to the next default without telling you. Your review behavior shifts overnight. With BYO LLM, you pin the snapshot, you run a regression set on the candidate before flipping, and the deprecation is your calendar event, not a surprise.
You bring it. You own it.
Bundled vendors decide which model you use, which version you run, when it gets deprecated, and what the markup is. BYO LLM puts every one of those decisions back on your side of the contract. Four guarantees ship with Kodus by default.
01 ━
You pick the provider
OpenAI, Anthropic, Google, Groq, Together, Fireworks, an internal vLLM, an open-weight model on Ollama. The call goes where you point it. Kodus stays out of the routing.
02 ━
You pin the model version
Lock to claude-sonnet-4-6, gpt-5.1, or any specific release. No silent upgrades. When the provider deprecates a snapshot you decide when the next one ships, not us.
03 ━
You see the full LLM bill
Inference invoices land on your provider account, line-item per request. No hidden inference cost rolled into a per-seat fee. The bill matches the work and the math is yours to audit.
04 ━
Zero markup on inference
We charge per seat for orchestration. The model spend is between you and the provider, with nothing skimmed in the middle. Same applies to the self-hosted edition (which is free).
Two invoices. One difference.
Bundled-vendor pricing hides what the inference actually costs. BYO LLM splits the bill cleanly: orchestration on one side, inference on the other, both audited by you. Example below uses a 50-developer team on annual rates, running ~30 PRs per developer per month on Claude Sonnet 4.6.
Inference cost is folded into the per-seat price. You cannot tell what the model actually costs or what margin sits on top. Monthly billing pushes this to $3,000 ($60/dev).
Two invoices, two providers, two line items you can take to finance. The inference invoice comes from Anthropic with your seat count, your token usage, your spend cap.
Numbers above use public list pricing as of 2026-05 on annual prepay: CodeRabbit Pro at $48/dev/month (or $60/dev/month if billed monthly), Kodus Teams at $8/dev/month (or $10/dev/month if billed monthly), Anthropic Claude Sonnet 4.6 at list passthrough rates. Inference estimate assumes ~3.5k input + ~600 output tokens per PR review across 1,500 reviewed PRs/month. Your bill will vary with model choice, PR size, and rule depth. Talk to us for a real estimate against your repo history.
Run the math on your team.
Three pre-computed scenarios across team size and model class. Pick the one closest to your shape. Numbers use public list pricing on annual prepay for both vendors, with Anthropic passthrough rates for the inference side.
Smaller team, fast triage on Haiku. Inference is so cheap it almost vanishes against seats.
Mid-sized team on Sonnet for balanced reasoning depth. This is the receipt comparison shown above.
Larger org. Same model class, 4x the PR volume. Annual savings buys a senior eng FTE.
All scenarios assume ~30 PRs per developer per month with ~3.5k input + ~600 output tokens per PR review on Anthropic list pricing. CodeRabbit Pro at $48/dev/month annual prepay ($60/dev/month monthly). Kodus Teams at $8/dev/month annual prepay ($10/dev/month monthly). Real numbers will vary with model choice, PR size, and rule depth. Reach out for a quote based on your actual repo history.
From PR opened to inline comments. Four stages, the model you brought.
Deterministic pipeline. Real components in the repo, no marketing-ware. Click any stage to read the source.
Trigger from your Git host or the CLI.
Webhooks come in from GitHub, GitLab, Bitbucket, and Azure DevOps. Self-managed flavors work the same way: GitHub Enterprise Server, GitLab Self-Managed, Bitbucket Data Center. Or skip the Git host entirely and trigger reviews from the Kodus CLI in your dev loop or CI.
Kody builds the picture before it writes anything.
A sandbox is provisioned for the review (local on your box, hosted on E2B, or skipped entirely if you don't want one). Kody reads the diff, walks the code structure, and pulls in your Kody Rules and linked tickets. Everything assembled, then it starts looking for problems.
One reviewer by default. Four specialists when you ask for it.
In normal mode Kody runs as a single generalist reviewer that covers logic, security, and performance in one pass, plus your Kody Rules agent if you have rules configured. Switch the review to deep mode and three dedicated specialists go in parallel instead. Same model you brought, same sandbox.
default · logic, security, and performance in one agent
enforces your team's custom rules and conventions
deep mode · three specialists run in parallel
every agent uses the provider you set in .env
Findings come back on the PR or in the CLI.
When the review starts on a PR, Kody posts line-anchored inline comments and sets approve or request-changes status, right next to the rest of your CI. When it starts from the CLI, the same findings stream back to your terminal as structured output you can pipe, fail builds on, or feed into another tool.
14+ providers. One config shape.
Run code review with Claude Sonnet 4.6, GPT-5.1, Gemini 2.5 Pro, Llama 3.3, Kimi K2, GLM 4.6, or any OpenAI-compatible endpoint you operate. Three env vars and the agent talks to any of them. Frontier models, speed-tuned inference, open weights. Same code path on every side.
# Same 3 vars for every provider. # Swap the base URL, swap the model, you are done. API_OPENAI_FORCE_BASE_URL="https://api.openai.com/v1" API_OPEN_AI_API_KEY="sk-..." API_LLM_PROVIDER_MODEL=gpt-5.1
# Anthropic exposes an OpenAI-compatible endpoint. API_OPENAI_FORCE_BASE_URL="https://api.anthropic.com/v1" API_OPEN_AI_API_KEY="sk-ant-..." API_LLM_PROVIDER_MODEL=claude-sonnet-4-6
# Gemini exposes an OpenAI-compatible endpoint. API_OPENAI_FORCE_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai" API_OPEN_AI_API_KEY="..." API_LLM_PROVIDER_MODEL=gemini-2.5-pro
# Same 3 vars. Point at your own gateway. # vLLM, Ollama, LiteLLM, TGI, any OpenAI-compatible server. API_OPENAI_FORCE_BASE_URL="http://llm.internal.your-co/v1" API_OPEN_AI_API_KEY="sk-local-anything" API_LLM_PROVIDER_MODEL=your-local-model
One config shape. The base URL is the switch. Kody never knows which vendor is on the other end.
Kodus is the BYO-friendly side. Everyone else bundles.
Most AI code review vendors lock the model. The honest exception is PR-Agent, the open source agent Qodo donated to the community in late 2025 (now maintained at the-PR-Agent/pr-agent, distinct from Qodo's commercial Qodo Merge). Below: how Kodus, PR-Agent, and the bundled crowd stack on the eight things that matter for BYO LLM teams.
Only tool that ships all 8 properties with a free hosted option.
| Capability |
Kodus
|
PA PR-Agent | SC Sourcery | CR CodeRabbit | GR Greptile |
|---|---|---|---|---|---|
| Models & providers | |||||
| Bring your own API key | Yes | Yes | Enterprise only | Bundled | Bundled |
| Multi-provider support | 14+ providers | LiteLLM (many) | Proprietary | Internal routing | Internal routing |
| Pin specific model version | Yes | Yes | No | No | No |
| Local / open-weight models | vLLM, Ollama, TGI | Ollama, vLLM | No | No | No |
| Cost & transparency | |||||
| Zero markup on inference | Yes | Yes (OSS) | Bundled | Bundled | Bundled |
| Inference billed separately | Yes | Yes | Opaque | Opaque | Opaque |
| Product & access | |||||
| Free hosted Cloud option | Yes | Self-host only | Free for OSS | Trial | Trial |
| Open source agent | AGPLv3 | Apache 2.0 | No | No | No |
For teams whose AI bill should not be a mystery.
Cost-aware engineering orgs, multi-team companies juggling more than one model, and security teams that can't approve a black-box inference pipeline. BYO LLM gives all three the same thing: a model their finance, security, and platform people can actually reason about.
Cost-conscious teams
One bill per provider. No surprises.
The inference invoice comes from Anthropic or OpenAI with token-level detail. Finance can forecast spend by repo, model, or month. Hit a budget cap and your provider stops the bleed, not us.
Model lifecycle control
Upgrade on your schedule, not theirs.
Pin a snapshot the day it ships. Run a regression set against your real PR history before flipping the model env var. Deprecation timelines and behavior changes stop being an unannounced incident.
Multi-model orgs
Different repos, different models.
Backend on Sonnet for cross-file reasoning, frontend on Haiku for cost, an internal vLLM endpoint for the data-platform repo that can't call out. One agent, one config shape, three model routes.
Open-weight commitment
Run the review on a model you can read.
Llama, Mistral, Qwen, GLM, Kimi, DeepSeek. If your governance policy says no closed-weight models in the loop, point Kodus at an internal inference server and the agent never knows the difference.
FAQ
It means the LLM provider is on your side of the contract. Your API key, your account, your usage caps, your invoice. Kodus orchestrates the review and sends the prompts, but the inference call hits your provider directly. There is no Kodus middleman in the request, the billing, or the rate limit. It is open source AI code review without vendor lock-in.
No. We charge per seat for the agent, orchestration, web UI, and the rule engine. Inference is billed by your provider at their list price. Self-hosted edition is free under AGPLv3 with the same model setup. The number on your seat invoice and the number on your Anthropic/OpenAI invoice are independent.
OpenAI, Anthropic, Google Gemini, Google Vertex AI, Novita, Groq, Cerebras, Together AI, Fireworks, Chutes, Moonshot / Kimi, Synthetic, and Z.ai / GLM are wired in. Anything else with an OpenAI-compatible API works through API_OPENAI_FORCE_BASE_URL, including an internal vLLM, TGI, LiteLLM, or Ollama instance you operate. Three env vars cover every case.
Yes. Point API_OPENAI_FORCE_BASE_URL at your internal inference endpoint (vLLM, TGI, Ollama, LiteLLM, or any OpenAI-compatible server). Set the model name in API_LLM_PROVIDER_MODEL. The agent does not need to know if Llama, Mistral, GLM, Kimi, DeepSeek, or anything else is on the other end of the wire.
Yes. Use the exact snapshot string the provider exposes: claude-sonnet-4-6, gpt-5.1-2026-04-15, gemini-2.5-pro. Kodus does not pick a model behind your back. When the provider deprecates a snapshot, you decide when the next pin ships.
Yes. The agent reads model config per repository, so the backend repo can run on Sonnet 4.6 for cross-file reasoning while the frontend repo runs on Haiku 4.5 for cost. The internal data-platform repo can route to a local vLLM endpoint. Same agent binary, three different model paths.
On Cloud, yes: a hosted default kicks in during trial so you can test without setup. On self-hosted, no, because there is no Kodus account to bill against. You must set API_OPENAI_FORCE_BASE_URL, API_OPEN_AI_API_KEY, and API_LLM_PROVIDER_MODEL in .env before the agent can call out.
The provider notifies you on their normal cadence. We do not silently swap the model. We maintain a regression test set of historical PRs and recommend running it against the candidate version before flipping the env var. That lets you measure the behavior delta on real diffs instead of trusting marketing copy.
Two ways. Static: each review logs token counts and the provider invoice gives you cost per call, so you can rebuild the math in BigQuery or your data warehouse. Dynamic: run the same PR through two different model env vars and compare findings. Cheap models triage, expensive models go deep. Use the cheaper one until it misses something that matters.
PR-Agent is the honest peer on BYO LLM mechanics. It is the open source agent Qodo donated to the community in late 2025, now maintained at the-PR-Agent/pr-agent (11k+ stars, active releases through 2026). It is a separate project from Qodo Merge, which is Qodo's commercial paid product with bundled models. Both Kodus and PR-Agent bring your key, pin model versions, support local models, and run open source. The difference is on the product side: Kodus ships a free hosted Cloud option (PR-Agent is self-host only), a polished web UI for review history and Kody Rules, and first-class integrations with GitHub Enterprise Server, GitLab Self-Managed, and Bitbucket Data Center. Pick PR-Agent if CLI-first OSS-only is your stance. Pick Kodus if you want the same BYO LLM mechanics plus a managed surface for the rest of the team.
Pick your model. Run the review.
Three env vars and the agent talks to any OpenAI-compatible provider. Switch in one line. No markup, ever.
Kodus
/api/cockpit?orgId=....