State of AI Code Review 2026 · Kodus Research

AI writes 1.6× more bugs than humans.

Name: State of AI Code Review 2026
Creator: Kodus
License: https://creativecommons.org/licenses/by/4.0/

Per pull request, AI-authored code draws 1.6× more review findings — and breaks 2.1× more of the team's own rules.

Read the findings Try Kodus

Across 22,743 AI PRs · measured from the diff, not benchmarks

Contents

Navigate the Report

01Implementation 02Bugs 03Vulnerabilities 04Rules 05Models 06Adoption wave 07AI-authored code 08Merged anyway 09Economics

Implementation

What happens to a delivered suggestion.

33%became code

1.1

Implementation rate over time

The 33% is a blended average — it has nearly doubled: 25% → 48% in eight months.

"Any implementation" by PR-created month · recent months are right-censored (biased low) · June partial, excluded

1.2

Outcome of every suggestion

Across the full window, 1 in 3 becomes code.

n = 180,739

1.3

By finding class

Type barely moves the rate — a 7-point spread.

Class	Delivered	Impl/adapt
Performance	4,713	38.2%
Bugs	100,014	34.5%
Custom rules	68,603	31.3%
Security	7,409	31.2%

1.4

By language

The language sets the ceiling: Rust 60%, TypeScript 34%.

≥ 200 suggestions, ≥ 5 orgs · scaled to leader

1.5

By severity

It runs backwards — critical is fixed least.

1.6

By PR size

Small PRs get ignored most, not large ones.

Bottom line

What becomes code depends more on context — language and PR size — than on what the finding is. And the rate isn't static: it nearly doubled in eight months.

Bugs

The ten classes Kody catches most.

34.5%of bugs fixed

2.1

Top 10 bug classes

#	Class	Severity	What it is
01	Null access on optional fields	high	A nested field that may not exist — often a column added in a migration older records still have null.
02	Race conditions	critical	Two requests for the same resource arrive at once and both think they're first.
03	Schema drift, create vs update	high	Two validators describe the same record but disagree on what's required.
04	Critical logic commented out	critical	A worker, cron, or middleware disabled during a refactor that never came back.
05	Async / await abuse	high	Async called from sync, transactions committed twice, or blocking IO in an async function.
06	Inverted boolean / off-by-one	high	A condition that evaluates the opposite of what was meant, common after an operator flip.
07	Hardcoded where dynamic required	medium	A session ID, model name, or environment value committed as a string literal.
08	Downstream breakage from schema change	high	Dropping a column or renaming a field while other code still expects the old shape.
09	Resource leaks	medium	Memory, file handles, timers, or listeners allocated and never released.
10	Database edge cases	medium	Queries that assume well-formed data: no duplicates, never empty, fits one row.

n = 100,014 across 497 orgs · 34.5% fixed

Bottom line

The same ten classes recur everywhere — and about a third get fixed. Catching bugs is pattern-matching, not detective work.

Vulnerabilities

The security landscape, by frequency.

#1SQL injection

3.1

Top 12 vulnerability classes

SQL injection still #1; prompt injection is new in 2026.

#	Class	Severity	What it is
01	SQL injection	critical	Still the most common. User input concatenated straight into SQL strings.
02	Path traversal	critical	Unsanitized URL parameters used to construct file paths.
03	Missing authorization	critical	Endpoints that lost their auth dependency during a refactor.
04	XSS via unescaped attributes	high	User-controlled values written into HTML attributes without escaping.
05	SSRF	high	URL parameters fed straight into outbound HTTP requests.
06	Hardcoded secrets in source	critical	Tokens, API keys, and credentials committed alongside code.
07	Open redirect	medium	Redirect helpers that block obvious schemes but miss protocol-relative URLs.
08	Sensitive data in logs	high	Raw error objects, internal IDs, or full payloads logged unsanitized.
09	Prompt injection new	critical	System prompts that concatenate user-controlled strings without delimiters.
10	Default credentials	high	Services deployed with well-known default username/password.
11	Command injection	critical	User input passed unsanitized into shell commands.
12	PostMessage origin not validated	high	Window message listeners that process events from any origin.

n = 7,409 across 352 orgs · 31.2% fixed

Bottom line

The classics — SQL injection, leaked secrets — still dominate, but AI pushed a brand-new class into the top tier: prompt injection.

Rules

What teams codify — and how often it lands.

9,916team rules

4.1

20 most adopted custom rules

By distinct organizations adopting them.

#	Rule	Orgs
1	Write a clear, scoped PR title	55
2	Prohibit hardcoded secrets	41
3	Always sanitize user inputs	36
4	Prevent hardcoded secrets	34
5	Avoid equality operators in loop termination	30
6	Avoid using eval	30
7	Enforce TypeScript strict mode	30
8	Always validate JSON parsing	28
9	Enforce strict TypeScript configuration	28
10	Prevent SQL injection in queries	26

#	Rule	Orgs
11	Avoid async operations in constructors	25
12	Mark unchanged variables as const	25
13	Ensure React list keys are stable	24
14	Do not nest React components	23
15	Enable TypeScript strict mode	22
16	Do not export mutable variables	21
17	Do not ignore exceptions	21
18	Prevent SQL injection via concatenation	21
19	React children not passed as props	20
20	Avoid building commands from user input	19

Distinct orgs, not instances · 9,916 rules by 717 orgs

4.2

Rules vs other signal

Custom rules land on par with bugs.

68,603 rule-driven suggestions

Bottom line

Beyond bugs, teams codify their own standards — nearly 10,000 rules — and those land at the same rate bug fixes do.

Models

Acceptance by the model that ran the review.

41%Gemini, most volume

5.1

By model family

Gemini carries most of the volume (41%). Read the gap as a team signal, not a model verdict.

By the model that ran the review · Gemini 44,324, others 358–980

Bottom line

Switching the reviewer model barely moves the outcome — what drives acceptance is the team, not the model.

The adoption wave

Who's adopting AI review — and how it ships.

×17more teams / month

6.1

Median PR size, by month

Platform-wide, PR size more than doubled: 73 → 157 lines.

Jan 2025 – May 2026 · all orgs · median lines changed

6.2

Median time to merge, by month

And merge time collapsed: 18.9h → 1.7h.

open → close, merged PRs · median hours

6.3

Composition, not behavior

Hold the same teams fixed and both trends vanish — the curve is who showed up, not how teams ship.

All teams (platform)

PR size+115%

Time to merge−91%

Active orgs / mo~17×

Same 24 teams (balanced)

PR size×1.13 · 13↑/11↓

Time to merge×1.05 · 13↑/11↓

Verdictcoin flip

Balanced panel: orgs with ≥ 10 PRs/quarter at both ends · Q3 2025 vs Q2 2026

Bottom line

PRs doubled and merge time fell from 18.9h to 1.7h — but not because existing teams changed. A new generation is arriving that already ships small and fast by default. The shift is generational, not behavioral.

AI-authored code

When the author is a machine.

1.6×more findings

7.1

Declared AI-coauthored share

From 0.8% to 30% in eight months — and that's a floor.

Share of all PRs by month opened

7.2

Which assistant wrote it

Claude is behind 85% of declared AI-coauthored PRs.

Claude 85%

Cursor 13%

Copilot 3.6%

Devin / Codex 0.3%

Shares sum > 100% — a few PRs carry more than one assistant's trailer

7.3

AI vs human-only

2.6× larger — and fixed more often.

PR size275

Findings/PR~2.0

Implemented42.3%

Human

PR size107

Findings/PR~1.3

Implemented31.1%

Median PR 2.6× larger on AI-coauthored code

7.4

By finding class — AI vs human

More findings, fixed more — every class. Widest gap: rules, 2.1×.

Class	Per 100 PRs (AI/human)	Implemented (AI/human)
Bugs	95 / 72	44.2% / 32.5%
Custom rules	95 / 45	40.1% / 28.5%
Security	7.1 / 5.2	42.3% / 29.4%
Performance	4.9 / 3.3	47.7% / 36.1%

Declared via commit trailer · a conservative floor

Bottom line

AI already writes a third of the code we see — in bigger PRs that draw more of every kind of finding, broken team rules most of all.

Merged anyway

Critical flags that ship unaddressed.

71.8%shipped open

8.1

Flags shipped unaddressed

71.8% of flagged merged PRs ship with one still open.

71.8%flagged PRs ship with ≥1 open

60%security flags unaddressed

64%critical flags unaddressed

58% / 74%AI vs human-written code

n = 13,609 flagged merged PRs · folds in false positives & accepted risk

Bottom line

Flagging isn't a gate: 7 in 10 critical flags get merged anyway. Without something that blocks the merge, review is just advice.

Economics

What it costs to run the reviewer.

$1.50per PR reviewed

9.1

Volume & latency

Input-heavy — ~14:1, ~2.4 min per review.

14.3B input

Input 14.3B Output 1.0B ~14:1 — re-reads the whole file every call

211,892LLM calls

~470Kinput tokens / PR

May 2026 · median call 13.3s, p95 66.5s, full review ~2.4 min

9.2

Cost per PR, by model

Same workload, each model's list price — $0.10 to $3.75 per PR (~38×).

Same May-2026 workload (14.3B in / 1.0B out) at each model's list price, sourced Jun 2026 · ~2.75× for a landed fix

Bottom line

Reviewing a PR is cheap and input-bound — but model choice swings the bill ~38×. The lever is which model, not whether to review.

Appendix

Methodology

Source	Kodus production pipeline — GitHub, GitLab, Bitbucket, Azure DevOps.
Window	PRs created 2025-09-01 onward, through the 2026-06-08 snapshot.
Scope	Bugs, security, performance, custom rules. Deprecated categories excluded.
"Implemented"	Flagged lines changed on a later commit. Measured from the diff — silent ignores counted.
Privacy	No slice under 50 suggestions. No org, repo, or PR identifiers exposed.

The takeaways

1.6×more review findings on AI-authored code

48%of suggestions get fixed — up from 25%

71.8%of flagged PRs merge with a flag still open

30%of PRs now declare AI authorship

Run it on your PRs

See what gets caught in your code.

Open source AI code review that learns your team's rules. This is what it does in production.

Try Kodus free Star on GitHub

State of AI Code Review (2026-Q2) · Research by Kodus · kodus.io/data · CC BY 4.0