State of AI Code Review 2026 · Kodus Research

AI writes 1.6× more bugs than humans.

Per pull request, AI-authored code draws 1.6× more review findings — and breaks 2.1× more of the team's own rules.

Across 22,743 AI PRs · measured from the diff, not benchmarks

01

Implementation

What happens to a delivered suggestion.

33%became code
1.1

Implementation rate over time

The 33% is a blended average — it has nearly doubled: 25% → 48% in eight months.

"Any implementation" by PR-created month · recent months are right-censored (biased low) · June partial, excluded
1.2

Outcome of every suggestion

Across the full window, 1 in 3 becomes code.

n = 180,739
1.3

By finding class

Type barely moves the rate — a 7-point spread.

ClassDeliveredImpl/adapt
Performance4,71338.2%
Bugs100,01434.5%
Custom rules68,60331.3%
Security7,40931.2%
1.4

By language

The language sets the ceiling: Rust 60%, TypeScript 34%.

≥ 200 suggestions, ≥ 5 orgs · scaled to leader
1.5

By severity

It runs backwards — critical is fixed least.

1.6

By PR size

Small PRs get ignored most, not large ones.

Bottom line

What becomes code depends more on context — language and PR size — than on what the finding is. And the rate isn't static: it nearly doubled in eight months.

02

Bugs

The ten classes Kody catches most.

34.5%of bugs fixed
2.1

Top 10 bug classes

#ClassSeverityWhat it is
01Null access on optional fieldshighA nested field that may not exist — often a column added in a migration older records still have null.
02Race conditionscriticalTwo requests for the same resource arrive at once and both think they're first.
03Schema drift, create vs updatehighTwo validators describe the same record but disagree on what's required.
04Critical logic commented outcriticalA worker, cron, or middleware disabled during a refactor that never came back.
05Async / await abusehighAsync called from sync, transactions committed twice, or blocking IO in an async function.
06Inverted boolean / off-by-onehighA condition that evaluates the opposite of what was meant, common after an operator flip.
07Hardcoded where dynamic requiredmediumA session ID, model name, or environment value committed as a string literal.
08Downstream breakage from schema changehighDropping a column or renaming a field while other code still expects the old shape.
09Resource leaksmediumMemory, file handles, timers, or listeners allocated and never released.
10Database edge casesmediumQueries that assume well-formed data: no duplicates, never empty, fits one row.
n = 100,014 across 497 orgs · 34.5% fixed
Bottom line

The same ten classes recur everywhere — and about a third get fixed. Catching bugs is pattern-matching, not detective work.

03

Vulnerabilities

The security landscape, by frequency.

#1SQL injection
3.1

Top 12 vulnerability classes

SQL injection still #1; prompt injection is new in 2026.

#ClassSeverityWhat it is
01SQL injectioncriticalStill the most common. User input concatenated straight into SQL strings.
02Path traversalcriticalUnsanitized URL parameters used to construct file paths.
03Missing authorizationcriticalEndpoints that lost their auth dependency during a refactor.
04XSS via unescaped attributeshighUser-controlled values written into HTML attributes without escaping.
05SSRFhighURL parameters fed straight into outbound HTTP requests.
06Hardcoded secrets in sourcecriticalTokens, API keys, and credentials committed alongside code.
07Open redirectmediumRedirect helpers that block obvious schemes but miss protocol-relative URLs.
08Sensitive data in logshighRaw error objects, internal IDs, or full payloads logged unsanitized.
09Prompt injection newcriticalSystem prompts that concatenate user-controlled strings without delimiters.
10Default credentialshighServices deployed with well-known default username/password.
11Command injectioncriticalUser input passed unsanitized into shell commands.
12PostMessage origin not validatedhighWindow message listeners that process events from any origin.
n = 7,409 across 352 orgs · 31.2% fixed
Bottom line

The classics — SQL injection, leaked secrets — still dominate, but AI pushed a brand-new class into the top tier: prompt injection.

04

Rules

What teams codify — and how often it lands.

9,916team rules
4.1

20 most adopted custom rules

By distinct organizations adopting them.

#RuleOrgs
1Write a clear, scoped PR title55
2Prohibit hardcoded secrets41
3Always sanitize user inputs36
4Prevent hardcoded secrets34
5Avoid equality operators in loop termination30
6Avoid using eval30
7Enforce TypeScript strict mode30
8Always validate JSON parsing28
9Enforce strict TypeScript configuration28
10Prevent SQL injection in queries26
#RuleOrgs
11Avoid async operations in constructors25
12Mark unchanged variables as const25
13Ensure React list keys are stable24
14Do not nest React components23
15Enable TypeScript strict mode22
16Do not export mutable variables21
17Do not ignore exceptions21
18Prevent SQL injection via concatenation21
19React children not passed as props20
20Avoid building commands from user input19
Distinct orgs, not instances · 9,916 rules by 717 orgs
4.2

Rules vs other signal

Custom rules land on par with bugs.

68,603 rule-driven suggestions
Bottom line

Beyond bugs, teams codify their own standards — nearly 10,000 rules — and those land at the same rate bug fixes do.

05

Models

Acceptance by the model that ran the review.

41%Gemini, most volume
5.1

By model family

Gemini carries most of the volume (41%). Read the gap as a team signal, not a model verdict.

By the model that ran the review · Gemini 44,324, others 358–980
Bottom line

Switching the reviewer model barely moves the outcome — what drives acceptance is the team, not the model.

06

The adoption wave

Who's adopting AI review — and how it ships.

×17more teams / month
6.1

Median PR size, by month

Platform-wide, PR size more than doubled: 73 → 157 lines.

Jan 2025 – May 2026 · all orgs · median lines changed
6.2

Median time to merge, by month

And merge time collapsed: 18.9h → 1.7h.

open → close, merged PRs · median hours
6.3

Composition, not behavior

Hold the same teams fixed and both trends vanish — the curve is who showed up, not how teams ship.

All teams (platform)
PR size+115%
Time to merge−91%
Active orgs / mo~17×
Same 24 teams (balanced)
PR size×1.13 · 13↑/11↓
Time to merge×1.05 · 13↑/11↓
Verdictcoin flip
Balanced panel: orgs with ≥ 10 PRs/quarter at both ends · Q3 2025 vs Q2 2026
Bottom line

PRs doubled and merge time fell from 18.9h to 1.7h — but not because existing teams changed. A new generation is arriving that already ships small and fast by default. The shift is generational, not behavioral.

07

AI-authored code

When the author is a machine.

1.6×more findings
7.1

Declared AI-coauthored share

From 0.8% to 30% in eight months — and that's a floor.

Share of all PRs by month opened
7.2

Which assistant wrote it

Claude is behind 85% of declared AI-coauthored PRs.

Claude 85%
Cursor 13%
Copilot 3.6%
Devin / Codex 0.3%
Shares sum > 100% — a few PRs carry more than one assistant's trailer
7.3

AI vs human-only

2.6× larger — and fixed more often.

AI
PR size275
Findings/PR~2.0
Implemented42.3%
Human
PR size107
Findings/PR~1.3
Implemented31.1%
Median PR 2.6× larger on AI-coauthored code
7.4

By finding class — AI vs human

More findings, fixed more — every class. Widest gap: rules, 2.1×.

ClassPer 100 PRs (AI/human)Implemented (AI/human)
Bugs95 / 7244.2% / 32.5%
Custom rules95 / 4540.1% / 28.5%
Security7.1 / 5.242.3% / 29.4%
Performance4.9 / 3.347.7% / 36.1%
Declared via commit trailer · a conservative floor
Bottom line

AI already writes a third of the code we see — in bigger PRs that draw more of every kind of finding, broken team rules most of all.

08

Merged anyway

Critical flags that ship unaddressed.

71.8%shipped open
8.1

Flags shipped unaddressed

71.8% of flagged merged PRs ship with one still open.

71.8%flagged PRs ship with ≥1 open
60%security flags unaddressed
64%critical flags unaddressed
58% / 74%AI vs human-written code
n = 13,609 flagged merged PRs · folds in false positives & accepted risk
Bottom line

Flagging isn't a gate: 7 in 10 critical flags get merged anyway. Without something that blocks the merge, review is just advice.

09

Economics

What it costs to run the reviewer.

$1.50per PR reviewed
9.1

Volume & latency

Input-heavy — ~14:1, ~2.4 min per review.

14.3B input
Input 14.3B Output 1.0B ~14:1 — re-reads the whole file every call
211,892LLM calls
~470Kinput tokens / PR
May 2026 · median call 13.3s, p95 66.5s, full review ~2.4 min
9.2

Cost per PR, by model

Same workload, each model's list price — $0.10 to $3.75 per PR (~38×).

Same May-2026 workload (14.3B in / 1.0B out) at each model's list price, sourced Jun 2026 · ~2.75× for a landed fix
Bottom line

Reviewing a PR is cheap and input-bound — but model choice swings the bill ~38×. The lever is which model, not whether to review.

Appendix

Methodology

SourceKodus production pipeline — GitHub, GitLab, Bitbucket, Azure DevOps.
WindowPRs created 2025-09-01 onward, through the 2026-06-08 snapshot.
ScopeBugs, security, performance, custom rules. Deprecated categories excluded.
"Implemented"Flagged lines changed on a later commit. Measured from the diff — silent ignores counted.
PrivacyNo slice under 50 suggestions. No org, repo, or PR identifiers exposed.
The takeaways
1.6×more review findings on AI-authored code
48%of suggestions get fixed — up from 25%
71.8%of flagged PRs merge with a flag still open
30%of PRs now declare AI authorship
Run it on your PRs

See what gets caught in your code.

Open source AI code review that learns your team's rules. This is what it does in production.

State of AI Code Review (2026-Q2) · Research by Kodus · kodus.io/data · CC BY 4.0