AI writes 1.6× more bugs than humans.
Per pull request, AI-authored code draws 1.6× more review findings — and breaks 2.1× more of the team's own rules.
Navigate the Report
Implementation
What happens to a delivered suggestion.
Implementation rate over time
The 33% is a blended average — it has nearly doubled: 25% → 48% in eight months.
Outcome of every suggestion
Across the full window, 1 in 3 becomes code.
By finding class
Type barely moves the rate — a 7-point spread.
| Class | Delivered | Impl/adapt |
|---|---|---|
| Performance | 4,713 | 38.2% |
| Bugs | 100,014 | 34.5% |
| Custom rules | 68,603 | 31.3% |
| Security | 7,409 | 31.2% |
By language
The language sets the ceiling: Rust 60%, TypeScript 34%.
By severity
It runs backwards — critical is fixed least.
By PR size
Small PRs get ignored most, not large ones.
What becomes code depends more on context — language and PR size — than on what the finding is. And the rate isn't static: it nearly doubled in eight months.
Bugs
The ten classes Kody catches most.
Top 10 bug classes
| # | Class | Severity | What it is |
|---|---|---|---|
| 01 | Null access on optional fields | high | A nested field that may not exist — often a column added in a migration older records still have null. |
| 02 | Race conditions | critical | Two requests for the same resource arrive at once and both think they're first. |
| 03 | Schema drift, create vs update | high | Two validators describe the same record but disagree on what's required. |
| 04 | Critical logic commented out | critical | A worker, cron, or middleware disabled during a refactor that never came back. |
| 05 | Async / await abuse | high | Async called from sync, transactions committed twice, or blocking IO in an async function. |
| 06 | Inverted boolean / off-by-one | high | A condition that evaluates the opposite of what was meant, common after an operator flip. |
| 07 | Hardcoded where dynamic required | medium | A session ID, model name, or environment value committed as a string literal. |
| 08 | Downstream breakage from schema change | high | Dropping a column or renaming a field while other code still expects the old shape. |
| 09 | Resource leaks | medium | Memory, file handles, timers, or listeners allocated and never released. |
| 10 | Database edge cases | medium | Queries that assume well-formed data: no duplicates, never empty, fits one row. |
The same ten classes recur everywhere — and about a third get fixed. Catching bugs is pattern-matching, not detective work.
Vulnerabilities
The security landscape, by frequency.
Top 12 vulnerability classes
SQL injection still #1; prompt injection is new in 2026.
| # | Class | Severity | What it is |
|---|---|---|---|
| 01 | SQL injection | critical | Still the most common. User input concatenated straight into SQL strings. |
| 02 | Path traversal | critical | Unsanitized URL parameters used to construct file paths. |
| 03 | Missing authorization | critical | Endpoints that lost their auth dependency during a refactor. |
| 04 | XSS via unescaped attributes | high | User-controlled values written into HTML attributes without escaping. |
| 05 | SSRF | high | URL parameters fed straight into outbound HTTP requests. |
| 06 | Hardcoded secrets in source | critical | Tokens, API keys, and credentials committed alongside code. |
| 07 | Open redirect | medium | Redirect helpers that block obvious schemes but miss protocol-relative URLs. |
| 08 | Sensitive data in logs | high | Raw error objects, internal IDs, or full payloads logged unsanitized. |
| 09 | Prompt injection new | critical | System prompts that concatenate user-controlled strings without delimiters. |
| 10 | Default credentials | high | Services deployed with well-known default username/password. |
| 11 | Command injection | critical | User input passed unsanitized into shell commands. |
| 12 | PostMessage origin not validated | high | Window message listeners that process events from any origin. |
The classics — SQL injection, leaked secrets — still dominate, but AI pushed a brand-new class into the top tier: prompt injection.
Rules
What teams codify — and how often it lands.
20 most adopted custom rules
By distinct organizations adopting them.
| # | Rule | Orgs |
|---|---|---|
| 1 | Write a clear, scoped PR title | 55 |
| 2 | Prohibit hardcoded secrets | 41 |
| 3 | Always sanitize user inputs | 36 |
| 4 | Prevent hardcoded secrets | 34 |
| 5 | Avoid equality operators in loop termination | 30 |
| 6 | Avoid using eval | 30 |
| 7 | Enforce TypeScript strict mode | 30 |
| 8 | Always validate JSON parsing | 28 |
| 9 | Enforce strict TypeScript configuration | 28 |
| 10 | Prevent SQL injection in queries | 26 |
| # | Rule | Orgs |
|---|---|---|
| 11 | Avoid async operations in constructors | 25 |
| 12 | Mark unchanged variables as const | 25 |
| 13 | Ensure React list keys are stable | 24 |
| 14 | Do not nest React components | 23 |
| 15 | Enable TypeScript strict mode | 22 |
| 16 | Do not export mutable variables | 21 |
| 17 | Do not ignore exceptions | 21 |
| 18 | Prevent SQL injection via concatenation | 21 |
| 19 | React children not passed as props | 20 |
| 20 | Avoid building commands from user input | 19 |
Rules vs other signal
Custom rules land on par with bugs.
Beyond bugs, teams codify their own standards — nearly 10,000 rules — and those land at the same rate bug fixes do.
Models
Acceptance by the model that ran the review.
By model family
Gemini carries most of the volume (41%). Read the gap as a team signal, not a model verdict.
Switching the reviewer model barely moves the outcome — what drives acceptance is the team, not the model.
The adoption wave
Who's adopting AI review — and how it ships.
Median PR size, by month
Platform-wide, PR size more than doubled: 73 → 157 lines.
Median time to merge, by month
And merge time collapsed: 18.9h → 1.7h.
Composition, not behavior
Hold the same teams fixed and both trends vanish — the curve is who showed up, not how teams ship.
PRs doubled and merge time fell from 18.9h to 1.7h — but not because existing teams changed. A new generation is arriving that already ships small and fast by default. The shift is generational, not behavioral.
AI-authored code
When the author is a machine.
Declared AI-coauthored share
From 0.8% to 30% in eight months — and that's a floor.
Which assistant wrote it
Claude is behind 85% of declared AI-coauthored PRs.
AI vs human-only
2.6× larger — and fixed more often.
By finding class — AI vs human
More findings, fixed more — every class. Widest gap: rules, 2.1×.
| Class | Per 100 PRs (AI/human) | Implemented (AI/human) |
|---|---|---|
| Bugs | 95 / 72 | 44.2% / 32.5% |
| Custom rules | 95 / 45 | 40.1% / 28.5% |
| Security | 7.1 / 5.2 | 42.3% / 29.4% |
| Performance | 4.9 / 3.3 | 47.7% / 36.1% |
AI already writes a third of the code we see — in bigger PRs that draw more of every kind of finding, broken team rules most of all.
Merged anyway
Critical flags that ship unaddressed.
Flags shipped unaddressed
71.8% of flagged merged PRs ship with one still open.
Flagging isn't a gate: 7 in 10 critical flags get merged anyway. Without something that blocks the merge, review is just advice.
Economics
What it costs to run the reviewer.
Volume & latency
Input-heavy — ~14:1, ~2.4 min per review.
Cost per PR, by model
Same workload, each model's list price — $0.10 to $3.75 per PR (~38×).
Reviewing a PR is cheap and input-bound — but model choice swings the bill ~38×. The lever is which model, not whether to review.
Methodology
| Source | Kodus production pipeline — GitHub, GitLab, Bitbucket, Azure DevOps. |
| Window | PRs created 2025-09-01 onward, through the 2026-06-08 snapshot. |
| Scope | Bugs, security, performance, custom rules. Deprecated categories excluded. |
| "Implemented" | Flagged lines changed on a later commit. Measured from the diff — silent ignores counted. |
| Privacy | No slice under 50 suggestions. No org, repo, or PR identifiers exposed. |
See what gets caught in your code.
Open source AI code review that learns your team's rules. This is what it does in production.
State of AI Code Review (2026-Q2) · Research by Kodus · kodus.io/data · CC BY 4.0