AI Code Review Benchmark

We evaluated Kody and other AI code review tools on the same PRs across five open-source projects. The goal is to give you a clear picture of how each tool performs in real reviews.

// BENCHMARK ANALYZER
SCAN_COMPLETE [v1.0]
METHODOLOGY

We selected nine pull requests from five large, actively maintained open-source repositories. Each PR contained at least one real, documented issue — a bug, a security vulnerability, or a performance concern that was later confirmed by the project maintainers.

We then ran four AI code review tools on each PR under identical conditions: same diff, same context window, default configuration. No tool received any hints or custom rules.

A finding was counted as a hit only if the tool flagged the specific issue that the PR was known to contain. Generic style or formatting comments were ignored.

benchmark_summary.sh ● DONE
$ cat results/tldr.txt
CRITICAL Kodus detected 69% of critical issues (9/13). Cursor came second at 62%.
HIGH Kodus led high-severity detection at 81% (13/16). CodeRabbit detected only 31%.
MEDIUM Kodus caught 89% of medium issues (8/9). GitHub Copilot came second at 78%.
Overall: Kodus was the most consistent tool across 38 PRs, catching 30/38 known issues (79%) across every severity level.
$

Overall Performance

PERF_ANALYSIS.DAT 38 PRs ANALYZED

Overall — Issues Detected (38 PRs)

Kodus
30 / 38
79%
Cursor
22 / 38
58%
GitHub Copilot
20 / 38
53%
CodeRabbit
15 / 38
39%

Critical Severity (13 PRs)

Kodus
9 / 13
69%
Cursor
8 / 13
62%
GitHub Copilot
7 / 13
54%
CodeRabbit
5 / 13
38%

High Severity (16 PRs)

Kodus
13 / 16
81%
Cursor
8 / 16
50%
GitHub Copilot
6 / 16
38%
CodeRabbit
5 / 16
31%

Medium Severity (9 PRs)

Kodus
8 / 9
89%
GitHub Copilot
7 / 9
78%
Cursor
6 / 9
67%
CodeRabbit
5 / 9
56%

Detailed Results

SENTRY_REPORT.CSV 9 RECORDS
PR / Bug Severity Kodus CodeRabbit GitHub Copilot Cursor
Replays Self-Serve Bulk Delete SystemBreaking changes in error response format CRITICAL
GitHub OAuth Security EnhancementNull reference if github_authenticated_user state is missing CRITICAL
Optimize spans buffer insertion with eviction during insertNegative offset cursor manipulation bypasses pagination boundaries CRITICAL
Enhanced Pagination Performance for High-Volume Audit LogsImporting non-existent OptimizedCursorPaginator HIGH
Reorganize incident creation / issue occurrence logicUsing stale config variable instead of updated one HIGH
Add ability to use queues to manage parallelismInvalid queue.ShutDown exception handling HIGH
Add hook for producing occurrences from the stateful detectorIncomplete implementation (only contains pass) HIGH
Span Buffer Multiprocess Enhancement with Health MonitoringInconsistent metric tagging with 'shard' and 'shards' MEDIUM
Implement cross-system issue synchronizationShared mutable default in dataclass timestamp MEDIUM
Total 7 / 9 3 / 9 4 / 9 4 / 9
CALCOM_REPORT.CSV 8 RECORDS
PR / Bug Severity Kodus CodeRabbit GitHub Copilot Cursor
feat: 2fa backup codesBackup codes not invalidated after use CRITICAL
fix: handle collective multiple host on destinationCalendarNull reference error if array is empty MEDIUM
feat: convert InsightsBookingService to use Prisma.sql raw queriesPotential SQL injection risk in raw SQL query construction CRITICAL
Comprehensive workflow reminder management for booking lifecycle eventsMissing database cleanup when immediateDelete is true HIGH
Advanced date override handling and timezone compatibility improvementsIncorrect end time calculation using slotStartTime instead of slotEndTime MEDIUM
OAuth credential sync and app integration enhancementsTiming attack vulnerability using direct string comparison CRITICAL
SMS workflow reminder retry count trackingOR condition causes deletion of all workflow reminders HIGH
Add guest management functionality to existing bookingsCase sensitivity bypass in email blacklist HIGH
Total 5 / 8 3 / 8 5 / 8 3 / 8
GRAFANA_REPORT.CSV 8 RECORDS
PR / Bug Severity Kodus CodeRabbit GitHub Copilot Cursor
Advanced SQL Analytics FrameworkenableSqlExpressions function always returns false, disabling SQL functionality CRITICAL
Unified Storage Performance OptimizationsRace condition in cache locking HIGH
Notification Rule Processing EngineMissing key prop causing React rendering issues MEDIUM
Advanced Query Processing ArchitectureDouble interpolation risk CRITICAL
Dual Storage ArchitectureIncorrect metrics recording methods causing misleading performance tracking MEDIUM
Frontend Asset OptimizationDeadlock potential during concurrent annotation deletion operations HIGH
AuthZService: improve authz cachingCache entries without expiration causing permanent permission denials HIGH
Anonymous: Add configurable device limitRace condition in CreateOrUpdateDevice method HIGH
Total 7 / 8 4 / 8 4 / 8 7 / 8
DISCOURSE_REPORT.CSV 8 RECORDS
PR / Bug Severity Kodus CodeRabbit GitHub Copilot Cursor
FEATURE: automatically downsize large imagesMethod overwriting causing parameter mismatch MEDIUM
FEATURE: per-topic unsubscribe option in emailsNil reference non-existent TopicUser HIGH
Add comprehensive email validation for blocked usersBlockedEmail.should_block? modifies DB during read CRITICAL
Enhance embed URL handling and validation systemSSRF vulnerability using open(url) without validation CRITICAL
UX: show complete URL path if website domain is same as instance domainString mutation with << operator MEDIUM
FIX: proper handling of group membershipsRace conditions in async member loading HIGH
FEATURE: Localization fallbacks (server-side)Thread-safety issue with lazy @loaded_locales HIGH
FEATURE: Can edit category/host relationships for embeddingNoMethodError before_validation in EmbeddableHost CRITICAL
Total 8 / 8 3 / 8 6 / 8 6 / 8
KEYCLOAK_REPORT.CSV 5 RECORDS
PR / Bug Severity Kodus CodeRabbit GitHub Copilot Cursor
Add AuthzClientCryptoProvider for authorization client cryptographic operationsReturns wrong provider (default keystore instead of BouncyCastle) HIGH
Fixing Re-authentication with passkeysConditionalPasskeysEnabled() called without UserModel parameter MEDIUM
Add Client resource type and scopes to authorization schemaInconsistent feature flag bug causing orphaned permissions HIGH
Implement access token context encoding frameworkWrong parameter in null check (grantType vs. rawTokenId) CRITICAL
Add caching support for IdentityProviderStorageProvider .getForLogin operationsRecursive caching call using session instead of delegate CRITICAL
Total 3 / 5 2 / 5 1 / 5 2 / 5
BENCHMARK_CTA.EXE
Kody Review

Don't take our word for it.
Try Kody on your next PR.

Spin it up in under 2 minutes — cloud or self-hosted, no credit card.