AI Code Review Benchmark

We evaluated Kody and other AI code review tools on the same PRs across five open-source projects. The goal is to give you a clear picture of how each tool performs in real reviews.

KODUS // BENCHMARK ANALYZER

SCAN_COMPLETE [v1.0]

METHODOLOGY

We selected nine pull requests from five large, actively maintained open-source repositories. Each PR contained at least one real, documented issue — a bug, a security vulnerability, or a performance concern that was later confirmed by the project maintainers.

We then ran four AI code review tools on each PR under identical conditions: same diff, same context window, default configuration. No tool received any hints or custom rules.

A finding was counted as a hit only if the tool flagged the specific issue that the PR was known to contain. Generic style or formatting comments were ignored.

REPOSITORIES ANALYZED

benchmark_summary.sh ● DONE

$ cat results/tldr.txt

CRITICAL Kodus detected 69% of critical issues (9/13). Cursor came second at 62%.

HIGH Kodus led high-severity detection at 81% (13/16). CodeRabbit detected only 31%.

MEDIUM Kodus caught 89% of medium issues (8/9). GitHub Copilot came second at 78%.

★ Overall: Kodus was the most consistent tool across 38 PRs, catching 30/38 known issues (79%) across every severity level.

$ █

Overall Performance

PERF_ANALYSIS.DAT 38 PRs ANALYZED

Overall — Issues Detected (38 PRs)

Kodus

30 / 38

79%

Cursor

22 / 38

58%

GitHub Copilot

20 / 38

53%

CodeRabbit

15 / 38

39%

Critical Severity (13 PRs)

Kodus

9 / 13

69%

Cursor

8 / 13

62%

GitHub Copilot

7 / 13

54%

CodeRabbit

5 / 13

38%

High Severity (16 PRs)

Kodus

13 / 16

81%

Cursor

8 / 16

50%

GitHub Copilot

6 / 16

38%

CodeRabbit

5 / 16

31%

Medium Severity (9 PRs)

Kodus

8 / 9

89%

GitHub Copilot

7 / 9

78%

Cursor

6 / 9

67%

CodeRabbit

5 / 9

56%

Detailed Results

SENTRY_REPORT.CSV 9 RECORDS

PR / Bug	Severity	Kodus	CodeRabbit	GitHub Copilot	Cursor
Replays Self-Serve Bulk Delete SystemBreaking changes in error response format	CRITICAL	✕	✕	✕	✕
GitHub OAuth Security EnhancementNull reference if github_authenticated_user state is missing	CRITICAL	✓	✕	✓	✓
Optimize spans buffer insertion with eviction during insertNegative offset cursor manipulation bypasses pagination boundaries	CRITICAL	✓	✓	✕	✓
Enhanced Pagination Performance for High-Volume Audit LogsImporting non-existent OptimizedCursorPaginator	HIGH	✕	✕	✕	✕
Reorganize incident creation / issue occurrence logicUsing stale config variable instead of updated one	HIGH	✓	✓	✕	✕
Add ability to use queues to manage parallelismInvalid queue.ShutDown exception handling	HIGH	✓	✕	✓	✕
Add hook for producing occurrences from the stateful detectorIncomplete implementation (only contains pass)	HIGH	✓	✕	✕	✓
Span Buffer Multiprocess Enhancement with Health MonitoringInconsistent metric tagging with 'shard' and 'shards'	MEDIUM	✓	✕	✓	✕
Implement cross-system issue synchronizationShared mutable default in dataclass timestamp	MEDIUM	✓	✓	✓	✓
Total		7 / 9	3 / 9	4 / 9	4 / 9

CALCOM_REPORT.CSV 8 RECORDS

PR / Bug	Severity	Kodus	CodeRabbit	GitHub Copilot	Cursor
feat: 2fa backup codesBackup codes not invalidated after use	CRITICAL	✕	✓	✕	✕
fix: handle collective multiple host on destinationCalendarNull reference error if array is empty	MEDIUM	✓	✕	✓	✓
feat: convert InsightsBookingService to use Prisma.sql raw queriesPotential SQL injection risk in raw SQL query construction	CRITICAL	✕	✕	✓	✕
Comprehensive workflow reminder management for booking lifecycle eventsMissing database cleanup when immediateDelete is true	HIGH	✓	✕	✕	✓
Advanced date override handling and timezone compatibility improvementsIncorrect end time calculation using slotStartTime instead of slotEndTime	MEDIUM	✓	✓	✓	✕
OAuth credential sync and app integration enhancementsTiming attack vulnerability using direct string comparison	CRITICAL	✓	✕	✕	✕
SMS workflow reminder retry count trackingOR condition causes deletion of all workflow reminders	HIGH	✓	✓	✓	✓
Add guest management functionality to existing bookingsCase sensitivity bypass in email blacklist	HIGH	✕	✕	✓	✕
Total		5 / 8	3 / 8	5 / 8	3 / 8

GRAFANA_REPORT.CSV 8 RECORDS

PR / Bug	Severity	Kodus	CodeRabbit	GitHub Copilot	Cursor
Advanced SQL Analytics FrameworkenableSqlExpressions function always returns false, disabling SQL functionality	CRITICAL	✓	✓	✓	✓
Unified Storage Performance OptimizationsRace condition in cache locking	HIGH	✓	✕	✕	✓
Notification Rule Processing EngineMissing key prop causing React rendering issues	MEDIUM	✓	✓	✕	✓
Advanced Query Processing ArchitectureDouble interpolation risk	CRITICAL	✓	✕	✓	✓
Dual Storage ArchitectureIncorrect metrics recording methods causing misleading performance tracking	MEDIUM	✓	✓	✓	✓
Frontend Asset OptimizationDeadlock potential during concurrent annotation deletion operations	HIGH	✕	✓	✓	✓
AuthZService: improve authz cachingCache entries without expiration causing permanent permission denials	HIGH	✓	✕	✕	✓
Anonymous: Add configurable device limitRace condition in CreateOrUpdateDevice method	HIGH	✓	✕	✕	✕
Total		7 / 8	4 / 8	4 / 8	7 / 8

DISCOURSE_REPORT.CSV 8 RECORDS

PR / Bug	Severity	Kodus	CodeRabbit	GitHub Copilot	Cursor
FEATURE: automatically downsize large imagesMethod overwriting causing parameter mismatch	MEDIUM	✓	✓	✓	✓
FEATURE: per-topic unsubscribe option in emailsNil reference non-existent TopicUser	HIGH	✓	✓	✓	✓
Add comprehensive email validation for blocked usersBlockedEmail.should_block? modifies DB during read	CRITICAL	✓	✕	✕	✓
Enhance embed URL handling and validation systemSSRF vulnerability using open(url) without validation	CRITICAL	✓	✓	✓	✓
UX: show complete URL path if website domain is same as instance domainString mutation with << operator	MEDIUM	✓	✕	✓	✓
FIX: proper handling of group membershipsRace conditions in async member loading	HIGH	✓	✕	✕	✕
FEATURE: Localization fallbacks (server-side)Thread-safety issue with lazy @loaded_locales	HIGH	✓	✕	✓	✕
FEATURE: Can edit category/host relationships for embeddingNoMethodError before_validation in EmbeddableHost	CRITICAL	✓	✕	✓	✓
Total		8 / 8	3 / 8	6 / 8	6 / 8

KEYCLOAK_REPORT.CSV 5 RECORDS

PR / Bug	Severity	Kodus	CodeRabbit	GitHub Copilot	Cursor
Add AuthzClientCryptoProvider for authorization client cryptographic operationsReturns wrong provider (default keystore instead of BouncyCastle)	HIGH	✓	✓	✕	✕
Fixing Re-authentication with passkeysConditionalPasskeysEnabled() called without UserModel parameter	MEDIUM	✕	✕	✕	✕
Add Client resource type and scopes to authorization schemaInconsistent feature flag bug causing orphaned permissions	HIGH	✓	✕	✕	✓
Implement access token context encoding frameworkWrong parameter in null check (grantType vs. rawTokenId)	CRITICAL	✓	✓	✓	✓
Add caching support for IdentityProviderStorageProvider .getForLogin operationsRecursive caching call using session instead of delegate	CRITICAL	✕	✕	✕	✕
Total		3 / 5	2 / 5	1 / 5	2 / 5

BENCHMARK_CTA.EXE

Don't take our word for it.
Try Kody on your next PR.

Spin it up in under 2 minutes — cloud or self-hosted, no credit card.

DEPLOY START FREE TRIAL

AI Code Review Benchmark

Overall Performance

Overall — Issues Detected (38 PRs)

Critical Severity (13 PRs)

High Severity (16 PRs)

Medium Severity (9 PRs)

Detailed Results

Don't take our word for it.Try Kody on your next PR.

Don't take our word for it.
Try Kody on your next PR.