AI Code Review Benchmark
We evaluated Kody and other AI code review tools on the same PRs across five open-source projects. The goal is to give you a clear picture of how each tool performs in real reviews.
We selected nine pull requests from five large, actively maintained open-source repositories. Each PR contained at least one real, documented issue — a bug, a security vulnerability, or a performance concern that was later confirmed by the project maintainers.
We then ran four AI code review tools on each PR under identical conditions: same diff, same context window, default configuration. No tool received any hints or custom rules.
A finding was counted as a hit only if the tool flagged the specific issue that the PR was known to contain. Generic style or formatting comments were ignored.
Overall Performance
Overall — Issues Detected (38 PRs)
Critical Severity (13 PRs)
High Severity (16 PRs)
Medium Severity (9 PRs)
Detailed Results
| PR / Bug | Severity | Kodus | CodeRabbit | GitHub Copilot | Cursor |
|---|---|---|---|---|---|
| Replays Self-Serve Bulk Delete SystemBreaking changes in error response format | CRITICAL | ✕ | ✕ | ✕ | ✕ |
| GitHub OAuth Security EnhancementNull reference if github_authenticated_user state is missing | CRITICAL | ✓ | ✕ | ✓ | ✓ |
| Optimize spans buffer insertion with eviction during insertNegative offset cursor manipulation bypasses pagination boundaries | CRITICAL | ✓ | ✓ | ✕ | ✓ |
| Enhanced Pagination Performance for High-Volume Audit LogsImporting non-existent OptimizedCursorPaginator | HIGH | ✕ | ✕ | ✕ | ✕ |
| Reorganize incident creation / issue occurrence logicUsing stale config variable instead of updated one | HIGH | ✓ | ✓ | ✕ | ✕ |
| Add ability to use queues to manage parallelismInvalid queue.ShutDown exception handling | HIGH | ✓ | ✕ | ✓ | ✕ |
| Add hook for producing occurrences from the stateful detectorIncomplete implementation (only contains pass) | HIGH | ✓ | ✕ | ✕ | ✓ |
| Span Buffer Multiprocess Enhancement with Health MonitoringInconsistent metric tagging with 'shard' and 'shards' | MEDIUM | ✓ | ✕ | ✓ | ✕ |
| Implement cross-system issue synchronizationShared mutable default in dataclass timestamp | MEDIUM | ✓ | ✓ | ✓ | ✓ |
| Total | 7 / 9 | 3 / 9 | 4 / 9 | 4 / 9 |
| PR / Bug | Severity | Kodus | CodeRabbit | GitHub Copilot | Cursor |
|---|---|---|---|---|---|
| feat: 2fa backup codesBackup codes not invalidated after use | CRITICAL | ✕ | ✓ | ✕ | ✕ |
| fix: handle collective multiple host on destinationCalendarNull reference error if array is empty | MEDIUM | ✓ | ✕ | ✓ | ✓ |
| feat: convert InsightsBookingService to use Prisma.sql raw queriesPotential SQL injection risk in raw SQL query construction | CRITICAL | ✕ | ✕ | ✓ | ✕ |
| Comprehensive workflow reminder management for booking lifecycle eventsMissing database cleanup when immediateDelete is true | HIGH | ✓ | ✕ | ✕ | ✓ |
| Advanced date override handling and timezone compatibility improvementsIncorrect end time calculation using slotStartTime instead of slotEndTime | MEDIUM | ✓ | ✓ | ✓ | ✕ |
| OAuth credential sync and app integration enhancementsTiming attack vulnerability using direct string comparison | CRITICAL | ✓ | ✕ | ✕ | ✕ |
| SMS workflow reminder retry count trackingOR condition causes deletion of all workflow reminders | HIGH | ✓ | ✓ | ✓ | ✓ |
| Add guest management functionality to existing bookingsCase sensitivity bypass in email blacklist | HIGH | ✕ | ✕ | ✓ | ✕ |
| Total | 5 / 8 | 3 / 8 | 5 / 8 | 3 / 8 |
| PR / Bug | Severity | Kodus | CodeRabbit | GitHub Copilot | Cursor |
|---|---|---|---|---|---|
| Advanced SQL Analytics FrameworkenableSqlExpressions function always returns false, disabling SQL functionality | CRITICAL | ✓ | ✓ | ✓ | ✓ |
| Unified Storage Performance OptimizationsRace condition in cache locking | HIGH | ✓ | ✕ | ✕ | ✓ |
| Notification Rule Processing EngineMissing key prop causing React rendering issues | MEDIUM | ✓ | ✓ | ✕ | ✓ |
| Advanced Query Processing ArchitectureDouble interpolation risk | CRITICAL | ✓ | ✕ | ✓ | ✓ |
| Dual Storage ArchitectureIncorrect metrics recording methods causing misleading performance tracking | MEDIUM | ✓ | ✓ | ✓ | ✓ |
| Frontend Asset OptimizationDeadlock potential during concurrent annotation deletion operations | HIGH | ✕ | ✓ | ✓ | ✓ |
| AuthZService: improve authz cachingCache entries without expiration causing permanent permission denials | HIGH | ✓ | ✕ | ✕ | ✓ |
| Anonymous: Add configurable device limitRace condition in CreateOrUpdateDevice method | HIGH | ✓ | ✕ | ✕ | ✕ |
| Total | 7 / 8 | 4 / 8 | 4 / 8 | 7 / 8 |
| PR / Bug | Severity | Kodus | CodeRabbit | GitHub Copilot | Cursor |
|---|---|---|---|---|---|
| FEATURE: automatically downsize large imagesMethod overwriting causing parameter mismatch | MEDIUM | ✓ | ✓ | ✓ | ✓ |
| FEATURE: per-topic unsubscribe option in emailsNil reference non-existent TopicUser | HIGH | ✓ | ✓ | ✓ | ✓ |
| Add comprehensive email validation for blocked usersBlockedEmail.should_block? modifies DB during read | CRITICAL | ✓ | ✕ | ✕ | ✓ |
| Enhance embed URL handling and validation systemSSRF vulnerability using open(url) without validation | CRITICAL | ✓ | ✓ | ✓ | ✓ |
| UX: show complete URL path if website domain is same as instance domainString mutation with << operator | MEDIUM | ✓ | ✕ | ✓ | ✓ |
| FIX: proper handling of group membershipsRace conditions in async member loading | HIGH | ✓ | ✕ | ✕ | ✕ |
| FEATURE: Localization fallbacks (server-side)Thread-safety issue with lazy @loaded_locales | HIGH | ✓ | ✕ | ✓ | ✕ |
| FEATURE: Can edit category/host relationships for embeddingNoMethodError before_validation in EmbeddableHost | CRITICAL | ✓ | ✕ | ✓ | ✓ |
| Total | 8 / 8 | 3 / 8 | 6 / 8 | 6 / 8 |
| PR / Bug | Severity | Kodus | CodeRabbit | GitHub Copilot | Cursor |
|---|---|---|---|---|---|
| Add AuthzClientCryptoProvider for authorization client cryptographic operationsReturns wrong provider (default keystore instead of BouncyCastle) | HIGH | ✓ | ✓ | ✕ | ✕ |
| Fixing Re-authentication with passkeysConditionalPasskeysEnabled() called without UserModel parameter | MEDIUM | ✕ | ✕ | ✕ | ✕ |
| Add Client resource type and scopes to authorization schemaInconsistent feature flag bug causing orphaned permissions | HIGH | ✓ | ✕ | ✕ | ✓ |
| Implement access token context encoding frameworkWrong parameter in null check (grantType vs. rawTokenId) | CRITICAL | ✓ | ✓ | ✓ | ✓ |
| Add caching support for IdentityProviderStorageProvider .getForLogin operationsRecursive caching call using session instead of delegate | CRITICAL | ✕ | ✕ | ✕ | ✕ |
| Total | 3 / 5 | 2 / 5 | 1 / 5 | 2 / 5 |
Don't take our word for it.
Try Kody on your next PR.
Spin it up in under 2 minutes — cloud or self-hosted, no credit card.