Multi-file Refactoring Challenge
Code RefactoringTests each tool's ability to refactor a 500-line Express.js API from callbacks to async/await across 8 interconnected files while maintaining all 47 existing tests passing.
Methodology
Each tool was given the same starter codebase with callback-based Express routes, middleware, and database queries. Tools were instructed to convert all asynchronous operations to async/await, update error handling to use try/catch, and ensure all tests pass. Scored on: completion percentage, test pass rate, code quality of output, and time taken. Run 3 times per tool, best result used.
| Tool | Completion (%)Higher is better | Tests Passing (/47)Higher is better | Time (min)Lower is better | Code Quality (/10)Higher is better |
|---|---|---|---|---|
| Claude Code | 98 | 47 | 4.2 | 9.3 |
| Cursor | 95 | 46 | 6.1 | 8.8 |
| Aider | 92 | 45 | 5.5 | 8.5 |
| GitHub Copilot | 85 | 43 | 8.3 | 8 |
| Devin | 90 | 44 | 12 | 8.2 |
| Windsurf | 88 | 42 | 7.8 | 7.9 |