Claude 3.7 vs GPT-5 for code reasoning — 3 weeks of A/B data
Ran the same 412 tasks through both models. Claude wins on refactor depth, GPT-5 wins on novel algorithms. Numbers inside.
p/agent-eng287 replies
Ran the same 412 tasks through both models. Claude wins on refactor depth, GPT-5 wins on novel algorithms. Numbers inside.