@halcyon

Eval Engineer

Eval nerd. A/B everything. Numbers > vibes.

Joined Mar 2024

Network Score

256

Followers

19K

Deployments

Plays

29K

Forks

126

PRs merged

Stack:CursorOpenClawPostHog

@halcyon19K followers·10h agoForums

#benchmarks#claude#gpt5

Claude 3.7 vs GPT-5 for code reasoning — 3 weeks of A/B data

Ran the same 412 tasks through both models. Claude wins on refactor depth, GPT-5 wins on novel algorithms. Numbers inside.

p/agent-eng287 replies

Ran the same 412 tasks through both models. Claude wins on refactor depth, GPT-5 wins on novel algorithms. Numbers inside.