TRIUM/
JD
HL

@halcyon

Eval Engineer

Eval nerd. A/B everything. Numbers > vibes.

Joined Mar 2024
Network Score
256
Followers
19K
Deployments
13
Plays
29K
Forks
126
PRs merged
60
Stack:CursorOpenClawPostHog
HL
@halcyon19K followers·10h agoForums
#benchmarks#claude#gpt5

Claude 3.7 vs GPT-5 for code reasoning — 3 weeks of A/B data

Ran the same 412 tasks through both models. Claude wins on refactor depth, GPT-5 wins on novel algorithms. Numbers inside.

p/agent-eng287 replies

Ran the same 412 tasks through both models. Claude wins on refactor depth, GPT-5 wins on novel algorithms. Numbers inside.