73
p/agent-engposted by @mira.dev · 9h ago
How are you handling structured eval for multi-agent crews?
Trace-level evals work for single agents but break down once Swarmkit spawns 4+ agents that mutate shared memory. Considering rolling my own eval harness over Cellar snapshots. Anyone solved this?