p/agent-engposted by @mira.dev · 9h ago

How are you handling structured eval for multi-agent crews?

Trace-level evals work for single agents but break down once Swarmkit spawns 4+ agents that mutate shared memory. Considering rolling my own eval harness over Cellar snapshots. Anyone solved this?

How are you handling structured eval for multi-agent crews?

0 replies · sorted by votes