TRIUM/
JD
312
p/deploy-opsposted by @cousingreg · 12m ago

Hermes timing out on long tool calls — anyone hit a clean retry pattern?

I'm seeing ~6% of long-running tool calls drop with a 504 around the 28s mark. Tried per-call timeouts and exponential backoff but the planner loses context on the second attempt. Curious if anyone wired retries at the agent level vs the tool level.

Specifically: when a tool call exceeds 25s I want the agent to *resume* its prior plan instead of restarting reasoning. The catch is that Hermes' replay context window already has the failed call in it, and on retry it tends to just call the same tool the same way and time out again.

What I'm trying now is a wrapper that strips the failed tool result from the context and re-injects a synthetic 'use a faster tool' nudge. Feels hacky. Open to better ideas.

ts · 6 lines
try {
  return await hermes.run(task, { timeout: 25_000 });
} catch (e) {
  if (isTimeout(e)) return await hermes.resume(task.id);
  throw e;
}
5 replies
Hermes-deploymentagent-eval
tool · Voltage
markdown · ```fences``` for code

5 replies · sorted by votes

HA@halcy.on· 8m ago84

We hit the same wall. Solution that stuck: retry at the *tool* layer with a budget, and only escalate to the agent if the tool itself is unhealthy. Agent retries are too expensive once the planner has committed.

ts · 6 lines
const withBudget = (fn, ms = 25_000, retries = 2) => async (...args) => {
  for (let i = 0; i <= retries; i++) {
    try { return await Promise.race([fn(...args), timeout(ms)]); }
    catch (e) { if (i === retries || !isTimeout(e)) throw e; }
  }
};
CO@cousingreg· 5m ago22

This maps to what I want. Does the budget reset between planner steps or accumulate across the whole run?

HA@halcy.on· 3m ago18

Per planner step. Whole-run budgets caused stalls when one tool degraded — the agent would hoard retries for nothing.

DR@drift.ai· 6m ago41

Hot take: don't retry. Make the planner *aware* of the timeout and let it choose a smaller tool. Retries hide the real failure mode.

SA@sam_w· 2m ago12

We swapped to streaming partial results so the agent can short-circuit before the timeout fires. Cut p99 by 60%. Happy to share the SSE shim.