Your agents crash.
Ours recover.
JamJet checkpoints every step as it happens. When a worker dies mid-run, the scheduler reclaims the lease and resumes from exactly where it stopped — no lost work, no duplicate actions, no reruns.
See it run
$ jamjet run research-pipeline.yaml ▸ Starting execution exec_7f3a... ▸ [Plan] ✓ completed 420ms ▸ [Research] ✓ completed 1.2s ▸ [Analyze] ✗ worker crashed ▸ Lease expired · reclaiming... ▸ [Analyze] ✓ resumed 890ms ▸ [Review] ✓ completed 650ms ▸ [Synthesize] ✓ completed 1.1s ▸ Execution complete · 5/5 nodes · 0 events lost
from jamjet import task, workflow @task async def analyze(data: dict) -> dict: # your logic here — crash-safe by default return {"summary": llm.call(data)} @workflow async def pipeline(): raw = await fetch_data() out = await analyze(raw) return out
12 lines of Python. Crash-safe by default.
Why teams care
Completed steps stay completed. No wasted tokens after failure.
Downstream actions are less likely to be repeated after failure.
Replay the exact execution instead of reconstructing it from logs.
Long-running workflows survive crashes, restarts, and lease handoffs.
Six things that go wrong with agents.
Six gates that catch them.
Click any failure mode to see the runtime intervene. Each demo shows what your code looked like, what JamJet logged, and what the runtime decided.
@workflow
async def pipeline(data):
a = await analyze(data) # crash here
b = await synthesize(a)
return b ▸ [analyze] ✗ worker crashed at 1.4s
▸ Lease expired · scheduler reclaiming...
▸ [analyze] ✓ resumed (event-sourced) 920ms
▸ [synthesize] ✓ completed 1.1s
▸ exec_7f3a · 0 events lost · 0 reruns Reclaimed the lease, replayed the event log, and resumed at the exact failed node. Completed steps were not re-run.
@task(tools=[search, read_files])
async def assistant(q):
# `delete_database` is NOT in the allow-list
return await agent.run(q) ▸ [assistant] agent requests tool: delete_database
▸ Policy: ALLOWED_TOOLS = {search, read_files}
▸ Decision: BLOCKED · tool not in allow-list
▸ Audit: evt_a3b8 → policy.deny[delete_database]
▸ Agent: receives "tool unavailable; pick another" Blocked the tool call before execution. The audit log records what was attempted, who attempted it, and why it was denied.
@workflow
async def transfer(amount, to):
await guard.requires_approval(
action="wire_transfer",
amount=amount,
)
return await bank.send(amount, to) ▸ [transfer] action: wire_transfer · $50,000
▸ [approval] suspended · approver: [email protected]
...worker restarts after deploy...
▸ [approval] resumed · still waiting (durable)
▸ [approval] approved by [email protected] at 14:08
▸ [bank.send] ✓ completed 1.8s Suspended the run durably until a human decided. Survived a worker restart. Resumed only after the approval landed in the audit log.
@task(
strategy="reflection",
max_iters=8,
max_cost=0.50,
)
async def reasoner(q):
return await loop_until_confident(q) ▸ [reasoner] iter 1 · $0.08 · confidence 0.62
▸ [reasoner] iter 2 · $0.16 · confidence 0.71
▸ [reasoner] iter 3 · $0.27 · confidence 0.78
▸ [reasoner] iter 4 · $0.41 · confidence 0.81
▸ Budget: $0.41 + projected $0.18 > cap $0.50
▸ Decision: HALT · returning best-so-far (0.81) The runtime stopped the loop before it exceeded the cost cap. The best-so-far answer was returned with full cost telemetry attached.
# after a run completes:
bundle = await jamjet.audit.export(
run_id="exec_7f3a",
format="pdf",
include=["events", "tools",
"policy", "approvals"],
) ▸ Building evidence package for exec_7f3a...
▸ Events: 247 (signed, immutable)
▸ Tool calls: 18 (args, results, latency)
▸ Policy: 12 (3 allowed, 9 evaluated)
▸ Approvals: 1 ([email protected], 14:08)
▸ Output: audit_exec_7f3a.pdf · 4.2 MB Every decision the runtime made is in the bundle. Hand it to security, compliance, or your auditor — no log-stitching required.
# session 8 — agent has talked to user before
context = await engram.context_for(
user="sunil",
topic="release notes",
budget_tokens=400,
) [ from earlier sessions ]
▸ sunil prefers concise answers (s1)
▸ project: JamJet Cloud (s1)
▸ shipped policy violations panel (s7)
▸ location: Amsterdam (superseded Pune, s6)
[ token budget: 287 of 400 ] Engram surfaced durable facts (with timestamps and supersedes) instead of dumping raw chat history into context. Stays under the token budget.
How teams use JamJet under load
Click any card to expand the code.
Investment Due Diligence
A durable multi-agent workflow for report generation, risk review, and compliance checks.
RAG Assistant
A retrieval-and-synthesis workflow where every step is checkpointed, traceable, and replayable.
Human Approval Workflow
Pause durably for a human decision, then resume without losing state or re-running prior work.
MCP Tool Integration
Use external tools through MCP while keeping each tool call inside the durable runtime.
Agent-to-Agent Delegation
Delegate to specialized agents via A2A with identity-aware, cost-aware, replayable execution.