Your agents crash.
Ours recover.

JamJet checkpoints every step as it happens. When a worker dies mid-run, the scheduler reclaims the lease and resumes from exactly where it stopped — no lost work, no duplicate actions, no reruns.

$ pip install jamjet

Read the docs

Live execution

See it run

jamjet · research-pipeline.yaml

$ jamjet run research-pipeline.yaml
▸ Starting execution exec_7f3a...
▸ [Plan]       ✓ completed  420ms
▸ [Research]   ✓ completed  1.2s
▸ [Analyze]    ✗ worker crashed
▸ Lease expired · reclaiming...
▸ [Analyze]    ✓ resumed    890ms
▸ [Review]     ✓ completed  650ms
▸ [Synthesize] ✓ completed  1.1s
▸ Execution complete · 5/5 nodes · 0 events lost

View full example on GitHub →

pipeline.py Python · 12 lines

from jamjet import task, workflow

@task
async def analyze(data: dict) -> dict:
    # your logic here — crash-safe by default
    return {"summary": llm.call(data)}

@workflow
async def pipeline():
    raw  = await fetch_data()
    out  = await analyze(raw)
    return out

12 lines of Python. Crash-safe by default.

Impact

Why teams care

Lower rerun cost

Completed steps stay completed. No wasted tokens after failure.

Safer side effects

Downstream actions are less likely to be repeated after failure.

Faster debugging

Replay the exact execution instead of reconstructing it from logs.

More reliable operations

Long-running workflows survive crashes, restarts, and lease handoffs.

Failure modes

Six things that go wrong with agents.
Six gates that catch them.

Click any failure mode to see the runtime intervene. Each demo shows what your code looked like, what JamJet logged, and what the runtime decided.

The worker dies mid-run. Crash recovery

Your code

@workflow
async def pipeline(data):
    a = await analyze(data)     # crash here
    b = await synthesize(a)
    return b

Runtime

▸ [analyze]    ✗ worker crashed at 1.4s
▸ Lease expired · scheduler reclaiming...
▸ [analyze]    ✓ resumed (event-sourced) 920ms
▸ [synthesize] ✓ completed 1.1s
▸ exec_7f3a · 0 events lost · 0 reruns

What JamJet did

Reclaimed the lease, replayed the event log, and resumed at the exact failed node. Completed steps were not re-run.

The agent reaches for a tool it shouldn't. Policy enforcement

Your code

@task(tools=[search, read_files])
async def assistant(q):
    # `delete_database` is NOT in the allow-list
    return await agent.run(q)

Runtime

▸ [assistant]  agent requests tool: delete_database
▸ Policy:      ALLOWED_TOOLS = {search, read_files}
▸ Decision:    BLOCKED · tool not in allow-list
▸ Audit:       evt_a3b8 → policy.deny[delete_database]
▸ Agent:       receives "tool unavailable; pick another"

What JamJet did

Blocked the tool call before execution. The audit log records what was attempted, who attempted it, and why it was denied.

A high-risk action waits for a human. Durable approval

Your code

@workflow
async def transfer(amount, to):
    await guard.requires_approval(
        action="wire_transfer",
        amount=amount,
    )
    return await bank.send(amount, to)

Runtime

▸ [transfer]   action: wire_transfer · $50,000
▸ [approval]   suspended · approver: [email protected]
  ...worker restarts after deploy...
▸ [approval]   resumed · still waiting (durable)
▸ [approval]   approved by [email protected] at 14:08
▸ [bank.send]  ✓ completed 1.8s

What JamJet did

Suspended the run durably until a human decided. Survived a worker restart. Resumed only after the approval landed in the audit log.

The reflection loop won't stop spending. Cost guardrail

Your code

@task(
    strategy="reflection",
    max_iters=8,
    max_cost=0.50,
)
async def reasoner(q):
    return await loop_until_confident(q)

Runtime

▸ [reasoner]   iter 1 · $0.08 · confidence 0.62
▸ [reasoner]   iter 2 · $0.16 · confidence 0.71
▸ [reasoner]   iter 3 · $0.27 · confidence 0.78
▸ [reasoner]   iter 4 · $0.41 · confidence 0.81
▸ Budget:      $0.41 + projected $0.18 > cap $0.50
▸ Decision:    HALT · returning best-so-far (0.81)

What JamJet did

The runtime stopped the loop before it exceeded the cost cap. The best-so-far answer was returned with full cost telemetry attached.

Compliance asks for evidence of what happened. Audit export

Your code

# after a run completes:
bundle = await jamjet.audit.export(
    run_id="exec_7f3a",
    format="pdf",
    include=["events", "tools",
             "policy", "approvals"],
)

Runtime

▸ Building evidence package for exec_7f3a...
▸ Events:      247 (signed, immutable)
▸ Tool calls:  18  (args, results, latency)
▸ Policy:      12  (3 allowed, 9 evaluated)
▸ Approvals:   1   ([email protected], 14:08)
▸ Output:      audit_exec_7f3a.pdf · 4.2 MB

What JamJet did

Every decision the runtime made is in the bundle. Hand it to security, compliance, or your auditor — no log-stitching required.

The agent forgot what it knows about you. Engram recall

Your code

# session 8 — agent has talked to user before
context = await engram.context_for(
    user="sunil",
    topic="release notes",
    budget_tokens=400,
)

Runtime

[ from earlier sessions ]
▸ sunil prefers concise answers          (s1)
▸ project: JamJet Cloud                  (s1)
▸ shipped policy violations panel        (s7)
▸ location: Amsterdam (superseded Pune, s6)
[ token budget: 287 of 400 ]

What JamJet did

Engram surfaced durable facts (with timestamps and supersedes) instead of dumping raw chat history into context. Stays under the token budget.

Real workflows

How teams use JamJet under load

Click any card to expand the code.

Investment Due Diligence

A durable multi-agent workflow for report generation, risk review, and compliance checks.

multi-agenta2adurable

due-diligence.py View on GitHub →

@task(model="claude-sonnet-4-6", tools=[market_data, sec_filings])
async def analyze_company(ticker: str) -> Report:
    """Deep financial analysis with risk assessment."""

RAG Assistant

A retrieval-and-synthesis workflow where every step is checkpointed, traceable, and replayable.

ragtoolsdurable

rag-agent.py View on GitHub →

@task(tools=[search, retrieve, summarize])
async def answer(question: str) -> str:
    """Search knowledge base, retrieve docs, synthesize answer."""

Human Approval Workflow

Pause durably for a human decision, then resume without losing state or re-running prior work.

hitldurableproduction

approval.py View on GitHub →

@workflow.step(type="human_approval")
async def approve(state):
    return await state.wait_for_approval()

MCP Tool Integration

Use external tools through MCP while keeping each tool call inside the durable runtime.

mcptoolsinterop

mcp-agent.py View on GitHub →

@task(tools=[web_search, calculator], strategy="react")
async def research(q: str) -> str:
    """Research with MCP tools, crash-safe."""

Agent-to-Agent Delegation

Delegate to specialized agents via A2A with identity-aware, cost-aware, replayable execution.

a2aroutingproduction

delegation.py View on GitHub →

card = await discover("/.well-known/agent.json")
result = await delegate(card, task="analyze report")

Your agents crash.Ours recover.

See it run

Why teams care

Six things that go wrong with agents.Six gates that catch them.

How teams use JamJet under load

Investment Due Diligence

RAG Assistant

Human Approval Workflow

MCP Tool Integration

Agent-to-Agent Delegation

Your agents crash.
Ours recover.

Six things that go wrong with agents.
Six gates that catch them.