JamJet Benchmarks — Agent Framework Overhead Comparison
Framework orchestration overhead — JamJet vs LangGraph vs raw LLM call.
All runners make the same LLM call through the same client. The difference is pure framework tax.
llama3.2 · Ollama · Apple M-series · 2026-03-08
| Framework | mean (ms) | median | p95 | p99 | stdev | overhead | visual |
|---|---|---|---|---|---|---|---|
| Raw (baseline) | 947.2 | 943.7 | 970.3 | 972.2 | 9.9 | — | |
| JamJet 0.1.1 | 948.6 | 948.2 | 959.0 | 964.2 | 6.0 | +1.4ms | |
| LangGraph | 944.0 | 943.0 | 953.8 | 961.1 | 8.1 | -3.2ms |
Note: All three frameworks within measurement noise (~1ms). JamJet's in-process executor adds zero observable overhead over a raw LLM call.
qwen3:8b (thinking mode) · Ollama · Apple M-series · 2026-03-08
| Framework | mean (ms) | median | p95 | p99 | stdev | overhead | visual |
|---|---|---|---|---|---|---|---|
| Raw (baseline) | 8429.5 | 8303.4 | 8940.3 | 9427.6 | 352.3 | — | |
| JamJet 0.1.1 | 10140.1 | 10139.1 | 10487.0 | 10519.5 | 285.1 | +1710.6ms | |
| LangGraph | 11902.9 | 11923.3 | 12761.8 | 12823.5 | 551.7 | +3473.3ms |
Note: qwen3:8b generates variable-length chain-of-thought. High stdev dominates — overhead numbers reflect token generation variance, not framework overhead.
Vertex AI (Gemini 2.0 Flash) — plan-and-execute agent
End-to-end run: JamJet @task + @tool on Vertex AI's OpenAI-compatible endpoint. Two-step research agent — plan then synthesize.
| Step | Latency (ms) | Prompt tokens | Compl tokens | Total tokens |
|---|---|---|---|---|
| plan — Gemini Flash | 2,641 | 96 | 104 | 200 |
| step 1 execution | 1,324 | 129 | 90 | 219 |
| step 2 execution | 1,413 | 127 | 110 | 237 |
| step 3 execution | 1,228 | 132 | 103 | 235 |
| step 4 execution | 1,986 | 664 | 186 | 850 |
| step 5 execution | 1,290 | 280 | 100 | 380 |
| synthesize — Gemini Flash | 3,050 | 124 | 153 | 277 |
| TOTAL (12 calls) | 41,811 | 6,121 | 4,840 | 10,961 |
Methodology
All benchmarks measure wall-clock time per call. Each framework makes the identical LLM call through the same OpenAI-compatible client — what we measure is framework orchestration overhead.
- Raw (baseline) — bare openai.OpenAI().chat.completions.create() call
- JamJet — Workflow.run_sync() in-process executor
- LangGraph — StateGraph.compile().invoke() with a single node
- Warmup runs excluded from measurements
- Each timed run is independent — no shared state
- Benchmarks run sequentially to avoid contention
- Hardware: Apple M-series, 16GB RAM, Ollama local