Framework overhead — zero-cost orchestration
JamJet vs raw LLM calls vs LangGraph, reproducible end-to-end.
Across two reader models — a fast 3B (llama3.2) and a slower 8B chain-of-thought model (qwen3:8b) — JamJet's in-process executor adds zero observable overhead vs raw LLM calls. LangGraph is similarly tied within measurement noise. The numbers are below; the reproduction script is in the methodology section.
Last updated 2026-03-08
llama3.2 · Ollama · Apple M-series · 2026-03-08
| Framework | mean (ms) | median | p95 | p99 | stdev | overhead | visual |
|---|---|---|---|---|---|---|---|
| Raw (baseline) | 947.2 | 943.7 | 970.3 | 972.2 | 9.9 | — | |
| JamJet 0.1.1 | 948.6 | 948.2 | 959.0 | 964.2 | 6.0 | +1.4ms | |
| LangGraph | 944.0 | 943.0 | 953.8 | 961.1 | 8.1 | -3.2ms |
Note: All three frameworks within measurement noise (~1ms). JamJet's in-process executor adds zero observable overhead over a raw LLM call.
qwen3:8b (thinking mode) · Ollama · Apple M-series · 2026-03-08
| Framework | mean (ms) | median | p95 | p99 | stdev | overhead | visual |
|---|---|---|---|---|---|---|---|
| Raw (baseline) | 8429.5 | 8303.4 | 8940.3 | 9427.6 | 352.3 | — | |
| JamJet 0.1.1 | 10140.1 | 10139.1 | 10487.0 | 10519.5 | 285.1 | +1710.6ms | |
| LangGraph | 11902.9 | 11923.3 | 12761.8 | 12823.5 | 551.7 | +3473.3ms |
Note: qwen3:8b generates variable-length chain-of-thought. High stdev dominates — overhead numbers reflect token generation variance, not framework overhead.
Methodology
All benchmarks measure wall-clock time per call. Each framework makes the identical LLM call through the same OpenAI-compatible client — what we measure is framework orchestration overhead.
- Raw (baseline) — bare openai.OpenAI().chat.completions.create() call
- JamJet — Workflow.run_sync() in-process executor
- LangGraph — StateGraph.compile().invoke() with a single node
# Reproduce locally (Ollama)
export OPENAI_API_KEY="ollama"
export OPENAI_BASE_URL="http://localhost:11434/v1"
export MODEL_NAME="llama3.2"
git clone https://github.com/jamjet-labs/jamjet-benchmarks
cd jamjet-benchmarks/benchmarks
pip install -r requirements.txt
python bench_single_call.py --json results/my-run.json - Warmup runs excluded from measurements
- Each timed run is independent — no shared state
- Benchmarks run sequentially to avoid contention
- Hardware: Apple M-series, 16GB RAM, Ollama local
More: JamJet benchmarks index · Engram LongMemEval-S leaderboard · JamJet comparisons