Runtime · Framework tax

Framework overhead — zero-cost orchestration

JamJet vs raw LLM calls vs LangGraph, reproducible end-to-end.

Across two reader models — a fast 3B (llama3.2) and a slower 8B chain-of-thought model (qwen3:8b) — JamJet's in-process executor adds zero observable overhead vs raw LLM calls. LangGraph is similarly tied within measurement noise. The numbers are below; the reproduction script is in the methodology section.

Last updated 2026-03-08

llama3.2 · Ollama · Apple M-series · 2026-03-08

Model: llama3.2 Endpoint: http://localhost:11434/v1 (Ollama) Runs: 20 (+3 warmup)

Framework	mean (ms)	median	p95	p99	stdev	overhead
Raw (baseline)	947.2	943.7	970.3	972.2	9.9	—
JamJet 0.1.1	948.6	948.2	959.0	964.2	6.0	+1.4ms
LangGraph	944.0	943.0	953.8	961.1	8.1	-3.2ms

Note: All three frameworks within measurement noise (~1ms). JamJet's in-process executor adds zero observable overhead over a raw LLM call.

qwen3:8b (thinking mode) · Ollama · Apple M-series · 2026-03-08

Model: qwen3:8b Endpoint: http://localhost:11434/v1 (Ollama) Runs: 15 (+3 warmup)

Framework	mean (ms)	median	p95	p99	stdev	overhead
Raw (baseline)	8429.5	8303.4	8940.3	9427.6	352.3	—
JamJet 0.1.1	10140.1	10139.1	10487.0	10519.5	285.1	+1710.6ms
LangGraph	11902.9	11923.3	12761.8	12823.5	551.7	+3473.3ms

Note: qwen3:8b generates variable-length chain-of-thought. High stdev dominates — overhead numbers reflect token generation variance, not framework overhead.

Methodology

All benchmarks measure wall-clock time per call. Each framework makes the identical LLM call through the same OpenAI-compatible client — what we measure is framework orchestration overhead.

Raw (baseline) — bare openai.OpenAI().chat.completions.create() call
JamJet — Workflow.run_sync() in-process executor
LangGraph — StateGraph.compile().invoke() with a single node

# Reproduce locally (Ollama)
export OPENAI_API_KEY="ollama"
export OPENAI_BASE_URL="http://localhost:11434/v1"
export MODEL_NAME="llama3.2"

git clone https://github.com/jamjet-labs/jamjet-benchmarks
cd jamjet-benchmarks/benchmarks
pip install -r requirements.txt
python bench_single_call.py --json results/my-run.json

Warmup runs excluded from measurements
Each timed run is independent — no shared state
Benchmarks run sequentially to avoid contention
Hardware: Apple M-series, 16GB RAM, Ollama local

Reproduce these benchmarks Feature comparison Quickstart

More: JamJet benchmarks index · Engram LongMemEval-S leaderboard · JamJet comparisons