← Back to JamJet benchmarks

Framework overhead — zero-cost orchestration

JamJet vs raw LLM calls vs LangGraph, reproducible end-to-end.

Across two reader models — a fast 3B (llama3.2) and a slower 8B chain-of-thought model (qwen3:8b) — JamJet's in-process executor adds zero observable overhead vs raw LLM calls. LangGraph is similarly tied within measurement noise. The numbers are below; the reproduction script is in the methodology section.

Last updated 2026-03-08

llama3.2 · Ollama · Apple M-series · 2026-03-08

Model: llama3.2 Endpoint: http://localhost:11434/v1 (Ollama) Runs: 20 (+3 warmup)
Framework mean (ms) median p95 p99 stdev overhead visual
Raw (baseline) 947.2 943.7 970.3 972.2 9.9
JamJet 0.1.1 948.6 948.2 959.0 964.2 6.0 +1.4ms
LangGraph 944.0 943.0 953.8 961.1 8.1 -3.2ms

Note: All three frameworks within measurement noise (~1ms). JamJet's in-process executor adds zero observable overhead over a raw LLM call.

qwen3:8b (thinking mode) · Ollama · Apple M-series · 2026-03-08

Model: qwen3:8b Endpoint: http://localhost:11434/v1 (Ollama) Runs: 15 (+3 warmup)
Framework mean (ms) median p95 p99 stdev overhead visual
Raw (baseline) 8429.5 8303.4 8940.3 9427.6 352.3
JamJet 0.1.1 10140.1 10139.1 10487.0 10519.5 285.1 +1710.6ms
LangGraph 11902.9 11923.3 12761.8 12823.5 551.7 +3473.3ms

Note: qwen3:8b generates variable-length chain-of-thought. High stdev dominates — overhead numbers reflect token generation variance, not framework overhead.

Methodology

All benchmarks measure wall-clock time per call. Each framework makes the identical LLM call through the same OpenAI-compatible client — what we measure is framework orchestration overhead.

  • Raw (baseline) — bare openai.OpenAI().chat.completions.create() call
  • JamJet — Workflow.run_sync() in-process executor
  • LangGraph — StateGraph.compile().invoke() with a single node
# Reproduce locally (Ollama)
export OPENAI_API_KEY="ollama"
export OPENAI_BASE_URL="http://localhost:11434/v1"
export MODEL_NAME="llama3.2"

git clone https://github.com/jamjet-labs/jamjet-benchmarks
cd jamjet-benchmarks/benchmarks
pip install -r requirements.txt
python bench_single_call.py --json results/my-run.json
  • Warmup runs excluded from measurements
  • Each timed run is independent — no shared state
  • Benchmarks run sequentially to avoid contention
  • Hardware: Apple M-series, 16GB RAM, Ollama local
Reproduce these benchmarks Feature comparison Quickstart