Field notes
Engineering posts on durable execution, policy, audit, memory, and the boring reliability work that makes AI agents survive past the demo.
Worth your time
Five pieces that hold up no matter which framework you run.
I ran 40 agent-action control tests against Microsoft AGT, Cloudflare HITL, LangSmith Gateway, and Anthropic permission policies
Incident analysisWhen AI Deletes the Database
LandscapeGoogle ADK vs JamJet: Building a Claims Processing Agent
LandscapeThe State of Memory in Java AI Agents (April 2026)
Incident analysisEvery Major AI Agent Failure Has the Same Root Cause
- 01
Approvals That Survive kill -9
JamJet's human-in-the-loop approvals are event-sourced. Park a payment on a human gate, kill the runtime, restart it, and the approval is still waiting. Approve it and the workflow finishes.
Read → - 02
What's Missing in JVM AI: Governance and Budget Enforcement
Java got serious about AI observability this year. Governance is a different story: the frameworks ship interception hooks, not a safety layer, and nobody enforces a budget.
Read → - 03
Your Agents Don't Need a Smarter Model. They Need a Runtime.
Models keep getting better and agents keep breaking in production. The bottleneck was never intelligence. It is execution reliability, and that is a runtime problem.
Read → - 04
I ran 40 agent-action control tests against Microsoft AGT, Cloudflare HITL, LangSmith Gateway, and Anthropic permission policies
AgentBoundary v0.1 conformance suite, 40 deterministic tests, four vendors. Per-vendor mapping and per-scenario verdicts in the public adapters/ tree. Run it yourself in 60 seconds.
Read → - 05
Datasets as Policy Test Fixtures for Production AI Agents
Curate a handful of traces, draft a policy, replay it against the curated set, and read a per-event verdict diff before any agent runs the rule in production.
Read → - 06
I tried to delete a database with an AI agent. The runtime said no.
JamJet 0.8.1 (Python) and @jamjet/cloud 0.2.2 (TypeScript) ship a runtime safety layer that intercepts an agent's tool calls before the tool function is invoked — and the four zero-setup demos prove the path.
Read → - 07
Every AI toolchain is inventing its own safety layer. We shipped one that works across all of them.
JamJet shipped a portable policy layer that runs the same safety rules across Claude Code hooks, OpenAI Agents SDK guardrails, MCP stdio traffic, and the JamJet Python/TS SDKs. One policy file. One audit trail.
Read → - 08
When AI Deletes the Database
From PocketOS to Replit, AI agents are wiping production databases. Why this is a runtime problem -- not a model problem -- and the architecture pattern that prevents it.
Read → - 09
Engram on LongMemEval: What Worked, What Didn't, What We Learned
We added 8 retrieval-and-reading techniques to Engram and benchmarked each independently against LongMemEval-S. Three shipped, five didn't. The negative results turned out to be the most useful part.
Read → - 10
Your AI Agents Won't Survive an Audit
89% of enterprise AI agents never reach production. The EU AI Act is enforceable in August. Here's what production safety actually requires — and why most agent frameworks aren't ready.
Read → - 11
Zero-Sidecar Durable AI Agents in Java
Kill your agent. Restart it. It remembers everything. The JamJet Java Runtime embeds durable execution directly in your JVM — no Docker, no sidecar, no REST overhead.
Read → - 12
Why Your AI Agents Need Observability — and What to Measure
You would not deploy a microservice without metrics and tracing. Why are you deploying AI agents blind? Here is what to measure and how.
Read → - 13
Getting Started with MCP: Connect AI Agents to Any Tool
Model Context Protocol is becoming the USB-C of AI agents. Here is how to connect your agents to databases, APIs, and file systems — with working code.
Read → - 14
Google ADK vs JamJet: Building a Claims Processing Agent
We built the same insurance claims agent in both frameworks. One crashes and loses everything. The other picks up exactly where it left off.
Read → - 15
How to Choose an AI Agent Framework in 2026
LangGraph, CrewAI, AutoGen, Google ADK, JamJet — the landscape is crowded. Here is a practical decision framework for picking the right one.
Read → - 16
Engram: A Memory Layer for AI Agents That Actually Works
One cargo install. Zero infrastructure. Your agents remember everything — with temporal knowledge graphs, semantic search, and MCP-native tools.
Read → - 17
The State of Memory in Java AI Agents (April 2026)
A tour of every option Java developers have for adding persistent memory to AI agents — and why most of them stop at chat history.
Read → - 18
The Companies Quietly Replacing Entire Workflows with AI Agents — While You're Still Debating Prompts
While most teams argue about prompt engineering, early movers are shipping autonomous agent workflows that handle claims, onboarding, and due diligence end-to-end. Here's what they know that you don't.
Read → - 19
Akka Agents vs JamJet: Actor Model or Agent-Native Runtime?
Two production-grade approaches to AI agents on the JVM. Akka adapted 20 years of actor infrastructure. JamJet was purpose-built from day one. An honest architectural comparison with code, diagrams, and a decision matrix.
Read → - 20
JamJet Spring Boot Starter — Production-Grade Agent Runtime for Spring AI
Add one dependency to your Spring Boot application. Get crash recovery, audit trails, replay testing, and human-in-the-loop for every Spring AI agent call. JamJet brings its full agent runtime — strategies, multi-agent coordination, MCP, A2A, eval harness — to the Spring ecosystem.
Read → - 21
Every Major AI Agent Failure Has the Same Root Cause
Klarna, Air Canada, DPD — sourced post-mortems of real AI agent failures. The pattern is always the same: prototype infrastructure in production. Named companies, real timelines, avoidable lessons.
Read → - 22
AI Agents Need Their Spring Moment — It Starts with the Runtime
Spring transformed how Java built enterprise apps. AI agents need the same transformation — not another framework, but a production runtime. A sourced comparison of every major JVM AI framework and where the gap remains.
Read → - 23
What Your Competitors Are Already Doing With AI Agents
Named companies, real metrics, sourced data. How finance, legal, support, and insurance deploy AI agents in production — and what it means if you haven't started.
Read → - 24
Why AI Agents Are the Next Competitive Advantage — and What Leaders Need to Know
What AI agents mean for business leaders: faster decisions, better scale, and a new operating model. No code, no jargon — just the strategic case.
Read → - 25
What's New: Incremental Streaming, LLM Tiebreaker, and Reasoning Modes
True incremental NDJSON streaming for agent tools, async LLM tiebreaker for coordinator routing, and reasoning mode scoring for Agent Cards.
Read → - 26
Why We Built JamJet
The demo-to-production gap in AI agents is real. Here is why we built a new runtime instead of reaching for another framework.
Read → - 27
Building a multi-agent wealth advisor with JamJet
Four specialist AI agents — risk profiler, market analyst, tax strategist, portfolio architect — collaborate through a durable workflow to produce investment recommendations. A deep dive into the architecture, with a side-by-side comparison to Google ADK.
Read → - 28
Data governance for AI agents: PII, redaction, and retention
How JamJet's data policy engine handles PII detection, automatic redaction, and time-based retention — enforced by the Rust runtime, not by convention.
Read → - 29
Phase 4: Enterprise security for production agents
Multi-tenant isolation, PII redaction, OAuth delegation, mTLS federation — the enterprise layer that lets agents handle real data in real organizations.
Read → - 30
OAuth delegation and federation auth for AI agents
RFC 8693 token exchange, scope narrowing, per-step scoping, mTLS federation — how JamJet ensures agents never exceed the permissions they were granted.
Read → - 31
Migrating from LangGraph to JamJet: what actually changes
A side-by-side walkthrough of the same workflow in LangGraph and JamJet — what maps across, what disappears, and what you gain.
Read → - 32
Building a self-evaluating AI agent in 50 lines
Draft, judge, retry. A workflow that scores its own output and loops until it is good enough — or gives up gracefully.
Read → - 33
Testing AI agents like software
Most teams test their agents by running them manually and eyeballing the output. There is a better way — and it fits in a CI pipeline.
Read → - 34
Phase 3: Eval Harness, Project Templates, and the Path to Trustworthy Agents
Shipping the eval harness, four built-in project templates, and why testing your agents the same way you test software is the only path forward.
Read → - 35
Why I built JamJet's runtime in Rust
Not a trendy choice. A conviction-based one. Here is what it cost, what it taught me, and why I would do it again.
Read → - 36
Announcing JamJet: The Agent-Native Runtime
We built the runtime we wished existed for AI agents — durable, composable, and built for production from day one.
Read →