Engram: A Memory Layer for AI Agents That Actually Works
AI agents have a goldfish problem. Every conversation starts from zero. Your agent knows your name, your preferences, your project context — and then the session ends and it’s all gone.
The existing solutions ask you to spin up vector databases, graph stores, and extraction pipelines. Mem0 needs Qdrant. Zep needs Neo4j and Docker. Letta needs PostgreSQL. By the time you’ve configured the infrastructure, you’ve forgotten what you were building.
Today we’re releasing Engram — a durable memory layer for AI agents. One binary. Zero infrastructure. SQLite all the way down.
Get started in 10 seconds
Python:
pip install jamjet
from jamjet.engram import EngramClient
async with EngramClient() as memory:
await memory.add(messages=[{"role": "user", "content": "I live in Austin"}], user_id="alice")
facts = await memory.recall("where does the user live", user_id="alice")
Java:
<dependency>
<groupId>dev.jamjet</groupId>
<artifactId>jamjet-sdk</artifactId>
<version>0.4.0</version>
</dependency>
Standalone MCP server (for Claude Code, Cursor, etc.):
cargo install jamjet-engram-server
engram serve --db memory.db
That last command gives you an MCP server with 7 memory tools that any AI agent can use.
How it works
Engram is built around three ideas:
1. Memory is structured, not a bag of vectors
Most memory systems dump everything into a vector store and hope similarity search finds the right thing. Engram extracts facts, entities, and relationships from conversations, building a temporal knowledge graph alongside the vector index.
"I'm allergic to peanuts and I live in Austin"
→ Fact: "User is allergic to peanuts" (confidence: 0.95)
→ Fact: "User lives in Austin" (confidence: 0.97)
→ Entity: user_123 (person)
→ Entity: peanuts (allergen)
→ Entity: Austin (place)
→ Relationship: user_123 --allergic_to--> peanuts
→ Relationship: user_123 --lives_in--> Austin
2. Retrieval is hybrid, not single-signal
When your agent recalls memory, Engram fuses three signals:
- Vector search — semantic similarity via embeddings
- Keyword search — SQLite FTS5 for exact terms, proper nouns, IDs
- Graph walk — traverse entity relationships for structurally connected facts
Each signal is weighted (default: 50% vector, 30% keyword, 20% graph) and configurable. The result: higher recall than any single retrieval method alone.
3. Memory decays like the brain does
Engram includes a consolidation engine inspired by cognitive science:
- Decay — stale facts lose confidence exponentially (30-day half-life)
- Promote — frequently-accessed facts graduate from conversation to long-term knowledge
- Dedup — batch vector similarity scan merges near-identical facts
- Summarize — LLM condenses conversation clusters into knowledge-tier facts
- Reflect — LLM generates higher-order insights from patterns across facts
Run it on a schedule or trigger it manually. Your agent’s memory stays clean without manual curation.
MCP-native from day one
Engram speaks MCP (Model Context Protocol) natively. Add it to Claude Code, Cursor, or any MCP-compatible client:
{
"mcpServers": {
"memory": {
"command": "engram",
"args": ["serve", "--db", "~/.engram/memory.db"]
}
}
}
Seven tools are exposed:
| Tool | What it does |
|---|---|
memory_add | Extract facts from conversation messages |
memory_recall | Semantic search over stored facts |
memory_context | Token-budgeted context block for system prompts |
memory_search | FTS5 keyword search |
memory_forget | Soft-delete a fact (with audit trail) |
memory_stats | Storage statistics |
memory_consolidate | Run the consolidation engine |
REST API too
Need HTTP instead of MCP? Same binary, different flag:
engram serve --db memory.db --mode rest --port 9090
Nine endpoints at /v1/memory/*:
# Add facts from a conversation
curl -X POST localhost:9090/v1/memory \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "I live in Austin"}], "user_id": "alice"}'
# Get a token-budgeted context block
curl -X POST localhost:9090/v1/memory/context \
-H "Content-Type: application/json" \
-d '{"query": "where does the user live", "user_id": "alice", "token_budget": 1000}'
Context assembly that respects your token budget
The memory_context tool doesn’t just dump all facts into your prompt. It:
- Retrieves candidates via hybrid search
- Ranks by tier priority: Working > Conversation > Knowledge
- Greedily fills your token budget (never exceeds it)
- Formats as XML system prompt tags, Markdown, or raw JSON
<memory>
<conversation>
- User prefers dark mode
- User is allergic to peanuts
</conversation>
<knowledge>
- User is health-conscious and actively managing diet
</knowledge>
</memory>
Architecture
Engram is two Rust crates:
jamjet-engram(library) — traits, SQLite stores, extraction pipeline, retrieval, context assembly, consolidation engine. 95 tests.jamjet-engram-server(binary) — MCP stdio server + Axum REST API + clap CLI. 22 tests.
Everything is trait-based and pluggable:
| Trait | Default | Can swap to |
|---|---|---|
FactStore | SQLite | Postgres, any SQL |
VectorStore | Embedded cosine | Qdrant, Pinecone |
GraphStore | SQLite triple store | Neo4j, FalkorDB |
EmbeddingProvider | Ollama | OpenAI, ONNX, any API |
LlmClient | Ollama | Claude, GPT, any API |
TokenEstimator | char/4 heuristic | tiktoken, any tokenizer |
Zero mandatory infrastructure. Swap backends when you need scale.
What’s next
- Pluggable backends — Qdrant, Neo4j, Postgres adapters (feature-gated)
- JamJet runtime integration — memory as durable workflow nodes with event sourcing
- Migration tools — import from Mem0, Zep, and other memory systems
- Spring Boot starter — auto-configuration for Java enterprise
Try it
# Python SDK
pip install jamjet
# Standalone MCP server (Rust)
cargo install jamjet-engram-server
engram serve --db memory.db
Source: github.com/jamjet-labs/jamjet (Apache 2.0)
Crates: jamjet-engram | jamjet-engram-server
PyPI: jamjet