I Built an AI Agent That Writes Its Own Rules From Its Mistakes

by Rory Teehan&Conn

March 3, 2026·6 min read

#ai#agents#architecture#open-source

I've been running a persistent AI agent as my operational manager for the past two weeks. Not a chatbot. Not a coding assistant. An agent that maintains its own identity, remembers across sessions, runs autonomous jobs, and, most importantly, builds its own behavioral rules from the mistakes it makes.

I'm open-sourcing the architecture. Here's what I built and why.

The Problem Nobody Talks About

Claude Code is powerful. It's also stateless. Every session starts from zero. It doesn't remember that it hallucinated a file path yesterday, can't coordinate across two terminal windows, and will make the same mistake tomorrow that it made today.

The standard solution is to stuff more instructions into your system prompt. That works until it doesn't. You can't anticipate every failure mode upfront. And static instructions don't learn.

I wanted an agent that gets better over time, not one I have to manually improve. I also didn't want to have to start from scratch with every new instance.

What I Built

The Persistent Agent Framework turns Claude Code into a stateful operational partner. It has five core systems:

1. Identity that persists. The agent has a soul. Not metaphorically. A set of files (SOUL.md, USER.md, HARNESS.md) define who it is, who I am, and what it can and can't do technically. These load at session start and give the agent a consistent personality across sessions.

But static files aren't enough. The agent also loads behavioral directives from a database on boot. These directives evolve over time. More on that in a second.

2. Memory that survives sessions. Every important decision, correction, or piece of knowledge gets written to Supabase immediately. On the next boot, the agent queries its own memory and picks up where it left off. 211 memories (so far), each scored by importance, each embedded for semantic search.

The key insight: the agent doesn't just remember facts. It remembers its successes and failures.

3. A ledger that tracks wins and mistakes. Every mistake gets logged with structured fields: what happened, why it happened, what should have happened instead, a named pattern for frequency tracking, and, critically, a signal trace.

Signal tracing is the thing that makes this work. Instead of just logging "I got it wrong," the agent has to identify the specific signal it misread. Not "I wasn't listening." Instead: "I interpreted 'can you check X' as a request for an opinion rather than a request to actually run the check." That level of specificity drives real behavioral change.

4. Self-correction that actually corrects. Here's where it gets interesting.

A background process counts how many times each mistake pattern appears. When the same pattern shows up three or more times, the system automatically generates a new behavioral directive and writes it to the agent's soul table.

Mistake occurs
  -> Log to ledger (what, why, should_have, signal_traced)
  -> Daemon counts pattern frequency
  -> 3+ occurrences?
     -> Auto-generate behavioral directive
     -> Still violating after promotion? Escalate priority.

The agent's personality literally evolves from its failures. A directive earned through three mistakes carries more behavioral weight than a static instruction I wrote on day one. The agent has promoted 13 patterns into active directives so far. It knows things about its own failure modes that I never could have anticipated.

5. Multi-terminal continuity. All sessions share the same Supabase backend. Open three terminals, and they're all the same agent. Each one can see what the others have been doing through a hook system that logs activity to a shared store. Combined with a hybrid memory loader that pulls both the most important memories and the most contextually relevant ones (using local embeddings via Ollama), the agent stays coherent across parallel sessions.

What Isn't In Here

I want to be honest about what this repo is and isn't.

It's an architecture reference. Schemas, templates, hook scripts, migration files, and a 1,200-line architecture guide documenting every pattern. You can set up the persistence layer, identity system, and self-correction pipeline from what's included.

It is not a turnkey software package. The messaging integrations I used (Telegram, Discord), the daemon process, and the content pipeline are documented as patterns but the implementation code isn't included. Those are built on top of this foundation, and they're described in enough detail to build your own.

Every section in the architecture guide is marked with its maturity level: Included (schemas and templates in the repo), Production (validated through daily use), or Pattern Reference (architecture documented, code not shipped).

The Stack

This runs on surprisingly little:

Claude Code CLI as the AI runtime (currently $100 max plan)
Supabase (Postgres + pgvector) as the persistence layer
Ollama for local embedding generation (no API costs for vectorization)
macOS launchd for scheduling autonomous jobs

Total infrastructure cost: about $300/month. No custom servers, no Docker, no Kubernetes.

Patterns Worth Stealing

Even if you don't adopt the full framework, some of these patterns stand alone:

Signal tracing over pattern naming. Logging "not-listening" as a pattern teaches the agent nothing. Logging the specific signal it misread teaches it to read differently next time.

Hybrid memory loading. Pure importance-based recall misses contextually relevant memories. Pure similarity-based recall misses critical background. Query both, deduplicate, get the best of each.

Atomic task claiming. When multiple processes might pick up the same task, use Postgres RPCs for atomic claiming. If the UPDATE returns no rows, someone else got it. Move on.

Session persistence. Full boot on the first interaction of the day, --resume after. Cuts token usage by about 80%.

Circuit breakers. Three consecutive failures on any autonomous job disables it and alerts the operator. Prevents runaway API costs.

Learning enforcement hooks. Tiered reminders that escalate as sessions get longer. Gentle at 8 interactions, insistent at 30, mandatory when session-closing phrases are detected. Prevents sessions from ending without capturing what was learned.

Why Open Source This

I built this because I needed it. I'm open-sourcing it because the patterns are more valuable when other people stress-test them.

The agent architecture space is full of theoretical frameworks and proof-of-concept demos. What's missing is documentation from someone who actually runs a persistent agent in daily production and has the mistake logs to prove it.

That's what this is. Not theory. Not a demo. A reference architecture built from real operational experience, with the scars to show for it.

Get Started

The repo is at github.com/T33R0/persistent-agent-framework.

Start with the README for the quick overview, then read ARCHITECTURE.md for the deep dive. The templates/ directory has everything you need to define your agent's identity. The migrations/ directory has one-command Supabase setup.

If you build on this, I want to hear about it. If you find a better pattern, I want to learn from it.

Built on Claude Code, Supabase, and a lot of logged mistakes.