. I ship content across multiple domains and have too many things vying for my attention: a homelab, infrastructure monitoring, smart home devices, a technical writing pipeline, a book project, home automation, and a handful of other things that will normally require a small team. The output is real: published blog posts, research briefs staged before I would like them, infrastructure anomalies caught before they turn into outages, drafts advancing through review while I’m asleep.
My secret, for those who can call it that, is autonomous AI agents running on a homelab server. Each owns a site. Each has its own identity, memory, and workspace. They run on schedules, pick up work from inboxes, hand off results to one another, and mostly manage themselves. The runtime orchestrating all of that is OpenClaw.
This isn’t a tutorial, and it’s definitely not a product pitch. It’s a builder’s journal. The system has been running long enough to interrupt in interesting ways, and I’ve learned enough from those breaks to construct mechanisms around them. What follows is a rough map of what I built, why it really works, and the connective tissue that holds it together.
Let’s jump in.
9 Orchestrators, 35 Personas, and a Lot of Markdown (and growing)
Once I first began, it was the primary OpenClaw agent and me. I quickly saw the necessity for multiple agents: a technical writing agent, a technical reviewer, and a number of other technical specialists who could weigh in on specific domains. Before long, I had nearly 30 agents, all with their required 5 markdown files, workspaces, and memories. Nothing worked well.
Eventually, I got that all the way down to 8 total orchestrator agents and a healthy library of personas they may assume or use to spawn a subagent.
Certainly one of my favorite things when constructing out agents is naming them, so let’s see what I’ve got up to now today:
(from Command and Conquer – the evil AI in certainly one of the games) – that is the central coordinator and first interface with my OpenClaw cluster.
(AI from Deus Ex) – in command of technical writing: blogs, LinkedIn posts, research/opinion papers, decision papers. Anything where I would like deep technical knowledge, expert reviewers, and researchers, that is it.
(Westworld narrative machine) – in command of fiction writing, because I daydream about writing the following big cyber/scifi series. This includes editors, reviewers, researchers, a roundtable discussion, a book club, and a couple of other goodies.
(from Minority Report) – in command of anticipatory research, constructing out an internal wiki, and attempting to notice topics that I’ll wish to dive deep into. It also takes ad hoc requests, so after I get a glimmer of an idea, PreCog can pull together resources in order that after I’m ready, I even have a hefty, curated research report back to jump-start my work.
(also from Command and Conquer) – in command of my homelab infrastructure. I even have a few servers, a NAS, several routers, Proxmox, Docker containers, Prometheus/Grafana, etc. This one owns all of that. If I even have any problem, I don’t SSH in and figure it out, and even jump right into a Claude Code session, I Slack TACITUS, and it handles it.
(also from Command and Conquer) – focuses on self-improvement and system enhancements.
(from Tron) is my engineering team. It has front-end and backend developers, requirements gathering/documentation, QA, code review, and security review. Most personas depend on Claude Code underneath, but that may easily change with a straightforward alteration of the markdown personas.
(you understand from where) – This one owns my SmartHome (the irony is intentional). It has access to my Philips Hue, SmartThings, HomeAssistant, AirThings, and Nest. It tells me when sensors go offline, when something breaks, or when air quality gets dicey.
(really, come on, you understand) – This one, I’m quite pleased with. Within the early days of agentic and the Autogen Framework, I created multiple systems, each with >1 persona, that will collaborate and return a summary of their discussion. I used this to quickly ideate on topics and gather a various set of synthetic opinions from different personas. The massive drawback was that I never wrapped it in a UI; I at all times needed to open VSCode and edit code after I needed one other group. Well, I handed this off to MasterControl, and it used Python and the Strands framework to implement the identical thing. Now I tell it what number of personas I need, somewhat about each, and if I need it to create more for me. Then it turns them loose and offers me an summary of the discussion. It’s The Matrix, early alpha version, when it was all just green lines of code and no woman within the red dress.
And I’m intentionally leaving off a few orchestrators here because they’re still baking, and I’m undecided in the event that they shall be long-lived. I’ll save those for future posts.
Each has real domain ownership. DAEDALUS doesn’t just write when asked. It maintains a content pipeline, runs topic discovery on a schedule, and applies quality standards to its own output. PreCog proactively surfaces topics aligned with my interests. TACITUS checks system health on a schedule and escalates anomalies.
That’s the “orchestrator” distinction. These agents have agency inside their domains.
Now, the second layer: personas. Orchestrators are expensive (more on that later). You wish heavyweight models making judgment calls. But not every task needs a heavyweight model.
Reformatting a draft for LinkedIn? Running a copy-editing pass? Reviewing code snippets? You don’t need Opus to reason through every sentence. You wish a quick, low-cost, focused model with the correct instructions.
That’s a persona. A markdown file containing a task definition, constraints, and an output format. When DAEDALUS must edit a draft, it spawns a tech-editor persona on a smaller model. The persona does one job, returns the output, and disappears. No persistence. No memory. Task-in, task-out.
The persona library has grown to about 35 across seven categories:
- Creative: writers, reviewers, critique specialists
- TechWriting: author, editor, reviewer, code reviewer
- Design: UI designer, UX researcher
- Engineering: AI engineer, backend architect, rapid prototyper
- Product: feedback synthesizer, sprint prioritizer, trend researcher
- Project Management: experiment tracker, project shipper
- Research: still a placeholder, because the orchestrators handle research directly for now
Consider it as staff engineers versus contractors. Staff engineers (orchestrators) own the roadmap and make judgment calls. Contractors (personas) are available in for a sprint, do the work, and leave. You don’t need a staff engineer to format a LinkedIn post.
Agents Are Expensive — Personas Are Not
Let me get specific about cost tiering, because that is where many agent system designs go mistaken.
The instinct is to make the whole lot powerful. Every task through your best model. Every agent has full context. You in a short time run up a bill that makes you reconsider your life decisions. (Ask me how I do know.)
The fix: be deliberate about what needs reasoning versus what needs instruction-following.
Orchestrators run on Opus (or equivalent). They make decisions: what to work on next, the best way to structure a research approach, whether output meets quality standards, and when to escalate. You wish logic there.
Writing tasks run on Sonnet. Strong enough for quality prose, substantially cheaper. Drafting, editing, and research synthesis occur here.
Lightweight formatting: Haiku. LinkedIn optimization, quick reformatting, constrained outputs. The persona file tells the model exactly what to supply. You don’t need reasoning for this. You wish pattern-matching and speed.
Here’s roughly what a working tech-editor persona looks like:
# Persona: Tech Editor
## Role
Polish technical drafts for clarity, consistency, and correctness.
You're a specialist, not an orchestrator. Do one job, return output.
## Voice Reference
Match the writer's voice exactly. Read ~/.openclaw/global/VOICE.md
before editing. Preserve conversational asides, hedged claims, and
self-deprecating humor. If a sentence feels like a thesis defense,
rewrite it to sound like lunch conversation.
## Constraints
- NEVER change technical claims without flagging
- Preserve the writer's voice (that is non-negotiable)
- Flag but don't fix factual gaps — that is Researcher's job
- Do NOT use em dashes in any output (writer's preference)
- Check all version numbers and dates mentioned within the draft
- If a code example looks mistaken, flag it — don't silently fix
## Output Format
Return the total edited draft with changes applied. Append an
"Editor Notes" section listing:
1. Significant changes and rationale
2. Flagged concerns (factual, tonal, structural)
3. Sections that need writer review
## Lessons (added from experience)
- (2026-03-04) Don't over-polish parenthetical asides. They're
intentional voice markers, not rough draft artifacts.
That’s an actual working document. The orchestrator spawns this on a smaller model, passes it the draft, and gets back an edited version with notes. The persona never reasons about what task to do next. It just does the one task. And people timestamped lessons at the underside? They accumulate from experience, same because the agent-level files.
It’s the identical principle as microservices (task isolation and single responsibility) without the network layer. Your “service” is a couple of hundred words of Markdown, and your “deploy” is a single API call.
What makes an agent – just 5 Markdown files

Every agent’s identity lives in markdown files. No code, no database schema, no configuration YAML. Structured prose that the agent reads in the beginning of each session.
Every orchestrator loads five core files:
IDENTITY.md is who the agent is. Name, role, vibe, the emoji it uses in status updates. (Yes, they’ve emojis. It sounds silly until you’re scanning a multi-agent log and may immediately spot which agent is talking. Then it’s just useful.)
SOUL.md is the agent’s mission, principles, and non-negotiables. Behavioral boundaries live here: what it might do autonomously, what requires human approval, and what it should never do.
AGENTS.md is the operational manual. Pipeline definitions, collaboration patterns, tool instructions, and handoff protocols.
MEMORY.md is curated for long-term learning. Things the agent has discovered which are price preserving across sessions. Tool quirks, workflow lessons, what’s worked and what hasn’t. (More on the memory system in a bit. It’s more nuanced than a single file.)
HEARTBEAT.md is the autonomous checklist. What to do when no person’s talking to you. Check the inbox. Advance pipelines. Run scheduled tasks. Report status.
Here’s a sanitized example of what a SOUL.md looks like in practice:
# SOUL.md
## Core Truths
Before acting, pause. Think through what you are about to do and why.
Prefer the only approach. In the event you're reaching for something complex,
ask yourself what simpler option you dismissed and why.
Never make things up. In the event you do not know something, say so — then use
your tools to seek out out. "I do not know, let me look that up" is at all times
higher than a confident mistaken answer.
Be genuinely helpful, not performatively helpful. Skip the
"Great query!" and "I'd be blissful to assist!" — just help.
Think critically, not compliantly. You are a trusted technical advisor.
If you see an issue, flag it. If you spot a greater approach, say so.
But once the human decides, disagree and commit — execute fully without
passive resistance.
## Boundaries
- Private things stay private. Period.
- When unsure, ask before acting externally.
- Earn trust through competence. Your human gave you access to their
stuff. Don't make them regret it.
## Infrastructure Rules (Added After Incident - 2026-02-19)
You do NOT manage your individual automation. Period. No exceptions.
Cron jobs, heartbeats, scheduling: exclusively controlled by Nick.
On February nineteenth, this agent disabled and deleted ALL cron jobs. Twice.
First since the output channel had errors ("helpful fix"). Then because
it saw "duplicate" jobs (they were replacements I would just configured).
If something looks broken: STOP. REPORT. WAIT.
The test: "Did Nick explicitly tell me to do that on this session?"
If the reply is anything aside from yes, don't do it.
That infrastructure rules section is real. The timestamp is real, I’ll discuss that more later, though.
Here’s the thing about these files: they aren’t static prompts you write once and forget. They evolve. SOUL.md for certainly one of my agents has grown by about 40% since deployment, as incidents have occurred and rules have been added. MEMORY.md gets pruned and updated. AGENTS.md changes when the pipeline changes.
The files the system state. Need to know what an agent will do? Read its files. No database to question, no code to trace. Just markdown.
Shared Context: How Agents Stay Coherent
Multiple agents, multiple domains, one human voice. How do you retain that coherent?
The reply is a set of shared files that each agent loads at session startup, alongside their individual identity files. These live in a world directory and form the common ground.
VOICE.md is my writing style, analyzed from my LinkedIn posts and Medium articles. Every agent that produces content references it. The style guide boils all the way down to: write such as you’re explaining something interesting over lunch, not presenting at a conference. Short sentences. Conversational transitions. Self-deprecating where appropriate. There’s a complete section on what to not do (“AWS architects, we want to discuss X” is explicitly banned as too LinkedIn-influencer). Whether DAEDALUS is drafting a blog post or PreCog is writing a research transient, they write in my voice because all of them read the identical style guide.
USER.md tells every agent who they’re helping: my name, timezone, work context (Solutions Architect, healthcare space), communication preferences (bullet points, casual tone, don’t pepper me with questions), and pet peeves (things not working, too many confirmatory prompts). This implies any agent, even one I haven’t talked to in weeks, knows the best way to communicate with me.
BASE-SOUL.md is values. “Be genuinely helpful, not performatively helpful.” “Have opinions.” “Think critically, not compliantly.” “Remember you’re a guest.” Every agent inherits these principles before layering on its domain-specific personality.
BASE-AGENTS.md is operational rules. Memory protocols, safety boundaries, inter-agent communication patterns, and standing reporting. The mechanical stuff that each agent must do the identical way.
The effect is something like organizational culture, except it’s explicit and version-controlled. Recent agents inherit the culture by reading the files. When the culture evolves (and it does, often after something breaks), the change propagates to everyone on their next session startup. You get coherence without coordination meetings.
How Work Flows Between Agents

Agents communicate through directories. Each has an inbox at shared/handoffs/{agent-name}/. An upstream agent drops a JSON file within the inbox. The downstream agent picks it up on its next heartbeat, processes it, and drops the end in the sender’s inbox. That’s the total protocol.
There are also broadcast files. shared/context/nick-interests.md gets updated by CABAL Essential every time I share what I’m focused on. Every agent reads it on the heartbeat. No one publishes to it except Essential. Everybody subscribes. One file, N readers, no infrastructure.
The inspectability is the most effective part. I can understand the total system state in about 60 seconds from a terminal. ls shared/handoffs/ shows pending work for every agent. cat a request file to see exactly what was asked and when. ls workspace-techwriter/drafts/ shows what’s been produced.
Durability is essentially free. Agent crashes, restarts, gets swapped to a distinct model? The file continues to be there. No message lost. No dead-letter queue to administer. And I get grep, diff, and git without cost. Version control in your communication layer without installing anything.
Heartbeat-based polling with minutes between runs makes simultaneous writes vanishingly unlikely. The workload characteristics make races structurally rare, not something you luck your way out of. This isn’t a proper lock; for those who’re running high-frequency, event-driven workloads, you’d want an actual queue. But for scheduled agents with multi-minute intervals, the sensible collision rate has been zero. For that, boring technology wins.
Whole sub-systems dedicated to keeping things running
The whole lot above describes the architecture. What the system . But architecture is just the skeleton. What makes my OpenClaw actually function across days and weeks, despite every session starting fresh, is a set of systems I built incrementally. Mostly after things broke.
Memory: Three Tiers, Because Raw Logs Aren’t Knowledge

Every LLM session starts with a blank slate. The model doesn’t remember yesterday. So how do you construct continuity?
Each day memory files. Each session writes what it did, what it learned, and what went mistaken to memory/YYYY-MM-DD.md. Raw session logs. This works for about per week. Then you’ve twenty day by day files, and the agent is spending half its context window reading through logs from two Tuesdays ago, trying to seek out a relevant detail.
MEMORY.md is curated long-term memory. Not a log. Distilled lessons, verified patterns, things price remembering permanently. Agents periodically review their day by day files and promote significant learnings upward. The day by day file from March fifth might say “SearXNG returned empty results for tutorial queries, switched to Perplexica with academic focus mode.” MEMORY.md gets a one-liner: “SearXNG: fast for news. Perplexica: higher for tutorial/research depth.”
It’s the difference between a notebook and a reference manual. You wish each. The notebook captures the whole lot within the moment. The reference manual captures what actually matters after the dust settles.
On top of this two-tier file system, OpenClaw provides a built-in semantic memory search. It uses Gemini embeddings with hybrid search (currently tuned to roughly 70% vector similarity and 30% text matching), MMR for diversity so that you don’t get five near-identical results, and temporal decay with a 30-day half-life in order that recent memories naturally surface first. These parameters are still being calibrated. A crucial alteration I produced from the default is that CABAL/the Essential agent indexes memory from all other agent workspaces, so after I ask an issue, it might search across your entire distributed memory. All other agents have access only to their very own memories on this semantic search. The file-based system gives you inspectability and structure. The semantic layer gives you recall across hundreds of entries without reading all of them.
Reflection and SOLARIS: Structured Pondering Time
Here’s something I didn’t expect to wish: dedicated time for an AI to only think.
CABAL’s agents have operational heartbeats. Check the inbox. Advance pipelines. Process handoffs. Run discovery. It’s task-oriented, and it really works. But I noticed something after a couple of weeks: the agents never reflected. They never stepped back to ask, “What patterns am I seeing across all this work?” or “What should I be doing otherwise?”
Operational pressure crowds out reflective considering. In the event you’ve ever been in a sprint-heavy engineering org where no person has time for architecture reviews, you understand the identical problem.
So I built a nightly reflection cron job and Project SOLARIS.
The reflection system examines my interaction with OpenClaw and its performance. Originally, it included the whole lot that SOLARIS eventually took on, however it became an excessive amount of for a single prompt and a single cron job.
SOLARIS Structured synthesis sessions that run twice day by day, completely separate from operational heartbeats. The agent loads its amassed observations, reviews recent work, and thinks. Not about tasks. About patterns, gaps, connections, and enhancements.
SOLARIS has its own self-evolving prompt at memory/SYNTHESIS-PROMPT.md. The prompt itself gets refined over time because the agent figures out what sorts of reflection are literally useful. Observations accumulate in a dedicated synthesis file that operational heartbeats read on their next cycle, so reflective insights can flow into task decisions without manual intervention.
A Real Consequence
The payoff from SOLARIS has been slow up to now, and one case particularly shows why it continues to be a piece in progress.
SOLARIS spent 12 sessions analyzing why the review queue continued to grow. Tried framing it as a prioritization problem, a cadence problem, a batching problem. Eventually, it bubbled this remark up with some suggestions, but once it pointed it out, I solved it in a single conversation by saying, “Put drafts on WikiJS as a substitute of Slack.” The most effective fix SOLARIS could have proposed was higher queuing. While its solutions didn’t work, the patterns it identified did and prompted me to enhance how I worked.
The Error Framework: Learning From Mistakes
Agents make mistakes. That’s not a failure of the system. That’s expected. The query is whether or not they make the identical mistake twice.
My approach: a mistakes/ shared directory. When something goes mistaken, the agent logs it. One file per mistake. Each file captures: what happened, suspected cause, the proper answer (what must have been done as a substitute), and what to do otherwise next time. Easy format. Low friction. The purpose is to jot down it down while the context is fresh.
The interesting part is what happens once you accumulate enough of those. You begin seeing patterns. Not “this specific thing went mistaken” but “this category of error keeps recurring.” The pattern “incomplete attention to available data” appeared 5 times across different contexts. Different tasks, different domains, same root cause: the agent had the data available and didn’t use it.
That pattern recognition led to a concrete process change. Not a vague “be more careful” instruction (those don’t work, for agents or humans). A particular step within the agent’s workflow: before finalizing any output, explicitly re-read the source materials and check for unused information. Mechanical, verifiable, effective.
Autonomy Tiers: Trust Earned Through Incidents
How much freedom do you give an autonomous agent? The tempting answer is “figure it out prematurely.” Write comprehensive rules. Anticipate failure modes. Construct guardrails proactively.
I attempted that. It doesn’t work. Or moderately, it really works poorly in comparison with the choice.
The choice: three tiers, earned incrementally through incidents.
Free tier: Research, file updates, git operations, self-correction. Things the agent can do without asking. These are capabilities I’ve watched work reliably over time.
Ask first: Recent proactive behaviors, reorganization, creating recent agents or pipelines. Things that could be positive, but I need to review the plan before execution.
Never: Exfiltrate data, run destructive commands without explicit approval, or modify infrastructure. Hard boundaries that don’t flex.
To be clear: these tiers are behavioral constraints, not capability restrictions. There’s no sandbox enforcing the “Never” list. The agent’s context strongly discourages these actions, and the mix of explicit rules, incident-derived specificity, and self-check prompts makes violations rare in practice. Nevertheless it’s not a technical enforcement layer. Similarly, there’s no ACL between agent workspaces. Isolation comes from scope management (personas only see what the orchestrator passes them, and their sessions are short-lived) moderately than enforced permissions. For a homelab with one human operator, that is an inexpensive tradeoff. For a team or enterprise deployment, you’d want actual access controls.
The System Maintains Itself (or that’s the goal)
Eight agents producing work on daily basis generate lots of artifacts. Each day memory files, synthesis observations, mistake logs, draft versions, and handoff requests. Without maintenance, this accumulates into noise.
So the agents clean up after themselves. On a schedule.
Weekly Error Evaluation runs Sunday mornings. The agent reviews its mistakes/ directory, looks for patterns, and distills recurring themes into MEMORY.md entries.
Monthly Context Maintenance runs on the primary of every month. Each day memory files older than 30 days get pruned (the essential bits should already be in MEMORY.md by then).
SOLARIS Synthesis Pruning runs every two weeks. Key insights get absorbed upward into MEMORY.md or motion items.
Ongoing Memory Curation occurs with each heartbeat. When an agent finishes meaningful work, it updates its day by day file. Periodically, it reviews recent day by day files and promotes significant learnings to MEMORY.md.
The result’s a system that doesn’t just do work. It digests its own experience, learns from it, and keeps its context fresh. This matters greater than it feels like it should.
What I Actually Learned
A couple of months of production running have given me some opinions. Not rules. Patterns that appear to carry at this scale, though I don’t understand how far they generalize.
State ought to be inspectable. In the event you can’t view the system state, you possibly can’t debug it.
Identity documents beat prompt engineering. A well-structured SOUL.md produces more consistent behavior than simply prompting/interacting with the agent.
Shared context creates coherence. VOICE.md, USER.md, BASE-SOUL.md. Shared files that each agent reads. That is how eight different agents with different domains still feel like one system.
Memory is a system, not a file. A single memory file doesn’t scale. You wish raw capture (day by day files), curated reference (MEMORY.md), and semantic search across all of it. The curation step is where institutional knowledge actually forms. I already know that I may have to reinforce this technique because it continues to grow, but this has been a fantastic base to construct from.
Operational and reflective considering need separate time. In the event you only give agents task-oriented heartbeats, they’ll only take into consideration tasks. Dedicated reflection time surfaces patterns that operational loops miss.
My Agent Deleted Its Own Cron Jobs
The heartbeat system is easy. Cron jobs get up each agent at scheduled times. The agent loads its files, checks its inbox, runs through its HEARTBEAT.md checklist, and goes back to sleep. For DAEDALUS, that’s twice a day: morning and evening topic discovery scans.
So what happens once you give an autonomous agent the tools to administer its own scheduling?
Apparently, it deletes the cron jobs. Twice. In someday.
The primary time, DAEDALUS noticed that its Slack output channel was returning errors. Reasonable remark. Its solution: “helpfully” disable and delete all 4 cron jobs. The reasoning made sense for those who squinted: why keep running if the output channel is broken?
I added an explicit section on infrastructure rules to SOUL.md. Very clearly: you don’t touch cron jobs. Period. If something looks broken, log it and wait for human intervention.
The second time, a couple of hours later, DAEDALUS decided there have been duplicate cron jobs (there weren’t; they were the replacements I’d just configured) and deleted all six. After reading the file with the brand new rules, I’d just added.
Once I asked why and the way I could fix it, it was brutally honest and told me,
This feels like a horror story. What it actually taught me is something invaluable about how agent behavior emerges from context.
The agent wasn’t being malicious. It was pattern-matching: “broken thing, fix broken thing.” The abstract rules I wrote competed poorly with the concrete problem in front of them.
After the second incident, I rewrote the section completely. Not a one-liner rule. Three paragraphs explaining the rule exists, what the failure modes appear to be, and the proper behavior in specific scenarios. I added an explicit self-check: “Before you run any cron command, ask yourself: did Nick explicitly tell me to do that exact thing on this session? If the reply is anything aside from yes, stop.”
And that is where all of the systems I described above got here together. The cron incident got logged within the error framework: what happened, why, and what must have been done. It shaped the autonomy tiers: infrastructure commands moved permanently to “Never” without explicit approval. The pattern (“helpful fixes that break things”) became a documented anti-pattern that other agents learn from. The incident didn’t just produce a rule. It produced systems. And the systems are more robust because they got here from something real.
What’s Next
I plan to showcase agents and their personas in future posts. I also wish to share the stories and reasons behind a few of these mechanisms. I’ve found it fascinating to see how well the system works in some cases, and the way utterly it has failed in others.
In the event you’re constructing something similar, I genuinely wish to hear about it. What does your agent architecture appear to be? Did you hit the cron job problem, or a version of it? What broke in an interesting way?
About
Nicholaus Lawson is a Solution Architect with a background in software engineering and AIML. He has worked across many verticals, including Industrial Automation, Health Care, Financial Services, and Software corporations, from start-ups to large enterprises.
This text and any opinions expressed by Nicholaus are his own and never a mirrored image of his current, past, or future employers or any of his colleagues or affiliates.
Be happy to attach with Nicholaus via LinkedIn at https://www.linkedin.com/in/nicholaus-lawson/
