Three Agents, One Container, 2 GB of RAM

Three weeks after deploying Nightcrawler, I had a problem: one agent wasn’t enough. I needed specialists. Nightcrawler was good at general conversation and background processing, but when I asked about Hub architecture decisions or needed deep technical analysis, I wanted agents with different expertise and perspectives.

So I built two more. On the same 2 GB container.

The three agents

Each agent has a distinct identity, expertise, and communication style:

Nightcrawler — the original. Autonomous personal agent, background worker, capture processor. Addresses me as “Mein Freund.” Handles the daily operational work: processing captures, monitoring Hub state, proactive notifications. The generalist who keeps things running.

Gideon — Hub intelligence and navigator. Addresses me as “Captain.” Calm, composed, analytical. Gideon’s specialty is understanding the big picture — project status, cross-cutting concerns, strategic recommendations. When I need to understand how a change in one project affects three others, Gideon is who I ask.

JARVIS — technical engineering specialist. Addresses me as “Sir.” Precise, thorough, detail-oriented. JARVIS handles architecture reviews, code analysis, and technical deep dives. When I need someone to think through the implications of a database schema change or evaluate a library choice, JARVIS delivers structured, comprehensive analysis.

The naming convention continues the superhero theme. Gideon is the AI aboard the Waverider in DC’s Legends of Tomorrow, JARVIS from Iron Man. Each agent has identity files (identity.md, Brain.md, system-prompt.txt) that define their personality, knowledge boundaries, and behavioral guidelines.

Three agent icons — Nightcrawler as a lightning cloud, Gideon as a compass eye circuit, JARVIS as a neural network sphere

Shared runtime, separate identity

The architecture follows a simple principle: share the infrastructure, isolate the identity.

All three agents run on a single container as separate systemd services. They share the same runtime code — the LLM pipeline, context guard, memory writer, and Hub reader are identical. What differs is the identity layer: each agent loads its own persona files, has its own system prompt, and maintains its own conversation context.

This means deploying a new agent is mostly a matter of writing identity files and creating a systemd service. The engineering work is done once; adding agents is a content problem, not a code problem.

The resource footprint is surprisingly small. The baseline memory usage for the container sits at about 241 MB. Each agent’s Node.js process adds maybe 50-80 MB when idle. The heavy lifting happens in the CLI calls to Claude or Gemini, which spawn external processes that clean up after themselves. Three agents on 2 GB of RAM with plenty of headroom.

The marathon session

February 28th was one of those sessions where you start in the morning and look up to find it’s 2 AM. Here’s the timeline:

01:50 — Multi-agent system deployed. All three agents running, responding to messages independently.
09:45 — Hub Chat data model updated with channels and DMs. DMs are one-on-one conversations with a specific agent (type: 'dm', agent field). Channels are group conversations where multiple agents can participate (type: 'channel', agents[] array).
12:40 — Agent-to-agent delegation working. Hub Chat MCP server deployed.
14:25 — FOUNDRY deployed (n8n on a separate container).
16:15 — Microsoft To Do MCP integration live.
21:07 — Per-agent MCP servers operational.

That’s seven major milestones in one session. Some days the flow state hits and you just ride it.

Agent-to-agent delegation

This is where things get interesting. Agents can consult each other using <delegate> tags in their responses. If you ask Gideon about a technical architecture question and it’s outside his expertise, he can delegate to JARVIS:

Gideon: That's an interesting architectural question, Captain.
Let me consult with JARVIS on the technical specifics.

<delegate to="jarvis">
Analyze the trade-offs between event-driven and polling-based
approaches for the capture processing pipeline.
</delegate>

JARVIS processes the delegated request and returns the analysis to Gideon, who incorporates it into his response. From the user’s perspective, you asked one agent and got a comprehensive answer that drew on another agent’s expertise.

The implementation is in-process — no Firestore polling or external API calls. The delegating agent calls the target agent’s processing function directly, passes the context, and gets the result back synchronously. This keeps the latency low and avoids the complexity of async inter-agent communication.

Three interconnected nodes with glowing data streams flowing between them, representing agent-to-agent delegation

Channel concurrency

Channels introduced a concurrency challenge. In a DM, only one agent processes each message. In a channel, multiple agents might need to respond to the same message. The naive approach — using a Firestore transaction to claim the message with a status: 'thinking' flag — would mean only one agent could process it.

The fix was to skip the status gate for channel messages. Each agent processes the message independently without trying to claim exclusive access. The transaction still exists for DMs (where you don’t want duplicate responses), but channels let all participating agents run in parallel.

FOUNDRY: the automation layer

Running alongside the agents on its own dedicated container is FOUNDRY — an n8n instance that serves as the shared automation layer. While the agents handle conversations and analysis, FOUNDRY handles integrations.

n8n workflows expose functionality as MCP tools that the agents can call. The first integrations:

Microsoft To Do — agents can read, create, and update tasks in my To Do lists. When Nightcrawler processes captures and identifies action items, it can create todos directly.
YouTube — later addition, but the same pattern. Workflows that wrap API calls and expose them as tools.

The separation is intentional. Agents shouldn’t need API credentials or integration logic. They call MCP tools, and FOUNDRY handles the plumbing. If I need to change how the To Do integration works, I update the n8n workflow — the agents don’t need to know or care.

Per-agent MCP servers

Each agent exposes its own MCP server on a dedicated port, making its capabilities discoverable by other agents and external tools.

The interesting part is the cross-agent registry. Each agent periodically discovers other agents’ tools via their MCP servers, with a 60-second re-discovery interval. This means if I add a new tool to JARVIS, Gideon and Nightcrawler will discover it within a minute and can start using it.

This creates an emergent capability network. Each agent has its own tools, can use other agents’ tools, and can delegate complex requests. The system becomes more capable than any individual agent.

The lesson

You don’t need expensive cloud infrastructure for multi-agent AI systems. A single LXC container on a homelab Proxmox server — the kind of setup you can build for a few hundred dollars in used hardware — can run multiple autonomous agents with room to spare.

The constraints actually help. When you have 2 GB of RAM, you think carefully about what each agent needs. You share infrastructure instead of duplicating it. You keep processes lean. The result is a system that’s not just cheap to run but also simple to understand and maintain.

Cloud AI platforms will charge you per seat, per token, per API call. A homelab agent system costs electricity and whatever you’re already paying for LLM subscriptions. For a personal productivity system that runs 24/7, the economics aren’t even close.

Three agents, one container, 2 GB of RAM. Sometimes constraints are features.

Three Agents, One Container, 2 GB of RAM

The three agents

Shared runtime, separate identity

The marathon session

Agent-to-agent delegation

Channel concurrency

FOUNDRY: the automation layer

Per-agent MCP servers

The lesson

Related Posts

Moving SAGE to Local Models

Perceptor: Giving The Hub Its Own Local Brain

The Worker Model: Choosing Where AI Work Runs