Perceptor: Giving The Hub Its Own Local Brain

At some point, local AI stops being an experiment and starts looking like infrastructure.

For a while, I treated local models as something to try on the side. Fun benchmarks, occasional experiments, maybe a fallback when a cloud model was unavailable. The real system still depended on remote inference for anything that mattered.

But The Hub kept growing. More scheduled agents. More classification tasks. More summarization. More small decisions that did not need frontier-model intelligence, but did need to be cheap, private, fast, and available.

That is when I added Perceptor.

Perceptor is a dedicated Mac Mini in the worker model. Its job is simple: provide local inference for The Hub. Not replace every cloud model. Not become a magical offline supercomputer. Just become the local brain the system can rely on for the right class of tasks.

Why a dedicated machine

I could have run local models on my laptop. I did, at first. The problem is that a laptop is not infrastructure. It sleeps. It travels. It gets busy. It runs browser tabs, video calls, IDEs, random experiments, and whatever else the day throws at it.

An always-on system needs something more boring.

That is the compliment. Boring infrastructure is available, predictable, and isolated from my daily mess. A small dedicated machine can sit in the background and do one job well. It can be reachable over the private network, managed like part of the homelab, and tuned for inference without competing with my workday.

Perceptor became that machine.

Provisioning is part of the architecture

The setup was not just “install a model runner and call it done.” Provisioning mattered because Perceptor sits between personal infrastructure and AI execution.

I wanted disk encryption. I wanted remote access through a private network. I wanted the machine to have its own identity instead of blending into my personal desktop setup. I wanted cloud account sync turned down rather than turned into another source of surprising coupling.

Those choices are not glamorous, but they shape the security model. Local inference only feels good if the machine running it is boring in the right ways: encrypted, reachable, documented, and not casually sharing more personal state than it needs.

The worker model helped here. Perceptor is not an always-on chat agent, and it is not a fully privileged coding session. It is an inference host. That boundary makes the setup easier to reason about.

The first useful benchmark

The first model I stood up was Gemma 3 12B through Ollama. I gave it a simple essay-style prompt and measured output speed. In my local setup, that came out to about 12.9 tokens per second for a medium response.

That number is not spectacular in isolation. It is also not the point.

What mattered was that the result was usable. A local model could produce a coherent answer fast enough for background workflows. Not instant, not frontier-class, but good enough for many of the tasks The Hub actually runs: classification, rough summarization, extraction, routing, and first-pass recommendations.

A dark technical benchmark scene showing a compact local inference node producing glowing token streams and latency traces

The lesson was not “local models can do everything.” The lesson was “local models can do more than enough to deserve a permanent place in the architecture.”

Local inference changes the economics

Once a local model is always available, you start seeing tasks differently.

Cloud inference makes you think about cost per call, rate limits, provider outages, and whether a task is worth spending model budget on. That is healthy for important work, but it can discourage tiny background intelligence. You do not want every small classification or extraction step to feel like a product decision.

Local inference changes that. It makes low-stakes intelligence cheap enough to sprinkle into the system.

Should this captured note become a task? Should this video go into a watch-later queue or a learning queue? Is this transcript segment relevant to a project? Does this message look like something that should wake me up? These are not always hard questions. They are volume questions.

Perceptor gives The Hub a place to answer those questions without calling out to a remote model every time.

It does not replace cloud models

The important caveat is that local-first is not local-only.

There are still jobs where I want a stronger model. Deep reasoning, careful writing, code review, complex planning, and anything with high ambiguity still benefit from more capable remote models. I do not want to pretend a smaller local model is smarter than it is just because it runs on my hardware.

The better pattern is tiered inference.

Use local models for cheap, frequent, bounded work. Use stronger cloud models for harder reasoning and final judgment. Let the system route intelligently based on risk, cost, privacy, and expected value.

That is much more interesting than trying to crown one model as the winner.

A new kind of dependency

Adding Perceptor also created a new operational dependency. The Hub now had a machine that other systems might rely on for inference. That means availability, monitoring, model updates, and fallback behavior matter.

If Perceptor is asleep, disconnected, or overloaded, what happens? Does a workflow fall back to a cloud model? Does it retry later? Does it mark the task as blocked? These are not theoretical questions once a local model moves from experiment to infrastructure.

I started keeping a small backlog of follow-up items: evaluate faster runtimes, expose local inference more cleanly to the private network, move bounded classifiers over one by one, and decide which workflows deserve fallback paths.

The key was to migrate gradually. Perceptor should earn trust task by task.

The feeling of having a local brain

There is something satisfying about hearing the machine spin quietly through a task that used to require a cloud API.

Not because local is morally superior. It is not. The cloud is amazing. But personal infrastructure feels different when part of the intelligence is physically yours. The system becomes more self-contained. Less like a pile of API calls. More like a workshop with its own tools.

Perceptor gave The Hub that feeling.

It also gave me a natural next step. SAGE already had a bounded classification workflow for YouTube videos. The input was small, the output was structured, the evaluation set was clear, and correctness was easy to check.

That made SAGE the perfect first real migration.

Perceptor: Giving The Hub Its Own Local Brain

Why a dedicated machine

Provisioning is part of the architecture

The first useful benchmark

Local inference changes the economics

It does not replace cloud models

A new kind of dependency

The feeling of having a local brain

Related Posts

The Podcast Pipeline: From Listening Queue to Knowledge System

Moving SAGE to Local Models

The Worker Model: Choosing Where AI Work Runs