Local AI inference,
across your entire fleet.

Run open-source AI across all your machines. No cloud service. Your data stays on your own hardware and network.

$ pip install 'vimin-core[mlx]'

$ vimin-core start-center

View on GitHub Talk to us about enterprise →

Your devices, working together

Most AI tools run on one machine at a time. vimin-core changes that.

Most homes and offices have more computing power sitting idle than they realise. vimin-core treats all of it as one: send a task to every machine at once, each one runs it locally with its own data, and results come back in seconds. You decide what gets sent back to the center and what stays on the device.

With vimin enterprise

In the enterprise version, agents talk to each other rather than just receiving tasks. One machine can break a problem into pieces and hand them off to others. You can route tasks to specific devices, set rules about what data moves where, and plug the whole thing into your existing tools and workflows.

Architecture

How it works

One machine routes the work. The rest run the models.

01

Start a center node

Run vimin-core start-center on any machine in your network. It routes tasks. No inference happens here.

02

Connect inference nodes

Run vimin-core start-agent on each machine after exporting the center's API key and fleet token. Same-LAN machines can usually discover the center automatically over mDNS.

03

Send tasks

POST to /api/tasks. The center routes work to agents, collects results, and returns them. All on your network.

# 1. Install (pick your hardware)
pip install 'vimin-core[mlx]'       # Apple Silicon
pip install 'vimin-core[llamacpp]'  # Linux / Windows / CUDA / CPU

# 2. Start the center node
vimin-core start-center                         # localhost only (single machine)
vimin-core start-center --host 0.0.0.0          # accept agents from other machines

# 3. Connect agents (run on each inference machine)
export ORCHESTRATOR_API_KEY=<center-api-key>
export VIMIN_FLEET_TOKEN=<center-fleet-token>
vimin-core start-agent                          # same machine or same LAN (auto-discovered when mDNS is available)
vimin-core start-agent --center http://<center-ip>:8080  # cross-subnet / explicit URL

# 4. Broadcast a task
vimin-core broadcast "Summarize Q3 results." --mode return

Enterprise

Built for serious deployments

Nodes that talk to each other and run multi-step jobs without a human in the loop.

Agent-to-agent coordination

Nodes can break a task into pieces and pass them to other nodes through the control plane. Each step still runs locally. OpenClaw nodes join the same fleet alongside standard agents.

Fleet pipelines & workflows

Chain inference tasks across nodes so the output of one model becomes the input to the next. Define multi-step pipelines with conditional branching and per-step hardware routing.

Hierarchical node orchestration

Assign department-level administrative nodes that coordinate their own cluster of inference machines. You can also define virtual graph nodes that route across departments, tags, and other topology nodes.

Data sovereignty controls

Choose exactly what flows back to the center node. Sensitive inference results can stay on the edge device entirely.

Plugin integrations

Connect vimin to your existing stack. First-party integrations include LiveKit for real-time audio and video inference pipelines. Build your own with the plugin API.

Custom models & hardware adaptation

Run fine-tuned or proprietary models alongside standard registry models. Hardware backends not covered by vimin-core can be added with dedicated integration work.

Priority support & SLA

Direct access to the engineering team. Response time SLAs with guaranteed escalation for production incidents.

Source-available vs. Enterprise

vimin-core vs. vimin

Start free and self-hosted. Upgrade when you need more.

	vimin-core Free · Source available	vimin Enterprise
Max nodes	10	Unlimited
Broadcast dispatch	✓	✓
OpenClaw node support	broadcast only	✓ + agent coordination
SSO & Role-based Access Control (RBAC)	—	✓
Audit Logging & Compliance Reporting (SOC2)	basic local audit log	✓
High Availability (HA) Center Nodes	—	✓
Enterprise Telemetry & Observability Export	—	✓
Air-gapped deployment support	self-hosted	✓
Per-node task targeting	—	✓
Manual approval for new agents	—	✓
Hierarchical & graph node topology	—	✓
Fleet pipelines & workflows	basic center-driven pipelines	✓
Data sovereignty controls	edge-only result mode	✓
LiveKit & plugin integrations	—	✓
MLX · llama-cpp · ONNX backends	✓	✓
Custom models & hardware backends	—	✓
Priority support & SLA	—	✓

Ready to scale beyond 10 nodes?

Book a 15-minute call to talk through your deployment.

Schedule a conversation →

Or email us at

Local AI inference, across your entire fleet.