exo: Run your own AI cluster at home with everyday devices. Maintained by exo labs.
exo connects all your devices into an AI cluster. Not only does exo enable running models larger than would fit on a single device, but with day-0 support for RDMA over Thunderbolt, makes models run faster as you add more devices.
Features
Automatic Device Discovery: Devices running exo automatically discover each other - no manual configuration.
Topology-Aware Auto Parallel: exo figures out the best way to split your model across all available devices based on a realtime view of your device topology. It takes into account device resources and network latency/bandwidth between each link.
Tensor Parallelism: exo supports sharding models, for up to 1.8x speedup on 2 devices and 3.2x speedup on 4 devices.
MLX Support: exo uses MLX as an inference backend and MLX distributed for distributed communication.
Benchmarks
Qwen3-235B (8-bit) on 4 × M3 Ultra Mac Studio with Tensor Parallel RDMA
Devices running exo automatically discover each other, without needing any manual configuration. Each device provides an API and a dashboard for interacting with your cluster (runs at http://localhost:52415).
Important note for Linux users: Currently, exo runs on CPU on Linux. GPU support for Linux platforms is under development. If you’d like to see support for your specific Linux hardware, please search for existing feature requests or create a new one.
macOS App
exo ships a macOS app that runs in the background on your Mac.
The app will ask for permission to modify system settings and install a new Network profile. Improvements to this are being worked on.
Enabling RDMA on macOS
RDMA is a new capability added to macOS 26.2. It works on any Mac with Thunderbolt 5 (M4 Pro Mac Mini, M4 Max Mac Studio, M4 Max MacBook Pro, M3 Ultra Mac Studio).
Note that on Mac Studio, you cannot use the Thunderbolt 5 port next to the Ethernet port.
To enable RDMA on macOS, follow these steps:
Shut down your Mac.
Hold down the power button for 10 seconds until the boot menu appears.
Select “Options” to enter Recovery mode.
When the Recovery UI appears, open the Terminal from the Utilities menu.
In the Terminal, type:
rdma_ctl enable
and press Enter.
Reboot your Mac.
After that, RDMA will be enabled in macOS and exo will take care of the rest.
Using the API
If you prefer to interact with exo via the API, here is an example creating an instance of a small model (mlx-community/Llama-3.2-1B-Instruct-4bit), sending a chat completions request and deleting the instance.
1. Preview instance placements
The /instance/previews endpoint will preview all valid placements for your model.
Send a POST to /instance with your desired placement in the instance field (the full payload must match types as in CreateInstanceParams), which you can copy from step 1:
On macOS, exo uses the GPU. On Linux, exo currently runs on CPU. We are working on extending hardware accelerator support. If you’d like support for a new hardware platform, please search for an existing feature request and add a thumbs up so we know what hardware is important to the community.
Contributing
See CONTRIBUTING.md for guidelines on how to contribute to exo.
exo: Run your own AI cluster at home with everyday devices. Maintained by exo labs.
exo connects all your devices into an AI cluster. Not only does exo enable running models larger than would fit on a single device, but with day-0 support for RDMA over Thunderbolt, makes models run faster as you add more devices.
Features
Benchmarks
Qwen3-235B (8-bit) on 4 × M3 Ultra Mac Studio with Tensor Parallel RDMA
Source: Jeff Geerling: 15 TB VRAM on Mac Studio – RDMA over Thunderbolt 5
DeepSeek v3.1 671B (8-bit) on 4 × M3 Ultra Mac Studio with Tensor Parallel RDMA
Source: Jeff Geerling: 15 TB VRAM on Mac Studio – RDMA over Thunderbolt 5
Kimi K2 Thinking (native 4-bit) on 4 × M3 Ultra Mac Studio with Tensor Parallel RDMA
Source: Jeff Geerling: 15 TB VRAM on Mac Studio – RDMA over Thunderbolt 5
Quick Start
Devices running exo automatically discover each other, without needing any manual configuration. Each device provides an API and a dashboard for interacting with your cluster (runs at
http://localhost:52415).There are two ways to run exo:
Run from Source (macOS)
Prerequisites:
brew (for simple package management on macOS)
uv (for Python dependency management)
macmon (for hardware monitoring on Apple Silicon)
node (for building the dashboard)
rust (to build Rust bindings, nightly for now)
Clone the repo, build the dashboard, and run exo:
This starts the exo dashboard and API at http://localhost:52415/
Run from Source (Linux)
Prerequisites:
Installation methods:
Option 1: Using system package manager (Ubuntu/Debian example):
Option 2: Using Homebrew on Linux (if preferred):
Note: The
macmonpackage is macOS-only and not required for Linux.Clone the repo, build the dashboard, and run exo:
This starts the exo dashboard and API at http://localhost:52415/
Important note for Linux users: Currently, exo runs on CPU on Linux. GPU support for Linux platforms is under development. If you’d like to see support for your specific Linux hardware, please search for existing feature requests or create a new one.
macOS App
exo ships a macOS app that runs in the background on your Mac.
The macOS app requires macOS Tahoe 26.2 or later.
Download the latest build here: EXO-latest.dmg.
The app will ask for permission to modify system settings and install a new Network profile. Improvements to this are being worked on.
Enabling RDMA on macOS
RDMA is a new capability added to macOS 26.2. It works on any Mac with Thunderbolt 5 (M4 Pro Mac Mini, M4 Max Mac Studio, M4 Max MacBook Pro, M3 Ultra Mac Studio).
Note that on Mac Studio, you cannot use the Thunderbolt 5 port next to the Ethernet port.
To enable RDMA on macOS, follow these steps:
After that, RDMA will be enabled in macOS and exo will take care of the rest.
Using the API
If you prefer to interact with exo via the API, here is an example creating an instance of a small model (
mlx-community/Llama-3.2-1B-Instruct-4bit), sending a chat completions request and deleting the instance.1. Preview instance placements
The
/instance/previewsendpoint will preview all valid placements for your model.Sample response:
This will return all valid placements for this model. Pick a placement that you like. To pick the first one, pipe into
jq:2. Create a model instance
Send a POST to
/instancewith your desired placement in theinstancefield (the full payload must match types as inCreateInstanceParams), which you can copy from step 1:Sample response:
3. Send a chat completion
Now, make a POST to
/v1/chat/completions(the same format as OpenAI’s API):4. Delete the instance
When you’re done, delete the instance by its ID (find it via
/stateor/instanceendpoints):Other useful API endpoints:*
curl http://localhost:52415/modelscurl http://localhost:52415/stateFor further details, see API types and endpoints in src/exo/master/api.py.
Hardware Accelerator Support
On macOS, exo uses the GPU. On Linux, exo currently runs on CPU. We are working on extending hardware accelerator support. If you’d like support for a new hardware platform, please search for an existing feature request and add a thumbs up so we know what hardware is important to the community.
Contributing
See CONTRIBUTING.md for guidelines on how to contribute to exo.