Fix copy code button not working in dashboard (#1659)
Fixes #1657
Motivation
The copy button on code blocks in the dashboard does nothing when clicked. This affects all code blocks in assistant responses.
Root Cause
In
MarkdownContent.svelte, click event listeners are bound to.copy-code-btnelements viasetupCopyButtons()inside a Svelte 5$effect. However, the effect fires before the DOM has been updated with the new HTML, soquerySelectorAll(".copy-code-btn")finds zero buttons.Additionally, during streaming, the
contentprop updates on every token, causing the entire{@html processedHtml}to be re-rendered. This destroys all previously bound event listeners, even if they were successfully attached.Changes
Replaced the per-button
addEventListenerapproach with event delegation — a single click listener on the container element that catches clicks bubbling up from any.copy-code-btnor.copy-math-btn. This:
- Eliminates the timing issue (the listener exists before the buttons are rendered)
- Survives HTML re-renders during streaming (no need to re-bind)
- Removes the need for
setupCopyButtons()and thedata-listenerBoundtrackingTesting
- Load any model
- Prompt it to generate a code block (e.g. “write a hello world in Python”)
- Click the copy button on the code block
- Paste — the code is copied correctly
- Verified the button also works during streaming (before generation completes)
Co-authored-by: Wysie wysie@users.noreply.github.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
exo: Run frontier AI locally. Maintained by exo labs.
exo connects all your devices into an AI cluster. Not only does exo enable running models larger than would fit on a single device, but with day-0 support for RDMA over Thunderbolt, makes models run faster as you add more devices.
Features
Dashboard
exo includes a built-in dashboard for managing your cluster and chatting with models.
4 × 512GB M3 Ultra Mac Studio running DeepSeek v3.1 (8-bit) and Kimi-K2-Thinking (4-bit)
Benchmarks
Qwen3-235B (8-bit) on 4 × M3 Ultra Mac Studio with Tensor Parallel RDMA
Source: Jeff Geerling: 15 TB VRAM on Mac Studio – RDMA over Thunderbolt 5
DeepSeek v3.1 671B (8-bit) on 4 × M3 Ultra Mac Studio with Tensor Parallel RDMA
Source: Jeff Geerling: 15 TB VRAM on Mac Studio – RDMA over Thunderbolt 5
Kimi K2 Thinking (native 4-bit) on 4 × M3 Ultra Mac Studio with Tensor Parallel RDMA
Source: Jeff Geerling: 15 TB VRAM on Mac Studio – RDMA over Thunderbolt 5
Quick Start
Devices running exo automatically discover each other, without needing any manual configuration. Each device provides an API and a dashboard for interacting with your cluster (runs at
http://localhost:52415).There are two ways to run exo:
Run from Source (macOS)
If you have Nix installed, you can skip most of the steps below and run exo directly:
Note: To accept the Cachix binary cache (and avoid the Xcode Metal ToolChain), add to
/etc/nix/nix.conf:Then restart the Nix daemon:
sudo launchctl kickstart -k system/org.nixos.nix-daemonPrerequisites:
Xcode (provides the Metal ToolChain required for MLX compilation)
brew (for simple package management on macOS)
uv (for Python dependency management)
macmon (for hardware monitoring on Apple Silicon)
node (for building the dashboard)
rust (to build Rust bindings, nightly for now)
Clone the repo, build the dashboard, and run exo:
This starts the exo dashboard and API at http://localhost:52415/
Please view the section on RDMA to enable this feature on MacOS >=26.2!
Run from Source (Linux)
Prerequisites:
Installation methods:
Option 1: Using system package manager (Ubuntu/Debian example):
Option 2: Using Homebrew on Linux (if preferred):
Note: The
macmonpackage is macOS-only and not required for Linux.Clone the repo, build the dashboard, and run exo:
This starts the exo dashboard and API at http://localhost:52415/
Important note for Linux users: Currently, exo runs on CPU on Linux. GPU support for Linux platforms is under development. If you’d like to see support for your specific Linux hardware, please search for existing feature requests or create a new one.
Configuration Options:
--no-worker: Run exo without the worker component. Useful for coordinator-only nodes that handle networking and orchestration but don’t execute inference tasks. This is helpful for machines without sufficient GPU resources but with good network connectivity.File Locations (Linux):
exo follows the XDG Base Directory Specification on Linux:
~/.config/exo/(or$XDG_CONFIG_HOME/exo/)~/.local/share/exo/(or$XDG_DATA_HOME/exo/)~/.cache/exo/(or$XDG_CACHE_HOME/exo/)You can override these locations by setting the corresponding XDG environment variables.
macOS App
exo ships a macOS app that runs in the background on your Mac.
The macOS app requires macOS Tahoe 26.2 or later.
Download the latest build here: EXO-latest.dmg.
The app will ask for permission to modify system settings and install a new Network profile. Improvements to this are being worked on.
Custom Namespace for Cluster Isolation:
The macOS app includes a custom namespace feature that allows you to isolate your exo cluster from others on the same network. This is configured through the
EXO_LIBP2P_NAMESPACEsetting:Use cases:
Configuration: Access this setting in the app’s Advanced settings (or set the
EXO_LIBP2P_NAMESPACEenvironment variable when running from source)The namespace is logged on startup for debugging purposes.
Uninstalling the macOS App
The recommended way to uninstall is through the app itself: click the menu bar icon → Advanced → Uninstall. This cleanly removes all system components.
If you’ve already deleted the app, you can run the standalone uninstaller script:
This removes:
Note: You’ll need to manually remove EXO from Login Items in System Settings → General → Login Items.
Enabling RDMA on macOS
RDMA is a new capability added to macOS 26.2. It works on any Mac with Thunderbolt 5 (M4 Pro Mac Mini, M4 Max Mac Studio, M4 Max MacBook Pro, M3 Ultra Mac Studio).
Please refer to the caveats for immediate troubleshooting.
To enable RDMA on macOS, follow these steps:
After that, RDMA will be enabled in macOS and exo will take care of the rest.
Important Caveats
tmp/set_rdma_network_config.sh, which will disable Thunderbolt Bridge and set dhcp on each RDMA port.Using the API
If you prefer to interact with exo via the API, here is an example creating an instance of a small model (
mlx-community/Llama-3.2-1B-Instruct-4bit), sending a chat completions request and deleting the instance.1. Preview instance placements
The
/instance/previewsendpoint will preview all valid placements for your model.Sample response:
This will return all valid placements for this model. Pick a placement that you like. To pick the first one, pipe into
jq:2. Create a model instance
Send a POST to
/instancewith your desired placement in theinstancefield (the full payload must match types as inCreateInstanceParams), which you can copy from step 1:Sample response:
3. Send a chat completion
Now, make a POST to
/v1/chat/completions(the same format as OpenAI’s API):4. Delete the instance
When you’re done, delete the instance by its ID (find it via
/stateor/instanceendpoints):Other useful API endpoints:*
curl http://localhost:52415/modelscurl http://localhost:52415/stateFor further details, see:
Benchmarking
The
exo-benchtool measures model prefill and token generation speed across different placement configurations. This helps you optimize model performance and validate improvements.Prerequisites:
uv run exobefore benchmarking/bench/chat/completionsendpointBasic usage:
Key parameters:
--model: Model to benchmark (short ID or HuggingFace ID)--pp: Prompt size hints (comma-separated integers)--tg: Generation lengths (comma-separated integers)--max-nodes: Limit placements to N nodes (default: 4)--instance-meta: Filter byring,jaccl, orboth(default: both)--sharding: Filter bypipeline,tensor, orboth(default: both)--repeat: Number of repetitions per configuration (default: 1)--warmup: Warmup runs per placement (default: 0)--json-out: Output file for results (default: bench/results.json)Example with filters:
The tool outputs performance metrics including prompt tokens per second (prompt_tps), generation tokens per second (generation_tps), and peak memory usage for each configuration.
Hardware Accelerator Support
On macOS, exo uses the GPU. On Linux, exo currently runs on CPU. We are working on extending hardware accelerator support. If you’d like support for a new hardware platform, please search for an existing feature request and add a thumbs up so we know what hardware is important to the community.
Contributing
See CONTRIBUTING.md for guidelines on how to contribute to exo.