// router/router.go - The aggregation service
func NewService(endpoints []*routing.Endpoint, usageSvc *usage.Service, retryMax int) *Service {
pool := routing.NewEndpointPool(endpoints, retryMax)
return &Service{pool: pool, usageSvc: usageSvc}
}
func (s *Service) Forward(ctx context.Context, rawReq []byte, clientFormat string) {
resp, usage, _ := call.Request(ctx, s.pool, rawReq, clientFormat)
s.usageSvc.Record(usage) // Track cost
return resp
}
All routing handled by fluxcore domain layer.
Protocol Conversion
# Use Anthropic SDK with OpenAI backend
export ANTHROPIC_BASE_URL=http://127.0.0.1:8765
# Use OpenAI SDK with Anthropic backend
export OPENAI_API_BASE=http://127.0.0.1:8765/v1
# Format conversion happens automatically
# Your SDK doesn't know the difference
Circuit Breaker
Model state machine (fluxcore):
Healthy → Fail(1) → Fail(2) → Fail(3) → Unhealthy
↓
60s auto recovery
tokrouter auto-switches to next healthy model.
Model-Level Routing
Requests are routed to endpoints matching the requested model:
tokrouter
Your LLM Aggregator
One config file. All your LLM APIs.
4 Lines to Production
Challenges
Managing multiple LLM providers presents these challenges:
The Solution
tokrouter gives you unified control:
Installation
Pre-built Binaries (Recommended)
Download from GitHub Releases:
Linux (amd64)
Linux (arm64 / Raspberry Pi)
macOS (Intel)
macOS (Apple Silicon)
Windows (PowerShell)
Verify:
Build from Source
Docker
Quick Start
Manual Docker Run
Available Tags
latestv0.1.0mainQuick Start
Who Uses This
Architecture
How It Works
All routing handled by fluxcore domain layer.
Protocol Conversion
Circuit Breaker
Model-Level Routing
Requests are routed to endpoints matching the requested model:
Request for
gpt-4→ only routes to gpt-4 endpoints (not gpt-3.5-turbo).Model Alias
Map request model names to actual model names:
Hot Reload
Reload config without restart:
Latency-Aware Routing
Endpoints are selected by:
This avoids slow endpoints automatically.
Cost Tracking
CLI Commands
API Endpoints
POST /v1/chat/completionsPOST /v1/messagesGET /statusGET /healthAI Tool Integration
Claude Code (Anthropic format):
aider (OpenAI format):
Cursor / VS Code Copilot:
Protocol Support
/v1/chat/completions/v1/messagesOpenAI-Compatible Providers:
https://open.bigmodel.cn/api/paas/v4https://api.deepseek.comhttps://api.mistral.aihttps://api.groq.comDirectory Structure
FAQ
Q: How do I add a new API key?
Q: How do I test if my key works?
Q: Pricing unit - what does
input: 0.03mean?It means $0.03 per 1,000 tokens. Example: 100K input tokens = 0.03×100=3.00.
Q: How do I use Claude Code with tokrouter?
Q: How do I use aider with tokrouter?
Q: Why port 8765?
Port 7890 conflicts with Clash proxy. 8765 is uncommon and safe.
Q: Does tokrouter support streaming?
Yes, streaming is fully supported for both OpenAI and Anthropic formats.
Q: How does automatic fallback work?
Model fails 3 times → marked unhealthy → auto-switch to next healthy model. 60 seconds later, retry unhealthy model.
Get Started
Next steps:
tokrouter initto configurehttp://127.0.0.1:8765Related Projects
License
MIT. Free forever.
tokrouter - Your LLM Aggregator. One config, one binary, full control.
中文说明
tokrouter
你的 LLM 聚合器
一个配置文件,聚合所有 LLM API。
4 行配置上线
问题背景
多 LLM 提供商管理面临的挑战:
解决方案
tokrouter 给你统一掌控:
安装
下载预编译二进制(推荐)
从 GitHub Releases 下载:
Linux
macOS (M1/M2)
Windows
验证安装:
从源码构建
Docker 部署
快速开始
手动 Docker 运行
快速开始
适用人群
架构
工作原理
所有路由逻辑由 fluxcore 领域层处理。
协议转换
熔断器
模型级路由
请求只路由到匹配模型的端点:
请求
gpt-4→ 只路由到 gpt-4 端点(不会路由到 gpt-3.5-turbo)。模型别名
将请求模型名映射到实际模型名:
热重载
无需重启即可重载配置:
延迟感知路由
端点选择策略:
自动避开慢端点。
成本追踪
CLI 命令
API 端点
POST /v1/chat/completionsPOST /v1/messagesGET /statusGET /healthAI 工具集成
Claude Code(Anthropic 格式):
aider(OpenAI 格式):
Cursor / VS Code Copilot:
协议支持
/v1/chat/completions/v1/messagesOpenAI 兼容提供商:
https://open.bigmodel.cn/api/paas/v4https://api.deepseek.comhttps://api.mistral.aihttps://api.groq.com目录结构
常见问题
Q: 如何添加新的 API 密钥?
Q: 如何测试密钥是否可用?
Q: pricing 单位是什么?
input: 0.03表示每 1000 tokens $0.03。例如:100K 输入 tokens = 0.03×100=3.00。Q: 如何用 Claude Code 连接 tokrouter?
Q: 如何用 aider 连接 tokrouter?
Q: 为什么端口是 8765?
端口 7890 与 Clash 代理冲突。8765 不常用,更安全。
Q: 支持流式响应吗?
支持,OpenAI 和 Anthropic 格式都完全支持流式。
Q: 自动降级如何工作?
模型失败 3 次 → 标记不健康 → 自动切换下一个健康模型。60 秒后重试不健康模型。
快速上手
下一步:
tokrouter init配置http://127.0.0.1:8765相关项目
许可证
MIT。永久免费。
tokrouter - 你的 LLM 聚合器。一个配置,一个二进制,完全掌控。