A TypeScript-first AI coding agent for the terminal. Run a smart agent locally,
control it from your phone over the LAN after pairing a device, and dispatch
sub-agents to peer machines.
$ magi
△ Magi · 90 tools
/✦\ cwd: ~/code/my-project
▔▔▔ model: openai:gpt-5.5
/help for commands · Ctrl+C to interrupt · /exit to quit
> refactor src/auth.ts to use the new session API
Quick start
# Install from this repository
git clone https://github.com/EDLee01/magi.git
cd magi
npm install
npm run build
npm link
# Set one provider key
export OPENAI_API_KEY="<your-key>"
# or: export ANTHROPIC_AUTH_TOKEN="<your-key>"
# or: export DEEPSEEK_API_KEY="<your-key>"
# Configure (interactive)
magi init
# Use it
magi # Interactive TUI
magi -p "explain this repo" # One-shot prompt
If you don’t set a key first, magi init will tell you which env var to set
and bail out cleanly.
What it does well
Real agent loop with parallel tool calls — file ops, shell, git, web,
MCP servers, sub-agents.
Context-safe file editing — FilePatch applies unified-diff hunks with
exact context matching for multi-line edits.
Smart routing — /model auto picks the configured fast/main/deep
aliases by task kind.
Plan mode — EnterPlanMode for non-trivial work, ask for approval
before implementing.
Cross-machine agents — discover other Magi daemons via mDNS, dispatch
sub-agents with target: 'peer-name'.
Mobile control — start a LAN-bound daemon, run magi pair, open the
printed /panel URL on your phone, then enter the Device ID and Token.
Persistent memory graph — durable Memory is indexed into weighted nodes
and edges; recall reinforces useful paths, and magi memory feedback lets
users mark memories useful, irrelevant, stale, or wrong.
Learning Loop v1 — recalls relevant prior sessions, memory, and skills
before work; creates reviewable LearningDrafts after reusable lessons.
Skills — bundled verify / debug / stuck / commit-msg /
review-pr. Add your own by dropping a SKILL.md file or applying an
approved skill LearningDraft.
Five-minute tutorial
magi tutorial
Walks through 8 sections (basics, models, files, sessions, skills, memory,
multi-machine, sub-agents). Press q to quit early.
Inside the TUI, type /help to list slash commands. Type /help <name> for
details on one.
Learning Loop
Magi now performs a local-first recall pass before provider calls. It retrieves
relevant durable Memory, installed skills, and prior session snippets, then
injects them as fenced background context. Recalled text is context, not a new
user instruction.
After explicit learning requests or sufficiently complex tasks, Magi can create
a pending LearningDraft under ~/.magi-next/state/learning-drafts/. Drafts can
target Memory, new skills, skill patches, or do_not_save; they do not mutate
Memory or skills until you apply them.
magi learning list
magi learning draft show <id>
magi learning draft apply <id>
magi learning draft reject <id>
Agents can also discover the deferred SessionSearch, LearningDraft, and
SkillManage tools through ToolSearch. SkillManage is path-limited to the
configured skills root and requires normal write approval outside bypass modes.
Memory recall quality can be checked with reusable case files:
npm run test:memory-eval seeds an isolated Memory root through the public CLI,
then checks recall after restart, graph links, correction replacement, Dream
reject/apply behavior, and maintenance weight decay.
Patch Engine behavior can be checked with:
npm run test:patch-eval
That eval runs a real headless CLI session against a mock provider and verifies
FilePatch ranking, failed-patch recovery, successful retry, exact FileEdit use,
and that FileWrite is not used for existing-file edits.
Goal/Plan lifecycle behavior can be checked with:
npm run test:goal-plan-eval
That eval runs a real headless CLI session against a mock provider and verifies
active goal injection, completed-goal suppression, plan mode mutation denial,
submitted plan persistence, and goal completion state.
Tool Discovery behavior can be checked with:
npm run test:tool-discovery-eval
That eval runs a real headless CLI session against a mock provider and verifies
core/deferred tool exposure, ToolSearch intent ranking, select:<tool> schema
reveal, and persisted tool usage feedback affecting later ranking.
Control API behavior can be checked with:
npm run test:control-api-eval
That eval starts magi serve from the built CLI, pairs a device, verifies SSE
events, resolves a mobile approval for FileWrite, cancels a streaming background
job, cancels an active approval, resumes a panel session, and checks durable
audit evidence.
Complex task harness behavior can be checked with:
npm run test:complex-harness
That harness runs isolated H1-H10 business fixtures through the built CLI and
mock providers, then validates stream-json lifecycle, SQLite session/audit
evidence, file diffs, forbidden paths, multi-agent write conflicts, Bash
approval control, and provider retry/fallback routing.
Live provider behavior can be checked with an opt-in smoke task:
MAGI_LIVE_SMOKE=1 MAGI_OPENAI_API_KEY=... npm run test:live-smoke
The live smoke creates an isolated fixture, asks the configured model to fix a
failing test, reruns that test, and writes a short report. Without
MAGI_LIVE_SMOKE=1, the script skips and writes a skipped report so normal CI
does not depend on upstream model availability.
After the eval scripts run, aggregate the current capability evidence with:
npm run report:capability
npm run report:capability:nightly
npm run verify runs the aggregate report last and fails if blackbox, Memory,
Patch Engine, Goal/Plan, Tool Discovery, Control API, model task, or complex
harness gates miss their required thresholds. The default capability trend
profile is strict for CI.
report:capability:nightly uses the same evidence with a wider efficiency
budget for scheduled longer benchmark runs, while still failing on score,
regression, and excessive provider/tool call growth.
Configuration
~/.magi-next/config.yaml:
providers:
openai:
type: openai
apiKeyEnv: OPENAI_API_KEY
baseUrl: https://api.openai.com/v1
defaultModel: gpt-5.5
models:
aliases:
fast: openai:gpt-5.5
main: openai:gpt-5.5
review: openai:gpt-5.5
deep: openai:gpt-5.5
router: # used when alias = "auto"
fast: { family: gpt, role: haiku, contextWindow: 200000, supportsVision: true }
main: { family: gpt, role: sonnet, contextWindow: 200000, supportsVision: true }
deep: { family: gpt, role: opus, contextWindow: 200000, supportsVision: true }
memory:
selectionModel: fast # optional relevance selector for large memory sets
writeDecisionModel: fast # optional judge for remember/correction requests
Run magi init to generate a working config and skip the manual yaml.
It supports OpenAI, Anthropic, and DeepSeek credentials.
Cross-machine setup
# On each machine you want to use:
MAGI_CONTROL_BIND=0.0.0.0 magi daemon start
# On your "main" machine, see who's around:
magi peers
# Pair another machine (run on the peer to get a token):
magi pair from-peer
# → outputs a Device ID + Token
# On main, save the credentials:
magi peers add peer-2 http://192.168.1.50:8765 <device-id> <token>
# Now in the TUI, the agent can dispatch to peer-2:
> compare the auth modules in this repo and the one on peer-2
The agent uses the Agent tool with target: "peer-2" to dispatch
sub-agents. Multiple targets in the same response run in parallel.
Phone access
# The default daemon bind is 127.0.0.1, which a phone cannot reach.
magi daemon stop
MAGI_CONTROL_BIND=0.0.0.0 magi daemon start
magi pair my-phone
# → prints Device ID, Token, and one or more URLs like:
# http://192.168.1.10:8765/panel
Open the printed /panel URL on a phone connected to the same LAN, then enter
the printed Device ID and Token. Tokens are not placed in the URL. The current
CLI prints URLs and credentials; it does not generate a QR code.
Override the root with MAGI_CONFIG_DIR=/path for testing or sandboxing.
Building from source
git clone <this-repo>
cd magi-next
npm install
npm run build
npm test
npm run test:memory-eval
npm run test:patch-eval
npm run test:goal-plan-eval
npm run test:tool-discovery-eval
npm run test:control-api-eval
npm run test:live-smoke
npm run report:capability
Requires Node ≥ 20.
Status
Active development. The core agent loop, routing, MCP, daemon, multi-machine
dispatch, and mobile web panel are implemented and covered by tests. Beta
quality; APIs and UX may still change.
Filing bugs: open a GitHub issue with output of magi doctor and magi --version.
Magi Next
A TypeScript-first AI coding agent for the terminal. Run a smart agent locally, control it from your phone over the LAN after pairing a device, and dispatch sub-agents to peer machines.
Quick start
If you don’t set a key first,
magi initwill tell you which env var to set and bail out cleanly.What it does well
FilePatchapplies unified-diff hunks with exact context matching for multi-line edits./model autopicks the configured fast/main/deep aliases by task kind.EnterPlanModefor non-trivial work, ask for approval before implementing.target: 'peer-name'.magi pair, open the printed/panelURL on your phone, then enter the Device ID and Token.magi memory feedbacklets users mark memories useful, irrelevant, stale, or wrong.verify/debug/stuck/commit-msg/review-pr. Add your own by dropping aSKILL.mdfile or applying an approved skill LearningDraft.Five-minute tutorial
Walks through 8 sections (basics, models, files, sessions, skills, memory, multi-machine, sub-agents). Press
qto quit early.Common commands
magimagi -p "<prompt>"magi initmagi doctormagi sessionsmagi resume <id>magi memory search <q>magi memory link --from <node> --to <node>magi memory feedback --target <node> --signal usefulmagi memory feedback trendsmagi memory eval --case-file <file>magi learning listmagi psmagi logs <job-id>magi daemon startmagi pair <name>magi peersmagi tutorialInside the TUI, type
/helpto list slash commands. Type/help <name>for details on one.Learning Loop
Magi now performs a local-first recall pass before provider calls. It retrieves relevant durable Memory, installed skills, and prior session snippets, then injects them as fenced background context. Recalled text is context, not a new user instruction.
After explicit learning requests or sufficiently complex tasks, Magi can create a pending LearningDraft under
~/.magi-next/state/learning-drafts/. Drafts can target Memory, new skills, skill patches, ordo_not_save; they do not mutate Memory or skills until you apply them.Agents can also discover the deferred
SessionSearch,LearningDraft, andSkillManagetools throughToolSearch.SkillManageis path-limited to the configured skills root and requires normal write approval outside bypass modes.Memory recall quality can be checked with reusable case files:
npm run test:memory-evalseeds an isolated Memory root through the public CLI, then checks recall after restart, graph links, correction replacement, Dream reject/apply behavior, and maintenance weight decay.Patch Engine behavior can be checked with:
That eval runs a real headless CLI session against a mock provider and verifies FilePatch ranking, failed-patch recovery, successful retry, exact FileEdit use, and that FileWrite is not used for existing-file edits.
Goal/Plan lifecycle behavior can be checked with:
That eval runs a real headless CLI session against a mock provider and verifies active goal injection, completed-goal suppression, plan mode mutation denial, submitted plan persistence, and goal completion state.
Tool Discovery behavior can be checked with:
That eval runs a real headless CLI session against a mock provider and verifies core/deferred tool exposure, ToolSearch intent ranking,
select:<tool>schema reveal, and persisted tool usage feedback affecting later ranking.Control API behavior can be checked with:
That eval starts
magi servefrom the built CLI, pairs a device, verifies SSE events, resolves a mobile approval for FileWrite, cancels a streaming background job, cancels an active approval, resumes a panel session, and checks durable audit evidence.Complex task harness behavior can be checked with:
That harness runs isolated H1-H10 business fixtures through the built CLI and mock providers, then validates stream-json lifecycle, SQLite session/audit evidence, file diffs, forbidden paths, multi-agent write conflicts, Bash approval control, and provider retry/fallback routing.
Live provider behavior can be checked with an opt-in smoke task:
The live smoke creates an isolated fixture, asks the configured model to fix a failing test, reruns that test, and writes a short report. Without
MAGI_LIVE_SMOKE=1, the script skips and writes a skipped report so normal CI does not depend on upstream model availability.After the eval scripts run, aggregate the current capability evidence with:
npm run verifyruns the aggregate report last and fails if blackbox, Memory, Patch Engine, Goal/Plan, Tool Discovery, Control API, model task, or complex harness gates miss their required thresholds. The default capability trend profile is strict for CI.report:capability:nightlyuses the same evidence with a wider efficiency budget for scheduled longer benchmark runs, while still failing on score, regression, and excessive provider/tool call growth.Configuration
~/.magi-next/config.yaml:Run
magi initto generate a working config and skip the manual yaml. It supports OpenAI, Anthropic, and DeepSeek credentials.Cross-machine setup
The agent uses the
Agenttool withtarget: "peer-2"to dispatch sub-agents. Multiple targets in the same response run in parallel.Phone access
Open the printed
/panelURL on a phone connected to the same LAN, then enter the printed Device ID and Token. Tokens are not placed in the URL. The current CLI prints URLs and credentials; it does not generate a QR code.Documentation
magi tutorial— interactive walkthroughState and isolation
Everything lives at
~/.magi-next/by default:Override the root with
MAGI_CONFIG_DIR=/pathfor testing or sandboxing.Building from source
Requires Node ≥ 20.
Status
Active development. The core agent loop, routing, MCP, daemon, multi-machine dispatch, and mobile web panel are implemented and covered by tests. Beta quality; APIs and UX may still change.
Filing bugs: open a GitHub issue with output of
magi doctorandmagi --version.