docs: sync the doc system with regenerate / loop / concession + P3 work
The skill driver, user manual, READMEs, spec, and status handoff had fallen behind the code:
- SKILL.md: add
regenerateandloopto the command table, document the anti-sycophancy concession gate oncritique, and correct the stale “no autonomous loop” / “doesn’t handle figures/tables” boundaries.- user-manual.zh-CN.md: add
regenerateandloopsections, a concession note oncritique, and refresh the analyze-repo signals / PDF-caption / loop limitation lines.- README (en + zh): fix front-matter claims that
loopisn’t exposed yet, add the new analyze-repo signal fields and the figure/table caption note, and droploopfrom the zh “not yet implemented” list.- research-copilot-v1-spec.md: add an implementation-status banner mapping the proposed commands to what actually shipped (loop/critique/regenerate/collect).
- handoff-next-steps.zh-CN.md: refresh the weaknesses list (sections+captions, repo signals, the full workflow command set incl. loop).
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
AI Researcher
CLI-first Research Copilot prototype. The user-facing CLI is still v0-style, while the codebase now contains a first v1 workflow slice.
For a fuller Chinese user manual, see
docs/user-manual.zh-CN.md.What It Does
This project is not a general-purpose autonomous researcher. The current CLI focuses on four task types plus seven inspection/workflow helpers:
topicTurns a research topic into a structured research map.analyze-paperAnalyzes a local file or paper identifier/URL and produces a structured paper card style output.analyze-repoAnalyzes a local repository and produces a first-pass structure and reproducibility review.plan-experimentsTurns a research idea into multiple falsifiable experiment plans.searchSearches arXiv and (optionally, via--source) OpenAlex, and prints matching papers to stdout.collectSearches for papers and appends them as de-duplicated evidence refs to an existing task, without changing its workflow status.critiqueRuns the structured critic on a task and advances it through thecriticizing → criticizedgate. This is the only way past the critic gate.statusPrints the saved workflow status, suggested next transition, and whether a critique or human decision is still required.advanceAdvances a saved task through the workflow state machine after explicit human input. Refuses to leavecriticizinguntilcritiquehas run.reportRewritesreport.mdfor an existing task from the saved task state.reviewPrints the path to the savedtask.jsonfor an existing task.Current Scope
What the current prototype does:
artifacts/prompts/.envis configuredpdfminer.six, falling back topypdfwhen neededpaper_card.method,paper_card.datasets, andpaper_card.metricsentries when they do not appear in extracted texttopic,analyze-paper, andplan-experimentswhen the model path is enabled (analyze-paperkeeps these external papers separate from in-paper grounding, so other papers’ abstracts can’t masquerade as verbatim quotes from the analyzed paper)--source {arxiv,openalex,all}onsearchandcollect, while the default and all internal auto-search remain arXiv-only, and a single provider failing degrades to a warning plus the other providers’ resultscriticizing,critiqueis the only edge tocriticized, and no task can reachdonewithout ittask.critique_logacrossregenerate) raises aconcession_alarmand needscritique --overrideto proceedregenerate) and auto-drives the workflow to the next decision point (loop), stopping at every critic block or human gate rather than forcing past itCurrent boundaries:
loopis bounded by design: it never auto-resolves a critic block or auto-approves the human-decision gatereportis a regeneration helper and can overwrite the original generatedreport.mdEnvironment
Preferred local setup:
If
uvis unavailable,venvalso works:For an OpenAI-compatible model provider, create a local
.envfrom.env.exampleand fill in your endpoint, key, and default model:Expected variables:
OPENAI_API_KEYOPENAI_BASE_URLOPENAI_MODELOPENAI_TIMEOUT_SECONDSOPENAI_MAX_RETRIESQuick Start
Use the installed CLI inside the environment:
You can also use the repo-local launcher without installation:
Each core command prints the generated
task.jsonpath, for example:The
task_idis the directory name underartifacts/tasks/, such astopic-map-weak-to-strong-alignment-1a2b3c4d. Use that ID with the helper commands:Commands And Outputs
topicExample:
Input:
Primary output fields in
outputs/result.json:summaryresearch_questionshypothesesreading_queueevidence_needsnext_actionHuman-readable output:
report.mdwith summary, questions, hypotheses, reading queue, and next actionanalyze-paperExample:
Accepted input:
Primary output fields in
outputs/result.json:summarypaper_inputresolved_inputclaims_checklistpaper_cardungrounded_paper_card_fieldswhen model output contains method, dataset, or metric entries not found in extracted textnext_actionNotes:
downloads/downloads/index.jsonpdfminer.six, falling back topypdfwhen neededanalyze-repoExample:
Accepted input:
Primary output fields in
outputs/result.json:summaryrepo_pathfile_counttop_file_typesreadme_excerptdependency_filesentrypointstest_filesconfig_filestrain_eval_scriptsci_configsnotebooksdata_dirsrun_commandscritical_checksreproducibility_notesreadme_assessmentwhen generated by the model pathnext_actionNotes:
plan-experimentsExample:
Input:
Primary output fields in
outputs/result.json:summaryexperiment_plansnext_actionEach item in
experiment_plansincludes:nameobjectivebaselinemetricsfailure_conditionsriskssearchExample:
Behavior:
--source {arxiv,openalex,all}selects the backend (defaultarxiv, so OpenAlex is opt-in)artifacts/or attach results to an existing taskcollectExample:
Behavior:
source_type+source_ref) to an existing task--queryoverrides it,--maxdefaults to3, and--source {arxiv,openalex,all}selects the backend (defaultarxiv)evidence.json,task.jsonevidence refs, adds acollect_evidenceentry to the decision log, and logs tologs/events.logThis closes the critic loop: when
critiqueblocks a task back tocollecting_evidence, runcollectto attach external corroboration, thenadvance(collecting_evidence → criticizing), thencritiqueagain.statusExample:
Behavior:
critiqueExample:
Behavior:
outputs/critic.json+critic.mdrecommended_actionand open-concern count) totask.jsoncriticizing, advances it tocriticized— the only edge out of the critic gatetask.critique_log(preserved acrossregenerate). On a re-critique that flips toward passing while dropping >50% of the prior round’s concerns (or concedes again right after a prior concession), it raises aconcession_alarm, holds the task atcollecting_evidence, and onlycritique --override REASONproceeds — stopping the model from rubber-stamping a regenerated result for conversational harmonyoutputs/result.jsonadvanceExample:
Behavior:
criticizingand tells you to runcritiquefirsttask.jsonandlogs/events.logregenerateExample:
Behavior:
task_id, overwritingresult.json/report.md(collectonly attaches evidence andadvanceonly changes state — neither recomputes the product)source_type+source_ref), so externallycollect-ed corroboration survivescritic.json/critic.md/ critic refs, then rewinds the task tocriticizingto face the gate againcollecting_evidence/criticizing/criticized)collectcannot: when the critic blocks because a field is absent from the paper itself (in-paper grounding), only recomputing the analysis helps, not more external evidenceloopExample:
Behavior:
advance, and atcriticizingit runs the real critic (reusingcritique/advance, so the gate logic stays single-sourced)criticizing → collecting_evidence), the human-decision gate (needs_human_decision), terminal states (done/failed), and a safety step capcollect/regenerateto push past the critic gate — the gate exists to stop blind advancement, so resolving a block is a human call. loop stops and prints the remediation commands (collect/regenerate/critique --override); resolve, then runloopagain to continue from where it stoppedreportExample:
Behavior:
report.mdfrom saved task statereport.md; the regenerated file is a task-state report and may be less detailed than the original generated analysis reportreviewExample:
Behavior:
task.jsonTest
Output Layout
artifacts/tasks/<task_id>/task.jsonartifacts/tasks/<task_id>/report.mdartifacts/tasks/<task_id>/evidence.jsonartifacts/tasks/<task_id>/inputs/pdf_text.txtwhen extractable PDF text existsartifacts/tasks/<task_id>/logs/events.logartifacts/tasks/<task_id>/outputs/result.jsondownloads/stores fetched paper filesdownloads/index.jsonstores download dedup metadataprompts/<task>/system.txtandprompts/<task>/user.txtstore editable prompt assetsv1 Slice Already Present
The latest code includes a small v1 foundation, but it is not a complete v1 product yet:
src/ai_researcher/workflow.pydefines research-loop statuses and transition validation.TaskRecordalready has fields forresearch_question,hypotheses,evidence_needs,critic_refs, anddecision_log.docs/research-copilot-v1-spec.mddescribes the target v1 workflow and open questions.The current executable CLI still exposes only:
collectis the real, wired command (see the### collectsection above) and is distinct from the still-unimplemented names below. Do not documentstart,loop,collect-papers,add-evidence,evidence, orpropose-runas available commands until they are wired intosrc/ai_researcher/cli.py.File Meanings
task.jsonThis is the task state snapshot. It contains:
task_idtask_typestatusinputsummaryevidence_refsnext_actionIt is the control record for the task, not the full analysis body.
outputs/result.jsonThis is the main structured output for the task.
This file contains the task-specific payload, for example:
topicpaper_cardforanalyze-paperanalyze-repoexperiment_plansforplan-experimentsIf you want the actual structured result, this is the main file to inspect.
report.mdThis is the human-readable version of the result.
It is not identical to
task.json. It is derived from the structured output and meant for reading, not programmatic consumption.outputs/critic.jsonandcritic.mdThese are created by
research critique <task_id>. The JSON file is the structured critic result; the markdown file is the human-readable version. Critiques do not overwrite the originaloutputs/result.json.evidence.jsonThis stores the evidence references attached to the task.
inputs/pdf_text.txtThis stores extracted PDF text for human audit when PDF text extraction succeeds.
logs/events.logThis stores task lifecycle events such as creation and finalization.
Verified End-To-End Paths
Real model-backed end-to-end runs have been verified for:
topicanalyze-paperplan-experimentsThe closed critic loop (
topic→collect --source openalex→critique) has also been verified with a real model and real network: OpenAlex was live-verified, and arXiv instability (timeout / HTTP 429) was reproduced, with OpenAlex providing resilience so the task still completed.analyze-repois implemented and available, but the current implementation is still a local static analysis pass rather than a richer execution-aware repo workflow.