AncientDoc: Benchmarking Vision-Language Models on Chinese Ancient Documents
📖 Introduction
Chinese ancient documents are invaluable carriers of history and culture, but their visual complexity, linguistic variety, and lack of benchmarks make them challenging for modern Vision-Language Models (VLMs). We introduce AncientDoc, the first benchmark designed for evaluating VLMs on Chinese ancient documents, covering the full pipeline from OCR to knowledge reasoning.
AncientDoc: Benchmarking Vision-Language Models on Chinese Ancient Documents
📖 Introduction
Chinese ancient documents are invaluable carriers of history and culture, but their visual complexity, linguistic variety, and lack of benchmarks make them challenging for modern Vision-Language Models (VLMs).
We introduce AncientDoc, the first benchmark designed for evaluating VLMs on Chinese ancient documents, covering the full pipeline from OCR to knowledge reasoning.
🏛 Dataset Overview
🧩 Task Definition
📊 Evaluation Metrics
🚀 Baseline Results
We evaluate open-source (Qwen2.5-VL, InternVL, LLaVA, etc.) and closed-source (GPT-4o, Gemini2.5-Pro, Doubao-V2, etc.) VLMs.
---Data Format
Each JSONL file contains:
📌 Citation
If you use AncientDoc in your research, please cite:
🔗 Resources
Data License
The AncientDoc dataset runs under the CC0 license.