Note: if you want vLLM and transformers codes to run in the same environment, you don’t need to worry about this installation error like: vllm 0.8.5+cu118 requires transformers>=4.51.1
vLLM-Inference
VLLM:
Note: change the INPUT_PATH/OUTPUT_PATH and other settings in the DeepSeek-OCR-master/DeepSeek-OCR-vllm/config.py
uv venv
source .venv/bin/activate
# Until v0.11.1 release, you need to install vLLM from nightly build
uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
or you can
```Shell
cd DeepSeek-OCR-master/DeepSeek-OCR-hf
python run_dpsk_ocr.py
Support-Modes
The current open-source model supports the following modes:
Native resolution:
Tiny: 512×512 (64 vision tokens)✅
Small: 640×640 (100 vision tokens)✅
Base: 1024×1024 (256 vision tokens)✅
Large: 1280×1280 (400 vision tokens)✅
Dynamic resolution
Gundam: n×640×640 + 1×1024×1024 ✅
Prompts examples
# document: <image>\n<|grounding|>Convert the document to markdown.
# other image: <image>\n<|grounding|>OCR this image.
# without layouts: <image>\nFree OCR.
# figures in document: <image>\nParse the figure.
# general: <image>\nDescribe this image in detail.
# rec: <image>\nLocate <|ref|>xxxx<|/ref|> in the image.
# '先天下之忧而忧'
📥 Model Download | 📄 Paper Link | 📄 Arxiv Paper Link |
DeepSeek-OCR: Contexts Optical Compression
Explore the boundaries of visual-text compression.
Release
Contents
Install
Note: if you want vLLM and transformers codes to run in the same environment, you don’t need to worry about this installation error like: vllm 0.8.5+cu118 requires transformers>=4.51.1
vLLM-Inference
[2025/10/23] The version of upstream vLLM:
Transformers-Inference
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModel.from_pretrained(model_name, _attn_implementation=’flash_attention_2’, trust_remote_code=True, use_safetensors=True) model = model.eval().cuda().to(torch.bfloat16)
prompt = “
\nFree OCR. “
prompt = “
\n<|grounding|>Convert the document to markdown. “
image_file = ‘your_image.jpg’
output_path = ‘your/output/dir’
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
Support-Modes
The current open-source model supports the following modes:
Prompts examples
Visualizations
Acknowledgement
We would like to thank Vary, GOT-OCR2.0, MinerU, PaddleOCR, OneChart, Slow Perception for their valuable models and ideas.
We also appreciate the benchmarks: Fox, OminiDocBench.
Citation