2025/02/21: Try yolo12 for classification, oriented bounding boxes, pose estimation, and instance segmentation at ultralytics. Please pay attention to this issue. Thanks to them!
Abstract
Enhancing the network architecture of the YOLO framework has been crucial for a long time but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms.
YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters.
YOLOv12
YOLOv12: Attention-Centric Real-Time Object Detectors
Yunjie Tian1, Qixiang Ye2, David Doermann1
1 University at Buffalo, SUNY, 2 University of Chinese Academy of Sciences.
Comparison with popular methods in terms of latency-accuracy (left) and FLOPs-accuracy (right) trade-offs
Updates
2025/02/24: Some blog introductions: ultralytics, LearnOpenCV, Medium@Mert. Thanks to them!
2025/02/22: YOLOv12 TensorRT CPP Inference Repo + Google Colab Notebook Support.
2025/02/22: Android deploy. TensorRT-YOLO accelerates yolo12 inference. Thanks to them!
2025/02/21: Try yolo12 for classification, oriented bounding boxes, pose estimation, and instance segmentation at ultralytics. Please pay attention to this issue. Thanks to them!
2025/02/20: Any computer or edge device? Support yolo12 now.
2025/02/20: ONNX CPP Version. Train a yolov12 model on a custom dataset? An introduction at Youtube. How to train YOLO12 on a custom dataset | Step-by-step guide by Noor.
2025/02/19: arXiv version is public. Demo is available (try Demo2 Demo3 if busy).
Abstract
Enhancing the network architecture of the YOLO framework has been crucial for a long time but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms.YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters.
Main Results
(pixels)
50-95
T4 TensorRT10
(M)
(G)
Installation
Validation
yolov12n
yolov12s
yolov12m
yolov12l
yolov12x
Training
Prediction
Export
Demo
Acknowledgement
The code is based on ultralytics. Thanks for their excellent work!
Citation