We currently support VisDrone, UAVDT, and TinyPerson datasets. Follow the instructions below to prepare datasets.
Data Prepare (click to expand)
Darknet Format: The Darknet framework locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:
dataset/images/im0.jpg
dataset/labels/im0.txt
The images and labels from VisDrone, UAVDT, TinyPerson are all organized in this format.
Ground-Truth Heatmap: We recommend to leverage the segment-anything model (SAM) to introduce precise shape prior to the GT heatmaps for training. You need to install SAM first:
cd third_party/segment-anything
pip install -e .
Dataset - VisDrone: Download the data, and ensure the subsets under the /path/to/visdrone directory are as follows:
Run commands below to reproduce results on the datasets, e.g., VisDrone. Download the pretrained weights (e.g., YOLOv5m) and put them to the weights/pretrained/ directory first.
Training on Single GPU
Here are the default setting to adapt YOLOv5m to VisDrone using our ESOD framework:
When multiple GPUs are available, the DistributedDataParallel mode can be applied to speed up the training.
Simply set GPUS according to your devices, e.g. GPUS=0,1,2,3
We support the YOLOv5 series (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x). After downloading them to weights/, simply change the MODEL (as well as IMAGE_SIZE) to apply different model:
Besides, we also support the RetinaNet, RTMDet, YOLOv8, and GPViT models. You can download the pre-trained weights, convert to ESOD initialization, and train the models to specific datasets:
The organizations of data for official evaluation tools are different from Darknet. So an intermediate data conversion is required. Run command below to get the results in Darknet format.
--view-cluster will draw the generated patches in green boxes and save the heat maps from both prediction and ground truth.
Pretrained Weights
Please find the pre-trained weights from Google Drive
Acknowledgment
A large part of the code is borrowed from YOLO. Many thanks for this wonderful work.
Citation
If you find this work useful in your research, please kindly cite the paper:
@article{liu2025esod,
title={ESOD: Efficient Small Object Detection on High-Resolution Images},
author={Liu, Kai and Fu, Zhihang and Jin, Sheng and Chen, Ze and Zhou, Fan and Jiang, Rongxin and Chen, Yaowu and Ye, Jieping},
journal={IEEE Transactions on Image Processing},
volume={34},
pages={183--195},
year={2025}
}
ESOD: Efficient Small Object Detection on High-Resolution Images
This repository is the offical implementation of Efficient Small Object Detection on High-Resolution Images.
Installation
Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7:
Data Preparation
We currently support VisDrone, UAVDT, and TinyPerson datasets. Follow the instructions below to prepare datasets.
Data Prepare (click to expand)
/images/in each image path with/labels/. For example:The images and labels from VisDrone, UAVDT, TinyPerson are all organized in this format.
/path/to/visdronedirectory are as follows:Then make a soft-link to your directory, and run the
scripts/data_prepare.pyscript to reorganize the images and labels:Training
Run commands below to reproduce results on the datasets, e.g., VisDrone. Download the pretrained weights (e.g., YOLOv5m) and put them to the
weights/pretrained/directory first.Here are the default setting to adapt YOLOv5m to VisDrone using our ESOD framework:
When multiple GPUs are available, the DistributedDataParallel mode can be applied to speed up the training. Simply set
GPUSaccording to your devices, e.g.GPUS=0,1,2,3We support the YOLOv5 series (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x). After downloading them to
weights/, simply change theMODEL(as well asIMAGE_SIZE) to apply different model:Besides, we also support the RetinaNet, RTMDet, YOLOv8, and GPViT models. You can download the pre-trained weights, convert to ESOD initialization, and train the models to specific datasets:
Feel free to set
MODELasretinanet,rtmdet, orgpvit(yolov8mdoes not require model convert). The detailed instructions will come soon.Beside VisDrone, we also support model training on UAVDT and TinyPerson datasets:
Testing
Vanilla Evaluation
Run commands below to compute evaluation results (AP, AP50, empty rate, missing rate) with intergrated
utils/metrics.py.For computational analysis (including GFLOPs and FPS), use the following command:
Official Evaluation
The organizations of data for official evaluation tools are different from Darknet. So an intermediate data conversion is required. Run command below to get the results in Darknet format.
Then run the specified script
data_convert.pyfor corresponding data formats and perform official evaluations.UAVDT : coming soon.
TinyPerson: coming soon.
Inference
The script
detect.pyruns inference and saves the results toruns/detect.--view-clusterwill draw the generated patches in green boxes and save the heat maps from both prediction and ground truth.Pretrained Weights
Acknowledgment
A large part of the code is borrowed from YOLO. Many thanks for this wonderful work.
Citation
If you find this work useful in your research, please kindly cite the paper: