Without manually annotated identities, unsupervised multi-object trackers are inferior to learning reliable feature embeddings. It causes the similarity-based inter-frame association stage also be error-prone, where an uncertainty problem arises. The frame-by-frame accumulated uncertainty prevents trackers from learning the consistent feature embedding against time variation. To avoid this uncertainty problem, recent self-supervised techniques are adopted, whereas they failed to capture temporal relations. The inter-frame uncertainty still exists. In fact, this paper argues that though the uncertainty problem is inevitable, it is possible to leverage the uncertainty itself to improve the learned consistency in turn. Specifically, an uncertainty-based metric is developed to verify and rectify the risky associations. The resulting accurate pseudo-tracklets boost learning the feature consistency. And accurate tracklets can incorporate temporal information into spatial transformation. This paper proposes a tracklet-guided augmentation strategy to simulate the tracklet’s motion, which adopts a hierarchical uncertainty-based sampling mechanism for hard sample mining. The ultimate unsupervised MOT framework, namely U2MOT, is proven effective on MOT-Challenges and VisDrone-MOT benchmark. U2MOT achieves a SOTA performance among the published supervised and unsupervised trackers.
Installation
Step1. Install u2mot (verified with PyTorch=1.8.1).
Before mixing different datasets, you need to follow the operations in tools/data/mix_data_xxx.py to create data folders and soft-links.
Finally, you can mix the training data for MOT-Challenge benckmarks (no extra training data is needed for VisDrone-MOT and BDD100K-MOT benchmark):
Since we adopt several tracking tricks from BoT-SORT, some of the results are slightly different from the performance reported in the paper.
For higher results, you may carefully tune the tracking parameters of each sequence, including detection score threshold, matching threshold, etc.
Training
The COCO pretrained YOLOX model can be downloaded from their model zoo. After downloading the pretrained models, you can put them under <u2mot_HOME>/pretrained.
For MOT20, VisDrone-MOT, and BDD100K-MOT, you need to clip the bounding boxes inside the image.
Specifically, set _clip=True for yolox/data/data_augment.py:line177, yolox/data/datasets/mosaicdetection.py:line86/358, and yolox/utils/boxes.py:line149.
To alleviate the optimization problem on detection head and reid head, you can train the detector (following ByteTrack) first, and then train the reid head only:
First, you need to prepare your dataset in COCO format. You can refer to MOT-to-COCO or VisDrone-to-COCO.
Second, you need to create a Exp file for your dataset. You can refer to the MOT17 training Exp file. Don’t forget to modify get_data_loader() and get_eval_loader() in your Exp file.
Third, modify the configuration at img_path2seq(), check_period(), and get_frame_cnt() functions to parse your image path and video info in yolox/data/datasets/mot.py.
Finally, you can train u2mot on your dataset by running:
Then, convert the result files into json format, zip those json files, and submit to the official website EvalAI to get the traking performance:
python tools/utils/convert_bdd.py
cd YOLOX_outputs/yolox_x_u2mot_bdd100k
zip -r -q bdd100k_pred.zip ./track_res_json
Be careful to evaluate on val and test splits separately.
Citation
@inproceedings{liu2023u2mot,
title={Uncertainty-aware Unsupervised Multi-Object Tracking},
author={Liu, Kai and Jin, Sheng and Fu, Zhihang and Chen, Ze and Jiang, Rongxin and Ye, Jieping},
booktitle={International Journal of Computer Vision},
year={2023}
}
u2mot
This repo is the official implementation of Uncertainty-aware Unsupervised Multi-Object Tracking
Abstract
Without manually annotated identities, unsupervised multi-object trackers are inferior to learning reliable feature embeddings. It causes the similarity-based inter-frame association stage also be error-prone, where an uncertainty problem arises. The frame-by-frame accumulated uncertainty prevents trackers from learning the consistent feature embedding against time variation. To avoid this uncertainty problem, recent self-supervised techniques are adopted, whereas they failed to capture temporal relations. The inter-frame uncertainty still exists. In fact, this paper argues that though the uncertainty problem is inevitable, it is possible to leverage the uncertainty itself to improve the learned consistency in turn. Specifically, an uncertainty-based metric is developed to verify and rectify the risky associations. The resulting accurate pseudo-tracklets boost learning the feature consistency. And accurate tracklets can incorporate temporal information into spatial transformation. This paper proposes a tracklet-guided augmentation strategy to simulate the tracklet’s motion, which adopts a hierarchical uncertainty-based sampling mechanism for hard sample mining. The ultimate unsupervised MOT framework, namely U2MOT, is proven effective on MOT-Challenges and VisDrone-MOT benchmark. U2MOT achieves a SOTA performance among the published supervised and unsupervised trackers.
Installation
Step1. Install u2mot (verified with PyTorch=1.8.1).
Step2. Install pycocotools.
Step3. Others
Data preparation
Download MOT17, MOT20, CrowdHuman, Cityperson, ETHZ, VisDrone-MOT, BDD100K-MOT(optional)and put them under /datasets in the following structure:
Then, you need to turn the datasets to COCO format, and the results will be saved in
datasets/<dataset>/annotations:Before mixing different datasets, you need to follow the operations in
tools/data/mix_data_xxx.pyto create data folders and soft-links. Finally, you can mix the training data for MOT-Challenge benckmarks (no extra training data is needed for VisDrone-MOT and BDD100K-MOT benchmark):Model zoo
The pre-trained weights are provided blow:
Since we adopt several tracking tricks from BoT-SORT, some of the results are slightly different from the performance reported in the paper.
For higher results, you may carefully tune the tracking parameters of each sequence, including detection score threshold, matching threshold, etc.
Training
The COCO pretrained YOLOX model can be downloaded from their model zoo. After downloading the pretrained models, you can put them under
<u2mot_HOME>/pretrained.For MOT20, VisDrone-MOT, and BDD100K-MOT, you need to clip the bounding boxes inside the image. Specifically, set
_clip=Trueforyolox/data/data_augment.py:line177,yolox/data/datasets/mosaicdetection.py:line86/358, andyolox/utils/boxes.py:line149.To alleviate the optimization problem on detection head and reid head, you can train the detector (following ByteTrack) first, and then train the reid head only:
First, you need to prepare your dataset in COCO format. You can refer to MOT-to-COCO or VisDrone-to-COCO. Second, you need to create a Exp file for your dataset. You can refer to the MOT17 training Exp file. Don’t forget to modify
get_data_loader()andget_eval_loader()in your Exp file. Third, modify the configuration atimg_path2seq(),check_period(), andget_frame_cnt()functions to parse your image path and video info inyolox/data/datasets/mot.py. Finally, you can train u2mot on your dataset by running:Tracking
Performance on MOT17 half val is evaluated with the official TrackEval (the configured code has been provided at
<u2mot_HOME>/TrackEval).First, run
u2motto get the tracking results, which will be saved inpretrained/u2mot/track_results:To leverage
UTLin inference stage, just add the--use-uncertaintyflag:Then, run
TrackEvalto evaluate the tracking performance:Run
u2mot, and the results will be saved inYOLOX_outputs/yolox_x_mix_u2mot17/track_res:Submit the txt files under
track_res_dtito MOTChallenge website for evaluation.We use the input size 1600 x 896 for MOT20-04, MOT20-07 and 1920 x 736 for MOT20-06, MOT20-08.
Run
u2mot:Submit the txt files under
track_res_dtito MOTChallenge website for evaluation.We use the input size 1600 x 896 for VisDrone-MOT benchmark. Following TrackFormer the performance is evaluated with the default motmetrics.
Run
u2mot, and the results will be saved atYOLOX_outputs/yolox_x_u2mot_visdrone/track_res:Evaluate the results:
You will get the performance on MOTA, IDF1 and ID switch.
Run
u2mot, and the results will be saved atYOLOX_outputs/yolox_x_u2mot_bdd100k/track_res:Then, convert the result files into
jsonformat, zip those json files, and submit to the official website EvalAI to get the traking performance:Be careful to evaluate on
valandtestsplits separately.Citation
Acknowledgement
A large part of the code is borrowed from YOLOX, FairMOT, ByteTrack, ByteTrack_ReID, and BoT-SORT. Many thanks for their wonderful works.