p5k6xuzwc/iccv2021

Backbone
Transformer
涨点神器
GAN
NAS
NeRF
Loss
Zero-Shot Learning
Few-Shot Learning
长尾(Long-tailed)
Vision and Language
无监督/自监督(Self-Supervised)
Multi-Label Image Recognition(多标签图像识别)
2D目标检测(Object Detection)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video Object Segmentation)
Few-shot Segmentation
人体运动分割(Human Motion Segmentation)
目标跟踪(Object Tracking)
3D Point Cloud
3D Object Detection(3D目标检测)
3D Semantic Segmenation(3D语义分割)
3D Instance Segmentation(3D实例分割)
3D Multi-Object Tracking(3D多目标跟踪)
Point Cloud Denoising(点云去噪)
Point Cloud Registration(点云配准)
Point Cloud Completion(点云补全)
雷达语义分割(Radar Semantic Segmentation)
图像恢复(Image Restoration)
超分辨率(Super-Resolution)
去噪(Denoising)
医学图像去噪(Medical Image Denoising)
去模糊(Deblurring)
阴影去除(Shadow Removal)
视频插帧(Video Frame Interpolation)
视频修复/补全(Video Inpainting)
行人重识别(Person Re-identification)
行人搜索(Person Search)
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
6D位姿估计(6D Object Pose Estimation)
3D人头重建(3D Head Reconstruction)
人脸识别(Face Recognition)
人脸表情识别(Facial Expression Recognition)
行为识别(Action Recognition)
时序动作定位(Temporal Action Localization)
动作检测(Action Detection)
群体活动识别(Group Activity Recognition)
手语识别(Sign Language Recognition)
文本检测(Text Detection)
文本识别(Text Recognition)
文本替换(Text Repalcement)
视觉问答(Visual Question Answering, VQA)
对抗攻击(Adversarial Attack)
深度估计(Depth Estimation)
视线估计(Gaze Estimation)
人群计数(Crowd Counting)
车道线检测(Lane Detection)
轨迹预测(Trajectory Prediction)
异常检测(Anomaly Detection)
场景图生成(Scene Graph Generation)
图像编辑(Image Editing)
图像合成(Image Synthesis)
图像检索(Image Retrieval)
三维重建(3D Reconstruction)
视频稳像(Video Stabilization)
细粒度识别(Fine-Grained Recognition)
风格迁移(Style Transfer)
神经绘画(Neural Painting)
特征匹配(Feature Matching)
语义对应(Semantic Correspondence)
边缘检测(Edge Detection)
相机标定(Camera Calibration)
图像质量评估(Image Quality Assessment)
度量学习(Metric Learning)
Unsupervised Domain Adaptation
Video Rescaling
Hand-Object Interaction
Vision-and-Language Navigation
数据集(Datasets)
其他(Others)
Backbone
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Paper(Oral): https://arxiv.org/abs/2102.12122
Code: https://github.com/whai362/PVT
AutoFormer: Searching Transformers for Visual Recognition
Paper: https://arxiv.org/abs/2107.00651
Code: https://github.com/microsoft/AutoML
Bias Loss for Mobile Neural Networks
Paper: https://arxiv.org/abs/2107.11170
Code: None
Vision Transformer with Progressive Sampling
Paper: https://arxiv.org/abs/2108.01684
Code: https://github.com/yuexy/PS-ViT
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Paper: https://arxiv.org/abs/2101.11986
Code: https://github.com/yitu-opensource/T2T-ViT
Rethinking Spatial Dimensions of Vision Transformers
Paper: https://arxiv.org/abs/2103.16302
Code: https://github.com/naver-ai/pit
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper: https://arxiv.org/abs/2103.14030
Code: https://github.com/microsoft/Swin-Transformer
Conformer: Local Features Coupling Global Representations for Visual Recognition
Paper: https://arxiv.org/abs/2105.03889
Code: https://github.com/pengzhiliang/Conformer
MicroNet: Improving Image Recognition with Extremely Low FLOPs
Paper: https://arxiv.org/abs/2108.05894
Code: https://github.com/liyunsheng13/micronet
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition
Paper: https://arxiv.org/abs/2102.01063
Code: https://github.com/idstcv/ZenNAS
Visual Transformer
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper: https://arxiv.org/abs/2103.14030
Code: https://github.com/microsoft/Swin-Transformer
An Empirical Study of Training Self-Supervised Vision Transformers
Paper(Oral): https://arxiv.org/abs/2104.02057
MoCo v3 Code: None
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Paper(Oral): https://arxiv.org/abs/2102.12122
Code: https://github.com/whai362/PVT
Group-Free 3D Object Detection via Transformers
Paper: https://arxiv.org/abs/2104.00678
Code: None
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Paper: https://arxiv.org/abs/2107.12309
Code: None
Rethinking and Improving Relative Position Encoding for Vision Transformer
Paper: https://arxiv.org/abs/2107.14222
Code: https://github.com/microsoft/AutoML/tree/main/iRPE
Emerging Properties in Self-Supervised Vision Transformers
Paper: https://arxiv.org/abs/2104.14294
Code: https://github.com/facebookresearch/dino
Learning Spatio-Temporal Transformer for Visual Tracking
Paper: https://arxiv.org/abs/2103.17154
Code: https://github.com/researchmm/Stark
Fast Convergence of DETR with Spatially Modulated Co-Attention
Paper: https://arxiv.org/abs/2101.07448
Code: https://github.com/abc403/SMCA-replication
Vision Transformer with Progressive Sampling
Paper: https://arxiv.org/abs/2108.01684
Code: https://github.com/yuexy/PS-ViT
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Paper: https://arxiv.org/abs/2101.11986
Code: https://github.com/yitu-opensource/T2T-ViT
Rethinking Spatial Dimensions of Vision Transformers
Paper: https://arxiv.org/abs/2103.16302
Code: https://github.com/naver-ai/pit
The Right to Talk: An Audio-Visual Transformer Approach
Paper: https://arxiv.org/abs/2108.03256
Code: None
Joint Inductive and Transductive Learning for Video Object Segmentation
Paper: https://arxiv.org/abs/2108.03679
Code: https://github.com/maoyunyao/JOINT
Conformer: Local Features Coupling Global Representations for Visual Recognition
Paper: https://arxiv.org/abs/2105.03889
Code: https://github.com/pengzhiliang/Conformer
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
Paper: https://arxiv.org/abs/2108.03032
Code: https://github.com/zhiheLu/CWT-for-FSS
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Paper: https://arxiv.org/abs/2108.03798
Code: https://github.com/wzmsltw/PaintTransformer
Conditional DETR for Fast Training Convergence
Paper: https://arxiv.org/abs/2108.06152
Code: https://github.com/Atten4Vis/ConditionalDETR
MUSIQ: Multi-scale Image Quality Transformer
Paper: https://arxiv.org/abs/2108.05997
Code: https://github.com/google-research/google-research/tree/master/musiq
SOTR: Segmenting Objects with Transformers
Paper: https://arxiv.org/abs/2108.06747
Code: https://github.com/easton-cau/SOTR
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
Paper(Oral): https://arxiv.org/abs/2108.08839
Code: https://github.com/yuxumin/PoinTr
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
Paper: https://arxiv.org/abs/2108.04444
Code: https://github.com/AllenXiangX/SnowflakeNet
Improving 3D Object Detection with Channel-wise Transformer
Paper: https://arxiv.org/abs/2108.10723
Code: https://github.com/hlsheng1/CT3D
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
Paper: https://arxiv.org/abs/2108.11116
Code: None
GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
Paper: https://arxiv.org/abs/2108.12630
Code: https://github.com/xueyee/GroupFormer
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Paper: https://arxiv.org/abs/2109.00512
Code: https://github.com/facebookresearch/co3d
Dataset: https://github.com/facebookresearch/co3d
Voxel Transformer for 3D Object Detection
Paper: https://arxiv.org/abs/2109.02497
Code: None
3D Human Texture Estimation from a Single Image with Transformers
Homepage: https://www.mmlab-ntu.com/project/texformer/
Paper(Oral): https://arxiv.org/abs/2109.02563
Code: None
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
Paper: https://arxiv.org/abs/2109.02974
Code: https://github.com/ruiliu-ai/FuseFormer
CTRL-C: Camera calibration TRansformer with Line-Classification
Paper: https://arxiv.org/abs/2109.02259
Code: https://github.com/jwlee-vcl/CTRL-C
An End-to-End Transformer Model for 3D Object Detection
Homepage: https://facebookresearch.github.io/3detr/
Paper: https://arxiv.org/abs/2109.08141
Code: https://github.com/facebookresearch/3detr
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
Paper: https://arxiv.org/abs/2109.08044
Code: None
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Paper: https://arxiv.org/abs/2109.07036
Code: https://github.com/twangnh/pnp-detr
Transformer-based Dual Relation Graph for Multi-label Image Recognition
Paper: https://arxiv.org/abs/2110.04722
Code: None
涨点神器
FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
Paper: https://github.com/EMI-Group/FaPN
Code: https://arxiv.org/abs/2108.07058
Unifying Nonlocal Blocks for Neural Networks
Paper: https://arxiv.org/abs/2108.02451
Code: https://github.com/zh460045050/SNL_ICCV2021
Towards Learning Spatially Discriminative Feature Representations
Paper: https://arxiv.org/abs/2109.01359
Code: None
GAN
Labels4Free: Unsupervised Segmentation using StyleGAN
Homepage: https://rameenabdal.github.io/Labels4Free/
Paper: https://arxiv.org/abs/2103.14968
GNeRF: GAN-based Neural Radiance Field without Posed Camera
Paper(Oral): https://arxiv.org/abs/2103.15606
Code: https://github.com/MQ66/gnerf
EigenGAN: Layer-Wise Eigen-Learning for GANs
Paper: https://arxiv.org/abs/2104.12476
Code: https://github.com/LynnHo/EigenGAN-Tensorflow
From Continuity to Editability: Inverting GANs with Consecutive Images
Paper: https://arxiv.org/abs/2107.13812
Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs
Sketch Your Own GAN
Homepage: https://peterwang512.github.io/GANSketching/
Paper: https://arxiv.org/abs/2108.02774
代码: https://github.com/peterwang512/GANSketching
Manifold Matching via Deep Metric Learning for Generative Modeling
Paper: https://arxiv.org/abs/2106.10777
Code: https://github.com/dzld00/pytorch-manifold-matching
Dual Projection Generative Adversarial Networks for Conditional Image Generation
Paper: https://arxiv.org/abs/2108.09016
Code: None
GAN Inversion for Out-of-Range Images with Geometric Transformations
Paper: https://arxiv.org/abs/2108.08998
Code: None
ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement
Homepage: https://yuval-alaluf.github.io/restyle-encoder/
Paper: https://arxiv.org/abs/2104.02699
Code: https://github.com/yuval-alaluf/restyle-encoder
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Paper(Oral): https://arxiv.org/abs/2103.17249
Code: https://github.com/orpatashnik/StyleCLIP
Image Synthesis via Semantic Composition
Homepage: https://shepnerd.github.io/scg/
Paper: https://arxiv.org/abs/2109.07053
Code: https://github.com/dvlab-research/SCGAN
NAS
AutoFormer: Searching Transformers for Visual Recognition
Paper: https://arxiv.org/abs/2107.00651
Code: https://github.com/microsoft/AutoML
BN-NAS: Neural Architecture Search with Batch Normalization
Paper: https://arxiv.org/abs/2108.07375
Code: https://github.com/bychen515/BNNAS
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition
Paper: https://arxiv.org/abs/2102.01063
Code: https://github.com/idstcv/ZenNAS
NeRF
GNeRF: GAN-based Neural Radiance Field without Posed Camera
Paper(Oral): https://arxiv.org/abs/2103.15606
Code: https://github.com/MQ66/gnerf
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
Paper: https://arxiv.org/abs/2103.13744
Code: https://github.com/creiser/kilonerf
In-Place Scene Labelling and Understanding with Implicit Scene Representation
Homepage: https://shuaifengzhi.com/Semantic-NeRF/
Paper(Oral): https://arxiv.org/abs/2103.15875
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
Homepage: https://ajayj.com/dietnerf
Paper(DietNeRF): https://arxiv.org/abs/2104.00677
BARF: Bundle-Adjusting Neural Radiance Fields
Homepage: https://chenhsuanlin.bitbucket.io/bundle-adjusting-NeRF/
Paper(Oral): https://arxiv.org/abs/2104.06405
Code: https://github.com/chenhsuanlin/bundle-adjusting-NeRF
Self-Calibrating Neural Radiance Fields
Paper: https://arxiv.org/abs/2108.13826
Code: https://github.com/POSTECH-CVLab/SCNeRF
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Paper: https://arxiv.org/abs/2109.00512
Code: https://github.com/facebookresearch/co3d
Dataset: https://github.com/facebookresearch/co3d
Neural Articulated Radiance Field
Paper: https://arxiv.org/abs/2104.03110
Code: https://github.com/nogu-atsu/NARF
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
Paper(Oral): https://arxiv.org/abs/2109.01129
Code: https://github.com/weiyithu/NerfingMVS
SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes
Homepage: https://xuchen-ethz.github.io/snarf
Paper: https://arxiv.org/abs/2104.03953
Code: https://github.com/xuchen-ethz/snarf
CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
Paper: https://arxiv.org/abs/2109.01750
Code: https://github.com/wayne1123/code-nerf
Loss
Rank & Sort Loss for Object Detection and Instance Segmentation
Paper(Oral): https://arxiv.org/abs/2107.11669
Code: https://github.com/kemaloksuz/RankSortLoss
Bias Loss for Mobile Neural Networks
Paper: https://arxiv.org/abs/2107.11170
Code: None
A Robust Loss for Point Cloud Registration
Paper: https://arxiv.org/abs/2108.11682
Code: None
Reconcile Prediction Consistency for Balanced Object Detection
Paper: https://arxiv.org/abs/2108.10809
Code: None
Influence-Balanced Loss for Imbalanced Visual Classification
Paper: https://arxiv.org/abs/2110.02444
Code: https://github.com/pseulki/IB-Loss
Zero-Shot Learning
FREE: Feature Refinement for Generalized Zero-Shot Learning
Paper: https://arxiv.org/abs/2107.13807
Code: https://github.com/shiming-chen/FREE
Discriminative Region-based Multi-Label Zero-Shot Learning
Paper: https://arxiv.org/abs/2108.09301
Code: https://arxiv.org/abs/2108.09301
Few-Shot Learning
Relational Embedding for Few-Shot Classification
Paper: https://arxiv.org/abs/2108.0966
Code: https://github.com/dahyun-kang/renet
Few-Shot and Continual Learning with Attentive Independent Mechanisms
Paper: https://arxiv.org/abs/2107.14053
Code: https://github.com/huang50213/AIM-Fewshot-Continual
长尾(Long-tailed)
Parametric Contrastive Learning
Paper: https://arxiv.org/abs/2107.12028
Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
Influence-Balanced Loss for Imbalanced Visual Classification
Paper: https://arxiv.org/abs/2110.02444
Code: https://github.com/pseulki/IB-Loss
Vision and Language
VLGrammar: Grounded Grammar Induction of Vision and Language
Paper: https://arxiv.org/abs/2103.12975
Code: https://github.com/evelinehong/VLGrammar
无监督/自监督(Un/Self-Supervised)
An Empirical Study of Training Self-Supervised Vision Transformers
Paper(Oral): https://arxiv.org/abs/2104.02057
MoCo v3 Code: None
DetCo: Unsupervised Contrastive Learning for Object Detection
Paper: https://arxiv.org/abs/2102.04803
Code: https://github.com/xieenze/DetCo
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Paper: https://arxiv.org/abs/2108.02183
Code: None
Improving Contrastive Learning by Visualizing Feature Transformation
Paper(Oral): https://arxiv.org/abs/2108.02982
Code: https://github.com/DTennant/CL-Visualizing-Feature-Transformation
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
Paper: https://arxiv.org/abs/2108.08012
Code: None
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
Paper: https://arxiv.org/abs/2108.10668
Code: None
MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
Paper: https://arxiv.org/abs/2108.12178
Code: https://github.com/KaiChen1998/MultiSiam
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
Homepage: https://siyuanhuang.com/STRL/
Paper: https://arxiv.org/abs/2109.00179
Code: https://github.com/yichen928/STRL
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
Paper: https://arxiv.org/abs/2109.02244
Code: https://github.com/youngkyunJang/SPQ
Multi-Label Image Recognition(多标签图像识别)
Residual Attention: A Simple but Effective Method for Multi-Label Recognition
Paper: https://arxiv.org/abs/2108.02456
Code: https://github.com/Kevinz-code/CSRA
2D目标检测(Object Detection)
DetCo: Unsupervised Contrastive Learning for Object Detection
Paper: https://arxiv.org/abs/2102.04803
Code: https://github.com/xieenze/DetCo
Detecting Invisible People
Homepage: http://www.cs.cmu.edu/~tkhurana/invisible.htm
Code: https://arxiv.org/abs/2012.08419
Active Learning for Deep Object Detection via Probabilistic Modeling
Paper: https://arxiv.org/abs/2103.16130
Code: None
Conditional Variational Capsule Network for Open Set Recognition
Paper: https://arxiv.org/abs/2104.09159
Code: https://github.com/guglielmocamporese/cvaecaposr
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
Homepage: https://ashkamath.github.io/mdetr_page/
Paper(Oral): https://arxiv.org/abs/2104.12763
Code: https://github.com/ashkamath/mdetr
Rank & Sort Loss for Object Detection and Instance Segmentation
Paper(Oral): https://arxiv.org/abs/2107.11669
Code: https://github.com/kemaloksuz/RankSortLoss
SimROD: A Simple Adaptation Method for Robust Object Detection
Paper(Oral): https://arxiv.org/abs/2107.13389
Code: None
GraphFPN: Graph Feature Pyramid Network for Object Detection
Paper: https://arxiv.org/abs/2108.00580
Code: None
Fast Convergence of DETR with Spatially Modulated Co-Attention
Paper: https://arxiv.org/abs/2101.07448
Code: https://github.com/abc403/SMCA-replication
Conditional DETR for Fast Training Convergence
Paper: https://arxiv.org/abs/2108.06152
Code: https://github.com/Atten4Vis/ConditionalDETR
TOOD: Task-aligned One-stage Object Detection
Paper(Oral): https://arxiv.org/abs/2108.07755
Code: https://github.com/fcjian/TOOD
Reconcile Prediction Consistency for Balanced Object Detection
Paper: https://arxiv.org/abs/2108.10809
Code: None
Mutual Supervision for Dense Object Detection
Paper: https://arxiv.org/abs/2109.05986
Code: https://github.com/MCG-NJU/MuSu-Detection
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Paper: https://arxiv.org/abs/2109.07036
Code: https://github.com/twangnh/pnp-detr
Deep Structured Instance Graph for Distilling Object Detectors
Paper: https://arxiv.org/abs/2109.12862
Code: https://github.com/dvlab-research/Dsig
半监督目标检测
End-to-End Semi-Supervised Object Detection with Soft Teacher
Paper: https://arxiv.org/abs/2106.09018
Code: None
旋转目标检测
Oriented R-CNN for Object Detection
Paper: https://arxiv.org/abs/2108.05699
Code: https://github.com/jbwang1997/OBBDetection
Few-Shot目标检测
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
Paper: https://arxiv.org/abs/2108.09017
Code: https://github.com/er-muyue/DeFRCN
语义分割(Semantic Segmentation)
Personalized Image Semantic Segmentation
Paper: https://arxiv.org/abs/2107.13978
Code: https://github.com/zhangyuygss/PIS
Dataset: https://github.com/zhangyuygss/PIS
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
Paper(Oral): https://arxiv.org/abs/2107.11264
Code: None
Enhanced Boundary Learning for Glass-like Object Segmentation
Paper: https://arxiv.org/abs/2103.15734
Code: https://github.com/hehao13/EBLNet
Self-Regulation for Semantic Segmentation
Paper: https://arxiv.org/abs/2108.09702
Code: https://github.com/dongzhang89/SR-SS
Mining Contextual Information Beyond Image for Semantic Segmentation
Paper: https://arxiv.org/abs/2108.11819
Code: https://github.com/CharlesPikachu/mcibi
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
Paper: https://arxiv.org/abs/2107.11264
Code: https://github.com/shjung13/Standardized-max-logits
ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation
Paper: https://arxiv.org/abs/2108.12382
Code: https://github.com/SegmentationBLWX/sssegmentation
Scaling up instance annotation via label propagation
Homepage: http://scaling-anno.csail.mit.edu/
Paper: https://arxiv.org/abs/2110.02277
Code: None
无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation)
Multi-Anchor Active Domain Adaptation for Semantic Segmentation
Paper(Oral): https://arxiv.org/abs/2108.08012
Code: https://github.com/munanning/MADA
论文下载链接：https://arxiv.org/abs/2108.08012
Few-Shot语义分割
Learning Meta-class Memory for Few-Shot Semantic Segmentation
Paper: https://arxiv.org/abs/2108.02958'
Code: None
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
Paper: https://arxiv.org/abs/2108.03032
Code: https://github.com/zhiheLu/CWT-for-FSS
半监督语义分割(Semi-supervised Semantic Segmentation)
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2107.11787
Code: None
Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
Paper(Oral): https://arxiv.org/abs/2107.11279
Code: https://github.com/CVMI-Lab/DARS
Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2108.09025
Code: None
弱监督语义分割(Weakly Supervised Semantic Segmentation)
Complementary Patch for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2108.03852
Code: None
无监督分割(Unsupervised Segmentation)
Labels4Free: Unsupervised Segmentation using StyleGAN
Homepage: https://rameenabdal.github.io/Labels4Free/
Paper: https://arxiv.org/abs/2103.14968
实例分割(Instance Segmentation)
Instances as Queries
Paper: https://arxiv.org/abs/2105.01928
Code: https://github.com/hustvl/QueryInst
Crossover Learning for Fast Online Video Instance Segmentation
Paper: https://arxiv.org/abs/2104.05970
Code: https://github.com/hustvl/CrossVIS
Rank & Sort Loss for Object Detection and Instance Segmentation
Paper(Oral): https://arxiv.org/abs/2107.11669
Code: https://github.com/kemaloksuz/RankSortLoss
SOTR: Segmenting Objects with Transformers
Paper: https://arxiv.org/abs/2108.06747
Code: https://github.com/easton-cau/SOTR
Scaling up instance annotation via label propagation
Homepage: http://scaling-anno.csail.mit.edu/
Paper: https://arxiv.org/abs/2110.02277
Code: None
医学图像分割(Medical Image Segmentation)
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
Paper: https://arxiv.org/abs/2108.00622
Code: https://github.com/uci-cbcl/RP-Net
视频目标分割(Video Object Segmentation)
Hierarchical Memory Matching Network for Video Object Segmentation
Paper: https://arxiv.org/abs/2109.11404
Code: https://github.com/Hongje/HMMN
Full-Duplex Strategy for Video Object Segmentation
Homepage: http://dpfan.net/FSNet/
Paper: https://arxiv.org/abs/2108.03151
Code: https://github.com/GewelsJI/FSNet
Joint Inductive and Transductive Learning for Video Object Segmentation
Paper: https://arxiv.org/abs/2108.03679
Code: https://github.com/maoyunyao/JOINT
Few-shot Segmentation
Mining Latent Classes for Few-shot Segmentation
Paper(Oral): https://arxiv.org/abs/2103.15402
Code: https://github.com/LiheYoung/MiningFSS
人体运动分割(Human Motion Segmentation)
Graph Constrained Data Representation Learning for Human Motion Segmentation
Paper: https://arxiv.org/abs/2107.13362
Code: None
目标跟踪(Object Tracking)
Learning to Track Objects from Unlabeled Videos
Paper: https://arxiv.org/abs/2108.12711
Code: https://github.com/VISION-SJTU/USOT
Learning Spatio-Temporal Transformer for Visual Tracking
Paper: https://arxiv.org/abs/2103.17154
Code: https://github.com/researchmm/Stark
Learning to Adversarially Blur Visual Object Tracking
Paper: https://arxiv.org/abs/2107.12085
Code: https://github.com/tsingqguo/ABA
HiFT: Hierarchical Feature Transformer for Aerial Tracking
Paper: https://arxiv.org/abs/2108.00202
Code: https://github.com/vision4robotics/HiFT
Learn to Match: Automatic Matching Network Design for Visual Tracking
Paper: https://arxiv.org/abs/2108.00803
Code: https://github.com/JudasDie/SOTS
Saliency-Associated Object Tracking
Paper: https://arxiv.org/abs/2108.03637
Code: https://github.com/ZikunZhou/SAOT.git
RGBD 目标跟踪
DepthTrack: Unveiling the Power of RGBD Tracking
Paper: https://arxiv.org/abs/2108.13962
Code: https://github.com/xiaozai/DeT
Dataset: https://github.com/xiaozai/DeT
3D Point Cloud
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
Homepage: https://siyuanhuang.com/STRL/
Paper: https://arxiv.org/abs/2109.00179
Code: https://github.com/yichen928/STRL
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
Homepage: https://hansen7.github.io/OcCo/
Paper: https://arxiv.org/abs/2010.01089
Code: https://github.com/hansen7/OcCo
DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
Paper: https://arxiv.org/abs/2108.04023
Code: None
Adaptive Graph Convolution for Point Cloud Analysis
Paper: https://arxiv.org/abs/2108.08035
Code: https://github.com/hrzhou2/AdaptConv-master
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
Paper: https://arxiv.org/abs/2010.01089
Code: https://github.com/hansen7/OcCo
3D Object Detection(3D目标检测)
Group-Free 3D Object Detection via Transformers
Paper: https://arxiv.org/abs/2104.00678
Code: None
Improving 3D Object Detection with Channel-wise Transformer
Paper: https://arxiv.org/abs/2108.10723
Code: https://github.com/hlsheng1/CT3D
AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection
Paper: https://arxiv.org/abs/2108.11127
Code: https://github.com/zongdai/AutoShape
4D-Net for Learned Multi-Modal Alignment
Paper: https://arxiv.org/abs/2109.01066
Code: None
Voxel Transformer for 3D Object Detection
Paper: https://arxiv.org/abs/2109.02497
Code: None
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
Paper: https://arxiv.org/abs/2109.02499
Code: None
An End-to-End Transformer Model for 3D Object Detection
Homepage: https://facebookresearch.github.io/3detr/
Paper: https://arxiv.org/abs/2109.08141
Code: https://github.com/facebookresearch/3detr
RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection
Paper: https://arxiv.org/abs/2103.10039
Code: https://github.com/TuSimple/RangeDet
Geometry-based Distance Decomposition for Monocular 3D Object Detection
Paper: https://arxiv.org/abs/2104.03775
Code: https://github.com/Rock-100/MonoDet
3D Semantic Segmentation(3D语义分割)
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
Paper: https://arxiv.org/abs/2107.11769
Code: None
Learning with Noisy Labels for Robust Point Cloud Segmentation
Homepage: https://shuquanye.com/PNAL_website/
Paper(Oral): https://arxiv.org/abs/2107.14230
VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
Paper(Oral): https://arxiv.org/abs/2107.13824
Code: https://github.com/hzykent/VMNet
Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation
Paper: https://arxiv.org/abs/2107.14724
Code: https://github.com/leolyj/DsCML
DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
Paper: https://arxiv.org/abs/2108.04023
Code: None
Adaptive Graph Convolution for Point Cloud Analysis
Paper: https://arxiv.org/abs/2108.08035
Code: https://github.com/hrzhou2/AdaptConv-master
Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation
Paper: https://arxiv.org/abs/2106.15277
Code: https://github.com/ICEORY/PMF
3D Instance Segmentation(3D实例分割)
Hierarchical Aggregation for 3D Instance Segmentation
Paper: https://arxiv.org/abs/2108.02350
Code: https://github.com/hustvl/HAIS
3D Multi-Object Tracking(3D多目标跟踪)
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving
Paper: https://arxiv.org/abs/2108.10312
Code: https://github.com/qcraftai/simtrack
Point Cloud Denoising(点云去噪)
Score-Based Point Cloud Denoising
Paper: https://arxiv.org/abs/2107.10981
Code: None
Point Cloud Registration(点云配准)
HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
Homepage: https://ispc-group.github.io/hregnet
Paper: https://arxiv.org/abs/2107.11992
Code: https://github.com/ispc-lab/HRegNet
A Robust Loss for Point Cloud Registration
Paper: https://arxiv.org/abs/2108.11682
Code: None
Point Cloud Completion(点云补全)
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
Paper(Oral): https://arxiv.org/abs/2108.08839
Code: https://github.com/yuxumin/PoinTr
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
Paper: https://arxiv.org/abs/2108.04444
Code: https://github.com/AllenXiangX/SnowflakeNet
雷达语义分割(Radar Semantic Segmentation)
Multi-View Radar Semantic Segmentation
Paper: https://arxiv.org/abs/2103.16214
Code: https://github.com/valeoai/MVRSS
图像恢复(Image Restoration)
Dynamic Attentive Graph Learning for Image Restoration
Paper: https://arxiv.org/abs/2109.06620
Code: https://github.com/jianzhangcs/DAGL
超分辨率(Super-Resolution)
Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
Paper: https://arxiv.org/abs/2004.03791
Code: https://github.com/LongguangWang/ArbSR
Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution
Paper: https://arxiv.org/abs/2108.05302
Code: https://github.com/JingyunLiang/MANet
Deep Reparametrization of Multi-Frame Super-Resolution and Denoising
Paper(Oral): https://arxiv.org/abs/2108.08286
Code: None
Dual-Camera Super-Resolution with Aligned Attention Modules
Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
Paper: https://arxiv.org/abs/2109.01349
Code: https://github.com/Tengfei-Wang/DualCameraSR
Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme
Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
Code: https://github.com/IanYeung/RealVSR
Dataset: https://github.com/IanYeung/RealVSR
去噪(Denoising)
Deep Reparametrization of Multi-Frame Super-Resolution and Denoising
Paper(Oral): https://arxiv.org/abs/2108.08286
Code: None
Rethinking Deep Image Prior for Denoising
Paper: https://arxiv.org/abs/2108.12841
Code: https://github.com/gistvision/DIP-denosing
医学图像去噪(Medical Image Denoising)
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
Paper: https://arxiv.org/abs/2109.08044
Code: None
去模糊(Deblurring)
Rethinking Coarse-to-Fine Approach in Single Image Deblurring
Paper: https://arxiv.org/abs/2108.05054
Code: https://github.com/chosj95/MIMO-UNet
Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
Paper: https://arxiv.org/abs/2108.09108
Code: None
阴影去除(Shadow Removal)
CANet: A Context-Aware Network for Shadow Removal
Paper: https://arxiv.org/abs/2108.09894
Code: https://github.com/Zipei-Chen/CANet
视频插帧(Video Frame Interpolation)
XVFI: eXtreme Video Frame Interpolation
Paper(Oral): https://arxiv.org/abs/2103.16206
Code: https://github.com/JihyongOh/XVFI
Dataset: https://github.com/JihyongOh/XVFI
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation
Paper: https://arxiv.org/abs/2108.06815
Code: https://github.com/JunHeum/ABME
视频修复/补全(Video Inpainting)
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
Paper: https://arxiv.org/abs/2109.02974
Code: https://github.com/ruiliu-ai/FuseFormer
行人重识别(Person Re-identification)
TransReID: Transformer-based Object Re-Identification
Paper: https://arxiv.org/abs/2102.04378
Code: https://github.com/heshuting555/TransReID
IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID
Paper(Oral): https://arxiv.org/abs/2108.02413
Code: https://github.com/SikaStar/IDM
行人搜索(Person Search)
Weakly Supervised Person Search with Region Siamese Networks
Paper: https://arxiv.org/abs/2109.06109
Code: None
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
2D 人体姿态估计
Human Pose Regression with Residual Log-likelihood Estimation
Paper(Oral): https://arxiv.org/abs/2107.11291
Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
Online Knowledge Distillation for Efficient Pose Estimation
Paper: https://arxiv.org/abs/2108.02092
Code: None
3D 人体姿态估计
Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
Paper: https://arxiv.org/abs/2107.13788
Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
Paper: https://arxiv.org/abs/2109.05885
Code: None
6D位姿估计(6D Object Pose Estimation)
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
Paper: https://arxiv.org/abs/2109.10115
Code: None
Dataset: None
3D人头重建(3D Head Reconstruction)
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
Homepage: https://crisalixsa.github.io/h3d-net/
Paper: https://arxiv.org/abs/2107.12512
人脸识别(Face Recognition)
SynFace: Face Recognition with Synthetic Data
Paper: https://arxiv.org/abs/2108.07960
Code: None
Facial Expression Recognition(人脸表情识别)
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
Paper: https://arxiv.org/abs/2108.11116
Code: None
行为识别(Action Recognition)
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Paper: https://arxiv.org/abs/2104.09952
Code: None
Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
Paper: https://arxiv.org/abs/2107.12213
Code: https://github.com/Uason-Chen/CTR-GCN
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Paper: https://arxiv.org/abs/2108.02183
Code: None
Dynamic Network Quantization for Efficient Video Inference
Homepage: https://cs-people.bu.edu/sunxm/VideoIQ/project.html
Paper: https://arxiv.org/abs/2108.10394
Code: https://github.com/sunxm2357/VideoIQ
时序动作定位(Temporal Action Localization)
Enriching Local and Global Contexts for Temporal Action Localization
Paper: https://arxiv.org/abs/2107.12960
Code: None
动作检测(Action Detection)
Class Semantics-based Attention for Action Detection
Paper: https://arxiv.org/abs/2109.02613
Code: None
群体活动识别(Group Activity Recognition)
GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
Paper: https://arxiv.org/abs/2108.12630
Code: https://github.com/xueyee/GroupFormer
手语识别(Sign Language Recognition)
Visual Alignment Constraint for Continuous Sign Language Recognition
Paper: https://arxiv.org/abs/2104.02330
Code: https://github.com/ycmin95/VAC_CSLR
文本检测(Text Detection)
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
Paper: https://arxiv.org/abs/2107.12664
Code: https://github.com/GXYM/TextBPN
文本识别(Text Recognition)
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
Paper: https://arxiv.org/abs/2107.12090
Code: None
文本替换(Text Replacement)
STRIVE: Scene Text Replacement In Videos
Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
Paper: https://arxiv.org/abs/2109.02762
Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
视觉问答(Visual Question Answering, VQA)
Greedy Gradient Ensemble for Robust Visual Question Answering
Paper: https://arxiv.org/abs/2107.12651
Code: https://github.com/GeraldHan/GGE
对抗攻击(Adversarial Attack)
Feature Importance-aware Transferable Adversarial Attacks
Paper: https://arxiv.org/abs/2107.14185
Code: https://github.com/hcguoO0/FIA
AdvDrop: Adversarial Attack to DNNs by Dropping Information
Paper: https://arxiv.org/abs/2108.09034
Code: https://github.com/RjDuan/AdvDrop
深度估计(Depth Estimation)
Augmenting Depth Estimation with Geospatial Context
Paper: https://arxiv.org/abs/2109.09879
Code: None
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
Paper(Oral): https://arxiv.org/abs/2109.01129
Code: https://github.com/weiyithu/NerfingMVS
单目深度估计
MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
Paper: https://arxiv.org/abs/2107.12429
Code: None
Towards Interpretable Deep Networks for Monocular Depth Estimation
Paper: https://arxiv.org/abs/2108.05312
Code: https://github.com/youzunzhi/InterpretableMDE
Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
Paper: https://arxiv.org/abs/2108.03830
Code: https://github.com/w2kun/RNW
Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
Paper: https://arxiv.org/abs/2108.07628
Code: https://github.com/LINA-lln/ADDS-DepthNet
StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
Paper: https://arxiv.org/abs/2108.08574
Code: https://github.com/SJTU-ViSYS/StructDepth
视线估计(Gaze Estimation)
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
Paper: https://arxiv.org/abs/2107.13780
Code: https://github.com/DreamtaleCore/PnP-GA
人群计数(Crowd Counting)
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
Paper(Oral): https://arxiv.org/abs/2107.12746
Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
Paper: https://arxiv.org/abs/2107.12619
Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
车道线检测(Lane-Detection)
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection
Paper: https://arxiv.org/abs/2108.08482
Code: https://github.com/yujun0-0/MMA-Net
Dataset: https://github.com/yujun0-0/MMA-Net
轨迹预测(Trajectory Prediction)
Human Trajectory Prediction via Counterfactual Analysis
Paper: https://arxiv.org/abs/2107.14202
Code: https://github.com/CHENGY12/CausalHTP
Personalized Trajectory Prediction via Distribution Discrimination
Paper: https://arxiv.org/abs/2107.14204
Code: https://github.com/CHENGY12/DisDis
MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction
Paper: https://arxiv.org/abs/2108.09274
Code: https://github.com/selflein/MG-GAN
Social NCE: Contrastive Learning of Socially-aware Motion Representations
Paper: https://arxiv.org/abs/2012.11717
Code: https://github.com/vita-epfl/social-nce
Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving
Paper: https://arxiv.org/abs/2109.01510
Code: https://github.com/xrenaa/Safety-Aware-Motion-Prediction
异常检测(Anomaly Detection)
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
Paper: https://arxiv.org/abs/2101.10030
Code: https://github.com/tianyu0207/RTFM
场景图生成(Scene Graph Generation)
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Paper: https://arxiv.org/abs/2107.12309
Code: None
图像编辑(Image Editing)
Sketch Your Own GAN
Homepage: https://peterwang512.github.io/GANSketching/
Paper: https://arxiv.org/abs/2108.02774
代码: https://github.com/peterwang512/GANSketching
图像合成(Image Synthesis)
Image Synthesis via Semantic Composition
Homepage: https://shepnerd.github.io/scg/
Paper: https://arxiv.org/abs/2109.07053
Code: https://github.com/dvlab-research/SCGAN
图像检索(Image Retrieval)
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
Paper: https://arxiv.org/abs/2109.02244
Code: https://github.com/youngkyunJang/SPQ
三维重建(3D Reconstruction)
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Paper: https://arxiv.org/abs/2109.00512
Code: https://github.com/facebookresearch/co3d
Dataset: https://github.com/facebookresearch/co3d
视频稳像(Video Stabilization)
Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization
Paper: https://arxiv.org/abs/2108.09041
代码：https://github.com/Annbless/OVS_Stabilization
细粒度识别(Fine-Grained Recognition)
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
Paper: https://arxiv.org/abs/2108.02399
Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
风格迁移(Style Transfer)
AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
Paper: https://arxiv.org/abs/2108.03647
Paddle Code：https://github.com/PaddlePaddle/PaddleGAN
PyTorch Code：https://github.com/Huage001/AdaAttN
神经绘画(Neural Painting)
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Paper: https://arxiv.org/abs/2108.03798
Code: https://github.com/wzmsltw/PaintTransformer
特征匹配(Feature Matching)
Learning to Match Features with Seeded Graph Matching Network
Paper: https://arxiv.org/abs/2108.08771
Code: https://github.com/vdvchen/SGMNet
语义对应(Semantic Correspondence)
Multi-scale Matching Networks for Semantic Correspondence
Paper: https://arxiv.org/abs/2108.00211
Code: https://github.com/wintersun661/MMNet
边缘检测(Edge Detection)
Pixel Difference Networks for Efficient Edge Detection
Paper: https://arxiv.org/abs/2108.07009
Code: https://github.com/zhuoinoulu/pidinet
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
Paper: https://arxiv.org/abs/2108.00616
Code : https://github.com/MengyangPu/RINDNet
Dataset: https://github.com/MengyangPu/RINDNet
相机标定(Camera calibration)
CTRL-C: Camera calibration TRansformer with Line-Classification
Paper: https://arxiv.org/abs/2109.02259
Code: https://github.com/jwlee-vcl/CTRL-C
图像质量评估(Image Quality Assessment)
MUSIQ: Multi-scale Image Quality Transformer
Paper: https://arxiv.org/abs/2108.05997
Code: https://github.com/google-research/google-research/tree/master/musiq
度量学习(Metric Learning)
Deep Relational Metric Learning
Paper: https://arxiv.org/abs/2108.10026
Code: https://github.com/zbr17/DRML
Towards Interpretable Deep Metric Learning with Structural Matching
Paper: https://arxiv.org/abs/2108.05889
Code: https://github.com/wl-zhao/DIML
Unsupervised Domain Adaptation
Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation
Paper(Oral): https://arxiv.org/abs/2107.13467
Code: None
Video Rescaling
Self-Conditioned Probabilistic Learning of Video Rescaling
Paper: https://arxiv.org/abs/2107.11639
Code: None
Hand-Object Interaction
Learning a Contact Potential Field to Model the Hand-Object Interaction
Paper: https://arxiv.org/abs/2012.00924
Code: https://lixiny.github.io/CPF
Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Paper: https://arxiv.org/abs/2108.09105
Code: https://airbert-vln.github.io/
Dataset: https://airbert-vln.github.io/
数据集(Datasets)
Beyond Road Extraction: A Dataset for Map Update using Aerial Images
Homepage: https://favyen.com/muno21/
Paper: https://arxiv.org/abs/2110.04690
Code: https://github.com/favyen/muno21
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
Paper: https://arxiv.org/abs/2109.10115
Code: None
Dataset: None
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
Paper: https://arxiv.org/abs/2108.00616
Code : https://github.com/MengyangPu/RINDNet
Dataset: https://github.com/MengyangPu/RINDNet
Panoptic Narrative Grounding
Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
Paper(Oral): https://arxiv.org/abs/2109.04988
Code: https://github.com/BCV-Uniandes/PNG
Dataset: https://github.com/BCV-Uniandes/PNG
STRIVE: Scene Text Replacement In Videos
Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
Paper: https://arxiv.org/abs/2109.02762
Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme
Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
Code: https://github.com/IanYeung/RealVSR
Dataset: https://github.com/IanYeung/RealVSR
Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes
Paper: https://arxiv.org/abs/2109.03585
Code: None
Dual-Camera Super-Resolution with Aligned Attention Modules
Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
Paper: https://arxiv.org/abs/2109.01349
Code: https://github.com/Tengfei-Wang/DualCameraSR
Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
DepthTrack: Unveiling the Power of RGBD Tracking
Paper: https://arxiv.org/abs/2108.13962
Code: https://github.com/xiaozai/DeT
Dataset: https://github.com/xiaozai/DeT
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Paper: https://arxiv.org/abs/2109.00512
Code: https://github.com/facebookresearch/co3d
Dataset: https://github.com/facebookresearch/co3d
BioFors: A Large Biomedical Image Forensics Dataset
Paper: https://arxiv.org/abs/2108.12961
Code: None
Dataset: None
Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
Paper: https://arxiv.org/abs/2108.02399
Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Paper: https://arxiv.org/abs/2108.09105
Code: https://airbert-vln.github.io/
Dataset: https://airbert-vln.github.io/
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
Paper: http://arxiv.org/abs/2108.08202
Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection
Paper: https://arxiv.org/abs/2108.08482
Code: https://github.com/yujun0-0/MMA-Net
Dataset: https://github.com/yujun0-0/MMA-Net
XVFI: eXtreme Video Frame Interpolation
Paper(Oral): https://arxiv.org/abs/2103.16206
Code: https://github.com/JihyongOh/XVFI
Dataset: https://github.com/JihyongOh/XVFI
Personalized Image Semantic Segmentation
Paper: https://arxiv.org/abs/2107.13978
Code: https://github.com/zhangyuygss/PIS
Dataset: https://github.com/zhangyuygss/PIS
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
Homepage: https://crisalixsa.github.io/h3d-net/
Paper: https://arxiv.org/abs/2107.12512
其他(Others)
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Paper: https://arxiv.org/abs/2109.14910
Code: None
ReconfigISP: Reconfigurable Camera Image Processing Pipeline
Paper: https://arxiv.org/abs/2109.04760
Code: None
Panoptic Narrative Grounding
Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
Paper(Oral): https://arxiv.org/abs/2109.04988
Code: https://github.com/BCV-Uniandes/PNG
Dataset: https://github.com/BCV-Uniandes/PNG
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
Paper: https://arxiv.org/abs/2109.04456
https://github.com/autonomousvision/neat
Keep CALM and Improve Visual Feature Attribution
Paper: https://arxiv.org/abs/2106.07861
Code: https://github.com/naver-ai/calm
YouRefIt: Embodied Reference Understanding with Language and Gesture
Paper: https://arxiv.org/abs/2109.03413
Code: None
Pri3D: Can 3D Priors Help 2D Representation Learning?
Paper: https://arxiv.org/abs/2104.11225
Code: https://github.com/Sekunde/Pri3D
Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
Paper: https://arxiv.org/abs/2108.08487
Code: https://github.com/iCGY96/APR
Continual Learning for Image-Based Camera Localization
Paper: https://arxiv.org/abs/2108.09112
Code: None
Multi-Task Self-Training for Learning General Representations
Paper: https://arxiv.org/abs/2108.11353
Code: None
A Unified Objective for Novel Class Discovery
Homepage: https://ncd-uno.github.io/
Paper(Oral): https://arxiv.org/abs/2108.08536
Code: https://github.com/DonkeyShot21/UNO
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
Paper: https://arxiv.org/abs/2108.07884
Code: https://github.com/islamamirul/PermuteNet
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
Paper: http://arxiv.org/abs/2108.08202
Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
Impact of Aliasing on Generalizatin in Deep Convolutional Networks
Paper: https://arxiv.org/abs/2108.03489
Code: None
Out-of-Core Surface Reconstruction via Global TGV Minimization
Paper: https://arxiv.org/abs/2107.14790
Code: None
Progressive Correspondence Pruning by Consensus Learning
Homepage: https://sailor-z.github.io/projects/CLNet.html
Paper: https://arxiv.org/abs/2101.00591
Code: https://github.com/sailor-z/CLNet
Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
Paper: https://arxiv.org/abs/2107.12628
Code: None
Generalized Shuffled Linear Regression
Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
Discovering 3D Parts from Image Collections
Homepage: https://chhankyao.github.io/lpd/
Paper: https://arxiv.org/abs/2107.13629
Semi-Supervised Active Learning with Temporal Output Discrepancy
Paper: https://arxiv.org/abs/2107.14153
Code: https://github.com/siyuhuang/TOD
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
Paper: https://arxiv.org/abs/2105.02498
Code: https://github.com/KingJamesSong/DifferentiableSVD
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
Homepage: https://hwjiang1510.github.io/GraspTTA/
Paper(Oral): https://arxiv.org/abs/2104.03304
Code: None
Equivariant Imaging: Learning Beyond the Range Space
Paper(Oral): https://arxiv.org/abs/2103.14756
Code: https://github.com/edongdongchen/EI
Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Paper(Oral): https://arxiv.org/abs/2012.00451
Code: https://github.com/antoyang/just-ask-