通过自适应边缘提高视频检索

由于网络上视频的迅速涌现,视频检索变得越来越重要。目前,主流的视频检索模式是通过将正样本与负样本相似性之间的距离与一个固定边界的距离扩大来学习视频-文本表示。然而,用于训练的负样本是随机采样的,这意味着负样本之间的语义可能相关甚至等同,而大多数方法仍然强制使它们之间的表示不相似以减少它们之间的相似性。这种现象导致了不准确的监督和在学习视频-文本表示时表现较差。

尽管大多数视频检索方法忽略了这种现象,但我们提出了一种自适应边界,它可以根据正负样本之间的距离变化来解决上述问题。首先,我们设计了自适应边界的计算框架,包括距离测量方法和距离与边界之间的函数。然后,我们探索了一种新颖的实现方法,称为“跨模态广义自我蒸馏”(CMGSD),它可以在大多数视频检索模型上进行少量修改。值得注意的是,在训练时,CMGSD几乎不增加计算成本,在测试时不增加计算成本。在三个广泛使用的数据集上的实验结果表明,所提出的方法可以比相应的骨干模型产生显着更好的性能,并且它的表现超过了最先进的方法。
Video retrieval is becoming increasingly important owing to the rapid
emergence of videos on the Internet. The dominant paradigm for video retrieval
learns video-text representations by pushing the distance between the
similarity of positive pairs and that of negative pairs apart from a fixed
margin. However, negative pairs used for training are sampled randomly, which
indicates that the semantics between negative pairs may be related or even
equivalent, while most methods still enforce dissimilar representations to
decrease their similarity. This phenomenon leads to inaccurate supervision and
poor performance in learning video-text representations.
While most video retrieval methods overlook that phenomenon, we propose an
adaptive margin changed with the distance between positive and negative pairs
to solve the aforementioned issue. First, we design the calculation framework
of the adaptive margin, including the method of distance measurement and the
function between the distance and the margin. Then, we explore a novel
implementation called “Cross-Modal Generalized Self-Distillation” (CMGSD),
which can be built on the top of most video retrieval models with few
modifications. Notably, CMGSD adds few computational overheads at train time
and adds no computational overhead at test time. Experimental results on three
widely used datasets demonstrate that the proposed method can yield
significantly better performance than the corresponding backbone model, and it
outperforms state-of-the-art methods by a large margin.
论文链接:http://arxiv.org/pdf/2303.05093v1


Posted

in

by

Tags: