考虑到轨迹,用于多人姿态预测的身体相互作用转换器

多人姿势预测仍然是一个具有挑战性的问题,特别是在模拟复杂人群场景中的人体细节交互方面。现有方法通常将整个姿态序列表示为时间序列,但忽略了基于骨架身体部位的人际交互影响。在本文中,我们提出了一种新颖的轨迹感知体部交互Transformer(TBIFormer)用于通过有效地建模身体部位交互来进行多人姿势预测。具体地,我们构建了一个时间身体分区模块,将所有姿态序列转换为多人身体部位序列,以根据身体语义保留空间和时间信息。然后,我们设计了一个社交身体交互自注意(SBI-MSA)模块,利用转换后的序列来学习身体部位动态的人际交互和个体内部交互。此外,与以往基于欧氏距离的空间编码不同,我们提出了一种新颖而高效的轨迹感知相对位置编码用于SBI-MSA,以提供有区别的空间信息和额外的交互线索。在短期和长期视野下,我们在CMU-Mocap、MuPoTS-3D以及合成数据集(6 ~ 10人)上进行了实证评估,并证明了我们的方法大大优于最先进的方法。代码将在接受后公开提供。
Multi-person pose forecasting remains a challenging problem, especially in
modeling fine-grained human body interaction in complex crowd scenarios.
Existing methods typically represent the whole pose sequence as a temporal
series, yet overlook interactive influences among people based on skeletal body
parts. In this paper, we propose a novel Trajectory-Aware Body Interaction
Transformer (TBIFormer) for multi-person pose forecasting via effectively
modeling body part interactions. Specifically, we construct a Temporal Body
Partition Module that transforms all the pose sequences into a Multi-Person
Body-Part sequence to retain spatial and temporal information based on body
semantics. Then, we devise a Social Body Interaction Self-Attention (SBI-MSA)
module, utilizing the transformed sequence to learn body part dynamics for
inter- and intra-individual interactions. Furthermore, different from prior
Euclidean distance-based spatial encodings, we present a novel and efficient
Trajectory-Aware Relative Position Encoding for SBI-MSA to offer discriminative
spatial information and additional interactive clues. On both short- and
long-term horizons, we empirically evaluate our framework on CMU-Mocap,
MuPoTS-3D as well as synthesized datasets (6 ~ 10 persons), and demonstrate
that our method greatly outperforms the state-of-the-art methods. Code will be
made publicly available upon acceptance.
论文链接:http://arxiv.org/pdf/2303.05095v1


Posted

in

by

Tags: