Diffusion-based framework for weakly-supervised temporal action localization

Yuanbing Zou, Qingjie Zhao*, Prodip Kumar Sarker, Shanshan Li, Lei Wang, Wangwang Liu

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Weakly supervised temporal action localization aims to localize action instances with only video-level supervision. Due to the absence of frame-level annotation supervision, how effectively separate action snippets and backgrounds from semantically ambiguous features becomes an arduous challenge for this task. To address this issue from a generative modeling perspective, we propose a novel diffusion-based network with two stages. Firstly, we design a local masking mechanism module to learn the local semantic information and generate binary masks at the early stage, which (1) are used to perform action-background separation and (2) serve as pseudo-ground truth required by the diffusion module. Then, we propose a diffusion module to generate high-quality action predictions under the pseudo-ground truth supervision in the second stage. In addition, we further optimize the new-refining operation in the local masking module to improve the operation efficiency. The experimental results demonstrate that the proposed method achieves a promising performance on the publicly available mainstream datasets THUMOS14 and ActivityNet. The code is available at http://github.com/Rlab123/action_diff.

源语言英语
文章编号111207
期刊Pattern Recognition
160
DOI
出版状态已出版 - 4月 2025

指纹

探究 'Diffusion-based framework for weakly-supervised temporal action localization' 的科研主题。它们共同构成独一无二的指纹。

引用此