本文的目的是利用声源的视觉线索,从混合音频中恢复原始成分信号。这种任务通常被称为视觉引导声源分离。提出的级联对手滤波器(COF)框架由多个阶段组成,基于外观和运动信息递归地细化语音分离。一个关键的因素是一个新的对手过滤模块,识别和重新定位剩余成分之间的声源。最后,我们提出一种声源位置掩蔽(SSLM)技术,它与COF一起,产生一个像素级的声源位置掩蔽。整个系统使用大量未标记的视频进行端到端的训练。我们将COF与最近的基线进行比较,并在三个具有挑战性的数据集(MUSIC、A-MUSIC和A-NATURAL)中获得最新的性能。将公开实施和预先培训的模式。
原文标题:Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
原文:The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the sound separation based on appearance and motion information. A key element is a novel opponent filter module that identifies and relocates residual components between sound sources. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained end-to-end using a large set of unlabelled videos. We compare COF with recent baselines and obtain state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). The implementation and pre-trained models will be made publicly available.
原文作者:Lingyu Zhu, Esa Rahtu
原文地址:https://arxiv.org/abs/2006.03028
基于级联对手滤波网络的视觉引导声源分离(CS CP).pdf ---来自腾讯云社区的---蔡秋纯
微信扫一扫打赏
支付宝扫一扫打赏