Abstract: Short-form videos (SFVs) have recently emerged as one of the most prominent forms of entertainment content on social media, attracting huge audiences across platforms such as TikTok, ...
Abstract: A high-quality enrollment speech is crucial to target speaker extraction (TSE), since it provides essential cues for identifying the target speaker in the mixture. However, real applications ...