Abstract: Although audio-visual speech separation has achieved significant advancements, it is relatively difficult to obtain audio and visual modalities simultaneously in real scenarios, often ...