Abstract: Semantic learning and understanding of multi-vehicle interaction patterns in a cluttered driving environment are essential but challenging for autonomous vehicles to make proper decisions.
Abstract: In this paper, we present a few-shot text-to-video frame-work, LAMP, which enables a text-to-image diffusion model to Learn A specific Motion Pattern with 8 ~16 videos on a single GPU.