Abstract: Visual grounding in remote sensing (RSVG) images aims to detect specific objects associated with referring expressions in remote sensing images. Existing methods typically combine outputs of ...
Abstract: Recent advances in image understanding have enabled methods that leverage large language models for multimodal reasoning in remote sensing. However, existing approaches still struggle to ...