GroupViT is a framework for learning semantic segmentation purely from text captions without using any mask supervision. It learns to perform bottom-up heirarchical spatial grouping of ...
(Note: The link above will automatically redirect to the latest available release. Look for the .zip file under the "Assets" section of the release.) This project and its creators are not affiliated ...