Understanding Visual Language Models

LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models

Abstract: Recent advances in large vision-language models (LVLMs) typically employ vision encoders based on the Vision Transformer (ViT) architecture. The division of the images into patches by ViT ...

Three ways AI is learning to understand the physical world

Large language models lack grounding in physical causality — a gap world models are designed to fill. Here's how three ...

Columbia News

An Interdisciplinary Project Investigates Art Images and AI

Latent spaces are abstract, high-dimensional areas within neural networks where patterns and relationships are encoded, but ...

IEEE

VLM-TD: A Visual Language Model for Transmission Defects with Integrated Link Attention

Abstract: Traditional transmission line inspection, which relies on manual recording of fault information, is prone to ambiguity. The semantics generated by general image description models suffer ...

The Lancet

Mapping the susceptibility of large language models to medical misinformation across clinical notes and social media: a cross-sectional benchmarking analysis

aThe Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Health System, New York, NY, USA bThe Hasso Plattner Institute for Digital Health at Mount Sinai, Mount Sinai Health ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results