Google Unveils PaliGemma 2: New Vision-Language AI Models Launched

Google has announced the release of PaliGemma 2, a new family of vision-language models. Launched on December 5th, PaliGemma 2 is the successor to the initial PaliGemma model, which was the first vision-language model in the Gemma family, released seven months prior. Built upon Gemma 2, these models are designed to understand and interact with visual information. According to Google, PaliGemma 2 is intended to simplify the integration of advanced vision-language capabilities into applications for developers. The new models offer improved captioning features, capable of identifying emotions and actions depicted in images. PaliGemma 2 provides scalable performance, adaptable to various tasks through different model sizes—3B, 10B, and 28B parameters—and resolutions of 224px, 448px, and 896px. The models also feature long captioning, enabling the generation of detailed and context-aware captions that go beyond simple object recognition to describe actions, emotions, and the story within an image.

Source link

Google Unveils PaliGemma 2: New Vision-Language AI Models Launched

Comments

Leave a Reply Cancel reply

More posts

Adjoint Method Speeds Up Design of Ion Optical Devices

Study: Progressive Lens Optics Performance for Astigmatism Detailed

Maxar Launches Vision Software to Boost GPS Resilience

Exceptional Coatings Power Energy-Saving Smart Windows