Multimodal learning

How Vision Language Models Are Trained from “Scratch”

to remodel a small text-only language model and gift it the ability of vision. This text is to summarize all my learnings, and take a deeper have a look at the network architectures...

Pairwise Cross-Variance Classification

Intro This project is about recuperating zero-shot Classification of images and text using CV/LLM models without spending money and time fine-tuning in training, or re-running models in inference. It uses a novel dimensionality reduction technique...

Scaling audio-visual learning without labels

Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere...

Recent posts

Popular categories

ASK ANA