Walkthrough

YOLOv1 Paper Walkthrough: The Day YOLO First Saw the World

If we speak about object detection, one model that likely involves our mind first is YOLO — well, at the least for me, because of its popularity in the sector of computer vision. The very first version...

MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter

Welcome back to the Tiny Giant series — a series where I share what I learned about MobileNet architectures. Up to now two articles I covered MobileNetV1 and MobileNetV2. Take a look at references ...

MobileNetV2 Paper Walkthrough: The Smarter Tiny Giant

Introduction was a breakthrough in the sphere of computer vision because it proved that deep learning models don't necessarily should be computationally expensive to realize high accuracy. Last month I posted an article where...

Paper Walkthrough: Attention Is All You Need

Because the title suggests, in this text I'm going to implement the Transformer architecture from scratch with PyTorch — yes, literally from scratch. Before we get into it, let me provide a temporary overview...

A Walkthrough of Nvidia’s Latest Multi-Modal LLM Family

From LLaVA, Flamingo, to NVLMMulti-modal LLM development has been advancing fast lately.Although the industrial multi-modal models like GPT-4v, GPT-4o, Gemini, and Claude 3.5 Sonnet are probably the most eye-catching performers today, the open-source models...

Seamless: In-Depth Walkthrough of Meta’s Latest Open-Source Suite of Translation Models

Meta’s open-source Seamless models: A deep dive into translation model architectures and a Python implementation guide using HuggingFaceProceed reading on Towards Data Science »

Recent posts

Popular categories

ASK ANA