Walkthrough

YOLOv2 & YOLO9000 Paper Walkthrough: Higher, Faster, Stronger

— that’s the ambitious title the authors selected for his or her paper introducing each YOLOv2 and YOLO9000. The title of the paper itself is “” , which was published back in December 2016. The...

YOLOv1 Loss Function Walkthrough: Regression for All

In my previous article I explained how YOLOv1 works and tips on how to construct the architecture from scratch with PyTorch. In today’s article, I'm going to deal with the loss function used to...

YOLOv1 Paper Walkthrough: The Day YOLO First Saw the World

If we speak about object detection, one model that likely involves our mind first is YOLO — well, at the least for me, because of its popularity in the sector of computer vision. The very first version...

MobileNetV3 Paper Walkthrough: The Tiny Giant Getting Even Smarter

Welcome back to the Tiny Giant series — a series where I share what I learned about MobileNet architectures. Up to now two articles I covered MobileNetV1 and MobileNetV2. Take a look at references ...

MobileNetV2 Paper Walkthrough: The Smarter Tiny Giant

Introduction was a breakthrough in the sphere of computer vision because it proved that deep learning models don't necessarily should be computationally expensive to realize high accuracy. Last month I posted an article where...

Paper Walkthrough: Attention Is All You Need

Because the title suggests, in this text I'm going to implement the Transformer architecture from scratch with PyTorch — yes, literally from scratch. Before we get into it, let me provide a temporary overview...

A Walkthrough of Nvidia’s Latest Multi-Modal LLM Family

From LLaVA, Flamingo, to NVLMMulti-modal LLM development has been advancing fast lately.Although the industrial multi-modal models like GPT-4v, GPT-4o, Gemini, and Claude 3.5 Sonnet are probably the most eye-catching performers today, the open-source models...

Seamless: In-Depth Walkthrough of Meta’s Latest Open-Source Suite of Translation Models

Meta’s open-source Seamless models: A deep dive into translation model architectures and a Python implementation guide using HuggingFaceProceed reading on Towards Data Science »

Recent posts

Popular categories

ASK ANA