PyTorch

YOLOv2 & YOLO9000 Paper Walkthrough: Higher, Faster, Stronger

— that’s the ambitious title the authors selected for his or her paper introducing each YOLOv2 and YOLO9000. The title of the paper itself is “” , which was published back in December 2016. The...

Optimizing Data Transfer in Distributed AI/ML Training Workloads

a part of a series of posts on optimizing data transfer using NVIDIA Nsight™ Systems (nsys) profiler. Part one focused on CPU-to-GPU data copies, and part two on GPU-to-CPU copies. On this post, we turn our attention...

Optimizing Data Transfer in Batched AI/ML Inference Workloads

is a to Optimizing Data Transfer in AI/ML Workloads where we demonstrated using NVIDIA Nsight™ Systems (nsys) in studying and solving the common data-loading bottleneck — occurrences where the GPU idles while it waits for input...

Optimizing Data Transfer in AI/ML Workloads

a , a deep learning model is executed on a dedicated GPU accelerator using input data batches it receives from a CPU host. Ideally, the GPU — the dearer resource — needs to...

Optimizing PyTorch Model Inference on AWS Graviton

AI/ML models will be an especially expensive endeavor. A lot of our posts have been focused on a wide range of suggestions, tricks, and techniques for analyzing and optimizing the runtime performance of AI/ML workloads....

Optimizing PyTorch Model Inference on CPU

grows, so does the criticality of optimizing their runtime performance. While the degree to which AI models will outperform human intelligence stays a heated topic of debate, their need for powerful and expensive...

On the Challenge of Converting TensorFlow Models to PyTorch

Within the interest of managing reader expectations and stopping disappointment, we would love to start by stating that this post does not provide a totally satisfactory solution to the issue described within the title. We are...

Overcoming the Hidden Performance Traps of Variable-Shaped Tensors: Efficient Data Sampling in PyTorch

is the a part of a series of posts on the subject of analyzing and optimizing PyTorch models. Throughout the series, we have now advocated for using the PyTorch Profiler in AI model development and demonstrated the...

Recent posts

Popular categories

ASK ANA