Optimizing

Optimizing Token Generation in PyTorch Decoder Models

which have pervaded nearly every facet of our day by day lives are autoregressive decoder models. These models apply compute-heavy kernel operations to churn out tokens one after the other in a way...

Optimizing Deep Learning Models with SAM

: Overparameterization, Generalizability, and SAM The dramatic success of recent deep learning — especially within the domains of Computer Vision and Natural Language Processing — is built on “overparameterized” models: models with good enough parameters to memorize the training data...

Optimizing Vector Search: Why You Should Flatten Structured Data 

structured data right into a RAG system, engineers often default to embedding raw JSON right into a vector database. The fact, nonetheless, is that this intuitive approach results in dramatically poor performance. Modern...

Optimizing Data Transfer in Distributed AI/ML Training Workloads

a part of a series of posts on optimizing data transfer using NVIDIA Nsight™ Systems (nsys) profiler. Part one focused on CPU-to-GPU data copies, and part two on GPU-to-CPU copies. On this post, we turn our attention...

Optimizing Data Transfer in Batched AI/ML Inference Workloads

is a to Optimizing Data Transfer in AI/ML Workloads where we demonstrated using NVIDIA Nsight™ Systems (nsys) in studying and solving the common data-loading bottleneck — occurrences where the GPU idles while it waits for input...

Optimizing Data Transfer in AI/ML Workloads

a , a deep learning model is executed on a dedicated GPU accelerator using input data batches it receives from a CPU host. Ideally, the GPU — the dearer resource — needs to...

Optimizing PyTorch Model Inference on AWS Graviton

AI/ML models will be an especially expensive endeavor. A lot of our posts have been focused on a wide range of suggestions, tricks, and techniques for analyzing and optimizing the runtime performance of AI/ML workloads....

Optimizing PyTorch Model Inference on CPU

grows, so does the criticality of optimizing their runtime performance. While the degree to which AI models will outperform human intelligence stays a heated topic of debate, their need for powerful and expensive...

Recent posts

Popular categories

ASK ANA