Triton

Artificial Intelligence

Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels

or fine-tuned an LLM, you’ve likely hit a wall on the very last step: the Cross-Entropy Loss. The offender is the logit bottleneck. To predict the subsequent token, we project a hidden state into...

ASK ANA - January 16, 2026

Artificial Intelligence

Breaking the Hardware Barrier: Software FP8 for Older GPUs

As deep learning models grow larger and datasets expand, practitioners face an increasingly common bottleneck: GPU memory bandwidth. While cutting-edge hardware offers FP8 precision to speed up training and inference, most data scientists and...

ASK ANA - December 28, 2025

Artificial Intelligence

Learning Triton One Kernel at a Time: Softmax

Within the previous article of this series, operation in all fields of computer science: matrix multiplication. It's heavily utilized in neural networks to compute the activation of linear layers. Nevertheless, activations on their...

ASK ANA - November 23, 2025

Artificial Intelligence

Learning Triton One Kernel at a Time: Matrix Multiplication

multiplication is undoubtedly probably the most common operation performed by GPUs. It's the elemental constructing block of linear algebra and shows up across a large spectrum of various fields equivalent to graphics, physics...

ASK ANA - October 15, 2025

Artificial Intelligence

Popular categories

Artificial Intelligence11037 New Post1 My Blog1

Triton

Recent posts

Latest Rowhammer attacks give complete control of machines running Nvidia GPUs

Our most capable open models so far

Achieving Single-Digit Microsecond Latency Inference for Capital Markets

Frontier multimodal intelligence on device

How one can Handle Classical Data in Quantum Models?

Popular categories