Fused

Cutting LLM Memory by 84%: A Deep Dive into Fused Kernels

or fine-tuned an LLM, you’ve likely hit a wall on the very last step: the Cross-Entropy Loss. The offender is the logit bottleneck. To predict the subsequent token, we project a hidden state into...

Recent posts

Popular categories

ASK ANA