Language models has witnessed rapid advancements, with Transformer-based architectures leading the charge in natural language processing. Nonetheless, as models scale, the challenges of handling long contexts, memory efficiency, and throughput have turn out to...