1. Introduction
Ever for the reason that introduction of the self-attention mechanism, Transformers have been the highest alternative relating to Natural Language Processing (NLP) tasks. Self-attention-based models are highly parallelizable and require substantially fewer parameters,...