Deriving optimal initial variance of weight matrices in neural network layers with ReLU activation functionInitialization techniques are one in every of the prerequisites for successfully training a deep learning architecture. Traditionally, weight initialization methods...