DeLighT, Parameter Reduction for Transformers — DeLighT: Deep and Light-weight Transformer,
DeLighT, by University of Washington, Facebook AI Research, and Allen Institute for AI,
2021 ICLR, Over 60 Citations (Sik-Ho Tsang @ Medium)
NLP, LM, NMT, Transformer Within each Transformer block, a deep and lightweight transformation is used using DeLighT block. Across blocks, block-wise scaling is…