High Training Efficiency Transformer for Multi-scenario Non-Autoregressive Neural Machine Translation

Authors: Xiangyu Qu, Guojing Liu, Liang Li
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 760-770
Keywords: Machine translation Non-autoregressive Transformer High Efficiency Training

Abstract

Non-autoregressive neural machine translation NAT focuses on improving reasoning efficiency through parallel decoding. However, NAT models training method lack improvement compared with the autoregressive translation AT models, which leads to an imbalance between training efficiency and inference speed. In this paper, we propose Padding Accelerated Training PAT for NAT.Specifically, we pad short sentences not with padding tokens but with another real training sentence, and apply Sequence Concatenating attention SC to obtain the sentence-level blocking matrix to prevent multiple sentences from interfering with each other. Experiments show that PAT is applicable to both sentence-level and document-level machine translation scenarios. While ensuring translation performance, PAT improves training speed by more than 2 times in multiple experimental tasks.
📄 View Full Paper (PDF) 📋 Show Citation