Learning to Adaptively Incorporate External Syntax through Gated Self-Attention

Authors: Shengyuan Hou
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 1051-1067
Keywords: Adaptive gating mechanism, Constituency syntax-aware architecture, Machine translation.

Abstract

Transformers are known to be able to implicitly learn syntax from data during training, albeit their quality depends heavily on the data size and quality. However, introducing structured syntactic information into the sequential Transformer is not trivial. Previous analytical studies have shown that Transformers learn more abstract representations through layers, with their lower layers being more related to syntactic information. In accordance, to provide extra flexibility and interpretability along with the utilization of constituency syntax, we propose an architecture that allows different layers of the Transformer to control the incorporating weights of external syntax adaptively through a gating mechanism. Experimental results of our learned syntactic gating weights reveal that Transformer tends to utilize constituency syntax hierarchically, which nicely aligns with previous findings, showcasing the interpretability of our architecture. Moreover, experimental results on five machine translation datasets across various language pairs also show that our model outperforms the vanilla Transformer by 1.22 BLEU score on average, and it is competitive against other latest syntax-aware models. Also, only few additional hyperparameters are required, alleviating the burden of searching for the best syntax incorporation location.
📄 View Full Paper (PDF) 📋 Show Citation