T-Attention: Optimizing Attention Computation Using Temporal Parameter Time

Abstract

The attention mechanism exhibits remarkable capability in processing sequential data however, its computational complexity scales quadratically with sequence length, resulting in significant resource demands. Numerous studies have achieved substantial success in leveraging sparse matrices to reduce the computational burden of dot-product operations, thereby improving both the computational efficiency and accuracy of models. Nevertheless, the question remains: can we further optimize these computations? In this paper, we introduce a novel approach based on function projection, integrating a restructured word embedding technique with the attention mechanism to alleviate computational overhead. We first validate the theoretical efficacy of designing word embedding using parametric equations and demonstrate the effectiveness of our proposed embedding method. Subsequently, we conduct experiments across a variety of basis functions, illustrating that our approach affords greater flexibility in parameter selection while effectively reducing computational costs. Compared to state-of-the-art attention-based models, our method achieves a reduction in inference time, underscoring its practical advantages.

BibTeX Citation:

@inproceedings{ICIC2025,
    author = {Yichen Yang, Hongxu Hou, and Wei Chen},
    title = {T-Attention: Optimizing Attention Computation Using Temporal Parameter Time},
    booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
    month = {July},
    date = {26-29},
    year = {2025},
    address = {Ningbo, China},
    pages = {1308-1322},
    note = {Poster Volume Ⅱ}
    doi = {10.65286/icic.v21i2.53258}
}