High Utility Pattern Fusion by Pretrained Language Models for Text Classification

Abstract

In the area of text classification, the identification of correlation patterns among semantics presents a persistent challenge. To tackle this issue, we propose a method called High Utility Pattern HUP fusion by Pretrained Language Models for Text Classification, which aims to enhance the performance of text classifica-tion techniques by learning correlation patterns among semantics within the same space. Specifically, HUP employs a Triplet Networks architecture, which utilizes three distinct encoders to extract sample semantics, correlation pattern infor-mation, and label semantic information, respectively. We employ a high-utility itemset mining algorithm to extract correlation pattern information with high utili-ty, and by incorporating prompt templates into labels, the model is able to fully leverage the semantic knowledge embedded in pre-trained models. Ultimately, through joint training, the distance between a sample and its corresponding label is minimized, while the distance between the sample and labels that are not asso-ciated with the sample is maximized. Empirical investigations conducted on six standard text classification datasets reveal that the classification accuracy of HUP exhibits a notable enhancement, with an average accuracy increase ranging from 1.52 to 89.08 .

BibTeX Citation:

@inproceedings{ICIC2024,
    author = {Yujia Wu1, Hong Ren},
    title = {High Utility Pattern Fusion by Pretrained Language Models for Text Classification},
    booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
    month = {August},
    date = {5-8},
    year = {2024},
    address = {Tianjin, China},
    pages = {339-350},
    note = {Poster Volume Ⅰ}
    doi = {10.65286/icic.v20i1.97910}
}