Mapping Cyber Threat Intelligence through Active Semi-Supervised Learning (ASSBM)

Authors: Sujie Shao, Zhiyi Li, Yan Liu, Shaoyong Guo, and Chao Yang
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 962-978
Keywords: CTI Mapping, Active Learning, SecureBERT, BERT.

Abstract

Cyber Threat Intelligence CTI plays a critical role in enhancing the implementation of cybersecurity programs by offering comprehensive information on attacks, which enables organizations to identify and respond to cyber threats more effectively. However, because most CTI data is presented in natural language and often contains ambiguous content, it requires interpretation and summarization by security experts for effective utilization. To ad-dress these challenges, this paper proposes a mapping method for CTI based on active and semi-supervised SecureBERT, aimed at alleviating the scarcity of labeled data and the ambiguities inherent in the CTI mapping task. This method efficiently extracts potential attack stage information from CTI at a minimal cost, ensuring accurate mapping even when labeled sample sizes are insufficient. We introduce an active learning sampling strategy that integrates uncertainty and instance relevance, selecting the most representative samples from unlabeled data to augment the training set. This strategy enhances the interpretability of labeled-scarce and ambiguous CTI, facilitating precise mappings between ambiguous CTI and the accurate phases of cyber attacks. Validation through experiments on the CPTC and CCDC datasets demonstrates that the proposed method excels across various baseline models, considering the influence of labeled data quantity and comparisons with different active learning algorithms. In situations where labeled CTI is limited, the proposed approach significantly improves the interpretive effective-ness of CTI, thereby enhancing the model's classification accuracy and training efficiency.
📄 View Full Paper (PDF) 📋 Show Citation