GCT-Net: A malicious Android application detection method based on multimodal feature fusion

Authors: Yuheng Huang, Weihao Huang, Chunhong Jiang, Song Xie, and Hongsong Wang
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 1773-1790
Keywords: Android Malware Detection · Multimodal Learning · Graph Neural Network · Tree-Structured LSTM

Abstract

Despite the persistent threat of malicious Android applica tions, many existing detection methods struggle to effectively integrate and analyze the heterogeneous threat indicators embedded within APKs. This fragmented analysis often fails to capture the complex interplay between different threat vectors. To address this challenge, we propose GCT-Net GNN-CNN-Tree-LSTM-Net , a novel deep learning framework that synergistically fuses Graph, Convolutional, and Tree-structured fea tures for unified malware detection. We first disassemble APKs via re verse engineering to derive three critical modalities: API call sequences modeled as directed graphs and processed by a Graph Neural Network GNN to capture semantic dependencies binary code converted into greyscale images and analyzed via a two layers Convolutional Neural Network 2D-CNN to detect spatial malware patterns and URL strings parsed into syntax trees and encoded using a hierarchical Tree LSTM network to learn structural embeddings. These modality-specific fea tures are adaptively integrated through three dense layers. Evaluated on two datasets, GCT-Net achieves state-of-the-art performance with 96.48 93.75 accuracy, 97.62 96.45 precision, 97.05 95.40 re call and 97.33 95.42 F1-score, outperforming other models. Abla tion studies confirm the critical contributions of all three modalities and validate the fusion efficacy, establishing a new method for multimodal malware analysis.
📄 View Full Paper (PDF) 📋 Show Citation