- YOLOCrane: An Enhanced YOLOv8-Based Algorithm for Robust Crane Detection in Transmission Line Scenarios, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaolong Wang, Bo Jiang, Yanwei Zhang, Genyi Wang, and Guocheng An
Abstract: Crane detection under transmission lines, crucial for power system safety monitoring, faces challenges in accuracy and generalization, particularly in environments with structural interference e.g., utility poles and dynamic vegetation occlusion. To address these issues, we propose YOLOCrane, an enhanced YOLOv8-based algorithm. First, a simplified backbone network reduces computational complexity by 33 while maintaining robust feature extraction. Second, by incorporating learnable mask parameters and multi-channel fusion, the channel-wise LBP algorithm adaptively extracts texture features across dimensions, addressing the limitations of traditional LBP in fixed 3×3 windows. Finally, a heterogeneous dual-branch attention fusion module integrates convolutional features with LBP texture patterns, enabling complementary learning of spatial and texture information. Experimental results on the CraneLine dataset demonstrate that YOLOCrane achieves an mAP0.5 of 85.8 , surpassing YOLOv8x by 2.4 , YOLOv11x by 2.1 , and RT-DETRx by 1.7 , while improving inference speed by 9.72 FPS. These advancements underscore YOLOCrane's capability to tackle detection challenges in complex environments, providing a robust solution for real-time safety monitoring of transmission lines.
Keyword: Crane detection under transmission lines, simplified backbone SB , channel-wise LBP, dual-branch attention Fusion DAF
Cite@inproceedings{ICIC2025,
author = {Xiaolong Wang, Bo Jiang, Yanwei Zhang, Genyi Wang, and Guocheng An},
title = {YOLOCrane: An Enhanced YOLOv8-Based Algorithm for Robust Crane Detection in Transmission Line Scenarios},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {162-173},
note = {Poster Volume Ⅰ}
}
- Low-Light Gaze Estimation for Fine-grained Intelligent Classroom Behavior Monitoring, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jin Wang, Meng Chen, and Dandan Wang
Abstract: Computer Vision CV technology is crucial for intelligent classroom behavior monitoring. Current CV methods can only measure coarse-grained metrics like attendance rate and head-up rate, while gaze estimation enables fine-grained monitoring of each student. However, existing gaze estimation algorithms struggle in low-light classroom environments. To address this, we propose the LLSGE-Net framework, which integrates low-light image enhancement with gaze estimation. This multi-stage enhancement and calibration process significantly improves image quality. Our method utilizes Local-Global Context Fusion ALGCF for better eye and face feature integration, and a feature enhancement technique combining 1D convolution and group normalization. The Enhanced Local Spatial and Global Channel Attention ELSCA improves the localization of regions of interest, while the Deep Feature Extraction Network DFENet refines high-level features. Extensive experiments demonstrate the superiority of our approach in real-world low-light classroom scenarios for student attention detection and behavior monitoring.
Keyword: Gaze Estimation, Classroom Behavior Monitoring, Local-Global Context Fusion, Attention Mechanism, Deep Feature Extraction Network.
Cite@inproceedings{ICIC2025,
author = {Jin Wang, Meng Chen, and Dandan Wang},
title = {Low-Light Gaze Estimation for Fine-grained Intelligent Classroom Behavior Monitoring},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2641-2658},
}
- FDML: An Improved Few-Shot Fault Detection Method for Transmission Lines Based on Meta-Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yanwei Zhang, Bocheng Huang, Guocheng An, and Xiaolong Wang
Abstract: In the task of fault detection in power transmission lines, certain fault categories suffer from the problem of insufficient samples, leading to inefficiencies in traditional object detection algorithms. Meta-learning, which employs multi-task learning and fine-tuning to extract common features across different tasks, performs well in few-shot object detection and demonstrates excellent generalization capabilities for new tasks. For this reason, an improved few-shot fault detection method based on meta-learning FDML is proposed in this paper. Firstly, to address the problem of distribution difference in data domains, a two-stage meta-learning training method is introduced to achieve model migration through meta fine-tuning. Secondly, we propose a support and query feature matching module SQFM to make the utmost of support features to assist detection, in which prototype features of the support class are accurately extracted in the first three stages of the backbone and then assigned to the query set features to highlight the class-specific representative features. To further integrate high-level feature before model prediction, a high-level semantic feature fusion module HSFF is designed to fuse RoI features and prototype features via combining the four feature fusion ways. Experimental results show that FDML effectively improves the few-shot object detection accuracy on the public dataset PASCALVOC and the fault dataset InsPLAD-fault, compared to the classic few-shot algorithms. Under the conditions of K= {5, 10, 20} shot in the fault dataset InsPLAD-fault, the mAP50 values are respectively 5.7 , 7.2 and 4.2 higher than the baseline network, which provides a solution for few-shot transmission line fault detection.
Keyword: transmission lines fault detection few-shot meta learning SQFM HSFF
Cite@inproceedings{ICIC2025,
author = {Yanwei Zhang, Bocheng Huang, Guocheng An, and Xiaolong Wang},
title = {FDML: An Improved Few-Shot Fault Detection Method for Transmission Lines Based on Meta-Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {174-191},
note = {Poster Volume Ⅰ}
}
- A Deep Learning Model for 3D Skeleton-Based Temporal Hand Gesture Localization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jingtao Chen, Jieyu Zhao, and Kedi Shen
Abstract: Temporal action localization has recently garnered widespread attention. Current studies on temporal action localization mostly focus on 2D temporal action localization for the human body, with little research on more complex hand gestures. 2D-based temporal hand gesture localization has notable drawbacks, such as motion ambiguity and difficulty in handling complex gestures. To address these issues, we innovatively designed a 3D skeleton-based temporal action localization model, which processes 3D hand skeleton sequences and has a more robust capacity for learning and representing complex hand ges-tures. The model includes a backbone network, a tem-poral action localiza-tion module, and a classification head. The GCN-based backbone network is responsible for feature extraction from the skeleton se-quence. The temporal action localization module fuses multi-scale temporal features through a pyr-amid structure, then performs action localization. The classification head predicts the action category. Additionally, we designed an innovative loss function to guide the model's learning. We validated our model on a self-constructed 3D hand skeleton dataset, and the results show that our model demonstrates good performance in temporal hand gesture lo-calization.
Keyword: Temporal action localization, hand gesture, skeleton, multi-view.
Cite@inproceedings{ICIC2025,
author = {Jingtao Chen, Jieyu Zhao, and Kedi Shen},
title = {A Deep Learning Model for 3D Skeleton-Based Temporal Hand Gesture Localization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2656-2669},
}
- A New Method of Exploring the Parameters of Heston Option Pricing Model: Multi-Population Genetic Algorithm, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yan Fang, Chang Guan, and Julius Wu
Abstract: The Heston option pricing model is fundamental for valuing financial derivatives. This paper enhances parameter estimation for the Heston model by extending the traditional Genetic Algorithm GA . While GA is effective for optimization, it often suffers from slow convergence and local minima. To address these issues, we introduce a multi-population Genetic Algorithm MPGA , which improves convergence stability and preserves the optimal solution. Empirical analysis of Shanghai 50ETF and Hang Seng Index options demonstrates that: 1 MPGA is an effective and reliable method for parameter estimation in the Heston model 2 the ask-bid weighting scheme significantly outperforms equal weighting in this context 3 the Heston model calibrated with MPGA achieves higher accuracy compared to traditional approaches and 4 the proposed method performs effectively in both developed and developing markets.
Keyword: Heston Model Multi-Population Genetic Algorithm Option Pricing SSE 50ETF Options Hang Seng Index Options.
Cite@inproceedings{ICIC2025,
author = {Yan Fang, Chang Guan, and Julius Wu},
title = {A New Method of Exploring the Parameters of Heston Option Pricing Model: Multi-Population Genetic Algorithm},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2486-2503},
}
- A fast multi-source target recognition system for Dangshan pear based on lightweight “graph neural network - YOLOv5sâ€, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Kaijie Zhang, Chao Wang, Xin Liu, Xiaoyong Yu, Yingying Wang, Dejun Li, and Kangjian Zhang
Abstract: China's Anhui Dangshan pear has a sweet and creamy taste and is a favorite product of consumers. However, it faces low economic added value and weak competitiveness, mainly due to the backwardness of post-production quality detection and grading and information management technology to address the above problems, the project integrates graph neural networks and YOLOv5s to construct a multi-source image detection system for the fruit to be tested combines graph neural networks to complete the comprehensive inversion of physicochemical properties, appearance and other physicochemical quality parameters utilizes the characteristics of YOLOv5s, which occupies little memory and has a fast recognition rate, to accelerate the rapid identification of the target fruit. The characteristics of YOLOv5s, which occupies little memory and has a fast recognition rate, are utilized to accelerate the rapid recognition of target fruits. Experimental results show that in the process of batch picture recognition, the average recognition rate of a single picture is about 0.02 seconds, and the recognition accuracy reaches 99.41 . At the same time to ensure that the production line fast and robust operation, the establishment of Dangshan pear multi-source information recognition system research and development, and steadily promote the quality and value-added fruit industry, and promote the rapid development of the regional economy.
Keyword: Graph neural network, YOLOv5s, Rapid detection, Dangshan pear.
Cite@inproceedings{ICIC2025,
author = {Kaijie Zhang, Chao Wang, Xin Liu, Xiaoyong Yu, Yingying Wang, Dejun Li, and Kangjian Zhang},
title = {A fast multi-source target recognition system for Dangshan pear based on lightweight “graph neural network - YOLOv5sâ€},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1912-1923},
note = {Poster Volume Ⅱ}
}
- Tor Traffic Classification Based on Burst Features, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ding Li, Yi Pan, and Yinlong Xu
Abstract: The classification of Tor traffic is of crucial importance in the identification of anonymous web applications and the defense against cybercrime. Previous studies have focused on the automatic extraction of raw traffic features by means of deep learning algorithms. However, these methods have neglected the global intrinsic relationship between local features at different data locations, which has resulted in limited classification performance. In this regard, a dark net traffic classification method based on burst feature aggregation, called burst matrix, is proposed. The proposed method involves the aggregation of temporal and length features of Tor traffic in terms of bursts, followed by the capture of local spatio-temporal features from the burst matrix using convolutional neural networks. The intrinsic relationships and hidden connections between the previously extracted spatio-temporal features are then mined using the self-attention mechanism. The efficacy of the burst matrix method is then evaluated using the ISCXTor2016 dataset. The experimental results demonstrate that the burst matrix significantly outperforms other contemporary methods, attaining an F1-score of over 95 .
Keyword: Network Security, Encrypted Traffic Identification, Dark Web, Onion Routing, Deep Learning.
Cite@inproceedings{ICIC2025,
author = {Ding Li, Yi Pan, and Yinlong Xu},
title = {Tor Traffic Classification Based on Burst Features},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {771-783},
note = {Poster Volume Ⅰ}
}
- DCNLLMs: Deep CTR Prediction with LLMs for En-hanced LTL Freight Matching, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chunhu Bian1, Fuyuan Liu, Yuxuan Guo, Dezheng Ji, Jinyue Liu, and Xiaohui Jia
Abstract: The logistics sector, particularly less-than-truckload LTL freight, is undergoing rapid development, with recommendation systems becoming increasingly crucial for optimizing operational efficiency. While deep learning and large language models LLMs have revolutionized recommendation systems across various domains, their application in LTL freight matching remains underexplored, with traditional methods still prevalent. To address this gap, this paper introduces DCNLLMs, a novel system designed for predicting click-through rates CTR in LTL cargo-vehicle matching scenarios. DCNLLMs leverages the extensive knowledge base of LLMs to provide expert-level recommendations. A key contribution is a specifically designed fine-tuning framework that aligns CTR prediction with the inherent knowledge of the LLM, significantly enhancing recommendation accuracy and relevance in the LTL logistics context. Comprehensive experiments comparing DCNLLMs with multiple state-of-the-art recommendation models demonstrate the superior effectiveness of our proposed approach. These findings not only validate the efficacy of DCNLLMs but also highlight its transformative potential in innovating LTL freight matching, paving the way for more efficient and intelligent logistics operations.
Keyword: DCNV3 Recommendation Large language model less-than-truckload
Cite@inproceedings{ICIC2025,
author = {Chunhu Bian1, Fuyuan Liu, Yuxuan Guo, Dezheng Ji, Jinyue Liu, and Xiaohui Jia},
title = {DCNLLMs: Deep CTR Prediction with LLMs for En-hanced LTL Freight Matching},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {11-22},
}
- CMUNet: Enhancing Image Segmentation through Advanced Channel Dependency Modeling, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Sizhe Yang, Yutao Qin, and Wei Ren
Abstract: Image segmentation aims to partition an image into meaningful regions, facilitating the analysis of its structure and content. U-shaped models inspired by UNet have become the dominant architecture in this field, leveraging an encoder-decoder design with skip connections to retain spatial details. Within this framework, CNNs and Transformers have achieved remarkable success but still suffer from either limited receptive fields or the high computational cost of long-range modeling. Recently, Mamba has emerged as a powerful approach for modeling long-range dependencies with linear complexity. However, existing efforts to integrate Mamba into U-Net primarily emphasize spatial feature extraction, largely overlooking the intricate inter-channel relationships encapsulating diverse semantic patterns. In this paper, we propose Channel Mamba UNet CMUNet , which explicitly captures channel dependencies with two key components: Channel Mamba CMamba , a module that adaptively recalibrates channel-wise features in the encoder, and Skip Connection Mamba SkiM , a mechanism that facilitates multi-level channel fusion to bridge the semantic gap between the encoder and decoder. Comprehensive evaluations on MoNuSeg, GlaS, and ISIC-2018 confirm the effectiveness of CMUNet, achieving Dice scores of 81.28, 92.18, and 91.13, along with superior IoU and HD95 metrics.
Keyword: Image Segmentation , Channel-wise Modeling , UNet , Mamba
Cite@inproceedings{ICIC2025,
author = {Sizhe Yang, Yutao Qin, and Wei Ren},
title = {CMUNet: Enhancing Image Segmentation through Advanced Channel Dependency Modeling},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {192-204},
note = {Poster Volume Ⅰ}
}
- A Method that Utilizes Negative Queries in Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jue Zhou, Hong Chen, and Qingling Zhao
Abstract: DEtection TRansformer DETR is an end-to-end object detection model based on transformers that generates multiple object queries per ground truth box and selects the best prediction through matching. However, studies have shown that many object queries rejected by the Hungarian matching algorithm still focus on foreground elements. Leveraging these potentially useful negative queries in DETR has thus emerged as a promising research direction.In this paper, I propose FUSION-DETR, a novel model that introduces a one-to-many matching strategy by grouping queries while preserving DETR's traditional one-to-one matching. This approach utilizes the foreground information embedded in queries rejected by Hungarian matching as negative samples. Furthermore, during training, the model dynamically assigns weights to the one-to-many loss using a clustering-based method, enhancing its robustness.Experiments demonstrate that the FUSION-DETR approach improves Deformable-DETR by 3.0 mAP50 on the BDD-100K datasets and achieves 70.5 mAP50 on COCO2017, outperforming existing DETR-based models incorporating one-to-many assignment.
Keyword: object recognition,neural networks,deep learning
Cite@inproceedings{ICIC2025,
author = {Jue Zhou, Hong Chen, and Qingling Zhao},
title = {A Method that Utilizes Negative Queries in Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1261-1272},
note = {Poster Volume Ⅱ}
}
- GFR: An Effective Plugin for Enhancing the ECG Classification Capability of Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xunde Dong, Yupeng Qiang, Xiuling Liu, Yang Yang, Yihai Fang, and Jianhong Dou
Abstract: Electrocardiogram ECG serves as a crucial non-invasive diagnostic tool for monitoring clinical cardiac conditions. Remarkable progress has been achieved in deep learning-based ECG classification research. Generally, its overall architecture can be divided into three parts: the feature extraction layer, the feature fusion methods typically concatenation and summation , and the multi-layer perceptron MLP classification layer.In this paper, we propose a plugin Global Feature Refinement GFR module to enhance the performance of multi-branch models for ECG classification.The GFR plugin assigns weights to different branching features in a dynamic disease-aware manner to capture critical global information while emphasizing important features. Specifically, these dynamic weights are obtained through the integration, mapping, and scaling of global features. Finally, the weighted features are summed for ECG classification. Extensive experiments on three large-scale imbalanced datasets demonstrate that the GFR plugin, with less 6.2k additional parameters,improves the performance of eight models of different sizes to varying degrees. Specifically, the maximum improvement in F1 score and accuracy was 8.27 and 6.41 , respectively.
Keyword: electrocardiogram ECG classification, feature refinement, multi-branch networks
Cite@inproceedings{ICIC2025,
author = {Xunde Dong, Yupeng Qiang, Xiuling Liu, Yang Yang, Yihai Fang, and Jianhong Dou},
title = {GFR: An Effective Plugin for Enhancing the ECG Classification Capability of Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1788-1803},
note = {Poster Volume Ⅱ}
}
- PRLL: Policy Regularization and Reward Shaping Assisted by Large Language Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Qianxia Zheng, Xiangfeng Luo, and Tao Wang
Abstract: Through continuous exploration and repeated trials, reinforcement learning RL enables agents to learn the optimal strategy, acquiring a certain level of behavioral intelligence. However, in complex and dynamically changing re-al-world environments, the state space and action spaces grow significantly larger. This implies that agents need to explore the environment more exten-sively to identify viable solutions. Unfortunately, such repetitive and inefficient exploration often leads to increased training time, higher costs, and greater risks. Several methods have emerged that use prior knowledge from large language models LLMs to assist RL training, but many of these approaches do not consider the issue of low sample efficiency. To address these challenges, we propose Policy Regularization and reward shaping assisted by Large Language models PRLL . Firstly, PRLL calculates the similarity between LLMs-generated suggestions and the agent's actions, using this as a regularization term, to constrain the agent's exploration direction. Secondly, to efficiently align the agent's behavior with human preferences, PRLL employs LLMs to evaluate the alignment between the agent's actions and human values, translating this evaluation into an intrinsic reward signal. Experiments in both discrete and continuous action spaces demonstrate that PRLL outperforms most baseline methods while requiring fewer training time steps.
Keyword: Deep reinforcement learning, Large Language Models, Policy Regularization, Reward Shaping
Cite@inproceedings{ICIC2025,
author = {Qianxia Zheng, Xiangfeng Luo, and Tao Wang},
title = {PRLL: Policy Regularization and Reward Shaping Assisted by Large Language Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {306-320},
}
- CP-Xception: A Lightweight Facial Expression Recognition Model For AI Companions, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaomeng Zhang and Jianhua Cao
Abstract: In this paper, we propose a lightweight facial expression recognition model: CP-Xception for AI companions, which is based on the Mini-Xception and features a small number of parameters, fast inference speed, and high recognition accuracy. Specifically, we integrate the feature segmentation concept from CSPNet into the model, splitting the input features into primary and secondary paths to extract deep and shallow features, respectively. We also incorporate the ParC module into the last two feature extraction stages of the backbone network. This enhancement enables the model to effectively capture both local details and global contextual information. The CP-Xception model is trained and evaluated on four public datasets: FER2013, FER2013Plus, CK_, and JAFFE. The results show that CP-Xception model achieves recognition accuracy improvements of 2.16 , 3.37 , and 4.31 over the Mini-Xception model on the FER2013, CK_, and JAFFE datasets respectively. And CP-Xception has only 30,149 parameters and 3.527 MFLOPs, which are approximately 50 of those of the Mini-Xception model, which makes the model more lightweight while also ensuring fast inference speed. We have deployed the model to the companion terminal for practical testing and observed satisfactory performance.
Keyword: facial expression recognition, CP-Xception model, AI companions, deep learning
Cite@inproceedings{ICIC2025,
author = {Xiaomeng Zhang and Jianhua Cao},
title = {CP-Xception: A Lightweight Facial Expression Recognition Model For AI Companions},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2672-2684},
}
- A Neural Network Framework Based on Symmetric Differential Equations, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Kun Jiang
Abstract: Modern mathematical neural networks are derived from biological neural net-works, yet the currently popular general large models do not incorporate biologi-cal neural networks. The primary reason for this is that the differential equations based on biological neural networks are difficult to manipulate. At present, math-ematical neural networks are characterized by their capacity for large-scale de-ployment, while biological neural networks offer strong biological interpretabil-ity. This paper introduces a system of differential equations with perfect sym-metry and convenient manipulability, enabling us to manipulate this system as easily as we manipulate numbers in a matrix, thus integrating the advantages of both. As we are introducing a brand-new neural network framework, we first ex-plore the mathematical properties of the differential equations, then define a new signal propagation method, and finally propose a new training approach for the neural network. The training of this new neural network does not rely on the tra-ditional back-propagation algorithm instead, it depends solely on the propagation of local signals. This implies that we no longer require global information to train the network. Each neuron can adjust based on the signals it receives and its pre-determined strategy. As a verification, we mimicked the linking method of a mul-tilayer perceptron MLP to create a new neural network and trained it on the MNIST dataset, demonstrating the effectiveness of our methodology.
Keyword: Symmetric differential equations, Fixed point, Multilayer perceptron, Neural net-work, Backward propagation.
Cite@inproceedings{ICIC2025,
author = {Kun Jiang},
title = {A Neural Network Framework Based on Symmetric Differential Equations},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1277-1292},
note = {Poster Volume Ⅱ}
}
- MetaCleaner: A Deep Neural Network for Phage Recognition with Denoising, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yingqi Liu,Yong Wang, and Ying Wang
Abstract: At present, the phage genome sequence recognition model based on deep learning technology faces two problems, namely, the pollution of human genome sequence and noise interference. To address this problem, we propose MetaCleaner, a phage genome sequence recognition model. MetaCleaner uses the k-mer count as the classification basis for genome sequences, and uses a parallel convolution filter and average pooling method to extract the k-mer count features of genome sequences. The denoising module implemented by the transformer architecture is used to predict the difference between the k-mer count feature of the noisy sequence and the k-mer count feature of the noise-free sequence, and the denoising operation is completed by subtracting the difference between the k-mer count feature of the noisy sequence. Finally, the denoised k-mer count feature is input into the fully connected layer to obtain the probability of the sequence belonging to phage and human. Our experiments on test sets with noise show that MetaCleaner is robust to noise, and experiments on real metagenomic datasets show that MetaCleaner outperforms recent proposed phage recognition models.
Keyword: Deep learning · Metagenomics · Denoising · phage identification · Noise.
Cite@inproceedings{ICIC2025,
author = {Yingqi Liu,Yong Wang, and Ying Wang},
title = {MetaCleaner: A Deep Neural Network for Phage Recognition with Denoising},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1168-1185},
}
- Hybrid Prototype Contrastive Learning with Cross-Attention for Few-Shot Relation Classification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zeyu Zhang, Shaowei Wang, Nana Bu, Junzhe Zhang, and Yuanyuan Xiao
Abstract: Few-shot relation classification FSRC aims to identify the relation class be-tween entities in a text with a small amount of labeled data. Recently, some studies have focused on optimizing prototype representations by incorporat-ing relation information into the prototype network or applying contrastive learning to alleviate the prediction confusion problem. However, these ap-proaches primarily rely on global instance features and relation information, making it difficult to capture fine-grained local semantic information, leading to misjudgment of abnormal samples and confusion of similar classes. To address these problems, we introduce a novel hybrid prototype contrastive learning HPCL model. Dynamically fusing global and local prototypes through a cross-attention mechanism significantly improves the performance of few-shot relation classification. In addition, HPCL combines a dual con-trastive learning strategy relation-prototype contrastive learning and query-prototype contrastive learning to effectively enhance intra-class feature sharing and inter-class feature discriminability by optimizing prototype rep-resentation. We have conducted extensive experiments on the public datasets FewRel 1.0 and FewRel 2.0, and the results show that HPCL not only per-forms well on traditional datasets but also demonstrates a strong generaliza-tion ability in cross-domain adaptation tasks, which can effectively alleviate the challenges brought by data scarcity and insufficient relation description.
Keyword: Few-shot relation classification, Prototype network, Relation information, Cross-attention mechanism, Contrastive learning.
Cite@inproceedings{ICIC2025,
author = {Zeyu Zhang, Shaowei Wang, Nana Bu, Junzhe Zhang, and Yuanyuan Xiao},
title = {Hybrid Prototype Contrastive Learning with Cross-Attention for Few-Shot Relation Classification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1928-1943},
note = {Poster Volume Ⅱ}
}
- Research on Confusing Entity Linking Method Based on Graph Neural Network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhen Zhang, Jiaxing Fan, Jing Wang, Yemao Zhang, Lingnan Bai, Zhe Xu, Ruiyao Han and Jun Yu
Abstract: Entity Linking EL is a fundamental task in Natural Language Processing NLP , aiming to accurately map entity mentions in text to their corresponding entities in a knowledge base, thereby bridging unstructured text with structured knowledge. EL plays a crucial role in various applications, including infor-mation retrieval, knowledge graph construction, and question answering. With the rapid advancement of deep learning, entity linking methods based on pre-trained language models PLMs have made remarkable progress, particularly in terms of semantic representation and context understanding. However, these methods still face challenges when dealing with ambiguous candidate entities, such as homonyms, synonyms, or cases with insufficient contextual infor-mation, which often lead to incorrect disambiguation. To address this issue, this paper proposes Confusing Entity Linking model based on Graph Neural Networks CEL-GNN , which leverages graph structures to capture subtle dif-ferences between candidate entity descriptions, thereby enhancing the accura-cy and robustness of entity linking. The proposed model first employs a BERT-based encoding layer to generate representations for both short texts and candidate entity descriptions. It then applies the TF-IDF method to extract keywords and construct a knowledge graph. Subsequently, a Graph Distillation Operator GDO is introduced to extract distinguishable features, further im-proving the disambiguation performance. Experimental results demonstrate that the proposed approach achieves outstanding performance on the CCKS2020 Chinese short-text entity linking benchmark. Compared to the baseline BERT model, our method achieves an F1 score of 88.9, significantly improving entity linking effectiveness.
Keyword: Entity Linking, Graph Neural Network, BERT.
Cite@inproceedings{ICIC2025,
author = {Zhen Zhang, Jiaxing Fan, Jing Wang, Yemao Zhang, Lingnan Bai, Zhe Xu, Ruiyao Han and Jun Yu},
title = {Research on Confusing Entity Linking Method Based on Graph Neural Network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3679-3691},
}
- Bearing Remaining Useful Life Prediction via Multi-Scale Convolution and Bidirectional Gated Recurrent Unit Network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jian Li, Rangyong Zhang, Hu Liang, and Yiming Zhang
Abstract: Accurate remaining useful life RUL prediction of rolling bearings plays a vital role in industrial predictive maintenance. Nevertheless, current approaches fail to effectively extract multi-scale degradation features in noisy environments, resulting in significant prediction inaccuracies. We propose a Multi-Scale Convolutional Bidirectional Gated Recurrent Unit MSCNN-BiGRU network for bearing remaining useful life prediction. First, raw vibration signals undergo deep feature extraction via a Stacked Denoising Autoencoder SDAE , followed by dimensionality reduction using a Hierarchical Self-Organizing Map HSOM to generate a 1D degradation curve DC . A Multi-Scale Convolution module is then constructed, incorporating 1D dilated convolution and a multi-scale strategy to extract degradation features from the DC, enabling the simultaneous capture of localized defects and global trend patterns. Finally, an attention layer is integrated at the feature input stage, combined with a GRU to construct a Bidirectional GRU BiGRU prediction model, which dynamically weights critical temporal dependencies for accurate RUL estimation. Experiments on the PHM2012 dataset that MAE is reduced by an average of 18.7 compared to sub-optimal models, and this work provides a generalizable framework for RUL prediction of rotating machinery, enhancing the reliability of industrial maintenance systems.
Keyword: Remaining Useful Life, Rolling bearings, Feature Extraction, Degradation Curve, Bidirectional Gated Recurrent Unit.
Cite@inproceedings{ICIC2025,
author = {Jian Li, Rangyong Zhang, Hu Liang, and Yiming Zhang},
title = {Bearing Remaining Useful Life Prediction via Multi-Scale Convolution and Bidirectional Gated Recurrent Unit Network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1292-1303},
note = {Poster Volume Ⅱ}
}
- Enhancing Fire and Smoke Segmentation with Cross-Attention Side-Adapter Network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuyang Deng, Xiying Luan, and Fan Zhong
Abstract: Fire and smoke segmentation is crucial for disaster management and emergency response. Existing fire and smoke segmentation methods predominantly rely on conventional deep learning models such as U-Net. However, a key challenge in fire and smoke segmentation is the inherent uncertainty in target scale—segmentation targets can range from large-scale fire or smoke regions to minute initial flames or smoke particles. This multi-scale characteristic poses additional challenges, as traditional models often struggle to balance global feature extraction for large targets with fine-grained feature representation for small targets, leading to missed or inaccurate detections. Furthermore, existing fire segmentation datasets exhibit limited diversity, resulting in models trained with conventional methods that lack generalization ability in cross-domain applications. To overcome these limitations and enhance model performance, this study proposes an optimized Side Adapter Network SAN that integrates cross-attention mechanisms and a CrossViT architecture to improve feature extraction across different target scales. Specifically, the proposed approach employs cross-attention mechanisms to enhance information exchange between CLIP and the side network, while CrossViT effectively strengthens the side network’s capability in capturing fine-grained image details. Experimental results demonstrate that, compared to traditional CNN and Transformer-based models, the optimized SAN achieves significant improvements in accuracy for fire and smoke detection and segmentation tasks. Moreover, due to its strong open-vocabulary semantic segmentation capability, the model exhibits robust generalization in cross-domain applications, enabling it to effectively handle complex environments and diverse fire scenarios.
Keyword: Fire And Smoke Image Segmentation, Open-Vocabulary Semantic Segmentation, Cross Attention, Side Adapter Network.
Cite@inproceedings{ICIC2025,
author = {Yuyang Deng, Xiying Luan, and Fan Zhong},
title = {Enhancing Fire and Smoke Segmentation with Cross-Attention Side-Adapter Network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2687-2699},
}
- NoC Security through Encryption: Mitigating Threats from Compromised Networks, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hengtai Zhao and Guozhi Song
Abstract: Network on chip NoC is a communication architecture that uses on-chip communication protocols and infrastructure to connect various components within a system on chip SoC . NoC has replaced the traditional bus-based communication architecture with a network-based approach. It typically consists of routers and links that connect processing cores, memory blocks, and other IP blocks within the chip. The high scalability of NoC allows for the integration of many processing components or IP cores on a single chip. The increase in SoC complexity has led to an increase in IP utilization from third-party suppliers. Attack NoC by implanting hardware trojans HT into the IP from an unreliable third-party IP provider. Since it is directly related to various aspects of the chip, it will become a prime target for data leakage and other security attacks. To enhance the anonymity of critical security in-formation in the NoC, we considered packet encryption and routing methods that explore path diversity. We designed and implemented a new NoC archi-tecture and proposed the use of the Ascon encryption algorithm and secure anonymous routing for safeguarding secure packets. Secondly, we also con-ducted experimental evaluations of the improved NoC architecture using the gem5 simulation tool. Through these evaluations, we verified the perfor-mance of the architecture under different traffic patterns. The results demonstrate that while ensuring low network latency, it can effectively pre-vent packets from being maliciously redirected and reduce the average hop count increased by hardware Trojan attacks.
Keyword: Network-on-Chip, eavesdropping, anonymous routing, Ascon.
Cite@inproceedings{ICIC2025,
author = {Hengtai Zhao and Guozhi Song},
title = {NoC Security through Encryption: Mitigating Threats from Compromised Networks},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {784-801},
note = {Poster Volume Ⅰ}
}
- Patent Value Prediction Method Based on Bibliographic Item Concatenation, Legal Value Calculation, and Deep Learning Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaoyuan Ma, Liang Zhang, and Wei Yan
Abstract: Citation analysis in high-value patent identification faces challenges such as regional bias, time lag, and insufficient legal event analysis. This paper adopts a strategy combining multi-source data fusion and deep learning techniques to enhance the accuracy and comprehensiveness of patent value assessment. The dataset is sourced from the IncoPat patent database. Patent validity duration serves as the key metric for categorization. The data is divided into three value classes: low, medium, and high after value calculation.. Oversampling is applied to address imbalanced sample distribution, laying the groundwork for subsequent model research. The study introduces a patent value assessment model built on BERT and BiLSTM. The BERT embedding layer captures word semantics. The BiLSTM encoder deeply encodes the semantic structure of the text. The value prediction layer outputs classification probabilities. The BERT-BiLSTM model is compared with the BERT model. Experimental results on the test set show that the BERT-BiLSTM model achieves a lower test loss of 0.53 and a higher test accuracy of 79.40 , surpassing the BERT model's 77.31 . For the high value class, the BERT-BiLSTM model outperforms the BERT model in recall and F1 scores. The results demonstrate the superior performance of the BERT-BiLSTM model in patent text value classification tasks. This method exhibits significant advantages in patent value prediction.
Keyword: BERT Patent Recommendation Framework Patent Value Judgment Deep Learning
Cite@inproceedings{ICIC2025,
author = {Xiaoyuan Ma, Liang Zhang, and Wei Yan},
title = {Patent Value Prediction Method Based on Bibliographic Item Concatenation, Legal Value Calculation, and Deep Learning Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {729-744},
}
- Unsupervised Local Editing of Ocular Images via Reverse-attention block, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Renzhong Wu, Shenghui Liao, Jianfeng Li, Lihong Liu, Xiaoyan Kui, and Yongrong Ji
Abstract: Predicting the postoperative appearance after strabismus surgery is of great sig-nificance for improving patients' understanding of the postoperative outcomes, enhancing communication between doctors and patients, and alleviating preopera-tive anxiety. Although some researchers have used image generation models to predict postoperative appearances for various diseases, existing methods typically rely on paired data for model training. Generative Adversarial Networks GANs have demonstrated strong application potential in image generation tasks, and cy-cle consistency loss has promoted the development of unsupervised image gener-ation techniques. However, traditional cycle consistency loss often results in the retention of unnecessary traces from the source image in the generated images. To address these issues, we propose an unsupervised image generation model based on GANs. By incorporating a reverse-attention block into the generator, the mod-el is guided to focus on key editing regions. Additionally, we employ reverse-attention consistency loss to maintain identity consistency while reducing unnec-essary trace residues. Furthermore, we introduce a multi-scale discriminator to ensure that the generated images have more reasonable texture details. Experi-mental results demonstrate that our model effectively reduces trace residues in the generated postoperative images and produces details that are more consistent with reality.
Keyword: Image-to-image Translation, Reverse-attention Consistency Loss, Unsupervised Learning, Generation Adversarial Networks.
Cite@inproceedings{ICIC2025,
author = {Renzhong Wu, Shenghui Liao, Jianfeng Li, Lihong Liu, Xiaoyan Kui, and Yongrong Ji},
title = {Unsupervised Local Editing of Ocular Images via Reverse-attention block},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {205-216},
note = {Poster Volume Ⅰ}
}
- Enhancing Multi-Category Smoke Detection Using Similarity Constrained Loss, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xingyuan Chen, Ding Xu, Yuzhe Huang, Qishen Chen, and Huahu Xu
Abstract: Smoke detection plays a crucial role in ensuring public safety across various domains, including industrial settings, daily life, and disaster management. The effectiveness of smoke detection models heavily relies on the availability of comprehensive datasets and the optimization of loss functions. However, existing smoke detection research primarily focuses on fire-related scenarios, overlooking the significant differences in characteristics between smoke generated by different causes. To address this issue, we have developed a Multi-Category smoke detection dataset MC-smoke dataset , which is organized based on the smoke's origin and main components. This dataset includes three categories of smoke and contains a total of 1,115 images. Furthermore, to alleviate the loss ambiguity issue present in existing object detection losses, we propose a novel Similarity-Constrained SC loss function. This function uses a similarity constraint coefficient in the bounding box to influence center regression and vertex regression losses, enabling more accurate smoke detection. Lastly, extensive experiments were conducted on both the MC-smoke dataset and the classic object detection dataset PASCAL VOC 2007, validating the substantial effectiveness enhancement achieved by the SC loss function. Additionally, comprehensive baseline and comparative experiments were conducted to affirm the suitability of the MC-smoke dataset for research about smoke detection training, testing, and validation.
Keyword: Smoke detection,Loss ambiguity,Bounding box loss,Multi-category smoke
Cite@inproceedings{ICIC2025,
author = {Xingyuan Chen, Ding Xu, Yuzhe Huang, Qishen Chen, and Huahu Xu},
title = {Enhancing Multi-Category Smoke Detection Using Similarity Constrained Loss},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {217-229},
note = {Poster Volume Ⅰ}
}
- An Indoor Terminal Positioning Algorithm for Mobile Communication Management and Control Scenarios, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jingwen Fu, Hang Zhang, Liqi Zhuang, Xing Gao, and Meng Zhang
Abstract: The popularity of mobile terminals has brought convenience to people's lives, but in some important areas and critical places, the illegal use of mobile terminals will introduce the problem of sensitive information leakage, for the security and confidentiality risk of such scenes. This paper proposes an indoor mobile terminal high-precision grid positioning algorithm based on the Received Signal Strength Indicator RSSI . Firstly, by mining the uplink signal protocol characteristics of mobile communication, the correlation of the demodulation reference signal Demodulation Reference is used to accurately extract the target user data, and the interference from non-target radiation sources is effectively excluded. Secondly, an efficient grid positioning strategy is proposed, which divides the complex indoor environment into multiple fine grids and constructs a probabilistic model based on RSSI data to achieve the target location estimation. Finally, an indoor mobile terminal positioning prototype system for mobile communication control scenarios is built based on the self-developed receiver board. Experiments show that the method in this paper is an effective and feasible solution for indoor mobile terminal positioning in mobile communication control scenarios and improves the ability to protect electromagnetic security in confidential and critical areas.
Keyword: Indoor location, Received signal strength indicator, Demodulation reference signal, Terminal management and control.
Cite@inproceedings{ICIC2025,
author = {Jingwen Fu, Hang Zhang, Liqi Zhuang, Xing Gao, and Meng Zhang},
title = {An Indoor Terminal Positioning Algorithm for Mobile Communication Management and Control Scenarios},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1804-1817},
note = {Poster Volume Ⅱ}
}
- SAM2-LongVideo: a robust tracking system based on SAM2 and YOLO, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Saijun Wang
Abstract: Video Object Segmentation VOS aims to track and segment objects across video sequences at the pixel level. SAM2, a state-of-the-art segmentation model, excels in zero-shot image and video segmentation but faces challenges in long video processing, including object tracking loss and high memory usage. This paper proposes a solution by combining SAM2 with YOLOv11. YOLOv11 provides real-time bounding box prompts for efficient segmentation, while SAM2 segments the target across sub-sequences. This approach reduces memory requirements and prevents object tracking loss by dividing the video into smaller sub-sequences without laborious human annotation. Experimental results show that the method maintains high segmentation accuracy with minimal memory consumption, making it suitable for long video segmentation in resource-constrained environments. This method offers an efficient and scalable solution for VOS tasks in various applications. The codes and GUI are available at https: github.com Saijun-Wang SAM2-LongVideo.
Keyword: Video Object Segmentation, SAM2, YOLOv11, Long Video Segmentation, Memory Optimization.
Cite@inproceedings{ICIC2025,
author = {Saijun Wang},
title = {SAM2-LongVideo: a robust tracking system based on SAM2 and YOLO},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2703-2712},
}
- Distribution-Aware Unsupervised Attacks on Graph Contrastive Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jinhua Huang, Jing Zhu, Zhao Ma, Hongli Ding, Yizhuo Wang, and Xucen Luo
Abstract: Graph contrastive learning has significantly advanced unsupervised graph representation learning, achieving performance comparable to supervised models. However, the robustness of graph contrastive learning models remains a major challenge. Most existing adversarial attacks are designed for supervised settings, making them inapplicable when label information is unavailable in unsupervised scenarios. To address this limitation, we propose D2AGCL, a distribution-aware unsupervised attack specifically designed for graph contrastive learning models. Our approach poisons graph data to degrade the overall quality of graph contrast learning embeddings by dynamically adjusting attack zones and gradient aggregation strategies, thus compromising the performance of down stream tasks. Extensive experiments on multiple benchmark datasets demonstrate that D2AGCL consistently outperforms existing unsupervised attack methods and even achieves comparable or superior performance against supervised adversarial baselines.
Keyword: Graph Contrastive Learning, Unsupervised Attack, Distribution Shift.
Cite@inproceedings{ICIC2025,
author = {Jinhua Huang, Jing Zhu, Zhao Ma, Hongli Ding, Yizhuo Wang, and Xucen Luo},
title = {Distribution-Aware Unsupervised Attacks on Graph Contrastive Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {233-246},
}
- Coarse-to-Fine Scene Graph Similarity Reasoning for Image-text Retrieval, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chengsong Sun, Qingyun Liu, Yuankun Liu, Boyan Liu, Xiang Yuan, Bingce Wang, Tong Mo, and Weiping Li
Abstract: Image-text retrieval is a crucial task, which targets at finding the counterparts from the opposing modalities. Scene graph based image-text retrieval methods leverage the object and predicate features to reason the cross-modal similarity, therefore increasing the retrieval accuracy. However, existing scene graph based image-text retrieval methods simply fuse the similarity calculations for features at each granularity in a single network, which only brings a slight improvement in the retrieval performance. The features of the scene graph fail to be effectively utilized. Therefore, this paper proposes a Coarse-to-Fine Scene Graph Similarity Reasoning CFSGR method to conduct coarse-grained and fine-grained cross-modal similarity reasoning, separately. CFSGR includes two networks: coarse-grained similarity reasoning network for graphs, fine-grained similarity reasoning network for objects and predicates. Moreover, CFSGR conducts local and global alignments for each feature, ensuring that the similarities at each granularity of visual and textual scene graphs are fully exploited. The evaluation and ablation study on Flickr30K demonstrates the superiority of CFSGR among the SOTA State-Of-The-Art image-text retrieval methods, and CFSGR achieves competitive results with Rsum as 506. The source code is available at https: github.com okeike CFSGR.
Keyword: Image-text retrieval, Multi-modal similarity reasoning, Scene graph, Contrastive learning
Cite@inproceedings{ICIC2025,
author = {Chengsong Sun, Qingyun Liu, Yuankun Liu, Boyan Liu, Xiang Yuan, Bingce Wang, Tong Mo, and Weiping Li},
title = {Coarse-to-Fine Scene Graph Similarity Reasoning for Image-text Retrieval},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3571-3584},
}
- Adaptive Weight Optimization for Ship Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Peng Sheng and Ruifu Wang
Abstract: Ship detection is crucial for maintaining maritime sovereignty and monitoring ocean pollution. However, deploying this technology in complex marine environments presents significant challenges, especially when detecting small vessels. Their subtle features, multi-scale variations, and the interference of complex backgrounds often result in poor target localization and classification accuracy in existing models. To address these issues, this paper presents a ship detection model based on multi-modal fusion. The model leverages pre-trained parameters from public datasets to extract features, enhances target identification through a cross-modal synergy mechanism, and introduces an uncertainty loss function to dynamically adjust loss weights, significantly improving detection accuracy across different ship sizes and complex backgrounds. Experimental results on the Levir-Ship dataset, which includes optical remote sensing images, demonstrate the model’s effectiveness with AP , A P _ 50 , A P _ 75 , and AR , scores of 33.7 , 84.8 , 16.1 , and 45.4 , respectively. These results validate the model’s superiority in ship detection, offering strong technical support for maritime surveillance and pollution monitoring, and paving the way for future advancements in marine monitoring technologies.
Keyword: Ship detection, deep learning, multi-modal fusion
Cite@inproceedings{ICIC2025,
author = {Peng Sheng and Ruifu Wang},
title = {Adaptive Weight Optimization for Ship Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {230-246},
note = {Poster Volume Ⅰ}
}
- T-Attention: Optimizing Attention Computation Using Temporal Parameter Time, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yichen Yang, Hongxu Hou, and Wei Chen
Abstract: The attention mechanism exhibits remarkable capability in processing sequential data however, its computational complexity scales quadratically with sequence length, resulting in significant resource demands. Numerous studies have achieved substantial success in leveraging sparse matrices to reduce the computational burden of dot-product operations, thereby improving both the computational efficiency and accuracy of models. Nevertheless, the question remains: can we further optimize these computations? In this paper, we introduce a novel approach based on function projection, integrating a restructured word embedding technique with the attention mechanism to alleviate computational overhead. We first validate the theoretical efficacy of designing word embedding using parametric equations and demonstrate the effectiveness of our proposed embedding method. Subsequently, we conduct experiments across a variety of basis functions, illustrating that our approach affords greater flexibility in parameter selection while effectively reducing computational costs. Compared to state-of-the-art attention-based models, our method achieves a reduction in inference time, underscoring its practical advantages.
Keyword: Attention, Parametric Equations, Fourier Series, Function Projection.
Cite@inproceedings{ICIC2025,
author = {Yichen Yang, Hongxu Hou, and Wei Chen},
title = {T-Attention: Optimizing Attention Computation Using Temporal Parameter Time},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1308-1322},
note = {Poster Volume Ⅱ}
}
- HMoE-SiMBA: Heterogeneous Mixture-of-Experts with SiMBA Attention for Robust Chinese Speech Emotion Recognition, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Lu Wang and Xinyue Duan
Abstract: Speech Emotion Recognition SER for Mandarin Chinese is crucial for human-computer interaction, yet faces challenges in real-world applications due to unique tonal and prosodic features. Existing methods suffer from limitations in feature extraction, model generalization, and computational efficiency. To address these issues, we propose HMoE-SiMBA, a novel framework based on HMoE Heterogeneous Moxture-of-Experts and SiMBA Simplified Mamba-Based Architecture attention for addressing stability and generalization issues in Chinese SER. Our approach employs a multi-modal feature representation layer to comprehensively capture emotional cues, utilizes heterogeneous feature extractors with dynamic routing to enhance feature adaptability, and combines EinFFT and Mamba for efficient sequence modeling. Experiments on the CASIA dataset demonstrate that HMoE-SiMBA achieves 92.2 accuracy, significantly outperforming existing methods with robust performance in complex acoustic environments.
Keyword: Chinese speech emotion recognition, State Space Models, Heterogeneous Mixture-of-Experts
Cite@inproceedings{ICIC2025,
author = {Lu Wang and Xinyue Duan},
title = {HMoE-SiMBA: Heterogeneous Mixture-of-Experts with SiMBA Attention for Robust Chinese Speech Emotion Recognition},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1323-1339},
note = {Poster Volume Ⅱ}
}
- ObjectContrast: Self-supervised Point Cloud Pre-training via Object Feature Contrast, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Nuo Xu, Qinghong Yang, and Weiguang Zhuang
Abstract: Current point cloud object detection methods rely on expensive manual annotation. Utilizing contrastive learning for self-supervised pre-training on unlabeled large-scale point clouds can reduce annotation costs and improve model performance. However, selecting effective features for instance discrimination is crucial for contrastive learning. Previous methods have constructed instances for pre-training at different levels, such as points, proposals, and scenes, but the features of these instances differ from the objects to be detected. Considering that instance discrimination tasks based on object-level features align with downstream object detection tasks, we propose a novel and efficient self-supervised point cloud object detection pre-training framework called ObjectContrast. To learn more effective point cloud representations, this framework constructs two self-supervised pre-training modules: object-level instance discrimination contrast ObCo and bounding box geometric contrast prediction BoxCo . ObCo drives the model to learn general object representations to locate object foregrounds and determine categories. BoxCo enhances the model's geometric perception capabilities regarding the dimension and orientation of 3D bounding boxes. Extensive experiments on various detectors and datasets validate the efficiency and transferability of ObjectContrast. Compared with the state-of-the-art self-supervised pre-training methods, ObjectContrast demonstrates superior performance.
Keyword: Self-supervised,Point Cloud,Object Detection
Cite@inproceedings{ICIC2025,
author = {Nuo Xu, Qinghong Yang, and Weiguang Zhuang},
title = {ObjectContrast: Self-supervised Point Cloud Pre-training via Object Feature Contrast},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {247-262},
}
- Complex Encoding Transformer for 3D Sonar Target Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Tiancheng Cai, Dongdong Zhao, Peng Chen, Yiran Li, Xiang Tian, and Rong-hua Liang
Abstract: With the ongoing advancement of 3D sonar detection technology, research on underwater 3D target detection has gained increasing significance. Currently, there is substantial research on optical point clouds, while research on 3D sonar point clouds remains limited. Underwater 3D sonar recognition differs from optical recognition, facing challenges such as high sparsity, strong noise intensity, and inter-object coupling. However, traditional optical-based methods struggle with recognizing coupled targets like frogmen and bubbles. This paper proposed a detection method based on a dynamic complex encoding transformer. By combining the principles of sparse array 3D sonar imaging and complex decoupling based on prior knowledge, noise and sidelobe interference are effectively reduced. Addressing the challenges of detecting concealed targets, this paper proposed a novel 3D backbone based on complex-encoding, which effectively enhances additional information around targets, achieving efficient recognition of 3D sonar targets. Finally, our model achieved satisfactory performance through both qualitative and quantitative experiments.
Keyword: Underwater detection, 3D sonar, acoustic pointcloud, transformer, complex encoding.
Cite@inproceedings{ICIC2025,
author = {Tiancheng Cai, Dongdong Zhao, Peng Chen, Yiran Li, Xiang Tian, and Rong-hua Liang},
title = {Complex Encoding Transformer for 3D Sonar Target Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2718-2729},
}
- MHASSNet: A Deep Neural Network-based Automatic Sleep Staging Model Using Hybrid Attention Mechanism and State Space Model, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhentao Huang, Shanwen Zhang, and Yin Tian
Abstract: Automatic sleep stage classification is an important means of measuring sleep quality. This article introduces a new deep learning architecture, MHASSNet, which aims to improve the accuracy of sleep stage classification to more effectively assess sleep quality. The model features a hybrid attention mechanism and multimodal signal processing capability. First, MHASSNet extracts low-frequency and high-frequency features through a multi-scale convolutional neural network MSCNN . Then, it utilizes a hybrid attention module MAM that combines spatial and channel attention mechanisms to capture the important spatiotemporal dependencies between these features. Additionally, a state-space model SSM is employed to enhance the understanding of temporal context information. Experimental results show that, when tested on two public datasets, MHASSNet achieved significant results across various evaluation metrics, demonstrating its superior performance and potential applications in automatic sleep stage classification.
Keyword: Sleep Stage, Multi Scale Convolution, Attention Mechanism
Cite@inproceedings{ICIC2025,
author = {Zhentao Huang, Shanwen Zhang, and Yin Tian},
title = {MHASSNet: A Deep Neural Network-based Automatic Sleep Staging Model Using Hybrid Attention Mechanism and State Space Model},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2191-2203},
note = {Poster Volume Ⅱ}
}
- Wavelet-Based Cross-Frequency and Cross-Region Interaction Convolutional Neural Network for Working Memory Load Level Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Congming Tan, Yahong Ma, and Yin Tian
Abstract: The detection of Working Memory Load WML plays a crucial role in neu-rofeedback processes and the treatment of disorders such as ADHD. How-ever, the performance of existing detection methods remains unsatisfactory. Neuropsychology research indicates that high-level cognitive processes are driven by both inter-regional collaborations across different brain functional areas and cross-frequency couplings. To comprehensively capture brain ac-tivities spanning both frequency domains and intra inter-regional interac-tions, we propose a novel cognitively-inspired neural network – the Wavelet-based Cross-Frequency and Cross-Region Interaction Convolutional Neural Network CFCRNet – for WML decoding. Specifically, CFCRNet first em-ploys predefined wavelet kernels to perform 1D convolution for time-frequency feature extraction, followed by multi-branch learning to model cross-frequency feature coupling with varying scales, and finally integrates intra- and inter-regional information interactions through spatial attention mechanisms. This architecture systematically fuses neurophysiologically meaningful cross-frequency coupling mechanisms with functional integra-tion principles across brain regions, constructing a network model capable of simultaneously resolving dynamic characteristics of neural signals across dif-ferent frequency bands and complex interactive relationships within between functional areas. Experimental validation on our collected working memory dataset and public benchmarks demonstrates that incorporating neuroscien-tific priors into neural network design enhances classification performance. Collectively, our findings establish an advanced framework for accurate WML detection that can be extended to explore detection tasks associated with other cognitive behaviors and neurological disorders.
Keyword: working memory, cross-frequency, cross-region interaction.
Cite@inproceedings{ICIC2025,
author = {Congming Tan, Yahong Ma, and Yin Tian},
title = {Wavelet-Based Cross-Frequency and Cross-Region Interaction Convolutional Neural Network for Working Memory Load Level Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2207-2223},
note = {Poster Volume Ⅱ}
}
- DAE-FAN: Simple and Efficient Mining of Periodic Patterns for Traffic Prediction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Sheng Li and Shuojiang Xu
Abstract: As a fundamental task of intelligent transportation systems, traffic prediction aims to forecast traffic time series in road networks based on historical observation data to support upper-level applications. Although deep learning models have performed well in this field in recent years, their architectures have become increasingly complex and less efficient and lack sufficient modelling of the intrinsic features of traffic data over time. In this paper, we focus on the special periodicity of traffic data and propose a novel model called DAE-FAN, which designs a dual adaptive embedding mechanism consisting of feature, periodicity, and spatial embedding. Employing self-supervised learning, the combination of these embedding matrices can autonomously represent the periodic changes in traffic features and spatial patterns. In addition, we introduce the Fourier principle to construct the Fourier neural network to enhance the capability of modelling periodicity. Extensive experiments on four large public traffic datasets demonstrate the superior performance and efficiency of DAE-FAN with its simpler structure compared to current traffic prediction models, providing a promising direction for efficiently solving traffic prediction challenges.
Keyword: Traffic Prediction, Adaptive Embedding, Fourier Principle
Cite@inproceedings{ICIC2025,
author = {Sheng Li and Shuojiang Xu},
title = {DAE-FAN: Simple and Efficient Mining of Periodic Patterns for Traffic Prediction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {23-35},
}
- Transformer-Based Anomaly Detection in Deep Reinforcement Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhen Chen, Jian Zhao, Youpeng Zhao, Yong Liao, and Hu Huang
Abstract: Despite promising potential of Deep Reinforcement Learning DRL to adapt to various tasks, it remains highly vulnerable to adversarial attacks or anomalous observation signals. Existing research on DRL robustness does not fully safeguard against all types of adversarial perturbations and disturbances, which hampers its use in critical real-world systems and applications, such as smart grids, traffic control, and autonomous vehicles. In these contexts, anomalous states can cause decision-making errors that may be exacerbated in subsequent actions, resulting in irreparable damage. To promptly detect anomalous states and prevent greater losses, we propose the Transformer-Based Anomaly Detection T-BAD framework. Utilizing the transformer's ability to handle sequential data robustly, T-BAD enables real-time detection of anomalous states and actions. Specifically, first, our approach involves collecting trajectory data from the DRL algorithm used by the agent and training a transformer module to accurately model the agent's actions. For anomaly detection, we input k consecutive states and k-1 consecutive actions excluding the current action into the transformer, and then compare its output with the agent's action. The difference between them indicates whether DRL model is influenced by interference or adversarial attacks. Extensive experiments across multiple scenarios in Atari and Mujoco demonstrate that our T-BAD outperforms existing baselines in anomaly detection while also possessing some capacity for anomaly correction.
Keyword: Deep reinforcement learning, Transformer, Anomaly detection.
Cite@inproceedings{ICIC2025,
author = {Zhen Chen, Jian Zhao, Youpeng Zhao, Yong Liao, and Hu Huang},
title = {Transformer-Based Anomaly Detection in Deep Reinforcement Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {321-334},
}
- Multi-Agent Reinforcement Learning with Cooperative Mechanism for Dynamic Job Shop Scheduling Problem, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jie Shang, Junqing Li, and Jiake Li
Abstract: With the vigorous growth of the manufacturing industry, the complex dynamic scheduling problem has become the focus of enterprise production and cannot be ignored. Therefore, this paper established a mathematical model of the dynamic job shop scheduling problem DJSP with uncertain processing time. The optimi-zation objective was to minimize the makespan. Firstly, a new abstract algebraic structure scheduling group is proposed for the first time to represent the schedul-ing relationships. Then, a multi-agent reinforcement learning MADRL method was proposed, including two agents: proximal policy optimization PPO and deep Q-network DQN . The DQN changes the network structure under the co-operative mechanism obtaining samples with higher priority in experience replay. In PPO, state features are characterized by the scheduling solutions of jobs and machines. Different dispatching rules are assigned to feasible machines as action space, with the reward function defined by idle time during scheduling. Finally, the proposed method was compared with other heuristic dispatching rules through static and dynamic experiments. In various scales of instances, the results demonstrated significant performances, further validating the generality and supe-riority of the method.
Keyword: Dynamic job shop scheduling, Uncertain processing time, Multi-agent reinforce-ment learning.
Cite@inproceedings{ICIC2025,
author = {Jie Shang, Junqing Li, and Jiake Li},
title = {Multi-Agent Reinforcement Learning with Cooperative Mechanism for Dynamic Job Shop Scheduling Problem},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {335-348},
}
- Enhancing the Robustness of Classification Against Adversarial Attacks through a Dual-Enhancement Strategy, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Guo Niu, Shuaiwei Jiao, Nannan Zhu, Juxin Liao, Shengjun Deng, Tao Li, Xiongfei Yao, and Huanlin Mo
Abstract: Deep neural networks have achieved remarkable success in target classification, but as accuracy improves, model robustness has become a growing concern. Existing methods, such as adversarial training, enhance robustness, yet adversarial examples can still lead to high-confidence, incorrect predictions. To address this issue, we propose a new defense mechanism—Dynamic MixCut. This method combines the advantages of multi-box CutMix and Mixup by enhancing the diversity and complexity in the sample generation process, enabling more effective defense against complex adversarial attacks, especially in dynamic perturbation environments. Through in-depth theoretical analysis, we reveal the fundamental reasons behind the robustness limitations of traditional Mixup under multi-step attacks, particularly the limitations of mixing adversarial perturbations between samples. Furthermore, the Dynamic MixCut method enhances the model's adaptability to diverse attack strategies by integrating more sophisticated perturbation designs in the generation of adversarial examples, thereby mitigating the trade-off between standard accuracy and adversarial robustness. Experimental results on the CIFAR-10 and SVHN datasets demonstrate that the Dynamic MixCut method improves adversarial accuracy by over 10 on average compared to the baseline while preserving standard accuracy. This research provides novel insights into robust training for object classification tasks and contributes to the advancement of adversarial training techniques.
Keyword: Object classification, Adversarial Attacks, Multi-step Attacks,Adversarial Robustness, Robust Training.
Cite@inproceedings{ICIC2025,
author = {Guo Niu, Shuaiwei Jiao, Nannan Zhu, Juxin Liao, Shengjun Deng, Tao Li, Xiongfei Yao, and Huanlin Mo},
title = {Enhancing the Robustness of Classification Against Adversarial Attacks through a Dual-Enhancement Strategy},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {247-259},
note = {Poster Volume Ⅰ}
}
- FE-DETR: A Fourier-Enhanced, Edge-Aware Framework for UAV-Based Remote Sensing Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Guocheng An, Wenbin Liu, and Pengzhan Sheng
Abstract: Aerial object detection in drone-based imagery presents unique challenges including sub-20px targets, motion blur, dense occlusions, and complex backgrounds. Existing methods struggle to harmonize spectral sensitivity with spatial precision while maintaining real-time efficiency. This paper pro-poses FE-DETR, an optimized end-to-end framework integrating Fourier-enhanced processing, adaptive attention, and edge-aware fusion. First, the Fourier-Enhanced Feature Fusion FFF module synergizes global fre-quency analysis with multi-scale dilated convolutions, amplifying faint ob-ject signatures while preserving structural integrity under motion blur. Sec-ond, the Adaptive WL-GH Attention dynamically allocates computation be-tween local window attention and global cross-window reasoning via learna-ble feature statistics. Third, the Edge-Enhanced Multi-Scale Fusion neck E²MF embeds physics-inspired Sobel operators to maintain structural co-herence in occlusion-heavy scenes. Evaluated on VisDrone2019, FE-DETR achieves state-of-the-art 50.4 mAP50 and 31.1 mAP50-95 with 17.3M parameters and 54.9G FLOPs. Ablation studies confirm the complementary benefits of spectral-spatial fusion and edge-aware processing. The framework demonstrates robust performance across illumination variations and scale disparities, offering practical efficiency for UAV deployment. Code will be released at https: github.com Avery5233 FE-DETR.
Keyword: Aerial Object Detection, RT-DETR, Fourier-Enhanced Feature Fusion, Edge-Aware Neck
Cite@inproceedings{ICIC2025,
author = {Guocheng An, Wenbin Liu, and Pengzhan Sheng},
title = {FE-DETR: A Fourier-Enhanced, Edge-Aware Framework for UAV-Based Remote Sensing Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2734-2748},
}
- Effective Local Texture Estimation Using Wavelet Transforms for Arbitrary-Scale Super-Resolution, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Baihong Qian, Yu Lu, Dian Ding, Yi-Chao Chen, Qiaoling Xiao, Guanghui Gao, Zhengguang Xiao, and Guangtao Xue
Abstract: Image super-resolution SR aims to reconstruct high-resolution images from low-resolution inputs, addressing challenges like sensor noise, optical distortions, and compression artifacts. Traditional SR methods often struggle with preserving fine details, particularly in regions with sharp transitions or complex textures. In this work, we propose a novel Local Wavelet Transformer LWT framework that leverages the Discrete Wavelet Transform DWT to capture both local textures and global structures, improving the accuracy of fine-grained detail restoration. By introducing a magnification factor decomposition strategy, our method enables super-resolution at arbitrary scaling levels, ensuring flexibility and precise detail preservation across different magnifications. We demonstrate the effectiveness of our approach through extensive experiments on multiple benchmark datasets, showing superior performance and achieving state-of-the-art results in high-resolution image reconstruction under diverse conditions. Our results highlight the potential of wavelet-based analysis for enhancing SR tasks, particularly in scenarios requiring fine detail recovery and sharp transitions.
Keyword: Single image super resolution, Discrete wavelet transformation, Local attention mechanism
Cite@inproceedings{ICIC2025,
author = {Baihong Qian, Yu Lu, Dian Ding, Yi-Chao Chen, Qiaoling Xiao, Guanghui Gao, Zhengguang Xiao, and Guangtao Xue},
title = {Effective Local Texture Estimation Using Wavelet Transforms for Arbitrary-Scale Super-Resolution},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {260-271},
note = {Poster Volume Ⅰ}
}
- MDTH: A Multi-Scale Deep Learning Network for Steel Surface Defect Detection with Trans-Ham Feature Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yihong Wu, Yuhao Guo, Chen Yang, and Chao Zhang
Abstract: In steel surface defect detection, accurately identifying various types of de-fects is essential. However, the diverse morphologies of defects and com-plex backgrounds encountered in real-world industrial production pose sig-nificant challenges for existing object detection networks. To overcome these issues, this paper presents a deep-learning-based network model named MDTH based on YOLOv10, which integrates multi-scale deep convolutional feature extraction with Swin Transformer encoding through an enhanced hy-brid attention mechanism Trans-HAM . Firstly, the Multi-Angle Perception and Depth-wise separable convolution module MAPD is employed to cap-ture the edges and texture details of steel surfaces, effectively identifying minor defects. Secondly, the Trans-HAM module extracts more comprehen-sive and fine-grained feature information, enabling the model to simultane-ously focus on both local details and global structures. Finally, MPDIoU is employed to optimize the overlap and shape matching of bounding boxes, improving the accuracy of defect localization. Experimental results on NEU-DET dataset and PKU-Market-PCB dataset show that the mAP@0.5 of the proposed MDTH model achieves a mean average precision of 81.2 and 95.3 , respectively, which greatly improves the detection accuracy, and the experimental results outperform those of the commonly used model.
Keyword: Defect detection Hybrid attention mechanism MPDIoU loss function Neu-ral network.
Cite@inproceedings{ICIC2025,
author = {Yihong Wu, Yuhao Guo, Chen Yang, and Chao Zhang},
title = {MDTH: A Multi-Scale Deep Learning Network for Steel Surface Defect Detection with Trans-Ham Feature Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2749-2764},
}
- AdvDetectGPT: Detecting Adversarial Examples Using Large Vision-Language Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ming Zhang, Huayang Cao, and Cheng Qian
Abstract: Adversarial examples have been proven to be a substantial threat to the security applications of deep neural networks. Adversarial detection plays a pivotal role in defending against adversarial attacks. While the underlying concept is straightforward, the practical realization of adversarial detection is non-trivial, frequently encountering challenges of universality and effectiveness. In this study, we leverage the powerful capabilities of large vision-language models LVLMs and develop AdvDetectGPT, a novel adversarial detector based on LVLMs. AdvDetectGPT can learn to identify adversarial examples directly from clean and adversarial instances, independent of the victim model's outputs or internal responses. The extensive experiments show that AdvDetectGPT significantly outperforms the state-of-the-art baselines. AdvDetectGPT exhibits robust generalization, capable of detecting adversarial examples crafted by novel attacks on new models, as well as those with customized perturbations distinct from the training set. Code is available at https: github.com mingcheung AdvDetectGPT.
Keyword: Adversarial detection, Adversarial examples, Deep neural networks, LVLMs.
Cite@inproceedings{ICIC2025,
author = {Ming Zhang, Huayang Cao, and Cheng Qian},
title = {AdvDetectGPT: Detecting Adversarial Examples Using Large Vision-Language Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {802-812},
note = {Poster Volume Ⅰ}
}
- A Coarse-Precise Refinement Learning-Based Knowledge Distillation Network for Anomaly Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chaoyang Li, Haozheng Zhang, Fei Wang, Chengkun Li, and Yanhong Yang
Abstract: Anomaly detection serves a crucial role in large-scale industrial manufacturing. Knowledge distillation KD -based approaches have demonstrated excellent performance, yet their efficacy is constrained by the identical symmetric structures. In this study, we propose an enhanced KD-based architecture with a dual-learning mechanism, called DLKD, to precisely characterize normal samples and improve detection performance. Specifically, we first introduce a coarse decoder into the student network to preliminarily reconstruct the teacher features, in which the SSM-based global feature reconstruction block GFRB and CNN-based local feature reconstruction block LFRB effectively model global and local information.A precise refinement learner is subsequently provided to finely tune the coarse reconstructed features. Extensive experiments on two publicly available anomaly detection datasets demonstrate the effectiveness and potential of the proposed DLKD. This work further explores KD-based methods for anomaly detection and provides a unique yet robust baseline for the community.
Keyword: Anomaly detection, Knowledge distillation, Dual-learning, Feature reconstruction
Cite@inproceedings{ICIC2025,
author = {Chaoyang Li, Haozheng Zhang, Fei Wang, Chengkun Li, and Yanhong Yang},
title = {A Coarse-Precise Refinement Learning-Based Knowledge Distillation Network for Anomaly Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {272-283},
note = {Poster Volume Ⅰ}
}
- Hierarchical Refinement and Bilateral Attention Fusion for Polyp Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Pusheng An, JiYa MengHe, and He Yi
Abstract: In the field of medical imaging, polyp segmentation is a crucial task as it enables doctors to accurately identify and segment polyps in endoscopic images and other medical images. Currently, numerous deep learning based polyp segmentation models mainly rely on multi-scale feature fusion techniques to delineate the boundaries of polyps. However, these existing methods often fail to consider the interconnection between the model's localization and segmentation processes. Generally, when searching for polyps, people first determine the approximate location of the polyps and then gradually obtain detailed feature information of the polyps. In view of this, we propose a hierarchical refinement multi-scale feature fusion Model named HRFFNet. First, we design a hierarchical refinement feature extraction method to precisely optimize the initially located polyp regions. Then, we develop a feature fusion block named FB, which relies on the overall lesion information to form multi-scale feature representations. Through extensive experiments on four commonly used benchmark datasets, we find that HRFFNet performs outstandingly in polyp segmentation, and its performance significantly surpasses that of existing top-notch models.
Keyword: Polyp Segmentation,Hierarchical Refinement,Bilateral Attention,Feature Fusion
Cite@inproceedings{ICIC2025,
author = {Pusheng An, JiYa MengHe, and He Yi},
title = {Hierarchical Refinement and Bilateral Attention Fusion for Polyp Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {284-298},
note = {Poster Volume Ⅰ}
}
- CAP: Contextual Enhancement and Adaptive Prompting Network for Zero-Shot Composed Image Retrieval, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Dian Chen, Bo Li, Ying Qin, Qingwen Li, Hong Li, and Shikui Wei
Abstract: This paper focuses on the zero-shot Composed Image Retrieval ZS-CIR task, which only requires unlabeled images or imagetitle pairs for model training. Previous work has utilized textual inversion networks to form queries by combining a photo of fixed templates with pseudo-words projected from reference image features into the text embedding space. However, fixed prompt templates offer limited performance improvement for the model and can affect the learning of instance-specific contextual information in open-domain tasks. To address these issues, we propose a zero-shot composed image retrieval framework based on contextual enhancement and adaptive prompting CAP , which consists of a Contextual Enhancement Module CEM and an Adaptive Prompting module APM . CEM introduces bi-directional LSTM re-parameterized learnable prompts, and APM decouples the retrieval instances and maps the different features to the corresponding prompt parameters. These two modules cooperate to construct the optimal prompts adapted to the retrieval instances. Extensive qualitative and quantitative experiments on three datasets show that our model has a good generalization and better performance compared to state-of-theart methods.
Keyword: Composed image retrieval Zero-shot Contrastive learning
Cite@inproceedings{ICIC2025,
author = {Dian Chen, Bo Li, Ying Qin, Qingwen Li, Hong Li, and Shikui Wei},
title = {CAP: Contextual Enhancement and Adaptive Prompting Network for Zero-Shot Composed Image Retrieval},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3586-3597},
}
- Multi-view Graph Attention Contrastive Learning for Predicting miRNA-Disease Association, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Li-Juan Qiao, Yu-Kai Ma, Yu-Tian Wang, Shuang Liu, Cun-Mei Ji, and Chun-Hou Zheng
Abstract: MicroRNAs miRNAs are an important class of endogenous non-coding RNAs that regulate critical biological processes, such as cell differentiation, proliferation, and apoptosis, through post-transcriptional mechanisms. Recent studies have shown that aberrant miRNA expression is closely linked to the pathogenesis of complex diseases, including cancer and neurodegenerative disorders. This study introduces a novel prediction model, MGACMDA, which combines multi-view contrastive learning and residual graph attention to overcome several limitations of existing miRNA-disease association prediction methods, such as limited robustness to data sparsity, high sensitivity to network noise, and insufficient extraction of deep topological information. We propose three data enhancement methods to construct global, local and topological views, and design a graph attention encoder with residual connection to fuse shallow topological features with deep representations through residual mechanism. Finally, a momentum-driven multi-view contrastive learning module is designed, and momentum encoder is used to maintain the global negative sample queue, which significantly improves the discrimination ability of sparse association. We applied MGACMDA to benchmark datasets, including HMDD v2.0 and HMDD v3.2, using 5-fold cross-validation. The evaluation metrics, including F1 score, AUC and AUPR values, and case studies of experimental results indicate that our method is efficient and robust for predicting miRNA-disease associations.
Keyword: miRNA-disease association prediction, Multi-view similarity net-work, Contrastive learning, Graph attention network, Similarity kernel fusion, Residual attention networks
Cite@inproceedings{ICIC2025,
author = {Li-Juan Qiao, Yu-Kai Ma, Yu-Tian Wang, Shuang Liu, Cun-Mei Ji, and Chun-Hou Zheng},
title = {Multi-view Graph Attention Contrastive Learning for Predicting miRNA-Disease Association},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2393-2410},
note = {Poster Volume Ⅱ}
}
- Emotional and Social Signals in Multimedia: Vibrato Variations in Classical Singing, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: JieYing Liu
Abstract: In acoustic psychology research, many studies have explored how acoustic features enhance emotional expression in singing, but the role of vibrato has been relatively underexamined. This study investi- gates how variations in vibrato affect emotional perception in singing through two experiments. Recordings of professional classical singers per- forming notes with different vibrato speeds and amplitudes were analyzed to examine correlations between vibrato features and emotional percep- tion. A perceptual emotional assessment further revealed that changes in vibrato significantly influence the perception of basic emotional cate- gories. Specifically, variations in vibrato speed and amplitude were found to alter how emotions are perceived. The findings indicate that even con- trolled vibrato variations carry emotional cues recognizable to listeners. The study also highlights that vibrato parameters are independent, and professional opera singers can control these features separately to convey distinct emotions.
Keyword: Vibrato Characteristics · Acoustic Psychology · Emotional Perception · Singing Emotion.
Cite@inproceedings{ICIC2025,
author = {JieYing Liu},
title = {Emotional and Social Signals in Multimedia: Vibrato Variations in Classical Singing},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1819-1834},
note = {Poster Volume Ⅱ}
}
- Efficient and Lightweight Federated Learning Scheme for Privacy Protection and Enhancement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xuyan Zhang, Zhencheng Fan, Da Huang, Yuhua Tang, and Xiyao Liu
Abstract: Deep Neural Networks DNNs have been widely used in computer vision, speech recognition, and recommender systems, which require large amounts of user data. However, the collection of data can result in data privacy breaches. Federated learning FL protects data privacy by enabling multiple clients to collaborate in training deep neural network models on private datasets and sharing the training results. However, traditional federated learning solutions are vulnerable to malicious data theft from malicious clients and cloud servers that infer private user data from transmitted intermediate parameters and are unfriendly to resource-constrained clients. In this paper, we propose an effi-cient and privacy-preserving lightweight federated learning PL-FL scheme based on the federated averaging algorithm that combines differential privacy DP and ring-based fully homomorphic encryption FHE . Specifically, we utilize a Gaussian mechanism to perturb the client's local model parameters, on top of which we use ring-based learning of FHE to prevent theft by mali-cious attackers. The formal analysis presented in this paper demonstrates that the proposed scheme can achieve model convergence with reduced communi-cation consumption and time while providing robust privacy protection. Exten-sive experimental results on diverse datasets illustrate that the scheme exhibits competitive model performance and computational efficiency, when compared to the FL baseline. Furthermore, the privacy analysis experiments demonstrate that the approach effectively prevents malicious data theft and recovery, providing strong privacy protection capabilities.
Keyword: Federated Learning, Privacy-preserving, Differential Privacy, Fully Homo-morphic Encryption.
Cite@inproceedings{ICIC2025,
author = {Xuyan Zhang, Zhencheng Fan, Da Huang, Yuhua Tang, and Xiyao Liu},
title = {Efficient and Lightweight Federated Learning Scheme for Privacy Protection and Enhancement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3493-3508},
}
- F3ND: Bridging the Semantic Gap with Tri-modal Self-Attention for Enhanced Fake News Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhonghao Yao and Huaping Zhang
Abstract: The rapid development of Internet technology and the rise of social media have promoted the spread of fake news, most of which contain both text and image content. However, multimodal methods will inevitably encounter the challenge of semantic gap when analyzing these news. To solve this problem, we propose F3ND, a tri-modal integrated self-attention framework for fake news detection. F3ND combines unimodal text and image features with fused multimodal features. The fused features act as a bridge to enhance the correlation between the two unimodal features, which effectively solves the semantic gap problem when understanding multimodal content. At the same time, we introduce a self-attention mechanism to dynamically assign weights to different features, retaining the discriminative information in unimodal features that helps to determine whether the news is fake. Our experiments on Weibo and Weibo21 datasets show that F3ND can achieve better performance than many previous baseline models, proving the robustness and effectiveness of our method.
Keyword: Fake News Detection Self-Attention Mechanism Semantic Gap.
Cite@inproceedings{ICIC2025,
author = {Zhonghao Yao and Huaping Zhang},
title = {F3ND: Bridging the Semantic Gap with Tri-modal Self-Attention for Enhanced Fake News Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {813-824},
note = {Poster Volume Ⅰ}
}
- EPASC-SH: Efficient Privacy-Preserving Authentication and Secure Communication Protocol for Smart Homes, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuo Wang, Yifan Liu, Wenlei Chai, Fan Feng, Yi Liu, and Zhenpeng Liu
Abstract: Existing authentication schemes in smart home environments often suffer from centralized reliance, high computational overhead, and vulnerability to single-point failures. Therefore, this paper proposes a distributed aggregated signature-based authentication scheme. The scheme employs an anonymous mechanism to collaboratively generate anonymous identity signatures, through multiple authorization centers to reduce the risk of single-point failure and enhance privacy protection. The signature accumulator aggregates and uniformly verifies the signatures of multiple smart devices to reduce the computational overhead and improve the authentication efficiency. Experimental results show that the scheme can effectively improve the efficiency of signature generation in smart home environments and outperforms existing authentication schemes in terms of computation, communication, and energy overhead, thus providing an identity authentication method with strong security, high authentication efficiency, and privacy protection for smart home systems.
Keyword: Multiple Authorization Center, Anonymous Authentication, Aggregated signatures, Smart Home
Cite@inproceedings{ICIC2025,
author = {Shuo Wang, Yifan Liu, Wenlei Chai, Fan Feng, Yi Liu, and Zhenpeng Liu},
title = {EPASC-SH: Efficient Privacy-Preserving Authentication and Secure Communication Protocol for Smart Homes},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {825-838},
note = {Poster Volume Ⅰ}
}
- A Lightweight Detection Network inspired by the Olfactory Learning Circuit of Caenorhabditis elegans, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiangpeng Zheng, Jiacheng Zhao, Xuebin Wang, Meng Zhao, Fan Shi, and He Liu
Abstract: author{Jiangpeng Zheng inst{1,2} orcidID{0009-0007-7644-6083} and Jiacheng Zhao inst{1,2} orcidID{0009-0001-1174-2945} and Xuebin Wang inst{3} orcidID{0000-0002-3095-4417} and Meng Zhao inst{1,2} orcidID{0000-0002-5060-9223} and Fan Shi inst{1,2} orcidID{0000-0003-2074-0228} and He Liu inst{3} orcidID{0000-0001-9418-9171}}
Keyword: Caenorhabditis elegans,Neural circuits,Artificial neural networks,Object detection and Lightweight network.
Cite@inproceedings{ICIC2025,
author = {Jiangpeng Zheng, Jiacheng Zhao, Xuebin Wang, Meng Zhao, Fan Shi, and He Liu},
title = {A Lightweight Detection Network inspired by the Olfactory Learning Circuit of Caenorhabditis elegans},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1339-1352},
note = {Poster Volume Ⅱ}
}
- BDTIAG: Reliable and Efficient Black-Box Adversarial Text-to-Image Generation via Decision Boundary Exploration, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yongqi Jiao, Yucheng Shi, Yufei Gao, Lin Wei, and Lei Shi
Abstract: Text-to-image generation models can produce high-quality images from textual descriptions. However, they are vulnerable to adversarial attacks, which can manipulate outputs and bypass content moderation systems, leading to potential security risks. We propose BDTIAG, a black-box adversarial attack framework that improves attack efficiency and stealthiness. It comprises two key phases: 1 Adversarial Sample Space Expansion ASSE , which systematically perturbs text to generate diverse adversarial samples, and 2 Boundary Perturbation Backtracking BPB , which refines these samples to maximize attack success while minimizing detection. Extensive experiments on DALL·E, DALL·E 2, Imagen, and AttnGAN demonstrate that BDTIAG outperforms existing black-box attack methods, achieving a 6.25 increase in attack success rate and reducing the number of queries by 41.02 compared to RIATIG, all while preserving semantic consistency and naturalness.
Keyword: Adversarial Attack, Text-to-Image Generation, Black-Box Framework, Information Security, Semantic Perturbation
Cite@inproceedings{ICIC2025,
author = {Yongqi Jiao, Yucheng Shi, Yufei Gao, Lin Wei, and Lei Shi},
title = {BDTIAG: Reliable and Efficient Black-Box Adversarial Text-to-Image Generation via Decision Boundary Exploration},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {839-852},
note = {Poster Volume Ⅰ}
}
- Fine and Coarse-grained Graph Flow Neural Network for Traffic Forecasting, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuhao Zhao and Zhanquan Wang
Abstract: Recently, the Intelligent Transportation System has been developed to help relieve traffic congestion, which calls for the need to predict middle and long-term traffic flow accurately. However, existing models can’t effective-ly obtain enough accuracy in middle and long-term prediction. To solve this, we propose a Fine and Coarse-grained Graph Flow Neural Network FCGFNN , which makes better prediction by capturing both fluctuating and stable traffic patterns. Firstly, an asymmetric embedding layer is de-signed to integrate graph structure and temporal dependencies with two di-mensions of data. Then, a Season-Trend Encoder is designed to extract es-sential spatial-temporal features as well as handling non-stationary flows. Finally, the pattern of traffic flow prediction is obtained. Experimental re-sults on two real public traffic datasets shows average performance im-provements of 5.9 , 7.5 and 7.7 across 30-minute, 45-minute and 60-minute prediction intervals.
Keyword: Traffic Forecasting, Decomposition Model, Attention Mechanism.
Cite@inproceedings{ICIC2025,
author = {Yuhao Zhao and Zhanquan Wang},
title = {Fine and Coarse-grained Graph Flow Neural Network for Traffic Forecasting},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1354-1370},
note = {Poster Volume Ⅱ}
}
- LADF-YOLO: A Highly Accurate Low-Light Target Detection Algorithm, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Songyang Li, Jianping Shuai, Ya Zhou, Yaoyang Zhang, and Yingying Chen
Abstract: Images captured in complex low-light environments often exhibit weak contrast, high noise, and blurred edges. Directly applying existing target detection models to low-light images can lead to missing details and inaccurate localization, resulting in poor detection accuracy. To address these issues, this paper presents a low-light target detection method based on LADF-YOLO. The method first introduces a ReS Feature Pyramid Network ReSFPN integrated with a backbone network to capture more effective image features in low-light conditions. The method then designs a detection head that eliminates the need for non-maximum suppression NMS-Free , utilizing a dual-label assignment strategy and a consistent matching metric to align the optimization direction of the head, thereby enhancing the model's overall performance. Finally, experiments on the real low-light image dataset DarkFace demonstrate that the proposed LADF-YOLO outperforms other leading target detection algorithms in low-light conditions. Compared to the benchmark model YOLOv8, LADF-YOLO achieves a 10.8 improvement in mAP@0.5 and a 9.9 improvement in Recall.
Keyword: Low-light images, Targeted Detection, YOLO, Feature Pyramid Network.
Cite@inproceedings{ICIC2025,
author = {Songyang Li, Jianping Shuai, Ya Zhou, Yaoyang Zhang, and Yingying Chen},
title = {LADF-YOLO: A Highly Accurate Low-Light Target Detection Algorithm},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2765-2781},
}
- MS-ConformerNet: A Multi-Scale Joint Encoding Network for OTDR Signal Analysis, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xueqing Xu, Zhihui Sun, Jiwen Xu, Shaodong Jiang, Shilei Wei, Faxiang Zhang, and Xianlong Liu
Abstract: This paper proposes a multi-task deep neural network architecture for optical time-domain reflectometer OTDR signal analysis, enabling end-to-end learning for fiber fault classification and event localization. To address the limitations of traditional methods in complex scenarios, such as insufficient feature representation and conflicts in multi-task optimization, this study designs a multi-scale pooling module to extract cross-scale features, integrates an improved bidirectional feature pyramid network BiFPN to enhance multi-resolution feature fusion, and introduces a Conformer hybrid encoding block that combines self-attention and gated convolution to model both global and local features. Additionally, a task-aware dynamic gating mechanism is proposed to mitigate conflicts in multi-objective optimization. Experimental results demonstrate that the proposed model outperforms traditional methods in classification accuracy and fault localization, providing a high-precision, cost-effective solution for optical network monitoring and maintenance.
Keyword: OTDR , Multi-task , Deep Learning , Fault Diagnosis.
Cite@inproceedings{ICIC2025,
author = {Xueqing Xu, Zhihui Sun, Jiwen Xu, Shaodong Jiang, Shilei Wei, Faxiang Zhang, and Xianlong Liu},
title = {MS-ConformerNet: A Multi-Scale Joint Encoding Network for OTDR Signal Analysis},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3695-3708},
}
- Rule Augmentation and Perception Smoothing for Training-free Video Anomaly Detection with LLMs, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Dongliang Zhao, Bo Sun, Jun He, Li Yuan, Mingyang Yue, and Zhichao Wu
Abstract: Video Anomaly Detection VAD is widely applied in the field of public safe-ty. Recently, training-free video anomaly detection based on large language models LLMs has achieved remarkable progress. However, while pre-trained LLMs in previous methods contain rich general-domain knowledge, they often lack a nuanced understanding of domain-specific knowledge, leading to re-duced performance in specific scenarios, such as campus environments. Fur-thermore, these methods often overlook the temporal consistency and motion continuity between anomalous video frames when utilizing LLMs for score judgment. To address these challenges, we propose a method for video anomaly detection using rule augmentation and perception smoothing. Specifically, the rule augmentation strategy can automatically generate anomaly detection rules based on the management standards of various scenarios. Perception smoothing employs an adaptive temporal smoothing strategy to enhance the robustness of score judgment based on LLMs. Extensive experiments demonstrate that the proposed method not only outperforms state-of-the-art, training-free methods on general datasets such as UCF-Crime and XD-Violence, but also achieves significant improvements on the specific scenario dataset ShanghaiTech.
Keyword: Video Anomaly Detection, Large language models, Training-free, Perception Smoothing, Rule augmentation.
Cite@inproceedings{ICIC2025,
author = {Dongliang Zhao, Bo Sun, Jun He, Li Yuan, Mingyang Yue, and Zhichao Wu},
title = {Rule Augmentation and Perception Smoothing for Training-free Video Anomaly Detection with LLMs},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2780-2792},
}
- SegMAE: A Dual Decoder Framework with Patch Wise Constraint for Skin Lesion Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiacheng Huang, Haozhe Li, Gexian Liu, Gao Wang, and Keming Mao
Abstract: Skin lesion segmentation remains a challenging task in medical image analysis. Although Transformer-based segmentation models have achieved notable progress in recent years, they still suffer from limitations such as the imbalance between local and global modeling, single-task architectural design, and insufficient attention to critical regions. These issues hinder their segmentation performance on complex skin lesion images. To address these challenges, we propose SegMAE, a dual-decoder segmentation framework that integrates image reconstruction and segmentation tasks to jointly enhance the model’s understanding of both global context and local details. The model adopts a CNN-Transformer hybrid encoder, with a MAE decoder for reconstruction and a Cascaded Upsampler for segmentation. To enhance the model’s performance and generalization, we design a two-stage training strategy that first involves. pretraining and then proceeds to hybrid multi-task training. In addition, we introduce a Patch-wise Loss function that adaptively emphasizes training on critical regions, thereby improving segmentation accuracy and robustness. Experimental results on ISIC2017, ISIC2018 and PH2 demonstrate that SegMAE consistently outperforms existing mainstream methods across multiple evaluation metrics, showcasing superior segmentation performance and strong generalization capability.
Keyword: Skin Lesion Segmentation, Patch-wise Loss, Hybrid Training Strategy, Dual Decoder Architecture.
Cite@inproceedings{ICIC2025,
author = {Jiacheng Huang, Haozhe Li, Gexian Liu, Gao Wang, and Keming Mao},
title = {SegMAE: A Dual Decoder Framework with Patch Wise Constraint for Skin Lesion Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {299-314},
note = {Poster Volume Ⅰ}
}
- CMTFormer: Contrastive Multi-Scale Transformer for Long-Term Time Series Forecasting, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chenhao Ye, Shuai Zhang, and Guangping Xu
Abstract: Long-term time series forecasting remains challenging due to complex temporal dependencies, diverse data distributions, and computational inefficiencies with extended sequences. We propose CMTFormer, a novel architecture that addresses these limitations through multi-scale temporal modeling and contrastive learning. Our approach combines adaptive trend decomposition across multiple timescales with a representation learning framework that leverages self-attention mechanisms and dilated convolutions. The proposed multi-scale trend decomposition disentangles time series into interpretable components at varying resolutions, while the contrastive learning strategy enhances feature discrimination by differentiating between semantically related and unrelated temporal patterns. Extensive experiments on six real-world benchmarks spanning energy, transportation, weather, finance, and public health domains demonstrate that CMTFormer consistently outperforms state-of-the-art forecasting models.
Keyword: Long-term time series forecasting, multi-scale temporal modeling, contrastive learning, self-attention.
Cite@inproceedings{ICIC2025,
author = {Chenhao Ye, Shuai Zhang, and Guangping Xu},
title = {CMTFormer: Contrastive Multi-Scale Transformer for Long-Term Time Series Forecasting},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {36-48},
}
- LCAA: Lightweight Convolutional Attention Autoencoder for Acoustic Anomaly Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuxue Wang and Chenhao Ye
Abstract: Industrial machinery monitoring is pivotal in modern manufacturing, where unexpected equipment failures could incur significant economic and operational costs. In this work, we introduce LCAA,a novel unsupervised framework tailored for acoustic anomaly detection in industrial environments. Our approach synergistically combines convolutional neural networks with multi-head attention mechanisms within a compact autoencoder architecture, enabling the effective capture of both temporal and frequency domain features inherent in acoustic signals. By selectively focusing on the most informative components of the input, the proposed model enhances feature extraction, leading to improved detection accuracy and faster convergence compared to traditional methods. Extensive experiments on multiple benchmark datasets demonstrate that LCAA not only outperforms state-of-the-art baselines in detecting subtle anomalies but also maintains a minimal parameter footprint, thereby facilitating real-time deployment on resource-constrained edge devices. This study contributes a robust and efficient solution for proactive maintenance strategies, promoting enhanced operational reliability and reduced downtime in industrial systems.
Keyword: Acoustic anomaly detection,Convolutional neural networks,Attention mechanisms ,Autoencoder,Industrial monitoring
Cite@inproceedings{ICIC2025,
author = {Yuxue Wang and Chenhao Ye},
title = {LCAA: Lightweight Convolutional Attention Autoencoder for Acoustic Anomaly Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {263-274},
}
- Vision Mamba UNet+: an Improved Multi-Organ Segmentation Method Based on State-Space Model, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Song Shen, Haohan Ding, Xiaohui Cui, Yicheng Di, Long Wang, and Wancheng He
Abstract: In the domain of multi-organ segmentation for medical imaging, considerable advancements have been achieved through the application of Convolutional Neural Networks CNNs and Transformer-based architectures. While CNNs excel in local feature extraction, their inherently small receptive fields limit their capacity to capture global context. Conversely, Transformers, with their ability to model global dependencies, offer superior performance in this regard, but their computational demands, particularly for high-resolution medical images, present significant challenges. To address these limitations, this study proposes Vision Mamba UNet_, an optimized architecture rooted in the Mamba framework. Vision Mamba UNet_ effectively balances the extraction of both local and global information while substantially reducing computational overhead. The model leverages components from VMamba and Vision Mamba encoders, structured around a ‘U’ -shaped encoderdecoder framework that incorporates skip connections and multi-scale feature fusion to maximize performance. Experimental evaluations on the Synapse dataset demonstrate that Vision Mamba UNet_ achieves superior computational efficiency and segmentation accuracy, underscoring its promise for application in complex medical image segmentation tasks.
Keyword: medical image segmentation vision mamba state space models
Cite@inproceedings{ICIC2025,
author = {Song Shen, Haohan Ding, Xiaohui Cui, Yicheng Di, Long Wang, and Wancheng He},
title = {Vision Mamba UNet+: an Improved Multi-Organ Segmentation Method Based on State-Space Model},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {315-326},
note = {Poster Volume Ⅰ}
}
- TM-SPEECH: END-TO-END TEXT TO SPEECH BASED ON INTEGRATING TRANSFORMER AND MAMBA, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Long Wang, Zichao Deng, Haoke Hou, Song Shen, Wancheng He, and Haohan Ding
Abstract: Text-to-speech synthesis is the process of converting natural language text into speech. In recent years, deep learning has made significant strides in this field. Although the Transformer model is effective at capturing dependencies, its attention mechanism’s quadratic complexity results in longer training times and increased costs. Recent advancements in state-space models SSMs have demonstrated impressive performance in modeling long-range dependencies due to their sub-quadratic complexity. Mamba, a notable example of SSMs, exhibits linear time complexity and excels in tasks involving long sequences, similar to those in natural language. In this paper, we propose TM-Speech, which integrates Mamba for modeling long-range dependencies and Transformer for capturing short-range dependencies, thereby reducing model training costs. Comparative experiments show that TM-Speech is almost 2× smaller and 3× faster than FastSpeech2 during training, while also achieving superior inferred audio quality. The code is available at https: github.com Apolarity886 TMSpeech.
Keyword: text-to-speech,speech synthesis,Transformer,Mamba
Cite@inproceedings{ICIC2025,
author = {Long Wang, Zichao Deng, Haoke Hou, Song Shen, Wancheng He, and Haohan Ding},
title = {TM-SPEECH: END-TO-END TEXT TO SPEECH BASED ON INTEGRATING TRANSFORMER AND MAMBA},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1835-1846},
note = {Poster Volume Ⅱ}
}
- User Privacy Leakage in Text-based Recommendation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhiyong Wu, Hongguang Chen, and Yin Chen
Abstract: With the breakthrough development of large language models, the recom-mendation system is undergoing a transformation from the traditional rec-ommendation model based on the unique identity ID of users items pure ID-based model, IDRec to the recommendation model integrated with pre-trained modality encoder modality-based recommendation model, MoRec . This paradigm overcomes the cold start problem and enables the recommen-dation model to achieve cross-platform migration through pretraining. How-ever, the vector encoded by the modality encoder always contains more in-formation. Given this, a natural question arises: would MoRec suffer more security issue which could cause serious leakage of user historical behavior data compared to IDRec? We aim to explore this question and study modali-ty-based attack model in textual field. Specifically, we study several subquestions: i which recommendation paradigm, T-MoRec Textual-MoRec or IDRec, performs worse in protecting user privacy against attacks by the attack model? ii can the latest technical advances from NLP translate into attack improvement for T-MoRec? iii are there other factors affect tex-tual recommendation attack model? What's the proper setting to conduct the attack? To answer all this questions, we purpose Text-based Recommenda-tion Attack Model TRAM and conduct rigorous experiments with textual modality. We provide the first empirical on two public datasets, MIND and EB-NeRD, demonstrating that T-MoRec leads to serious leakage of user his-torical behavior data compared with IDRec under the same conditions. Addi-tionally, we show that the leakage is consistently influenced by the hyperpa-rameters and training cost in textual recommendation attack model.
Keyword: Information Security, Recommendation System, Data Mining
Cite@inproceedings{ICIC2025,
author = {Zhiyong Wu, Hongguang Chen, and Yin Chen},
title = {User Privacy Leakage in Text-based Recommendation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {853-865},
note = {Poster Volume Ⅰ}
}
- DocHQ: Towards Multi-modal Document Understanding via Hybrid Feature Queries, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jin Wang, Yingying Liu, and Yahong Han
Abstract: Significant progress has been made in general multi-modal tasks leveraging pre-trained visual and language models. However, in visual document under-standing tasks, enhancing performance by utilizing existing models encoun-ters difficulties due to the fundamental differences between natural and doc-ument images. In this paper, we introduce DocHQ, a multi-modal document image understanding model with pre-trained visual and language models, employing a hybrid feature query for feature alignment between document visual information and language text. Our approach combines learnable and fixed task-oriented queries within a cross-attention visual-language align-ment module to extract more fine-grained information from document im-ages. Moreover, we utilize large-scale document images for alignment train-ing between the pre-trained image encoder and the language model. Experi-mental results demonstrate that our method achieves outstanding perfor-mance across three different types of document image understanding tasks compared to existing approaches.
Keyword: Document Image Understanding, Multi-modal Feature Alignment, Docu-ment Pretrain model.
Cite@inproceedings{ICIC2025,
author = {Jin Wang, Yingying Liu, and Yahong Han},
title = {DocHQ: Towards Multi-modal Document Understanding via Hybrid Feature Queries},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {327-339},
note = {Poster Volume Ⅰ}
}
- GSE-MN4: Group-Shared Exponents Integer Quantization for MobileNetV4, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shenglin Yang, Zhuo Han, Hahn Yuan, and Yanmei Hu
Abstract: This paper introduces Group-Shared Exponents Integer Quantization for MobileNetV4, a novel quantization framework tailored for efficient deployment of deep learning models on resource-constrained edge devices. Our method employs the Group-Shared Exponents GSE format, which shares exponents among groups of parameters and quantizes mantissas under a shared exponent constraint, significantly reducing memory overhead compared to traditional quantization techniques. Furthermore, we introduce an automated mixed-precision quantization scheme that allocates bit-widths based on layer sensitivity, thereby assigning each layer an optimal quantization bit-width. This strategy effectively optimizes the trade-off between accuracy and efficiency. Extensive experiments on the ImageNet1K dataset demonstrate that GSE-MN4 outperforms conventional quantization methods. For instance, the GSE-MIX quantization method on MNv4-Conv-S achieves a Top-1 accuracy of 73.34 with a memory footprint of only 3.23 MB, maintaining high accuracy while substantially reducing memory usage. Our work highlights the potential of GSE-INT for efficient and accurate deployment of deep learning models in mobile and edge scenarios.
Keyword: Post-Training Quantization, MobileNetV4
Cite@inproceedings{ICIC2025,
author = {Shenglin Yang, Zhuo Han, Hahn Yuan, and Yanmei Hu},
title = {GSE-MN4: Group-Shared Exponents Integer Quantization for MobileNetV4},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {340-352},
note = {Poster Volume Ⅰ}
}
- ME-GCN: Motif-Enhanced Graph Convolutional Network for Recommendation Systems, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jianmin Xu and Ping Lu
Abstract: Recommendation systems play a key role in helping users cope with the vast amount of online information, and graph convolutional network GCN -based methods have attracted much attention due to their ability to model complex relationships in user-item interaction graphs. However, existing GCN models mainly focus on direct connections between nodes, ignoring the potential value of higher-order structural patterns such as triangles. In this paper, we propose a motif-enhanced graph convolutional network ME-GCN to improve recommendation performance by explicitly leveraging the triangle patterns in the user-item interaction graph. Specifically, we design an efficient sparse matrix algorithm to compute the triangle participation of nodes and integrate it into the node embedding via a learnable projection mechanism, which enhances the motif capability of higher-order structural patterns while retaining the simple architecture of GCN. Experiments on three public datasets MovieLens-1M, Amazon-Books, and Yelp2018 show that ME-GCN significantly outperforms existing benchmark models, especially in sparse data scenarios up to 7.47 . Ablation experiments further verify the importance of the triangular model, whose contribution far exceeds simple structural features such as the first-order node degree.
Keyword: Graph Convolutional networks Recommendation systems Motif-enhanced Triangular structures Sparse matrix computation.
Cite@inproceedings{ICIC2025,
author = {Jianmin Xu and Ping Lu},
title = {ME-GCN: Motif-Enhanced Graph Convolutional Network for Recommendation Systems},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2470-2486},
note = {Poster Volume Ⅱ}
}
- Data Augmentation via Bit-Plane Manipulation for Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Changcheng Lu, Songjie Du, Weiguo Pan, Bingxin Xu, and Nuoya Li
Abstract: Current object detection algorithms based on deep learning- heavily depend on a substantial amount of annotated data for model training. High-quality datasets are crucial in addressing challenges such as overfitting. However, collecting large amount of annotated data poses challenging in certain fields. To mitigate this limitation, this paper introduces a data augmentation method based on low-bit plane manipulation. Specifically, this paper employs selected data augmentation methods by processing the low bit planes of the annotated regions in images. This can modify the low-frequency information of the images while minimizing significant visual changes. It is crucial for tasks that depend on high-quality image. During the bit-plane combination process, the augmented image data is achieved through the combination of different bit planes, thereby increasing the diversity of training data. The effectiveness of the proposed method is validated on existing object detection and classification methods, demonstrating notable performance improvements on public datasets, voc2007, voc2012, and kitti2D. These results demonstrating its applicability to object detection and classification that require high-quality input images, enhancing the performance of the algorithms. The code and data can be find here: https: github.com cjjhf Data_augmentation.
Keyword: Data Augmentation, Bit-Plane Manipulation, Object Detection.
Cite@inproceedings{ICIC2025,
author = {Changcheng Lu, Songjie Du, Weiguo Pan, Bingxin Xu, and Nuoya Li},
title = {Data Augmentation via Bit-Plane Manipulation for Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {353-369},
note = {Poster Volume Ⅰ}
}
- Scientific Literature Retrieval and Recommendation Model Based on RoBERTa and SASRec, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuhui Zhang
Abstract: With the rapid growth in the quantity and variety of scientific literature, efficiently retrieving and recommending relevant documents for researchers has become a challenge. This paper proposes a scientific literature retrieval and recommendation model integrating Robustly Optimized BERT Pretraining Approach RoBERTa and Self-Attentive Sequential Recommendation SASRec . By incorporating semantic feature information extracted by the RoBERTa model and domain category information predicted by the FastBERT model and combining traditional self-attention sequence recommendation models with proxy attention mechanisms and learnable filtering encoders, the model effectively captures the long-term dependencies of user behavior. This enhances the accuracy of scientific literature retrieval and recommendation. Experimental results demonstrate that the proposed model outperforms traditional methods regarding retrieval and recommendation accuracy, personalization, and efficiency.
Keyword: Literature Retrieval and Recommendation RoBERTa FastBERT SASRec Sequential Recommendation
Cite@inproceedings{ICIC2025,
author = {Yuhui Zhang},
title = {Scientific Literature Retrieval and Recommendation Model Based on RoBERTa and SASRec},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3602-3613},
}
- An LLM-empowered General Workflow for Legal Case Analysis: A Case Study on Elderly Laborer Protection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuting Wang, Runliang Niu, Xingyuan Min, Nanfei Gu, Qianli Xing, and Qi Wang
Abstract: The emergence of LLMs has revolutionized the field of legal case analysis. Exist-ing research primarily focuses on specific issues, e.g., legal view generation and case retrieval, neglecting the universal ability of legal semantic features within texts to address multiple issues. In this paper, we develop a novel general work-flow utilizing LLMs for diverse legal case analysis tasks, uncovering implicit in-formation in the legal case datasets. Specifically, the workflow involves three steps: 1 legal experts first define a fine-grained elements framework for legal cases 2 LLMs then extract these elements from documents and convert them into structured tables, aiming to capture special meaningful information contained in the document 3 various questions of interest to legal experts can be ad-dressed by selecting and analyzing relevant elements. Benefiting from LLMs' knowledge and ability in understanding and processing text, element annotation becomes scalable, allowing our workflow to handle general legal intelligence tasks. We validate the feasibility and effectiveness of our workflow on the Elder-ly Laborer Protection issue as a case study, exploring the factors affecting judg-ment outcomes from a causal perspective. Our fine-grained legal case dataset at the document level annotated by legal experts and easy-to-use workflow tools are available at https: anonymous.4open.science' LegalCaseAnalysis-814F .
Keyword: Large Language Models, Legal Case Analysis, General Workflow.
Cite@inproceedings{ICIC2025,
author = {Yuting Wang, Runliang Niu, Xingyuan Min, Nanfei Gu, Qianli Xing, and Qi Wang},
title = {An LLM-empowered General Workflow for Legal Case Analysis: A Case Study on Elderly Laborer Protection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {745-758},
}
- Ancient Chinese Character Image Retrieval Based on Self-Attention Mechanism and Multi-Scale Feature Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ye Yang
Abstract: The diverse fonts and forms of ancient Chinese characters and the significant differences in glyphs present great challenges for retrieving ancient Chinese character images. In this study, we proposed a Chinese character image retrieval model based on a self-attention mechanism and multi-scale feature fusion SAMSFF to improve the feature extraction ability and accuracy of Chinese character images in ancient documents. Firstly, an improved inverted residual module called HardFused IB was constructed using the optimized SE attention mechanism to obtain the enhanced features of key information. Secondly, the static dynamic context fusion module was used to fully use the context information between adjacent keys to improve the expressiveness and representativeness of the output features. Finally, the bilinear multi-scale feature fusion module BMSFblock was constructed to perform an adaptive fusion of the multi-layer features extracted by the designed network. The network measures the Euclidean distance between the queried and candidate images and sorts and returns the most relevant results. The mAP@-1 of the retrieval method proposed in this paper on the ancient Chinese character image dataset is 0.932. Experimental results show that the model can effectively extract the features of ancient Chinese character images, improve retrieval accuracy, and have certain advantages in ancient Chinese character image retrieval.
Keyword: Ancient Chinese Character images, Image Retrieval, Self-Attention Mechanism, Multi-Scale Feature Fusion.
Cite@inproceedings{ICIC2025,
author = {Ye Yang},
title = {Ancient Chinese Character Image Retrieval Based on Self-Attention Mechanism and Multi-Scale Feature Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {370-381},
note = {Poster Volume Ⅰ}
}
- Novel Defect Detection for a Badminton Shuttlecock Based on Improved YOLOv8 with RepVGGBlock, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yujie Li, Xin Li, Qiuyuan Gan, and Benying Tan
Abstract: In badminton shuttlecock secondary recycling, manual selection is easily affected by human factors such as fatigue and attention lapses. This leads to low efficiency, making it impossible to meet large-scale refurbishment needs. This paper proposes YOLOv8n_RepVGG, an enhanced badminton shuttlecock classification method from a modified YOLOv8n architecture. We use RepVGGBlock modules in the backbone network to improve the model's representational capacity. Compared to baseline YOLOv8, the proposed model achieves a precision of 91.1 with an increase of 3.1 and a mean average precision mAP50 of 87.2 with an increase of 1.2 . The proposed approach not only propels technological advances in badminton shuttlecock reconditioning processes but also contributes significantly to global sustainability initiatives through enhanced resource optimization.
Keyword: Defect detection for badminton shuttlecock,YOLO,RepVGGBlock,Data augmentation,Deep learning.
Cite@inproceedings{ICIC2025,
author = {Yujie Li, Xin Li, Qiuyuan Gan, and Benying Tan},
title = {Novel Defect Detection for a Badminton Shuttlecock Based on Improved YOLOv8 with RepVGGBlock},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {382-395},
note = {Poster Volume Ⅰ}
}
- Efficient and Verifiable Privacy-Preserving Convolutional Computation for CNN Inference with Untrusted Clouds, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jinyu Lu, Xinrong Sun, Yunting Tao, Tong Ji, Fanyu Kong, and Guoqiang Yang
Abstract: The widespread adoption of convolutional neural networks CNNs in resource-constrained scenarios has driven the development of Machine Learning as a Service MLaaS system. However, this approach is susceptible to privacy leakage, as the data sent from the client to the untrusted cloud server often contains sensitive information. Existing CNN privacy-preserving schemes, while effective in ensuring data confidentiality through homomorphic encryption and secret sharing, face efficiency bottlenecks, particularly in convolution operations. In this paper, we propose a novel verifiable privacy-preserving scheme tailored for CNN convolutional layers. Our scheme enables efficient encryption and decryption, allowing resource-constrained clients to securely offload computations to the untrusted cloud server. Additionally, we present a verification mechanism capable of detecting the correctness of the results with a success probability of at least 1-1 Z. Extensive experiments conducted on 10 datasets and various CNN architectures demonstrate that our scheme achieves speedups ranging from 26× to 87× compared to the original plaintext model while maintaining accuracy.
Keyword: Privacy-preserving, Convolutional Neural Network, MLaaS, Verifiable Com-putation
Cite@inproceedings{ICIC2025,
author = {Jinyu Lu, Xinrong Sun, Yunting Tao, Tong Ji, Fanyu Kong, and Guoqiang Yang},
title = {Efficient and Verifiable Privacy-Preserving Convolutional Computation for CNN Inference with Untrusted Clouds},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {866-881},
note = {Poster Volume Ⅰ}
}
- Efficient Delegated Multi-Party Private Set Intersection Protocol for Large-Scale Datasets, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ou Ruan, Huiwen Miao, and Changwang Yan
Abstract: Private set intersection PSI as a core research direction in modern cryptography can accurately obtain the intersection information of multiple parties' data while ensuring the confidentiality of the original set of each participant. With the advantages of cloud platforms in storage and computation, cloud-based PSI schemes are getting more and more attention. Based on Paillier homomorphic encryption algorithm and pseudo-random function, this paper proposes an efficient delegated multi-party private set intersection protocol suitable for large-scale data sets. The protocol transforms the dataset intersection problem into a polynomial rooting problem and uses random polynomial blinding methods and homomorphic encryption techniques to ensure the security of the protocol. We give a rigorous formal security proof of the protocol and implement it using the C__ programming language. Our advantages can be demonstrated from the experimental analysis as follows: a the protocol is more suitable for large-scale dataset scenarios than the relevant protocols. The time complexity of ours’ server is while it’s in other protocols where d is the size of the dataset b our protocol is much more efficient. The running time for clients of our protocol is 1 3 of that of the others c the protocol does not depend on the existence of a secure channel while the comparison protocol needs.
Keyword: Private Set Intersection, Cloud Delegation, Set Polynomial Representation, Large-Scale Datasets
Cite@inproceedings{ICIC2025,
author = {Ou Ruan, Huiwen Miao, and Changwang Yan},
title = {Efficient Delegated Multi-Party Private Set Intersection Protocol for Large-Scale Datasets},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {882-897},
note = {Poster Volume Ⅰ}
}
- FusionCLIP-AD: Hierarchical Global-Local Adaptation with Learnable Embeddings for Robust Medical Image Anomaly Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hongwei Li and S. Kevin Zhou
Abstract: Although CLIP-based few-shot learning has shown promise in anomaly detection, it still exhibits notable limitations in medical imaging applications: fixed prompt mechanisms are difficult to finely adapt to domain differences, and the lack of collaborative modeling between local and global features results in loss of holistic information. This paper proposes a novel hierarchical adaptation framework: 1 Integration of global and local features to effectively capture potential details and comprehensive information in medical images, and 2 Multilevel learnable anomaly prompts dynamically constructed in the embedding space. By learning fused features and prompts across different layers, the model flexibly and accurately addresses complex scenarios in medical imaging. Experimental results demonstrate that the proposed method significantly enhances CLIP’s few-shot learning performance in medical image anomaly detection tasks. Our method achieves state-of-the-art performance on LiverCT with 85.55 AUROC under 4-shot settings, surpassing prior arts like MVFA 81.18
Keyword: Vision-Language Model, Few-Shot Anomaly Detection, Medical Image Analysis
Cite@inproceedings{ICIC2025,
author = {Hongwei Li and S. Kevin Zhou},
title = {FusionCLIP-AD: Hierarchical Global-Local Adaptation with Learnable Embeddings for Robust Medical Image Anomaly Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2796-2809},
}
- TMN: Bridging Modality Gap via Transition Modality Network for Visible-Infrared Person Re-Identification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Mengzhe Wang and Yuhao Wang
Abstract: Visible-infrared person re-identification VI-ReID task aims to match visible and infrared pedestrian images. However, due to the modality gap, VI-ReID task faces serious technical challenges. Existing methods have made significant progress but still suffer from two limitations: Unsupervised image generation methods are computational intensive and may introduce additional noise feature-level alignment struggles with designing effective loss functions for complex and abstract features output, resulting in insufficient learning and constraint of the model. To address these issues, we propose a Transition Modality Network TMN , which aims to construct a transitional modality between the two modalities, enabling early-stage cross-modality interaction at shallow network layers, thereby avoiding large com-putations and complex loss function design. First, the processed visible and infrared features are input into the Visible-infrared Transition Modality Fusion module VI-TMF to construct the transition modality. Secondly, we embed the Grouped Spatial-Channel Excitation block GSCE into the Resnet-50 for deep feature processing and extraction. Finally, we design a cross-modality bridging loss function to align the features of the three modalities. Through experiments on two benchmark datasets, TMN achieves Rank-1 mAP accuracy of 71.42 65.91 on the SYSU-MM01 dataset, and 92.14 83.25 on the RegDB dataset, demonstrating that transition modality construction effectively bridges cross-modality discrepancies and establishes a novel paradigm for addressing the fundamental challenges in VI-ReID tasks.
Keyword: Visible-infrared person re-identification, Transition Modality, Feature interaction and fusion
Cite@inproceedings{ICIC2025,
author = {Mengzhe Wang and Yuhao Wang},
title = {TMN: Bridging Modality Gap via Transition Modality Network for Visible-Infrared Person Re-Identification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1943-1959},
note = {Poster Volume Ⅱ}
}
- FDFE-Net: Frequency Domain Feature Enhancement Network for Infrared Small Target Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haoyu Zuo, Xincheng Zhang, Zhou Yang, Jiazhen Huang, and Xu Wang
Abstract: Infrared small target detection IRSTD encounters challenges due to the tiny sizes of targets and interference from complex backgrounds. To overcome these issues, this paper proposes a novel Frequency Domain Feature Enhancement Network FDFE-Net . The proposed network significantly improves the detection accuracy and robustness for IRSTD by integrating the micro-scale feature encoder MSF Encoder and frequency domain feature enhancement FDFE module. Specifically, the MSF Encoder combines parallel feature extraction and feature enhancement modules to effectively capture multi-scale feature information, thus mitigating information loss. The FDFE module introduces frequency domain features via the Haar wavelet transform, enhancing the semantic differences between targets and backgrounds, thereby improving the distinguishability of small targets. Experimental results on three public datasets, NUAA-SIRST, NUDT-SIRST, and IRSTD-1K, demonstrate that the proposed FDFE-Net outperforms several state-of-the-art IRSTD methods across multiple evaluation metrics.
Keyword: Infrared small target detection, Deep learning, Frequency domain feature, Haar wavelet transform.
Cite@inproceedings{ICIC2025,
author = {Haoyu Zuo, Xincheng Zhang, Zhou Yang, Jiazhen Huang, and Xu Wang},
title = {FDFE-Net: Frequency Domain Feature Enhancement Network for Infrared Small Target Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {396-410},
note = {Poster Volume Ⅰ}
}
- Cache Optimization in Consortium Blockchain System Based on GCN and XGBoost, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ao Xiong, Wenchuan Ma, Yan Zhang, Zhe Du, Xuwen Liu, and He Huang
Abstract: With the rapid development of the global carbon market, the application of consortium blockchain technology in carbon trading and carbon neutrality management has become increasingly widespread, ensuring the security and transparency of transaction data. However, as the transaction scale contin-ues to expand, the on-chain storage capacity and query efficiency of block-chain systems have gradually become key factors limiting system perfor-mance. Although traditional off-chain storage solutions have alleviated the pressure on on-chain storage to some extent, high-concurrency access sce-narios still face the challenge of high query latency. While in-memory cach-ing methods can improve query speed, they are often limited by memory ca-pacity, making it difficult to meet the query response time requirements when dealing with large amounts of data. To address these issues, this paper proposes a cache optimization strategy based on transaction access predic-tion, combining Graph Convolutional Networks GCN and Extreme Gra-dient Boosting XGBoost . The method first constructs a relationship graph of transaction data using GCN and extracts the structural features of trans-action nodes. It then combines the XGBoost model to predict access fre-quency and dynamically adjusts the cache replacement strategy. Experi-mental results show that, compared to traditional algorithms, the proposed method significantly improves cache hit rates and optimizes query perfor-mance in high-concurrency trading environments. This study provides an intelligent optimization solution for cache management in blockchain trad-ing systems, which is of great significance for improving the operational ef-ficiency of the carbon trading market.
Keyword: Blockchain, Cache Optimization, Graph Convolutional Networks
Cite@inproceedings{ICIC2025,
author = {Ao Xiong, Wenchuan Ma, Yan Zhang, Zhe Du, Xuwen Liu, and He Huang},
title = {Cache Optimization in Consortium Blockchain System Based on GCN and XGBoost},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {63-78},
}
- Omnidirectional Image Quality Assessment with TransVGG and Fused Saliency Guidance, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xican Tan, Jing Yu, Keke Tong, Shengfeng Lou, Jingsong Meng, Chuang Ma, and Wenzhi Chen
Abstract: Most existing omnidirectional image quality assessment OIQA models focus on locally salient regions within viewports, neglecting the critical guiding role of global saliency in holistic quality evaluation. This limitation restricts their performance when processing complex images. To tackle this, we propose a TransVGG-based OIQA framework guided by fused saliency map. First, the SalBiNet360 network is employed to generate fused saliency maps that combine local and global saliency information, simulating human viewing behavior during omnidirectional image observation. Then a collaborative architecture integrating Swin-Transformer and VGG has been designed to synergistically extract global and local features, thereby resolving the insufficiency of diverse guidance infor-mation. To enhance long-sequence data processing, the Mamba model is utilized for efficient omnidirectional image comprehension. Then a parallel hybrid atten-tion mechanism is introduced to retrieve semantic features from saliency feature and guide the global understanding module. Experiments carried out on two OIQA datasets demonstrate that the proposed model outperforms advanced methods in performance.
Keyword: Omnidirectional Image Quality Assessment, Parallelized Channel-and-spatial Attention Mechanism, TransVGG, Fused Saliency Guidance
Cite@inproceedings{ICIC2025,
author = {Xican Tan, Jing Yu, Keke Tong, Shengfeng Lou, Jingsong Meng, Chuang Ma, and Wenzhi Chen},
title = {Omnidirectional Image Quality Assessment with TransVGG and Fused Saliency Guidance},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {411-422},
note = {Poster Volume Ⅰ}
}
- Research on Personalized Recommendation System for Crop Cultivation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Minmin Wang, Chen Dong, Yiran Liu, and Yuehong Lin
Abstract: This paper introduces a personalized crop recommendation system using ensemble learning and collaborative filtering algorithm to tackle traditional cultivation’s reliance on experience and low economic returns. A soft voting ensemble model combining KNN, SVM, and RF boosts recommendation accuracy to 99.13 , and alleviates the cold start issue. An Intelligent Integrated Scoring Mechanism merges collaborative filtering scores with market price scores in a 1:1 ratio, producing a ranked crop list and an Intelligent Integrated Recommendation Score, further increasing accuracy to 99.27 and achieving Pareto optimality between yield and economic bene-fits. Experiments show the system improves the F1 score by 7.2 and 2.1 over KNN and SVM baselines, respectively, and raises the NDCG metric by 16 compared to collaborative filtering algorithm, enhancing recommendation quality and farmers’ economic outcomes.
Keyword: Crop Cultivation Recommendation, Soft Voting Ensemble Model, Intelligent Integrated Scoring Mechanism, Cold Start, Pareto Optimal Cultivation
Cite@inproceedings{ICIC2025,
author = {Minmin Wang, Chen Dong, Yiran Liu, and Yuehong Lin},
title = {Research on Personalized Recommendation System for Crop Cultivation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {49-62},
}
- FMLTC-IDS: A Federated Meta-Learning and Adaptive Time Clustering-based IoT Intrusion Detection System, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jingxian Zhou, Zhou Liu, and Qiang Zhu
Abstract: With the rapid proliferation of IoT devices, network security threats have intensified. Federated Learning FL has been applied to anomaly-based Network Intrusion Detection Systems NIDS to identify malicious traffic and mitigate risks. However, traditional FL struggles to handle Non-IID data, and while Personalized FL PFL improves adaptability, it remains insufficient in addressing the dynamic nature of time-series data. To address these issues, this paper proposed an IoT Intrusion Detection System based on Federated Meta-Learning and Adaptive Temporal Clustering FMLTC-IDS . The method combines Model-Agnostic Meta-Learning MAML to optimize the initialization of the global model, enhancing personalized adaptation. It also introduces adaptive batch adjustment and gradient-weighted sampling strategies to improve local training efficiency. Additionally, Principal Component Analysis PCA is used for dimensionality reduction, and a time-series-based dynamic weighted DBA-K-Means clustering method is employed to optimize model clustering quality, enhancing the system's ability to handle spatiotemporal non-IID data. Experimental results show that FMLTC-IDS achieves excellent performance on CICIDS2017, BoT-IoT, and real IoT traffic datasets, outperforming existing methods e.g., Fed-ANIDS, SSFL by more accurately adapting to data heterogeneity, improving Accuracy, Recall, and F1-score, and accelerating model convergence. Furthermore, ablation experiments validate the effectiveness of dynamic batch adjustment, PCA dimensionality reduction, and time-series clustering strategies, demonstrating significant advantages in enhancing personalized detection capabilities and overall detection accuracy for FMLTC-IDS.
Keyword: IoT Security , Personalized Intrusion Detection , Federated Learning , spatial-temporal non-IID problem
Cite@inproceedings{ICIC2025,
author = {Jingxian Zhou, Zhou Liu, and Qiang Zhu},
title = {FMLTC-IDS: A Federated Meta-Learning and Adaptive Time Clustering-based IoT Intrusion Detection System},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {898-914},
note = {Poster Volume Ⅰ}
}
- MTS-DTA: A drug target affinity prediction framework based on multi-task optimization and co-training, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Bingchen Zhao, Lei Yu, and Hongzhe Tang
Abstract: Drug-target affinity DTA prediction remains a critical challenge in AI-driven drug discovery yet suffers from severe scarcity of experimentally validated data due to the prohibitively high costs and time-intensive nature of biochemical as-says. This data limitation not only amplifies overfitting risks but also compromis-es model generalizability under real-world distributional shifts. While existing approaches predominantly rely on molecular docking simulations and generative models�capable of producing synthetic data�they inadequately exploit available information due to inherent prior biases. To address these challenges, we propose MTS-DTA, a semi-supervised multi-task framework integrating co-training strat-egies with cross-task representation alignment. The framework introduces two core innovations: 1 multi-task synchronization, which enhances feature general-izability through joint optimization of representation and prediction tasks 2 cor-relation-guided pseudo-labeling, dynamically generating pseudo-labels via inter-task dependencies to leverage unlabeled data while mitigating noise propagation. Benchmark evaluations confirm the framework�s improved robustness against distributional biases, establishing a viable strategy to address data scarcity in drug discovery.
Keyword: Drug�Target Affinity Prediction, Multi-task Learning, Semi-supervised Learn-ing, Masked Language Modeling
Cite@inproceedings{ICIC2025,
author = {Bingchen Zhao, Lei Yu, and Hongzhe Tang},
title = {MTS-DTA: A drug target affinity prediction framework based on multi-task optimization and co-training},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2424-2440},
note = {Poster Volume Ⅱ}
}
- ConDRL-JSP: A Contrastive and Reinforcement Learning-Based Framework for JSP, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jian Li,Shuhan Qi,Xinyu Xiao,Chao Xing,Jiajia Zhang, and Xuan Wang
Abstract: In industries like manufacturing, the Job Shop Scheduling Problem JSP , a classic NP-hard combinatorial optimization problem, faces numerous challenges. Traditional solution methods are restricted by specific scenarios and suffer from high computational complexity, while constructive-based methods often exhibit low sample efficiency and poor generalization ability. To address these limitations, this paper proposes a novel job shop scheduling framework that integrates contrastive learning and reinforcement learning, complemented by a curriculum learning strategy to enhance model training and significantly improve production scheduling efficiency.A key motivation for incorporating contrastive learning is to address the weak feature discrimination and limited data mining capabilities of traditional reinforcement learning methods in JSP. By leveraging contrastive learning, the framework enhances the model’s ability to extract discriminative features from complex scheduling states, enabling more effective decision-making in diverse and large-scale scenarios. Additionally, the framework adopts a curriculum learning strategy to guide the model through a progressive learning process. This strategy dynamically adjusts the difficulty of training tasks based on the model’s performance, starting with simpler instances and gradually advancing to more complex ones. This approach not only improves the model’s generalization ability but also helps avoid local optima, ensuring robust and efficient scheduling solutions.Experimental results demonstrate that the proposed framework achieves significant improvements in scheduling quality, reducing the makespan and approaching optimal solutions across various benchmark datasets. The integration of contrastive learning and curriculum learning provides a powerful and adaptable solution to the challenges of JSP, offering a promising direction for future research in combinatorial optimization.
Keyword: Job Shop Scheduling Problem Contrastive Learning Reinforcement Learning Curriculum Learning
Cite@inproceedings{ICIC2025,
author = {Jian Li,Shuhan Qi,Xinyu Xiao,Chao Xing,Jiajia Zhang, and Xuan Wang},
title = {ConDRL-JSP: A Contrastive and Reinforcement Learning-Based Framework for JSP},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {349-366},
}
- MT-Net: A heterogeneous image matching method based on modality transformation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Min Nuo, Fan Wang, Xinrong Wu, Xueqi Cheng, and Xiaopeng Hu
Abstract: This paper focuses on the challenges in heterogeneous image matching. The matching accuracy of heterogeneous image pairs is lower than that of homogeneous image pairs. Existing methods have attempted adaptive improvements to address the challenges in heterogeneous image matching, but accuracy still needs improvement. This is because heterogeneous image pairs exhibit significant differences, primarily due to their distinct imaging mechanisms. Regarding this issue, we propose an end-to-end hybrid framework that employs modality transformation for heterogeneous image matching. First, a modality transformation method based on style transfer is proposed to convert heterogeneous image pairs into pseudo-homologous image pairs. Second, we extract multiscale and multilevel discriminative features from the pseudo-homologous image pairs to enhance the repeatability and discrimination of keypoints. Third, a unified matching loss is proposed to optimize the method for generating pseudo- homologous images. This loss function improves the performance of the modality transformation module and even the entire network. The experiments indicate that the proposed MT-Net improves the mean matching result by 0.9 –3.5 .
Keyword: Heterogeneous Images, Image Matching, Style Transfer, End-to-end Learning.
Cite@inproceedings{ICIC2025,
author = {Min Nuo, Fan Wang, Xinrong Wu, Xueqi Cheng, and Xiaopeng Hu},
title = {MT-Net: A heterogeneous image matching method based on modality transformation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2811-2822},
}
- Statistical Feature-Driven Regularization for Structured Model Pruning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jielei Wang, Dongnan Liu, Heng Yin, Kexin Li, Guangchun Luo, and Guoming Lu
Abstract: Structured pruning is a highly effective model compression technique that balances accuracy and acceleration, making it widely adopted in the field of convolutional neural networks. Traditional prun- ing methods relying on magnitude-based criteria exhibit limitations in distinguishing critical channels because of narrow parameter distribu- tions in sparse models. Building on this phenomenon, we propose a statistical feature-driven structured pruning framework that integrates dependency-aware group regularization. By incorporating a dependency graph to model inter-layer relationships and leveraging both the mean and variance of channel parameters, we design a dynamic regularization term to reduce both the norm and variance of channels, encouraging uni- form shrinkage. Our approach has been validated through experiments across diverse datasets and model architectures, achieving only a 0.71 accuracy drop on ImageNet compared to the baseline model under sim- ilar FLOPs reduction ratios.
Keyword: Structured Pruning· Convolutional Neural Networks· Regularization· Statistical Feature
Cite@inproceedings{ICIC2025,
author = {Jielei Wang, Dongnan Liu, Heng Yin, Kexin Li, Guangchun Luo, and Guoming Lu},
title = {Statistical Feature-Driven Regularization for Structured Model Pruning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1385-1396},
note = {Poster Volume Ⅱ}
}
- Arbitrary-Scale Super-Resolution for Remote Sensing Images with Multi-Branch Feature Enhancement and Scale-Specific Dictionary Attention, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xin Jin, Zhiyuan Li, Yuhao Xie, Bo Li, Cong Huang, Xiaoyuan Xu, Ahmed Zahir, and Qian Jiang
Abstract: With the widespread application of deep learning technologies such as Convolutional Neural Networks CNNs and Generative Adversarial Networks GANs , facial forgery techniques have matured rapidly, bringing innovative applications to multiple fields while also raising serious security concerns. To address this challenge, researchers have developed various deepfake detectors. However, these detectors have shown significant vulnerabilities when faced with adversarial attacks. This study aims to systematically evaluate the performance of deepfake detectors under adversarial attacks and test the effectiveness of various defense methods. Through large-scale experiments, we analyzed the performance of different types of detectors under various adversarial attacks and assessed the efficacy of existing defense strategies. The results indicate that while some defense methods perform well in specific scenarios, the overall robustness of detectors still needs improvement. This research not only deepens our understanding of adversarial robustness in deepfake detection but also provides important experimental evidence and theoretical guidance for developing more effective defense strategies.
Keyword: Internet of things, adversarial examples, object detection, computer vision, deep learning
Cite@inproceedings{ICIC2025,
author = {Xin Jin, Zhiyuan Li, Yuhao Xie, Bo Li, Cong Huang, Xiaoyuan Xu, Ahmed Zahir, and Qian Jiang},
title = {Arbitrary-Scale Super-Resolution for Remote Sensing Images with Multi-Branch Feature Enhancement and Scale-Specific Dictionary Attention},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {423-440},
}
- SOC estimation of sodium-ion batteries based on EKF, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiayi Zhu, Xiaoke Liu, and Taixin Chen
Abstract: As a new generation of energy storage devices, sodium-ion batteries have promising applications in renewable energy storage and electric vehicles. However, the state-of-charge SOC estimation of sodium-ion batteries is limited by their complex electrochemical properties and dynamic respons-es. In this paper, we propose a SOC estimation method for sodium-ion bat-teries based on extended Kalman filter EKF . Firstly, a second-order RC equivalent circuit model is established for the characteristics of sodium-ion batteries, and the feasibility of the open-circuit voltage-based accurate es-timation of the state of charge of sodium-ion batteries is verified through the open-circuit voltage OCV test experiments, and the experimental data-driven method is adopted for the parameter identification furthermore, a system model is constructed in the Matlab Simulink simulation platform, and the applicability of the model is verified through the online simulation. simulation to verify the applicability of the model. Secondly, the state space equations were constructed based on the model, and the improved EKF algorithm was used to realize the online estimation of SOC. Finally, the effectiveness of the proposed method is verified by the stage discharge condition. The simulation results show that the method can accurately track the SOC changes of sodium-ion batteries, and the estimation error is con-trolled within 2.5 , with high estimation accuracy and robustness. Through accurate parameter identification and model optimization, this pa-per significantly improves the accuracy of SOC estimation of sodium-ion batteries, provides an efficient and reliable solution for the sodium-ion bat-tery management system, and provides important technical support for its promotion in practical applications.
Keyword: Sodium-ion Battery, SOC Estimation, Extended Kalman Filter, Equivalent Circuit Model, Parameter Identification.
Cite@inproceedings{ICIC2025,
author = {Jiayi Zhu, Xiaoke Liu, and Taixin Chen},
title = {SOC estimation of sodium-ion batteries based on EKF},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3524-3541},
}
- Addressing Noise and Stochasticity in Fraud Detection for Service Networks, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wenxin Zhang, Ding Xu, Xi Xuan, Lei Jiang, Guangzhen Yao, Renda Han, Xiangxiang Lang, and Cuicui Luo
Abstract: Fraud detection is crucial in social service networks to maintain user trust and improve service network security. Existing spectral graph-based methods address this challenge by leveraging different graph filters to capture signals with different frequencies in service networks. However, most graph filter-based methods struggle with deriving clean and discriminative graph signals. On the one hand, they overlook the noise in the information propagation process, resulting in degradation of filtering ability. On the other hand, they fail to discriminate the frequency-specific characteristics of graph signals, leading to distortion of signals fusion. To address these issues, we develop a novel spectral graph network based on information bottleneck theory SGNN-IB for fraud detection in service networks. SGNN-IB splits the original graph into homophilic and heterophilic subgraphs to better capture the signals at different frequencies. For the first limitation, SGNN-IB applies information bottleneck theory to extract key characteristics of encoded representations. For the second limitation, SGNN-IB introduces prototype learning to implement signal fusion, preserving the frequency-specific characteristics of signals. Extensive experiments on three real-world datasets demonstrate that SGNN-IB outperforms state-of-the-art fraud detection methods.
Keyword: Fraud detection, Graph neural network, Heterophily
Cite@inproceedings{ICIC2025,
author = {Wenxin Zhang, Ding Xu, Xi Xuan, Lei Jiang, Guangzhen Yao, Renda Han, Xiangxiang Lang, and Cuicui Luo},
title = {Addressing Noise and Stochasticity in Fraud Detection for Service Networks},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {423-437},
note = {Poster Volume Ⅰ}
}
- Smart Contract Vulnerabilities Detection with Adaptive Loss Weight and Entropy Weight, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jingyuan Hu, Peng Su, and Xuanxia Yao
Abstract: Smart contract security constitutes the foundational cornerstone for ensuring the trusted operational integrity of blockchain ecosystems. In recent years, multi-task learning MTL architectures have been widely adopted in smart contract vulnerability detection, owing to their context-aware optimization and superior generalization capabilities compared to single-task learning STL frameworks. However, MTL-based approaches for smart contract vulnerability detection face two persistent challenges: 1 The negative transfer phenomenon, the mitigation of negative transfer via adaptive loss weighting in smart contract vulnerability detection remains underexplored in existing research. 2 Performance degradation caused by the homogeneous contribution assumption where undifferentiated contract representations impair expert layer learning efficacy. To overcome these limitations, we propose a novel detection framework incorporating adaptive loss weight and entropy-based feature enhancement. Our dual-weighting mechanism introduces: 1 dynamic loss coefficients that automatically balance task-specific optimization objectives based on evolving learning complexity and task significance, and 2 entropy-aware attention weights that prioritize high-information contract features during expert network training. Comprehensive evaluations on real-world smart contract datasets demonstrate the framework's superior detection performance compared to three state-of-the-art adaptive weighting baselines. Experimental results reveal significant improvements in F1-score across multiple vulnerability types, validating the effectiveness of our approach in mitigating negative transfer while maintaining robust concurrent detection capabilities. The experimental code will be systematically organized and made publicly available on GitHub shortly.
Keyword: Smart Contract, Vulnerability Detection, Adaptive Loss Weight, Entropy Weight.
Cite@inproceedings{ICIC2025,
author = {Jingyuan Hu, Peng Su, and Xuanxia Yao},
title = {Smart Contract Vulnerabilities Detection with Adaptive Loss Weight and Entropy Weight},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1401-1416},
note = {Poster Volume Ⅱ}
}
- A New Exploration: Ancient Book Defect Detection with Attention Mechanisms, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jun Yu, Yemao Zhang, Jiahui Cheng, Lingnan Bai, Jiaxing Fan, Zhen Zhang, Ruiyao Han, and Zhe Xu
Abstract: With the increasing application of digital technology in cultural heritage preservation, the digitization of ancient books and their defect detection has become an important research topic. Ancient books are prone to a variety of defects, and traditional manual detection methods are inefficient and cannot guarantee accuracy. This paper constructs a specialized dataset containing six types of defects and proposes an improved YOLOv8 network, which is applied to ancient book defect detection for the first time. By introducing three atten-tion mechanisms—CBAM, SEBlock, and ECA—and applying improvements at different positions within the network, the model's ability to recognize de-fects is enhanced. Experimental results show that the improved YOLOv8 mod-el significantly improves detection performance.
Keyword: Ancient Book Defect Detection, Object Detection, YOLOv8, Attention Mech-anism.
Cite@inproceedings{ICIC2025,
author = {Jun Yu, Yemao Zhang, Jiahui Cheng, Lingnan Bai, Jiaxing Fan, Zhen Zhang, Ruiyao Han, and Zhe Xu},
title = {A New Exploration: Ancient Book Defect Detection with Attention Mechanisms},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2827-2840},
}
- Argus: Multi-view LiDAR Point Cloud Fusion for Enhancing Vehicle Detection in Auto Driving, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yifei Tian, Hongwei Huang, and Xiangyu Li
Abstract: The environmental perception of unmanned ground vehicles UGVs direct-ly impacts decisions like path planning and obstacle avoidance, with vehi-cle detection being critical for autonomous driving. LiDAR provides high-precision point clouds but suffers from sparse density and self-occlusion, often resulting in incomplete vehicle point clouds that hinder detection performance. To address this, we propose Argus, a multiview registration and completion model that fuses multi-frame point clouds of surrounding vehicles. Argus achieves multi-view fusion through a self-attention-based cumulative registration module and a coarse-to-fine residual completion module, refining vehicle point clouds using grid residual layers and a multi-layer perceptron. Compared to single-view point clouds, Argus produces denser and more complete vehicle shapes, serving as an independent plug-in to enhance detection methods. Experiments on the KITTI dataset show that Argus improves downstream vehicle detection performance.
Keyword: LiDAR Point Cloud Fusion, Multiple Accumulating Registration Strategy, Coarse to Fine Complete, Vehicle Detection
Cite@inproceedings{ICIC2025,
author = {Yifei Tian, Hongwei Huang, and Xiangyu Li},
title = {Argus: Multi-view LiDAR Point Cloud Fusion for Enhancing Vehicle Detection in Auto Driving},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2563-2576},
}
- Separable Auxiliary Training for Real-Time Small Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xinrong Wu, Fan Wang, Min Nuo, Ying Zhou, and Xiaopeng Hu
Abstract: During the training of end-to-end detectors, one-to-one label assignments result in an insufficient number of positive samples, impeding the learning of discriminative features. Existing methods have employed one-to-many label assignments and denoising training strategies to provide additional supervision, thereby increasing the number of positive samples or introducing samples with noise. However, these additional supervisions perform bidirectional feature fusion with the original end-to-end models, increasing the computational costs of the model during inference. In this paper, we propose a Separable Auxiliary Training SAT for real-time small object detection to achieve auxiliary supervision without additional inference delay. In SAT, an auxiliary branch supervised by a one-to-many label assignment is adopted to assist a deployment branch during training. To avoid increasing the inference costs, a one-way feature flow from the deployment branch to the auxiliary branch has been designed. The flow ensures that the deployment branch can be deployed independently without sacrificing any accuracy. Extensive experiments demonstrate that SAT can provide additional supervision to enhance performance without increasing computational costs during inference.
Keyword: Small Object Detection, SAT, Separable Auxiliary Supervision, RT-DETR
Cite@inproceedings{ICIC2025,
author = {Xinrong Wu, Fan Wang, Min Nuo, Ying Zhou, and Xiaopeng Hu},
title = {Separable Auxiliary Training for Real-Time Small Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2842-2853},
}
- EN-BERT: A Transformer-based Model for Encrypted Attack Traffic Detection with Pre-training and Fine-tuning Phases, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaoying Huang, Yanping Xu, Yanbo Fang, Yuxin Shen, and Yongxing Xu
Abstract: Encrypted attacks represented by ransomware and APT are becoming increasingly complex, posing a huge threat to cyberspace. As a result, encrypted attack traffic detection is imperative. Traditional encrypted attack traffic detection methods face challenges which include feature extraction limitations, dataset imbalance, and poor generalization capabilities. To address these issues, this paper proposes EN-Bert, a Transformer-based model with both pre-training and fine-tuning phases. In the pre-training stage, the encrypted traffic dataset is first processed using Token serialization for traffic shunting, segmentation, and feature extraction. Then the model is pre-trained on two tasks: Masked Flow Model MFM and Same-origin Flow Prediction SFP to uncover the contextual relationships between traffic flows. In the fine-tuning stage, this paper addresses the imbalance issues through data enhancement. From the model’s perspective, weighted cross-entropy loss and K-L divergence are designed in this phase to optimize the model’s performance and enhance its generalization ability. In the experimental section, with the CIC-IDS-2017 dataset and a self-collected encrypted DOS attack traffic dataset, comparative and ablation experiments demonstrate that EN-Bert model proficiently addresses challenges related to dataset imbalance and poor model generalization, proving to be an effective and reliable approach for encrypted attack traffic detection.
Keyword: encrypted attack traffic detection, EN-Bert, pre-training phase, data enhancement, weighted cross-entropy, K-L divergence.
Cite@inproceedings{ICIC2025,
author = {Xiaoying Huang, Yanping Xu, Yanbo Fang, Yuxin Shen, and Yongxing Xu},
title = {EN-BERT: A Transformer-based Model for Encrypted Attack Traffic Detection with Pre-training and Fine-tuning Phases},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {441-455},
}
- SEBF-YOLO: An Improved YOLOv8s for Small Insect Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Lai Jiang, Rui Xiong, and Zhiwu Liao
Abstract: To address the low detection accuracy of small insect with blurred features and complex backgrounds in agricultural scenarios, we propose an improved YOLOv8 You Only Look Once version 8 model, SEBF-YOLO, to tackle the shortcomings of insufficient feature extraction and fusion in the original YOLOv8 for small insect detection. Given that small insects with low pixel occupancy in spatial domains often suffer from feature loss during extraction, a Space-to-Depth SPD module is introduced after each convolutional layer in the backbone network to enhance the extraction of fine-grained features for small targets. For the challenges of complex backgrounds and feature blurriness—rooted in the model’s inability to distinguish backgrounds and lack of effective multi-scale feature fusion—the C2f_EMA module is added after concatenation layers in the neck network, establishing bidirectional cross-scale connections and adopting a weighted fusion strategy to strengthen critical features of blurred targets by integrating multi-level features. Subsequently, the BiFormer module is introduced after C2f_EMA to leverage dynamic attention mechanisms for weighted focusing on fused feature maps, integrating local details and global contextual information to suppress background interference and enhance target discrimination in complex scenes. Experimental results on a self-built dataset demonstrate that SEBF-YOLO achieves a mean Average Precision mAP of 77.3 at an Intersection over Union IoU of 0.5, a 4.1 improvement over the original model, providing an effective solution for detecting small insect targets in agricultural environments.
Keyword: YOLOv8, small target detection, attention mechanism
Cite@inproceedings{ICIC2025,
author = {Lai Jiang, Rui Xiong, and Zhiwu Liao},
title = {SEBF-YOLO: An Improved YOLOv8s for Small Insect Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2858-2875},
}
- Glass Segmentation with Multi Scales and Primary Prediction Guiding, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhiyu Xu and Qingliang Chen
Abstract: Glass-like objects can be seen everywhere in our daily life which are very hard for existing methods to segment them. The properties of transparencies pose great challenges of detecting them from the chaotic background and the vague separation boundaries further impede the acquisition of their exact con tours. Moving machines which ignore glasses have great risks of crashing into transparent barriers or difficulties in analysing objects reflected in the mirror, thus it is of substantial significance to accurately locate glass-like objects and completely figure out their contours. In this paper, inspired by the scale integra tion strategy and the refinement method, we proposed a brand-new network, named as MGNet, which consists of a Fine-Rescaling and Merging module FRM to improve the ability to extract spatially relationship and a Primary Pre diction Guiding module PPG to better mine the leftover semantics from the fused features. Moreover, we supervise the model with a novel loss function with the uncertainty-aware loss to produce high-confidence segmentation maps. Un like the existing glass segmentation models that must be trained on different set tings with respect to varied datasets, our model are trained under consistent set tings and has achieved superior performance on three popular public datasets.
Keyword: glass segmentation
Cite@inproceedings{ICIC2025,
author = {Zhiyu Xu and Qingliang Chen},
title = {Glass Segmentation with Multi Scales and Primary Prediction Guiding},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2579-2595},
}
- Geological Layer-Aware Cross-Modal Learning for Multi-Label Drill Cuttings Classification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuying Cui, Jianwei Niu, Xuefeng Liu, and Yan Ding
Abstract: Multi-label drill cuttings classification reveals the current lithologies during drilling, which is crucial for guiding oil drilling operations. Many existing methods rely on annotated images for feature extraction, making their accuracy highly dependent on dataset size. However, due to equipment and manpower constraints, datasets in this field are generally small, posing a significant challenge for improving traditional methods for accurate multi-label classification. Notably, we observed that geological layers influence the distribution of drill cuttings, highlighting the importance of effectively leveraging geological priors for classification. In this paper, we propose a geological layer-aware cross-modal learning framework, which explicitly leverages local layer-wise information and global label co-occurrence patterns for multi-label drill cuttings classification. Unlike conventional end-to-end models, our framework first estimates the geological layer of a given image and derives a corresponding cuttings proportion vector. These priors are then employed to guide the alignment between visual and textual features, leading to more precise visual representations. Furthermore, we introduce a global co-occurrence matrix that captures label dependencies and enhances enhances the learning of visual representations through a graph convolutional network GCN , resulting in more accurate label predictions. Experiments on our dataset demonstrate that our approach significantly outperforms state-of-the-art methods, achieving a mean average precision mAP of 98.8 .
Keyword: 1 Multi-label Image Classification 2 Drill Cuttings 3 Layer-Wise Proportion 4 Co-Occurrence Matrix
Cite@inproceedings{ICIC2025,
author = {Shuying Cui, Jianwei Niu, Xuefeng Liu, and Yan Ding},
title = {Geological Layer-Aware Cross-Modal Learning for Multi-Label Drill Cuttings Classification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2873-2887},
}
- UniversalRAG: Universal Retrieval-Augmented Generation Framework, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Tianci Wu, Zikang Zhang, Dong Zhang, and Juntao Li
Abstract: Large Language Models LLMs excel in various tasks, yet hallucination limits their applicability in high-accuracy, domain-specific scenarios. Retrieval-Augmented Generation RAG mitigates this issue by integrating external knowledge retrieval, but existing systems struggle with multimodal, multi-format corpora common in industrial settings, and targeted evaluation datasets remain scarce. This paper introduces UniversalRAG, a modular, plug-and-play RAG framework supporting diverse document formats with adaptive indexing, retrieval, and generation agents, enhancing RAG adaptability and output quality. To validate its effectiveness, we develop the FACT dataset Fact-based Augmented Corpus Testing for RAG evaluation. Experimental results show that UniversalRAG, when paired with GPT-4o, achieves a 73.68 score, a 8.54-point improvement over the naive RAG baseline, significantly outperforming traditional methods. Ablation studies confirm the essential roles of indexing, retrieval, and generation agents in system performance. This work not only introduces a versatile RAG framework but also fills a critical gap in end-to-end evaluation, advancing RAG system development and assessment.
Keyword: Large language model, Hallucination mitigation, Information retrieval.
Cite@inproceedings{ICIC2025,
author = {Tianci Wu, Zikang Zhang, Dong Zhang, and Juntao Li},
title = {UniversalRAG: Universal Retrieval-Augmented Generation Framework},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {759-770},
}
- Achieving High Efficiency Heart Image Segmentation in U-Net by Means of Early Fusion and Contextual Information Reconstruction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Huijuan Hao and Wenpeng Wang
Abstract: Accurate segmentation of the ventricles provides quantitative data on the cardiac structure and function, facilitating precise diagnosis for medical professionals. Deep learning methods have been widely applied and proven highly effective in cardiac medical image segmentation however, challenges still remain. Due to the complexity of anatomical structures and the issues related to fine detail recovery in high - quality cardiac medical images, many current studies attempt to improve feature extraction and segmentation accuracy by increasing the network depth. While increasing network depth by introducing more convolutional layers and nonlinear transformations can effectively address issues such as complex morphology, high - quality cardiac images may still encounter problems such as overfitting, insufficient detail recovery, noise sensitivity, computational resource bottlenecks, and class imbalance, especially in deeper layers of the network. To address these challenges, this paper introduces EAS - Net. EAS - Net employs a classic U - shaped encoder architecture and incorporates a novel Residual Structure Unit RSU technique. Furthermore, the information prior to sampling, along with the process sampling information, is fused in advance and then input into the innovative Contextual Information Reconstruction CIR method and the Multi - Head Dilated Attention MHDA algorithm. This block effectively captures multi - scale contextual information, expands the receptive field, and significantly reduces computational complexity. Extensive experiments on multiple medical datasets demonstrate that EAS - Net exhibits high efficiency and robustness in high - quality cardiac image segmentation, particularly in left and right ventricular segmentation. It achieves exceptional performance while maintaining a low model complexity.
Keyword: Ventricular segmentation, Deep learning, High-quality cardiac images, Computational complexity
Cite@inproceedings{ICIC2025,
author = {Huijuan Hao and Wenpeng Wang},
title = {Achieving High Efficiency Heart Image Segmentation in U-Net by Means of Early Fusion and Contextual Information Reconstruction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2408-2421},
note = {Poster Volume Ⅱ}
}
- SIA-YOLO: A Lightweight Multi-scale Feature Fusion Network For Bearing Surface Defect Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yafei Zhu, Rangyong Zhang, Qijia Ping, and Jian Li
Abstract: Bearing surface defect detection is a key task in manufacturing quality control. However, traditional detection methods often fail to meet the requirements in terms of accuracy and efficiency when faced with defects of small size, diverse shapes and complex backgrounds. To solve this problem, this paper proposes a lightweight multi-scale feature fusion network based on YOLOv11. Firstly, the lightweight New StarNet module is used as the backbone to extract features by stacking multiple star operation blocks, while downsampling is performed using convolutional layers, and nonlinear mapping is achieved through element-wise multiplication. This improves the model's feature extraction capability while reducing inference overhead through lightweight calculation. Secondly, the IRMA attention module is embedded in the neck, so that the model can better extract important features of the bearing surface, while enhancing the small target detection capability and keeping the model lightweight. Finally, the improved AFPN module is used to optimize the detection head, which significantly enhances the model's feature expression capability and effectively improves the model's detection capability for multi-scale defects. Experiments show that the GFLOPs of the SIA-YOLO algorithm on ZC bearing dataset is reduced from 6.4GFLOPs of YOLOv11 to 4.2GFLOPs, a reduction of 34.4 . The mAP@0.5 of the SIA-YOLO algorithm increased by 1.6 from 87.5 to 89.1 . A large number of ablation and comparative experiments have verified the effectiveness and generalization ability of the model in bearing surface defect detection.
Keyword: Bearing Surface Defect Detection, YOLOv11, Multi-scale Feature Fusion, Lightweight, Attention Mechanism.
Cite@inproceedings{ICIC2025,
author = {Yafei Zhu, Rangyong Zhang, Qijia Ping, and Jian Li},
title = {SIA-YOLO: A Lightweight Multi-scale Feature Fusion Network For Bearing Surface Defect Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2889-2906},
}
- LBERT-MSDF: A Dynamic Fusion Network for Multi-Scale Text Feature Extraction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: MingLe Zhou, XinYu Liu, JiaChen Li, and DeLong Han
Abstract: Text categorization is a core task in natural language processing and plays an irreplaceable role in tasks such as spam detection, user sentiment analysis, and news topic classification. In this paper, we propose a dynamic fusion network for multi-scale semantic feature extraction based on BERT in order to improve the model's understanding of the semantics, which is difficult to capture different levels of semantic information in the text at the same time, which leads to classification ambiguity and reduces the classification accuracy, etc. This network proposes a multi-functional channel depth modulator VCDM in order to improve the model's understanding of the semantics, which enables the model to perform a deeper analysis at the macro level such as the context of the text. The small-scale feature extractor BMHA is also proposed to perform feature extraction at the micro level to build the underlying structure of the vocabulary, which complements and enhances the effect of the VCDM module. In order to be able to capture the feature information better and optimize the accuracy of the classification results, the optimal BERT output level is selected for the input and output of the model, as well as the introduction of an adaptive weight weighting mechanism to dynamically fuse the outputs of the two modules. The experimental results show that the MSDF model outperforms the existing models, achieving better accuracies of 95.03 , 93.63 , and 89.47 on the three datasets, respectively, proving the effectiveness of the MSDF model proposed in this paper on the text classification task.
Keyword: {Natural language processing、Text classification 、 Bert 、 Multi-scale feature extraction 、 Weighted fusion strategy.
Cite@inproceedings{ICIC2025,
author = {MingLe Zhou, XinYu Liu, JiaChen Li, and DeLong Han},
title = {LBERT-MSDF: A Dynamic Fusion Network for Multi-Scale Text Feature Extraction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {771-786},
}
- DDF-Net: A Dual-Branch Deep Feature Fusion Network for Few-Shot Hyperspectral Image Classification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ke Chen and AiGuo Chen
Abstract: In recent years, various deep learning frameworks have been introduced for hyperspectral image HSI classification. However, the proposed network models often exhibit high model complexity and fail to provide high classification accuracy when applied in few-shot learning scenarios. In this paper, we propose a Dual-Branch Deep Feature Fusion Network DDF-Net for few-shot hyperspectral image classification. DDF-Net extracts multi-layer features from hyperspectral images using a pre-trained CNN model and applies Principal Component Analysis PCA for dimensionality reduction. Subsequently, non-overlapping image patches are extracted from the reduced-dimensional features, and processed through two parallel streams: a 3D-CNN stream for spatial feature extraction and a CV-CNN stream for spectral feature extraction. Additionally, to enhance model performance, the Squeeze-and-Excitation SE mechanism is incorporated. Finally, the features from the two branches are effectively integrated through concatenation fusion and enhancement by the SE module, and then input into an SVM for classification. Experiments conducted on multiple datasets demonstrate the effectiveness and efficiency of DDF-Net in hyperspectral image classification, outperforming state-of-the-art methods.
Keyword: Hyperspectral Data Few-Shot Learning Deep Features Convolutional Kernels Dual-Branch.
Cite@inproceedings{ICIC2025,
author = {Ke Chen and AiGuo Chen},
title = {DDF-Net: A Dual-Branch Deep Feature Fusion Network for Few-Shot Hyperspectral Image Classification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2904-2921},
}
- pLFL: A Lightweight Federated Learning Framework for Credit Risk Prediction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yu Songsen, Yan Songlian, Qiu Miaosheng, and Pan Ming
Abstract: A significant challenge faced by numerous small- and medium-sized banks in the field of credit risk prediction lies in the limitations of available data, high non-performing loan ratios, and stringent data privacy regulations. Federated learning FL offers a promising solution by enabling collaborative model training across multiple institutions without the need to share sensitive data, thus safeguarding privacy while enhancing the accuracy of credit risk predictions. This study focuses on borrower default prediction as a practical application scenario for small- and medium-sized banks and introduces a lightweight federated learning framework pLFL designed to optimize model performance. The proposed framework integrates an enhanced tADA data preprocessing technique with an improved pFed aggregation algorithm, effectively addressing the aforementioned challenges. To evaluate the efficacy of the pLFL framework, experiments were conducted on two real-world datasets. The results demonstrate substantial performance improvements: on the credit card dataset, the F1 score of the model increased to 81.5 , with Precision reaching 91.5 . On the Lending Club Loan Data dataset, communication overhead was significantly reduced, and the global model's convergence rate accelerated to 1.8 times its original speed. Furthermore, the pLFL framework incorporates parameter quantization and asynchronous communication strategies to minimize system resource consumption, underscoring its practicality for small- and medium-sized financial institutions. This research presents an efficient and privacy-preserving solution for credit risk prediction in the financial sector, particularly in scenarios requiring cross-institutional collaboration with heterogeneous data distributions.
Keyword: Federated Learning, Credit Risk, Lightweight, Risk Prediction
Cite@inproceedings{ICIC2025,
author = {Yu Songsen, Yan Songlian, Qiu Miaosheng, and Pan Ming},
title = {pLFL: A Lightweight Federated Learning Framework for Credit Risk Prediction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3447-3464},
}
- Hyperbolic Hierarchical Topic-based Keyphrase Generation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wu Zhuang, Heng Yu, and Yafu Li
Abstract: Keyphrases can concisely describe the high-level topics discussed in a document that usually possesses hierarchical topic structures.Thus, it is crucial to understand the hierarchical topic structuresand employ it to guide the keyphrase identification.However, existing works that integrates the hierarchical topic information into a deep keyphrase generation model still remain in Euclidean space. Their ability to capture the hierarchical structures is limited by the nature of Euclidean space. To this end, we design a new hyperbolic hierarchical topic-based keyphrase generation method Hyper- HTKG to effectively exploit the hierarchical topic to improve the keyphrase generationperformance. Concretely, we propose a novel hyperbolic hierarchical topic-guided sequence generation method for keyphrase generation, which consists of two major modules: a hyperbolic hierarchical topic model that learns the latent topic tree across the whole corpus of documents, and a hyperbolic keyphrase generation model to generate keyphrases under hierarchical topic guidance.Finally, these two modules are jointly trained to help them learn complementary information from eachother.To the best of our knowledge, this is the first study to explore ahyperbolic hierarchical topic-based network for keyphrase generation. Compared with seven baseline methods, Hyper-HTKG demonstrates superior performance in experiments conducted on five benchmark datasets.
Keyword: Hyperbolic keyphrase generation,Hyperbolic hierarchical topic model, Hyperbolic keyphrase generation model.
Cite@inproceedings{ICIC2025,
author = {Wu Zhuang, Heng Yu, and Yafu Li},
title = {Hyperbolic Hierarchical Topic-based Keyphrase Generation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {787-804},
}
- PRMT: Retentive Networks Meet Vision Transformers for Plant Disease Identification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jialong Guo, Baofang Chang, and Guoqiang Li
Abstract: Accurate and rapid classification of plant diseases is crucial for enhancing productivity in contemporary agriculture. Both modern deep-learning models and conventional methods encounter obstacles when it comes to finding plant diseas-es. For instance, complicated scenarios often increase processing costs and re-duce recognition accuracy. This study introduces the PRMT Retentive Networks Meet Vision Transformers for Plant Disease Identification framework, utilizing the Retentive Networks Meet Vision Transformers RMT architecture. The method utilizes Manhattan distances and spatial prior knowledge to create a spa-tial attenuation matrix. It improves internal correlations and enables a greater un-derstanding of the relationships among image regions. The design incorporates the Convolutional Block Attention Module CBAM to enhance feature represen-tations. Incorporating 2D average pooling in the backbone network diminishes sensitivity to local noise and inhibits an increase in model parameters. We em-ployed datasets on paddy, corn, wheat, and coffee diseases. To enhance the utili-zation of the datasets, we implemented rotation, scaling, and color modification and conducted three-fold cross-validation. We assessed the PRMT model is per-formance using recall, specificity, accuracy, and precision metrics and compared it with other models. Studies show that the PRMT model can easily handle big and complicated datasets of agricultural diseases, leading to much better results with only a few extra parameters. Our methodology improves the effectiveness of categorizing intricate plant disease images.
Keyword: Plant disease, Manhattan, Networks, Attention, Spatial prior knowledge.
Cite@inproceedings{ICIC2025,
author = {Jialong Guo, Baofang Chang, and Guoqiang Li},
title = {PRMT: Retentive Networks Meet Vision Transformers for Plant Disease Identification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2920-2937},
}
- Knowledge Distillation Based on Logit Ranking Alignment, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Guoming Lu, Mingdong Zhang, Zihan Cheng, Jielei Wang, Yu Peng, and Kexin Li
Abstract: With the development of deep learning, models have become increasingly large and difficult to deploy effectively on embedded devices, mobile applications, and edge computing environments. Knowledge distillation has become a mainstream approach to solving this problem due to its simplicity and efficiency. In the field of image classification, most knowledge distillation methods focus on how to enable the student model to learn more knowledge from the teacher model, but they introduce unnecessary strict constraints. To address this challenge, we propose a novel knowledge distillation paradigm based on logit ranking alignment, i.e., aligning the logit rankings of the teacher and student models. Since traditional hard ranking algorithm is non-differentiable, we introduce a fast differentiable soft ranking algorithm to obtain the soft logit rankings of the teacher and student models, and then we use an L2 loss to align them. Extensive experiments on CIFAR-100 and Tiny-ImageNet validate the effectiveness of our method.
Keyword: Knowledge Distillation, Logit Ranking Alignment, Fast Differentiable Soft Ranking, Image Classification.
Cite@inproceedings{ICIC2025,
author = {Guoming Lu, Mingdong Zhang, Zihan Cheng, Jielei Wang, Yu Peng, and Kexin Li},
title = {Knowledge Distillation Based on Logit Ranking Alignment},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2935-2949},
}
- DAMLP: Data Augmented Multi-Layer Perceptrons for Multivariate Time Series Forecasting, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiyanglin Li, Heming Du, Yiming Tang, Jinhong You, Shouguo Du, and Wen Li
Abstract: Multivariate time series forecasting MTSF plays a crucial role in various applications by predicting future values based on historical data across multiple variates. Although deep learning models have achieved remarkable success in MTSF tasks, they often face challenges related to data scarcity. Data augmentation, which enriches the training data, has emerged as a promising technique to improve forecasting accuracy. However, preserving temporal dependencies in augmented data remains a significant challenge. In this paper, we introduce Data Augmented Multi-Layer Perceptrons DAMLP , a novel MTSF framework that integrates a Data Augmentation DA module and a simple yet effective Multi-Layer Perceptrons MLP architecture. Our DA module enhances the training dataset by increasing the frequency of time series with high correlations to others while reducing the frequency of low-correlation series, thus mitigating the interference on the model's forecasting accuracy caused by low-correlation series. To efficiently utilize the augmented dataset, we use a simple MLP architecture that provides an efficient solution without sacrificing forecasting performance. Our experimental results on multiple real-world datasets demonstrate that DAMLP outperforms state-of-the-art models with less memory usage and training time. Our approach highlights the potential of leveraging correlation information to improve the accuracy and efficiency of MTSF models.
Keyword: Multivariate Time Series Forecasting Data Augmentation MLP
Cite@inproceedings{ICIC2025,
author = {Jiyanglin Li, Heming Du, Yiming Tang, Jinhong You, Shouguo Du, and Wen Li},
title = {DAMLP: Data Augmented Multi-Layer Perceptrons for Multivariate Time Series Forecasting},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1416-1432},
note = {Poster Volume Ⅱ}
}
- METACoref: A Coreference Resolution Approach Based on Meta-information Loss for Document, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ying Mao, Yong Peng, and Yong Zhong
Abstract: Coreference resolution is a key technique in natural language processing, aiming at recognizing different representations pointing to the same entity in a text. However, in order to improve the performance, existing methods rely on single semantic features for complex representations and iterative operations on one hand, and introduce multiple complex structures and external knowledge on the other hand, which sacrifices the efficiency and generalization performance to a certain extent. Therefore, this study explores the textual meta-information and proposes a meta-information loss-based coreference resolution model, METACoref, which optimizes the task at two levels, i.e., mention recognition and coreference prediction. METACoref first enriches word representation by syntactic information and entity types, and then obtains subword-based word representation based on local and masked attention mechanisms. In mention recognition, METACoref integrates entity type features, speaker features, and belonging sentence position features to compensate for the lack of pure semantic modeling. In coreference prediction, METACoref uses a combination of dynamically balanced semantic loss and structured meta-information loss to complement the semantic information. Structured meta-information loss computes a representation of the consistency of speaker information between mentions and the relative distance between mentions. Experiments on the OntoNotes 5.0 dataset show that the method performs superiorly in mention identification and coreference prediction, significantly improving the performance of the coreference reasolution model in terms of efficiency, robustness, and long-distance dependency handling.
Keyword: Mention Identification, Coreference Prediction, Coreference Resolution, Mata-information.
Cite@inproceedings{ICIC2025,
author = {Ying Mao, Yong Peng, and Yong Zhong},
title = {METACoref: A Coreference Resolution Approach Based on Meta-information Loss for Document},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2284-2295},
note = {Poster Volume Ⅱ}
}
- Dance Dissected: Enhancing Labanotation Generation through Part-specific Attention with Transformers, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Min Li, Jing Sang, and Lina Du
Abstract: Labanotation is a widely-used notation system for recording human dance movements. However, writing standard Labanotation scores requires extensive professional training. Automatically generating Labanotation scores from mo-tion capture data can significantly reduce manual efforts in dance documentation and can serve as a powerful tool for preserving folk dances, in the protection work of intangible cultural heritages. Despite this, existing methods for Labano-tation generation face challenges in capturing the more fluid and variable limb movements inherent in folk dance performances. In this paper, we introduce a novel Transformer-based model called PSA-Transformer Transformer with Part-Specific Attention to achieve more accurate Labanotation generation. First, we develop a Part-Specific Attention PSA module that adheres to the body part division rules of Labanotation. This module extracts spatial attention features at individual body part levels, enhancing the precision of movement capture. Then, this attention mechanism is integrated into an encoder-decoder architecture, enabling the model to learn global temporal dependencies within the feature sequences produced by the PSA module. As such, we sequentially generate corresponding Laban symbols using the decoder component of the PSA-Transformer. Extensive experiments on two real-world datasets demon-strate that our proposed model performs favorable against current state-of-the-art methods in automatic Labanotation generation.
Keyword: Labanotation generation, Part-Specific Attention, Transformer, limb move-ments, encoder-decoder.
Cite@inproceedings{ICIC2025,
author = {Min Li, Jing Sang, and Lina Du},
title = {Dance Dissected: Enhancing Labanotation Generation through Part-specific Attention with Transformers},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1959-1971},
note = {Poster Volume Ⅱ}
}
- Shared Bicycle Demand Prediction Based on Hierarchical Spatiotemporal Graph Convolution Network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zikang Dai, Lifeng Yang, Liming Jiang, and Huanyu Wang
Abstract: Shared bicycle demand prediction plays a crucial role in urban public transportation planning, citizen mobility, and environmental protection. However, existing demand prediction models have certain limitations in modeling the coupling relationship between the inflow and outflow of shared bicycle stations. Furthermore, the performance of the model is highly influenced by sparse data, especially for fine-grained shared bicycle demand prediction tasks. To address these challenges, this paper proposes a Hierarchical Spatiotemporal Graph Convolutional Network HST-GCN . Specifically, we apply a hierarchical spatial-temporal feature learning framework to capture both coarse-grained and fine-grained shared bicycle demand features, and utilize a feature transformation matrix to achieve cross-scale fusion of these demand features, alleviating the impact of data sparsity on fine-grained feature modeling. We also design a dynamic coupling graph convolution module to better model the dynamic spatial dependencies between the inflow and outflow of shared bicycle stations. On this basis, we integrate temporal convolution networks and temporal attention mechanisms to capture the spatial-temporal correlations of shared bicycle demand. Extensive experiments are conducted on the Citi Bike dataset from New York and the Divvy dataset from Chicago. The results show that the proposed model outperforms the baseline models in prediction accuracy.
Keyword: Transportation planning,Demand forecasting,Temporal convolution network,Graph convolution network
Cite@inproceedings{ICIC2025,
author = {Zikang Dai, Lifeng Yang, Liming Jiang, and Huanyu Wang},
title = {Shared Bicycle Demand Prediction Based on Hierarchical Spatiotemporal Graph Convolution Network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {111-127},
}
- A Heading Correction Method for UAV Swarms Against Yaw Deception Based on the Consensus Potential Field, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Huan Zhao, Yuxin Xue, and Zhaojun Gu
Abstract: A yaw spoofing heading correction method utilizing a consensus force field is introduced to address the issue of drone swarm heading deviation during Global Navigation Satellite System GNSS spoofing attacks. Firstly, a noisy environment based on asymmetric information environment is established to simulate the attack and defense scenario of unmanned aerial vehicle clusters against yaw deception attacks under environmental noise interference. Then, based on the attack principle, the yaw correction problem is transformed into an artificial potential field problem where the repulsive field is invisible, and the repulsive source is predicted based on spatial relationships to achieve pre-liminary correction of the yaw direction Finally, by designing a lightweight gamma Consensus mechanism and further correcting the yaw direction through credibility calculation and consensus mechanism, collaborative de-fense against yaw deception attacks is achieved. The experimental results in-dicate that, Under the CPF method, the cluster achieved a destination error of 10.32 meters, a trajectory deviation of 13.35 meters, and a task comple-tion rate of 90.45 . Compared with game models, random decision, and other methods, there is a significant improvement, which verifies the effec-tiveness and robustness of the method in the face of yaw deception attacks in long-distance flight missions.
Keyword: UAV swarm, artificial potential field, consensus mechanism, GNSS spoofing defense
Cite@inproceedings{ICIC2025,
author = {Huan Zhao, Yuxin Xue, and Zhaojun Gu},
title = {A Heading Correction Method for UAV Swarms Against Yaw Deception Based on the Consensus Potential Field},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {136-148},
note = {Poster Volume Ⅰ}
}
- YOLO-FireAD: Efficient Fire Detection via Attention-Guided Inverted Residual Learning and Dual-Pooling Feature Preservation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Weichao Pan, Bohan Xu, Xu Wang, Chengze Lv, Shuoyang Wang, and Zhenke Duan
Abstract: Fire detection in dynamic environments faces continuous challenges, including illumination interference, frequent false detection missed detection, and difficulty in balancing efficiency and accuracy. To address the problem of feature extraction limitation and information loss in the existing YOLO-based models, this study propose You Only Look Once for Fire Detection with Attention-guided Inverted Residual and Dual-pooling Downscale Fusion YOLO-FireAD with two core innovations: 1 Attention-guided Inverted Residual Block AIR integrates hybrid channel-spatial attention with inverted residuals to adaptively enhance fire features and suppress environmental noise 2 Dual Pool Downscale Fusion Block DPDF preserves multi-scale fire patterns through learnable fusion of max-average pooling outputs, mitigating small-fire detection failures. Extensive evaluation on two public datasets shows the efficient performance of our model. Experimental results show that the proposed model maintains the advantages of lightweight 1.45M parameters, 51.8 lower than YOLOv8n 4.6GFLOPs, 43.2 lower than YOLOv8n , and mAP75 is higher than the mainstream real-time object detection models YOLOv8n, YOLOv9t, YOLOv10n, YOLO11n, YOLOv12n and other YOLOv8 variants 1.3-5.5 .
Keyword: Fire detection, Efficient models, YOLO, Attention mechanisms, Small object detection.
Cite@inproceedings{ICIC2025,
author = {Weichao Pan, Bohan Xu, Xu Wang, Chengze Lv, Shuoyang Wang, and Zhenke Duan},
title = {YOLO-FireAD: Efficient Fire Detection via Attention-Guided Inverted Residual Learning and Dual-Pooling Feature Preservation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {438-452},
note = {Poster Volume Ⅰ}
}
- MLVP-Net: Deepfake Detection Based on Multi-Level Visual Perception, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Kai Li, Shaochen Jiang, Liejun Wang, Songlin Li, Chao Liu, and Sijia He
Abstract: The negative impacts of Deepfake technology have attracted widespread attention in the multimedia forensics community. Due to the insufficient diversity of existing datasets, models tend to overly rely on forgery-specific features, resulting in poor generalization. To address this issue, we propose a multi-level visual perception network MLVP-Net , which explores local, spatial, and semantic consistency from different perspectives to improve detection accuracy. Specifically, we first introduce a Multi-scale Spatial Perception Module MSPM that effectively captures both long-range and local information through parallel cascaded Hybrid State Space HSS blocks and multi-kernel convolution operations. Then, we present a Detail Feature Enhancement Module DFEM , which employs multiple differential convolutions for multi-directional perception, enabling the model to sense and weight details from different directions. Finally, we propose a Content-Adaptive Attention Module CAAM , which enriches contextual information by fusing multi-level features while guiding the model to focus on more useful information through combing channel and spatial attention mechanisms. Extensive experiments demonstrate that our MLVP-Net significantly outperforms all comparison methods across five benchmark datasets in Deepfake detection.
Keyword: Deepfake Detection,Efficientnet,State Space Model,Differential Convolution,Content Adaptive Attention
Cite@inproceedings{ICIC2025,
author = {Kai Li, Shaochen Jiang, Liejun Wang, Songlin Li, Chao Liu, and Sijia He},
title = {MLVP-Net: Deepfake Detection Based on Multi-Level Visual Perception},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {453-467},
note = {Poster Volume Ⅰ}
}
- LLM Evaluation Panel Selection Based on Cross Assessment and Similarity Matrix, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhixiang Yang, Rongduo Han, Xiaoteng Pan, Liming Kang, Meiping Wang, Nan Gao, and Haining Zhang
Abstract: The evaluation of Large Language Models LLMs has become increasingly crucial as these models continue to advance in capabilities and complexity. The current evaluation methods primarily rely on lexical metrics and single-model scoring systems, falling short on comprehensively and accurately assessing the capabilities and performance of LLMs in semantic understanding and logical reasoning, which presents a significant challenge in developing reliable and trustworthy assessment frameworks. The contributions of the study are as follows. First, it introduces an automated approach that combines cross-model evaluation mechanisms with similarity analysis to systematically select members for the multi-model evaluation panel. Second, it validate the effectiveness of it's methodology using expert-annotated evaluation data. Experimental results demonstrate that the multi-model evaluation panel approach achieves noticeable improvement in scoring consistency versus human evaluation as compared to single-model approach.
Keyword: Large language models, Model performance evaluation, Multi-model scoring, Cross Assessment, Similarity Matrix
Cite@inproceedings{ICIC2025,
author = {Zhixiang Yang, Rongduo Han, Xiaoteng Pan, Liming Kang, Meiping Wang, Nan Gao, and Haining Zhang},
title = {LLM Evaluation Panel Selection Based on Cross Assessment and Similarity Matrix},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {805-820},
}
- Change Detection in Wide-Field Video Images Based on Low Illumination Enhancement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jia He, Tianyu Ren, Yankai Cao, Zhenhong Jia, and Sensen Song
Abstract: In low illumination environments, the image quality of eagle-eye surveillance devices is significantly degraded by high-density random noise and inhomogeneous illumination. In addition, the change targets in the surveillance area are usually small, which further increases the difficulty of change detection CD , and is prone to false positives and negatives. In this paper, we propose a new unsupervised CD method for small targets. Specifically, under low illumination, the image is pre-enhanced using the bright channel prior and Single-Scale Retinex SSR algorithm to improve image quality two difference images DIs are generated by the arctangent operator and the Chi-Square Transform CST , and the difference information is fused using the improved multiplicative fusion MTF technique to ensure the completeness of the details in the change region and suppress the noise. Particularly, for areas with few changes or no changes, we propose a threshold segmentation method based on Log-Normal Distribution Histogram Fitting Error Minimization LNDFEM to achieve the segmentation of change regions. Experimental results demonstrate that the proposed method outperforms comparison algorithms in terms of overall error, F1-Score, and Kappa coefficient, and exhibits stronger robustness.
Keyword: Low Illumination Change Detection , Wide Field of View , Security Surveillance , Log-normal Distribution , Image Enhancement.
Cite@inproceedings{ICIC2025,
author = {Jia He, Tianyu Ren, Yankai Cao, Zhenhong Jia, and Sensen Song},
title = {Change Detection in Wide-Field Video Images Based on Low Illumination Enhancement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {468-481},
note = {Poster Volume Ⅰ}
}
- Causality Extraction in Chinese Public Health Events Text, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shituo Ma, Lingwei Chen, and Ran Wang
Abstract: Extracting causality in public health event datasets is crucial, and traditional sentence-level extraction methods have been extensively studied. However, the performance of widely used models remains poor, especially for Chinese datasets. One reason is the lack of high-quality labeled Chinese datasets in this field. Additionally, implicit causality, cross-sentence causality, and multiple causalities in Chinese datasets make it difficult for models to fully extract causality. To address these issues, we constructed the first Chinese public health event dataset for causality extraction, containing 33,286 Weibo texts. We propose a model with multi-task learning to provide additional information and an attention mechanism to focus on key context for causality. The model achieved an F1 score of 0.9554 on our dataset and performed well in multiple causalities and cross-sentence causality. Our work focuses on short-text relationship extraction in the context of public health events, addressing the unique challenges of implicit causality and cross-sentence dependencies.
Keyword: Public Health Events , Causality Extraction , BERT-BiLSTM-Attention-CRF , Multi-task Learning
Cite@inproceedings{ICIC2025,
author = {Shituo Ma, Lingwei Chen, and Ran Wang},
title = {Causality Extraction in Chinese Public Health Events Text},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2222-2233},
note = {Poster Volume Ⅱ}
}
- Deep Reinforcement Learning for Solving Electric Vehicle Routing Problems with Battery Swapping Station, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Qichao Sun, Junqing Li, and Xiaolong Chen
Abstract: With the widespread adoption of electric vehicles EVs in logistics and transportation, long charging times and limited range have become significant challenges. Battery swapping offers an efficient station energy replenishment method that can substantially reduce replenishment time. This study investigates the Electric Vehicle Routing Problem with Battery Swapping Station EVRP-BSS . To address this problem, a Deep Reinforcement Learning DRL -based approach is proposed to solve the EVRP-BSS by training an encoder-decoder structured policy network that constructs vehicle routes sequentially. First, the construction of the EVRP-BSS solution is modeled as a Markov Decision Process MDP . Then, to recognize the relationships more effectively, we design a graph convolutional network GCN -based encoder that separately embeds node features and edge features e.g., distance, slope , further fusing them through self-attention to generate global representations for downstream tasks. This approach enhances the node information, ultimately leading to high-quality solutions. During training, we update the model parameters by the multiple starting sampling trajectory method. Experimental results demonstrate that our method outperforms various traditional and DRL-based baselines while showing strong generalization ability.
Keyword: Deep Reinforcement Learning, Electric Vehicle Routing Problem, Graph Convolutional Network.
Cite@inproceedings{ICIC2025,
author = {Qichao Sun, Junqing Li, and Xiaolong Chen},
title = {Deep Reinforcement Learning for Solving Electric Vehicle Routing Problems with Battery Swapping Station},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {149-161},
note = {Poster Volume Ⅰ}
}
- Multi-scale Depth-Calibrated Kernel-split Network for Monocular Occupancy Prediction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaoke Tan, Jinlai Zhang, Yuxin Chen, Kai Gao, Qiqi Li, Lin Hu, and Gang Wu
Abstract: In the task of indoor occupancy prediction from a single im- age, there is an issue where it is challenging to complement the semantic scene due to the small size and intricate nature of the predicted objects. Traditional methods struggle to obtain rich and accurate semantic infor- mation, which results in the worsening of the problem of semantic scene blurring. To solve the problem, we propose a novel approach accompa- nied by our proposed Multi-scale Context Calibration Module MCCM , Depth Calibrated Residual Network DCRNet and Kernel-split Depth- wise Attention KSDA to enhance the scene semantic information and alleviate the depth blurring problem of semantic scenes. Ablation ex- periments confirm the effectiveness of our module. Comparative analysis with the SOTA model verifies the superiority and generalisation of our model. Comparison on the OccScanNet_mini dataset confirms the ex- cellent generalisation of our method even with limited data. Specifically, our method reaches 47.94 and 32.33 for IoU and mIoU on the NYUv2 dataset, and 42.81 and 29.59 for IoU and mIoU on the OccScanNet dataset, respectively.
Keyword: Indoor occupancy prediction · Semantic scene completion · Computer vision.
Cite@inproceedings{ICIC2025,
author = {Xiaoke Tan, Jinlai Zhang, Yuxin Chen, Kai Gao, Qiqi Li, Lin Hu, and Gang Wu},
title = {Multi-scale Depth-Calibrated Kernel-split Network for Monocular Occupancy Prediction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2951-2965},
}
- Research on accurate prediction of Air quality index in Nanyang City driven by machine learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: FengYue Jiang, HongJie Tu, Zhen Shen, Ya Qiu
Abstract: Air quality prediction models often face the problems of insufficient generalization ability and poor prediction accuracy. To address this challenge, a machine learning model is proposed to be applied to Air Quality Index AQI prediction. Based on the air quality data of Nanyang City from 2023 to 2024, this study screened out PM2.5, PM10 and CO as key characteristic variables through correlation coefficient analysis. The parameters of eXtreme Gradient Boosting model, Random Forest model and Support Vector Regression model were optimized by using grid search method, and the model was trained and predicted. The experimental results show that the performance of SVR model optimized by grid search is the best. In terms of evaluation index, its R² is 0.886875, MAE is 0.026789, MSE is 0.001636. This study provides an efficient and feasible method for air quality prediction, and provides data support and technical reference for the formulation of air pollution prevention strategies and environmental management decisions.
Keyword: Air quality prediction, Machine learning, Grid search, Support vector regression model, Hyperparameter optimization
Cite@inproceedings{ICIC2025,
author = {FengYue Jiang, HongJie Tu, Zhen Shen, Ya Qiu},
title = {Research on accurate prediction of Air quality index in Nanyang City driven by machine learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {128-138},
}
- Self-Supervised Monocular Depth Estimation Based on Dual-Branch DepthNet and Multi-Attention Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuanyuan Wang, Dianxi Shi, Junze Zhang, Luoxi Jing, Xueqi Li, and Xucan Chen
Abstract: Monocular depth estimation infers 3D geometric structures of scenes from a single RGB image, offering significant applications in autonomous driving, robot navigation, and other fields. While current self-supervised learning methods avoid dependency on ground truth depth data, they still exhibit no-table limitations in complex scenarios: traditional encoder-decoder architec-tures inevitably lose high-frequency detail features when acquiring global context through continuous downsampling, resulting in blurred edges and texture distortion in depth maps. To alleviate these issues, we propose a nov-el approach named HyperDetailNet, which significantly enhances depth es-timation detail preservation. Specifically, our method contains two key com-ponents: 1 A dual-branch detail-global feature extraction network, where the detail branch adopts an enlarge-then-reduce strategy to preserve high-frequency texture information, while the global branch extracts overall struc-tural information of the scene. 2 To effectively fuse features from both branches, we designed a multi-attention fusion module that combines spatial attention, channel attention, and sliding window self-attention mechanisms to enhance model perception of detailed regions. Experimental results demonstrate that HyperDetailNet achieves excellent performance on both KITTI and Make3D datasets, with significant improvements in depth estima-tion for edge and texture-rich areas. Additionally, ablation experiments veri-fy the effectiveness of the dual-branch detail-global feature extraction DepthNet and multi-attention fusion module.
Keyword: Monocular Depth Estimation·Self-Supervised·Dual-Branch Network
Cite@inproceedings{ICIC2025,
author = {Yuanyuan Wang, Dianxi Shi, Junze Zhang, Luoxi Jing, Xueqi Li, and Xucan Chen},
title = {Self-Supervised Monocular Depth Estimation Based on Dual-Branch DepthNet and Multi-Attention Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {482-498},
note = {Poster Volume Ⅰ}
}
- BGE-YOLO: An Improved YOLOv8 for Chinese Handwritten Text Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Rui Xiong, Shiyu Li, Xiuwei Yang, and Zhiwu Liao
Abstract: Handwritten text detection is a crucial step in converting handwritten text images into editable text. However, in practical applications, text detection still faces numerous challenges, including the complexity of environmental back grounds, diversity of target scales, and the contact between complex characters. To address these challenges, this paper proposes a BGE-YOLO model for hand written text detection. Firstly, a new feature fusion module is designed to achieve bidirectional information flow through cross-scale connections and rapid plan ning, ensuring effective integration of features across multiple scales. On this basis, a Global Attention Mechanism GAM is incorporated, which reduces in formation loss and amplifies the interaction of global dimensional features, ena bling the model to extract meaningful information in complex backgrounds. Ad ditionally, the incorporated Multi-Scale Attention EMA module utilizes a novel cross-spatial learning approach, enhancing the interaction of local features and further improving feature fusion efficiency. Furthermore, a data augmentation strategy enriches the self-constructed handwritten text image dataset, further im proving the model's generalization ability. Experimental results indicate that compared to the YOLOv8 model, the mAP50 and accuracy P of this model have increased by 2.8 and 3.9 , respectively. This validates the advantages of the BGE-YOLO model in handwritten text detection and facilitates more convenient information extraction from handwritten text.
Keyword: handwritten text, text detection, BGE-YOLO, BiFPN, Global At tention Module, EMA, self-constructed dataset.
Cite@inproceedings{ICIC2025,
author = {Rui Xiong, Shiyu Li, Xiuwei Yang, and Zhiwu Liao},
title = {BGE-YOLO: An Improved YOLOv8 for Chinese Handwritten Text Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2966-2983},
}
- Cultural Heritage Assistant: A Lightweight Retrieval Augmented Generation Method Enhanced Vision-Language Model for Cultural Heritage, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shiyu Wang, Huanda Lu, Haibiao Yao, Zhiyu Wu, Chen Hu, Xiangjie Xie, and Xin Yu
Abstract: External knowledge makes the Vision-Language Model VLM mo -re versatile. However, Traditional methods often fail to address the nuanced challenges of cultural heritage, mainly when dealing with unfamiliar or complex artifact-related queries. This limitation is evident in Vision-Language Models, which struggle to generate responses without exposure to domain knowledge. Frequent retraining to accommodate new artifacts or knowledge domains is computationally expensive and impractical. To overcome these limitations, we propose Cultural Heritage Assistant, a lightweight Retrieval-Augmented Generation RAG method designed for enhancing small-scale VLMs. Our approach integrates visual and textual retrieval modules to augment the input context, enabling the model to generate professional and accurate responses for cultural heritage queries. Experimental results on the constructed Hemudu Artifacts Visual Question-Answering dataset demonstrate the effectiveness of this approach. This method offers a solution for preserving and disseminating cultural heritage, bridging the gap between advanced VLM capabilities and domain-specific expertise.
Keyword: VLM、RAG、Cultural Heritage Assistant
Cite@inproceedings{ICIC2025,
author = {Shiyu Wang, Huanda Lu, Haibiao Yao, Zhiyu Wu, Chen Hu, Xiangjie Xie, and Xin Yu},
title = {Cultural Heritage Assistant: A Lightweight Retrieval Augmented Generation Method Enhanced Vision-Language Model for Cultural Heritage},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3617-3630},
}
- PBSpikformer: A Pure Spike-Driven Spiking Neural Network with Fourier-Phase Attention and Dynamic Batch Context, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chengfan Yang, Tao Deng, Ran Li, and Fei Yan
Abstract: Spiking Neural Networks SNNs , inspired by biological neurons, have gained increasing attention for their energy efficiency and event-driven computation. However, their binary nature and complex dynamics make it difficult to train high-performance, low-latency models, limiting their progress compared to Artificial Neural Networks ANNs . To address these challenges, we propose PBSpikformer, a directly trainable spiking Transformer architecture that incorporates two novel components: Fourier-Phase Attention FPA and Dynamic Batch Context DBC . FPA combines spike-based Q-K token attention with Spectral Cross-Modal Augmentation SCMA to effectively fuse spatial, temporal, and frequency-domain features while reducing computational complexity. DBC introduces batch-level global signals to modulate local and global activations, improving gradient flow and training robustness. Extensive experiments show that PBSpikformer outperforms existing SNN models across multiple benchmarks, achieving 96.7 accuracy on CIFAR10-DVS—a 12.7 improvement over previous methods—and becomes the first directly trained SNN to surpass 90 accuracy on this dataset.
Keyword: Spiking neural network, Fourier transform, Attention mechanism
Cite@inproceedings{ICIC2025,
author = {Chengfan Yang, Tao Deng, Ran Li, and Fei Yan},
title = {PBSpikformer: A Pure Spike-Driven Spiking Neural Network with Fourier-Phase Attention and Dynamic Batch Context},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1432-1443},
note = {Poster Volume Ⅱ}
}
- V2VSR: Keypoints Feature Fusion-Based Cooperative Perception Method Under Communication Delays, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Junxi Chen, Xiangfeng Luo, Liyan Ma, and Xue Chen
Abstract: Single-agent LiDAR-based perception has significantly progressed but remains constrained by factors such as sensor range and occlusion. Multi-agent point clouds cooperative perception leverages inter-agent communication to share sensory information, thereby enhancing the perception capabilities of individual agents. Existing methods often assume ideal communication conditions. However, In the real world, data transmission delays are inevitable, which can cause the central agent to receive inaccurate features, leading to significant misguidance in perception results. This paper proposes a novel framework for multi-agent point clouds cooperative perception to efficiently extract key features and reduce latency. Specifically, We introduce a Sparse-Residual PointPillar SRPP backbone, improving inference speed and receptive field, and a Pillar Set Abstract Module PSM , which abstracts scenes into compact keypoint features, significantly reducing shared feature map size. Additionally, we employ an inter-agent attention module, leveraging the characteristic of the main agent's own feature map, which requires no transmission and thus has no latency, to correct potential feature distortions and mitigate the impact of partially unavoidable delays, thereby improving system robustness. Our method can significantly reduce the shared feature map size to less than 0.1 MB, approximately 40 times smaller than most state-of-the-art methods. Even with significantly reduced shared feature maps, our model still outperforms other methods under ideal communication conditions and demonstrates a substantial advantage under delayed communication scenarios, indicating that our method significantly enhances the perception system's performance and delay robustness.
Keyword: Cooperative perception, 3D object detection, Sparse-Residual convolution, Key Feature Selection, Transformer
Cite@inproceedings{ICIC2025,
author = {Junxi Chen, Xiangfeng Luo, Liyan Ma, and Xue Chen},
title = {V2VSR: Keypoints Feature Fusion-Based Cooperative Perception Method Under Communication Delays},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3509-3526},
}
- Defect Detection and Classification of PCB Based on RT-DETR, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuai Hao, Xiaoqi He, and Ya Li
Abstract: Printed circuit board PCB defect detection is crucial for ensuring PCB quality. However, small defect sizes on PCBs and the substantial parameter and computational requirements of deep learning models pose challenges in detection accuracy and deployment on resource-constrained devices. To address these issues, we introduce an optimized PCB defect detection method based on the RT-DETR model. Firstly, we propose the ContextAdown-D-LKAEfficientFormerV2 lightweight network to replace the ResNet18 backbone, reducing parameters and computations while enhancing small defect feature extraction. Secondly, we present the Bi-Slim-Neck lightweight structure to replace the CCFM component in the original model, achieving lightweight design and improved feature fusion capabilities to leverage effective features fully. Lastly, we propose the InnerShapeIoU loss function to replace the GIoU loss function, accounting for the influence of PCB defect bounding box shapes and scales on regression, and generating auxiliary bounding boxes suitable for PCB defect detection tasks and detectors. This enhances model generalization and detection accuracy. Experi mental results show that the improved model achieves detection accuracies of 97 mAP50 and 55 mAP50:95 , with a 23.8 reduction in parameters and a 44.9 decrease in computations compared to the original model. Thisindicates that the improved method significantly boosts parameter efficiency and reduces computational complexity while maintaining high detection accuracy.
Keyword: RT-DETR, PCB defect detection, Small defect.
Cite@inproceedings{ICIC2025,
author = {Shuai Hao, Xiaoqi He, and Ya Li},
title = {Defect Detection and Classification of PCB Based on RT-DETR},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1974-1989},
note = {Poster Volume Ⅱ}
}
- Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Naifu Feng, Lixing Chen, Junhua Tang, Hua Ding, Jianhua Li, and Yang Bai
Abstract: Transformer-based models have made significant progress in time series forecasting. However, a key limitation of deep learning models is their susceptibility to adversarial attacks, which has not been studied enough in the context of time series prediction. In contrast to areas such as computer vision, where adversarial robustness has been extensively studied, frequency domain features of time series data play an important role in the prediction task but have not been sufficiently explored in terms of adversarial attacks. This paper proposes a time series prediction attack algorithm based on frequency domain loss. Specifically, we adapt an attack method originally designed for classification tasks to the prediction field and optimize the adversarial samples using both time-domain and frequency-domain losses. To the best of our knowledge, there is no relevant research on using frequency information for time-series adversarial attacks. Our experimental results show that these current time series prediction models are vulnerable to adversarial attacks, and our approach achieves excellent performance on major time series forecasting datasets.
Keyword: Adversarial attack, time series forecasting, frequency domain analysis
Cite@inproceedings{ICIC2025,
author = {Naifu Feng, Lixing Chen, Junhua Tang, Hua Ding, Jianhua Li, and Yang Bai},
title = {Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {915-928},
note = {Poster Volume Ⅰ}
}
- Retrieval and Ranking of Scientific Documents Based on LSB and TOPSIS, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Peng Jin
Abstract: Fusing mathematical expressions, related text features, and document attributes is vital for improving the retrieval and ranking performance of scientific documents based on mathematical information. However, due to the specificity of mathematical expressions and their contextual relationships, as well as the diversity of document attributes, they are challenging to utilize fully in existing models. To address these issues, a retrieval and ranking model of scientific documents based on LSB LDA-SBERT and TOPSIS Technique for Order Preference by Similarity to an Ideal Solution is proposed. Firstly, the mathematical expressions are analyzed using a symbol-level multidimensional parsing algorithm, and hesitant fuzzy sets are introduced to calculate the similarity of mathematical expressions. Then, the LSB model is used to analyze the mathematical expression contexts, extract the contextual features, and calculate text similarity accordingly. By integrating the similarity of mathematical expressions and contexts, the preliminary retrieval results are obtained. Finally, the document attribute set is constructed, and the TOPSIS is used to calculate the influence weight of documents and weight it against the preliminary results to achieve a more precise and influential ranking of scientific documents. Experimental results show that the average MAP_10 is 85.5 and the average NDCG@10 is 87.8 .
Keyword: Scientific document retrieval document ranking mathematical expressions hesitant fuzzy sets LSB TOPSIS.
Cite@inproceedings{ICIC2025,
author = {Peng Jin},
title = {Retrieval and Ranking of Scientific Documents Based on LSB and TOPSIS},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3633-3646},
}
- DAICW: Defect Detection Algorithm for High-voltage Transmission Lines in Complex Weather, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haofeng Li, Zhiqing Guo, Liejun Wang, and Yongming Li
Abstract: High-voltage transmission lines are continuously exposed to outdoor environments, where harsh natural conditions can lead to struc tural damage. Addressing the challenge of defect detection under com plex weather conditions, we propose the Detection Algorithm in Complex Weather DAICW . Firstly, we introduce the Detail-Enhanced Convolu tion DEConv , designed to extract richer features without increasing the parameters. Subsequently, a Focus-Detect is incorporated to emphasize defect features within images while suppressing background interference. Finally, using the LInner-IoU loss function can effectively accelerate con vergence and improve the model’s ability to detect small objects. Exper iments with other mainstream models reveal that DAICW achieves a detection precision of 78.2 and a recall of 67.8 , showcasing robust adaptability in detecting multiple defect types under complex weather scenarios.
Keyword: High-voltage transmission line, Complex weather, Defect detection, Feature enhancement.
Cite@inproceedings{ICIC2025,
author = {Haofeng Li, Zhiqing Guo, Liejun Wang, and Yongming Li},
title = {DAICW: Defect Detection Algorithm for High-voltage Transmission Lines in Complex Weather},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {499-512},
note = {Poster Volume Ⅰ}
}
- Beyond Limitations: Omni-DETR for Comprehensive Object Detection in Real-Time Applications, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiantao Nie, Zuohua Ding, and Xiao Zhu
Abstract: In the field of autonomous driving, the performance of object detectors is crucial, as they must not only provide accurate and reliable environmental perception but also meet the demands for high inference speed and lightweight model requirements. Although the RT-DETR model, based on the Transformer architecture, has demonstrated high speed and accuracy in real-time end-to-end object detection, its performance in detecting small objects is suboptimal. To address this issue, we introduce Omni-DETR, an advanced model aimed at enhancing detection accuracy for small objects without compromising efficiency. Omni-DETR incorporates FasterIRANet as its backbone for feature extraction, significantly reducing redundant computations and memory access, thereby achieving high inference speed while enhancing accuracy. In the encoder, we propose the Dimensional Feature Integrator DFI , which strengthens the model's capability to capture multi-scale features. Additionally, we design a novel bounding box regression loss function, InnerMPDIoU. Experimental results on the TT100K dataset demonstrate that Omni-DETR achieves an AP of 61.2 and a processing speed of 42.8 FPS on a 3090 GPU, while attaining 53.7 AP on the COCO dataset. Compared to several existing models, Omni-DETR proves its superiority in comprehensive performance.
Keyword: End-to-End Object Detection Small Object Transformer
Cite@inproceedings{ICIC2025,
author = {Jiantao Nie, Zuohua Ding, and Xiao Zhu},
title = {Beyond Limitations: Omni-DETR for Comprehensive Object Detection in Real-Time Applications},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {513-528},
note = {Poster Volume Ⅰ}
}
- A High-Imperceptibility Image Steganography Scheme via Makeup Transfer Network and Multiple Feature Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Meihong Yang, Ziyi Feng, Bin Ma, Qi Li, and Linna Zhou
Abstract: The existing image steganography will inevitably cause modification traces to the cover image, resulting in the risk of secret information leakage. Therefore, this paper proposes a color image steganography algorithm based on Makeup Transfer Network and multi-scale feature fusion. This paper aims to achieve the embedding of secret image in the process of makeup transfer. Specifi-cally, the secret image is initially mapped into its latent representation, then, it performs multi-scale feature fusion with makeup features to generate a makeup-ed stego image, resulting in the excellent quality of steganographic image and the high imperceptibility of secret information. Moreover, the Information Compensation Network ICN was constructed for deep fine-grained feature fusion, by using the differences between the original and rebuilt secret information as network loss, the information of secret image is comprehensively compensated and its quality is further improved. Experimental results show that the proposed scheme exhibits superior image quality on both the target image and the recovered secret image, thus providing good security.
Keyword: Steganography, Makeup transfer, Information compensation, Multi-scale feature fusion.
Cite@inproceedings{ICIC2025,
author = {Meihong Yang, Ziyi Feng, Bin Ma, Qi Li, and Linna Zhou},
title = {A High-Imperceptibility Image Steganography Scheme via Makeup Transfer Network and Multiple Feature Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {929-943},
note = {Poster Volume Ⅰ}
}
- AMGCN-FL: Adaptive Multi-Graph Convolutional Networks for Personalized Federated Learning in Industrial IoT Environments, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chenhao Ye and Hailang Jia
Abstract: Industrial Internet of Things IIoT environments generate vast amounts of heterogeneous data across distributed devices, presenting unique challenges for machine learning applications. Federated Learning FL has emerged as a promising paradigm for collaborative model training while preserving data privacy. However, existing FL approaches struggle with the non-IID Independent and Identically Distributed nature of industrial data, leading to suboptimal personalization. In this paper, we propose AMGCN-FL, an Adaptive Multi-Graph Convolutional Network for Federated Learning that addresses the challenges of personalized learning in heterogeneous IIoT environments. Our approach leverages adaptive graph structures to capture complex relationships between clients and introduces a novel parameter-efficient knowledge transfer mechanism. Theoretical analysis demonstrates the convergence properties of our algorithm under non-IID data distributions. Extensive experiments on benchmark datasets show that AMGCN-FL consistently outperforms state-of-the-art personalized FL methods, achieving up to 5.8 improvement in accuracy while maintaining communication efficiency. The proposed method demonstrates robust performance across various degrees of data heterogeneity, making it particularly suitable for real-world industrial applications.
Keyword: Federated Learning,Personalization,Graph Convolutional Networks,Industrial IoT ,Non-IID Data
Cite@inproceedings{ICIC2025,
author = {Chenhao Ye and Hailang Jia},
title = {AMGCN-FL: Adaptive Multi-Graph Convolutional Networks for Personalized Federated Learning in Industrial IoT Environments},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1447-1462},
note = {Poster Volume Ⅱ}
}
- MMCFusionNet: A Multimodal Mixture of Experts and Collaborative Attention Fusion Network for Abnormal Emotion Recognition, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haitao Xiong, Xin Zhou, Wei Jiao, and Yuanyuan Cai
Abstract: The growing popularity of short videos on social media has introduced new challenges for content moderation, particularly in detecting abnormal emotions like hate and sarcasm. These emotions usually exhibit higher concealment and multimodal inconsistency compared to conventional ones. While prior studies have primarily focused on conventional emotion recognition, research on abnormal emotions remains limited. Moreover, existing models often fail to leverage the complementary nature of multimodal data fully and lack robust intermodal interactions. This study proposes MMCFusionNet, a novel multimodal fusion framework designed for abnormal emotion recognition in short videos. The model extracts and aligns features from four modalities text, visual, audio, and facial through a dedicated feature encoder and alignment module to improve the ability of hate and sarcasm emotion recognition. At its core, the model integrates two key mechanisms: 1 Mix-ture of Experts MoE modules to enhance intramodal representations across temporal frames for identifying concealed emotional cues 2 Dual-channel collaborative attention Co-Attention modules to facilitate intermodal complementarity for resolving multimodal contradictions. Experimental results on the HateMM and MUStARD datasets show that MMCFusionNet outperforms baseline models across various evaluation metrics, with ablation studies confirming the effectiveness and robustness of each module.
Keyword: Emotion recognition Multi-modal learning Multi-modal fusion Mixture of Experts Collaborative Attention
Cite@inproceedings{ICIC2025,
author = {Haitao Xiong, Xin Zhou, Wei Jiao, and Yuanyuan Cai},
title = {MMCFusionNet: A Multimodal Mixture of Experts and Collaborative Attention Fusion Network for Abnormal Emotion Recognition},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2982-2994},
}
- Meta-learning and Residual Block Enhanced YOLO for Accurate Detection of Gastrointestinal Pathology Lesions, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiangyu Xue and Yakun Wang
Abstract: Early identification and accurate diagnosis of gastrointestinal diseases, particularly gastric cancer, are paramount for enhancing patient survival rates and treatment outcomes. However, diagnosing these diseases can be challenging, especially when symptoms are mild or absent. Endoscopy, a standard diagnostic tool, relies heavily on the endoscopist's expertise. Integrating artificial intelligence AD with endoscopic imaging has the potential to assist in diagnosis, reduce missed cases, and expedite timely treatment. Previous studies have focused on refining disease classification and and improving diagnostic accuracy, often neglecting issues of data reliability and imbalance. This study proposes a novel approach utilizing model-agnostic meta-learning MAML strategies to address the challenges posed by sparse and imbalanced medical image data. We introduce the YOLO-MR model, which incorporates meta-recognition mechanisms and residual blocks into the YOLO framework. Experimental results demonstrate that the traditional YOLO model achieves an average precision mAP of only 41.7 on imbalanced data, highlighting the negative impact of data imbalance. Traditional data augmentation techniques improve the mAP to 65.2 . whereas our proposed YOLO-MR model achieves an impressive mAP of 96 , representing a significant improvement of 54.3 over the traditional model. This enhancement effectively reduces the diagnostic accuracy gap between different disease categories and mitigates the issue of data imbalance. Furthermore, our research validates the strong potential of advanced techniques such as MAML and residual blocks in resource-limited medical image recognition tasks. These findings provide valuable insights into addressing the challenges of limited and imbalanced medical data in the healthcare field.
Keyword: Gastrointestinal endoscopy, Medical image, Meta-learning, YOLO, Lesions
Cite@inproceedings{ICIC2025,
author = {Xiangyu Xue and Yakun Wang},
title = {Meta-learning and Residual Block Enhanced YOLO for Accurate Detection of Gastrointestinal Pathology Lesions},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2238-2255},
note = {Poster Volume Ⅱ}
}
- SAKR: Enhancing Retrieval-Augmented Generation via Streaming Algorithm and K-Means Clustering, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haoyu Kang, Yuzhou Zhu, Yukun Zhong, Ke Wang, and Ping Zhong
Abstract: Retrieval-augmented generation RAG has achieved significant success in information retrieval to assist large language models LLMs in answering questions from unseen documents by building an external knowledge base. However, it faces significant challenges, including high memory consumption due to the extensive database and the inability to update the index database in real-time when handling large data streams. To reduce the memory required for building the database and maintain accuracy simultaneously, we proposed a novel approach that integrates a streaming algorithm with k-means clustering into RAG. Our approach applies a streaming algorithm to dynamic index updates and reduces memory consumption. Additionally, the k-means clustering algorithm that groups similar documents is applied to reduce query time. We conducted comparative experiments on RAG with streaming algorithm and k-means clustering SAKR , and the results indicated that SAKR outperforms traditional RAG in both accuracy and memory efficiency, particularly for large-scale streaming data, with an average accuracy of 0.640 and 10 memory cost.
Keyword: Retrieval-augmented generation, Natural Language Processing, Streaming algorithm.
Cite@inproceedings{ICIC2025,
author = {Haoyu Kang, Yuzhou Zhu, Yukun Zhong, Ke Wang, and Ping Zhong},
title = {SAKR: Enhancing Retrieval-Augmented Generation via Streaming Algorithm and K-Means Clustering},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {821-832},
}
- MCF-SVC: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hui Li, Hongyu Wang, Bohan Sun, Zhijin Chen, and Yanmin Qian
Abstract: Singing voice conversion is to convert the source singing voice into the target singing voice without changing the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in voice processing. In this paper, we propose a high-fidelity flow-based model based on multi-condition feature constraints called MCF-SVC, which enhances the capture of voice details by integrating multiple latent attribute encoders. We also use Multi-stream inverse short-time Fourier transform MS-iSTFT instead of traditional vocoder to enhance the speed of voice reconstruction. We have compared the synthesized singing voice of our model with those of other competitive models from multiple dimensions, and our proposed model is highly consistent with the current state-of-the-art, with the demo which is available at url{https: lazycat1119.github.io MCF-SVC-demo}.
Keyword: Singing voice conversion and Flow model and MS-iSTFT and Multi-Condition.
Cite@inproceedings{ICIC2025,
author = {Hui Li, Hongyu Wang, Bohan Sun, Zhijin Chen, and Yanmin Qian},
title = {MCF-SVC: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1463-1477},
note = {Poster Volume Ⅱ}
}
- AKPFL:A Personalized Federated Learning Architecture to Alleviate Statistical Heterogeneity, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuoshuo Fang, Jialong Sun, Wenchao Zhang, Zongjian Yang, and Kejia Zhang
Abstract: Federated Learning FL has gained considerable attention in machine learning for its ability to preserve data privacy while enabling collaborative modeling. However, statistical heterogeneity, such as non-independent and identically distributed non-IID data severely limits performance. To address this limitation, this study proposes an Adaptive Kernel Alignment-based Personalized Federated Learning framework AKPFL . This approach achieves a balance between global model sharing and local adaptation by incorporating a dynamic kernel adjustment mechanism and a personalized model fusion strategy, thereby improving model generalization and robustness in heterogeneous data environments. Experimental results demonstrate that, compared to existing algorithms, AKPFL delivers substantial improvements in test accuracy on datasets such as FashionMNIST, CIFAR-10, and CIFAR-100, particularly under high statistical heterogeneity. The code for the framework will be released publicly following the completion of the paper review process.
Keyword: Keywords: Federated Learning, Personalized Federated Learning, Statistical Heterogeneity, Kernel Function, Meta-Learning.
Cite@inproceedings{ICIC2025,
author = {Shuoshuo Fang, Jialong Sun, Wenchao Zhang, Zongjian Yang, and Kejia Zhang},
title = {AKPFL:A Personalized Federated Learning Architecture to Alleviate Statistical Heterogeneity},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1478-1493},
note = {Poster Volume Ⅱ}
}
- MMGT-PD: A Multi-Modal Graph Transformer for Parkinson’s Disease Stage Classification Using Clinical Omics and Whole Blood RNA Sequencing Data, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chengjie Ding, Zeqi Xu, and Wei Zhang
Abstract: The assessment of both motor and non-motor functions in Parkinson's disease PD plays a crucial role in disease diagnosis and early intervention. In recent years, multi-modal deep learning methods have demonstrated excellent performance in identifying disease subtypes. However, previous studies have primarily focused on clinical and transcriptomic data, neglecting the information on gene associations. This paper proposes a multi-modal graph Transformer model named MMGT-PD, which integrates whole blood RNA sequencing data, gene co-expression networks, and Clinical omics data, combining modality-specific and consensus information to significantly enhance the accuracy of Parkinson's disease diagnosis. The model constructs a gene co-expression network using RNA sequencing data and designs an RNA sequencing encoder that combines Graph Attention Network GAT and Kolmogorov-Arnold Network KAN to extract RNA-specific representations. Additionally, the model introduces the Genegraph-Clinic Fusion GCFusion module to enhance the integration of multi-modal data by extracting shared information through inter-modal interactions. This paper conducts extensive comparative experiments on two well-known Parkinson's disease datasets, and the results show that the MMGT-PD method outperforms baseline models.
Keyword: Whole blood RNA sequencing data,Gene co-expression networks,Clinical omics data,Graph Transformer
Cite@inproceedings{ICIC2025,
author = {Chengjie Ding, Zeqi Xu, and Wei Zhang},
title = {MMGT-PD: A Multi-Modal Graph Transformer for Parkinson’s Disease Stage Classification Using Clinical Omics and Whole Blood RNA Sequencing Data},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1186-1200},
}
- MedFedSSL: Semi-Supervised Federated Learning with Dual Consistency and Adaptive Distillation for Medical Imaging, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chenhao Ye
Abstract: Medical image analysis presents unique challenges due to limited labeled data, privacy concerns, and heterogeneous data distributions across institutions. Federated learning FL offers a promising solution by enabling collaborative model training without sharing raw data. However, existing FL approaches often struggle with label scarcity in medical domains. In this paper, we propose MedFedSSL, a novel semi-supervised federated learning framework specifically designed for medical image analysis. Our approach integrates a dual-consistency regularization mechanism with an adaptive knowledge distillation strategy to effectively leverage both labeled and unlabeled data across distributed clients. We introduce a theoretically sound optimization objective that addresses the challenges of data heterogeneity and label imbalance in medical imaging. Extensive experiments on multiple medical imaging datasets demonstrate that MedFedSSL significantly outperforms state-of-the-art federated learning and semi-supervised learning methods, achieving superior performance with limited labeled data while preserving privacy. Our theoretical analysis provides convergence guarantees and bounds on the generalization error of the proposed approach.
Keyword: Federated Learning,Semi-Supervised Learning,Medical Image Analysis,Deep Learning
Cite@inproceedings{ICIC2025,
author = {Chenhao Ye},
title = {MedFedSSL: Semi-Supervised Federated Learning with Dual Consistency and Adaptive Distillation for Medical Imaging},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2997-3009},
}
- BPMFARD: A Bayesian Probabilistic Matrix Factorization Algorithm with Automatic Rank Determination in Recommender Systems, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hao Wang, Junfeng Yan, Jingyuan Xiao, Guangzhi Qu, and Feng Zhang
Abstract: Matrix factorization is a prevalent and effective technique for building recommender systems. However, traditional matrix factorization methods demand manual setting and tuning of hyperparameters, like regularization coefficients, learning rates, and the dimension of the feature matrix rank . To automate this, we introduce BPMFARD, a Bayesian probabilistic matrix factorization algorithm with automatic rank determination. By setting a prior distribution for factor matrices and devising an effective parameter elimination strategy, BPMFARD enables automatic parameter adjustment during training to significantly alleviate overfitting, enhancing recommendation accuracy. Experiments with benchmark datasets show that BPMFARD outperforms the benchmark methods. Since matrix factorization can be seen as a simple neural network, the rank determination strategy in matrix factorization may provide a valuable and interesting research perspective for the embedding size learning in neural collaborative filtering.
Keyword: Recommender Systems, Matrix Factorization, Automatic Rank Determination, Bayesian Inference Optimization
Cite@inproceedings{ICIC2025,
author = {Hao Wang, Junfeng Yan, Jingyuan Xiao, Guangzhi Qu, and Feng Zhang},
title = {BPMFARD: A Bayesian Probabilistic Matrix Factorization Algorithm with Automatic Rank Determination in Recommender Systems},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {456-471},
}
- Code Generation Security in LLMs: A Hybrid Detection and Post-Processing Framework for Vulnerability Mitigation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Beilei Zhang, Tao Hu, and Hailong Ma
Abstract: Large Language Models LLMs have transformed code generation but introduce critical security risks in software development pipelines. This paper proposes a hybrid framework combining static analysis Bandit CodeQL , dynamic fuzzing AFL__ , and syntax-aware repair rules to mitigate vulnerabilities in LLM-generated code without retraining. Evaluated on an enhanced SecurityEval benchmark with 185 test samples, our framework achieves a 68.2 reduction in vulnerabilities 95 CI: 64.7–71.7 while preserving 92.1 functional correctness across four state-of-the-art LLMs Qwen2.5-72B, QwQ-32B, ChatGPT-3.5, and ChatGPT-4 . Key findings reveal significant disparities in model security: ChatGPT-4 demonstrates superior vulnerability awareness static VDR: 83 vs. 61 for Qwen2.5-72B and generates 1.9× fewer vulnerabilities than open-source alternatives. The lightweight repair pipeline operates at 0.8 seconds per sample, enabling real-time deployment. This work highlights the necessity of integrating hybrid detection with context-aware repair to balance security and functionality in LLM-generated code.
Keyword: LLM Security, Code Generation, Static Analysis, Dynamic Fuzzing, Vulnerability Repair.
Cite@inproceedings{ICIC2025,
author = {Beilei Zhang, Tao Hu, and Hailong Ma},
title = {Code Generation Security in LLMs: A Hybrid Detection and Post-Processing Framework for Vulnerability Mitigation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {833-845},
}
- HH-GNN: Homogeneity- and Heterogeneity-Aware Graph Neural Network for Fraud Detection with Noisy Labels, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Boyi He, Jianzhe Zhao, Xuan Wang, Wei Ai, Tao Meng
Abstract: Because graph structures can represent rich information by aggregating neighborhood information, graph neural networks GNNs are heavily used in fraud detection tasks. However, a large amount of noise is generated in fraud detection problems that affect the detection effectiveness of the model. On the one hand, the fraudster actively generates noise through two disguises: feature disguise and relationship disguise on the other hand, a part of the noise is also generated in the graph construction due to the fact that the labeling of the adopted data is not guaranteed to be correct as well as the connection between normal nodes and fraudulent nodes unconsciously. In order to solve the above problems, we propose a framework that focuses on both homogeneous and heterogeneous information HH-GNN in the paper. It improves the noise at graph nodes and connections by considering both homogeneous and heterogeneous information in the distance calculation method and the dilated k-NN algorithm to achieve neighbor aggregation. Meanwhile, based on the early learning phenomenon, we introduce ELR regularization to effectively suppress the influence of noisy labels during gradient descent. Our experiments on graph-based fraud detection tasks on four real datasets using multidimensional metrics of AUC value, and F1-macro demonstrate the effectiveness and superiority of the proposed HH-GNN.
Keyword: Fraud Detection Graph Neural Networks Node Classification.
Cite@inproceedings{ICIC2025,
author = {Boyi He, Jianzhe Zhao, Xuan Wang, Wei Ai, Tao Meng},
title = {HH-GNN: Homogeneity- and Heterogeneity-Aware Graph Neural Network for Fraud Detection with Noisy Labels},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3462-3475},
}
- FSSNet: Frequency-Spatial Synergy Network for Universal Deepfake Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zepeng Su and Yin Chen
Abstract: In this paper, our aim is to develop a detector capable of effectively identifying previously unseen deepfake images, even with limited training data. Existing deepfake detection methods predominantly focus on single modality. For instance, frequency-domain approaches leverage Fourier transforms to capture frequency information, while spatial-domain methods utilize convolutional networks to extract visual features. However, relying on a single modality limits the ability to capture diverse feature types, resulting in poor generalization. To overcome this limitation, we propose a dual-stream network, FSSNet, which integrates the Scale-aware Bidirectional Cross Attention SBCA module and the Adaptive Feature Fusion AFF module for comprehensive and dynamic multi-modal feature fusion. Experimental results on deepfake images generated by eight unseen GAN models and ten unseen diffusion models demonstrate the superior performance of FSSNet, showcasing its robust generalization capability.
Keyword: Deep learning, Deepfake detection, Multi-modal fusion.
Cite@inproceedings{ICIC2025,
author = {Zepeng Su and Yin Chen},
title = {FSSNet: Frequency-Spatial Synergy Network for Universal Deepfake Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3013-3024},
}
- Design of Cargo Sorting and Transmission Scheme based on Digital Twin, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jing Liu
Abstract: Technological advancements are driving the transformation of tra-ditional warehousing logistics towards automation, with sorting be-ing the core of system optimization. To address the challenges of long cycles and low efficiency caused by reliance on physical equipment debugging in current sorting projects, this study propos-es a digital sorting and conveying solution based on digital twin technology. For real-time goods arrival sequences, we develop a data-driven virtual modeling method that generates characteristic material profiles, achieving high-fidelity cargo mapping and dy-namically configurable routing strategies. Prior to physical system deployment, virtual simulation verifies the feasibility and stability of conveying units, with iterative optimization achieving an average 99.26 peaking at 100 cargo transmission success rate. Through parameter optimization experiments, the optimal configuration is determined as 5 conveying units with vn=0.6m s speed, enabling ef-ficient coordination among supply rate, conveyor speed, and robot-ic arm sorting capacity. This configuration balances system stability and space utilization, delivering 59.1s average waiting time and 555.2 units hour throughput. Scheme is fed back to the real scenario before the actual equipment test.
Keyword: Warehousing Logistics, Digital twin, Cargo Sorting scheme.
Cite@inproceedings{ICIC2025,
author = {Jing Liu},
title = {Design of Cargo Sorting and Transmission Scheme based on Digital Twin},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3540-3555},
}
- PatchMamba: Multivariate Time Series Forecasting Model Based on Patch Attention and Mamba, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haitao Xiong, Feng Shao, and Yuanyuan Cai
Abstract: Multivariate time series forecasting MTSF has broad applications in real-life situations. However, multivariate time series data in different scenarios may exhibit different inter-sequence or intra-sequence dependencies. Existing models often struggle to capture these complex dependencies between time series and variables simultaneously, thus limiting forecasting accuracy. To address these issues, this paper proposes PatchMamba, a multivariate time series forecast model based on Patch Attention and Mamba. PatchMamba includes two novel Patch Attention mechanisms: CD-Patch Attention under the channel dependence strategy and CI-Patch Attention under the channel independence strategy. CD-Patch effectively captures the inter-variable dependencies. In contrast, CI-Patch Attention takes each variable individually to extract local features, avoiding cross-channel interference. Furthermore, we use bidirectional Mamba Bi-Mamba to capture long temporal dependency information. Experiments show that PatchMamba achieves higher forecast accuracy on multiple real-world datasets compared to current state-of-the-art SOTA models. In addition, this paper validates the role and robustness of the model components through ablation experiments and parameter sensitivity analysis.
Keyword: Multivariate Time series forecasting, Bi-Mamba, Patch Attention,PatchMamba.
Cite@inproceedings{ICIC2025,
author = {Haitao Xiong, Feng Shao, and Yuanyuan Cai},
title = {PatchMamba: Multivariate Time Series Forecasting Model Based on Patch Attention and Mamba},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {139-151},
}
- Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haoming Ji and Zequn Xie
Abstract: Text-Based Person Search TBPS seeks to identify individuals using textual descriptions. However, in real-world scenarios, noisy correspondences—under-correlated or even false-correlated image-text pairs—significantly degrade retrieval performance. Existing approaches often overemphasize negative samples, inadvertently amplifying this noise. To address these challenges, we propose the Dynamic Uncertainty with Noisy Correspondences DUNC framework, which introduces a novel Cross-modal Uncertainty Learning paradigm and a robust loss function, Dynamic Robust Loss DRL . Unlike conventional methods that rely on global representations, DUNC effectively mines fine-grained correspondences, improving alignment between text and image features. Furthermore, DUNC employs a Dirichlet distribution to model bidirectional evidence from cross-modal similarity, enabling the capture of alignment uncertainty and reducing the effects of large intra-class variations. Meanwhile, DRL adaptively selects and aggregates the most challenging negative samples, mitigating noise and capturing a richer distribution of negative samples. This design enhances robustness and representation quality, even in noisy environments. Extensive experiments on three benchmark datasets demonstrate that DUNC achieves strong resistance to noise and improves retrieval performance under both low- and high-noise conditions. The code is publicly available at url{https: github.com ASL-forever DUNC}.
Keyword: keywords{Text-Based Person Search and Noisy Correspondences and Cross-modal Uncertainty Learning.}
Cite@inproceedings{ICIC2025,
author = {Haoming Ji and Zequn Xie},
title = {Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1-15},
note = {Poster Volume Ⅰ}
}
- Class Prototype-guided Disambiguation in Partially Labeled Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yu He, YaLing Ge, and Jun Zhou
Abstract: Partial Label Learning PLL is a prominent research direction in weak supervision, in which each instance is associated with a set of ambiguous candidate labels. Recent PLL methods primarily focus on uncovering the latent true label using label ambiguity information. However, the candidate label set contains only one true label. We directly utilize the entire candidate set will introduce label noise and hinder performance improvement of model training. To address this issue, we propose a guided model learning method called Class Prototype-induced Weighted Contrastive Partial Label Learning method PIWCL to effectively reduce the impact of label noise. Specifically, PIWCL consists of the Class Prototype-guided Module CPGM and the Weighted Contrastive Learning Module WCLM . WCLM employs a novel weighting scheme to learn more compact and discriminative representations, mitigating the confusion caused by ambiguous class samples while capturing useful latent information. Meanwhile, CPGM guides the classifier's learning process, further improving its ability to distinguish between positive and negative samples and facilitating the training of WCLM. Experimental results show that, compared to existing PLL methods, PIWCL achieves significant improvements in effectiveness.
Keyword: Partial Label Learning, Class Prototype, Contrastive Learning, Weakly Supervised Learning
Cite@inproceedings{ICIC2025,
author = {Yu He, YaLing Ge, and Jun Zhou},
title = {Class Prototype-guided Disambiguation in Partially Labeled Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {472-484},
}
- TS-KFNet: Key-Frame Optimized Lightweight Video Forgery Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Fan Zhang, Chang Liu, Yuchuan Luo, Zhenyu Qiu, and Zhiping Cai
Abstract: With the rapid development of generative artificial intelligence technologies, video forgery techniques have evolved from localized facial replacement to multimodal scene synthesis text to video , posing severe challenges to media authenticity. Existing detection methods struggle to meet the requirements for identifying high-quality synthetic videos due to insufficient spatiotemporal dependency modeling, low computational efficiency, and limited sensitivity to subtle local artifacts. To address this, we propose TS-KFNet—a lightweight dual-stream detection framework that achieves efficient video forgery detection by fusing global spatiotemporal attention with keyframe-based local artifact analysis. The framework adopts TimeSformer backbone network to capture global motion and appearance consistency through a divided space-time attention mechanism, reducing computational complexity from $O TN^2 $ to $O T^2_N^2 $. A dynamic keyframe selection strategy is introduced to filter the top 10 most informative keyframes based on motion-compensated grayscale difference analysis, significantly reducing computational costs. Simultaneously, a CNN-enhanced branch extracts local artifact features from keyframes, forming a hybrid architecture that balances efficiency and accuracy. Experiments on 8 cutting-edge video generation models demonstrate that TS-KFNet achieves an average accuracy of 94.0 and AUC of 99.0 , outperforming existing methods by up to 12.5 in accuracy improvement. The inference speed is 10 times faster than the state-of-the-art method AIGVDet. The core contributions include a multi-granularity detection paradigm, a keyframe-based efficient inference framework, and an evaluation benchmark for emerging forgery technologies. This study provides a reliable solution for real-time high-precision long video forgery detection in dynamic complex scenarios.
Keyword: Video forgery detection,TS-KFNet,Dual-stream framework,Spatiotemporal attention.
Cite@inproceedings{ICIC2025,
author = {Fan Zhang, Chang Liu, Yuchuan Luo, Zhenyu Qiu, and Zhiping Cai},
title = {TS-KFNet: Key-Frame Optimized Lightweight Video Forgery Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3028-3043},
}
- DASTCN: Enhancing Cross-Subject P300 Detection via Adversarial Spatio-Temporal Learning and Adaptive Source Selection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaodong Yang, Fei Wang, and Zhibin Du
Abstract: Brain-Computer Interface BCI systems aim to decode neural activity and translate it into actionable commands for external devices. Electroencephalogram EEG is a widely used, non-invasive method for analyzing brain activity. However, the significant inter-subject variability in EEG signals poses a major challenge for the generalization of EEG-based models. While Domain-Adversarial Neural Networks DANN have demonstrated promising results in transfer learning tasks, their application to EEG-based cross-subject P300 detection remains relatively unexplored. In this study, we introduce the Domain-Adversarial Spatio-Temporal Convolution Network DASTCN , which combines a Generative Adversarial Network GAN with a lightweight spatio-temporal convolutional architecture to address the issue of inter-subject variability. Extensive empirical evaluations show that DASTCN outperforms conventional models, achieving an accuracy of 84.9 in cross-subject P300 detection. These findings underscore the potential of DASTCN as a transformative tool for advancing practical BCI systems and offer significant implications for future research and applications in this field.
Keyword: Brain-Computer Interface, DANN, GAN, Cross-Subject.
Cite@inproceedings{ICIC2025,
author = {Xiaodong Yang, Fei Wang, and Zhibin Du},
title = {DASTCN: Enhancing Cross-Subject P300 Detection via Adversarial Spatio-Temporal Learning and Adaptive Source Selection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2145-2157},
note = {Poster Volume Ⅱ}
}
- Anomaly Detection in Time Series Data Based on a Variable-Time Transformer, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Huijuan Hao and Can Li
Abstract: With the performance degradation caused by aging industrial equipment, multivariate time series anomaly detection has become pivotal for enabling Prognostics and Health Management PHM and preventive maintenance. However, existing methods face critical challenges in complex industrial scenarios, including insufficient real-time responsiveness, high noise sensi-tivity, and limited diversity in anomaly pattern recognition. To address these issues, this paper proposes VT-GAN, an anomaly detection framework that deeply integrates a Variable-Time Transformer VTT with Generative Ad-versarial Networks GANs . The model employs parallel generator groups, where each generator extracts multi-scale temporal patterns through dilated causal convolution. The VTT architecture combines temporal self-attention, variable-specific self-attention, and cross-attention layers, explicitly model-ing spatiotemporal interactions via learnable gating weights to effectively capture variable coupling effects under complex operating conditions. Fur-thermore, the integration of the Model-Agnostic Meta-Learning MAML framework enhances rapid adaptation to new tasks or environ-ments.Extensive experiments on six industrial datasets, including Secure Wa-ter Treatment SWaT and Server Machine Dataset SMD , demonstrate that VT-GAN outperforms the Transformer-GAN baseline with a 12.7 im-provement in F1-score average 89.3 , 23.4 reduction in false alarm rate, and real-time inference latency under 28 ms. Ablation studies validate the critical contributions of the multi-generator architecture F1-score improves by 3 and dynamic hybrid attention mechanism F1-score improves by 5.2 . This work provides a robust and reliable real-time monitoring solution for industrial equipment health management, demonstrating significant po-tential for industrial deployment.
Keyword: Multivariate Time Series, Anomaly Detection, Variable-Time Transformer, Generative Adversarial Networks, Model-Agnostic Meta-Learning.
Cite@inproceedings{ICIC2025,
author = {Huijuan Hao and Can Li},
title = {Anomaly Detection in Time Series Data Based on a Variable-Time Transformer},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {94-110},
}
- HyperMoCL: Emotion Recognition via Multimodal Representation Learning and Multi-Level Hypergraph Contrastive Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuoxin Liu, Guocheng An, and Xiaolong Wang
Abstract: Multimodal Emotion Recognition in Conversation MERC aims to identify the emotional states of speakers by integrating linguistic, audio, and visual in-formation from dialogues. The core challenge of MERC lies in effectively fus-ing multimodal information and extracting key features. In recent years, hy-pergraph-based methods have been explored to construct hypergraphs directly using features output from unimodal encoders. However, due to the heteroge-neity across modalities and the propagation of noise and redundant information within the hypergraphs, the modeling of inter-modal relationships often be-comes inaccurate. Furthermore, existing approaches that employ node-level hypergraph contrastive learning overlook global structural information, result-ing in insufficient modeling of global features. To address these limitations, we propose HyperMoCL, which integrates multimodal representation learning and multi-level hypergraph contrastive learning. First, HyperMoCL obtains higher-quality modal features through multimodal representation learning for hypergraph construction. Subsequently, a multi-level hypergraph contrastive learning framework is employed to comprehensively capture the structural fea-tures of the hypergraph, thereby enhancing feature discriminability and model robustness. Experimental results on two widely-used datasets IEMOCAP, MELD demonstrate that our method outperforms previous state-of-the-art ap-proaches.
Keyword: multimodal representation learning, multi-level hypergraph contrastive learn-ing, conversation emotion recognition.
Cite@inproceedings{ICIC2025,
author = {Shuoxin Liu, Guocheng An, and Xiaolong Wang},
title = {HyperMoCL: Emotion Recognition via Multimodal Representation Learning and Multi-Level Hypergraph Contrastive Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {846-863},
}
- MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiancheng Zhao and Xingda Yu
Abstract: Parameter-Efficient Fine-Tuning PEFT has become an essential approach for adapting large-scale pre-trained models while reducing computational costs. Among PEFT methods, LoRA significantly reduces trainable parameters by decomposing weight updates into low-rank matrices. However, traditional LoRA applies a fixed rank across all layers, failing to account for the varying complexity of hierarchical information, which leads to inefficient adaptation and redundancy. To address this, we propose MSPLoRA Multi-Scale Pyramid LoRA , which introduces Global Shared LoRA, Mid-Level Shared LoRA, and Layer-Specific LoRA to capture global patterns, mid-level features, and fine-grained information, respectively. This hierarchical structure reduces inter-layer redundancy while maintaining strong adaptation capability. Experiments on various NLP tasks demonstrate that MSPLoRA achieves more efficient adaptation and better performance while significantly reducing the number of trainable parameters. Furthermore, additional analyses based on Singular Value Decomposition validate its information decoupling ability, highlighting MSPLoRA as a scalable and effective optimization strategy for parameter-efficient fine-tuning in large language models. Our code is available at https: github.com Oblivioniss MSPLoRA.
Keyword: LoRA and PEFT and Multi-Scale Learning and Hierarchical Pyramid Structure and Redundancy Elimination
Cite@inproceedings{ICIC2025,
author = {Jiancheng Zhao and Xingda Yu},
title = {MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1494-1510},
note = {Poster Volume Ⅱ}
}
- Enhancing the Identification of Related-Key Neural Differential Distinguishers for SPECK32/64, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wanqing Wu and Mengxuan Cheng
Abstract: Lightweight encryption algorithms play a vital role in securing communica-tions for resource constrained devices. As a prominent lightweight cipher, SPECK has attracted extensive security analyses. At ASIACRYPT 2023, Bao et al. introduced a related key neural network differential distinguisher capa-ble of effectively distinguishing 9 round SPECK32 64 ciphertexts and inte-grated it into a 1_s_r_1 key recovery framework to attack 14 round SPECK32 64. Inspired by their work, this paper presents a new related key neural differential distinguisher for SPECK32 64, built upon a novel relat-ed key processing method and an alternative network architecture, which significantly boosts the accuracy of distinguishing 10 round ciphertexts. Within the same 1_s_r_1 key recovery framework, we employed our trained distinguisher to recover the key of 15 round SPECK32 64. The spe-cific contributions are as follows: First, this paper introduces a novel related-key processing method, generating correlated subkey pairs for encrypting samples containing 64 plaintext pairs. Second, a related-key neural differen-tial distinguisher was constructed based on the Inception module from Goog-leNet and the DenseNet architecture. Experimental results demonstrate that the trained distinguisher achieves a recognition accuracy of 97.24 for 10-round ciphertexts, surpassing Bao et al.'s results by extending the recogniza-ble rounds by one. Finally, leveraging the 10-round neural distinguisher, this paper successfully executed a key recovery attack on 15-round SPECK32 64. Analysis of error-bit distributions revealed a correct key re-covery success rate of 98.67 .
Keyword: SPECK, Key Recovery Attack, Neural Differential Distinguisher, Related Key.
Cite@inproceedings{ICIC2025,
author = {Wanqing Wu and Mengxuan Cheng},
title = {Enhancing the Identification of Related-Key Neural Differential Distinguishers for SPECK32/64},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {944-961},
note = {Poster Volume Ⅰ}
}
- Layer-Wise Stability Optimization for Accurate and Reliable Prediction on Clinical Lab Data, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuang Zhang, Mei Wang, and Qiao Pan
Abstract: Clinical lab tests are essential for disease diagnosis, medical treatment and predictive modeling tasks within healthcare. Traditional learning methods often struggle with lab test data that include imprecision ranges. In this paper, we tackle the challenge of improving prediction stability without generating additional training samples nor compromising accuracy. We reformulate the learning problem as a multi-objective optimization task, where accuracy and stability are both key objectives.To accomplish this, we develop a novel approach for calculating stability loss, by decomposing stability loss into a cumulative process, propagated layer by layer. We then formulate a stability-enhanced loss SELoss function to control the layer-wise output errors and maintain prediction precision. In addition, we design a multi-stage learning mechanism to control instability in each layer, especially in the initial layers. These components regulate the learning process, achieving a much improved balance between accuracy and stability. Using three real-world datasets, experimental results demonstrate that SELoss achieves more accurate and stable predictions across various tasks by reducing the instability of each layer. Also, as input perturbation increases, the rise in output instability slows down.
Keyword: Prediction stability;neural networks;imprecise data;healthcare
Cite@inproceedings{ICIC2025,
author = {Shuang Zhang, Mei Wang, and Qiao Pan},
title = {Layer-Wise Stability Optimization for Accurate and Reliable Prediction on Clinical Lab Data},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2253-2269},
note = {Poster Volume Ⅱ}
}
- Prediction Research of TACE Treatment Response Based on Multimodal Data Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Lin Tong, Zhengkui Chen, Jun Luo, Qingli Zhou, Yuqiang Shen, and Jijun Tong
Abstract: Transcatheter arterial chemoembolization TACE is the preferred non-surgical treatment for HCC patients, but up to 60 of HCC patients do not benefit from TACE treatment. Therefore, accurately and efficiently predicting the treatment response after TACE in HCC patients is of great significance for treatment planning. To address this, a predictive model based on the integration of clinical data and preoperative CT imaging was proposed. This model first used a convolutional neural network to extract microscopic structural features from CT images, then inputed the extracted features into a Long Short-Term Memory network to obtain the global feature vector for each CT slice. Next, a deep neural network was used to extract features from the clinical data. Finally, the features were fused using an asymmetric cross-attention mechanism, followed by classification using a feedforward neural network. A retrospective study was conducted on 181 HCC patients who underwent TACE treatment at a hospital in Zhejiang Province from January 2018 to April 2022. The AUC, precision, accuracy, and recall of the prediction model are 0.85, 0.86, 0.88, and 0.87, respectively. The experimental results demonstrate that the model Cnn-Lstm-Dnn-Cross-Attention, CLDCA outperforms the comparison models, providing an effective solution for predicting the post-TACE treatment response in HCC patients.
Keyword: Multimodal fusion Transcatheter arterial chemoembolization Cross-attention mechanism Convolutional neural networks.
Cite@inproceedings{ICIC2025,
author = {Lin Tong, Zhengkui Chen, Jun Luo, Qingli Zhou, Yuqiang Shen, and Jijun Tong},
title = {Prediction Research of TACE Treatment Response Based on Multimodal Data Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1509-1525},
note = {Poster Volume Ⅱ}
}
- DMKAN-Net: A Dual-Modal Fusion and KAN Decoder Network for RGB-D Salient Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Qingbo Xue, Panpan Zheng, and Liejun Wang
Abstract: At present, most RGB-D saliency target detection algorithms SOD are committed to investing a lot of computing power in the encoding and feature fusion stages. Admittedly, this strategy brings the performance improvement, but it also ignores the feature recovery ability and the fitting ability in the decoding stage. Therefore, we designed a Dual-modal fusion and KAN decoder network called DMKAN-Net to better implement the RGB-D SOD task. This network structure has only three parts, dual-stream encoder, dual-mode converter and KAN decoder. Among them, the dual-stream encoder adopts the Swin Trans former encoder, which is mainly used to extract the multilevel and global features in the RGB and depth images. In the dual-mode fusion section, we design a dual mode feature fusion module to capture the channel information and spatial infor mation in different modes and fuse it. KAN decoder is the decoder mainly com posed of KAN module, which uses nonlinear and learnable activation function to better recover and predict saliency targets. Moreover, experiments performed on five benchmark datasets show that our method achieves competitive results.
Keyword: GB-D, SOD, Dual-model, KAN.
Cite@inproceedings{ICIC2025,
author = {Qingbo Xue, Panpan Zheng, and Liejun Wang},
title = {DMKAN-Net: A Dual-Modal Fusion and KAN Decoder Network for RGB-D Salient Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {529-545},
note = {Poster Volume Ⅰ}
}
- CDAD: A Novel High-Efficiency Cross-age Domain Adaptation ECG Diagnosis Algorithm for Adolescents, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yiyang Li, Zhenghan Zhang, Dejing Zhang, JieJia Chen, Zhenkun Cai, and Tang Tang
Abstract: Cardiovascular diseases are rapidly becoming one of the major health problems in adolescents. With the advancement of deep learning techniques, smart ECG diagnostic tools based on these techniques show great potential for application in real-world healthcare settings. However, the scarcity of ECG data in adolescents compared to older adults is a key challenge for deep learning techniques, the accuracy of which relies on extensive labeled training data. In this paper, we propose a Cross-age Domain Adaptation Diagnosis CDAD approach and introduce a domain adaptation network, Squeeze and Excitation Widekernel Neural Network SEWNN , aiming to alleviate the constraints imposed by unlabeled data and cross-domain diagnosis. Firstly, Largescale labeled ECG data from elderly individuals are employed for feature extraction and model training. Subsequently, an adversarial learning approach is employed to enhance the model’s cross-domain transfer capabilities. In addition, Ensemble learning techniques that consider information from multiple cues to improve prediction accuracy are applied. In this study, we validate the effectiveness of the proposed method by applying it to three public ECG diagnostic datasets and evaluating its applicability from the elderly to adolescents. By comparing the experimental results with other methods, we demonstrate the validity of the method in adolescents diagnosing ECG, as well as its robustness in cross-dataset diagnosis.
Keyword: ECG Diagnosis Domain Adaptation Adversarial Learning Ensemble Learning
Cite@inproceedings{ICIC2025,
author = {Yiyang Li, Zhenghan Zhang, Dejing Zhang, JieJia Chen, Zhenkun Cai, and Tang Tang},
title = {CDAD: A Novel High-Efficiency Cross-age Domain Adaptation ECG Diagnosis Algorithm for Adolescents},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1201-1218},
}
- Key frame extraction based on sparse coding with deep frame features, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yujie Li, Ning Liu, Depeng Chen, Shuxue Ding, and Benying Tan
Abstract: Key frame extraction based on sparse coding can present the entire video with a small number of key frames, reducing the redundancy of the video. However, existing sparse coding-based methods use raw video frame features, which leads to high computational complexity and significant time consumption. In this paper, we propose a novel key frame extraction method based on sparse coding and deep frame features KSC-DFF to address these challenges. First, we obtain deep frame features using a deep neural network, which can reduce the dimensionality of the input video data and generate deep frame features such as the main object features of the frame. To automatically extract deep frame features, a YOLO-based deep neural network called YOLO-MLP was designed for video feature extraction. Then, we used sparse coding to extract key frames based on deep frame features, which can reduce information redundancy and computation time while maintaining high accuracy. Experimental results on SumMe demonstrate that the proposed KSC- DFF outperforms the existing methods with an increase of 49.4 and a time reduction of nearly 98 compared to the conventional sparse coding-based method SMRS.
Keyword: Key frame extraction, Sparse coding, Deep learning, Feature extraction, YOLO-MLP.
Cite@inproceedings{ICIC2025,
author = {Yujie Li, Ning Liu, Depeng Chen, Shuxue Ding, and Benying Tan},
title = {Key frame extraction based on sparse coding with deep frame features},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {485-498},
}
- The Task Scheduling of IMA based on The Multi-stage Q-learning Differential Evolution, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuying Feng, Lisong Wang, Shaohan Liu, Fengtao Xu, and Yizhuo Sun
Abstract: With the increasing complexity of the integrated modular avionics IMA and the growing demand for efficient operation in multi-task environments, IMA systems must not only utilize various resources efficiently but also consider communica-tion latency, system safety and real-time responsiveness during task execution. Because of the number and complexity of tasks increasing, the system faces dual challenges: real-time task scheduling and resource utilization optimization. There-fore, we propose a task scheduling method based on a multi-stage Q-learning dif-ferential evolution algorithm. First, a bilevel scheduling model for IMA systems is constructed, which comprehensively considering key factors such as resource utilization, communication latency and safety. Second, an enhanced differential evolution algorithm is employed to optimize the model. Specifically, in the popu-lation initialization stage, the proposed algorithm uses a chaotic logistic map to ensure a uniform distribution of the initial population in the solution space. Dur-ing the population evolution process, the fitness of each infeasible individual is corrected through the penalty function. Meanwhile, the proposed algorithm utiliz-es a Q-learning mechanism to dynamically adjust the evolutionary operators to improve their adaptability and employs a multi-stage constraint addition strategy to expand the search space of the algorithm. Finally, experimental results compar-ing the proposed algorithm with other algorithms demonstrate its superior per-formance, which indicates its effectiveness in solving task scheduling problems in integrated avionics systems.
Keyword: IMA Task Scheduling, Constraint Handling, Differential Evolution DE , Q-learning Algorithm, Adaptive Operator.
Cite@inproceedings{ICIC2025,
author = {Shuying Feng, Lisong Wang, Shaohan Liu, Fengtao Xu, and Yizhuo Sun},
title = {The Task Scheduling of IMA based on The Multi-stage Q-learning Differential Evolution},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1215-1227},
note = {Poster Volume Ⅱ}
}
- Cause-based Supervised Contrastive Learning with Adversarial Sample-label for Multimodal Emotion Recognition in Conversation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yujie Guan, Yu Wang, Weijie Feng, Tingyi Li, and Xiao Sun
Abstract: Multimodal Emotion Recognition in Conversation MERC holds significant importance in Natural Language Processing due to its broad range of applications. However, existing methods still face challenges in addressing the highly imbalanced class problem and extracting robust representations for complex conversational scenarios. which leads to a decrease in the generalization ability of the model and an inability to effectively recognize minority emotion classes. To address these challenges, a cause-based supervised contrastive learning framework with adversarial sample-label CaSCLA is proposed in this paper. Specifically, we employ a modality balancing technique to fuse the multimodal features, which are then fed into a novel causal-aware network to effectively capture the underlying causal relationships within dialogues. Besides, a supervised contrastive learning with adversarial sample-label method is proposed to alleviate the class imbalance problem by learning label representations and optimizing the similarity between sample features and label embeddings. Furthermore, CaSCLA applies an adversarial samples training strategy, constructing additional positive sample-label pairs to enhance the diversity of the data and increase the robustness of the model. Extensive experiments on the IEMOCAP and MELD benchmark datasets demonstrate that CaSCLA achieves competitive performance.
Keyword: Class Imbalance, Supervised Contrastive Learning, Emotion Recognition in Conversation.
Cite@inproceedings{ICIC2025,
author = {Yujie Guan, Yu Wang, Weijie Feng, Tingyi Li, and Xiao Sun},
title = {Cause-based Supervised Contrastive Learning with Adversarial Sample-label for Multimodal Emotion Recognition in Conversation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {882-897},
}
- Research on Movie Service Recommendation Algorithms Incorporating Film Attributes and Multimodal Information, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chuanxi Liu, Jing Li, Chengfan Jiao, Ming Zhu, and Wenxuan Liu
Abstract: As a personalized recommendation technology, the recommendation system aims to predict users' preferences for items and provide recommendation services for users. Movie recommendation technology can help users quickly find their pre-ferred movies and thus meet their viewing needs. Traditional context-based movie recommendation models only use text data, obtaining limited information from single-modal data and failing to fully address the problem of data sparsity. This paper proposes a multimodal movie recommendation model Layered Multi-head Attention Dynamic Graph, LMADG that integrates text and image data, aiming to capture the dynamic changes in user interests and the graph structure infor-mation of user-movie interactions. By combining Graph Convolutional Network GCN and temporal attention mechanisms, LMADG can effectively extract the temporal features of users and movies and generate personalized recommendation results. Finally, comparative experiments are conducted on the Movielens-1M, TMDB, and Netflix Prize datasets, verifying that the proposed model has better recommendation quality.
Keyword: Multi-modal, Graph convolutional network, Movie recommendation, Temporal attention mechanism
Cite@inproceedings{ICIC2025,
author = {Chuanxi Liu, Jing Li, Chengfan Jiao, Ming Zhu, and Wenxuan Liu},
title = {Research on Movie Service Recommendation Algorithms Incorporating Film Attributes and Multimodal Information},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {16-28},
note = {Poster Volume Ⅰ}
}
- BERT-PointerNet: A Unified Framework for Cross-Sentence Entity-Relation Extraction in Chinese Computer Science Texts, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: ChuCheng Wu, YunQi Huang, TuanXiong Ni, and ShiDeng Ma
Abstract: This study presents an innovative text annotation framework for constructing knowledge graphs in the Chinese computer science domain, addressing challenges such as nested entity resolution and implicit relation extraction in Chinese technical texts. The proposed method integrates relation extraction into named entity recognition NER via a novel CS_R_BMES tagging schema, which extends the BMES Begin-Middle-End-Single approach to encode both entity boundaries and relation types. By appending a fully connected layer to the BERT model, we generate domain-specific word embeddings that align with the CS_R_BMES annotation space. These embeddings are then fed into a BERT-BiLSTM-CRF-PointerNet architecture, where a Pointer Network decodes CRF-generated labels into structured triples, dynamically resolving nested entities and implicit relations through cross-attention mechanisms. Experimental results demonstrate a 4.19 F1 score improvement over baseline models, with the proposed model achieving 93.7 F1 for entity-relation extraction. Ablation studies confirm the critical role of BERT�s contextual encoding and the Pointer Network�s capability to handle complex linguistic phenomena. Notably, this framework exhibits strong generalizability, enabling cross-domain adaptation to fields like software engineering by adjusting entity'elation categories. The constructed knowledge graph provides a scalable foundation for educational applications in computer science.
Keyword: Knowledge Graph Construction , Named Entity Recognition Tasks, BERT Models, PointerNet, Ternary Extraction Tasks
Cite@inproceedings{ICIC2025,
author = {ChuCheng Wu, YunQi Huang, TuanXiong Ni, and ShiDeng Ma},
title = {BERT-PointerNet: A Unified Framework for Cross-Sentence Entity-Relation Extraction in Chinese Computer Science Texts},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {864-881},
}
- ST-PCN: A Dynamic Point Cloud Classification Network Based on the Spatiotemporal Attention Mechanism of Millimeter-Wave Radar, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zanqiang Wu, Qiaojuan Tong, and Hongbing Ma
Abstract: With the advent of the intelligent era, human activity recognition has become increasingly important in application scenarios such as intelligent monitoring and human-computer interaction. The application of millimeter-wave radar in human behavior recognition has also emerged as a research hotspot in recent years. The point cloud data of millimeterwave radar can provide depth information in a three-dimensional space, which endows it with unique advantages in capturing spatial postures. This paper proposes a general method for human behavior recognition based on millimeter-wave radar point clouds. This method first processes the point cloud data of each frame in the spatial dimension to extract spatial features. Subsequently, it models the temporal dimension. By introducing attention mechanisms in both space and time, the model can focus on important features, thereby improving the accuracy of behavior recognition. Finally, the extracted features are classified through a multi-layer perceptron MLP . By comparing with other methods on public datasets, the results show that the proposed ST-PCN network model outperforms other baseline models, verifying its effectiveness and superiority
Keyword: Millimeter-wave Radar · Point Cloud · Behavior Recognition · Behavior Recognition.
Cite@inproceedings{ICIC2025,
author = {Zanqiang Wu, Qiaojuan Tong, and Hongbing Ma},
title = {ST-PCN: A Dynamic Point Cloud Classification Network Based on the Spatiotemporal Attention Mechanism of Millimeter-Wave Radar},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1850-1864},
note = {Poster Volume Ⅱ}
}
- Q-learning-based Optimal Control Scheme for Time-varying Uncertain Batch Processes, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jianan Liu, Wenjing Hong, and Jia Shi
Abstract: Industrial batch processes are extensively employed in the modern manufac-turing industry due to their efficiency and flexibility. However, controlling and optimizing batch processes is difficult because of their complex non-stationary dynamics and inherent time-varying uncertainties. In order to ad-dress these issues, this paper proposes a novel Q-learning-based optimal con-trol scheme for time-varying batch processes to achieve optimal control while reducing reliance on process modeling. First, based on the time-varying nominal model, an initial control policy is derived from dynamic programming and the principle of optimality. Nevertheless, the presence of unknown time-varying system uncertainties hinders the optimal perfor-mance of the initial control policy. To overcome this limitation, we utilize the repetitive nature of the batch process to collect operational data from multiple batches runs under the initial control policy. Then, the Q-learning-based optimal control scheme is developed to iteratively improve the initial control policy under the reinforcement learning framework. The convergence analysis demonstrates that the improved control policy gradually converges to the optimal control policy. Finally, the simulation results from the nu-merical multi-input multi-output batch system and the injection molding process demonstrate the proposed control method's effectiveness, applicabil-ity, and superior control performance.
Keyword: Reinforcement Q-learning, Optimal Control Scheme, Time-varying batch processes, Time-varying system uncertainties
Cite@inproceedings{ICIC2025,
author = {Jianan Liu, Wenjing Hong, and Jia Shi},
title = {Q-learning-based Optimal Control Scheme for Time-varying Uncertain Batch Processes},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2594-2611},
}
- Dual Path Attention and Re-parameterization Network for efficient image super-resolution, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yao Li, Junjie Huang, Ka Chen, Guoqiang Zhao, Aodie Cui, Yongzhuo Zhu, Hanyang Pan, and Chuang Li
Abstract: Efficient deep learning based methods have achieved significant performance in single image super-resolution. Recent research on efficient super-resolution has mainly focused on reducing the number of parameters and computational complexity through various network designs, enabling models to be better deployed on resource constrained devices. In this work, we propose a novel and effective super-resolution model based on attention mechanism and structural re-parameterization, called Dual Path Attention and Re-parameterization Network DPARN , which uses a hybrid attention mechanisms to balance the running speed and reconstruction quality of the model. Specifically, we utilize grouped convolution to introduce both parameter free and enhanced spatial attention to improve the feature extraction capability of student networks. Meanwhile, we ado-pted a novel lightweight network training strategy that first uses knowledge distillation for initial training, during which structured knowledge from the teacher network is transmitted to the student network. Then, multiple loss functions are combined to fine-tuning the student network, in order to preserve high-frequency details and avoid excessive smoothing caused by pixel loss. Finally, extensive experiments conducted on four benchmark datasets demonstrated the effectiveness and efficiency of proposed DPARN. Our method achieves PSNR SSIM performance comparable to state-of-the-art efficient super-resolution models, with faster inference speed and fewer network parameters.
Keyword: Super-resolution,Balance,Attention,Re-parameterization
Cite@inproceedings{ICIC2025,
author = {Yao Li, Junjie Huang, Ka Chen, Guoqiang Zhao, Aodie Cui, Yongzhuo Zhu, Hanyang Pan, and Chuang Li},
title = {Dual Path Attention and Re-parameterization Network for efficient image super-resolution},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1525-1541},
note = {Poster Volume Ⅱ}
}
- Efficient Multimodal Sentiment Recognition with Dual Cross-Attention for Multi-Scale Features, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xinyu Ye, Tingsong Ma, Yi Feng, and Yiming Zhai
Abstract: Multimodal sentiment analysis has emerged as a key research field, particularly for decoding emotions conveyed through text and images on social media platforms. However, many approaches encounter difficulties when integrating textual and visual features across diverse dimensions, often leading to suboptimal performance. To address this, we propose a novel approach for multimodal sentiment recognition that designs a simple yet efficient network, inspired by feature pyramids. In this model, feature vectors are split into high- dimensional and low-dimensional representations, which are then processed through distinct cross-attention mechanisms tailored to their scales, followed by a fusion step to capture comprehensive cross-modal interactions. This strategy enhances the network’s ability to model relationships between modalities effectively. We evaluated our approach on the well-established MVSA-Single and MVSA-Multiple datasets, where it consistently surpasses existing techniques. Specifically, it achieves an accuracy of 78.27 and an F1 score of 77.95 on MVSA-Single, and an accuracy of 71.18 and an F1 score of 68.92 on MVSA-Multiple. These results demonstrate the potential of combining high- and low-dimensional features with dual cross-attention for social media sentiment analysis.
Keyword: Multimodal Sentiment Analysis, Cross-Attention Mechanism, Multi-Scale Features.
Cite@inproceedings{ICIC2025,
author = {Xinyu Ye, Tingsong Ma, Yi Feng, and Yiming Zhai},
title = {Efficient Multimodal Sentiment Recognition with Dual Cross-Attention for Multi-Scale Features},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1990-2002},
note = {Poster Volume Ⅱ}
}
- Mapping Cyber Threat Intelligence through Active Semi-Supervised Learning (ASSBM), ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Sujie Shao, Zhiyi Li, Yan Liu, Shaoyong Guo, and Chao Yang
Abstract: Cyber Threat Intelligence CTI plays a critical role in enhancing the implementation of cybersecurity programs by offering comprehensive information on attacks, which enables organizations to identify and respond to cyber threats more effectively. However, because most CTI data is presented in natural language and often contains ambiguous content, it requires interpretation and summarization by security experts for effective utilization. To ad-dress these challenges, this paper proposes a mapping method for CTI based on active and semi-supervised SecureBERT, aimed at alleviating the scarcity of labeled data and the ambiguities inherent in the CTI mapping task. This method efficiently extracts potential attack stage information from CTI at a minimal cost, ensuring accurate mapping even when labeled sample sizes are insufficient. We introduce an active learning sampling strategy that integrates uncertainty and instance relevance, selecting the most representative samples from unlabeled data to augment the training set. This strategy enhances the interpretability of labeled-scarce and ambiguous CTI, facilitating precise mappings between ambiguous CTI and the accurate phases of cyber attacks. Validation through experiments on the CPTC and CCDC datasets demonstrates that the proposed method excels across various baseline models, considering the influence of labeled data quantity and comparisons with different active learning algorithms. In situations where labeled CTI is limited, the proposed approach significantly improves the interpretive effective-ness of CTI, thereby enhancing the model's classification accuracy and training efficiency.
Keyword: CTI Mapping, Active Learning, SecureBERT, BERT.
Cite@inproceedings{ICIC2025,
author = {Sujie Shao, Zhiyi Li, Yan Liu, Shaoyong Guo, and Chao Yang},
title = {Mapping Cyber Threat Intelligence through Active Semi-Supervised Learning (ASSBM)},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {962-978},
note = {Poster Volume Ⅰ}
}
- HSAN: A Side Adapter Network with Hybrid Compression and Local Enhancement Attention, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yankui Fu, Fang Yang, and Qingxuan Shi
Abstract: Significant progress has been made in open-vocabulary semantic segmentation tasks, particularly in recognizing and segmenting unseen categories by leveraging Contrastive Language-Image Pre-training CLIP . Among existing methods, the Side Adapter Network SAN stands out as an effective approach, achieving strong performance. However, we identify that SAN does not perform well in capturing fine-grained local features in complex scenes and high-resolution images. Additionally, it suffers from high computational costs and struggles to effectively fuse the features generated by its internal modules with those extracted by CLIP, resulting in segmentation accuracy. To address these issues, we propose HSAN, which introduces the Hybrid Compression and Local Enhancement Attention HCLEA mechanism to re-duce dimensionality for lower computational complexity while using additional convolutional neural networks to preserve and enhance local features. Furthermore, we design an Adaptive Feature Fusion Block AFFB that dynamically adjusts fusion weights based on input features, achieving better global-local feature fusion and fully leveraging CLIP’s generalization ability. Extensive experiments on benchmark datasets demonstrate that HSAN achieves higher accuracy and faster inference compared to SAN and other state-of-the-art methods.
Keyword: Open-Vocabulary Semantic Segmentation, Attention Mechanism, Feature Fusion, CLIP
Cite@inproceedings{ICIC2025,
author = {Yankui Fu, Fang Yang, and Qingxuan Shi},
title = {HSAN: A Side Adapter Network with Hybrid Compression and Local Enhancement Attention},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3044-3058},
}
- APRstyler: Adaptive Patch and Reversible Text-Guided Single Image Style Transfer, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhichao Liu, Fang Yang, and Xiufang Miao
Abstract: In recent years, text-guided image style transfer techniques have received ex-tensive attention. CLIPstyler is the first model to achieve this goal and has demonstrated impressive stylization results. However, related models represented by CLIPstyler suffer from problems such as content distortion and uneven style during the transformation process. To address these issues, taking CLIPstyler as the baseline model, we propose a new CLIP-based Text-Guided Single Image Style Transfer with Adaptive Patch Selection and Re-verse Generative Network APRStyler . APRStyler designs an end-to-end adaptive adjustment patch scheme to dynamically adjust the weight of each patch. This reduces the negative impact of poorly sampled patches on image quality, enabling the model to better align with the target style. In addition, we introduced a reverse generative network to regenerate the style image from the original image. Through the reverse generative network, the closer the generated original image is to the real original image, the less structural information is lost during the style transfer process. These can ensure that the image after style transfer not only maintains style consistency but also maximally preserves the content structure information of the original image. Experimental results show that APRStyler can not only generate natural and delicate artistic images that conform to human visual perception, but also significantly improve the effect and stability of style transfer.
Keyword: Text-guided Style Transfer, CLIP, Adaptive Patch Selection
Cite@inproceedings{ICIC2025,
author = {Zhichao Liu, Fang Yang, and Xiufang Miao},
title = {APRstyler: Adaptive Patch and Reversible Text-Guided Single Image Style Transfer},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {29-42},
note = {Poster Volume Ⅰ}
}
- AS-ES: Sparse Black-box Adversarial Attack by Active Subspace Evolution Strategy, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jinling Duan and Zhenhua Li
Abstract: In adversarial attacks, most existing methods adopt global attack methods, which attack by changing all image pixels, but this is not realistic. On the contrary, sparse attacks indicate that only perturbing local regions of the input image can deceive DNN models into making incorrect predictions. However, this method requires a large number of queries to generate adversarial examples, and the key issues it faces are locating the perturbation area and optimizing the magnitude of the perturbation. Currently, generating high-quality adversarial examples and improving query efficiency in restricted environments is a challenge for black-box attacks. In this paper, we propose a sparse black-box attack method based on the Active Subspace Evolution Strategy AS-ES , which locates the active subspace of the input image through the multi-arm bandit method, and uses the Covariance Matrix Adaptive Evolution Strategy algorithm for perturbation search in the low-dimensional subspace. We model this problem as a bi-level optimization problem, optimizing both the perturbation position and magnitude to generate high-quality adversarial examples while achieving efficient attacks. We conducted extensive experiments on multiple datasets and verified that the AS-ES method generates adversarial examples with higher quality and query efficiency than existing state-of-the-art attack methods.
Keyword: adversarial example, sparse perturbation, black-box attack.
Cite@inproceedings{ICIC2025,
author = {Jinling Duan and Zhenhua Li},
title = {AS-ES: Sparse Black-box Adversarial Attack by Active Subspace Evolution Strategy},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1230-1247},
note = {Poster Volume Ⅱ}
}
- CDSS: Innovating Cross Differential Attention for Robust Monaural Multi-Speaker Audio-Visual Speech Separation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yinlong Zhang, Jinjiang Liu, Jiawei Jin, Jiuxin Lin, and Zhiyong Wu
Abstract: Target Speaker Extraction TSE based on visual cues has been widely adopted and further extended to Audio-Visual Multi-Speaker Speech Separation AV-MSS through either simultaneous multi-speaker processing or recursive approaches. However, in real-world scenarios, obtaining complete visual information for all speakers is often impractical due to data collection constraints. Existing methods mostly use basic self-attention mechanisms to model correlations between separated speech streams to mitigate missing visual cues. Nevertheless, these approaches overlook the critical distinction between speech signals with auxiliary visual information and those without, resulting in performance degradation when modalities are incomplete. To address this, we propose a novel Cross Differential Attention CDA mechanism that performs cross-modal differentiation, effectively highlighting the salient disparities between modalities. This design enables the model to adaptively emphasize informative, modality-specific features, thereby significantly improving robustness and effectiveness in both complete and missing-visual scenarios. Extensive experiments validate our method’s superiority, demonstrating state-of-the-art performance on both two-speaker and three-speaker mixture tasks.
Keyword: Speech Separation, Audio-Visual, Cross Differential Attention, Multi-Speaker Scenarios
Cite@inproceedings{ICIC2025,
author = {Yinlong Zhang, Jinjiang Liu, Jiawei Jin, Jiuxin Lin, and Zhiyong Wu},
title = {CDSS: Innovating Cross Differential Attention for Robust Monaural Multi-Speaker Audio-Visual Speech Separation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1866-1883},
note = {Poster Volume Ⅱ}
}
- T-LENs: A Tile-Assisted Prompt Framework for Next Location Prediction via Large Language Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yifei Luo, Yuhang Wang, Ningyun Li, Lin Zhang, Haichen Xu, Yu Liu, Rui Luo, and Lin Zhang
Abstract: Next location prediction is critical for personalized recommendations, transportation planning, and emergency responses. However, the sparsity of mobility data and the stochastic nature of individuals daily activities make accurate forecasting still a significant challenge. Existing next location prediction methods often rely on discrete location IDs from limited-scale datasets, limiting interpretability and generalization across regions. To address these issues, we propose T-LENs, a prompt-based framework that combines continuous tile-assisted spatial encoding with the interpretive and reasoning capabilities of Large Language Models LLMs . Our proposed tile-assisted encoding integrates seamlessly with existing methods and enhances privacy preservation by avoiding exposure of sensitive raw coordinates, while also mitigating noise from ultra-precise geolocation data. Furthermore, T-LENs models human mobility by jointly capturing long-term trends and short-term dependencies through a variable-length window, enabling LLMs to identify complex mobility patterns with high accuracy. Our experiments demonstrate that T-LENs significantly outperforms state-of-the-art baselines, achieving superior prediction accuracy with a 50 improvement in Acc@1 and 8 in nDCG@10, while requiring no dataset-specific training. To comprehensively assess the frameworks adaptability, we further evaluate its performance across diverse LLMs, highlighting their potential and limitations in mobility modeling.
Keyword: Next location prediction, Tile encoding, LLMs.
Cite@inproceedings{ICIC2025,
author = {Yifei Luo, Yuhang Wang, Ningyun Li, Lin Zhang, Haichen Xu, Yu Liu, Rui Luo, and Lin Zhang},
title = {T-LENs: A Tile-Assisted Prompt Framework for Next Location Prediction via Large Language Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {152-167},
}
- Structural Entropy Dynamics in CNN Training: A Three-Phase Guided Framework with Applications in Training Optimization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Fengming Dong, Jianghua Lv, Yining Chen, and Hexuan Li
Abstract: Recent advances in convolutional neural networks CNNs have achieved re-markable success across computer vision domains, yet the inherent complexity and opaque nature of their training processes continue to impede further effi-ciency improvements. As a quantitative indicator of graph structural complexi-ty, structural entropy offers a novel perspective for analyzing the training dy-namics of neural networks. This work proposes a graph structure abstraction-based representation method for CNNs, establishing a quantitative framework for training complexity assessment through the transformation of computation-al graphs into weighted directed graphs followed by structural entropy calcula-tion. Through systematic monitoring of classical CNN architectures, we identi-fy a three-phase evolution pattern of complexity dynamics: Adjustment Phase, Convergence Phase, and Specialization Phase, thereby formulating a structural entropy-guided characterization framework for CNN training processes. Fur-thermore, by establishing the correlation between dynamic structural entropy features and model performance, we develop optimization strategies including entropy-aware early stopping criteria and adaptive learning rate scheduling. Experimental results demonstrate that the proposed methodology achieves 27 training acceleration without sacrificing model accuracy, providing a principled approach to enhance CNN training efficiency through complexity-aware opti-mization.
Keyword: Structural Entropy, Convolutional Neural Networks, Training Process Analy-sis, Training Strategy Optimization
Cite@inproceedings{ICIC2025,
author = {Fengming Dong, Jianghua Lv, Yining Chen, and Hexuan Li},
title = {Structural Entropy Dynamics in CNN Training: A Three-Phase Guided Framework with Applications in Training Optimization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1540-1553},
note = {Poster Volume Ⅱ}
}
- An Enhanced Method of Multimodal Information Retrieval Based on Document Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jintao Liu, Chen Feng, Guang Jin, and Jun Fan
Abstract: With the rapid development of information technology, emerging technologies, typically represented by large language models LLMs , are powerfully driving profound transformations in multiple industries. However, the LLMs may exhibit the hallucination phenomenon, making it difficult for them to accurately understand and effectively apply relevant industry knowledge in some specific fields. To address this issue, a method called DocColQwen, a multimodal in-formation retrieval enhancement method based on document segmentation in this paper is proposed. First, the large model analyses the user task and then the contextualized late interaction over paligemma 's Colpalli idea is used to segment the multimodal experimental document into images. Subsequently, the images and user questions are transformed into vectors through encoding for matching, and the retrieved documents are passed to the Qwen2-VL model for response output. Finally, the method is verified in multimodal experimental documents to validate its effectiveness, providing a solution idea for the analysis and processing of multimodal test documents.
Keyword: Large language model Multimodality Vector matching Retrieval-augmented generation
Cite@inproceedings{ICIC2025,
author = {Jintao Liu, Chen Feng, Guang Jin, and Jun Fan},
title = {An Enhanced Method of Multimodal Information Retrieval Based on Document Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3648-3659},
}
- PolyRec : Polynomial Attention for Enhanced Sequential Recommendation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Peichen Ji, Jiwei Qin, Jie Ma, and Yanping Chen
Abstract: The self-attention mechanism has been widely adopted in sequential recommendation due to its powerful capability in modeling long-range dependencies. However, as the number of attention layers increases, user embedding vectors tend to collapse into a low-dimensional subspace. This collapse of embedding space leads to overly concentrated user distributions, resulting in the dimensional collapse phenomenon. The increased similarity among user embedding representations makes it challenging for the model to distinguish between different users, ultimately causing recommendation results to become homogenized. To mitigate this issue, we propose a novel sequential recommendation model named Polynomial Attention for enhanced Sequential Recommendation PolyRec , which alleviates spatial collapse and improves the distribution of user representations. Firstly, the model can better capture high-order structural information through the incorporation of high-order polynomial terms. Simultaneously, leveraging the orthogonality and optimal approximation properties of Chebyshev coefficients stabilizes the parameter training process and enhances the representation capability of the attention mechanism. Furthermore, we conduct a theoretical analysis to demonstrate that during neural networks' aggregation of target information, feature representations are prone to being squeezed by noise and redundant information, thereby exacerbating dimensional collapse. Therefore, by introducing Fourier transforms, we truncate the traditional residual connections in the frequency domain. This approach effectively retains more important information, thereby alleviating the over-squashing phenomena. Experimental evaluations on four benchmark datasets demonstrate that our model outperforms other baseline methods in recommendation accuracy.
Keyword: Sequential recommendation,Dimensional collapse,Polynomial
Cite@inproceedings{ICIC2025,
author = {Peichen Ji, Jiwei Qin, Jie Ma, and Yanping Chen},
title = {PolyRec : Polynomial Attention for Enhanced Sequential Recommendation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2300-2314},
note = {Poster Volume Ⅱ}
}
- Steel Surface Defect Detection Based on YOLOv9 with Denoising Diffusion Implicit Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haozhe Zhang,Yujie Li
Abstract: Due to the complexity of steel processing environments, surface defects inevitably occur during production. Detecting these defects is critical for ensuring product quality and industrial safety. Traditional manual inspection methods suffer from inefficiency and subjectivity, while existing algorithms struggle with feature extraction in complex scenarios. We propose a novel steel surface defect detection model YOLODDIM-DWConv-C3 based on YOLOv9, which enhances feature extraction capabilities while significantly reducing computational complexity. To address the scarcity of original data, we employ the Denoising Diffusion Implicit Model DDIM for data augmentation. The proposed YOLO based defect detection model minimizes computational demands, enabling seamless deployment on edge devices for real-time defect monitoring. Experimental results on the NEU-DET dataset demonstrate that YOLO-DDC outperforms existing methods in both detection accuracy and computational efficiency.We have published the complete project at https: github.com zhzhzsword YOLO-DDC.
Keyword: Steel surface defect detection ,DDIM ,YOLO-DDC
Cite@inproceedings{ICIC2025,
author = {Haozhe Zhang,Yujie Li},
title = {Steel Surface Defect Detection Based on YOLOv9 with Denoising Diffusion Implicit Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3059-3070},
}
- MWDN: A long time series forecasting framework based on multi-scale wavelet decomposition network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xingjie Feng, Jingyao Sun, and Jiaxi Chen
Abstract: Long-term time series forecasting remains a significant challenge due to complex temporal dependencies, scale variability, and noise interference. Existing deep learning methods often struggle to capture fine-grained temporal features, particularly in multivariate scenarios where spatio-temporal correlations vary across different resolutions.To address these limitations, we propose MWDN multi-scale wavelet decomposition network , a novel forecasting framework that integrates multi-scale decomposition with frequency-aware modeling. MWDN employs a wavelet-based module to iteratively decompose the input into detail and approximation sequences, effectively separating seasonal and trend components. These are then processed in parallel via a dual-branch architecture, enabling efficient modeling of variable dependencies across frequencies.To further enhance representation, a multi-scale fusion module aggregates information across resolutions, improving prediction accuracy while mitigating information loss. Extensive experiments on multiple benchmark datasets show that MWDN consistently achieves state-of-the-art or second-best performance on both short- and long-term forecasting tasks. Ablation studies validate the effectiveness of the decomposition strategy and architectural design.MWDN offers a robust and scalable solution for multivariate time series forecasting. The source code is publicly available at: https: github.com take-off-ddl MWDN.
Keyword: Long-term Time Series Forecasting, Wavelet Decomposition, Multi-scale Modeling , Spatio-temporal Dependency.
Cite@inproceedings{ICIC2025,
author = {Xingjie Feng, Jingyao Sun, and Jiaxi Chen},
title = {MWDN: A long time series forecasting framework based on multi-scale wavelet decomposition network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {168-184},
}
- UniGS: Unified 3D Gaussian Splatting for Long-Haired Talking Portraits, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yanping Hu, Ting Liu, and Yuzhuo Fu
Abstract: Talking portrait synthesis is a crucial task in computer vision, enabling realistic animations for applications in virtual communication, entertainment, and digital media. Current methods primarily focus on short-hair scenarios, where they rely on rigid segmentation to separate the head from the torso, followed by head reconstruction and simple compositional strategies to combine the head back with the torso. However, these methods face significant challenges when applied to long-haired individuals due to the complex interactions between hair and body, which can lead to visual artifacts and misalignments. In this work, we introduce a novel dataset specifically designed for long-haired individuals, providing a comprehensive benchmark for evaluating head-torso separation in these complex scenarios. Building upon this dataset, we propose UniGS, a unified 3DGS-based framework that holistically models the full portrait, eliminating the need for explicit segmentation. By incorporating audio, eye, and pose features into a deformation network and utilizing a static-to-dynamic training strategy, our method achieves superior realism and coherence. Experimental results show that our approach outperforms existing state-of-the-art techniques in both visual quality and inference efficiency, and effectively handles the complex visual challenges posed by long-haired scenarios. Additional comparisons on existing short-hair datasets further confirm the robustness of our method.
Keyword: Talking Portrait Generation, Head-Torso Seperation, 3D Gaussian Splatting.
Cite@inproceedings{ICIC2025,
author = {Yanping Hu, Ting Liu, and Yuzhuo Fu},
title = {UniGS: Unified 3D Gaussian Splatting for Long-Haired Talking Portraits},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3075-3086},
}
- Multi-Channel Fusion Graph Convolutional Networks with pseudo-label for Semi-Supervised Node Classification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Guang Yang, Shiwen Sun, and Zhouhua Shi
Abstract: Graph Convolutional Networks GCNs have shown great promise in semi-supervised node classification tasks. However, existing Graph Convolutional Networks GCNs face two key challenges: 1 While addressing the limitations of incomplete or noisy graph structures, the structural information of the graph remains underutilized 2 the scarcity of labeled data, limiting the ability to learn comprehensive embeddings. To address these issues, we propose a novel Multi-channel Fusion Graph Convolutional Networks with pseudo-label, which learn a connected embedding by fusing the multi-channel graphs information and node features. First, to explore the latent information within the original data, we design a graph generation module to extend and reconstruct the original data into multiple graphs. Meanwhile, a multi-channel approach is employed to embed and fuse these graphs, capturing the complementarity across different channels. Second, to address the issue of label sparsity, we design a confidence propagation-based information gain filtering module to generate high-quality pseudo-labels. Extensive experiments on three benchmark datasets demonstrate that our method outperforms other approaches.
Keyword: Graph convolutional networks · Multi-channel · Pseudo labeling · Semi-supervised · Classification learning.
Cite@inproceedings{ICIC2025,
author = {Guang Yang, Shiwen Sun, and Zhouhua Shi},
title = {Multi-Channel Fusion Graph Convolutional Networks with pseudo-label for Semi-Supervised Node Classification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1556-1569},
note = {Poster Volume Ⅱ}
}
- A Sentiment Analysis Model for Aspect-Based Sentiment Analysis using Biaffine Attention and Sentiment Knowledge Enhancement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yusi Gao
Abstract: Sentiment analysis aims to uncover the sentiment polarity of various targets in text. However, existing models predominantly rely on syntactic structures or sequential information, making it challenging to effectively capture the deep dependencies and complex emotional interactions between multi word aspect and opinion terms. This limitation hampers the accurate modeling of semantic relationships among sentiment triplets—aspect terms, opinion terms, and sentiment polarity. Furthermore, current models often overlook the potential of external sentiment knowledge, which results in suboptimal performance when dealing with complex semantic dependencies and multi-word sentiment relationships. To address these challenges, we propose a novel Aspect-based Sentiment Analysis Model SEBM that leverages bi-affine attention and sentiment knowledge enhancement to improve perfor-mance. First, we introduce a biaffine attention mechanism to model the in-tricate semantic and emotional dependencies between multi-word terms, en-abling more precise capture of emotional interactions and semantic relation-ships. Second, we integrate external sentiment knowledge from the Sentic-Net lexicon to optimize the syntactic dependency graph, thereby enhancing the emotional dependencies between the context and aspect terms. This ap-proach compensates for the limitations of existing models in sentiment in-formation modeling. We validate the proposed method on three publicly available datasets: Restaurant, Laptop, and Twitter. The experimental results show that while the accuracy slightly decreased on the Restaurant dataset, SEBM achieved improvements of 2.0 and 1.18 in accuracy on the Laptop and Twitter datasets, respectively. Moreover, SEBM outperformed the base-line model SSEGCN in Macro-F1 scores, with improvements of 0.69 , 2.78 , and 1.01 on the three datasets.
Keyword: Sentiment analysis, Biaffine Attention Mechanism, External affective knowledge, SenticNet
Cite@inproceedings{ICIC2025,
author = {Yusi Gao},
title = {A Sentiment Analysis Model for Aspect-Based Sentiment Analysis using Biaffine Attention and Sentiment Knowledge Enhancement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {898-914},
}
- DASC-YOLO: An Attention Scale-aware Framework for Real-time Leather Defect Detection with Limited Samples, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zihao Li, Zuohao Wu, Hongyu Ao, Mingsheng Shang, and Guang Li
Abstract: Surface defect detection in leather manufacturing faces challenges including multi-scale defects, scarce samples, and texture interference. This study proposes an optimized YOLOv11 framework integrating attention mechanisms, cross-scale feature fusion, and few-shot learning. The backbone network employs spatial-channel attention and feature redundancy reduction to enhance defect discrimination. A cross-scale attention mechanism adaptively fuses multi-resolution features for improved small defect detection. A ProtoNet module addresses sample scarcity while ensuring localization precision. Evaluations on industrial leather datasets and public benchmarks demonstrate the model’s effectiveness, achieving 80.8 mAP on leather defects and 78.1 on steel surfaces with 3.02M parameters and real-time inference 70.3 FPS . The framework outperforms conventional methods in accuracy and robustness, offering a practical solution for automated quality inspection in texture-rich industrial scenarios. Our source code is available at https: github.com zuoqiumama DASC-YOLO.git
Keyword: Surface Defect Detection, Tiny Target, Convolutional Neural Network, Deep Learning.
Cite@inproceedings{ICIC2025,
author = {Zihao Li, Zuohao Wu, Hongyu Ao, Mingsheng Shang, and Guang Li},
title = {DASC-YOLO: An Attention Scale-aware Framework for Real-time Leather Defect Detection with Limited Samples},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {499-515},
}
- Stability Optimization and Analysis of Energy Flow Networks versus Different Centrality Measurement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yi Li and Xin Li
Abstract: Optimizing the stability and control performance of complex networks often hinges on effectively identifying critical nodes for targeted intervention. Due to their inherent complexity and high dimensionality, large-scale energy flow networks, prevalent in domains like power grids, transportation, and financial systems, present unique challenges in selecting optimal nodes for resource allocation. While numerous centrality measurements, such as Katz centrality, eigenvector centrality, closeness centrality, betweenness centrality, and PageRank, have been proposed to evaluate node importance, the impact of different centrality metrics on stability outcomes remains inadequately understood. Moreover, networks manifest diverse structural characteristics—including small-world, scale-free, and random graph properties—which further complicates the optimization problem. This paper systematically investigates how various node centrality measurements influence control stability across representative complex network structures. A unified energy-flow dynamical model is developed, and performance metrics such as $L_1$ is employed to quantify the network stability implications of employing different centrality metrics. Extensive numerical simulations over statistically generated network ensembles reveal significant variances in stability outcomes, highlighting the crucial role of centrality selection. The findings underscore the sensitivity of energy-flow stability to seemingly minor changes in topological node rankings, providing practical insights for enhancing control efficiency and robustness in real-world networked systems.
Keyword: Energy flow networks, dynamical systems, stability performance, centrality measurement
Cite@inproceedings{ICIC2025,
author = {Yi Li and Xin Li},
title = {Stability Optimization and Analysis of Energy Flow Networks versus Different Centrality Measurement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3555-3570},
}
- UAG: Integrating R2UNet and Attention-Guided GNN for Robust Left Ventricle Motion Estimation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Junhao Wu, Huanbin Yao, Kai Li, and Muhammad Sadiq
Abstract: Background: Cardiac diseases significantly affect the structure and function of the left ventricle LV during the cardiac cycle. Purpose: Develop a robust framework UAG for precise detection and correspondence estimation of aberrant LV myocardial motion, enhancing di-agnostic accuracy in cardiac disease management. Methods: This paper proposes UAG, an innovative framework for LV motion estimation. The UAG framework integrates a U-shaped network ar-chitecture R2UNet for precise LV endocardial contour segmentation and a graph neural network GNN enhanced with attention mechanisms for robust feature matching. Initially, R2UNet is trained on cardiac magnetic resonance CMR images to extract discriminative features representing key points along the LV myocardial boundary. Subsequently, the GNN, combined with the Sinkhorn algorithm, establishes accurate correspondence between land-marks across diverse cardiac phases by leveraging both spatial and semantic feature relationships. Results: Performance evaluation on two publicly available cardiac da-tasets demonstrates UAG’s superiority over state-of-the-art methods. Using matching accuracy ACC and average perpendicular distance APD as evaluation metrics, UAG achieves the lowest ACC and APD values, outper-forming existing techniques in both normal and pathological LV contour scenarios. Conclusions: Experimental results validate UAG’s exceptional capability in LV motion estimation, particularly for images with abnormal contours. The integration of R2UNet’s multi-scale feature extraction and the attention-guided GNN ensures robustness against morphological variations, highlight-ing its potential for clinical applications in cardiac diagnostics.
Keyword: Left Ventricle, Myocardial Motion, U-shaped Network, Graph Neural Net-work, Image Segmentation, Endocardial Contour
Cite@inproceedings{ICIC2025,
author = {Junhao Wu, Huanbin Yao, Kai Li, and Muhammad Sadiq},
title = {UAG: Integrating R2UNet and Attention-Guided GNN for Robust Left Ventricle Motion Estimation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2005-2020},
note = {Poster Volume Ⅱ}
}
- RWTLA-Prompt: Leveraging Prompt Learning and Deep Networks for Sentiment Analysis, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yunxiang Nan, Hongyun Huang, and Zuohua Ding
Abstract: With the rapid development of social media, sentiment analysis has become a vital tool for understanding online public opinion and user attitudes. To enhance the model’s capacity for capturing nuanced emotional semantics, this paper proposes a novel sentiment analysis approach that integrates soft prompt learning with a hybrid neural network architecture. Specifically, we leverage RoBERTa-WWM to obtain rich semantic representations, and combine TextCNN and BiLSTM in a parallel structure to extract both local and contextual features. An attention mechanism is further incorporated to enhance the model’s focus on emotionally salient words.To improve task adaptability, we design soft prompt templates by extracting key information from input sentences using an extractive summarization method. These soft prompts are then concatenated with the original input and fed into the model for training and classification. Without significantly increasing the number of trainable parameters, our approach retains the pre-trained model’s language understanding capabilities while enhancing sentiment prediction performance.Experiments conducted on three representative datasets demonstrate that our proposed model outperforms both traditional hard prompt methods and baseline models without prompt learning. The optimized soft prompt templates achieve superior accuracy and F1 scores, validating the effectiveness and generalizability of our approach.
Keyword: Sentiment Analysis, Prompt Tuning, RoBERTa-WWM,TextCNN
Cite@inproceedings{ICIC2025,
author = {Yunxiang Nan, Hongyun Huang, and Zuohua Ding},
title = {RWTLA-Prompt: Leveraging Prompt Learning and Deep Networks for Sentiment Analysis},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {915-932},
}
- YOLO-VIS: Human Vision Mechanism Enhanced YOLO for Forward-Looking Sonar Images Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ziyu Zheng, Yuquan Wu, Xuewei Li, Linjuan Cheng, and Chenghao Hu
Abstract: Forward-looking sonar images object detection plays a crucial role in marine resource exploration and national defense. Existing methods typically focus on traditional feature extraction approaches when processing sonar images, but these methods have not fully borrowed from the advanced processing mechanisms of the human brain in target recognition, leading to less than satisfactory performance in forward-looking sonar images with issues such as low resolution, dynamic changes, and noise interference. To address this, this paper proposes a brain-inspired forward-looking sonar target recognition framework named YOLO-VIS. We designed a low-level feature enhancement module based on large-kernel convolutions, which simulates the human brain’s preliminary processing of images by expanding the receptive field, thereby improving the quality of feature extraction. In addition, a visual attention weighting module is proposed, which further enhances the model’s focus on key features by optimizing feature selection based on the importance of neurons. Finally, through a multi-scale feature deep fusion module, the model’s target recognition capability at different scales is improved. Experimental results show that YOLO-VIS significantly improves target detection accuracy over existing methods on public datasets, verifying the effectiveness of brain-inspired mechanisms in sonar image recognition.
Keyword: Underwater Object Detection,Brain-Inspired Intelligence,Attention Mechanism,Large Kernel Convolution
Cite@inproceedings{ICIC2025,
author = {Ziyu Zheng, Yuquan Wu, Xuewei Li, Linjuan Cheng, and Chenghao Hu},
title = {YOLO-VIS: Human Vision Mechanism Enhanced YOLO for Forward-Looking Sonar Images Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2021-2035},
note = {Poster Volume Ⅱ}
}
- Exploiting Mention-Entity Graph to Enhance In-context Learning for Collective Entity Linking, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xingyi Li, Boyuan Jia, Mingshuo Chen, Xiang Cheng, and Sen Su
Abstract: Entity Linking EL aims at mapping mentions to their corresponding entities. It has been shown that in-context learning based approaches can provide better performance. However, they ignore the interdependency between different EL decisions, i.e., the mentions in the same document should be semantically related to each other, leading to inaccuracy of the task. In this paper, we present CIRCLE, a collective entity linking approach via Mention-Entity graph based in-context learning. In CIRCLE, we propose a logic enhanced path information injection method, which leverages comparative and additive logic to enhance the path information. Moreover, we design a submodular function based demonstration selection method which selects the document-level demonstrations considering high coverage of semantic and path information. Furthermore, we design a Tree-of-Thoughts based demonstration format method which uses a four-layer tree structure for hierarchical thinking. Experimental results confirm the effectiveness of our approach.
Keyword: Collective entity linking, In-context learning, Mention-Entity graph, Large language models.
Cite@inproceedings{ICIC2025,
author = {Xingyi Li, Boyuan Jia, Mingshuo Chen, Xiang Cheng, and Sen Su},
title = {Exploiting Mention-Entity Graph to Enhance In-context Learning for Collective Entity Linking},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {979-993},
note = {Poster Volume Ⅰ}
}
- Diversified Style Generation for Face Anti-Spoofing, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Daoyang Lin and Danwei Chen
Abstract: Face Anti-Spoofing FAS is crucial for protecting face recognition systems against various spoofing attacks. However, existing methods still suffer from significant performance degradation when handling unseen domains. To ad-dress this challenge, this paper designs a Diversified Style Transformation Network DSTN that enhances the domain generalization capability of FAS models through instance-level style augmentation. At the core is a method called Diversified Style Generation DSG . DSG introduces a set of learnable style bases and uses the Dirichlet distribution to generate dynamic weights for each sample, constructing diversified style-enhanced features. During training, the model is exposed to a broader range of style variations, thereby learning style-invariant features. In addition, this paper designs a content consistency loss and a style diversity loss to preserve semantic information in the augment-ed features and to encourage diversity among style bases, further improving model robustness. Experiments on multiple standard cross-domain FAS benchmark datasets show that the proposed method outperforms state-of-the-art approaches across various domains, especially in unseen domain tasks, demonstrating stronger generalization capabilities. These results verify the ef-fectiveness and potential of DSG in solving the domain generalization prob-lem.
Keyword: Face Anti-spoofing, Domain Generalization, Style Augmentation.
Cite@inproceedings{ICIC2025,
author = {Daoyang Lin and Danwei Chen},
title = {Diversified Style Generation for Face Anti-Spoofing},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {546-562},
note = {Poster Volume Ⅰ}
}
- CMAE:Channel-Masked Autoencoders, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yutao Wang, Yu Nie, and Yilai Zhang
Abstract: In the realm of self-supervised learning, excellent frameworks such as MAE and MoCo have emerged. However, these frameworks' complexity and reliance on specific architectures limit their universality and scalability across different models. Although these training paradigms have facilitated the performance improvement of lightweight models to a certain extent, related research remains scarce. Therefore, this paper aims to explore new approaches for enhancing the performance of lightweight models and proposes a more universal, concise, and scalable self-supervised learning framework called Channel-Masked Autoencoders CMAE . CMAE effectively addresses the incompatibility issue between the MAE framework and con1volutional neural networks and can be well applied to lightweight models. Additionally, we have further investigated the impact of noise strategies on the performance of lightweight models and applied them to CMAE. Our method is concise and efficient: the encoder learns latent representations from grayscale images obtained by randomly masking two color channels and approximately 50 random cropping, which provides information for the decoder to reconstruct the original image. This innovative idea stems from the fact that human vision relies primarily on texture and shape features rather than color. We conducted experiments on multiple datasets and tasks to evaluate the universality and generalization capabilities of the model comprehensively. In these experiments, CMAE exhibited remarkable performance, particularly noteworthy being that the MobileViTv3 model pre-trained with CMAE achieved a 3.7 percentage point improvement in classification accuracy on the Mini-ImageNet dataset. Furthermore, CMAE also demonstrated advantages compared to MoCov3.
Keyword: Computer vision, Self-supervised learning, Visual feature learning, Transfer learning
Cite@inproceedings{ICIC2025,
author = {Yutao Wang, Yu Nie, and Yilai Zhang},
title = {CMAE:Channel-Masked Autoencoders},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {516-532},
}
- A Gradient Noise-based Dynamic Conditional Diffusion Model for Time-Series Anomaly Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xianghe Du, Xueru Song, Shikang Pang, Shuaitao Yang, Yao Tong, and Jiahui Lu
Abstract: Time series anomaly detection is crucial in various real-world scenarios, including fault diagnosis, financial fraud detection, and early warning systems. While diffusion models have recently emerged as powerful generative tools for anomaly detection, two key challenges persist: 1 conventional Gaussian noise used during the forward process fails to suppress anomaly-specific frequencies due to spectral mismatches and 2 most existing methods adopt a unified model to detect all types of anomalies, overlooking the distinct characteristics of trend, seasonal, and mixture anomalies. To address these issues, we propose GNDC-DM, a greadient noise-based dynamic conditional diffusion model for time series anomaly detection. GNDC-DM employs three dedicated channels to detect different types of anomalies individually. In the trend and seasonal channels, we introduce a novel GNDC-DM that fuses gradient-aligned noise with stochastic Gaussian components, effectively preserving normal patterns while corrupting anomaly distortion. In the mixture channel, we dynamically incorporate trend and seasonal components as conditions to guide the denoising process, making mixed anomalies more distinguishable. Extensive experiments on four benchmark datasets demonstrate the superior performance of our approach, highlighting its ability to improve detection accuracy across various anomaly categories.
Keyword: Time series, anomaly detection, diffusion models
Cite@inproceedings{ICIC2025,
author = {Xianghe Du, Xueru Song, Shikang Pang, Shuaitao Yang, Yao Tong, and Jiahui Lu},
title = {A Gradient Noise-based Dynamic Conditional Diffusion Model for Time-Series Anomaly Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1246-1262},
note = {Poster Volume Ⅱ}
}
- A Defect Recognition and Classification Method Based on Improved Convolutional Neural Network and Terahertz Time-Domain Spectroscopy System, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Liu Yang, Xiuwei Yang, Teng Li, Aoyu Zhu, and Jinhong Li
Abstract: Convolutional neural networks CNN can perform defect recognition and classification, saving time compared to traditional methods. However, traditional CNN are difficult to achieve accurate differentiation due to insufficient feature extraction capability and low computational efficiency when dealing with scenes with complex backgrounds and similar defect categories. To solve these problems, this paper proposes an improved convolutional neural network based on multimodal data fusion to achieve efficient automated defect recognition and classification by combining the technical advantages of terahertz time-domain spectral system. Firstly, the spectral data of the samples are obtained by a terahertz time-domain spectroscopy system, and the pre-processed spectral data are imaged. Second, the absorption coefficients were obtained by building a terahertz propagation model inversion. Then, the terahertz absorption coefficients are deeply fused with the image data to construct a multimodal dataset as the network input. Convolutional blocks with multi-layer asymmetric convolutional kernels are designed in the convolutional layer to enhance the accuracy and classification speed of defect recognition by strengthening the feature extraction and learning capabilities. Meanwhile, jump connections are chosen between the convolutional blocks, aiming to resist the problems of gradient vanishing and overfitting. Numerical experiments show that the improved CNN achieves 99.4 accuracy in defect classification with an F1 score of 0.99 and 100 accuracy in the confusion matrix validation set. Compared with the traditional CNN, the accuracy is improved by 6 and the F1 score is improved by 4 , which provides a reliable technical support for defect recognition and classification in complex scenes.
Keyword: Convolutional Neural Network,Defect detection and classification, Terahertz Time-Domain Spectroscopy System
Cite@inproceedings{ICIC2025,
author = {Liu Yang, Xiuwei Yang, Teng Li, Aoyu Zhu, and Jinhong Li},
title = {A Defect Recognition and Classification Method Based on Improved Convolutional Neural Network and Terahertz Time-Domain Spectroscopy System},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1370-1385},
note = {Poster Volume Ⅱ}
}
- Muti-Scale Encoder and Temporal Queries Decoder for Video Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hongxiao Yang and Xi Chen
Abstract: Video Object Detection VOD leverages temporal information across adja-cent frames in video datasets, enabling the identification and localization be-yond single-frame image object detection. Transformer-based detectors have achieved remarkable performance in static image object detection. However, their application to video object detection lacks sufficient exploration, par-ticularly in aggregating spatial features and temporal features effectively. Re-cent research has replaced handcrafted components in traditional optical flow models and association networks with novel designs to integrate spatial features across frames, thereby incorporating temporal information. Never-theless, these methods often introduce significant computational overhead or complex processing pipelines. Moreover, the integration of multi-scale spa-tial features and temporal features into a unified framework remains chal-lenging, making it difficult to process both small and large objects simultane-ously. To address these issues and enhance detection efficiency, we propose a novel method that aggregates multi-scale spatial features and contextual temporal information. Specifically, we propose a strip attention mechanism for intra-scale feature interaction, utilize pyramid network to fuse spatial fea-tures across scales and construct temporal associations across video frames through decoder structures. Our end-to-end approach aggregates target que-ries progressively from coarse to fine, striking a balance between perfor-mance and efficiency. Extensive experiments on the ImageNet VID dataset demonstrate that our method significantly improves video object detection.
Keyword: video object detection, temporal queries, muti-scale features, transformer
Cite@inproceedings{ICIC2025,
author = {Hongxiao Yang and Xi Chen},
title = {Muti-Scale Encoder and Temporal Queries Decoder for Video Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3090-3106},
}
- MAPL: Memory Augmentation and Pseudo-Labeling for Semi-Supervised Anomaly Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Junzhuo Chen and Shitong Kang
Abstract: Large unlabeled data and difficult-to-identify anomalies are the urgent issues need to overcome in most industrial scene. In order to address this issue, a new methodology for detecting surface defects in industrial settings is introduced, referred to as Memory Augmentation and Pseudo-Labeling MAPL . The methodology first introduces an anomaly simulation strategy, which significantly improves the model's ability to recognize rare or unknown anomaly types by generating simulated anomaly samples. To cope with the problem of the lack of labeling of anomalous simulated samples, a pseudo-labeler method based on a one-classifier ensemble was employed in this study, which enhances the robustness of the model in the case of limited labeling data by automatically selecting key pseudo-labeling hyperparameters. Meanwhile, a memory-enhanced learning mechanism is introduced to effectively predict abnormal regions by analyzing the difference between the input samples and the normal samples in the memory pool. An end-to-end learning framework is employed by MAPL to identify the abnormal regions directly from the input data, which optimizes the efficiency and real-time performance of detection. By conducting extensive trials on the recently developed BHAD dataset including MVTec AD [1], Visa [2], and MDPP [3] , MAPL achieves an average image-level AUROC score of 86.2 , demonstrating a 5.1 enhancement compared to the original MemSeg [4] model. The source code is available at https: github.com jzc777 MAPL.
Keyword: Anomaly Detection, Semi-Supervised Learning, Computer Vision.
Cite@inproceedings{ICIC2025,
author = {Junzhuo Chen and Shitong Kang},
title = {MAPL: Memory Augmentation and Pseudo-Labeling for Semi-Supervised Anomaly Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3106-3122},
}
- Mining Intention with Heterogeneous Attention and
Distillation for Interaction Anticipation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yueke Zheng, Danhuai Zhao, Yicheng Liu, Kanghua Pan, Yuping He, Guo Chen, Guangpin Tao, Kang Zheng, Wei Zhu, and Tong Lu
Abstract: Short-term Object Interaction Anticipation STA aims to enhance intelligent systems with predictive capabilities and decision-making support by forecasting future interactions between objects. Existing methods often rely on visual changes in short video inputs, which limits the depth and accuracy of motivation prediction. In this study, we propose a novel approach, termed IntenFormer, inspired by how humans make decisions by understanding intentions. Specifically, IntenFormer employs a heterogeneous attention mechanism to simultaneously mine long- and short-term information frameworks, while incorporating knowledge distillation by utilizing a pretrained global intention model as a teacher, enabling the model to learn intention patterns. Extensive experiments on the Ego4D-STA dataset demonstrate that IntenFormer achieves highly competitive results, underscoring the efficacy of a unified approach to intention prediction and knowledge distillation.
Keyword: Object Interaction Anticipation, Knowledge Distillation.
Cite@inproceedings{ICIC2025,
author = {Yueke Zheng, Danhuai Zhao, Yicheng Liu, Kanghua Pan, Yuping He, Guo Chen, Guangpin Tao, Kang Zheng, Wei Zhu, and Tong Lu},
title = {Mining Intention with Heterogeneous Attention and
Distillation for Interaction Anticipation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3121-3132},
}
- Structure-Based Testing Criteria and Testing Case Generation for Deep Learning Systems, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yining Chen, Jianghua Lv, Fengming Dong, and Hexuan Li
Abstract: Deep neural networks DNN are currently the basis of many modern AI ap-plications has been widely applied in various domains. As more safety-oriented fields autonomous driving, medical diagnosis, etc. begin to use DNN, people have put forward new requirements for DNN. Not only the ac-curacy of DNN and other objective indicators be excellent, but also have ro-bustness and ability to handle various corner cases. It is important to test the adequacy of the deep neural network model, design appropriate evaluation indicators, build a complete test evaluation system. However, deep neural network computing like a black box, a slight disturbance to the input may cause errors in the final output of the model. Therefore, it is important to test the adequacy of the deep neural network model, design appropriate eval-uation indicators, build a complete test evaluation system. We prove that there are differences in the internal structure of neural networks for different types of input. Based on this discovery, we proposed Multi-Layer test criteria based on the neural network structure. To quantify and analyze the changes in the internal structure of neural networks under different types of input, this paper pro poses an algorithm for mapping the deep neural network to tree structure data. Finally, a Multi-Layer test criteria based on the neural network structure is proposed to guide the generation of test cases, which can generate high-quality test cases.
Keyword: Deep neural network, White box testing, test case generation, test effective-ness
Cite@inproceedings{ICIC2025,
author = {Yining Chen, Jianghua Lv, Fengming Dong, and Hexuan Li},
title = {Structure-Based Testing Criteria and Testing Case Generation for Deep Learning Systems},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1571-1583},
note = {Poster Volume Ⅱ}
}
- MgSFR: Multi-grained Semantic Fusion Retrieval for Multi-hop Reading Comprehension, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yingying Zhang, Bo Cheng, and Yuli Chen
Abstract: Multi-hop Reading Comprehension RC has become a critical task in natural language Processing NLP , requiring models to perform multiple reasoning steps and aggregate dispersed clues across multiple paragraphs to answer complex questions. Unlike single-hop RC, multi-hop RC aims to bridge information across diverse contexts and provide interpretable supporting facts, making it closer to human-like reasoning. To address these challenges, we propose a novel approach, Multi-grained Semantic Fusion Retrieval MgSFR , which integrates semantic information from multiple granularities word, phrase, sentence, and document . This fusion enhances the semantic relevance between questions and paragraphs, improving retrieval accuracy and efficiency. Additionally, MgSFR proposes a fine-grained semantic interaction mechanism that computes semantic similarity between different granularities, further boosting the model's performance. To complete the multi-hop RC pipeline, we introduce a multi-task reader that leverages this semantic fusion to enhance the model's reasoning capabilities. Experimental results on the HotpotQA benchmark dataset demonstrate that MgSFR significantly outperforms existing retrieval methods and provides high-quality context for multi-hop reasoning. Additionally, MgSFR achieves competitive performance compared to current state-of-the-art models in multi-hop reasoning tasks, validating its effectiveness in complex multi-hop tasks.
Keyword: Information Integration and Multi-grained Semantic Fusion and Multi-hop Reading Comprehension and Semantic Reasoning.
Cite@inproceedings{ICIC2025,
author = {Yingying Zhang, Bo Cheng, and Yuli Chen},
title = {MgSFR: Multi-grained Semantic Fusion Retrieval for Multi-hop Reading Comprehension},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {533-549},
}
- Generating Privacy-Preserving Data without Compromising Analytical Utility, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaopeng Luo, Shanlin Feng, Yunlin Liu, and Taoting Xiao
Abstract: As machine learning technology improves by leaps and bounds, there is a rapid growth in the demand for data. However, in many practical applications, a challenging problem is efficiently capturing useful information in private data. Moreover, personal information in the data will seriously threaten the privacy of participating users, building blocks for data-driven decision-making. The popularization of communication technology and data collection devices have led to mixed-type data containing numerical and categorical features. Mixed data provides more comprehensive and rich information to help us discover hidden patterns between features and data labels. It is worth noting that not all features contribute equally to the classification task. Features with poor correlation to the labels may not provide valuable information to the dataset and can even affect the accuracy of the analysis results. Such features were considered noise and irrelevant to the analysis task. This paper proposes a novel data synthesis method that considers the relevance of heterogeneous features to the data labels, even in scenarios with limited data. By employing strict privacy constraints through differential privacy and protecting user privacy information with noise, this method generates new data, increasing the quantity and diversity of training data while preserving its utility. We evaluate the newly generated data protected under privacy constraints, assessing their utility in classifiers through experiments. The experimental results demonstrate that this method preserves the original data's utility and improves the classifiers' classification results.
Keyword: Mixed data Feature selection Feature ranking Privacy-preserving
Cite@inproceedings{ICIC2025,
author = {Xiaopeng Luo, Shanlin Feng, Yunlin Liu, and Taoting Xiao},
title = {Generating Privacy-Preserving Data without Compromising Analytical Utility},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {550-563},
}
- Deformation Tumor Synthesis with Modal-Data Adaptive Supervision for 3D Brain Tumor Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaofeng Peng and Feng Yang
Abstract: The accurate segmentation of brain tumor is important not only for treatment planning, but also for follow-up evaluations. However, the inadequacy of annotated medical images poses challenges in training the brain tumor segmentation models. This paper addresses this issue by presenting a new method called De-formation Tumor Synthesis with Model-Data Adaptive Supervision DSMA . DSMA consists of data synthesis and weight allocation. The Deformation Tumor Synthesis DTSS strategy combines the morphological features of real tumors and adopts a unique iteration synthesis and fusion mechanism to generate diverse derived synthetic data customized for each set of real data. The Model-Data Adaptive Supervision MAS strategy dynamically filters and allocates the loss weights of synthetic data based on the real-time performance of the segmentation model to ensure the positive effects of adding synthetic data. The experimental results on the publicly available MRI brain imaging datasets BraTS2019 and BraTS2020 indicate that the proposed method achieves high-quality data synthesis and effectively improves the performance of the segmentation model.
Keyword: brain tumor segmentation, Deformation Tumor Synthesis, iteration synthesis, Model-Data Adaptive Supervision.
Cite@inproceedings{ICIC2025,
author = {Xiaofeng Peng and Feng Yang},
title = {Deformation Tumor Synthesis with Modal-Data Adaptive Supervision for 3D Brain Tumor Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {563-579},
note = {Poster Volume Ⅰ}
}
- Boosting Neural Language Inference via Cascaded Interactive Reasoning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Min Li and Chun Yuan
Abstract: Natural language inference NLI aims to judge accurately the logical relationship between premises and hypotheses. Due to the diverse expressions, rich semantics, and complex contexts inherent in language, NLI remains challenging. While Transformer-based pre-trained models have brought notable improvements, existing methods usually leverage only the last-layer token representations, limiting their effectiveness for modeling complex semantic interactions. To address this, we propose a Cascaded Interactive Reasoning Network CIRN , which achieves deep semantic understanding by extracting multi-level semantic features within an interactive space. This hierarchical extraction mechanism simulates the progressive cognitive process from shallow to deep understanding, efficiently mining hidden semantic relationships between sentences. Extensive experiments across multiple benchmark datasets demonstrate consistent performance improvements over strong baseline methods.
Keyword: Neural language inference and Deep learning and Neural language process.
Cite@inproceedings{ICIC2025,
author = {Min Li and Chun Yuan},
title = {Boosting Neural Language Inference via Cascaded Interactive Reasoning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {933-944},
}
- CLUE: A High-Performance, Efficient, and Robust APT Detection Framework via Fine-Tuning Pretrained Transformer and Contrastive Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wenzhuo Cui, Maihao Guo, Jingjing Feng, Shuyi Zhang, Zheng Liu, and Yu Wen
Abstract: In recent years, provenance graph-based approaches have become the standard approach for Advanced Persistent Threat APT detection and investigation. However, existing studies face several challenges: 1 the high computational cost of the training process makes it difficult to update the model in a timely manner, leading to delayed attack detection 2 the imbalance in training data results in a scarcity of attack samples, which negatively impacts model performance and 3 high false positive rates hinder practical deployment in real-world applications. To address these challenges, we propose CLUE, a novel APT detection framework that enables high-quality, multi-granular detection. CLUE employs lemmatization techniques to normalize sequence data extracted from provenance graphs and then directly fine-tunes a pretrained Transformer model, significantly reducing both training time and dependence on scarce attack data. Furthermore, CLUE incorporates contrastive learning to enhance generalization capability in data-scarce scenarios by optimizing inter-sample distances while accelerating model convergence. Our evaluation of CLUE across 10 real-world APT attack scenarios demonstrates that compared to state-of-the-art methods, CLUE maintains superior detection performance while achieving a 7.4× reduction in average training time and requiring 45.2 less training data particularly attack samples . These results validate CLUE's efficiency, robustness, and practical value in APT detection.
Keyword: APT detection, Provenance graph, Contrastive learning, Pretrained Transformer
Cite@inproceedings{ICIC2025,
author = {Wenzhuo Cui, Maihao Guo, Jingjing Feng, Shuyi Zhang, Zheng Liu, and Yu Wen},
title = {CLUE: A High-Performance, Efficient, and Robust APT Detection Framework via Fine-Tuning Pretrained Transformer and Contrastive Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {994-1008},
note = {Poster Volume Ⅰ}
}
- DKE: LLM-Based Domain Knowledge Enhancement for Comprehensible Personality Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yutian Zhang, Conghui Zheng, and Li Pan
Abstract: Personality detection aims to identify personality traits of individuals based on social media posts. Many existing methods map words to psycholinguistic categories by utilizing Linguistic Inquiry and Word Count LIWC to introduce the domain knowledge required for personality detection. However, the categories of LIWC are static and lack a direct connection to personality detection, thereby limiting the model's ability to effectively leverage domain-specific knowledge. Trained on massive amounts of data, large language models LLMs possess extensive world knowledge, especially domain-specific knowledge that is beneficial to comprehending personality taxonomies. Inspired by this, we propose an LLM-based domain knowledge enhanced model to capture the implicit psycholinguistic knowledge in posts, achieving more accurate and comprehensible personality detection. Specifically, to leverage the LLM’s comprehensive knowledge base, we first input the posts into the LLM to obtain personality judgments and corresponding rationales, which are derived based on the core characteristics of each MBTI dimension. To better incorporate personality-related knowledge, the proposed model then conducts feature interactions between the rationales and the posts, generating text representations that better reflect personality traits. After that, the model adaptively adjusts the weights of the interactive features and aggregates them with the semantic features to form the final representations, based on which personality detection is performed. Experimental results on real-world datasets demonstrate the proposed model effectively improves the quality of user personality representation and outperforms baseline methods.
Keyword: Large Language Model, Personality Detection, Knowledge Enhancement.
Cite@inproceedings{ICIC2025,
author = {Yutian Zhang, Conghui Zheng, and Li Pan},
title = {DKE: LLM-Based Domain Knowledge Enhancement for Comprehensible Personality Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {945-960},
}
- EdgeMeter: An Edge Computing System and Algorithm for Intelligent Water Meter Recognition, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ximing Li, Xiaosheng Xie, Yue Zhang, Min Wang, Xiao Du, Zelin He, and Yubin Guo
Abstract: Accurate and efficient water meter reading recognition system is essential for intelligent water resource management. However, existing systems face several challenges, including the high deployment costs of replacing old meters with smart ones, the limited device lifespan caused by local recognition on embed-ded devices, and the increased server workload associated with server-based processing. To address these issues, we propose an intelligent water meter recognition system based on a three-layer edge computing architecture. The IoT layer is responsible for data collection, utilizing an HC32F460 chip to capture automatically capture water meter images, compress the images, and transmit them to the edge layer. The edge layer is primarily composed of the YOLO-METER algorithm for water meter reading recognition. Based on YOLO11n, we have improved two modules by integrating FastC3k2 to enhance the extrac-tion of low-contrast features and MRFBlock to refine feature selection and im-prove the localization of reading regions on the water meter. The cloud layer periodically aggregates water meter readings, performs data analysis, and pro-vides users related information, enabling real-time monitoring and insights. Experiments show that YOLO-METER achieves a 2.4 higher mAP50 and 6 fewer parameters than YOLO11n, enhancing recognition accuracy while reduc-ing computational cost. This system facilitates efficient water usage monitoring, thereby improving operational efficiency and contributing to intelligent re-source management.
Keyword: Edge Computing, Smart Meters, Reading Recognition, Deep Learning, IoT.
Cite@inproceedings{ICIC2025,
author = {Ximing Li, Xiaosheng Xie, Yue Zhang, Min Wang, Xiao Du, Zelin He, and Yubin Guo},
title = {EdgeMeter: An Edge Computing System and Algorithm for Intelligent Water Meter Recognition},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3137-3153},
}
- Latent Query Alignment for Enhanced Domain-specific Retrieval-Augmented Generation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yijun Bei, Yan Jiang, Bin Zhao, Lihua Yu, Zhaoyu Zhong, and Yao Zhu
Abstract: Large Language Models LLMs demonstrate impressive generalization capabilities in various natural language processing tasks but often encounter performance degradation, particularly in domain-specific applications due to hallucinations and semantic mismatches. We propose LaQuA, a novel retrieval framework leveraging latent query alignment to bridge the semantic gap between user queries and specialized domain documents. LaQuA integrates three core innovations: latent query generation using LLMs, contrastive alignment with similarity-constrained synthetic queries, and a semantic bridging inference mechanism employing proxy queries. Comprehensive experiments on public benchmarks and a custom domain-specific dataset show that LaQuA significantly improves retrieval quality compared to standard dense retrievers and pseudo-query approaches. Additionally, evaluation within a Retrieval-Augmented Generation RAG pipeline demonstrates consistent enhancements in factual accuracy and content relevance across multiple language models and domains. Our findings suggest that latent query-driven semantic alignment substantially mitigates hallucinations and improves LLM performance in knowledge-intensive, domain-specific tasks.
Keyword: Large Language Models, Dense Retrieval, Latent Query Generation, Contrastive Learning, Retrieval-Augmented Generation
Cite@inproceedings{ICIC2025,
author = {Yijun Bei, Yan Jiang, Bin Zhao, Lihua Yu, Zhaoyu Zhong, and Yao Zhu},
title = {Latent Query Alignment for Enhanced Domain-specific Retrieval-Augmented Generation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {961-977},
}
- InteractMatch: Segment Anything Model with Interact-Consistency for Semi-Supervised Medical Image Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Haohua Chang
Abstract: The scarcity of labeled data is a significant challenge in medical image segmentation tasks. In recent years, Segment Anything Model SAM has gained attention as a foundational model for segmentation tasks due to its powerful zero-shot capabilities and prompt-based interactive manner. However, due to the substantial domain gap between medical and natural image data, adapting SAM to the medical domain requires a large volume of annotated medical data. Unfortu-nately, in medical applications, obtaining densely annotated data is both costly and challenging, particularly for rare diseases. Therefore, to efficiently fine-tune SAM, we consider utilizing Semi-Supervised Learning SSL to harvest knowledge from unlabeled samples. In this paper, we present InteractMatch, which consists of a Prompt Augmentation-Based Consistency PAC and a Cross-Model Knowledge Distillation CKD . The PAC module effectively leverages various types of prompts from SAM to facilitate model training on unlabeled data, improving both robustness and predictive accuracy by introducing perturbations to the prompts. Additionally, CKD is introduced to align the probability distributions of the two model branches, thereby reducing discrepancies in their predictions and enhancing the output invariance of the model. Extensive experiments on two public datasets demonstrate that our InteractMatch achieves state-of-the-art performance in semi-supervised medical image segmentation task, particularly, leading a 1.93 dice score improvement on the ACDC dataset.
Keyword: Segment Anything Model, Semi-Supervised Learning, Medical Image Segmentation.
Cite@inproceedings{ICIC2025,
author = {Haohua Chang},
title = {InteractMatch: Segment Anything Model with Interact-Consistency for Semi-Supervised Medical Image Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1009-1025},
note = {Poster Volume Ⅰ}
}
- Enhance Adversarial Attack against Trajectory Prediction Model by Considering the Characteristics of Trajectory Data and Model, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xin Guo, Yucheng Shi, Zhenghan Gao, Guangyao Bai, Yufei Gao, Lei Shi, and Wenwen li
Abstract: Recent studies have revealed the vulnerability of trajectory prediction TP models to gradient-based adversarial attacks. However, existing gradient-based attacks overlook the characteristics of trajectory data and model, limiting their effectiveness in robustness evaluation. Therefore, we propose a gradient-based attack algorithm considering both rich physical information in data and common characteristics in model. For the data aspect, unlike adversarial attacks in the image, trajectory data carries more physical information than pixels. Considering this, our method introduces momentum in gradient-updates to reserve physical information in previous iterations to keep the generated adversarial trajectory realistic. And in order to search for more possible adversarial trajectories, it updates with different initial states and update with different step sizes. For the aspect of model, unlike convolutional neural networks CNNs used for image recognition, trajectory prediction models are typically based on recurrent neural networks RNNs which tend to focus more on specific points within the data rather than treating all inputs with equal importance. An attention loss function is designed to guide the attack to focus on that the model concern about. Experiments on three models and two datasets show that our attack algorithm increases mean displacement error ADE over 7.94 of the trajectory prediction error compared to previous state-of-the-art gradient-based attack. Our code is open source at Github : https: anonymous.4open.science' RMS-PGD.
Keyword: adversarial example, gradient-based attack, trajectory prediction, robustness evaluation
Cite@inproceedings{ICIC2025,
author = {Xin Guo, Yucheng Shi, Zhenghan Gao, Guangyao Bai, Yufei Gao, Lei Shi, and Wenwen li},
title = {Enhance Adversarial Attack against Trajectory Prediction Model by Considering the Characteristics of Trajectory Data and Model},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3152-3166},
}
- Video Anomaly Detection: A Systematic Taxonomy and Analysis of Deep Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yitong Yuan and Jingxin Cao
Abstract: Video anomaly detection VAD , a foundational pillar of modern computer vision, has garnered significant attention due to its wide-ranging real-world applications. Despite considerable progress, VAD persistently grapples with formidable challenges, particularly the scarcity of anomalies in real-world datasets and the complexities of accurate anomaly annotation. This survey investigates state-of-the-art VAD methodologies, synthesizing their core challenges and elucidating tailored solutions. Addressing the shortcomings of prior reviews, we propose a unified taxonomy that classifies methods according to input modalities: raw video, mid-level visual features, and high-level visual-semantic representations, providing a perspicuous framework to discern their unique attributes. To enrich comprehension, we present rigorous comparisons and analyses across established benchmark datasets. Anticipating future developments, we delineate promising research trajectories, such as semantic context learning enabled by contrastive language-image pre-training CLIP and multi-modal large language models MLLMs , to drive transformative advancements in the field. Furthermore, we meticulously examine the practical impediments existing approaches encounter in deployment in real-world environments.
Keyword: Video Anomaly Detection, Anomaly Detection, Computer Vision.
Cite@inproceedings{ICIC2025,
author = {Yitong Yuan and Jingxin Cao},
title = {Video Anomaly Detection: A Systematic Taxonomy and Analysis of Deep Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {43-59},
note = {Poster Volume Ⅰ}
}
- CAMS: Collaborative Small-Parameter Large Language Models for Educational QA Grading, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Gangliang Li, Dacheng Xu, Xiaodong Huang, Chengfeng Chen, and Shouqiang Liu
Abstract: Automated grading has become a crucial component of smart education, involving various complex Natural Language Processing NLP tasks, including text representation, similarity evaluation, and classification. Although large language models LLMs show great promise in improving grading accuracy and consistency, their high computational costs and data privacy concerns limit widespread adoption. This study introduces CAMS, an automated grading system based on smaller LLMs that enhances the grading process through model collaboration and chain-of-thought CoT -guided prompt templates. CAMS offers an efficient, locally deployable, and sustainable solution. By integrating Yi-1.5-9B into the proposed collaborative CAMS system and deploying it locally, the system achieved a grading score of 0.8511 and an overall score of 0.8148, demonstrating improvements of 0.1865 and 0.1732, respectively, compared to the standalone use of Yi-1.5-9B. Furthermore, the performance of CAMS approaches that of larger-scale model APIs such as GPT-3.5-Turbo score = 0.8457 .
Keyword: large language models, automatic grading, prompt engineering, natural language processing.
Cite@inproceedings{ICIC2025,
author = {Gangliang Li, Dacheng Xu, Xiaodong Huang, Chengfeng Chen, and Shouqiang Liu},
title = {CAMS: Collaborative Small-Parameter Large Language Models for Educational QA Grading},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {978-993},
}
- DPPBP: Dual-stream Protein-peptide Binding Sites Prediction Based on Region Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yueli Yang, Yang Hua, Wenjie Zhang, and Xiaoning Song
Abstract: Prediction of protein-peptide binding sites plays a critical role in the regulation of cellular functions and the targeted drug discovery. Recently, sequence-based prediction methods have been widely used due to their simplicity, effectiveness, and low cost of data collection. However, these methods rely on the binary classification of individual amino acids within the protein sequence, which often overlooks the dependencies between binding amino acids in the training labels. To address this issue, we propose a novel Dual-stream Protein-Peptide Binding sites Prediction method DPPBP based on region detection and protein language model. For the first-stream, we group successive binding sites into a single region to capture the relationships between binding amino acids and highlight the binding region of the entire sequence. Then, we use a fixed small set of learned target queries to reason about the relationships between the target regions and the global sequence information of the protein, generating the final predictions in parallel. For the second-stream, we continue to use a binary classification to discriminate each individual amino acid at a fine-grained level, and the final prediction is obtained by combining the results of both streams. Extensive experiments show that our DPPBP method outperforms the existing state-of-the-art sequence-based methods on the two benchmark datasets. Datasets and codes can be found at https: github.com 22Donkey DPPBP.
Keyword: Protein-peptide interaction, Binding sites prediction, Dual-stream joint inference.
Cite@inproceedings{ICIC2025,
author = {Yueli Yang, Yang Hua, Wenjie Zhang, and Xiaoning Song},
title = {DPPBP: Dual-stream Protein-peptide Binding Sites Prediction Based on Region Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2160-2174},
note = {Poster Volume Ⅱ}
}
- Performance Optimization of Imbalanced Intrusion Detection Data Classification Based on Voting Approach, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Guannan Wen and Zhenzhou An
Abstract: Intrusion detection systems IDSs often suffer from the category imbalance problem, i.e., malicious traffic is much less than normal traffic, which results in inefficient system detection. This paper proposed a data generation technique incorporating Pearson's correlation coefficient to solve this problem. At the same time, feature selection based on chi-square distribution and integrated learning based on voting method are also used for model optimization. In the data generation stage, the Pearson correlation coefficient-based sample similarity calculation was introduced to improve the Borderline-SMOTE method to generate high-quality data and reduce the negative impact of low-quality samples on the classification model. Different experiments were conducted for multiple machine learning methods in the model optimization phase and selected the most effective combination. Experiments on three public datasets, UNSW-NB15, NSL-KDD, and CIC-IDS-2017, proved the effectiveness of the method, especially in the detection of a few categories, which achieved significant improvement, especially in the UNSW-NB15 dataset, the F1 scores of the few categories of Analysis and Backdoor had 42$ $ and 39$ $ improvement, respectively. In addition, a new evaluation metric, Mean Category Accuracy MCA , was proposed, which provides a more balanced assessment of the detection performance of all attack types.
Keyword: Intrusion detection, Machine learning, Ensemble learning
Cite@inproceedings{ICIC2025,
author = {Guannan Wen and Zhenzhou An},
title = {Performance Optimization of Imbalanced Intrusion Detection Data Classification Based on Voting Approach},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1026-1043},
note = {Poster Volume Ⅰ}
}
- TR7Net: A Hybrid Transformer-CNN Framework for Endoscopic Image Segmentation with Validation on Spinal Surgery, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shao-Chi Pao, Liuyi Yang, Lin Lin, Fengcheng Mei, Xiaoxing Yang, and Bingding Huang
Abstract: In endoscopic surgery, accurate segmentation of medical images is indispensable for surgical planning, navigation, and real-time guidance. However, existing segmentation methods often fall short in capturing complex anatomical details and long-range contextual information, which are critical for precise segmentation. To address these challenges, we introduce TR7Net TransUNet-RSU7-Network , a novel hybrid deep learning framework that integrates the strengths of TransUNet and RSU7 architectures. TransUNet excels in global context modeling through its transformer encoder-decoder structure, while RSU7 is renowned for its robust feature extraction capabilities, particularly in handling intricate image features. TR7Net synergizes these two architectures to achieve superior segmentation performance. Extensive experiments on spinal endoscopic datasets demonstrate that TR7Net outperforms both TransUNet and nnUNet regarding segmentation accuracy and robustness in surgical scenarios. This work presents a significant advancement in medical image segmentation for spinal endoscopic surgery, offering a more precise and reliable solution for surgical assistance.
Keyword: Spinal endoscopic surgery, Medical image segmentation, Deep learning, TransUNet, RSU7
Cite@inproceedings{ICIC2025,
author = {Shao-Chi Pao, Liuyi Yang, Lin Lin, Fengcheng Mei, Xiaoxing Yang, and Bingding Huang},
title = {TR7Net: A Hybrid Transformer-CNN Framework for Endoscopic Image Segmentation with Validation on Spinal Surgery},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1219-1230},
}
- Behavior-Type Aware Representation Learning for Multiplex Behavior Recommendation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xin-Wei Yao, ShiXun Sun, Chuan He, Xin-Li Xu, Wei Huang, Qiang Li, and Xinggang Fan
Abstract: Efficient recommender systems are essential for modeling user-item interactions, such as views, favorites, and purchases. However, two challenges remain: 1 Complex user-item interactions require a more informative method for modeling multiplex behavior patterns on representation learning. 2 Ignoring the effect of different interactions on the target interaction in recommender system scenarios. In this study, we propose a more informative framework, Behavior-Type Aware Representation Learning for Multiplex Behavior Recommendation BA-MBRec , to learn representations of users and items by mining behavior-aware patterns in feature encoding. Specifically, BA-MBRec is a powerful approach tailored to effectively encode nodes across various multiplex structures. It not only adaptively captures individual behavior-aware patterns but also discovers the interdependencies across these various patterns within multiplex heterogeneous networks by hierarchical modeling and cross-behavioral aggregators. Experiments on three real-world datasets demonstrate its superior performance, with improvements of 5.2 in HR@10 and 10.16 in NDCG@10 over state-of-the-art methods. Our empirical studies further demonstrate the great potential of this framework for capturing the multiplexity of users’ preferences in recommendation scenarios. Our implementation code is available in https: github.com sunshixx BA-MBRec tree master.
Keyword: Recommender systems· Multiplex Heterogenous Graph· Learning latent representations· Contrastive Learning
Cite@inproceedings{ICIC2025,
author = {Xin-Wei Yao, ShiXun Sun, Chuan He, Xin-Li Xu, Wei Huang, Qiang Li, and Xinggang Fan},
title = {Behavior-Type Aware Representation Learning for Multiplex Behavior Recommendation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1587-1604},
note = {Poster Volume Ⅱ}
}
- CTGR: A Dual-Branch Convolutional-Transformer Network for RFID-Based Contactless Gesture Recognition, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ruofan Ma, Lvqing Yang, Yongrong Wu, Qianwen Mao, Wensheng Dong, Yifan Liu, Ziyan Wen, Bo Yu, and Yishu Qiu
Abstract: Abstract. RFID-based gesture recognition, while overcoming vision-based limitations in privacy and environmental robustness, faces three key challenges: inadequate temporal dynamics modeling, disjointed local-global feature integration, and suboptimal fusion of complementary Received Signal Strength Indicator RSSI and phase signals. To address these limitations, we present CTGR, a dual-branch CNN-Transformer architecture for RFID gesture recognition. Our CTGR framework first establishes dual parallel input pathways for RSSI and phase signals, then processes each branch through synergistic components: the STC layer employing depthwise separable convolutions to extract noise-robust local features from both modalities, and the MATE module applies multi-head self-attention to capture global temporal dependencies in each signal domain. Finally, CTGR combines the processed RSSI and phase features through a strategic fusion mechanism that effectively integrates their complementary properties, enabling comprehensive modeling of gesture dynamics. Extensive experiments across diverse scenarios validate the method's exceptional effectiveness, achieving a 97.38 average accuracy on a 7-class gesture dataset. Compared with mainstream recognition algorithms, CTGR demonstrates superior robustness in adapting to diverse users, varying gesture speeds, and challenging environmental conditions, ensuring consistent performance across real-world scenarios. This work enhances RFID-based interaction systems through spatio-temporal feature fusion, offering practical solutions for robust human-machine interfaces in dynamic environments.
Keyword: RFID, Gesture Recognition, CNNs, Transformer, Spatio-Temporal Features
Cite@inproceedings{ICIC2025,
author = {Ruofan Ma, Lvqing Yang, Yongrong Wu, Qianwen Mao, Wensheng Dong, Yifan Liu, Ziyan Wen, Bo Yu, and Yishu Qiu},
title = {CTGR: A Dual-Branch Convolutional-Transformer Network for RFID-Based Contactless Gesture Recognition},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2036-2047},
note = {Poster Volume Ⅱ}
}
- A Robust Intelligent Framework for Long Jump Action Scoring: From Pose Estimation to Motion Blur-Resistant Recognition, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhiliang Qiu, Yanyan Su, Min Lu, Jun Xiang, and Shenglian Lu
Abstract: With the advancement of deep learning, sports motion analysis has become in-creasingly data-driven. However, techniques such as pose estimation, action recognition, and scoring often operate independently. To address this limitation, a unified framework is proposed for structured and objective long jump analysis. One major challenge in real-world scenarios is motion blur, which greatly reduc-es the accuracy of pose estimation. To mitigate this issue, a long jump dataset was collected from 30 athletes, annotated across four movement phases, multiple lighting conditions, and four levels of motion blur. Based on this dataset, a simple MetaFormer-based model named BaseFormerPose is developed, using uniformly stacked window self-attention. It achieves 91.0 AP on the long jump motion-blur dataset. An automatic scoring module is also introduced, and its outputs show strong agreement with pose-based scores from three expert coaches, suggesting improved consistency and reduced subjectivity in long jump evaluation.
Keyword: Human Pose Estimation, Deep Learning, Performance Evaluation, Motion Blur, Automatic Scoring
Cite@inproceedings{ICIC2025,
author = {Zhiliang Qiu, Yanyan Su, Min Lu, Jun Xiang, and Shenglian Lu},
title = {A Robust Intelligent Framework for Long Jump Action Scoring: From Pose Estimation to Motion Blur-Resistant Recognition},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {60-74},
note = {Poster Volume Ⅰ}
}
- Bidirectional Decoding Collaborating with Hierarchical Imitation for Long-Horizon Task Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Faquan Zhang, Bo Jin, Kun Zhang, and Ziqi Wei
Abstract: Imitation Learning IL struggles with long-horizon tasks due to insufficient policy generalization and adaptability in dynamic environments. To address this, we propose a hierarchical framework that integrates Hierarchical Reinforcement Learning HRL with a Bidirectional Decoding mechanism. The framework decomposes complex tasks into subtasks, leveraging human demonstrations to rap-idly capture behavioral patterns through IL, while employing HRL to refine policies via reward-driven optimization. A novel Bidirectional Decoding mechanism leverages temporal consistency backward coherence and enhances robustness forward contrast by dynamically reassessing action sequences against strong and weak policy predictions. Evaluations in the Franka Kitchen environment demonstrate superior performance in task success rates and cumulative rewards, outperforming existing approaches. Ablation studies confirm the critical role of Bidirectional Decoding in resolving the rigidity of traditional action chunking, while the discovery of novel strategies—diverging from human demonstrations—highlights autonomous policy improvement. Our framework efficiently handles dynamic and diverse long-horizon tasks, even with limited demonstration data, offering a robust solution for real-world applications such as robotic manipulation.
Keyword: Imitation Learning, Hierarchical Reinforcement Learning, Bidirectional Decoding.
Cite@inproceedings{ICIC2025,
author = {Faquan Zhang, Bo Jin, Kun Zhang, and Ziqi Wei},
title = {Bidirectional Decoding Collaborating with Hierarchical Imitation for Long-Horizon Task Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {564-578},
}
- FarDetect3D: Enhancing Long-Range Object Detection in Multi-View 3D Systems, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Songyan Liu, Chaoyi Luo, Tong Xiao, Xiaofei Liu, and Jing Chai
Abstract: In this paper, we propose FarDetect3D, a novel multi-view 3D detection paradigm, to enhance the detection of long-range objects. FarDetect3D improves the existing sparse query-based multi-view 3D object detection by introducing two modules: Remote Detection Denoising ReDN and Long-range Feature Attention LrFA . ReDN utilizes the fake long-range depth information to generate sparse 3D queries, which improves the performance of the long-range detection. LrFA enhances the central features and captures the contextual relationships between the distant pixels, further boosting the detection accuracy. Experimental results show that our approach outperforms the state-of-the-art camera-based multi-view 3D detection methods, which can provide a robust solution for safe autonomous driving in complex environments.
Keyword: Multi-view 3D Object Detectio, Denoising. Long-range Feature Attention
Cite@inproceedings{ICIC2025,
author = {Songyan Liu, Chaoyi Luo, Tong Xiao, Xiaofei Liu, and Jing Chai},
title = {FarDetect3D: Enhancing Long-Range Object Detection in Multi-View 3D Systems},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {75-88},
note = {Poster Volume Ⅰ}
}
- S2LNet: Review the Non-Stationarity in Multivariate Time Series Forecasting, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhuang Xing
Abstract: Transformer-based methods have achieved remarkable advances in multivariate time series forecasting for their long-range ability. However, the non-stationarity of real-world time series, make these models particularly prone to overfitting when data distribution changes over time. Recently, despite various attempts in existing studies, they either overlook cross-channel mutual information gains or struggle to effectively capture cross-time features. To overcome these limitations, we review the characteristics of time series and develop a novel Short-term to Long-term network called S2LNet, which combines short-term cross-time features into long-term distributions and then models cross-channel dependencies models cross-time and cross-channel dependencies. For cross-time features, S2LNet first decomposes the input sequence into seasonal and trend items, then employs Transformers for capturing seasonal features seasonal items and multilayer perceptrons MLPs for trend items modeling trend features. These modeled short-term features are then fused and downsampled into long-term relationships through the Long-term Fusion module, followed by a channel-wise Transformer for long-term cointegration across channels. Extensive experiments on various real-world benchmarks have verified the superiority of our model over other state-of-the-art baselines.
Keyword: Non-stationary Time Series, Cross-time Dependencies, Cross-channel Dependencies
Cite@inproceedings{ICIC2025,
author = {Zhuang Xing},
title = {S2LNet: Review the Non-Stationarity in Multivariate Time Series Forecasting},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1-10},
}
- Adaptive Knowledge Distillation with Dynamic Weight Allocation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shaokang Zhang, Yixin Zhang, Ning Ran, and Liang Wang
Abstract: Knowledge distillation has become a key technique for compressing pre-trained language models. However, existing methods suffer from some limitations. First, the student model can only imitate the teacher model, but the teacher cannot adapt to the ability of the student model. Second, the student model should focus on learning the knowledge that it is unfamiliar with. Existing methods that distill all the knowledge of the teacher model may bring redundant information. To address these issues, we propose Dynamic Weighted Adaptive Knowledge Distillation, which can adaptively update the teacher model and weight distillation. Specifically, the teacher model is updated according to feedback on the performance of the distilled student model in the independent quiz dataset.We introduce a dynamic weight assignment mechanism that controls the knowledge learned by the student model based on the difference between the teacher model and the student model. Experimental results show that our method outperforms several state-of-the-art methods on multiple datasets.
Keyword: Knowledge Distillation � Pre-trained Language Models� Adaptive Weight Distillation.
Cite@inproceedings{ICIC2025,
author = {Shaokang Zhang, Yixin Zhang, Ning Ran, and Liang Wang},
title = {Adaptive Knowledge Distillation with Dynamic Weight Allocation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {994-1006},
}
- Research on Privacy-Preserving Action Recognition Method Based on Adversarial Learning and Feature Enhancement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaohan Qi and Xingang Wang
Abstract: Video action recognition technology provides technical support for automated monitoring and early event warning, liberating human resources, preventing anomalies in advance, and handling events more promptly. However, the issue of user privacy leakage is also a growing concern. This paper discusses how to conduct action recognition while protecting privacy, proposing a generative adversarial network architecture that combines a multi-scale feature fusion generator with a spatiotemporal consistency discriminator. The network continuously enhances its capabilities through adversarial training strategies to protect facial privacy in videos. This network extracts features at different levels in the video through a multi-scale feature fusion mechanism so that the video after facial privacy protection still maintains a high degree of realism at the same time, to ensure that the accuracy of action recognition is not compromised, the feature enhancement module is designed to enhance action features and inhibit the privacy features significantly C3D is used as the action recognition model to accurately recognize a variety of actions in the video, such as running, jumping, falling, realizing the action analysis of video content. In this paper, the proposed method is evaluated in terms of privacy protection level and action recognition performance. Experiments are conducted on three datasets, LFW, HMDB51, and Hollywood2, and the results show that this framework effectively protects personal information while maintaining high action recognition accuracy.
Keyword: Multi-scale feature fusion, Facial privacy protection, Action recognition, Feature enhancement.
Cite@inproceedings{ICIC2025,
author = {Xiaohan Qi and Xingang Wang},
title = {Research on Privacy-Preserving Action Recognition Method Based on Adversarial Learning and Feature Enhancement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {89-103},
note = {Poster Volume Ⅰ}
}
- Multi-Scale Contrastive Adapter for Vision-Language Model Group Robustness, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yue Cai, Wenqiong Zhang, and Yikai Wang
Abstract: While vision-language models VLMs like CLIP demonstrate strong zero-shot classification capabilities, their robustness to group shifts remains a critical challenge, as classification accuracy degrades significantly for minority groups. Existing methods to improve group robustness often require costly full-model retraining or rely on single-scale feature representations, which may inadequately capture diverse group characteristics. We propose the Multi-Scale Contrastive Adapter MSCA and related modules, a novel framework designed to improve group robustness in VLMs with less computational cost. MSCA employs a multi-scale feature representation strategy, leveraging contrastive learning across multiple dimensions to alleviate the group shift of the model on the dataset in multiple different dimensional spaces. A feature voting mechanism is introduced to dynamically select the most relevant feature dimensions during inference, further improving group robustness. Experiments across benchmarks Waterbirds, CelebA, CIFAR-10.02 show that MSCA significantly improves the worst-group accuracy to 86.1 and reduces GAP from 55.2 to 4.1 , outperforming recent advanced methods like FairerCLIP. Our findings highlight that MSCA offers a practical pathway toward more robust vision-language models.
Keyword: Group Robustness, Vision-Language Models, Multi-Scale
Cite@inproceedings{ICIC2025,
author = {Yue Cai, Wenqiong Zhang, and Yikai Wang},
title = {Multi-Scale Contrastive Adapter for Vision-Language Model Group Robustness},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {104-118},
note = {Poster Volume Ⅰ}
}
- Unlocking CLIP for Generalized Deepfake Detection with Dynamic Mixture-of-Adapters, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jialong Liu, Guanghui Li, and Chenglong Dai
Abstract: The rapid development of deepfake has raised significant security and ethical concerns, requiring robust and generalizable detection methods. In this work, we propose a novel framework for deepfake detection that leverages the power of large-scale pre-trained vision-language models, specifically the Contrastive Lan-guage–Image Pre-training CLIP model. Our approach fine-tunes the CLIP im-age encoder for deepfake detection by introducing a Dynamic Mixture-of-Adapters MoA architecture, which consists of multiple lightweight, domain-specific adapter modules that are dynamically activated based on input images. To further improve cross-domain performance, we introduce three auxiliary regulari-zation terms for fine-tuning: attention alignment and similarity regularization, which enforce consistency in feature extraction, and cached domain regulariza-tion, which preserves domain-specific prototypes. The proposed framework ef-fectively balances domain-specific adaptation and generalization, addressing criti-cal challenges in generalized deepfake detection. Extensive experiments on benchmark datasets, including FaceForensics__, CelebDF, DFDC, DFD, and DiFF, show that our method performs well in both in-domain and cross-domain deepfake detection tasks.
Keyword: Deepfake Detection, Dynamic Mixture-of-Adapters, Contrastive Language–Image Pre-training CLIP
Cite@inproceedings{ICIC2025,
author = {Jialong Liu, Guanghui Li, and Chenglong Dai},
title = {Unlocking CLIP for Generalized Deepfake Detection with Dynamic Mixture-of-Adapters},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {119-135},
note = {Poster Volume Ⅰ}
}
- Pipeline Method for Domain-specific Language Generation in Low-code Platforms Using Large Language Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xin Cui, Weixing Zhang, Linnan Jiang, Aimin Pan, and Fei Yang
Abstract: The advancements in language models, particularly Large Language Models LLMs have propelled the evolution of front-end low-code platforms, transitioning from the traditional drag-and-drop approach to an automated Domain-Specific Language DSL code-based generation process. Within this context, the objective becomes to generate the appropriate DSL from textual descriptions using large language models. Nonetheless, due to the limitation of DSL data, challenges persist in training or fine-tuning LLMs for some DSL generation tasks such as the front-end low code platform. This study proposes a novel pipe-line approach for DSL generation, taking advantage of the potential of prompt engineering. The methodology utilizes Named Entity Recognition NER , a DSL knowledge vector database, and LLMs. The experiments demonstrated significant improvements in the quality of DSL generation while reducing token and time costs.
Keyword: DSL generation, LLMs, Vector database, In-context learning, Prompt engineering
Cite@inproceedings{ICIC2025,
author = {Xin Cui, Weixing Zhang, Linnan Jiang, Aimin Pan, and Fei Yang},
title = {Pipeline Method for Domain-specific Language Generation in Low-code Platforms Using Large Language Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1007-1020},
}
- Personalized federated prototype learning in mixed heterogeneous data scenarios, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiahao Zeng, Wolong Xing, Liangtao Shi, Xin Huang, Jialin Wang, Zhile Cao, and Zhenkui Shi
Abstract: Federated learning has received significant attention for its ability to simultaneously protect customer privacy and leverage distributed data from multiple devices for model training. However, conventional approaches often focus on isolated heterogeneous scenarios, resulting in skewed feature distributions or label distributions. Meanwhile, data heterogeneity is actually a key factor in improving model performance. To address this issue, we propose a new approach called PFPL in mixed heterogeneous scenarios. The method provides richer domain knowledge and unbiased convergence targets by constructing personalized, unbiased prototypes for each client. Moreover, in the local update phase, we introduce consistent regularization to align local instances with their personalized prototypes, which significantly improves the convergence of the loss function. Experimental results on Digits and Office Caltech datasets validate the effectiveness of our approach and successfully reduce the communication cost.
Keyword: Skewed label distribution, Skewed distribution of features, Personalized Federal Learning, data heterogeneity
Cite@inproceedings{ICIC2025,
author = {Jiahao Zeng, Wolong Xing, Liangtao Shi, Xin Huang, Jialin Wang, Zhile Cao, and Zhenkui Shi},
title = {Personalized federated prototype learning in mixed heterogeneous data scenarios},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {579-594},
}
- Joint Deployment Optimization of Fixed and Vehicle-Mounted Edge Servers for Urban Internet of Vehicles, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xuyang Chen, Zhihai Tang, Jingtong Chen, Wei Song, Aiwen Huang, Le Chang, and Heng Li
Abstract: In the Internet of Vehicles IoV , the user computation demand varies spatially and temporally. Thus, traditional static edge servers with fixed capacity at fixed sites lack the flexibility to handle such user dynamic. To this end, we study the joint deployment optimization of fixed and vehicle-mounted edge servers for an IoV system, where fixed servers FESes offer the basic coverage of the computation offloading service, and vehicle-mounted edge servers VESes focus on serving demand hotspots on the move. We first design the GICUNet traffic flow prediction model to precisely forecast the future traffic. Next, we allocate the computation capacity to each FES using Bayesian Optimization to minimize the deployment cost. We then design a Mobile Server scheduling algorithm based on Bipartite Graph Rematching MS-BGR to plan the short-distance paths of the VESes that cover most of the user demand. Experimental results show that our solution is excellent in terms of traffic prediction accuracy, adaptability to spatio-temporal dynamic user demand, and energy-efficiency of the VES travel paths compared with existing popular algorithms.
Keyword: Internet of Vehicles IoV , Edge Computing, Vehicle-mounted Edge Serves, Traffic Prediction, Deployment, Path planning
Cite@inproceedings{ICIC2025,
author = {Xuyang Chen, Zhihai Tang, Jingtong Chen, Wei Song, Aiwen Huang, Le Chang, and Heng Li},
title = {Joint Deployment Optimization of Fixed and Vehicle-Mounted Edge Servers for Urban Internet of Vehicles},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {185-202},
}
- When Coordinated Knowledge Distillation Meets Mixture of Expert Inference: Insights from Portfolio Optimization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jianzeng Song, Senjie Xia, Yong Zhang, Jie Wei, and Jianfei Yin
Abstract: To achieve stable profits in uncertain financial environments characterized by pervasive noise signals, unavoidable transaction costs and zero-sum dynamics, it is crucial to construct optimized portfolios based on comprehensive data pro-cessing. However, existing methods often overlook the importance of learning hedge financial knowledge from data and leveraging mixture-of-expert MoE in-ferences to maximize agent profitability. To address this issue, we propose the Coordinated Knowledge Distillation and Inference Framework CKDIF . CKDIF introduces a three-dimensional discrete coordinate system to train deep rein-forcement learning agents with hedge trading behaviors, enabling the effective distillation of underlying micro-financial knowledge directly from noisy financial data. Furthermore, CKDIF constructs a novel ensemble of MoE networks by harnessing these pretrained agents and uses the ensemble to make final portfolio selection across any asset dimension. Notably, with transaction costs set at a real-istic rate of 0.1 , CKDIF outperforms eight representative algorithms on five out of six real-world financial datasets. It achieves an average cumulative wealth and Calmar ratio that are 1.66 and 3.70 times higher, respectively, compared to the buy-and-hold strategy. These results underscore the potency of coordinated knowledge distillation and MoE inference in enhancing agent performance in competitive environments.
Keyword: Coordinated Knowledge Distillation, Mixture of Experts, Portfolio Optimization.
Cite@inproceedings{ICIC2025,
author = {Jianzeng Song, Senjie Xia, Yong Zhang, Jie Wei, and Jianfei Yin},
title = {When Coordinated Knowledge Distillation Meets Mixture of Expert Inference: Insights from Portfolio Optimization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3478-3492},
}
- Enhancing Hallucination Detection in Large Language Models through a Dual-Position Debate Multi-Agent Framework, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Qile He and Siting Le
Abstract: Large language models LLMs have demonstrated remarkable capabilities but are prone to generating factual inconsistencies, or hallucinations. Addressing this challenge is crucial for the reliable deployment of LLMs. This paper introduces a novel Dual-Position Debate DPD framework designed to enhance the veracity of LLM-generated content and mitigate hallucinations. DPD simulates a human debate by organizing agents into affirmative and negative teams, each comprising information gatherers, rebutters, analysts, and a summarizer. These agents collaboratively construct arguments, critique opposing viewpoints, and synthesize their findings. Furthermore, multiple independent LLMs act as referees, evaluating the debate and rendering judgments to ensure fairness and encourage rigorous information scrutiny. Extensive experiments across question answering, summarization, and dialogue tasks demonstrate the efficacy of the DPD framework, outperforming existing baseline methods in reducing hallucinations.
Keyword: Large language models, Hallucination Detection, Multi-Agent Debate
Cite@inproceedings{ICIC2025,
author = {Qile He and Siting Le},
title = {Enhancing Hallucination Detection in Large Language Models through a Dual-Position Debate Multi-Agent Framework},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {292-305},
}
- Talk2Doc: A Patient Q A system using Retrieval-Augmented Generation with Weighted Knowledge Graphs and LLMs, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Asad Khan, Zafar Ali, Irfanullah, Abdul Aziz, and Pavlos Kefalas
Abstract: Effective retrieval in patient question-answering Q A systems is essential for addressing complex medical and healthcare inquiries. Traditional retrieval-augmented generation RAG methods leveraging large language models LLMs treat historical dialogues and issue-tracking tickets as unstructured text, overlooking critical intra- and inter-issue structures and semantic relationships. This limitation often reduces the contextual relevance and accuracy of generated responses. This paper introduces Talk2Doc, a novel Q A system that combines RAG, weighted knowledge graphs KGs , and LLMs to enhance response quality and contextual understanding in healthcare applications. In particular, Talk2Doc constructs a weighted KG from patient questions, preserving both intra-issue structures and inter-issue relationships. By retrieving relevant subgraphs, the system generates precise, contextually aware answers, effectively mitigating the drawbacks of fragmented text representations. The proposed system was rigorously evaluated using standard retrieval metrics NDCG@K, Recall@K, MRR and text generation metrics BLEU, METEOR, ROUGE . Results show that Talk2Doc significantly outperforms existing approaches, improving answer accuracy and maintaining the structural integrity of patient dialogue information. By prioritizing semantic relationships among medical entities, Talk2Doc refines retrieval performance, ensuring high-quality responses. Scalable across diverse medical domains and languages, Talk2Doc represents a transformative advancement for healthcare Q A systems.
Keyword: Weighted Knowledge Graph, Large Language Model, Retrieval Augmented Generation, Question Answering
Cite@inproceedings{ICIC2025,
author = {Asad Khan, Zafar Ali, Irfanullah, Abdul Aziz, and Pavlos Kefalas},
title = {Talk2Doc: A Patient Q A system using Retrieval-Augmented Generation with Weighted Knowledge Graphs and LLMs},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {595-610},
}
- Adaptive and High-security Image Steganography via Adversarial Embedding, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jibin Zheng, Li Ma, Wenyin Yang, Fen Liu, and Jihui Li
Abstract: The existing adversarial embedding methods, which based on distortion cost function and syndrome trellis codes STCs , achieve adversarial embedding by manually adjusting the embedding cost. To address the limitation of hand-crafted embedding cost, we propose an end-to-end adversarial image steganography method, which automatically achieves adversarial embedding by using the gradients from the steganalytic network. We utilize gradients of the stego image to generate the adversarial embedding mask, then integrate it wtih the loss function to guide the secret messages embedded into the specific security-enhanced regions. Comparing with several state-of-the-art steganography methods, extensive experimental results demonstrate that our method significantly improves the security performance against convolutional neural network CNN -based steganalyzers and re-trained steganalyzers. For example, when against steganalyzers, the security improvement in terms of detection accuracy of our method achieves 30.68 higher than the SOTA steganography methods at 0.4 bpp bit per pixel .
Keyword: Image steganography · adversarial sample · invertible neural network.
Cite@inproceedings{ICIC2025,
author = {Jibin Zheng, Li Ma, Wenyin Yang, Fen Liu, and Jihui Li},
title = {Adaptive and High-security Image Steganography via Adversarial Embedding},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1044-1056},
note = {Poster Volume Ⅰ}
}
- MECFE: A Novel Consensus Feature Engineering Approach for Enhanced Diabetes Risk Prediction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jijun Tong, Lisi Ye, Congcong Yang, Yang Chen, Yuqiang Shen, Qingli Zhou, and Shudong Xia
Abstract: Diabetes mellitus has emerged as a global health crisis, with its prevalence rising sharply and placing significant strain on healthcare systems. Early and accurate prediction of diabetes risk is crucial for effective prevention and management. While machine learning and deep learning techniques have made advancements in diabetes prediction, feature engineering remains an underemphasized area and faces several challenges, particularly in terms of interpretability, depth, and stability. This study introduces a novel Medical Enhancement Consensus Feature Engineering MECFE approach to enhance the accuracy and interpretability of diabetes detection. The MECFE integrates medical knowledge with data-driven approaches through two core modules: Medical-Data Collaborative Feature Construction MD-CFC and Heterogeneous Model Consensus Feature Selection HM-CFS . MD-CFC enriches feature construction using structured medical knowledge and ClinicalBERT, while HM-CFS utilizes a three-layer weighted fusion strategy that combines evaluations from heterogeneous models, enhancing stability and clinical relevance. Based on data preprocessing, the application of the MECFE improves the quality of model inputs, thereby enhancing model performance. The results show significant improvements in the performance metrics of all models, with Bayesian-optimized LightGBM achieving the best results: R2 increasing by 0.108, RMSE decreasing by 23.66 , MSE reducing by 41.75 , and MAPE dropping by 16.42 , demonstrating the effectiveness of the MECFE. Additionally, feature importance analysis and regression tree in LightGBM are employed to further enhance the model's interpretability, providing deeper insights into the factors influencing diabetes risk.
Keyword: Diabetes Prediction Feature Engineering Heterogeneous Model LightGBM ClinicalBERT Interpretability.
Cite@inproceedings{ICIC2025,
author = {Jijun Tong, Lisi Ye, Congcong Yang, Yang Chen, Yuqiang Shen, Qingli Zhou, and Shudong Xia},
title = {MECFE: A Novel Consensus Feature Engineering Approach for Enhanced Diabetes Risk Prediction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1231-1244},
}
- Online Knowledge Distillation with Feature Disentanglement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yifan Li, Zhengzhong Zhu, Pei Zhou, Kejiang Chen, and Jiangping Zhu
Abstract: Knowledge distillation is a method that trains a student model to approxi-mate the performance of a teacher model. However, in real-world applica-tions, the architectural discrepancy between teacher and student models of-ten impedes the comprehensive transfer of knowledge from teacher to stu-dent. Moreover, the reduction in learnable parameters in student models poses challenges in acquiring the high-dimensional knowledge from the teacher models. due to the complexity and redundancy of the teacher mod-el's high-dimensional features, the student model may encounter difficulties in learning these features. To address this challenge, this study proposes a knowledge distillation method based on variational autoencoders VAE . We use VAE to compress the teacher model's high-dimensional features into low-dimensional robust features, which are extracted and transferred to the student model through the variational autoencoder loss function. Experi-mental results show that student models using this method achieve signifi-cant performance improvements on multiple benchmark datasets. Our re-search indicates that the low-dimensional robust features extracted by VAE can effectively enhance the student model's learning process, providing a new approach for knowledge distillation tasks.
Keyword: Knowledge Distillation, Variational Autoencoders, Knowledge Transfer
Cite@inproceedings{ICIC2025,
author = {Yifan Li, Zhengzhong Zhu, Pei Zhou, Kejiang Chen, and Jiangping Zhu},
title = {Online Knowledge Distillation with Feature Disentanglement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1602-1616},
note = {Poster Volume Ⅱ}
}
- Distributed Hierarchical Structure for Multi-Objective OCR Text Recognition in Electrical Cabinet Inspection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhixin Kang , Shu Lin, Jungang Xu, and Qingjie Kong
Abstract: Electrical Cabinet Label Recognition is an important part of robotic intelligent inspection. Accurate label recognition is a prerequisite for effectively recording inspection anomalies. After the inspection robot takes pictures, OCR Optimal Character Recognition can detect the location of the electrical cabinet labels and recognize the content on the labels. Firstly, We use the oclip method to reduce the model training time and reduce the need for dataset size. Secondly, In the electrical cabinet label dataset, the text location detection accuracy based on the cutting-edge model DBNet__ reaches 96.74 , and the text content recognition accuracy based on ABINet reaches 89.33 . Through comparative experiments, we found that applying only the ABINet visual model can improve the text recognition accuracy to 90.58 , indicating that the language model in ABINet does not perform well for this task. The ABINet visual model is better at extracting local information from the image text, while the ABINet language model excels at recognizing semantic relationships across different parts of the text. Thirdly, Leveraging this characteristic, we designed a distributed hierarchical structure for the multi-objective OCR text recognition framework ABINet-TS. In the first layer, the visual model is used to recognize local information from the image, while in the second layer, the language model is applied to correlate and correct the predictions made by the visual model. This further improves the text recognition accuracy to 91.74 .We further replace the language model in ABINet-TS with BERT, which further improved the accuracy of text lines to 92.16 .
Keyword: OCR Text recognition Text detection Deep learning.
Cite@inproceedings{ICIC2025,
author = {Zhixin Kang , Shu Lin, Jungang Xu, and Qingjie Kong},
title = {Distributed Hierarchical Structure for Multi-Objective OCR Text Recognition in Electrical Cabinet Inspection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3168-3183},
}
- A Lightweight Real-time Detection Algorithm for Drone WiFi Hijacking, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jingxian Zhou and Chengcheng Shangguan
Abstract: Due to WiFi protocols prioritized usability over security and adopted weak encryption because of resource constraints,consumer drones are vulnerable to malicious hijacking attacks when communicating with ground stations via WiFi. In order to carry out drone WiFi hijacking attack experiments, the research team built the first real drone WiFi hijacking attack dataset to address the scarcity of real attack samples in this field, covering multiple types such as De-Authentication attacks. At the same time, considering the limited computing resources of drones and the high real-time requirements of communications, the team used self-built datasets and public datasets to conduct multi-dimensional comparative experiments on existing algorithms, selected the XGBoost model that takes into account both detection accuracy and lightness as the basic framework, designed a three-level feature screening mechanism of variance threshold filtering-high correlation elimination-Boruta feature selection , and introduced a weighted cross entropy loss function to optimize learning performance, and developed a lightweight drone WiFi hijacking real-time detection algorithm. The experimental results show that this method can effectively detect drone WiFi hijacking attack traffic, and its comprehensive performance is better than the existing algorithms compared with the original XGBoost method, the accuracy of the proposed method reaches 96.3 , and the inference time is shortened by half, which has both high accuracy and lightness.
Keyword: UAV WiFi Hijacking, Intrusion Detection System, Feature Selection Mechanism, Weighted Cross-Entropy.
Cite@inproceedings{ICIC2025,
author = {Jingxian Zhou and Chengcheng Shangguan},
title = {A Lightweight Real-time Detection Algorithm for Drone WiFi Hijacking},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1057-1073},
note = {Poster Volume Ⅰ}
}
- ASTPGFN: A Parallel Gating Fusion Framework with Adaptive Spatio-Temporal Modeling for Human Activity Recognition, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yan Mao, Guoyin Zhang, and Cuicui Ye
Abstract: Effective spatiotemporal modeling is crucial for Human Activity Recognition HAR using wearable sensors. This paper proposes a novel HAR method integrating a feature enhancement layer, a spatiotemporal gated fusion module, and a fine-grained spatiotemporal segmentation attention module. The feature enhancement layer transforms input data to improve its representational capacity. The spatiotemporal gated fusion module extracts global spatiotemporal features using a Transformer-based temporal encoder and a residual Graph Convolutional Network GCN -based spatial encoder, with an adaptive gating mechanism for feature fusion. The fine-grained segmentation attention module further refines local spatial and temporal features to enhance feature interaction. The fully integrated features are then classified using a fully connected layer.Experimental results on multiple public datasets demonstrate that the proposed method outperforms conventional approaches in terms of recognition accuracy, robustness, and generalization. This model provides an efficient and adaptive solution for HAR using wearable sensors.
Keyword: Wearable sensors, Human Activity Recognition, Spatiotemporal modeling, Graph convolutional network, Transformer.
Cite@inproceedings{ICIC2025,
author = {Yan Mao, Guoyin Zhang, and Cuicui Ye},
title = {ASTPGFN: A Parallel Gating Fusion Framework with Adaptive Spatio-Temporal Modeling for Human Activity Recognition},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {611-623},
}
- Alleviating Distribution Shift in Time Series Forecasting with an Invertible Neural Network Transformation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhiyuan Deng, Zhe Wu, Li Su, Yiling Wu, and Qingfang Zheng
Abstract: The distribution of time series data changes over time, posing challenges for accurate time series forecasting. One common approach to tackle the issue of distribution shift involves transforming the data into a latent space where the impact of the shift is minimized. However, existing methods heavily depend on experienced distribution assumptions and lack guidance on the latent space, leading to sub-optimal performance enhancements. To tackle the above challenges, we propose a new transformation technique to explicitly mitigate the distribution shift between historical and forecast data without any distribution assumptions. Specifically, an Invertible Neural Network Transformation INNT is designed to convert data into a smooth latent space. The INNT is constructed to be bidirectional and reversible by a temporal slicing mechanism, thereby preserving all information from the original data. Moreover, the transformation process is guided by a pretraining strategy that aims at reducing distribution divergence within the latent space. Additionally, the proposed method is model-agnostic, allowing for seamless integration into various existing forecasting models. Extensive experiments are conducted to validate the accuracy and generalization of the proposed framework.
Keyword: Multivariate Time Series Forecasting, Data Normalization, Distribution Shift.
Cite@inproceedings{ICIC2025,
author = {Zhiyuan Deng, Zhe Wu, Li Su, Yiling Wu, and Qingfang Zheng},
title = {Alleviating Distribution Shift in Time Series Forecasting with an Invertible Neural Network Transformation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {79-93},
}
- Micro-Grained Feature Enhanced Network for High-Accuracy Counterfeit Identification in Luxury Products, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Peng Yan, Gang Wang, Yu Yang, Linna Zhou, and Xiangli Meng
Abstract: Significant progress has been made in luxury goods authentication using fine-grained image classification to exploit morphological differences in LOGO authentication points. However, with the advancement of luxury counterfeiting techniques, traditional fine-grained methods based on LOGO morphological differences face severe challenges: the micro-level differences between high-quality counterfeits and genuine products have approached the limit of human visual resolution. To address this, this paper proposes an authentication approach focusing on the micro-grained differences between authentic and counterfeit luxury goods. The key challenge lies in that existing deep learning models are distracted by macro signals, making it difficult to represent and extract micro-grained information. To overcome this, we innovatively design a Micro-Grained Feature Enhancement Module and a Multi-Scale Feature Learning Network:The former introduces a background replacement mechanism that generates diverse backgrounds for semantically identical foregrounds, preventing the model from relying on macro background information to establish decision boundaries. This forces the model to focus on micro-grained differences in the foreground.The latter proposes a multi-scale feature capture module that establishes an adaptive key region localization and multi-scale feature fusion mechanism, combined with a weighted voting strategy to enhance classification robustness. Experimental results on two luxury goods datasets demonstrate that the diverse background mechanism and multi-scale feature fusion significantly enhance the representation of micro-grained features. Visualization results effectively show that the model's attention shifts from distracting strong-signal background regions to enhanced micro-grained foreground regions, significantly improving its ability to distinguish high-precision counterfeits and greatly enhancing the credibility of its decisions.
Keyword: Micro-grained feature enhancement,High-quality counterfeit authentication,Multi-scale classification
Cite@inproceedings{ICIC2025,
author = {Peng Yan, Gang Wang, Yu Yang, Linna Zhou, and Xiangli Meng},
title = {Micro-Grained Feature Enhanced Network for High-Accuracy Counterfeit Identification in Luxury Products},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3183-3198},
}
- Road Damage Detection Method based on Improved YOLO-World, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuli Zhou, Lei Li, Yushan Ma, Wen Ya, Bin Gu, and Guanchun Song
Abstract: Artificial Intelligence AI -driven road damage detection is a crucial component of intelligent transportation and smart cities. Given the outstanding performance of the You Only Look Once YOLO model in computer vision tasks, intelligent road damage detection technologies based on the YOLO model are currently among the mainstream methods. However, existing methods have limitations such as low accuracy and poor real-time performance when dealing with small objects and complex backgrounds. To alleviate these issues, this paper proposes an intelligent road damage detection method based on an improved YOLO-World model. Firstly, the Spatial Pyramid Pooling Cross Stage Partial Channel SPPCSPC convolutional structure and the FasterNet architecture are introduced into the detection backbone of the YOLO-World model. The aim is to simultaneously enhance the model's ability to extract multi-scale features and its detection speed. Secondly, the Convolutional Block Attention Module CBAM attention mechanism module is introduced into the detection head to improve the model's ability to extract key features. Finally, experimental results on the constructed complex road damage dataset show that the improved YOLO-World model outperforms existing state-of-the-art methods in terms of accuracy and detection speed. In particular, the mAP50 index of the improved model is 18.2 percentage points higher than that of YOLOv10.
Keyword: Road Damage Detection Complex Scenarios YOLO-World Real-time Detection
Cite@inproceedings{ICIC2025,
author = {Yuli Zhou, Lei Li, Yushan Ma, Wen Ya, Bin Gu, and Guanchun Song},
title = {Road Damage Detection Method based on Improved YOLO-World},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3199-3215},
}
- Energy Efficiency Maximization in Wireless Federated Learning under Inter-Channel Interference, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xinjie Yuan, Shengjie Zhao, Weichao Chen, Fengxia Han, Jin Zeng, and Enze Cui
Abstract: Federated Learning FL offers a collaborative learning paradigm for a large number of devices while avoiding data centralization, which is particularly advantageous in wireless environments. However, inter-channel interference, a factor not fully explored in existing FL studies, significantly impacts model transmission and poses substantial challenges for resource allocation in the Wireless FL WFL framework. Additionally, the limited energy budgets of mobile devices necessitate energy-efficient strategies across both local computation and model transmission phases. To address these challenges, we formulate a joint learning and communication optimization problem aimed at maximizing the system's Energy Efficiency EE under given constraints. We address the problem by decomposing it into two sub-problems: power allocation and client selection, then tackling them sequentially. First, a designed graph neural network GNN is employed to parameterize the power allocation strategy, which is optimized through a primal-dual algorithm. Based on the power allocation model, we propose an online algorithm for energy-efficient client selection. Experimental results demonstrate that the proposed method achieves superior EE and reduced energy consumption compared to three baseline methods, while ensuring high-quality wireless transmission and achieving comparable global model accuracy.
Keyword: Federated Learning, Graph Neural Networks, Client Selection, Power Allocation, Energy Efficiency
Cite@inproceedings{ICIC2025,
author = {Xinjie Yuan, Shengjie Zhao, Weichao Chen, Fengxia Han, Jin Zeng, and Enze Cui},
title = {Energy Efficiency Maximization in Wireless Federated Learning under Inter-Channel Interference},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1074-1091},
note = {Poster Volume Ⅰ}
}
- Cross-modal Adaptation of medical vision-language model for few-shot classification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jingyi Wu and S Kevin Zhou
Abstract: Medical Vision-Language Models VLMs show significant potential for aux-iliary diagnosis, especially given the continuous growth of medical image da-ta. However, adapting these models effectively with limited labeled data re-mains a challenge. This paper proposes a cross-modal adaptation method for few-shot medical image classification based on pre-trained VLMs. Our ap-proach leverages both image features and corresponding text features ex-tracted from the pre-trained models to train a classifier head. Furthermore, we employ the SHAP interpretability analysis method to select the most in-formative text features, thereby enhancing classification performance. We evaluated our method on the CheXpert5x200 dataset using MedCLIP and KAD as foundation models, comparing it against zero-shot classification and uni-modal adaptation using only image features . Results demonstrate that our approach significantly improves few-shot classification performance over the baselines. The SHAP-based feature selection provides additional gains. Ultimately, we present a general, simple, and efficient cross-modal ad-aptation strategy that enhances medical VLM performance using only a small number of image samples, contributing to more reliable AI-powered diagnos-tic tools.
Keyword: Vision-Language Model · Cross-Modal Adaptation · Few-shot Learning
Cite@inproceedings{ICIC2025,
author = {Jingyi Wu and S Kevin Zhou},
title = {Cross-modal Adaptation of medical vision-language model for few-shot classification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {624-634},
}
- DRC-YOLO: An Improved Fire Detection Algorithm Based on YOLO11, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wentao Li, Cunrui Zou, and Zhiguo Zhou
Abstract: Abstract. Fire detection plays an important role in safety and loss reduction and is widely used in scenarios such as forests, industrial facilities and urban environments. However, fire detection faces many challenges, including the diversity of flame appearance, dynamic and unpredictable behaviour, and the complexity of distinguishing flames from similar visual phenomena. Exist-ing fire detection algorithms generally suffer from low detection accuracy, slow processing speed, and poor adaptability to complex backgrounds. To ad-dress these limitations, we propose a fire detection algorithm called DRC-YOLO, an enhanced model based on YOLO11. First, we replace some stand-ard convolution blocks with dynamic convolution layers, which improves the detection accuracy of irregular fire regions while maintaining the light-weight design of the model. Second, we integrated CBAM into the detection head and enhanced it through residual connections to further enhance the network's ability to localise fire-affected regions and improve robustness. Fi-nally, we enhanced the spatial pyramid structure by simulating large-kernel convolution operations, significantly expanding the model's receptive field while improving multi-scale feature extraction capability and maintaining computational efficiency. Extensive experiments on the M4SFWD dataset show that DRC-YOLO improves the AP by 2.6 , the AR by 2.2 and the mAP@50 by 1.8 , which are significant advantages over the baseline model.
Keyword: Fire Detection, YOLO11, Dynamic Convolution, CBAM, LSKA.
Cite@inproceedings{ICIC2025,
author = {Wentao Li, Cunrui Zou, and Zhiguo Zhou},
title = {DRC-YOLO: An Improved Fire Detection Algorithm Based on YOLO11},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1618-1635},
note = {Poster Volume Ⅱ}
}
- Automatic Scoring for Elementary Mathematics Solutions Based on Bi-LSTM and Attention Mechanism, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ke Xu, Yuan Sun, Xi Zhang, Yan Huang, Songfeng Lu, and Yongqiang Zhang
Abstract: Automated scoring for mathematical subjective responses remains challenging due to the inherent complexity of integrating symbolic notations, procedural reasoning, and natural language explanations. Traditional NLP approaches fail to capture domain-specific mathematical semantics and logical dependencies between problem-solving steps. This study proposes a novel neural architecture named BiLSTM-AM that synergizes Bi-LSTM with hierarchical attention mechanisms to tackle pivotal challenges in the automated scoring of mathematical subjective responses. The architecture effectively models the conjunction of formulaic expressions and textual narratives through a dual-channel embedding strategy, dynamically allocates weights to pivotal procedural elements using step-level attention, and automates the alignment of student-generated solutions with knowledge graphs constructed from expert solutions. Evaluated on the dataset of 1120 elementary mathematics solutions, this paper achieves 90.52 scoring accuracy, outperforming state-of-the-art baselines by 8.17 . The novelty of this research is underscored by its interpretable attention mechanisms, which offer quantitative means to track the propagation of errors throughout the steps of mathematical solutions. This study contributes to the field of AI-enhanced educational evaluation by introducing a scalable and curriculum-sensitive framework designed for the assessment of open-ended mathematics problems.
Keyword: automatic essay scoring elementary mathematics questions problem solving bidirectional LSTM attention mechanisms
Cite@inproceedings{ICIC2025,
author = {Ke Xu, Yuan Sun, Xi Zhang, Yan Huang, Songfeng Lu, and Yongqiang Zhang},
title = {Automatic Scoring for Elementary Mathematics Solutions Based on Bi-LSTM and Attention Mechanism},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {203-220},
}
- MPNet: Multiscale Compensated Probabilistic Adaptive Style Transfer Network for Underwater Image Enhancement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shangyan Wang, Jianhua Yin, and Bingrong Xu
Abstract: Underwater images often suffer from color distortion and low contrast due to complex degradation factors, severely limiting their utility in many fields, such as marine exploration. Although existing methods predominantly focus on establishing a deterministic mapping from the degraded images to the enhanced images, they frequently overlook underwater environmental diversity in water types and lighting conditions. Some studies have noticed this problem, but the proposed methods often cause overcorrection of image colors and low contrast. To address these limitations, we propose the MPNet, a novel deep learning network capable of restoring color details and improving contrast in underwater images. Specifically, our approach introduces a novel framework centered around a Probabilistic Adaptive Style Transfer PAST module that integrates depthwise separable convolutions for uncertainty-aware enhancement to achieve more generalized color correction and contrast enhancement. Furthermore, a Multiscale Color-Texture Compensation MCTC module is developed through texture-color feedback utilizing parameter-shared SE-Res blocks and cross-layer fusion to mitigate detail loss and color bias in deep networks. Extensive experiments on the UIEB and the EUVP datasets demonstrate improvements in superiority over other advanced methods. Qualitative and visual results confirm its ability to effectively restore the color texture details and enhance contrast.
Keyword: Underwater Image Enhancement, Color Correction, Contrast Enhancement, Conditional Variational Autoencoder
Cite@inproceedings{ICIC2025,
author = {Shangyan Wang, Jianhua Yin, and Bingrong Xu},
title = {MPNet: Multiscale Compensated Probabilistic Adaptive Style Transfer Network for Underwater Image Enhancement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {580-591},
note = {Poster Volume Ⅰ}
}
- Prototype-based Bilevel Knowledge Distillation for Online Continual Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaochen Yang
Abstract: In Online Continual Learning OCL , all samples arrive sequentially and are seen only once, posing a challenge in balancing the learning of new tasks with the retention of old task knowledge. Traditional methods often ignore the protection of previously learned knowledge while learning new tasks, leading to catastrophic forgetting. On the other hand, some methods focus on minimizing the forgetting of previous knowledge, which hinders the model’s ability to effectively learn new knowledge. To address the balance between learning new tasks and preserving old knowledge, we propose a new framework—Prototype-based Bilevel Knowledge Distillation PBKD . By incorporating hierarchical prototypes and bilevel distillation mechanisms, PBKD enhances the model's ability to distinguish between classes through personalized feature representations and dynamically adjusts the knowledge transfer between teacher and student models. This approach allows for the effective retention of old task knowledge while improving the model’s capacity to learn new tasks. Extensive experimental results demonstrate that PBKD achieves a more favorable combination of accuracy and forgetting rate on three benchmark datasets, validating its effectiveness in addressing the knowledge learning and forgetting issue in OCL.
Keyword: Continual Learning, Knowledge Distillation, Prototype Learning.
Cite@inproceedings{ICIC2025,
author = {Xiaochen Yang},
title = {Prototype-based Bilevel Knowledge Distillation for Online Continual Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {635-649},
}
- Dynamic Encoding Selection: Adaptive Mamba and LLM Fusion for Temporal Knowledge Graph Reasoning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuchong Wei and Liangjun Zang
Abstract: This paper introduces Dynamic Encoding Selection DES , a novel framework for temporal knowledge graph reasoning that adaptively fuses representations from state space models and large language models LLMs . While recent advancements in sequence modeling have improved temporal pattern recognition, they often lack the semantic understanding necessary for comprehensive reasoning. Similarly, large language models possess rich semantic knowledge but struggle with structured temporal dependencies. Our approach leverages the complementary strengths of both paradigms—employing Mamba's state space architecture to efficiently capture sequential patterns with linear complexity, while utilizing LLMs' pre-trained knowledge for semantic understanding. The key innovation lies in our adaptive fusion mechanism, which dynamically selects between sequential, semantic, or combined representations for each query based on contextual factors like temporal proximity and entity connectivity.
Keyword: Adaptive Representation Fusion , Dynamic Encoding Selection
Cite@inproceedings{ICIC2025,
author = {Shuchong Wei and Liangjun Zang},
title = {Dynamic Encoding Selection: Adaptive Mamba and LLM Fusion for Temporal Knowledge Graph Reasoning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2315-2332},
note = {Poster Volume Ⅱ}
}
- Light but Mighty: When Lightweight Meets Stable Detection in Infrared Small Target Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Pengyuan Zhang, Cheng Zhang, Xubing Yang, Yan Zhang, and Li Zhang
Abstract: With the development of deep learning, infrared small target detection methods have yielded promising results benefiting from the powerful feature extraction capability of deep neural networks. However, these methods with large numbers of parameters are often impractical due to hardware limitations, while existing lightweight models tend to be unstable as they struggle to effectively capture small targets. To cope with these challenges, we propose a novel light but mighty network Limi-Net , which maintains lightweight while having a stable ability to capture small targets. First, since the targets are rare in infrared images, we propose an Infrared Target Simulator that generates pseudo targets for data augmentation, helping the model to better learn and recognize small targets. Then a Lightweight Stable Encoder is designed to guarantee reliable feature extraction from diverse receptive fields to improve the discrimination of small targets and reduce memory consumption. In addition, we introduce a Coarse-to-fine Hybrid Upsampling Decoder that combines a dual upsampling fusion method and a coarse to fine alignment mechanism to integrate multi-scale features while preserving critical information. Extensive experiments demonstrate that Limi-Net achieves state-of-the-art SOTA performance while maintaining a lightweight architecture, making it well-suited for practical deployment. Our code is available at https: github.com Arrosw LimiNet.
Keyword: Infrared small target, Deep learning, Lightweight, Coarse to fine training.
Cite@inproceedings{ICIC2025,
author = {Pengyuan Zhang, Cheng Zhang, Xubing Yang, Yan Zhang, and Li Zhang},
title = {Light but Mighty: When Lightweight Meets Stable Detection in Infrared Small Target Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3214-3225},
}
- A Heterogeneous Network Community Detection Method Based on GCN and Social Recommendation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: 张子潇,刘井莲,钟珊,司亚丽,龚声蓉
Abstract: To address the issue of sparse and singular connections between users and communities, we propose a novel heterogeneous network community detection method GS-HCD that combines GCN graph convolutional neural network with social recommendation. First, the heterogeneous information network is transformed into a user-user social network Guu and a user-community binary network Guc based on predefined meta paths. The GCN model architecture was adjusted by adding regularization and appropriate activation functions, which achieves the network optimization of Guu and Guc. A labeling mechanism is then introduced to merge the two optimized net-works and construct a user-community extension graph Gcu. Then, in the ex-tended graph Gcu, meta paths that satisfy the criteria are selected between the target user node and the candidate community nodes, and the AvgSim similarity index is used to calculate the similarity based on these meta paths, forming candidate node pairs. Finally, input the vector information of user community candidate nodes into the social recommendation model based on deep learning, learn to capture the dynamic changes of user interests in social networks, and recommend the optimal communities for users. The experimental results on three classical datasets confirm the performance of the GS-HCD model and the accuracy of community detection. Compared with multiple representative methods, the GS-HCD model performs outstandingly in terms of Precision, Recall, and F-score values, and its F1 and AUC values generally exceed the comparative baselines, demonstrating its effectiveness in community detection tasks.
Keyword: Community discovery. Social recommendation. Graph Convolutional Neural Network. Heterogeneous Information Network. Meta path.
Cite@inproceedings{ICIC2025,
author = {张子潇,刘井莲,钟珊,司亚丽,龚声蓉},
title = {A Heterogeneous Network Community Detection Method Based on GCN and Social Recommendation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1633-1650},
note = {Poster Volume Ⅱ}
}
- Algorithm Research for Crop Pest and Disease Identification Based on Improved YOLOv8, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: YueHong Lin, Chen Dong, and ShuTingWei
Abstract: To address the issues of insufficient feature extraction in complex environ-ments, missed detections of small-scale pests and diseases, and inadequate multi-scale feature fusion in crop pest and disease detection, this paper pro-poses an improved YOLOv8-based pest and disease identification algo-rithm.First, to enhance feature extraction in complex scenarios, the Swin Transformer module was introduced into the YOLOv8 backbone network. Leveraging its hierarchical structure and Shifted Window Multi-Head Self-Attention, the model’s ability to capture global pest and disease features was strengthened. Second, to mitigate missed detections of small-scale pests and diseases, an SE attention module was added to the Neck, enabling adaptive channel-wise feature weighting to enhance feature representation. Finally, the YOLOv8 Concat module was replaced with BiFPN, which uses a learnable bi-directional fusion strategy to optimize cross-scale feature interactions. Exper-imental results showed that the improved YOLOv8 model excelled in detect-ing 23 crop pests and diseases, achieving 96.6 precision and 98.2 mAP50. It also maintained high accuracy and efficiency under challenging conditions like overcast or strong lighting, demonstrating strong application potential.
Keyword: Pest and disease detection, YOLOv8, Swin Transformer, BiFPN, Multi-scale feature fusion
Cite@inproceedings{ICIC2025,
author = {YueHong Lin, Chen Dong, and ShuTingWei},
title = {Algorithm Research for Crop Pest and Disease Identification Based on Improved YOLOv8},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3230-3246},
}
- Evolutionary Replay-Driven Federated Class-Incremental Learning for Cyber-attack Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Junyan Su, Wenbo Fang, Linlin Zhang, Wengang Ma, Junjiang He
Xiaolong Lan, and Tao Li
Abstract: With the continuous evolution of cyber-attack patterns and strategies, the task of detecting cyber-attacks in real-time has become increasingly critical. However, existing replay-based class-incremental learning methods face two fundamental challenges: a relying on continuous sample aggregation may raise concern about privacy data leakage, and b lacking careful consideration of evolving cyber-attacks, leading to insufficient detection capabilities for attack variants. In this paper, we propose an evolutionary replay-driven federated class-incremental learning for cyber-attack detection, which effectively enhances the detection of variants in incremental learning tasks while protecting data privacy. Specifically, at task T, each local client trains a classification model and stores prototypical features ‘genes’ for each class, accompanied by a category-specific convolutional autoencoder CAE model. Under privacy-preserving mechanisms, a global network attack detection model is trained via federated learning, with subsequent updates propagated to local client models. At task T_1, old knowledge genes are generated from the stored prototypical sample library using a gene evolution strategy and the pre-trained CAE model. These generated features are integrated with new data for model update. Finally, the detection model is updated again through the federated learning mechanism. Extensive experiments conducted on authoritative datasets demonstrate the effectiveness of our proposed method. Experimental results show that our method achieves 90.16 accuracy in Task 2 and 85.90 accuracy in Task 3. Notably, in Task 3, our method outperforms the random replay method by 4.66 , the GAN-based replay method by 6.91 , and the VAE-based replay method by 22.32 . Code available: https: github.com sjy722 Evolutionary-Replay-Driven-FCIL
Keyword: Federated Learning, Incremental Learning, Cyber-attack Detection
Cite@inproceedings{ICIC2025,
author = {Junyan Su, Wenbo Fang, Linlin Zhang, Wengang Ma, Junjiang He
Xiaolong Lan, and Tao Li},
title = {Evolutionary Replay-Driven Federated Class-Incremental Learning for Cyber-attack Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2517-2534},
}
- Representative Chain-of-Reasoning Framework for Aspect Sentiment Quad Prediction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jiajian Li, Zhongquan Jian, Yancheng Wang, Qingqiang Wu, and Meihong Wang
Abstract: Aspect Sentiment Quad Prediction ASQP is a crucial sentiment analysis task that has attracted increasing attention. The most recent studies focus on generating complete sentiment quadruples through end-to-end generative models. However, thesemethods heavily depend on labeled data quality and quantity, performing poorly in low-resource scenarios and less suitable for real-world applications. To address these issues, we propose a novel Representative Chain-of-Reasoning framework RCR , with the aim of providing representative knowledge for large language models LLMs and fully activating their reasoning capabilities for ASQP. Specifically, we develop a Chain Prompting ChaPT module to decompose the ASQP task into three subtasks using the step-by-step reasoning mechanism. Then, a Representative Demonstration Retriever RepDR is introduced to provide ChaPT with representative demonstrations, balancing diversity and similarity, and enhancing the reasoning capabilities of LLMs at each step. Experimental results demonstrate the superiority of RCR in low-resource scenarios, with its optimal performance even surpassing that of the fully supervised BERT baseline.
Keyword: Aspect sentiment quad prediction In-Context learning Demonstration retrieval
Cite@inproceedings{ICIC2025,
author = {Jiajian Li, Zhongquan Jian, Yancheng Wang, Qingqiang Wu, and Meihong Wang},
title = {Representative Chain-of-Reasoning Framework for Aspect Sentiment Quad Prediction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1021-1038},
}
- Multivariate Time Series Anomaly Detection Model Selection based on Rank Aggregation of Performance Metrics, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Mingyu Liu, Yijie Wang, Xiaohui Zhou, and Yongjun Wang
Abstract: Multivariate time series anomaly detection MTAD has significant practical relevance in various applications. Despite the recent proposals of numerous MTAD models, none have demonstrated consistent optimal performance across various scenarios. Hence, there is an urgent need to investigate the accurate selection of the most appropriate MTAD model for a specific dataset. Most studies on model selection depend on extensive pre-trained models. Nevertheless, in real-world situations, labels for time series anomaly data are seldom accessible, and the training cost of pre-trained models is significant. This paper presents an unsupervised method for selecting multivariate time series anomaly detection model based on rank aggregation of performance metrics. We create a reliable performance ranking by aggregating rankings from various unsupervised evaluation metrics. Subsequently, an early-stopping mechanism is applied to minimize computational expenses by identifying the Top-K models that consistently maintain their ranking in robust performance throughout the epochs. Extensive experiments on six real-world datasets demonstrates that our proposed unsupervised model selection method is comparably effective to the supervised method in selecting the optimal MTAD model.
Keyword: Multivariate time series, Anomaly detection, Model Selection, Rank Aggregation
Cite@inproceedings{ICIC2025,
author = {Mingyu Liu, Yijie Wang, Xiaohui Zhou, and Yongjun Wang},
title = {Multivariate Time Series Anomaly Detection Model Selection based on Rank Aggregation of Performance Metrics},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {650-660},
}
- EHNet: An Efficient Hybrid Network for Crowd Counting and Localization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuqing Yan and Yirui Wu
Abstract: In recent years, crowd counting and localization have become crucial techniques in computer vision, with applications spanning various domains. The presence of multi-scale crowd distributions within a single image remains a fundamental challenge in crowd counting tasks. To address these challenges, we introduce the Efficient Hybrid Network EHNet , a novel framework for efficient crowd counting and localization. By reformulating crowd counting into a point regression framework, EHNet leverages the Spatial-Position Attention Module SPAM to capture comprehensive spatial contexts and long-range dependencies. Additionally, we develop an Adaptive Feature Aggregation Module AFAM to effectively fuse and harmonize multi-scale feature representations. Building upon these, we introduce the Multi-Scale Attentive Decoder MSAD . Experimental results on four benchmark datasets demonstrate that EHNet achieves competitive performance with reduced computational overhead, outperforming existing methods on ShanghaiTech Part_A, ShanghaiTech Part_B, UCF-CC-50, and UCF-QNRF. Our code is in https: anonymous.4open.science' EHNet.
Keyword: Crowd counting, Crowd localization, Efficient Hybrid Networks
Cite@inproceedings{ICIC2025,
author = {Yuqing Yan and Yirui Wu},
title = {EHNet: An Efficient Hybrid Network for Crowd Counting and Localization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {592-603},
note = {Poster Volume Ⅰ}
}
- STE-YOLO: A Dual Enhancement Architecture for Small Object Detection in Aerial Imagery, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wancheng He, Yihang Shen, Jun Wan, Peitao Wang, Haohan Ding, and Song Shen
Abstract: Object detection in aerial imagery faces unique challenges due to small object scales, ambiguous textures, and dense distributions. Traditional detection methods often struggle with preserving structural information of small targets and effectively utilizing both global context and local details. To address these limitations, we propose STE-YOLO Small Target Enhancement YOLO , featuring two key innovations: a Multi-level Content-Aware Feature Enhancement Module MCAFE that dynamically adjusts feature fusion strategies, and a Local-Enhance Global Attention LEGA module that effectively balances global context and local feature representation. Extensive experiments on VisDrone2019 dataset demonstrate that STE-YOLO significantly outperforms baseline models. Compared to YOLOv10, our method achieves improvements of 10.8 in mAP@0.5 and 11.0 in mAP@0.5:0.95 on VisDrone2019. Additionally, we conducted generalization experiments on DOTAv1.5 dataset, where our method also shows strong performance, demonstrating the robustness and adaptability of our approach across different aerial imagery scenarios while maintaining acceptable computational overhead.
Keyword: Small object detection target recognition Deep learning Computer vision YOLO
Cite@inproceedings{ICIC2025,
author = {Wancheng He, Yihang Shen, Jun Wan, Peitao Wang, Haohan Ding, and Song Shen},
title = {STE-YOLO: A Dual Enhancement Architecture for Small Object Detection in Aerial Imagery},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {604-619},
note = {Poster Volume Ⅰ}
}
- Learning Flexible Job Shop Scheduling with Bidirectional Cross-Attention Network via Deep Reinforcement Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiongxin Zha, Lisha Dong, Muhammad Sadiq, and Qingling Zhu
Abstract: Neural combinatorial optimization NCO for solving scheduling problems have gained increasing research attention because they do not rely on expert knowledge. However, existing NCO approaches face significant challenges in the flexible job shop scheduling problem FJSP , because neural networks struggle to effectively capture the heterogeneous interactions among multiple machines and operations. To address this issue, we propose a bidirectional cross attention neural architecture trained by deep reinforcement learning. Our approach introduces dual interaction mechanism enables simultaneous learning of operation priorities and machine availability constraints. We demonstrate the effectiveness of this approach through extensive experi-ments, showing its superiority over classical network architectures on both synthetic datasets.
Keyword: Deep Reinforcement Learning, Flexible Job Shop Scheduling Problem, Neu-ral combinatorial optimization, Cross-Attention.
Cite@inproceedings{ICIC2025,
author = {Xiongxin Zha, Lisha Dong, Muhammad Sadiq, and Qingling Zhu},
title = {Learning Flexible Job Shop Scheduling with Bidirectional Cross-Attention Network via Deep Reinforcement Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1649-1661},
note = {Poster Volume Ⅱ}
}
- TIFVec: an Image Vectorization Approach Based on Texture Intensity Field, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yongjian Liu, Jiaqi Liu, Jiachen Li, Yanchun Ma, Qing Xie, and Anshu Hu
Abstract: Image vectorization works to convert raster images into vector graphics, which is widely used in various fields. The current state-of-the-art approaches are learning-based models, which aims to establish the correlation between the raster image and a specific number of randomly distributed primitives through deep learning, such as Bézier curves. However, these methods have not payed attention to the influence of some important factors on the performance of vectorization, such as the number of primitives and the initial primitive positions. Therefore, the converted vector outputs usually suffer from various shortcomings such as excessively high number of primitives, unclear rendering of details, color mean errors, and prolonged primitives optimization time. To address the aforementioned issues, we propose an image vectorization framework termed TIFVec, which takes the Bézier curves as primitives and discovers the interdependent mechanism among different factors. In the framework, we introduce the texture intensity field TIF , which is able to guide the optimization of those factors above, in terms of the primitive initialization strategy and TIF-based objective function. Based on TIF, the connections among different factors can be constructed, and the performance of vectorization can be effectively improved. The experimental results demonstrate that our method significantly outperforms the current state-of-the-art models across multiple datasets, in terms of the visual results, evaluation indicators, and primitives optimization time.
Keyword: Image Vectorization, Texture Intensity Field, Vector Graphics
Cite@inproceedings{ICIC2025,
author = {Yongjian Liu, Jiaqi Liu, Jiachen Li, Yanchun Ma, Qing Xie, and Anshu Hu},
title = {TIFVec: an Image Vectorization Approach Based on Texture Intensity Field},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {620-634},
note = {Poster Volume Ⅰ}
}
- RINQC: A Robust Invisible Network for Quick Response Code, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chaoen Xiao, Ruiling Luo, Lei Zhang, Jianxin Wang, and Duo Zhang
Abstract: The widespread application of Quick Response Code urgently necessitates en-hancing their traceability and anti-counterfeiting capabilities. However, traditional QR code-based watermark protection technology is susceptible to interference during cross-media transmission. To address the above mentioned issue, this pa-per proposes a QR image watermark algorithm based on the UNet__ network. First, leveraging the multi-angle and high-speed recognition characteristics of QR codes, targeted improvements are made to the network structure and training process, with dilated convolutions incorporated into the encoder to enhance detail precision. Then, a refined local discrimination is achieved through the integration of PatchGAN, continuously optimizing the watermark embedding method to im-prove the imperceptibility of the watermark. Finally, a distortion network mecha-nism is introduced during the training process to simulate the environment of cap-turing QR codes from different angles, thereby enhancing the robustness of the images. Experiments demonstrate that the proposed method achieved PSNR and SSIM values of 36.27 dB and 0.978 respectively, with better robustness and im-perceptibility.
Keyword: Quick Response Code, Image Watermark, UNet__ Network, Deep Learning.
Cite@inproceedings{ICIC2025,
author = {Chaoen Xiao, Ruiling Luo, Lei Zhang, Jianxin Wang, and Duo Zhang},
title = {RINQC: A Robust Invisible Network for Quick Response Code},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {635-646},
note = {Poster Volume Ⅰ}
}
- HydraMamba: An Efficient and High-Performance Architecture for Time Series Classification through Multi-Mechanism Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Peiqi Tang, Mengna Liu, Xin Qin, Yutao Jin, and Xu Cheng
Abstract: Multivariate time series classification is a key task in fields such as healthcare, financial analysis, and industrial monitoring. How- ever, existing methods still face challenges in modeling complex depen- dencies across different time scales, and their computational efficiency is relatively low. To address these issues, we propose an efficient and high- performance model architecture, HydraMamba, which enhances model- ing capability by integrating three core mechanisms: The Time Feature Recalibration Module TFRM adaptively adjusts the feature weights of time segments to improve the model’s ability to focus on key moments the Multi-Receptive Field Feature Extractor MRFFE extracts local and global information in parallel using receptive fields of different sizes, enhancing feature representation and the Dynamic State-Space Mixer DSSM , based on state-space modeling, effectively integrates multi-scale temporal features. We conducted extensive experiments on the UEA multivariate time series classification benchmark datasets, where Hydra- Mamba outperformed mainstream methods like TodyNet on the Heart- beat dataset, achieving an F1 score of 0.898. The experimental results show that HydraMamba maintains high computational efficiency while offering superior classification performance, demonstrating strong gener- alization ability and application potential.
Keyword: Time Series Classification · Multiscale · Spatial Model
Cite@inproceedings{ICIC2025,
author = {Peiqi Tang, Mengna Liu, Xin Qin, Yutao Jin, and Xu Cheng},
title = {HydraMamba: An Efficient and High-Performance Architecture for Time Series Classification through Multi-Mechanism Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2052-2063},
note = {Poster Volume Ⅱ}
}
- Arbitrary-Scale Super-Resolution for Remote Sensing Images with Multi-Branch Feature Enhancement and Scale-Specific Dictionary Attention, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaoxuan Ren, Qianqian Wang, Xin Jin, and Qian Jiang
Abstract: For remote sensing image processing, high quality images are particularly essential because of their more detailed texture information and clearer edges. Image super-resolution SR could reconstruct the high-resolution HR image from its low-resolution LR counterpart, overcoming the limitations of devices and environmental conditions. The shortcoming of most SR methods is that they could only be applied to fixed-scale SR task, which requires more training and deployment cost. Therefore, arbitrary-scale SR approaches are proposed to restore HR images of different scales with a single model. However, most approaches only use simple MLPs or the local attention mechanism in the decoding phase, which limits the representative power of the model. In this work, we propose an arbitrary-scale super-resolution method for remote sensing images with Multi-branch Feature Enhancement and Scale-specific Dictionary Attention MFESDA . We use a Multi-branch Feature Enhancement MFE module which combines global information and scale-aware attention to capture more informative features. Moreover, we design a Scale-specific Multi-level Dictionary Attention Modulation SMDAM module in the decoding process which makes use of scale-specific priors to improve the performance. The experimental results have shown that the proposed model performs better than other arbitrary-scale SR approaches and our visual quality is higher than other approaches.
Keyword: Remote sensing, super resolution, implicit neural representation, attention mechanism
Cite@inproceedings{ICIC2025,
author = {Xiaoxuan Ren, Qianqian Wang, Xin Jin, and Qian Jiang},
title = {Arbitrary-Scale Super-Resolution for Remote Sensing Images with Multi-Branch Feature Enhancement and Scale-Specific Dictionary Attention},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {647-662},
note = {Poster Volume Ⅰ}
}
- RanpCode: Rank-based Pruning after Complete CP Decomposition for Model Compression, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Lianhua Yu, Guodong Zou, Guoming Lu, Jielei Wang, Kexin Li, and Guangchun Luo
Abstract: Traditional convolutional neural networks CNNs architecture has limitations such as over-parameterization and a large computational demand. One effective approach to address these issues is to replace convolutional kernels with their low-rank tensor approximations. Among the various methods available, Canonical Polyadic CP tensor decomposition stands out as particularly promising. However, employing CP decomposition for CNNs compression presents two major challenges. First, numerical optimization algorithms used to fit convolutional tensors can cause rank-one tensors to cancel each other out, leading to instability and complicating the fine-tuning of the resulting model. Second, determining the appropriate rank for CP decomposition is inherently complex. To overcome these challenges, we propose RanpCode, a novel compression method based on CP decomposition. This method incorporates specially designed numerical fitting techniques to ensure complete CP decomposition and address instability issues. Furthermore, it employs a rank pruning scheme to automatically determine the optimal rank for each layer, with the rank globally optimized and adjusted according to the desired compression rate. Our evaluations on popular CNNs architectures for image classification demonstrate that RanpCode achieves higher compression rates while maintaining superior accuracy.
Keyword: model compression, convolutional neural networks, CP decomposition, instability, rank selection.
Cite@inproceedings{ICIC2025,
author = {Lianhua Yu, Guodong Zou, Guoming Lu, Jielei Wang, Kexin Li, and Guangchun Luo},
title = {RanpCode: Rank-based Pruning after Complete CP Decomposition for Model Compression},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1664-1677},
note = {Poster Volume Ⅱ}
}
- SAM-DRA-UNet: An Enhanced U-Net Framework Integrating Knowledge Distillation and Transfer Learning for Brain Tumor Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Weihao Huang, Chunhong Jiang, Yuheng Huang, Jiayu Ye, Yuntao Nie, and
Jiahui Pan
Abstract: Brain tumor segmentation is challenged by irregular morphology, scarce annotations, and class imbalance in medical imaging. This study proposes SAM-DRA-UNet, an enhanced U-Net framework integrating knowledge distillation and transfer learning. We first develop the DRA-UNet architecture by augmenting U-Net’s convolutional blocks with a novel depthwise-pointwise reinforced module and multiple residual simple attention modules, which infer 3D attention maps without parameter expansion while preserving baseline network weights. Furthermore, we employ the SAM model as the teacher network and the DRA-UNet as the student network, transferring knowledge through distillation. Experiments demonstrate that the model achieves mIoU scores of 0.8276 on the TCGA-LGG dataset and 0.8479 on the BraTS21 dataset, significantly outperforming the baseline U-Net and existing state-of-the-art methods. The model also exhibits stable performance across diverse datasets and knowledge distillation temperature settings, validating its generalization capability and providing a reliable solution for brain tumor image segmentation.
Keyword: Brain tumor segmentation, SAM-DRA-UNet, Knowledge distillation, Transfer learning
Cite@inproceedings{ICIC2025,
author = {Weihao Huang, Chunhong Jiang, Yuheng Huang, Jiayu Ye, Yuntao Nie, and
Jiahui Pan},
title = {SAM-DRA-UNet: An Enhanced U-Net Framework Integrating Knowledge Distillation and Transfer Learning for Brain Tumor Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3245-3260},
}
- Research on Edge-Device Collaborative Task Offloading for Dependent Tasks, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Bo Peng, Xinxin Chen, Qiang Li, Li Yan, and Xinyu Zhang
Abstract: Edge-device collaborative computing can provide efficient computing services for emerging intelligent applications by coordinating the resources of terminal devic-es and edge servers. Given the characteristics of such applications, which are usually composed of multiple dependent subtasks, how to achieve efficient task collaborative processing becomes a key challenge. This paper models dependent tasks as DAG, and its multi-node collaborative scheduling problem has been proven to be an NP-hard problem. To solve this problem, this paper designs a two-stage optimization framework: first, the predicted cost priority is introduced in the task sorting stage, the subtask computational cost is estimated to dynami-cally adjust the order, and combined with the deep reinforcement learning meth-od, the optimal unloading location is matched for each task based on the proximal policy optimization PPO algorithm. Simulation results show that the designed method can effectively reduce the cost of task completion.
Keyword: Edge-device collaboration,Dependency Task,Task Priority,Proximal Policy Optimization PPO
Cite@inproceedings{ICIC2025,
author = {Bo Peng, Xinxin Chen, Qiang Li, Li Yan, and Xinyu Zhang},
title = {Research on Edge-Device Collaborative Task Offloading for Dependent Tasks},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {367-377},
}
- DiMNet: Multi-Label Detection Algorithm for Panoramic Radiographs, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yanfu Li, Ruijie Huang, Pei Zhou, Zhengzhong Zhu, and Jiangping Zhu
Abstract: In recent years, deep learning has been widely applied to single-target detec-tion tasks in dental images, achieving promising results. Existing methods aiming to achieve multi-label detection rely heavily on fully annotated data. However, due to the difficulty in obtaining such fully annotated data, the de-tection accuracy remains low, failing to meet the requirements of clinical di-agnosis. To address this limitation, we propose DiMNet, a end-to-end multi-label object detection model based on an improved DiffusionDet, which in-corporates multi-stage training, weight transfer, and cross-stage guidance to enable the model to be trained on partially annotated data, thereby improving detection accuracy. Additionally, we enhance the feature extraction backbone by integrating the Mamba model, leveraging its linear-time sequence model-ing approach to maintain high accuracy while significantly improving infer-ence speed. The model is capable of identifying dental pathologies in pano-ramic X-ray images while simultaneously providing the quadrant and tooth number of the affected tooth, maintaining high accuracy and fast inference speed, thereby meeting the requirements of fully automated diagnosis. During the experiments, we utilized DENTEX2023, which features a multi-level structure, enabling a comprehensive evaluation of the effectiveness of the proposed improvements in DiMNet. Experimental results demonstrate that DiMNet achieves AR scores of 71.7 for quadrant detection, 66.8 for enumeration, and 69.1 for dental pathology detection on the test dataset, accurately detecting all three targets in dental images simultaneously.
Keyword: Multi-Label Detection, Diffusion, MambaVision
Cite@inproceedings{ICIC2025,
author = {Yanfu Li, Ruijie Huang, Pei Zhou, Zhengzhong Zhu, and Jiangping Zhu},
title = {DiMNet: Multi-Label Detection Algorithm for Panoramic Radiographs},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3261-3275},
}
- Cross-Document Fact Verification Based on Fine-Grained Graph Neural Network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaoman Xu, Xiaoxu Zhu, and Peifeng Li
Abstract: Cross-Document Fact Verification CDFV aims to retrieve evidence from multiple documents to verify the factuality of a given claim. However, existing CDFV approaches fail to capture complex semantic relationships and fine-grained information in the evidence. To address these issues, we propose a Fine-Grained Graph Neural Network FGGNN for CDFV. FGGNN constructs a sentence-level graph during the evidence selection stage and efficiently propagates information within the graph using Graph Attention Networks GAT , accurately capturing the complex relationships between sentences. This enables FGGNN to select trustworthy and relevant evidence. In the claim verification stage, FGGNN constructs a word-level evidence graph to capture fine-grained relationships at the word level. It then uses a Relational Graph Convolutional Network RGCN to propagate and update information within the graph, fully uncovering the potential logic in the evidence. Additionally, an attention mechanism is introduced to weight the evidence based on its relevance to the claim, emphasizing the importance of key evidence. Finally, FGGNN considers all the evidence and claim information to accurately predict the label of the claim. Experimental results on the CHEF dataset demonstrate the effectiveness of FGGNN in achieving accurate fact verification.
Keyword: Fact Verification, Evidence selection, Claim Verification
Cite@inproceedings{ICIC2025,
author = {Xiaoman Xu, Xiaoxu Zhu, and Peifeng Li},
title = {Cross-Document Fact Verification Based on Fine-Grained Graph Neural Network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1039-1050},
}
- DDSR: An Identity Preservation Framework for Facial Privacy Protection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jingxian Zhou, Wanying Zhao, Shuang Wang, and Ping Chen
Abstract: With the increasing risk of facial privacy leakage during image transmission, achieving both recognizability and privacy protection remains a major chal-lenge. This paper proposes a novel diffusion-based framework for facial im-age privacy, termed Diffusion-Driven Steganography and Recovery DDSR . DDSR utilizes a prompt-guided dual-phase diffusion strategy: during the ste-ganography phase, identity prompts guide latent perturbation, and reverse diffusion under unrelated prompts generates semantically irrelevant stego images during the recovery phase, the model re-encodes the stego image and reconstructs the original face under the guidance of the identity prompt. To enhance semantic alignment, we introduce a lightweight Prompt Consistency Regularization PCR , which aligns recovered images and prompts in CLIP semantic space during training. This regularization improves prompt control-lability without adding inference overhead. DDSR is compatible with low-resolution data and small-scale training, and does not rely on high-resolution inputs or large datasets. Extensive experiments demonstrate that DDSR achieves up to 98 face recognition accuracy on recovered images and out-performs prior methods by over 30 in resisting recognition attacks. Fur-thermore, DDSR provides improved robustness under image degradation while maintaining high visual quality and identity fidelity.
Keyword: Deep learning, facial privacy protection, diffusion models, image steganog-raphy
Cite@inproceedings{ICIC2025,
author = {Jingxian Zhou, Wanying Zhao, Shuang Wang, and Ping Chen},
title = {DDSR: An Identity Preservation Framework for Facial Privacy Protection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {663-679},
note = {Poster Volume Ⅰ}
}
- Learning to Adaptively Incorporate External Syntax through Gated Self-Attention, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shengyuan Hou
Abstract: Transformers are known to be able to implicitly learn syntax from data during training, albeit their quality depends heavily on the data size and quality. However, introducing structured syntactic information into the sequential Transformer is not trivial. Previous analytical studies have shown that Transformers learn more abstract representations through layers, with their lower layers being more related to syntactic information. In accordance, to provide extra flexibility and interpretability along with the utilization of constituency syntax, we propose an architecture that allows different layers of the Transformer to control the incorporating weights of external syntax adaptively through a gating mechanism. Experimental results of our learned syntactic gating weights reveal that Transformer tends to utilize constituency syntax hierarchically, which nicely aligns with previous findings, showcasing the interpretability of our architecture. Moreover, experimental results on five machine translation datasets across various language pairs also show that our model outperforms the vanilla Transformer by 1.22 BLEU score on average, and it is competitive against other latest syntax-aware models. Also, only few additional hyperparameters are required, alleviating the burden of searching for the best syntax incorporation location.
Keyword: Adaptive gating mechanism, Constituency syntax-aware architecture, Machine translation.
Cite@inproceedings{ICIC2025,
author = {Shengyuan Hou},
title = {Learning to Adaptively Incorporate External Syntax through Gated Self-Attention},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1051-1067},
}
- CTRL: Contrastive Traffic Recognition with Lightweight network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zijia Song, Yelin Wang, Pan Chen, Zhaobin Shen, Longxi Li, Wanyu Chen, and Yuliang Lu
Abstract: Network traffic classification plays a vital part in network security and management. Facing the growing sophistication of encryption techniques, various works focus to acquire underlying features and try to achieve more advanced identification. However, current methods mostly depend on pre-training or using large models to fit implicit relationship, which could cause imbalance unsupervised learning due to long-tail distribution and may be impractical for intrusion detection because of large cost. Therefore, we propose a non-pretrained lightweight framework, termed Contrastive Traffic Recognition with Lightweight network CTRL , to fully explore spatial-temporal features in traffic. Specifically, two-stream architecture is adopted to decouple the mixed feature extraction while lightweight encoder is further improved to avoid weak representation. By employing contrastive loss, single model can grasp common knowledge from different views, which realizes better traffic recognition. Extensive experiments conducted on six public traffic datasets from various tasks validate the more superior performances of our CTRL which maintains the fewest parameters, compared to state-of-the-art approaches with an average improvement of 7.5 .
Keyword: Traffic classification, Lightweight model, Spatial-temporal features, Contrastive learning
Cite@inproceedings{ICIC2025,
author = {Zijia Song, Yelin Wang, Pan Chen, Zhaobin Shen, Longxi Li, Wanyu Chen, and Yuliang Lu},
title = {CTRL: Contrastive Traffic Recognition with Lightweight network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1092-1103},
note = {Poster Volume Ⅰ}
}
- Intelligent Diagnosis for Breast Cancer Based on Multi-Modal Hierarchical Fusion of Ultrasound Images and Clinical Semantic Features, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jian Liu, Jie Ren, Guohui Wang, Yuqi Yan, Qunyang Zuo, Xinzheng Xue, and Dong Xu
Abstract: Ultrasound imaging plays a key role in breast cancer screening and diagnosis due to its advantages of non-invasive, real-time and low cost. This paper proposes an intelligent breast cancer diagnosis method based on Multi-modal Hierarchical Fu-sion Network MHFNet , aiming to fully integrate complementary information from B-mode image, Doppler image and clinical semantic features. In MHFNet, a Semantic-augmented ResNet SAR was constructed to achieve the deep fusion of image and semantic features. And Hierarchical Semantic Fusion HSF mod-ule and Semantic Integration Bottleneck SIB are designed to enhance the inter-action and fusion of multi-modal information layer by layer. Finally, a unified multi-modal feature representation was developed for breast cancer diagnosis. The experimental results show that the proposed multi-modal classification fusion method is superior to other comparison algorithms, which fully verifies the posi-tive role of multi-modal information complementarity in improving the diagnostic performance of breast cancer.
Keyword: Breast cancer, Ultrasound image, Deep learning, Clinical semantic features, Mul-ti-modal
Cite@inproceedings{ICIC2025,
author = {Jian Liu, Jie Ren, Guohui Wang, Yuqi Yan, Qunyang Zuo, Xinzheng Xue, and Dong Xu},
title = {Intelligent Diagnosis for Breast Cancer Based on Multi-Modal Hierarchical Fusion of Ultrasound Images and Clinical Semantic Features},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2377-2386},
note = {Poster Volume Ⅱ}
}
- An Enhanced Multi-Scale Feature Perception and Improved Feature Extraction-Based Algorithm for Cotton Pest and Disease Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhisheng Wang, Lizhen He, Jiayu Peng, Yiwei Duan, Qi Liu, Zhaohui Wang, Xin Yang, and Jinhai Sa
Abstract: Aiming at the significant variability of cotton leaf pests and diseases in terms of shape, size, and location distribution in the natural environment, as well as the shortcomings of the existing detection models in terms of parameter optimization and detection efficiency, this paper proposes an algorithm for detecting cotton pests and diseases based on enhanced multi-scale feature sensing and improved feature extraction, named SDRM-YOLO.�First, to improve the model's ability to perceive pest and disease feature information and spatial localization accuracy, we propose a Multi-Scale Cross-Space Perception Attention MCPA , which is a mechanism that effectively enhances the model's ability to focus on key target areas by fusing spatial information at multiple scales. Second, to improve the feature extraction quality of the model, we design the C2f-DCN-RCSOSA C2f-DR module, which enables the model to capture the foreground features of the target flexibly and, at the same time, strengthens the focus on the key regions to enhance the feature expression capability. Finally, to reduce the computational complexity of the model and improve the detection speed, we introduce a lightweight network, ShuffleNetv2-RC, into the backbone network to optimize the computational efficiency and maintain a high detection accuracy.�The experimental results show that SDRM-YOLO outperforms other state-of-the-art target detection algorithms on both the Cotton Disease Dataset and Cotton Pest Detect Dataset datasets. Compared with the benchmark model YOLOv8n, the mAP50 metrics were improved by 7.3 and 2.5 , significantly enhancing cotton pest detection's accuracy and robustness.
Keyword: Pests and diseases, target detection, feature extraction, multi-scale feature sensing
Cite@inproceedings{ICIC2025,
author = {Zhisheng Wang, Lizhen He, Jiayu Peng, Yiwei Duan, Qi Liu, Zhaohui Wang, Xin Yang, and Jinhai Sa},
title = {An Enhanced Multi-Scale Feature Perception and Improved Feature Extraction-Based Algorithm for Cotton Pest and Disease Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {680-697},
note = {Poster Volume Ⅰ}
}
- DeCA: A Decomposition-Enhanced Framework for Query-Focused Table Summarization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Luyi Wang, Yake Niu, Tiantian Peng, Renjie Ci, and Hui Zhao
Abstract: Query-focused table summarization aims to generate personalized summaries by reasoning and analyzing tabular data in response to user queries. Existing large language model LLM based methods utilize intermediate facts to enhance reasoning. However, their effectiveness is constrained by limited inference rules and the generation of erroneous sub-queries, potentially introducing misleading information into the summarization. To address these issues, we propose DeCA, an innovative framework designed to generate non-repetitive and relevant sub-queries that support LLM reasoning and improve summary quality. Our framework comprises four modules: 1 Table Schema Extractor that interprets table structure and information 2 Query Decomposer that recursively decomposes queries 3 Sub-query Checker that verifies non-repetition, relevance, and dependencies among sub-queries and 4 Answer Generator that generates summaries employing a hint-based answering strategy. Furthermore, we construct CQTS, the first large-scale Chinese table dataset for query-focused table summarization, consisting of 2,956 tables and 6,721 query-summary pairs. Extensive experiments on CQTS and two English datasets, QTSumm and FeTaQA, demonstrate that DeCA enhances LLM reasoning and outperforms existing methods in summary generation and sub-queries formulation.
Keyword: Query Decomposition, Query-Focused Table Summarization, Large Language Model Reasoning, Chinese Table Dataset.
Cite@inproceedings{ICIC2025,
author = {Luyi Wang, Yake Niu, Tiantian Peng, Renjie Ci, and Hui Zhao},
title = {DeCA: A Decomposition-Enhanced Framework for Query-Focused Table Summarization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1068-1084},
}
- BS-YOLO:A Multi-scale Object Detection Model for Complex Environments, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhaoyang Liu and Guangying Jin
Abstract: Helmet detection is a critical component of road safety. However, existing detection algorithms face challenges in accuracy and recall, particularly when dealing with multi-scale objects in complex environments. To address these issues, this study proposes an improved YOLOv8-based model, denot-ed as BS-YOLO. Firstly, drawing inspiration from StarNet, we propose two modules, namely StarFuseBlock and StarSPPF, to enhance the feature extrac-tion capability of the model at shallow layers. Secondly, to mitigate the loss of texture features of small and medium-sized objects during the feature propagation process, we propose ResBiBlock, which captures global features through residual connections and Biformer Attention. Finally, MPDIoU is employed as a superior alternative to CIoU to enhance computational effi-ciency and provide greater robustness in situations where predicted boxes do not align with ground-truth boxes. To validate the performance of the pro-posed model, a series of extensive experiments were performed on the TWHD dataset.Results show that the BS-YOLO yields a 2.2 increase in precision, a 2.8 improvement in recall, and a 2.2 enhancement in mAP compared to the YOLOv8 baseline. The experimental results indicate that the proposed improvements effectively enhance the performance of the baseline model, particularly in terms of robustness when dealing with multi-scale ob-jects in complex scenarios.
Keyword: YOLOv8, Object Detection, Star Operation, Biformer Attention.
Cite@inproceedings{ICIC2025,
author = {Zhaoyang Liu and Guangying Jin},
title = {BS-YOLO:A Multi-scale Object Detection Model for Complex Environments},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3276-3291},
}
- A Knowledge Distillation Architecture for Pressure Based In-Bed Human Body Reconstruction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wentao Ni, Chen Lei, and Liangjing Yang
Abstract: This paper addresses the critical challenge of supine human mesh reconstruction in clinical monitoring scenarios through an innovative knowledge distillation framework. Confronting the inherent limitations of pressure sensor data—including limb occlusion artifacts and limited 3D expressiveness—we propose a hierarchical teacher-student architecture that synergistically integrates cross-modal knowledge from visual domain expertise. Our method leverages a pre-trained CLIFF model as the teacher to guide pressure-map student networks ResNet variants in estimating SMPL body parameters. The framework achieved 2-4$ $ error reduction across key metrics. This work propose a new solution to optimize pressure-based human body reconstruction and multimodal datasets utilization.
Keyword: Human body reconstruction, Knowledge distillation, Mul-timodal data fusion
Cite@inproceedings{ICIC2025,
author = {Wentao Ni, Chen Lei, and Liangjing Yang},
title = {A Knowledge Distillation Architecture for Pressure Based In-Bed Human Body Reconstruction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3726-3734},
}
- Vision-Based Pedestrian Gesture Recognition System Using Spatiotemporal Features , ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: MD AMINUL ISLAM ; XIAOHUI CUI
Abstract: Pedestrians, classified as Vulnerable Road Users VRUs due to their lack of protective equipment, face high risks in traffic collisions. While crossing roads, VRUs frequently use communicative gestures such as raising a hand with the palm outward-to signal drivers to stop. Although human drivers can intuitively interpret these gestures, autonomous and driver-assistance systems still lack robust dynamic gesture interpretation, contributing to safety-critical failures. Despite progress in human-vehicle interaction for autonomous driving, prior research has largely emphasized on traffic police signal recognition, pedestrian trajectory prediction, or movement-based intent analysis, neglecting VRUs’ explicit gesture-based interactions. To address this gap, we present a systematic taxonomy of pedestrian gesture behaviors, grounded in real-world observations along with a custom dataset. We propose a robust recognition framework that combines spatial feature extraction and deep learning to interpret these gestures. Our method leverages geometric relationships in body keypoints to model spatial patterns, while temporal dynamics are captured using a Long Short-Term Memory LSTM network. This architecture processes sequential geometric features to identify distinctive spatiotemporal characteristics of pedestrian gestures. Experiments on our proposed custom dataset and the public CTPG dataset demonstrate a recognition accuracy of 95.18 with near real-time inference speeds, surpassing existing vision-based approaches for VRU gesture recognition.
Keyword: Autonomous driving · Vulnerable Road User · Spatiotemporal features · long-short term memory LSTM .
Cite@inproceedings{ICIC2025,
author = {MD AMINUL ISLAM ; XIAOHUI CUI},
title = {Vision-Based Pedestrian Gesture Recognition System Using Spatiotemporal Features },
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2067-2083},
note = {Poster Volume Ⅱ}
}
- Enhancing Vulnerability-Fixing Commit Classification: The Synergy of User-Guided and LLM, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yaning Zheng, Honglin Zhuang, Dongxia Wang, Huayang Cao, and Cheng Qian
Abstract: With the increasing complexity of software development environments, identifying and fixing vulnerabilities has become a key aspect of software maintenance. One way to improve the efficiency and effectiveness of vulnerability-fixing is to classify vulnerability-fixing commits. However, the existing vulnerability-fixing classification methods are limited to code language, code length, commit dataset, ambiguous and domain specialized commits, which leads to low precision. In this paper, we propose a user-guided classification method for vulnerability-fixing commits. For ambiguous and domain specialized commits, we incorporate human involvement and timely intervention in the process of fine-tuning the BERT model. Furthermore, a large language model LLM is employed to address the challenges posed by the variant code language and length. Experiment results show that our approach significantly improves the performance of commit classification. The user-guided BERT message classifier accuracy increases by 2~5 compared with baseline methods after 10 iterations of human participation. Based on the TensorFlow dataset, the patch classifier using LLM outperforms HERMES by 11.6 in terms of F1-score. In summary, our overall classification which combined the results of message classifier and patch classifier outperforms the HERMES by 14.6 and VulCurator by 5.6 .
Keyword: commit classification,user-guided,LLM
Cite@inproceedings{ICIC2025,
author = {Yaning Zheng, Honglin Zhuang, Dongxia Wang, Huayang Cao, and Cheng Qian},
title = {Enhancing Vulnerability-Fixing Commit Classification: The Synergy of User-Guided and LLM},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1104-1121},
note = {Poster Volume Ⅰ}
}
- Emotion Traceability Analysis: A Multi Strategy Framework for LLM Dialogue Processing, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zichen Yu, Senwei Liang, Aiyu Li, Xiaoyi Zhu, Tianlan Pan and Jionglong Su
Abstract: Recent approaches in emotional causal reasoning have leveraged Retrieval-Augmented Generation RAG and multimodal fusion to enhance the accuracy of large language models LLMs in analyzing emotions. As a critical cognitive process for understanding, inferring and predicting the antecedents and consequences of emotional states, emotional causal reasoning primarily involves two components: emotional understanding and causal inference. However, LLMs face two key challenges in analyzing emotional causality: 1 the inability to process ultra-long texts due to input length constraints, and 2 insufficient capability to track emotional dynamics in dialogues. To address these limitations, we propose the Emotion Traceability Analysis Framework ETAF , which employs RAG-based keyword retrieval to extract critical events from dialogues and dynamically segments conversations according to event progression, enabling LLMs to comprehend contextualized events holistically. In addition, we integrate character analysis and variation correction modules to improve the precision of the model in tracking emotional causal chains between characters and refining the interpretation of emotional shifts. Experimental results on the ATLAS-6 dataset demonstrate that our framework improves the performance of GLM-4-air by 17.79 , outperforming DeepSeek-R1 origin by 6.49 and achieving state-of-the-art results.
Keyword: Emotional Causal Reasoning, Large Language Models, Retrieval-Augmented Generation, Emotion Traceability
Cite@inproceedings{ICIC2025,
author = {Zichen Yu, Senwei Liang, Aiyu Li, Xiaoyi Zhu, Tianlan Pan and Jionglong Su},
title = {Emotion Traceability Analysis: A Multi Strategy Framework for LLM Dialogue Processing},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1085-1096},
}
- Shadow Model Craft: An Efficient Framework for Privacy-Preserving Inference, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Meijuan Li, Yidong Wang, Ziwen Wei, and Zhe Cui
Abstract: Privacy-preserving inference PPI has become a critical requirement in Machine Learning as a Service MLaaS , where both user inputs and model parameters are sensitive assets. Existing cryptographic-based approaches, such as those relying on homomorphic encryption or secure multi-party computation MPC , often suffer from substantial computational and communication overhead, making them impractical for large-scale deep learning models. In this paper, we propose a novel and efficient framework that protects model and input privacy while significantly improving inference efficiency. The core of our approach is textit{Shadow Model Craft}, a structural model decomposition strategy inspired by secret sharing. Instead of encrypting model parameters, we distill the original model into multiple lightweight shadow models with disjoint functionality and distribute them across non-colluding servers. Each server performs inference over secret-shared inputs using plaintext model fragments, thus eliminating the need for encrypted model parameters. Our design allows local execution of linear operations, further reducing inference latency. Experiments on CIFAR-10 and ImageNet demonstrate that our framework achieves strong privacy guarantees with up to 90 model compression and over remarkable speedup compared to other sota works, all while maintaining competitive inference accuracy. This work offers a practical and scalable solution for secure deep learning inference in real-world deployments.
Keyword: Privacy-Preserving Inference,Machine Learning as a Service MLaaS,Multi-Party Computation
Cite@inproceedings{ICIC2025,
author = {Meijuan Li, Yidong Wang, Ziwen Wei, and Zhe Cui},
title = {Shadow Model Craft: An Efficient Framework for Privacy-Preserving Inference},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1122-1135},
note = {Poster Volume Ⅰ}
}
- GLAD-Net: Global-Local Adaptive Fusion and Cross-Stage Distillation for Cross-Level Multi-Scale Medical Image Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wenkai Zhao, Lingwei Zhang, Yun Zhao, Xuecheng Bai, Zhenhuan Xu, and Yidi Li
Abstract: Medical image segmentation is crucial for disease diagnosis, treatment planning, and surgical navigation.Despite the advances achieved by U-Net-based multi-scale fusion methods, challenges persist, including difficulties in synergistically modeling global and local features amid lesion variations, inadequate cross-level coupling of multi-scale information, and the presence of asymmetric supervisory signals between the encoder and decoder.Accordingly, this study introduces GLAD-Net: by incorporating a Global-Local Adaptive Module GLAM and a Selective KAN Module SKM into the U-Net backbone for dynamic weighted feature fusion to enhance hierarchical feature representation by constructing a Channel-Spatial Collaborative Attention CSCA mechanism in the cross-layer connections that exploits the continuous spatial modeling ability of the KAN network to boost multi-scale feature expression by employing a Cross-Level Multi-Scale Selective Fusion CMSF module in the decoder to merge SKM-weighted decoded features from early layers with corresponding encoded features to enhance feature representation and by applying a Cross-Stage Self-Distillation CSSD framework to reverse-distill high-level semantic features from the decoder into the early encoder stages to alleviate semantic bias.Experimental results show that GLAD-Net outperforms existing methods on most metrics in both the ISIC2017 and ISIC2018 datasets.Our source codes will be available.
Keyword: Medical image segmentation, Cross-Level Multi-scale feature fusion, Global-local feature modeling, Self-distillation framework
Cite@inproceedings{ICIC2025,
author = {Wenkai Zhao, Lingwei Zhang, Yun Zhao, Xuecheng Bai, Zhenhuan Xu, and Yidi Li},
title = {GLAD-Net: Global-Local Adaptive Fusion and Cross-Stage Distillation for Cross-Level Multi-Scale Medical Image Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2083-2097},
note = {Poster Volume Ⅱ}
}
- MSFuzz: Directed Greybox Fuzzing Using Multi-Target Sensitivity-Based Energy Scheduling, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chengwei Qin and Zhao Ma
Abstract: Directed Greybox Fuzzing DGF effectively targets specific program locations for bug discovery, but existing tools face challenges in multi-target directed fuzzing due to static stage division and coarse energy scheduling. Key challenges include global optimization biases that overlook lower-priority targets, inadequate prioritization of seeds that reach multiple targets, and inflexible exploration-exploitation stage allocation. This paper presents adaptive strategies to tackle these issues: a multi-target sensitivity-based energy scheduling approach that dynamically prioritizes seeds based on their target sensitivity, and a state-aware stage coordination strategy that balances exploration and exploitation using real-time fuzzing metrics to enable flexible stage transitions. We implemented these techniques in the tool MSFuzz, which optimizes resource allocation to avoid single-target bias and prevent inefficient stage durations. Evaluations on Magma, FuzzBench, and real-world programs show that MSFuzz outperforms state-of-the-art fuzzers like AFLGo, achieving 6.57× faster crash reproduction on Magma and 1.32× higher target-guided efficiency on FuzzBench. MSFuzz also discovered 27 unique crashes 13 CVEs in real-world programs.
Keyword: directed greybox fuzzing, bug discovery, energy scheduling.
Cite@inproceedings{ICIC2025,
author = {Chengwei Qin and Zhao Ma},
title = {MSFuzz: Directed Greybox Fuzzing Using Multi-Target Sensitivity-Based Energy Scheduling},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1136-1152},
note = {Poster Volume Ⅰ}
}
- CAAT: Channel-Aggregated Attention Transformer for Efficient Multivariate Time Series Forecasting, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wenhao Tang, Wanli Zhao, and Fang Mei
Abstract: Multivariate Time Series Forecasting MTSF holds significant application value in finance, energy, transportation, and other domains. Existing Transformer-based approaches typically face two critical challenges: 1 The computational complexity of cross-channel modeling grows quadratically with the number of channels, and 2 Noisy interference in cross-channel information leads to inefficient dependency modeling. This paper proposes a lightweight model named CAAT Channel-Aggregated Attention Transformer that achieves efficient forecasting through a Channel-Aggregation Module CAM and spatiotemporally decoupled attention mechanisms.The CAAT framework first compresses multivariate sequences into latent representations via MLPs, followed by saliency-based probabilistic sampling to select high signal-to-noise ratio channel features. Subsequently, the aggregated channel features are injected into the temporal dimension, enabling joint modeling of cross-temporal and cross-channel dependencies through temporal-axis attention mechanisms alone. Experimental results demonstrate that CAAT achieves significant improvements in prediction accuracy compared to baseline methods.
Keyword: MTSF, Transformer, Spatiotemporal Decoupled Attention, Channel Aggregation Module.
Cite@inproceedings{ICIC2025,
author = {Wenhao Tang, Wanli Zhao, and Fang Mei},
title = {CAAT: Channel-Aggregated Attention Transformer for Efficient Multivariate Time Series Forecasting},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {221-232},
}
- NeuLF-Net: A Neural Latent Fusion Network for 3D Surface Reconstruction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Boren Li, Xuhua Shi, and Haizhen Yu
Abstract: Learning-based implicit neural networks have achieved inspirational perfor-mance on point cloud surface reconstruction. To reconstruct continuous sur-faces from raw, discrete point clouds, existing methods typically project point clouds to grid latents or directly encode them as point latents. Howev-er, these methods rarely combine grid latents and point latents effectively and typically only perform simple topological transformations that ignore the spatial positional information of points, which seriously restricts the ability to capture fine details. In addition, traditional linear interpolation fails to sufficiently consider the global spatial information when inferring features of spatial points in sparse regions, resulting in a complete loss of expressiveness in some regions. In this paper, we propose a novel neural latent fusion net-work, named NeuLF-Net. The network serves as an end-to-end surface re-construction framework, efficiently retaining the spatial encoding ad-vantages of grid latents while capturing the fine-grained descriptive power of point latents. Specifically, we introduce a Neighbor Grid Enhancement Lay-er, which fully utilizes the neighbor information of the grid latents and point latents to enable enhancement of the two latents type. Furthermore, we de-sign a novel adaptive interpolation strategy that exhibits better adaptability for point cloud spatial feature extraction. We extensively evaluate our pipe-line with previous methods on three datasets including ShapeNet, Synthetic Rooms and ScanNet. Both quantitative and qualitative analyses demonstrate that NeuLF-Net substantially enhances the overall quality of point cloud re-construction. From a visual perspective, the reconstruction results appear more realistic.
Keyword: NeuLF-Net, Implicit surface reconstruction, Point clouds, Adaptive interpo-lation strategy.
Cite@inproceedings{ICIC2025,
author = {Boren Li, Xuhua Shi, and Haizhen Yu},
title = {NeuLF-Net: A Neural Latent Fusion Network for 3D Surface Reconstruction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3292-3307},
}
- A Decision Tree Based On Related Family, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wenxing Li, Xin Yang, Meihua Liu, and Tian Yang
Abstract: Decision trees are widely used supervised learning models known for their simplicity, interpretability, and effectiveness in classification and regression tasks. Feature selection can remove redundant and noisy features, enhancing the generalization and robustness of decision trees. However, due to the high computational cost of existing feature selection methods, it is typically applied only once before classifier training, providing the classifier with dimensionally reduced data. This limits the synergistic effect between feature selection and the construction of split nodes in decision trees. The Related Family is an efficient feature evaluation method proposed by our research team. Its efficiency allows us to use it in the construction of split nodes in decision trees, leading to better splitting criteria. Building on this method, We introduce the Dynamic Related Family Decision Tree DRFDT , which dynamically selects optimal features for each sample subgroup as the tree grows. Experiments demonstrate that DRFDT outperforms a wide range of classification algorithms across 15 UCI datasets, achieving an average accuracy of 89.30 . This represents significant improvements over classical single-feature decision tree methods CART: _3.87 , traditional classification algorithms KNN: _5.71 , SVM: _4.54 , multi-feature split decision tree algorithms CART-LC: _3.99 , O1: _4.25 , and state-of-the-art decision tree classification algorithms FGBDT: _4.88 , MPRBC: _4.77 , RSLRS: _26.84 .
Keyword: Rough set theory, Decision trees, Related family, Feature selection
Cite@inproceedings{ICIC2025,
author = {Wenxing Li, Xin Yang, Meihua Liu, and Tian Yang},
title = {A Decision Tree Based On Related Family},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3710-3723},
}
- Exploration-Enhanced Dueling Double Deep Q-Network with Random Network Distillation for Satellite Beam Selection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zijing Cheng, Chuxiong Sun, Shuaijun Liu, and Lixiang Liu
Abstract: With the evolution of satellite communication systems towards achieving low-latency and high-throughput performance, dynamic beam resource scheduling emerges as a challenging sequential decision-making task that can be effectively tackled using deep reinforcement learning DRL . However, owing to the sparse channel characteristics and complex multi-user interference in satellite communications, traditional DRL methods struggle to obtain effective learning signals during exploration, resulting in suboptimal resource allocation efficiency. To address this challenge, in this work, we propose Beam selection with Integrated RND BIRD , a novel framework that combines the Dueling Double Deep Q-Network DQN architecture with Random Network Distillation RND to enhance exploration capabilities in sparse state spaces. Our main innovations include the design of an enhanced solution framework that integrates Dueling DQN-based value evaluation architecture with RND mechanism to improve exploration efficiency through intrinsic rewards. Additionally, we develop a novel Markov Decision Process MDP model for formalizing the beam selection as a sequential decision problem. Simulation results demonstrate that BIRD achieves a significant 24.1 improvement in system sum rate compared to traditional beam selection methods.
Keyword: Multi-Beam Satellite、Deep Reinforcement Learning、Random Network Distillation 、Beam Selection.
Cite@inproceedings{ICIC2025,
author = {Zijing Cheng, Chuxiong Sun, Shuaijun Liu, and Lixiang Liu},
title = {Exploration-Enhanced Dueling Double Deep Q-Network with Random Network Distillation for Satellite Beam Selection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1153-1170},
note = {Poster Volume Ⅰ}
}
- Region Features Propagation with Class-Aware Contrastive Learning for Weakly Supervised Cardiac Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yu Xiao, Ping Wang, Xiuyang Zhao, Dongmei Niu, and Jinshuo Zhang
Abstract: Cardiac segmentation is crucial for analyzing heart structure and func tion, providing essential support for clinical diagnosis and treatment planning. However, obtaining fully annotated images is both costly and time-consuming. Scribble annotations, which utilize simple lines instead of pixel-wise annotations, offer a cost-effective alternative but lack sufficient information, making segmentation network training challenging. To address this, we propose ScribbleCorrNet SCN , a novel framework for scribble-supervised medical image segmentation. SCN employs Correlation-Aware Label Enhancement CALE strategies, introducing two key mechanisms: i pixel affinity propagation PAP , which propagates high-confidence pixels using pairwise similarities in a correlation map, and ii region shape refinement RSR , which refines pseudo-labels by leveraging shape information encoded in the correlation map. Additionally, a class-aware contrastive learning CACL mechanism enhances intra-class consistency and inter-class separation. Experiments on the ACDC2017 and MSCMR datasets demonstrate SCN’s superior performance compared to existing scribble-based segmentation methods.
Keyword: Cardiac segmentation, scribble annotation, weakly supervised learning, contrastive learning.
Cite@inproceedings{ICIC2025,
author = {Yu Xiao, Ping Wang, Xiuyang Zhao, Dongmei Niu, and Jinshuo Zhang},
title = {Region Features Propagation with Class-Aware Contrastive Learning for Weakly Supervised Cardiac Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {698-711},
note = {Poster Volume Ⅰ}
}
- Cross-Domain Functional Knowledge Integration Architecture Powered by Deep Reinforcement Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ding Yishen, Hu Yahong, Xie Youbai, Meng Xianghui, and Mao Jiafa
Abstract: To address the challenges associated with the inefficient generation of product functional design schemes in the face of complex user requirements, a multi-expert optimized reinforcement learning search network is proposed. By adding non-functional factors to functional design knowledge representation, the algorithm breaks through the limitations of traditional single-dimensional evaluations and optimizes the generated functional unit chains. A highly efficient circular experience pool and dynamic priority sampling strategy are proposed to improve experience storage efficiency and training stability. Combining the dynamic weighting mechanism and the Mixture of Experts Model enhances the algorithm’s adaptability to complex design tasks. Experiments show that the circular experience pool technology can eliminate memory fragmentation, increase storage efficiency by 86.60 , and accelerate model convergence speed by 88.20 . The dynamic weighting mechanism maintains a stable success rate of 93.60 in scenarios with variable requirements, and the MoE model increases the search success rate to 94.33 .
Keyword: functional knowledge integration, knowledge representation, deep reinforcement learning, Mixture of Experts
Cite@inproceedings{ICIC2025,
author = {Ding Yishen, Hu Yahong, Xie Youbai, Meng Xianghui, and Mao Jiafa},
title = {Cross-Domain Functional Knowledge Integration Architecture Powered by Deep Reinforcement Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {378-394},
}
- Neural Rule Learning with Network Architecture Search for Interpretable Classification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xincheng He, Xueting Jiang, Haoran Liu, Shuo Guan, Jiayu Xue, Bowen Shen, and Yuangang Wang
Abstract: Deep neural network models have achieved unprecedented success in modeling unstructured data across tasks such as computer vision, speech recognition, and natural language processing. However, their inherent limitations in model transparency and interpretability hinder the analysis of underlying mechanisms, prediction processes, and decision rationales, restricting their application in critical domains such as healthcare, finance, and judiciary. Traditional rule-based models exhibit strong transparency and interpretability. However, they are overly dependent on feature engineering, which significantly increases the cost of human intervention. Furthermore, their limited capability to represent data restricts the efficient and comprehensive utilization of large-scale datasets. While ensemble learning methods can enhance predictive performance, they often do so at the expense of model interpretability. To address the challenge of balancing predictive performance and interpretability in structured data classification tasks, we proposes an interpretable classification method that integrates neural rule learning with network architecture search in this paper. On the one hand, this method automatically learns interpretable logical rules to represent and classify the data. On the other hand, by incorporating network architecture search techniques, the model adapts to the characteristics of the dataset and determines the optimal network structure to achieve superior predictive performance. Through comparative experiments with different types of classification models across multiple datasets, we find that the proposed method demonstrates strong competitiveness in both predictive accuracy and interpretability. It is capable of generating highly interpretable logical rules while maintaining excellent predictive performance.
Keyword: Neural Rule Learning, Network Architecture Search, Interpretable Classifier, Tabular Data, Deep Neural Network
Cite@inproceedings{ICIC2025,
author = {Xincheng He, Xueting Jiang, Haoran Liu, Shuo Guan, Jiayu Xue, Bowen Shen, and Yuangang Wang},
title = {Neural Rule Learning with Network Architecture Search for Interpretable Classification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2331-2346},
note = {Poster Volume Ⅱ}
}
- Multi-Stage Hallucination Mitigation and Structured Output Generation via Guided Inference and Model Synergy, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Wang Ruan, Tengda Qi, Jun He, Bo Sun, and Guomin Zheng
Abstract: In the pursuit of advanced artificial intelligence capabilities, the challenges posed by Large Language Models LLMs cannot be overlooked. LLMs, despite their capacity for multi-step logical reasoning through CoT prompting, often encounter issues such as hallucinations that undermine the accuracy of results and inefficiency in processing. Recognizing the complementary strengths of small Language models, we introduce the MS-HM Synergy Multi-Stage Hallucination Mitigation Synergy framework. This novel framework, centered around guided inference and model synergy, comprises three essential stages. Firstly, Guided Inference utilizes LLMs for initial reasoning, tapping into their language understanding. Secondly, Hallucination Detection acts as a safeguard, meticulously identifying and eliminating unreliable outputs. Lastly, Result Standardization ensures the generation of coherent and structured outputs. Methodologically, LLMs are tasked with complex reasoning, while small Language models play a crucial verification role. Empirical results on benchmarks like MMLU, MATH, and LogiQA exhibit substantial performance improvements. The MS-HM Synergy not only effectively mitigates hallucinations for enhanced reliability but also boosts efficiency and flexibility, heralding a new era of leveraging combined model strengths to overcome LLM limitations.
Keyword: Large Language Models LLMs Multi-step Logical Reasoning Hallucinations Small Language Models Model Synergy.
Cite@inproceedings{ICIC2025,
author = {Wang Ruan, Tengda Qi, Jun He, Bo Sun, and Guomin Zheng},
title = {Multi-Stage Hallucination Mitigation and Structured Output Generation via Guided Inference and Model Synergy},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1097-1110},
}
- A Robust Image Blind Watermarking Scheme Based on Staged Adaptive Strategy, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hu Deng, Feng Chen, Pei Gan, Rongtao Liao, and XueHu Yan
Abstract: Image blind watermarking is a critical tool for copyright protection and verification of digital images. Existing watermarking schemes usually perform well under a single noise condition. However, in practical applications, watermarked images are often exposed to many different types of noise. This combined noise condition significantly degrades the image quality and watermark extraction accuracy of existing watermarking schemes. To address these challenges, we propose a novel two-stage training strategy that enhances watermarking robustness by training the model with various noise intensities, improving performance under combined noise conditions. To further improve the imperceptibility of the watermarked image while ensuring high accuracy of watermark extraction, we propose a strength balanced watermarking optimization algorithm in the model testing phase. Furthermore, due to the non-differentiable nature of JPEG compression, existing schemes cannot effectively obtain satisfactory watermarking performance for JPEG compression. We introduce a differentiable fine-grained JPEG compression module to improve the robustness of existing schemes for JPEG compression. Experimental results indicate that our proposed scheme outperforms state-of-the-art schemes under multiple noise conditions. Under noise-free condition, it achieves a 0 bit error rate and 53.55 dB PSNR, and under combined noise conditions, it still achieves an average of 2.40 bit error rate and 42.70 dB PSNR.
Keyword: Image blind watermarking, Staged training strategy, JPEG compression.
Cite@inproceedings{ICIC2025,
author = {Hu Deng, Feng Chen, Pei Gan, Rongtao Liao, and XueHu Yan},
title = {A Robust Image Blind Watermarking Scheme Based on Staged Adaptive Strategy},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1168-1183},
note = {Poster Volume Ⅰ}
}
- LightDrone-YOLO: A Novel Lightweight and Efficient Object Detection Network for Unmanned Aerial Vehicles, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xin Li, Tianze Zhang, Yifan Lyu, Zhixuan Miao, and Gang Shi
Abstract: In recent years, the application of unmanned aerial vehicles UAVs has grown exponentially in various fields due to their convenience. These vehicles have become ubiquitous in numerous fields, including environmental monitoring, agricultural management, urban planning, traffic monitoring, and emergency rescue, playing an instrumental role in these domains. However, target detection from the perspective of a drone is fraught with challenges. These challenges include the difficulty of detecting small targets, interference from lighting and background in complex scenes, and limited hardware resources. To address these challenges, we have enhanced the YOLOv8 model and introduced a lightweight and efficient target detection model specifically designed for the perspective of a drone, named LightDrone-YOLO. Firstly, a specialised layer is incorporated into the model for the purpose of enhancing detection of small targets. Secondly, a lightweight multi-scale feature fusion neck LMFF-Neck is designed to reduce the number of parameters and computational complexity of the model and improve the fusion of multi-scale features. Thirdly, we improved the C2f module and renamed it C2f-MFEM, which is designed to enhance feature extraction. Finally, the spatial feature weighting fusion SFWF module was designed to accurately select the most valuable spatial information during the multi-scale feature fusion process. Experimental results on the Visdrone 2021 dataset demonstrate the effectiveness of the proposed method, and the mean accuracy mAP is substantially improved. In the validation and test datasets, the proposed method demonstrated superiority over other prevalent lightweight models, with mAP50 reaching 40.8 and 32.5 .
Keyword: YOLOv8, Feature fusion, Aerial Images, Object detection
Cite@inproceedings{ICIC2025,
author = {Xin Li, Tianze Zhang, Yifan Lyu, Zhixuan Miao, and Gang Shi},
title = {LightDrone-YOLO: A Novel Lightweight and Efficient Object Detection Network for Unmanned Aerial Vehicles},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1680-1695},
note = {Poster Volume Ⅱ}
}
- Hybrid Point-Pillar-Transformer Network for 3D Small Object Detection in Autonomous Driving, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Rongjie Wang and Shuo Yang
Abstract: With the increasing demand for object detection accuracy in scenarios such as intelligent transportation and autonomous driving, methods that utilize point cloud and voxel features to achieve multi-modal feature fusion have become increasingly common in 3D object detection. However, existing methods often rely on inefficient linear fusion strategies during the multi-modal feature fusion process, which fails to adequately capture the dependencies between multi-source features, leading to insufficient feature integration. Additionally, during feature extraction, limitations in network architecture result in a lack of interaction between shallow and deep features, causing the loss of fine-grained feature information, which particularly affects the detection of small objects.To address these issues, we propose the Hybrid Point-Voxel-Transformer Network HPP-TNet , a two-stage object detection framework that integrates point and pillar features. Specifically, we design a fine-grained pillar feature extraction module CFPEM , which effectively alleviates the feature loss problem caused by voxel downsampling through shallow-deep feature interaction and lightweight attention design. Next, we develop a transformer-based multi-scale feature fusion module TMFFM , which dynamically achieves cross-modal associations through amulti-head attention mechanism, enhancing context-aware features and fully realizing multi-source feature fusion. Experiments on the KITTI dataset demonstrate that our proposed algorithm achieves competitive detection performance compared to several state-of-the-art methods, particularly in the Cyclist and Pedestrian categories. Our code will be open-sourced soon.
Keyword: 3D Object Detection · Multi-modal transformer feature fusion · Pillar Features.
Cite@inproceedings{ICIC2025,
author = {Rongjie Wang and Shuo Yang},
title = {Hybrid Point-Pillar-Transformer Network for 3D Small Object Detection in Autonomous Driving},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2098-2108},
note = {Poster Volume Ⅱ}
}
- DTS-YOLO: Enhancing Object Detection via Dynamic Routing, Texture Encoding, and Semantic Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shipeng Zheng, Sen Zhang, and Yulin Chen
Abstract: To address limited feature representation, insufficient cross-scale fusion, and localization inaccuracies in complex scenes, we propose DTS-YOLO, a lightweight single-stage detector. It improves detection through dynamic fea-ture aggregation, fine-grained texture encoding, and precise bounding box regression. Specifically, the Dynamic Route Enhanced Aggregation Module DREAM integrates multi-branch depthwise convolutions and lightweight Transformers to enrich multi-scale representations. To mitigate semantic in-consistency in fusion, Dynamic Cross-scale Feature Fusion DCFF com-bines Scale-aware Channel Attention Fusion SCAF and Intra-layer Feature Fusion Attention IFFA for enhanced semantic alignment. Additionally, edge and texture perception is reinforced via Sobel and Laplacian Pyramid modules. For robust localization, a novel Closed Complete IoU CCIoU loss introduces morphological closure operations to refine bounding box align-ment under occlusion. Experiments on VisDrone2019 and DOTA-v1.5 HBB demonstrate consistent performance gains over baseline YOLO11, especially for small and dense objects in complex environments.
Keyword: Object detection, multi-scale fusion, texture encoding, semantic attention, bounding box regression.
Cite@inproceedings{ICIC2025,
author = {Shipeng Zheng, Sen Zhang, and Yulin Chen},
title = {DTS-YOLO: Enhancing Object Detection via Dynamic Routing, Texture Encoding, and Semantic Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3307-3323},
}
- Safe Policy Improvement based on epsilon-bisimulation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuan Zhuang
Abstract: This paper studies the Safety Policy Improvement SPI problem in Batch Reinforcement Learning Batch RL , which aims to train a policy from a fixed dataset without environment interaction, while ensuring its perfor-mance is no worse than the baseline policy used for data collection. Most existing SPI methods impose constraints on training, but these constraints often make the training overly conservative, especially in complex envi-ronment where satisfying the constraints requires large amounts of data. Meanwhile, ?-bisimulation, a general state abstraction technique, has been widely used to enhance sample efficiency in reinforcement learning RL . However, applying ?-bisimulation transforms the original dataset into one over abstracted observation, which typically violates the assumption of in-dependent and identically distributed i.i.d. samples required by existing SPI methods. To address this limitation, this paper proposes a constraint for policy learning that incorporates ?-bisimulation to improve sample effi-ciency while ensuring the policy's performance.
Keyword: Batch Reinforcement Learning, Safe policy Improvement, ?-bisimulation
Cite@inproceedings{ICIC2025,
author = {Yuan Zhuang},
title = {Safe Policy Improvement based on epsilon-bisimulation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {395-408},
}
- Boosting Adversarial Transferability via High-Frequency Augmentation and Hierarchical-Gradient Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yayin Zheng, Chen Wan, Zihong Guo, Hailing Kuang, and Xiaohai Lu
Abstract: Adversarial attacks have become a significant challenge in the security of ma-chine learning models, particularly in the context of black-box defense strategies. Existing methods for enhancing adversarial transferability primarily focus on the spatial domain. This paper presents Frequency-Space Attack FSA , a new adversarial attack framework that effectively integrates frequency-domain and spatial-domain transformations. FSA combines two key techniques: 1 High-Frequency Augmentation, which applies Fourier transform with frequency selective amplification to diversify inputs and emphasize the critical role of high-frequency components in adversarial attacks, and 2 Hierarchical-Gradient Fusion, which merges multi-scale gradient decomposition and fusion to capture both global structures and fine-grained details, resulting in smoother perturbations. Our experiment demonstrates that FSA consistently outperforms state-of-the-art methods across various black-box models. Notably, our pro-posed FSA achieves an average attack success rate increase of 23.6 compared with BSR CVPR 2024 on eight black-box defense models.
Keyword: Adversarial Attack, Transferability, Frequency-Space Attack
Cite@inproceedings{ICIC2025,
author = {Yayin Zheng, Chen Wan, Zihong Guo, Hailing Kuang, and Xiaohai Lu},
title = {Boosting Adversarial Transferability via High-Frequency Augmentation and Hierarchical-Gradient Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3323-3337},
}
- MPPose: An Efficient Multi-Path Network for 2D Human Pose Estimation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Qing Peng, Zhongteng Zhang, Zihao Zhang, Liu Zhang, Jing Chong, and Weihong Huang
Abstract: Human pose estimation models are increasingly deployed on low-computation devices, with extensive applications in motion capture and sports rehabilita-tion. The multi-scale feature extraction capability of high-resolution networks HRNet effectively addresses the issue of varying human body scales, enhanc-ing the accuracy of lightweight models based on HRNet. However, the high-resolution architecture results in a more complex network structure and in-creased computational overhead. This paper introduces MPPose, a top-down human pose estimation framework that integrates coordinate classification based on keypoint heatmap representation. We design a single-branch network based on a high-resolution architecture, which implicitly retains and fuses mul-ti-scale features. The multi-path network maintains both the simplicity of sin-gle-branch network and the effectiveness of high-resolution network, resulting in a simpler and more efficient architecture. Based on the high-resolution architectures, we retain only the blocks in the lowest-resolution branch and employ both cross-resolution and same-resolution feature fusion. We redesign an efficient block inspired by the shuf-fle block, which we called the Channel Expansion Attention Module CEAM . CEAM compensates for the reduction in channel information caused by chan-nel splitting by introducing a channel scaling module and a channel attention module. We evaluate our model against state-of-the-art top-down methods on the COCO and MPII datasets. Results show that it reduces computational overhead by 20 and improves inference speed by 37 , while achieving accu-racy on par with Lite-HRNet.
Keyword: 2D Human Pose Estimation, Lightweight Network, Efficient Block.
Cite@inproceedings{ICIC2025,
author = {Qing Peng, Zhongteng Zhang, Zihao Zhang, Liu Zhang, Jing Chong, and Weihong Huang},
title = {MPPose: An Efficient Multi-Path Network for 2D Human Pose Estimation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {712-727},
note = {Poster Volume Ⅰ}
}
- CvdKG: Cardiovascular Disease Knowledge Graph Construction with Cascading Pointer Networks, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yu Song, Bohan Yu, Dezhi Kong, Pengcheng Wu, Shuai Zhang, Xia Liu, Kejun Wu, and Kunli Zhang
Abstract: Cardiovascular disease is a common major chronic disease characterized by high mortality rate, and high difficulty in rehabilitation. This paper constructs a Cardiovascular diseases Knowledge Graph CvdKG . According to the medical vocabulary and medical knowledge base, CvdKG has determined 15 entity types and 74 relationship types with diseases and operations as the core. The Cascading pointer network CASREL_pclBERT model suitable for the medical field is used to automatically extract knowledge from medical texts and manually proofread them. Knowledge fusion is carried out based on multi similarity weighting. The constructed CvdKG includes 217 core cardiovascular diseases, 8,845 related diseases, 433 surgeries, and 68,316 triples. CvdKG can provide data support for intelligent question answering and auxiliary diagnosis of cardiovascular diseases.
Keyword: Knowledge graph Cardiovascular disease Cascading pointer network
Cite@inproceedings{ICIC2025,
author = {Yu Song, Bohan Yu, Dezhi Kong, Pengcheng Wu, Shuai Zhang, Xia Liu, Kejun Wu, and Kunli Zhang},
title = {CvdKG: Cardiovascular Disease Knowledge Graph Construction with Cascading Pointer Networks},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2269-2282},
note = {Poster Volume Ⅱ}
}
- An Optimized Object Detection Approach on Medical Image using Feature Enhancement and Dynamic Loss, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yingjun Liu, Fuchun Liu, and Yingbin Huang
Abstract: Detection based on medical images is crucial for improving disease cure rates and patient prognosis. However, existing methods have limitations in feature extraction and computational efficiency. This paper presents an optimized method for medical image object detection using feature enhancement and dynamic loss FENet-UIoU . It combines receptive field attention convolution RFAConv , efficient up-sampling convolution block EUCB , large separable kernel attention LSKA with UIoU loss function to overcome the limitations of traditional convolutional neural network CNNs in medical image detection. RFAConv highlights tumor features through spatial attention mechanisms, EUCB improves feature map resolution and computational efficiency, LSKA enhances feature capture and expression, and unified intersection over union UIoU loss function uses dynamic weight allocation to optimize prediction box focus. Experimental results show that the proposed model achieves a performance improvement 7.8 on mean average precision mAP when adopting the proposed method in comparison with baseline. Meanwhile, ablation experiments verify the synergistic effect of each module, which shows that his study provides a high-precision and high-efficiency solution for medical image object detection.
Keyword: Medical Image, Object Detection, Feature Enhancement, Dynamic Loss.
Cite@inproceedings{ICIC2025,
author = {Yingjun Liu, Fuchun Liu, and Yingbin Huang},
title = {An Optimized Object Detection Approach on Medical Image using Feature Enhancement and Dynamic Loss},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3338-3353},
}
- Overlapping Community Detection Algorithm Based on Enhanced Label Propagation with Graph Neural Network Optimization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xiaoliang Zhang, Xiaomeng Zhai, Meng Wang, and Hong Zhang
Abstract: The structure of a community is essential for understanding complex networks, yet detecting communities efficiently and accurately remains a significant challenge. Although the label propagation algorithm offers linear-time complexity, it faces issues with low robustness, high randomness, and a tendency to form overly large communities. To overcome these limitations, we propose an Overlapping Community Detection Algorithm based on Enhanced Label Propagation with Graph Neural Network Optimization ELP-GNN . Our approach consists of three phases: first, an enhanced label propagation algorithm is employed to identify initial communities by incorporating core node selection and importance-based propagation second, a Graph Neural Network GNN model is trained on the initial communities to learn node embeddings and optimize the community structures and finally, a fusion strategy is applied to combine the strengths of both methods. We evaluate ELP-GNN on both real-world and synthetic networks, comparing its performance with existing overlapping and non-overlapping community detection algorithms. The experimental results demonstrate that our algorithm outperforms state-of-the-art methods in terms of accuracy and robustness, particularly in complex network structures with high mixing parameters.
Keyword: Community Detection Label Propagation Graph Neural Networks Density Peak Clustering Graph Computation
Cite@inproceedings{ICIC2025,
author = {Xiaoliang Zhang, Xiaomeng Zhai, Meng Wang, and Hong Zhang},
title = {Overlapping Community Detection Algorithm Based on Enhanced Label Propagation with Graph Neural Network Optimization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2346-2357},
note = {Poster Volume Ⅱ}
}
- GSM-AIV: Exposing the Fragile Boundaries of Mathematical Reasoning in LLMs through Contextual Recomposition, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hualing Liu, Zilong Zhang, Yingxin Hong, Yiwei Guo, Shiqin Gong, and Mengkai Wang
Abstract: This study reveals the semantic contextual vulnerability of large language models in solving mathematical problems. By constructing GSM-AIV, an Algebraic Isomorphic Variants dataset of GSM8K, we find that large language models such as Mistral-7b have an average accuracy improvement of 5.65 percentage points under the condition of retaining the mathematical logical chain but changing the problem context. This phenomenon implies that mathematical reasoning in LLMs relies heavily on surface semantic patterns rather than deep mathematical understanding. We further propose the template-induced bias and attention entropy reduction hypotheses to argue for the phenomenon of loose coupling between the mathematical reasoning ability of the models and the semantic scenarios, which provides a new theoretical perspective on the design of evaluation frameworks.
Keyword: Large Language Model, Natural Language Processing, Mathematical Reasoning.
Cite@inproceedings{ICIC2025,
author = {Hualing Liu, Zilong Zhang, Yingxin Hong, Yiwei Guo, Shiqin Gong, and Mengkai Wang},
title = {GSM-AIV: Exposing the Fragile Boundaries of Mathematical Reasoning in LLMs through Contextual Recomposition},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1111-1121},
}
- PAF-DM: Proposal Alignment Framework for Multimodal Event Extraction via Dynamic Masking, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hengrui Song and Chun Yuan
Abstract: Multimodal event extraction MEE aims to identify and classify event triggers and arguments by jointly modeling textual and visual information. While recent methods have shown promising progress, they often suffer from two key limitations: coarse-grained modality alignment and the scarcity of annotated multimodal data. To alleviate data scarcity, cross-modal data augmentation techniques—such as text-to-image and image-to-text generation—have been explored. However, synthetic data may introduce noise, including hallucinations and artifacts, which can negatively impact model performance. In this work, we propose PAF-DM, a novel framework for MEE that addresses both challenges through a proposal-based alignment paradigm and a dynamic masking strategy. Specifically, we incorporate a Q-Former architecture to achieve fine-grained alignment based on proposals between event-related elements across modalities, and also introduce a three-dimensional dynamic masking mechanism to reduce over-reliance on low-quality synthetic data. Experimental results on the M2E2 benchmark demonstrate that our approach achieves state-of-the-art performance and offers a robust solution for leveraging cross-modal data in MEE.
Keyword: Multimodal Event Extraction, Cross-modal Data Augmentation, Proposal, Dynamic Masking.
Cite@inproceedings{ICIC2025,
author = {Hengrui Song and Chun Yuan},
title = {PAF-DM: Proposal Alignment Framework for Multimodal Event Extraction via Dynamic Masking},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3354-3367},
}
- DDN-GP: Estimating Regression Predictive Distributions with Missing Data, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chaoran Pang, Hua Wang, Shikun Tian, Chen Chen, Wu Xu, and Lin Wang
Abstract: Probability density estimation in time series often encoun- ters missing values, which compromise data completeness and usability, making it difficult to accurately estimate distributions and leading to biased results. To address this challenge, we propose a novel probabil- ity density estimation method called DDN-GP, which introduces Gaus- sian Process GP to Deconvolutional Density Networks DDN . This ap- proach uses a nonlinear dimensionality reduction approach, employing GP in the latent space to handle missing input data, and takes advan- tage of DDN to estimate arbitrarily distributed times in time series even with missing output data, ultimately improving both prediction accu- racy and model robustness. We validate DDN-GP on multiple datasets with missing data, and the experimental results demonstrate that our approach enhances predictive performance quantification compared to existing methods.
Keyword: Probability Density Estimation, Time Series, Missing Data, Gaussian Processes.
Cite@inproceedings{ICIC2025,
author = {Chaoran Pang, Hua Wang, Shikun Tian, Chen Chen, Wu Xu, and Lin Wang},
title = {DDN-GP: Estimating Regression Predictive Distributions with Missing Data},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2439-2451},
note = {Poster Volume Ⅱ}
}
- Integration Detection Model for Deep Neural Network Backdoor Attacks, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chunlu Wu, Junjiang He, Wengang Ma, Ping He, Xiaolong Lan, Shixuan Ren, and Tao Li
Abstract: Deep Neural Network DNN has demonstrated exceptional performance across various domains. However, with the continuous development of adversarial attack techniques, DNN faces increasingly serious security threats. Existing backdoor attack detection methods are primarily designed for specific attack scenarios and often exhibit insufficient effectiveness when confronting complex attack forms such as dynamic sensitivity optimization and randomized obfuscation. This study proposes an Integration Detection Model for Deep Neural Network Backdoor Attacks ID-Model , aiming to build an integrated detection framework capable of addressing various backdoor attacks. The ID-Model consists of three core components: the feature extraction and analysis module, the integrated detector module, and the data processing and alert module. Experimental results demonstrate that compared to STRIP and NNCDA methods, the ID-Model integrated detection model achieves a 19 improvement in detection accuracy under Original-Net and R-Net attacks. This research provides an important theoretical foundation for DNN security defense.
Keyword: Deep Neural Network, Backdoor Attack Detection, Integrated Detection, Security Enhancement
Cite@inproceedings{ICIC2025,
author = {Chunlu Wu, Junjiang He, Wengang Ma, Ping He, Xiaolong Lan, Shixuan Ren, and Tao Li},
title = {Integration Detection Model for Deep Neural Network Backdoor Attacks},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1695-1711},
note = {Poster Volume Ⅱ}
}
- ED-GCAE: Efficient and Adaptive Disentanglement via Shared Features and Dynamic Noise Injection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xingshen Zhang, Hong Pan, Bin Chai, Lin Wang, Bo Yang, and Shuangrong Liu
Abstract: While Autoencoder-based methods have become a dominant framework in disentangled representation learning, their reliance on simplistic Gaussian density estimation presents significant limitations to better disentanglement performance. The Gaussian Channel Autoencoder GCAE was introduced to address density estimation flexibility yet suffers from high computational costs due to its independent discriminator architecture and sensitivity to noise. To overcome these challenges, we propose ED-GCAE, a novel frame-work designed to improve the efficiency and dynamic adaptability of GCAE. ED-GCAE incorporates a shared feature extraction backbone into the dis-criminator architecture, significantly enhancing computational efficiency and training stability. Concurrently, we introduce a dynamic latent-variable-dependent noise injection mechanism to achieve the balance between disen-tanglement and stability. Experiments demonstrate that ED-GCAE demon-strates superior performance compared to baseline methods, achieving better disentangled representations while exhibiting enhanced training stability and computational efficiency.
Keyword: Disentanglement Representation Learning, Representation Learning, Deep Learning.
Cite@inproceedings{ICIC2025,
author = {Xingshen Zhang, Hong Pan, Bin Chai, Lin Wang, Bo Yang, and Shuangrong Liu},
title = {ED-GCAE: Efficient and Adaptive Disentanglement via Shared Features and Dynamic Noise Injection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2455-2467},
note = {Poster Volume Ⅱ}
}
- JLGS-CAD: CAD Reconstruction Based on Joint Learning for Geometry and Sequence, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Tianzhou Han and Fazhi He
Abstract: Achieving both high accuracy and greater similarity to the modeling process of human engineers in CAD reconstruction tasks is a challenging problem. In this paper, we propose JLGS-CAD, a neural network designed based on joint learning, aiming to coordinatively maximize the accuracy of model reconstruction and the quality of the modeling sequence. Based on the characteristics of the CAD modeling process, we divide the reconstruction task between two models: the Extrusion Model, responsible for the geometric accuracy of the reconstructed shape, and the Sketch Model, responsible for the quality of the modeling sequence. We adopt a hybrid supervision approach to enable joint learning of both sequences and geometry in the two models. This method significantly improves the quality of the modeling sequence while maintaining the precision of the reconstructed geometry, allowing the network to produce results more aligned with human modeling workflows. Our training pipeline consists of two stages: a supervised pre-training stage on a large-scale dataset with sequence annotations and a self-supervised fine-tuning stage on a target dataset without sequence labels. This reduces the network�s dependency on large annotated CAD modeling datasets. Experiments conducted on the ABC and Fusion 360 datasets demonstrate the effectiveness of our method. JLGS-CAD accurately recovers geometric details and constructs editable and creative modeling workflows, showing clear advantages over state-of-the-art alternatives.
Keyword: CAD Reconstruction, Joint Learning, Sequence Generation, Hybrid Supervision, Model Finetune, Multimodel Learning
Cite@inproceedings{ICIC2025,
author = {Tianzhou Han and Fazhi He},
title = {JLGS-CAD: CAD Reconstruction Based on Joint Learning for Geometry and Sequence},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3369-3385},
}
- LLM-Based Immune Detection Method for Unknown Network Attacks in ICS Under Few-Shot Conditions, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Hao Wu, Jiangchuan Chen, Wengang Ma, Ping He, Xiaolong Lan, Tao Li, and Junjiang He
Abstract: The rapidly evolving landscape of unknown network attacks has significantly ex-panded the range of cyber threats. However, existing intrusion detection systems IDS primarily rely on large amounts of known attack samples for model train-ing and can only effectively detect known network attacks, particularly in indus-trial control system ICS environments, where obtaining attack samples is ex-tremely difficult. In this paper, inspired by artificial immune systems AIS and large language models LLM , we propose an LLM-based immune detection method, named LLM-IDS, for identifying unknown network attacks in ICS un-der few-shot conditions. The artificial immune system, as a biologically inspired intelligent algorithm, inherently possesses the ability to identify unknown threats. Meanwhile, LLM, with its strong reasoning ability, can deeply explore the latent spatial feature information even with limited train samples. Specifically, we first map network attack data to the antigen space of the artificial immune system. Then, we design a specialized prompt template to guide the LLM in learning and analyzing the spatial distribution features of nonself antigens, thereby capturing the latent space feature distribution information. Finally, we generate immune space detectors under the guidance of LLM and activate them through tolerance mechanisms. Extensive experiments on multiple datasets demonstrate that LLM-IDS exhibits superior performance in detecting both known and unknown cyberattacks, significantly outperforming current mainstream IDS research achievements.
Keyword: Intrusion Detection System, Large Language Model, Artificial Immune System, Unknown Cyber Attacks
Cite@inproceedings{ICIC2025,
author = {Hao Wu, Jiangchuan Chen, Wengang Ma, Ping He, Xiaolong Lan, Tao Li, and Junjiang He},
title = {LLM-Based Immune Detection Method for Unknown Network Attacks in ICS Under Few-Shot Conditions},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2532-2549},
}
- Real-Time Control Method for Resource-Limited HAUV Based on Dual-Modal Dynamic Triggering, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Gehan Zhu, Qi Guo, Zhaoyang Wang, and Bo Xu
Abstract: Abstract. Hybrid Aquatic-Aerial Underwater Vehicles HAUVs face significant control challenges in scenarios involving abrupt medium transi-tions, actuator constraints, and limited computational resources, including sudden dynamic changes, multimodal coupling, and insufficient real-time re-sponsiveness. To address the actuator saturation issue inherent in traditional Proportional-Integral-Derivative control PID and the high computational load of model predictive control MPC , this paper proposes an event-triggered disturbance observer-based tiny model predictive control ET-TMPC method. First, a HAUV rigid-body dynamic model is established, where lumped disturbances during cross-medium transitions are estimated using a nonlinear disturbance observer. Second, the MPC optimization pro-cess is restructured by integrating the alternating direction method of multi-pliers ADMM with precomputation techniques, significantly reducing online computational complexity. Furthermore, a dual-modal FAL dynamic triggering strategy is introduced, which dynamically adjusts triggering thresh-olds for disturbance errors and state errors through FAL, thereby achieving co-optimization of control performance and resource efficiency in cross-domain trajectory tracking. Simulation results demonstrate that, compared to conventional PID and standard MPC, ET-TMPC substantially enhances tra-jectory tracking stability and anti-disturbance capability during water-to-air transition phases while effectively suppressing attitude fluctuations and re-ducing computational load.
Keyword: Keywords: Event-Triggered, Model Predictive Control, Disturbance Observer, Alternating Direction Method of Multipliers
Cite@inproceedings{ICIC2025,
author = {Gehan Zhu, Qi Guo, Zhaoyang Wang, and Bo Xu},
title = {Real-Time Control Method for Resource-Limited HAUV Based on Dual-Modal Dynamic Triggering},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2610-2625},
}
- SAMCA: Segment Anything Model with Double Click Training and Shared Weight Adapter for Medical Ultrasound Image Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: YiRu Huo, YiChen Shi, Jun Feng, Liu Yang, Na Liu
Abstract: Segmentation of medical ultrasound images is crucial for clinical diagnosis. However, challenges such as low contrast and blurred boundaries make obtaining large-scale labeled data for model training difficult. The Segment Anything Model SAM , excelling at prompt-based segmentation in natural images, shows promise for ultrasound applications. In light of this, we propose SAMCA, a promptable medical ultrasound image segmentation model. SAMCA incorporates a shared weight adapter designed to efficiently transfer information between layers, allowing SAM to adapt to the complexities of medical ultrasound imaging. Additionally, we introduce a double click training strategy, where the first set of click prompts is used to provide guidance information for the initial target area, and the second set focuses on correcting local errors in the segmentation error-prone areas. A dynamic fusion mechanism ensures that the second set leverages the global context of the first set during refinement. Experimental comparisons with classic and recent segmentation networks demonstrate that SAMCA achieves state-of-the-art SOTA performance on the challenging TN3K and BUSI datasets, with DSC scores of 86.36 and 89.55 , respectively. Moreover, SAMCA is significantly more lightweight, requiring only 3 of parameter updates compared to SAM-Med2d. Our code will be publicly available at here.
Keyword: Medical ultrasound image segmentation Segment anything model Shared weight adapter Double click training.
Cite@inproceedings{ICIC2025,
author = {YiRu Huo, YiChen Shi, Jun Feng, Liu Yang, Na Liu},
title = {SAMCA: Segment Anything Model with Double Click Training and Shared Weight Adapter for Medical Ultrasound Image Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {728-744},
note = {Poster Volume Ⅰ}
}
- Two-stage occlusion giant panda image inpainting based on partial convolutions, multi-scale contextual attention and a new PatchGAN with two discriminators, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xingchen Dong, Zhiwu Liao, and ChenPeng
Abstract: Image-based individual recognition of giant pandas in real wild scenes suffers from difficulties such as occlusion and multiple postures. Image painting is an important preprocessing step in solving the occlusion of giant panda images. A two-stage inpainting model of giant panda images based on partial convolutions and multi-scale features is proposed. In the coarse inpainting, the structural in-formation of the occluded part is restored, while the fine inpainting focuses on re-storing details such as textures and edges based on the coarse inpainting. A new PatchGAN with two discriminators in the coarse inpainting balances two-scale information guided by the WGAN-GP loss and L1 norm. The generator of the new PatchGAN uses partial-convolutions to avoid the propagation and influence of misinformation in the occluded area. In the fine inpainting, a new proposed module fusing multi-scale feature by contextual attention is added to the PatchGAN. The fine inpainting model learns and enhances the texture and details of the image output by the coarse inpainting through the global searching for mul-ti-scale similar image patches by multi-scale context attention. Thus, it can strengthen the semantic connection between occluded and real image regions. In order to achieve better multi-scale inpainting results, the perceived loss and style loss are added to the adversarial loss. Compared with state-of-art methods, the proposed method can effectively restore image textures and details while sup-pressing noise and artifacts from visual effects. The PSNR, SSIM and FIN of proposed method can achieve 35.69, 0.971 and 5.22 respectively, indicating that proposed method can obtain satisfied inpainting results.
Keyword: image inpainting, dual discriminator, multi-scale context attention, partial convo-lution, PatchGAN.
Cite@inproceedings{ICIC2025,
author = {Xingchen Dong, Zhiwu Liao, and ChenPeng},
title = {Two-stage occlusion giant panda image inpainting based on partial convolutions, multi-scale contextual attention and a new PatchGAN with two discriminators},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2114-2131},
note = {Poster Volume Ⅱ}
}
- Adaptive Parameter Control in Particle Swarm Optimization Based on Proximal Policy Optimization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ruiqi Fan, Lisong Wang, Shaohan Liu, Liang Liu, Fengtao Xu, Yizhuo Sun
Abstract: The performance of Particle Swarm Optimization algorithms when solving complex problems is highly dependent on parameter settings and prone to premature convergence. This paper proposes a Reinforcement Learning-based adaptive multi-subgroup PSO framework, termed PPOPSO, designed to enable an RL agent to learn online and dynamically adjust the core behavioral parameters of multiple parallel PSO subgroups.This framework utilizes Proximal Policy Optimization as the RL agent. Its decision-making process is informed by a state representation that incorporates rich historical performance indicators. Furthermore, it independently selects actions for each subgroup from a predefined set of parameter configurations, each representing different search strategies. The controlled multi-subgroup PSO system integrates a periodic elite particle migration mechanism to foster information sharing and maintain diversity among the subgroups.This design transforms the parameter adaptation challenge into a sequential decision-making process. This allows the system to autonomously balance exploration and exploitation based on the optimization stage and the state of each subgroup. Preliminary experimental results on the CEC2013 standard test function set indicate that, through dynamic parameter adjustment empowered by reinforcement learning, the proposed PPOPSO framework can exhibit superior performance compared to traditional methods, offering a promising new approach for complex optimization problems.
Keyword: Particle swarm optimization , Reinforcement Learning , Proximal Policy Optimization.
Cite@inproceedings{ICIC2025,
author = {Ruiqi Fan, Lisong Wang, Shaohan Liu, Liang Liu, Fengtao Xu, Yizhuo Sun},
title = {Adaptive Parameter Control in Particle Swarm Optimization Based on Proximal Policy Optimization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2501-2518},
}
- SpikeRWKV:Energy-efficient Large Language Model with Spiking Neural Network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yulu Zhang, Qianzi Shen, and Zijian Wang
Abstract: Spiking Neural Networks SNNs , as the third generation of neural networks, hold great promise for enhancing the energy efficiency of large language models LLMs due to their event-driven computation. However, their naive application in large-scale models typically depends on binary spike simulations over long time steps, making it challenging to balance performance and energy consumption. To address this issue, we propose a Multi-head Spike Encoding scheme with three advantages. First, it enables parallel spike processing to accelerate computation Second, it supports precise representation of positive and negative spikes Third, it mitigates energy surges caused by high-frequency spikes through hierarchical spike decomposition. To demonstrate the effectiveness of our encoding scheme, we introduce SpikeRWKV, an SNN-based adaptation of the RWKV language model. Experimental results demonstrate that SpikeRWKV significantly enhances performance on natural language understanding NLU tasks, achieving a $3.15 times$ reduction in energy consumption compared to the baseline, along with an 8.3 lower perplexity and 5.7 improvement in bits-per-character BPC . Furthermore, SpikeRWKV is $3.88 times$ more energy-efficient than its non-spiking counterpart.
Keyword: spiking neural networks , energy efficiency, spike encoding scheme
Cite@inproceedings{ICIC2025,
author = {Yulu Zhang, Qianzi Shen, and Zijian Wang},
title = {SpikeRWKV:Energy-efficient Large Language Model with Spiking Neural Network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {661-674},
}
- HICCNN: A Hierarchical Approach to Enhancing Interpretability in Convolutional Neural Networks, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yinze Luo, Yuxiang Luo, Bo Peng, and Lijun Sun
Abstract: Convolutional Neural Networks CNNs frequently exhibit limited interpret-ability, which presents significant challenges to their deployment in high-stakes applications. Although existing methods such as ICCNN incorporate interpretability mechanisms, these approaches are typically confined to a single network layer and thus fail to capture the hierarchical nature of visual semantics. To overcome this limitation, we propose Hierarchical Interpreta-ble Compositional Convolutional Neural Networks, a novel approach that facilitates layer-wise hierarchical interpretability without requiring any modi-fications to the original network architecture. Specifically, our method al-lows CNNs to learn semantically meaningful and fine-grained features in a structured hierarchy, thereby achieving a closer alignment with human visu-al cognition. Extensive quantitative experiments demonstrate that our model not only offers superior interpretability compared to existing methods but also enhances classification performance—particularly in complex multi-class tasks—by effectively leveraging the hierarchical compositional struc-ture of the learned features. Moreover, we compare our method against Grad-CAM and demonstrate that our model achieves comparable semantic localization quality while offering built-in interpretability during inference, thereby eliminating the need for additional post-hoc explanation modules.
Keyword: Hierarchical interpretability, Neural network interpretability, Convolutional neural networks
Cite@inproceedings{ICIC2025,
author = {Yinze Luo, Yuxiang Luo, Bo Peng, and Lijun Sun},
title = {HICCNN: A Hierarchical Approach to Enhancing Interpretability in Convolutional Neural Networks},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1711-1725},
note = {Poster Volume Ⅱ}
}
- HTLNet: A Segmentation-Free Multi-View Approach for Robust 3D Tooth Landmark Localization, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chentao Wang, Jing Du, Ran Fan, and Fuchang Liu
Abstract: Tooth landmark localization plays a pivotal role in digital orthodontics, providing the computational foundation for generating alignment coordinates and guiding precise treatment planning. However, the limited availability of high-quality 3D tooth landmark datasets and the prevalent reliance on seg-mentation-based methods hinder the accuracy and scalability of current ap-proaches. In this work, we manually annotated a publicly available dental dataset to construct a benchmark 3D tooth landmark dataset, which provides a foundation for robust evaluation in real-world clinical scenarios. To over-come the limitations of existing methods, we propose HTLNet Heatmap-based Tooth Landmark Localization Network , a novel segmentation-free lo-calization framework based on multi-view 2D heatmap regression. HTLNet eliminates the dependency on prior segmentation and reduces error propaga-tion in the processing pipeline. Experimental results demonstrate that HTLNet outperforms state-of-the-art 3D models, such as PointNet-Reg, in terms of accuracy and robustness, especially under challenging conditions such as missing teeth or misaligned dentition. Our method provides a gener-alizable, scalable, and efficient solution, making it well-suited for integration into intelligent dental digital systems and advancing the application of com-puter vision technologies in digital healthcare.
Keyword: Neural networks, Heatmap regression, Multi-view learning, 3D landmark lo-calization, Segmentation-free, Orthodontic applications, Tooth landmark da-tasets.
Cite@inproceedings{ICIC2025,
author = {Chentao Wang, Jing Du, Ran Fan, and Fuchang Liu},
title = {HTLNet: A Segmentation-Free Multi-View Approach for Robust 3D Tooth Landmark Localization},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1726-1742},
note = {Poster Volume Ⅱ}
}
- CFA-FSOD: Context-aware Feature Aggregation for Few-Shot Object Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Huajie Xu, Haikun Liao, Qiukai Huang, and Ganxiao Nong
Abstract: Few-shot object detection FSOD aims to detect novel categories from only a few labeled samples. Most of the meta-learning based FSOD methods tend to rely on static support features which lack adaptability to query contexts and have limited representational power, and they often underutilize class-specific features to refine proposals to promote detection performance. To address these challenges, we propose a novel Context-aware Feature Aggregation for FSOD CFA-FSOD that enhances interaction in a support-query bidirectional manner. Concretely, in this method, a Query-guided Support Enhancement QSE module is proposed to adaptively integrate features from query image regions typically proposals into support features to enhance their flexibility meanwhile, a Cross-attention Feature Modulation CFM module is proposed to leverage the enhanced support features to refine query proposals for fine-grained alignment. Experimental results on both Pascal VOC and MS COCO demonstrate that CFA-FSOD achieves outstanding performance in most evaluation settings, benefiting from its bidirectional interaction mechanism that improves the efficiency of sample utilization and the transfer of category-specific features.
Keyword: Few-shot Object Detection,Meta Learning,Feature Aggregation
Cite@inproceedings{ICIC2025,
author = {Huajie Xu, Haikun Liao, Qiukai Huang, and Ganxiao Nong},
title = {CFA-FSOD: Context-aware Feature Aggregation for Few-Shot Object Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3385-3399},
}
- MCSTA: Multi-dimensional Collaborative Spatial-Temporal Attention Model for Traffic Flow Prediction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Dazhi Zhao, Jinlai Zhang, Kejia Wang, and Wenguang Wu
Abstract: Traffic flow prediction is critical for the effective management and public safety of modern cities however, it remains a challenging task. The intricate spatiotemporal dependencies in traffic data and the trade-off between computational efficiency and predictive accuracy in existing models have long been key challenges. To address these issues, we propose a novel attention-based model built upon the Transformer architecture, termed the Multi-dimensional Collaborative Spatial-Temporal Attention Model MCSTA . Our model introduces several innovations: first, we design a Lightweight Multi-dimensional Cooperative Enhanced Attention LMCEA mechanism to capture spatiotemporal relationships across multiple dimensions. Additionally, we propose Non-dimensionality Reduction Local Cross-Channel Attention NDLCCA , which leverages 1D convolution to model local cross-channel interactions while circumventing dimensionality reduction operations. This approach significantly reduces computational complexity, enhances the utilization of inter-channel information, accurately captures correlations among channels, and ultimately provides more discriminative feature representations. Experimental evaluations on two real-world traffic datasets demonstrate that MCSTA outperforms state-of-the-art SOTA models. Compared to the baseline model, our approach achieves RMSE reductions of 3.43 , 6.63 , and 13.01 on the NYCBike dataset and 6.13 , 7.24 , and 7.49 on the NYCTaxi dataset, respectively.
Keyword: Traffic Flow Prediction and Transformer, Lightweight Attention and Cross-Channel Attention
Cite@inproceedings{ICIC2025,
author = {Dazhi Zhao, Jinlai Zhang, Kejia Wang, and Wenguang Wu},
title = {MCSTA: Multi-dimensional Collaborative Spatial-Temporal Attention Model for Traffic Flow Prediction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1881-1898},
note = {Poster Volume Ⅱ}
}
- Traffic Flow Prediction Using Multi-Scale Convolution and Attention Mechanisms, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Pengfei Qi, Jinlai Zhang, Chulin Li, Linlong Lei, Wei Hao, and Xiong Jiang
Abstract: Traffic flow prediction is a critical task in intelligent transportation systems, significantly improving the efficiency of traffic management and scheduling. However, the complexity and diversity of traffic flow data pose substantial challenges to existing prediction methods. In particular, the frequent temporal variations and spatial characteristics in complex spatiotemporal data are difficult to handle effectively. To address this issue, this paper proposes MCANet, a novel prediction model designed to capture the intricate spatiotemporal features in traffic flow prediction through the integration of multi-scale convolution and attention mechanisms. Specifically, we introduce the Large Kernel Decomposition and Spatio-Temporal Selection LKD-STS module to enhance the model's ability to extract multi-scale features in traffic flow prediction, enabling it to better capture traffic patterns at different temporal scales. Additionally, we propose the Global Channel Spatial Attention GCSA module to improve the model's capability in capturing multi-scale traffic features and preserving spatial-channel information. Furthermore, we introduce the Partial Convolution Batch-normalization GELU PCBG module, which reduces redundant computations and memory access through partial convolution techniques, thereby enhancing the model's efficiency. Compared to the baseline model and other state-of-the-art SOTA traffic flow prediction models, MCANet demonstrates superior performance on the Flight and Traffic datasets. Notably, MCANet efficiently captures complex spatiotemporal features, maintaining stable performance in high-frequency traffic flow prediction tasks. Experimental results show that MCANet excels in traffic flow prediction tasks with varying prediction horizons. Particularly, for a prediction length of T=96, MCANet outperforms SOTA models such as MSGNet and TimesNet, with Mean Squared Error MSE reductions of 2.7 and 5.7 , respectively.
Keyword: Multi-Scale Convolution and Attention Mechanism and Traffic Flow Prediction and Time Series.
Cite@inproceedings{ICIC2025,
author = {Pengfei Qi, Jinlai Zhang, Chulin Li, Linlong Lei, Wei Hao, and Xiong Jiang},
title = {Traffic Flow Prediction Using Multi-Scale Convolution and Attention Mechanisms},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1742-1758},
note = {Poster Volume Ⅱ}
}
- GEMN: A Novel Forest Fire Detection Network, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yong Liu, Shaochen Jiang, and Yongming Li
Abstract: Forest fires have had a significant impact on global ecosys tems and human societies, necessitating the development of efficient and accurate early smoke detection technologies to combat fires. However, current smoke detection technologies face multiple challenges in real time applications, including issues such as large parameter sizes, high computational complexity, and low detection accuracy in complex sce narios. Therefore, based on YOLOv10, we propose a lightweight, high precision, real-time smoke detection network called GEMN. Firstly, to reduce the extraction of redundant features, we innovatively designed a GGCAattention mechanism. This mechanism significantly enhances the comprehensiveness of the model’s feature extraction by strengthening the representation of important features. Secondly, to lower the compu tational complexity and parameter count of the model, we introduced a lightweight detection head named EISDH. Thirdly, we incorporated the MPDIoU function. This not only enhances the model’s robustness but also simplifies the process of extracting unnecessary features from forest fire targets, further reducing the parameter count in the network model. GEMN demonstrates exceptional performance across three test ing benchmark datasets. Notably, on the FFES dataset, compared to the baseline model, our GEMN model achieves a remarkable 0.991 mAP improvement of 2.5 , reaching an accuracy of 96.8 , an increase of 3.7 . Meanwhile, it compresses the parameters to 4.0MB and short ens the inference time to 1.0 milliseconds, showing an approximately 30 improvement over the original model.
Keyword: Image Processing · YOLOv10 · Forest Fires
Cite@inproceedings{ICIC2025,
author = {Yong Liu, Shaochen Jiang, and Yongming Li},
title = {GEMN: A Novel Forest Fire Detection Network},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {745-758},
note = {Poster Volume Ⅰ}
}
- A groundbreaking and innovative data privacy protection framework, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Binghui Liu
Abstract: In scenarios like smart healthcare, smart communi ties, and smart buildings, data collected by Internet of Things IoT devices often pertains to user privacy. However, due to the limited computing power and storage capacity of IoT devices, the data of Data Subject DS are generally stored in the cloud, causing DS to lose control over his data and increasing the risk of privacy leakage. Additionally, resource-constrained IoT devices often face affordability issues regarding encryption costs. In this paper, we propose PPFID, an efficient privacy preserving framework with the DS’s intentions. Specifically, PPFID enforces isolated compu tation and permission control via secure enclaves of Intel SGX on cen trally aggregated data, and encrypts data to guarantee confidential ac cess, computation, and delivery throughout the entire life of the data. To support fine grained access control with the wishes of DS as its core, we design the Privacy Metadata-Based Access Control PMBAC model, which consider the wishes of DS to make access control decisions for each piece of data. Compared to other schemes, PPFID provides more data processing methods and introduces access control schemes that are both strongly isolated and respect DS’s rights. We successfully implemented PPFID on Intel SGX and the embedded device, and evaluated the its feasibility. Our evaluation shows PMBAC can process an access request in the enclave in just 140 ms, meeting DS’s real-time requirements. Al though the computing time has increased compared to the non-protected environments, the prediction accuracy of VGG19 and CNN remains es sentially the same. Experimental results demonstrate that PPFID is ap plicable in general IoT scenarios involving users’ privacy data, and can ensure the confidentiality, integrity, and availability of data while re specting the wishes of DS
Keyword: Internet of Things · IoT Privacy Preserving, · Intel SGX
Cite@inproceedings{ICIC2025,
author = {Binghui Liu},
title = {A groundbreaking and innovative data privacy protection framework},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1184-1196},
note = {Poster Volume Ⅰ}
}
- GafRel: A Joint Entity and Relation Extraction Framework for Chinese Electronic Medical Records with Multidimensional Semantic Enhancement, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chenyang He, Shudong Xia, and Jijun Tong
Abstract: Relation extraction from electronic medical records EMRs is essential for advancing biomedical information systems, but it remains challenging due to nested entity structures and limited contextual representations. To ad-dress these issues, we propose GafRel, a novel joint entity and relation ex-traction framework for Chinese EMRs. GafRel extends CasRel by integrat-ing a Global Pointer to better handle nested entity recognition. Additional-ly, we design a Multi-dimensional Feature Enhancement Layer MFEL , which enables multi-scale contextual modeling through semantic fusion of both local and global features. This architecture enhances the capacity of the model to capture local continuity and long-range dependencies. To address the lack of relation extraction datasets in Chinese EMRs, we construct Di-aRel, a new dataset derived from EMRs of 608 hospital patients. Experi-ments on CMeIE-v2, DiaKG, and DiaRel demonstrate the strong perfor-mance of our method, where GafRel outperforms existing baselines with F1 scores of 53.38 , 53.41 , and 83.98 on the three datasets, respectively. These results highlight the effectiveness of GafRel in extracting complex re-lations from EMRs and its potential for advancing biomedical information extraction.
Keyword: Chinese electronic medical records Relation extraction Global pointer Multidimensional Feature Enhancement Layer.
Cite@inproceedings{ICIC2025,
author = {Chenyang He, Shudong Xia, and Jijun Tong},
title = {GafRel: A Joint Entity and Relation Extraction Framework for Chinese Electronic Medical Records with Multidimensional Semantic Enhancement},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1122-1138},
}
- CBRRel: A Chinese Medical Entity Relationship Extraction Model Combining Location Aware Attention and Feature Fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Chuanxia Lin, Shudong Xia, and Jijun Tong
Abstract: Entity-relation extraction from Chinese electronic medical records EMRs is essential for constructing medical knowledge graphs and enabling intelligent diagnosis and clinical decision-making. However, the presence of complex sentence structures, overlapping entities, and sparse annotations poses signif-icant challenges. To address these issues, we propose CBRRel, a joint extrac-tion model optimized for Chinese EMRs. The model integrates a UNet-based semantic fusion module to enhance multi-scale representation learning and improve boundary detection for complex entities. To further strengthen structural understanding, we introduce a relative position attention mecha-nism that effectively captures positional dependencies between entity pairs. In addition, we apply the Fast Gradient Method FGM adversarial training to improve robustness against input perturbations. Experimental results on the CACMeD dataset show that CBRRel achieves 80.67 precision, 74.13 re-call, and a 77.26 F1 score. On the DuIE public dataset, it achieves an F1 score of 76.74 , demonstrating strong capability in handling overlapping and complex relation scenarios. These results highlight the effectiveness of CBRRel and its potential for practical medical information extraction.
Keyword: Chinese EMRs, Entity-relation extraction, Relative position attention, Adver-sarial Training, UNet semantic fusion.
Cite@inproceedings{ICIC2025,
author = {Chuanxia Lin, Shudong Xia, and Jijun Tong},
title = {CBRRel: A Chinese Medical Entity Relationship Extraction Model Combining Location Aware Attention and Feature Fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1139-1154},
}
- Cross-Modal Dependable Subjective Learning for Sketch Person Re-identification, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Junjie Huang, Chuang Li, and Zhihong Sun
Abstract: Sketch-based person re-identification Sketch Re-ID enables suspect retrieval when camera images are unavailable by leveraging sketches drawn from human memory. However, the subjectivity in sketches often introduces significant style variation, making it difficult to extract reliable cross-modal features.To address this challenge, we propose a novel Cross-Modal Dependable Subjective Learning CMDSL framework. It consists of a Flexible Feature Aggregation Module FFAM that removes style noise via instance normalization and captures dependable subjective semantics through attention-enhanced residual learning, and a Recognisable Target Centroid Loss RTCL that strengthens discriminability and alignment across modalities.Experiments on MARKET-SKETCH-1K and PKU-Sketch datasets demonstrate that our approach effectively captures consistent subjective cues and achieves state-of-the-art performance under diverse sketch styles.
Keyword: Person re-identification,Sketch retrieval,Subjective understanding ,Dependable Subjective Features,Target Centroid Loss Subjective Features and Target Centroid Loss
Cite@inproceedings{ICIC2025,
author = {Junjie Huang, Chuang Li, and Zhihong Sun},
title = {Cross-Modal Dependable Subjective Learning for Sketch Person Re-identification},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3400-3416},
}
- Research on Motor Optimization Method for Seeder System Based on IWMA-RBF, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Xu Chen, Qinglong Meng, Ruirui Sun, Xudong Miao, Fengqi Hao, Qingyan Ding, and Jinqiang Bai
Abstract: To address the issues of nonlinearity and strong coupling characteristics in seeder motor systems, and the insufficient dynamic response and weak anti-disturbance capability of traditional control methods. This study proposes an active disturbance rejection control ADRC method based on an improved whale migration algorithm IWMA optimized radial basis function RBF neural network. First, building upon the whale migration algorithm WMA , a leader proportion dynamic adjustment mechanism is introduced to optimize the population structure through a nonlinear attenuation function. Second, the global search capability is enhanced by integrating a hybrid guidance strategy and Lévy flight perturbation mechanism, thereby constructing the IWMA algorithm with high-efficiency optimization performance. Third, the IWMA is combined with the RBF neural net-work to collaboratively optimize the RBF network’s center vectors, kernel width, and output weights, forming an IWMA-RBF parameter self-tuning framework. Furthermore, an IWMA-RBF-based ADRC controller is designed. The dynamic compensation capability for disturbances such as sudden soil resistance changes is strengthened through an improved extended state observer ESO , and multi-objective optimization of nonlinear state error feedback NLSEF gain parameters is achieved using the IWMA-RBF algorithm. Simulation experiments demonstrate that compared to traditional PID, ADRC, and RBF-ADRC controllers, the IWMA-RBF-ADRC controller significantly improves control accuracy, response speed, and robustness in the motor control system. Field seeding trials verify the superior stability and response speed of this method in complex seeding environments, providing effective technical support for practical applications.
Keyword: Seeder Motor Control, IWMA, RBF, ADRC, Parameter Self-Tuning
Cite@inproceedings{ICIC2025,
author = {Xu Chen, Qinglong Meng, Ruirui Sun, Xudong Miao, Fengqi Hao, Qingyan Ding, and Jinqiang Bai},
title = {Research on Motor Optimization Method for Seeder System Based on IWMA-RBF},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2548-2563},
}
- LMCNet: A MobileNetV4-Enhanced YOLOv10 with Cross-Scale Fusion for Tomato Ripeness Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Jianying Chen and Chuanying Yang
Abstract: In order to quickly and accurately identify tomato fruit ripeness and im-plement automated tomato harvesting in agricultural environments, this study proposes a lightweight tomato ripeness detection model based on an im-proved YOLOv10. Firstly, a lightweight model based on the improved YOLOv10 is proposed by introducing the Universal Inverted Bottleneck module from the MobileNetV4 network and integrating it with the C2f mod-ule in YOLOv10, Then, a new feature fusion structure is designed, where the C2fUIB module replaces the original feature fusion module in the CCFM structure, and the GhostConv module is introduced to replace the standard Conv module. The improved model efficiently handles and fuses the different scale information, and at the same time enhances the model’s detection accuracy and computational efficiency for tomato fruits. The results of this research model on tomato fruit ripeness detection show that the accuracy, recall and average precision are 88.2 , 86.2 and 90.2 , respectively, and the number of parameters of the network model is 4.62M, and the model memory occupancy is 9.7MB, which has a high detection precision and low number of parameters. It highlights the effect of the improved model on tomato fruit ripeness detection.
Keyword: YOLOv10, Ripeness Detection, Lightweight Model, Tomato, MobileNetV4, CCFM.
Cite@inproceedings{ICIC2025,
author = {Jianying Chen and Chuanying Yang},
title = {LMCNet: A MobileNetV4-Enhanced YOLOv10 with Cross-Scale Fusion for Tomato Ripeness Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {759-770},
note = {Poster Volume Ⅰ}
}
- Point Cloud Mapping and Loop Closure Detection Using Superpoint Semantic Graph for Autonomous Driving, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ronghua Du, Zong Li, Jinlai Zhang, Kai Gao, Shaosheng Fan, Taishan Cao, Gengbiao Chen, and Zhenzhen Jin
Abstract: Accurate loop closure detection and pose estimation remain critical challenges for autonomous vehicles operating in dynamic urban environments, where perceptual aliasing, occlusions, and changing scenes often degrade localization performance. To this end, we present a novel hierarchical framework that leverages superpoint graphs to achieve robust place recognition and precise pose estimation. Our approach begins by constructing a topologically meaningful superpoint graph, where nodes represent stable environmental features and edges encode their spatial relationships. For loop closure detection, we introduce semantic-enhanced ring descriptors that combine geometric structure with semantic information, enabling reliable place recognition despite viewpoint changes or temporary occlusions. The system employs a two-stage verification process: initial candidate selection through descriptor matching, followed by geometric verification using superpoint centroids with RANSAC-based outlier rejection. The pose estimation pipeline employs a hierarchical refinement strategy, starting with superpoint centroid alignment, followed by dense ICP and sparse point-to-plane ICP, all integrated into a global pose graph optimization framework. Our overlap-based loop closure detection demonstrates superior performance across KITTI, Apollo, and Ford Campus datasets, achieving state-of-the-art SOTA results on AUC, F1MAX, and recall rate. Furthermore, our pose estimation method exhibits consistently outstanding performance in both accuracy and robustness.
Keyword: Superpoint graph and Autonomous vehicle localization and Loop closure detection and Semantic pose estimation
Cite@inproceedings{ICIC2025,
author = {Ronghua Du, Zong Li, Jinlai Zhang, Kai Gao, Shaosheng Fan, Taishan Cao, Gengbiao Chen, and Zhenzhen Jin},
title = {Point Cloud Mapping and Loop Closure Detection Using Superpoint Semantic Graph for Autonomous Driving},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2625-2641},
}
- MAVSA-DI: Mongolian Audio-Visual Sentiment Analysis Based on Deep Residual Shrinkage Network and Improved 3D-DenseNet, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Ren Qing-Dao-Er-Ji, Qian Bo, Ying Lu, Yatu Ji, and Nier Wu
Abstract: To address the issue of inaccurate extraction of key emotional features in Mongolian audio and video data, which leads to suboptimal sentiment classi-fication performance, this paper proposes a Mongolian Audio-Visual Senti-ment Analysis model based on Deep Residual Shrinkage Network and Im-proved 3D-DenseNet MAVSA-DI . Specifically, the audio branch adopts a Deep Residual Shrinkage Network DRSN to suppress noise interference through a soft-thresholding mechanism and enhance the extraction of emo-tion-relevant acoustic features. The video branch employs an Improved 3D-DenseNet I3DD by integrating the SPD-Conv module, which combines the deep feature extraction capability of SPD-Conv with the dense connectivity of 3D-DenseNet to improve spatiotemporal feature learning from low-resolution facial expressions. Furthermore, Intra-Modal Attention IMA mechanisms are applied to both branches to highlight intra-modal key infor-mation, followed by Cross-Modal Attention CMA to facilitate effective feature fusion. Experimental results demonstrate that the proposed model significantly outperforms several advanced baselines in terms of classifica-tion accuracy for Mongolian Audio-Visual Sentiment Analysis MAVSA .
Keyword: Deep Residual Shrinkage Network, Improved 3D-DenseNet, SPD-Conv, Feature Fusion, Mongolian Audio-Visual Sentiment Analysis.
Cite@inproceedings{ICIC2025,
author = {Ren Qing-Dao-Er-Ji, Qian Bo, Ying Lu, Yatu Ji, and Nier Wu},
title = {MAVSA-DI: Mongolian Audio-Visual Sentiment Analysis Based on Deep Residual Shrinkage Network and Improved 3D-DenseNet},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1155-1167},
}
- Coordinated Attacks through Graph Autoencoder in Federated Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Shuoxiang Wang, Bencan Gong, and Youlin Huang
Abstract: Federated Learning FL facilitates collaborative model training across dis-tributed clients while preserving data privacy through decentralized compu-tation. Architectural limitations render FL susceptible to adversarial attacks. In current attack methodologies, the absence of collaboration among attack-ers renders them more susceptible to detection. This paper proposes CAGA Collaborative Attack via Graph Autoencoder , an innovative model poison-ing framework. Attackers exploit graph-structured correlations among be-nign local models to infer the training data characteristics of the target mod-el and subsequently adversarially reconstruct these correlations, aiming to significantly degrade the performance of global FL model through crafted malicious updates. Unlike conventional attack methodologies, CAGA lever-ages pre-trusted malicious users embedded within benign user groups to exe-cute dual-mode attacks—combining explicit adversarial actions with implicit exploitation of internal user privileges. The experimental results demonstrate that the proposed CAGA attack is highly aggressive and difficult to detect. The attack outperforms the existing GAE attack in terms of both aggressive-ness and stealth.
Keyword: Federated Learning, Collaborative Attack, Graph Autoencoder, Dual-mode Attack
Cite@inproceedings{ICIC2025,
author = {Shuoxiang Wang, Bencan Gong, and Youlin Huang},
title = {Coordinated Attacks through Graph Autoencoder in Federated Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {675-692},
}
- Privacy-Preserving Defense Against Poisoning Attacks in Federated Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Youlin Huang, Bencan Gong, and Shuoxiang Wang
Abstract: Federated Learning FL , as a collaborative training paradigm that does not rely on raw data sharing, faces dual security threats of privacy leakage and data poisoning attacks. These threats not only compromise client data priva-cy but also degrade the performance of the global model. To address this challenge, we propose a Privacy-Preserving Defense against Poisoning At-tacks PPDPA , which integrates privacy preservation and poisoning detec-tion through a lossless masking mechanism. In this framework, the gradient uploaded by each client is first masked using a removable mask to protect gradient privacy. Without revealing the original gradients, the masked gradi-ents are then aggregated, and Singular Value Decomposition SVD is em-ployed to extract features and perform dimensionality reduction. In the re-sulting low-dimensional space, a clustering-based approach is used to identi-fy poisoned gradients. Additionally, a verification mechanism is designed to ensure the integrity of the masking process during aggregation, effectively preventing attackers from manipulating the mask for stealthy poisoning. Fi-nally, poisoned gradients are either removed during aggregation to defend against data poisoning attacks. Extensive experiments demonstrate that PPDPA outperforms existing state of the art privacy-preserving detection methods in both detection accuracy and defense efficiency.
Keyword: Federated Learning FL , Defense Mechanism, Privacy Preservation, Label Flipping Attacks, Singular Value Decomposition SVD
Cite@inproceedings{ICIC2025,
author = {Youlin Huang, Bencan Gong, and Shuoxiang Wang},
title = {Privacy-Preserving Defense Against Poisoning Attacks in Federated Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {693-710},
}
- Membership Inference Attacks for Generative model-based One-Shot Federated Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yiwen Wang, Fan Qi, and Zixin Zhang
Abstract: In recent years, One-Shot Federated Learning (OSFL) has gained significant at-tention for its communication efficiency. With the rise of generative models, many approaches leverage synthetic data on the server to improve global model performance. However, this efficiency introduces heightened privacy risks that remain largely unexplored. In this paper, we conduct the first systematic explora-tion of privacy risks in OSFL by designing a Membership Inference Attack (MIA) strategy tailored to this paradigm. In our strategy, we introduce a general approach designed for all generative models, which infers membership by align-ing query data with the global client distribution. Building on this, we extend the approach specifically for diffusion models, integrating global alignment with que-ry-specific fine-grained details through finetuning and conditional generation, thereby enabling more robust inference. In particular, our strategy does not rely on auxiliary data, making it particularly relevant for privacy-sensitive OSFL set-tings. Extensive experiments validate the effectiveness of the proposed strategy, highlighting the critical privacy risks posed by generative models in OSFL.
Keyword: Membership Inference Attacks, One-Shot Federated Learning, Generative Model
Cite@inproceedings{ICIC2025,
author = {Yiwen Wang, Fan Qi, and Zixin Zhang},
title = {Membership Inference Attacks for Generative model-based One-Shot Federated Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1199-1215},
note = {Poster Volume Ⅰ}
}
- Semi-Supervised Object Detection via Dynamic Reweighting of Localization Error, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Huajie Xu and Ganxiao Nong
Abstract: Semi-supervised object detection SSOD leverages limited labeled data alongside abundant unlabeled data to improve detection performance. Existing SSOD methods based on teacher-student framework tend to neglect localization error within pseudo-labels, detrimentally affecting the student model's bounding box regression and classification. To address this issue, a novel SSOD method based on dynamic localization error reweighting is proposed. In the method, predicted bounding boxes are modeled using Gaussian distribution to derive a localization quality score quantifying localization error. This score underpins a strategy of Localization Error reweighting in Regression LER , which dynamically adjusts the unsupervised regression loss to prioritize accurately localized pseudo-labels. Simultaneously, a strategy of Proposal Reliability reweighting in Classification PRC is proposed, utilizing teacher predictions to assess student proposal reliability. PRC combines class probabilities and localization quality scores to dynamically reweight the unsupervised classification loss, thereby mitigating interference from misassigned labels. Extensive experiments on the MS COCO and PASCAL VOC datasets demonstrate the effectiveness and superiority of our approach.
Keyword: Semi-supervised object detection, pseudo-label, localization error, loss reweighting
Cite@inproceedings{ICIC2025,
author = {Huajie Xu and Ganxiao Nong},
title = {Semi-Supervised Object Detection via Dynamic Reweighting of Localization Error},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3416-3431},
}
- Local-Semantic Attentive Bidirectional Bottleneck Network with Residual Feature Augmentation for Real-time Semantic Segmentation, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: ManYuan Gui, Jinlai Zhang, Yonghen Hu, Sheng Wu, Du Xu, Bo Ouyang, Shaosheng Fan, and Zhenzhen Jin
Abstract: Real-time semantic segmentation is critical for applications such as autonomous driving, where the core challenge lies in achieving high segmentation accuracy while maintaining efficient inference. This paper proposes LSA-BiNet, a bidirectional bottleneck network via local-semantic attention and residual feature augmentation. The framework has three key innovations: 1 The Local Receptive Field Attention LRFA module achieves high-order feature interactions with 1st-order computational complexity through region-wise soft-weight computation and channel gating 2 The Spatial Variance Fusion Module SVFM collaboratively models local and non-local features via low-frequency variance modulation and local detail enhancement 3 The Residual Cross-level Attention Decoder RCAD enables precise pixel-level prediction using cross-level feature projection, dual gating mechanisms, and residual attention weighting. Extensive experiments on Cityscapes and CamVid benchmarks demonstrate that LSA-BiNet achieves state-of-the-art SOTA mean Intersection-over-Union mIoU of 72.74 and 68.53 without ImageNet pretraining, while maintaining low computational complexity 8.81 GFLOPs and real-time inference speeds 51.08 FPS on Cityscapes, 79.62 FPS on CamVid . Ablation studies confirm significant contributions of each module, establishing LSA-BiNet’s superiority over contemporary SOTA models.
Keyword: Real-time semantic segmentation and local-semantic attention and residual feature augmentation and bidirectional bottleneck network and computational efficiency
Cite@inproceedings{ICIC2025,
author = {ManYuan Gui, Jinlai Zhang, Yonghen Hu, Sheng Wu, Du Xu, Bo Ouyang, Shaosheng Fan, and Zhenzhen Jin},
title = {Local-Semantic Attentive Bidirectional Bottleneck Network with Residual Feature Augmentation for Real-time Semantic Segmentation},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1757-1774},
note = {Poster Volume Ⅱ}
}
- GCT-Net: A malicious Android application detection method based on multimodal feature fusion, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuheng Huang, Weihao Huang, Chunhong Jiang, Song Xie, and Hongsong Wang
Abstract: Despite the persistent threat of malicious Android applica tions, many existing detection methods struggle to effectively integrate and analyze the heterogeneous threat indicators embedded within APKs. This fragmented analysis often fails to capture the complex interplay between different threat vectors. To address this challenge, we propose GCT-Net GNN-CNN-Tree-LSTM-Net , a novel deep learning framework that synergistically fuses Graph, Convolutional, and Tree-structured fea tures for unified malware detection. We first disassemble APKs via re verse engineering to derive three critical modalities: API call sequences modeled as directed graphs and processed by a Graph Neural Network GNN to capture semantic dependencies binary code converted into greyscale images and analyzed via a two layers Convolutional Neural Network 2D-CNN to detect spatial malware patterns and URL strings parsed into syntax trees and encoded using a hierarchical Tree LSTM network to learn structural embeddings. These modality-specific fea tures are adaptively integrated through three dense layers. Evaluated on two datasets, GCT-Net achieves state-of-the-art performance with 96.48 93.75 accuracy, 97.62 96.45 precision, 97.05 95.40 re call and 97.33 95.42 F1-score, outperforming other models. Abla tion studies confirm the critical contributions of all three modalities and validate the fusion efficacy, establishing a new method for multimodal malware analysis.
Keyword: Android Malware Detection · Multimodal Learning · Graph Neural Network · Tree-Structured LSTM
Cite@inproceedings{ICIC2025,
author = {Yuheng Huang, Weihao Huang, Chunhong Jiang, Song Xie, and Hongsong Wang},
title = {GCT-Net: A malicious Android application detection method based on multimodal feature fusion},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1773-1790},
note = {Poster Volume Ⅱ}
}
- Unsupervised Wood Surface Anomaly Detection via Enhanced GAN with Residual Dense and Attention Modules, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuhao Guo, Fengqi Hao, Qingyan Ding, Jinqiang Bai, Dexin Ma, and Huijuan Hao
Abstract: Reliable surface-defect inspection is a prerequisite for modern woodprocessing lines, yet manually labelled defect images are inherently scarce and imbalanced. We present ERA-GANomaly, an unsupervised anomaly-detection framework that combines an encoder–decoder–re-encoder back-bone with Residual Dense Blocks RDBs and lightweight Efficient Channel Attention ECA to emphasise salient textures. Experiments on three wood-defect datasets show that ERA-GANomaly attains 92.4 accuracy and a macro-F1 of 83.0 , outperforming representative unsupervised baselines such as GANomaly, EGBAD and AnoGAN. Ablation studies verify that both ECA and RDB modules contribute markedly to detecting subtle de-fects—including cracks, chips and bark inclusions. These findings indicate that ERA-GANomaly offers a practical, label-free solution for industrial sur-face-defect screening.
Keyword: Unsupervised Anomaly Detection, Generative Adversarial Networks, Atten-tion Mechanism, Residual Dense Blocks.
Cite@inproceedings{ICIC2025,
author = {Yuhao Guo, Fengqi Hao, Qingyan Ding, Jinqiang Bai, Dexin Ma, and Huijuan Hao},
title = {Unsupervised Wood Surface Anomaly Detection via Enhanced GAN with Residual Dense and Attention Modules},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {275-291},
}
- HRS-UNet: A Semantic Segmentation Model for Precise Crop Classification in Hyperspectral Remote Sensing Image, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Zhiyu Yang, Lei Zou, and Yuhuai Lin
Abstract: Precise crop classification, as a pivotal technology underpinning precision agriculture, has attracted considerable attention in recent years. Hyperspectral imaging systems mounted on Unmanned Aerial Vehicles UAVs are capable of producing high spatial resolution hyperspectral imagery, offering distinct advantages including low operational costs, high operational flexibility, and real-time data acquisition. As a result, these systems have emerged as an optimal tool for precise crop classification within precision agriculture monitoring. Nevertheless, existing methods for crop classification using UAV hyperspectral imagery encounter a trade-off between global feature perception and computational complexity, frequently leading to the loss of spatial features. To tackle this issue, this study introduces a hyperspectral segmentation network, HRS-UNet, designed to achieve precise crop classification from hyperspectral samples. And we propose a Multiscale Spectral Aggregation MSA module, which greatly reduces the computational burden of the backbone network through feature enhancement and dimensionality reduction. Evaluation results on the UAV-HSI-Crop dataset reveals that our model attains state-of-the-art performance, achieving an overall classification accuracy of 89.96 and a Kappa coefficient of 0.8814, outperforming existing approaches. Our model offers a novel technical pathway for efficient monitoring in precision agriculture.
Keyword: Presicion Agriculture Hyperspectral Image Semantic Segmentation Remote Sensing
Cite@inproceedings{ICIC2025,
author = {Zhiyu Yang, Lei Zou, and Yuhuai Lin},
title = {HRS-UNet: A Semantic Segmentation Model for Precise Crop Classification in Hyperspectral Remote Sensing Image},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2129-2140},
note = {Poster Volume Ⅱ}
}
- Hierarchical Incongruity-Aware Fusion Network with Adaptive Refinement for Multi-modal Sarcasm Detection, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Fang Wang, Lei Chen, and Hao Pan
Abstract: Multi-modal sarcasm detection MSD aims to identify sarcastic sentiment conveyed through textual and visual modalities. The key challenge lies in capturing underlying incongruity across modalities. However, many existing studies rely on shallow feature fusion strategies, resulting in limited interaction between textual and visual features. Moreover, they often overlook localized inconsistencies in sarcasm, leading to insufficient representation of fine-grained sarcastic cues. To address these challenges, we propose a hierarchical incongruity-aware fusion network with semantic adaptive refinement HIAF . Specifically, we first introduce a hierarchical fusion module that progressively captures multi-level incongruity through iterative transformer layers, guided by a cross-modal locality-constrained attention mechanism. Second, we design a semantic adaptive refinement module that dynamically integrates unimodal and cross-modal features based on their contextual contributions. Experiments demonstrate consistent outperformance over strong baselines, validating its capability in capturing multi-modal incongruity.
Keyword: Multi-modal Sarcasm Detection , Multi-modal Fusion , Hierarchical Attention
Cite@inproceedings{ICIC2025,
author = {Fang Wang, Lei Chen, and Hao Pan},
title = {Hierarchical Incongruity-Aware Fusion Network with Adaptive Refinement for Multi-modal Sarcasm Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3664-3679},
}
- Reconstructing Reality: Robust High-Frequency Recovery for MRI via Latent Diffusion Models, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Tianzhi Wang and Jian Wang
Abstract: Multi-contrast magnetic resonance imaging MRI is a widely used analytical tool for characterizing tissue contrast in neurological disorders. Although conventional MRI techniques provide rich contrast information in the diagnosis of neurological diseases, their limited spatial resolution often hinders the precise identification of subtle pathological regions. Therefore, super-resolution SR reconstruction of MRI images holds significant importance in the field of medical imaging. Traditional end-to-end deep neural network approaches tend to learn the average of multiple possible reconstruction outcomes, resulting in overly smoothed generated images that lack high-frequency details. In recent years, generative models have demonstrated remarkable capabilities in SR tasks by synthesizing more realistic high-frequency information, thereby substantially mitigating the aforementioned issue. However, generative models generally exhibit considerable randomness, making it challenging to ensure the stability and consistency of the results. To address this, we propose a novel MRI SR method that integrates the strengths of both generative and discriminative models. Specifically, we employ a latent diffusion model LDT to capture the high-frequency information in real images and utilize the low-frequency information from low-resolution LR images as conditional input for an autoencoder to generate high-resolution HR images. Quantitative experimental results demonstrate that our method outperforms existing state-of-the-art MRI SR approaches across multiple metrics while maintaining a more lightweight architecture. Furthermore, visualization results further validate the superiority of our method in reconstructing high-frequency details.
Keyword: MRI Super-Resolution Diffusion Model Autoencoder Wavelet Transform Discriminative Model
Cite@inproceedings{ICIC2025,
author = {Tianzhi Wang and Jian Wang},
title = {Reconstructing Reality: Robust High-Frequency Recovery for MRI via Latent Diffusion Models},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2176-2190},
note = {Poster Volume Ⅱ}
}
- HG-DETR: Image-Level Few-Shot Object Detection with Cross-Category and Query-Level Heterogeneous Graphs, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Liangchen Qu and Hongru Zhao
Abstract: Few-shot object detection FSOD aims to detect novel objects with limited annotated examples, yet existing methods face critical challenges in handling low-quality region proposals, leading to suboptimal generalization. And current meta-learning approaches often rely on pairwise region-class matching, which neglects contextual relationships among proposals and fails to leverage cross-class semantic dependencies, resulting in misclassification over similar classes and limited adaptability to novel categories. To address these limitations, we propose HG-DETR, a novel FSOD framework that integrates image-level detection with heterogeneous relational reasoning. Our method bypasses error-prone region proposal networks by directly operating on holistic image features through a Transformer-based architecture, enabling end-to-end optimization. By considering these multi-faceted relationships between proposals and classes, we propose 1 a cross-category semantic relationship graph that dynamically models semantic dependencies among base and novel classes to enhance prototype representations through knowledge transfer, 2 a query-level context aggregation graph models spatial relationships within a query image by connecting top-confidence proposals and a class node, using a GCN layer to aggregate features and refine proposals, and 3 bidirectional class-query adaptation via attention mechanisms to align feature distributions and bridge domain gaps. Qualitative and quantitative results demonstrate that our method achieves superior performance in few-shot object detection on Pascal VOC and MS COCO datasets compared with existing methods.
Keyword: Object Detection, Few-Shot Learning, Few-Shot Object Detection, Heterogeneous Graph Convolutional Networks
Cite@inproceedings{ICIC2025,
author = {Liangchen Qu and Hongru Zhao},
title = {HG-DETR: Image-Level Few-Shot Object Detection with Cross-Category and Query-Level Heterogeneous Graphs},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3431-3446},
}
- Decoding Olympic Medal Success: A Multi-Factor Analysis and Predictive Framework, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuqian Huang, Zishuo Liu, Lian Mei, and Suohai Fan
Abstract: A comprehensive and interpretable framework is proposed to forecast and analyze Olympic medal distributions by integrating ensemble machine learning techniques with statistical concentration diagnostics. Utilizing a structured dataset comprising both athlete-level and nation-level features—such as performance records, sport-specific metadata, host advantages, and historical patterns—the framework employs the XGBoost algorithm to predict medal counts for the 2028 Summer Olympics. The model achieves strong predictive performance, particularly for gold medal forecasting RMSE = 2.42, accuracy = 93.66 , and is validated through rigorous cross-validation procedures. To explore structural disparities in medal allocation, Gini and Herfindahl–Hirschman indices are computed across multiple disciplines, revealing significant concentration in sports like swimming, gymnastics, and athletics, where a limited number of countries consistently dominate podium outcomes. Model interpretability is enhanced using SHAP SHapley Additive exPlanations , which identifies the relative contributions of demographic, structural, and sport-specific variables to medal predictions. This integrative approach not only enables accurate and explainable Olympic forecasting but also provides actionable insights for evaluating competitive equity and informing national sports investment strategies.
Keyword: Olympic Medal Prediction, TOPSIS, XGBoost, Great Coach Effect, Gini Index, Herfindahl-Hirschman Index, Data Clustering.
Cite@inproceedings{ICIC2025,
author = {Yuqian Huang, Zishuo Liu, Lian Mei, and Suohai Fan},
title = {Decoding Olympic Medal Success: A Multi-Factor Analysis and Predictive Framework},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {711-728},
}
- Adaptive Fusion Multi-View Contrastive Learning with Interest Aggregation for Collaborative Filtering, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Runze Feng, Junping Liu, and Mingchao Yu
Abstract: In recent years, recommendation systems based on graph neural networks GNN have achieved remarkable success. Despite their effectiveness, GNN-based methods are often affected by noisy interac tions in user-item data. Consequently, several approaches have adopted graph contrastive learning GCL to address this challenge. However, most existing GCL approaches construct contrastive views from the user-item graph, without explicitly leveraging high-order relational in formation i.e., user-user and item-item relationships . Moreover, they often adopt a uniform perspective on user-item connections, neglecting the diversity of user interests.To address these limitations, we present a graph contrastive recommendation model that incorporates an adaptive multi-view fusion strategy,named AdaFCL. Specifically,to more explic itly exploit high-order information, we design an adaptive fusion mod ule that fuses edge weights derived from both the user-item interaction graph and the high-order collaborative graph i.e., user-user and item item graph .Then this fusion module introduces a learnable generator based on GCN and GAT to generate low-noise contrastive views, as an alternative to traditional random perturbations. Furthermore, we de sign a interest aggregation module to embed users’ personalized prefer ences into the representation learning process.Extensive experiments on three public benchmark datasets demonstrate the superiority of AdaFCL. Compared to the strongest baselines , our model improves performance by up to 9.27 for NDCG@20 and 8.67 for Recall@20.
Keyword: Information systems · Recommender systems ·
Cite@inproceedings{ICIC2025,
author = {Runze Feng, Junping Liu, and Mingchao Yu},
title = {Adaptive Fusion Multi-View Contrastive Learning with Interest Aggregation for Collaborative Filtering},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2362-2379},
note = {Poster Volume Ⅱ}
}
- Safe Policy Improvement with Baseline Bootstrapping under State Abstraction, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Yuan Zhuang
Abstract: This paper studies the Safety Policy Improvement SPI problem in Batch Reinforcement Learning Batch RL , which aims to train a policy from a fixed dataset without environment interaction, while ensuring its perfor-mance is no worse than the behavior policy used for data collection. Most existing methods often require a substantial amount of historical data to ensure sufficient confidence in the performance of the learned policy. How-ever, the fixed dataset is often limited, which causes the learning overly conservative. To address this issue, we investigate the integration of state abstraction into the SPIBB framework to improve sample efficiency. While state abstraction has been widely used to improve sample efficiency, it traditionally lacks mechanisms for providing performance guarantees. We bridge this gap by deriving theoretical performance guarantees of policies learned from SPIBB under state abstraction. Empirical results show that our method achieves comparable or better policy improvement using fewer samples than the original SPIBB algorithm.
Keyword: Batch Reinforcement Learning, Safe policy Improvement with Baseline Bootstrapping , State Abstraction.
Cite@inproceedings{ICIC2025,
author = {Yuan Zhuang},
title = {Safe Policy Improvement with Baseline Bootstrapping under State Abstraction},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {409-422},
}
- A Novel Framework for sEMG Gesture Recognition Based on Soft Prompt Learning, ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Authors: Dingchi Sun and Junjian Ren
Abstract: Surface electromyography sEMG signals hold considerable promise for predicting human motion prior to its actual execution. However, a major challenge in sEMG-based intention recognition lies in the severe noise interference and high inter-subject variability inherent in traditional myoelectric time-series signals. These issues hinder accurate alignment with corresponding actions and constrain model learning capacity. To address these challenges, this study proposes a dual-modal contrastive learning framework based on Contrastive Language-Audio Pretraining CLAP . By introducing textual prompts as auxiliary guidance for interpreting sEMG signals, the proposed method enhances recognition accuracy while reducing redundant training. In addition, a k-layer hierarchical processing algorithm is developed to expand the training dataset to a quadratic scale of its original size, thereby mitigating the problem of limited data availability and facilitating integrated prediction. The proposed approach is evaluated on public benchmark datasets, including Ninapro DB1, DB2, DB5, and CapgMyo. Experimental results show that the model outperforms state-of-the-art SOTA methods by 2–3 .
Keyword: Surface electromyograph,gesture recognition,segmentation parameters,Multimodal learning,Contrastive Learning .
Cite@inproceedings{ICIC2025,
author = {Dingchi Sun and Junjian Ren},
title = {A Novel Framework for sEMG Gesture Recognition Based on Soft Prompt Learning},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {1897-1910},
note = {Poster Volume Ⅱ}
}