-
Cascaded Feature Fusion Network for Small-size Pedestrian Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Shilong Yu and Chenhui Yang
Abstract: Deep neural network-based target detectors cannot sufficiently extract effec-tive features for detecting small-size pedestrians, In this letter, we propose a deep cascaded network framework for small-size pedestrian detection, which contains an Iterative Feature Augmentation module and a Residual Attention Fusion module, Specifically, the Iterative Feature Augmentation module adopts bilinear interpolation sampling and channel reshaping in the deep backbone network to achieve feature fusion at different scales, Moreover, we also introduce a feature fusion coefficient to select small-size features, The Residual Attention Fusion module is constructed by stacking attention modules, and the attention modules at different depths produce adaptive changes in perceptual features, Each attention module is a bottom-up feed-forward structure and features are reconstructed by residual connection be-tween attention modules, Experiments on Tiny Citypersons, Caltech, and Ti-ny Person challenging datasets show that our proposed modules achieve sig-nificant gains, with an almost 10% improvement in pedestrian average miss rate and precision compared to baseline networks.
Keyword: Cascaded convolutional neural network (CNN), Pedestrian Detection, Resid-ual Attention, Image Processing
Cite
@inproceedings{ICIC_2024,
author = {Shilong Yu and Chenhui Yang},
title = {Cascaded Feature Fusion Network for Small-size Pedestrian Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {1-12},
note = {Poster Volume Ⅰ}
}
-
Intent-Driven Attribute-Based Outsourcing Encryption Scheme,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Ke Li, Guowei Wu, and Jun Shen
Abstract: With the development of the Internet, the communication between users is increasingly concerned about the protection of private information, As one of the important means to protect private data, attribute-based encryption takes attributes as the certificate of user decryption, which prevents private data from being leaked or tampered, However, the traditional attribute-based encryption has some problems, such as inflexible encryption process and heavy computing burden for users, We propose an intent-driven attribute-based outsourcing encryption scheme, which integrates user intent parameters into the encryption algorithm to improve the flexibility and reliability of the encryption process, Edge nodes have powerful computing and storage capabilities, We outsource some encryption and decryption operations from the users to the edge nodes, which is conducive to reducing the computing overhead of users or terminals, The hierarchical relationship of attributes can help users quickly match attributes, We construct the attributes as attribute trees, and determine the user's decryption permission according to the hierarchical relationship between user attributes, Finally, we give scheme analysis, including security proof, performance cost and functional comparison of the scheme.
Keyword: Intent-Driven, Attribute-Based Encryption, Hierarchical Attributes, Outsourced
Cite
@inproceedings{ICIC_2024,
author = {Ke Li, Guowei Wu, and Jun Shen},
title = {Intent-Driven Attribute-Based Outsourcing Encryption Scheme},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {733-744},
note = {Poster Volume Ⅰ}
}
-
Piculet: Specialized Model-Guided Hallucination Alleviation for MultiModal Large Language Models,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, and Shiguo Lian
Abstract: Multimodal Large Language Models (MLLMs) have made significant progress in bridging
the gap between visual and language modalities, However, hallucinations in MLLMs,
where generated text does not align with image content,
continue to be a major challenge, Existing methods for addressing hallucinations often rely on instruction-tuning,
which requires retraining the model with specific data, which increases the cost of utilizing MLLMs further, In this paper,
we introduce a novel training-free method, named Warbler, for enhancing the input representation of MLLMs, Warbler leverage multiple auxiliary models
to extract description of visual information from the input image, and combine these description together with the original image as an input to the MLLM,
We evaluate our method both quantitively and qualitively, and the results demostrating that Warbler greatly decrease hallucinations of MLLMs,
Our method can be easily extended to different MLLMs while being universal.
Keyword: Multimodal Large Language Models, hallucinations, training-free
Cite
@inproceedings{ICIC_2024,
author = {Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, and Shiguo Lian},
title = {Piculet: Specialized Model-Guided Hallucination Alleviation for MultiModal Large Language Models},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {1-15},
note = {Poster Volume Ⅱ}
}
-
State Quantize for Pursuit Approximate Optimal Control using Reinforcement Learning,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Huanhuan Yu, Haohao Mai, Shuling Gao, Xiwen Huang, and Qiuling Yang
Abstract: In high-speed vehicle motion scenarios, solving optimal control problems faces significant challenges in terms of time and space complexity, Ensuring real-time performance of the controller requires efficient solving algorithms and support from high-performance computing platforms, To reduce the computational cost and approach the performance of optimal control, approximate optimal control has emerged as a feasible solution, In this paper, we propose an approximate optimal vehicle control method that outperforms Model Predictive Control (MPC) in terms of performance, The method combines the pure pursuit algorithm for vehicle path tracking with the Twin Delayed DDPG (TD3) algorithm to generate approximate lookahead distance and velocity control values for the vehicle, Additionally, the vehicle state is quantized and discretized, In our experiments with a vehicle simulator, we compare the MPC control with our proposed method, The results show that while the MPC control remains stable at a vehicle speed of up to 70MPH, our method effectively controls the vehicle even at a speed of 100MPH, with higher control rate and robustness.
Keyword: approximate optimal control, pure pursuit,TD3, quantize
Cite
@inproceedings{ICIC_2024,
author = {Huanhuan Yu, Haohao Mai, Shuling Gao, Xiwen Huang, and Qiuling Yang},
title = {State Quantize for Pursuit Approximate Optimal Control using Reinforcement Learning},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {202-217},
note = {Poster Volume Ⅱ}
}
-
Content-Aware Network for Quality Estimation of Copper Scrap Granules,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Kaikai Zhao, Zhaoxiang Liu, Kai Wang, and Shiguo Lian
Abstract: To determine the quality level of copper scrap granules, existing methods have to manually identify all kinds of impurities mixed in copper scrap granules relying on technicians' experience, In this paper, we pioneer a computer vision-based approach called Content-Aware Network (CANet) to estimate the quality of copper scrap granules, Specifically, CANet consists of a visual transformer-based backbone that extracts the semantic features from copper scrap granule images, a multi-layer perception-based neck that explicitly estimates the volume proportion of copper to copper scrap granules and implicitly estimates the counterparts of varieties of impurities and a well-designed head that directly outputs the quality result, Benefiting from our novel architecture and loss functions, CANet can be trained in an end-to-end manner to accurately estimate the quality of copper scrap granules only with the binary annotated images (copper area and non-copper area) without identifying these unknown impurities and their densities in advance, Experiments on real copper scrap granule datasets demonstrate the effectiveness and superiority of our proposed method.
Keyword: Copper scrap granules, Quality level, Visual transformer, Content-Aware
Cite
@inproceedings{ICIC_2024,
author = {Kaikai Zhao, Zhaoxiang Liu, Kai Wang, and Shiguo Lian},
title = {Content-Aware Network for Quality Estimation of Copper Scrap Granules},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {16-27},
note = {Poster Volume Ⅱ}
}
-
Facial Expression Recognition Via Multi Semantic Diffusion Model on Imbalanced Datasets,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Ling Zhang and Junlan Dong
Abstract: This paper presents a novel facial expression recognition approach based on multiple semantic taxonomies learning on the imbalanced datasets, Recent studies on imbalanced data always concern how to homogenize the data volume between different categories, presenting strategies like minority over-sampling and majority balance cascading, etc, In this paper, we try to pay more attention in high-level semantic characterization of facial expression, using more discriminative and conceptual attributes to describe samples in the case of unbalanced sets, To fully exploit the semantic information contained in the small volume samples, we develop an Analytic Hierarchical Model (AHM) method based on facial Action Unit (AU), to enforce a discriminative mapping from the image feature space to a multi-semantic space with taxonomic relations, We apply convolutional neural networks to capture the low-level image feature, and then use dictionary learning algorithm for reconstruction of images in semantic space, in order to prevent deviation from individual identity, Experiments performed on RAF-DB, FER2013 and SFEW expression databases show that the proposed method is robust to facial expression recognition in the wild.
Keyword: semantic diffusion, imbalanced dataset, facial action control system (FACS), conceptual taxonomies, Analytic Hierarchical Model (AHM)
Cite
@inproceedings{ICIC_2024,
author = {Ling Zhang and Junlan Dong},
title = {Facial Expression Recognition Via Multi Semantic Diffusion Model on Imbalanced Datasets},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {453-463},
note = {Poster Volume Ⅰ}
}
-
Industrial Internet of Things Intrusion Detection System Based on Federated Learning,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Teng Fang and Lina Ge
Abstract: Abstract, With the rapid development of Industrial Internet of Things (IIoT), its secu-rity has become a focus of attention, Traditional centralized intrusion detection sys-tems (IDS) face challenges of privacy leakage and high communication overhead in IIoT environments, This article proposes an IIoT intrusion detection system based on federated learning (FL-IDS), The system introduces Paillier homomorphic encryption technology to enhance the security of data transmission, uses Bi-LSTM to extract network traffic data features, and uses Transformer for model training, The experi-mental results show that our system outperforms other models in terms of detection rate and false alarm rate, This framework effectively improves the accuracy of intru-sion detection, reduces communication bandwidth requirements, and protects user privacy while ensuring model convergence.
Keyword: Keywords: Industrial Internet of Things, Intrusion Detection, Federated Learning, Bi-LSTM, Transformer
Cite
@inproceedings{ICIC_2024,
author = {Teng Fang and Lina Ge},
title = {Industrial Internet of Things Intrusion Detection System Based on Federated Learning},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {218-235},
note = {Poster Volume Ⅱ}
}
-
PAEN: Efficient Pillar-based 3D Object Detector Based on Attention and Dilated Convolution,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jia Wen, Guanghao Zhang, Qi Zhang, Kelun Tian, and Kejun Ren
Abstract: The Pillar-based 3D object detector can complete the scene-sensing task efficiently and quickly, meeting the basic real-time detection needs of the automatic driving sensing module, In this paper, we propose a Pillar Sequence Attention Encoder and Dilated Expansion Convolution Network, The former addresses issues of coarse encoding methods and limitations in encoding information during the pillar encoding stage, while the latter tackles the problem of insufficient receptive fields in the backbone network, Specifically, the Pillar Sequence Attention Encoder uses the Pillar Sequence Attention module (PSA) to capture attention information among points in the local region of the pillar and utilizes a Pillar Feature Soft Aggregation module (PFSA) to finely aggregate information from points within the pillar, The Dilated Expansion Convolution Network leverages dilated convolutions to capture feature information with both sparse and dense in wide-ranging receptive fields, We conducted experiments on the KITTI dataset to validate the performance of our model and the effectiveness of the proposed modules, Experiments show that our method achieved a mean average precision(mAP) of 81, 48% for the car category, surpassing the baseline model by 3, 12%, while the inference time only increases by about 10ms.
Keyword: 3D object detection, LiDAR, Pillar detector, Attention module, Dilated convolution
Cite
@inproceedings{ICIC_2024,
author = {Jia Wen, Guanghao Zhang, Qi Zhang, Kelun Tian, and Kejun Ren},
title = {PAEN: Efficient Pillar-based 3D Object Detector Based on Attention and Dilated Convolution},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {28-39},
note = {Poster Volume Ⅱ}
}
-
FastHDRNet: A new efficient method for SDR-to-HDR Translation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Tian Siyuan, Wang Hao, Rong Yiren, Wang Junhao, Dai Renjie, and He Zhengxiao
Abstract: Modern displays nowadays possess the capability to render video content with a high dynamic range (HDR) and an extensive color gamut (WCG), However, the majority of available resources are still in standard dynamic range(SDR), Therefore, we need to identify an effective methodology for this objective, The existing deep neural network (DNN) based SDR (Standard dynamic range) to HDR (High dynamic range) conversion methods outperform conventional methods, but they are either too large to implement or generate some terrible artifacts, We propose a neural network for SDRTV to HDRTV conversion, termed "FastHDRNet", This network includes two parts, Adaptive Universal Color Transformation and Local Enhancement, The architecture is designed as a lightweight network that utilizes global statistics and local information with super high efficiency, After the experiment, we find that our proposed method achieve state-of-the-art performance in both quantitative comparisons and visual quality with a lightweight structure and a enhanced infer speed.
Keyword: Inverse Tonemapping,Channel Selection Normalization,Image Processing
Cite
@inproceedings{ICIC_2024,
author = {Tian Siyuan, Wang Hao, Rong Yiren, Wang Junhao, Dai Renjie, and He Zhengxiao},
title = {FastHDRNet: A new efficient method for SDR-to-HDR Translation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {464-479},
note = {Poster Volume Ⅰ}
}
-
Transformer in Touch: A Survey,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jing Gao, Ning Cheng, Bin Fang, and Wenjuan Han
Abstract: The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception,
This review aims to comprehensively outline the application and development of Transformers in tactile technology,
We first introduce the two fundamental concepts behind the success of the Transformer: the self-attention mechanism and large-scale pre-training,
Then, we delve into the application of Transformers in various tactile tasks, including but not limited to object recognition, cross-modal generation, and object manipulation,
offering a concise summary of the core methodologies, performance benchmarks, and design highlights,
Finally, we suggest potential areas for further research and future work, aiming to generate more interest within the community, tackle existing challenges, and encourage the use of Transformer models in the tactile field.
Keyword: Tactile, Self-attention, Transformers, Self-supervision
Cite
@inproceedings{ICIC_2024,
author = {Jing Gao, Ning Cheng, Bin Fang, and Wenjuan Han},
title = {Transformer in Touch: A Survey},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {13-39},
note = {Poster Volume Ⅰ}
}
-
SCD-YOLO: A security detection model for X-ray images based on the improved YOLOv5s,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiaotong Kong, Aimin Li, Wenqiang Li, Zhiyao Li, and Yuechen Zhang
Abstract: X-ray security inspection is widely used in the subway, high-speed rail, airports, key locations, logistics, and other scenarios, However, because of the complexity and diversity of objects in the X-ray images in real-world scenarios, it is easy for security personnel to make mistakes or miss inspections when they are fatigued or not fully focused, In this paper, we proposed an improved model based on YOLOv5 to help security inspectors improve the efficiency of security inspection procedures, First, we replaced the SPP(spatial pyramid pooling) feature fusion module with SPPFCSPC to further enhance the feature extraction capability, Then, we added CoordConv before each feature map input to the detection head, This enables the model to perceive positional information and enhances its feature extraction capability, effectively addressing the detection of small prohibited items in complex backgrounds, Finally, we used decoupled detector head instead of the traditional coupled detector head to separate the classification and localization tasks further improves the detection speed, The experimental results show that our method achieves 77% accuracy, Compared with state-of-the-art methods, our model also achieves significant improvements in detection accuracy and recall.
Keyword: security object detection, X-ray, yolov5s, neural network
Cite
@inproceedings{ICIC_2024,
author = {Xiaotong Kong, Aimin Li, Wenqiang Li, Zhiyao Li, and Yuechen Zhang},
title = {SCD-YOLO: A security detection model for X-ray images based on the improved YOLOv5s},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {40-52},
note = {Poster Volume Ⅱ}
}
-
Imputing Missing Temperature Data of Meteorological Stations Based on Global Spatiotemporal Attention Neural Network,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Tianrui Hou, Xinshuai Guo, Li Wu, Xiaoying Wang, Guojing Zhang, and Jianqiang Huang
Abstract: Imputing missing meteorological site temperature data is necessary and valuable for researchers to analyze climate change and predict related natural disasters, Prior research often used interpolation-based methods, which basically ignored the temporal correlation existing in the site itself, Recently, researchers have attempted to leverage deep learning techniques, However, these models cannot fully utilize the spatiotemporal correlation in meteorological stations data, Therefore, this paper proposes a global spatiotemporal attention neural network (GSTA-Net), which consists of two sub networks, including the global spatial attention network and the global temporal attention network, respectively, The global spatial attention network primarily addresses the global spatial correla-tions among meteorological stations, The global temporal attention network pre-dominantly captures the global temporal correlations inherent in meteorological stations, To further fully exploit and utilize spatiotemporal information from me-teorological station data, adaptive weighting is applied to the outputs of the two sub-networks, thereby enhancing the imputation performance, Additionally, a progressive gated loss function has been designed to guide and accelerate GSTA-Net's convergence, Finally, GSTA-Net has been validated through a large num-ber of experiments on public dataset TND and QND with missing rates of 25%, 50%, and 75%, respectively, The experimental results indicate that GSTA-Net outperforms the latest models, including Linear, NLinear, DLinear, PatchTST, and STA-Net, across both the mean absolute error (MAE) and the root mean square error (RMSE) metrics.
Keyword: Attention mechanism, Deep learning, Neural network, Missing data imputing, Meteorological station data, Spatiotemporal correlation
Cite
@inproceedings{ICIC_2024,
author = {Tianrui Hou, Xinshuai Guo, Li Wu, Xiaoying Wang, Guojing Zhang, and Jianqiang Huang},
title = {Imputing Missing Temperature Data of Meteorological Stations Based on Global Spatiotemporal Attention Neural Network},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {40-56},
note = {Poster Volume Ⅰ}
}
-
Few-Shot Constraint Enhancement Based on Generative Adversarial Networks,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xin Song, Yanqing Song, Jianguo Chen, and Long Chen
Abstract: A constrained theoretical model for Generative Adversarial Networks (GANs) is proposed, To address issues such as overfitting, convergence difficulties, and mode collapse in the GAN training process, a GAN structure and process constrained training based on Directed Graphical Models(DGM) is first introduced to solve the instability and quality issues of generated samples, Then, a static constraint method is proposed, which calculates the similarity of interpretable measurement (EMS) and final classification metrics of generated data on different classifiers by setting the topology of D and G, and measures the constraint strength through EMS to suppress overfitting during the generation process, Furthermore, the constraint of label sharing features and weight updates effectively reduces the probability of mode collapse by appropriately constraining the functionality of label information in generation, The constraint of GAN solves the problem of effective sample enhancement.
Keyword: Few-shot
, Constraint Enhancement
, Generative Adversarial Networks
Cite
@inproceedings{ICIC_2024,
author = {Xin Song, Yanqing Song, Jianguo Chen, and Long Chen},
title = {Few-Shot Constraint Enhancement Based on Generative Adversarial Networks},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {57-69},
note = {Poster Volume Ⅰ}
}
-
Skeleton-Based Actions Recognition with Significant Displacements,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Chengming Liu, Jiahao Guan, and Haibo Pang
Abstract: In the realm of human skeleton-based action recognition, the graph convolu-tional networks have proven to be successful, However, directly storing co-ordinate features into the graph structure presents challenges in achieving shift, scale, and rotation invariance, which is crucial for actions with signifi-cant displacements, Such as figure skating, due to the significant displace-ments of athletes relative to the camera and the inherent perspective effects, leading to variations in scale, position, and rotation-related features, Signifi-cant displacements and perspective effects in actions video result in varia-tions in scale, position, and rotation-related features, To address this, drawing inspiration from leveraging high-order information, we propose a novel co-sine stream, This stream utilizes the bending angle of human joints for action recognition based on human skeleton, Furthermore, we introduce a new keyframe downsampling algorithm that significantly improves model per-formance, Notably, our approach does not necessitate any modifications to the backbone, Through extensive experiments on three datasets: FSD-10, FineGYM, and NTU RGB+D, our approach demonstrates improved recogni-tion of actions with significant displacement compared to current mainstream methods.
Keyword: Action Recognition, Skeleton, Angle, Figure skating
Cite
@inproceedings{ICIC_2024,
author = {Chengming Liu, Jiahao Guan, and Haibo Pang},
title = {Skeleton-Based Actions Recognition with Significant Displacements},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {480-491},
note = {Poster Volume Ⅰ}
}
-
DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Wei Wang, Jixing He, and Xin Wang
Abstract: It is helpful in preventing colorectal cancer to detect and treat polyps in the gastrointestinal tract early, However, there have been few studies to date on designing polyp image classification networks that balance efficiency and accuracy, This challenge is mainly attributed to the fact that polyps are similar to other pathologies and have complex features influenced by texture, color, and morphology, In this paper, we propose a novel network DFE-IANet based on both spectral transformation and feature interaction, Firstly, to extract detailed features and multi-scale features, the features are transformed by the multi-scale frequency domain feature extraction (MSFD) block to extract texture details at the fine-grained level in the frequency domain, Secondly, the multi-scale interaction attention (MSIA) block is designed to enhance the network's capability of extracting critical features, This block introduces multi-scale features into self-attention, aiming to adaptively guide the network to concentrate on vital regions, Finally, with a compact parameter of only 4M, DFE-IANet outperforms the latest and classical networks in terms of efficiency, Furthermore, DFE-IANet achieves state-of-the-art (SOTA) results on the challenging Kvasir dataset, demonstrating a remarkable Top-1 accuracy of 93, 94%, This outstanding accuracy surpasses ViT by 8, 94%, ResNet50 by 1, 69%, and VMamba by 1, 88%, The code is publicly available at https://anonymous, 4open, science/r/DFE-IANet-FABE.
Keyword: Polyp image classification, spectral transformation, feature interaction, multi-scale
Cite
@inproceedings{ICIC_2024,
author = {Wei Wang, Jixing He, and Xin Wang},
title = {DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {492-508},
note = {Poster Volume Ⅰ}
}
-
A Lightweight Dual-Channel Multimodal Emotion Recognition Network Using Facial Expressions and Eye Movements,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Mengcheng Ji, Fulan Fan, Xin Nie, and Yahong Li
Abstract: Emotional understanding plays a crucial role in various fields related to human-computer interaction, emotional computing, and human behavior anal-ysis, However, traditional single-modal methods often struggle to capture the complexity and subtleties of emotional states, With the advances in eye-tracking technology and facial expression recognition technology, eye-tracking and facial expressions provide complementary insight, We combine eye-tracking and facial expressions to conduct emotional research, Combining these two types of infor-mation more comprehensively and accurately describes the emotional experience of individuals and improves upon methods using a single mode, Because human emotional changes require event induction, the events and methods of emotion induction are extremely important, We also present a data collection experiment using emotion theory in psychology, We selected three types of emotion-activat-ing images (positive, neutral, and negative) from the Chinese Affective Picture System (CAPS), We design a system to extract features from the collected data, fusing the multi-modal eye tracking and facial expressions, This system is our proposed dual-channel multi-modal emotion recognition lightweight network VGG-inspired LightNet using a convolutional neural network (CNN), This model achieved an accuracy rate of 96, 25% in tests using our gathered data.
Keyword: Multimodal,Facial expressions,Eye-tracking,Feature fusion,Emotional recognition
Cite
@inproceedings{ICIC_2024,
author = {Mengcheng Ji, Fulan Fan, Xin Nie, and Yahong Li},
title = {A Lightweight Dual-Channel Multimodal Emotion Recognition Network Using Facial Expressions and Eye Movements},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {53-67},
note = {Poster Volume Ⅱ}
}
-
A Multi-subject Classification Algorithm Based on SVM Geometric Interpretation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Chi Tang
Abstract: A new multi-subject classification algorithm based on support vector machine(SVM) is proposed, For each class of training samples, a minimum convex shell surrounding as many samples as possible is constructed in the feature space by using soft SK algorithm, and finally the multi-subject classifier composed of multiple convex shells is obtained, For the sample to be classified, its classes are determined according to the convex hulls in which it are located, If it is not in any convex hull, firstly, the membership degree is determined by the distance that from it to the centroid of each class sample, and then its class to which it belongs is determined according to the membership degree, The classification experiments are carried out on the standard dataset Reuters 21578, and the classification performance is compared with the hyperellipsoid SVM classification algorithm, The experimental results show that compared with the hyperellipsoid SVM classification algorithm, the proposed algorithm can ensure the inheritability of the classifier and the classification accuracy is significantly improved, which effectively solves the influence of sample distribution shape on classification performance.
Keyword: Multi-subject classification, Convex hull, Schlesinger-Kozinec algorithm, Support vector machine
Cite
@inproceedings{ICIC_2024,
author = {Chi Tang},
title = {A Multi-subject Classification Algorithm Based on SVM Geometric Interpretation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {236-244},
note = {Poster Volume Ⅱ}
}
-
A Unified Model for Unimodal and Multimodal Rumor Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Haibing Zhou, Zhong Qian, Peifeng Li, and Qiaoming Zhu
Abstract: Rumor detection aims to determine the truthfulness of a post, no matter it is unimodal plain text or multimodal text and images , However, previous models only considered one of these situations, ignoring the possibility of both occurring simultaneously, Additionally, previous multimodal models often failed to tackle the inconsistency between texts and images, which can produce noise and harm performance, To address the aforementioned issues, we propose a novel unified model for unimodal and multimodal rumor detection, called the Graph Attention Generative Image Network GAGIN , which is integrated with multimodal alignment, The experimental results on two popular datasets demonstrate that GAGIN outperforms the state-of-the-art baselines.
Keyword: Unified model, Rumor detection, Graph attention network, Diffusion model and Clip model
Cite
@inproceedings{ICIC_2024,
author = {Haibing Zhou, Zhong Qian, Peifeng Li, and Qiaoming Zhu},
title = {A Unified Model for Unimodal and Multimodal Rumor Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {611-622},
note = {Poster Volume Ⅱ}
}
-
Deblurring via Video Diffusion Models,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yan Wang and Haoyang Long
Abstract: Video deblurring poses a significant challenge due to the intricate nature of blur, which often arises from a confluence of factors such as camera shakes, object motions, and variations in depth, While diffusion models and video diffusion models have respectively shone brightly in the fields of image and video generation, achieving remarkable results, Specifically, Diffusion Probabilistic Models (DPMs) have been successfully utilized for image deblurring, indicating the vast potential for research and development of video diffusion models in the realm of video deblurring, However, due to the significant data and training time requirements of diffusion models, the prospects of video diffusion models for video deblurring tasks remain uncertain, To investigate the feasibility of video diffusion models in video deblurring, this paper proposes a diffusion model specifically tailored for this task, Its model structure and some parameters are based on a pre-trained text-to-video diffusion model, and through a two-stage training process, it can accomplish video deblurring with a relatively small number of training parameters and data, Furthermore, this paper compares the performance of the proposed model with baseline models and achieves state-of-the-art results.
Keyword: Computer vision, Video deblurring, Diffusion model
Cite
@inproceedings{ICIC_2024,
author = {Yan Wang and Haoyang Long},
title = {Deblurring via Video Diffusion Models},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {509-520},
note = {Poster Volume Ⅰ}
}
-
Double Global and Local Information-based Image Inpainting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Shibin Wang, Wenjie Guo, Shiying Zhang, Xuening Guo, and Jiayi Guo
Abstract: With the development of deep learning, significant progress has been made in image inpainting, Deep learning-based image inpainting methods can generate visually plausible inpainting results, However, the inpainting images may include the distortions or artifacts, especially at boundaries and high-texture regions, To address these issues, we propose an improved two-stage inpainting model with double local and global information, In the first stage, an Local Binary Pattern (LBP) learning network based on the U-Net architecture is employed to accurately predict the semantic and structural information of the missing regions, In the second stage, the double local and global network based on spatial attention module and Double-PatchGAN Discriminator (DPD) are proposed for further refinement, Aim to achieve the accurate, realistic, and high-quality inpainting results, the Multiple Loss Functions (MLF) is designed to strengthen the information at different levels, Extensive experiments conducted on public datasets, including CelebA-HQ, Places2 and Paris StreetView, demonstrate that our model outperforms several existing methods in terms of image inpainting.
Keyword: Deep learning, Image inpainting, Local Binary Pattern, Double-PatchGAN Discriminator, Multiple Loss Functions
Cite
@inproceedings{ICIC_2024,
author = {Shibin Wang, Wenjie Guo, Shiying Zhang, Xuening Guo, and Jiayi Guo},
title = {Double Global and Local Information-based Image Inpainting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {521-537},
note = {Poster Volume Ⅰ}
}
-
WIDDAS: A Word-Importance-Distribution-based Detection method against Word-Level Adversarial Samples,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiangge Li, Hong Luo, and Yan Sun
Abstract: Deep neural networks are facing security threats from adversarial samples, and even the most advanced large-scale language models are still vulnerable to adversarial attacks, Moreover, existing defense methods against adversarial attacks suffer from issues such as low accuracy in detection, too much false detection of clean data, and high defense costs, Therefore, in this paper, we propose WIDDAS: a Word-Importance-Distribution-based Detection method against Word-Level Adversarial Samples , It comprises a detection module and an evaluation module, The detection module swiftly identifies potential adversarial samples based on the word importance distribution of the input text, Then the evaluation module attempts to restore those samples and evaluates whether they are adversarial, thereby filtering out clean data which is non-adversarial, Experimental results demonstrate that WIDDAS outperforms the baselines in terms of both detection accuracy for adversarial samples and clean data, Particularly in scenario of Chinese data, the detection accuracy is at least 1, 2% higher than the best baselin.
Keyword: Natural Language Processing, Adversarial Samples, Textual Defense, Adversarial Detection, Model Robustness
Cite
@inproceedings{ICIC_2024,
author = {Xiangge Li, Hong Luo, and Yan Sun},
title = {WIDDAS: A Word-Importance-Distribution-based Detection method against Word-Level Adversarial Samples},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {70-86},
note = {Poster Volume Ⅰ}
}
-
Methods for blocking malicious traffic with static Bayesian game in IIoT,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Maoli Wang, Bowen Zhang, and Quanxi Xia
Abstract: With the development of the Industrial Internet of Things (IIoT), its security issues have become prominent, and network attacks have continued to increase, including malicious traffic threats, There are already many effective methods for detecting malicious traffic in the Industrial Internet of Things, but how to handle detected malicious traffic lightly and effectively? Based on this problem, we propose a method to dynamically adjust malicious traffic in the Industrial Internet of Things using Software-Defined Networking (SDN), With the help of SDN's programmability of the net-work and the characteristics of decoupling the control plane and the data plane, through the SDN controller OpenFlow rule entries corresponding to malicious traffic are generated, and then the SDN switch updates the flow table to achieve the purpose of blocking malicious traffic, At the same time, we consider two types of known types of malicious traffic and unknown types of malicious traffic, Different strategies of traffic blocking, including traffic dropping and traffic redirection, conduct a static Bayesian game be-tween two types of malicious traffic and two traffic blocking strategies, taking into account factors such as current and future benefits, response costs, and risk levels, through the Harsanyi transformation reasoning proves that the Nash equilibrium point and equilibrium strategy are found, and then the strategy is numerically analyzed and experimentally verified, The final result is that when the known type of malicious traffic is discarded and the unknown type of malicious traffic is redirected, comprehensive maximum utility.
Keyword: IIoT, blocking malicious traffic, static Bayesian game, SDN
Cite
@inproceedings{ICIC_2024,
author = {Maoli Wang, Bowen Zhang, and Quanxi Xia},
title = {Methods for blocking malicious traffic with static Bayesian game in IIoT},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {745-764},
note = {Poster Volume Ⅰ}
}
-
SimPM: A Simple Patch Masking Contrastive Learning Framework for Time Series Forecasting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jinjun Zhang, Tianyi Wang, and Xiaolin Qin
Abstract: Time series forecasting plays a critical role in numerous practical
industries, where effectively learning and extracting meaningful representations has always been a significant and challenging problem, Although contrastive learning methods have shown outstanding ability in
learning meaningful representations in computer vision and natural language
processing domains, their performance in time series forecasting
tasks is weaker, This weakness can mainly be attributed to their failure
to fully consider the characteristics of time series data, leading to information loss,
Specifically, existing data augmentation strategies primarily
operate at the timestamp level, which cannot fully exploit and utilize
local semantic information, Moreover, previous research has not taken
into account the sharing of information between independent channels
when dealing with inter-channel information, This limitation, to some
extent, restricts the integrity of the learned representations, To address
these issues, we propose a new method called SimPM, a simple patch
masking contrastive learning framework for time series forecasting that
effectively mitigates information loss, In our experiments on seven benchmark
time series forecasting datasets, SimPM demonstrates competitive
performance compared to existing contrastive learning methods.
Keyword: time series forecasting, contrastive learning, patch masking
Cite
@inproceedings{ICIC_2024,
author = {Jinjun Zhang, Tianyi Wang, and Xiaolin Qin},
title = {SimPM: A Simple Patch Masking Contrastive Learning Framework for Time Series Forecasting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {87-103},
note = {Poster Volume Ⅰ}
}
-
The path planning of hybrid algorithm for patrol robot in complex environment,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhiyong Wu and Weiwei Hu
Abstract: In this paper, an improved hybrid obstacle avoidance algorithm is proposed, The APF method has a small amount of calculation, easy to implement, and has good adaptability to dynamic environment obstacle avoidance, but it does not have certain adaptability to complex environment, The algorithm gets stuck in local minima and U-shaped obstacles, Combined with the basic principle of improved APF and A* algorithm, we propose a hybrid algorithm to solve these problems, The path planned by this hybrid path planning algorithm can plan a suitable path no matter what kind of complex environment, In addition, we perform edge dilation on the original obstacle grid map, This processing can ensure that the planned path is safe and collision-free, and the planning makes this planning algorithm universal by adjusting the puffing edge, that is, it can be adjusted according to the radius of the robot and also has a good planning effect, In addition, considering the limitation of four-wheel robot steering, we use quasi-uniform B-spline to smooth the path, shorten the length of the path and reduce the curvature of the path, Under the coordination of all aspects, we finally get a more satisfactory safe and collision-free path.
Keyword: hybrid algorithm, improved APF, improved A*, safe obstacle avoidance,patrol robot
Cite
@inproceedings{ICIC_2024,
author = {Zhiyong Wu and Weiwei Hu},
title = {The path planning of hybrid algorithm for patrol robot in complex environment},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {911-922},
note = {Poster Volume Ⅰ}
}
-
Roberta-MHARC: Enhanced Telecom Fraud Detection with Multi-head Attention and Residual Connection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jun Li and Cheng Zhang
Abstract: Telecom fraud has become one of the hardest-hit areas in the criminal field, With the development of artificial intelligence technology, telecom fraud texts have become highly concealed and deceptive, Existing prevention methods, such as mobile phone number tracking, detection, and traditional machine learning model text recognition, lack real-time performance in identifying telecom fraud, Furthermore, due to the scarcity of Chinese telecom fraud text data, the accuracy of recognizing Chinese telecom fraud text is not high, In this paper, we design a telecom fraud text decision-making model Roberta-MHARC based on Roberta combined with multi-head attention mechanism and residual connection, First, the model selects some categories of data in the CCL2023 telecom network fraud data set as basic samples, and combines it with the collected telecom fraud text data to form a five-category covering impersonating customer service, impersonating leadership acquaintances, loans, public security fraud, and normal text data set, Secondly, during the training process, the model adds a multi-head attention mechanism and improves the training speed through residual connections, Finally, the model improves its accuracy on multi-classification tasks by introducing an inconsistency loss function alongside the cross-entropy loss, Experimental results show that our model achieves good results on multiple benchmark datasets.
Keyword: natural language processing, telecom fraud, Roberta, multi-head attention, multi-classification tasks
Cite
@inproceedings{ICIC_2024,
author = {Jun Li and Cheng Zhang},
title = {Roberta-MHARC: Enhanced Telecom Fraud Detection with Multi-head Attention and Residual Connection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {623-638},
note = {Poster Volume Ⅱ}
}
-
AttentionNet: An Efficient Scheme for Human Activity Recognition,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Wei Yang, Xiaojun Jing, Hai Huang, Chao Li, and Botao Feng
Abstract: Aiming at the problem of noise interference in two-dimensional radar signals, an efficient human behavior recognition scheme based on attention mechanism and convolutional neural network (CNN) is proposed, This algorithm combines attention mechanism and spatial pyramid pooling (SPP) layer with CNN, and fuses hierarchical feature maps generated by the network to reduce noise interference, By comparing the performance of proposed method with that of common CNNs, the experimental results show that the proposed algorithm acts more effectively under different noises, Especially, when the signal-to-noise ratio (SNR) is higher than -10dB, an accuracy rate of more than 90% could be achieved.
Keyword: Human activity recognition,Radar signal,Attention mechanism,CNN
Cite
@inproceedings{ICIC_2024,
author = {Wei Yang, Xiaojun Jing, Hai Huang, Chao Li, and Botao Feng},
title = {AttentionNet: An Efficient Scheme for Human Activity Recognition},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {104-114},
note = {Poster Volume Ⅰ}
}
-
CoGSD: Fast Consistency Generation Based on 3D Gaussian Splatting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhenghui Sun, Xianglong Li, and Shuyuan Chang
Abstract: This research introduces a multi-view consistent 3D modeling method CoGSD Cosistent Gaussian Splatting Dreamer , this is construct 3D models rapidly with multi-view consistency through 3D Gaussian Splatting, 3D Gaussian Splatting generation fails to construct effective 3D assets due to the lack of stable ground truth, In addition, the expansion characteristics of 3D Gaussian Splatting itself lead to abnormal expansion of saturated Gaussian points and multi-view inconsistency problems, At the same time, the lack of credible ground truth will also lead to multi-view inconsistency problems, To solve this problem, we use a pre-trained consistent diffusion model to generate consistent viewpoints, In our framework, instead of generating diffusion with a single a priori perspective, the 2D image generation method of SDS uses a controlnet-tuned pre-trained model to generate 2D images with coherent viewpoints, resulting in high-quality 3D model generation, The method in this paper provides an effective solution for 3D modeling and is expected to be widely used in the field of 3D modeling and visual effects.
Keyword: 3D Model Generation, Multi-view Consistent Image Generation, 3D Gaussian Splatting, Score Distillation Sampling, Diffusion Model
Cite
@inproceedings{ICIC_2024,
author = {Zhenghui Sun, Xianglong Li, and Shuyuan Chang},
title = {CoGSD: Fast Consistency Generation Based on 3D Gaussian Splatting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {245-258},
note = {Poster Volume Ⅱ}
}
-
Hide Your Weaknesses from Attackers: A Defense Method against Black-Box Adversarial Text Attacks,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Kun Li, Minxuan Lv, and Wei Zhou
Abstract: Despite the significant successes of LLMs in generative tasks, the current preference in classification scenarios predominantly leans towards the use of pre-trained language models(PLM), taking into account the balance between cost and effectiveness, However, these models are prone to manipulation through black-box adversarial text attacks, where attackers modify texts in subtle ways to deceive the models, Typically, attackers follow a two-step process: first, they identify crucial sentence elements for the model, then they alter these elements by replacing, deleting, or adding words or characters,
Previous research has mainly focused on creating adversarial examples for training, aiming to improve model resilience, These efforts often overlook defenses against the initial phase of identifying vulnerable targets, This paper introduces a defensive strategy against these first-stage attacks, leveraging concepts from differential privacy, We propose a novel approach, extbf{Mask Regeneration}, which conceals the targets using a [Mask] token and employs a Mask Language Model (MLM) to generate misleading samples, Additionally, we observe that key targets often align with high attention values in the model, Based on this insight, we introduce an extbf{Attention Shuffle} tactic, which randomizes the top-k attention values at each transformer layer, further disorienting attackers,
The experiment shows that our defense method achieves better robustness gains than the State-of-the-art under three strong adversarial attacks for three typical NLP tasks, like sentiment analysis, textual entailment, and topic classification, Moreover, it is also demonstrated that the attack cost significantly increases when attacking our defense model.
Keyword: Natural language processing and Adversarial attack and Adversarial defense
Cite
@inproceedings{ICIC_2024,
author = {Kun Li, Minxuan Lv, and Wei Zhou},
title = {Hide Your Weaknesses from Attackers: A Defense Method against Black-Box Adversarial Text Attacks},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {259-269},
note = {Poster Volume Ⅱ}
}
-
TUPL: Text-guided Unknown Pseudo-Labeling for Open World Object Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xuefei Wang and Dong Xu
Abstract: Open-world object detection (OWOD) aims to enhance the performance of a trained model in real-world environments, A crucial requirement is to detect unknown objects even with only bits of known class labels, Existing OWOD models that rely on objectness scores to choose unknown pseudo candidates often suffer from sub-optimal performance due to bias issue towards known classes, Leveraging the inherent advantage of text, which contains natural high-level semantic information, we incorporate textual data into the process of generating pseudo-labels for unknown class objects, By combining random selection to further mitigate the bias problem, we present a concise yet powerful text-guided pseudo-labeling approach for OWOD, named TUPL, To further facilitate the model in distinguishing all foreground objects in an image, we design an ROI feature refinement module to assist the model in learning distinctive foreground features, Experiments conducted on the PASCAL VOC and MS-COCO evaluation benchmarks demonstrate TUPL's exceptional open-world detection capability, Specifically, under the OWOD SPLIT setting, TUPL achieves a UR (Unknown Recall) value of 23, 1, which is at least twice as high as that of existing pseudo-labeling methods based on objectness scores.
Keyword: Open world object detection, Cross-modal learning, Pseudo-labeling
Cite
@inproceedings{ICIC_2024,
author = {Xuefei Wang and Dong Xu},
title = {TUPL: Text-guided Unknown Pseudo-Labeling for Open World Object Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {127-144},
note = {Poster Volume Ⅰ}
}
-
Neural Network Model for Malware Classification Based on BiD-ConvLSTM Encoder,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Junyu Wu, Lilie Chen, and Yuan Liu
Abstract: The issue of malware poses a significant threat to the security of computer systems and data, Previous research has demonstrated that employing deep learning techniques is an effective solution for addressing this problem, Many scholars have adopted a method wherein they convert the binary code of malware executable files into images, utilizing neural networks for classification purposes, However, neural networks typically require inputs of fixed sizes, while malware sizes vary greatly and are often not uniform, Traditional algorithms that enforce a uniform image size can lead to information loss and redundancy, This paper proposes a malware classification model that employs a Bidirectional Dynamic Convolutional Long Short-Term Memory (BiD-ConvLSTM) encoder, This encoder can encode images generated from the binary code of malware executable files of varying sizes, outputting fixed-size feature images for neural network training, The encoded images are fed into a ResNet-50 for training and achieve up to 98, 44% accuracy on the Microsoft Malware Classification Challenge (BIG 2015) dataset on Kaggle.
Keyword: Malware Detection, Malware Classification, Deep Learning, Neural Networks, BiD-ConvLSTM
Cite
@inproceedings{ICIC_2024,
author = {Junyu Wu, Lilie Chen, and Yuan Liu},
title = {Neural Network Model for Malware Classification Based on BiD-ConvLSTM Encoder},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {765-780},
note = {Poster Volume Ⅰ}
}
-
Hyper-heuristic Ant Colony Optimization for solving the integrated distributed permutation flow shop problem and multiple compartments vehicle routing problem with simultaneous deterministic delivery and fuzzy pickup,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yan Geng, Bin Qian, Ning Guo, and Rong Hu
Abstract: Production and transportation are two essential activities in supply chain management, decision-makers strive to enhance the operational efficiency of these two stages to maximize business interests, In this paper, we consider the integrat-ed distributed permutation flow shop problem (IDFSP) and multiple compartments vehicle routing problem with simultaneous deterministic delivery and fuzzy pickup (IDFSP_MCVRPSDDFP), The IDFSP_MCVRPSDDFP aims to simultaneously minimize cost and carbon emissions caused by both production and transporation, To address the IDFSP_MCVRPSDDFP, we propose a hyper-heuristic ant colony optimization algorithm (HH_ACO), The HH_ACO is com-posed of two main components: a hyper-heuristic algorithm (HHA) and an ant colony optimization algorithm (ACO), To enhance the efficiency of local search, we design six heuristic operations within the low-level heuristics (LLHs), Meanwhile, the ACO is utilized to enhance the performance of the high-level heuristics (HLS) within the HHA, Experimental simulations and data analysis have validated that HH_ACO can effectively solve IDFSP_MCVRPSDDFP.
Keyword: distributed permutation flow shop,multi-objective optimization,hyper heuristic algorithm
Cite
@inproceedings{ICIC_2024,
author = {Yan Geng, Bin Qian, Ning Guo, and Rong Hu},
title = {Hyper-heuristic Ant Colony Optimization for solving the integrated distributed permutation flow shop problem and multiple compartments vehicle routing problem with simultaneous deterministic delivery and fuzzy pickup},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {851-862},
note = {Poster Volume Ⅰ}
}
-
Multimodal Chinese Event Detection on Vision-Language Pre-training and Glyphs,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qianqian Si, Zhongqing Wang, Peifeng Li, and Qiaoming Zhu
Abstract: When using visual information to complement textual data for event extraction, current approaches primarily focus on processing text and images independently using different pre-trained uni-models and then fusing the feature information from different modalities, However, pre-training and fine-tuning schemes have been extended to the joint domain of vision and language, leading to the development of vision-language pre-trained models (VLPs), These models are extensively trained on text and its corresponding images and then fine-tuned for vision-language tasks, In this paper, we propose a method for event detection in Chinese glyphs and VLP models, Since Chinese characters are hieroglyphs, some radical features of the trigger words play a certain and auxiliary role in the detection of text trigger words, We convert the text in the ACE Chinese corpus into text images, and transport the text and images into the Vision-Language model to obtain multimodal features for event detection, Experimental results on the ACE 2005 Chinese corpus show that our proposed model outperforms the SOTA baselines.
Keyword: VLP, Chinese glyphs, event detection
Cite
@inproceedings{ICIC_2024,
author = {Qianqian Si, Zhongqing Wang, Peifeng Li, and Qiaoming Zhu},
title = {Multimodal Chinese Event Detection on Vision-Language Pre-training and Glyphs},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {270-281},
note = {Poster Volume Ⅱ}
}
-
EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yichun Yu, Yuqing Lan, Zhihuan Xing, Xiaoyi Yang, Tingyue Tang, and Dan Yu
Abstract: Semantic segmentation is a highly challenging task in high-resolution remote sensing (HRRS) image due to the complex spatial layouts and significant appearance variations of multi-class objects, Convolutional Neural Networks (CNNs) have been widely employed as feature extractors for various visual tasks, owing to their excellent ability to extract local features, However, due to the inherent bias of convolutional operations, CNNs inevitably have limitations in modeling long-range dependencies, On the other hand, Transformers excel in capturing global representations but unfortunately overlook the details of local features and category features, and exhibit high computational and spatial complexity when dealing with high-resolution feature maps, Semantic segmentation has traditionally been modeled as predicting each point on a dense regular grid, In this work, we propose a novel and effective model, EMRA-proxy, which consists of two parts: homogeneous regions attention proxy(HRA-proxy) and Multi-class Attention proxy(MCA-proxy), The proposed EMRA-proxy model abandons the common Cartesian feature layout and operates purely at the region level,
First, to capture contextual information within a region, we use Transformer to encode regions in a sequence-to-sequence manner by applying multiple layers of self-attention to region embeddings acting as proxies for specific regions, HRA-proxy then interprets the image into learnable surface subdivisions, each with flexible geometry and homogeneous semantics, It is performed by using a single linear classifier on top of the encoded region embeddings for prediction per region, thereby obtaining a homogeneous semantic mask feature map (HSMF-map), Then MCA-proxy learns the global class attention map (GCA-map) to make up for ViT's shortcomings in multi-class information extraction, Finally, HSMF-map and GCA-map are integrated to achieve high-precision multi-class remote sensing image segmentation,
Extensive experiments on three public remote sensing datasets demonstrate the superiority of EMRA-proxy and indicate that the overall performance of our method outperforms state-of-the-art methods.
Keyword: Remote sensing Images,Semantic Segmentation,Homogeneous regions attention proxy,Multi-Class Attention Proxy
Cite
@inproceedings{ICIC_2024,
author = {Yichun Yu, Yuqing Lan, Zhihuan Xing, Xiaoyi Yang, Tingyue Tang, and Dan Yu},
title = {EMRA-proxy: Enhancing Multi-Class Region Semantic Segmentation in Remote Sensing Images with Attention Proxy},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {538-562},
note = {Poster Volume Ⅰ}
}
-
Efficient Detection Model of Illegal Driving Behavior in Two-Wheeled Vehicles,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Liuyu Zhu, Zhiguang Wang, Zhiqiang Liu, Xiaoxue Li, and Shen Li
Abstract: The intelligent detection of illicit driving behaviors exhibited by two-wheeled vehicles, encompassing electric two-wheeled vehicles, motorcycles, and bicy-cles, constitutes a pivotal facet in developing a contemporary intelligent traffic monitoring system, However, prevailing challenges confront intelligent detec-tion in this domain, manifesting in two principal predicaments: (1) an absence of pertinent open-source datasets and (2) suboptimal accuracy and swiftness in discerning illicit driving behavior of two-wheelers within the prevailing object detection model, Hence, this study focuses on two focal points: (1) constructing the two-wheeled vehicle illegal driving behavior detection (TIDBD) dataset coupled with annotating 10 driving states, and (2) the proposition of an effica-cious detection model, YOLOv8_VanillaBlock, tailored for detecting illegal driving behavior in two-wheeled vehicles, We experimentally compared YOLOv8_VanillaBlock with the original YOLOv8 using the TIDBD dataset employing evaluation metrics such as floating point operations (FLOPs), mean average precision (mAP), and GPU inference time, The outcomes indicate that YOLOv8_VanillaBlock yields superior detection results.
Keyword: Illigal driving behavior detection, Dataset, VanillaBlock,
Cite
@inproceedings{ICIC_2024,
author = {Liuyu Zhu, Zhiguang Wang, Zhiqiang Liu, Xiaoxue Li, and Shen Li},
title = {Efficient Detection Model of Illegal Driving Behavior in Two-Wheeled Vehicles},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {68-79},
note = {Poster Volume Ⅱ}
}
-
Temal: A Time Encoding Module Augmented LLM for Financial Forecasting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Mingjun Ma, Chenyu Wang, Ruyao Xu, Shuxiao Chen, and Zhongchen Miao
Abstract: In the domain of time series analysis, financial forecasting presents itself as a pinnacle of intricacy, Despite the multitude of models, even those powered by cutting-edge transformer architectures, their practical efficacy on financial datasets has remained unexplored, This challenge stems from the unique nature of financial derivatives: various time scales, multifaceted attributes, and volatile patterns, Therefore, this study introduces an innovative multi-modal fine-tuning framework, which harnesses the semantic comprehension capabilities of Large Language Models (LLMs) and encodes both time-series data and its domain-specific knowledge, To mitigate the shortcomings of LLMs in capturing temporal dynamics, we propose two pivotal innovations: a Time-series Encoding Module (TEM) and a Multi-Patch Method, The TEM seamlessly embeds sophisticated temporal representation algorithms within the LLM architecture, Concurrently, the Multi-Patch Method transforms 1D time series into multiple sets of 2D tensors, each representing distinct temporal segments, thereby enriching the model's temporal analysis capabilities, Our empirical evaluations reveal that the Multi-Patch Method adeptly handles the complex temporal fluctuations across varied intervals, The proposed model outperforms other competing methods, marking a 20, 2% enhancement in forecasting accuracy for Turnover Ratio and an 9, 1% improvement in zero-shot forecasting performance, Crucially, The TEM and Multi-Patch offer modular improvements for LLM-based time-series forecasting, with potential applications across various domains.
Keyword: Time Series Forecasting, Large Language Models, Financial Derivatives
Cite
@inproceedings{ICIC_2024,
author = {Mingjun Ma, Chenyu Wang, Ruyao Xu, Shuxiao Chen, and Zhongchen Miao},
title = {Temal: A Time Encoding Module Augmented LLM for Financial Forecasting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {455-468},
note = {Poster Volume Ⅱ}
}
-
A Deep Reinforcement Learning Method for Solving the Multi-depot Vehicle Routing Problem,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Haixin Xu, Rong Hu, Bin Qian, Ziqi Zhang, Qingxia Shang, and Huaiping Jin
Abstract: In this paper, a deep reinforcement learning optimization algorithm combined with the clustering decomposition strategy (DRLA_CD) is proposed for solving the multi-depot vehicle routing problem (MDVRP), First, taking into account the NP-hard and strong coupling characteristics of MDVRP, an improved
K-means algorithm (IKA) is designed to decompose MDVRP into several single-depot vehicle routing subproblems, thereby rationally reducing the search space and improving the search efficiency of the algorithm, Second, the deconstructed subproblems are solved using the deep reinforcement learning technique, and then the obtained solutions of subproblems are combined to form the whole solution of MDVRP, Finally, to confirm the efficacy of the proposed DRLA_CD, the comparative and simulation tests are carried out on instances with different scales.
Keyword: Mult-depot vehicle routing problem, Deep reinforcement learning, Cluster of decomposition, Attention mechanism
Cite
@inproceedings{ICIC_2024,
author = {Haixin Xu, Rong Hu, Bin Qian, Ziqi Zhang, Qingxia Shang, and Huaiping Jin},
title = {A Deep Reinforcement Learning Method for Solving the Multi-depot Vehicle Routing Problem},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {863-874},
note = {Poster Volume Ⅰ}
}
-
Enhancing YOLOv5 with Swin Transformer and Multi-Scale Attention for Improved Helmet Detection in Power Grid Construction Sites,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jindong He, Tianming Zhuang, Jianwen Min, Botao Jiang, Chaosheng Feng, and Zhen Qin
Abstract: With the progress of computer vision technology, helmet wearing detection has become increasingly important for site safety, particularly in complex environments like power grid construction sites, However, this remains challenging due to issues with incorrect and missed detections in crowded environments, To it, in this paper, we introduce a novel approach that refines the YOLOv5 network with swin Transformer, This method is able to address the limitations of YOLO's convolutional architecture, which struggles with long-range dependencies and dense target detection, Our hybrid strategy combines the Transformer's ability to capture global dependencies with YOLO's processing speed, resulting in a robust and real-time detection system for power grid environments, Additionally, a novel Multi-Scale Convolutional Attention (MSCA) module is proposed, which overcomes the single-scale focus of existing attention mechanisms, By integrating attention across various scales, the MSCA module captures the semantic richness of different feature sizes, enhancing the model's performance in long-range contextual understanding and fine-grained semantic awareness, Extensive experiments are conducted on the safety helmet wearing detect and VOC2028-SafeHelmet datasets, the superior performance validates the effectiveness and generalization of our proposed method.
Keyword: safety helmet wearing, target detection, vision Transformer, multi-scale representations
Cite
@inproceedings{ICIC_2024,
author = {Jindong He, Tianming Zhuang, Jianwen Min, Botao Jiang, Chaosheng Feng, and Zhen Qin},
title = {Enhancing YOLOv5 with Swin Transformer and Multi-Scale Attention for Improved Helmet Detection in Power Grid Construction Sites},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {115-126},
note = {Poster Volume Ⅰ}
}
-
Employing Coarse-grained Task to Improve Fine-grained Dialogue Topic Shift Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jiangyi Lin, Yaxin Fan, Xiaomin Chu, Peifeng Li, and Qiaoming Zhu
Abstract: The goal of dialogue topic shift detection is to identify whether the current topic in a dialogue has shifted or not, Previous work has focused on detecting whether a topic has shifted, without delving into the finer-grained topic situations of the dialogue, To address these issues, we further explore fine-grained topic shift detection, Based on different categories of topic semantics, a multi-task learning framework is constructed by treating the labels of both coarse and fine granularity as different tasks, The topic semantics of the two granularities reinforce each other and enhance the robustness of the model, Finally, semantic coherence learning as well as weight adaptation learning are applied to alleviate the sample imbalance problem in the dataset, so that the model can distinguish different topic shift situations more effectively, Experimental results on the Chinese dataset CNTD show that the proposed model outperforms several baseline models.
Keyword: Chinese dialogue topic Fine granularity topic Topic shift detection Multi-task Learning
Cite
@inproceedings{ICIC_2024,
author = {Jiangyi Lin, Yaxin Fan, Xiaomin Chu, Peifeng Li, and Qiaoming Zhu},
title = {Employing Coarse-grained Task to Improve Fine-grained Dialogue Topic Shift Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {639-650},
note = {Poster Volume Ⅱ}
}
-
Unsupervised Attention-Based Generative Adversarial Network for Remote Sensing Image Fusion,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Quanli Wang, Qian Jiang, Yuting Feng, Shengfa Miao, Huangqimei Zheng, and Xin Jin
Abstract: Remote sensing image fusion combines single-band panchromatic (PAN) image with multi-spectral (MS) image to generate high quality fused image, also known as pan-sharpening, Most of the current methods suitable for remote sensing image fusion are supervised, which require proportional down-sampling of the original multi-spectral image as training image, and the original multi-spectral image as label image, This will result in poor performance of the model on full resolution images, so the unsupervised methods are more practical, Furthermore, most methods do not consider the differences between MS and PAN images and use the same modules to extract features, which results in some information loss, Therefore, we design an unsupervised attention-based generative adversarial network fusion framework (UAB-GAN), which can be trained directly on the datasets of unlabeled images, Specifically, the model framework consists of a generator and two discriminators, The generator employs different network modules with specific designs to extract unique modal features from PAN and MS images, respectively, Then two discriminators are designed to preserve the spectral and spatial information of different images, Additionally, we propose a unified loss function to integrate multi-scale spectral and spatial features without external data supervision, The effectiveness of the proposed method is demonstrated through experiments conducted on various datasets.
Keyword: Image Fusion, Generative Adversarial Networks (GAN), Unsupervised Methods, Remote Sensing Image
Cite
@inproceedings{ICIC_2024,
author = {Quanli Wang, Qian Jiang, Yuting Feng, Shengfa Miao, Huangqimei Zheng, and Xin Jin},
title = {Unsupervised Attention-Based Generative Adversarial Network for Remote Sensing Image Fusion},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {563-580},
note = {Poster Volume Ⅰ}
}
-
a novel interaction graph and texts fusion method for review-based recommendation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yuqiao Liu,Nan Zheng,Song zhang
Abstract: Review-based recommender systems aim to calculate users' preference for items by leveraging user reviews, Current methods mainly consist of two components: user and item embedding learning and user-item rating predicting, But these methods overlook the higher-order interaction relationships in the user-item graph which are beneficial to capture users' preferences and features among items, Also, these methods overlook the inherent attributes in item descriptions which complement user reviews, In this paper, we propose a deep neural recommendation framework named UniDNR that unites item descriptions, user reviews and the user-item interaction graph to make recommendations, UniDNR can be divided into three parts: the ID-level embedding layer, the text-level embedding layer and the rating prediction layer, Specifically, the ID-level embedding layer captures the higher-order interactive relationship in the user-item interaction graph which can better share features among users and items, The text-level embedding layer focuses on embedding items and users by aspect-based learning which considering different aspects mentioned in descriptions and reviews, Such that, we combine ID embedding and text embedding to predict the most likely final rating assigned by the user, Experiments on three real-world datasets demonstrate the superiority of our proposed UniDNR model compared to the state-of-the-art baselines.
Keyword: Recommender systems, Text information, Neural network
Cite
@inproceedings{ICIC_2024,
author = {Yuqiao Liu,Nan Zheng,Song zhang},
title = {a novel interaction graph and texts fusion method for review-based recommendation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {469-479},
note = {Poster Volume Ⅱ}
}
-
Enhancing Multi-Step Mathematical Reasoning in Large Language Models with Step-by-Step Similarity Prompts and Answer Voting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qi Ye, Xiang Ji, RuiHui Hou, JingPing Liu, and Tong Ruan
Abstract: Complex reasoning problems, especially multi-step mathematical reasoning problems, are a difficult class of NLP tasks to solve, Existing methods such as Manual-CoT improve the accuracy of reasoning tasks by manually designing prompts to allow large models to output reasoning paths, However, the quality of the inference steps generated by this method is not high, resulting in many calculation and planning errors, To tackle the problems, we propose a method that combines enhanced similar step-by-step prompts with an answer voting mechanism, Specifically, we first design a comprehensive prompt template that integrates task prompts, CoT prompts, and format prompts, and then use two similar templates to guide the Large Language Model in generating better inference paths, Furthermore, we use ChatGLM for efficient information retrieval and determine the most accurate answer through a majority voting system, We evaluate our method in five mathematical datasets and one symbolic dataset, The experimental results over GPT-3 show that our proposed method outperforms Zero-shot-CoT and Zero-shot-Program-of-Thought Prompting across all datasets by a large margin of 7, 3\% and 4, 4\% respectively, and exceeds Plan-and-Solve in five of six datasets, Particularly, on symbolic datasets our method completely outperforms all comparable methods by a large margin of an average of 13\%, Our code and data are publicly available at https://anonymous, 4open, science/r/ESPDE-2740.
Keyword: Mathematical reasoning, CoT, Similar prompt
Cite
@inproceedings{ICIC_2024,
author = {Qi Ye, Xiang Ji, RuiHui Hou, JingPing Liu, and Tong Ruan},
title = {Enhancing Multi-Step Mathematical Reasoning in Large Language Models with Step-by-Step Similarity Prompts and Answer Voting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {282-296},
note = {Poster Volume Ⅱ}
}
-
VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jinghua Tang, Liyun Zhang, Yu Lu, Dian Ding, Lanqing Yang, Yi-Chao Chen, Minjie Bian, Xiaoshan Li, and Guangtao Xue
Abstract: Emotion recognition can enhance humanized machine responses to user commands, while voiceprint-based perception systems can be easily integrated into commonly used devices like smartphones and stereos, Despite having the largest number of speakers, there's a noticeable absence of high-quality corpus datasets for emotion recognition using Chinese voiceprints, The proposed dataset is constructed from everyday conversations and comprises over 100 users and 7,747 textual samples, Furthermore, this paper proposes a multimodal-based model as a benchmark, which effectively fuses speech, text, and external knowledge using a co-attention structure, The system employs contrastive learning-based regulation for the uneven distribution of the dataset and the diversity of emotional expressions, The experiments demonstrate the significant improvement of the proposed model over SOTA on the VCEMO and IEMOCAP datasets,
Code and dataset will be released for research.
Keyword: Speech Emotion Recognition, Multi-modal, Chinese Voiceprints
Cite
@inproceedings{ICIC_2024,
author = {Jinghua Tang, Liyun Zhang, Yu Lu, Dian Ding, Lanqing Yang, Yi-Chao Chen, Minjie Bian, Xiaoshan Li, and Guangtao Xue},
title = {VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {480-491},
note = {Poster Volume Ⅱ}
}
-
Efficient Quantized Transformer Network for EEG Emotion Recognition,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yun Ping, Chao Xu, Jing Hu, Jian Xiao and Yihang Song
Abstract: Emotion recognition algorithms based on Electroencephalography EEG sig-nals have made promising progress, but as wearable detection devices become more popular, there are greater demands for computational efficiency and complexity, and existing models may pose a challenge for mobile devices and edge computing platforms, In this paper, we propose an efficient quantized transformer network EQTNet for EEG emotion recognition, The basic idea is to combine the power of transformer for capturing long-term correlations in EEG data with efficient quantization to reduce the model size and calculation cost, Our model combines two quantitative methods, quantization based on weighted mean subtraction for weights and elastic quantization based on learn-ing parameters for EEG signals, The calculation efficiency of the model signif-icantly improved, with a slight loss in accuracy, Experiments on the SEED and SEEDIV datasets show that EQTNet outperforms state-of-the-art baselines, Compared to the corresponding full-precision transformer, EQTNet reduces the computational complexity by a factor of 27, whereas the accuracy is re-duced by only 1, 58 , The code and models are available at https: github, com ping-yun EQTNet, git.
Keyword: EEG signals, emotion recognition, efficient transformer, quantification,
Cite
@inproceedings{ICIC_2024,
author = {Yun Ping, Chao Xu, Jing Hu, Jian Xiao and Yihang Song},
title = {Efficient Quantized Transformer Network for EEG Emotion Recognition},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {438-454},
note = {Poster Volume Ⅱ}
}
-
HiQuFlexAsync: Hierarchical Federated Learning with Quantization, Flexible Client Selection and Asynchronous Communication,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Ze Zhao, Yifan Liu, Donglin Pan, Yi Liu, Xiaofei Li, and Zhenpeng Liu
Abstract: In three-tier federated learning at the cloud-edge, uneven data distribution can lead to a decrease in model performance, while the increased communication demands of multi-tier federated learning may impact system efficiency, To mitigate the impact of these challenges on federated learning performance, a novel method named HiQuFlexAsync has been introduced, HiQuFlexAsync is an innovative asynchronous three-tier federated learning approach with quantization capabilities, Within the framework of HiQuFlexAsync, a new quantizer is employed to naturally compress local and edge gradients, and an algorithm called "Cost-Optimized Heterogeneous Client-Edge Association" (COHEA) is developed, This algorithm aims to optimize the client selection process for federated learning by leveraging data heterogeneity and the physical diversity of clients, Simulation experiments on the MNIST and CIFAR-10 datasets demonstrate that compared to the traditional three-tier architecture HierFAVG, HiQuFlexAsync achieves an approximate 5, 6% increase in accuracy and a 12, 2% enhancement in efficiency.
Keyword: Federated Learning, Hierarchical Mechanism, Quantization, Client Selection, Asynchronous Aggregation
Cite
@inproceedings{ICIC_2024,
author = {Ze Zhao, Yifan Liu, Donglin Pan, Yi Liu, Xiaofei Li, and Zhenpeng Liu},
title = {HiQuFlexAsync: Hierarchical Federated Learning with Quantization, Flexible Client Selection and Asynchronous Communication},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {297-314},
note = {Poster Volume Ⅱ}
}
-
PropMat-DAE: An Avionics System Fault Diagnosis Algorithm based on Graph Anomaly Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Tianyi Li and Lisong Wang
Abstract: Fault diagnosis of sensors in avionics system is one of the important ways to ensure the normal operation of modern aircraft, However, in this process, most studies only consider the time anomaly of a single sensor and seldom consider the hidden spatial position relationship between sensors, so they cannot fully consider the spatiotemporal anomaly of sensors, To solve the above problem, in this paper, we propose PropMat-DAE, which is a fault diagnosis framework that comprehensively considers sensor attribute and structural anomalies, In each iteration, in addition to calculating the anomaly scores of the two parts separately, it can also optimize the combined loss on the basis of considering the full attention, which represents the reconstruction error of the sensor under the spatiotemporal fusion, Experimental results on open source aerospace sensor datasets show that the proposed method is superior to 14 new baseline methods in overall performance, and it is also superior to the mainstream attention mechanisms in the design of attention.
Keyword: Fault diagnosis,GNN, Autoencoder, Matrix Decomposition
Cite
@inproceedings{ICIC_2024,
author = {Tianyi Li and Lisong Wang},
title = {PropMat-DAE: An Avionics System Fault Diagnosis Algorithm based on Graph Anomaly Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {923-936},
note = {Poster Volume Ⅰ}
}
-
A small target detector design for aerial scenarios based on multi-cross adaptive fusion mechanism and high-efficiency feature extraction model,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zikai Li, Xiangyu Kong, Haolin Chen, and Xu Peng
Abstract: With the rich development of drone technology and artificial intelligence, target detection from the perspective of unmanned aerial vehicles UAV has become a valuable area of research for a large variety of applications, However, detecting small and densely distributed objects with significant overlap in size, all while dealing with inadequate lighting conditions, remains substantial challenges for target detection in this perspective, In this paper, we proposed MPB-YOLO, a high-efficiency small target detection algorithm based on the YOLOv8s model, aiming to overcome the challenges of target detection from the UAV perspective, To address the issue of small target sizes, we have made improvements to the neck structure, Specifically, we introduced a small target detection head on top of the large-scale feature maps and proposed a multi-scale adaptive fusion neck structure, This modification significantly enhanced the model's performance in the presence of small-sized targets, To tackle the challenges of densely distributed and heavily overlapping targets, we incorporated deformable convolutions and designed a spatial coordinate attention mechanism, This mechanism was integrated into the feature extraction module to enhance the network's perceptual capabilities, further improving the model's overall performance, The proposed MPB-YOLO methods had been evaluated in ablation experiments and compared with other state-of-the-art algorithms on VisDrone2019 dataset, The results demonstrate that our MPB-YOLO outperforms other baseline methods in the accuracy of object detection, Compared to YOLOv8s, our method achieved a significant improvement in mAP50 metrics, with a 26, 3 increase on the VisDrone2019-test and a 28 increase on the VisDrone2019-val, Additionally, the parameter of MPB-YOLO achieved a 16, 8 decrease than the benchmark model, These experiments validated the effectiveness of the MPB-YOLO methods in the task of object detection in aerial scenarios.
Keyword: tiny object detector complex scenarios multi-cross feature fusion remote sensing
Cite
@inproceedings{ICIC_2024,
author = {Zikai Li, Xiangyu Kong, Haolin Chen, and Xu Peng},
title = {A small target detector design for aerial scenarios based on multi-cross adaptive fusion mechanism and high-efficiency feature extraction model},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {176-192},
note = {Poster Volume Ⅰ}
}
-
Value Imitation Reinforcement Learning in Self-Training Dialogue State Tracking,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jie Yang, Hui Song, Bo Xu, and Tianqi Liu
Abstract: Few-shot Dialogue State Tracking aims to predict the dialogue state with limited labeled data, especially when human annotation is scarce, Existing approaches that use the Self-Training framework often suffer from the gradual drift problem, which results in a noisy expanded labeled dataset, Moreover, except model initialization process, the knowledge of the annotated data has not been fully investigated to accurately deal with unlabeled data, In this paper, we introduce Slot Value Imitation Reinforcement Learning into the Self-Training process to alleviate bias selection and improve the quality of pseudo-label, The reinforcement learning step encourages pseudo-labeled data to imitate the standard value representation of each slot, and then high-confidence pseudo labels are chosen by a dual selection strategy based on value probability and active slot accuracy, Experimental results on the MultiWOZ 2, 0 and MultiWOZ 2, 4 dataset demonstrate the effectiveness of our proposed model in few-shot DST scenarios, Compared to the original self-training method, Joint Goal Accuracy has a maximum improvement of 2, 66% in MultiWOZ 2, 0.
Keyword: Dialogue State Tracking, Reinforcement Learning, Self-Training
Cite
@inproceedings{ICIC_2024,
author = {Jie Yang, Hui Song, Bo Xu, and Tianqi Liu},
title = {Value Imitation Reinforcement Learning in Self-Training Dialogue State Tracking},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {651-665},
note = {Poster Volume Ⅱ}
}
-
Time Sequence Based Dynamic Hirsch Index Measure for Scholar Impact Factor,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qing Huo, Yanwen Li, Yue Zhao, Sijia Ma, Wenhua Ming, and Zhiyong Li
Abstract: The Hirsch Index (H-index) has become a widely used index for evaluating the academic influence of scholars, which is of great significance for talent evalu-ation and resource allocation, Currently, the measure of the H-index is mainly based on some static and present-day academic features of scholars using dif-ferent combinations of features, To gain a more comprehensive understanding of scholars' academic trajectories, it is necessary to consider their historical H-index for the future scholar impact factor, In this paper, we introduce the time sequence into the H-index and analyze the dynamic trend of the H-index of scholar's influence, Through the comparison of linear regression model, dy-namic Bayesian networks (DBNs), and the sequence-to-sequence model of LSTM, the experimental results show that the LSTM model is the most effec-tive for short-term H-index prediction, It achieves an R2 exceeding 0, 95, which surpasses the linear regression model and DBNs by 11% and 8%, respectively, Additionally, the LSTM model exhibits a significantly lower MAE of only 1, 30, representing a decrease of 1, 0 and 0, 9 compared to the linear regression model and DBNs, respectively, But for a long-term prediction, the perfor-mance of the LSTM model becomes worse and the DBNs exhibits better per-formance, Our method can effectively predict the H-index on Chinese medi-cine scholar data and avoids the problem of feature collection compared with the traditional method based on static academic features.
Keyword: Hirsch Index, Time Sequence, LSTM Neural Network, Dynamic Bayesian Networks
Cite
@inproceedings{ICIC_2024,
author = {Qing Huo, Yanwen Li, Yue Zhao, Sijia Ma, Wenhua Ming, and Zhiyong Li},
title = {Time Sequence Based Dynamic Hirsch Index Measure for Scholar Impact Factor},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {492-503},
note = {Poster Volume Ⅱ}
}
-
MD-BAN: Multi-Direction Mask and Detail Enhancement Blind-Area Network for Self-Supervised Real-World Denoising,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Ruiying Wang and Yong Jiang
Abstract: Recently, Asymmetric PD and Blind-Spot Network (AP-BSN) has shown effectiveness for real-world image denoising, However, when the noiserelated area is large, it uses the single-center pixel mask which cannot break the noise spatial correlation, therefore the blind spot recovered from the surrounding pixels still contains noise, resulting in obviously abnormal color spots in the denoised image, In addition, AP-BSN enlarges the receptive field by stacking multiple dilated convolutional layer (DCL), but these layers may lead to block artifacts and partial pixel detail information loss due to their interpolation and overlap operations, To address the above issues, we propose a multi-direction mask convolution kernel (MDMCK) to form a blind area to further destroy largescale spatial connection noise, We also propose a detail feature enhancement (DFE) module to supplement the detail lost by MDMCK and stacking DCL, Finally, we use a robust joint loss function to train our model, generating denoised images with clean and sharp detail while alleviating the block artifacts, Extensive quantitative and qualitative evaluations of the SIDD and DND datasets show that our proposed method performs favorably.
Keyword: Self-supervised denoising, Real-world image, Multi-direction mask, Detail feature enhancement, Blind-area network
Cite
@inproceedings{ICIC_2024,
author = {Ruiying Wang and Yong Jiang},
title = {MD-BAN: Multi-Direction Mask and Detail Enhancement Blind-Area Network for Self-Supervised Real-World Denoising},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {315-328},
note = {Poster Volume Ⅱ}
}
-
An Efficient Two-Stage Black-Box Sparse Adversarial Attack Method Based on Intelligent Optimization,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Ting Mei, Hui Lu, Shiqi Wang, and Ruoliu Zhang
Abstract: Sparse adversarial attacks have attracted increasing attention due to the advantage of low attack costs by limiting the number of modified pixels, However, some sparse attacks assume full access to information from deep neural networks (DNNs), often necessitating a large number of queries, making them impractical, Other methods only constrain the number of perturbed pixels, regardless of the size of the individual perturbation to each pixel, resulting in easily detectable in vision, To overcome these issues, we propose a two-stage black-box sparse attack approach that efficiently generates adversarial examples with small distortions, The proposed method first employs sparse attacks to generate an initial improved perturbation vector that meets the confidence score threshold, using Genetic Algorithm (GA), Subsequently, the size of the initial sparse perturbation vector is optimized to identify the final adversarial example with smaller perturbations through the application of Particle Swarm Optimization (PSO), The experimental results demonstrate that our method can achieve attack success rates comparable to the state-of-the-art black-box sparse attack method within the same budget while introducing more imperceptible distortions, This holds for untargeted and targeted attacks on CIFAR-10 classifiers trained conventionally and adversarially.
Keyword: black attack, sparse attack, confidence scores, intelligent optimization
Cite
@inproceedings{ICIC_2024,
author = {Ting Mei, Hui Lu, Shiqi Wang, and Ruoliu Zhang},
title = {An Efficient Two-Stage Black-Box Sparse Adversarial Attack Method Based on Intelligent Optimization},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {145-160},
note = {Poster Volume Ⅰ}
}
-
A Survey: Research Progress of Feature Fusion Technology,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Weiqi Wang, Feilong Bao, Zhecong Xing, and Zhe Lian
Abstract: Feature fusion techniques represent a critical research content in the domain of deep learning, aiming to concatenate feature information from diverse sources or varying levels to generate more comprehensive and accurate representations, This technology is extensively employed in downstream tasks that necessitate rich target representations, such as image classification, semantic segmentation, and object detection, Over recent years, under the impetus of advancements in deep learning technologies, we have witnessed rapid progress in feature fusion techniques and their profound impact on the entire computer vision field, This paper takes an technique evolutionary perspective to comprehensively summarize the innovative contributions of feature fusion technology within four cutting-edge domains: Convolutional Neural Network (CNN), Vision Transformer (ViT), Graph Convolutional Network (GCN) and Neural Architecture Search (NAS), We provide a detailed introduction to the specific implementation process of each technology, and analytically explores the pivotal roles played by the concept of feature fusion in each of these technologies through different viewpoints, Finally, we provides a systematic overview of the mechanisms behind several classical methods and arrange the open-source code links, and we performance evaluation was conducted on several classic methods.
Keyword: Deep learning, Feature fusion, Convolutional neural network, Vision transformer, Neural architecture search, Graph convolutional network
Cite
@inproceedings{ICIC_2024,
author = {Weiqi Wang, Feilong Bao, Zhecong Xing, and Zhe Lian},
title = {A Survey: Research Progress of Feature Fusion Technology},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {161-175},
note = {Poster Volume Ⅰ}
}
-
DSCANet: Dynamic Snake Convolution with Attention for Crack Segmentation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Wenbo Hu, Kuixuan Jiao, Kaijian Xia, and Rui Yao
Abstract: Pixel-wise crack segmentation task plays a crucial role in infrastructure maintenance, However, it poses a significant challenge due to the irregular and slender nature of cracks, The standard convolution kernel struggles to accurately capture these structural features of cracks as its shape is a fixed square, Moreover, the presence of intricate details in cracks means that shallow information containing more spatial and nuanced details significantly impacts the segmentation results, Any inadequacy or inaccuracy in local detailed information can lead to segmentation errors, In this paper, we propose a novel crack segmentation method based on encoder-decoder framework, First, we propose a Dynamic Snake Convolution with Attention (DSCA) module to enhance feature extraction accuracy for cracks and direct the network's focus towards critical features, Additionally, we propose a multi-level and multi-scale feature fusion strategy to enable the network to effectively leverage both local spatial information and global semantic information, We enrich the decoder with more detailed information at various levels, Also, we incorporate a channel prior convolutional attention mechanism for feature fusion to supplement attention to both channel and spatial aspects, Finally, a Strip Pooling Module (SPM) is employed to our network, The SPM enable networks to efficiently model long-range dependencies, Experimental results on two different crack datasets validate the superior performance of our method, surpassing several mainstream methods.
Keyword: Crack segmentation, Feature fusion, Attention mechanism
Cite
@inproceedings{ICIC_2024,
author = {Wenbo Hu, Kuixuan Jiao, Kaijian Xia, and Rui Yao},
title = {DSCANet: Dynamic Snake Convolution with Attention for Crack Segmentation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {581-594},
note = {Poster Volume Ⅰ}
}
-
A Memetic Algorithm for Finding Robust and Influential Seeds for Networks under Cascading Failures,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Shun Cai, Shuai Wang, and Chengkun Yang
Abstract: How to select a set of nodes with the strongest information spreading capability from a complex network is known as the influence maximization problem, Existing research has mostly focused on structurally stable networks, and the impact of network structural changes on the influence diffusion process is yet to be explored, Simultaneously, network systems are inevitably subject to disturbances or even structural damage during operation, such as cascading failures, To address the robust influence maximization (RIM) problem under cascading failures, this paper investigates the RIM problem caused by link attacks leading to cascading failures, A numerical metric is designed to comprehensively assess the robust influence performance of seeds, For the RIM problem, a memetic algorithm with an ecological niche strategy, termed MA-RIMCF-Link, is designed to find seeds with stable influence, and experiments on synthetic and practical net- works validate the competitiveness and effectiveness of MA-RIMCF-Link compared to existing methods.
Keyword: omplex networks, cascading failure, robustness, influence maximization
Cite
@inproceedings{ICIC_2024,
author = {Shun Cai, Shuai Wang, and Chengkun Yang},
title = {A Memetic Algorithm for Finding Robust and Influential Seeds for Networks under Cascading Failures},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {504-515},
note = {Poster Volume Ⅱ}
}
-
Analysis of risk factors for recurrence of Budd-Chiari syndrome based on zero-inflated model,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Shengli Li, Xiangting Liu, Muyao Zhou, Hui Wang, Na Yang, Cuocuo Wang, Qingqiao Zhang, Maoheng Zu, and Lei Wang
Abstract: Objective: To identify the optimal model for assessing the recurrence frequency of BCS, and further analyze the factors contributing to BCS recurrence. Methods: The study included a total of 754 patients who were admitted to the Affiliated Hospital of Xuzhou Medical University between January 2015 and July 2022. We constructed four different count outcome models: the Poisson model, the negative binomial (NB) model, the zero-inflated Poisson (ZIP) model, and the zero-inflated negative binomial (ZINB) model. We selected the model with the best fitting performance for exploring factors associated with BCS recurrence. Results: Of all 754 respondents, 511 patients reported no recurrences. The LR tests indicated that the NB model performed better than the Poisson regression model (X^2 = 124.91, p < 0.001), and the ZINB model outperformed the ZIP model (X^2 = 34.29, p < 0.001). In the ZINB model, the analysis of the counting process revealed that the variables significantly associated with recurrence frequency included age (odds ratio [OR] = 0.69; 95% confidence interval [CI]: 0.57-0.84), sex (female: OR = 1.77; 95% CI: 1.24-2.55), anticoagulant use (warfarin vs. new oral anticoagulants [NOACs]: OR = 2.11, 95% CI: 1.34-3.31; not using anticoagulants vs. NOACs: OR = 1.98, 95% CI: 1.20-3.28), absence of cirrhosis (OR = 0.57, 95% CI: 0.40-0.82), and neutrophil count (OR = 1.22, 95% CI: 1.04-1.42). Conclusions: The zero-inflated model proves robust in identifying factors influencing BCS recurrence compared to other models, elucidating the influence of gender, surgery, anticoagulation, cirrhosis, hospital duration, APOA, and neutrophil count on recurrence risk and frequency of BCS patients.
Keyword: ZINB regression,Dispersion,Count data,Budd-Chiari syndrome,Recurrence
Cite
@inproceedings{ICIC_2024,
author = {Shengli Li, Xiangting Liu, Muyao Zhou, Hui Wang, Na Yang, Cuocuo Wang, Qingqiao Zhang, Maoheng Zu, and Lei Wang},
title = {Analysis of risk factors for recurrence of Budd-Chiari syndrome based on zero-inflated model},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {771-785},
note = {Poster Volume Ⅱ}
}
-
Incorporating Causal Connective Prediction to Improve Event Causality Identification with Generated Explanations,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jinzhao Cheng, Sheng Xu, Peifeng Li, and Qiaoming Zhu
Abstract: Event Causality Identification ECI aims to predict the causal relation for a pair of events in text, Previous work has often combined fine-tuning with specific classifiers, which contradicts the pre-trained task of the model and fails to utilize the knowledge within the PLM, Additionally, event causality identification is a complex inference task, and relying solely on sample content makes it challenging to establish an effective inference process, To tackle these two issues, we propose a new prompt-based approach for ECI, which includes a new task, causal connective prediction, and the use of explanations generated by a large-scale language model LLM to enhance event causality identification, Initially, we direct the LLM to produce natural language explanations of target event pairs to aid prompt generation, These explanations assist the model in comprehending events and their correlation, Additionally, we develop a task for predicting causal connectives to guide the reasoning process, Furthermore, we introduce a tensor matching mechanism to capture the semantic interaction of events in context, supporting our two prompt tasks, Our experimental results on two benchmark datasets demonstrate that our method outperforms state-of-the-art models in the sentence-level ECI task.
Keyword: Event Causality Identification Prompt-based Learning Causal Connective Prediction
Cite
@inproceedings{ICIC_2024,
author = {Jinzhao Cheng, Sheng Xu, Peifeng Li, and Qiaoming Zhu},
title = {Incorporating Causal Connective Prediction to Improve Event Causality Identification with Generated Explanations},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {691-702},
note = {Poster Volume Ⅱ}
}
-
Academic Institution Name Recognition Based on Representation Learning and Sematic Matching,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jinyu Wang and Zhijie Ban
Abstract: Recognition of scholar's institution name has been extensively researched for accurately parsing academic papers, Existing rule-based methods are primarily applicable in the cases where the writing styles of the organization name are regular, Other approaches need to pre-establish a knowledge base for mapping organization names, which demands considerable human resources, This paper presents a method based on representation learning and semantic matching, primarily leveraging institution's textual information and academic network's structure, We first construct an author-institution heterogeneous graph, on which maximal random walk and Word2Vec are used to obtain representation vectors for institution nodes, Then, we convert institution names into semantic vectors by the SimCSE model and institution candidate sets are generated by employing the locality sensitive hashing algorithm, Finally, in order to avoid setting the cluster number, we propose a connected subgraph partitioning method to divide institution clusters, Experimental results on two real datasets demonstrate that our method significantly outperforms the existing state-of-the-art recognition methods.
Keyword: Academic Institution Name Recognition, Representation Learning, Semantic Matching
Cite
@inproceedings{ICIC_2024,
author = {Jinyu Wang and Zhijie Ban},
title = {Academic Institution Name Recognition Based on Representation Learning and Sematic Matching},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {516-528},
note = {Poster Volume Ⅱ}
}
-
Hybrid Convolutional Network for Object Detection and Multi-Class Counting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yuanlin Ning, Ying Yang, Zhenbo Li, Jianquan Li, and Ping Song
Abstract: In this work, we introduce a novel Hybrid Convolutional Network designed for efficient object detection and multi-class counting in varied applications such as aerial photography and surveillance, Leveraging the strengths of hybrid networks, our model facilitates the simultaneous execution of detection and counting tasks by sharing common network structures, thereby accelerating the image analysis process and enhancing feature generalization, We propose a novel Density-Aware Non-Maximum Suppression algorithm that adaptively adjusts the Intersection over Union IoU threshold according to object density, ensuring robust detection performance in both dense and sparse scenes, Additionally, we introduce a Region Suppression Module that leverages detection outcomes to minimize noise in density maps, further improving counting accuracy, Through comprehensive experiments, our approach demonstrates state-of-the-art performance in counting tasks and competitive accuracy in detection tasks across various datasets, while maintaining high processing speed.
Keyword: object detection object counting multi-task learning
Cite
@inproceedings{ICIC_2024,
author = {Yuanlin Ning, Ying Yang, Zhenbo Li, Jianquan Li, and Ping Song},
title = {Hybrid Convolutional Network for Object Detection and Multi-Class Counting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {80-94},
note = {Poster Volume Ⅱ}
}
-
Chinese Discourse Parsing on Hierarchical Topic Graphs,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Weihao Liu, Yaxin Fan, Xiaomin Chu, Peifeng Li, and Qiaoming Zhu
Abstract: Discourse parsing aims to help understand the structure and semantics of discourse
by mining the intrinsic structured information of the text, Most existing methods lack guidance from topic information in modeling discourse units, resulting in inconsistencies in semantic modeling at various levels, Therefore, we propose a Chinese discourse parsing method on hierarchical topic graphs, interacting with topic information and textual semantic information at different levels, In particular, we use GPT-4 to generate topic information at different levels, Then, we construct the topic information into a three-level hierarchical topic graph by referring to the original discourse unit division, allowing the core information at different levels to merge, The experiments on both Chinese UCDTB and English RST-DT demonstrate the effectiveness of our proposed method.
Keyword: Discourse parsing, Topic information, Hierarchical topic graph
Cite
@inproceedings{ICIC_2024,
author = {Weihao Liu, Yaxin Fan, Xiaomin Chu, Peifeng Li, and Qiaoming Zhu},
title = {Chinese Discourse Parsing on Hierarchical Topic Graphs},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {666-677},
note = {Poster Volume Ⅱ}
}
-
Bitcoin Illegal Transaction Detection Model Based on Time Step and Ensemble Learning,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zelai Yang and Xudong Li
Abstract: In recent years, bitcoin, as the leading digital currency, has grown in value, However, due to the anonymity of the bitcoin system, it is convenient for people to carry out illegal activities on it, This will lead to irreversible loss to rights and interests of investors, as well as the proliferation of criminal activities, resulting in significant economic losses, Thus, detecting and combating illegal transactions becomes crucial, This research is dedicated to solving the problem of detecting illegal transactions in open-source bitcoin transaction datasets, We proposes a bitcoin illegal transaction detection model based on time step and ensemble learning, The model focuses on the time-sensitive nature of illegal behaviour in reality by grouping the dataset at the time step, Moreover, the model leverages oversampling techniques in the ensemble learning stage to improve the recall, Experimental results indicate that the proposed model makes it easier for the classifiers at each time step to capture the prevalent illegal patterns of the bitcoin system on different time steps, Results also show that this model can achieve high precision and recall precision=0, 99, recall=0, 9 in the scope of elliptic dataset, thus improving the detection rate of illegal transactions.
Keyword: Bitcoin,Illegal transaction,Time step,Ensemble learning,Oversampling
Cite
@inproceedings{ICIC_2024,
author = {Zelai Yang and Xudong Li},
title = {Bitcoin Illegal Transaction Detection Model Based on Time Step and Ensemble Learning},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {329-340},
note = {Poster Volume Ⅱ}
}
-
Improved Swarm Intelligence Algorithm Based on Novel Nonlinear Multi-Strategy Optimisation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiaoran He, Jianuo Hou, Jianrong Li, and Chuanlei Zhang
Abstract: Aiming at the problems of low solution accuracy of the grey wolf optimization algorithm and the tendency to fall into local optimality in the later stage, an improved swarm intelligence algorithm NSGWO based on a novel nonlinear multi-strategy optimization is proposed: firstly, Logistic mapping is utilized to make the initial particles as uniformly distributed as possible, so as to provide a balanced search space improved Gaussian distributions are used to adjust the convergence factor, and a position in the NSGWO The logarithmic function is introduced into the updating formula, Finally, an optimal position perturbation mechanism is introduced, which prompts the algorithm to jump out of the local optimal solution by perturbing the optimal position in a small range, thus further improving the optimisation performance of the algorithm, Several classical unimodal and multimodal test functions are used to verify the optimisation performance of NSGWO, The experimental results show that compared with other classical optimisation algorithms, NSGWO has a greater advantage in terms of convergence speed and solution accuracy.
Keyword: Logistic mapping, convergence factor, position update, optimal position perturbation,GWO
Cite
@inproceedings{ICIC_2024,
author = {Xiaoran He, Jianuo Hou, Jianrong Li, and Chuanlei Zhang},
title = {Improved Swarm Intelligence Algorithm Based on Novel Nonlinear Multi-Strategy Optimisation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {875-888},
note = {Poster Volume Ⅰ}
}
-
Document-level Event Argument Extraction with Entity type-aware Graph Link Prediction,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Tianqi Liu, Jie Yang, and Hui Song
Abstract: Document-level event argument extraction faces challenges such as context modeling, cross-sentence correlations, and long-distance dependencies, Pre-vious researches have introduced abstract meaning representation to capture the semantic structure of documents, However, there are still issues with in-complete argument spans and misclassified argument roles, To improve the performance of the model in argument identification and classification, we propose a novel model EBGE, which involves an entity type-aware bidirec-tional heterogeneous graph in, It updates node representations by means of relational graph attention network, and then predicts arguments through node representations and span entity type embeddings, Experimental results on public datasets, WikiEvents and RAMS, demonstrate that our model achieves improvements in F1 scores on both subtasks compared to previous state-of-the-art works.
Keyword: Event Argument Extraction, Abstract Meaning Representation, Entity Type
Cite
@inproceedings{ICIC_2024,
author = {Tianqi Liu, Jie Yang, and Hui Song},
title = {Document-level Event Argument Extraction with Entity type-aware Graph Link Prediction},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {678-690},
note = {Poster Volume Ⅱ}
}
-
Unsupervised low-light image enhancement using statistic modules and dense connections,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yunqi Ma and Danwei Chen
Abstract: Low-light Image Enhancement LLIE is a crucial strategy for improving the brightness and visual characteristics of underexposed images, Traditional and machine learning-based LLIE methods often use a single image or map to merge the most prominent channel in RGB, However, this approach presents challenges in achieving a comprehensive understanding of the image data due to the limited information available from a single source, Leveraging a single image or map may limit the algorithm ability to capture the full spectrum of details and nuances present in the original image, Therefore, it is important to explore alternative approaches that can capture a more comprehensive and detailed representation of the input data, In this paper, we introduce an unsupervised approach, SDLLIE, which combines the advantages of retinex theory and deep learning, Firstly, a statistical module is used to extract various information from the input map, allowing for a comprehensive analysis of the image data, Secondly, dense connections are incorporated to prevent network degradation and facilitate the smooth flow of information across layers, Before extracting the illumination and reflectance components, we remove noise to improve the quality and accuracy of low-light images, To align the generated results with the desired outcomes, we use a set of customized loss functions to guide the training process and optimize the network parameters effectively, Our proposed SDLLIE method has been comprehensively evaluated using both quantitative and qualitative measures on three widely-recognized benchmark datasets, The results demonstrate its considerable performance when compared to existing state-of-the-art approaches.
Keyword: Retinex
Statistic
Connection
Cite
@inproceedings{ICIC_2024,
author = {Yunqi Ma and Danwei Chen},
title = {Unsupervised low-light image enhancement using statistic modules and dense connections},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {595-606},
note = {Poster Volume Ⅰ}
}
-
K-Aster: A novel Membership Inference Attack via Prediction Sensitivity,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Ruoyi Li, Xinlong Zhao, Deshun Li, and Yuyin Tan
Abstract: Membership Inference Attacks MIA are considered the fundamental priva-cy risk in Machine Learning ML , which attempt to determine whether a specific data sample is training data for a target model, However, the recently proposed Aster only reports precision and recall for the member class with-out reporting false alarm rate FAR for the non-member class and the per-formance of target models, Additionally, Aster with Jacobi matrices requires the target model to output a vector of prediction probabilities, which can be easily defended when the model outputs only labels, In this paper, we pro-pose a novel MIA method K-Aster, which only needs the output labels and partial training data of the target model to determine whether the data sam-ples were used to train a given ML model, We obtain different output labels of the target model by data enhancement, Then we extract features from the labels to fit a line and quantify the prediction sensitivity with slope, Finally, we regard the samples with lower sensitivity as training data, Experimental results of attacks on Automatic Speech Recognition ASR systems show that our method is an important extension to Aster, which can achieve low FAR and high attack precision under non-classification tasks, The source code is available at https: github, com 13053676954 K-Aster.
Keyword: Machine Learning, Membership Inference Attack, Aster, Prediction Sensitivi-ty
Cite
@inproceedings{ICIC_2024,
author = {Ruoyi Li, Xinlong Zhao, Deshun Li, and Yuyin Tan},
title = {K-Aster: A novel Membership Inference Attack via Prediction Sensitivity},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {781-792},
note = {Poster Volume Ⅰ}
}
-
ADPA-PCB: Enhancing PCB Defect Detection Neural Networks with Adaptive Activate Conv and PCBSAdd,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: ZhiNan Huang, GuoPeng Zhou, and JianQuan Zhang
Abstract: The effective operation of electronic products heavily relies on the utilization of high-quality printed circuit boards PCBs , PCB defects can lead to failures, resulting in substantial wastage and financial losses, Hence, it is imperative to employ efficient technologies for detecting defects in PCBs, aiming to minimize waste and ensure product reliability, However, existing defect detection methods encounter challenges in achieving a balance between accuracy and speed, Traditional approaches often compromise one aspect in favor of the other, leading to suboptimal performance and limited practicality in real-world scenarios, To address this challenge, this paper presents a novel defect detection method, Our study proposes the augmentation of publicly available datasets of PCB images by incorporating a broader range of defect types, This augmentation facilitates better simulation of real-world defect detection scenarios, Additionally, we introduce the Adaptive Activate Conv module to enhance the model's capacity to learn features associated with PCB defects, Furthermore, we propose the PCBSAdd module to improve the model's accuracy in detecting PCB defects, Extensive experiments are conducted using an expanded PCB dataset to evaluate the performance of the proposed method, The results demonstrate outstanding performance, exhibiting a noteworthy 2, 1 increase in detection accuracy, Moreover, the proposed model maintains real-time applicability, thereby highlighting its practical significance in industrial settings.
Keyword: Printed Circuit Boards PCBs,Defect Detection,Adaptive Activate Conv,PCBSAdd,Deep Learning
Cite
@inproceedings{ICIC_2024,
author = {ZhiNan Huang, GuoPeng Zhou, and JianQuan Zhang},
title = {ADPA-PCB: Enhancing PCB Defect Detection Neural Networks with Adaptive Activate Conv and PCBSAdd},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {95-106},
note = {Poster Volume Ⅱ}
}
-
Robust Lane Detection via Spatial and Temporal Fusion,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Siyuan Peng, Wangshu Yao, and Yifan Xue
Abstract: Lane detection is a crucial and challenging task in autonomous driving, Most existing detection methods only have good results in common scenes, but they have detected poorly in extreme scenarios such as occlusion and strong illumination, To address this problem, this paper introduces a robust lane detection network based on spatial-temporal fusion LSTnet for extreme scenarios like occlusion, LSTnet incorporates a detachable local and global memory component as an external storage unit, Through the fusion, read, and update operations on memory features, the component captures temporal information to compensate for the lack of information in extreme detection scenarios, Additionally, LSTnet uses a memory alignment loss function to guide the memory component to update the memory effectively, so as to obtain temporal consistency between the feature maps outputted by the memory component and the ground truth feature maps, Extensive experiments on two commonly used datasets demonstrate that the network achieves an F1 score of 79, 49 on CULane and 97, 31 on the TuSimple dataset.
Keyword: Lane Detection, Time Series Model, Memory Network
Cite
@inproceedings{ICIC_2024,
author = {Siyuan Peng, Wangshu Yao, and Yifan Xue},
title = {Robust Lane Detection via Spatial and Temporal Fusion},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {107-118},
note = {Poster Volume Ⅱ}
}
-
BPSO and BRKGA for Broadcast Scheduling Problem,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Shuai Xiaoying and Yin Yuxia
Abstract: Broadcasting in wireless networks is an essential information dissemination method, However, shared channel causes contention and collisions, TDMA is a widely used conflict-free scheduling scheme, In this study, a TDMA scheduling scheme based on BPSO Binary Particle Swarm Optimization and BRKGA Biased Random-Key Genetic Algorithm is proposed for BSP Broadcast Scheduling Problem which is a NP-complete problem, First, a better population is generated by BPSO then, the scheme uses BRKGA to get a solution closer to the optimization, The simulation results show that the proposed algorithm exhibits better performance in terms of lower frame length and higher channel utilization.
Keyword: BSP, TDMA, BPSO, BRKGA
Cite
@inproceedings{ICIC_2024,
author = {Shuai Xiaoying and Yin Yuxia},
title = {BPSO and BRKGA for Broadcast Scheduling Problem},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {889-900},
note = {Poster Volume Ⅰ}
}
-
Smoking Detection Model Based on Improved YOLOv8-s,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yujun Zhu, Canyang Zhou, and Bi Zeng
Abstract: Target detection algorithms face challenges in smoking detection tasks, particularly in the identification of small targets and the occurrence of misidentification in various scenes, In this paper, we propose Smoking-YOLO based on YOLOv8-s, which adopt ConvNeXtv2 as the backbone that can extract features at four scales, obtaining stronger contextual information to enhance small target detection capability, During the feature fusion stage, we employ a bidirectional three-channel four-scale fusion strategy in the fusion stage to output four-scale prediction maps, strengthening the semantic information focus on smoking details and improving the ability to distinguish pseudo-smoking behaviors, Finally, we adds a slide weighting function to enhance attention to hard negative samples, Experimental results on the self-built Smoking-3k dataset show that our model achieves a detection effect of AP_small 0, 31 for small targets, an improvement of 10, 6 , The model's precision and recall reach mAP_ 0, 5 : 0, 947 and mAP_ 0,5:0, 95 :0, 652, respectively, increasing by 3, 1 and 7 , demonstrating the effectiveness of the model improvement, The code is available at https: github, com TaroPlay Smoking-YOLO, gi.
Keyword: Smoking Detection Small Target Detection Bidirectional Three-channel Four-scale Fusion Strategy
Cite
@inproceedings{ICIC_2024,
author = {Yujun Zhu, Canyang Zhou, and Bi Zeng},
title = {Smoking Detection Model Based on Improved YOLOv8-s},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {193-209},
note = {Poster Volume Ⅰ}
}
-
A malicious traffic detection algorithm based on the combination of traffic statistical feature and BERT text feature,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: HongPeng Wang, YingMing Zeng, Jia Hu, KaiWei Kong, and LinLin Zhang
Abstract: Nowadays, most malicious traffic detection algorithms are based on statistical characteristics for analysis, However, as the behaviors of malicious traffic are constantly evolving, attackers are constantly refining their techniques to evade these statistical feature-based detection algorithms, In today's complex network environment, relying solely on statistical feature to detect malicious traffic may not be able to identify all malicious traffic, Therefore, this paper proposes a clas-sification detection method that integrates statistical feature with BERT text fea-ture as the training and testing features of the classification model, The classifica-tion model utilizes variational autoencoders to capture malicious traffic's latent patterns and anomalous features, Experimental results show that the proposed method in this paper can classify malicious traffic with an accuracy of 99 , which is significantly better than other malicious traffic detection algorithms, The proposed method combines mixed features with probabilistic modeling, signifi-cantly improving the accuracy of detecting malicious traffic, and enabling early detection and prevention of potential network attacks and threats.
Keyword: variational autoencoder, network malicious traffic detection, deep learning, BERT
Cite
@inproceedings{ICIC_2024,
author = {HongPeng Wang, YingMing Zeng, Jia Hu, KaiWei Kong, and LinLin Zhang},
title = {A malicious traffic detection algorithm based on the combination of traffic statistical feature and BERT text feature},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {793-806},
note = {Poster Volume Ⅰ}
}
-
LLM-as-an-Augmentor: Improving the Data Augmentation for Aspect-Based Sentiment Analysis with Large Language Models,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Mengyang Xu, Qihuang Zhong, and Juhua Liu
Abstract: Aspect-Based Sentiment Analysis ABSA is a vital fine-
grained sentiment analysis task that aims to determine the sentiment po-
larity towards an aspect in a sentence, Due to the expensive and limited
amounts of labeled data, data augmentation DA methods have become
the de-facto standard for ABSA, However, current DA methods usually
suffer from 1 poor fluency and coherence and 2 lack of the diversity of
generated data, To this end, we propose a simple-yet-effective DA method
for ABSA, namely LLM-as-an-Augmentor, which leverages the pow-
erful capability of third-party larger language models LLMs to improve
the quality of generated data, Specifically, we introduce several text re-
construction strategies and use them to guide the LLMs for automatic
data generation via a carefully-designed prompting method, Extensive
experiments on 5 baseline methods and 3 widely-used benchmarks show
that our LLM-as-an-Augmentor can bring consistent and significant
performance gains among all settings, More encouragingly, given only
15 labeled data, our method can achieve comparable performance to
that of full labeled data.
Keyword: Aspect-based Sentiment Analysis, Large Language Model
, Data Augmentation, Prompt Engineering
Cite
@inproceedings{ICIC_2024,
author = {Mengyang Xu, Qihuang Zhong, and Juhua Liu},
title = {LLM-as-an-Augmentor: Improving the Data Augmentation for Aspect-Based Sentiment Analysis with Large Language Models},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {703-714},
note = {Poster Volume Ⅱ}
}
-
Entity Resolution with Deep Interactions and Fine-Grained Difference Extraction based on BERT,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Huiting Yuan, Liang Zhu, Yu Wang, and Zhouyan Liu
Abstract: Entity Resolution ER is crucial for data integration and it aims to determine whether a pair of records from one or multiple datasets refer to the same real-world entity, With growing complexity and diversity in record structures, traditional ER methods rely on coarse-grained features, making it difficult to delve into subtle semantic associations and difference between records, which in turn affects model performance, Furthermore, processing each record pair individually also increases computational costs, To overcome the limits of existing methods, we propose DIBER, a novel ER model based on Siamese networks structure and a pre-trained language model PLM that generates contextually rich representations of records, DIBER harnesses co-attention to discern inter-record relationships and applies a fusion and weighted- attention to pinpoint subtle but significant distinctions, It further integrates a feature extractor for extracting fine-grained and pivotal matching information, complementing the global context furnished by the PLM, This results in richer, more discriminative entity representations, It also can be flexibly applied to blocking, Extensive experiments are conducted on benchmark datasets and compared with state-of-the-art SOTA methods, showing superior performance on small-scale datasets without injecting specific domain knowledge.
Keyword: entity resolution matching deep interaction fine-grained information blocking
Cite
@inproceedings{ICIC_2024,
author = {Huiting Yuan, Liang Zhu, Yu Wang, and Zhouyan Liu},
title = {Entity Resolution with Deep Interactions and Fine-Grained Difference Extraction based on BERT},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {715-732},
note = {Poster Volume Ⅱ}
}
-
Clustering-based Self-Supervised Multi-Scale Generative Adversarial Network for Data Imputation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yi Xu, Xuhui Xing, Anchi Chen, and Yang Liu
Abstract: Missing data has always been a challenging issue in machine learning,
The Generative Adversarial Imputation Network GAIN has been proven to be
superior to many existing solutions, However, GAIN suffers from two limitations: first, it does not consider the correlations among input samples second, it only imputes based on adversarial loss and reconstruction loss of non-missing values without considering the reconstruction loss of missing values, To address these issues, this paper proposes a clustering-based self-supervised multi-scale Generative Adversarial Network for data imputation method, CCGAIN, Firstly, the dataset to be imputed is clustered, and subsequent imputation is performed on samples within each cluster, Then, based on features with low missing rates, local scale data is constructed for each cluster, Next, we use the imputation results of local scale missing values as supervised information for global scale missing value imputation, constructing the reconstruction loss for global scale missing values, Finally, based on the reconstruction loss of missing values, the reconstruction loss of non-missing values, and the adversarial loss, imputation is performed at the global scale, Experimental results demonstrate the effectiveness of
this method.
Keyword: Missing Data, Generative Adversarial Networks, Clustering
Cite
@inproceedings{ICIC_2024,
author = {Yi Xu, Xuhui Xing, Anchi Chen, and Yang Liu},
title = {Clustering-based Self-Supervised Multi-Scale Generative Adversarial Network for Data Imputation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {341-358},
note = {Poster Volume Ⅱ}
}
-
STASG: a Novel Traffic Prediction Model Based on Spatial-Temporal Attention Simple Graph Neural Network,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiujuan Xu, Jiaxin Ai, Renjie Liu, Xiaowei Zhao, and Yu Liu
Abstract: The growth of the autonomous driving industry in recent years has spurred research on intelligent transportation systems, However, predicting long-term traffic patterns is a complex task that can lead to overfitting and fluctuations in model predictions, To address these challenges, this paper proposes a spatio-temporal modeling approach that captures both the spatial and temporal features of traffic data, The method fuses these features using a gated fusion mechanism and then applies feedforward neural networks to transform the spatio-temporal data into predictions for future time steps, To mitigate overfitting, the paper introduces a novel loss function called the mean loss function, By minimizing fluctuations in model predictions, this approach aims to improve the accuracy of long-term traffic forecasts, Overall, this paper presents a promising approach to improving the performance of intelligent transportation systems, particularly in the area of long-term traffic prediction, The proposed method combines several techniques, including spatio-temporal modeling, neural networks, and a new loss function,to address the challenges of overfitting and prediction fluctuations, After conducting multiple experiments on the publicly available transportation network datasets, METR-LA and PEMS-Bay, our proposed model
demonstrated improved performance in long-term traffic flow predictio.
Keyword: Traffic prediction, Gated attention unit, Mean Value loss, Graph neural network
Cite
@inproceedings{ICIC_2024,
author = {Xiujuan Xu, Jiaxin Ai, Renjie Liu, Xiaowei Zhao, and Yu Liu},
title = {STASG: a Novel Traffic Prediction Model Based on Spatial-Temporal Attention Simple Graph Neural Network},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {937-954},
note = {Poster Volume Ⅰ}
}
-
BankCARE: Advancing Bank Services with Enhanced LLM and Retrieval Generation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Deyu Chen and Xiaofeng Zhang
Abstract: In the context of coping with the rapid development of fintech and the transformation of banking online,
this study explores a retrieval augmented generation RAG -
based approach aiming to enhance the efficiency and quality
of bank customer service, Considering the challenges faced by
existing bank customer service systems in handling complex
requirements and personalised solutions, especially data privacy
protection and dialogue system intelligence, this paper proposes
a novel LangChain-based RAG framework, This framework
performs vector indexing and similarity search by integrating
FAISS, and employs multiple embedding models for data processing and chunking to accurately and efficiently capture and
respond to customer needs, By processing external information
in real-time, this method is able to adjust the response to
the specific context of the query, improving the accuracy and
adaptability of the response, The validation on a real dataset
of bank customer service demonstrates the advantages of this
research method over existing techniques in improving response
speed and quality, significantly enhancing customer satisfaction
and contributing to the development of the FinTech field.
Keyword: Fintech development, Fintech development, LangChain-based framework, Similarity search
Cite
@inproceedings{ICIC_2024,
author = {Deyu Chen and Xiaofeng Zhang},
title = {BankCARE: Advancing Bank Services with Enhanced LLM and Retrieval Generation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {955-967},
note = {Poster Volume Ⅰ}
}
-
Domain Similarity-Perceived Label Assignment for Domain Generalized Underwater Object Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xisheng Li, Wei Li, Pinhao Song, Mingjun Zhang, and Guifang Sun
Abstract: Abstract-The inherent characteristics and light fluctuations of water bodies give rise to the huge difference between different layers and regions in underwater environments, When
the test set is collected in a different marine area from the training set, the issue of domain shift emerges, significantly compromising the model's ability to generalize, The Domain
Adversarial Learning DAL training strategy has been previously utilized to tackle such challenges, However, DAL heavily depends on manually one-hot domain labels, which implies
no difference among the samples in the same domain, Such an assumption results in the instability of DAL, This paper introduces the concept of Domain Similarity-Perceived Label
Assignment DSP , The domain label for each image is regarded as its similarity to the specified domains, Through domainspecific data augmentation techniques,
we achieved state-of-the-art results on the underwater cross-domain object detection benchmark S-UODAC2020, Furthermore, we validated the effectiveness of our method in the Cityscapes dataset.
Keyword: Domain adversarial learning, underwater object detection, pseudo domain label
Cite
@inproceedings{ICIC_2024,
author = {Xisheng Li, Wei Li, Pinhao Song, Mingjun Zhang, and Guifang Sun},
title = {Domain Similarity-Perceived Label Assignment for Domain Generalized Underwater Object Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {607-621},
note = {Poster Volume Ⅰ}
}
-
Breeding Strategies Generation and Crop Genetic Enhancement Based on Generative Adversarial Networks,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yanqing Song, Jianguo Chen, Ning Liu, Wei Zhang, Yunxian Chi, Xiaoyu Yang, Xiaozhen Hou, Zhihua Ding, Lei Guo, and Long Chen
Abstract: This study endeavors to propose a novel Generative Adversarial Networks GANs framework specifically designed for crop breeding, aimed at augmenting crop genetic information and formulating efficient breeding strategies, It addresses the pivotal scientific question of how to enhance the diversity and quality of crop genetic information and develop efficient breeding strategies to elevate the efficacy and success rates of crop breeding initiatives, This work proposes a methodology leveraging Generative Adversarial Networks GANs to enrich crop genetic information and craft effective breeding strategies, Through an in-depth examination of GANs-based methodologies for the enhancement of crop genetic information, this research aims to simulate and generate crop genetic data characterized by elevated genetic diversity, thereby significantly expanding the genetic resource pool and offering a wider array of genetic materials for breeding purposes, Specifically, by simulating rare or inadequately explored genetic variations, GANs hold the potential to unveil novel genetic traits and characteristics, thus opening new avenues for crop enhancement, Moreover, this study will leverage the augmented genetic information to refine breeding strategies through GANs models, encompassing not only the optimization of hybrid combinations but also the prediction of environmental factors and management practices on the expression of crop traits, In essence, this research aspires to provide scientific and precise decision-making support for breeders, markedly enhancing the success rate and efficiency of breeding programs.
Keyword: Generative Adversarial Networks, Crop Genetic Enhancement, Breeding Strategy Generative
Cite
@inproceedings{ICIC_2024,
author = {Yanqing Song, Jianguo Chen, Ning Liu, Wei Zhang, Yunxian Chi, Xiaoyu Yang, Xiaozhen Hou, Zhihua Ding, Lei Guo, and Long Chen},
title = {Breeding Strategies Generation and Crop Genetic Enhancement Based on Generative Adversarial Networks},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {786-797},
note = {Poster Volume Ⅱ}
}
-
Hierarchical Label Auto-Labeling and Relationship Constraints for Multi-Granularity Image Classification,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Changwang Mei, Xindong You, Shangzhi Teng, and Xueqiang LYU
Abstract: Hierarchical multi-Granularity classification HMC aims to assign each object a label with multiple granularities from coarse to fine, focusing on the hierarchical structure of the label encoding, However, obtaining multi-granularity image labels through extensive manual labeling by experts is both costly and impractical for large-scale Fine-grained visual classification FGVC datasets and new scenarios, In this paper, we propose a hierarchical label auto-labeling clustering algorithm HLA to automatically generate hierarchical multi-granularity image labels, Additionally, we introduce a hierarchical constraint loss HCL and propose a hierarchical prediction constraint loss HPCL to constrain the relationship between different hierarchies, Extensive experiments on three commonly used FGVC datasets demonstrate that the proposed HLA can obtain similar performance with manual label method on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets, The introduced HCL and HPCL achieves promising performance on multi-granularity image classification datasets, Meanwhile, the consistent improvement on all object Re-identification tasks demonstrates the effectiveness of our method.
Keyword: Hierarchical multi-granularity classification, Fine-grained visual classification, Automatic labeling
Cite
@inproceedings{ICIC_2024,
author = {Changwang Mei, Xindong You, Shangzhi Teng, and Xueqiang LYU},
title = {Hierarchical Label Auto-Labeling and Relationship Constraints for Multi-Granularity Image Classification},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {622-636},
note = {Poster Volume Ⅰ}
}
-
Ultra-Sparse Viewpoints Novel View Synthesis via Global Feature-Driven Spatial Structure Comprehension,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qijun He, Jingfu Yan, Jiahui Li, and Yifeng Li
Abstract: Our research primarily tackles the issue of substantial artifacts and geometric distortions encountered while synthesizing novel views from extremely sparse input views, We discovered that enhancing global features in such sparse conditions aids the network in better comprehending the scene's spatial relationships, thereby improving rendering quality, Our methodology is bifurcated into two principal components: the geometric reasoner and the classical neural renderer, The geometric reasoner unfolds in three phases, Initially, our approach emphasizes global feature extraction, utilizing these features to afford the network a comprehensive grasp of the scene's overall layout and structure, It particularly focuses on deciphering spatial relationships between different views, facilitating geometric reasoning and the formulation of expressive 3D scene representations, The subsequent fusion stage employs a mechanism akin to the visual transformer to amalgamate features from various input views across multiple scales and levels, thereby enriching the model's understanding of abstract spatial relationships and augmenting the light density attributes of all 3D points, The second part involves rendering the color of any light passing through the scene using a classic renderer, Experiments show that when tested on the most popular real-scenario forward datasets and synthetic datasets, our approach exhibits state-of-the-art performance and demonstrates richer details and a more complete silhouette structure compared to previous excellent work on synthesizing novel views.
Keyword: Transformer and ViT and NeRF and Sparse Views
Cite
@inproceedings{ICIC_2024,
author = {Qijun He, Jingfu Yan, Jiahui Li, and Yifeng Li},
title = {Ultra-Sparse Viewpoints Novel View Synthesis via Global Feature-Driven Spatial Structure Comprehension},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {210-221},
note = {Poster Volume Ⅰ}
}
-
Enhancing Sequence Model with Mathematical Reasoning in Symbolic Integration,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xingqi Lin, Liangyu Chen, Zhengfeng Yang, and Zhenbing Zeng
Abstract: Sequence model has shown its efficiency in tackling integration problems, outperforming traditional mathematical software on specific datasets, However, it also encounters some challenges: robustness against minor perturbations, compositionality for decomposable operations, and out-of-distribution generalization when dealing with larger values, longer problems, and functions not covered in the training set, These issues arise from the fact that integration problems can only be partially regarded as a language translation task because integration follows its own mathematical rules, To address the above issues, this paper proposes a novel approach that enhances sequence model with mathematical reasoning, We introduce the abstraction of coefficients, perform expression decomposition, and substitute known functions for unknown counterparts, Our model achieves 83, 6 accuracy in integration testing, 100 accuracy in robustness testing and 100 accuracy in additive composite expressions, By the mathematical rewriting, it also exhibits notable performance in extrapolation beyond the distribution, Moreover, our model passes the SAGGA test, In general, we obtain a robust symbolic integrator.
Keyword: Sequence Model Deep Learning AI mathematics
Cite
@inproceedings{ICIC_2024,
author = {Xingqi Lin, Liangyu Chen, Zhengfeng Yang, and Zhenbing Zeng},
title = {Enhancing Sequence Model with Mathematical Reasoning in Symbolic Integration},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {222-235},
note = {Poster Volume Ⅰ}
}
-
Unsupervised Low-light Image Enhancement with Generated Low-light Image Pairs,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yuping Xia, Fan Ji, Xiongxin Tang, and Fanjiang Xu
Abstract: Low-light image enhancement aims to improve the perception of images captured under low-light conditions, Many previous unsupervised methods rely solely on information from a single image and multiple priors for enhancement, However, the information within a single image is limited, and designing suitable priors could be challenging, Although some existing methods have addressed this using paired low-light images, obtaining such images is also intricate, To tackle this problem, we first generate paired low-light images with consistent content, noise independence, and slightly different illumination from a single low-light image, Second, we propose an unsupervised low-light enhancement network based on the paired images, Leveraging consistent image content, we establish mutual constraints between the two images to achieve identical enhancement results, To accomplish this, the Retinex theory is employed to decompose the images into illumination and reflectance components, ensuring consistency in the reflectance components of the two images, which facilitates the preservation of image content and avoids artifacts and color deviations, Moreover, as the images exhibit independent noise, we adopt the Noise2Noise for noise removal, ensuring comprehensive denoising without affecting image details, which further improves the quality of the enhanced images, Extensive experiments demonstrate that our approach has superior denoising capability while ensuring enhancement performance, and achieves results comparable to state-of-the-art methods.
Keyword: Low-light image enhancement,Unsupervised,Generated low-light image pairs,Retinex theory,Noise2Noise
Cite
@inproceedings{ICIC_2024,
author = {Yuping Xia, Fan Ji, Xiongxin Tang, and Fanjiang Xu},
title = {Unsupervised Low-light Image Enhancement with Generated Low-light Image Pairs},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {637-651},
note = {Poster Volume Ⅰ}
}
-
Design and Implementation of Unity3D-based Image Compression Coding Gamification Teaching System,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Di Fan, Yongfei Wang, Hongyun Liu, Ying Chen, and Aying Wei
Abstract: For the situation that some knowledge points of image compression coding are difficult to understand and the learning efficiency is low in traditional teaching, this system designs and implements a game-based teaching system based on virtual reality technology, which is oriented to complete the basic theoretical knowledge of image compression coding and the experimental operation of image compression coding methods, and creates a virtual visualization and scenario-based image compression coding teaching environment with the help of the navigation system of the Unity3D platform, the particle system, and animation system and other technologies, With the help of Unity3D platform navigation system, particle system, animation system and other technologies, we create a virtual visualization and scenario-based image compression coding teaching environment, which can show students a more intuitive and three-dimensional image compression operation method than the traditional teaching mode and establish a multi-dimensional and effective teaching scenario, Students can participate in the operation process of the image compression method independently, and seek solutions to problems in the infinite possibilities of the virtual scene, which can fully stimulate students' creativity, improve learning efficiency and achieve better learning results.
Keyword: Teaching Image Compression Coding, Virtual Reality, Virtual scenario-based teaching, Unity3D
Cite
@inproceedings{ICIC_2024,
author = {Di Fan, Yongfei Wang, Hongyun Liu, Ying Chen, and Aying Wei},
title = {Design and Implementation of Unity3D-based Image Compression Coding Gamification Teaching System},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {652-669},
note = {Poster Volume Ⅰ}
}
-
Automatic Epicardial Adipose Tissue Segmentation in Cardiac CT with Position Regularization,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qinghe Yuan, Qiong Su, Zhenteng Li, Qi Wen, Zhikang Lin, Dajun Chai, and Sheng Lian
Abstract: Cardiovascular diseases CVDs are a major global health concern, Epicardial adipose tissue EAT has been identified as playing a significant role in the pathogenesis and progression of cardiovascular diseases, While deep learning-based methods have shown promising results in EAT segmentation, they primarily treat EAT as a whole and do not consider the urgent clinical need for fine-grained segmentation at different locations, In this work, we propose a position-aware fine-grained EAT segmentation method that extends existing single-class coarse EAT segmentation to multi-class fine-grained segmentation of RV-, LV-, and PA-EAT, Our method utilizes a two-branch architecture, where one branch specializes in segmentation and the other focuses on precisely positioning centroids of various EATs,thereby enhancing model performance for EAT localization and boosting segmentation accuracy, By leveraging prior knowledge of spatial distributions of different tissues, our method demonstrates favorable performance on a challenging self-collected dataset and a public dataset, The proposed method has the potential to aid in the automatic fine-grained segmentation of EAT, enabling more detailed clinical diagnostic needs.
Keyword: Epicardial adipose tissue EAT, segmentation, multi-class, position regularization
Cite
@inproceedings{ICIC_2024,
author = {Qinghe Yuan, Qiong Su, Zhenteng Li, Qi Wen, Zhikang Lin, Dajun Chai, and Sheng Lian},
title = {Automatic Epicardial Adipose Tissue Segmentation in Cardiac CT with Position Regularization},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {798-809},
note = {Poster Volume Ⅱ}
}
-
Deep Embedded Subspace Clustering with Hard-Sample Mining,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Li Zou, Tingting Leng, Rui Xie, Jiaxiong Liu, and Jun Zhou
Abstract: In recent years, subspace clustering has received increasing attention for its ability to accurately discover the underlying subspace
in high-dimensional data, Among them, end-to-end subspace clustering methods compute cluster assignments by mapping points to subspaces,
However, such methods ignore hard samples when computing the clustering assignment, i, e, , low-value samples in the correct clustering assignment and high-value samples in the incorrect clustering assignment, To address this problem, in contrast to the previous instance-level hard samples, we mine hard samples at the subspace cluster-level, We first construct a deep embedded subspace clustering framework as the clustering target and learn subspace bases in iterations to obtain clustering assignments, Secondly, we utilize pseudo-supervised information and clustering assignments to mine hard samples at subspace cluster-level, Finally, a weight modulation strategy is proposed to dynamically focus the hard samples and obtain more accurate subspace clustering assignments, Through extensive experiments, we show that our method outperforms state-of-the-art subspace clustering algorithms on four benchmark datasets.
Keyword: subspace clustering, end to end clustering, hard-sample mining
Cite
@inproceedings{ICIC_2024,
author = {Li Zou, Tingting Leng, Rui Xie, Jiaxiong Liu, and Jun Zhou},
title = {Deep Embedded Subspace Clustering with Hard-Sample Mining},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {359-370},
note = {Poster Volume Ⅱ}
}
-
Threat Intelligence Quality Assessment Model Based on ATT&CK Framework for Multiple Application Scenarios,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Guangxiang Dai, Peng Wang, and Pengyi Wu
Abstract: With the increasing severity of cyber threats, cyber threat intelligence CTI has become a crucial tool for enhancing cyber security protection, Maximizing the potential value of threat intelligence requires properly and efficiently sharing, However, the sharing process often faces challenges such as quality assessment, To tackle the problem of quantification in quality assessment, this paper proposes a threat intelligence quality assessment model based on contribution calculation with the ATT CK framework, We introduce assessment metrics from event perspective, take TTPs Tactics, Techniques, and Procedures and other elements into account, and incorporate specific application scenarios to evaluate threat intelligence so as to provide practical guidance for security practitioners, Finally, we demonstrate the effectiveness, practicality, and high coverage in terms of event-relevant metrics of the model through experimental assessment.
Keyword: Threat Intelligence, Quality Assessment, Contribution Calculation,Intelligence Sharing, Security Application Scenario
Cite
@inproceedings{ICIC_2024,
author = {Guangxiang Dai, Peng Wang, and Pengyi Wu},
title = {Threat Intelligence Quality Assessment Model Based on ATT&CK Framework for Multiple Application Scenarios},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {807-825},
note = {Poster Volume Ⅰ}
}
-
Multi-scale Period-dependent Transformer for Time Series Forecasting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jiatian Pi, Chenyue Wang, Kanlun Tan, Xin Wang, and Qiao Liu
Abstract: The periodicity of the time series is useful in improving the performance of forecasting models by revealing long-term trends, seasonal variations and oscillatory phenomena,
Existing methods usually use a single-scale periodicity assumption at a certain fixed stage, which is uncoupled from the inherent multi-scale and continuous nature of the periodicity,
This leads to a bottleneck in the exploitation of the periodic property of the time series by these methods,
It limits the ability of the models to explore the underlying periodic information for capturing reliable dependencies and constructing them efficiently in forecasts,
To this end, in this paper, we make full use of the multi-scale information of time-series periodicity and construct a continuous periodic relational interaction at multiple different stages,
By modeling dependencies and feature aggregation at the sub-sequence level, we are able to break the bottleneck of underutilization of periodic information,
Specifically, we first extract the inherent stationary periodic measurement of sequence data and embed the multi-layers period pattern to model seasonal regularity,
Second, to capture the long-range periodicity correlation, we propose a novel attention mechanism that performs convergence of representation under the predictive paradigm with efficient sparse filtering based on periodic segments,
Third, we implement the sequence decomposition with multi-period scale to separate precisely tendency and seasonality,
Therefore, the intrinsic patterns of the time series can be reasonably deciphered and analyzed respectively,
Extensive experimental results on five benchmarks show that our method achieves favorable results, especially on the significantly periodic data.
Keyword: Time-series forecasting Periodicity Transformer
Cite
@inproceedings{ICIC_2024,
author = {Jiatian Pi, Chenyue Wang, Kanlun Tan, Xin Wang, and Qiao Liu},
title = {Multi-scale Period-dependent Transformer for Time Series Forecasting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {968-979},
note = {Poster Volume Ⅰ}
}
-
Meta Weighted Loss: Balanced Scene Graph Generation with Meta-Learning,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yisen Wang, Yang Wang, and Yuxin Deng
Abstract: Unbiased Scene Graph Generation SGG is a major development direction of SGG, Recent years, a number of great approaches have emerged in this field but many of them tend to overlook a fundamental factor - study on loss function, Because there is a serious conflict between datasets, loss function and pursuit metrics, In most relevant datasets, status of different predicates usually varies greatly, as the common predicate 'on' appears 400 times more frequently than the rare predicate 'walking in', But each predicate has the same weight in the loss function which does not properly reflect their status gap in datasets, And when we evaluate results, we also treat predicates in a uniform way, Now we can sum up this conflict with an interesting statement: sometimes fairness means a kind of unfairness, In response to this challenge, we introduce Meta Weighted Loss MWL , a approach based on meta-learning, MWL leverages meta-learning principles to construct a meta-neural network during model training, This network establishes a rational relationship between various predicates and their respective weight in the loss function so that the conflict above can be solved, We verify the effectiveness and generalization of this approach on multiple datasets, Comprehensive experiments demonstrate superior performance of MWL in SGG.
Keyword: Scene Graph Generation, Meta-Learning
Cite
@inproceedings{ICIC_2024,
author = {Yisen Wang, Yang Wang, and Yuxin Deng},
title = {Meta Weighted Loss: Balanced Scene Graph Generation with Meta-Learning},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {119-132},
note = {Poster Volume Ⅱ}
}
-
Syntax-aware Event Temporal Relation Extraction Using Constraint Graph,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Haijiao Liu, Jie Zhou, Xin Zhou, Fei Hu, Jiaqian Yin, and Xiaodong Wang
Abstract: Extracting temporal relations among events is essential in natural language understanding tasks, When two event mentions are widely separated in one text, the contextual information between them often becomes complicated and the temporal clues are difficult to locate, making inferring their temporal relationship more challenging, In this paper, we propose a novel approach named Constraint Graph-based and Syntax-aware Event Temporal Relation Extraction CGSE to address this issue, Specifically, we build temporal constraint rules by event attributes from databases to obtain prior temporal knowledge, Then we construct constraint graphs based on temporal constraint rules and present a graph neural network to model the dependencies, To eliminate irrelevant information in complicated contexts, we employ the Shortest Dependency Paths SDP between events in syntactic dependency parse trees, while also incorporating more temporal clues into the SDP, After that, we utilize a graph transformer to learn the representation of the SDP, Finally, a constraint fusion module is used to integrate constraint information and syntactic information to improve performance further, Experiments on two benchmark datasets, MATRES and TB-DENSE, establish that our proposed method demonstrates remarkable superiority over the previously existing state-of-the-art approaches in temporal relation extraction.
Keyword: Event temporal relation, Constraint graph, Dependency parse trees
Cite
@inproceedings{ICIC_2024,
author = {Haijiao Liu, Jie Zhou, Xin Zhou, Fei Hu, Jiaqian Yin, and Xiaodong Wang},
title = {Syntax-aware Event Temporal Relation Extraction Using Constraint Graph},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {529-546},
note = {Poster Volume Ⅱ}
}
-
Enhancing Accuracy for Metal Target Detection Using CNN-GP Algorithm,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiaofen Wang, Xiaotong Zhang, Yadong Wan, and Peng Wang
Abstract: Inversion based on electromagnetic induction EMI is an important method for detection of underground metal targets in fields such as archaeology and geological exploration, However, traditional inversion algorithms grounded in the framework of least squares suffer from long iteration times, susceptibility to local optima, and dependence on initial values, To address these challenges and enhance the detection of underground metal target detection, this paper proposes an innovative CNN-GP algorithm based on Convolutional Neural Network CNN and Gaussian Process GP , Our proposed algorithm initiates by extracting discriminative features based on CNN, followed by dimensionality reduction through a multilayer perceptron MLP to map the extracted features into low-dimensional vectors, and estimating the position of metal targets through GP algorithm, To refine the accuracy of the CNN-GP algorithm, this paper uses grid and Bayesian search algorithms for network optimization, Results demonstrate that the Bayesian search algorithm expeditiously identifies an optimal set of hyperparameters, yielding inversion performance compared with grid search algorithm, Comparative analyses of inversion efficacy between CNN, GP, MLP, and CNN-GP algorithms pre- and post-optimization reveal CNN-GP as the optimal performer, with inversion errors of 0, 5cm, 0, 5cm, and 2, 4cm along the x , y , and z direction, respectively.
Keyword: Underground Metal Target Detection, Electromagnetic Induction, Convolutional Neural Networks, Hyperparameter Optimization, Transfer Learning
Cite
@inproceedings{ICIC_2024,
author = {Xiaofen Wang, Xiaotong Zhang, Yadong Wan, and Peng Wang},
title = {Enhancing Accuracy for Metal Target Detection Using CNN-GP Algorithm},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {236-254},
note = {Poster Volume Ⅰ}
}
-
RPMF: In-hospital Mortality Risk Prediction based on Multimodal Fusion,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Changtong Ding, Shichao Geng, Quanrun Song, Yalong Liu, Yu Zhao, Xiangwei Zhang, and Lin Wang
Abstract: In predicting mortality and disease risk, deep learning in clinical decision-support technology to analyze structured electronic health
records EHR has been a highly scrutinized research area, However,
despite the abundance of narrative clinical diagnostic records and ICU
physiological indicators, we still believe there are two shortcomings in
current disease risk prediction efforts, On the one hand, current models
fail to utilize the available data fully and lack comprehensive modeling
of patient characteristics, On the other hand, existing studies have not
effectively captured potential correlations between multimodal data, In
this paper, we introduced a pioneering design concept based on the initial health state of the patient, which involves considering the patient's
current health status characterized by disease information as a key element in sequence modeling, In addition, we have innovatively adopted
the Informer model for processing time-series data of physiological indicators of ICU patients, More critically, we developed a multimodal feature interaction module that captures the interrelationships between different data modalities, Extensive experiments on real-world datasets show that our model significantly outperforms existing models, fully validating the efficiency of our proposed model.
Keyword: Machine learning, Electronic health records, Initial health status, In-hospital mortality risk prediction, Multimodal fusion
Cite
@inproceedings{ICIC_2024,
author = {Changtong Ding, Shichao Geng, Quanrun Song, Yalong Liu, Yu Zhao, Xiangwei Zhang, and Lin Wang},
title = {RPMF: In-hospital Mortality Risk Prediction based on Multimodal Fusion},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {547-563},
note = {Poster Volume Ⅱ}
}
-
LLM-driven Interactive document classification through Keyword Feedback,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Boan Yu, Mei Wang, Dehua Chen, Qiao Pan, and Yunhua Wen
Abstract: Document classification offers a concise comprehension of document content, which is crucial for document organization and management in real application, However, practical scenarios pose challenges due to limited annotated data and dynamic changes in document categories, In this paper, we propose an LLM-driven interactive document classification framework based on keyword feedback, which operates with minimal input-just the documents to be classified, We achieve this by first introducing an unsupervised learning based document classification framework, Then a keyword interaction process is designed to iteratively enhance the classifier's performance, The representative keyword explanations is generated in each iteration, which offer the most significant features or characteristics within each category, Crucially, an LLM feedback module is designed for interaction which offers category description and keyword feedback, facilitating seamless cooperation to enhance classification performance, Experimental results on benchmark datasets demonstrated that our framework significantly improves classifier accuracy when compared to methods lacking feedback with few feedback iterations.
Keyword: Interactive document classification, keyword feedback, LLM
Cite
@inproceedings{ICIC_2024,
author = {Boan Yu, Mei Wang, Dehua Chen, Qiao Pan, and Yunhua Wen},
title = {LLM-driven Interactive document classification through Keyword Feedback},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {371-381},
note = {Poster Volume Ⅱ}
}
-
The Design of a Deep Learning-based Adaptive Multi-Channel Fusion Network for Diabetes Diagnosis,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Peng Xia, Qi Qi, Yucong Duan, Ni Li, and Xinying Wang
Abstract: Accurate diagnosis of diabetes is crucial for effective health management of patients, Recent advances in machine learning have shown promising predictive results in diabetes diagnosis, In this paper, we developed an Adaptive Multi-Channel Fusion Network AMCFN , Specifically, we defined a feature enhancement module that combines attention mechanisms to adaptively enhance input data, Meanwhile, we designed a multi-channel fusion network capable of simultaneously extracting various deep features, including temporal and nonlinear features, from the input data, Extensive experiments were conducted on the Pima Indian Diabetes Dataset PIDD and the Early-Stage Diabetes Risk Prediction Dataset ESDRPD , Our model achieved high predictive accuracies of 95, 83 and 99, 6 , respectively, These results outperformed existing baseline models in diabetes diagnosis, Ablation experiments emphasized the power of the feature enhancement module and the multi-channel fusion network, Finally, we analyzed the prediction process of AMCFN using SHapley Additive exPlanations SHAP , The analysis results show the importance ranking of each feature to the model output in different channels, and the importance ranking of each channel to the final diabetes diagnosis, This enhances the interpretability of AMCFN and validates the effectiveness of the multi-channel design, Our model demonstrates potential in diabetes diagnosis and is expected to increase end-user trust and confidence in early detection of diabetes.
Keyword: Diabetes diagnosis, Adaptive Multi-Channel Fusion Network, Feature enhancement, SHapley Additive exPlanations
Cite
@inproceedings{ICIC_2024,
author = {Peng Xia, Qi Qi, Yucong Duan, Ni Li, and Xinying Wang},
title = {The Design of a Deep Learning-based Adaptive Multi-Channel Fusion Network for Diabetes Diagnosis},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {255-275},
note = {Poster Volume Ⅰ}
}
-
Relation-aware Subgraph Graph Neural Network for Modeling Document Relevance,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhenxiang Sun, Runyuan Sun, Zhifeng Liang, Bo Liu, Hui Liu, Zhenyu Li, and Hao Yuan
Abstract: In the context of the information age, the exponential growth in the volume of document data makes it challenging to retrieve information quickly and accurate-ly, Traditional keyword-based retrieval methods have limitations and cannot ef-fectively capture the semantic information of a query, leading to irrelevant retriev-al results, To improve the accuracy of retrieval, researchers have started to use knowledge graph KG tools to enhance the matching of document retrieval re-sults, however, direct retrieval using graph structures is limited by exponential complexity and the inability to model distant related documents, To solve this problem, we propose a new information retrieval model, SGDR Subgraph Neu-ral Network-based Graph Representation and Document Retrieval , which utiliz-es relational subgraph neural networks to deeply mine the structural information and semantic associations in document KG, The SGDR models the relevance of documents mainly from semantic relations and local structure in the KG, The ex-perimental results show that the SGDR model outperforms several baseline mod-els on the DocIR dataset, including significant improvements in the key perfor-mance metric AUC, The effectiveness of each module in the model is verified through ablation experiments, and the results emphasize the importance of initial-izing a deep representation of the document knowledge graph.
Keyword: Government Entity Recognition, Multi-feature Fusion, Multi-headed Attention Mechanism, Semantic Representation
Cite
@inproceedings{ICIC_2024,
author = {Zhenxiang Sun, Runyuan Sun, Zhifeng Liang, Bo Liu, Hui Liu, Zhenyu Li, and Hao Yuan},
title = {Relation-aware Subgraph Graph Neural Network for Modeling Document Relevance},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {733-744},
note = {Poster Volume Ⅱ}
}
-
A Study on Explainable Inference Prediction of Diabetes Complications Based on Medical Knowledge Graph,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Shouqiang Liu, Mingyue Jiang, and Linying Su
Abstract: This paper presents a LightGBM-SMOTE-ENN model that uses a medical knowledge graph to predict diabetes complications with improved accuracy and interpretability, In response to the public health challenge posed by diabetes, this research utilizes advanced AI to analyze medical data, integrating patient information with symptom vectors from the knowledge graph to develop a reliable classification tool, The model's effectiveness is demonstrated through superior performance metrics such as accuracy, recall, and F1 score, attributed to a SHAP value-based method for interpretability, Future directions include expanding the knowledge graph and optimizing algorithms for broader application, This work not only advances diabetes complication prediction but also leverages medical knowledge graphs for clinical support, aiming to enhance healthcare quality and patient outcomes.
Keyword: Medical knowledge graph Knowledge graph construction Diagnostic reasoning for diabetic complications Interpretability
Cite
@inproceedings{ICIC_2024,
author = {Shouqiang Liu, Mingyue Jiang, and Linying Su},
title = {A Study on Explainable Inference Prediction of Diabetes Complications Based on Medical Knowledge Graph},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {810-826},
note = {Poster Volume Ⅱ}
}
-
An Innovative Zero-Shot Inference Approach Based on Deep Learning,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhuo Lei, Wei Li, Xiangwei Zhang, Qiang Yu, Lidan Shou, Shengquan Li, and Yunqing Mao
Abstract: We present a novel multi-modal zero-shot inference framework for urban management applications, particularly in retail environments, The deep learning model fuses multi-scale CNN-based object detection with self-attention mechanisms to enhance the identification of unauthorized activities and complex categorization tasks in fixed-point surveillance scenarios, Innovative components include lightweight channel aggregation modules that reduce high-dimensional representations and intermediate interactions are captured through multi-stage gate aggregation, Spatial aggregation extracts context-aware multi-level features, addressing limitations of traditional DNN, Attention down-sampling is integrated to address computational challenges when applying Transformers on high-resolution imagery, Multi-modal learning bypasses explicit class labels by directly training on raw text-image pairs using contrastive learning, This enables the model to learn from natural language supervision and perform zero-shot recognition across unseen categories, We obtain the state-of-the-art performance both in public dataset and our own urban management dataset.
Keyword: Zero-shot Learning, Multi-modal, Object Detection, Transformer
Cite
@inproceedings{ICIC_2024,
author = {Zhuo Lei, Wei Li, Xiangwei Zhang, Qiang Yu, Lidan Shou, Shengquan Li, and Yunqing Mao},
title = {An Innovative Zero-Shot Inference Approach Based on Deep Learning},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {133-146},
note = {Poster Volume Ⅱ}
}
-
Mixed Feature Processing Model for Few-Shot Object Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qian Jiang, Shuting Li, Dakai Sun, Yu Wang, Biaohua Liu, Shengfa Miao, and Xin Jin
Abstract: Traditional object detection methods typically require large-scale annotated training data, However, in some areas, acquiring a large amount of annotated data can be extremely challenging, To address the issue of Few-Shot Object Detection FSOD , researchers have introduced the concept of meta-learning, Currently, meta-learning is widely applied in two-stage object detection, We have identified several key issues affecting the accuracy of FSOD, including limited data, insufficient feature extraction capabilities, and the aggregation method between different features, To more finely extract features and better aggregate features, we separate the support branch and query branch of Meta-RCNN, forming two parallel branches, We create one mixed feature processing model for few shot object detection, we put the Feature Pyramid Network FPN only into the backbone network of the query branch, creating a strong baseline to enhance the extraction capabilities for images of different dimensions, Additionally, for the first time in FSOD, we use a Variational Autoencoder VAE model to extract features, Which achieves data augmentation and improves the generalisation ability of the network by adding the VAE to the support branch to obtain more useful information in the support set, In addition to this, we design a module $R$ to aggregate the output support image features with the query image features on the query branch, the aggregated results are fed into the detection head of the object detection process, Experimental results demonstrate that the proposed method exhibits good performance, Following the experimental settings for FSOD, we conducted extensive experiments on the PASCAL VOC dataset, showing that our method is superior to other methods currently available and achieves very satisfactory results.
Keyword: Few-Shot Object Detection,Feature Pyramid Network,Variational Autoencoder
Cite
@inproceedings{ICIC_2024,
author = {Qian Jiang, Shuting Li, Dakai Sun, Yu Wang, Biaohua Liu, Shengfa Miao, and Xin Jin},
title = {Mixed Feature Processing Model for Few-Shot Object Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {147-160},
note = {Poster Volume Ⅱ}
}
-
CE-TransUnet: A Convolutional Enhanced Model for Pulmonary Alveolus Pathology Image Segmentation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yongkun Chen, Yu Qiu, Jierui Liu, Shiming Zha, Huayi He, and Zheng Li
Abstract: Pulmonary alveolus segmentation plays an important role in the diagnosis of alveolar emphysema and lobar pneumonia, Besides, if segmented and calculated precisely, the area of tumor beds in non-small-cell lung cancers could be readily calculated and hence can help determine the severity of one's cancer, These factors render the segmentation of alveolar pathological images highly meaningful, However, we have not identified any existing publicly available dataset of alveolar pathological images and no existing methods focused on the segmentation of alveolus, Therefore, we are here to introduce our original Pulmonary Alveolus Pathology Image dataset PAPI , Additionally, those widely-used and several state-of-the-art medical segmentation methods perform passable, not expected, on PAPI, So we innovate our method Convolutional Enhanced Transformer-based U-net abbreviated as CE-TransUnet , which is a combination of improved U-net structure and our innovative CE-Transformer block, We circumspectly detect salient characteristics of the pulmonary alveolus and make counterpart improvements in both CE-Transformer blocks and U-net structure, Our experimental results have shown that these adjustments has made our model surpass the current common segmentation models in performance on PAPI and reach a Dice score of 95, 31, We are also exploring the robustness of our model to adapt it to a wider range of scenarios.
Keyword: semantic segmentation, computer vision, pulmonary alveolus, medical digital pathology image, transformer
Cite
@inproceedings{ICIC_2024,
author = {Yongkun Chen, Yu Qiu, Jierui Liu, Shiming Zha, Huayi He, and Zheng Li},
title = {CE-TransUnet: A Convolutional Enhanced Model for Pulmonary Alveolus Pathology Image Segmentation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {382-399},
note = {Poster Volume Ⅱ}
}
-
PELMo: Prompt-based Ensemble Expert Language Models with Multi-label Routing,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jiansheng Wang, Jian Zhang, Yuzhi Mu, Wei Han, Xuefan Xu, Junyu Shen, and Tianming Ma
Abstract: The large language models LLMs utilize few-shot and zero-shot prompting to support tasks across multiple domains better, Despite LLM's strong performance on a wide range of natural language tasks, a single LLM is often difficult to generalize to multiple domains that require dif-ferent knowledge and abilities, To overcome this problem, we introduce PELMo, an ensemble framework designed to attain consistently superior performance by leveraging the diverse strengths of multiple language expert models, Our method combines multiple expert models by training an additional routing model, First, by optimizing prompts with instruction for different tasks, we obtain expert models with different task capabilities based on the same backbone, Af-terwards, a multi-label routing model is trained to select k top-ranked expert models for each question strategically, Finally, the outputs of the selected expert models at the final layer are through weighted averaging to generate the ultimate answer, Our results demonstrate that PELMo outperforms the expert models within the target domain and achieves robust capabilities in the whole scope of tasks, Overall, these results demonstrate the benefits of ensembling k top-ranked expert models during language modeling.
Keyword: large language models,expert models,prompt,ensemble
Cite
@inproceedings{ICIC_2024,
author = {Jiansheng Wang, Jian Zhang, Yuzhi Mu, Wei Han, Xuefan Xu, Junyu Shen, and Tianming Ma},
title = {PELMo: Prompt-based Ensemble Expert Language Models with Multi-label Routing},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {745-759},
note = {Poster Volume Ⅱ}
}
-
Cross-relational attention mechanism-driven graph neural network and Apriori algorithm are used for knowledge reasoning in knowledge graphs,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Kuang Wei, Yifei Wei, Huimei Wen, and Yuhong Su
Abstract: Knowledge reasoning for knowledge graphs refers to predicting unknown re-lationships in knowledge graphs to achieve automatic completion and expan-sion of knowledge, For the task of link prediction in knowledge graphs, this paper proposes an improved algorithm for link prediction in knowledge graphs based on relational graph neural networks and cross-relation attention mechanism, and integrates the Apriori association rule mining algorithm, The cross-relation attention mechanism enables information transfer across mul-tiple relationships between nodes, improving the performance of graph neu-ral networks, Using the Apriori association rule mining algorithm for data preprocessing can greatly filter out useless information in the inference in-put, improving the quality of the inference results, Finally, this model was compared with GCN, GAT, and R-GCN on two datasets, FB15K-237 and WN18, and the effectiveness of the proposed method was demonstrated, When training with a subgraph size of 80,000, the model achieves an MRR of 0, 2753 on the FB15K-237 dataset and 0, 9054 on the WN18 dataset.
Keyword: Link prediction, Cross-relation attention mechanism, Apriori algorithm
Cite
@inproceedings{ICIC_2024,
author = {Kuang Wei, Yifei Wei, Huimei Wen, and Yuhong Su},
title = {Cross-relational attention mechanism-driven graph neural network and Apriori algorithm are used for knowledge reasoning in knowledge graphs},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {564-578},
note = {Poster Volume Ⅱ}
}
-
Learning to Solve Vehicle Routing Problems with Soft Time Windows via Collaborative Transformer,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zengjian Yang, Junqing Li, and Xiaolong Chen
Abstract: Over the past several years, there has been a rapid evolution in harnessing advanced deep reinforcement learning techniques to address challenges, including but not limited to the traveling salesperson problem TSP and the vehicle routing problem VRP , However, the effectiveness of existing deep architectures for the vehicle routing problem with soft time windows VRPSTW is compromised by their integration of node and positional information into a single unified representation, In this article, we design a novel Collaborative Transformer framework based on deep reinforcement learning architecture to learn the node features eg, locations, time window and positional features separately to avoid incompatible correlations , so as to improve the learning ability, During training, we leverage the Proximal Policy Optimization PPO algorithm to update the parameters of the model, This CT architecture serves as the policy network in the PPO framework, Tested on three datasets with customer points of 20, 50 and 100 respectively, experiments show that our method outperforms existing DRL architecture, showcasing its effectiveness in solving the given task.
Keyword: Vehicle routing problem with soft time windows, Transformer, Deep reinforcement learning
Cite
@inproceedings{ICIC_2024,
author = {Zengjian Yang, Junqing Li, and Xiaolong Chen},
title = {Learning to Solve Vehicle Routing Problems with Soft Time Windows via Collaborative Transformer},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {400-411},
note = {Poster Volume Ⅱ}
}
-
CFMT: A Music Transcription Model using Conformer Architecture,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yulong Wang, Hailong Yu, Fengchi Sun, and Jianyu Zhou
Abstract: Automatic music transcription is a fundamental process for converting audio recordings of musical compositions into symbolic representations, This study extends the current state of the art in automatic music transcription with a specific focus on leveraging conformer models, known for their exceptional performance in speech recognition, Furthermore, this research introduces an innovative approach designed to address the longstanding challenge of missing note-end events, which has previously hindered the accurate evaluation of Seq2Seq models using frame-wise metrics, Empirical findings reveal that a slightly modified Conformer model surpasses existing models across a spectrum of evaluation metrics, even outperforming models trained on distinct iterations of the MAESTRO dataset, Notably, this research contributes to the enhancement of frame-wise evaluation metrics for Seq2Seq models by providing estimations of possible note lengths for ongoing musical notes, resulting in a substantial improvement in evaluation accurac.
Keyword: Automatic music transcription, Conformer, Note events
Cite
@inproceedings{ICIC_2024,
author = {Yulong Wang, Hailong Yu, Fengchi Sun, and Jianyu Zhou},
title = {CFMT: A Music Transcription Model using Conformer Architecture},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {412-427},
note = {Poster Volume Ⅱ}
}
-
Feature Fusion Network for Skeleton-based Action Recognition,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Wei Guo, Haonan Ma, Zikai Li, and Jianwen Chen
Abstract: With the increasing demand for intelligence in human-computer interaction, secu-rity monitoring, intelligent nursing, and sports analysis, the development of hu-man skeletal behavior recognition technology has attracted more attention, How-ever, current methods based on human skeleton recognition encounter a balance issue between model complexity and accuracy, and struggle to comprehensively extract the required features, To address these challenges, this study proposes a Feature Fusion Network based on ST-GCN as the backbone network, which achieves comprehensive and detailed feature extraction through multiple feature fusion operations within the network, The parameter count of FFN is only 3, 35 million, It achieves accuracies of 94, 24 and 98, 30 on the 2D skeletal data of the NTU RGB_D 60 dataset using the cross-subject and cross-view partition cri-teria, respectively, On the NTU RGB_D 120 dataset, it achieves accuracies of 87, 31 and 90, 95 using the cross-subject and cross-setup partition criteria, re-spectively, representing a state-of-the-art performance in the field of deep learning for skeleton action recognition.
Keyword: graph convolution skeletal action recognition feature fusion
Cite
@inproceedings{ICIC_2024,
author = {Wei Guo, Haonan Ma, Zikai Li, and Jianwen Chen},
title = {Feature Fusion Network for Skeleton-based Action Recognition},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {276-289},
note = {Poster Volume Ⅰ}
}
-
Face Age Estimation with Multi-feature Fusion Model,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Van Anh Nguyen
Abstract: Age information is one of the most important features of human, so the task of extracting age features from face images has been received extensive at-tention, attracting many researchers, The appearance of a human face with the growth of age is affected by factors such as the difference of gender, race, environments. so, these tasks are of great significance and also brings challenges, In recent years, many researchers use deep learning techniques to solve the task of face age estimation, Specifically, use the VGG16 model to extract features, then use the classifier to estimate the age, However, disad-vantage of this model which uses a lot of parameters, plus the depth of the network level so that slow model operation, Other researcher, use a rank-consistent ordinal regression method, using the ResNet34 structure to ex-tract features, then combining the two-category extension method to achieve the age prediction task, This model has better results than the pre-vious ordinal regression network on the UTKFace dataset but the MAE value is still large, To overcome the aforementioned shortcomings and im-prove accuracy, we have introduced a composite model that leverages vari-ous types of features, known as TransCNNFusion, The TransCNNFusion model combines the feature extraction abilities of the Attention mechanism with the local facial feature extraction of CNN, Experimental results demonstrate that the proposed model is as effective as or even superior to other Vision Transformer and CNN models, indicating its potential for practical applications.
Keyword: Estimate age global feature attention local feature CNN
Cite
@inproceedings{ICIC_2024,
author = {Van Anh Nguyen},
title = {Face Age Estimation with Multi-feature Fusion Model},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {317-328},
note = {Poster Volume Ⅰ}
}
-
YOLORG: A multi-scale intestinal Organoid detection algorithm,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhipeng Shang, Xun Deng, Tianyu Sun, Feng Tan, Lun Hu, Xi Zhou, and Pengwei Hu
Abstract: Intestinal organoids have shown great research value in areas such as drug screening and disease modeling due to their unique physiological properties, However, the morphological diversity of organoids made the accurate acquisition of morphological information particularly critical and challenging, Although conventional fluorescent labeling assays could provide a certain degree of morphological information, the potential risk of compromising the integrity of organoids could not be ignored, Traditional bounding box detection methods were not capable of capturing details when dealing with the complex and variable morphology of intestinal organoids, Meanwhile, the huge size of the gut organoid image dataset, and the subjective and time-consuming manual classification, made it difficult to meet the research demand for high efficiency, Although some deep learning methods had made significant progress in the field of image processing, they still faced great challenges in dealing with complex structures such as organoids, which had significant shape and size heterogeneity, The paper proposes an intestinal organoid detection method YOLORG, YOLORG employed a multi-scale feature extraction module to fuse the multi-scale attributes of organoid specimens, This method effectively eliminated background interference and image noise, thus improving the accuracy and robustness of organoid detection.
Keyword: Intestinal Organoids, Detection, Multi-scale
Cite
@inproceedings{ICIC_2024,
author = {Zhipeng Shang, Xun Deng, Tianyu Sun, Feng Tan, Lun Hu, Xi Zhou, and Pengwei Hu},
title = {YOLORG: A multi-scale intestinal Organoid detection algorithm},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {980-991},
note = {Poster Volume Ⅰ}
}
-
Enhancing Container Damage Detection with improved YOLOv5 Model: Integrating Swin Transformer,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jiahao Chen, Chen Dong, and Yuxuan Wan
Abstract: To improve the safety of port logistics transportation, container damage detection is critical, Container damage is diverse and includes small-scale object damage e, g, , holes, dents, scratches , Traditional object detection algorithms used for container damage detection suffer from low accuracy and high miss rates for small-scale objects, This paper proposes an improvement to the YOLOv5 model based on the Transformer self-attention mechanism for container damage detection, To effectively capture global and long-range relationships in damage images, two layers of Swin Transformer blocks are added to the backbone network of YOLOv5, The PANet in YOLOv5 Neck has been optimized to BiFPN, Enhanced ability to fuse multi-scale features in damaged images while reducing computational complexity and information loss, Furthermore, use the Focaler-IoU Loss Function to improve the balance of features extracted from different samples in the dataset, The training set is clustered using the KMeans algorithm to obtain 9 initial anchor boxes more suitable for the container damage dataset, Experimental results on the COCO and Tianjin Port official container damage datasets validate that the improved model achieves an mAP of 95, 4 , This outperforms common object detection algorithms such as Fast-RCNN and YOLOv5.
Keyword: Container damage detection, Improved YOLOv5, Transformer, BiFPN
Cite
@inproceedings{ICIC_2024,
author = {Jiahao Chen, Chen Dong, and Yuxuan Wan},
title = {Enhancing Container Damage Detection with improved YOLOv5 Model: Integrating Swin Transformer},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {670-685},
note = {Poster Volume Ⅰ}
}
-
The loss model with class variance for fine-grained classification,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qian Long, Bolun Zhu, Gaihua Wang, and Hongwei Qu
Abstract: We propose a loss model with class variance for fine-grained image classification, It adopts basic convolutional neural network to get features, The dates from dataset are shuffle selected as inputs according to batch size and their outputs are processed by attention model, Because of class variance in the same class is smaller and that in the different class is larger, in the training phase, we use class variance to define the loss function, The total loss model combines the loss function with class variance and label loss function, Both are jointly employed to fast convergence, Compared with state-of-the-art methods, experimental results demonstrate our model has better performance.
Keyword: Class variance Fine-grained Image classification Loss function
Cite
@inproceedings{ICIC_2024,
author = {Qian Long, Bolun Zhu, Gaihua Wang, and Hongwei Qu},
title = {The loss model with class variance for fine-grained classification},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {290-299},
note = {Poster Volume Ⅰ}
}
-
Traffic Classification over Tor Netwrok Based on RGB Images,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Depeng Chen, Xiao Wu, Jie Cui, and Hong Zhong
Abstract: The emergence of the Tor anonymous communication system can effectively protect users' identities from being leaked by untrusted destinations and third parties on the Internet, However, there are endless cases of anonymous abuse using the Tor anonymous communication system to hide real identities and engage in cybercrime activities, Therefore, it is of great research significance to effectively identify Tor traffic,
To distinguish different categories of Tor traffic and different categories of regular traffic, traditional gray image data processing methods are widely used, but gray images cannot represent richer color information, In this regard, our paper proposes an RGB image data processing method and combines deep learning to classify Tor traffic, We first verify the impact of the image-saving format on model performance, then explore the impact of different assignment methods of the RGB image on our experimental results, and finally compare the performance of the model trained by the RGB image method and the conventional gray image method, Experimental results show that this method can effectively identify different Tor traffic and regular traffic with extremely high accuracy.
Keyword: Tor anonymous communication system RGB image assignment method traffic classification deep learning
Cite
@inproceedings{ICIC_2024,
author = {Depeng Chen, Xiao Wu, Jie Cui, and Hong Zhong},
title = {Traffic Classification over Tor Netwrok Based on RGB Images},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {826-838},
note = {Poster Volume Ⅰ}
}
-
STMDF: An Effective Approach for Malicious Domain Detection through Dynamic Spatial-Temporal Analysis,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Hongwu Li, JianQiang Li, Xingyu Fu, DongZheng Jia, Yujia Zhu,Han Wang, and Qingyun Liu
Abstract: The Internet is widely used for network attacks, such as phishing,
fraud, gambling, the spread of malware, and botnets, Domains play a crucial
role in attackers' network communication due to their low cost and flexibility,
Attackers frequently change or transfer malicious domains to evade detection,
making it challenging to capture complete associations between domains and
related resources, The inherent relationships among domains are difficult to
forge, for instance, stable connections exist between domain operators from the
same organization or between domains providing similar services, Recent
research has employed graph learning techniques, including bipartite graphs,
homogeneous graphs, and heterogeneous graphs, to integrate domain attributes
and association information for uncovering implicit relationships between
domains, However, approaches based on bipartite or homogeneous graphs have
limited association information, while methods based on heterogeneous graphs
require expert knowledge to design meta-paths and overlook the heterophilic
interactions of the domain association graph, where two associated domains
may not belong to the same label type, Furthermore, domains and related
resources are dynamic, with attributes and associations changing over time,
Previous methods have failed to consider the spatiotemporal characteristics, In
summary, malicious domain identification techniques require reduced reliance
on expert knowledge, consideration of the heterogeneity in graph networks, and
attention to the spatio and temporal dynamics of domains and associated
resources, In this paper, we propose a novel STMDF model for detecting
malicious domains, which utilizes RNN and attention modules to learn temporal
information, addressing the complex challenges in malicious domain
identification, To validate the effectiveness of our approach, we conduct
comprehensive comparisons with various existing detection models,
demonstrating the superiority of our method.
Keyword: Spatial-temporal Snapshot Graph Learning, Attention Mechanism
Malicious Domain Identification,
Cite
@inproceedings{ICIC_2024,
author = {Hongwu Li, JianQiang Li, Xingyu Fu, DongZheng Jia, Yujia Zhu,Han Wang, and Qingyun Liu},
title = {STMDF: An Effective Approach for Malicious Domain Detection through Dynamic Spatial-Temporal Analysis},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {839-850},
note = {Poster Volume Ⅰ}
}
-
CESNet: Cross-dimensional information extraction and channel sharing,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Qian Long, Gaihua Wang, and Kehong Li
Abstract: To improve detection accuracy, it proposes cross-dimensional information extraction and channel sharing CESNet , The cross-dimensional information extraction CE module uses max pooling and average pooling to strengthen important features in different dimensions, and then interacts across chan-nels to focus on regions of interest, Channel sharing CS module of involu-tion, group convolution and efficient channel attention for deep convolu-tional neural networks ECA-Net , And it can reduce the loss of semantic in-formation caused by channel reduction during feature fusion, Experiments show that the proposed method can work on different networks, Among them, the accuracy of CESNet reaches 34, 1 in box AP on COCO dataset, And the detection performance of our network is better than other networks.
Keyword: Deep learning, Object detection, CE module, CS module
Cite
@inproceedings{ICIC_2024,
author = {Qian Long, Gaihua Wang, and Kehong Li},
title = {CESNet: Cross-dimensional information extraction and channel sharing},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {300-316},
note = {Poster Volume Ⅰ}
}
-
Gated Cross-modal Attention and Multimodal Homogeneous Feature Discrepancy Learning for Speech Emotion Recognition,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Feng Li and Jiusong Luo
Abstract: Understanding human emotions from speech is crucial for computers to comprehend human intentions, Human emotions are expressed through a wide variety of forms, including speech, text, and facial expressions, However, most speech emotion recognition fails to consider the interactions between different information sources, Therefore, we propose a multimodal speech emotion recognition framework that integrates information from different modalities via a gated cross-modal attention mechanism and multimodal homogeneous feature discrepancy learning, Specifically, we firstly extract acoustic, visual and textual features using different pre-train model, respectively, Then, A-GRU-LVC Auxiliary Gated Recurrent Unit with learnable Vision Center and A-GRU Auxiliary Gated Recurrent Unit is used to further extract emotion-related information for visual and text features, Additionally, we design a gated cross-modal attention mechanism to dynamically fusion multimodal fusion features, Finally, we introduce multimodal homogeneous feature discrepancy learning to better capture differences among different emotion samples, Evaluation results show that our proposed model can achieve better recognition performance than the previous methods on the IEMOCAP dataset.
Keyword: Speech emotion recognition Multimodal Wav2vec 2, 0 Cross-modal attention mechanism
Cite
@inproceedings{ICIC_2024,
author = {Feng Li and Jiusong Luo},
title = {Gated Cross-modal Attention and Multimodal Homogeneous Feature Discrepancy Learning for Speech Emotion Recognition},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {329-338},
note = {Poster Volume Ⅰ}
}
-
Reverse nearest neighbourhood query based on road social networks,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yaoyu Liu, Shaopeng Wang, Chunkai Feng, and Wei Guo
Abstract: With the increasing popularity of mobile devices that support spatial position-ing, numerous location-based service LBS systems have been put into place and widely adopted by users of mobile devices, Reverse nearest neighbor RNN queries are essential supporting techniques in these systems, A new and useful variant of RNN queries has emerged recently, known as reverse nearest neighbourhood concept for road networks RNNH-RN , to discover the neigh-bourhood that finds the query point is the nearest facilities among all other fa-cilities, However, existing research has primarily focused on spatial queries, and to the best of our knowledge, there is no technique available for computing queries that incorporate social network information, The questions people cur-rently ask about road networks are not applicable to road social networks di-rectly, In this paper, we introduce the reverse nearest neighbourhood query based on road social networks RNNH-RS , where a neighbourhood is a set of at least m objects, ensuring that the maximum road network distance between any objects is at most d, and the objects within the neighbourhood have at least k, familiar acquaintances, We validated the flexibility and effectiveness of the proposed query through experiments on a real-world road social network da-taset.
Keyword: Road social network, Reverse nearest neighbor, Social influence, Spatial data-bases
Cite
@inproceedings{ICIC_2024,
author = {Yaoyu Liu, Shaopeng Wang, Chunkai Feng, and Wei Guo},
title = {Reverse nearest neighbourhood query based on road social networks},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {579-598},
note = {Poster Volume Ⅱ}
}
-
Inference of gene regulatory network with regulation type based on signed graph convolutional network from time-series data,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zi-Qiang Guo, Zhen Gao, Chun-Hou Zheng, and Pi-Jing Wei
Abstract: Gene regulatory network GRN inference has been an essential challenge in systems biology, Currently, most existing methods for GRN reconstruction ignore the information about the regulation types, such as activation or inhibition regulation, Additionally, concerning the characteristics of time-series data, most methods employ the same approach to process the time-series expression values of different samples, without considering the differences in gene expression values among them, To this end, this work proposes the SGCGRNT model Signed Graph Convolutional neural network for GRN Inference from Time-series data , which utilizes a signed graph convolutional network to infer GRNs with both the direction and regulatory type from time-series data, In addition, we define Spear-man's Rank Correlation Mutual Information S-RMI to enable SGCGRNT to adapt to various types of gene expression data, Furthermore, the sampling idea of GraphSAGE is adopted, which can significantly save time and resources when processing large sample datasets, Experimental results demonstrate SGCGRNT can accurately predict GRNs with both direction and regulation types.
Keyword: Gene regulatory network, Signed graph convolutional network, Link prediction, Regulation type
Cite
@inproceedings{ICIC_2024,
author = {Zi-Qiang Guo, Zhen Gao, Chun-Hou Zheng, and Pi-Jing Wei},
title = {Inference of gene regulatory network with regulation type based on signed graph convolutional network from time-series data},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {827-838},
note = {Poster Volume Ⅱ}
}
-
On Finding Short Addition Chains for Large Integers,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiaopeng Zhao, Zhusen Liu, and Jiawei Qian
Abstract: The addition chain for a given exponent $n$ is an increasing sequence of positive integers: its first term is $1$, and each subsequent term is obtained by adding the previous two terms which can be the same , so that the last element of the sequence is equal to $n$, Constructing the shortest addition chain for a fixed exponent $n$ is the most efficient method for computing $x^n$ in some group under multiplication, Therefore, the addition chain plays a crucial role in modern cryptography, It can improve the computational efficiency of cryptographic algorithms that require fast exponentiation, such as RSA, ElGamal, Paillier, and ECC, etc, However, the problem of finding the shortest additive chain is textbf{NP}-Complete, Moreover, the existing evolutionary algorithms can not work well for finding short addition chains for large integers, This paper integrates genetic algorithms with the window method to obtain an efficient strategy for the addition chain problem involving large integers, We do experiments on an RSA-1536 modulus to verify the efficiency and practicability of our algorithm.
Keyword: Addition chains, Genetic algorithms, Window method, Exponentiation, Cryptography
Cite
@inproceedings{ICIC_2024,
author = {Xiaopeng Zhao, Zhusen Liu, and Jiawei Qian},
title = {On Finding Short Addition Chains for Large Integers},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {428-437},
note = {Poster Volume Ⅱ}
}
-
Prediction by Machine Learning Analysis of Genomic Data Phenotypic Frost Tolerance in Perccottus glenii,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Lilin Fan, Xuqing Chai, Zhixiong Tian, Yihang Qiao, Zhen Wang, and Yifan Zhang
Abstract: Analysis of the genome sequence of Perccottus glenii, the only fish known to possess freeze tolerance, holds significant importance for understanding how organisms adapt to extreme environments, Traditional biological analysis methods are time-consuming and have limited accuracy, To address these issues, we will employ machine learning techniques to analyze the gene sequences of Perccottus glenii, with Neodontobutis hainanens as a comparative group, Firstly, we have proposed five gene sequence vectorization methods and a method for handling ultra-long gene sequences, We conducted a comparative study on the three vectorization methods: ordinal encoding, One-Hot encoding, and K-mer encoding, to identify the optimal encoding method, Secondly, we constructed four classification models: Random Forest, LightGBM, XGBoost, and Decision Tree, The dataset used by these classification models was extracted from the National Center for Biotechnology Information database, and we vectorized the sequence matrices using the optimal encoding method, K-mer, The Random Forest model, which is the optimal model, achieved a classification accuracy of up to 99, 98 , Lastly, we utilized SHAP values to conduct an interpretable analysis of the optimal classification model, Through ten-fold cross-validation and the AUC metric, we identified the top 10 features that contribute the most to the model's classification accuracy, This demonstrates that machine learning methods can effectively replace traditional manual analysis in identifying genes associated with the freeze tolerance phenotype in Perccottus glenii.
Keyword: genome sequence machine learning vectorization method SHAP
Cite
@inproceedings{ICIC_2024,
author = {Lilin Fan, Xuqing Chai, Zhixiong Tian, Yihang Qiao, Zhen Wang, and Yifan Zhang},
title = {Prediction by Machine Learning Analysis of Genomic Data Phenotypic Frost Tolerance in Perccottus glenii},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {839-856},
note = {Poster Volume Ⅱ}
}
-
High Training Efficiency Transformer for Multi-scenario Non-Autoregressive Neural Machine Translation,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiangyu Qu, Guojing Liu, and Liang Li
Abstract: Non-autoregressive neural machine translation NAT focuses on improving reasoning efficiency through parallel decoding, However, NAT models training method lack improvement compared with the autoregressive translation AT models, which leads to an imbalance between training efficiency and inference speed, In this paper, we propose Padding Accelerated Training PAT for NAT, Specifically, we pad short sentences not with padding tokens but with another real training sentence, and apply Sequence Concatenating attention SC to obtain the sentence-level blocking matrix to prevent multiple sentences from interfering with each other, Experiments show that PAT is applicable to both sentence-level and document-level machine translation scenarios, While ensuring translation performance, PAT improves training speed by more than 2 times in multiple experimental tasks.
Keyword: Machine translation Non-autoregressive Transformer High Efficiency Training
Cite
@inproceedings{ICIC_2024,
author = {Xiangyu Qu, Guojing Liu, and Liang Li},
title = {High Training Efficiency Transformer for Multi-scenario Non-Autoregressive Neural Machine Translation},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {760-770},
note = {Poster Volume Ⅱ}
}
-
High Utility Pattern Fusion by Pretrained Language Models for Text Classification,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yujia Wu, Hong Ren, Xuan Zhang, and Guohua Xiao
Abstract: In the area of text classification, the identification of correlation patterns among semantics presents a persistent challenge, To tackle this issue, we propose a method called High Utility Pattern HUP fusion by Pretrained Language Models for Text Classification, which aims to enhance the performance of text classifica-tion techniques by learning correlation patterns among semantics within the same space, Specifically, HUP employs a Triplet Networks architecture, which utilizes three distinct encoders to extract sample semantics, correlation pattern infor-mation, and label semantic information, respectively, We employ a high-utility itemset mining algorithm to extract correlation pattern information with high utili-ty, and by incorporating prompt templates into labels, the model is able to fully leverage the semantic knowledge embedded in pre-trained models, Ultimately, through joint training, the distance between a sample and its corresponding label is minimized, while the distance between the sample and labels that are not asso-ciated with the sample is maximized, Empirical investigations conducted on six standard text classification datasets reveal that the classification accuracy of HUP exhibits a notable enhancement, with an average accuracy increase ranging from 1, 52 to 89, 08 .
Keyword: Text Classification, Transformer Encoders, High Utility Pattern, Pre-trained Language Models
Cite
@inproceedings{ICIC_2024,
author = {Yujia Wu, Hong Ren, Xuan Zhang, and Guohua Xiao},
title = {High Utility Pattern Fusion by Pretrained Language Models for Text Classification},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {339-350},
note = {Poster Volume Ⅰ}
}
-
: Rumor Detection based on Social Immune Network,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Mingrui Liu, Zexian Xie, Jielin Chen, and Binyang Li
Abstract: The dissemination of rumors on social media will severely endanger political, economic, and social security, which highlights the importance of rumor detec-tion, Current studies mainly focus on capturing content information or propaga-tion pattern of message cascade, but most of these methods do not describe pre-cisely the potential impact among tweets and tweet's influence in message cas-cade, To tackle the above issue, this paper considers the spread of a rumor on so-cial media as the procedure of immune response in organism, where the users as immune cells, and the retweets as antibodies, A rumor detection model based on Social Immune Network is proposed, named SIN, which is able to utilize the in-stantaneous rate of change in the number of immune cells users and antibodies retweets with certain stance to describe tweet's influence, In this process, inter-actions among different retweets and users with different stances can be explored, thereby investigating the potential impact of each tweet, Extensive experiments conducted based on PHEME dataset show that SIN outperforms State-Of-The-Art method, with 2, 8 higher in F1 value of 84, 7 , and 2, 9 higher in accuracy of 86, 2 .
Keyword: Rumor Detection, Stance Classification, Dynamic Immune Network, Social Im-mune Network
Cite
@inproceedings{ICIC_2024,
author = {Mingrui Liu, Zexian Xie, Jielin Chen, and Binyang Li},
title = {: Rumor Detection based on Social Immune Network},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {351-367},
note = {Poster Volume Ⅰ}
}
-
Squeeze and Learn: Compressing Long Sequences with Fourier Transformers for Gene Expression Prediction,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Vittorio Pipoli, Giuseppe Attanasio, Marta Lovino, and Elisa Ficarra
Abstract: Genes regulate fundamental processes in living cells, such as the synthes is of proteins or other functional molecules, Studying gene expression is hence crucial for both diagnostic and therapeutic purposes, State-of-the-art Deep Learning techniques such as Xpresso have proposed to predict gene expression from raw DNA sequences, However, DNA sequences challenge computational approaches because of their length, typically in the order of the thousands, and sparsity, requiring models to capture both short- and long-range dependencies, Indeed, the application of recent techniques like transformers is prohibitive with common hardware resources, This paper proposes FNetCompression, a novel gene-expression prediction method, Crucially, FNetCompression combines Convolutional encoders and memory-efficient Transformers to compress the sequence up to 95 with minimal performance tradeoff, Experiments on the Xpressodataset show that FNetCompression outscores our baselines and the margin is statistically significant, Moreover, FNet-Compressionis 88 faster than a classical transformer-based architecture with minimal performance tradeoff.
Keyword: DNA sequences
,Gene expression
,Transformers
Cite
@inproceedings{ICIC_2024,
author = {Vittorio Pipoli, Giuseppe Attanasio, Marta Lovino, and Elisa Ficarra},
title = {Squeeze and Learn: Compressing Long Sequences with Fourier Transformers for Gene Expression Prediction},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {857-867},
note = {Poster Volume Ⅱ}
}
-
Cooperative Inference with Interleaved Operator Partitioning for CNNs,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, and Chao Li
Abstract: Deploying deep learning models on IoT devices often faces challenges due to limited memory resources and computing capabilities, Cooperative inference is an important way to address it, where an intelligent model has to be partitioned and then distributively deployed, To perform horizontal partitions, existing cooperative inference methods take either the output channel of operators or the height and width of feature maps as the partition dimensions, In this manner, since the activation of operators is distributed, they have to be concatenated together before being fed to the next operator, which incurs the delay for cooperative inference, In this paper, we propose the Interleaved Operator Partitioning IOP strategy for CNN models, By partitioning an operator based on the output channel dimension and its successive operator based on the input channel dimension, activation concatenation becomes unnecessary, thereby reducing the number of communication connections, which consequently reduces cooperative inference delay, Based on IOP, we further present a model segmentation algorithm for minimizing cooperative inference time, which greedily selects operators for IOP pairing based on the inference delay benefit harvested, Experimental results demonstrate that compared with the state-of-the-art partition approaches used in CoEdge and AlexNet, the IOP strategy achieves 14, 97 ~ 16, 97 faster acceleration and reduces peak memory usage by 21, 22 ~ 49, 98 for three classical image classification models.
Keyword: deep learning, distributed inference, parallel computing
Cite
@inproceedings{ICIC_2024,
author = {Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, and Chao Li},
title = {Cooperative Inference with Interleaved Operator Partitioning for CNNs},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {368-377},
note = {Poster Volume Ⅰ}
}
-
ALLSTATE: Hierarchical Clustering for Single Cells based on Non-linear Transition Embedding,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yating Lin, Minshu Wang, Wenxian Yang, and Rongshan Yu
Abstract: Single-cell RNA sequencing scRNA-seq provides critical insights into cellular diversity, essential for understanding complex biological dynamics, Traditional scRNA-seq analysis employs unsupervised clustering methods and supervised learning-based approaches to interpret cells or cell clusters, However, both of these approaches have limitations, Unsupervised learning-based methods struggle with selecting a single resolution, limiting their ability to reveal the multi-layered nature of cellular diversity, On the other hand, supervised learning-based methods lack the flexibility to adjust to the various levels of resolution needed to fully capture the complex spectrum of cell types and states,
In response to these challenges, hierarchical clustering has emerged as a
superior technique, It enables detailed exploration across various resolutions without predefined cluster counts, thus overcoming the limitations of both unsupervised and supervised methods, Nevertheless, the highdimensional nature of scRNA-seq poses significant analytical challenges,
We introduce ALLSTATE, a novel pipeline that utilizes non-linear transition embedding for dimension reduction, facilitating hierarchical clustering in a computationally efficient manner, Our experiments demonstrate that ALLSTATE achieves satisfactory clustering performance and allows to explore the connections between cellular hierarchies and cell types at multiple levels of resolution, Additionally, ALLSTATE effectively captures complex cellular differentiation paths, offering a nuanced view of cellular heterogeneity with performance comparable to mainstream methods.
Keyword: scRNA-seq, Hierarchical clustering, Non-linear transition embedding
· Cellular differentiation path,
Cite
@inproceedings{ICIC_2024,
author = {Yating Lin, Minshu Wang, Wenxian Yang, and Rongshan Yu},
title = {ALLSTATE: Hierarchical Clustering for Single Cells based on Non-linear Transition Embedding},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {868-882},
note = {Poster Volume Ⅱ}
}
-
A dual cross-modal interactive guided common representation method for fine-grained cross-modal retrieval,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Hongchun Lu, Min Han, Xue Li, and Le An
Abstract: In fine-grained cross-modal retrieval tasks, the huge heterogeneity gap between different modalities is a key factor leading to low retrieval performance, Therefore, addressing the media divide i, e, , inconsistent representation of different media types is an important way to improve retrieval performance, Although previous research has yielded some results, the standard model still has some shortcomings, First, the information interaction between different modalities is ignored when learning common representations of different media data, Second, discriminative fine-grained features are not fully exploited, To address this challenge, we propose a dual cross-modal interaction-guided common representation network DCINet to enhance the information interaction between different modalities while mining discriminative features in media data, Specifically, we construct a common representation network and use pre-interaction and post-interaction multimodal feature inputs into the network for training, respectively, The two training strategies guide the learning of the common representation network through a maximal-minimal game, effectively enhancing cross-media semantic consistency and improving retrieval accuracy, Finally, extensive experiments and ablation studies conducted on public datasets demonstrate the effectiveness of our proposed method.
Keyword: Fine-grained Cross-media Retrieval,Cross-Modal Spatial Interaction,Cross-Modal Channel interaction
Cite
@inproceedings{ICIC_2024,
author = {Hongchun Lu, Min Han, Xue Li, and Le An},
title = {A dual cross-modal interactive guided common representation method for fine-grained cross-modal retrieval},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {686-700},
note = {Poster Volume Ⅰ}
}
-
Multi-dimensional Edge-based Graph Representation Learning for Obstructed Prohibited Items Detection in X-ray Images,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Haolin Tang, Hongxia Gao, and Runze Lin
Abstract: X-ray security inspection has been widely used to maintain safety in public places and transportation systems, Due to the imaging characteristics of X-ray images, the stacking of items can cause translucency interference in the images, making it challenging to detect contraband items in backpacks or suitcases during security checks, Most existing methods have improved detection by adjusting the combination of features without considering the relationships between targets, In this paper, we propose a novel prohibited item graph representation learning algorithm to explicitly model inter-item relationships, aiming at improving their detection performance, Our approach starts with GTG module which generates a graph topology structure connecting the proposals output by the detection backbone network, where each proposal is treated as a node describing a candidate object, Then, the MDE module creates a set of multi-dimensional edge features to comprehensively and explicitly describe the relationships between each pair of connected nodes, allowing context information to be used for their detection, Extensive experiments validate the effectiveness of our method which not only enhances the detection accuracy, but also better identifies hard-to-distinguish objects in complex scenarios, This exploration opens up an uncharted graph-based direction previously unexplored in prior research, providing a new path for future studies in graph-based X-ray security inspection detection, Our code is provided in the Supplementary Material.
Keyword: Prohibited items detection X-ray image Graph Representation Learning Multi-dimensional Edge Feature
Cite
@inproceedings{ICIC_2024,
author = {Haolin Tang, Hongxia Gao, and Runze Lin},
title = {Multi-dimensional Edge-based Graph Representation Learning for Obstructed Prohibited Items Detection in X-ray Images},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {701-713},
note = {Poster Volume Ⅰ}
}
-
DGUQA: Domain Generalization Uncertainty Informed Patient-Specific Quality Assurance,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiaoyang Zeng, Awais Ahmed, Rui Xi, and Mengshu Hou
Abstract: Deep Learning Automated Patient-Specific Quality Assurance PSQA endeavors to diminish the reliance on clinical resources, The accurate estimation of the dose difference metric, particularly the Gamma passing rate, is paramount in ensuring the safety and efficacy of radiation therapy plans, Although current research has yielded an overall performance on par with that of experts, it fails to address the local performance discrepancies of the model across diverse lesions, thereby highlighting a generalization challenge that undermines its credibility in real clinical settings,
This paper introduces DGUQA, based on the theory of domain generalization in deep learning, DGUQA employs an adversarial loss-based regularization to address the issue of generalization, Further, since the model is biased with the most common lesion organs, relying solely on a domain-generalized model would decrease overall performance, Therefore, in conjunction with safety requirements, we also model predictive uncertainty, The domain generalization model is used only when the uncertainty exceeds a certain threshold otherwise, a standard model is employed, Experiments demonstrate that DGUQA shows superiority in both generalization performance and overall effectiveness, DGUQA notably enhances the deep learning trustworthiness in the PSQA and has meaningful implications for the clinical significance of medical deep learning.
Keyword: PSQA, Quality Assurance, Domain Generlization, Deep Learning, Uncertainty
Cite
@inproceedings{ICIC_2024,
author = {Xiaoyang Zeng, Awais Ahmed, Rui Xi, and Mengshu Hou},
title = {DGUQA: Domain Generalization Uncertainty Informed Patient-Specific Quality Assurance},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {883-896},
note = {Poster Volume Ⅱ}
}
-
Lithologic scene classification based on channel group fusion and adaptive feature filtering network,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhiyuan Sui, Haoyi Wang, and Xianju Li
Abstract: Lithology classification is an important research direction in geological re-mote sensing, Lithology is a high-level semantic information and its features are easily masked by vegetation, posing challenges in remote sensing feature extraction, In this study, we constructed a lithology scene classification da-taset named MSRS-LSC based on multi-source remote sensing data, Subse-quently, we proposed a lithology scene classification model called Channel Grouping Fusion and Adaptive Feature Filtering Network CGFAFFNet , This model consists of two modules: 1 Channel Grouping Fusion CGF module: group learning, channel-wise information mixing and interaction, and weighted fusion were performed to select key information on the feature maps that extracted at different depths by dense connection blocks 2 Adap-tive Feature Filtering module: Cascading the fusion features from different CGF modules and performing weighted calculations in both channel and spatial dimensions to further filter key feature information, The proposed model achieved an OA, F1 score, and Kappa of 80, 99 +- 0, 4 , 81, 26 +- 0, 38 , and 78, 85 +- 0, 44 , respectively, outperforming mainstream scene classification models.
Keyword: lithology classification Remote sensing Scene classification
Cite
@inproceedings{ICIC_2024,
author = {Zhiyuan Sui, Haoyi Wang, and Xianju Li},
title = {Lithologic scene classification based on channel group fusion and adaptive feature filtering network},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {378-393},
note = {Poster Volume Ⅰ}
}
-
MiNiformer: Enhance Vanilla Transformer with Mixer-Adapter for Long-term Traffic Forecasting,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Shaojun E, Wenjuan Han, Zhiwei zhang, and Jinan Xu
Abstract: Recently, there's been a surge in scholarly interest in traffic forecasting, Most of the efforts have been concentrated on short-term forecasting, and have yielded promising results, Long-term forecasting, though more practical, presents two challenges, First, existing approaches primarily capture dependencies and correlations within short-term historical data, Their performance drops when handling long-term spatio-temporal forecasting, indicating limited scalability, Second, most approaches tend to emphasize temporal information, often at the expense of neglecting important spatial geographic information, In response to these two challenges, we propose our transformer-based traffic forecasting approach, Miniformer, featuring the Spatial Feature Extractor - Mixer Adapter as a crucial element, Miniformer excels in extracting and integrating spatial features, leading to impressive results, Experiments show that Miniformer, by leveraging spatial information and long-term dependencies, showcases robust long-term feature extraction capabilities and performs exceptionally well in both short-term and long-term scenarios.
Keyword: Spatio-temporal, traffic forecasting, Transformer, Mixer-Adapter
Cite
@inproceedings{ICIC_2024,
author = {Shaojun E, Wenjuan Han, Zhiwei zhang, and Jinan Xu},
title = {MiNiformer: Enhance Vanilla Transformer with Mixer-Adapter for Long-term Traffic Forecasting},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {394-409},
note = {Poster Volume Ⅰ}
}
-
Video-Image-Sentence Multi-Modality Sequential Recommendation Model,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Guowei Wang, Yicheng Di, and Yuan Liu
Abstract: At present, recommendation systems have become an indispensable
tool for users to access information, Traditional sequential recommendation systems often rely on explicit item IDs, which have limitations in data sparsity and
cold start scenarios, Recent studies have focused on using the modal features of
items as inputs to models, allowing knowledge learned from different modal datasets to be transferred, In this context, we propose a pre-training method for
modeling multiple modalities of data, which can effectively integrate information
from different modalities, We also introduce a new loss calculation method to
measure the performance of this method, Finally, to further improve the retrieval
performance of the model, we propose a new sequential recommendation method
that uses a sequence encoder to capture user interaction sequences and a project
encoder to encode project information, sharing parameters to enhance information, We evaluate the proposed methods on three public datasets and conduct
experiments, the results of which demonstrate an improvement in the performance of our methods.
Keyword: Sequential Recommendation, Multi-Modal, Multimodal Pretraining
Cite
@inproceedings{ICIC_2024,
author = {Guowei Wang, Yicheng Di, and Yuan Liu},
title = {Video-Image-Sentence Multi-Modality Sequential Recommendation Model},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {992-1006},
note = {Poster Volume Ⅰ}
}
-
Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, and Wenjuan Han
Abstract: Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots, Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language, Beyond vocabulary, sentence-level descriptions contain richer semantics, Based on this, we construct a touch-language-vision dataset named TLV Touch-Language-Vision by human-machine cascade collaboration, featuring sentence-level descriptions for multimode alignment, The new dataset is used to fine-tune our proposed lightweight training framework, TLV-Link Linking Touch, Language, and Vision through Alignment , achieving effective semantic alignment with minimal parameter adjustments 1 , Project Page: https: xiaoen0, github, io touch, page .
Keyword: Tactile-related multimodal perception,Tactile dataset,Modal Alignment
Cite
@inproceedings{ICIC_2024,
author = {Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, and Wenjuan Han},
title = {Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {599-610},
note = {Poster Volume Ⅱ}
}
-
A dynamic graph structure optimization diagnosis,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhiyuan Hu, Yangde Lin, Jianrong Li, and Juan Lyu
Abstract: In the field of industrial equipment management and academic network analy-sis, early fault diagnosis and node classification tasks are of great significance for ensuring the stable operation of equipment and promoting knowledge dis-covery, Existing methods face many challenges in dealing with large-scale and unbalanced data sets, especially in bearing fault diagnosis and scientific litera-ture classification, In response to these challenges, this paper proposes a Dy-namic Graph-Structured Optimization Diagnosis model based on graph neural network, The innovation of the model primarily encompasses two aspects, Firstly, concerning the dataset, the k-nearest neighbor algorithm is utilized to fuse the health status of bearings with vibration signal data, This integration facilitates the construction of a graph structure that accurately captures the complex relationship between different bearing states, At the same time, an optimization strategy combining Focal Loss and graph Deep Open Classifica-tion method is used to further improve the applicability and accuracy in differ-ent fields on the basis of enhancing the performance of the model in dealing with unbalanced data, During the experiment, the DG-SOD model showed ex-cellent performance in the above tasks, The accuracy of bearing fault diagnosis increased to 65 , the accuracy of Core node classification increased from 76 to 86, 65 , and the classification accuracy of CiteSeer increased from 70 to 76, 05 , The above data show that the DG-SOD model has obvious advantages in dealing with data imbalance problems in industrial equipment detection and scientific literature classification and improving the accuracy of minority class recognition, It provides new ideas and frameworks for future in-dustrial equipment management and academic network analysis.
Keyword: bearing fault diagnosis, graph neural network, k nearest neighbor algorithm, Focal Loss
Cite
@inproceedings{ICIC_2024,
author = {Zhiyuan Hu, Yangde Lin, Jianrong Li, and Juan Lyu},
title = {A dynamic graph structure optimization diagnosis},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {410-423},
note = {Poster Volume Ⅰ}
}
-
BiSlim-6D:A 6D pose estimation network for efficient feature decoupling and fusionBiSlim-6D:A 6D pose estimation network for efficient feature decoupling and fusion,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Xiaotong Gu, Yongshuai He, and Shenghai Wang
Abstract: 6D pose estimation is a crucial component of robotic interaction tasks and remains an active research direction in the field of computer vision, Therefore, developing a 6D pose estimation algorithm that simultane-ously achieves high speed and high accuracy is essential, Recently, there has been a trend of utilizing real-time and accurate YOLO series methods for 6D pose estimation tasks, In this work, we propose a new 6D pose estimation network, BiSilm-6D, which uses CSPDarknet53 as the backbone and incorporates BiFPN built following the Slim-neck paradigm as the feature fusion network, We evaluate the contributions of CSPDarknet53 and BiSlim-neck to the performance of 6D pose estimation and attempt to explain the reasons for these contributions, Furthermore, through comparative experiments, we demonstrate that BiSlim-6D exhibits strong overall performance among current 6D pose estimation networks, Our proposed method achieves an accura-cy of 98, 78 on the 2D reprojection metric and 81, 51 on the ADD -S metric, The proposed method has the potential for practical application in relevant tasks in the future.
Keyword: Deep learning, Machine vision, Feature fusion, 6D Pose estimation
Cite
@inproceedings{ICIC_2024,
author = {Xiaotong Gu, Yongshuai He, and Shenghai Wang},
title = {BiSlim-6D:A 6D pose estimation network for efficient feature decoupling and fusionBiSlim-6D:A 6D pose estimation network for efficient feature decoupling and fusion},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {161-177},
note = {Poster Volume Ⅱ}
}
-
A New Bibliometrics Analysis Method for Imbalanced Classes and New Classes in the Domain of Biomedical Literature,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Yangde Lin, Zhiyuan Hu, Xiaoran He, Sujuan Liu, and Jianrong Li
Abstract: In the field of biomedical research, processing and understanding large amounts of academic literature quickly and accurately is critical to advancing the field, In massive data analysis, the original graph neural networks GNN has many shortcomings in processing data, such as the difficulty in effectively capturing the dynamic changes of data when dealing with dynamic graph data, as well as the bias towards a larger number of categories when dealing with category imbalance, which affects the ability of recognizing a small number of categories, In view of the aforementioned issues, this study introduces the gDOC method into the field of biomedical literature analysis, Additionally, a lifelong learning framework, termed Biology Dynamic Graph Neural Network BDGNN , is proposed, which integrates GNN to leverage its robust data representation capabilities, Furthermore, BDGNN incorporates the Focal Loss function and a temporal variance metric into the gDOC method, This enables dynamic adjustments of the model based on the temporal characteristics of the graph data, The amount of historical data used in the training process can thus be better adapted to the dynamic nature of biomedical literature citation networks, In the experimental phase, this study designs data preprocessing and data adaptation strategies tailored specifically for the PubMed dataset and the BDGNN method is implemented on a variety of typical GNN models, By varying the historical data size and labeling rate, the performance of the models in dealing with new and imbalanced category problems is comprehensively evaluated, The experimental results confirm that the accuracy of the framework improves up to 89 in dealing with imbalanced and new category recognition tasks in the field of biomedical literature compared to existing techniques.
Keyword: bibliometrics analysis, graph neural networks, lifelong learning, imbalanced classes, new classes
Cite
@inproceedings{ICIC_2024,
author = {Yangde Lin, Zhiyuan Hu, Xiaoran He, Sujuan Liu, and Jianrong Li},
title = {A New Bibliometrics Analysis Method for Imbalanced Classes and New Classes in the Domain of Biomedical Literature},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {424-438},
note = {Poster Volume Ⅰ}
}
-
Improving Stereo Matching Accuracy with Blind Super-Resolution Networks,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Dinghao Zheng, Jiangtao Peng, Qiwei Xie, and Qian Long
Abstract: Due to the limitation of the camera's photoreceptor, the stereo images obtained from stereo cameras can only be presented in the form of pixel blocks, which cannot show the more minute object information in sub-pixel units between the pixel blocks, which may restricts the breakthrough of stereo matching in sub-pixel accuracy, For this reason, we innovatively use a blind super-resolution network to simulate the sub-pixel effect and improve the accuracy of stereo matching, We combine the super-resolution network with the stereo matching and design a new stereo matching process: the stereo image is first zoomed in by the blind super-resolution network to supplement the local information, and then the high-resolution image is inputted into the stereo matching algorithm to generate a dense and fine disparity map, Through experiments, we find that the blind super-resolution network BSRGAN can effectively improve the stereo matching accuracy in most scenarios, However, when facing repetitive and dense texture regions, the limitation of blind super-resolution network leads to a degradation of the matching accuracy, Nevertheless, our study provides a new idea and method for improving stereo matching accuracy.
Keyword: Stereo Matching, Blind Super-Resolution,Sub-Pixel
Cite
@inproceedings{ICIC_2024,
author = {Dinghao Zheng, Jiangtao Peng, Qiwei Xie, and Qian Long},
title = {Improving Stereo Matching Accuracy with Blind Super-Resolution Networks},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {178-189},
note = {Poster Volume Ⅱ}
}
-
Coal Mine Safety Alert System: Refining BP Neural Network with Genetic Algorithm Optimization,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Jiabin Luo and Hanzhe Pan
Abstract: In response to the persistent safety challenges within coal mines, this study proposes a novel approach integrating a three-layer feedforward backpropagation artificial neural network with a genetic algorithm GA-BP for establishing a safety early warning system, Focused on a coal mine in Shandong, China, the model's effectiveness is evaluated using relevant data for training and analysis, Results indicate the superiority of the GA-BP model over traditional BP neural networks, offering enhanced capability for identifying potential safety risks promptly, This advancement enables coal mine management to implement timely interventions, ensuring the safety of miners, The findings present valuable insights for engineering applications in similar contexts.
Keyword: Coal mine safety, Genetic algorithm-backpropagation neural network GA-BP, Safety early warning model, Early warning indicators
Cite
@inproceedings{ICIC_2024,
author = {Jiabin Luo and Hanzhe Pan},
title = {Coal Mine Safety Alert System: Refining BP Neural Network with Genetic Algorithm Optimization},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {901-910},
note = {Poster Volume Ⅰ}
}
-
A Comprehensive Survey of Style Transfer: Techniques, Models, and Applications,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Weiqi Wang, Weiting Wang, Ying Xu, Feilong Bao, Zhecong Xing, Zhiguo Zhang, and Yuan Zhang
Abstract: Style transfer learning has garnered significant attention in the field of computer vision in recent years, with anime style transfer being particularly notable for its entertaining nature and widespread application, This fascinating feature has been integrated into various short video platforms, mini-programs, and photography applications, According to our survey, nearly one in four individuals has used applications based on anime style transfer models, with most users providing positive feedback, citing its novelty, fun, and high playability, This paper provides a comprehensive summary of the technological advancements in style transfer, focusing on four mainstream methods: Convolutional Neural Networks CNN , Variational Autoencoders VAE , Vision Transformers ViT , and Generative Adversarial Networks GAN , We detail the implementation of specific models for each method and systematically compare the performance of several representative models, Finally, we include links to the open-source code of these models to facilitate further research and application.
Keyword: Style Transfer, Convolutional Neural Networks CNN, Variational Autoencoders VAE, Vision Transformer ViT, Generative Adversarial Networks GAN, Image Processing
Cite
@inproceedings{ICIC_2024,
author = {Weiqi Wang, Weiting Wang, Ying Xu, Feilong Bao, Zhecong Xing, Zhiguo Zhang, and Yuan Zhang},
title = {A Comprehensive Survey of Style Transfer: Techniques, Models, and Applications},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {439-452},
note = {Poster Volume Ⅰ}
}
-
LBAS: A Batch Authentication Scheme for M2M Scenarios,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Zhihuan Xing, Yongzhuo Huang, Yuqing Lan, Xiaoyi Yang, Yichun Yu, Xiaoxue Cui, and Dan Yu
Abstract: In some IoT Internet of Things Machine-to-Machine M2M scenarios, existing device authentication schemes often struggle to balance security, lightweight design, and flexibility, making them unsuitable for batch authentication among low-performance devices, Therefore, this paper proposes a Lightweight Batch Authentication Scheme LBAS tailored for M2M scenarios involving low-performance devices, LBAS employs a two-step authentication strategy to ensure both security and flexibility, It organizes devices into a complete binary tree structure, reducing the number of mutual authentications between devices and achieving lightweight authentication, Additionally, LBAS provides a comprehensive trust domain management strategy to handle different behaviors of devices within the trust domain, ensuring availability, We formalized the correctness of the LBAS authentication phase using BAN Burrows-Abadi-Needham logic and analyzed two common types of man-in-the-middle attacks, demonstrating the protocol's security, Compared to traditional publish-subscribe schemes, LBAS does not require a proxy server and reduces average time overhead by 44, 3 .
Keyword: IoT, M2M, Security, Batch Authentication
Cite
@inproceedings{ICIC_2024,
author = {Zhihuan Xing, Yongzhuo Huang, Yuqing Lan, Xiaoyi Yang, Yichun Yu, Xiaoxue Cui, and Dan Yu},
title = {LBAS: A Batch Authentication Scheme for M2M Scenarios},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {714-732},
note = {Poster Volume Ⅰ}
}
-
An Improved YOLOv8 Network for Real-Time Air-to-Ground Object Detection,
ICIC 2024 Posters, Tianjin, China,
August 5-8, 2024
Authors: Bo Peng, Chen Dong
Abstract: In recent years, target detection algorithms such as YOLO have been widely applied across various fields, particularly in scenarios that require rapid re-sponses, such as autonomous driving and video surveillance, However, con-ventional YOLOv8 models face challenges of slow processing speeds and high latency in these contexts, To address these issues, we propose an im-proved model called FNYOLOv8, which incorporates FasterNet into the orig-inal network to enhance both the runtime speed and accuracy of the original model, FasterNet is an efficient neural network architecture that utilizes lo-cal convolution PConv techniques to extract spatial features more efficient-ly by reducing redundant computations and memory access, This design en-ables FasterNet to achieve higher runtime speeds on a wide range of devices while maintaining or even improving accuracy across various visual tasks, By integrating FasterNet into the backbone of YOLOv8, our FNYOLOv8 model can deliver faster inference speeds and higher energy efficiency while maintaining high precision, Experimental results on the Visdrone aerial da-taset demonstrate that our FNYOLOv8 model is more suitable for rapid-response scenarios such as autonomous driving and video surveillance, as it ensures accuracy while enabling quick reasoning and enhancing processing speed.
Keyword: Object Detection YOLOv8 Lightweight Network Computer Vision Partial Convolution
Cite
@inproceedings{ICIC_2024,
author = {Bo Peng, Chen Dong},
title = {An Improved YOLOv8 Network for Real-Time Air-to-Ground Object Detection},
booktitle = {Proceedings of the International Conference on Intelligent Computing (ICIC 2024)},
month = August,
date = 5-8,
year = 2024,
address = {Tianjin, China},
pages = {190-201},
note = {Poster Volume Ⅱ}
}