- Cascaded Feature Fusion Network for Small-size Pedestrian Detection,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Shilong Yu and Chenhui Yang
Abstract: Deep neural network-based target detectors cannot sufficiently extract effec-tive features for detecting small-size pedestrians, In this letter, we propose a deep cascaded network framework for small-size pedestrian detection, which contains an Iterative Feature Augmentation module and a Residual Attention Fusion module, Specifically, the Iterative Feature Augmentation module adopts bilinear interpolation sampling and channel reshaping in the deep backbone network to achieve feature fusion at different scales, Moreover, we also introduce a feature fusion coefficient to select small-size features, The Residual Attention Fusion module is constructed by stacking attention modules, and the attention modules at different depths produce adaptive changes in perceptual features, Each attention module is a bottom-up feed-forward structure and features are reconstructed by residual connection be-tween attention modules, Experiments on Tiny Citypersons, Caltech, and Ti-ny Person challenging datasets show that our proposed modules achieve sig-nificant gains, with an almost 10 improvement in pedestrian average miss rate and precision compared to baseline networks.
Keyword: Cascaded convolutional neural network CNN , Pedestrian Detection, Resid-ual Attention, Image Processing
Cite
@inproceedings{ICIC2024,
author = {Shilong Yu and Chenhui Yang},
title = {Cascaded Feature Fusion Network for Small-size Pedestrian Detection},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {1-12},
note = {Poster Volume Ⅰ}
}
- Intent-Driven Attribute-Based Outsourcing Encryption Scheme,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Ke Li, Guowei Wu, and Jun Shen
Abstract: With the development of the Internet, the communication between users is increasingly concerned about the protection of private information, As one of the important means to protect private data, attribute-based encryption takes attributes as the certificate of user decryption, which prevents private data from being leaked or tampered, However, the traditional attribute-based encryption has some problems, such as inflexible encryption process and heavy computing burden for users, We propose an intent-driven attribute-based outsourcing encryption scheme, which integrates user intent parameters into the encryption algorithm to improve the flexibility and reliability of the encryption process, Edge nodes have powerful computing and storage capabilities, We outsource some encryption and decryption operations from the users to the edge nodes, which is conducive to reducing the computing overhead of users or terminals, The hierarchical relationship of attributes can help users quickly match attributes, We construct the attributes as attribute trees, and determine the user's decryption permission according to the hierarchical relationship between user attributes, Finally, we give scheme analysis, including security proof, performance cost and functional comparison of the scheme.
Keyword: Intent-Driven, Attribute-Based Encryption, Hierarchical Attributes, Outsourced
Cite
@inproceedings{ICIC2024,
author = {Ke Li, Guowei Wu, and Jun Shen},
title = {Intent-Driven Attribute-Based Outsourcing Encryption Scheme},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {733-744},
note = {Poster Volume Ⅰ}
}
- Piculet: Specialized Model-Guided Hallucination Alleviation for MultiModal Large Language Models,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, and Shiguo Lian
Abstract: Multimodal Large Language Models MLLMs have made significant progress in bridging the gap between visual and language modalities, However, hallucinations in MLLMs, where generated text does not align with image content, continue to be a major challenge, Existing methods for addressing hallucinations often rely on instruction-tuning, which requires retraining the model with specific data, which increases the cost of utilizing MLLMs further, In this paper, we introduce a novel training-free method, named Warbler, for enhancing the input representation of MLLMs, Warbler leverage multiple auxiliary models to extract description of visual information from the input image, and combine these description together with the original image as an input to the MLLM, We evaluate our method both quantitively and qualitively, and the results demostrating that Warbler greatly decrease hallucinations of MLLMs, Our method can be easily extended to different MLLMs while being universal.
Keyword: Multimodal Large Language Models, hallucinations, training-free
Cite
@inproceedings{ICIC2024,
author = {Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, and Shiguo Lian},
title = {Piculet: Specialized Model-Guided Hallucination Alleviation for MultiModal Large Language Models},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {1-15},
note = {Poster Volume Ⅱ}
}
- State Quantize for Pursuit Approximate Optimal Control using Reinforcement Learning,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Huanhuan Yu, Haohao Mai, Shuling Gao, Xiwen Huang, and Qiuling Yang
Abstract: In high-speed vehicle motion scenarios, solving optimal control problems faces significant challenges in terms of time and space complexity, Ensuring real-time performance of the controller requires efficient solving algorithms and support from high-performance computing platforms, To reduce the computational cost and approach the performance of optimal control, approximate optimal control has emerged as a feasible solution, In this paper, we propose an approximate optimal vehicle control method that outperforms Model Predictive Control MPC in terms of performance, The method combines the pure pursuit algorithm for vehicle path tracking with the Twin Delayed DDPG TD3 algorithm to generate approximate lookahead distance and velocity control values for the vehicle, Additionally, the vehicle state is quantized and discretized, In our experiments with a vehicle simulator, we compare the MPC control with our proposed method, The results show that while the MPC control remains stable at a vehicle speed of up to 70MPH, our method effectively controls the vehicle even at a speed of 100MPH, with higher control rate and robustness.
Keyword: approximate optimal control, pure pursuit,TD3, quantize
Cite
@inproceedings{ICIC2024,
author = {Huanhuan Yu, Haohao Mai, Shuling Gao, Xiwen Huang, and Qiuling Yang},
title = {State Quantize for Pursuit Approximate Optimal Control using Reinforcement Learning},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {202-217},
note = {Poster Volume Ⅱ}
}
- Content-Aware Network for Quality Estimation of Copper Scrap Granules,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Kaikai Zhao, Zhaoxiang Liu, Kai Wang, and Shiguo Lian
Abstract: To determine the quality level of copper scrap granules, existing methods have to manually identify all kinds of impurities mixed in copper scrap granules relying on technicians' experience, In this paper, we pioneer a computer vision-based approach called Content-Aware Network CANet to estimate the quality of copper scrap granules, Specifically, CANet consists of a visual transformer-based backbone that extracts the semantic features from copper scrap granule images, a multi-layer perception-based neck that explicitly estimates the volume proportion of copper to copper scrap granules and implicitly estimates the counterparts of varieties of impurities and a well-designed head that directly outputs the quality result, Benefiting from our novel architecture and loss functions, CANet can be trained in an end-to-end manner to accurately estimate the quality of copper scrap granules only with the binary annotated images copper area and non-copper area without identifying these unknown impurities and their densities in advance, Experiments on real copper scrap granule datasets demonstrate the effectiveness and superiority of our proposed method.
Keyword: Copper scrap granules, Quality level, Visual transformer, Content-Aware
Cite
@inproceedings{ICIC2024,
author = {Kaikai Zhao, Zhaoxiang Liu, Kai Wang, and Shiguo Lian},
title = {Content-Aware Network for Quality Estimation of Copper Scrap Granules},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {16-27},
note = {Poster Volume Ⅱ}
}
- Facial Expression Recognition Via Multi Semantic Diffusion Model on Imbalanced Datasets,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Ling Zhang and Junlan Dong
Abstract: This paper presents a novel facial expression recognition approach based on multiple semantic taxonomies learning on the imbalanced datasets, Recent studies on imbalanced data always concern how to homogenize the data volume between different categories, presenting strategies like minority over-sampling and majority balance cascading, etc, In this paper, we try to pay more attention in high-level semantic characterization of facial expression, using more discriminative and conceptual attributes to describe samples in the case of unbalanced sets, To fully exploit the semantic information contained in the small volume samples, we develop an Analytic Hierarchical Model AHM method based on facial Action Unit AU , to enforce a discriminative mapping from the image feature space to a multi-semantic space with taxonomic relations, We apply convolutional neural networks to capture the low-level image feature, and then use dictionary learning algorithm for reconstruction of images in semantic space, in order to prevent deviation from individual identity, Experiments performed on RAF-DB, FER2013 and SFEW expression databases show that the proposed method is robust to facial expression recognition in the wild.
Keyword: semantic diffusion, imbalanced dataset, facial action control system FACS , conceptual taxonomies, Analytic Hierarchical Model AHM
Cite
@inproceedings{ICIC2024,
author = {Ling Zhang and Junlan Dong},
title = {Facial Expression Recognition Via Multi Semantic Diffusion Model on Imbalanced Datasets},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {453-463},
note = {Poster Volume Ⅰ}
}
- Industrial Internet of Things Intrusion Detection System Based on Federated Learning,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Teng Fang and Lina Ge
Abstract: Abstract, With the rapid development of Industrial Internet of Things IIoT , its secu-rity has become a focus of attention, Traditional centralized intrusion detection sys-tems IDS face challenges of privacy leakage and high communication overhead in IIoT environments, This article proposes an IIoT intrusion detection system based on federated learning FL-IDS , The system introduces Paillier homomorphic encryption technology to enhance the security of data transmission, uses Bi-LSTM to extract network traffic data features, and uses Transformer for model training, The experi-mental results show that our system outperforms other models in terms of detection rate and false alarm rate, This framework effectively improves the accuracy of intru-sion detection, reduces communication bandwidth requirements, and protects user privacy while ensuring model convergence.
Keyword: Keywords: Industrial Internet of Things, Intrusion Detection, Federated Learning, Bi-LSTM, Transformer
Cite
@inproceedings{ICIC2024,
author = {Teng Fang and Lina Ge},
title = {Industrial Internet of Things Intrusion Detection System Based on Federated Learning},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {218-235},
note = {Poster Volume Ⅱ}
}
- PAEN: Efficient Pillar-based 3D Object Detector Based on Attention and Dilated Convolution,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Jia Wen, Guanghao Zhang, Qi Zhang, Kelun Tian, and Kejun Ren
Abstract: The Pillar-based 3D object detector can complete the scene-sensing task efficiently and quickly, meeting the basic real-time detection needs of the automatic driving sensing module, In this paper, we propose a Pillar Sequence Attention Encoder and Dilated Expansion Convolution Network, The former addresses issues of coarse encoding methods and limitations in encoding information during the pillar encoding stage, while the latter tackles the problem of insufficient receptive fields in the backbone network, Specifically, the Pillar Sequence Attention Encoder uses the Pillar Sequence Attention module PSA to capture attention information among points in the local region of the pillar and utilizes a Pillar Feature Soft Aggregation module PFSA to finely aggregate information from points within the pillar, The Dilated Expansion Convolution Network leverages dilated convolutions to capture feature information with both sparse and dense in wide-ranging receptive fields, We conducted experiments on the KITTI dataset to validate the performance of our model and the effectiveness of the proposed modules, Experiments show that our method achieved a mean average precision mAP of 81, 48 for the car category, surpassing the baseline model by 3, 12 , while the inference time only increases by about 10ms.
Keyword: 3D object detection, LiDAR, Pillar detector, Attention module, Dilated convolution
Cite
@inproceedings{ICIC2024,
author = {Jia Wen, Guanghao Zhang, Qi Zhang, Kelun Tian, and Kejun Ren},
title = {PAEN: Efficient Pillar-based 3D Object Detector Based on Attention and Dilated Convolution},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {28-39},
note = {Poster Volume Ⅱ}
}
- FastHDRNet: A new efficient method for SDR-to-HDR Translation,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Tian Siyuan, Wang Hao, Rong Yiren, Wang Junhao, Dai Renjie, and He Zhengxiao
Abstract: Modern displays nowadays possess the capability to render video content with a high dynamic range HDR and an extensive color gamut WCG , However, the majority of available resources are still in standard dynamic range SDR , Therefore, we need to identify an effective methodology for this objective, The existing deep neural network DNN based SDR Standard dynamic range to HDR High dynamic range conversion methods outperform conventional methods, but they are either too large to implement or generate some terrible artifacts, We propose a neural network for SDRTV to HDRTV conversion, termed quot FastHDRNet quot , This network includes two parts, Adaptive Universal Color Transformation and Local Enhancement, The architecture is designed as a lightweight network that utilizes global statistics and local information with super high efficiency, After the experiment, we find that our proposed method achieve state-of-the-art performance in both quantitative comparisons and visual quality with a lightweight structure and a enhanced infer speed.
Keyword: Inverse Tonemapping,Channel Selection Normalization,Image Processing
Cite
@inproceedings{ICIC2024,
author = {Tian Siyuan, Wang Hao, Rong Yiren, Wang Junhao, Dai Renjie, and He Zhengxiao},
title = {FastHDRNet: A new efficient method for SDR-to-HDR Translation},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {464-479},
note = {Poster Volume Ⅰ}
}
- Transformer in Touch: A Survey,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Jing Gao, Ning Cheng, Bin Fang, and Wenjuan Han
Abstract: The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception, This review aims to comprehensively outline the application and development of Transformers in tactile technology, We first introduce the two fundamental concepts behind the success of the Transformer: the self-attention mechanism and large-scale pre-training, Then, we delve into the application of Transformers in various tactile tasks, including but not limited to object recognition, cross-modal generation, and object manipulation, offering a concise summary of the core methodologies, performance benchmarks, and design highlights, Finally, we suggest potential areas for further research and future work, aiming to generate more interest within the community, tackle existing challenges, and encourage the use of Transformer models in the tactile field.
Keyword: Tactile, Self-attention, Transformers, Self-supervision
Cite
@inproceedings{ICIC2024,
author = {Jing Gao, Ning Cheng, Bin Fang, and Wenjuan Han},
title = {Transformer in Touch: A Survey},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {13-39},
note = {Poster Volume Ⅰ}
}
- SCD-YOLO: A security detection model for X-ray images based on the improved YOLOv5s,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Xiaotong Kong, Aimin Li, Wenqiang Li, Zhiyao Li, and Yuechen Zhang
Abstract: X-ray security inspection is widely used in the subway, high-speed rail, airports, key locations, logistics, and other scenarios, However, because of the complexity and diversity of objects in the X-ray images in real-world scenarios, it is easy for security personnel to make mistakes or miss inspections when they are fatigued or not fully focused, In this paper, we proposed an improved model based on YOLOv5 to help security inspectors improve the efficiency of security inspection procedures, First, we replaced the SPP spatial pyramid pooling feature fusion module with SPPFCSPC to further enhance the feature extraction capability, Then, we added CoordConv before each feature map input to the detection head, This enables the model to perceive positional information and enhances its feature extraction capability, effectively addressing the detection of small prohibited items in complex backgrounds, Finally, we used decoupled detector head instead of the traditional coupled detector head to separate the classification and localization tasks further improves the detection speed, The experimental results show that our method achieves 77 accuracy, Compared with state-of-the-art methods, our model also achieves significant improvements in detection accuracy and recall.
Keyword: security object detection, X-ray, yolov5s, neural network
Cite
@inproceedings{ICIC2024,
author = {Xiaotong Kong, Aimin Li, Wenqiang Li, Zhiyao Li, and Yuechen Zhang},
title = {SCD-YOLO: A security detection model for X-ray images based on the improved YOLOv5s},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {40-52},
note = {Poster Volume Ⅱ}
}
- Imputing Missing Temperature Data of Meteorological Stations Based on Global Spatiotemporal Attention Neural Network,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Tianrui Hou, Xinshuai Guo, Li Wu, Xiaoying Wang, Guojing Zhang, and Jianqiang Huang
Abstract: Imputing missing meteorological site temperature data is necessary and valuable for researchers to analyze climate change and predict related natural disasters, Prior research often used interpolation-based methods, which basically ignored the temporal correlation existing in the site itself, Recently, researchers have attempted to leverage deep learning techniques, However, these models cannot fully utilize the spatiotemporal correlation in meteorological stations data, Therefore, this paper proposes a global spatiotemporal attention neural network GSTA-Net , which consists of two sub networks, including the global spatial attention network and the global temporal attention network, respectively, The global spatial attention network primarily addresses the global spatial correla-tions among meteorological stations, The global temporal attention network pre-dominantly captures the global temporal correlations inherent in meteorological stations, To further fully exploit and utilize spatiotemporal information from me-teorological station data, adaptive weighting is applied to the outputs of the two sub-networks, thereby enhancing the imputation performance, Additionally, a progressive gated loss function has been designed to guide and accelerate GSTA-Net's convergence, Finally, GSTA-Net has been validated through a large num-ber of experiments on public dataset TND and QND with missing rates of 25 , 50 , and 75 , respectively, The experimental results indicate that GSTA-Net outperforms the latest models, including Linear, NLinear, DLinear, PatchTST, and STA-Net, across both the mean absolute error MAE and the root mean square error RMSE metrics.
Keyword: Attention mechanism, Deep learning, Neural network, Missing data imputing, Meteorological station data, Spatiotemporal correlation
Cite
@inproceedings{ICIC2024,
author = {Tianrui Hou, Xinshuai Guo, Li Wu, Xiaoying Wang, Guojing Zhang, and Jianqiang Huang},
title = {Imputing Missing Temperature Data of Meteorological Stations Based on Global Spatiotemporal Attention Neural Network},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {40-56},
note = {Poster Volume Ⅰ}
}
- Few-Shot Constraint Enhancement Based on Generative Adversarial Networks,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Xin Song, Yanqing Song, Jianguo Chen, and Long Chen
Abstract: A constrained theoretical model for Generative Adversarial Networks GANs is proposed, To address issues such as overfitting, convergence difficulties, and mode collapse in the GAN training process, a GAN structure and process constrained training based on Directed Graphical Models DGM is first introduced to solve the instability and quality issues of generated samples, Then, a static constraint method is proposed, which calculates the similarity of interpretable measurement EMS and final classification metrics of generated data on different classifiers by setting the topology of D and G, and measures the constraint strength through EMS to suppress overfitting during the generation process, Furthermore, the constraint of label sharing features and weight updates effectively reduces the probability of mode collapse by appropriately constraining the functionality of label information in generation, The constraint of GAN solves the problem of effective sample enhancement.
Keyword: Few-shot , Constraint Enhancement , Generative Adversarial Networks
Cite
@inproceedings{ICIC2024,
author = {Xin Song, Yanqing Song, Jianguo Chen, and Long Chen},
title = {Few-Shot Constraint Enhancement Based on Generative Adversarial Networks},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {57-69},
note = {Poster Volume Ⅰ}
}
- Skeleton-Based Actions Recognition with Significant Displacements,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Chengming Liu, Jiahao Guan, and Haibo Pang
Abstract: In the realm of human skeleton-based action recognition, the graph convolu-tional networks have proven to be successful, However, directly storing co-ordinate features into the graph structure presents challenges in achieving shift, scale, and rotation invariance, which is crucial for actions with signifi-cant displacements, Such as figure skating, due to the significant displace-ments of athletes relative to the camera and the inherent perspective effects, leading to variations in scale, position, and rotation-related features, Signifi-cant displacements and perspective effects in actions video result in varia-tions in scale, position, and rotation-related features, To address this, drawing inspiration from leveraging high-order information, we propose a novel co-sine stream, This stream utilizes the bending angle of human joints for action recognition based on human skeleton, Furthermore, we introduce a new keyframe downsampling algorithm that significantly improves model per-formance, Notably, our approach does not necessitate any modifications to the backbone, Through extensive experiments on three datasets: FSD-10, FineGYM, and NTU RGB_D, our approach demonstrates improved recogni-tion of actions with significant displacement compared to current mainstream methods.
Keyword: Action Recognition, Skeleton, Angle, Figure skating
Cite
@inproceedings{ICIC2024,
author = {Chengming Liu, Jiahao Guan, and Haibo Pang},
title = {Skeleton-Based Actions Recognition with Significant Displacements},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {480-491},
note = {Poster Volume Ⅰ}
}
- DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Wei Wang, Jixing He, and Xin Wang
Abstract: It is helpful in preventing colorectal cancer to detect and treat polyps in the gastrointestinal tract early, However, there have been few studies to date on designing polyp image classification networks that balance efficiency and accuracy, This challenge is mainly attributed to the fact that polyps are similar to other pathologies and have complex features influenced by texture, color, and morphology, In this paper, we propose a novel network DFE-IANet based on both spectral transformation and feature interaction, Firstly, to extract detailed features and multi-scale features, the features are transformed by the multi-scale frequency domain feature extraction MSFD block to extract texture details at the fine-grained level in the frequency domain, Secondly, the multi-scale interaction attention MSIA block is designed to enhance the network's capability of extracting critical features, This block introduces multi-scale features into self-attention, aiming to adaptively guide the network to concentrate on vital regions, Finally, with a compact parameter of only 4M, DFE-IANet outperforms the latest and classical networks in terms of efficiency, Furthermore, DFE-IANet achieves state-of-the-art SOTA results on the challenging Kvasir dataset, demonstrating a remarkable Top-1 accuracy of 93, 94 , This outstanding accuracy surpasses ViT by 8, 94 , ResNet50 by 1, 69 , and VMamba by 1, 88 , The code is publicly available at https: anonymous, 4open, science'' DFE-IANet-FABE.
Keyword: Polyp image classification, spectral transformation, feature interaction, multi-scale
Cite
@inproceedings{ICIC2024,
author = {Wei Wang, Jixing He, and Xin Wang},
title = {DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {492-508},
note = {Poster Volume Ⅰ}
}
- A Lightweight Dual-Channel Multimodal Emotion Recognition Network Using Facial Expressions and Eye Movements,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Mengcheng Ji, Fulan Fan, Xin Nie, and Yahong Li
Abstract: Emotional understanding plays a crucial role in various fields related to human-computer interaction, emotional computing, and human behavior anal-ysis, However, traditional single-modal methods often struggle to capture the complexity and subtleties of emotional states, With the advances in eye-tracking technology and facial expression recognition technology, eye-tracking and facial expressions provide complementary insight, We combine eye-tracking and facial expressions to conduct emotional research, Combining these two types of infor-mation more comprehensively and accurately describes the emotional experience of individuals and improves upon methods using a single mode, Because human emotional changes require event induction, the events and methods of emotion induction are extremely important, We also present a data collection experiment using emotion theory in psychology, We selected three types of emotion-activat-ing images positive, neutral, and negative from the Chinese Affective Picture System CAPS , We design a system to extract features from the collected data, fusing the multi-modal eye tracking and facial expressions, This system is our proposed dual-channel multi-modal emotion recognition lightweight network VGG-inspired LightNet using a convolutional neural network CNN , This model achieved an accuracy rate of 96, 25 in tests using our gathered data.
Keyword: Multimodal,Facial expressions,Eye-tracking,Feature fusion,Emotional recognition
Cite
@inproceedings{ICIC2024,
author = {Mengcheng Ji, Fulan Fan, Xin Nie, and Yahong Li},
title = {A Lightweight Dual-Channel Multimodal Emotion Recognition Network Using Facial Expressions and Eye Movements},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {53-67},
note = {Poster Volume Ⅱ}
}
- A Multi-subject Classification Algorithm Based on SVM Geometric Interpretation,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Chi Tang
Abstract: A new multi-subject classification algorithm based on support vector machine SVM is proposed, For each class of training samples, a minimum convex shell surrounding as many samples as possible is constructed in the feature space by using soft SK algorithm, and finally the multi-subject classifier composed of multiple convex shells is obtained, For the sample to be classified, its classes are determined according to the convex hulls in which it are located, If it is not in any convex hull, firstly, the membership degree is determined by the distance that from it to the centroid of each class sample, and then its class to which it belongs is determined according to the membership degree, The classification experiments are carried out on the standard dataset Reuters 21578, and the classification performance is compared with the hyperellipsoid SVM classification algorithm, The experimental results show that compared with the hyperellipsoid SVM classification algorithm, the proposed algorithm can ensure the inheritability of the classifier and the classification accuracy is significantly improved, which effectively solves the influence of sample distribution shape on classification performance.
Keyword: Multi-subject classification, Convex hull, Schlesinger-Kozinec algorithm, Support vector machine
Cite
@inproceedings{ICIC2024,
author = {Chi Tang},
title = {A Multi-subject Classification Algorithm Based on SVM Geometric Interpretation},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {236-244},
note = {Poster Volume Ⅱ}
}
- A Unified Model for Unimodal and Multimodal Rumor Detection,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Haibing Zhou, Zhong Qian, Peifeng Li, and Qiaoming Zhu
Abstract: Rumor detection aims to determine the truthfulness of a post, no matter it is unimodal plain text or multimodal text and images , However, previous models only considered one of these situations, ignoring the possibility of both occurring simultaneously, Additionally, previous multimodal models often failed to tackle the inconsistency between texts and images, which can produce noise and harm performance, To address the aforementioned issues, we propose a novel unified model for unimodal and multimodal rumor detection, called the Graph Attention Generative Image Network GAGIN , which is integrated with multimodal alignment, The experimental results on two popular datasets demonstrate that GAGIN outperforms the state-of-the-art baselines.
Keyword: Unified model, Rumor detection, Graph attention network, Diffusion model and Clip model
Cite
@inproceedings{ICIC2024,
author = {Haibing Zhou, Zhong Qian, Peifeng Li, and Qiaoming Zhu},
title = {A Unified Model for Unimodal and Multimodal Rumor Detection},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {611-622},
note = {Poster Volume Ⅱ}
}
- Deblurring via Video Diffusion Models,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Yan Wang and Haoyang Long
Abstract: Video deblurring poses a significant challenge due to the intricate nature of blur, which often arises from a confluence of factors such as camera shakes, object motions, and variations in depth, While diffusion models and video diffusion models have respectively shone brightly in the fields of image and video generation, achieving remarkable results, Specifically, Diffusion Probabilistic Models DPMs have been successfully utilized for image deblurring, indicating the vast potential for research and development of video diffusion models in the realm of video deblurring, However, due to the significant data and training time requirements of diffusion models, the prospects of video diffusion models for video deblurring tasks remain uncertain, To investigate the feasibility of video diffusion models in video deblurring, this paper proposes a diffusion model specifically tailored for this task, Its model structure and some parameters are based on a pre-trained text-to-video diffusion model, and through a two-stage training process, it can accomplish video deblurring with a relatively small number of training parameters and data, Furthermore, this paper compares the performance of the proposed model with baseline models and achieves state-of-the-art results.
Keyword: Computer vision, Video deblurring, Diffusion model
Cite
@inproceedings{ICIC2024,
author = {Yan Wang and Haoyang Long},
title = {Deblurring via Video Diffusion Models},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {509-520},
note = {Poster Volume Ⅰ}
}
- Double Global and Local Information-based Image Inpainting,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Shibin Wang, Wenjie Guo, Shiying Zhang, Xuening Guo, and Jiayi Guo
Abstract: With the development of deep learning, significant progress has been made in image inpainting, Deep learning-based image inpainting methods can generate visually plausible inpainting results, However, the inpainting images may include the distortions or artifacts, especially at boundaries and high-texture regions, To address these issues, we propose an improved two-stage inpainting model with double local and global information, In the first stage, an Local Binary Pattern LBP learning network based on the U-Net architecture is employed to accurately predict the semantic and structural information of the missing regions, In the second stage, the double local and global network based on spatial attention module and Double-PatchGAN Discriminator DPD are proposed for further refinement, Aim to achieve the accurate, realistic, and high-quality inpainting results, the Multiple Loss Functions MLF is designed to strengthen the information at different levels, Extensive experiments conducted on public datasets, including CelebA-HQ, Places2 and Paris StreetView, demonstrate that our model outperforms several existing methods in terms of image inpainting.
Keyword: Deep learning, Image inpainting, Local Binary Pattern, Double-PatchGAN Discriminator, Multiple Loss Functions
Cite
@inproceedings{ICIC2024,
author = {Shibin Wang, Wenjie Guo, Shiying Zhang, Xuening Guo, and Jiayi Guo},
title = {Double Global and Local Information-based Image Inpainting},
booktitle = {Proceedings of the 20th International Conference on Intelligent Computing (ICIC 2024)},
month = {August},
date = {5-8},
year = {2024},
address = {Tianjin, China},
pages = {521-537},
note = {Poster Volume Ⅰ}
}
- WIDDAS: A Word-Importance-Distribution-based Detection method against Word-Level Adversarial Samples,
ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Authors: Xiangge Li, Hong Luo, and Yan Sun
Abstract: Deep neural networks are facing security threats from adversarial samples, and even the most advanced large-scale language models are still vulnerable to adversarial attacks, Moreover, existing defense methods against adversarial attacks suffer from issues such as low accuracy in detection, too much false detection of clean data, and high defense costs, Therefore, in this paper, we propose WIDDAS: a Word-Importance-Distribution-based Detection method against Word-Level Adversarial Samples , It comprises a detection module and an evaluation module, The detection module swiftly identifies potential adversarial samples based on the word importance distribution of the input text, Then the evaluation module attempts to restore those samples and evaluates whether they are adversarial, thereby filtering out clean data which is non-adversarial, Experimental results demonstrate that WIDDAS outperforms