A New Bibliometrics Analysis Method for Imbalanced Classes and New Classes in the Domain of Biomedical Literature

Authors: Yangde Lin, Zhiyuan Hu, Xiaoran He, Sujuan Liu, Jianrong Li
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 424-438
Keywords: bibliometrics analysis, graph neural networks, lifelong learning, imbalanced classes, new classes.

Abstract

In the field of biomedical research, processing and understanding large amounts of academic literature quickly and accurately is critical to advancing the field. In massive data analysis, the original graph neural networks GNN has many shortcomings in processing data, such as the difficulty in effectively capturing the dynamic changes of data when dealing with dynamic graph data, as well as the bias towards a larger number of categories when dealing with category imbalance, which affects the ability of recognizing a small number of categories. In view of the aforementioned issues, this study introduces the gDOC method into the field of biomedical literature analysis. Additionally, a lifelong learning framework, termed Biology Dynamic Graph Neural Network BDGNN , is proposed, which integrates GNN to leverage its robust data representation capabilities. Furthermore, BDGNN incorporates the Focal Loss function and a temporal variance metric into the gDOC method. This enables dynamic adjustments of the model based on the temporal characteristics of the graph data. The amount of historical data used in the training process can thus be better adapted to the dynamic nature of biomedical literature citation networks. In the experimental phase, this study designs data preprocessing and data adaptation strategies tailored specifically for the PubMed dataset and the BDGNN method is implemented on a variety of typical GNN models. By varying the historical data size and labeling rate, the performance of the models in dealing with new and imbalanced category problems is comprehensively evaluated. The experimental results confirm that the accuracy of the framework improves up to 89 in dealing with imbalanced and new category recognition tasks in the field of biomedical literature compared to existing techniques.
📄 View Full Paper (PDF) 📋 Show Citation