MECFE: A Novel Consensus Feature Engineering Approach for Enhanced Diabetes Risk Prediction

Authors: Jijun Tong, Lisi Ye, Congcong Yang, Yang Chen, Yuqiang Shen, Qingli Zhou, and Shudong Xia
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 1231-1244
Keywords: Diabetes Prediction Feature Engineering Heterogeneous Model LightGBM ClinicalBERT Interpretability.

Abstract

Diabetes mellitus has emerged as a global health crisis, with its prevalence rising sharply and placing significant strain on healthcare systems. Early and accurate prediction of diabetes risk is crucial for effective prevention and management. While machine learning and deep learning techniques have made advancements in diabetes prediction, feature engineering remains an underemphasized area and faces several challenges, particularly in terms of interpretability, depth, and stability. This study introduces a novel Medical Enhancement Consensus Feature Engineering MECFE approach to enhance the accuracy and interpretability of diabetes detection. The MECFE integrates medical knowledge with data-driven approaches through two core modules: Medical-Data Collaborative Feature Construction MD-CFC and Heterogeneous Model Consensus Feature Selection HM-CFS . MD-CFC enriches feature construction using structured medical knowledge and ClinicalBERT, while HM-CFS utilizes a three-layer weighted fusion strategy that combines evaluations from heterogeneous models, enhancing stability and clinical relevance. Based on data preprocessing, the application of the MECFE improves the quality of model inputs, thereby enhancing model performance. The results show significant improvements in the performance metrics of all models, with Bayesian-optimized LightGBM achieving the best results: R2 increasing by 0.108, RMSE decreasing by 23.66 , MSE reducing by 41.75 , and MAPE dropping by 16.42 , demonstrating the effectiveness of the MECFE. Additionally, feature importance analysis and regression tree in LightGBM are employed to further enhance the model's interpretability, providing deeper insights into the factors influencing diabetes risk.
📄 View Full Paper (PDF) 📋 Show Citation