Clustering-based Self-Supervised Multi-Scale Generative Adversarial Network for Data Imputation

Authors: Yi Xu, Xuhui Xing, Anchi Chen, Yang Liu
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 341-358
Keywords: Missing Data, Generative Adversarial Networks, Clustering.

Abstract

Missing data has always been a challenging issue in machine learning.
The Generative Adversarial Imputation Network GAIN has been proven to be
superior to many existing solutions. However, GAIN suffers from two limitations: first, it does not consider the correlations among input samples second, it only imputes based on adversarial loss and reconstruction loss of non-missing values without considering the reconstruction loss of missing values. To address these issues, this paper proposes a clustering-based self-supervised multi-scale Generative Adversarial Network for data imputation method, CCGAIN. Firstly, the dataset to be imputed is clustered, and subsequent imputation is performed on samples within each cluster. Then, based on features with low missing rates, local scale data is constructed for each cluster. Next, we use the imputation results of local scale missing values as supervised information for global scale missing value imputation, constructing the reconstruction loss for global scale missing values. Finally, based on the reconstruction loss of missing values, the reconstruction loss of non-missing values, and the adversarial loss, imputation is performed at the global scale. Experimental results demonstrate the effectiveness of
this method.
📄 View Full Paper (PDF) 📋 Show Citation