MetaCleaner: A Deep Neural Network for Phage Recognition with Denoising

Authors: Yingqi Liu,Yong Wang, and Ying Wang
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 1168-1185
Keywords: Deep learning · Metagenomics · Denoising · phage identification · Noise.

Abstract

At present, the phage genome sequence recognition model based on deep learning technology faces two problems, namely, the pollution of human genome sequence and noise interference. To address this problem, we propose MetaCleaner, a phage genome sequence recognition model. MetaCleaner uses the k-mer count as the classification basis for genome sequences, and uses a parallel convolution filter and average pooling method to extract the k-mer count features of genome sequences. The denoising module implemented by the transformer architecture is used to predict the difference between the k-mer count feature of the noisy sequence and the k-mer count feature of the noise-free sequence, and the denoising operation is completed by subtracting the difference between the k-mer count feature of the noisy sequence. Finally, the denoised k-mer count feature is input into the fully connected layer to obtain the probability of the sequence belonging to phage and human. Our experiments on test sets with noise show that MetaCleaner is robust to noise, and experiments on real metagenomic datasets show that MetaCleaner outperforms recent proposed phage recognition models.
📄 View Full Paper (PDF) 📋 Show Citation