CFMT: A Music Transcription Model using Conformer Architecture

Authors: Yulong Wang, Department of Software Engineering, Nankai University, China, 2120220692@mail.nankai.edu.cn Hailong Yu, Department of Software Engineering, Nankai University, China, 2120220695@mail.nankai.edu.cn Fengchi Sun, Department of Software Engineering, Nankai University, China, fengchisun@nankai.edu.cn Jianyu Zhou, Department of Software Engineering, Nankai University, China, jyzhou@nankai.edu.cn
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 412-427
Keywords: Automatic music transcription, Conformer, Note events

Abstract

Automatic music transcription is a fundamental process for con￾verting audio recordings of musical compositions into symbolic representa￾tions. This study extends the current state of the art in automatic music tran￾scription with a specific focus on leveraging conformer models, known for their
exceptional performance in speech recognition. Furthermore, this research in￾troduces an innovative approach designed to address the longstanding chal￾lenge of missing note-end events, which has previously hindered the accurate
evaluation of Seq2Seq models using frame-wise metrics. Empirical findings
reveal that a slightly modified Conformer model surpasses existing models
across a spectrum of evaluation metrics, even outperforming models trained
on distinct iterations of the MAESTRO dataset. Notably, this research contributes
to the enhancement of frame-wise evaluation metrics for Seq2Seq models by
providing estimations of possible note lengths for ongoing musical notes, re￾sulting in a substantial improvement in evaluation accuracy
📄 View Full Paper (PDF) 📋 Show Citation