Value Imitation Reinforcement Learning in Self-Training Dialogue State Tracking

Authors: Jie Yang, Hui Song, Bo Xu, Tianqi Liu
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 651-665
Keywords: Dialogue State Tracking, Reinforcement Learning, Self-Training

Abstract

Few-shot Dialogue State Tracking aims to predict the dialogue state with limited labeled data, especially when human annotation is scarce. Existing approaches that use the Self-Training framework often suffer from the gradual drift problem, which results in a noisy expanded labeled dataset. Moreover, except model initialization process, the knowledge of the annotated data has not been fully investigated to accurately deal with unlabeled data. In this paper, we introduce Slot Value Imitation Reinforcement Learning into the Self-Training process to alleviate bias selection and improve the quality of pseudo-label. The reinforcement learning step encourages pseudo-labeled data to imitate the standard value representation of each slot, and then high-confidence pseudo labels are chosen by a dual selection strategy based on value probability and active slot accuracy. Experimental results on the MultiWOZ 2.0 and MultiWOZ 2.4 dataset demonstrate the effectiveness of our proposed model in few-shot DST scenarios. Compared to the original self-training method, Joint Goal Accuracy has a maximum improvement of 2.66% in MultiWOZ 2.0.
📄 View Full Paper (PDF) 📋 Show Citation