Safe Policy Improvement with Baseline Bootstrapping under State Abstraction

Authors: Yuan Zhuang
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 409-422
Keywords: Batch Reinforcement Learning, Safe policy Improvement with Baseline Bootstrapping , State Abstraction.

Abstract

This paper studies the Safety Policy Improvement SPI problem in Batch Reinforcement Learning Batch RL , which aims to train a policy from a fixed dataset without environment interaction, while ensuring its perfor-mance is no worse than the behavior policy used for data collection. Most existing methods often require a substantial amount of historical data to ensure sufficient confidence in the performance of the learned policy. How-ever, the fixed dataset is often limited, which causes the learning overly conservative. To address this issue, we investigate the integration of state abstraction into the SPIBB framework to improve sample efficiency. While state abstraction has been widely used to improve sample efficiency, it traditionally lacks mechanisms for providing performance guarantees. We bridge this gap by deriving theoretical performance guarantees of policies learned from SPIBB under state abstraction. Empirical results show that our method achieves comparable or better policy improvement using fewer samples than the original SPIBB algorithm.
📄 View Full Paper (PDF) 📋 Show Citation