Barrier Functions Inspired Reward Shaping for Reinforcement Learning

Nilaksh Nilaksh, Abhishek Ranjan, Shreenabh Agrawal, Aayush Jain, Pushpak Jagtap, Shishir Kolathaya

PDF

Key figure (auto-extracted from paper)

Abstract

Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by barrier functions, of- fering simplicity and ease of implementation across various environments and tasks. To evaluate the effectiveness of the pro- posed reward formulations, we conduct simulation experiments on CartPole, Ant, and Humanoid environments, along with real- world deployment on the Unitree Go1 quadruped robot. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework. We have open-sourced our code at https://github.com/Safe-RL-IISc/barrier_shaping.

Index terms

Reinforcement Learning Machine Learning for Robot Control Legged Robots