Differential High Order Control Barrier Function-Based Safe Reinforcement Learning

Xiangyu Kong; Yuanqing Xia; Zhongqi Sun; Di Hua Zhai; Yunshan Deng; Sihua Zhang

doi:10.1109/LRA.2025.3575310

Differential High Order Control Barrier Function-Based Safe Reinforcement Learning

Xiangyu Kong, Yuanqing Xia^*, Zhongqi Sun, Di Hua Zhai, Yunshan Deng, Sihua Zhang

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier function (CBF) or high order control barrier function (HOCBF) for the RL policy. A quadratic programming (QP) is then formulated and solved to modify the RL policy, enabling safe exploration. However, directly integrating the safety filter with RL presents two challenges: (1) the conservativeness of safe policy, and (2) potential infeasibility of the QP under bounded input constraints. These issues limit the performance of safe RL. In this letter, we introduce a differential HOCBF constraint by incorporating neural network-based penalty functions into HOCBF. Furthermore, we propose a differential HOCBF-based safe RL framework in which the penalty functions and RL policy are trained concurrently. To address conservativeness, we train penalty functions to maximize long-term rewards while preventing abrupt changes in safe action, thereby achieving ideal performance. To ensure the feasibility of the formulated QP under bounded input constraints, we calculate a set for penalty functions and prove that the feasibility is guaranteed if the learned penalty functions remain within the set. Finally, we verify the effectiveness of the proposed framework on the wheeled mobile robot navigation and obstacle avoidance task.

源语言	英语
页（从-至）	7524-7531
页数	8
期刊	IEEE Robotics and Automation Letters
卷	10
期	7
DOI	http://doi.org/10.1109/LRA.2025.3575310
出版状态	已出版 - 2025
已对外发布	是

访问文件

10.1109/LRA.2025.3575310

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c56625bb18f7446198eab7921dad754e,

title = "Differential High Order Control Barrier Function-Based Safe Reinforcement Learning",

abstract = "Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier function (CBF) or high order control barrier function (HOCBF) for the RL policy. A quadratic programming (QP) is then formulated and solved to modify the RL policy, enabling safe exploration. However, directly integrating the safety filter with RL presents two challenges: (1) the conservativeness of safe policy, and (2) potential infeasibility of the QP under bounded input constraints. These issues limit the performance of safe RL. In this letter, we introduce a differential HOCBF constraint by incorporating neural network-based penalty functions into HOCBF. Furthermore, we propose a differential HOCBF-based safe RL framework in which the penalty functions and RL policy are trained concurrently. To address conservativeness, we train penalty functions to maximize long-term rewards while preventing abrupt changes in safe action, thereby achieving ideal performance. To ensure the feasibility of the formulated QP under bounded input constraints, we calculate a set for penalty functions and prove that the feasibility is guaranteed if the learned penalty functions remain within the set. Finally, we verify the effectiveness of the proposed framework on the wheeled mobile robot navigation and obstacle avoidance task.",

keywords = "Reinforcement learning (RL), collision avoidance, robot safety",

author = "Xiangyu Kong and Yuanqing Xia and Zhongqi Sun and Zhai, \{Di Hua\} and Yunshan Deng and Sihua Zhang",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2025",

doi = "10.1109/LRA.2025.3575310",

language = "English",

volume = "10",

pages = "7524--7531",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "7",

}

TY - JOUR

T1 - Differential High Order Control Barrier Function-Based Safe Reinforcement Learning

AU - Kong, Xiangyu

AU - Xia, Yuanqing

AU - Sun, Zhongqi

AU - Zhai, Di Hua

AU - Deng, Yunshan

AU - Zhang, Sihua

PY - 2025

Y1 - 2025

N2 - Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier function (CBF) or high order control barrier function (HOCBF) for the RL policy. A quadratic programming (QP) is then formulated and solved to modify the RL policy, enabling safe exploration. However, directly integrating the safety filter with RL presents two challenges: (1) the conservativeness of safe policy, and (2) potential infeasibility of the QP under bounded input constraints. These issues limit the performance of safe RL. In this letter, we introduce a differential HOCBF constraint by incorporating neural network-based penalty functions into HOCBF. Furthermore, we propose a differential HOCBF-based safe RL framework in which the penalty functions and RL policy are trained concurrently. To address conservativeness, we train penalty functions to maximize long-term rewards while preventing abrupt changes in safe action, thereby achieving ideal performance. To ensure the feasibility of the formulated QP under bounded input constraints, we calculate a set for penalty functions and prove that the feasibility is guaranteed if the learned penalty functions remain within the set. Finally, we verify the effectiveness of the proposed framework on the wheeled mobile robot navigation and obstacle avoidance task.

AB - Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier function (CBF) or high order control barrier function (HOCBF) for the RL policy. A quadratic programming (QP) is then formulated and solved to modify the RL policy, enabling safe exploration. However, directly integrating the safety filter with RL presents two challenges: (1) the conservativeness of safe policy, and (2) potential infeasibility of the QP under bounded input constraints. These issues limit the performance of safe RL. In this letter, we introduce a differential HOCBF constraint by incorporating neural network-based penalty functions into HOCBF. Furthermore, we propose a differential HOCBF-based safe RL framework in which the penalty functions and RL policy are trained concurrently. To address conservativeness, we train penalty functions to maximize long-term rewards while preventing abrupt changes in safe action, thereby achieving ideal performance. To ensure the feasibility of the formulated QP under bounded input constraints, we calculate a set for penalty functions and prove that the feasibility is guaranteed if the learned penalty functions remain within the set. Finally, we verify the effectiveness of the proposed framework on the wheeled mobile robot navigation and obstacle avoidance task.

KW - Reinforcement learning (RL)

KW - collision avoidance

KW - robot safety

UR - http://www.scopus.com/pages/publications/105007366011

U2 - 10.1109/LRA.2025.3575310

DO - 10.1109/LRA.2025.3575310

M3 - Article

AN - SCOPUS:105007366011

SN - 2377-3766

VL - 10

SP - 7524

EP - 7531

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 7

ER -

Differential High Order Control Barrier Function-Based Safe Reinforcement Learning

摘要

访问文件

其它文件与链接

指纹

引用此