A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

Haochen Tao, Shisheng Cui*, Zhuo Li, Jian Sun

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization (BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic (AC) policy optimization problems. In this work, based on a bilevel-structured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions, the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L02L~02ϵ−1). The convergence performance of the proposed algorithm is verified through numerical simulations.

源语言英语
文章编号150204
期刊Science China Information Sciences
68
5
DOI
出版状态已出版 - 5月 2025

指纹

探究 'A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes' 的科研主题。它们共同构成独一无二的指纹。

引用此