A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

Haochen Tao; Shisheng Cui; Zhuo Li; Jian Sun

doi:10.1007/s11432-024-4397-7

A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

Haochen Tao, Shisheng Cui^*, Zhuo Li, Jian Sun

^*此作品的通讯作者

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization (BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic (AC) policy optimization problems. In this work, based on a bilevel-structured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions, the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L02L~02ϵ−1). The convergence performance of the proposed algorithm is verified through numerical simulations.

源语言	英语
文章编号	150204
期刊	Science China Information Sciences
卷	68
期	5
DOI	http://doi.org/10.1007/s11432-024-4397-7
出版状态	已出版 - 5月 2025

访问文件

10.1007/s11432-024-4397-7

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{667b52dcb00448fb9833c62e46e7ed49,

title = "A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes",

abstract = "Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization (BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic (AC) policy optimization problems. In this work, based on a bilevel-structured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions, the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L02L\textasciitilde{}02ϵ−1). The convergence performance of the proposed algorithm is verified through numerical simulations.",

keywords = "actor-critic, bilevel optimization, implicit programming, stochastic approximation, zeroth-order algorithm",

author = "Haochen Tao and Shisheng Cui and Zhuo Li and Jian Sun",

note = "Publisher Copyright: {\textcopyright} Science China Press 2025.",

year = "2025",

month = may,

doi = "10.1007/s11432-024-4397-7",

language = "English",

volume = "68",

journal = "Science China Information Sciences",

issn = "1674-733X",

publisher = "Science China Press",

number = "5",

}

TY - JOUR

T1 - A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

AU - Tao, Haochen

AU - Cui, Shisheng

AU - Li, Zhuo

AU - Sun, Jian

N1 - Publisher Copyright: © Science China Press 2025.

PY - 2025/5

Y1 - 2025/5

N2 - Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization (BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic (AC) policy optimization problems. In this work, based on a bilevel-structured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions, the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L02L~02ϵ−1). The convergence performance of the proposed algorithm is verified through numerical simulations.

AB - Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization (BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic (AC) policy optimization problems. In this work, based on a bilevel-structured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions, the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L02L~02ϵ−1). The convergence performance of the proposed algorithm is verified through numerical simulations.

KW - actor-critic

KW - bilevel optimization

KW - implicit programming

KW - stochastic approximation

KW - zeroth-order algorithm

UR - http://www.scopus.com/pages/publications/105003858922

U2 - 10.1007/s11432-024-4397-7

DO - 10.1007/s11432-024-4397-7

M3 - Article

AN - SCOPUS:105003858922

SN - 1674-733X

VL - 68

JO - Science China Information Sciences

JF - Science China Information Sciences

IS - 5

M1 - 150204

ER -

A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

摘要

访问文件

其它文件与链接

指纹

引用此