Abstract
Deep reinforcement learning (DRL) has achieved groundbreaking results in robotics, cyber-physical systems, healthcare, and many other real-world applications in recent years. However, despite their success, the inherent opacity and unpredictability of DRL controllers limit their widespread adoption in many safety-critical scenarios. In such contexts, it is crucial to consider additional safety and behavioral requirements pertaining to the deployed agents in addition to their performance. In this paper, we propose using Scenario-Based Programming (SBP) to define a cost signal that can be optimized together with the standard reward function to enforce additional behaviors in the final agents. To this end, we rely on the constrained DRL framework, particularly on a modified version of Lagrangian-PPO, which we call λ-PPO, designed especially for the multi-step and temporal nature of the SBP requirements. This approach allows us to easily design and enforce the agent’s adherence to these requirements during training without compromising its freedom to explore the state space and converge to an optimal policy, enabling the use of a simple reward function. We have validated our method extensively by experimenting with real robotic platforms in a mapless navigation task, demonstrating the method’s success. We use SBP to define different types of requirements, including a more predictable behavior, safety properties, and the injection of prior knowledge to drive training.
| Original language | English |
|---|---|
| Title of host publication | Neural Information Processing - 31st International Conference, ICONIP 2024, Proceedings |
| Editors | Mufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 284-302 |
| Number of pages | 19 |
| ISBN (Print) | 9789819666058 |
| DOIs | |
| State | Published - 2025 |
| Event | 31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand Duration: 2 Dec 2024 → 6 Dec 2024 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15296 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 31st International Conference on Neural Information Processing, ICONIP 2024 |
|---|---|
| Country/Territory | New Zealand |
| City | Auckland |
| Period | 2/12/24 → 6/12/24 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Fingerprint
Dive into the research topics of 'Enforcing Specific Behaviours via Constrained DRL and Scenario-Based Programming'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver