Enforcing Specific Behaviours via Constrained DRL and Scenario-Based Programming

  • Davide Corsi
  • , Raz Yerushalmi*
  • , Guy Amir
  • , Alessandro Farinelli
  • , David Harel
  • , Guy Katz
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep reinforcement learning (DRL) has achieved groundbreaking results in robotics, cyber-physical systems, healthcare, and many other real-world applications in recent years. However, despite their success, the inherent opacity and unpredictability of DRL controllers limit their widespread adoption in many safety-critical scenarios. In such contexts, it is crucial to consider additional safety and behavioral requirements pertaining to the deployed agents in addition to their performance. In this paper, we propose using Scenario-Based Programming (SBP) to define a cost signal that can be optimized together with the standard reward function to enforce additional behaviors in the final agents. To this end, we rely on the constrained DRL framework, particularly on a modified version of Lagrangian-PPO, which we call λ-PPO, designed especially for the multi-step and temporal nature of the SBP requirements. This approach allows us to easily design and enforce the agent’s adherence to these requirements during training without compromising its freedom to explore the state space and converge to an optimal policy, enabling the use of a simple reward function. We have validated our method extensively by experimenting with real robotic platforms in a mapless navigation task, demonstrating the method’s success. We use SBP to define different types of requirements, including a more predictable behavior, safety properties, and the injection of prior knowledge to drive training.

Original languageEnglish
Title of host publicationNeural Information Processing - 31st International Conference, ICONIP 2024, Proceedings
EditorsMufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer
PublisherSpringer Science and Business Media Deutschland GmbH
Pages284-302
Number of pages19
ISBN (Print)9789819666058
DOIs
StatePublished - 2025
Event31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand
Duration: 2 Dec 20246 Dec 2024

Publication series

NameLecture Notes in Computer Science
Volume15296 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference31st International Conference on Neural Information Processing, ICONIP 2024
Country/TerritoryNew Zealand
CityAuckland
Period2/12/246/12/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

Fingerprint

Dive into the research topics of 'Enforcing Specific Behaviours via Constrained DRL and Scenario-Based Programming'. Together they form a unique fingerprint.

Cite this