Deep reinforcement learning agents have achieved unprecedented results when learning to generalize from unstructured data. However, the “black-box” nature of the trained DRL agents makes it difficult to ensure that they adhere to various requirements posed by engineers. In this work, we put forth a novel technique for enhancing the reinforcement learning training loop, and specifically—its reward function, in a way that allows engineers to directly inject their expert knowledge into the training process. This allows us to make the trained agent adhere to multiple constraints of interest. Moreover, using scenario-based modeling techniques, our method allows users to formulate the defined constraints using advanced, well-established, behavioral modeling methods. This combination of such modeling methods together with ML learning tools produces agents that are both high performing and more likely to adhere to prescribed constraints. Furthermore, the resulting agents are more transparent and hence more maintainable. We demonstrate our technique by evaluating it on a case study from the domain of internet congestion control, and present promising results.
Bibliographical noteFunding Information:
The work of R. Yerushalmi, G. Amir, A. Elyasaf and G. Katz was partially supported by a grant from the Israeli Smart Transportation Research Center (ISTRC). The work of G. Amir was supported by a scholarship from the Clore Israel Foundation. The work of D. Harel, A. Marron and R. Yerushalmi was partially supported by a research grant from the Estate of Harry Levine, the Estate of Avraham Rothstein, Brenda Gruss and Daniel Hirsch, the One8 Foundation, Rina Mayer, Maurice Levy, and the Estate of Bernice Bernath, a grant 3698/21 from the ISF-NSFC joint to the Israel Science Foundation and the National Science Foundation of China, and a grant from the Minerva foundation.
© 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
- Deep reinforcement learning
- Domain expertise
- Machine learning
- Rule-based specifications
- Scenario-based modeling