TY - JOUR
T1 - Revealing principles of autonomous thermal soaring in windy conditions using vulture-inspired deep reinforcement-learning
AU - Flato, Yoav
AU - Harel, Roi
AU - Tamar, Aviv
AU - Nathan, Ran
AU - Beatus, Tsevi
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Thermal soaring, a technique used by birds and gliders to utilize updrafts of hot air, is an appealing model-problem for studying motion control and how it is learned by animals and engineered autonomous systems. Thermal soaring has rich dynamics and nontrivial constraints, yet it uses few control parameters and is becoming experimentally accessible. Following recent developments in applying reinforcement learning methods for training deep neural-network (deep-RL) models to soar autonomously both in simulation and real gliders, here we develop a simulation-based deep-RL system to study the learning process of thermal soaring. We find that this process has learning bottlenecks, we define a new efficiency metric and use it to characterize learning robustness, we compare the learned policy to data from soaring vultures, and find that the neurons of the trained network divide into function clusters that evolve during learning. These results pose thermal soaring as a rich yet tractable model-problem for the learning of motion control.
AB - Thermal soaring, a technique used by birds and gliders to utilize updrafts of hot air, is an appealing model-problem for studying motion control and how it is learned by animals and engineered autonomous systems. Thermal soaring has rich dynamics and nontrivial constraints, yet it uses few control parameters and is becoming experimentally accessible. Following recent developments in applying reinforcement learning methods for training deep neural-network (deep-RL) models to soar autonomously both in simulation and real gliders, here we develop a simulation-based deep-RL system to study the learning process of thermal soaring. We find that this process has learning bottlenecks, we define a new efficiency metric and use it to characterize learning robustness, we compare the learned policy to data from soaring vultures, and find that the neurons of the trained network divide into function clusters that evolve during learning. These results pose thermal soaring as a rich yet tractable model-problem for the learning of motion control.
UR - http://www.scopus.com/inward/record.url?scp=85195622259&partnerID=8YFLogxK
U2 - 10.1038/s41467-024-48670-x
DO - 10.1038/s41467-024-48670-x
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 38858356
AN - SCOPUS:85195622259
SN - 2041-1723
VL - 15
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 4942
ER -