Robust Reinforcement Learning for Control
Adversarial Body Shape Search for Legged Robots
We propose an evolutionary computation method for an adversarial attack on the length and thickness of parts of legged robots by deep reinforcement learning. This attack changes the robot body shape and interferes with walking—we call the attacked body as adversarial body shape. The evolutionary computation method searches adversarial body shape by minimizing the expected cumulative reward earned through walking simulation. To evaluate the effectiveness of the proposed method, we perform experiments with three legged robots, Walker2d, Ant-v2, and Humanoid-v2 in OpenAI Gym. The experimental results reveal that Walker2d and Ant-v2 are more vulnerable to the attack on the length than the thickness of the body parts, whereas Humanoid-v2 is vulnerable to the attack on both of the length and thickness. We further identify that the adversarial body shapes break left-right symmetry or shift the center of gravity of the legged robots. Finding adversarial body shape can be used to proactively diagnose the vulnerability of legged robot walking.
Adversarial joint attacks on legged robots
We address adversarial attacks on the actuators at the joints of legged robots trained by deep reinforcement learning. The vulnerability to the joint attacks can significantly impact the safety and robustness of legged robots. In this study, we demonstrate that the adversarial perturbations to the torque control signals of the actuators can significantly reduce the rewards and cause walking instability in robots. To find the adversarial torque perturbations, we develop black-box adversarial attacks, where, the adversary cannot access the neural networks trained by deep reinforcement learning. The black box attack can be applied to legged robots regardless of the architecture and algorithms of deep reinforcement learning. We employ three search methods for the black-box adversarial attacks: random search, differential evolution, and numerical gradient descent methods. In experiments with the quadruped robot Ant-v2 and the bipedal robot Humanoid-v2, in OpenAI Gym environments, we find that differential evolution can efficiently find the strongest torque perturbations among the three methods. In addition, we realize that the quadruped robot Ant-v2 is vulnerable to the adversarial perturbations, whereas the bipedal robot Humanoid-v2 is robust to the perturbations. Consequently, the joint attacks can be used for proactive diagnosis of robot walking instability.
Reinforcement Learning for Fault-Tolerant Robot Control
This study is aimed at addressing the problem of fault tolerance of quadruped robots to actuator failure, which is critical for robots operating in remote or extreme environments. In particular, an adaptive curriculum reinforcement learning algorithm with dynamics randomization (ACDR) is established. The ACDR algorithm can adaptively train a quadruped robot in random actuator failure conditions and formulate a single robust policy for fault-tolerant robot control. It is noted that the hard2easy curriculum is more effective than the easy2hard curriculum for quadruped robot locomotion. The ACDR algorithm can be used to build a robot system that does not require additional modules for detecting actuator failures and switching policies. Experimental results show that the ACDR algorithm outperforms conventional algorithms in terms of the average reward and walking distance.
Wataru Okamoto, Hiroshi Kera, and Kazuhiko Kawamoto, Reinforcement Learning with Adaptive Curriculum Dynamics Randomization for Fault-Tolerant Robot Control, arXiv:2111.10005, 2021 [arXiv].
Wataru Okamoto and Kazuhiko Kawamoto, Reinforcement Learning with Randomized Physical Parameters for Fault-Tolerant Robots, Proc. SCIS-ISIS, pp.449-452, 2020 [arXiv].