Over 10 years we helping companies reach their financial and branding goals. Onum is a values-driven SEO agency dedicated.

LATEST NEWS

Project Manager – Resume Writing
February 21, 2024
Startup Branding
February 21, 2024

CONTACTS

Intrinsically Motivated Multi-Goal Reinforcement Learning Using Robotics Environment Integrated With OpenAI Gym

Home
portfolio
Publications
Intrinsically Motivated Multi-Goal Reinforcement Learning Using Robotics Environment Integrated With OpenAI Gym

Published

March 8, 2024

Intrinsically Motivated Multi-Goal Reinforcement Learning Using Robotics Environment Integrated With OpenAI Gym

Abstract

Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabelling the goals. In open-ended and changing environments, agents face a wide range of potential tasks that might not come with associated reward functions. Such autonomous learning agents must set their own tasks and build their own curriculum through an intrinsically motivated exploration. Because some tasks might prove easy and some impossible, agents must actively select which task to practice at any given moment, to maximize their overall mastery on the set of learnable tasks. The purpose of this technical report is two-fold. First, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input.

The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay. The Fetch environments are based on the 7-DoF Fetch robotics arm,2 which has a two-ﬁngered parallel gripper. Agents focus on achievable tasks first and focus back on tasks that are being forgotten. Experiments conducted in a new multi-task multi-goal robotic environment show that our algorithm benefits from these two ideas and demonstrate properties of robustness to distracting tasks, forgetting and changes in body properties

Keywords:

Artificial Intelligence, Product Management, Data Science, Reinforcement Learning, Deep Learning, Multi-GoalReinforcement Learning, Robotics

References

Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O. P., and Zaremba, W. (2017). Hindsight experience replay. In Advances in Neural Information Processing Systems, pages 5055-5065.

Bellemare, M. G., Dabney, W., and Munos, R. (2017). A distributional perspective on reinforcement learning. arXiv preprint arXiv:1707.06887.

Brockman,G.,Cheung,V.,Pettersson,L.,Schneider,J.,Schulman,J.,Tang,J.,andZaremba,W.(2016). Openai gym. arXiv preprint arXiv:1606.01540.

Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., and Wu, Y. (2017). OpenAI Baselines.

Florensa, C., Held, D., Wulfmeier, M., and Abbeel, P. (2017). Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300.

Gu, S., Lillicrap, T., Ghahramani, Z., Turner, R. E., Schölkopf, B., and Levine, S. (2017). Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. arXiv preprint arXiv:1706.00387.

He, F. S., Liu, Y., Schwing, A. G., and Peng, J. (2016). Learning to play in a day: Faster deep reinforcement learning by optimality tightening. arXiv preprint arXiv:1611.01606.

Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. A Novel Approach for Color Image, Steganography using NUBASI and Randomized, Secret Sharing Algorithm. Indian Journal Science and Technology, 8(S7), 228-235.

https://doi.org/10.17485/ijst/2015/v8iS7/64275