Nagajayant Nagamani,
- Software engagement, manager, Chennai, Tamil Nadu, India
Abstract
This paper investigates Random Latent Exploration (RLE), a novel reinforcement learning technique that enhances exploration using randomized latent vector conditioning. I evaluate RLE’s performance across various environments, including discrete control tasks (FourRoom), continuous control (IsaacLab), and complex visual domains (Atari games). The core approach augments traditional reward functions with intrinsic rewards, calculated as the dot product between state features and periodically resampled latent vectors. The policy and value networks are conditioned on these latent vectors and trained to optimize cumulative rewards that combine both extrinsic and intrinsic signals. I benchmark RLE against established methods such as Proximal Policy Optimization (PPO), NoisyNet, and Random Network Distillation (RND). Results show that RLE consistently improves exploration and performance, particularly in sparse-reward settings. Notably, it achieves greater state-space coverage, as measured by Shannon entropy of state visitations. Further analysis shows that the choice of latent vector distribution significantly affects performance, with standard normal and von Mises distributions outperforming uniform and exponential ones. I also propose two adaptive variants—queue-based and neural-adaptive von Mises-Fisher—that dynamically adjust exploration and yield improved outcomes in complex tasks. This study supports RLE’s effectiveness while highlighting areas for further development in exploration-driven reinforcement learning.
Keywords: Random Network Distillation (RND), Reinforcement Learning, NoisyNet, Random Latent Exploration (RLE), IsaacLab
[This article belongs to Current Trends in Signal Processing ]
Nagajayant Nagamani. Randomized Latent Vectors for Enhanced Reinforcement Learning Exploration. Current Trends in Signal Processing. 2025; 15(03):19-25.
Nagajayant Nagamani. Randomized Latent Vectors for Enhanced Reinforcement Learning Exploration. Current Trends in Signal Processing. 2025; 15(03):19-25. Available from: https://journals.stmjournals.com/ctsp/article=2025/view=222434
References
- Mahankali S, Hong ZW, Sekhari A, Rakhlin A, Agrawal P. Random latent exploration for deep reinforcement learning. arXiv preprint arXiv:2407.13755. 2024 Jul 18.
- Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. 2017 Jul 20.
- Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, Blundell C. Noisy networks for exploration. 2017. arXiv preprint arXiv:1706.10295. 2018.
- Burda Y, Edwards H, Storkey A, Klimov O. Exploration by random network distillation. arXiv preprint arXiv:1810.12894. 2018 Oct 30.
- Sutton RS, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence. 1999 Aug 1;112(1–2):181–211.
- Towers M, Kwiatkowski A, Terry J, Balis JU, De Cola G, Deleu T, Goulão M, Kallinteris A, Krimmel M, KG A, Perez-Vicente R. Gymnasium: A standard interface for reinforcement learning environments. arXiv preprint arXiv:2407.17032. 2024 Jul 24.
- Shannon CE. A mathematical theory of communication. The Bell system technical journal. 1948 Jul;27(3):379–423.
- Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. 2013 Dec 19.
- Badia AP, Piot B, Kapturowski S, Sprechmann P, Vitvitskyi A, Guo ZD, Blundell C. Agent57: Outperforming the atari human benchmark. InInternational conference on machine learning 2020 Nov 21 (pp. 507–517). PMLR.
- A. Serrano-Mun˜oz, D. Chrysostomou, S. Bøgh, and N. Arana- Arana-Arexolaleiba N. skrl: Modular and Flexible Library for Reinforcement Learning. 11. Pinzón C, Jung K. Fast Python sampler for the von Mises Fisher distribution.

Current Trends in Signal Processing
| Volume | 15 |
| Issue | 03 |
| Received | 05/07/2025 |
| Accepted | 25/07/2025 |
| Published | 07/08/2025 |
| Publication Time | 33 Days |
Login
PlumX Metrics