Fundamentals of RL & Agents

Overview

Reinforcement Learning (RL) has achieved remarkable success, yet fundamental challenges remain in making agents sample-efficient, scalable, and capable of long-term reasoning. Our research delves into the theoretical underpinnings of RL to build more robust autonomous agents.

Active Projects

1. Scalable Environments with JAX

Goal: Accelerate RL research by orders of magnitude. Details: Building on our work Navix, we leverage JAX to create vectorised grid-world environments that compile directly to XLA. This allows for massive parallelisation, enabling us to train agents in seconds rather than hours and explore meta-learning frontiers previously out of reach.

2. Temporal Credit Assignment

Goal: Solve the “needle in a haystack” problem in long-horizon tasks. Details: When a reward is delayed, how does the agent know which past action caused it? We are developing new mechanisms for credit assignment that go beyond simple backpropagation through time, allowing agents to connect cause and effect over thousands of steps.

3. Sample Efficiency via Invariances

Goal: Learn faster by understanding symmetries. Details: We incorporate group theory into RL agents. By explicitly encoding known invariances (e.g., rotation, translation) into the network structure or the learning objective, we drastically reduce the number of samples needed to master a task.

Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning
A Kayal, S Vakili, L Toni, A Bernacchia. AISTATS 2025.
Details PDF
Reward-Free Kernel-Based Reinforcement Learning
A Kayal, S Vakili, L Toni, A Bernacchia. ICML 2024.
Details PDF
Navix: Scaling MiniGrid Environments with JAX
E Pignatelli, J Liesen, RT Lange, C Lu, PS Castro, L Toni. NeurIPS 2025 Dataset Track.
Details PDF
Assessing the zero-shot capabilities of LLMs for action evaluation in RL
E Pignatelli, J Ferret, T Rockäschel, E Grefenstette, D Paglieri, S Coward, et al. arXiv preprint 2024.
Details PDF
A survey of temporal credit assignment in deep reinforcement learning
E Pignatelli, J Ferret, M Geist, T Mesnard, H van Hasselt, O Pietquin, L Toni. arXiv preprint 2023.
Details PDF
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
A Kayal, S Vakili, L Toni, D Shiu, A Bernacchia. ICML 2025.
Details PDF