<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Agents | Learning And Signal Processing</title><link>https://ucl-lasp.github.io/tag/agents/</link><atom:link href="https://ucl-lasp.github.io/tag/agents/index.xml" rel="self" type="application/rss+xml"/><description>Agents</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sun, 25 Jan 2026 00:00:00 +0000</lastBuildDate><image><url>https://ucl-lasp.github.io/media/icon_hu488c70cfa50b07216f285734af4abcd1_22080_512x512_fill_lanczos_center_3.png</url><title>Agents</title><link>https://ucl-lasp.github.io/tag/agents/</link></image><item><title>Fundamentals of RL &amp; Agents</title><link>https://ucl-lasp.github.io/project/rl-fundamentals/</link><pubDate>Sun, 25 Jan 2026 00:00:00 +0000</pubDate><guid>https://ucl-lasp.github.io/project/rl-fundamentals/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Reinforcement Learning (RL) has achieved remarkable success, yet fundamental challenges remain in making agents sample-efficient, scalable, and capable of long-term reasoning. Our research delves into the theoretical underpinnings of RL to build more robust autonomous agents.&lt;/p>
&lt;h2 id="active-projects">Active Projects&lt;/h2>
&lt;h3 id="1-scalable-environments-with-jax">1. Scalable Environments with JAX&lt;/h3>
&lt;p>&lt;strong>Goal:&lt;/strong> Accelerate RL research by orders of magnitude.
&lt;strong>Details:&lt;/strong> Building on our work &lt;strong>Navix&lt;/strong>, we leverage JAX to create vectorised grid-world environments that compile directly to XLA. This allows for massive parallelisation, enabling us to train agents in seconds rather than hours and explore meta-learning frontiers previously out of reach.&lt;/p>
&lt;h3 id="2-temporal-credit-assignment">2. Temporal Credit Assignment&lt;/h3>
&lt;p>&lt;strong>Goal:&lt;/strong> Solve the &amp;ldquo;needle in a haystack&amp;rdquo; problem in long-horizon tasks.
&lt;strong>Details:&lt;/strong> When a reward is delayed, how does the agent know which past action caused it? We are developing new mechanisms for &lt;strong>credit assignment&lt;/strong> that go beyond simple backpropagation through time, allowing agents to connect cause and effect over thousands of steps.&lt;/p>
&lt;h3 id="3-sample-efficiency-via-invariances">3. Sample Efficiency via Invariances&lt;/h3>
&lt;p>&lt;strong>Goal:&lt;/strong> Learn faster by understanding symmetries.
&lt;strong>Details:&lt;/strong> We incorporate group theory into RL agents. By explicitly encoding known invariances (e.g., rotation, translation) into the network structure or the learning objective, we drastically reduce the number of samples needed to master a task.&lt;/p>
&lt;h2 id="related-publications">Related Publications&lt;/h2>
&lt;div class="pub-list-item" style="margin-bottom: 1rem;">
&lt;i class="far fa-file-alt pub-icon" aria-hidden="true">&lt;/i>
&lt;span style="font-weight: bold;">Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning&lt;/span>&lt;br>
A Kayal, S Vakili, L Toni, A Bernacchia. &lt;em>AISTATS 2025&lt;/em>.&lt;br>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://ucl-lasp.github.io/publication/kayal-2025-sample/" target="_blank" rel="noopener">Details&lt;/a>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://arxiv.org/pdf/2502.07715.pdf" target="_blank" rel="noopener">PDF&lt;/a>
&lt;/div>
&lt;div class="pub-list-item" style="margin-bottom: 1rem;">
&lt;i class="far fa-file-alt pub-icon" aria-hidden="true">&lt;/i>
&lt;span style="font-weight: bold;">Reward-Free Kernel-Based Reinforcement Learning&lt;/span>&lt;br>
A Kayal, S Vakili, L Toni, A Bernacchia. &lt;em>ICML 2024&lt;/em>.&lt;br>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://ucl-lasp.github.io/publication/kayal-2024-reward/" target="_blank" rel="noopener">Details&lt;/a>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://arxiv.org/pdf/2502.07715.pdf" target="_blank" rel="noopener">PDF&lt;/a>
&lt;/div>
&lt;div class="pub-list-item" style="margin-bottom: 1rem;">
&lt;i class="far fa-file-alt pub-icon" aria-hidden="true">&lt;/i>
&lt;span style="font-weight: bold;">Navix: Scaling MiniGrid Environments with JAX&lt;/span>&lt;br>
E Pignatelli, J Liesen, RT Lange, C Lu, PS Castro, L Toni. &lt;em>NeurIPS 2025 Dataset Track&lt;/em>.&lt;br>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://ucl-lasp.github.io/publication/pignatelli-2025-navix/" target="_blank" rel="noopener">Details&lt;/a>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://arxiv.org/abs/2407.19396" target="_blank" rel="noopener">PDF&lt;/a>
&lt;/div>
&lt;div class="pub-list-item" style="margin-bottom: 1rem;">
&lt;i class="far fa-file-alt pub-icon" aria-hidden="true">&lt;/i>
&lt;span style="font-weight: bold;">Assessing the zero-shot capabilities of LLMs for action evaluation in RL&lt;/span>&lt;br>
E Pignatelli, J Ferret, T Rockäschel, E Grefenstette, D Paglieri, S Coward, et al. &lt;em>arXiv preprint 2024&lt;/em>.&lt;br>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://ucl-lasp.github.io/publication/pignatelli-2024-assessing/" target="_blank" rel="noopener">Details&lt;/a>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://arxiv.org/pdf/2409.12798.pdf" target="_blank" rel="noopener">PDF&lt;/a>
&lt;/div>
&lt;div class="pub-list-item" style="margin-bottom: 1rem;">
&lt;i class="far fa-file-alt pub-icon" aria-hidden="true">&lt;/i>
&lt;span style="font-weight: bold;">A survey of temporal credit assignment in deep reinforcement learning&lt;/span>&lt;br>
E Pignatelli, J Ferret, M Geist, T Mesnard, H van Hasselt, O Pietquin, L Toni. &lt;em>arXiv preprint 2023&lt;/em>.&lt;br>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://ucl-lasp.github.io/publication/pignatelli-2023-survey/" target="_blank" rel="noopener">Details&lt;/a>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://arxiv.org/pdf/2312.01072.pdf" target="_blank" rel="noopener">PDF&lt;/a>
&lt;/div>
&lt;div class="pub-list-item" style="margin-bottom: 1rem;">
&lt;i class="far fa-file-alt pub-icon" aria-hidden="true">&lt;/i>
&lt;span style="font-weight: bold;">Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds&lt;/span>&lt;br>
A Kayal, S Vakili, L Toni, D Shiu, A Bernacchia. &lt;em>ICML 2025&lt;/em>.&lt;br>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://ucl-lasp.github.io/publication/kayal-2025-bayesian/" target="_blank" rel="noopener">Details&lt;/a>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://arxiv.org/pdf/2411.01190.pdf" target="_blank" rel="noopener">PDF&lt;/a>
&lt;/div></description></item></channel></rss>