<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LLMs | Learning And Signal Processing</title><link>https://ucl-lasp.github.io/tag/llms/</link><atom:link href="https://ucl-lasp.github.io/tag/llms/index.xml" rel="self" type="application/rss+xml"/><description>LLMs</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sun, 25 Jan 2026 00:00:00 +0000</lastBuildDate><image><url>https://ucl-lasp.github.io/media/icon_hu488c70cfa50b07216f285734af4abcd1_22080_512x512_fill_lanczos_center_3.png</url><title>LLMs</title><link>https://ucl-lasp.github.io/tag/llms/</link></image><item><title>LLM Alignment &amp; Exploration</title><link>https://ucl-lasp.github.io/project/llm-alignment/</link><pubDate>Sun, 25 Jan 2026 00:00:00 +0000</pubDate><guid>https://ucl-lasp.github.io/project/llm-alignment/</guid><description>&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Large Language Models (LLMs) are powerful, but aligning them with human preferences and encouraging them to explore novel solutions remains difficult. We bring techniques from control theory and exploration research to LLMs.&lt;/p>
&lt;h2 id="active-projects">Active Projects&lt;/h2>
&lt;h3 id="1-bayesian-optimization-from-human-feedback">1. Bayesian Optimization from Human Feedback&lt;/h3>
&lt;p>&lt;strong>Goal:&lt;/strong> Optimize LLM outputs with minimal human labelling.
&lt;strong>Details:&lt;/strong> We treat alignment as a Bayesian Optimization problem. By efficiently querying human preferences, we aim to find optimal prompts or model weights with theoretical regret bounds, minimizing the cost of human annotation.&lt;/p>
&lt;h3 id="2-post-training-exploration">2. Post-Training Exploration&lt;/h3>
&lt;p>&lt;strong>Goal:&lt;/strong> Encouraging LLMs to think &amp;ldquo;outside the box.&amp;rdquo;
&lt;strong>Details:&lt;/strong> Standard RLHF can lead to mode collapse (repetitive answers). We are investigating the impact of &lt;strong>intrinsic rewards&lt;/strong> on LLMs, encouraging the model to explore diverse reasoning paths and discover creative solutions during the fine-tuning phase.&lt;/p>
&lt;h2 id="related-publications">Related Publications&lt;/h2>
&lt;div class="pub-list-item" style="margin-bottom: 1rem;">
&lt;i class="far fa-file-alt pub-icon" aria-hidden="true">&lt;/i>
&lt;span style="font-weight: bold;">Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds&lt;/span>&lt;br>
A Kayal, S Vakili, L Toni, D Shiu, A Bernacchia. &lt;em>ICML 2025&lt;/em>.&lt;br>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://ucl-lasp.github.io/publication/kayal-2025-bayesian/" target="_blank" rel="noopener">Details&lt;/a>
&lt;a class="btn btn-outline-primary btn-page-header btn-sm" href="https://arxiv.org/pdf/2411.01190.pdf" target="_blank" rel="noopener">PDF&lt;/a>
&lt;/div></description></item></channel></rss>