LLM Alignment & Exploration

Overview

Large Language Models (LLMs) are powerful, but aligning them with human preferences and encouraging them to explore novel solutions remains difficult. We bring techniques from control theory and exploration research to LLMs.

Active Projects

1. Bayesian Optimization from Human Feedback

Goal: Optimize LLM outputs with minimal human labelling. Details: We treat alignment as a Bayesian Optimization problem. By efficiently querying human preferences, we aim to find optimal prompts or model weights with theoretical regret bounds, minimizing the cost of human annotation.

2. Post-Training Exploration

Goal: Encouraging LLMs to think “outside the box.” Details: Standard RLHF can lead to mode collapse (repetitive answers). We are investigating the impact of intrinsic rewards on LLMs, encouraging the model to explore diverse reasoning paths and discover creative solutions during the fine-tuning phase.

Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
A Kayal, S Vakili, L Toni, D Shiu, A Bernacchia. ICML 2025.

LLMs Bayesian Optimization

LLM Alignment & Exploration

Overview

Active Projects

1. Bayesian Optimization from Human Feedback

2. Post-Training Exploration

Related Publications