Learning What Matters:

Adaptive Information Theoretic Objectives for Robot Exploration

Youwei Yu¹ Jionghao Wang² Zhengming Yu² Wenping Wang² Lantao Liu¹

¹Indiana University ²Texas A&M University

RSS 2026

Can we really improve policy from information gains?

Abstract
A vivid visualization

Problem: information-theoretic exploration objectives in robotics are hard to design because many model parameters are weakly observable or unidentifiable, causing standard objectives to overestimate what exploration data can actually teach the system.

Method: QOED (Quasi-Optimal Experimental Design) analyzes the Fisher information matrix to identify learnable parameter directions, then adapts the exploration objective to prioritize those identifiable directions while suppressing nuisance effects from unidentifiable parameters.

Result: QOED provides a theoretically grounded approximation to ideal information-maximizing exploration and improves both exploration efficiency and downstream policy learning, achieving up to 35.23% gains from identifiable-direction selection and 21.98% gains from nuisance suppression across simulated and real-world robotics tasks.

A vivid visualization of the QOED method.

We start from the standard information-gain view of robot exploration: choose a policy that collects data most informative about the hidden parameters.

\[ \pi^\star = \operatorname*{arg\,max}_{\pi} \mathbb{E}\left[\operatorname{Info\text{-}Gain}(\text{data}\mid \text{model}, \pi)\right]. \]

BOED approximates this objective using the Fisher information matrix. For a trajectory \(\tau\) and parameters \(\phi\), the score \(g = \nabla_{\phi} \log p(\tau \mid \phi, \pi)\) measures how sensitive the observed data is to parameter changes.

\[ \mathcal{B}_{\mathrm{BOED}}(\pi) = \operatorname{tr}(F_{\phi}^{\pi}), \qquad F_{\phi}^{\pi} = \mathbb{E}[g g^\top]. \]

Vanilla BOED maximizes information over all parameters. This can waste exploration on weakly observable or unidentifiable directions.

Agnostic QOED keeps only the selected critical parameters \(k\), but treats the discarded parameters \(\bar{k}\) as irrelevant.

QOED keeps the critical information that remains after removing the part predictable from nuisance parameters.

\[ \mathcal{B}_{\mathrm{Agnostic}}(\pi) = \operatorname{tr}(F_{kk}^{\pi}) \]

\[ \mathcal{B}_{\mathrm{QOED}}(\pi) = \operatorname{tr}\left( F_{kk}^{\pi} - F_{k\bar{k}}^{\pi} (F_{\bar{k}\bar{k}}^{\pi})^{-1} F_{\bar{k}k}^{\pi} \right). \]

Intuitively, the orange term is the nuisance shadow: information in the critical block that can be explained by nuisance directions. QOED subtracts this shadow and optimizes the remaining conditional information.

Model-Based Policy Optimization Simulations

GO1 FLAT

GO1 ROUGH

G1 FLAT

Model-Based Policy Optimization Experiments

FRANKA

FOREST

HIGHBAY

MESH

BibTeX

@inproceedings{
    yuRSS26qoed,
    title={Learning What Matters: Adaptive Information Theoretic Objectives for Robot Exploration},
    author={Youwei Yu and Jionghao Wang and Zhengming Yu and Wenping Wang and Lantao Liu},
    booktitle={Proceedings of Robotics: Science and Systems},
    year={2026}
}