Learning What Matters:

Adaptive Information Theoretic Objectives for Robot Exploration

RVIZ
1Indiana University   2Texas A&M University

RSS 2026

Can we really improve policy from information gains?

Problem: information-theoretic exploration objectives in robotics are hard to design because many model parameters are weakly observable or unidentifiable, causing standard objectives to overestimate what exploration data can actually teach the system.

Method: QOED (Quasi-Optimal Experimental Design) analyzes the Fisher information matrix to identify learnable parameter directions, then adapts the exploration objective to prioritize those identifiable directions while suppressing nuisance effects from unidentifiable parameters.

Result: QOED provides a theoretically grounded approximation to ideal information-maximizing exploration and improves both exploration efficiency and downstream policy learning, achieving up to 35.23% gains from identifiable-direction selection and 21.98% gains from nuisance suppression across simulated and real-world robotics tasks.

A vivid visualization

A vivid visualization of the QOED method.

We start from the standard information-gain view of robot exploration: choose a policy that collects data most informative about the hidden parameters.

\[ \pi^\star = \operatorname*{arg\,max}_{\pi} \mathbb{E}\left[\operatorname{Info\text{-}Gain}(\text{data}\mid \text{model}, \pi)\right]. \]

BOED approximates this objective using the Fisher information matrix. For a trajectory \(\tau\) and parameters \(\phi\), the score \(g = \nabla_{\phi} \log p(\tau \mid \phi, \pi)\) measures how sensitive the observed data is to parameter changes.

\[ \mathcal{B}_{\mathrm{BOED}}(\pi) = \operatorname{tr}(F_{\phi}^{\pi}), \qquad F_{\phi}^{\pi} = \mathbb{E}[g g^\top]. \]
Vanilla BOED maximizes information over all parameters. This can waste exploration on weakly observable or unidentifiable directions.
Agnostic QOED keeps only the selected critical parameters \(k\), but treats the discarded parameters \(\bar{k}\) as irrelevant.
QOED keeps the critical information that remains after removing the part predictable from nuisance parameters.
\[ \mathcal{B}_{\mathrm{Agnostic}}(\pi) = \operatorname{tr}(F_{kk}^{\pi}) \]
\[ \mathcal{B}_{\mathrm{QOED}}(\pi) = \operatorname{tr}\left( F_{kk}^{\pi} - F_{k\bar{k}}^{\pi} (F_{\bar{k}\bar{k}}^{\pi})^{-1} F_{\bar{k}k}^{\pi} \right). \]

Intuitively, the orange term is the nuisance shadow: information in the critical block that can be explained by nuisance directions. QOED subtracts this shadow and optimizes the remaining conditional information.

Model-Based Policy Optimization Simulations

GO1 FLAT
GO1 ROUGH
G1 FLAT

Model-Based Policy Optimization Experiments

FRANKA
FOREST
HIGHBAY
MESH

BibTeX

@inproceedings{
    yuRSS26qoed,
    title={Learning What Matters: Adaptive Information Theoretic Objectives for Robot Exploration},
    author={Youwei Yu and Jionghao Wang and Zhengming Yu and Wenping Wang and Lantao Liu},
    booktitle={Proceedings of Robotics: Science and Systems},
    year={2026}
}