Title: Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change
Date: Thursday, November 21st, 2024
Time: 9:00 AM - 11:00 AM EST
Location: Coda C1115 Druid Hills
Remote Meeting Link: Zoom Link
Meeting ID: 924 3120 2756
Passcode: 438181
Jonathan Balloch
Robotics PhD Student
School of Interactive Computing
Georgia Institute of Technology
Committee
Dr. Mark Riedl (Advisor): School of Interactive Computing, Georgia Tech
Dr. Seth Hutchinson: School of Interactive Computing, Georgia Tech
Dr. Sehoon Ha: School of Interactive Computing, Georgia Tech
Dr. Harish Ravichandar: School of Interactive Computing, Georgia Tech
Dr. Michael Littman: Brown University; National Science Foundation
Abstract
Real-world autonomous decision-making systems, from robots to recommendation engines, must operate in environments that change over time. While deep reinforcement learning (RL) has shown an impressive ability to learn optimal policies in stationary environments, most methods are data intensive and assume a world that does not change between training and test time. As a result, conventional RL methods struggle to adapt when conditions change. This poses a fundamental challenge: how can RL agents efficiently adapt their behavior when encountering novel environmental changes during deployment without catastrophically forgetting useful prior knowledge? This dissertation demonstrates that efficient online adaptation requires two key capabilities: (1) prioritized exploration and sampling strategies that help identify and learn from relevant experiences, and (2) selective preservation of prior knowledge through structured representations that can be updated without disruption to reusable components.
We first establish a formal framework for studying online test-time adaptation (OTTA) in RL by introducing the Novelty Minigrid (NovGrid) test environment and metrics to systematically assess adaptation performance and analyze how different adaptation solutions handle various types of environmental change. We then begin our discussion of solutions to OTTA problems by investigating the impacts of different exploration and sampling strategies on adaptation. Through a comprehensive evaluation of model-free exploration strategies, we show that methods emphasizing stochasticity and explicit diversity are most effective for adaptation across different novelty types. Building on these insights, we develop the Dual Objective Priority Sampling (DOPS) strategy. DOPS improves model-based RL adaptation by training policy and world models on different subsets of data, each prioritized according to the different learning objectives. By balancing the trade-off between distribution overlap and mismatched objectives, DOPS achieves more sample-efficient adaptation while maintaining stable performance.
To improve adaptation efficiency with knowledge preservation, we develop WorldCloner, a neurosymbolic approach that enables rapid world model updates while preserving useful prior knowledge through a symbolic rule-based representation. WorldCloner demonstrates how structured knowledge representation can dramatically improve adaptation efficiency compared to traditional neural approaches. Finally, we present Concept Bottleneck World Models (CBWMs), which extend these insights into an end-to-end differentiable architecture. By grounding learned representations in human-interpretable concepts, CBWMs enable selective preservation of unchanged knowledge during adaptation while maintaining competitive task performance. CBMWs provide a practical path toward interpretable and efficient adaptation in neural RL systems.
Together, these contributions advance both the theoretical understanding and practical capabilities of adaptive RL systems. By showing how careful exploration and structured knowledge preservation can enable efficient online adaptation, this work helps bridge the gap between current RL systems and the demands of real-world applications where change is constant and adaptation essential.