Vision-centric state estimation and mapping for visually challenging scenarios
Share
- Publication Date
- Abstract
Reliable 3D scene understanding is essential for enabling autonomous robot operation in complex environments. This thesis addresses the challenges of vision-based state estimation and mapping in challenging scenarios, where conventional methods often struggle due to motion blur, low light, or high dynamic motion. The overarching goal is to develop vision-centric systems that enhance state estimation and scene interpretation by leveraging both novel sensing technologies and robust multi-session mapping strategies.
The first contribution of this thesis presents a stereo event-based visual odometry (VO) system that fully exploits the asynchronous and high-temporal-resolution nature of event cameras. Unlike traditional frame-based VO systems that estimate robot states at a fixed rate in a discrete manner, the proposed system models camera motion as a continuous-time trajectory, enabling per-event state estimation. It combines asynchronous feature tracking with a physically-grounded motion prior to estimate a smooth trajectory that allows pose query at any time within the measurement window. Experimental results demonstrate that this system achieves competitive performance under high-speed motion and challenging lighting conditions, offering a promising alternative for continuous-time state estimation on asynchronous data streams.
The second contribution introduces Exosense, a scene understanding system tailored for self-balancing exoskeletons. Building upon a wide field-of-view multi-camera device, Exosense can generate rich, semantically annotated elevation maps that integrate geometry, terrain traversability, and room-level semantics. The system supports indoor navigation by providing reusable environment representations for localization and planning. Designed as a wearable sensing platform, Exosense emphasizes modularity and adaptability, with the potential for integration into a broader wearable sensor ecosystem.
Building upon Exosense, the third contribution is LT-Exosense, a change-aware, multi-session mapping system designed for long-term operation in dynamic environments. LT-Exosense incrementally merges scene representations built during repeated traversals of an environment, detects environmental changes, and updates a unified global map. This map representation enables adaptive path planning in response to the dynamic environment. The system supports persistent spatial memory and demonstrates compatibility with different sensor configurations, offering a flexible and scalable foundation for lifelong assistive mobility.
Together, these contributions cover different topics under vision-centric state estimation and mapping in challenging scenarios, including high-speed sensing, semantic scene interpretation, and long-term map maintenance. The thesis opens up new possibilities for robust autonomy on resource-constrained platforms, such as drones and self-balancing exoskeletons, where reliable environmental understanding is critical to safe and intelligent operation.
- Publication Details
- Type
- D.Phil. Thesis
- Institution
- University of Oxford
- Notes
- Co-supervised with Prof. Maurice Fallon
- BibTeX Entry
@phdthesis{wang_dphil25,
author = {Jianeng Wang},
title = {Vision-centric state estimation and mapping for visually challenging scenarios},
school = {University of Oxford},
year = {2025},
}