
Abstract:
Generative AI has advanced rapidly in language and vision, driven by massive image–text datasets and scaled models. These gains are increasingly enabling robots with open-world perception and reasoning. Yet progress toward generalist robots is constrained by the scarcity of large-scale, high-quality interaction data, which limits real-world generalization and action-level reasoning. While MLLM-based systems show promise especially for acquiring low-level skills in everyday settings. My research focuses on moving beyond data scaling alone by formalizing and leveraging reasoning as a core principle for building truly generalist robotic models. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for spatial reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks.
Bio:
Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Jiafei’s research has been published in top AI and robotics venues, including ICLR, ICML, RSS, CoRL, ECCV, IJCAI, CoLM, and EMNLP, and has earned awards such as Best Paper at Ubiquitous Robots 2023 and a Spotlight at ICLR 2024. He is a recipient of both the ASTAR National Science PhD Scholarship and the ASTAR Undergraduate Scholarship.