About

👋🏻 Hi! I’m a Presidential Young Professor at the National University of Singapore (NUS), School of Computing, where I direct the MAGIC Lab. I completed my PhD in Computer Science at the University of Washington, co-advised by Prof. Ranjay Krishna and Prof. Dieter Fox (2022–2026). I am also a Graduate Student Researcher at the Allen Institute for AI (AI2), working with the PRIOR and Robotics teams. I have also interned at NVIDIA as a Research Scientist Intern.

My research mission is simple: I teach robots to perceive, reason, and act. I focus on robot learning, embodied AI, and building large-scale robotics foundation models that are deployable in the real world.

Previously, I obtained my B.Eng. in Electrical and Electronic Engineering with Highest Distinction from Nanyang Technological University (NTU), Singapore.

I am recruiting PhDs, RAs, Postdocs, and Visiting Students to join the MAGIC Lab at NUS! Sign up here.

🔥 News

[2026/07] I am teaching CS6283 (Robot Learning in the Era of Foundation Models) at NUS.
[2026/07] Invited to serve as Associate Editor for RA-L.
[2026/06] Our paper FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models has been accepted to IROS 2026.
[2026/06] Honored to be a finalist for the Rising Star Award for Spatial Intelligence at the CVPR E2E3D Workshop.
[2026/06] Invited to serve as Area Chair for CoRL 2026.
[2026/06] Our paper VLS: Steering Pretrained Robot Policies via Vision–Language Models has received the Outstanding Paper Award at the Foundation Models Meet Embodied Agents @ CVPR 2026.
[2026/06] Our paper VLS: Steering Pretrained Robot Policies via Vision–Language Models has received the Best Paper Runner-Up at the CVPR 2026 3D-LLM/VLA Workshop.
[2026/06] Our paper MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation has received the Best Paper Award at the VLA Pipeline Workshop @ ICRA 2026.
[2026/06] Our paper MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation has received the Best Paper Award at the Beyond Teleoperation Workshop @ ICRA 2026.
[2026/06] Our paper MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation has received the Best Paper Award at the ICRA 2026 SDRL Workshop.
[2026/05] We are launching MolmoAct2: Action Reasoning Models for Real-World Deployment.
[2026/04] Our paper MolmoSpaces: Large-Scale Open Ecosystem for Robot Navigation and Manipulation has been accepted to RSS 2026 as Oral.
[2026/02] Our paper From Mystery to Mastery: Failure Diagnosis and Recovery for Robotic Manipulation has been accepted to ICLR 2026.
[2026/01] Our paper MolmoAct: Action Reasoning Models that Reason in Space has been accepted to ICRA 2026.
[2025/12] Honored to join NUS as a Presidential Young Professor, directing the MAGIC Lab.
[2025/10] Our paper FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models has been preprinted to arXiv.
[2025/09] Our paper SAM2Act has been selected as Best Paper Award at RemembeRL @ CoRL 2025.
[2025/05] Our paper SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation has been accepted to ICML 2025.
[2025/04] Our paper AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation has been accepted to ICLR 2025.
[2024/09] Our paper Manipulate-Anything: Automating Real-World Robots using Vision-Language Models has been accepted to CoRL 2024.
[2024/07] Our papers THE COLOSSEUM and Octopi have been accepted to RSS 2024 as Oral.
[2024/01] Our paper Selective Visual Representations Improve Convergence and Generalization for Embodied AI has been accepted to ICLR 2024 as Spotlight.
[2023/10] Our paper NEWTON: Are Large Language Models Capable of Physical Reasoning? has been accepted to EMNLP 2023.
[2023/07] AR2-D2 accepted to CoRL 2023. Good Time to Ask wins Best Paper Award at Ubiquitous Robots 2023.
[2022/07] PIP accepted to ECCV 2022. Intuitive Physics Survey accepted to IJCAI 2022 as Oral.

📑 Publications

MolmoAct2: Action Reasoning Models for Real-World Deployment

Haoquan Fang*, Jiafei Duan*, Donovan Clay^§, Sam Wang^§, Shuo Liu^§, Weikai Huang^§, Xiang Fan^§, Wei-Chuan Tsai^§, Shirui Chen^§, Yi Ru Wang^§, Shanli Xing^§, Jaemin Cho, Jae Sung Park, Ainaz Eftekhar, Peter Sushko, Karen Farley, Angad Wadhwa, Cole Harrison, Winson Han, Ying-Chun Lee, Eli VanderBilt, Rose Hendrix, Suveen Ellawela, Lucas Ngoo, Joyce Chai, Zhongzheng Ren, Ali Farhadi, Dieter Fox, Ranjay Krishna

arXiv 2026

Paper Code Blog Hardware SiliconAngle TechTimes RoboticsNews AIEconomy AIInsider

MolmoAct: Action Reasoning Models that Reason in Space

Jason Lee*^§, Jiafei Duan*^§, Haoquan Fang*^§, Yuquan Deng^§, Shuo Liu^§, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang^§, Sangho Lee, Winson Han, Wilbert Pumacay, Angelica Wu, Rose Hendrix, Karen Farley, Eli VanderBilt, Ali Farhadi, Dieter Fox^§‡, Ranjay Krishna^§‡

International Conference on Robotics and Automation (ICRA) 2026

Rational Robots @ CoRL 2025 Best Paper Award Runner-up

Data @ CoRL 2025 Best Paper Award Nominee

Paper Code Blog Models Datasets VentureBeat GeekWire

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox^‡, Ranjay Krishna^‡, Jiafei Duan^‡

International Conference on Machine Learning (ICML) 2025

RemembeRL @ CoRL 2025 Best Paper Award

Jiafei Duan

🔥 News

📑 Publications

MolmoAct2: Action Reasoning Models for Real-World Deployment

MolmoAct: Action Reasoning Models that Reason in Space

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

AR2-D2: Training a Robot Without a Robot

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models

WildDet3D: Scaling Promptable 3D Detection in the Wild

MolmoAct2: Action Reasoning Models for Real-World Deployment

MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

VLS: Steering Pretrained Robot Policies via Vision-Language Models

Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

RoboCade: Gamifying Robot Data Collection

The One RING: A Robotic Indoor Navigation Generalist

MolmoAct: Action Reasoning Models that Reason in Space

SAT: Spatial Aptitude Training for Multimodal Language Models

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

EVE: Enabling Anyone to Train Robots Using Augmented Reality

Octopi: Object Property Reasoning with Large Tactile-Language Models

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

NEWTON: Are Large Language Models Capable of Physical Reasoning?

AR2-D2: Training a Robot Without a Robot

A Benchmark for Modeling Violation-of-Expectation in Physical Reasoning Across Event Categories

Good Time to Ask: A Learning Framework for Asking for Help in Navigation

PIP: Physical Interaction Prediction via Mental Simulation with Span Selection

A Survey on Machine Learning Approaches for Modelling Intuitive Physics

A Survey of Embodied AI: From Simulators to Research Tasks

ActioNet: Interactive End-to-End Platform for Task-Based Data Collection and Augmentation in 3D Environment

💬 Invited Talks

📖 Academic Services

🎓 Education

💼 Experience

🏅 Awards