The workshop will take place in Lecture Hall D, Aula Conference Centre, TU Delft, on July 15, 2024, from 8:45 to 17:00 CEST.
Overview
For robots to safely interact with people and the real world, they need the capability to not only perceive but also understand their surroundings in a semantically meaningful way (i.e., understanding implications or pertinent properties associated with the objects in the scene). Advanced perception methods coupled with learning algorithms have made significant progress in enabling semantic understanding. Recent breakthroughs in foundation models have further exposed opportunities for robots to contextually reason about their operating environments. Semantics is ingrained in every aspect of robotics, from perception to action; reliably exploiting semantic information in embodied systems requires tightly coupled perception, learning, and control algorithm design (e.g., a robot in a warehouse must recognize objects on the floor and reason whether it is safe to run over them). By organizing this workshop, we hope to foster discussions on innovative approaches that harness semantic understanding for the design and deployment of intelligent embodied systems. We aim to facilitate an interdisciplinary exchange between researchers in robot learning, perception, mapping, and control to identify the opportunities and pressing challenges when incorporating semantics into robotic applications.
Our workshop comprises two general themes with invited talks from experts in each:
Theme A – Environment Understanding and Reasoning: In this theme, we aim to provide an overview of the recent advances in 3D spatial understanding and reasoning in robotics. The invited talks will cover state-of-the-art methods enabling robots to derive semantically meaningful information from diverse sensor modalities and natural language instructions for effective downstream decision-making. The panel discussions will delve into opportunities and challenges related to geometric and semantic representations, uncertainty-aware perception, data-driven learning methods, and (safety) evaluation in practical applications.
Theme B – Safe Interaction with the World: In this theme, we will highlight the seminal approaches towards planning and control in complex, interactive scenarios. The invited talks will encompass control-theoretic frameworks for safe learning-based decision-making under uncertainties, safe and compliant human-robot interaction, and high-level language instructions for non-expert robot operation. The panel discussion will facilitate discussions on strategies for incorporating semantic understanding into planning and control algorithms to enable robot interaction in complex environments and enhance safe operation in real-world deployment.
Discussion Topics | Program | Invited Speakers | Contributed Talks | Organizers
Discussion Topics
Our workshop has two general themes. The first theme is “Environment Understanding and Reasoning,” where speakers will provide an overview of the recent developments in 3D scene understanding, semantic mapping and localization, and language-conditioned contextual reasoning. The second theme, “Safe Interaction with the World,” will focus on downstream motion planning and control frameworks for safe interaction with the perceived world. A preliminary set of discussion questions are listed below.
Theme A: Environment Understanding and Reasoning
- How do we efficiently represent the robot operating environment to facilitate the downstream planning and control tasks? What are the advantages and limitations of different representations (e.g., dense metric maps and scene graphs)?
- How do we characterize and account for uncertainties from perception, especially in dynamic or changing scenes?
- How do we meaningfully fuse high-dimensional sensor data (camera, lidar, radar) with foundation models for contextual reasoning in robotics?
- World models have shown promising results in computer vision (e.g., video generation). What are the current challenges and opportunities in robotics?
- How do ethical considerations come into play when designing robots with advanced semantic understanding for real-world applications?
Theme B: Safe Interaction with the World
- How do we incorporate semantic understanding into a robot decision-making pipeline to enable safe interaction?
- How do we propagate uncertainties from perception efficiently into planning and control to guarantee safety during deployment?
- How do we map a high-level understanding of the environment to constraints and objectives that are compatible with current planning and control-theoretic frameworks? Or do we need to rethink planning and control in the age of generative models?
- What dimensions of safety should be considered in real-world interactive scenarios, and how should safety be measured and benchmarked?
- How can interdisciplinary collaboration between researchers in robot learning, perception, and control enhance the development of intelligent embodied systems?
Program
Below is the program of the workshop. Times are in CEST.
Morning Session
08:45 – 09:00: Opening Remarks
09:00 – 09:20: Invited Talk: Oier Mees, “Low-Level Embodied Intelligence with Foundation Models”
09:20 – 09:40: Invited Talk: Angela Dai, “From Quantity to Quality for 3D Perception”
09:40 – 10:00: Invited Talk: Masha Itkina, “Towards Uncertainty-Aware Embodied Behavior Generalization”
10:00 – 10:30: Coffee Break and Poster Session
10:30 – 10:50: Invited Talk: Manuel Keppler, “Bridging the Divide: Unified Control for Rigid and Elastic Robots”
10:50 – 11:10: Invited Talk: Federico Tombari, “Open-set Semantics for Environment Understanding”
11:10 – 11:30: Invited Talk: Michael Milford, “What Can Understanding do for Navigating and Localizing Robots?”
11:30 – 12:30: Morning Session Panel Discussion
Afternoon Session
14:00 – 14:30: Spotlight Talks
14:30 – 14:50: Invited Talk: Luca Carlone, “Metric-Semantic World Models”
14:50 – 15:10: Invited Talk: Marco Pavone, “Leveraging Foundation Models to Develop Safer AV Stacks”
15:10 – 15:30: Invited Talk: Koushil Sreenath, “Generative Self-Supervised Learning for Legged Locomotion”
15:30 – 16:00: Coffee Break and Poster Session
16:00 – 16:20: Invited Talk: Andrea Bajcsy, “Towards Human–AI Safety: Unifying Generative AI and Control Systems Safety”
16:20 – 16:55: Afternoon Session Panel Discussion
16:55 – 17:00: Concluding Remarks
17:00 – 18:00: Social Gathering
We will have an optional post-workshop social event; further details will be shared on the workshop day.
Invited Speakers
Queensland University of Technology (QUT)
Massachusetts Institute of Technology (MIT)
Stanford University and Nvidia
Carnegie Mellon University (CMU)
Contributed Talks
- “Opening Cabinets and Drawers in the Real World using a Commodity Mobile Manipulator” by Arjun Gupta (UIUC), Michelle Zhang (UIUC), Rishik Sathua (UIUC), Saurabh Gupta (UIUC)
- “Agile Flight from Pixels without State Estimation” by Ismail Geles (University of Zurich), Leonard Bauersfeld (University of Zurich), Angel Romero (University of Zurich), Jiaxu Xing (University of Zurich), Davide Scaramuzza (University of Zurich)
- “Scaling 3D Reasoning with LMMs to Large Robot Mission Environments Using Datagraphs” by Wouter Jeroen Meijer (TNO)
- “Generalizable Robotic Manipulation: Object-Centric Diffusion Policy with Language Guidance” by Hang Li (TUM), Qian Feng (Agile Robots SE and TUM), Zhi Zheng (TUM), Jianxiang Feng (TUM), Alois Knoll (TUM)
- “Gaussian Process Obstacle Modeling with Task-Informed Constraints for Robot Manipulation” by Abhinav Kumar (University of Michigan), Peter Mitrano (University of Michigan at Ann Arbor), Dmitry Berenson (University of Michigan)
- “Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views” by Abhishek Kashyap (Örebro University), Henrik Andreasson (Örebro University), Todor Stoyanov (Örebro University)
- “DP-VLM: Differentiable Planner-Augmented Vision-Language Model for Explainable Closed-Loop Autonomous Driving” by Pranjal Paul (IIIT-Hyderabad), Anant Garg (IIIT-Hyderabad), Tushar Choudhary (IIIT-Hyderabad), Arunkumar Singh Madhava Krishna (IIIT-Hyderabad)
- “Adapting a Foundation Model for Space-based Tasks” by Matthew Foutter (Stanford), Praneet Bhoj (Stanford), Rohan Sinha (Stanford), Amine Elhafsi (Stanford), Somrita Banerjee (Stanford), Christopher G. Agia (Stanford), Justin Kruger (Stanford), Tommaso Guffanti (Stanford), Daniele Gammelli (DTU), Simone D’Amico (Stanford), Marco Pavone (Stanford)
- “AutoGrAN: Autonomous Vehicle LiDAR Contaminant Detection using Graph Attention Networks – Extended Abstract” by Grafika Jati (U. of Bologna), Martin Molan (U. of Bologna), Junaid Ahmed Khan (U. of Bologna), Francesco Barchi (University of Bologna), Andrea Bartolini (U. of Bologna), Giuseppe Mercurio (U. of Bologna), Andrea Acquaviva (U. of Bologna)
- “Leveraging Interactive Distance Fields for Safe and Smooth Reactive Planning” by Usama Ali (THWS), Fouad Sukkar (UTS), Adrian Mueller (THWS), Lan Wu (UTS), Tobias Kaupp (THWS), Teresa A. Vidal-Calleja (UTS)
- “Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing” by Lennart Niecksch (DFKI), Alexander Mock (Osnabrück University), Felix Igelbrink (DFKI), Thomas Wiemann (HS Fulda), Joachim Hertzberg (DFKI)