The workshop is scheduled for November 9, 2024, from 8:45 AM to 6:00 PM CET. It will be held in Room Jupiter (Ground Floor) at the Science Congress Center Munich.
Overview
Manipulation is a crucial skill for fully autonomous robots operating in complex, real-world environments. As robots move into dynamic, human-centric spaces, it is increasingly important to develop reliable and versatile manipulation abilities. With the availability of large datasets (e.g., RT-X) and recent advances in robot learning and perception (e.g., deep RL, diffusion, and language-conditioned methods), there has been significant progress in acquiring new skills, understanding common sense, and enabling natural interaction in human-centric environments. These advances spark new questions about (i) the learning methods that best utilize abundant data to learn versatile and reliable manipulation policies and (ii) the modalities (e.g., visual, tactile) and sources (e.g., real-world, high-fidelity contact simulations) of training data for acquiring general-purpose skills. In this workshop, we aim to facilitate an interdisciplinary exchange between the communities in robot learning, computer vision, manipulation, and control. Our goal is to map out further potential and limitations of current large-scale data-driven methods for the community and discuss pressing challenges and opportunities in diversifying data modalities and sources for mastering robot manipulation in real-world applications.
Discussion Themes | Program | Invited Speakers | Workshop Papers | Organizers
Discussion Themes
Our workshop comprises two closely related themes with invited talks from experts in each.
Theme A: Learning Methods for Versatile and Reliable Manipulation
– What are the roles of RL, imitation learning, and foundation models in manipulation, and how do we best leverage these methods/tools to achieve human-like learning and refinement of manipulation skills?
– Is scaling with large models and diverse datasets the way toward acquiring general-purpose manipulation skills? How do we best exploit our prior knowledge to facilitate versatile but also reliable learning? What are some challenges arising from cross-embodiment learning?
– How can foundation models trained on large datasets reach high reliability (99.9+%) as required in many real-world (industrial) applications? What are some criteria for real-world deployment?
– Will the common sense/reasoning capability enabled by foundation models improve the robustness of robot learning algorithms in the long run?
Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition
– We have seen a proliferation of LLMs and VLMs in the robot decision-making software stack. Which sensor data modalities are required for learning and reliable deployment of manipulation skills?
– When is tactile feedback required for manipulation, and how can it be combined with vision? Can we train gripper-agnostic foundation models for dexterous manipulation?
– What role does internet video data play, and is simulation necessary to generate synthetic data? How can we collect informative data in the real world and effectively combine it with synthetic data for “in-the-wild” task learning?
– How can manipulation datasets containing different data modalities be effectively combined for cross-embodiment learning?
Program
Times below are in CET. Session A and B each have a 10-minute introduction, a set of 30-minute invited talks with brief Q&A sessions, and a 30-minute moderated panel discussion. In between, we have a spotlight talks session.
Theme A: Learning Methods for Versatile and Reliable Manipulation
08:45 – 09:00 Opening Remarks and Theme A Introduction
09:00 – 09:30 Invited Talk: Sergey Levine, “Robotic foundation models”
09:30 – 10:00 Invited Talk: Jens Lundell, “Is there room for data efficiency in a world of abundant data?”
10:00 – 10:30 Invited Talk: Ankur Handa, “Exploring the frontiers of dexterous robot hand manipulation”
10:30 – 11:00 Coffee Break and Morning Poster Session
11:00 – 11:30 Invited Talk: Carlo Sferrazza, “Learning generalizable representations from vision and touch”
11:30 – 12:00 Theme A Panel Discussion
Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition
13:45 – 13:55 Theme B Introduction
13:55 – 14:30 Spotlight Talks
14:30 – 15:00 Invited Talk: Ted Xiao, “What’s missing for robotics-first foundation models?”
15:00 – 15:30 Invited Talk: Christian Gehring, “A real world perspective on Mastering Robot Manipulation in a World of Abundant Data”
15:30 – 16:00 Coffee Break and Afternoon Poster Session
16:00 – 16:30 Invited Talk: Mohsen Kaboli, “Embodied interactive visuo-tactile perception and learning for robotic grasp and manipulation”
16:30 – 17:00 Invited Talk: Katerina Fragkiadaki, “Training robot manipulators with 3D scene representations in the real world and in simulation”
17:00 – 17:30 Invited Talk: Shuran Song
17:30 – 18:00 Theme B Panel Discussion
Post-Workshop Social
Invited Speakers
Workshop Papers
Spotlight Talks
- RoboCrowd: Scaling Robot Data Collection through Crowdsourcing
- STEER: Bridging VLMs and Low-Level Control for Adaptable Robotic Manipulation
- SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting
- Latent Action Pretraining From Videos
- Diffusion Policy Policy Optimization
- ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
Accepted Papers
- BAKU: An Efficient Transformer for Multi-Task Policy Learning
- SkillGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment
- Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments
- Learning Precise, Contact-Rich Manipulation through Uncalibrated Tactile Skins
- AnySkin: Plug-and-play Skin Sensing for Robotic Touch
- Local Policies Enable Zero-shot Long-horizon Manipulation
- OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation
- UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
- From Imitation to Refinement – Residual RL for Precise Visual Assembly
- DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
- DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
- Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control
- ATK: Automatic Task-driven Keypoint selection for Policy Transfer from Simulation to Real World
- Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations
- STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
- ScissorBot: Learning Generalizable Scissor Skill for Paper Cutting via Simulation, Imitation, and Sim2Real
- Student-Informed Teacher Training
- ActionFlow: Equivariant, Accurate, and Efficient Manipulation Policies with Flow Matching
- Subtask-Aware Visual Reward Learning from Segmented Demonstrations
- RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation
- Fast Reinforcement Learning without Rewards or Demonstrations via Auxiliary Task Examples
- SonicSense: Object Perception from In-Hand Acoustic Vibration
- Neural MP: A Generalist Neural Motion Planner
- ClutterGen: A Cluttered Scene Generator for Robot Learning
- Robot Manipulation with Flow Matching
- Multi-constrained robot motion generation
- Safe and stable motion primitives via imitation learning and geometric fabrics
- PokeFlex: A Real-World Dataset of Deformable Objects for Robotics.
- What Matters in Learning from Large-Scale Datasets for Robot Manipulation
- Interactive Visuo-Tactile Learning to Estimate Properties of Articulated Objects
- Bi3D Diffuser Actor: 3D Policy Diffusion for Bi-manual Robot Manipulation
- Just Add Force for Delicate Robot Policies
- Towards Benchmarking Robotic Manipulation in Space
- GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
- Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation
- RT-Affordance: Reasoning about Robotic Manipulation with Affordances
- Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
- Parental Guidance: Evolutionary Distillation for Non-Prehensile Mobile Manipulation
- Towards Standards and Guidelines for Developing Open-Source and Benchmarking Learning for Robot Manipulation in the COMPARE Ecosystem
- Efficient and Scalable Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
- Enhancing Probabilistic Imitation Learning with Robotic Perception for Self-Organising Robotic Workstation
- Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection
- TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning
Call for Papers
We are inviting researchers from different disciplines to share novel ideas on topics pertinent to the workshop themes, which include but are not limited to:
- Foundation models for robot learning
- Diffusion and energy-based policies for robot manipulation
- Deep reinforcement learning for real-world robot grasping and manipulation
- Real-world datasets and simulators for general-purpose skill acquisition
- Comparisons of foundation-model-based methods and conventional robot learning methods (e.g., task generalization versus performance)
- Visuo-tactile sensing for robot manipulation and/or methods leveraging multimodalities
- Environment perception and representation for robot learning
- Positions on what robots are not yet able to do (i.e., the challenges at the cutting edge of one or multiple subfields)
- Best practices for data collection and aggregation (multimodality, teleoperation, examples to include)
The review process will be double-blind. Accepted papers will be published on the workshop webpage and will be presented as a spotlight talk or as a poster. If you have any questions, please contact us at contact.lsy@xcit.tum.de.
Paper Format
Suggested Length: minimum 2 and maximum 4 pages excluding references
Style Template: CoRL Paper Template
Important Dates
Initial Submission: October 15, 2024 (11:59 pm AoE)
Author Notification: October 29, 2024
Camera Ready Submission: November 01, 2024 (11:59 pm AoE)
Workshop Date: November 09, 2024