CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data

The workshop is scheduled for November 9, 2024, from 8:45 AM to 6:00 PM CET. It will be held in Room Jupiter (Ground Floor) at the Science Congress Center Munich.

Overview

Manipulation is a crucial skill for fully autonomous robots operating in complex, real-world environments. As robots move into dynamic, human-centric spaces, it is increasingly important to develop reliable and versatile manipulation abilities. With the availability of large datasets (e.g., RT-X) and recent advances in robot learning and perception (e.g., deep RL, diffusion, and language-conditioned methods), there has been significant progress in acquiring new skills, understanding common sense, and enabling natural interaction in human-centric environments. These advances spark new questions about (i) the learning methods that best utilize abundant data to learn versatile and reliable manipulation policies and (ii) the modalities (e.g., visual, tactile) and sources (e.g., real-world, high-fidelity contact simulations) of training data for acquiring general-purpose skills. In this workshop, we aim to facilitate an interdisciplinary exchange between the communities in robot learning, computer vision, manipulation, and control. Our goal is to map out further potential and limitations of current large-scale data-driven methods for the community and discuss pressing challenges and opportunities in diversifying data modalities and sources for mastering robot manipulation in real-world applications.

Discussion Themes | Program | Invited Speakers | Workshop Papers | Organizers


Discussion Themes

Our workshop comprises two closely related themes with invited talks from experts in each.

Theme A: Learning Methods for Versatile and Reliable Manipulation

– What are the roles of RL, imitation learning, and foundation models in manipulation, and how do we best leverage these methods/tools to achieve human-like learning and refinement of manipulation skills?
– Is scaling with large models and diverse datasets the way toward acquiring general-purpose manipulation skills? How do we best exploit our prior knowledge to facilitate versatile but also reliable learning? What are some challenges arising from cross-embodiment learning?
– How can foundation models trained on large datasets reach high reliability (99.9+%) as required in many real-world (industrial) applications? What are some criteria for real-world deployment?
– Will the common sense/reasoning capability enabled by foundation models improve the robustness of robot learning algorithms in the long run?

Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition

– We have seen a proliferation of LLMs and VLMs in the robot decision-making software stack. Which sensor data modalities are required for learning and reliable deployment of manipulation skills?
– When is tactile feedback required for manipulation, and how can it be combined with vision? Can we train gripper-agnostic foundation models for dexterous manipulation?
– What role does internet video data play, and is simulation necessary to generate synthetic data? How can we collect informative data in the real world and effectively combine it with synthetic data for “in-the-wild” task learning?
– How can manipulation datasets containing different data modalities be effectively combined for cross-embodiment learning?


Program

Times below are in CET. Session A and B each have a 10-minute introduction, a set of 30-minute invited talks with brief Q&A sessions, and a 30-minute moderated panel discussion. In between, we have a spotlight talks session.

Theme A: Learning Methods for Versatile and Reliable Manipulation

08:45 – 09:00 Opening Remarks and Theme A Introduction
09:00 – 09:30 Invited Talk: Sergey Levine, “Robotic foundation models”
09:30 – 10:00 Invited Talk: Jens Lundell, “Is there room for data efficiency in a world of abundant data?”
10:00 – 10:30 Invited Talk: Ankur Handa, “Exploring the frontiers of dexterous robot hand manipulation”
10:30 – 11:00 Coffee Break and Morning Poster Session
11:00 – 11:30 Invited Talk: Carlo Sferrazza, “Learning generalizable representations from vision and touch”
11:30 – 12:00 Theme A Panel Discussion

Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition

13:45 – 13:55 Theme B Introduction
13:55 – 14:30 Spotlight Talks
14:30 – 15:00 Invited Talk: Ted Xiao, “What’s missing for robotics-first foundation models?”
15:00 – 15:30 Invited Talk: Christian Gehring, “A real world perspective on Mastering Robot Manipulation in a World of Abundant Data”
15:30 – 16:00 Coffee Break and Afternoon Poster Session
16:00 – 16:30 Invited Talk: Mohsen Kaboli, “Embodied interactive visuo-tactile perception and learning for robotic grasp and manipulation”
16:30 – 17:00 Invited Talk: Katerina Fragkiadaki, “Training robot manipulators with 3D scene representations in the real world and in simulation”
17:00 – 17:30 Invited Talk: Shuran Song
17:30 – 18:00 Theme B Panel Discussion

Post-Workshop Social

Invited Speakers


Sergey Levine
UC Berkeley

Ankur Handa
NVIDIA

Ted Xiao
Google DeepMind

Shuran Song
Stanford University

Mohsen Kaboli
BMW and TU/e

Carlo Sferrazza
UC Berkeley

Workshop Papers

Accepted Papers

Call for Papers

We are inviting researchers from different disciplines to share novel ideas on topics pertinent to the workshop themes, which include but are not limited to:

  • Foundation models for robot learning
  • Diffusion and energy-based policies for robot manipulation
  • Deep reinforcement learning for real-world robot grasping and manipulation
  • Real-world datasets and simulators for general-purpose skill acquisition
  • Comparisons of foundation-model-based methods and conventional robot learning methods (e.g., task generalization versus performance)
  • Visuo-tactile sensing for robot manipulation and/or methods leveraging multimodalities
  • Environment perception and representation for robot learning
  • Positions on what robots are not yet able to do (i.e., the challenges at the cutting edge of one or multiple subfields)
  • Best practices for data collection and aggregation (multimodality, teleoperation, examples to include)

The review process will be double-blind. Accepted papers will be published on the workshop webpage and will be presented as a spotlight talk or as a poster. If you have any questions, please contact us at contact.lsy@xcit.tum.de.

Paper Format

Suggested Length: minimum 2 and maximum 4 pages excluding references
Style Template: CoRL Paper Template

Important Dates

Initial Submission: October 15, 2024 (11:59 pm AoE)
Author Notification: October 29, 2024
Camera Ready Submission: November 01, 2024 (11:59 pm AoE)
Workshop Date: November 09, 2024

OpenReview Submission Link

http://tiny.cc/corl24-mrm-d-submission

Organizers


Angela Schoellig
TUM and University of Toronto

Animesh Garg
Georgia Tech and NVIDIA

Oier Mees
UC Berkeley
University of Toronto Institute for Aerospace Studies