Multiagent Coordination and Learning

There are tasks that cannot be done by a single robot alone. A group of robots collaborating on a task has the potential of being highly efficient, flexible and robust. If one robot fails, another robot could take its position. However, coordinating a large group of robots through a centralized control unit is difficult as it would require the centralized unit to talk to all robots and compute next actions for a possibly huge number of team members. We investigate decentralized control strategies, where each robot is a self-contained unit able to communicate with or observe its closest neighbor robots and make decisions based on its own observations. The goal is that such a team of self-contained robot units is capable to achieving a joint goal. Such a decentralized approach scales to robot teams of any size. Our research in this area particularly focuses on:

  • Decentralized learning strategies that enable a team of robots to improve over time and
  • Learning approaches that help us find decentralized control strategies for complex problems that we know how to solve in a centralized way but finding a decentralized strategy from intuition turns out to be difficult.

 

Related Publications

Multiagent Coordination

[DOI] SICNav: safe and interactive crowd navigation using model predictive control and bilevel optimization
S. Samavi, J. R. Han, F. Shkurti, and A. P. Schoellig
IEEE Transactions on Robotics, 2024.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in closed-loop. We model each human in the crowd to be following an Optimal Reciprocal Collision Avoidance (ORCA) scheme and embed that model as a constraint in the robot’s local planner, resulting in a bilevel nonlinear MPC optimization problem. We use a KKT-reformulation to cast the bilevel problem as a single level and use a nonlinear solver to optimize. Our MPC method can influence pedestrian motion while explicitly satisfying safety constraints in a single-robot multi-human environment. We analyze the performance of SICNav in two simulation environments and indoor experiments with a real robot to demonstrate safe robot motion that can influence the surrounding humans. We also validate the trajectory forecasting performance of ORCA on a human trajectory dataset. Code: https://github.com/sepsamavi/safe-interactive-crowdnav.git.

@ARTICLE{samavi-tro23,
title = {{SICNav}: Safe and Interactive Crowd Navigation using Model Predictive Control and Bilevel Optimization},
author = {Sepehr Samavi and James R. Han and Florian Shkurti and Angela P. Schoellig},
journal = {{IEEE Transactions on Robotics}},
year = {2024},
urllink = {https://arxiv.org/abs/2310.10982},
doi = {10.1109/TRO.2024.3484634},
abstract = {Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in closed-loop. We model each human in the crowd to be following an Optimal Reciprocal Collision Avoidance (ORCA) scheme and embed that model as a constraint in the robot's local planner, resulting in a bilevel nonlinear MPC optimization problem. We use a KKT-reformulation to cast the bilevel problem as a single level and use a nonlinear solver to optimize. Our MPC method can influence pedestrian motion while explicitly satisfying safety constraints in a single-robot multi-human environment. We analyze the performance of SICNav in two simulation environments and indoor experiments with a real robot to demonstrate safe robot motion that can influence the surrounding humans. We also validate the trajectory forecasting performance of ORCA on a human trajectory dataset. Code: https://github.com/sepsamavi/safe-interactive-crowdnav.git.}
}

AMSwarmX: safe swarm coordination in complex environments via implicit non-convex decomposition of the obstacle-free space
V. K. Adajania, S. Zhou, A. K. Singh, and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2024. Accepted.
[View BibTeX] [View Abstract] [Download PDF]

Quadrotor motion planning in complex environments leverage the concept of safe flight corridor (SFC) to facilitate static obstacle avoidance. Typically, SFCs are constructed through convex decomposition of the environment’s free space into cuboids, convex polyhedra, or spheres. However, when dealing with a quadrotor swarm, such SFCs can be overly conservative, substantially limiting the available free space for quadrotors to coordinate. This paper presents an Alternating Minimization-based approach that does not require building a conservative free-space approximation. Instead, both static and dynamic collision constraints are treated in a unified manner. Dynamic collisions are handled based on shared position trajectories of the quadrotors. Static obstacle avoidance is coupled with distance queries from the Octomap, providing an implicit non-convex decomposition of free space. As a result, our approach is scalable to arbitrary complex environments. Through extensive comparisons in simulation, we demonstrate a 60\% improvement in success rate, an average 1.8× reduction in mission completion time, and an average 23× reduction in per-agent computation time compared to SFC-based approaches. We also experimentally validated our approach using a Crazyflie quadrotor swarm of up to 12 quadrotors in obstacle-rich environments. The code, supplementary materials, and videos are released for reference.

@inproceedings{adajania-icra24,
author={Vivek K. Adajania and Siqi Zhou and Arun Kumar Singh and Angela P. Schoellig},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
title={{AMSwarmX}: Safe Swarm Coordination in CompleX Environments via Implicit Non-Convex Decomposition of the Obstacle-Free Space},
year={2024},
note={Accepted},
abstract = {Quadrotor motion planning in complex environments leverage the concept of safe flight corridor (SFC) to facilitate static obstacle avoidance. Typically, SFCs are constructed through convex decomposition of the environment's free space into cuboids, convex polyhedra, or spheres. However, when dealing with a quadrotor swarm, such SFCs can be overly conservative, substantially limiting the available free space for quadrotors to coordinate. This paper presents an Alternating Minimization-based approach that does not require building a conservative free-space approximation. Instead, both static and dynamic collision constraints are treated in a unified manner. Dynamic collisions are handled based on shared position trajectories of the quadrotors. Static obstacle avoidance is coupled with distance queries from the Octomap, providing an implicit non-convex decomposition of free space. As a result, our approach is scalable to arbitrary complex environments. Through extensive comparisons in simulation, we demonstrate a 60\% improvement in success rate, an average 1.8× reduction in mission completion time, and an average 23× reduction in per-agent computation time compared to SFC-based approaches. We also experimentally validated our approach using a Crazyflie quadrotor swarm of up to 12 quadrotors in obstacle-rich environments. The code, supplementary materials, and videos are released for reference.}
}

[DOI] AMSwarm: an alternating minimization approach for safe motion planning of quadrotor swarms in cluttered environments
V. K. Adajania, S. Zhou, A. K. Singh, and A. P. and Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2023, p. 1421–1427.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Code]

This paper presents a scalable online algorithm to generate safe and kinematically feasible trajectories for quadrotor swarms. Existing approaches rely on linearizing Euclidean distance-based collision constraints and on axis-wise decoupling of kinematic constraints to reduce the trajectory optimization problem for each quadrotor to a quadratic program (QP). This conservative approximation often fails to find a solution in cluttered environments. We present a novel alternative that handles collision constraints without linearization and kinematic constraints in their quadratic form while still retaining the QP form. We achieve this by reformulating the constraints in a polar form and applying an Alternating Minimization algorithm to the resulting problem. Through extensive simulation results, we demonstrate that, as compared to Sequential Convex Programming (SCP) baselines, our approach achieves on average a 72\% improvement in success rate, a 36\% reduction in mission time, and a 42 times faster per-agent computation time. We also show that collision constraints derived from discrete-time barrier functions (BF) can be incorporated, leading to different safety behaviours without significant computational overhead. Moreover, our optimizer outperforms the state-of-the-art optimal control solver ACADO in handling BF constraints with a 31 times faster per-agent computation time and a 44\% reduction in mission time on average. We experimentally validated our approach on a Crazyflie quadrotor swarm of up to 12 quadrotors. The code with supplementary material and video are released for reference.

@inproceedings{adajania-icra23,
author={Vivek K. Adajania and Siqi Zhou and Arun Kumar Singh and and Angela P. Schoellig},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
title={{AMSwarm}: An Alternating Minimization Approach for Safe Motion Planning of Quadrotor Swarms in Cluttered Environments},
year={2023},
pages={1421--1427},
doi={10.1109/ICRA48891.2023.10161063},
urlvideo={http://tiny.cc/AMSwarmVideo},
urlcode={https://github.com/utiasDSL/AMSwarm},
abstract = {This paper presents a scalable online algorithm to generate safe and kinematically feasible trajectories for quadrotor swarms. Existing approaches rely on linearizing Euclidean distance-based collision constraints and on axis-wise decoupling of kinematic constraints to reduce the trajectory optimization problem for each quadrotor to a quadratic program (QP). This conservative approximation often fails to find a solution in cluttered environments. We present a novel alternative that handles collision constraints without linearization and kinematic constraints in their quadratic form while still retaining the QP form. We achieve this by reformulating the constraints in a polar form and applying an Alternating Minimization algorithm to the resulting problem. Through extensive simulation results, we demonstrate that, as compared to Sequential Convex Programming (SCP) baselines, our approach achieves on average a 72\% improvement in success rate, a 36\% reduction in mission time, and a 42 times faster per-agent computation time. We also show that collision constraints derived from discrete-time barrier functions (BF) can be incorporated, leading to different safety behaviours without significant computational overhead. Moreover, our optimizer outperforms the state-of-the-art optimal control solver ACADO in handling BF constraints with a 31 times faster per-agent computation time and a 44\% reduction in mission time on average. We experimentally validated our approach on a Crazyflie quadrotor swarm of up to 12 quadrotors. The code with supplementary material and video are released for reference.}
}

Swarm-GPT: combining large language models with safe motion planning for robot choreography design
A. Jiao, T. P. Patel, S. Khurana, A. Korol, L. Brunke, V. K. Adajania, U. Culha, S. Zhou, and A. P. Schoellig
Extended Abstract in the 6th Robot Learning Workshop at the Conference on Neural Information Processing Systems (NeurIPS), 2023.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

This paper presents Swarm-GPT, a system that integrates large language models (LLMs) with safe swarm motion planning – offering an automated and novel approach to deployable drone swarm choreography. Swarm-GPT enables users to automatically generate synchronized drone performances through natural language instructions. With an emphasis on safety and creativity, Swarm-GPT addresses a critical gap in the field of drone choreography by integrating the creative power of generative models with the effectiveness and safety of model-based planning algorithms. This goal is achieved by prompting the LLM to generate a unique set of waypoints based on extracted audio data. A trajectory planner processes these waypoints to guarantee collision-free and feasible motion. Results can be viewed in simulation prior to execution and modified through dynamic re-prompting. Sim-to-real transfer experiments demonstrate Swarm-GPT’s ability to accurately replicate simulated drone trajectories, with a mean sim-to-real root mean square error (RMSE) of 28.7 mm. To date, Swarm-GPT has been successfully showcased at three live events, exemplifying safe real-world deployment of pre-trained models.

@MISC{jiao-neurips23,
author = {Aoran Jiao and Tanmay P. Patel and Sanjmi Khurana and Anna-Mariya Korol and Lukas Brunke and Vivek K. Adajania and Utku Culha and Siqi Zhou and Angela P. Schoellig},
title = {{Swarm-GPT}: Combining Large Language Models with Safe Motion Planning for Robot Choreography Design},
year = {2023},
howpublished = {Extended Abstract in the 6th Robot Learning Workshop at the Conference on Neural Information Processing Systems (NeurIPS)},
urllink = {https://arxiv.org/abs/2312.01059},
abstract = {This paper presents Swarm-GPT, a system that integrates large language models (LLMs) with safe swarm motion planning - offering an automated and novel approach to deployable drone swarm choreography. Swarm-GPT enables users to automatically generate synchronized drone performances through natural language instructions. With an emphasis on safety and creativity, Swarm-GPT addresses a critical gap in the field of drone choreography by integrating the creative power of generative models with the effectiveness and safety of model-based planning algorithms. This goal is achieved by prompting the LLM to generate a unique set of waypoints based on extracted audio data. A trajectory planner processes these waypoints to guarantee collision-free and feasible motion. Results can be viewed in simulation prior to execution and modified through dynamic re-prompting. Sim-to-real transfer experiments demonstrate Swarm-GPT's ability to accurately replicate simulated drone trajectories, with a mean sim-to-real root mean square error (RMSE) of 28.7 mm. To date, Swarm-GPT has been successfully showcased at three live events, exemplifying safe real-world deployment of pre-trained models.},
}

[DOI] Min-max vertex cycle covers with connectivity constraints for multi-robot patrolling
J. Scherer, A. P. Schoellig, and B. Rinner
IEEE Robotics and Automation Letters, vol. 4, iss. 7, p. 10152–10159, 2022.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

We consider a multi-robot patrolling scenario with intermittent connectivity constraints, ensuring that robots’ data finally arrive at a base station. In particular, each robot traverses a closed tour periodically and meets with the robots on neighboring tours to exchange data. We model the problem as a variant of the min-max vertex cycle cover problem (MMCCP), which is the problem of covering all vertices with a given number of disjoint tours such that the longest tour length is minimal. In this work, we introduce the minimum idleness connectivity-constrained multi-robot patrolling problem, show that it is NP-hard, and model it as a mixed-integer linear program (MILP). The computational complexity of exactly solving this problem restrains practical applications, and therefore we develop approximate algorithms taking a solution for MMCCP as input. Our simulation experiments on 10 vertices and up to 3 robots compare the results of different solution approaches (including solving the MILP formulation) and show that our greedy algorithm can obtain an objective value close to the one of the MILP formulations but requires much less computation time. Experiments on instances with up to 100 vertices and 10 robots indicate that the greedy approximation algorithm tries to keep the length of the longest tour small by extending smaller tours for data exchange.

@article{scherer-ral22,
author={J{\"u}rgen Scherer and Angela P. Schoellig and Bernhard Rinner},
title={Min-Max Vertex Cycle Covers With Connectivity Constraints for Multi-Robot Patrolling},
journal = {{IEEE Robotics and Automation Letters}},
year = {2022},
volume = {4},
number = {7},
pages = {10152--10159},
doi = {10.1109/LRA.2022.3193242},
urllink = {https://ieeexplore.ieee.org/abstract/document/9837406/},
abstract = {We consider a multi-robot patrolling scenario with intermittent connectivity constraints, ensuring that robots' data finally arrive at a base station. In particular, each robot traverses a closed tour periodically and meets with the robots on neighboring tours to exchange data. We model the problem as a variant of the min-max vertex cycle cover problem (MMCCP), which is the problem of covering all vertices with a given number of disjoint tours such that the longest tour length is minimal. In this work, we introduce the minimum idleness connectivity-constrained multi-robot patrolling problem, show that it is NP-hard, and model it as a mixed-integer linear program (MILP). The computational complexity of exactly solving this problem restrains practical applications, and therefore we develop approximate algorithms taking a solution for MMCCP as input. Our simulation experiments on 10 vertices and up to 3 robots compare the results of different solution approaches (including solving the MILP formulation) and show that our greedy algorithm can obtain an objective value close to the one of the MILP formulations but requires much less computation time. Experiments on instances with up to 100 vertices and 10 robots indicate that the greedy approximation algorithm tries to keep the length of the longest tour small by extending smaller tours for data exchange.}
}

[DOI] Learning to fly—a Gym environment with PyBullet physics for reinforcement learning of multi-agent quadcopter control
J. Panerati, H. Zheng, S. Zhou, J. Xu, A. Prorok, and A. P. Schoellig
in Proc. of the IEEE International Conference on Intelligent Robots and Systems (IROS), 2021, p. 7512–7519.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

Robotic simulators are crucial for academic research and education as well as the development of safetycritical applications. Reinforcement learning environments—simple simulations coupled with a problem specification in the form of a reward function—are also important to standardize the development (and benchmarking) of learning algorithms. Yet, full-scale simulators typically lack portability and parallelizability. Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toylike problems. While public data sets have greatly benefited deep learning and computer vision, we still lack the software tools to simultaneously develop—and fairly compare—control theory and reinforcement learning approaches. In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine. Its multi-agent and vision-based reinforcement learning interfaces, as well as the support of realistic collisions and aerodynamic effects, make it, to the best of our knowledge, a first of its kind. We demonstrate its use through several examples, either for control (trajectory tracking with PID control, multi-robot flight with downwash, etc.) or reinforcement learning (single and multi-agent stabilization tasks), hoping to inspire future research that combines control theory and machine learning.

@INPROCEEDINGS{panerati-iros21,
author = {Jacopo Panerati and Hehui Zheng and SiQi Zhou and James Xu and Amanda Prorok and Angela P. Schoellig},
title = {Learning to Fly—a {Gym} Environment with {PyBullet} Physics for
Reinforcement Learning of Multi-agent Quadcopter Control},
booktitle = {{Proc. of the IEEE International Conference on Intelligent Robots and Systems (IROS)}},
year = {2021},
pages = {7512--7519},
doi = {10.1109/IROS51168.2021.9635857},
urlvideo = {https://www.youtube.com/watch?v=-zyrmneaz88},
urllink = {https://arxiv.org/abs/2103.02142},
abstract = {Robotic simulators are crucial for academic research and education as well as the development of safetycritical applications. Reinforcement learning environments—simple simulations coupled with a problem specification in the form of a reward function—are also important to standardize the development (and benchmarking) of learning algorithms. Yet, full-scale simulators typically lack portability and parallelizability. Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toylike problems. While public data sets have greatly benefited deep learning and computer vision, we still lack the software tools to simultaneously develop—and fairly compare—control theory and reinforcement learning approaches. In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine. Its multi-agent and vision-based reinforcement learning interfaces, as well as the support of realistic collisions and aerodynamic effects, make it, to the best of our knowledge, a first of its kind. We demonstrate its use through several examples, either for control (trajectory tracking with PID control, multi-robot flight with downwash, etc.) or reinforcement learning (single and multi-agent stabilization tasks), hoping to inspire future research that combines control theory and machine learning.},
}

[DOI] Online trajectory generation with distributed model predictive control for multi-robot motion planning
C. E. Luis, M. Vukosavljev, and A. P. Schoellig
IEEE Robotics and Automation Letters, vol. 5, iss. 2, p. 604–611, 2020.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

We present a distributed model predictive control (DMPC) algorithm to generate trajectories in real-time for multiple robots. We adopted the on-demand collision avoidance method presented in previous work to efficiently compute non-colliding trajectories in transition tasks. An event-triggered replanning strategy is proposed to account for disturbances. Our simulation results show that the proposed collision avoidance method can reduce, on average, around 50\% of the travel time required to complete a multi-agent point-to-point transition when compared to the well-studied Buffered Voronoi Cells (BVC) approach. Additionally, it shows a higher success rate in transition tasks with a high density of agents, with more than 90\% success rate with 30 palm-sized quadrotor agents in a 18 m^3 arena. The approach was experimentally validated with a swarm of up to 20 drones flying in close proximity.

@article{luis-ral20,
title = {Online Trajectory Generation with Distributed Model Predictive Control for Multi-Robot Motion Planning},
author = {Carlos E. Luis and Marijan Vukosavljev and Angela P. Schoellig},
journal = {{IEEE Robotics and Automation Letters}},
year = {2020},
volume = {5},
number = {2},
pages = {604--611},
doi = {10.1109/LRA.2020.2964159},
urlvideo = {https://www.youtube.com/watch?v=N4rWiraIU2k},
urllink = {https://arxiv.org/pdf/1909.05150.pdf},
abstract = {We present a distributed model predictive control (DMPC) algorithm to generate trajectories in real-time for multiple robots. We adopted the on-demand collision avoidance method presented in previous work to efficiently compute non-colliding trajectories in transition tasks. An event-triggered replanning strategy is proposed to account for disturbances. Our simulation results show that the proposed collision avoidance method can reduce, on average, around 50\% of the travel time required to complete a multi-agent point-to-point transition when compared to the well-studied Buffered Voronoi Cells (BVC) approach. Additionally, it shows a higher success rate in transition tasks with a high density of agents, with more than 90\% success rate with 30 palm-sized quadrotor agents in a 18 m^3 arena. The approach was experimentally validated with a swarm of up to 20 drones flying in close proximity.}
}

[DOI] A modular framework for motion planning using safe-by-design motion primitives
M. Vukosavljev, Z. Kroeze, A. P. Schoellig, and M. E. Broucke
IEEE Transactions on Robotics, vol. 35, iss. 5, p. 1233–1252, 2019.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

In this paper, we present a modular framework for solving a motion planning problem among a group of robots. The proposed framework utilizes a finite set of low-level motion primitives to generate motions in a gridded workspace. The constraints on allowable sequences of motion primitives are formalized through a maneuver automaton . At the high level, a control policy determines which motion primitive is executed in each box of the gridded workspace. We state general conditions on motion primitives to obtain provably correct behavior so that a library of safe-by-design motion primitives can be designed. The overall framework yields a highly robust design by utilizing feedback strategies at both the low and high levels. We provide specific designs for motion primitives and control policies suitable for multirobot motion planning; the modularity of our approach enables one to independently customize the designs of each of these components. Our approach is experimentally validated on a group of quadrocopters.

@article{vukosavljev-tro19,
title = {A modular framework for motion planning using safe-by-design motion primitives},
author = {Marijan Vukosavljev and Zachary Kroeze and Angela P. Schoellig and Mireille E. Broucke},
journal = {{IEEE Transactions on Robotics}},
year = {2019},
volume = {35},
number = {5},
pages = {1233--1252},
doi = {10.1109/TRO.2019.2923335},
urlvideo = {http://tiny.cc/modular-3alg},
urllink = {https://arxiv.org/abs/1905.00495},
abstract = {In this paper, we present a modular framework for solving a motion planning problem among a group of robots. The proposed framework utilizes a finite set of low-level motion primitives to generate motions in a gridded workspace. The constraints on allowable sequences of motion primitives are formalized through a maneuver automaton . At the high level, a control policy determines which motion primitive is executed in each box of the gridded workspace. We state general conditions on motion primitives to obtain provably correct behavior so that a library of safe-by-design motion primitives can be designed. The overall framework yields a highly robust design by utilizing feedback strategies at both the low and high levels. We provide specific designs for motion primitives and control policies suitable for multirobot motion planning; the modularity of our approach enables one to independently customize the designs of each of these components. Our approach is experimentally validated on a group of quadrocopters.}
}

Trajectory generation for multiagent point-to-point transitions via distributed model predictive control
C. E. Luis and A. P. Schoellig
IEEE Robotics and Automation Letters, vol. 4, iss. 2, p. 357–382, 2019.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

This paper introduces a novel algorithm for multiagent offline trajectory generation based on distributed model predictive control (DMPC). By predicting future states and sharing this information with their neighbours, the agents are able to detect and avoid collisions while moving towards their goals. The proposed algorithm computes transition trajectories for dozens of vehicles in a few seconds. It reduces the computation time by more than 85\% compared to previous optimization approaches based on sequential convex programming (SCP), with only causing a small impact on the optimality of the plans. We replaced the previous compatibility constraints in DMPC, which limit the motion of the agents in order to avoid collisions, by relaxing the collision constraints and enforcing them only when required. The approach was validated both through extensive simulations for a wide range of randomly generated transitions and with teams of up to 25 quadrotors flying in confined indoor spaces.

@article{luis-ral19,
title = {Trajectory Generation for Multiagent Point-To-Point Transitions via Distributed Model Predictive Control},
author = {Carlos E. Luis and Angela P. Schoellig},
journal = {{IEEE Robotics and Automation Letters}},
year = {2019},
volume = {4},
number = {2},
pages = {357--382},
urllink = {https://arxiv.org/abs/1809.04230},
urlvideo = {https://youtu.be/ZN2e7h-kkpw},
abstract = {This paper introduces a novel algorithm for multiagent offline trajectory generation based on distributed model predictive control (DMPC). By predicting future states and sharing this information with their neighbours, the agents are able to detect and avoid collisions while moving towards their goals. The proposed algorithm computes transition trajectories for dozens of vehicles in a few seconds. It reduces the computation time by more than 85\% compared to previous optimization approaches based on sequential convex programming (SCP), with only causing a small impact on the optimality of the plans. We replaced the previous compatibility constraints in DMPC, which limit the motion of the agents in order to avoid collisions, by relaxing the collision constraints and enforcing them only when required. The approach was validated both through extensive simulations for a wide range of randomly generated transitions and with teams of up to 25 quadrotors flying in confined indoor spaces.}
}

Fast and in sync: periodic swarm patterns for quadrotors
X. Du, C. E. Luis, M. Vukosavljev, and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2019, p. 9143–9149.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

This paper aims to design quadrotor swarm performances, where the swarm acts as an integrated, coordinated unit embodying moving and deforming objects. We divide the task of creating a choreography into three basic steps: designing swarm motion primitives, transitioning between those movements, and synchronizing the motion of the drones. The result is a flexible framework for designing choreographies comprised of a wide variety of motions. The motion primitives can be intuitively designed using few parameters, providing a rich library for choreography design. Moreover, we combine and adapt existing goal assignment and trajectory generation algorithms to maximize the smoothness of the transitions between motion primitives. Finally, we propose a correction algorithm to compensate for motion delays and synchronize the motion of the drones to a desired periodic motion pattern. The proposed methodology was validated experimentally by generating and executing choreographies on a swarm of 25 quadrotors.

@INPROCEEDINGS{du-icra19,
author = {Xintong Du and Carlos E. Luis and Marijan Vukosavljev and Angela P. Schoellig},
title = {Fast and in sync: periodic swarm patterns for quadrotors},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2019},
pages={9143--9149},
urllink = {https://arxiv.org/abs/1810.03572},
urlvideo = {https://drive.google.com/file/d/1D9CTpYSjdFHNjiYsFWeOI3--Ve1-3Bfp/view},
abstract = {This paper aims to design quadrotor swarm performances, where the swarm acts as an integrated, coordinated unit embodying moving and deforming objects. We divide the task of creating a choreography into three basic steps: designing swarm motion primitives, transitioning between those movements, and synchronizing the motion of the drones. The result is a flexible framework for designing choreographies comprised of a wide variety of motions. The motion primitives can be intuitively designed using few parameters, providing a rich library for choreography design. Moreover, we combine and adapt existing goal assignment and trajectory generation algorithms to maximize the smoothness of the transitions between motion primitives. Finally, we propose a correction algorithm to compensate for motion delays and synchronize the motion of the drones to a desired periodic motion pattern. The proposed methodology was validated experimentally by generating and executing choreographies on a swarm of 25 quadrotors.},
}

Towards scalable online trajectory generation for multi-robot systems
C. E. Luis, M. Vukosavljev, and A. P. Schoellig
Abstract and Poster, in Proc. of the Resilient Robot Teams Workshop at the IEEE International Conference on Robotics and Automation (ICRA), 2019.
[View BibTeX] [View Abstract] [Download PDF]

We present a distributed model predictive control (DMPC) algorithm to generate trajectories in real-time for multiple robots, taking into account their trajectory tracking dynamics and actuation limits. An event-triggered replanning strategy is proposed to account for disturbances in the system. We adopted the on-demand collision avoidance method presented in previous work to efficiently compute non-colliding trajectories in transition tasks. Preliminary results in simulation show a higher success rate than previous online methods based on Buffered Voronoi Cells (BVC), while maintaining computational tractability for real-time operation.

@MISC{luis-icra19,
author = {Carlos E. Luis and Marijan Vukosavljev and Angela P. Schoellig},
title = {Towards Scalable Online Trajectory Generation for Multi-robot Systems},
year = {2019},
howpublished = {Abstract and Poster, in Proc. of the Resilient Robot Teams Workshop at the IEEE International Conference on Robotics and Automation (ICRA)},
abstract = {We present a distributed model predictive control (DMPC) algorithm to generate trajectories in real-time for multiple robots, taking into account their trajectory tracking dynamics and actuation limits. An event-triggered replanning strategy is proposed to account for disturbances in the system. We adopted the on-demand collision avoidance method presented in previous work to efficiently compute non-colliding trajectories in transition tasks. Preliminary results in simulation show a higher success rate than previous online methods based on Buffered Voronoi Cells (BVC), while maintaining computational tractability for real-time operation.},
}

Multiagent Learning

[DOI] To share or not to share? performance guarantees and the asymmetric nature of cross-robot experience transfer
M. J. Sorocky, S. Zhou, and A. P. Schoellig
IEEE Control Systems Letters, vol. 5, iss. 3, p. 923–928, 2020.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [View 2nd Video] [More Information]

In the robotics literature, experience transfer has been proposed in different learning-based control frameworks to minimize the costs and risks associated with training robots. While various works have shown the feasibility of transferring prior experience from a source robot to improve or accelerate the learning of a target robot, there are usually no guarantees that experience transfer improves the performance of the target robot. In practice, the efficacy of transferring experience is often not known until it is tested on physical robots. This trial-and-error approach can be extremely unsafe and inefficient. Building on our previous work, in this paper we consider an inverse module transfer learning framework, where the inverse module of a source robot system is transferred to a target robot system to improve its tracking performance on arbitrary trajectories. We derive a theoretical bound on the tracking error when a source inverse module is transferred to the target robot and propose a Bayesian-optimization-based algorithm to estimate this bound from data. We further highlight the asymmetric nature of cross-robot experience transfer that has often been neglected in the literature. We demonstrate our approach in quadrotor experiments and show that we can guarantee positive transfer on the target robot for tracking random periodic trajectories.

@article{sorocky-lcss20,
title = {To Share or Not to Share? Performance Guarantees and the Asymmetric Nature of Cross-Robot Experience Transfer},
author = {Michael J. Sorocky and Siqi Zhou and Angela P. Schoellig},
journal = {{IEEE Control Systems Letters}},
year = {2020},
volume = {5},
number = {3},
pages = {923--928},
doi = {10.1109/LCSYS.2020.3005886},
urllink = {https://ieeexplore.ieee.org/document/9129781},
urlvideo = {https://www.youtube.com/watch?v=fPWNhIMcMqM},
urlvideo2 = {https://youtu.be/wVAxJO-pejQ},
abstract = {In the robotics literature, experience transfer has been proposed in different learning-based control frameworks to minimize the costs and risks associated with training robots. While various works have shown the feasibility of transferring prior experience from a source robot to improve or accelerate the learning of a target robot, there are usually no guarantees that experience transfer improves the performance of the target robot. In practice, the efficacy of transferring experience is often not known until it is tested on physical robots. This trial-and-error approach can be extremely unsafe and inefficient. Building on our previous work, in this paper we consider an inverse module transfer learning framework, where the inverse module of a source robot system is transferred to a target robot system to improve its tracking performance on arbitrary trajectories. We derive a theoretical bound on the tracking error when a source inverse module is transferred to the target robot and propose a Bayesian-optimization-based algorithm to estimate this bound from data. We further highlight the asymmetric nature of cross-robot experience transfer that has often been neglected in the literature. We demonstrate our approach in quadrotor experiments and show that we can guarantee positive transfer on the target robot for tracking random periodic trajectories.}
}

Experience selection using dynamics similarity for efficient multi-source transfer learning between robots
M. J. Sorocky, S. Zhou, and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2020, p. 2739–2745.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

In the robotics literature, different knowledge transfer approaches have been proposed to leverage the experience from a source task or robot—real or virtual—to accelerate the learning process on a new task or robot. A commonly made but infrequently examined assumption is that incorporating experience from a source task or robot will be beneficial. For practical applications, inappropriate knowledge transfer can result in negative transfer or unsafe behaviour. In this work, inspired by a system gap metric from robust control theory, the nu-gap, we present a data-efficient algorithm for estimating the similarity between pairs of robot systems. In a multi-source inter-robot transfer learning setup, we show that this similarity metric allows us to predict relative transfer performance and thus informatively select experiences from a source robot before knowledge transfer. We demonstrate our approach with quadrotor experiments, where we transfer an inverse dynamics model from a real or virtual source quadrotor to enhance the tracking performance of a target quadrotor on arbitrary hand-drawn trajectories. We show that selecting experiences based on the proposed similarity metric effectively facilitates the learning of the target quadrotor, improving performance by 62\% compared to a poorly selected experience.

@INPROCEEDINGS{sorocky-icra20,
author = {Michael J. Sorocky and Siqi Zhou and Angela P. Schoellig},
title = {Experience Selection Using Dynamics Similarity for Efficient Multi-Source Transfer Learning Between Robots},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2020},
pages = {2739--2745},
urllink = {https://ieeexplore.ieee.org/document/9196744},
urlvideo = {https://youtu.be/8m3mOkljujM},
abstract = {In the robotics literature, different knowledge transfer approaches have been proposed to leverage the experience from a source task or robot—real or virtual—to accelerate the learning process on a new task or robot. A commonly made but infrequently examined assumption is that incorporating experience from a source task or robot will be beneficial. For practical applications, inappropriate knowledge transfer can result in negative transfer or unsafe behaviour. In this work, inspired by a system gap metric from robust control theory, the nu-gap, we present a data-efficient algorithm for estimating the similarity between pairs of robot systems. In a multi-source inter-robot transfer learning setup, we show that this similarity metric allows us to predict relative transfer performance and thus informatively select experiences from a source robot before knowledge transfer. We demonstrate our approach with quadrotor experiments, where we transfer an inverse dynamics model from a real or virtual source quadrotor to enhance the tracking performance of a target quadrotor on arbitrary hand-drawn trajectories. We show that selecting experiences based on the proposed similarity metric effectively facilitates the learning of the target quadrotor, improving performance by 62\% compared to a poorly selected experience.},
}

[DOI] Distributed iterative learning control for multi-agent systems
A. Hock and A. P. Schoellig
Autonomous Robots, vol. 43, iss. 8, p. 1989–2010, 2019.
[View BibTeX] [View Abstract] [Download PDF]

The goal of this work is to enable a team of quadrotors to learn how to accurately track a desired trajectory while holding agiven formation. We solve this problem in a distributed manner, where each vehicle has only access to the information of its neighbors. The desired trajectory is only available to one (or few) vehicle(s). We present a distributed iterative learning control {(ILC)} approach where each vehicle learns from the experience of its own and its neighbors’ previous task repetitions and adapts its feedforward input to improve performance. Existing algorithms are extended in theory to make them more applicable to real-world experiments. In particular, we prove convergence of the learning scheme for any linear, causal learning function with gains chosen according to a simple scalar condition. Previous proofs were restricted to a specific learning function, which only depends on the tracking error derivative {(D-type ILC)}. This extension provides more degrees of freedom in the {ILC} design and, as a result, better performance can be achieved. We also show that stability is not affected by a linear dynamic coupling between neighbors. This allows the use of an additional consensus feedback controller to compensate for non-repetitive disturbances. Possible robustness extensions for the {ILC} algorithm are discussed, the so-called {Q-filter} and a {Kalman} filter for disturbance estimation. Finally, this is the first work to show distributed ILC in experiment. With a team of two quadrotors, the practical applicability of the proposed distributed multi-agent {ILC} approach is attested and the benefits of the theoretic extension are analyzed. In a second experimental setup with a team of four quadrotors, we evaluate the impact of different communication graph structures on the learning performance. The results indicate, that there is a trade-off between fast learning convergence and formation synchronicity, especially during the first iterations.

@article{hock-auro19,
title = {Distributed iterative learning control for multi-agent systems},
author = {Andreas Hock and Angela P. Schoellig},
journal = {{Autonomous Robots}},
year = {2019},
volume = {43},
number = {8},
pages = {1989--2010},
doi = {10.1007/s10514-019-09845-4},
abstract = {The goal of this work is to enable a team of quadrotors to learn how to accurately track a desired trajectory while holding agiven formation. We solve this problem in a distributed manner, where each vehicle has only access to the information of its neighbors. The desired trajectory is only available to one (or few) vehicle(s). We present a distributed iterative learning control {(ILC)} approach where each vehicle learns from the experience of its own and its neighbors’ previous task repetitions and adapts its feedforward input to improve performance. Existing algorithms are extended in theory to make them more applicable to real-world experiments. In particular, we prove convergence of the learning scheme for any linear, causal learning function with gains chosen according to a simple scalar condition. Previous proofs were restricted to a specific learning function, which only depends on the tracking error derivative {(D-type ILC)}. This extension provides more degrees of freedom in the {ILC} design and, as a result, better performance can be achieved. We also show that stability is not affected by a linear dynamic coupling between neighbors. This allows the use of an additional consensus feedback controller to compensate for non-repetitive disturbances. Possible robustness extensions for the {ILC} algorithm are discussed, the so-called {Q-filter} and a {Kalman} filter for disturbance estimation. Finally, this is the first work to show distributed ILC in experiment. With a team of two quadrotors, the practical applicability of the proposed distributed multi-agent {ILC} approach is attested and the benefits of the theoretic extension are analyzed. In a second experimental setup with a team of four quadrotors, we evaluate the impact of different communication graph structures on the learning performance. The results indicate, that there is a trade-off between fast learning convergence and formation synchronicity, especially during the first iterations.}
}

[DOI] Transfer learning for high-precision trajectory tracking through L1 adaptive feedback and iterative learning
K. Pereida, D. Kooijman, R. R. P. R. Duivenvoorden, and A. P. Schoellig
International Journal of Adaptive Control and Signal Processing, vol. 33, iss. 2, p. 388–409, 2019.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

Robust and adaptive control strategies are needed when robots or automated systems are introduced to unknown and dynamic environments where they are required to cope with disturbances, unmodeled dynamics, and parametric uncertainties. In this paper, we demonstrate the capabilities of a combined L_1 adaptive control and iterative learning control (ILC) framework to achieve high-precision trajectory tracking in the presence of unknown and changing disturbances. The L1 adaptive controller makes the system behave close to a reference model; however, it does not guarantee that perfect trajectory tracking is achieved, while ILC improves trajectory tracking performance based on previous iterations. The combined framework in this paper uses L1 adaptive control as an underlying controller that achieves a robust and repeatable behavior, while the ILC acts as a high-level adaptation scheme that mainly compensates for systematic tracking errors. We illustrate that this framework enables transfer learning between dynamically different systems, where learned experience of one system can be shown to be beneficial for another different system. Experimental results with two different quadrotors show the superior performance of the combined L1-ILC framework compared with approaches using ILC with an underlying proportional-derivative controller or proportional-integral-derivative controller. Results highlight that our L1-ILC framework can achieve high-precision trajectory tracking when unknown and changing disturbances are present and can achieve transfer of learned experience between dynamically different systems. Moreover, our approach is able to achieve precise trajectory tracking in the first attempt when the initial input is generated based on the reference model of the adaptive controller.

@ARTICLE{pereida-acsp18,
title={Transfer Learning for High-Precision Trajectory Tracking Through {L1} Adaptive Feedback and Iterative Learning},
author={Karime Pereida and Dave Kooijman and Rikky R. P. R. Duivenvoorden and Angela P. Schoellig},
journal={{International Journal of Adaptive Control and Signal Processing}},
year={2019},
volume = {33},
number = {2},
pages = {388--409},
doi={10.1002/acs.2887},
urllink={https://onlinelibrary.wiley.com/doi/abs/10.1002/acs.2887},
abstract={Robust and adaptive control strategies are needed when robots or automated systems are introduced to unknown and dynamic environments where they are required to cope with disturbances, unmodeled dynamics, and parametric uncertainties. In this paper, we demonstrate the capabilities of a combined L_1 adaptive control and iterative learning control (ILC) framework to achieve high-precision trajectory tracking in the presence of unknown and changing disturbances. The L1 adaptive controller makes the system behave close to a reference model; however, it does not guarantee that perfect trajectory tracking is achieved, while ILC improves trajectory tracking performance based on previous iterations. The combined framework in this paper uses L1 adaptive control as an underlying controller that achieves a robust and repeatable behavior, while the ILC acts as a high-level adaptation scheme that mainly compensates for systematic tracking errors. We illustrate that this framework enables transfer learning between dynamically different systems, where learned experience of one system can be shown to be beneficial for another different system. Experimental results with two different quadrotors show the superior performance of the combined L1-ILC framework compared with approaches using ILC with an underlying proportional-derivative controller or proportional-integral-derivative controller. Results highlight that our L1-ILC framework can achieve high-precision trajectory tracking when unknown and changing disturbances are present and can achieve transfer of learned experience between dynamically different systems. Moreover, our approach is able to achieve precise trajectory tracking in the first attempt when the initial input is generated based on the reference model of the adaptive controller.},
}

Data-efficient multi-robot, multi-task transfer learning for trajectory tracking
K. Pereida, M. K. Helwa, and A. P. Schoellig
Abstract and Poster, in Proc. of the Resilient Robot Teams Workshop at the IEEE International Conference on Robotics and Automation (ICRA), 2019.
[View BibTeX] [View Abstract] [Download PDF]

Learning can significantly improve the performance of robots in uncertain and changing environments; however, typical learning approaches need to start a new learning process for each new task or robot as transferring knowledge is cumbersome or not possible. In this work, we introduce a multi-robot, multi-task transfer learning framework that allows a system to complete a task by learning from a few demonstrations of another task executed on a different system. We focus on the trajectory tracking problem where each trajectory represents a different task. The proposed learning control architecture has two stages: (i) \emph{multi-robot} transfer learning framework that combines $\mathcal{L}_1$ adaptive control and iterative learning control, where the key idea is that the adaptive controller forces dynamically different systems to behave as a specified reference model; and (ii) a \emph{multi-task} transfer learning framework that uses theoretical control results (e.g., the concept of vector relative degree) to learn a map from desired trajectories to the inputs that make the system track these trajectories with high accuracy. This map is used to calculate the inputs for a new, unseen trajectory. We conduct experiments on two different quadrotor platforms and six different trajectories where we show that using information from tracking a single trajectory learned by one quadrotor reduces, on average, the first-iteration tracking error on another quadrotor by 74\%.

@MISC{pereida-icra19c,
author = {Karime Pereida and Mohamed K. Helwa and Angela P. Schoellig},
title = {Data-Efficient Multi-Robot, Multi-Task Transfer Learning for Trajectory Tracking},
year = {2019},
howpublished = {Abstract and Poster, in Proc. of the Resilient Robot Teams Workshop at the IEEE International Conference on Robotics and Automation (ICRA)},
abstract = {Learning can significantly improve the performance of robots in uncertain and changing environments; however, typical learning approaches need to start a new learning process for each new task or robot as transferring knowledge is cumbersome or not possible. In this work, we introduce a multi-robot, multi-task transfer learning framework that allows a system to complete a task by learning from a few demonstrations of another task executed on a different system. We focus on the trajectory tracking problem where each trajectory represents a different task. The proposed learning control architecture has two stages: (i) \emph{multi-robot} transfer learning framework that combines $\mathcal{L}_1$ adaptive control and iterative learning control, where the key idea is that the adaptive controller forces dynamically different systems to behave as a specified reference model; and (ii) a \emph{multi-task} transfer learning framework that uses theoretical control results (e.g., the concept of vector relative degree) to learn a map from desired trajectories to the inputs that make the system track these trajectories with high accuracy. This map is used to calculate the inputs for a new, unseen trajectory. We conduct experiments on two different quadrotor platforms and six different trajectories where we show that using information from tracking a single trajectory learned by one quadrotor reduces, on average, the first-iteration tracking error on another quadrotor by 74\%.},
}

[DOI] Data-efficient multi-robot, multi-task transfer learning for trajectory tracking
K. Pereida, M. K. Helwa, and A. P. Schoellig
IEEE Robotics and Automation Letters, vol. 3, iss. 2, p. 1260–1267, 2018.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

Transfer learning has the potential to reduce the burden of data collection and to decrease the unavoidable risks of the training phase. In this paper, we introduce a multi-robot, multi-task transfer learning framework that allows a system to complete a task by learning from a few demonstrations of another task executed on another system. We focus on the trajectory tracking problem where each trajectory represents a different task, since many robotic tasks can be described as a trajectory tracking problem. The proposed, multi-robot transfer learning framework is based on a combined L1 adaptive control and iterative learning control approach. The key idea is that the adaptive controller forces dynamically different systems to behave as a specified reference model. The proposed multi-task transfer learning framework uses theoretical control results (e.g., the concept of vector relative degree) to learn a map from desired trajectories to the inputs that make the system track these trajectories with high accuracy. This map is used to calculate the inputs for a new, unseen trajectory. Experimental results using two different quadrotor platforms and six different trajectories show that, on average, the proposed framework reduces the first-iteration tracking error by 74% when information from tracking a different, single trajectory on a different quadrotor is utilized.

@article{pereida-ral18,
title = {Data-Efficient Multi-Robot, Multi-Task Transfer Learning for Trajectory Tracking},
author = {Karime Pereida and Mohamed K. Helwa and Angela P. Schoellig},
journal = {{IEEE Robotics and Automation Letters}},
year = {2018},
volume = {3},
number = {2},
doi = {10.1109/LRA.2018.2795653},
pages = {1260--1267},
urllink = {http://ieeexplore.ieee.org/abstract/document/8264705/},
abstract = {Transfer learning has the potential to reduce the burden of data collection and to decrease the unavoidable risks of the training phase. In this paper, we introduce a multi-robot, multi-task transfer learning framework that allows a system to complete a task by learning from a few demonstrations of another task executed on another system. We focus on the trajectory tracking problem where each trajectory represents a different task, since many robotic tasks can be described as a trajectory tracking problem. The proposed, multi-robot transfer learning framework is based on a combined L1 adaptive control and iterative learning control approach. The key idea is that the adaptive controller forces dynamically different systems to behave as a specified reference model. The proposed multi-task transfer learning framework uses theoretical control results (e.g., the concept of vector relative degree) to learn a map from desired trajectories to the inputs that make the system track these trajectories with high accuracy. This map is used to calculate the inputs for a new, unseen trajectory. Experimental results using two different quadrotor platforms and six different trajectories show that, on average, the proposed framework reduces the first-iteration tracking error by 74% when information from tracking a different, single trajectory on a different quadrotor is utilized.},
}

[DOI] Multi-robot transfer learning: a dynamical system perspective
M. K. Helwa and A. P. Schoellig
in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 4702-4708.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

Multi-robot transfer learning allows a robot to use data generated by a second, similar robot to improve its own behavior. The potential advantages are reducing the time of training and the unavoidable risks that exist during the training phase. Transfer learning algorithms aim to find an optimal transfer map between different robots. In this paper, we investigate, through a theoretical study of single-input single-output (SISO) systems, the properties of such optimal transfer maps. We first show that the optimal transfer learning map is, in general, a dynamic system. The main contribution of the paper is to provide an algorithm for determining the properties of this optimal dynamic map including its order and regressors (i.e., the variables it depends on). The proposed algorithm does not require detailed knowledge of the robots’ dynamics, but relies on basic system properties easily obtainable through simple experimental tests. We validate the proposed algorithm experimentally through an example of transfer learning between two different quadrotor platforms. Experimental results show that an optimal dynamic map, with correct properties obtained from our proposed algorithm, achieves 60-70% reduction of transfer learning error compared to the cases when the data is directly transferred or transferred using an optimal static map.

@INPROCEEDINGS{helwa-iros17,
author={Mohamed K. Helwa and Angela P. Schoellig},
title={Multi-Robot Transfer Learning: A Dynamical System Perspective},
booktitle={{Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}},
year={2017},
pages={4702-4708},
doi={10.1109/IROS.2017.8206342},
urllink={https://arxiv.org/abs/1707.08689},
abstract={Multi-robot transfer learning allows a robot to use data generated by a second, similar robot to improve its own behavior. The potential advantages are reducing the time of training and the unavoidable risks that exist during the training phase. Transfer learning algorithms aim to find an optimal transfer map between different robots. In this paper, we investigate, through a theoretical study of single-input single-output (SISO) systems, the properties of such optimal transfer maps. We first show that the optimal transfer learning map is, in general, a dynamic system. The main contribution of the paper is to provide an algorithm for determining the properties of this optimal dynamic map including its order and regressors (i.e., the variables it depends on). The proposed algorithm does not require detailed knowledge of the robots’ dynamics, but relies on basic system properties easily obtainable through simple experimental tests. We validate the proposed algorithm experimentally through an example of transfer learning between two different quadrotor platforms. Experimental results show that an optimal dynamic map, with correct properties obtained from our proposed algorithm, achieves 60-70% reduction of transfer learning error compared to the cases when the data is directly transferred or transferred using an optimal static map.},
}

[DOI] High-precision trajectory tracking in changing environments through L1 adaptive feedback and iterative learning
K. Pereida, R. R. P. R. Duivenvoorden, and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2017, p. 344–350.
[View BibTeX] [View Abstract] [Download PDF] [Download Slides] [More Information]

As robots and other automated systems are introduced to unknown and dynamic environments, robust and adaptive control strategies are required to cope with disturbances, unmodeled dynamics and parametric uncertainties. In this paper, we propose and provide theoretical proofs of a combined L1 adaptive feedback and iterative learning control (ILC) framework to improve trajectory tracking of a system subject to unknown and changing disturbances. The L1 adaptive controller forces the system to behave in a repeatable, predefined way, even in the presence of unknown and changing disturbances; however, this does not imply that perfect trajectory tracking is achieved. ILC improves the tracking performance based on experience from previous executions. The performance of ILC is limited by the robustness and repeatability of the underlying system, which, in this approach, is handled by the L1 adaptive controller. In particular, we are able to generalize learned trajectories across different system configurations because the L1 adaptive controller handles the underlying changes in the system. We demonstrate the improved trajectory tracking performance and generalization capabilities of the combined method compared to pure ILC in experiments with a quadrotor subject to unknown, dynamic disturbances. This is the first work to show L1 adaptive control combined with ILC in experiment.

@INPROCEEDINGS{pereida-icra17,
author = {Karime Pereida and Rikky R. P. R. Duivenvoorden and Angela P. Schoellig},
title = {High-precision trajectory tracking in changing environments through {L1} adaptive feedback and iterative learning},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2017},
pages = {344--350},
doi = {10.1109/ICRA.2017.7989041},
urllink = {http://ieeexplore.ieee.org/abstract/document/7989044/},
urlslides = {../../wp-content/papercite-data/slides/pereida-icra17-slides.pdf},
abstract = {As robots and other automated systems are introduced to unknown and dynamic environments, robust and adaptive control strategies are required to cope with disturbances, unmodeled dynamics and parametric uncertainties. In this paper, we propose and provide theoretical proofs of a combined L1 adaptive feedback and iterative learning control (ILC) framework to improve trajectory tracking of a system subject to unknown and changing disturbances. The L1 adaptive controller forces the system to behave in a repeatable, predefined way, even in the presence of unknown and changing disturbances; however, this does not imply that perfect trajectory tracking is achieved. ILC improves the tracking performance based on experience from previous executions. The performance of ILC is limited by the robustness and repeatability of the underlying system, which, in this approach, is handled by the L1 adaptive controller. In particular, we are able to generalize learned trajectories across different system configurations because the L1 adaptive controller handles the underlying changes in the system. We demonstrate the improved trajectory tracking performance and generalization capabilities of the combined method compared to pure ILC in experiments with a quadrotor subject to unknown, dynamic disturbances. This is the first work to show L1 adaptive control combined with ILC in experiment.},
}

[DOI] Distributed iterative learning control for a team of quadrotors
A. Hock and A. P. Schoellig
in Proc. of the IEEE Conference on Decision and Control (CDC), 2016, pp. 4640-4646.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [View 2nd Video] [Download Slides] [More Information]

The goal of this work is to enable a team of quadrotors to learn how to accurately track a desired trajectory while holding a given formation. We solve this problem in a distributed manner, where each vehicle has only access to the information of its neighbors. The desired trajectory is only available to one (or few) vehicles. We present a distributed iterative learning control (ILC) approach where each vehicle learns from the experience of its own and its neighbors’ previous task repetitions, and adapts its feedforward input to improve performance. Existing algorithms are extended in theory to make them more applicable to real-world experiments. In particular, we prove stability for any causal learning function with gains chosen according to a simple scalar condition. Previous proofs were restricted to a specific learning function that only depends on the tracking error derivative (D-type ILC). Our extension provides more degrees of freedom in the ILC design and, as a result, better performance can be achieved. We also show that stability is not affected by a linear dynamic coupling between neighbors. This allows us to use an additional consensus feedback controller to compensate for non-repetitive disturbances. Experiments with two quadrotors attest the effectiveness of the proposed distributed multi-agent ILC approach. This is the first work to show distributed ILC in experiment.

@INPROCEEDINGS{hock-cdc16,
author = {Andreas Hock and Angela P. Schoellig},
title = {Distributed iterative learning control for a team of quadrotors},
booktitle = {{Proc. of the IEEE Conference on Decision and Control (CDC)}},
year = {2016},
pages = {4640-4646},
doi = {10.1109/CDC.2016.7798976},
urllink = {http://arxiv.org/ads/1603.05933},
urlvideo = {https://youtu.be/Qw598DRw6-Q},
urlvideo2 = {https://youtu.be/JppRu26eZgI},
urlslides = {../../wp-content/papercite-data/slides/hock-cdc16-slides.pdf},
abstract = {The goal of this work is to enable a team of quadrotors to learn how to accurately track a desired trajectory while holding a given formation. We solve this problem in a distributed manner, where each vehicle has only access to the information of its neighbors. The desired trajectory is only available to one (or few) vehicles. We present a distributed iterative learning control (ILC) approach where each vehicle learns from the experience of its own and its neighbors’ previous task repetitions, and adapts its feedforward input to improve performance. Existing algorithms are extended in theory to make them more applicable to real-world experiments. In particular, we prove stability for any causal learning function with gains chosen according to a simple scalar condition. Previous proofs were restricted to a specific learning function that only depends on the tracking error derivative (D-type ILC). Our extension provides more degrees of freedom in the ILC design and, as a result, better performance can be achieved. We also show that stability is not affected by a linear dynamic coupling between neighbors. This allows us to use an additional consensus feedback controller to compensate for non-repetitive disturbances. Experiments with two quadrotors attest the effectiveness of the proposed distributed multi-agent ILC approach. This is the first work to show distributed ILC in experiment.},
}

[DOI] An upper bound on the error of alignment-based transfer learning between two linear, time-invariant, scalar systems
K. V. Raimalwala, B. A. Francis, and A. P. Schoellig
in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, p. 5253–5258.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

Methods from machine learning have successfully been used to improve the performance of control systems in cases when accurate models of the system or the environment are not available. These methods require the use of data generated from physical trials. Transfer Learning (TL) allows for this data to come from a different, similar system. This paper studies a simplified TL scenario with the goal of understanding in which cases a simple, alignment-based transfer of data is possible and beneficial. Two linear, time-invariant (LTI), single-input, single-output systems are tasked to follow the same reference signal. A scalar, LTI transformation is applied to the output from a source system to align with the output from a target system. An upper bound on the 2-norm of the transformation error is derived for a large set of reference signals and is minimized with respect to the transformation scalar. Analysis shows that the minimized error bound is reduced for systems with poles that lie close to each other (that is, for systems with similar response times). This criterion is relaxed for systems with poles that have a larger negative real part (that is, for stable systems with fast response), meaning that poles can be further apart for the same minimized error bound. Additionally, numerical results show that using the reference signal as input to the transformation reduces the minimized bound further.

@INPROCEEDINGS{raimalwala-iros15,
author = {Kaizad V. Raimalwala and Bruce A. Francis and Angela P. Schoellig},
title = {An upper bound on the error of alignment-based transfer learning between two linear, time-invariant, scalar systems},
booktitle = {{Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}},
pages = {5253--5258},
year = {2015},
doi = {10.1109/IROS.2015.7354118},
urllink = {http://hdl.handle.net/1807/69365},
note = {},
abstract = {Methods from machine learning have successfully been used to improve the performance of control systems in cases when accurate models of the system or the environment are not available. These methods require the use of data generated from physical trials. Transfer Learning (TL) allows for this data to come from a different, similar system. This paper studies a simplified TL scenario with the goal of understanding in which cases a simple, alignment-based transfer of data is possible and beneficial. Two linear, time-invariant (LTI), single-input, single-output systems are tasked to follow the same reference signal. A scalar, LTI transformation is applied to the output from a source system to align with the output from a target system. An upper bound on the 2-norm of the transformation error is derived for a large set of reference signals and is minimized with respect to the transformation scalar. Analysis shows that the minimized error bound is reduced for systems with poles that lie close to each other (that is, for systems with similar response times). This criterion is relaxed for systems with poles that have a larger negative real part (that is, for stable systems with fast response), meaning that poles can be further apart for the same minimized error bound. Additionally, numerical results show that using the reference signal as input to the transformation reduces the minimized bound further.}
}

[DOI] Limited benefit of joint estimation in multi-agent iterative learning
A. P. Schoellig, J. Alonso-Mora, and R. D’Andrea
Asian Journal of Control, vol. 14, iss. 3, pp. 613-623, 2012.
[View BibTeX] [View Abstract] [Download PDF] [Download Additional Material] [Download Slides]

This paper studies iterative learning control (ILC) in a multi-agent framework, wherein a group of agents simultaneously and repeatedly perform the same task. Assuming similarity between the agents, we investigate whether exchanging information between the agents improves an individual’s learning performance. That is, does an individual agent benefit from the experience of the other agents? We consider the multi-agent iterative learning problem as a two-step process of: first, estimating the repetitive disturbance of each agent; and second, correcting for it. We present a comparison of an agent’s disturbance estimate in the case of (I) independent estimation, where each agent has access only to its own measurement, and (II) joint estimation, where information of all agents is globally accessible. When the agents are identical and noise comes from measurement only, joint estimation yields a noticeable improvement in performance. However, when process noise is encountered or when the agents have an individual disturbance component, the benefit of joint estimation is negligible.

@ARTICLE{schoellig-ajc12,
author = {Angela P. Schoellig and Javier Alonso-Mora and Raffaello D'Andrea},
title = {Limited benefit of joint estimation in multi-agent iterative learning},
journal = {{Asian Journal of Control}},
volume = {14},
number = {3},
pages = {613-623},
year = {2012},
doi = {10.1002/asjc.398},
urldata={../../wp-content/papercite-data/data/schoellig-ajc12-files.zip},
urlslides={../../wp-content/papercite-data/slides/schoellig-ajc12-slides.pdf},
abstract = {This paper studies iterative learning control (ILC) in a multi-agent framework, wherein a group of agents simultaneously and repeatedly perform the same task. Assuming similarity between the agents, we investigate whether exchanging information between the agents improves an individual's learning performance. That is, does an individual agent benefit from the experience of the other agents? We consider the multi-agent iterative learning problem as a two-step process of: first, estimating the repetitive disturbance of each agent; and second, correcting for it. We present a comparison of an agent's disturbance estimate in the case of (I) independent estimation, where each agent has access only to its own measurement, and (II) joint estimation, where information of all agents is globally accessible. When the agents are identical and noise comes from measurement only, joint estimation yields a noticeable improvement in performance. However, when process noise is encountered or when the agents have an individual disturbance component, the benefit of joint estimation is negligible.}
}

[DOI] Sensitivity of joint estimation in multi-agent iterative learning control
A. P. Schoellig and R. D’Andrea
in Proc. of the IFAC (International Federation of Automatic Control) World Congress, 2011, pp. 1204-1212.
[View BibTeX] [View Abstract] [Download PDF] [Download Additional Material] [Download Slides]

We consider a group of agents that simultaneously learn the same task, and revisit a previously developed algorithm, where agents share their information and learn jointly. We have already shown that, as compared to an independent learning model that disregards the information of the other agents, and when assuming similarity between the agents, a joint algorithm improves the learning performance of an individual agent. We now revisit the joint learning algorithm to determine its sensitivity to the underlying assumption of similarity between agents. We note that an incorrect assumption about the agents’ degree of similarity degrades the performance of the joint learning scheme. The degradation is particularly acute if we assume that the agents are more similar than they are in reality; in this case, a joint learning scheme can result in a poorer performance than the independent learning algorithm. In the worst case (when we assume that the agents are identical, but they are, in reality, not) the joint learning does not even converge to the correct value. We conclude that, when applying the joint algorithm, it is crucial not to overestimate the similarity of the agents; otherwise, a learning scheme that is independent of the similarity assumption is preferable.

@INPROCEEDINGS{schoellig-ifac11,
author = {Angela P. Schoellig and Raffaello D'Andrea},
title = {Sensitivity of joint estimation in multi-agent iterative learning control},
booktitle = {{Proc. of the IFAC (International Federation of Automatic Control) World Congress}},
pages = {1204-1212},
year = {2011},
doi = {10.3182/20110828-6-IT-1002.03687},
urlslides = {../../wp-content/papercite-data/slides/schoellig-ifac11-slides.pdf},
urldata = {../../wp-content/papercite-data/data/schoellig-ifac11-files.zip},
abstract = {We consider a group of agents that simultaneously learn the same task, and revisit a previously developed algorithm, where agents share their information and learn jointly. We have already shown that, as compared to an independent learning model that disregards the information of the other agents, and when assuming similarity between the agents, a joint algorithm improves the learning performance of an individual agent. We now revisit the joint learning algorithm to determine its sensitivity to the underlying assumption of similarity between agents. We note that an incorrect assumption about the agents' degree of similarity degrades the performance of the joint learning scheme. The degradation is particularly acute if we assume that the agents are more similar than they are in reality; in this case, a joint learning scheme can result in a poorer performance than the independent learning algorithm. In the worst case (when we assume that the agents are identical, but they are, in reality, not) the joint learning does not even converge to the correct value. We conclude that, when applying the joint algorithm, it is crucial not to overestimate the similarity of the agents; otherwise, a learning scheme that is independent of the similarity assumption is preferable.}
}

University of Toronto Institute for Aerospace Studies