Vision-Based Autonomous Flight | Dynamic Systems Lab

Vision-Based Autonomous Flight

Home » Research » Aerial, Control, Current Projects, Ground/Self-driving, Robotics » Vision-Based Autonomous Flight

← Previous Next →

We use vision to achieve robot localization and navigation without using external infrastructure. Our ground robot experiments localize based on 3D vision sensors (stereo cameras or Lidars) and make use of the so-called visual teach-and-repeat algorithm originally developed by Prof. Tim Barfoot’s group. We also work on vision-based flight of fixed-wing and quadrotor vehicles equipped with a camera. For example, one project explores methods of visual navigation for emergency return of UAVs in case of GPS or communications failure. We leverage the visual teach-and-repeat algorithm to allow autonomous self-teaching of routes with a monocular or stereo camera under GPS guidance, and autonomous re-following of the route using vision alone. To increase flexibility on the return path and not bound the vehicle to the self-taught path, we also investigate using other map sources such as satellite data.

Related Publications

Control-barrier-aided teleoperation with visual-inertial SLAM for safe MAV navigation in complex environments
S. Zhou, S. Papatheodorou, S. Leutenegger, and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2024. Accepted.

In this paper, we consider a Micro Aerial Vehicle (MAV) system teleoperated by a non-expert and introduce a perceptive safety filter that leverages Control Barrier Functions (CBFs) in conjunction with Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) and dense 3D occupancy mapping to guarantee safe navigation in complex and unstructured environments. Our system relies solely on onboard IMU measurements, stereo infrared images, and depth images and autonomously corrects teleoperated inputs when they are deemed unsafe. We define a point in 3D space as unsafe if it satisfies either of two conditions: (i) it is occupied by an obstacle, or (ii) it remains unmapped. At each time step, an occupancy map of the environment is updated by the VI-SLAM by fusing the onboard measurements, and a CBF is constructed to parameterize the (un)safe region in the 3D space. Given the CBF and state feedback from the VI-SLAM module, a safety filter computes a certified reference that best matches the teleoperation input while satisfying the safety constraint encoded by the CBF. In contrast to existing perception-based safe control frameworks, we directly close the perception-action loop and demonstrate the full capability of safe control in combination with real-time VI-SLAM without any external infrastructure or prior knowledge of the environment. We verify the efficacy of the perceptive safety filter in real-time MAV experiments using exclusively onboard sensing and computation and show that the teleoperated MAV is able to safely navigate through unknown environments despite arbitrary inputs sent by the teleoperator.

@inproceedings{zhou-icra24,
author={Siqi Zhou and Sotiris Papatheodorou and Stefan Leutenegger and Angela P. Schoellig},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
title={Control-Barrier-Aided Teleoperation with Visual-Inertial {SLAM} for Safe {MAV} Navigation in Complex Environments},
year={2024},
note={Accepted},
abstract = {In this paper, we consider a Micro Aerial Vehicle (MAV) system teleoperated by a non-expert and introduce a perceptive safety filter that leverages Control Barrier Functions (CBFs) in conjunction with Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) and dense 3D occupancy mapping to guarantee safe navigation in complex and unstructured environments. Our system relies solely on onboard IMU measurements, stereo infrared images, and depth images and autonomously corrects teleoperated inputs when they are deemed unsafe. We define a point in 3D space as unsafe if it satisfies either of two conditions: (i) it is occupied by an obstacle, or (ii) it remains unmapped. At each time step, an occupancy map of the environment is updated by the VI-SLAM by fusing the onboard measurements, and a CBF is constructed to parameterize the (un)safe region in the 3D space. Given the CBF and state feedback from the VI-SLAM module, a safety filter computes a certified reference that best matches the teleoperation input while satisfying the safety constraint encoded by the CBF. In contrast to existing perception-based safe control frameworks, we directly close the perception-action loop and demonstrate the full capability of safe control in combination with real-time VI-SLAM without any external infrastructure or prior knowledge of the environment. We verify the efficacy of the perceptive safety filter in real-time MAV experiments using exclusively onboard sensing and computation and show that the teleoperated MAV is able to safely navigate through unknown environments despite arbitrary inputs sent by the teleoperator.}
}

Fly out the window: exploiting discrete-time flatness for fast vision-based multirotor flight
M. Greeff, S. Zhou, and A. P. Schoellig
IEEE Robotics and Automation Letters, vol. 7, iss. 2, p. 5023–5030, 2022.

Recent work has demonstrated fast, agile flight using only vision as a position sensor and no GPS. Current feed- back controllers for fast vision-based flight typically rely on a full-state estimate, including position, velocity and acceleration. An accurate full-state estimate is often challenging to obtain due to noisy IMU measurements, infrequent position updates from the vision system, and an imperfect motion model used to obtain high-rate state estimates required by the controller. In this work, we present an alternative control design that bypasses the need for a full-state estimate by exploiting discrete-time flatness, a structural property of the underlying vehicle dynamics. First, we show that the Euler discretization of the multirotor dynamics is discrete-time flat. This allows us to design a predictive controller using only a window of inputs and outputs, the latter consisting of position and yaw estimates. We highlight in simulation that our approach outperforms controllers that rely on an incorrect full-state estimate. We perform extensive outdoor multirotor flight experiments and demonstrate reliable vision-based navigation. In these experiments, our discrete- time flatness-based controller achieves speeds up to 10 m/s and significantly outperforms similar controllers that hinge on full- state estimation, achieving up to 80\% path error reduction.

@article{greeff-ral22,
author = {Melissa Greeff and SiQi Zhou and Angela P. Schoellig},
title = {Fly Out The Window: Exploiting Discrete-Time Flatness for Fast Vision-Based Multirotor Flight},
journal = {{IEEE Robotics and Automation Letters}},
year = {2022},
volume = {7},
number = {2},
pages = {5023--5030},
doi = {10.1109/LRA.2022.3154008},
urllink = {https://ieeexplore.ieee.org/document/9720919},
abstract = {Recent work has demonstrated fast, agile flight using only vision as a position sensor and no GPS. Current feed- back controllers for fast vision-based flight typically rely on a full-state estimate, including position, velocity and acceleration. An accurate full-state estimate is often challenging to obtain due to noisy IMU measurements, infrequent position updates from the vision system, and an imperfect motion model used to obtain high-rate state estimates required by the controller. In this work, we present an alternative control design that bypasses the need for a full-state estimate by exploiting discrete-time flatness, a structural property of the underlying vehicle dynamics. First, we show that the Euler discretization of the multirotor dynamics is discrete-time flat. This allows us to design a predictive controller using only a window of inputs and outputs, the latter consisting of position and yaw estimates. We highlight in simulation that our approach outperforms controllers that rely on an incorrect full-state estimate. We perform extensive outdoor multirotor flight experiments and demonstrate reliable vision-based navigation. In these experiments, our discrete- time flatness-based controller achieves speeds up to 10 m/s and significantly outperforms similar controllers that hinge on full- state estimation, achieving up to 80\% path error reduction.}
}

A perception-aware flatness-based model predictive controller for fast vision-based multirotor flight
M. Greeff, T. D. Barfoot, and A. P. Schoellig
in Proc. of the International Federation of Automatic Control (IFAC) World Congress, 2020, p. 9412–9419.

Despite the push toward fast, reliable vision-based multirotor flight, most vision- based navigation systems still rely on controllers that are perception-agnostic. Given that these controllers ignore their effect on the system’s localisation capabilities, they can produce an action that allows vision-based localisation (and consequently navigation) to fail. In this paper, we present a perception-aware flatness-based model predictive controller (MPC) that accounts for its effect on visual localisation. To achieve perception awareness, we first develop a simple geometric model that uses over 12 km of flight data from two different environments (urban and rural) to associate visual landmarks with a probability of being successfully matched. In order to ensure localisation, we integrate this model as a chance constraint in our MPC such that we are probabilistically guaranteed that the number of successfully matched visual landmarks exceeds a minimum threshold. We show how to simplify the chance constraint to a nonlinear, deterministic constraint on the position of the multirotor. With desired speeds of 10 m/s, we demonstrate in simulation (based on real-world perception data) how our proposed perception-aware MPC is able to achieve faster flight while guaranteeing localisation compared to similar perception-agnostic controllers. We illustrate how our perception-aware MPC adapts the path constraint along the path based on the perception model by accounting for camera orientation, path error and location of the visual landmarks. The result is that repeating the same geometric path but with the camera facing in opposite directions can lead to different optimal paths flown.

@INPROCEEDINGS{greeff-ifac20,
author = {Melissa Greeff and Timothy D. Barfoot and Angela P. Schoellig},
title = {A Perception-Aware Flatness-Based Model Predictive Controller for Fast Vision-Based Multirotor Flight},
booktitle = {{Proc. of the International Federation of Automatic Control (IFAC) World Congress}},
year = {2020},
volume = {53},
number = {2},
pages = {9412--9419},
urlvideo = {https://youtu.be/aBEce5aWfvk},
abstract = {Despite the push toward fast, reliable vision-based multirotor flight, most vision-
based navigation systems still rely on controllers that are perception-agnostic. Given that these controllers ignore their effect on the system’s localisation capabilities, they can produce an action that allows vision-based localisation (and consequently navigation) to fail. In this paper, we present a perception-aware flatness-based model predictive controller (MPC) that accounts for its effect on visual localisation. To achieve perception awareness, we first develop a simple geometric model that uses over 12 km of flight data from two different environments (urban and rural) to associate visual landmarks with a probability of being successfully matched. In order to ensure localisation, we integrate this model as a chance constraint in our MPC such that we are probabilistically guaranteed that the number of successfully matched visual landmarks exceeds a minimum threshold. We show how to simplify the chance constraint to a nonlinear, deterministic constraint on the position of the multirotor. With desired speeds of 10 m/s, we demonstrate in simulation (based on real-world perception data) how our proposed perception-aware MPC is able to achieve faster flight while guaranteeing localisation compared to similar perception-agnostic controllers. We illustrate how our perception-aware MPC adapts the path constraint along the path based on the perception model by accounting for camera orientation, path error and location of the visual landmarks. The result is that repeating the same geometric path but with the camera facing in opposite directions can lead to different optimal paths flown.},
}

Visual localization with Google Earth images for robust global pose estimation of UAVs
B. Patel, T. D. Barfoot, and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2020, p. 6491–6497.

We estimate the global pose of a multirotor UAV by visually localizing images captured during a flight with Google Earth images pre-rendered from known poses. We metrically localize real images with georeferenced rendered images using a dense mutual information technique to allow accurate global pose estimation in outdoor GPS-denied environments. We show the ability to consistently localize throughout a sunny summer day despite major lighting changes while demonstrating that a typical feature-based localizer struggles under the same conditions. Successful image registrations are used as measurements in a filtering framework to apply corrections to the pose estimated by a gimballed visual odometry pipeline. We achieve less than 1 metre and 1 degree RMSE on a 303 metre flight and less than 3 metres and 3 degrees RMSE on six 1132 metre flights as low as 36 metres above ground level conducted at different times of the day from sunrise to sunset.

@INPROCEEDINGS{patel-icra20,
title = {Visual Localization with {Google Earth} Images for Robust Global Pose Estimation of {UAV}s},
author = {Bhavit Patel and Timothy D. Barfoot and Angela P. Schoellig},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2020},
pages = {6491--6497},
urlvideo = {https://tiny.cc/GElocalization},
abstract = {We estimate the global pose of a multirotor UAV by visually localizing images captured during a flight with Google Earth images pre-rendered from known poses. We metrically localize real images with georeferenced rendered images using a dense mutual information technique to allow accurate global pose estimation in outdoor GPS-denied environments. We show the ability to consistently localize throughout a sunny summer day despite major lighting changes while demonstrating that a typical feature-based localizer struggles under the same conditions. Successful image registrations are used as measurements in a filtering framework to apply corrections to the pose estimated by a gimballed visual odometry pipeline. We achieve less than 1 metre and 1 degree RMSE on a 303 metre flight and less than 3 metres and 3 degrees RMSE on six 1132 metre flights as low as 36 metres above ground level conducted at different times of the day from sunrise to sunset.}
}

Point me in the right direction: improving visual localization on UAVs with active gimballed camera pointing
B. Patel, M. Warren, and A. P. Schoellig
in Proc. of the Conference on Computer and Robot Vision (CRV), 2019, p. 105–112. Best paper award, robot vision.

Robust autonomous navigation of multirotor UAVs in GPS-denied environments is critical to enable their safe operation in many applications such as surveillance and reconnaissance, inspection, and delivery services. In this paper, we use a gimballed stereo camera for localization and demonstrate how the localization performance and robustness can be improved by actively controlling the camera’s viewpoint. For an autonomous route-following task based on a recorded map, multiple gimbal pointing strategies are compared: off-the-shelf passive stabilization, active stabilization, minimization of viewpoint orientation error, and pointing the camera optical axis at the centroid of previously observed landmarks. We demonstrate improved localization performance using an active gimbal-stabilized camera in multiple outdoor flight experiments on routes up to 315 m, and with 6-25 m altitude variations. Scenarios are shown where a static camera frequently fails to localize while a gimballed camera attenuates perspective errors to retain localization. We demonstrate that our orientation matching and centroid pointing strategies provide the best performance; enabling localization despite increasing velocity discrepancies between the map-generation flight and the live flight from 3-9 m/s, and 8 m path offsets.

@INPROCEEDINGS{patel-crv19,
author = {Bhavit Patel and Michael Warren and Angela P. Schoellig},
title = {Point Me In The Right Direction: Improving Visual Localization on {UAV}s with Active Gimballed Camera Pointing},
booktitle = {{Proc. of the Conference on Computer and Robot Vision (CRV)}},
year = {2019},
pages = {105--112},
note = {Best paper award, robot vision},
abstract = {Robust autonomous navigation of multirotor UAVs in GPS-denied environments is critical to enable their safe operation in many applications such as surveillance and reconnaissance, inspection, and delivery services. In this paper, we use a gimballed stereo camera for localization and demonstrate how the localization performance and robustness can be improved by actively controlling the camera’s viewpoint. For an autonomous route-following task based on a recorded map, multiple gimbal pointing strategies are compared: off-the-shelf passive stabilization, active stabilization, minimization of viewpoint orientation error, and pointing the camera optical axis at the centroid of previously observed landmarks. We demonstrate improved localization performance using an active gimbal-stabilized camera in multiple outdoor flight experiments on routes up to 315 m, and with 6-25 m altitude variations. Scenarios are shown where a static camera frequently fails to localize while a gimballed camera attenuates perspective errors to retain localization. We demonstrate that our orientation matching and centroid pointing strategies provide the best performance; enabling localization despite increasing velocity discrepancies between the map-generation flight and the live flight from 3-9 m/s, and 8 m path offsets.},
}

Flatness-based model predictive control for quadrotor trajectory tracking
M. Greeff and A. P. Schoellig
in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, p. 6740–6745.

The use of model predictive control for quadrotor applications requires balancing trajectory tracking performance and constraint satisfaction with fast computational demands. This paper proposes a Flatness-based Model Predictive Control (FMPC) approach that can be applied to quadrotors, and more generally, differentially flat nonlinear systems. Our proposed FMPC couples feedback model predictive control with feedforward linearization. The proposed approach has the computational advantage that, similar to linear model predictive control, it only requires solving a convex quadratic program instead of a nonlinear program. However, unlike linear model predictive control, we still account for the nonlinearity in the model through the use of an inverse term. In simulation, we demonstrate improved robustness over approaches that couple model predictive control with feedback linearization. In experiments using quadrotor vehicles, we demonstrate improved trajectory tracking compared to classical linear and nonlinear model predictive controllers.

@INPROCEEDINGS{greeff-iros18,
author={Melissa Greeff and Angela P. Schoellig},
title={Flatness-based Model Predictive Control for Quadrotor Trajectory Tracking},
booktitle={{Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}},
year={2018},
urllink={http://www.dynsyslab.org/wp-content/papercite-data/pdf/greeff-iros18.pdf},
urldata = {../../wp-content/papercite-data/data/greeff-icra18-supplementary.pdf},
pages={6740--6745},
abstract={The use of model predictive control for quadrotor applications requires balancing trajectory tracking performance and constraint satisfaction with fast computational demands. This paper proposes a Flatness-based Model Predictive Control (FMPC) approach that can be applied to quadrotors, and more generally, differentially flat nonlinear systems. Our proposed FMPC couples feedback model predictive control with feedforward linearization. The proposed approach has the computational advantage that, similar to linear model predictive control, it only requires solving a convex quadratic program instead of a nonlinear program. However, unlike linear model predictive control, we still account for the nonlinearity in the model through the use of an inverse term. In simulation, we demonstrate improved robustness over approaches that couple model predictive control with feedback linearization. In experiments using quadrotor vehicles, we demonstrate improved trajectory tracking compared to classical linear and nonlinear model predictive controllers.},
}

Towards visual teach & repeat for GPS-denied flight of a fixed-wing UAV
M. Warren, M. Paton, K. MacTavish, A. P. Schoellig, and T. D. Barfoot
in Proc. of the 11th Conference on Field and Service Robotics (FSR), 2017, p. 481–498.

Most consumer and industrial Unmanned Aerial Vehicles (UAVs) rely on combining Global Navigation Satellite Systems (GNSS) with barometric and inertial sensors for outdoor operation. As a consequence these vehicles are prone to a variety of potential navigation failures such as jamming and environmental interference. This usually limits their legal activities to locations of low population density within line-of-sight of a human pilot to reduce risk of injury and damage. Autonomous route-following methods such as Visual Teach & Repeat (VT&R) have enabled long-range navigational autonomy for ground robots without the need for reliance on external infrastructure or an accurate global position estimate. In this paper, we demonstrate the localisation component of (VT&R) outdoors on a fixed-wing UAV as a method of backup navigation in case of primary sensor failure. We modify the localisation engine of (VT&R) to work with a single downward facing camera on a UAV to enable safe navigation under the guidance of vision alone. We evaluate the method using visual data from the UAV flying a 1200 m trajectory (at altitude of 80 m) several times during a multi-day period, covering a total distance of 10.8 km using the algorithm. We examine the localisation performance for both small (single flight) and large (inter-day) temporal differences from teach to repeat. Through these experiments, we demonstrate the ability to successfully localise the aircraft on a self-taught route using vision alone without the need for additional sensing or infrastructure.

@INPROCEEDINGS{warren-fsr17,
author={Michael Warren and Michael Paton and Kirk MacTavish and Angela P. Schoellig and Tim D. Barfoot},
title={Towards visual teach & repeat for {GPS}-denied
flight of a fixed-wing {UAV}},
booktitle={{Proc. of the 11th Conference on Field and Service Robotics (FSR)}},
year={2017},
pages={481--498},
urllink={https://link.springer.com/chapter/10.1007/978-3-319-67361-5_31},
abstract={Most consumer and industrial Unmanned Aerial Vehicles (UAVs) rely on combining Global Navigation Satellite Systems (GNSS) with barometric and inertial sensors for outdoor operation. As a consequence these vehicles are prone to a variety of potential navigation failures such as jamming and environmental interference. This usually limits their legal activities to locations of low population density within line-of-sight of a human pilot to reduce risk of injury and damage. Autonomous route-following methods such as Visual Teach & Repeat (VT&R) have enabled long-range navigational autonomy for ground robots without the need for reliance on external infrastructure or an accurate global position estimate. In this paper, we demonstrate the localisation component of (VT&R) outdoors on a fixed-wing UAV as a method of backup navigation in case of primary sensor failure. We modify the localisation engine of (VT&R) to work with a single downward facing camera on a UAV to enable safe navigation under the guidance of vision alone. We evaluate the method using visual data from the UAV flying a 1200 m trajectory (at altitude of 80 m) several times during a multi-day period, covering a total distance of 10.8 km using the algorithm. We examine the localisation performance for both small (single flight) and large (inter-day) temporal differences from teach to repeat. Through these experiments, we demonstrate the ability to successfully localise the aircraft on a self-taught route using vision alone without the need for additional sensing or infrastructure.},
}

A proof-of-concept demonstration of visual teach and repeat on a quadrocopter using an altitude sensor and a monocular camera
A. Pfrunder, A. P. Schoellig, and T. D. Barfoot
in Proc. of the International Conference on Computer and Robot Vision (CRV), 2014, pp. 238-245.

This paper applies an existing vision-based navigation algorithm to a micro aerial vehicle (MAV). The algorithm has previously been used for long-range navigation of ground robots based on on-board 3D vision sensors such as a stereo or Kinect cameras. A teach-and-repeat operational strategy enables a robot to autonomously repeat a manually taught route without relying on an external positioning system such as GPS. For MAVs we show that a monocular downward looking camera combined with an altitude sensor can be used as 3D vision sensor replacing other resource-expensive 3D vision solutions. The paper also includes a simple path tracking controller that uses feedback from the visual and inertial sensors to guide the vehicle along a straight and level path. Preliminary experimental results demonstrate reliable, accurate and fully autonomous flight of an 8-m-long (straight and level) route, which was taught with the quadrocopter fixed to a cart. Finally, we present the successful flight of a more complex, 16-m-long route.

@INPROCEEDINGS{pfrunder-crv14,
author = {Andreas Pfrunder and Angela P. Schoellig and Timothy D. Barfoot},
title = {A proof-of-concept demonstration of visual teach and repeat on a quadrocopter using an altitude sensor and a monocular camera},
booktitle = {{Proc. of the International Conference on Computer and Robot Vision (CRV)}},
pages = {238-245},
year = {2014},
doi = {10.1109/CRV.2014.40},
urlvideo = {https://youtu.be/BRDvK4xD8ZY?list=PLuLKX4lDsLIaJEVTsuTAVdDJDx0xmzxXr},
urlslides = {../../wp-content/papercite-data/slides/pfrunder-crv14-slides.pdf},
abstract = {This paper applies an existing vision-based navigation algorithm to a micro aerial vehicle (MAV). The algorithm has previously been used for long-range navigation of ground robots based on on-board 3D vision sensors such as a stereo or Kinect cameras. A teach-and-repeat operational strategy enables a robot to autonomously repeat a manually taught route without relying on an external positioning system such as GPS. For MAVs we show that a monocular downward looking camera combined with an altitude sensor can be used as 3D vision sensor replacing other resource-expensive 3D vision solutions. The paper also includes a simple path tracking controller that uses feedback from the visual and inertial sensors to guide the vehicle along a straight and level path. Preliminary experimental results demonstrate reliable, accurate and fully autonomous flight of an 8-m-long (straight and level) route, which was taught with the quadrocopter fixed to a cart. Finally, we present the successful flight of a more complex, 16-m-long route.}
}