RSS 2026 · Robotics: Science and Systems

Learning Agile Quadrotor Flight
in the Real World

LAFR · A self-adaptive framework where quadrotor policies evolve in real-world flight, growing faster and more agile with every iteration.

Yunfan Ren1 Zhiyuan Zhu1 Jiaxu Xing1 Davide Scaramuzza1

1 Robotics and Perception Group · University of Zurich

arXiv Code YouTube Live 3D viewer
Abstract

From 2.0 m/s to 7.3 m/s in 100 seconds of physical flight.

LAFR is a self-adaptive framework that learns agile quadrotor flight directly in the real world, without precise system identification, without offline Sim2Real transfer, and without conservative safety margins. The system operates as a continuous closed-loop cycle bridging physical execution and differentiable simulation: a learned hybrid dynamics model closes the reality gap; RASH-BPTT (Real-world Anchored Short-horizon Backpropagation Through Time) optimizes the control policy via massively parallel rollouts anchored at the latest real-world state; and Adaptive Temporal Scaling jointly retunes the reference trajectory's time-scale \(\alpha\) using closed-loop sensitivity, maximizing agility while enforcing safety via a barrier function. The base policy evolves from a peak speed of 2.0 m/s to 7.3 m/s within roughly 100 seconds of physical flight time, converging to a 2.34 s figure-8 lap at \(\alpha = 0.28\).

Peak speed
2.0 → 7.3 m/s
≈ 100 s of physical flight
Lap time
2.34 s
Figure-8, reach CTBR saturation
No offline SysID
Direct in real world
No massive data collection
Cite

BibTeX

If you find this work useful, please cite:

@inproceedings{ren2026agile,
  title     = {Learning Agile Quadrotor Flight in the Real World},
  author    = {Ren, Yunfan and Zhu, Zhiyuan and Xing, Jiaxu and Scaramuzza, Davide},
  booktitle = {Robotics: Science and Systems (RSS)},
  year      = {2026}
}
Method

Interactive Pipeline

A continuous closed-loop cycle bridging physical execution and differentiable simulation. Hover any module to spotlight it; click to pin the detail panel.

A · Policy Learning B · Real-World Rollout C · Model Calibration D · Anchored Init. E · Adaptive Temporal Scaling Continual Learning Loop
Module A: Continual Policy Learning (RASH-BPTT) with Hybrid Dynamics Model
Module B: Real-World Policy Rollout on the physical quadrotor
Module C: Simulation Model Calibration (hybrid ODE + neural residual)
Module D: Real-World Anchored Initialization
Module E: Adaptive Temporal Scaling
Continual Learning loop
Hover · spotlight Click · pin popup Esc · unpin Tab · focus next card
Live demo

Interactive 3D Viewer

The figure below embeds the official Rerun web viewer (WASM) streaming the training recording from this site. Drag to orbit, scroll to zoom, and use the sim_time timeline at the bottom to scrub through the 8 ATS iterations as \(\alpha\) contracts and lap time falls from 8.34 s to 2.34 s.

Loading viewer…

Recording

What you're watching

The recording above corresponds row-for-row to the eleven ATS iterations below. \(\alpha\) contracts from 1.0 to its 0.25 floor; the lap time falls from 8.34 s to 2.08 s and the policy's peak speed grows from 3.4 m/s to 10.0 m/s, while tracking RMSE stays at or below 0.16 m, well clear of the 0.35 m safety guard.

Iter \(\alpha\) Lap time Tracking RMSE Notes
01.0008.34 s0.60 mBase policy, pre-residual
10.7536.28 s0.19 mATS first contraction
20.7536.28 s0.11 mResidual closes sim-to-real gap
30.5794.82 s0.06 m
40.4713.92 s0.04 mBest RMSE (0.042 m)
50.3993.32 s0.06 m
60.3502.92 s0.07 m
70.3132.60 s0.08 m
80.2812.34 s0.09 m
90.2552.12 s0.16 mApproaching \(\alpha\) floor
100.2502.08 s0.09 mConverged at \(\alpha\) floor

After the residual closes the sim-to-real gap (iter 2 onward), tracking RMSE drops to 0.042 m at iter 4 and stays below 0.16 m for the rest of the run, well clear of the 0.35 m safety guard. The pipeline converges to 2.08 s at \(\alpha = 0.25\) (the \(\alpha\) floor) on iteration 10.

Run it yourself

Reproduce

ROS-free, JAX-only reference implementation. A single workstation with a modern NVIDIA GPU reproduces the figure-8 lap-time curve end-to-end.

1 · Set up the environment
conda install -n base -c conda-forge mamba
mamba create -n flightning python=3.11 -y
mamba activate flightning
pip install --upgrade "jax[cuda12]"
pip install -e ".[dev]"
2 · Train + run the pipeline
python -m flightning.scripts.train \
    --log_dir outputs/tracking

python -m flightning.online_learning.run_pipeline \
    --cfg flightning/cfg/online.yaml
Thanks

Acknowledgements

We thank the Robotics and Perception Group at the University of Zurich for hardware, lab space, and countless flight sessions. We are grateful to the open-source communities behind JAX, Rerun, and the broader differentiable-simulation ecosystem, on whose tools this work stands.

This research was supported in part by the National Centre of Competence in Research (NCCR) Robotics through the Swiss National Science Foundation (SNSF) and by the European Research Council (ERC) under the European Union's Horizon programme.