Abstract
This thesis investigates the potential of Reinforcement Learning (RL) for achieving robust and adaptable quadcopter control, focusing on trajectory and attitude stabilization. We compare state-of-the-art RL algorithms, specifically Proximal Policy Optimization (PPO), against traditional Proportional-Integral-Derivative (PID) controllers across three tasks: hovering, slow trajectory following, and fast trajectory following. To enhance realism, we employ a modified PyFlyt simulation environment with a high-fidelity Crazyflie 2.x model, accounting for motor dynamics, noise, wind disturbances, and aerodynamic drag.
The challenge of operating a quadcopter can be divided into two distinct parts: planning a flight path and actually following that path. Our focus is on the latter, training a reinforcement learning controller within a highly realistic simulated setting. This environment directly links simulated sensor data to motor commands, mimicking the real-world operation of the quadcopter. This end-to-end approach allows us to train the controller on the complete process, from sensing the environment to executing actions.
Our results show that end-to-end RL-based controllers, consistently outperform both gain-scheduling and cascaded PID controllers in maintaining stable hover, particularly under strong wind conditions where RL controller acheived 70% of the average reward criteria while cascaded PID and gain-scheduling PID scored 45% and 24% respectively. Traditional PID controllers, while widely used, often struggle to maintain stability in the face of external disturbances and changing system dynamics. This is particularly evident in challenging scenarios such as strong winds, where their fixed gains may not be sufficient to counteract the destabilizing forces. RL, on the other hand, learns to adapt its control strategy based on the environment, making it more robust to such disturbances.
In slow trajectory following, RL demonstrates a wider range of motion and smoother angular control, achieving performance comparable to cascaded PID. Specically, RL exhibited a 15% increase in the range of motion along the x and y axes compared to gain-scheduling PID, and a 10% reduction in angular velocity fluctuations compared to both PID controllers. This improvement can be attributed to RL’s ability to learn complex control policies that optimize for both position and attitude objectives simultaneously.
For fast trajectory following, RL enables significantly faster navigation compared to slower scenarios. In our experiments, RL achieved more than a 60% reduction in the time taken to complete the trajectory compared to the slow trajectory following task. This highlights RL’s ability to exploit the full capabilities of the quadcopter’s actuators when the task demands it.
These findings highlight the limitations of traditional PID controllers in dynamic environments and demonstrate the superior adaptability and robustness of RL-based controllers. The ability of RL to learn and adapt to changing conditions makes it a promising approach for quadcopter control in real-world applications where the environment is often unpredictable and disturbances are common.
Future research will focus on generalizing these results across different platforms, establishing formal stability guarantees, and exploring real-world applications in complex tasks such as obstacle avoidance and collaborative flight. One of the key challenges in generalizing RL to different quadcopter platforms is the variation in physical parameters and dynamics. Addressing this challenge will require developing RL algorithms that can either adapt to these variations online or learn from a diverse set of training environments. Additionally, while RL has shown promising results in simulation, ensuring stability in real-world deployments remains an open research question. Future work will explore methods for providing formal stability guarantees for RL-based controllers. Finally, extending RL to complex tasks such as obstacle avoidance and collaborative flight will require addressing issues such as sensor limitations, partial observability, and multi-agent coordination.
By addressing these challenges, RL has the potential to revolutionize quadcopter control, enabling the deployment of autonomous aerial vehicles in a wide range of applications that were previously considered too complex or dangerous for traditional control methods.
School
School of Sciences and Engineering
Department
Robotics, Control & Smart Systems Program
Degree Name
MS in Robotics, Control and Smart Systems
Graduation Date
Fall 12-25-2024
Submission Date
9-5-2024
First Advisor
Dr. Maki Habib
Committee Member 1
Dr. Yasser Gadallah
Committee Member 2
Dr. Wahied Gharieb Ali Abdelaal
Committee Member 3
Dr. Seif El Dawlatly
Extent
137 p.
Document Type
Master's Thesis
Institutional Review Board (IRB) Approval
Approval has been obtained for this item
Recommended Citation
APA Citation
Chawa, M.
(2024).End-to-End Autonomous Quadcopter using Reinforcement Learning [Master's Thesis, the American University in Cairo]. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/2421
MLA Citation
Chawa, Mohamed Marwan. End-to-End Autonomous Quadcopter using Reinforcement Learning. 2024. American University in Cairo, Master's Thesis. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/2421