Monocular Visual SLAM Using YOLO-Based Object Detection and Depth Estimation

Authors

  • Ruhani Sujlana Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA
  • Yojan Gautam Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA
  • Ningshi Yao Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA

Abstract

Unmanned aerial vehicles (UAVs), such as blimps and other lighter-than-air systems, have helpful indoor applications such as monitoring, navigation, and inspection. These vehicles offer advantages like long flight duration and low power consumption but face major challenges in indoor environments where GPS signals are unavailable. Additionally, their limited payload capacity restricts the use of localization systems that rely on heavy sensors, such as LIDAR or stereo cameras. To address these limitations, this project explores a monocular vision-based SLAM method that uses YOLO (You Only Look Once) object detection and monocular depth estimation. A trained YOLO model can identify recognizable objects in real time, while the depth estimation model predicts the distance of each pixel using learned patterns from large datasets. This combination allows the system to estimate the blimp’s position and construct a map that includes both spatial geometry and labeled objects. This system is designed to support navigation and mapping for lightweight indoor UAVs while minimizing computational load and sensor weight. Future testing in simulations and indoor environments will evaluate the accuracy and speed of this combined method for deployment on small aerial platforms.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Electrical and Computer Engineering