Adaptive Multimodal RGB–LWIR Sensor Fusion and YOLO Backbone Architectures for Real-Time Object Tracking in Autonomous Turret Systems Across Variable Illumination Conditions

Aahan Sachdeva; Dhanvinkumar Ganeshkumar; James Gallagher; Tyler Treat; Edward Oughton

doi:10.13021/jssr2025.5277

Authors

Aahan Sachdeva Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
Dhanvinkumar Ganeshkumar Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
James Gallagher Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
Tyler Treat Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
Edward Oughton Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5277

Abstract

Autonomous robotic platforms are playing a growing role across the emergency services sector, supporting missions such as search and rescue operations in disaster zones and reconnaissance. However, traditional red-green-blue (RGB) detection pipelines struggle in low-light environments, and thermal-based systems fail to capture object characteristics such as color and texture. These complementary limitations suggest that combining thermal and visible imaging may yield more reliable performance across diverse conditions. To address these challenges, this study introduces a unified, adaptive framework that fuses long-wave infrared (LWIR) and RGB video streams at multiple fusion ratios, and dynamically selects the optimal detection model based on the illumination conditions. Using a library of 33 custom-trained you only look once (YOLO) models, we identified the top-performing architectures for the three illumination conditions: no-light (<10 lux), dim light (10–1000 lux), and daylight (>1000 lux). Fusion was performed by blending pixel intensities from aligned LWIR and RGB frames at eleven predefined ratios, from full RGB (100/0) to full LWIR (0/100) in 10% steps. We created a dataset of over 22,000 annotated images across varied illumination using a 75/25 split for model training and validation. Preliminary findings suggest that adaptive RGB–LWIR fusion produced noticeably higher average confidence and firing success rates compared to our baseline models (YOLOv5 and YOLOv11) operating on single modalities. This work lays the foundation for adaptive multimodal perception, improving the reliability of autonomous robotic vision in diverse environments.

Adaptive Multimodal RGB–LWIR Sensor Fusion and YOLO Backbone Architectures for Real-Time Object Tracking in Autonomous Turret Systems Across Variable Illumination Conditions

Authors

DOI:

Abstract

Published

Issue

Section

License

assip