Robust Sleep Stage Classification from Multimodal Physiological Signals Using a Cross-Modal Transformer with Missing Data Resilience
Abstract
Accurate and robust classification of sleep stages, in particular classifying Wake, N1, N2, N3, and REM stages, is of uttermost importance to diagnose sleep disorders and promote overall health. Traditional laboratory-based polysomnography (PSG) is resource-intensive, while existing at-home monitoring solutions often struggle with noisy or missing data. This research proposes a deep learning framework featuring a Cross-Modal Transformer for sleep stage classification that fuses multimodal physiological data (EEG, EOG, EMG) while maintaining performance in the presence of incomplete sensor streams. Unlike simple attention or gated fusion which proved ineffective, the Transformer architecture forces the model to learn a joint, intertwined representation by treating the modalities as a single sequence of feature "tokens." The methodology involves extracting comprehensive features from the Sleep-EDF dataset, followed by subject-wise train-test splitting and aggressive data augmentation that simulates the random loss of any modality. A baseline attention model demonstrated a significant failure to adapt, with Cohen's Kappa score dropping from 0.74 to 0.64-0.66 when a modality was removed. In contrast, the Cross-Modal Transformer achieved a Kappa of 0.75 and accuracy of 81.3 with all modalities present and demonstrated graceful degradation, with scores of 0.74 when missing EMG, 0.73 when missing EOG, and a strong 0.72 when relying on EEG only. Future work will explore real-time implementation, investigate the framework's adaptability to other physiological signals, and evaluate its performance on more diverse and larger datasets from real-world, at-home monitoring scenarios.
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.