PET and SPET: A Two-Stage Pipeline for Pulmonary Embolism Detection from CTPA Scans

Authors

  • Anisha Raghu Quarry Lane School, Dublin, California
  • Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA

Abstract

Pulmonary embolism (PE) is a life-threatening condition with a high mortality rate of 30%. Immediate treatment is crucial to improve treatment outcomes, but PE diagnosis often takes multiple days due to the limited availability of radiologists to analyze computed tomography pulmonary angiography (CTPA) images, which involves taking multiple X-ray images (often 100s of slices) of the chest region. Therefore, automation of diagnosis of PE can significantly improve patient outcomes. We use the RadFusion dataset to improve upon prior work to diagnose PE using the CTPA scans as well as using electronic health records. The labeling method and pre-processing of EHR data was improved, with over 50% of the features being identified as redundant. Correlation based analysis was done to select key EHR features, drastically reducing the number of features needed. A novel two stage pipeline was developed for analyzing CTPE images. This approach consists of one model, PE-Transformer (PET), to analyze a window of contiguous CTPE slices (Chunks), followed by another model, Sequential PE-Transformer (S-PET) to aggregate information across multiple chunks.  The PET model architecture improved AUROC by 1.8% due to using a standard DinoV2 + Transformer architecture rather than a custom architecture. The   S-PET model improved the accuracy by an additional 1.2% due to aggregating information across chunks.  Distilling the metadata features to just the top 16 most important features and using a random forest classifier resulted in the highest AUROC of 0.79 compared to the 0.76 when using all the metadata features. The analysis supports the results from larger datasets such as INSPECT and bridge the gap in prior work (RadFusion) which indicated that EHR data was more accurate than PE data.  We also demonstrate that modern self-supervised backbones trained on web scale data offer superior performance, reducing the need for custom architectures. This research also shows that very few EHR features contribute to accuracy, reducing the need for collection of large amounts of EHR data. The current approach isn't end-to-end trainable and separates chunk-level and patient-level models; in the future, we aim to explore unified models and more complex backbones. The code and checkpoints are available at: https://github.com/Anisha234/ASSIP_research_penet

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Information Sciences and Technology