Optimizing Sequence-to-Graph Alignment

Authors

  • Andrew Gao Department of Computer Science, George Mason University, Fairfax, VA
  • Justin Ma Department of Computer Science, George Mason University, Fairfax, VA
  • Justin Liu Department of Computer Science, George Mason University, Fairfax, VA
  • Bobby Chab Department of Computer Science, George Mason University, Fairfax, VA
  • Fei Li Department of Computer Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5222

Abstract

Graph‑based reference genomes preserve population‑level variation that linear references cannot, yet the combinatorial expansion of candidate paths grows exponentially (2k for k variant nodes) with each additional variant node making sequence‑to‑graph alignment computationally demanding. We evaluate four alignment strategies, a greedy heuristic, dynamic programming (DP), exhaustive search, and exhaustive search with early stopping, on synthetic directed acyclic graphs (DAGs) containing 10-20 nodes with a 10% branching probability and simulated short reads. The greedy heuristic completes in <1 ms per read but attains only 43 % alignment accuracy. Early‑stop exhaustive search guarantees 100 % accuracy but requires 2.3s per read, underscoring the cost of exhaustive exploration and the tradeoff between runtime and completeness. DP achieves a balanced profile, delivering 93% accuracy with a runtime two orders of magnitude faster than exhaustive search. These results map the speed‑accuracy frontier in sequence‑to‑graph alignment and motivate banded DP and heuristic pruning to scale analyses to kilobase‑scale genome graphs.

Published

2025-09-25

Issue

Section

College of Engineering and Computing: Department of Computer Science