Optimizing Sequence-to-Graph Alignment
DOI:
https://doi.org/10.13021/jssr2025.5222Abstract
Graph‑based reference genomes preserve population‑level variation that linear references cannot, yet the combinatorial expansion of candidate paths grows exponentially (2k for k variant nodes) with each additional variant node making sequence‑to‑graph alignment computationally demanding. We evaluate four alignment strategies, a greedy heuristic, dynamic programming (DP), exhaustive search, and exhaustive search with early stopping, on synthetic directed acyclic graphs (DAGs) containing 10-20 nodes with a 10% branching probability and simulated short reads. The greedy heuristic completes in <1 ms per read but attains only 43 % alignment accuracy. Early‑stop exhaustive search guarantees 100 % accuracy but requires 2.3s per read, underscoring the cost of exhaustive exploration and the tradeoff between runtime and completeness. DP achieves a balanced profile, delivering 93% accuracy with a runtime two orders of magnitude faster than exhaustive search. These results map the speed‑accuracy frontier in sequence‑to‑graph alignment and motivate banded DP and heuristic pruning to scale analyses to kilobase‑scale genome graphs.
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.