Computational Identification and Mapping of Protein-DNA Interactions
Abstract
Protein-DNA interactions play a crucial role in key biological processes such as gene regulation, replication, and transcription. Understanding these interactions strongly influences the development of gene editing technologies like CRISPR. Machine learning models such as AlphaFold and RoseTTAFold can generally predict protein-DNA structures, but models that can explain their critical binding interactions based on protein or DNA sequences alone remain underdeveloped. Enhancing our understanding of these key interactions will improve the accuracy of such models to the point that we can predict ideal protein binding partners for a given DNA sequence. We developed a Python script to explicitly identify key interactions between protein and DNA in 89 protein-DNA interfaces from the Protein Data Bank. Our script facilitates visualization and analysis of interaction patterns along the DNA sequence, offering a robust framework for understanding the key intermolecular interactions underlying known protein-DNA interfaces, and validating the presence of key interactions on DNA bases. From here, we started developing a second algorithm that determines the best-known DNA-binding protein for an arbitrary stretch of DNA. These algorithms could be adapted to generate modified CRISPR machinery to directly recognize specific DNA sequences, thus potentially increasing CRISPR specificity while decreasing dependence on gRNA.
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.