Machine Learning Models of CHI3L1 Mutant Pathogenicity using Computational Mutagenesis
CHI3L1 is a secreted glycoprotein thought to play an active and crucial role in angiogenesis, antibacterial and inflammatory response throughout the body, and stimulation of human connective tissue cell proliferation. Pathogenic CHI3L1 variants were observed in numerous cancers, autoimmune, and inflammatory diseases, including asthma, atoptic dermatitis, rheumatoid arthritis, Alzheimer’s disease, and glioblastoma cancers, among other common poorly treatable conditions. Evaluation of CHI3L1 missense variant pathogenicity is critical to the future of CHI3L1-related drug research and discovery of promising biomarkers. Current knowledge of pathogenic variants is limited to clinical findings with insufficient evidence to confidently validate research, with many findings expressing “possible” or “probable” pathogenic effects of SNPs. Through four structure-based and sequence-based machine learning programs (AutoMute2.0, SNPs & GO3d, MutPred, PANTHER) trained with large numbers of diverse single residue mutations to predict disease potential of SNPs, computational mutagenesis was performed on 7.3k CHI3L1 variants using protein sequence and structure from PDB ID 1hjx. AlphaFold structure AF-P36222-F1 was also run in AutoMute to explore the possibility of predicting the unmapped residues 1-21 from structure 1hjx and for cross-referencing of results from regions with high model confidence; residues 1-21 yielded low confidence results due to the low pLDDT score in that region. CHI3L1 variants assessed at all possible residue positions with limitations of prediction at the positions with less than 6 neighboring amino acid residues. Results display promising baseline for future study of CHI3L1 mutant-prompted pathogenesis.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.