AutoMute2.0 to Predict Disease-Causing and Neutral Mutations in Human Transmembrane Proteins


  • Abhinav Pappu Aspiring Scientists’ Summer Internship Program Intern
  • Dr. Iosif I. Vaisman Aspiring Scientists’ Summer Internship Program Primary Mentor



Mutations in transmembrane proteins can be detrimental and the cause of deadly diseases. As a result, predicting the lethality of the mutations can be extremely useful for clinical purposes and drug research. AutoMute2.0 is a framework trained using cytoplasmic proteins with tools to predict the disease potential of human single-point mutations. However, because cytoplasmic proteins differ structurally from transmembrane proteins, we were unsure if AutoMute2.0 would be an accurate tool to predict the disease potential of transmembrane proteins. In this project, we recorded and analyzed the results of using AutoMute2.0 on human transmembrane proteins, a task undocumented in any scientific literature. From the database MutHTP, which contains prediction data from other frameworks trained specifically on Transmembrane proteins for over 200,000 point mutations, we created a dataset containing 402 point mutations from nine distinct proteins. Then, we utilized AutoMute2.0 to generate prediction data to classify each one of the point mutations as either a “Neutral” or “Disease-causing” mutation. By comparing the data obtained from AutoMute2.0 to 4 other models trained on transmembrane proteins, AutoMute2.0 demonstrated an average accuracy of 61.8% between all four models. BorodaTM, a software training on Transmembrane proteins, demonstrated an average accuracy of 67.3%, only 5.5% between the average accuracy of AutoMute2.0. Due to the high accuracy between AutoMute2.0 and the other models, we concluded that AutoMute2.0 could predict the disease potential of point mutations in transmembrane proteins with low confidence. Since there were no actual values to compare the prediction data to, it is hard to determine how accurate a model will be, so it is best to use prediction data from multiple models to conclude disease potential.





College of Science: School of Systems Biology