Using Data Visualization and Machine Learning Techniques to Determine Which Features Are Most Helpful in Predicting Malignant Or Benign Tumors


  • Anish Kohli Aspiring Scientists’ Summer Internship Program Intern
  • Anvitha Aravelly Aspiring Scientists’ Summer Internship Program Co-mentor
  • Dr. Kamaljeet Sanghera Aspiring Scientists’ Summer Internship Program Primary Mentor



In a couple of years, cancer will become the leading cause of death in the United States. Although there are many ways to prevent this catastrophe, there are some types of cancer that remain treatless. One of the most common cancer types is breast cancer, and early diagnosis is crucial for better treatment. There are many studies about predicting the type of breast tumors, but in this study, public data about breast cancer tumors from Dr. William H. Walberg of the University of Wisconsin Hospital was taken and used for data visualization, classification, and machine learning algorithms. This study aimed to establish an adequate model by revealing the predictive factors or features of early-stage breast cancer patients. Data visualization and machine learning techniques can provide a significant impact on cancer detection when deciding the type of tumor. In this work, I looked at different features that determine if the tumor is benign or malignant through AI data visualization. I found certain features that determine a tumor and used heat data maps, box plots, and swarm plots to see the correlation between what determines a benign or malignant tumor. Visualizing the data helped to understand the correlation between each feature and highlighted unnecessary features that were unessential to use while making predictions. The data visualization also helped identify which features are most correlated with defining a tumor as benign or malignant. This is very helpful in assessing a patient's cancer position and how to act on the patient efficiently when needed as cancer is very hazardous and if the patient has a malignant tumor, actions and precautions must be taken immediately for the best possible care.





Institute for Digital Innovation