MLFeed: Data Science Tutor

Authors

  • Neha Konduru Aspiring Scientists’ Summer Internship Program Intern
  • Saloni Shah Aspiring Scientists’ Summer Internship Program Intern
  • Sajed Jalil Aspiring Scientists’ Summer Internship Program Co-mentor
  • Dr. Thomas LaToza Aspiring Scientists’ Summer Internship Program Primary Mentor

DOI:

https://doi.org/10.13021/jssr2022.3409

Abstract

The need for data scientists is growing, yet new professionals struggle to find the best practices in building machine learning (ML) models. MLFeed is a system that walks a developer through possible approaches that use the best data science concepts and practices in several development steps. Additionally, the tool detects potential issues in developers’ code and offers explanations of how to resolve them. This project was built by taking data from the Kaggle website, containing a substantial amount of publicly accessible code in the form of notebooks, which use certain approaches to solve data science issues. MLFeed scans the top-ranked Kaggle notebooks for keywords indicating the type of approach the developer is using, allowing us to build an AST (abstract syntax tree) and enhance the deep learning algorithms. Based on the scan, the tool is able to offer suggestions for how the developer should continue building their code in the form of integrative templates. To improve the tool, we created a scikit-learn (a software machine learning library) API (application programming interface) reference to input more keywords and identify the methods within each class. Furthermore, we attempted to run unit tests on various model keywords from a machine-learning API dictionary to detect the object types for AST classification. MLFeed can make data science more accessible and easier to learn in a field where people are reluctant to join because of the level of difficulty.

Published

2022-12-13

Issue

Section

College of Engineering and Computing: Department of Computer Science

Categories