MLFeed: Data Science Tutor

Neha; Saloni; Sajed; Thomas

doi:10.13021/jssr2022.3409

Authors

Neha Konduru Aspiring Scientists’ Summer Internship Program Intern
Saloni Shah Aspiring Scientists’ Summer Internship Program Intern
Sajed Jalil Aspiring Scientists’ Summer Internship Program Co-mentor
Dr. Thomas LaToza Aspiring Scientists’ Summer Internship Program Primary Mentor

DOI:

https://doi.org/10.13021/jssr2022.3409

Abstract

The need for data scientists is growing, yet new professionals struggle to find the best practices in building machine learning (ML) models. MLFeed is a system that walks a developer through possible approaches that use the best data science concepts and practices in several development steps. Additionally, the tool detects potential issues in developers’ code and offers explanations of how to resolve them. This project was built by taking data from the Kaggle website, containing a substantial amount of publicly accessible code in the form of notebooks, which use certain approaches to solve data science issues. MLFeed scans the top-ranked Kaggle notebooks for keywords indicating the type of approach the developer is using, allowing us to build an AST (abstract syntax tree) and enhance the deep learning algorithms. Based on the scan, the tool is able to offer suggestions for how the developer should continue building their code in the form of integrative templates. To improve the tool, we created a scikit-learn (a software machine learning library) API (application programming interface) reference to input more keywords and identify the methods within each class. Furthermore, we attempted to run unit tests on various model keywords from a machine-learning API dictionary to detect the object types for AST classification. MLFeed can make data science more accessible and easier to learn in a field where people are reluctant to join because of the level of difficulty.

MLFeed: Data Science Tutor

Authors

DOI:

Abstract

Published

Issue

Section

Categories

License

assip