MLFeed: Data Science Tutor
DOI:
https://doi.org/10.13021/jssr2022.3409Abstract
The need for data scientists is growing, yet new professionals struggle to find the best practices in building machine learning (ML) models. MLFeed is a system that walks a developer through possible approaches that use the best data science concepts and practices in several development steps. Additionally, the tool detects potential issues in developers’ code and offers explanations of how to resolve them. This project was built by taking data from the Kaggle website, containing a substantial amount of publicly accessible code in the form of notebooks, which use certain approaches to solve data science issues. MLFeed scans the top-ranked Kaggle notebooks for keywords indicating the type of approach the developer is using, allowing us to build an AST (abstract syntax tree) and enhance the deep learning algorithms. Based on the scan, the tool is able to offer suggestions for how the developer should continue building their code in the form of integrative templates. To improve the tool, we created a scikit-learn (a software machine learning library) API (application programming interface) reference to input more keywords and identify the methods within each class. Furthermore, we attempted to run unit tests on various model keywords from a machine-learning API dictionary to detect the object types for AST classification. MLFeed can make data science more accessible and easier to learn in a field where people are reluctant to join because of the level of difficulty.
Published
Issue
Section
Categories
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.