Model Management For Machine Learning
Data Scientists have very much evolved over the years. It’s now expected of every Data Scientist to be capable of pulling in and parsing data into a suitable format. Whether that means preparing the data for Exploratory Data Analysis (EDA) or preparing the data for a Machine Learning model to be fitted, Data Scientists should know this feat like the back of their hand. Of course, this progress is largely due to the rapid rise of open source contributions which have lowered the barrier to entry and made Machine Learning development more accessible.
As happens when boundaries are broken, new problems arise. The new accessibility provided by the open-source community has enabled Data Scientists to iterate over much more experiments and optimize for hundreds of parameters at speeds never seen before meaning that things can get quite messy, hence the tools to support the management of models have cropped up on the market.
There are plenty of moving parts involved when we deploy and use Machine Learning models in business – to name a few, there is model exploration, model refinement, integration testing, model deployment, and more. Advancing without the foresight of how to manage models can make for an extremely counter-effective and cumbersome Machine Learning development, hence the need for Machine Learning Model Management. A vital part of MLOps that we discussed previously.
Two of the main challenges – which could be further broken down into smaller elements – that arise when performing ad-hoc Machine Learning development are as follows:
- Collaboration Prevention
Not recording experiments is a shot in the foot to any team developing Machine Learning models since carefully kept records permits for experiment results to be duplicated and confirmed. In some events, we may wish to prevent duplication of experiments such as when multiple practitioners are working on a problem and each conducting their own experiments. Where a solo practitioner may be capable of recalling all the experiments they’ve conducted, It’s needless to say that in a team environment, duplicated experiments are highly susceptible when there is no solid method for tracking experiments conducted by other teammates.
- Reproducibility Crisis
The great innovator Thomas Edison once said that “I have not failed. I’ve just found 10,000 ways that won’t work.” and this has since been the attitude adopted by inventors when approaching problems. Essentially, the quote implies that even in failure there are useful insights to derive from an experiment but this is where the issue lays during Machine Learning Development. Practitioners usually work in a highly rapid iterative environment and the tendency to continue on to the next iteration after a failed iteration is much more common since tracking past experiments requires effort outside of the effort already being exerted on generating new ideas to approach the problem.
However, as many are away that past experiments may hold more insights than once thought at a time, it is not unusual that one may wish to reproduce a past experiment but this involves using the same source code, model hyperparameters, and version of the data at the time of experimentation. No means to track this often means the loss of insightful information in the ML development process without the ability to reproduce experiments to recoup them.
Model Management
The traditional software development solutions have proven insufficient when working on Machine Learning. Nonetheless, it’s become more and more evident that an analogous solution is required for our Machine Learning applications hence the rise of the model management issue.
As stated prior, Model Management falls under the broad MLOps framework which can be broken down further into 3 distinct parts that optimize for a particular aspect of Machine Learning development:
- Model Development
- Model Deployment
- Model Management
Model Management constitutes the model development phase of the ML Ops pipeline. Essentially, Model Management is a relatively new category of technologies and processes which aid companies to consistently and safely develop, validate, deliver, and monitor Machine Learning models.
An example of the problem that model management solves is the tracking of experiments. The development of Machine Learning models involves a fast-paced experimental phase by which I mean that various models and approaches are implemented to determine their performance on the metrics used to evaluate the quality of the model.
An individual practitioner can iterate through multiple experiments. However, as the number of practitioners working on a particular problem grows, the number of experiments would also grow. Without proper model management, this could significantly increase the time to market as practitioners would have to constantly stop to manually record each experiment if practitioners are to avoid duplicated experiments. Whereas in traditional software development we’d probably track these experiments using tools like git or JIRA, in Machine Learning development the environment is too chaotic and fast to suffice hence these tools fall short.
Although this may sound like a very complex problem to solve, the solution is actually quite simple. It is simply writing down the relevant information – better terminology would be “logging the relevant information”. By saving all of the necessary information into a secure location, it is easy for practitioners to come back to past experiments to compare, analyze, and select a champion model that they may wish to proceed with for future development without the stop stop-starts of manually tracking the experiments – in turn meaning a reduced time to market.
There are many model management frameworks. Since Git is the industry-standard version control system for Software development, the majority of Model Management frameworks provide a very similar experience to git with respect to the architecture and the user interface. Namely, there is ModelDB who is known to be the pioneers of making machine learning model management accessible – MLFlow is also quite popular, and Azure Machine learning which is offered by Microsoft.
Here at Datatron, we offer a platform to govern and manage all of your Machine Learning, Artificial Intelligence, and Data Science Models in Production. Additionally, we help you automate, optimize, and accelerate your Machine Learning models to ensure they are running smoothly and efficiently in production — To learn more about our services be sure to Book a Demo.
Thank you for reading! Connect with me on Medium, LinkedIn, and Twitter to read more insights I have regarding Data Science and Artificial Intelligence.