What is MLOps?
Finally! The company you’ve been working for, for the past 8 years has heeded your cries to invest in machine learning, a growing feat poised to revolutionize business as we know it. You were placed in charge of growing the Data Science team, you’re foresight had you covered though. The past 18 months of lingering around lucrative Data Science circles and becoming acquainted with the culture in your free time unlocked your privileged access to the first dabs of some of the most competent Data Scientists around.
Excited by the prospects of churning out models to solve some of the most difficult problems that were out of reach before the rise of Artificial intelligence, you lay down some demo projects to get things running.
6 weeks later the results come back and performance metrics on the demos are blossoming. Former internal Artificial Intelligence skeptics convert to AI evangelists overnight. When the executives are informed, their pupils begin to dilate until they look like dollar signs. “Jackpot baby!” one executive screeched, “We need this in production ASAP. How soon can you do it?” asked another. Considering all the complicated tasks that have been solved, one would imagine that we could be swimming in the dollars pretty quickly since all that was left was some routine development work. Right?
To put this into perspective, at the end of 2019 going into 2020, a report by DeepLearning.AI’s newsletter, The Batch, stated that “only 22 percent of companies using machine learning have successfully deployed a model, the study found.”
“More and more companies are developing machine learning models for internal use. But many are still struggling to bridge the gap to practical deployments.”
(Source: DeepLearning.AI, The Batch Newsletter)
Understanding the challenges we are confronted by when using Machine Learning in Development provides the foundation for what MLOps does and why it is essential for companies adopting Machine Learning into their workflow.
What are the Challenges?
We may begin by first becoming acquainted with the term “DevOps”. DevOps constitutes a set of best practices in software engineering that has made it possible to ship software into a production environment within minutes whilst ensuring that the application is running reliably whilst in production. Therefore it may be said that DevOps is a software engineering practice that unifies software DEVelopment and software OPerationS.
DevOps = Software Development + Software Operations
Here is where the issue lies. When building traditional software applications, much of the concern for the DevOps team is regarding the code. On the other hand, Machine Learning applications involve code as well as data – effectively, the fundamental difference between a Machine Learning application and a Software Engineering application.
The final machine learning model that is deployed into production encompasses an algorithm that has been applied to a large set of data (better known as the training data) which in effect determines the behavior of the model in a production environment – the model’s behavior is also dependent on the input data that it receives at inference time, but we have no way of knowing in advance what it would be.
Essentially, code in both traditional software applications and machine learning applications are crafted in a controlled development environment. However, machine learning applications also include data that comes from a never-ending source called the “real-world”. Data does not stop changing, and there is no measure to completely control how data should change. To enhance your perception of this concept, you may consider a relationship between code and data of which both live in their own independent planes though sharing a dimension of time.
The gap between the planes is illustrative of a disconnect that is the root of several vital challenges that ought to be overcome by absolutely anybody attempting to deploy a Machine Learning model into production successfully. These challenges include:
- Slow deployments
- Training-Serving skew (a difference between performance during training and performance during serving)
- Lack of reproducibility
The goal of the person attempting to deploy machine learning models into production is to bridge the gap between the 2 planes, in-turn overcoming challenges such as those listed above.
Overall, MLOps is a set of best practices that were modeled after the existing discipline of DevOps in order to aid businesses in deploying and maintaining Machine Learning algorithms in production reliably and efficiently.
In future post we would cover MLOps in deeper details, covering various aspects of what is involved in MLOPs such as:
- The Deployment Environment
- Model Agnostic Deployment for various technologies
- High-Availability and Disaster Recoverability
- Multi-tenancy
- Model Validation, AB testing, and Shadow Mode
Here at Datatron, we offer a platform to govern and manage all of your Machine Learning, Artificial Intelligence, and Data Science Models in Production. Additionally, we help you automate, optimize, and accelerate your Machine Learning models to ensure they are running smoothly and efficiently in production — To learn more about our services be sure to Book a Demo.
Thank you for reading! Connect with me on Medium, LinkedIn, and Twitter to read more insights I have regarding Data Science and Artificial Intelligence related