Machine Learning For The Modern Enterprise: The Way Forward For AI ModelOps
Machine learning — it is difficult to argue that there is an emerging technology that is quite as impactful and has as much potential as the ability to train systems to become really smart and operate in ways previously thought unimaginable. As a matter of fact, research published by Microsoft showed that companies that have a strong AI strategy in place are outperforming companies that do not by as much as 5%.
The impact is unquestionable, so one would wonder why not all companies are jumping into the fray and implementing their own machine learning strategies. The simple answer is, it is very difficult to do so. A very small percentage of ML projects actually make it to production with most companies. To understand why this is so, it is important that one identifies how models are created and deployed, and what challenges exist in the current methods of operationalizing these models.
Recently Victor Thu, Datatron’s VP of Customer Success and Operations did a webinar for ML Ops World where he summarizes the current problems of creating production ready machine learning models, and how Datatron seeks to pave the way forward. Here, we will be summarizing his talking points.
Current Problems With Integrating Machine Learning Solutions in Enterprise
Research done by Gartner actually shows that up to 85% of machine learning projects fail and never makes it into production. Despite the fact that interest in implementing machine learning solutions within enterprises has become widespread, actual implementations remain fairly low.
With the current advances in machine learning and data science techniques, the problem with bringing these solutions into production is not actually because of the quality of these models, but rather the lack of infrastructure in place that allows companies to do so. The development life cycle for machine learning, afterall, is fundamentally different from the lifecycle of traditional software engineering.
For one, because of the sheer amount of data that is needed for machine learning models, there is a greater security and privacy concern. Also, ML solutions are more complex and difficult to integrate with most companies’ current technology stack. To be more precise, these are the two most profound challenges:
1.The machine learning development lifecycle is time consuming.
The creation of machine learning models does not take a lot of time — this can be done in a matter of weeks. After machine learning engineers have presented their models to the DevOps teams, however, the process of getting these models to production actually takes anywhere between 6-9 months. The processes involved are siloed and there is a great disconnect between teams that most companies have yet to address.
2. It is difficult to operationalize machine learning models.
Even when machine learning models have made it to production, there is no guarantee that the models are actually running the way they are intended. The difficulty that lies in the actual operationalization of machine learning models is proving to be a costly business risk and a big reason why companies still do not elect to implement machine learning in their current stack. Models, for example, maybe slowed down by regulatory compliance issues or experience data drift. Moreover, the models could be using bad data that leads to bias among other issues that may cause these models to fail under the radar.
Where Datatron Fits In The Chaos
Datatron’s mission is to address all of the issues that we mentioned above. It provides a way for companies to not only cut down the time to streamline machine learning models to production after creation, therefore shortening the development lifecycle; its centralized platform also makes it easier to deploy, monitor, and govern these models with any framework, language, or library. Here is how Datatron does the heavy duty lifting:
Actionable Model Catalog
One very troublesome pain point for most companies trying to implement machine learning into their tech stack is that they do not have a centralized repository for all of their models, which causes the processes involved with these models to be siloed and isolated, making the development lifecycle a lot more convoluted and inefficient. Even if current teams do have a catalog for their machine learning models and data, oftentimes these are not actionable. For example, existing catalogs do not have infrastructure in place for teams to be able to have version control for their machine learning models. If ever a machine learning model in production experiences some of the operationalization issues that we mentioned earlier such as bias and data drift — it would be very difficult to pinpoint the source of these issues without version control.
Datatron offers an actionable centralized repository for all of your machine learning models with support in both binary and docker formats. Teams are able to get greater visibility and see which models are active. These models will also get a greater amount of transparency with documentation and information on assumptions for each version in place. Not only will teams have version control, but they can easily track the changes made to each model as well.
Model Deployment
Deployment of machine learning models in production can lead to a lot of issues the way it is currently being performed by most teams. One of the issues teams currently face with deployment is that their tools do not address the problem with different codebases and frameworks being disjointed with one another. First of all, managing large datasets can be a very daunting task. Even if teams have the tools in place to manage large datasets, they still cannot tell whether the data within these sets are good. That is, they have no tools in place to score the quality of their data. As we mentioned earlier, this could lead to bias; this means that valuable time and resources would be wasted. It is also apparent that most teams today struggle with tracking and versioning the datasets they use to train their models.
Datatron makes it possible to scale a company’s model deployment easily. First of all, Datatron’s deployment infrastructure is truly vendor agnostic, which means there won’t be any issues with model disjoints and developers are free to choose whatever framework works best for them. Teams will get detailed records for each of their deployments for them to get truly actionable visibility to enable them to deploy models faster. Once models are deployed, teams have access to a REST API endpoint for interactive requests that makes it easier for them to perform tasks such as A/B testing to determine model performance. Managing large datasets is made easier, but determining the quality of the data used is also made possible through batch scoring.
Monitoring and Governance
Once machine learning models have been deployed into production, we mentioned earlier that there are several issues which make the operationalization of these models unfeasible, such as problems concerning the monitoring and governance of these models. MLOps teams need to be able to test whether their models are actually working as intended and whether the data that their models are trained on are actually high quality, making the model auditing process fairly arduous and time consuming. Metrics such as compute usage and accuracy, precision have to constantly be monitored.
Datatron’s platform offers solutions to both these issues. With the dashboard, users are able to better understand how their models perform with the datasets that are used on them in production. The metrics that we mentioned earlier such as accuracy and precision are also shown on the dashboard, giving teams greater visibility to how their models are actually performing. Users can also easily create alerts that allow them to address both deviation and data drifting. MLOps teams will also have a Governance Dashboard in place for them to validate their models as well as compare and contrast different versions of their models by replaying other versions using historical production data. Because the operationalization issues are addressed, companies can ensure that their machine learning models in production actually perform therefore minimizing potential losses and maximizing revenue.
Patented Publisher / Challenger
Another big issue that MLOps and DevOps teams face that slows down the operational efficiency of their machine models is that their workflows become rather disjointed when their machine learning models run into issues, which they always do. The Publisher REST API Endpoint is essential for teams to be able to make interactive requests for these models, but whenever a model gets updated, a new endpoint is generated. This is of course very problematic because most teams will not have the coordination to continuously communicate these new endpoints, since there are so many teams involved. This greatly slows down the development lifecycle for production-level AI.
Datatron fixes this by using a single static endpoint URL through something called a challenger credit gateway. Models deployed by the publishers can be changed as many times as they want without having to communicate new endpoint URLs every time a change is made. This allows teams to have great flexibility when it comes to fine tuning their models, drastically cutting down on the production time. Having multiple publisher models connected to a single challenger credit gateway also lets teams perform A/B testing, where teams control certain variables about each version of the model and test which version works best for certain tasks.
Datatron recognizes the significance of having a strong machine learning implementation infrastructure in place for companies to fully take advantage of this radical new technology. This is all in an effort to ensure that enterprises do not get left behind amidst the new machine learning revolution that is starting in the business world. As far as machine learning for the modern enterprise goes — Datatron seeks to pave the way forward.