How to Deploy ML models
Deploying a machine learning model is one of the most important parts of an ML pipeline and really determines the applicability and accessibility of the model. Building a machine learning model is one of the most challenging tasks of building a ML pipeline for processing and predicting data but deploying it successfully is critical in order to convert your time and effort into real output. There are several important aspects of model deployment that need to be considered while thinking about deploying ML models.
Data access and query: You need to make sure that your model would have easy access to the data and is able to make predictions and/or retrain itself accordingly based on the given data. There are two main types of data querying for ML pipelines: Using an API to query the data that is being stored in another service or use uploaded data that has been provided through a form, either through HTML or other frameworks.
You need to make sure that the data remains safe and the transfer of data is encrypted.
Data processing and storage: The optimality of your ML model will depend on the way that you store and process your data. If one uploads a CSV, saving just the CSV can be a time consuming and computationally intensive task, especially if the data files are huge. To counteract this issue, the data can be stored in slices or stored in a different format such as a hash table or binary tree in order to make sure that the ML model can easily access and process your data without having to dig through millions of rows of a CSV.
A picture of raw data stored as a CSV (https://unsplash.com/photos/Wpnoqo2plFA)
Storage of the ML infrastructure: Your machine learning model can simply be a python file if you retrain your model everytime or it could be a pickle file that has a stored Python object that can be easily loaded and used on incoming data. Most simple deployments of ML models use a pickled version of trained ML models that are loaded and then used to predict outcomes. There are other ways to store information of trained ML models but they are not as common. Make sure to have enough storage space for your pickle files.
Processing infrastructure: This is a crucial part of deploying ML models. This choice of processing infrastructure allows the system to load the ML model and use it on incoming data automatically. If you are using Python, the easiest way is to build a Flask app that has predefined functions for loading the ML model and applying it on an uploaded set of CSV of data points. This infrastructure can also just be an API that works every hour to query data from a MongoDB server (or any place where you have stored your data) to look for new data points and use them to make predictions. The latter is used in more complicated applications where the prediction and retraining takes time, and real-time output is difficult to produce. In such cases, the developer has to build a queue system that allows one to queue their jobs for prediction and have their results emailed to them, which brings us to the most important point.
Another important aspect of the processing infrastructure is debugging to make sure that the user doesn’t load data that would otherwise cause problems in the model. Remember – The ML model is just a machine and does not know how to process data structures that it has never seen before. For example, a model expecting an integer cannot process a string of 1. The data needs to be converted into numerical data (or floats) before it is sent into the ML model.
Thus, testing the infrastructure against a variety of inputs and writing code to address all kinds of scenarios is very important. You may ask yourself questions like – ‘Would adding a space in one data point make a difference ?’. Imagining all the possible scenarios would help you build systems that need less work later.
- Presentation and output: You need to build a proper way to display the results of the model. This could just be an HTML file with dynamic variables that allow you to populate things like accuracy, the predicted result, the error, etc. In more complicated pipelines, the API develops the result into a PDF or an email that is sent to the specified email address. In other cases, it may store the result which can then be queried by the user with a specified key or job ID that was generated at the time of submission.
Logging of results: This is an underrated component of deploying ML models. One must log key statistics and results for each run of the model to make sure that everything runs smoothly. In some cases, one may build a simple script to look in the logs for specific errors or problems which can then be highlighted on a monitoring dashboard. Logging can also help you keep track of bugs or issues that may not have been addressed in the infrastructure.
Monitoring and Maintenance: Maintaining the ML model and fixing it regularly is recommended but may not be necessary based on the context of the problem. In important and legally regulated environments like loan applications, ML models need to be monitored carefully and any biases or drifts must be quickly fixed. In other cases where the model has seen mostly all of the population data and really doesn’t need retraining, such as some biological models trained on the Human DNA, monitoring them doesn’t make a lot of sense except for looking out for errors or bugs that might cause problems to users.
Obtain user feedback: Test the model for a number of users before you distribute it to the general public. Make sure to collect feedback from the users about the pain points of the model and address them accordingly. Employing a UX/UI researcher may be worth the time and effort if your model is very complicated.
Cloud infrastructure and compute power: Lastly, make good estimates of the amount of memory and compute resources that are used by each job. Based on that, make a decision about the cloud infrastructure that you’d like to use for deploying your application. A flask application runs really well on a free Heroku server but cannot handle more than 200 users at a time. Thus, if you’re planning to have more users or queries at a time, invest in good AWS servers that provide good memory and performance. This also allows you to scale the model easily and let more users use it.
Towards the end, your ML model’s success depends on a lot of things including the infrastructure you develop and the infrastructure you deploy to run your model. A lot of things can go sideways in the beginning so make sure to keep checking your logs and your system usage so as to provide a seamless service to your users.
Here at Datatron, we offer a platform to govern and manage all of your Machine Learning, Artificial Intelligence, and Data Science Models in Production. Additionally, we help you automate, optimize, and accelerate your Machine Learning models to ensure they are running smoothly and efficiently in production — To learn more about our services be sure to Request a Demo.