As DevOps term is already set in stone, as a combination of cultural philosophies, practices, tools that automate and integrate processes, or methodology that enables IT organizations to faster deliver applications and services, nowadays, we are hearing a lot about MLOps, a term which is rapidly gaining momentum alongside Data Scientists, Machine Learning Engineers, AI Engineers…
MLOps is a set of practices for collaboration and communication between data scientists and operations, providing a scalable and controlled means to deploy machine learning models. Applying these practices increases the quality, simplifies the management process, and automates the deployment of ML model in large-scale production environments, which often comprising data scientist, DevOps, and IT.
MLOps is evolving into an independent approach to entire machine learning lifecycle.
Why and How it works
The objective of MLOps is to help individuals and businesses to deploy and maintain machine learning models efficiently and stably. These goals are hard to achieve without having a good procedure, framework, or philosophy to follow. MLOps helps to be more agile and strategic in making decisions, by accomplishing to automate the model development and deployment faster and, of course, with lower operational costs.
MLOps allows the data scientists to do their work without worrying about the aspects of model execution, as by adopting an MLOps key phases, engineers can collaborate and increase the pace of model development and production.
The key phases of MLOps are as follows:
- Data gathering
- Data analysis
- Data transformation/preparation
- Model training and development
- Model validation
- Model serving
- Model monitoring
- Model re-training
MLOps differentiates the ML models management from traditional software development lifecycle by suggesting following capabilities:
- MLOps aims to unify the release cycle for machine learning and software application release
- MLOps enables automated testing of ML artifacts
- MLOps enables supporting machine learning models and datasets to build these within CI/CD systems
- MLOps reduces technical debt across ML models
DevOps vs MLOps
The objective of MLOps is to help individuals and businesses to deploy and maintain machine learning models efficiently and stably. These goals are hard to achieve without having a good procedure, framework, or philosophy to follow. MLOps helps to be more agile and strategic in making decisions, by accomplishing to automate the model development and deployment faster and, of course, with lower operational costs.
MLOps allows the data scientists to do their work without worrying about the aspects of model execution, as by adopting an MLOps key phases, engineers can collaborate and increase the pace of model development and production.
There are quite differences in execution:
- MLOps is much more experimental – Data Scientists, ML Engineers need to tweak a various features related to the models itself, but also similarly to DevOps, they need to keep track and manage the data and codebase for reproducible results.
- Hybrid team composition – The team is quite different than DevOps, as it includes the data scientists, ML/DL engineers.
- Testing – MLOps have additional steps to control while testing, like validation of models, model training, besides conventional unit and integration testing.
- Automated deployment – You cannot just deploy an offline-trained ML model as a service, as you need to include a multi-step pipeline to automatically retrain and deploy a ML model, which adds complexity.
- Continuous Integration – CI is no longer only about testing and validating code, but also about testing and validating data, data schemas and models.
- Continuous Deployment – CD is no longer about single software package or service, but a system that should automatically deploy another service or roll back changes from existing model.
- Continuous Testing – CT is a new property, not existing in DevOps, which is unique to the ML systems, which is related to retraining and serving the models.
Important Existing Tools
Currently the landscape of MLOps tools & technologies is frequently changing, but the MLOps technology stack should include tooling for the following tasks:
- Data engineering
- Version control of data, ML models and code
- CI/CD pipelines
- Automating deployments and experiments
- Model performance assessment
- Model monitoring in production
Depending on the requirements and restrictions organizations can implement their own in-house MLOps solution by combining already existing open-source libs, but there are already fully managed services that enables us with the ability to build, train and deploy ML models faster and more reliable.
The top commercial solutions, based on the cloud providers are:
- AWS Sagemaker
- Azure MLOps suite:
- Azure Machine Learning
- Azure Pipelines
- Azure Monitor
- GCP MLOps suite:
- Dataflow
- AI Platform Notebook
- TFX
- Kubeflow Pipelines
Beside cloud solutions, there are many custom-built tools available, combined into MLOps ecosystem of tools, where some of the mostly used are:
- Project Jupyter
- Airflow
- Kubeflow
- MLflow
- Optuna
- Cortex