Machine Learning Operations (MLOps)

As DevOps term is already set in stone, as a combination of cultural philosophies, practices, tools that automate and integrate processes, or methodology that enables IT organizations to faster deliver applications and services, nowadays, we are hearing a lot about MLOps, a term which is rapidly gaining momentum alongside Data Scientists, Machine Learning Engineers, AI Engineers…

MLOps is a set of practices for collaboration and communication between data scientists and operations, providing a scalable and controlled means to deploy machine learning models. Applying these practices increases the quality, simplifies the management process, and automates the deployment of ML model in large-scale production environments, which often comprising data scientist, DevOps, and IT.

MLOps is evolving into an independent approach to entire machine learning lifecycle.

Why and How it works

The objective of MLOps is to help individuals and businesses to deploy and maintain machine learning models efficiently and stably. These goals are hard to achieve without having a good procedure, framework, or philosophy to follow. MLOps helps to be more agile and strategic in making decisions, by accomplishing to automate the model development and deployment faster and, of course, with lower operational costs.

MLOps allows the data scientists to do their work without worrying about the aspects of model execution, as by adopting an MLOps key phases, engineers can collaborate and increase the pace of model development and production.

The key phases of MLOps are as follows:

Data gathering
Data analysis
Data transformation/preparation
Model training and development
Model validation
Model serving
Model monitoring
Model re-training

MLOps differentiates the ML models management from traditional software development lifecycle by suggesting following capabilities:

MLOps aims to unify the release cycle for machine learning and software application release
MLOps enables automated testing of ML artifacts
MLOps enables supporting machine learning models and datasets to build these within CI/CD systems
MLOps reduces technical debt across ML models

DevOps vs MLOps

There are quite differences in execution:

MLOps is much more experimental – Data Scientists, ML Engineers need to tweak a various features related to the models itself, but also similarly to DevOps, they need to keep track and manage the data and codebase for reproducible results.
Hybrid team composition – The team is quite different than DevOps, as it includes the data scientists, ML/DL engineers.
Testing – MLOps have additional steps to control while testing, like validation of models, model training, besides conventional unit and integration testing.
Automated deployment – You cannot just deploy an offline-trained ML model as a service, as you need to include a multi-step pipeline to automatically retrain and deploy a ML model, which adds complexity.
Continuous Integration – CI is no longer only about testing and validating code, but also about testing and validating data, data schemas and models.
Continuous Deployment – CD is no longer about single software package or service, but a system that should automatically deploy another service or roll back changes from existing model.
Continuous Testing – CT is a new property, not existing in DevOps, which is unique to the ML systems, which is related to retraining and serving the models.

Important Existing Tools

Currently the landscape of MLOps tools & technologies is frequently changing, but the MLOps technology stack should include tooling for the following tasks:

Data engineering
Version control of data, ML models and code
CI/CD pipelines
Automating deployments and experiments
Model performance assessment
Model monitoring in production

Depending on the requirements and restrictions organizations can implement their own in-house MLOps solution by combining already existing open-source libs, but there are already fully managed services that enables us with the ability to build, train and deploy ML models faster and more reliable.

The top commercial solutions, based on the cloud providers are:

AWS Sagemaker
Azure MLOps suite:
- Azure Machine Learning
- Azure Pipelines
- Azure Monitor
GCP MLOps suite:
- Dataflow
- AI Platform Notebook
- TFX
- Kubeflow Pipelines

Beside cloud solutions, there are many custom-built tools available, combined into MLOps ecosystem of tools, where some of the mostly used are:

Project Jupyter
Airflow
Kubeflow
MLflow
Optuna
Cortex

Why and How it works

DevOps vs MLOps

Important Existing Tools

Entropy387

Social media