MLOps Pipeline — from experiment to production

87% of machine learning models never make it to production. The problem isn’t in the algorithms — it’s in operations. MLOps is a discipline that brings DevOps principles to the ML world: versioning, automation, monitoring, reproducibility. Here’s our approach to MLOps pipeline in 2020.

Why Jupyter notebook isn’t enough¶

A data scientist creates a model in a Jupyter notebook. It works great on their laptop, on their dataset, with their version of libraries. And then comes the question: “How do we deploy this to production?”

This is where the problem begins. A notebook isn’t a deployment artifact. It has no tests, no dependency management, no data versioning, no monitoring. The transition from experiment to production is a manual, painful and non-reproducible process that typically takes 3-6 months.

MLOps pipeline — five layers¶

Based on experience from projects in 2020, we’ve defined a five-layer MLOps pipeline:

1. Data management¶

Everything starts with data. And data changes — new records, fixed bugs, changed schema. Without data versioning, you can’t reproduce experiments.

DVC (Data Version Control): Git for data. Versions large files and datasets using metadata in git and storage in S3/Azure Blob
Feature Store: central repository of feature definitions — Feast (open-source) or managed solutions in Azure ML. Consistent features for training and serving
Data validation: automatic data quality checks before training — Great Expectations for schema validation, distribution checks, missing values

2. Experiment tracking¶

Every experiment must be recorded — hyperparameters, metrics, artifacts, code and data versions. MLflow Tracking is the de facto standard in 2020:

import mlflow

with mlflow.start_run():

mlflow.log_param(“learning_rate”, 0.01)

mlflow.log_param(“n_estimators”, 200)

model = train_model(X_train, y_train)

mlflow.log_metric(“auc”, evaluate(model, X_test))

mlflow.sklearn.log_model(model, “model”)

Every run is trackable, comparable and reproducible. No more “I trained that good model last Thursday, but I don’t know with what parameters”.

3. Model registry and CI/CD¶

MLflow Model Registry provides a central catalog of models with lifecycle management — Staging, Production, Archived. Model promotion from development to production goes through an automated pipeline:

Data scientist registers model in registry as “Staging”
CI pipeline runs automated tests — unit tests on prediction function, integration tests on API endpoint, performance tests on latency
Automatic metric validation — new model must be better than current production model (A/B test on holdout dataset)
Code review and approval from ML engineering team
Automatic deployment to production — canary deployment with gradual traffic ramp-up

4. Model serving¶

How to deploy the model? In 2020 we see three main patterns:

REST API: Model packaged in Flask/FastAPI container behind load balancer. Simple, universal, but latency depends on network roundtrip
Batch inference: Spark job that periodically scores entire dataset. Suitable for recommendation systems, scoring, segmentation
Edge deployment: Model converted to ONNX and deployed directly in application or on edge device. Minimal latency, no network dependency

For an e-commerce client, we deployed a recommendation model as Azure ML real-time endpoint — managed Kubernetes cluster, autoscaling based on request queue depth, A/B testing between model versions. Average inference latency: 23ms.

5. Monitoring and retraining¶

Models in production degrade. Data drift — distribution of input data changes. Concept drift — relationship between inputs and outputs shifts.

mlopsmachine learningmlflowkubeflow

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.