Skip to content
_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN DE
Let's talk

MLOps Pipeline — from experiment to production

28. 07. 2020 Updated: 24. 03. 2026 3 min read CORE SYSTEMSdevops
This article was published in 2020. Some information may be outdated.
MLOps Pipeline — from experiment to production

87% of machine learning models never make it to production. The problem isn’t in the algorithms — it’s in operations. MLOps is a discipline that brings DevOps principles to the ML world: versioning, automation, monitoring, reproducibility. Here’s our approach to MLOps pipeline in 2020.

Why Jupyter notebook isn’t enough

A data scientist creates a model in a Jupyter notebook. It works great on their laptop, on their dataset, with their version of libraries. And then comes the question: “How do we deploy this to production?”

This is where the problem begins. A notebook isn’t a deployment artifact. It has no tests, no dependency management, no data versioning, no monitoring. The transition from experiment to production is a manual, painful and non-reproducible process that typically takes 3-6 months.

MLOps pipeline — five layers

Based on experience from projects in 2020, we’ve defined a five-layer MLOps pipeline:

1. Data management

Everything starts with data. And data changes — new records, fixed bugs, changed schema. Without data versioning, you can’t reproduce experiments.

  • DVC (Data Version Control): Git for data. Versions large files and datasets using metadata in git and storage in S3/Azure Blob
  • Feature Store: central repository of feature definitions — Feast (open-source) or managed solutions in Azure ML. Consistent features for training and serving
  • Data validation: automatic data quality checks before training — Great Expectations for schema validation, distribution checks, missing values

2. Experiment tracking

Every experiment must be recorded — hyperparameters, metrics, artifacts, code and data versions. MLflow Tracking is the de facto standard in 2020:

import mlflow

with mlflow.start_run():

mlflow.log_param(“learning_rate”, 0.01)

mlflow.log_param(“n_estimators”, 200)

model = train_model(X_train, y_train)

mlflow.log_metric(“auc”, evaluate(model, X_test))

mlflow.sklearn.log_model(model, “model”)

Every run is trackable, comparable and reproducible. No more “I trained that good model last Thursday, but I don’t know with what parameters”.

3. Model registry and CI/CD

MLflow Model Registry provides a central catalog of models with lifecycle management — Staging, Production, Archived. Model promotion from development to production goes through an automated pipeline:

  1. Data scientist registers model in registry as “Staging”
  2. CI pipeline runs automated tests — unit tests on prediction function, integration tests on API endpoint, performance tests on latency
  3. Automatic metric validation — new model must be better than current production model (A/B test on holdout dataset)
  4. Code review and approval from ML engineering team
  5. Automatic deployment to production — canary deployment with gradual traffic ramp-up

4. Model serving

How to deploy the model? In 2020 we see three main patterns:

  • REST API: Model packaged in Flask/FastAPI container behind load balancer. Simple, universal, but latency depends on network roundtrip
  • Batch inference: Spark job that periodically scores entire dataset. Suitable for recommendation systems, scoring, segmentation
  • Edge deployment: Model converted to ONNX and deployed directly in application or on edge device. Minimal latency, no network dependency

For an e-commerce client, we deployed a recommendation model as Azure ML real-time endpoint — managed Kubernetes cluster, autoscaling based on request queue depth, A/B testing between model versions. Average inference latency: 23ms.

5. Monitoring and retraining

Models in production degrade. Data drift — distribution of input data changes. Concept drift — relationship between inputs and outputs shifts.

mlopsmachine learningmlflowkubeflow
Share:

CORE SYSTEMS

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us
Need help with implementation? Schedule a meeting