_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

MLOps Pipeline — from experiment to production

28. 07. 2020 3 min read CORE SYSTEMSdevops

87% of machine learning models never make it to production. The problem isn’t in the algorithms — it’s in operations. MLOps is a discipline that brings DevOps principles to the ML world: versioning, automation, monitoring, reproducibility. Here’s our approach to MLOps pipeline in 2020.

Why Jupyter notebook isn’t enough

A data scientist creates a model in a Jupyter notebook. It works great on their laptop, on their dataset, with their version of libraries. And then comes the question: “How do we deploy this to production?”

This is where the problem begins. A notebook isn’t a deployment artifact. It has no tests, no dependency management, no data versioning, no monitoring. The transition from experiment to production is a manual, painful and non-reproducible process that typically takes 3-6 months.

MLOps pipeline — five layers

Based on experience from projects in 2020, we’ve defined a five-layer MLOps pipeline:

1. Data management

Everything starts with data. And data changes — new records, fixed bugs, changed schema. Without data versioning, you can’t reproduce experiments.

  • DVC (Data Version Control): Git for data. Versions large files and datasets using metadata in git and storage in S3/Azure Blob
  • Feature Store: central repository of feature definitions — Feast (open-source) or managed solutions in Azure ML. Consistent features for training and serving
  • Data validation: automatic data quality checks before training — Great Expectations for schema validation, distribution checks, missing values

2. Experiment tracking

Every experiment must be recorded — hyperparameters, metrics, artifacts, code and data versions. MLflow Tracking is the de facto standard in 2020:

import mlflow

with mlflow.start_run():

mlflow.log_param(“learning_rate”, 0.01)

mlflow.log_param(“n_estimators”, 200)

model = train_model(X_train, y_train)

mlflow.log_metric(“auc”, evaluate(model, X_test))

mlflow.sklearn.log_model(model, “model”)

Every run is trackable, comparable and reproducible. No more “I trained that good model last Thursday, but I don’t know with what parameters”.

3. Model registry and CI/CD

MLflow Model Registry provides a central catalog of models with lifecycle management — Staging, Production, Archived. Model promotion from development to production goes through an automated pipeline:

  1. Data scientist registers model in registry as “Staging”
  2. CI pipeline runs automated tests — unit tests on prediction function, integration tests on API endpoint, performance tests on latency
  3. Automatic metric validation — new model must be better than current production model (A/B test on holdout dataset)
  4. Code review and approval from ML engineering team
  5. Automatic deployment to production — canary deployment with gradual traffic ramp-up

4. Model serving

How to deploy the model? In 2020 we see three main patterns:

  • REST API: Model packaged in Flask/FastAPI container behind load balancer. Simple, universal, but latency depends on network roundtrip
  • Batch inference: Spark job that periodically scores entire dataset. Suitable for recommendation systems, scoring, segmentation
  • Edge deployment: Model converted to ONNX and deployed directly in application or on edge device. Minimal latency, no network dependency

For an e-commerce client, we deployed a recommendation model as Azure ML real-time endpoint — managed Kubernetes cluster, autoscaling based on request queue depth, A/B testing between model versions. Average inference latency: 23ms.

5. Monitoring and retraining

Models in production degrade. Data drift — distribution of input data changes. Concept drift — relationship between inputs and outputs shifts.

mlopsmachine learningmlflowkubeflow
Share:

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us