87% of machine learning models never make it to production. The problem isn’t in the algorithms — it’s in operations. MLOps is a discipline that brings DevOps principles to the ML world: versioning, automation, monitoring, reproducibility. Here’s our approach to MLOps pipeline in 2020.
Why Jupyter notebook isn’t enough¶
A data scientist creates a model in a Jupyter notebook. It works great on their laptop, on their dataset, with their version of libraries. And then comes the question: “How do we deploy this to production?”
This is where the problem begins. A notebook isn’t a deployment artifact. It has no tests, no dependency management, no data versioning, no monitoring. The transition from experiment to production is a manual, painful and non-reproducible process that typically takes 3-6 months.
MLOps pipeline — five layers¶
Based on experience from projects in 2020, we’ve defined a five-layer MLOps pipeline:
1. Data management¶
Everything starts with data. And data changes — new records, fixed bugs, changed schema. Without data versioning, you can’t reproduce experiments.
- DVC (Data Version Control): Git for data. Versions large files and datasets using metadata in git and storage in S3/Azure Blob
- Feature Store: central repository of feature definitions — Feast (open-source) or managed solutions in Azure ML. Consistent features for training and serving
- Data validation: automatic data quality checks before training — Great Expectations for schema validation, distribution checks, missing values
2. Experiment tracking¶
Every experiment must be recorded — hyperparameters, metrics, artifacts, code and data versions. MLflow Tracking is the de facto standard in 2020:
import mlflow
with mlflow.start_run():
mlflow.log_param(“learning_rate”, 0.01)
mlflow.log_param(“n_estimators”, 200)
model = train_model(X_train, y_train)
mlflow.log_metric(“auc”, evaluate(model, X_test))
mlflow.sklearn.log_model(model, “model”)
Every run is trackable, comparable and reproducible. No more “I trained that good model last Thursday, but I don’t know with what parameters”.
3. Model registry and CI/CD¶
MLflow Model Registry provides a central catalog of models with lifecycle management — Staging, Production, Archived. Model promotion from development to production goes through an automated pipeline:
- Data scientist registers model in registry as “Staging”
- CI pipeline runs automated tests — unit tests on prediction function, integration tests on API endpoint, performance tests on latency
- Automatic metric validation — new model must be better than current production model (A/B test on holdout dataset)
- Code review and approval from ML engineering team
- Automatic deployment to production — canary deployment with gradual traffic ramp-up
4. Model serving¶
How to deploy the model? In 2020 we see three main patterns:
- REST API: Model packaged in Flask/FastAPI container behind load balancer. Simple, universal, but latency depends on network roundtrip
- Batch inference: Spark job that periodically scores entire dataset. Suitable for recommendation systems, scoring, segmentation
- Edge deployment: Model converted to ONNX and deployed directly in application or on edge device. Minimal latency, no network dependency
For an e-commerce client, we deployed a recommendation model as Azure ML real-time endpoint — managed Kubernetes cluster, autoscaling based on request queue depth, A/B testing between model versions. Average inference latency: 23ms.
5. Monitoring and retraining¶
Models in production degrade. Data drift — distribution of input data changes. Concept drift — relationship between inputs and outputs shifts.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us