Model Serving and A/B Testing ML Models in Production

Training a model is half the work. Getting it into production, monitoring performance, and safely updating it — that’s the other, harder half.

Model Serving on Kubernetes¶

Seldon Core for orchestrating model serving on Kubernetes. Inference graph: pre-processing → model → post-processing. Automatic scaling based on request rate. REST and gRPC endpoints.

A/B Testing ML Models¶

We don’t want to deploy a new model to 100% of traffic at once. Canary deployment: 5% of traffic to the new model, 95% to the existing one. We compare business metrics (conversion rate, not just accuracy). If the new model wins → gradual rollout.

Model Monitoring¶

We track: prediction latency, error rate, feature drift (is the distribution of input data changing?), prediction drift (is the model predicting differently?). Alibi Detect for drift detection, alerting when thresholds are exceeded.

ML in Production = Continuous Delivery¶

Model deployment is a DevOps problem. A/B testing, canary releases, and monitoring — the same principles as for software.

model servinga/b testingmlopsseldon coreml

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Model Serving and A/B Testing ML Models in Production

Model Serving on Kubernetes¶

A/B Testing ML Models¶

Model Monitoring¶

ML in Production = Continuous Delivery¶

CORE SYSTEMS

Need help with implementation?

Related articles

MLOps with MLflow — From Experiment to Production Model

MLOps Pipeline — from experiment to production

A/B Testing Infrastructure — Data-Driven Decision Making