MLflow serves us well for experiment tracking, but for end-to-end ML pipelines we need more. We tested Kubeflow (self-hosted) and Vertex AI (managed).
Kubeflow on AKS¶
An open-source ML platform on Kubernetes. Pipelines as DAGs, Jupyter notebooks, Katib for hyperparameter tuning, KFServing for model serving. Advantage: full control. Disadvantage: operationally demanding — upgrading Kubeflow is like upgrading a small operating system.
Vertex AI (GCP)¶
A managed ML platform from Google. AutoML for non-ML engineers, custom training jobs, managed pipelines, model monitoring. Advantage: zero ops. Disadvantage: vendor lock-in, cost.
Our Decision¶
A hybrid approach: Kubeflow pipelines for custom workloads on AKS, Vertex AI AutoML for rapid prototypes and smaller projects. MLflow as the shared experiment tracker across both platforms.
There Is No Single Right Platform¶
It depends on the team, budget, and requirements for control vs. simplicity.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us