Machine Learning isn’t magic. It’s mathematics, data and lots of experimentation. We’re sharing our experience from our first ML projects.
scikit-learn for 80% of problems¶
In 2020 you have three main choices: scikit-learn (classical ML), TensorFlow (deep learning from Google) and PyTorch (deep learning from Facebook). In practice, 80% of problems are solved by classical algorithms — Random Forest, XGBoost, logistic regression. Deep learning only for NLP and computer vision.
First project: churn prediction¶
Telco client, 500K customers, 47 features, 18 months of history. XGBoost won with AUC 0.87. We spent 70% of time on data — cleaning, feature engineering. The best improvement came from better features, not from a better algorithm.
What surprised us¶
Production deployment is hard. Jupyter notebook → production with monitoring and versioning is a completely different discipline. Explainability — the client wanted to know not only “who will leave”, but “why”. We added SHAP for interpretation.
ML isn’t rocket science — but it’s not trivial either¶
scikit-learn, quality dataset and basic statistics will get you far. But a production ML system is complex — we’ll write more about that.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us