Monitoring & Predictive Maintenance

Q: What data do you need for predictive maintenance?

Minimum: vibration, temperature, current/voltage, operating hours. Ideally: historical failure data (what broke, when, under what conditions). The more historical data, the more accurate the predictions. But even without failure history we can start with anomaly detection.

Q: How accurate are the predictions?

Depends on data quality and quantity. Typically: anomaly detection catches 85-95% of incipient failures. RUL prediction with ±15-20% accuracy (i.e. predicted 14 days → actual failure in 12-17 days). Accuracy improves with more data.

Q: Do we need to change existing sensors?

Often not. We use data from existing PLCs and SCADA systems. If key sensors are missing (vibration on critical bearings), we recommend a retrofit — wireless sensors without any cabling.

Q: How much does implementation cost?

Pilot on 5-10 machines: 2-3 months. Scale-out to the entire operation: 6-12 months. Typical payback period: 6-12 months through reduction of unplanned outages and optimisation of planned maintenance.

Fix it before it breaks. Not after.

Continuous monitoring, anomaly detection, Remaining Useful Life prediction. Reduce unplanned outages by 70%.

I want predictive maintenance Back to IoT & Automation

-70%

Unplanned outages

-25%

Maintenance costs

+20%

Equipment lifetime

<12 months

ROI

From reactive to predictive maintenance¶

Most industrial companies run maintenance reactively or on a time-based schedule. Both are suboptimal:

Reactive maintenance: Machine breaks → we fix it. Unplanned outage, express parts, overtime. The most expensive approach — an unplanned outage costs 3-10× more than a planned one.

Time-based maintenance: Replace the bearing every 6 months. Regardless of actual condition. We replace parts that still work (waste). And we are still surprised by failures between intervals.

Predictive maintenance: Continuous condition monitoring. Maintenance when data says “there will be a problem in 2 weeks” — not sooner (waste), not later (failure). Optimal timing, minimal outages.

ROI in numbers¶

Industry consensus (McKinsey, Deloitte):

25-30% reduction in maintenance costs
70-75% reduction in unplanned outages
20-25% extension of equipment lifetime
35-45% reduction in spare parts inventory

Unplanned production line outage: significant cost per hour (depends on industry). One predicted and prevented outage pays for the project.

Condition monitoring¶

Sensors and measured quantities¶

Vibration — most sensitive indicator of mechanical condition: - Accelerometers on bearing housings, motors, gearboxes - FFT (Fast Fourier Transform) analysis of frequency spectrum - Characteristic frequencies: BPFO, BPFI, BSF, FTF for different bearing defect types - Envelope analysis for detection of early-stage defects

Temperature: - Surface temperature of motors, bearings, transformers - Thermal cameras for non-contact measurement - Trend: slow increase above baseline = lubricant degradation, overload, clogged cooler

Electrical quantities: - Current and voltage — Motor Current Signature Analysis (MCSA) - Phase asymmetry = winding problem - Current increase at constant load = mechanical resistance

Other: - Acoustic emission — ultrasonic detection of leaks, partial discharge - Pressure — hydraulic systems, compressors, filtration systems - Flow — cooling circuits, lubrication systems - Humidity — transformer oil, insulation

Sensor infrastructure¶

Retrofit without cabling:

Wireless vibration sensors (ABB, SKF, Fluke) with 3-5 year battery. Installation in minutes — magnet or adhesive on bearing housing. Communication via BLE, Wi-Fi or LoRaWAN to gateway → cloud/edge.

Integration with existing PLCs:

Most modern PLCs already collect data from analogue inputs. No need to add sensors — just export data via OPC-UA. Caveat: PLCs typically sample slowly (1 Hz) — vibration requires 10-50 kHz (dedicated sensor).

Anomaly detection¶

Baseline — what is normal¶

Every machine has its “fingerprint of normal operation.” Vibration at 120 Hz with amplitude 2.5 mm/s is normal for motor X under load Y. Statistical model of normal:

Training period: 2-4 weeks of data collection in normal operation
Feature extraction: Statistical features (mean, std, RMS, kurtosis, crest factor), frequency features (dominant frequencies, spectral entropy)
Baseline model: Multivariate normal distribution or autoencoder

Anomaly detection methods¶

Statistical methods (quick to deploy): - Z-score per feature. Alert when |z| > 3 (3 sigma). - Control charts (Shewhart, CUSUM, EWMA). Industrial standard, well understood. - Advantage: interpretable, low false positives, no training required.

ML methods (more accurate, more complex): - Isolation Forest: Unsupervised, tree-based. Anomalies are “easily isolated” points. Fast, low memory. - One-class SVM: Boundary around normal data. Everything outside = anomaly. - Autoencoder: Neural network compresses and reconstructs input. High reconstruction error = anomaly. Handles multivariate, non-linear patterns. - Temporal models: LSTM predicts next timestep. Large deviation prediction vs. actual = anomaly.

Alert management¶

Not every anomaly is an alarm. Hierarchy:

Info: Deviation detected, trend monitoring activated
Warning: Deviation persists / grows. Schedule an inspection.
Alert: High probability of failure within X days. Schedule maintenance.
Critical: Immediate risk. Consider shutdown.

Fatigue prevention: Too many false alarms = operators start ignoring them. Target: <5% false positive rate. Alert tuning is a continuous process.

Remaining Useful Life (RUL) Prediction¶

Anomaly detection says: “something is wrong.” RUL prediction says: “it will last approximately 14 more days.”

Approaches¶

Physics-based models: Mathematical model of the degradation process (Paris law for crack propagation, Archard equation for wear). Accurate, but requires deep domain knowledge and specific failure mode understanding.

Data-driven models:

Training data: Historical run-to-failure data — sensors from installation to failure. The more run-to-failure cycles, the better the model.
Feature engineering: Sliding window statistics, trend features, frequency domain features.
Model: Gradient boosted trees (XGBoost, LightGBM) for tabular data. LSTM/Transformer for sequence data. Survival analysis (Cox regression) for censored data.
Output: Predicted RUL in days/hours + confidence interval.

Hybrid: Physics-informed neural networks — ML model with physical constraints. Best of both worlds.

Without historical failure data¶

No run-to-failure data? (Most companies don’t.) We start with:

Anomaly detection — catches deviations from normal
Degradation tracking — trends in key indicators
Expert knowledge — maintenance team knows what indicators mean
Gradual accumulation of failure data — each failure = training data for future model

Typically after 1-2 years of operation we accumulate enough data for a supervised RUL model.

IoT dashboards¶

Grafana as the visualisation platform¶

Fleet overview: All machines on one screen. Green/yellow/red. Drill-down to detail.
Machine detail: Real-time sensor data, trends, historical comparison, predictions.
Shift report: KPI per shift — OEE, availability, performance, quality.
Maintenance view: Open work orders, upcoming predictions, spare parts inventory.

Alerting integration¶

Alert from monitoring → automatic workflow:

Anomaly detected → alert in Grafana
Work order created in CMMS (SAP PM, Maximo, Fiix)
Assignment to technician (automatic or manual)
Technician diagnoses and repairs
Repair confirmation → work order closed
Feedback to ML model (was the prediction correct?)

Mobile access¶

Maintenance team in the field needs data on their phones:

Responsive Grafana dashboards
Push notifications on new alerts
QR code on the machine → opens the machine dashboard on mobile
Offline access to runbooks and documentation

Technology stack¶

Sensors: ABB, SKF, Fluke wireless vibration. Industrial RTD/thermocouple. Current transformers.

Data pipeline: MQTT, OPC-UA, Kafka, InfluxDB, TimescaleDB.

ML: scikit-learn, XGBoost, PyTorch (LSTM/Transformer), MLflow, Kubeflow.

Visualisation: Grafana, custom web dashboards.

Integration: SAP PM, IBM Maximo, Fiix CMMS, custom work order systems.

Edge: Processing on the edge for real-time anomaly detection (see Edge Computing).

Časté otázky

Minimum: vibration, temperature, current/voltage, operating hours. Ideally: historical failure data (what broke, when, under what conditions). The more historical data, the more accurate the predictions. But even without failure history we can start with anomaly detection.

Depends on data quality and quantity. Typically: anomaly detection catches 85-95% of incipient failures. RUL prediction with ±15-20% accuracy (i.e. predicted 14 days → actual failure in 12-17 days). Accuracy improves with more data.

Often not. We use data from existing PLCs and SCADA systems. If key sensors are missing (vibration on critical bearings), we recommend a retrofit — wireless sensors without any cabling.

Pilot on 5-10 machines: 2-3 months. Scale-out to the entire operation: 6-12 months. Typical payback period: 6-12 months through reduction of unplanned outages and optimisation of planned maintenance.

Souvisí s

IoT, Automation & Robotics {'cs': 'Průmyslové IoT, edge computing, robotická automatizace.', 'en': 'Industrial IoT, edge computing, robotic automation.'}

Data Platform & Integration {'cs': 'ETL/ELT, data lakehouse, real-time pipelines.', 'en': 'ETL/ELT, data lakehouse, real-time pipelines.'}

AI & Agentic Systems {'cs': 'Stavíme AI agenty s governance, bezpečností a produkčním provozem.', 'en': 'We build AI agents with governance, security, and production operations.'}

Logistics & E-commerce {'cs': 'Supply chain, WMS, fulfillment automatizace', 'en': 'Supply chain, WMS, fulfillment automation'}

Máte projekt?

Pojďme si o něm promluvit.

Domluvit schůzku