_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Network Operations Center for a Czech Telecom Operator

Leading Czech telecom operator

Challenge

The client, the largest Czech telecommunications operator with more than 8 million customers, faced a fundamental challenge in 2024. The company was preparing for a massive 5G network rollout across the entire Czech Republic while simultaneously needing to modernise its existing network monitoring infrastructure. The existing solution, built on a combination of proprietary tools and internally developed scripts, was reaching its limits.

The main problems included fragmented visibility of the network infrastructure — data from various network segments (access network, transport network, core network, mobile RAN) was stored in separate systems with no unified view. Network Operations Centre (NOC) operators had to switch between five different consoles to diagnose a single incident. The average mean time to identify the root cause of an outage (MTTI) was 47 minutes, which was completely unacceptable for the planned 5G network with stringent SLAs.

With the arrival of 5G, the number of network elements increased exponentially — thousands of small cells, new gNodeB base stations, and edge computing nodes generated a volume of telemetric data that the existing platform could not process. The operator needed a solution capable of processing more than 2 million metrics per second with a latency below 5 seconds.

The operator’s management approached CORE SYSTEMS with a request to design and implement a unified next-generation monitoring platform that would cover both the existing 4G/LTE infrastructure and the upcoming 5G network.

Solution

CORE SYSTEMS designed and implemented a platform called NetPulse — a centralised monitoring system built on open-source technologies with a proprietary integration layer. The key design principle was the ability to scale horizontally and a modular architecture enabling the gradual connection of new data sources.

The solution was divided into four main phases:

Phase 1 — Data Ingestion Layer: Building a universal collection layer capable of receiving data from heterogeneous sources — SNMP traps, syslog, streaming telemetry (gNMI/gRPC), NETCONF, and proprietary APIs from individual vendors (Ericsson, Nokia, Huawei). All data is normalised to a unified data model and published to Apache Kafka clusters.

Phase 2 — Stream Processing: Implementation of a real-time analytics engine based on Apache Flink for event correlation, anomaly detection, and automatic root cause analysis. The system uses a combination of rules engines and ML models trained on the operator’s historical incidents.

Phase 3 — NOC Dashboard: Development of custom Grafana dashboards with geographic coverage visualisation, signal heat maps, hierarchical network topology display, and drill-down functionality from the overall overview down to an individual port on a specific device.

Phase 4 — 5G Integration: Extension of the platform with 5G-specific metrics — monitoring of network slicing, edge computing nodes, beamforming parameters, and handover statistics between 4G and 5G.

Architecture

The NetPulse platform runs on a Kubernetes cluster deployed in the operator’s private data centre in Prague, with a disaster recovery replica in Brno. The architecture is designed as event-driven microservices:

The collection layer consists of a fleet of collectors — lightweight containerised agents specialised for individual protocols. Each collector implements an adapter for a specific type of data source and transforms raw data into a canonical format (Protocol Buffers). Collectors run as Kubernetes DaemonSets on dedicated worker nodes.

The messaging backbone is provided by an Apache Kafka cluster with 12 brokers, partitioned by regions and network segment types. Kafka Streams handles simple transformations and enrichment (adding geolocation, mapping to inventory), while more complex analytics runs in the Flink cluster.

Apache Flink processes alarm correlation using sliding window operations — grouping related events in a 30-second time window and identifying the root cause. ML models for anomaly detection (isolation forests, LSTM networks) run as Flink UDF functions and are regularly retrained on new data.

The data layer combines Prometheus for short-term metrics (15-day retention), Elasticsearch for logs and events (90-day retention), and PostgreSQL with the TimescaleDB extension for long-term trends and reporting (2-year retention). ClickHouse serves as the analytics engine for ad-hoc queries over large volumes of historical data.

The presentation layer is built on Grafana with custom plugins developed by CORE SYSTEMS — in particular a plugin for network topology visualisation with automatic layout and an interactive map plugin displaying signal coverage overlaid on a map of the Czech Republic.

The entire platform is monitored by itself (meta-monitoring) and backed up using Velero to S3-compatible object storage.

Results

The deployment of the NetPulse platform delivered measurable results for the client within the first six months of operation:

Operational efficiency: The average mean time to identify the root cause of an outage (MTTI) dropped from 47 minutes to 11 minutes thanks to automatic alarm correlation. The overall mean time to repair (MTTR) was reduced by 62%, directly impacting customer satisfaction and SLA compliance.

Unified view: NOC operators now work with a single console covering all 12,000+ network elements across the entire Czech Republic. Geographic visualisation enables immediate identification of regional issues and switching between logical and physical views of the network.

5G rollout support: The platform provided critical data for 5G coverage planning — analysis of load on existing 4G cells helped identify locations with the highest priority for 5G deployment. Network slicing monitoring enabled the operator to offer enterprise customers guaranteed SLA parameters.

Capacity planning: Predictive models analysing trends in network element utilisation can forecast capacity expansion needs with 89% accuracy 6 weeks in advance, enabling proactive investment planning.

Financial impact: Annual savings of CZK 8.5 million in NOC operating costs through the automation of routine tasks and a 74% reduction in false alarms. Reduction in escalations to 2nd and 3rd level support by 41%.

Automation: The platform automatically resolves 23% of common incidents without operator intervention — for example automatic service restarts, switching to backup routes, or escalating to the vendor with a complete diagnostic package.

Technology

The NetPulse project uses a modern technology stack optimised for high throughput and low latency:

  • Apache Kafka — messaging backbone, 2M+ messages/s, geo-replication between Prague and Brno data centres
  • Apache Flink — stream processing, event correlation, real-time analytics
  • Prometheus + Thanos — metrics collection and long-term storage with a global query view
  • Elasticsearch — full-text log search, Watcher-based alerting
  • Grafana — visualisation and dashboarding with custom plugins for telco specifics
  • PostgreSQL + TimescaleDB — relational data, inventory, configuration database
  • ClickHouse — OLAP analytics over historical data
  • Kubernetes (OpenShift) — container orchestration in an on-premise environment
  • Python — ML models, data pipelines, integration scripts
  • Ansible + ArgoCD — deployment automation and GitOps workflow

The collaboration between CORE SYSTEMS and the client continues under a long-term managed service contract covering ongoing platform development, ML model training, and support for the expansion of 5G coverage.

Technologie

KafkaGrafanaPrometheusKubernetesPythonElasticsearchPostgreSQLApache Flink

Chcete podobný výsledek?

Řekneme vám, jak na to.

Domluvit schůzku