Home / Know-how / Data

KATEGORIE

Data

61 articles

intermediate12. 11. 2025

MongoDB Aggregation Pipeline

How to query MongoDB efficiently. $match, $group, $lookup, $unwind.

intermediate30. 10. 2025

ETL vs ELT — When to Use Which Approach for Data Pipelines

Comparison of ETL and ELT approaches to data pipelines. When to choose extraction before transformation and when not.

intermediate21. 10. 2025

ChromaDB Tutorial

Open-source vector DB — installation and querying.

intermediate20. 10. 2025

The Complete Guide to PostgreSQL

PostgreSQL complete — installation, SQL, indexes, JSONB, replication, backup.

intermediate14. 10. 2025

SQL vs NoSQL

Relational vs non-relational databases — CAP theorem and use cases.

intermediate01. 10. 2025

How to Scale an Application from 0 to 1M Users

Progressive web application scaling — from a single server to a million users.

intermediate25. 09. 2025

Lakehouse Architecture — Merging Data Lake and Warehouse

Lakehouse combines the flexibility of a data lake with the reliability of a warehouse. Medallion architecture.

intermediate24. 09. 2025

Parquet, Avro, ORC — Serialization Formats for Data Engineering

Comparison of data formats Parquet, Avro, ORC and JSON. When to use which in data pipeline.

intermediate17. 09. 2025

MongoDB vs PostgreSQL

Document vs relational database — when to choose which.

intermediate07. 09. 2025

The Complete Guide to Elasticsearch

Elasticsearch — full-text search, agregace, logging, monitoring.

intermediate22. 08. 2025

Spark Structured Streaming — Unified Batch and Stream Processing

Spark Structured Streaming combines batch and stream in one API. Micro-batch and Delta Lake integration.

intermediate21. 08. 2025

Batch Processing

How to efficiently process large data volumes. Chunks, streaming, parallel.

intermediate13. 08. 2025

System Design Interview: Preparation

How to prepare for a system design interview — framework, examples, resources.

intermediate13. 08. 2025

Looker — BI Platform with LookML Modeling Layer

Google's Looker with LookML layer. Central metric definitions and governance.

intermediate12. 08. 2025

OWASP Top 10: Injection

SQL injection, NoSQL injection, OS command injection — how they work and how to defend against them.

intermediate04. 08. 2025

Read Replicas — Scaling Reads

Scaling databases with read replicas. Master-slave replication and routing.

intermediate05. 07. 2025

ClickHouse — Columnar Database for Lightning-Fast Analytical Queries

ClickHouse is an open-source columnar OLAP database. MergeTree engine and materialized views.

intermediate10. 06. 2025

Docker Compose for Development

Local development with docker-compose. Multi-container setup, volumes and networking.

intermediate02. 06. 2025

Trino — Distributed SQL Engine for Federated Queries

Trino is a distributed SQL engine for querying heterogeneous sources without moving data.

intermediate05. 04. 2025

Hadoop Ecosystem — HDFS, YARN and Modern Alternatives

The Hadoop ecosystem from HDFS to Hive. History and the transition to modern cloud solutions.

intermediate22. 03. 2025

PostgreSQL Replication

Streaming and logical replication, failover.

intermediate23. 02. 2025

DataHub — Open Data Catalog for Modern Data Stack

DataHub from LinkedIn is open-source data catalog. Metadata, lineage and governance.

intermediate03. 02. 2025

Redis Patterns — Cache, Session, Queue

Redis as cache, session store, pub/sub, rate limiter.

intermediate12. 12. 2024

PostgreSQL JSON Operations

JSONB type, operators, indexes and practical examples.

intermediate13. 11. 2024

PostgreSQL Indexes Deep Dive

B-tree, GIN, GiST, BRIN, partial and expression indexes.

intermediate06. 11. 2024

PostgreSQL vs MySQL

The two most popular open-source SQL databases.

intermediate08. 09. 2024

Apache Flink — Real-time Stream Processing Engine

Apache Flink is a framework for stateful stream processing. Windowing, event time and exactly-once semantics.

intermediate10. 07. 2024

PostgreSQL Advanced Features

JSONB, CTE, window functions, partitioning, extensions.

intermediate27. 04. 2024

Connection Pooling

How to properly pool database connections. PgBouncer, HikariCP, SQLAlchemy.

intermediate24. 03. 2024

OLAP vs OLTP — Analytical vs Transactional Databases

Difference between OLAP and OLTP databases. Columnar vs row storage and choosing for different use cases.

intermediate16. 03. 2024

Dagster — Modern Orchestration with Asset-Based Approach

Dagster brings asset-oriented approach to orchestration. Software-defined assets, type system and monitoring.

intermediate02. 03. 2024

WebSocket — Real-Time Communication

WebSocket server implementation for chat, notifications and live dashboards. Scaling with Redis.

intermediate12. 01. 2024

AWS RDS — Managed Databases

Relational Database Service. Multi-AZ, Read Replicas, Aurora, backup, and performance tuning.

intermediate26. 11. 2023

Real-Time Analytics — Architecture for Real-Time Analysis

Real-time analytics architecture. Lambda vs Kappa, streaming pipelines and OLAP engines.

intermediate20. 09. 2023

Debezium — Change Data Capture for Real-time Replication

Debezium is open-source CDC platform. Capturing database changes via Kafka Connect.

intermediate07. 08. 2023

When to Use NoSQL vs SQL

SQL vs NoSQL — PostgreSQL vs MongoDB vs Redis. When to use what.

intermediate28. 07. 2023

Azure Cosmos DB — Global NoSQL

Cosmos DB API models, consistency levels, partitioning and RU optimization.

intermediate04. 07. 2023

PostgreSQL Partitioning

Declarative partitioning for large tables.

intermediate10. 06. 2023

Redis Pub/Sub

Real-time messaging with Redis Pub/Sub.

intermediate26. 05. 2023

How Much Does Web Application Hosting Cost (2025)

Real costs of web application hosting — from free tier to enterprise.

intermediate09. 03. 2023

Apache Kafka — Distributed Streaming Platform

Apache Kafka is a distributed event streaming system. Topics, partitioning and consumer groups.

intermediate01. 02. 2023

Elasticsearch Tutorial

Full-text search, indexing, queries, aggregations.

intermediate28. 01. 2023

Apache Iceberg — Open Table Format for Data Lake

Apache Iceberg — hidden partitioning, schema evolution and time travel. Vendor-neutral.

intermediate06. 11. 2022

dbt — Data Transformation in Warehouse Using SQL

dbt enables data transformation in warehouse using SQL. Models, tests, documentation and versioning.

intermediate16. 07. 2022

Database Migration Checklist

Database migration checklist — planning, testing, rollback, zero-downtime.

intermediate25. 06. 2022

CAP Theorem in Practice

Practical implications of the CAP theorem on system design. CP vs AP systems.

intermediate20. 06. 2022

Kafka Connect — System Integration Without Code

Kafka Connect links Kafka with databases, files and cloud services. Source and sink connectors without coding.

intermediate04. 06. 2022

Apache Spark — Distributed Batch Processing of Big Data

Apache Spark is an engine for distributed data processing. DataFrame API, Spark SQL and optimization.

intermediate08. 05. 2022

Kafka vs RabbitMQ

Event streaming vs message broker — architecture and use cases.

intermediate10. 04. 2022

Docker Volumes and Storage

Managing data in Docker — volumes, bind mounts and best practices.

intermediate03. 12. 2021

PostgreSQL: 15 Optimization Tricks

PostgreSQL optimization — indexes, EXPLAIN ANALYZE, connection pooling, vacuum and more.

intermediate25. 09. 2021

PostgreSQL Installation and Configuration

Complete guide to PostgreSQL installation and configuration.

intermediate05. 08. 2021

Redis Streams

Persistent event streaming with consumer groups.

intermediate13. 01. 2021

Schema Registry — Central Schema Management for Streaming

Schema Registry versions schemas in the Kafka ecosystem. Avro, Protobuf and compatibility strategies.

intermediate21. 12. 2020

Lakehouse vs Data Warehouse — When to Choose Which Approach

Comparison of lakehouse and traditional data warehouse. Architecture, costs, performance and migration.

intermediate02. 06. 2020

Metabase — Open-Source BI for Self-Serve Analytics

Metabase is an open-source BI platform. Query builder, dashboards, and embedding.

intermediate24. 05. 2020

Database per Service

Why each microservice should have its own database and how to handle cross-service queries.

intermediate12. 05. 2020

PostgreSQL EXPLAIN ANALYZE

Reading query plans — scans, joins, cost and optimization.

intermediate05. 02. 2020

Analytics Engineering — The Role Between Data and Business

Analytics engineering bridges data engineering and business analytics. dbt, modelling and self-serve.

intermediate03. 07. 2019

DuckDB — An Analytical Database Right in Your Notebook

DuckDB is an embedded OLAP database. Zero dependency, SQL over CSV, Parquet and JSON.

intermediate21. 03. 2019

MySQL vs PostgreSQL

Two most popular open-source relational databases.

All categories

Ai 278 Architecture 17 Cloud 105 Data 61 Development 116 Devops 28 Guide 2 Infrastructure 21 Observability 1 Security 24