61 articles
How to query MongoDB efficiently. $match, $group, $lookup, $unwind.
Comparison of ETL and ELT approaches to data pipelines. When to choose extraction before transformation and when not.
Open-source vector DB — installation and querying.
PostgreSQL complete — installation, SQL, indexes, JSONB, replication, backup.
Relational vs non-relational databases — CAP theorem and use cases.
Progressive web application scaling — from a single server to a million users.
Lakehouse combines the flexibility of a data lake with the reliability of a warehouse. Medallion architecture.
Comparison of data formats Parquet, Avro, ORC and JSON. When to use which in data pipeline.
Document vs relational database — when to choose which.
Elasticsearch — full-text search, agregace, logging, monitoring.
Spark Structured Streaming combines batch and stream in one API. Micro-batch and Delta Lake integration.
How to efficiently process large data volumes. Chunks, streaming, parallel.
How to prepare for a system design interview — framework, examples, resources.
Google's Looker with LookML layer. Central metric definitions and governance.
SQL injection, NoSQL injection, OS command injection — how they work and how to defend against them.
Scaling databases with read replicas. Master-slave replication and routing.
ClickHouse is an open-source columnar OLAP database. MergeTree engine and materialized views.
Local development with docker-compose. Multi-container setup, volumes and networking.
Trino is a distributed SQL engine for querying heterogeneous sources without moving data.
The Hadoop ecosystem from HDFS to Hive. History and the transition to modern cloud solutions.
Streaming and logical replication, failover.
DataHub from LinkedIn is open-source data catalog. Metadata, lineage and governance.
Redis as cache, session store, pub/sub, rate limiter.
JSONB type, operators, indexes and practical examples.
B-tree, GIN, GiST, BRIN, partial and expression indexes.
The two most popular open-source SQL databases.
Apache Flink is a framework for stateful stream processing. Windowing, event time and exactly-once semantics.
JSONB, CTE, window functions, partitioning, extensions.
How to properly pool database connections. PgBouncer, HikariCP, SQLAlchemy.
Difference between OLAP and OLTP databases. Columnar vs row storage and choosing for different use cases.
Dagster brings asset-oriented approach to orchestration. Software-defined assets, type system and monitoring.
WebSocket server implementation for chat, notifications and live dashboards. Scaling with Redis.
Relational Database Service. Multi-AZ, Read Replicas, Aurora, backup, and performance tuning.
Real-time analytics architecture. Lambda vs Kappa, streaming pipelines and OLAP engines.
Debezium is open-source CDC platform. Capturing database changes via Kafka Connect.
SQL vs NoSQL — PostgreSQL vs MongoDB vs Redis. When to use what.
Cosmos DB API models, consistency levels, partitioning and RU optimization.
Declarative partitioning for large tables.
Real-time messaging with Redis Pub/Sub.
Real costs of web application hosting — from free tier to enterprise.
Apache Kafka is a distributed event streaming system. Topics, partitioning and consumer groups.
Full-text search, indexing, queries, aggregations.
Apache Iceberg — hidden partitioning, schema evolution and time travel. Vendor-neutral.
dbt enables data transformation in warehouse using SQL. Models, tests, documentation and versioning.
Database migration checklist — planning, testing, rollback, zero-downtime.
Practical implications of the CAP theorem on system design. CP vs AP systems.
Kafka Connect links Kafka with databases, files and cloud services. Source and sink connectors without coding.
Apache Spark is an engine for distributed data processing. DataFrame API, Spark SQL and optimization.
Event streaming vs message broker — architecture and use cases.
Managing data in Docker — volumes, bind mounts and best practices.
PostgreSQL optimization — indexes, EXPLAIN ANALYZE, connection pooling, vacuum and more.
Complete guide to PostgreSQL installation and configuration.
Persistent event streaming with consumer groups.
Schema Registry versions schemas in the Kafka ecosystem. Avro, Protobuf and compatibility strategies.
Comparison of lakehouse and traditional data warehouse. Architecture, costs, performance and migration.
Metabase is an open-source BI platform. Query builder, dashboards, and embedding.
Why each microservice should have its own database and how to handle cross-service queries.
Reading query plans — scans, joins, cost and optimization.
Analytics engineering bridges data engineering and business analytics. dbt, modelling and self-serve.
DuckDB is an embedded OLAP database. Zero dependency, SQL over CSV, Parquet and JSON.
Two most popular open-source relational databases.