Data Blueprint
Architecture before technology.
We map your data sources, flows and consumers. We design Medallion architecture with clear source of truth and implementable plan.
Why blueprint before implementation¶
Most data projects fail on architecture, not technology. Team picks Snowflake, starts building pipelines, and after 6 months: - Nobody knows what’s source of truth for “revenue” - 3 teams have 3 different definitions of “active customer” - Data quality is a disaster, nobody trusts dashboards - Pipelines fail silently, nobody knows why
Blueprint solves these problems upfront.
Discovery process¶
Week 1-2: Data Landscape Mapping - Inventory of all data sources (ERP, CRM, e-shop, DMS, spreadsheets) - Data flow mapping (who sends what where, how often, through which channel) - Consumer identification (who needs data, in what form, how often) - Qualitative assessment (where are problems, what hurts most)
Week 3: Architecture Design - Source of Truth definition for key entities (customer, order, product) - Medallion architecture (Bronze → Silver → Gold) - Technology selection based on requirements - Data governance model (ownership, quality SLA, access control)
Week 4: Roadmap - Use case prioritization by business value and technical feasibility - MVP pipeline definition (most painful use case) - Timeline and resource estimate - Risk assessment and mitigation
Medallion Architecture Design¶
For every project we design three layers:
Bronze (Raw): Exact copy of source data. Immutable, append-only. No transformation. Purpose: audit trail, reprocessing, debugging.
Silver (Cleaned): Cleaned, validated, standardized data. Defined schema, data types, constraints. Quality gates automatically monitor completeness and consistency.
Gold (Business-ready): Denormalized views optimized for consumers. Semantic layer with business metric definitions. Access controls per role/team.
Technology Selection¶
We don’t pick technology based on hype. We decide based on:
| Criterion | Option A | Option B |
|---|---|---|
| Data volume < 100 GB | PostgreSQL + dbt | Overkill for Spark |
| Data volume 100 GB - 10 TB | Snowflake / Databricks | dbt for transformations |
| Real-time requirement | Kafka + Flink | Batch insufficient |
| Budget < 50K/month | Open-source stack | Managed services expensive |
| Team skill | Known technology | New tool = ramp-up time |
Result: architecture that makes sense for your situation, not for vendor sales team.
Časté otázky
Implementable document: data landscape map, source of truth definition, target architecture (Medallion), technology recommendation, prioritized roadmap, cost estimate. Not PowerPoint — code and diagrams.
Discovery + blueprint: 2-4 weeks, from 400K CZK. Includes business workshops, technical audit, architectural design and roadmap.
No. Data platform connects to existing sources (CDC, API, export). Source systems remain unchanged. Transformation happens in the data platform.