Cloud Intermediate
Google BigQuery — Serverless Data Warehouse¶
GCPBigQueryAnalyticsData Warehouse 5 min read
BigQuery architecture, partitioning, clustering, ML and cost control.
Architecture¶
Separated storage (Colossus) and compute (Dremel). On-demand $5/TB or flat-rate slots.
Partitioning and Clustering¶
CREATE TABLE dataset.events
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type
AS SELECT * FROM dataset.raw_events;
Dramatic reduction in scanned data = lower cost.
BigQuery ML¶
CREATE OR REPLACE MODEL dataset.churn_model
OPTIONS(model_type='LOGISTIC_REG', input_label_cols=['churned'])
AS SELECT days_since_last_login, total_purchases, churned
FROM dataset.user_features;
Summary¶
BigQuery = the fastest path to petabyte-scale analytics. Partitioning + clustering = the key to cost control.
Need Help with Implementation?¶
Our team has experience designing and implementing modern architectures. We’re happy to help.