Skip to content
_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN DE
Let's talk

Feature Store: Key Infrastructure for ML in Production

18. 02. 2026 5 min read CORE SYSTEMSai
Feature Store: Key Infrastructure for ML in Production

Feature Store: Key Infrastructure for ML in Production

Most ML projects fail not because of a bad model, but because of bad data in production. Feature store solves exactly this problem — it’s infrastructure that ensures your model in production gets the same quality features as during training.

What Is a Feature Store

A feature store is a central repository and serving layer for ML features. It serves as a bridge between data engineering and data science.

Core problems it solves:

  • Training-serving skew — model in production receives differently calculated features than during training
  • Feature reuse — each team computes the same features from scratch and differently
  • Point-in-time correctness — during training you must use features from exactly that point in time, not future data
  • Online/offline konzistence — batch features for training and real-time features for serving must be identical

Feature Store Architecture

A modern feature store has two main layers:

Offline Store (Batch)

Historical data for model training. Typically on top of a data lake (S3/ADLS + Parquet/Delta Lake).

Raw Data → Feature Pipeline (Spark/dbt) → Offline Store → Training Dataset

Key property: point-in-time joins. When training a model on data from January, features must correspond to values from January — not current ones.

Online Store (Real-time)

Low-latency serving for production inference. Typically Redis, DynamoDB, or Cassandra.

Event Stream → Streaming Pipeline (Flink/Spark) → Online Store → Model Serving

Latency under 10 ms is standard. For real-time ML (fraud detection, recommendations, dynamic pricing), this is critical.

Materialization

The synchronization process between offline and online store. The feature store automatically:

  1. Computes features from raw data (batch and streaming)
  2. Stores them in the offline store with timestamps
  3. Materializes the latest values into the online store
  4. Versions schemas and transformations

Main Tools in 2026

Open-source

Feast — the most widely used open-source feature store. Python-first, supports AWS, GCP, Azure, and on-prem. Registry in Git (feature definitions as code), offline store via BigQuery/Redshift/Spark, online store via Redis/DynamoDB.

# Feature Store: Key Infrastructure for ML in Production
from feast import FeatureView, Entity, Field
from feast.types import Float32, Int64

customer = Entity(name="customer_id", join_keys=["customer_id"])

customer_features = FeatureView(
    name="customer_features",
    entities=[customer],
    schema=[
        Field(name="total_orders_30d", dtype=Int64),
        Field(name="avg_order_value_30d", dtype=Float32),
        Field(name="days_since_last_order", dtype=Int64),
        Field(name="churn_risk_score", dtype=Float32),
    ],
    source=customer_data_source,
    online=True,
    ttl=timedelta(hours=24),
)

Hopsworks — a complete ML platform with an integrated feature store. Strong in real-time features (streaming transformations). Open-core model.

Managed

Tecton — enterprise-grade, built by the creators of Uber’s Michelangelo. Best for real-time features and streaming transformations. Expensive, but production-ready.

Databricks Feature Store — native integration with Unity Catalog and MLflow. Ideal if you are already in the Databricks ecosystem.

SageMaker Feature Store — AWS native. Simple, but limited to AWS.

Vertex AI Feature Store — GCP native. Good integration with BigQuery.

When You Need a Feature Store

YES — The Investment Pays Off

  • Multiple ML models in production (>3) sharing similar features
  • Real-time inference with latency requirements <100 ms
  • Multiple teams working with ML and duplicating feature engineering
  • Regulated environment requiring audit trail and reproducibility
  • Training-serving skew causing model degradation

NO — The Overhead Does Not Pay Off

  • You have 1-2 models with batch inference
  • Small team where communication is sufficient
  • Experiments and PoC phase
  • Features don’t change and are simple

Implementation Patterns

Feature Pipeline Patterns

Batch features — computed periodically (hourly/daily). Typically aggregations: average purchase over 30 days, number of logins per week.

Streaming features — computed in real time from the event stream. Sliding window: number of transactions in the last 5 minutes (fraud detection).

On-demand features — computed at request time. Customer distance from the nearest branch, current exchange rate.

Feature Engineering Best Practices

  1. Version transformations — feature definition is code, belongs in Git
  2. Test features — unit tests on transformations, data quality checks
  3. Monitor drift — feature distributions change over time, monitor statistics
  4. Document business context — what a feature means, who owns it, where it is used
  5. Standardize naming{entity}_{aggregation}_{window}_{metric} (e.g., customer_sum_30d_revenue)

Feature Store in the Czech Context

For Czech companies with 5-50 ML models, we recommend:

Starter setup (up to 10 models): - Feast + Redis (online) + PostgreSQL/S3 (offline) - Feature definitions in a Git monorepo - CI/CD pipeline for feature materialization - Total costs: ~$200-500/month for infrastructure

Enterprise setup (10+ models): - Feast or Tecton + dedicated streaming (Kafka + Flink) - Delta Lake as offline store - Central feature catalog with ownership and documentation - Feature quality monitoring (Great Expectations / Soda) - Costs: $2,000-10,000/month

ROI Calculation

Typical return on investment:

  • Feature reuse: You save 2-4 weeks of data scientist work per project (feature engineering is 60-80% of time)
  • Training-serving consistency: You eliminate model degradation after deploy (typically 5-15% accuracy loss)
  • Time to production: From months to days for a new model (features already exist)
  • Compliance: Audit trail for free — who, when, what features were used

Monitoring and Observability

A feature store without monitoring is like a database without backups. Track:

  • Freshness — are features up to date? Materialization lag
  • Completeness — how many null values? Missing rate per feature
  • Distribution drift — has the distribution changed? PSI (Population Stability Index)
  • Latency — online serving p50/p95/p99
  • Usage — which features are used by whom, which are dead

Conclusion

A feature store is not a luxury — it is a necessity for companies that want to run ML in production reliably. Start with Feast and Redis, set up a basic pipeline, and expand as needed.

The most important thing is to start with a feature catalog — a list of all features with documentation, owner, and source. Even without a full feature store, this will save you dozens of hours of duplicated work.


CORE SYSTEMS helps Czech companies build ML infrastructure from feature store to model serving. Contact us for a consultation.

mlopsfeature-storeml-infrastructurereal-time-mlfeasttectondata-engineering
Share:

CORE SYSTEMS

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us