Netflix Apache Druid

Netflix — In-House Ads Event Processing Pipeline

Netflix rebuilt its entire ad infrastructure in-house in January 2024, using Kafka, Flink, and Apache Druid with a sessionization pattern to collapse raw events into structured Ad Sessions.

Architecture diagram: Netflix — In-House Ads Event Processing Pipeline

Scale

Full-scale in-house ad infrastructure replacing Microsoft's platform

Before

Microsoft-managed ad infrastructure — external vendor dependency, limited iteration speed

After

Custom pipeline: Ads Event Publisher → Kafka → Flink transforms → Apache Druid (OLAP) + Ads Sessionizer Flink job

Key Insight

The sessionization pattern — collapsing a stream of events into a single structured entity (the 'Ad Session') — is a critical architectural primitive that enables campaign/creative-level analytics.

In a Snowflake Conversation

The sessionization pattern — collapsing a stream of events into a single structured entity (the 'Ad Session') — is what allows downstream analytics to query at the campaign/creative level instead of raw event level.

My Read

Practitioner commentary coming soon.

Apache Druid Kafka Flink sessionization in-house ads

Relevant Conversations

Streaming OLAP Kafka & Flink