LinkedIn Kafka/Flink

LinkedIn — Batch to Real-Time Recommendation Migration

LinkedIn migrated 'People You May Know' from batch precomputation to real-time using a four-phase Offline → Nearline → Online → Remote Scoring model, cutting compute costs 90%.

Architecture diagram: LinkedIn — Batch to Real-Time Recommendation Migration

Scale

Full user base for 'People You May Know' recommendations — LinkedIn's scale

Before

'People You May Know' precomputed for entire user base regardless of login activity → compute waste + stale results (pipeline incident = days of delay)

After

Four-phase migration: Offline → Nearline → Online → Remote Scoring; 90% reduction in offline computing costs + session-level freshness

Key Insight

The four-phase migration model is a reusable framework for de-risking batch-to-real-time transitions. Each phase is independently valuable and reduces risk.

In a Snowflake Conversation

The four-phase migration model is a reusable framework. When a customer asks 'how do we get from batch to real-time,' this phased approach de-risks the transition.

My Read

Practitioner commentary coming soon.

Kafka Flink recommendation batch to real-time four-phase migration

Relevant Conversations

Kafka & Flink