Case Study — AtlasIQ

Customer Insights Engine

Turning 200M Daily Events Into a Competitive Advantage in 10 Weeks

Real-time behavioral analytics and personalization for high-volume e-commerce

Industry

E-Commerce / Consumer

Timeline

10 weeks

Team

6 engineers

Tech

Kafka + ClickHouse + ML

The Challenge

A high-growth e-commerce platform was generating 200M+ behavioral events daily but had zero ability to act on them in real time. Customer segmentation ran overnight in batch. By morning, the segments were stale. Personalization was limited to 'users who bought X also bought Y' — a rule from 2019.

Our Approach

How We Solved It

01

Event Stream Architecture

Replaced batch processing with a Kafka-based event stream that processes clickstream, purchase, and search events in under 500ms from user action to segment update.

02

Real-Time Segment Engine

Built a streaming segment engine on ClickHouse that evaluates 50+ behavioral rules against live event data, keeping segments fresh within 30 seconds of any user action.

03

ML-Powered Personalization

Trained collaborative filtering and content-based models on 90 days of behavioral data, serving personalized recommendations via a low-latency API with <12ms P99 response time.

04

Experimentation Infrastructure

Built an A/B testing framework with automatic traffic splitting, statistical significance tracking, and guardrail metrics that auto-pause experiments hurting conversion.

Engineering Process

How We Built It

ClickHouse for Analytical Workloads

Chose ClickHouse over PostgreSQL for the segment engine after benchmarking — 400x faster on aggregation queries across the full event log with columnar compression.

Two-Phase Recommendation Serving

Candidate generation (offline, large model) + ranking (online, lightweight model) pattern kept serving latency under 12ms while maintaining recommendation quality.

Shadow Mode Rollout

New personalization engine ran in shadow mode for 2 weeks alongside the legacy system before cutover, validating recommendation quality without any production risk.

Architecture Decisions

Key Technical Choices

ClickHouse Over BigQuery for Real-Time Segments

BigQuery's query latency was 2-8 seconds — too slow for real-time segment evaluation. ClickHouse's columnar OLAP design gave us sub-100ms aggregation queries.

Redis for Serving Cache

Pre-computed top-20 recommendations per user into Redis at model refresh time, eliminating online model inference latency from the critical path entirely.

Separate Streams by Event Type

Partitioned Kafka into separate topics for purchase, browse, and search events so downstream consumers can subscribe to only what they need without filtering overhead.

Results

What We Delivered

200M+
Events Processed Daily
12ms
P99 Recommendation Latency
8%
Revenue Lift From Personalization
NPS Improvement

Solution Blueprint

How It All Fits Together

Stream Processing Layer
  • Kafka event ingestion
  • Flink stream processing
  • 30-second segment refresh
Analytics Layer
  • ClickHouse OLAP engine
  • ML recommendation models
  • A/B testing framework
Serving Layer
  • Redis recommendation cache
  • REST personalization API
  • Real-time analytics dashboard

Lessons Learned

What We Improved

01

Measure Business Metrics, Not Technical Ones

We initially optimized for segment refresh speed. The real metric was revenue per session. Refocusing on business outcomes changed our model selection and feature priorities.

02

Shadow Mode Saved the Launch

Shadow testing caught a critical edge case where the new model performed poorly for new users with fewer than 5 events — we fixed it before any customer saw a degraded experience.

03

ClickHouse Migration Is Irreversible

Once you go real-time OLAP, your team expects sub-second analytics everywhere. Set expectations upfront that not every query can be sub-100ms.

Work With Halkwinds

Build Something Exceptional

Partner with the team that built AtlasIQ.

View Platform