Big Data Moscow 2018

 

Andrei Bashchenko

Russian Post, Russia

BIO

Russian Post Big Data CoE Head.
Responsible for:
– development of Enterprise Data Platform, product ecosystem development around the platform:
– division evolution and strategy, key architectural and technical decisions, product backlogs;
– development teams management, team enhancement, competencies development.

More than 15 years in software development, from software engineer to project manager and development division head.
Focus on data-centric, Data-driven products from centralized architecture DWH, MDM solutions to Enterprise Data Platform on Hadoop stack.

TOPIC

Case Study: Big Data Platform for Russian Post on Hadoop stack

I want to share a success story of creating Enterprise Data Platform on Hadoop stack for Russian Post.
Russian Post is 350.000+ employees, largest logistics infrastructure and retail network of 42.000+ shops, with every item being uniquely tracked. Digital transformation of business of such a scale is an exciting challenge. In this presentation I’ll demonstrate, how business tasks and requirements formed solution architecture and affected technology choice.

In 3 years the platform has grown from scratch to a cluster of 7.200 VCPU with following architecture: Hadoop (Hortonworks), Yarn, Spark, Hive, Hue, Tez, Oozie, Flink, Kafka, Spark Streaming, Cassandra, Vertica, Yandex ClickHouse, Pentaho, Docker.

Platform provides following capabilities:
– multi-stage analytical data marts calculation over tens of billions records per day;
– streaming processing of events from source systems, real time data integration and data streaming to consumers;
– high throughput access to streaming processing data by key;
– stable and convenient data access for hundreds of thousands users.

Date: October 11, 2018