Big Data Moscow 2018

 

Bartosz Łoś

RTB House, Poland

BIO

Bartek graduated from two faculties: Computer Science and Mathematics at University of Warsaw. Bartek’s major interest is connected with creating systems that enable for the distributed processing of large data sets. It is what he has been working on for more than 8 years, previously as a C++ Developer in Gemius and now as a Tech Lead in RTB House. He has unique experience in this area, he has been doing it in very different technologies, firstly, with custom-built internal solutions and secondly, with solutions built from recent open source components. Speaker is also an author of the presented solution. He is the main person responsible for reorganization of a new flow for the whole data produced and stored for RTB necessities and maintaining the whole processing infrastructure. His experience includes all levels of data processing architecture from technical issues to high-level system design with focus on its efficiency, scalability and reliability.

TOPIC

Real-Time Data Processing at RTB House – Architecture & Lessons Learned

In this talk we would like to share our experience connected with building and scaling our real-time data processing infrastructure at RTB House, the company which has been placed #46 (and #8 in technology) in the Financial Times 1000.

Our platform, which purchases and runs advertisements in the Real-Time Bidding model, processes 1.5M bid requests and generates 80K events per every second in 4 data-centers which gives 20TB data every day. Because of machine learning, system monitoring and financial settlements we need to filter, synchronize, store, aggregate and join these events together. As a result processed events and aggregated statistics are available in various data sources like Hadoop, Google BigQuery, Postgres or Elasticsearch.

We have designed and implemented the solution which has reduced delay of availability of this data from 1 day to few seconds. It was possible because of a new approach and used technologies. It was essential to provide immutable streams of events to make it a good fit for our multi-DC architecture. Current real-time data flow in contrast to the previous solution is completely independent from bidding system which produces only light events now. Because of this separation, the core system is much more stable, but also data processing has higher quality and is easier to maintain. Additionally events making could be paused or even reprocess if it is needed.

 

Date: October 11, 2018   |   Time: 00:00