DATA MINER, Lithuania
Learn how Apache Hadoop addresses the limitations of traditional computing, helps businesses overcome real challenges, and powers new types of big data analytics. This workshop introduces Apache Hadoop ecosystem and outlines how to prepare the data center and manage Hadoop in production.
There are many components working together in the Apache Hadoop stack. By understanding how each functions, you gain more insight into Hadoop’s functionality in your own IT environment. We will go beyond the motivation for Apache Hadoop and will dissect the Hadoop Distributed File System (HDFS), MapReduce, and the general topology of a Hadoop cluster.
Things Solver, Serbia
This workshop is dedicated to the machine learning techniques that can be used for anomaly detection. The session is organized in three phases, where each phase is more advanced and demanding than the previous one.
Phase 1 includes theoretical anomaly detection introduction and covering basic anomaly detection techniques like z-score and smoothed z-score for anomaly detection.
Phase 2 includes more advanced machine learning algorithms, able to work with multivariate datasets, like Isolation Forest and Elliptic Envelope.
Phase 3 includes using the Autoencoder neural network to detect anomalies in huge multivariate datasets.
WSO2, Sri Lanka
There is an emerging need for efficiently building event-driven applications in the current microservices era, but the traditional message processing systems are falling behind as they are inflexible in adapting to the cloud. In this session, I will present Siddhi, which is a 100% open source stream processing system that provides an efficient way of implementing event-driven cloud-native applications that can run natively on Kubernetes, and integrate to various systems such as NATS, Kafka, email, and MongoDB. I will also do a hands-on session to showcase how you can build streaming data integration, streaming analytics, and machine learning based adaptive intelligence applications efficiently within minutes.
Счетная палата Российской Федерации, Russia
Many companies now are asking the questions “what is digital transformation?” “do we need it?” “if we need – how to approach it, how to start?”
Company’s culture of working with data is one of the most important success factors.
How to change the culture and to step on the transformation path? It is necessary to identify correctly the key steps, recruit a team of changes, find “agents of transformation”, identify “quick wins”, support all things with the motivation and training plan, determine the sources of data, technologies and products.
The last few years in the sphere of data warehouses (DWH) are best described by one phrase: the game has changed. In contrast to the mono-vendor solutions of the past, the modern data landscape is not represented by a single silver bullet system, or even several systems from one vendor. A business that wants to gain a competitive advantage from the available data is forced to use dozens, if not hundreds, of various components and systems, each effectively solves its narrow task.
At the same time, there is a growing tendency to abandon vendor lock-in solutions – now companies are increasingly choosing open source solutions. This allows them to diversify the risks of contractors and vendors, simultaneously opening the door to accumulate internal expertise on technologies with their subsequent independent support.
Another trend is also becoming more noticeable – more and more companies are choosing clouds instead of their own capacities for infrastructure. A few years ago these were mostly private installations, now there is an advantage in using public ones. Each cloud provider carries its own virtualization technologies, networks and other specifics.
You will learn about models and key patterns for managing distributed applications on Kubernetes. We will be describing data transformations using Apache Beam and then we will be running an implemented pipeline in batch and streaming modes using Apache Spark and Apache Flink, respectively. After that you will implement a pipeline for building and deploying applications on Kubernetes using GitLab CI/CD.
Lab sessions will be held in four stages:
- Preparing workplace;
- Describing data transformations using Apache Beam;
- Implementing CI/CD pipeline for managing a streaming application (execution engine – Apache Flink, tools – GitLab CI/CD, Helm, Kubernetes);
- Implementing CI/CD pipeline for managing a batch application (execution engine – Apache Flink, tools – GitLab CI/CD, Helm, Kubernetes).
All participants will be provided with handouts and as well as a set of excercises in order to consolidate the acquired skills.
Also, you will have an opportunity to ask the questions and discuss the prospects of applying the acquired knowledge and the considered tools regarding your practical cases.