Big Data Days 2019

 8-10 октября   Москва

Подтверждённые доклады

подтверждённые доклады на данный момент

Constant Bridon

OCTO Technology, France

Доклад

On Board Artificial Intelligence : Train, Deploy and Use Deep Learning on an Edge Device, a Raspberry Pi

Machine learning applications are new in the software development landscape, and tend to be hard to build. As Google noted in an article, it is mainly because the application is much broader than the model itself. Surprisingly though, Machine Learning applications follow a double Pareto’s law. On the one hand, 80% of the time spent on building those applications deals with machine learning problems whereas 20% of the remaining time is spent on…

Читать больше…

Deep Learning
Deployment
On Board
Production

Guglielmo Iozzia

MSD, Ireland

Доклад

Distributed Deep Learning with Keras and TensorFlow on Apache Spark

DeepLearning4J is an Open Source distributed framework for Deep Learning on the JVM. It allows importing Python (Keras and TensorFlow) models in order to train them in a distributed fashion on Apache Spark. The talk would walk through the reasons for doing distributed Deep Learning of Python models in a JVM based environment and the details to productionalize this process.

Читать больше…

Distributed Deep Learning
Apache Spark
DL4J
Java

Yulia Stolin

Outbrain, Israel

Доклад

Realtime Data Pipelines Using Spark Streaming

At Outbrain we serve billions of personalised recommendations.
Our serving ML models were built on top of batch ELT flows.
But having near realtime inputs is extremely important in our business.
During this session, I will present our journey from batch-based to real-time analytics.

Читать больше…

Big Data
Spark Streaming
Kafka
Lambda Architecture

Diego Hueltes

RavenPack, Spain

Доклад

Data Science for Lazy People, Automated Machine Learning

Data science is fun, right? Data cleaning, feature selection, feature preprocessing, feature construction, model selection, parameter optimization, model validation – oh wait – are you sure? What about automating 80% of the work even doing better choices than you? Automated Machine Learning has arrived to be your personal assistant in Data Science.

Читать больше…

Machine Learning
Automated Machine Learning

Nenad Bozic

SmartCat, Serbia

Доклад

What It Takes to Build Production Ready AI Solution

We are data company that works with other companies to help them build AI solutions. We are a blend of data scientists and data engineers and that makes us question from different angles how next big AI module will be integrated in your platform. This is prime reason why we can brag that we have more then 10 AI solutions in production developed over the last 3 years.

Читать больше…

AI Solution
Production Ready
Human in the Loop
Success Criteria
Exploratory Analysis

Kelly Schlamb

IBM Canada Ltd., Canada

Доклад

Data Science & AI: Infrastructure Matters

Artificial intelligence is increasingly being seen as a competitive advantage and every company is running fast to try and be a part of this revolution. However, on its own, AI faces a steep time-to-value curve. For example, do you have the right data? Do you have the right skills? Can everyone participate (typically, AI is in the hands of the privileged few and companies struggle to democratize it for the many)? But there is one often overlooked…

Читать больше…

Infrastructure
Data Science Power

Valentina Djordjevic

Things Solver, Serbia

Доклад

Breaking Out the Lead Scoring Algorithm

The Lead is considered to be any individual who may become a potential client as it has shown an interest in the product or service a company offers.
A Lead Generation refers to the method of collecting leads in order to manage sales channels more efficiently, raise brand awareness and contribute to rising profits. Lead Scoring includes assigning certain weights to each potential…

Читать больше…

Lead Generation
Lead Scoring
Data Science
Optimization

Olga Petrova

Sencha, Germany

Доклад

Visual ML with TensorFlow.js

Advantages of performing machine learning (ML) in browser using TensorFlow.js are not limited to the privacy of user’s data, which is unnecessary for any installations and access to sensors. Another important point is the availability of a rich set of instruments for interactive visualizations and UIs available for JavaScript. This allows us to look inside of the process of model training and re-inforce ML in browsers.

Читать больше…

Visualization
Tensorflow.js
Machine Learning

Sonya Liberman

Outbrain, Israel

Доклад

From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation

Serving tens of billions of personalized recommendations a day under a latency of 30 milliseconds is a challenge. In this talk I’ll share our algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer, that enable running complex models under difficult scale constrains and shorten the cycle between research and production.

Читать больше…

Machine Learning
Spark
Elasticsearch
Recommender System

Miel Hostens

Utrecht University, The Netherlands

Доклад

Predicting the Moment of Calving in Dairy Cows Using Time Series Analysis in Apache Spark

Stillbirth, defined as calves that die during unobserved birth is often seen as an indicator of lowered animal welfare in dairy cows. Sensors have been proposed as a tool to support dairy farmers but accurate calving prediction models are often lacking. In this session, a machine learning data pipeline will be described using the spark ML framework. Heavy lifting and feature preparation for sensor data from 1331 cows on 8 herds from 21 days before until the day of calving was performed using sliding windows and time series analysis. In total a set of 100+ features was used in a random forest classification model trained and tested using different cross validation approaches.

Читать больше…

Milos Milovanovic

Things Solver, Serbia

Доклад

Who Needs Data Governance?

With the fast development of advanced analytics and tight deadlines in project delivery imposed by business units, data management and data governance are often left aside. This environment leads to an unconsolidated and decentralized approach to analytics projects, where the organization lacks the overall view of the complete business processes. Although some benefits of data science projects are achieved even in these circumstances…

Читать больше…

Big Data
Data Management
Data Governance
Advanced Analytics

Sriskandarajah Suhothayan

WSO2, Sri Lanka

Доклад

A Head Start on Cloud-native Event Driven Applications

There is an emerging need for efficiently building event-driven applications in the current microservices era, but the traditional message processing systems are falling behind as they are inflexible in adapting to the cloud. In this talk, I will present Siddhi, which is a 100% open source stream processing system that provides an efficient way of implementing event-driven cloud-native applications…

Читать больше…

Stream Processing
Kubernetes
Cloud Native
Event Driven

David Pilato

elastic, France

Доклад

Managing Your Black Friday Logs

Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.

Читать больше…