Big Data Days 2022

Конференция отменена

Catalin Ciobanu

CTO & Co-Founder

Boostrs SAS, France

Биография

Catalin is co-founder and CTO of Boost.rs, an A.I. startup advancing the science of skills to connect people with jobs and learning. Before founding Boost.rs, Catalin led a data & analytics team within the travel industry, generating several award-winning products and outstanding research coverage (Harvard Business Review, The Economist etc). Prior to his industry experience, Catalin worked as an experimental particle physicist; his PhD thesis centered on applying AI to solving low signal-to-noise problems.

Мастер-класс

Data Science Playbook: A Step-By-Step Guide for Your Journey From Data to Insight

Абстракт

This workshop will cover the do’s and don’ts of working with data, from formulating insightful questions to communicating the results to specialists or non-technical public. At the end of the workshop participants will have an analysis blueprint complete with Python notebooks which they can re-purpose and apply to their own projects at work.

Содержание

  • Part 1: Introduction. What is data science? Three types of problems faced by every data scientist.
    • Learn to recognize the type of problem to which you are confronted.
    • Take an object-oriented approach to data analysis: the O-V-V-D framework.
  • Part 2: Data gathering / cleaning / wrangling.
    • Prepare the data. Avoid the GIGO myth.
    • Perform exploratory analysis. Learn guidelines for outlier treatment.
  • Part 3: Feature engineering.
    • Define variable types and variable combinations (linear and non-linear).
    • Perform correlation analysis.
  • Part 4: Multi-variate analysis: regression and classification.
    • Perform linear regression. Quantify predictor importance. Learn to incorporate fixed effects.
    • Perform non-linear techniques. Compare the results obtained from different techniques.
  • Part 5: Prediction using multi-variate models.
    • Quantify model accuracy.
    • Create a “money plot”. Perform final cross checks and verify the robustness of the results.
  • Part 6: Result visualization and communication
    • Learn several guidelines for making great charts.
    • Make your results work for you through data marketing and thought leadership.

Целевая аудитория

The target audience includes data scientists, aspiring data scientists, and anyone interested in working with data in a business environment.

Предварительные условия курса

    Anaconda environment with Python 3.7 or above.

    Technical knowledge:

    • Basic Python programming or some other programming language
    • Basic statistics: calculating averages and standard deviations, representing basic histograms

    Professional experience: participants occupy or have occupied a data-related role, preferably in a business setting.

« Hазад