Big Data Days 2021

Online Edition

28-30 Cентября

онлайн

Lidor Gerstel

DEVOPS-Cloud Architect

Israel, Centerity

Биография

DevOps Team Leader & Experienced Trainer with a demonstrated history of leading CI/CD Projects & Big Data Project in the Industry Skilled in Kubernetes, Hadoop, AWS, Docker, AWS and Jenkins, Certified in AWS Solution Architect.

Мастер-класс

Spark and HADOOP

Абстракт

The Workshop will cover basic concepts of Hadoop and mostly in The Cloudera stack, like  using HBase & Impala to query data, using Spark to stream data, afterwards we will launch a Cloudera quickstart, using datasets of top-rated movies in the workshops, getting the data analyzed and queried with Hadoop, explaining & demonstrating  Map Reduce Concepts, RDD Partition on Spark.

Содержание

  • Part 1: Introduction to Hadoop and Map Reduce :
    • Hadoop Distributers
    • Hadoop Vs Traditional Data Storage
    • Working with HDFS
    • Basic commands
    • Architecture
  • Part 2: Hive and HBase:
    • HiveQL
    • Hive Data types
    • HBase data model
    • HBase vs RDBMS
    • Client API and REST
  • Part 3: Apache Spark ( PySpark):
    • Basics and RDD
    • Caching & Modules
    • Spark Streaming
    • Spark SQL

Целевая аудитория

entry Level in Big Data, DBA’s , BI Engineers, familiarity in Open Source Systems

Предварительные условия курса

Installations:

  • Docker Installed on Linux : sudo apt-get install docker.io
  • Download the Cloudera QuickStart Image : docker pull cloudera/quickstart:latest
  • Start the Cloudera stack Container:

docker run –hostname=quickstart.cloudera –privileged=true -t -i -p 8888 -p 80 -p 7180 -d <Name of the Image> /usr/bin/docker-quickstart

« Hазад