Big Data Days 2021

Online Edition

28-30 Cентября


Lidor Gerstel

DEVOPS-Cloud Architect

Israel, Centerity


DevOps Team Leader & Experienced Trainer with a demonstrated history of leading CI/CD Projects & Big Data Project in the Industry Skilled in Kubernetes, Hadoop, AWS, Docker, AWS and Jenkins, Certified in AWS Solution Architect.


Spark and HADOOP


The Workshop will cover basic concepts of Hadoop and mostly in The Cloudera stack, like  using HBase & Impala to query data, using Spark to stream data, afterwards we will launch a Cloudera quickstart, using datasets of top-rated movies in the workshops, getting the data analyzed and queried with Hadoop, explaining & demonstrating  Map Reduce Concepts, RDD Partition on Spark.


  • Part 1: Introduction to Hadoop and Map Reduce :
    • Hadoop Distributers
    • Hadoop Vs Traditional Data Storage
    • Working with HDFS
    • Basic commands
    • Architecture
  • Part 2: Hive and HBase:
    • HiveQL
    • Hive Data types
    • HBase data model
    • HBase vs RDBMS
    • Client API and REST
  • Part 3: Apache Spark ( PySpark):
    • Basics and RDD
    • Caching & Modules
    • Spark Streaming
    • Spark SQL

Целевая аудитория

entry Level in Big Data, DBA’s , BI Engineers, familiarity in Open Source Systems

Предварительные условия курса


  • Docker Installed on Linux : sudo apt-get install
  • Download the Cloudera QuickStart Image : docker pull cloudera/quickstart:latest
  • Start the Cloudera stack Container:

docker run —hostname=quickstart.cloudera —privileged=true -t -i -p 8888 -p 80 -p 7180 -d <Name of the Image> /usr/bin/docker-quickstart

« Hазад