Total Hours: 40

BIG Data - Hadoop Introduction

  • Understand What is Big Data.
  • Analyze limitations and solutions of existing Data Analytics Architecture
  • Understand What is Hadoop and its features
  • Hadoop Ecosystem
  • Understand Hadoop 2.x core components
  • Perform Read and Write in Hadoop
  • Understand Rack Awareness concept
  • Analyze Hadoop 2.x Cluster Architecture –Federation
  • Analyze Hadoop 2.x Cluster Architecture –High Availability
  • Run Hadoop in different cluster modes
  • Implement basic Hadoop commands on Terminal
  • Prepare Hadoop 2.x configuration files and analyze the parameters in it
  • Implement Password-less SSH on Hadoop cluster
  • Analyze dump of a Map Reduce program
  • Implement different data loading techniques

Map Reduce and Advance MR

  • Analyze different use-cases where Map Reduce is used
  • Difference between Traditional way and Map Reduce way
  • Learn about Hadoop 2. X Map Reduce Architecture and components
  • Understand execution flow of YARN Map Reduce Application
  • Data types in Hadoop
  • Run a Map Reduce Program
  • Input Formats in Map Reduce
  • Reduce Side Join
  • Map Side Join/Replicated Join/Composite Join
  • In-Memory Map Side Join/Distributed Cache

PIG & HIVE

PIG:

  • Need of PIG
  • Why should we go for PIG where there is MR
  • Where not to use PIG
  • What is PIG
  • Use cases where PIG is used
  • PIG – Basic Program Structure
  • PIG Execution
  • PIG Latin Program
  • PIG – Data Model
  • PIG Latin Operators
  • PIG UDF

HIVE:

  • Understand what is Hive and its Use Cases
  • Understand Hive Architecture and Hive Components
  • Analyze limitations of Hive
  • Implement Primitive and Complex types in Hive
  • Understand Hive Data Model
  • Perform basic Hive operations
  • Execute Hive scripts and Hive UDFs

Advance HIVE and HBASE

  • Implement Joins in Hive
  • Implement Dynamic Partitioning
  • Analyze Custom Map/Reduce Scripts
  • Create Hive UDF
  • Understand NoSQL Databases and HBASE
  • Analyze difference between HBASE and RDBMS
  • Understand HBASE Components and Storage Architecture
  • Analyze HBASE Read and Write
  • Perform HBASE Cluster Deployment
  • Understand HBASE Attributes
  • Understand Data Model and Physical Storage in HBASE
  • Execute basic commands on HBASE shell
  • Analyze Data Loading Techniques in HBASE
  • Implement HBASE API
  • Understand Zookeeper Data Model and its Services
  • Analyze relationship between HBASE and Zookeeper
  • Perform Advance HBASE Actions

SQOOP, FLUME, APACHE OOZIE, HCATALOG and Hadoop Project

Implement Flume and Sqoop

OOZIE

  • Understand Oozie
  • Schedule Job in Oozie
  • Implement OozieWorkflow
  • Implement OozieCoordinator

HCATALOG

  • Understand Oozie
  • How to use - Demo

Hardware and Software Requirement

Pre-requisites to install Apache Hadoop 2.0 VMWare Player
  • Minimum 4 GB RAM
  • Dual Core Processor or above
Follow below link to install and configure Hadoop 2.0 cluster @VMWare player for Pseudo Distributed Mode :
http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-apache-hadoop-cluster.html
Follow below link to install and configure Apache PIG :
http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache.html
Follow below link to install and configure Apache HIVE :
http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache_19.html
Follow below link to install and configure HBASE :
http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-hbase-on.html
Follow below link to install and configure SQOOP :
http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-sqoop.html