DBTECHNOSOLUTIONS

Total Hours: 40

BIG Data - Hadoop Introduction

Understand What is Big Data.
Analyze limitations and solutions of existing Data Analytics Architecture
Understand What is Hadoop and its features
Hadoop Ecosystem
Understand Hadoop 2.x core components
Perform Read and Write in Hadoop
Understand Rack Awareness concept
Analyze Hadoop 2.x Cluster Architecture –Federation
Analyze Hadoop 2.x Cluster Architecture –High Availability
Run Hadoop in different cluster modes
Implement basic Hadoop commands on Terminal
Prepare Hadoop 2.x configuration files and analyze the parameters in it
Implement Password-less SSH on Hadoop cluster
Analyze dump of a Map Reduce program
Implement different data loading techniques

Map Reduce and Advance MR

Analyze different use-cases where Map Reduce is used
Difference between Traditional way and Map Reduce way
Learn about Hadoop 2. X Map Reduce Architecture and components
Understand execution flow of YARN Map Reduce Application
Data types in Hadoop
Run a Map Reduce Program
Input Formats in Map Reduce
Reduce Side Join
Map Side Join/Replicated Join/Composite Join
In-Memory Map Side Join/Distributed Cache

PIG & HIVE

PIG:

Need of PIG
Why should we go for PIG where there is MR
Where not to use PIG
What is PIG
Use cases where PIG is used
PIG – Basic Program Structure
PIG Execution
PIG Latin Program
PIG – Data Model
PIG Latin Operators
PIG UDF

HIVE:

Understand what is Hive and its Use Cases
Understand Hive Architecture and Hive Components
Analyze limitations of Hive
Implement Primitive and Complex types in Hive
Understand Hive Data Model
Perform basic Hive operations
Execute Hive scripts and Hive UDFs

Advance HIVE and HBASE

Implement Joins in Hive
Implement Dynamic Partitioning
Analyze Custom Map/Reduce Scripts
Create Hive UDF
Understand NoSQL Databases and HBASE
Analyze difference between HBASE and RDBMS
Understand HBASE Components and Storage Architecture
Analyze HBASE Read and Write
Perform HBASE Cluster Deployment
Understand HBASE Attributes
Understand Data Model and Physical Storage in HBASE
Execute basic commands on HBASE shell
Analyze Data Loading Techniques in HBASE
Implement HBASE API
Understand Zookeeper Data Model and its Services
Analyze relationship between HBASE and Zookeeper
Perform Advance HBASE Actions

SQOOP, FLUME, APACHE OOZIE, HCATALOG and Hadoop Project

Implement Flume and Sqoop

OOZIE

Understand Oozie
Schedule Job in Oozie
Implement OozieWorkflow
Implement OozieCoordinator

HCATALOG

Understand Oozie
How to use - Demo

Hardware and Software Requirement

Pre-requisites to install Apache Hadoop 2.0 VMWare Player

Minimum 4 GB RAM
Dual Core Processor or above

Follow below link to install and configure Hadoop 2.0 cluster @VMWare player for Pseudo Distributed Mode :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-apache-hadoop-cluster.html

Follow below link to install and configure Apache PIG :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache.html

Follow below link to install and configure Apache HIVE :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache_19.html

Follow below link to install and configure HBASE :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-hbase-on.html

Follow below link to install and configure SQOOP :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-sqoop.html