DBTECHNOSOLUTIONS

BIG Data – Hadoop 2.x JAVA Developer

Total Hours: 30

At the completion of the course students will be able to:

Describe Hadoop 2.X and the Hadoop Distributed File System
Describe the YARN framework
Develop and run a Java MapReduce application on YARN
Use combiners and in-map aggregation
Write a custom partitioner to avoid data skew on reducers
Perform a secondary sort
Recognize use cases for built-in input and output formats
Write a custom MapReduce input and output format
Optimize a MapReduce job
Configure MapReduce to optimize mappers and reducers
Develop a custom RawComparator class
Distribute files as LocalResources
Describe and perform join techniques in Hadoop
Perform unit tests using the UnitMR API
Describe the basic architecture of HBase
Write an HBase MapReduce application
List use cases for Pig and Hive
Write a simple Pig script to explore and transform big data
Write a Pig UDF (User-Defined Function) in Java
Write a Hive UDF in Java
Use Oozie to create a MapReduce workflow
Use Oozie to define and schedule workflows

Hardware and Software Requirement

Pre-requisites to install Apache Hadoop 2.0 VMWare Player

Minimum 4 GB RAM
Dual Core Processor or above

Follow below link to install and configure Hadoop 2.0 cluster @VMWare player for Pseudo Distributed Mode :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-apache-hadoop-cluster.html

Follow below link to install and configure Apache PIG :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache.html

Follow below link to install and configure Apache HIVE :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache_19.html

Follow below link to install and configure HBASE :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-hbase-on.html

Follow below link to install and configure SQOOP :

http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-sqoop.html