BIG Data – Hadoop 2.x JAVA Developer Total Hours: 30 At the completion of the course students will be able to: Describe Hadoop 2.X and the Hadoop Distributed File System Describe the YARN framework Develop and run a Java MapReduce application on YARN Use combiners and in-map aggregation Write a custom partitioner to avoid data skew on reducers Perform a secondary sort Recognize use cases for built-in input and output formats Write a custom MapReduce input and output format Optimize a MapReduce job Configure MapReduce to optimize mappers and reducers Develop a custom RawComparator class Distribute files as LocalResources Describe and perform join techniques in Hadoop Perform unit tests using the UnitMR API Describe the basic architecture of HBase Write an HBase MapReduce application List use cases for Pig and Hive Write a simple Pig script to explore and transform big data Write a Pig UDF (User-Defined Function) in Java Write a Hive UDF in Java Use Oozie to create a MapReduce workflow Use Oozie to define and schedule workflows Hardware and Software Requirement Pre-requisites to install Apache Hadoop 2.0 VMWare Player Minimum 4 GB RAM Dual Core Processor or above Follow below link to install and configure Hadoop 2.0 cluster @VMWare player for Pseudo Distributed Mode : http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-apache-hadoop-cluster.html Follow below link to install and configure Apache PIG : http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache.html Follow below link to install and configure Apache HIVE : http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-apache_19.html Follow below link to install and configure HBASE : http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-hbase-on.html Follow below link to install and configure SQOOP : http://www.sqldatabaseconsulting.blogspot.in/2015/06/steps-to-install-and-configure-sqoop.html