07/06/2015
Hadoop & Data Science contents.
===========================
Batch started for Data Science for weekdays and weekend.
Module 1 Introduction tp Big Data, Hadoop and Data Science
What is Big Data and Hadoop?
Challenges of Big Data
Traditional approach Vs Hadoop
Analytics Value Chain
Business Intelligence & Data Analysis
Data Mining & Machine Learning
Hadoop Architecture & Other Architectures
OLTP vs OLAP
Distributed Model
Block structure File System
Technologies supporting Big Data
Replication
Fault Tolerance
Why Hadoop?
Hadoop Eco-System
Use cases of Hadoop
Fundamental Design Principles of Hadoop
Comparison of Hadoop Vs RDBMS
Module 2 Understand Hadoop Cluster Architecture & Map Reduce
Hadoop Cluster & Architecture
5 Daemons
Typical Workflow
Writing Files to HDFS
Reading Files from HDFS
Rack Awareness & Cluster Balancing
Before Map Reduce
Map Reduce Overivew
Word Count Problem
Word Count Flow & Solution
Map Reduce Flow
Log Processing and Map Reduce
What is Mapper?
What is Reducer?
What is Shuffling?
Module 3 Advanced Map Reduce Concepts
What is Combiner?
What is Partitioner?
What is Counter?
InputFormats/Output Formats
Map Join using MR
Reduce Join using MR
MR Distributed Cache
Using sequence files & images with MR
Module 4 Hadoop 2.0 Yarn
Hadoop 1.0 Challenges
NN Scalability
NN SPOF & HA
Job Tracker Challenges
Hadoop 2.0 New Features
Hadoop 2.0 Cluster Architecture & Federation
Hadoop 2.0 HA
Yarn & Hadoop Ecosystem
Yarn MR Application Flow
Module 5 PIG
Introduction to Pig
What Is Pig?
Pig’s Features & Pig Use Cases
Interacting with Pig
Basic Data Analysis with Pig
Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions
Hands-On Exercise: Pig for ETL Processing
Processing Complex Data with Pig
Storage Formats
Complex/Nested Data Types
Grouping
Built-in Functions for Complex Data
Iterating Grouped Data
Hands-On Exercises
Multi-Dataset Operations with Pig
Techniques for Combining Data Sets
Joining Data Sets in Pig
Splitting Data Sets
Hands-On Exercise
Module 6 HIVE
Hive Fundamentals & Architecture
Loading and Querying Data in Hive
Hive Architecture and Installation
Comparison with Traditional Database
HiveQL: Data Types, Operators and Functions,
Hive Tables ,Managed Tables and External Tables
Partitions and Buckets
Storage Formats, Importing Data, Altering Tables, Dropping Tables
Querying Data, Sorting and Aggregating, Map Reduce Scripts,
Joins & Sub queries, Views
When to Use HIVE, Impala and Pig
Hands on Exercises
Integration, Data manipulation with Hive
User Defined Functions,
Appending Data into existing Hive Table
Static partitioning vs dynamic partitioning
Module 7 HBASE
CAP Theorem
HBase Architecture and concepts
Introduction to HBase
Client API's and their features
HBase tables The ZooKeeper Service
Data Model, Operations
Programming and Hands on Exercises
Module 8 SQOOP
Introduction to Sqoop
MySQL Client & server
Connecting to relational data base using Sqoop
Importing data using Sqoop from Mysql
Exporting data using Sqoop to MySql
Incremental append
Importing data using Sqoop from Mysql to hive
Exporting data using Sqoop to MySql from hive
Importing data using Sqoop from Mysql to hbase
Using queries and sqoop
Module 9 Flume & Oozie
What is Flume?
Why use Flume, Architecture, configurations
Master, collector, Agent
Twitter Data Analysis project
Module 10 Oozie
What is Oozie, Architecture, configurations?
Oozie Job Submission
Oozie properties
Hands on exercises
Module 11 Project in Banking Domain
Hadoop Project in Banking Domain
Objective
Problem Definition
Solution
Discuss data sets and specifications of the project.
Module 12 Project in Telecom Domain
Hadoop Project in Telecom Domain
Objective
Problem Definition
Solution
Discuss data sets and specifications of the project.
Module 13 Project in Sentimental Analysis
Hadoop Project in Sentimental Analysis
Objective
Problem Definition
Solution
Discuss data sets and specifications of the project.
Module 14 Project in Sentimental Analysis & ETL Offloading
Hadoop Project in Sentimental Analysis & ETL Offloading
Objective
Problem Definition
Solution
Discuss data sets and specifications of the project.
Module 15 Project in Healthcare
Hadoop Project in Healthcare
Objective
Problem Definition
Solution
Discuss data sets and specifications of the project.