Hadoop Developer Course Content
Hadoop Admin Course Content
Introduction to Bigdata and Hadoop
- What is Bigdata
- Why Bigdata is needed
- Bigdata characteristics
- How to store and process Bigdata
- What is Hadoop
- Why Hadoop
- Hadoop history
- Software and hardware requirements for Hadoop
- Hadoop real time use case
- Major Components of Hadoop
- Hadoop ecosystem projects
- Scope of Hadoop
- Hadoop distributions
- Hadoop installation modes
Hadoop Developer Course Content
- Introduction to HDFS
- Why HDFS
- HDFS Commands
- Regular File system Vs Hadoop distributed file system
- HDFS Master/slave architecture
- Daemons in Hadoop
- HDFS concepts like blocks, Name node, Secondary name node, data node
- HDFS File reads
- HDFS File writes
- Fault Tolerance
- Details on Network topologies like nodes, clusters and racks
- Details aboutheartbeat?
- Details on rackawareness file
- HDFS Federation
- High Availability of Namenode
- Hadoop Archive files
- Distcp usage
- Assignment and Interview Questions on HDFS module
MapReduce
- Introduction to MapReduce framework
- MapReduce Architecture
- MapReduce execution phases
- Details on input splits, mappers, shuffle sort and reducers
- Eclipse plug in installation
- My first map reduce program
- Depth knowledge about Combiners
- Details on Tool runner
- Partitioner
- Realtime usecases to write MapReduce programs
Advanced MapReduce
- Counters
- Real time use case on Counters
- Secondary Sorting
- Map side joins
- Reducer side joins
- Classic MapReduce and Yarn
- Details on resource Manager, Application Master, Node Manager and Container
- Performance Tuning features
- Hadoop Streaming
- Hadoop Pipes
- File Input /Output Formats in MapReduce
- Distributed cache
- Assignment and Interview Questions on MapReduce and advance MapReduce
Hive
- Introduction to Hive
- Hive Architecture
- Difference between HQL and SQL
- Installation of Hive
- Depth knowledge on Managed Tables and External Tables
- Hive Data types
- Hive Create, Alter and drop tables
- Hive Multi table inserts
- Partitions in Hive with real time example
- Bucketing in Hive with real time example
- Hive storage formats
- Joins in hive
- Hive Indexes
- Hive Views
- Hive UDF
- Assignment and Interview Questions on Hive
Pig
- Introduction to Pig
- Details on pig data flow engine
- MapReduce Vs Hive Vs Pig
- When to use Pig
- Datatypes in Pig
- Modes of execution in Pig
- Pig programming
- Pig Execution models
- Operators in Pig
- Pig UDF
- Assignment and Interview Questions on PIG
HBASE
- Introduction to HBASE
- Basic Configurations of HBASE
- Fundamentals of HBase
- HBase Data Model
- HBASE Architecture
- SQL vs. NOSQL
- HDFS vs. HBase
- Client-side buffering or bulk uploads
- HBase Operations
- Assignment and Interview Questions on HBase
Sqoop
- Introduction to Sqoop
- Sqoop and sqoop2 architectural differences
- Sqoop Import
- Sqoop Incremental Import
- Sqoop Import-all
- Sqoop Export
- Sqoop Jobs
- Real time Example of Import/export from RDBMS (MySQL) to Hadoop
Flume
- Introduction to Flume
- Architecture of Flume
- Depth on Flumeagents
- Real time Data ingestion from Twitter to Hadoop using flume
- Assignment and Interview Questions
Introduction to Bigdata and Hadoop
- What is Bigdata
- Why Bigdata is needed
- Bigdata characteristics
- How to store and process Bigdata
- What is Hadoop
- Why Hadoop
- Hadoop history
- Software and hardware requirements for Hadoop
- Hadoop real time use case
- Major Components of Hadoop
- Hadoop ecosystem projects
- Scope of Hadoop
- Hadoop distributions
Planning Your Hadoop Cluster
- Hadoop Installation Modes
- Hadoop Releases
- Virtual machine set up
- Installing latest Cloudera Quick start VM
- Hadoop installation Pseudo Distributed Mode Cluster set up
- Hadoop Cluster Architecture
- Hadoop cluster planning
- Sizing the cluster
- In depth Details on configuration files
Multi node Cluster Setup and Maintenance
- Installing and configuring multi node cluster setup
- Adding and Removing Cluster Nodes
- Rebalancing the cluster
- Name Node Metadata Backup
- Decommissioning the nodes
- Cluster Upgrading
Hadoop Distributed File System (HDFS)
- Introduction to HDFS
- Why HDFS
- HDFS Commands
- Regular File system Vs Hadoop distributed file system
- HDFS Master/slave architecture
- Daemons in Hadoop
- HDFS concepts like blocks, Name node, Secondary name node, data node
- HDFS File reads
- HDFS File writes
- Fault Tolerance
- Details on Network topologies like nodes, clusters and racks.
- Details aboutheartbeat?
- Details on rackawareness file
- HDFS Federation
- High Availability of Namenode
- Hadoop Archive files
- Distcp usage
- HDFS Admin Commands
- Exercise
Over View of MapReduce 2.0
- Introduction to MapReduce framework
- MapReduce Architecture
- MapReduce execution phases
- Details on input splits, mappers, shuffle sort and reducers
- Classic MapReduce and Yarn
- Details on resource Manager, Application Master, Node Manager and Container
Cluster Administration using Cloudera Manger
- Cloudera Manager features
- Configuration management
- Resource management
- Reports in Cloudera Manager
- Alerts in Cloudera manager
- Service management
Installation and managing Hadoop Ecosystem
- Understanding Hive
- Installing and configuring Hive
- Understanding PIG
- Installing and configuring PIG
- Understanding SQOOP
- Installing and configuring SQOOP
- Understanding FLUME
- Installing and configuring FLUME
Advance Cluster Setups
- High Availability setup for Hadoop Clusters
- Setting up the Hadoop Environment in Amazon cloud EC2
Cluster Monitoring, Troubleshooting, and Optimizing
- Name Node and Job Tracker Web UI
- View and Manage Hadoop’s Log files
- GangliaMonitoring Tool
- Nagios monitoring Tool
- Common cluster issues and their resolutions
- Optimization Techniques for the cluster