Big Data &amp; Hadoop

Big Data & Hadoop

Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become.

Hadoop is 100% open or free source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. And in today's hyper-connected world where more and more data is being created every day, Hadoop's breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless. The students would get to work on a Real Life Project on Big Data Analytics and gain hands on project.

Topics to be covered in Workshop

Day 1

Module 1: What is Big Data & Why Hadoop?

What is Big Data?
Traditional data management systems and their limitations
What is Hadoop?
Why is Hadoop used?
The Hadoop eco-system
Big data/Hadoop use cases

Module 2. HDFS (Hadoop Distributed File System) and installing Hadoop on single node

HDFS Architecture
HDFS internals and use cases
HDFS Daemons
Files and blocks
Namenode memory concerns
Secondary namenode
HDFS access options
Installing and configuring Hadoop
Hadoop daemons
Basic Hadoop commands
Hands-on exercise

Day 2

Module 3. Advanced HDFS concepts

HDFS workshop
HDFS API
How to use configuration class
Using HDFS in MapReduce and programmatically
HDFS permission and security
Additional HDFS tasks
HDFS web-interface
Hands-on exercise

Day 3

Module 4. Cloud computing overview and installing Hadoop on multiple nodes

Cloud computing overview
SaaS/PaaS/IaaS
Characteristics of cloud computingSaaS/PaaS/IaaS
Cluster configurationsSaaS/PaaS/IaaS
Configuring Masters and Slaves
Module 5.Introduction to MapReduce
MapReduce basics
Functional programming concepts
List processing
Mapping and reducing lists
Putting them together in MapReduce
Word Count example application
Understanding the driver, mapper and reducer
Closer look at MapReduce data flow
Additional MapReduce functionality
Fault tolerance
Hands-on exercises

Module 6. MapReduce workshop

Hands-on work on MapReduce

Module 7. Advanced MapReduce concepts

Understand combiners & partitioners
Understand input and output formats
Distributed cache
Understanding counters
Chaining, listing and killing jobs
Hands-On Exercise

Day 4

Module 8. Using Pig and Hive for data analysis

Pig program structure and execution process
Joins & filtering using Pig
Group & co-group
Schema merging and redefining functions
Pig functions
Understanding Hive
Using Hive command line interface
Data types and file formats
Basic DDL operations
Schema design
Hands-on examples

Module 9. Introduction to HBase, Zookeeper & Sqoop

HBase overview, architecture & installation
HBase admin: test
HBase data access
Overview of Zookeeper
Sqoop overview and installation
Importing and exporting data in Sqoop
Hands-on exercise

Day 5

Module 10. Introduction to Oozie, Flume and advanced Hadoop concepts

Overview of Oozie and Flume
Oozie features and challenges
How does Flume work
Connecting Flume with HDFS
YARN
HDFS Federation
Authentication and high availability in Hadoop

Module 11: Introduction about Data Science

Introduction: What is Data Science?, Getting started with R, Exploratory Data Analysis, Review of probability and probability distributions, Bayes Rule
Supervised Learning, Regression, polynomial regression, local regression, k-nearest neighbors,
Unsupervised Learning, Kernel density estimation, k-means, Naive Bayes, Data and Data Scraping
Classification, ranking, logistic regression
Ethics, time series, advanced regression

Duration: The duration of this workshop will be five consecutive days, with 6-7 hours session per day

Certification Policy:

Certificate of Participation for all the workshop participants.
At the end of this workshop, a small competition will be organized among the participating students and winners will be awarded with a 'Certificate of Excellence'.
Certificate of Coordination for the coordinators of the campus workshops.

Eligibility: There are no prerequisites. Anyone interested, can join this workshop.