Workshop on Data Wrehouse |Data Mining

Collections of databases that work together are called data warehouses. This makes it possible to integrate data from multiple databases. Data mining is used to help individuals and organizations make better decisions.

Data Warehousing: In a typical setting, the database files reside on the server, but they can be accessed from many different computers in the organization. As the number and complexity of databases grows, we start referring to them together as a data warehouse. A data warehouse is a collection of databases that work together. A data warehouse makes it possible to integrate data from multiple databases, which can give new insights into the data.

Data Mining: Once all the data is stored and organized in databases, what's next? Many day-to-day operations are supported by databases. Queries based on SQL, a database programming language, are used to answer basic questions about data. But, as the collection of data grows in a database, the amount of data can easily become overwhelming. How does an Organization gets the most out of its data in the details? That’s where Data Mining comes in. Data Mining is a Process of analyzing data and summarizing it to produce useful information.

Topics to be covered in Workshop

Introduction to Data Mining

What is data mining?
Related technologies - Machine Learning, DBMS, OLAP, Statistics
Data Mining Goals
Stages of the Data Mining Process
Data Mining Techniques
Knowledge Representation Methods
Applications
Example: weather data

Data Warehouse and OLAP

Data Warehouse and DBMS
Multidimensional data model
OLAP operations
Example: loan data set

Data preprocessing

Data cleaning
Data transformation
Data reduction
Discretization and generating concept hierarchies
Installing Weka 3 Data Mining System
Experiments with Weka - filters, discretization

Data mining knowledge representation

Task relevant data
Background knowledge
Interestingness measures
Representing input data and output knowledge
Visualization techniques
Experiments with Weka - visualization

Attribute-oriented analysis

Attribute generalization
Attribute relevance
Class comparison
Statistical measures
Experiments with Weka - using filters and statistics

Data mining algorithms: Association rules

Motivation and terminology
Example: mining weather data
Basic idea: item sets
Generating item sets and rules efficiently
Correlation analysis
Experiments with Weka - mining association rules

Data mining algorithms: Classification

Basic learning/mining tasks
Inferring rudimentary rules: 1R algorithm
Decision trees
Covering rules
Experiments with Weka - decision trees, rules

Data mining algorithms: Prediction

The prediction task
Statistical (Bayesian) classification
Bayesian networks
Instance-based methods (nearest neighbor)
Linear models
Experiments with Weka - Prediction

Evaluating what's been learned

Basic issues
Training and testing
Estimating classifier accuracy (holdout, cross-validation, leave-one-out)
Combining multiple models (bagging, boosting, stacking)
Minimum Description Length Principle (MLD)
Experiments with Weka - training and testing

Mining real data

Preprocessing data from a real medical domain (310 patients with Hepatitis C).
Applying various data mining techniques to create a comprehensive and accurate model of the data.
Clustering
Basic issues in clustering
First conceptual clustering system: Cluster/2
Partitioning methods: k-means, expectation maximization (EM)
Hierarchical methods: distance-based agglomerative and divisible clustering
Conceptual clustering: Cobweb
Experiments with Weka - k-means, EM, Cobweb

Advanced techniques, Data Mining software and applications

Text mining: extracting attributes (keywords), structural approaches (parsing, soft parsing).
Bayesian approach to classifying text
Web mining: classifying web pages, extracting knowledge from the web
Data Mining software and applications

Duration: The duration of this workshop will be two consecutive days, with eight hour session each day in a total of sixteen hours properly divided into theory and hands on sessions.

Certification Policy:

Certificate of Participation for all the workshop participants.
At the end of this workshop, a small competition will be organized among the participating students and winners will be awarded with a 'Certificate of Excellence'.
Certificate of Coordination for the coordinators of the campus workshops.

Eligibility: There are no prerequisites. Anyone interested, can join this workshop.

Data Warehousing & Data Mining

Data Warehousing & Data Mining

Our Clients

For any Training Requirement or Workshops

Corporate Trainings

Workshops

Quick Links