Data Science using Python
Data Science (DA) is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software. Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.
Topics to be covered in Workshop
(20% Theory & 80% Hands-On Session)
Introduction to Machine Learning
- Understand Machine Learning and Python
- Knowledge on the Python language
- Community and ecosystem
- Understand the use of 'Python' in the industry
- Compare Python with other software in analytics
- Install Python and the packages useful for the course
- Perform basic operations in Python using command line
- Learn the use of Python IDE and Various GUI
- Use the ‘Python help’ feature in Python
- Knowledge about the worldwide Python community collaboration.
Exploratory Data Analysis
- Understanding the Exploratory Data Analysis(EDA)
- Implementation of EDA on various datasets
- Boxplots
- Understanding the cor() in Python
- EDA functions like summarize(), llist()
- Multiple packages in Python for data analysis
- The Fancy plots like Segment plot, HC plot in Python.
Introduction to Machine Learning
- Introduction to Machine Learning
- Exploratory data analysis and preprocessing
Walking with Python
- Understanding Python
- Introduction to Python notebook with elements of Python programming
Introduction to Python Programming
- The various kinds of data types in Python and its appropriate uses
- The built-in functions in Python like: seq(), cbind (), rbind(), merge()
- Knowledge on the various Subsetting methods
- Summarize data by using functions like: str(), class(), length(), nrow(), ncol()
- Use of functions like head(), tail(), for inspecting data
- Indulge in a class activity to summarize data.
Data Manipulation in Python
- The various steps involved in Data Cleaning
- Functions used in Data Inspection
- Tackling the problems faced during Data Cleaning
- Uses of the functions like grepl(), grep(), sub()
- Coerce the data
- Uses of the apply() functions
Machine Learning
- Introduction to Supervised and Unsupervised Learning Algorithms
- Linear Regression with Multiple Variables
- Logistic Regression
- Decision Trees [CART]
- k-Fold Cross Validation
- Bagging and Bootstrapping
- Random Forest
- Gradient Boosting (XGBoost)
- Principal component Analysis
- K-means clustering
- Hierarchical Clustering
- Market Basket Analysis
Data Mining: Clustering Techniques
- Introduction to Data Mining
- Cluster Analysis (Hierarchical Clustering, K- Means Clustering)
Machine Learning Techniques
- Types of Learning
- Supervised Learning
- Unsupervised Learning
- Advice for applying machine Learning
- Machine Learning System Design
Duration: The duration of this workshop will be two consecutive days, with eight hour session each day in a total of sixteen hours properly divided into theory and hands on sessions.
Certification Policy:
- Certificate of Participant for all the workshop participants.
- At the end of this workshop, a small competition will be organized among the participating students and winners will be awarded with a 'Certificate of Excellence'.
- Certificate of Coordination for the coordinators of the campus workshops.
Eligibility: There are no prerequisites. Anyone interested, can join this workshop.