Data Science using Python
About the Course
Data Science (DA) is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software. Analysis of data is a process of inspecting, cleaning, transforming, and modelling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.
Topics to be Covered in Workshop
Basics of Data Science
- AI vs ML vs DL vs Data Science
- Data Science Scope, Applications
- Data Science Introduction
- Predictive v/s Descriptive Data Analysis
- Data Science v/s Data Analytics
- Regression & Classification Problems
- What makes a Data Science Expert?
- The art of making stories from Data
- Use Cases and Case Studies
(This part will be taken for 3 hour to refresh python basics considering participants have basic idea of atleast one programming language)
Introduction to Python Programming
- What is Python?
- Installing Anaconda
- Understanding the Spyder Integrated Development Environment (IDE)
- Python basics and string manipulation
- lists, tuples, dictionaries, variables
- Control Structure – If loop, For loop and while Loop
- Single line loops
- Writing user defined functions
- Object oriented programming
- Working with Class&Inheritance
Statistic for Data
- Measure of Central Tendency – Mean, Mode and Median
- Grouped and Ungrouped Data
- Measure of Spread – IQR, Variance and Standard Deviation
- Covariance
- Correlation
- Kurtosis, Skewness
Analyzing the categorical Data
- Proportional Test
- Chi Square Test
- Fisher’s Exact Test
- Mantel Henszel test
Analyzing the Continuous Data
- One Sample T-Test
- Two Independent Samples Tests
- Paired T-test
- Wilcoxon Test
- Anova
- Kruskal Wallis Test
Probabilistic Theory
- Events and their Probabilities
- Rules of Probability
- Conditional Probability and Independence
- Distribution of a Random Variable
- Bayes Theorem
- Moment Generating functions Central
- Limit Theorem
- Expectation & Variance
- Standard Distributions – Bernoulli, Binomial & Multinomial
Data Structure & Data Manipulation in Python
- Intro to Numpy Arrays
- Creating ndarrays
- Indexing
- Data Processing using Arrays
- Mathematical computing basics
- Basic statistics
- File Input and Output
- Getting Started with Pandas
- Data Acquisition (Import & Export)
- Indexing
- Selection and Filtering
- Sorting & Summarizing
- Descriptive Statistics
- Combining and Merging Data Frames
- Removing Duplicates
- Discretization and Binning
- String Manipulation
Visualization in python, case studies
- Introduction to Visualization
- Visualization Importance
- Visualization Rules
- Working with Python visualization libraries
- Matplotlib
- Creating Line Plots, Bar Charts, Pie Charts, Histograms, Scatter Plots
Working with Seaborn
- Data Visualization using Seaborn
- Basic Plots, color palettes
- Plotting categorical data
- Visualizing linear relationship
- Plotting on data-aware grids
- HeatMap, Histogram, Barplot, Factor plot
- Density Plot, Joint Distribution Plot
Linear Regression
- Regression Problem Analysis
- Mathematical modelling of Regression Model
- Gradient Descent Algorithm
- Programming Process Flow
- Use cases
- Regression Table
- Heteroscedasticity
- Model Specification
- L1 & L2 Regularization
Linear Regression – Case Study & Project
- Programming Using python
- Building simple Univariate Linear Regression Model
- Multivariate Regression Model
- Apply Data Transformations
- Identify Multicollinearity in Data Treatment on Data
- Identify Heteroscedasticity
- Modelling of Data
- Variable Significance Identification
- Model Significance Test
- Bifurcate Data into Training / Testing Data set
- Build Model on Training Data Set
- Predict using Testing Data Set
- Validate the Model Performance
- Project 1: Boston Housing Prizes Prediction
- Project 2: Cancer Detection Predictive Analysis
- Best Fit Line and Linear Regression
Logistic Regression
- Variable and Model Significance
- Maximum Likelihood Concept
- Log Odds and Interpretation
- Regression Table
- Null Vs Residual Deviance
- Problem Analysis
- Cost Function Formation
- Mathematical Modelling
- Use Cases
Case Study & Project
- Model Parameter Significance Evaluation
- Drawing the ROC Curve
- Estimating the Classification Model Hit Ratio
- Isolating the Classifier for Optimum Results
- Project 3: Digit Recognition using Logistic Regression
Decision Trees with Case Study
- Forming a Decision Tree
- Components of Decision Tree
- Mathematics of Decision Tree
- Decision Tree Evaluation
- Practical Examples & Case Study
- Project 4: Intrusion Detection
Random Forests
- Random Forest Mathematics
- Examples & use cases using Random Forests
K-NN Algorithm – Applications & Case Studies
- Understanding the KNN
- Distance metrics
- Case Study on KNN
Support Vector Machine
- Concept and Working Principle
- Mathematical Modelling
- Optimization Function Formation
- The Kernel Method and Nonlinear Hyperplanes
- Use Cases
- Programming SVM using Python
- Project 5- Character recognition using SVM
- Project 6- Regression problem using SVM
- Project 7- Wisconsin Cancer Detection using SVM
Clustering
- Hierarchical Clustering
- K Means Clustering
- Use Cases for K Means Clustering
- Programming for K Means using Python
- Image Color Quantization using K Means Clustering Technique
- Cluster Size Optimization vs Definition Optimization
- Projects & Case Studies
Principle Component Analysis
- Dimensionality Reduction, Data Compression
- Curse of dimensionality
- Multicollinearity
- Factor Analysis
- Concept and Mathematical modelling
- Use Cases
- Programming using Python
Duration: The duration of this workshop will be five consecutive days, with 6-7 hours session per day
Certification Policy:
- Certificate of Participation for all the workshop participants.
- At the end of this workshop, a small competition will be organized among the participating students and winners will be awarded with a 'Certificate of Excellence'.
- Certificate of Coordination for the coordinators of the campus workshops.
Eligibility: There are no prerequisites. Anyone interested, can join this workshop.