Python Tutorial for Statistical Data Analysis

This comprehensive Python tutorial series provides an introduction to statistical data analysis in healthcare. Through practical Jupyter notebooks, the course covers key data analysis techniques, from basic Python programming to advanced statistical models. The tutorial is structured across 10 sessions, each focusing on different aspects of statistical analysis with a hands-on approach using real healthcare datasets.

Course Materials and Resources

The full course and all materials can be accessed through the GitHub repository.

Sessions Overview

Session 1: Introduction to Python and Data Analysis

Session 1 Notebook
This session introduces basic Python programming concepts, focusing on data structures and their applications in analyzing healthcare datasets. It provides the foundation for working with data in Python.

Session 2: Data Preprocessing and Cleaning

Session 2 Notebook
This session covers data preprocessing techniques including handling missing values, data normalization, and feature engineering to prepare datasets for further analysis.

Session 3: Descriptive Statistics

Session 3 Notebook
This session explores measures of central tendency (mean, median, mode), measures of dispersion (variance, standard deviation), and introduces basic probability distributions in Python.

Session 4: Probability Distributions

Session 4 Notebook
In this session, students learn about different probability distributions such as normal, binomial, and Poisson distributions. It also covers hypothesis testing and statistical tests used to validate assumptions for sample and population.

Session 5: One-Way Hypothesis Testing

Session 5 Notebook
This session focuses on confidence intervals and hypothesis testing including Z-test and T-test.

Session 6: Two-Sample Hypothesis Tests

Session 6 Notebook
This section includes Type I & II errors, p-value, and how to distinguish homogeneity.

Session 7: Categorical Data Analysis

Session 7 Notebook
In this session, we explore Chi-Square Test and ANOVA (Analysis Of Variance).

Session 8: Regression Analysis

Session 8 Notebook
This session delves into basic concepts in regression analysis and how to form a regression line.

Session 9: Logistic Regression

Session 9 Notebook
This session covers logistic regression, sigmoid function along with the Central Limit Theorem (CLT).

Session 10: Classification

Session 10 Notebook
The final session is dedicated to classification algorithms such as logistic regression and decision trees. Students learn how to build models to classify data and evaluate their performance using metrics like accuracy, precision, and recall.

Learning Outcomes

By the end of this tutorial, students will be able to:

Perform comprehensive statistical analysis using Python
Create visualizations to represent healthcare data
Apply various statistical and machine learning methods to real-world problems
Develop and evaluate predictive models
Communicate results effectively and professionally

Prerequisites

Basic understanding of statistics
Familiarity with Python programming
Interest in healthcare data analysis

Tools and Libraries Used

Python 3.x
Jupyter Notebook
Pandas, NumPy
Matplotlib, Seaborn
Scikit-learn, Statsmodels