About Course
This two-part series is designed to equip clinicians, researchers, and other healthcare professionals with practical machine-learning skills tailored to medical data. By focusing on real-world examples drawn from electronic health records and cohort studies, participants will gain the tools they need to prepare data, develop predictive models, and evaluate their performance in a clinical context. In Bright Health Science, this course is instructed by Ali M. Shabestari and Dr. Motahare Shabestari.
Part I: Python Programming & Data Preprocessing
Objectives:
- Introduce Python fundamentals, from variables and control flow to functions and modules.
- Master NumPy & pandas libraries for cleaning, transforming, and organizing tabular health datasets.
- Learn best practices for handling outlier values, categorical encoding, normalization, and feature engineering in medical data
- Apply techniques directly to sample datasets drawn from cohort studies and electronic health records
Part II: Machine-Learning Models & Clinical Implementation
Objectives:
- Explore core supervised-learning algorithms: classifiers (e.g., logistic regression, decision trees) and regressors (e.g., linear regression, random forest)
- Understand model assumptions, strengths, and ideal use cases in health fields.
- Develop skills for training, hyperparameter tuning, and cross-validation on tabular medical datasets.
- Learn rigorous evaluation metrics to assess clinical applicability.
Capstone Project
In the final module, participants will apply their new skills to a real medical dataset. Guided through the end-to-end ML workflow, they will:
- Prepare and preprocess raw clinical data
- Select and train appropriate models
- Tune hyperparameters for optimal performance
- Evaluate and interpret results with an eye toward clinical deployment
Course Content
Session 01: Introduction to Python
-
Prerequisites
Session 02: Variables, Input/Output, Data Types, Strings and Operators
Session 03: Control Flow, For Loop, While Loop
Session 04: Functions, Anonymous Function, Exeption Handling
Session 05: Reading & Writing in Pandas, Indexing, Selecting & Assigning, Summary Functions & Map
Session 06: Grouping & Aggregation, Merging and Combining, Data Types and Missing Values
Session 07: Dataset Introduction, Outlier Detection & Handling, Missing Data Imputation, Data Type Preprocessing
Session 08: Introduction to EDA, Primary Steps, Univariate Analysis, and Bivariate Analysis
Session 09: Introduction to NumPy & Arrays, Array Operations & Indexing, Statistical Analysis with NumPy
Session 10: Classification & Regression, Baseline Method, Prediction & Probability Threshold and Evaluation Metrics and Extra
Session 11: Decision Tree, Bagging, Boosting, Overfitting & Model Selection
Session 12: Hyperparameters, Hyperparameter Tuning, and Retraining
Session 13: Importance of Interpretability, Global Feature Importance and SHAP
Session 14: Feature Engineering, Feature Creation, Feature Selection and Dimensionality Reduction
Session 15: Saving and Packaging the Model, Environment Reproducibility, Introduction to Model Deployment and Conclusion & Next Steps
Session 16: Projects
Student Ratings & Reviews
No Review Yet