Machine Learning Feature Selection in Python

Overview

Class Central Tips

In this 1-hour long project-based course, you will learn basic principles of feature selection and extraction, and how this can be implemented in Python. Together, we will explore basic Python implementations of Pearson correlation filtering, Select-K-Best knn-based filtering, backward sequential filtering, recursive feature elimination (RFE), estimating feature importance using bagged decision trees, lasso regularization, and reducing dimensionality using Principal Component Analysis (PCA). We will focus on the simplest implementation, usually using Scikit-Learn functions.

All of this will be done on Ubuntu Linux, but can be accomplished using any Python I.D.E. on any operating system. We will be using the IDLE development environment to demonstrate several feature selection techniques using the publicly available Pima Diabetes dataset.

I would encourage learners to experiment using these techniques not only for feature selection, but hyperparameter tuning as well.

Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.