This course is aimed at anyone interested in applying machine learning techniques to scientific problems. In this course, we'll learn about the complete machine learning pipeline, from reading in, cleaning, and transforming data to running basic and advanced machine learning algorithms. We'll start with data preprocessing techniques, such as PCA and LDA. Then, we'll dive into the fundamental AI algorithms: SVMs and K-means clustering. Along the way, we'll build our mathematical and programming toolbox to prepare ourselves to work with more complicated models. Finally, we'll explored advanced methods such as random forests and neural networks. Throughout the way, we'll be using medical and astronomical datasets. In the final project, we'll apply our skills to compare different machine learning models in Python.
Before the AI: Preparing and Preprocessing Data
In this module, we'll tackle the steps taken before we can use AI algorithms. We'll start with an introduction to the most prominent data preprocessing techniques including filling in missing values and removing outliers. Then we'll dive into data transformations including PCA and LDA, two methods featured heavily for dimensionality reduction. Finally, we'll learn how to code the algorithms in Python to set up your data for use in the next module.
Foundational AI Algorithms: K-Means and SVM
In this module, we'll dive into two of the most foundational machine learning algorithms: K-Means and support vector machines. We'll start by comparing the two branches of ML: supervised and unsupervised learning. Then, we'll go into the specific similarities and differences between K-Nearest neighbors for classification and K-Means clustering. Finally, we'll perform deep dives into K-Means and SVMs, learning the basic theory behind them and how to implement each in Python.
Advanced AI: Neural Networks and Decision Trees
In this module, we'll explore some advanced AI techniques. We'll start with tree-based algorithms, made popular because of the use of random forests for both classification and regression. Then, we'll build our way to neural networks, starting from experimentation on the different models. We'll spend some time in the Tensorflow playground getting familiar with the different mechanics behind neural networks. Finally, we'll code our own neural networks to make predictions on unseen data.
In this module, we'll go through a course project to predict diabetes from health data. We'll compare different regressors by implementing them and checking the error on a test set.
Sabrina Moore, Rajvir Dua and Neelesh Tiruviluamala