Scalability is one of the biggest challenges in data science. Learn how to evaluate data, choose the right algorithms, and perform predictive modeling at scale.
Overview
Syllabus
Introduction
- Scaling machine learning initiatives
- Defining terms
- Data and supervised machine learning
- The nine big data bottlenecks
- The stages of predictive analytics data
- Why you might have too little data
- How much data do I need?
- Balancing
- Who truly has big data?
- Assessing data
- Selecting: Data that should be left out
- Seasonality and time alignment
- Data and the data scientist
- Aggregate and restructure
- Dummy coding
- Feature engineering
- Understanding the modeling process
- Slow algorithms: Brute force
- Slow algorithms: More calculations
- Slow algorithms: More models
- How to sample properly
- Modeling with missing data
- Scoring traditional ML models
- Scoring a black box model
- Scoring an ensemble
- Batch vs. real-time scoring
- Data prep and scoring
- Combining batch and real-time scoring
- What is model monitoring?
- How often should you rebuild?
- Next steps
Taught by
Keith McCormick