When Machine Learning Isn't Private

Overview

This course explores the privacy issues surrounding machine learning models, specifically focusing on the leakage of training data. The learning outcomes include understanding how adversaries can extract personally-identifiable information from models like GPT-2 and the challenges in preventing such leakage. The course teaches the concept of differential privacy as a secure solution, albeit with a trade-off in utility. The teaching method involves a lecture format with a presentation on the privacy problem and potential solutions. The intended audience for this course includes researchers looking to address privacy concerns in machine learning models and practitioners seeking practical techniques to test for data memorization.

Syllabus

THE ADVANCED COMPUTING SYSTEMS ASSOCIATION
Do models leak training data?
Act I: Extracting Training Data
A New Attack: : Training Data Extraction
1. Generate a lot of data 2. Predict membership
Evaluation
Up to 5% of the output of language models is verbatim copied from the training dataset
Case study: GPT-2
Act II: Ad-hoc privacy isn't
Act III: Whatever can we do?
3. Use differential privacy
Questions?

Taught by

USENIX Enigma Conference

Reviews

Start your review of When Machine Learning Isn't Private

BloomTech’s Downfall: A Long Time Coming

Most common

Popular subjects

Popular courses

When Machine Learning Isn't Private

Overview

Syllabus

Taught by

Reviews

BloomTech’s Downfall: A Long Time Coming

Taught by

Build and operate machine learning solutions with Azure Machine Learning

Protecting Sensitive Data in Huge Datasets - Cloud Tools You Can Use

Privacy Budget Scheduling

Practical Privacy-Preserving Machine Learning in Python

Diffprivlib - Privacy-Preserving Machine Learning with Scikit-Learn

Stealing Machine Learning Models via Prediction APIs

Never Stop Learning.