Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Data Ingestion with Python

via LinkedIn Learning

Overview

Prepare for a new career with $100 off Coursera Plus
Gear up for jobs in high-demand fields: data analytics, digital marketing, and more.
Learn how to use Python tools and techniques to solve one of the main challenges data scientists face: getting good data to train their algorithms.

Syllabus

Introduction
  • Why is data inegstion important?
  • What you should know
  • Using the exercise files
1. Data Ingestion Overview
  • Overview of data scientists work
  • Where does data come from?
  • Different types of data
  • The data pipeline (ETL)
  • Final destination (data lake)
2. Reading Files
  • Working in CSV
  • Working in XML
  • Working in Parquet, Avro, and ORC
  • Unstructured text
  • JSON
  • Challenge: CSV to JSON
  • Solution: CSV to JSON
3. Calling APIs
  • Working with JSON
  • Making HTTP calls
  • Processing event-based data
  • Challenge: Location from IP
  • Solution: Location from IP
4. Web Scraping
  • Try to find an API
  • Working with Beautiful Soup
  • Working with Scrapy
  • Working with Selenium
  • Other considerations
  • Challenge: GitHub API
  • Solution: GitHub API
5. Schema
  • What are schemas?
  • Working with ontologies
  • What should be in schema
  • Schema changes
  • Schema validations
6. Working with Databases
  • Types of databases
  • Hosted and cost of ops
  • Working with relational databases
  • Working with key or value databases
  • Working with document databases
  • Working with graph databases
  • Challenge: ETL
  • Solution: ETL
7. Troubleshooting Data
  • Data is never 100% okay
  • Causes of errors
  • Filling missing values
  • Finding outliers (manual)
  • Finding outliers (ML)
  • Challenge: Clean rides according to ride duration
  • Solution: Clean rides according to ride duration
8. Data KPIs and Process
  • Design your data
  • KPIs
  • What to monitor?
Conclusion
  • Next steps

Taught by

Miki Tebeka

Reviews

4.2 rating at LinkedIn Learning based on 336 ratings

Start your review of Data Ingestion with Python

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.