Web Scraping with Python
Overview
This course dives into the process, strategies, and best practices of web scraping. Learn how to use the Python framework, Scrapy, to practice key techniques.
Instructor Ryan Mitchell teaches the practice of web scraping using the Python programming language. Ryan helps you understand how a human browsing the web is different from a web scraper. She introduces the Chrome developer tools and how to use them to examine network calls. Ryan shows you how to install Scrapy with pip and how to write some "Hello, World" code to scrape a simple web page. She covers how to use the Scrapy LinkExtractor to find internal links on a web page, then demonstrates how to configure Scrapy and the ItemPipeline to write data to various file formats. Ryan walks you through best practices for organizing your projects, writing reusable parsers, and future-proofing your spiders. She explains how APIs work and how they can be used to retrieve data directly. Ryan explores headers and cookies, then goes into browser automation and how to integrate Selenium with Scrapy. In conclusion, she offers ideas to continue your studies in computer science and think creatively about automation.
Instructor Ryan Mitchell teaches the practice of web scraping using the Python programming language. Ryan helps you understand how a human browsing the web is different from a web scraper. She introduces the Chrome developer tools and how to use them to examine network calls. Ryan shows you how to install Scrapy with pip and how to write some "Hello, World" code to scrape a simple web page. She covers how to use the Scrapy LinkExtractor to find internal links on a web page, then demonstrates how to configure Scrapy and the ItemPipeline to write data to various file formats. Ryan walks you through best practices for organizing your projects, writing reusable parsers, and future-proofing your spiders. She explains how APIs work and how they can be used to retrieve data directly. Ryan explores headers and cookies, then goes into browser automation and how to integrate Selenium with Scrapy. In conclusion, she offers ideas to continue your studies in computer science and think creatively about automation.
Syllabus
Introduction
- How to learn to stop worrying and love the bot
- What you should know
- What is web scraping?
- How the internet works: A brief summary
- Hello world with Scrapy
- Challenge: Scraping all data on a page
- Solution: Scraping all data on a page
- Crawling a website
- Recording data
- Scrapy settings file
- Structuring your scrapers for extensibility/reusability
- Challenge: Scraping news sites
- Solution: Scraping news sites
- Submitting a form
- Finding and using hidden APIs
- Site maps and robots.txt
- Challenge: Using CNN's sitemap
- Solution: Using CNN's sitemap
- Logging in
- Browser automation with Selenium
- Interacting with a page
- Next steps
Taught by
Ryan Mitchell
Related Courses
Reviews
0.0 rating, based on 0 reviews