Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Skillshare

Modern Web scraping With Python using Scrapy and Splash

via Skillshare

This course may be unavailable.

Overview

Web Scraping nowdays has become one of the hottest topics, there are plenty of paid tools out there in the market that doesn't show you anything how things are done as you will be always limited to their functionalities as a consumer.

In this course you won't be a consumer anymore, i'll teach you how you can build your own scraping tool ( spider ) using Scrapy.

You will learn:

  1. The fundamentals of Web Scraping

  2. How to build a complete spider

  3. Understand the crawling behavior
  4. Build a CrawlSpider
  5. The fundamentals of XPath

  6. How to locate content/nodes from the DOM using XPath

  7. How to store the data in JSON, CSV... and even to an external database(MongoDb)

  8. Writing your own custom Pipeline

  9. Fundamentals of Splash

  10. Scrape Javascript websites using Scrapy Splash

What makes this course different from the others, and why you should enroll ?

  • First, this is the most updated course. You will be using Python 3.6, Scrapy 1.5 and Splash 2.0

  • You will have an in-depth step by step guide on how to become a professional web scraper.

  • I'll show you how other courses scrape Javascript websites using Selenium and why shouldn't do it in their way.

  • You will learn how to use Splash to scrape Javascript websites and i can assure you won't find any tutorials out there that teaches how to really use Splash like i'll be doing in this course.

So whether you are a data analyst who wants to add web scraping to he's tool set or someone else who wants to learn how to extract unstructured data from unstructured HTML web pages and then store back that data in a structured way to apply some data analysis on it,  you are welcome to join this course.

Syllabus

  • Introduction
  • Where to find all the code
  • Web Scraping In Theory
  • Spiders and Robots.txt
  • Scrapy Terminology
  • Setting up the Development Environment on Linux
  • Installing VsCode on Linux
  • Setting up the Development Environment on Windows PART 1
  • Setting up the Development Environment on Windows PART 2
  • Scrapy files explained
  • Hello World Scrapy
  • Quick Update for Windows 64bits Users
  • XPath Terminology
  • XPath Syntax
  • XPath Axes
  • XPath Predicates
  • XPath Exercise
  • XPath Exercise Solution
  • Locating Quotes Authors and Tags
  • Scrapy XPath Selectors
  • Pagination
  • Feed Exporters
  • Items and ItemLoader
  • Input and Output processors
  • Final Touches
  • Deploying to the Cloud
  • MongoDb Terminology
  • Installing MongoDB on Linux
  • Installing MongoDb on Windows
  • Writing the MongoDb Pipeline
  • Data vizualisation
  • Why using Splash
  • Setting Up Splash On Linux
  • Writing Lua Scripts
  • Splash Request
  • Dealing with pagination
  • The Crawling Behaviour
  • The CrawlSpider simplified
  • Setting up the Rules
  • Challenge Solution(Building the Parse Method)
  • Technics Used by Websites Administrators to Prevent Web Scraping
  • Web Crawling Scraping Best Practices
  • Custom Middleware(User Agent Rotator Middleware)

Taught by

Ahmed Rafik Djerah

Reviews

Start your review of Modern Web scraping With Python using Scrapy and Splash

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.