Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Data Science on Google Cloud Platform: Building Data Pipelines

via LinkedIn Learning


Learn how to design and build big data pipelines on Google Cloud Platform.

Cloud computing brings unlimited scalability and elasticity to data science applications. Expertise in the major platforms, such as Google Cloud Platform (GCP), is essential to the IT professional. This course—one of a series by veteran cloud engineering specialist and data scientists Kumaran Ponnambalam—shows how to use the latest technologies in GCP to build a big data pipeline that ingests, transports, and transforms data entirely in the cloud. Learn how to set up data processing jobs using Apache Beam and Cloud Dataflow. Discover how to leverage Cloud Pub/Sub for stream ingestion and real-time messaging. Finally, find out how to process the stream events in Cloud Dataflow. The course uses an end-to-end use case that shows how to apply the knowledge and best practices from the course in a practical data science workflow.


  • What goes into a data pipeline?
  • Data science modules covered
1. GCP Data Pipeline Products
  • GCP data pipeline options
  • Cloud Dataproc
  • Cloud Dataflow
  • Cloud Pub/Sub
2. Apache Beam
  • What is Apache Beam?
  • Beam pipelines
  • PCollections
  • Transforms
  • Pipeline I/O
  • Runners
3. Setting Up Dataflow
  • Setting up GCP for Dataflow
  • Setting up Python
  • Creating a simple pipeline
  • Executing in Dataflow
4. Data Processing with Beam and Dataflow
  • Reading text files
  • ParDo
  • GroupBy
  • Map
  • Combine
  • Writing data to text files
  • Other capabilities
5. Cloud Pub/Sub
  • What is Pub/Sub?
  • Topics and messages
  • Publishers
  • Subscribers
  • Create a topic
  • Create a subscription
  • Publish and receive
  • Python SDK
6. Streaming with Dataflow
  • Streaming with Dataflow
  • Windowing with Dataflow
  • Streaming and windowing example
  • Next steps

Taught by

Kumaran Ponnambalam

Related Courses


Start your review of Data Science on Google Cloud Platform: Building Data Pipelines

Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free