Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Pluralsight

Work with RDDs, DataFrames, and Datasets in Apache Spark

via Pluralsight

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!


RDDs and their immutability properties are needed in order to understand why these building blocks are used when processing large amounts of data in a parallel processing environment. In this course, Work with RDDs, DataFrames, and Datasets in Apache Spark, you’ll learn the difference between RDDs and DataFrames, when to use each one when representing data, and how they are processed underneath the hood with Apache Spark. You'll understand how these work, which will help you gain a better grasp on how big data processing is done in a platform such as Apache Spark and what it means for efficiency when transforming data to something meaningful. When you’re finished with this course, you’ll have a better understanding of how RDDs represent data in Apache Spark and when to use DataFrames over them when doing big data processing.

Taught by

Raphael Alampay

Reviews

Start your review of Work with RDDs, DataFrames, and Datasets in Apache Spark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.