Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Performance Analysis of Apache Spark and Presto in Cloud Environments

Databricks via YouTube

Overview

This course focuses on conducting a performance analysis of Apache Spark and Presto in cloud environments. The learning outcomes include understanding the performance and associated costs of these systems, usability in conjunction with monitoring, interoperability, and administration capabilities. The course teaches skills such as running the TPC-DS benchmark for SQL performance analysis. The teaching method involves presenting quantitative results and discussing the advantages and disadvantages of each system. The intended audience for this course includes data scientists, engineers, and individuals interested in deploying data analytics at scale.

Syllabus

Intro
About BSC
TPC-DS Benchmark Work
Context and motivation
Systems Under Test (SUTs)
Hardware configuration
Software configuration System Runtime 5.5
Benchmark execution time (base)
Cost-Based Optimizer (CBO) stats
Benchmark execution time (stats)
Speedup with table and column stats
Additional configuration for Presto
TPC-DS Power Test - Query 72
Dynamic data partitioning
Benchmark exec. time (part + stats)
Speedup with partitioning and stats
TPC Benchmark total execution time
TPC Benchmark DS metric
System costs
TPC Benchmark DS cost
TPC-DS price-performance
Usability and developer productivity
Conclusions

Taught by

Databricks

Reviews

Start your review of Performance Analysis of Apache Spark and Presto in Cloud Environments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.