Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

R Programming in Data Science: High Volume Data

via LinkedIn Learning

Overview

Analyze high-volume data using R, the language optimized for big data. Learn how to produce visualizations, implement parallel processing, and integrate with SQL and Apache Spark.

Syllabus

Introduction
  • Wrangling high-volume data with R
  • Sample data set
1. Problems and Opportunities with High-Volume Data
  • Perspectives on high-volume data
  • Big data and available memory
  • Code: Finding available memory
  • Big data and CPU cycles
  • Code: How fast is your computer?
2. Visualizing High-Volume Data
  • High-volume data and visualizations
  • Code: Graphs for high-volume data
  • Code: rug() and jitter()
  • Code: Applying statistics to plots
  • Code: Subsampled graphs for high-volume data
  • Code: Trellising data across multiple charts
3. Working within the R Programming Language
  • R programming tools for high-volume data
  • Downsampling
  • Profile R code to find inefficiencies
  • Code: Profile R code to find inefficiencies
  • Avoid the copy-on-modify problem with R
  • Code: Avoid copy-on-modify with data.table
  • Optimization versus readability
4. Advanced High-Volume Techniques
  • Compile R functions
  • Parallel processing with R
  • Code: Parallel R functions
  • bigmemory, LaF, and ff packages
5. Use R with External Big Data Solutions
  • Store high-volume data in a database
  • Code: R with databases
  • Cloud computing with R
  • Sparklyr with R
  • Code: R with Sparklyr
Conclusion
  • Summary of high-volume data with R

Taught by

Mark Niemann-Ross

Reviews

Start your review of R Programming in Data Science: High Volume Data

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.