Data visualization is a critical part of any data science project. Once data have been imported and wrangled into place, visualizing your data can help you get a handle on what’s going on in the data set. Similarly, once you’ve completed your analysis and are ready to present your findings, data visualizations are a highly effective way to communicate your results to others. In this course we will cover what data visualization is and define some of the basic types of data visualizations.
In this course you will learn about the ggplot2 R package, a powerful set of tools for making stunning data graphics that has become the industry standard. You will learn about different types of plots, how to construct effect plots, and what makes for a successful or unsuccessful visualization.
In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
About This Course
Data visualization is a critical part of any data science project. Once data have been imported and wrangled into place, visualizing your data can help you get a handle on what’s going on in the dataset. Similarly, once you’ve completed your analysis and are ready to present your findings, data visualizations are a highly effective way to communicate your results to others.
There are many types of plots that are helpful. We’ll discuss a few basic ones below and will include links to a few galleries where you can get a sense of the many different types of plots out there.
Making Good Plots
The goal of data visualization in data analysis is to improve understanding of the data. As mentioned in the last lesson, this could mean improving our own understanding of the data or using visualization to improve someone else’s understanding of the data. We discussed some general characteristics and basic types of plots in the last lesson, but here we will step through a number of general tips for making good plots. When generating exploratory or explanatory plots, you’ll want to ensure information being displayed is being done so accurately and in a away that best reflects the reality within the dataset. Here, we provide a number of tips to keep in mind when generating plots.
Plot Generation Process
Having discussed some general guidelines, there are a number of questions you should ask yourself before making a plot. There are three main questions you should ask any time you create a visual display of your data. We will discuss these three questions below.
R was initially developed for statisticians, who often are interested in generating plots or figures to visualize their data. As such, a few basic plotting features were built in when R was first developed. These are all still available; however, over time, a new approach to graphing in R was developed. This new approach implemented what is known as the grammar of graphics, which allows you to develop elegant graphs flexibly in R. Making plots with this set of rules requires the R package ggplot2. This package is a core package in the tidyverse, so as along as the tidyverse has been loaded in, you’re ready to get started.
So far, we have walked through the steps of generating a number of different graphs (using different geoms) in ggplot2. We discussed the basics of mapping variables to your graph to customize its appearance or aesthetic (using size, shape, and color within aes()). Here, we’ll build on what we’ve previously learned to really get down to how to customize your plots so that they’re as clear as possible for communicating your results to others. The skills learned in this lesson will help take you from generating exploratory plots that help you better understand your data to explanatory plots – plots that help you communicate your results to others. We’ll cover how to customize the colors, labels, legends, and text used on your graph. Since we’re already familiar with it, we’ll continue to use the diamonds dataset that we’ve been using to learn about ggplot2.
While we have focused on figures here so far, tables can be incredibly informative at a glance too. If you are looking to display summary numbers, a table can also visually display information.
Beyond the many capabilities of ggplot2, there are a few additional packages that build on top of ggplot2’s capabilities. We’ll introduce a few packages here so that you can (1) directly annotate points on plots (ggrepel and directlabels); (2) combine multiple plots (cowplot + patchwork); and (3) generate animated plots (gganimate). These are referred to as ggplot2 extensions There are dozens of additional ggplot2 extensions available if you’d like to explore other plotting options beyond what is covered here!
At this point, we’ve done a lot of work with our case studies. We’ve introduced the case studies, read them into R, and have wrangled the data into a usable format. Now, we get to peek at the data using visualizations to better understand each dataset’s observations and variables! When working through the steps of the case studies, you can use either RStudio on your own computer or Coursera lab spaces provided for each case study.
Project: Visualizing Data in the Tidyverse
In this project, you will practice exploring data and creating data visualizations with the tidyverse using nutrition and sales data from fast food restaurants in 2018.
Carrie Wright, PhD, Shannon Ellis, PhD, Stephanie Hicks, PhD and Roger D. Peng, PhD