Ask the right questions, manipulate data sets, and create visualizations to communicate results.
This Specialization covers foundational data science tools and techniques, including getting, cleaning, and exploring data, programming in R, and conducting reproducible research. Learners who complete this specialization will be prepared to take the Data Science: Statistics and Machine Learning specialization, in which they build a data product using real-world data.
The five courses in this specialization are the very same courses that make up the first half of the Data Science Specialization. This specialization is presented for learners who want to start and complete the foundational part of the curriculum first, before moving onto the more advanced topics in Data Science: Statistics and Machine Learning.
Course 1: The Data Scientist’s Toolbox - Offered by Johns Hopkins University. In this course you will get an introduction to the main tools and ideas in the data scientist's ... Enroll for free.
Course 2: R Programming - Offered by Johns Hopkins University. In this course you will learn how to program in R and how to use R for effective data analysis. You ... Enroll for free.
Course 3: Getting and Cleaning Data - Offered by Johns Hopkins University. Before you can work with data you have to get some. This course will cover the basic ways that data can ... Enroll for free.
Course 4: Exploratory Data Analysis - Offered by Johns Hopkins University. This course covers the essential exploratory techniques for summarizing data. These techniques are ... Enroll for free.
Course 5: Reproducible Research - Offered by Johns Hopkins University. This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible ... Enroll for free.
In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples.
Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
In this 1-hour long project-based course, you will learn exploratory data analysis techniques and create visual methods to analyze trends, patterns, and relationships in the data. By the end of this project, you will have applied EDA on a real-world dataset.
This class is for learners who want to use Python for applying data visualization and data analysis, and for learners who are currently taking a basic machine learning course or have already finished a machine learning course and are searching for a practical data visualization and analysis project course. Also, this project provides learners with basic knowledge about exploratory analysis and improves their skills in creating maps which helps them in fulfilling their career goals by adding this project to their portfolios.
This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
Brian Caffo, PhD, Jeff Leek, PhD and Roger D. Peng, PhD