Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Online Course

Getting and Cleaning Data

Johns Hopkins University via Coursera

(56)
1.3k
  • Provider Coursera
  • Cost Free Online Course (Audit)
  • Session Finished
  • Language English
  • Certificate Paid Certificate Available
  • Effort 4-9 hours a week
  • Duration 4 weeks long
  • Learn more about MOOCs

Taken this course? Share your experience with other students. Write review

Overview

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.

Syllabus

Week 1
-In this first week of the course, we look at finding data and reading different file types.

Week 2
-Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.

Week 3
-Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.

Week 4
-Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.


Taught by

Jeff Leek

Tags

Help Center

Most commonly asked questions about Coursera

Reviews for Coursera's Getting and Cleaning Data Based on 56 reviews

  • 5 stars 23%
  • 4 stars 39%
  • 3 stars 13%
  • 2 stars 13%
  • 1 stars 13%

Did you take this course? Share your experience with other students.

Write a review
  • 1
Life S
Life completed this course.
Getting and cleaning data is the third course in the first wave of John Hopkins’s data science specialization track on Coursera. It is recommended that you take this course after the data scientist's toolkit and R programming courses.

The title of the course pretty well sums up the content: the entire class is about loading data into R and cleaning it up so that it can be used of data analysis. You'll learn how to load various data formats into R, such as json, xml, csv, excel files and get data from other sources like MySQL and web APIs. The course also discusses subsetting data…
Read Full Review
29 people found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous is taking this course right now.
I'm a fresh beginner to R and my only experience with it is from the previous 2 courses in this specialization.

The lectures aren't so bad... they're a little bit boring and not engaging since they rarely are more than just a voiceover and slides. If that's important to you, don't take this class. However, I do think the instructors explain the lecture topics well and there is some value in their short walkthroughs.

Unfortunately... this only applies to the lecture topics... which are often only a small part of the quizzes and programming assignments. The previous cou…
20 people found
this review helpful
Was this review helpful to you? Yes
Stephen B
by Stephen completed this course.
Class information is very sparse. There's a huge gap between the (minimal) content provided in the lectures and the class project required for completion of the course. This is the worst constructed college course and worst MOOC I have ever encountered. I've completed 12 MOOCs, 2 bachelor's degrees, and several graduate courses at Stanford, so that is a distinction earned by Johns Hopkins U from among a very wide field. A complete overhaul of this course and series is desperately needed.
34 people found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous is taking this course right now.
Dropping this course because there is such a disconnect between what is taught and what is expected to complete the project and quizzes. I found myself using external sources to learn all of the material necessary. Many of the questions are vague, leaving you spending hours trying to complete tasks only to realize that the objective is different and just not communicated effectively. There is no coherent order to how they deliver the material, teaching basic concepts in week 3 which should have been covered in week 1 or the prior course in R programming. So, I will just use others' tutorials to learn data science in R. Ridiculous that I wasted so much time on this!
24 people found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
Extremely frustrating class, I spent tons of time wondering what is it that I am actually suppose to do...

I am considering dropping the specialization.
24 people found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
Course is lacking any kind of logic or structure. It's simply methods/functions thrown one after another. Complete lack of perspective.
22 people found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous is taking this course right now.
A rather poor and confusing course. The lectures are not so great. I'm rather dissapointed with it. Normally these courses are rather good, but not this one.
14 people found
this review helpful
Was this review helpful to you? Yes
Syed A
by Syed completed this course, spending 3 hours a week on it and found the course difficulty to be medium.
i didn't learn much from course lectures or materials, rather i learned most from stack over flow.really a big disappointment.
10 people found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous is taking this course right now.
This is the third course in the series, and it's taken me this long to realize that everything I learn comes from external sources and not the course itself. If you do this, you'll learn something. If you don't, you'll lose your mind and waste a ton of time in the process.

I started out by watching the videos, taking copious notes and then realizing that I didn't have the information I needed to complete the assignments. I was very stressed about it until my friend -- who uses R programming regularly for work -- shrugged and said, "That's how it works in the real world. You searc…
5 people found
this review helpful
Was this review helpful to you? Yes
Brandt P
by Brandt completed this course, spending 3 hours a week on it and found the course difficulty to be easy.
This is the third course in the Data Science specialization. The course is all about how to read data of different formats into R and how to create tidy datasets (one variable per column, one observation per row, one observational unit type per table). There are brief introductions to reading datasets from online resources such as XML files, website APIs, and MySQL, and the quizzes for weeks 1 and 2 require you to work with these tools. Week 3 introduces subsetting and reshaping data and tools like dplyr, and week 4 introduces working with text strings and regular expressions.

I f…
Was this review helpful to you? Yes
Ramesh N
Ramesh completed this course, spending 20 hours a week on it and found the course difficulty to be very hard.
This course just provides an outline on the subject. Its upto you to figure out how to get the assignment done .. Google and StackOverflow is your instructors .. Really! To make things worse, the course assignment instructions are very ambiguous and you spend tons of time trying to understand the problem than solving it. If thats the intend of this course, they have succeeded in it, but when you have a course deadline (and a full time job as many of you do), its extremely frustrating.
7 people found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous completed this course.
There is a complete disconnect between what is taught and what is expected in the project and tests. The course is pretty bad. I was considering doing the specialization in Data Science and this course is making me re-think this goal.

I understand that you need to be good at 'hacking' to be a good data scientist, but if that's the case then what's the point of paying money to have to Google everything.
1 person found
this review helpful
Was this review helpful to you? Yes
Hongmei L
by Hongmei completed this course.
There is a significant gap between the video lecture and the assignments/quizzes.

Very horrible... I paid my course for certification, and I cann't retake it for free.
4 people found
this review helpful
Was this review helpful to you? Yes
Andari R
Andari is taking this course right now, spending 4 hours a week on it and found the course difficulty to be very hard.
What were taught in video materials are nothing compared to the quiz and final projects. At this point I'm still re-reading my final project assignment data, and although I can sense some things that needs to be done to finish this project, it has taken me hours into StackOverflow or some other R blogs (just to make sure the command/formula I type is right). Very frustrating compared to other Coursera modules I finished. After this I may drop the Data Scientist specialisation altogether.
Was this review helpful to you? Yes
Scott O
Scott completed this course.
Getting and Cleaning Data promises to teach students how to extract data from common data storage formats (including databases, specifically SQL, XML, JSON, and HDF5), and from the web using API's and web scraping. The syllabus also includes tips on using R to clean and recode data, and, in the last lecture, a long list of links to sources of data. It's also worth noting that the style of the video lectures is a bit different from those of other classes I've taken: there's never any video of the instructor, just the instructor's voice over the lecture notes.
Read Full Review
3 people found
this review helpful
Was this review helpful to you? Yes
Jason C
by Jason completed this course, spending 4 hours a week on it and found the course difficulty to be hard.
This course teaches a lot of extremely important skills in data science. No matter what you end up doing, dealing with data quality is going to be a part of it. This is a challenging class, and rightly so, as the work is tedious, but oh-so-important! The lectures do get a bit bland, but are informative.
Was this review helpful to you? Yes
Daniel R
by Daniel completed this course, spending 6 hours a week on it and found the course difficulty to be medium.
Ok, this course is really helpful!

Everything on it has no waste at all, this course is a must for a data scientist!
Was this review helpful to you? Yes
Kuhnrl30 K
by Kuhnrl30 completed this course.
1 person found
this review helpful
Was this review helpful to you? Yes
Jevgeni M
Jevgeni completed this course.
1 person found
this review helpful
Was this review helpful to you? Yes
Anonymous
Anonymous is taking this course right now.
0 person found
this review helpful
Was this review helpful to you? Yes
  • 1

Class Central

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free

Never stop learning Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free