Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Johns Hopkins University

Getting and Cleaning Data

Johns Hopkins University via Coursera


Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.

Taught by

Jeff Leek


3.4 rating, based on 58 Class Central reviews

Start your review of Getting and Cleaning Data

  • Life is Study

    Life is Study completed this course.

    Getting and cleaning data is the third course in the first wave of John Hopkins’s data science specialization track on Coursera. It is recommended that you take this course after the data scientist's toolkit and R programming courses. The title of…
  • Anonymous

    Anonymous is taking this course right now.

    I'm a fresh beginner to R and my only experience with it is from the previous 2 courses in this specialization. The lectures aren't so bad... they're a little bit boring and not engaging since they rarely are more than just a voiceover and slides.…
  • Stephen B

    Stephen B completed this course.

    Class information is very sparse. There's a huge gap between the (minimal) content provided in the lectures and the class project required for completion of the course. This is the worst constructed college course and worst MOOC I have ever encountered. I've completed 12 MOOCs, 2 bachelor's degrees, and several graduate courses at Stanford, so that is a distinction earned by Johns Hopkins U from among a very wide field. A complete overhaul of this course and series is desperately needed.
  • Anonymous

    Anonymous is taking this course right now.

    Dropping this course because there is such a disconnect between what is taught and what is expected to complete the project and quizzes. I found myself using external sources to learn all of the material necessary. Many of the questions are vague, leaving you spending hours trying to complete tasks only to realize that the objective is different and just not communicated effectively. There is no coherent order to how they deliver the material, teaching basic concepts in week 3 which should have been covered in week 1 or the prior course in R programming. So, I will just use others' tutorials to learn data science in R. Ridiculous that I wasted so much time on this!
  • Anonymous

    Anonymous completed this course.

    Extremely frustrating class, I spent tons of time wondering what is it that I am actually suppose to do...

    I am considering dropping the specialization.
  • Anonymous

    Anonymous completed this course.

    Course is lacking any kind of logic or structure. It's simply methods/functions thrown one after another. Complete lack of perspective.
  • Anonymous

    Anonymous is taking this course right now.

    A rather poor and confusing course. The lectures are not so great. I'm rather dissapointed with it. Normally these courses are rather good, but not this one.
  • Syed Aslam completed this course, spending 3 hours a week on it and found the course difficulty to be medium.

    i didn't learn much from course lectures or materials, rather i learned most from stack over flow.really a big disappointment.
  • Anonymous

    Anonymous is taking this course right now.

    This is the third course in the series, and it's taken me this long to realize that everything I learn comes from external sources and not the course itself. If you do this, you'll learn something. If you don't, you'll lose your mind and waste a ton…
  • Brandt Pence

    Brandt Pence completed this course, spending 3 hours a week on it and found the course difficulty to be easy.

    This is the third course in the Data Science specialization. The course is all about how to read data of different formats into R and how to create tidy datasets (one variable per column, one observation per row, one observational unit type per tabl…
  • Profile image for Ramesh Natarajan
    Ramesh Natarajan

    Ramesh Natarajan completed this course, spending 20 hours a week on it and found the course difficulty to be very hard.

    This course just provides an outline on the subject. Its upto you to figure out how to get the assignment done .. Google and StackOverflow is your instructors .. Really! To make things worse, the course assignment instructions are very ambiguous and you spend tons of time trying to understand the problem than solving it. If thats the intend of this course, they have succeeded in it, but when you have a course deadline (and a full time job as many of you do), its extremely frustrating.
  • Profile image for Mohd Azzani
    Mohd Azzani
    It's not free at all.
    Providing demo doesn't mean free
    I tried enrolling to the so called free course and I couldn't make it without providing credit card
    It's providing free demo but the course itself is not free at all
  • Profile image for Andari Reksi
    Andari Reksi

    Andari Reksi is taking this course right now, spending 4 hours a week on it and found the course difficulty to be very hard.

    What were taught in video materials are nothing compared to the quiz and final projects. At this point I'm still re-reading my final project assignment data, and although I can sense some things that needs to be done to finish this project, it has taken me hours into StackOverflow or some other R blogs (just to make sure the command/formula I type is right). Very frustrating compared to other Coursera modules I finished. After this I may drop the Data Scientist specialisation altogether.
  • Anonymous

    Anonymous completed this course.

    There is a complete disconnect between what is taught and what is expected in the project and tests. The course is pretty bad. I was considering doing the specialization in Data Science and this course is making me re-think this goal.

    I understand that you need to be good at 'hacking' to be a good data scientist, but if that's the case then what's the point of paying money to have to Google everything.
  • Hongmei Li

    Hongmei Li completed this course.

    There is a significant gap between the video lecture and the assignments/quizzes.
    Very horrible... I paid my course for certification, and I cann't retake it for free.
  • Michal
    The course is a part of very good 'data science with R' program (don't know current name cause it changes) available at Coursera.

    The program is quite massive, it contains about 8 courses but is really thorough and well presented. It is designed with even complete beginners in mind, so may start it without any prior knowledge.
  • Jason Michael Cherry completed this course, spending 4 hours a week on it and found the course difficulty to be hard.

    This course teaches a lot of extremely important skills in data science. No matter what you end up doing, dealing with data quality is going to be a part of it. This is a challenging class, and rightly so, as the work is tedious, but oh-so-important! The lectures do get a bit bland, but are informative.
  • Scott orr

    Scott orr completed this course.

    Getting and Cleaning Data promises to teach students how to extract data from common data storage formats (including databases, specifically SQL, XML, JSON, and HDF5), and from the web using API's and web scraping. The syllabus also includes tips on using R to clean and recode data, and, in the last lecture, a long list of links to sources of data. It's also worth noting that the style of the video lectures is a bit different from those of other classes I've taken: there's never any video of the instructor, just the instructor's voice over the lecture notes.
  • Profile image for Jevgeni Martjushev
    Jevgeni Martjushev

    Jevgeni Martjushev completed this course.

  • Daniel Rosquete completed this course, spending 6 hours a week on it and found the course difficulty to be medium.

    Ok, this course is really helpful!

    Everything on it has no waste at all, this course is a must for a data scientist!

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.