Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Online Course

The Data Scientist’s Toolbox

Johns Hopkins University via Coursera


In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.


Data Science Fundamentals
-In this module, we'll introduce and define data science and data itself. We'll also go over some of the resources that data scientists use to get help when they're stuck.

R and RStudio
-In this module, we'll help you get up and running with both R and RStudio. Along the way, you'll learn some basics about both and why data scientists use them.

Version Control and GitHub
-During this module, you'll learn about version control and why it's so important to data scientists. You'll also learn how to use Git and GitHub to manage version control in data science projects.

R Markdown, Scientific Thinking, and Big Data
-During this final module, you'll learn to use R Markdown and get an introduction to three concepts that are incredibly important to every successful data scientist: asking good questions, experimental design, and big data.

Taught by

Jeff Leek


Related Courses


3.3 rating, based on 164 reviews

Start your review of The Data Scientist’s Toolbox

  • Brandt Pence completed this course, spending 1 hours a week on it and found the course difficulty to be very easy.

    This is the first course in the Data Science specialization, offered by Johns Hopkins through Coursera. The official schedule lists the time commitment as 4 weeks of study with 1-4 hours/week of work. In reality, this course can be easily completed in...
  • Life is Study

    Life is Study completed this course and found the course difficulty to be very easy.

    The Data Scientist’s Toolbox is essentially just an overview of the data science specialization track. It introduces the very basics of R and R studio, Git and Github and a few other things that will be used in the data science specialization track. It is basically a bunch of introductory and supplementary material that shouldn't be a standalone course. You can complete all the lecture videos in the entire course in about 2 hours. It's almost embarrassing that John Hopkins has a paid verified certificate option for this course and it is required to complete the data science specialization track.
  • Anonymous

    Anonymous completed this course.

    The whole "course" is doable in about 2 hours. It's laughably easy − the main point is to get you to install RStudio. Granted, if you've never used Github it'll be a bit bewildering, but I think it's wildly excessive to have a Signature Track certificate with a $49 fee to basically prove you're capable of signing up to a website.
  • Ricardo Vladimiro

    Ricardo Vladimiro completed this course, spending 1 hours a week on it and found the course difficulty to be very easy.

    This is a very simple introductory course of Coursera's Data Science Specialisation.

    It gives a brief overview of the eight other courses in the specialisation, an overview of what data science is (in the instructors opinion) and an overall crash course on R, RStudio and Github.

    The course is a blessing if you don't have any coding or computational statistics background since it is an appropriate introduction. If you have the background are taking the course for the specialisation sake, it is annoying but you can do it in (literally) 4 hours.
  • Anonymous

    Anonymous completed this course. this course is easy and can the whole project can be completed within a couple hours. Even so, this is a valuable and helpful course for those new to Github, R, and some statistics jargon. I agree with many other reviewers that this course...
  • Anonymous

    Anonymous completed this course.

    Your reaction to this course will likely depend on your background. If you already know about integrated programming environments (RStudio), version control (git and github), and mark-up languages (markdown and knitr) you will find it insultingly simple. If you don't know what I'm talking about, you badly need to do this course before attempting the rest of the specialisation. As a once upon a time programmer who has not kept up, I found it fairly easy but also very useful.
  • Anonymous

    Anonymous completed this course.

    This class should be both:
    - optional.
    - free.
    Instead, it's required, and I wasted money paying for it.

    Others have covered this class's material in their reviews. My two cents are, this content is insultingly simple if you have a decent amount of experience as a programmer. (I'm in the "danger zone" of their data scientist venn diagram, I guess). It is literally painful to sit through an "introduction to the command line" after many years working in Unix.

    It's inappropriate to place course overviews behind a paywall. I would not have paid the (now-reduced) $29 for this course if it weren't a gateway to the later ones.
  • Anonymous

    Anonymous completed this course.

    As of early 2016 there were ZERO code walkthroughs in the *INTRO* R programming course. I got a 98% on the R programming course (2nd in the sequence). I peeked in & didn't notice any change in the curricula. 1) I’m an experienced programmer & have...
  • Jasmine Mercier

    Jasmine Mercier completed this course, spending 1 hours a week on it and found the course difficulty to be very easy.

    This course provides a useful introduction to Git, GitHub, R, and RStudio, which are all very useful tools you'll need to complete the rest of the specialization as well as in real data science work. It also provides a philosophical overview of just what "data science" is all about, clarifying what kinds of work actual data scientists do.

    Unfortunately, all the material can be covered in a single day. The class should really have been a part of the R Programming course - paying $50 for the signature track (if you plan on doing the capstone project for the specialization) seems a bit too much.
  • Anonymous

    Anonymous is taking this course right now.

    Very difficult if you don't have any prior programming and computer science knowledge. This is not a beginners course. Frustrating, challenging, assumes students have enough prior knowledge to fill in the gaps during the lectures. As someone with no prior computer experience of any kind (other than basic Word) I was lost almost immediately.
  • Mohamed Sameh

    Mohamed Sameh completed this course and found the course difficulty to be very easy.

    The course is very easy, and its material is very shallow, it should have been only a first-week introduction to another course in Coursera's Data specialization track.
  • Greg Chapman

    Greg Chapman completed this course, spending 3 hours a week on it and found the course difficulty to be very easy.

    First, I should note that the time spent per week is actually the total time for completion. This course is a very straightforward introduction to the data science track and the tools that are going to be used in the rest of the track. And that's it, there's really no meat to this course, but if you're not familiar with Git or command-line interfaces it could be really helpful. I suppose if you're paying for the specialization you just have to amortize the cost of this course over the other courses in the track.
  • Anonymous

    Anonymous completed this course.

    If you have never worked with Git, GitHub, R, or RStudio, this is a great (very short) introduction.

    I have seen a lot of people in later courses of the specialization complain that they did not know how to upload the peer projects onto github, so I would say it can be a very useful class.
  • Adelyne Chan completed this course, spending 1 hours a week on it and found the course difficulty to be very easy.

    A very simple introduction to the tools used in data science, mainly to get students acquainted with using GitHub repositories and RStudio. I took this because it was recommended for the other courses in the specialisation, but if you are comfortable with using virtual tools it is probably possible to learn on the job with the subsequent courses. This course could feasibly be completed in a single sitting.

    Although I intend to take all the courses in the specialisation for its content (thus did not pay for a Verified Certificate), I am aware that a number of other people who are taking this for the specialisation did feel that it was not worth the money they paid given the very little content covered.
  • Matteo Ferrara completed this course, spending 1 hours a week on it and found the course difficulty to be very easy.

    By far the worst course ever taken. I understand they need to find a viable business model, but if I did not take courses on Coursera before it, I would have not bother to open Coursera again. It is even worse since their following course, introduction to R, has a very steep learning curve (let alone it is as well a very bad course compare with the rest of Coursera offer), they could have put all together in one course, one week for this and 7 for R, instead of 4 and 4.
  • Ryan Bowen completed this course, spending 2 hours a week on it and found the course difficulty to be very easy.

    This course essentially is just a walkthrough of the different programs that you will be utilizing throughout the rest of the Data Science Specialization (if you are continuing with the other courses). The fact that they allow you to pay for this course is a joke because you are taught next to nothing except how to install programs. If you plan on finishing the entire Specialization, then perhaps paying for it will benefit you in the long run. Otherwise, the class is easy, does not take much time, and sets you up to continue with the other courses.
  • Krishna Magar completed this course.

    Great course if you are hearing the words GitHub, R and RStudio for the first time. It will provide you the basics(practically) of these terms and rest of the courses in the Specialization. For people like programmers, it's just a mode of entry(only with the payment) to the rest of the Specialization courses. So, it's painful that this 2 hrs course is expanded to a month plus a fee. But then its a good bargain for the ovearall courses in Specialization. Cheers !
  • Guest completed this course.

    Taking this course was very annoying due to it's lack of structure(specifically when the quizzes should be taken) & shallow content (mostly overviews). Also if you have an older computer installing to software needed to complete the class maybe a challenge. I may try it again when I need a new computer.
  • Terrel Shumway completed this course, spending 2 hours a week on it and found the course difficulty to be very easy.

    The course was a good introduction. As an experienced programmer, I found the course quite easy. However, I do appreciate the focus on setting up the tools and environment. Too many people leave version control to chance. It is something people are just supposed to pick up as they go along.
  • Greg Kent

    Greg Kent completed this course, spending 2 hours a week on it and found the course difficulty to be easy.

    Overall, the class is pretty simple, and far easier than the other classes in this specialization. Really, the hardest part of this course was finding the complete instructions for the final project. The first part was found easily enough, but the other parts of the project are not obvious.

Never stop learning Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free