The best online introduction to data science course is Kirill Eremenko’s “Data Science A-Z.” The course, which has a 4.5-star weighted average rating over 3,071 reviews, is among the highest rated and most reviewed courses of the ones considered. It is the clear winner in terms of breadth and depth of coverage of the data science process. The instructor’s natural teaching ability is frequently praised by reviewers.
Udacity’s Intro to Data Analysis covers the data science process cohesively using Python, though it lacks a bit in the modeling aspect. It has a 5-star rating over one review. It is relatively new offering that is part of Udacity’s popular Data Analyst Nanodegree. The videos are well-produced and the instructor (Caroline Buckey) is clear and personable.
Data Science Fundamentals is a four-course series provided by Big Data University, which is an IBM initiative. The series covers the full data science process and introduces Python, R, and several other open-source tools. The courses have tremendous production value. Unfortunately, they have no review data on the major review sites that were used for this analysis.
I started creating my own data science master’s degree using online courses almost a year ago. I have taken many data science-related courses and audited portions of many more. I know the options out there, and what skills are needed for learners preparing for a data analyst or data scientist role.
For this guide, I spent 10+ hours trying to identify every online intro to data science course offered as of January 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings. For this task, I turned to none other than the open source Class Central community and its database of thousands of course ratings and reviews.
Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.
About the Data Science Career Guide
Class Central’s Data Science Career Guide is a six-piece series that recommends the best MOOCs for launching yourself into the data science industry. The first five pieces recommend the best courses for several data science core competencies (programming, statistics, the data science process, data visualization, and machine learning). The final piece is a summary of those courses and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.
Here are the parts of the series that have been published so far:
P.S. If you are looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.
How We Picked Courses to Consider
Each course must fit three criteria:
It must teach the data science process. More on that soon.
It must be on-demand or offered every few months.
It must be an interactive online course, so no books or read-only tutorials. Though these are viable ways to learn, this guide focuses on courses.
We believe we covered every notable course that fits the above criteria. Since there are seemingly hundreds of courses on Udemy, we chose to consider the most-reviewed and highest-rated ones only. There’s always a chance that we missed something, though. So please let us know in the comments section if we left a good course out.
How We Tested
We compiled average rating and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. We read text reviews and used this feedback to supplement the numerical ratings.
We made subjective syllabus judgment calls based on two factors:
Coverage of the data science process. Does the course brush over or skip certain subjects? Does it cover certain subjects in too much detail? See the next section for what this process entails.
Usage of common data science tools. Is the course taught using popular programming languages like Python and/or R? These aren’t necessary, but helpful in most cases so slight preference is given to these courses.
What is the Data Science Process?
What is data science? What does a data scientist do? These are the types of fundamental questions that an intro to data science course should answer. The following infographic from Harvard professors Joe Blitzstein and Hanspeter Pfister outlines a typical data science process, which will help us answer these questions.
Our goal with this introduction to data science course is to become familiar with the data science process. We don’t want to go in-depth coverage of specific aspects of the process, hence the “intro to” portion of the title. For each aspect, the ideal course explains key concepts within the framework of the process, introduces common tools, and provides a few examples (preferably hands-on).
We are only looking for an introduction. This guide therefore won’t include full specializations or programs like Johns Hopkins University’s Data Science Specialization on Coursera or Udacity’s Data Analyst Nanodegree. These compilations of courses elude the purpose of this series: to find the best individual courses for each subject to comprise a data science education. The next guides in the series will cover each aspect of the data science process in detail.
Basic Coding, Stats, and Probability Experience Required
Several courses listed below require basic programming, statistics, and probability experience. This requirement is understandable given that the new content is reasonably advanced and that these subjects often have several courses dedicated to them.
This experience can be acquired through our recommendations in the first two articles (programming, statistics) in this Data Science Career Guide.
Kirill Eremenko’s “Data Science A-Z” on Udemy is the clear winner in terms of breadth and depth of coverage of the data science process of the 20+ courses that qualified. It has a 4.5-star weighted average rating over 3,071 reviews, which places it among the highest rated and most reviewed courses of the ones considered. It outlines the full process, makes it clear that it can be iterative, and provides real-life examples. Reviewers love the instructor’s delivery and the organization of the content.
Though it doesn’t check our “usage of common data science tools” box, the non-Python/R tool choices (gretl, Tableau, Excel) are used effectively in context. Eremenko mentions the following when explaining the gretl choice (gretl is a statistical software package), though it applies to all of the tools he uses (emphasis mine):
In gretl, we will be able to do the same modeling just like in R and Python but we won’t have to code. That’s the big deal here. Some of you may already know R very well, but some may not know it at all. My goal is to show you how to build a robust model and give you a framework that you can apply in any tool you choose. gretl will help us avoid getting bogged down in our coding.
Listed below are the details for each course, including their description, syllabus, and prominent reviews.
Instructor: Kirill Eremenko, SuperDataScience Team
Cost: Varies depending on Udemy discounts, which are frequent. Can be purchased for as little as $10.
Estimatedtimeline: 21 hours
This course will give you a full overview of the Data Science journey. Upon completing this course you will know:
How to clean and prepare your data for analysis
How to perform basic visualization of your data
How to model your data
How to curve-fit your data
And finally, how to present your findings and wow the audience
This course will give you so much practical exercises that real world will seem like a piece of cake when you graduate this class. This course has homework exercises that are so thought provoking and challenging that you will want to cry… But you won’t give up! You will crush it. In this course you will develop a good understanding of the following tools:
This course has pre-planned pathways. Using these pathways you can navigate the course and combine sections into YOUR OWN journey that will get you the skills that YOU need.
Or you can do the whole course and set yourself up for an incredible career in Data Science. The choice is yours. Join the class and start learning today!
12: Building a robust geodemographic segmentation model
13: Assessing your model
14: Drawing insights from your model
15: Model maintenance
16: Part 3: Data Preparation
17: Business Intelligence (BI) Tools
18: ETL Phase 1: Data Wrangling before the Load
19: ETL Phase 2: Step-by-step guide to uploading data using SSIS
20: Handling errors during ETL (Phases 1 & 2)
21: SQL Programming for Data Science
22: ETL Phase 3: Data Wrangling after the load
23: Handling errors during ETL (Phase 3)
24: Part 4: Communication
25: Working with people
26: Presenting for Data Scientists
27: Homework Solutions
28: Bonus Lectures
“Kirill is the best teacher I’ve found online. He uses real life examples and explains common problems so that you get a deeper understanding of the coursework. He also provides a lot of insight as to what it means to be a data scientist from working with insufficient data all the way to presenting your work to C-class management. I highly recommend this course for beginner students to intermediate data analysts!”
“This course has been absolutely amazing. Very valuable actually being *shown* the whole process of data science while working through it yourself.”
“Outstanding content delivered in a user-friendly way. Kirill has a natural ability to teach. Everything is explained to the exact level of detail you would need with no assumptions made of previous knowledge. Highly recommended.”
Udacity’s Intro to Data Analysis is a relatively new offering that is part of Udacity’s popular Data Analyst Nanodegree. It covers the data science process clearly and cohesively using Python, though it lacks a bit in the modeling aspect. It has a 5-star rating over one review.
The videos are well-produced and the instructor (Caroline Buckey) is clear and personable. Lots of programming quizzes enforce the concepts learned in the videos. Students will leave the course confident in their new and/or improved NumPy and Pandas skills (these are popular Python libraries). The final project, which is graded and reviewed in the Nanodegree but not in the free individual course, can be a nice add to a portfolio.
Listed below are the details for the specialization, including each course’s description and syllabus.
Data Science Fundamentals is a four-course series provided by IBM’s Big Data University. It includes courses titled Data Science 101, Data Science Methodology, Data Science Hands-on with Open Source Tools, and R 101. It covers the full data science process and introduces Python, R, and several other open-source tools. Unfortunately, it has no review data on the major review sites that we used for this analysis, so we can’t recommend it over the above two options yet.
The courses have tremendous production value. The 5-hour “R 101” course at the end isn’t necessary for the purpose of this guide.
Listed below are the details for the specialization, including each course’s description and syllabus.
Estimatedtimeline: 13–18 hours, depending on if you take the “R 101” course at the end, which isn’t necessary for the purpose of this guide.
Dust off your lab-coat and stretch out your fingers and get ready for the journey of a lifetime that will have you see the everyday through a new lens. Looking at mundane events becomes interesting from the speed of your windshield wipers wiping off the rain to the rate of plant growth in ditches along highways under different conditions. As the study that leads into all things pertinent to humans in present, this path is a must for all who have even the slightest interest in this field.
This learning path currently consists of one course that introduces you to Data Science from a practitioner point of view, to courses that discuss topics such as data compilation, preparation and modeling throughout the life-cycle of data science from basic concepts and methodologies to advanced algorithms. It also discusses how to get some practical knowledge with open source tools, and introduces you to one of the most popular programming languages used by data scientists: R.
“Data Science: The Sexiest Job in the 21st Century”
Module 2: What do data science people do?
A day in the life of a data science person
R versus Python?
Data science tools and technology
Module 3: Data Science in Business
How should companies get started in data science?
Tips for recruiting data science people
“The Final Deliverable”
Module 4: Use Cases for Data Science
Applications for data science
“The Report Structure”
Module 5: Data Science People
Things data science people say
“What Makes Someone a Data Scientist?”
Course 2: Data Science Methodology
Module 1: From Problem to Approach
Business Understanding – Concepts & Case Study
Analytic Approach – Concepts & Case Study
Module 2: From Requirements to Collection
Data Requirements – Concepts & Case Study
Data Collection – Concepts & Case Study
Module 3: From Understanding to Preparation
Data Understanding – Concepts & Case Study
Data Preparation – Concepts & Case Study
Module 4: From Modeling to Evaluation
Modeling – Concepts & Case Study
Evaluation – Concepts & Case Study
Module 5: From Deployment to Feedback
Deployment – Concepts & Case Study
Feedback – Concepts & Case Study
Course 3: Data Science Hands-on with Open Source Tools
Module 1: Introducing Data Scientist Workbench
What is Data Scientist Workbench?
DSWB Account features
Creating a DSWB account
Managing data within My Data
Preparing data with OpenRefine
Module 2: Introducing Jupyter Notebooks
What are Jupyter notebooks?
Getting started with Jupyter
Data and Notebooks in Jupyter
Sharing your Jupyter Notebooks and data
Apache Spark in Jupyter Notebooks
Module 3: Introducing Zeppelin Notebooks
What are Zeppelin Notebooks?
Zeppelin for Scala
Getting started with Zeppelin
Managing your Interpreters in Zeppelin
Apache Spark in Zeppelin Notebooks
Module 4: Introducing RStudio IDE
What is RStudio IDE?
Uploading files, Installing Packages and loading libraries in RStudio IDE
Getting started with RStudio IDE
RStudio Environment and History
Apache Spark in RStudio IDE
Module 5: Introducing Seahorse
What is Seahorse?
A Glimpse of Seahorse’s Features
Getting started with Seahorse on DSWB
Creating and uploading Seahorse Workflows on DSWB
Exporting and Cloning the Seahorse Examples on DSWB
Course 4: R 101
Module 1: R basics
Math, Variables, and Strings
Vectors and Factors
Module 2: Data structures in R
Arrays & Matrices
Module 3: R programming fundamentals
Conditions and loops
Functions in R
Objects and Classes
Module 4: Working with data in R
Reading CSV and Excel Files
Reading text files
Writing and saving data objects to file in R
Module 5: Strings and Dates in R
String operations in R
Dates in R
Our #1 pick had a weighted average rating of 4.5 out of 5 stars over 3,068 reviews. Let’s look at the other alternatives, sorted by descending rating. Below you’ll find several R-focused courses, if you are set on an introduction in that language.
Python for Data Science and Machine Learning Bootcamp (Jose Portilla/Udemy): Full process coverage with a tool-heavy focus (Python). Less process-driven and more of a very detailed intro to Python. Amazing course, though not ideal for the scope of this guide. It, like Jose’s R course below, can double as both intros to Python/R and intros to data science. 21.5 hours of content. It has a 4.7-star weighted average rating over 1,644 reviews. Cost varies depending on Udemy discounts, which are frequent.
Data Science and Machine Learning Bootcamp with R (Jose Portilla/Udemy): Full process coverage with a tool-heavy focus (R). Less process-driven and more of a very detailed intro to R. Amazing course, though not ideal for the scope of this guide. It, like Jose’s Python course above, can double as both intros to Python/R and intros to data science. 18 hours of content. It has a 4.6-star weighted average rating over 847 reviews. Cost varies depending on Udemy discounts, which are frequent.
Data Science and Machine Learning with Python — Hands On! (Frank Kane/Udemy): Partial process coverage. Focuses on statistics and machine learning. Decent length (nine hours of content). Uses Python. It has a 4.5-star weighted average rating over 3,104 reviews. Cost varies depending on Udemy discounts, which are frequent.
Introduction to Data Science (Data Hawk Tech/Udemy): Full process coverage, though limited depth of coverage. Quite short (three hours of content). Briefly covers both R and Python. It has a 4.4-star weighted average rating over 62 reviews. Cost varies depending on Udemy discounts, which are frequent.
Applied Data Science: An Introduction (Syracuse University/Open Education by Blackboard): Full process coverage, though not evenly spread. Heavily focuses on basic statistics and R. Too applied and not enough process focus for the purpose of this guide. Online course experience feels disjointed. It has a 4.33-star weighted average rating over 6 reviews. Free.
Introduction To Data Science (Nina Zumel & John Mount/Udemy): Partial process coverage only, though good depth in the data preparation and modeling aspects. Okay length (six hours of content). Uses R. It has a 4.3-star weighted average rating over 101 reviews. Cost varies depending on Udemy discounts, which are frequent.
Applied Data Science with Python (V2 Maestros/Udemy): Full process coverage with good depth of coverage for each aspect of the process. Decent length (8.5 hours of content). Uses Python. It has a 4.3-star weighted average rating over 92 reviews. Cost varies depending on Udemy discounts, which are frequent.
Want to be a Data Scientist? (V2 Maestros/Udemy): Full process coverage, though limited depth of coverage. Quite short (3 hours of content). Limited tool coverage. It has a 4.3-star weighted average rating over 790 reviews. Cost varies depending on Udemy discounts, which are frequent.
Data to Insight: an Introduction to Data Analysis (University of Auckland/FutureLearn): Breadth of coverage unclear. Claims to focus on data exploration, discovery, and visualization. Not offered on demand. 24 hours of content (three hours per week over eight weeks). It has a 4-star weighted average rating over 2 reviews. Free with paid certificate available.
Data Science Orientation (Microsoft/edX): Partial process coverage (lacks modeling aspect). Uses Excel, which makes sense given it is a Microsoft-branded course. 12–24 hours of content (two-four hours per week over six weeks). It has a 3.95-star weighted average rating over 40 reviews. Free with Verified Certificate available for $25.
Data Science Essentials (Microsoft/edX): Full process coverage with good depth of coverage for each aspect. Covers R, Python, and Azure ML (a Microsoft machine learning platform). Several 1-star reviews citing tool choice (Azure ML) and the instructor’s poor delivery. 18–24 hours of content (three-four hours per week over six weeks). It has a 3.81-star weighted average rating over 67 reviews. Free with Verified Certificate available for $49.
Applied Data Science with R (V2 Maestros/Udemy): The R companion to V2 Maestros’ Python course above. Full process coverage with good depth of coverage for each aspect of the process. Decent length (11 hours of content). Uses R. It has a 3.8-star weighted average rating over 212 reviews. Cost varies depending on Udemy discounts, which are frequent.
Intro to Data Science (Udacity): Partial process coverage, though good depth for the topics covered. Lacks the exploration aspect, though Udacity has a great, full course on exploratory data analysis (EDA). Claims to be 48 hours in length (six hours per week over eight weeks), but is shorter in my experience. Some reviews think the set-up to the advanced content is lacking. Feels disorganized. Uses Python. It has a 3.61-star weighted average rating over 18 reviews. Free.
Introduction to Data Science in Python (University of Michigan/Coursera): Partial process coverage. No modeling and vizualization, though courses #2 and #3 in the Applied Data Science with Python Specialization cover these aspects. Taking all three courses would be too in depth for the purpose of this guides. Uses Python. Four weeks in length. It has a 3.6-star weighted average rating over 15 reviews. Free and paid options available.
Data-driven Decision Making (PwC/Coursera): Partial coverage (lacks modeling) with a business focus. Introduces many tools, including R, Python, Excel, SAS, and Tableau. Four weeks in length. It has a 3.5-star weighted average rating over 2 reviews. Free and paid options available.
A Crash Course in Data Science (Johns Hopkins University/Coursera): An extremely brief overview of the full process. Too brief for the purpose of this series. Two hours in length. It has a 3.4-star weighted average rating over 19 reviews. Free and paid options available.
The Data Scientist’s Toolbox (Johns Hopkins University/Coursera): An extremely brief overview of the full process. More of a set-up course for Johns Hopkins University’s Data Science Specialization. Claims to have 4–16 hours of content (one-four hours per week over four weeks), though one reviewer noted it could be completed in two hours. It has a 3.22-star weighted average rating over 182 reviews. Free and paid options available.
Data Management and Visualization (Wesleyan University/Coursera): Partial process coverage (lacks modeling). Four weeks in length. Good production value. Uses Python and SAS. It has a 2.67-star weighted average rating over 6 reviews. Free and paid options available.
The following courses had no reviews as of January 2017.
CS109 Data Science (Harvard University): Full process coverage in great depth (probably too in depth for the purpose of this series). A full 12-week undergraduate course. Course navigation is difficult since the course is not designed for online consumption. Actual Harvard lectures are filmed. The above data science process infographic originates from this course. Uses Python. No review data. Free.
Introduction to Data Analytics for Business (University of Colorado Boulder/Coursera): Partial process coverage (lacks modeling and visualization aspects) with a focus on business. The data science process is disguised as the “Information-Action Value chain” in their lectures. Four weeks in length. Describes several tools, though only covers SQL in any depth. No review data. Free and paid options available.
Introduction to Data Science (Lynda): Full process coverage, though limited depth of coverage. Quite short (three hours of content). Introduces both R and Python. No review data. Cost depends on Lynda subscription.
About Class Central Career Guides
Class Central Career Guides are recommendations for the best online courses and MOOCs.
Class Central Career Guides are recommendations for the best online courses and MOOCs. They have one goal: to enable you to quickly figure out which courses can help you learn new skills and advance your career. Our editorial picks are thoroughly researched using reviews written by Class Central users, as well as data from other sources and our own subjective analysis.
These guides are updated frequently to always reflect the best in online education.
Drop us a note at firstname.lastname@example.org if you have any feedback or requests for particular career guides — it will help us prioritize. Also, reach out to us if you want to help us create more of these career guides. We are looking for contributors!
David Venturi created a personalized data science master’s curriculum for himself using MOOCs. He has a dual degree in Chemical Engineering and Economics, and especially enjoys math, stats, and coding. He’s a huge baseball and hockey fan, and writes about the latter with a focus on analytics.