The best online intro to programming course for people breaking into the data science field is the University of Toronto’s “Learn to Program” series on Coursera. “LTP1: The Fundamentals” and “LTP2: Crafting Quality Code” have a near-perfect weighted average rating of 4.71/5 stars over 284 reviews, and they have a great mix of content difficulty and scope for the beginner data scientist. This free, Python-based introduction to programming sets itself apart from the other 20+ courses considered.
If you already have some familiarity with programming and don’t mind a syllabus that has a notable skew towards games and interactive applications, I would also recommend Rice University’s “An Introduction to Interactive Programming in Python (Part 1 and Part 2)” on Coursera. With 6,000+ reviews and the highest weighted average rating of 4.93/5 stars, this popular course is noted for its engaging videos, challenging quizzes, and enjoyable mini projects. It is slightly more difficult and focuses less on the fundamentals and more on topics that aren’t applicable in data science than our #1 pick. These courses are also part of the 7-course Principles in Computing Specialization on Coursera.
CodeSkulptor: Browser-based Python programming environment used for Rice University’s MOOCs.
If you are set on R
If you are set on an introduction to programming course in R, we recommend DataCamp’s series of R courses: “Introduction to R,” “Intermediate R,” “Intermediate R – Practice,” and “Writing Functions in R.” Though the latter three come at a price point of $25/month, DataCamp is best in category for covering the programming fundamentals and R-specific topics, which is reflected in its average rating of 4.29/5 stars over 14 reviews.
Coming from a no code background, I started creating my own data science master’s degree using online courses almost a year ago. I scoured the introduction to programming landscape and have taken a few courses, and I have audited portions of many. I know the options and what content is needed for those targeting a data analyst or data scientist role. I also spent 20+ hours trying to find every single online introduction to programming course offered as of August 2016, extracting key bits of information from their syllabi and reviews, and compiling their ratings.
Dhawal Shah, the founder and continuing builder of Class Central, has kept the closest eye on online courses, arguably, of anyone in the world since 2011. The Class Central database has thousands of course ratings and reviews, and the homepages of most courses have hundreds to thousands more.
About Class Central Career Guides
Class Central Career Guides are recommendations for the best online courses and MOOCs.
Class Central Career Guides are recommendations for the best online courses and MOOCs. They have one goal: to enable you to quickly figure out which courses can help you learn new skills and advance your career. Our editorial picks are thoroughly researched using reviews written by Class Central users, as well as data from other sources and our own subjective analysis.
These guides are updated frequently to always reflect the best in online education.
Drop us a note at firstname.lastname@example.org if you have any feedback or requests for particular career guides — it will help us prioritize. Also, reach out to us if you want to help us create more of these career guides. We are looking for contributors!
About the Data Science Career Guide
Class Central’s Data Science Career Guide is a six-piece series that recommends the best MOOCs for launching yourself into the data science industry. The first five pieces recommend the best courses for several data science core competencies (programming, statistics, the data science process, data visualization, and machine learning). The final piece is a summary of those courses and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.
Here are the parts of the series that have been published so far:
The Best Intro to Programming Courses for Data Science (this one)
P.S. If you are looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.
Intro to Programming vs. Intro to Computer Science
Programming is not computer science and vice versa. There is a difference of which beginners may not be acutely aware. Borrowing this answer from Programmers Stack Exchange:
“Computer science is the study of what computers [can] do; programming is the practice of making computers do things.”
The course we are looking for introduces programming and optionally touches on relevant aspects of computer science that would benefit a new programmer in terms of awareness. Many of the courses considered, you’ll notice, do indeed have a computer science portion. None of the courses, however, are strictly computer science courses, which is why something like Harvard’s CS50x on edX is excluded. Most entering the data field won’t need detailed computation theory and topics like computer networking. This guide is focused on the practical skills that the vast majority of data scientists use.
How We Picked Courses to Consider
Each course must fit four criteria:
It introduces programming and, optionally, computer science. See above.
The language of instruction is Python or R. These are by far the two most popular programming languages used in data science.
It must be an interactive online course, so no books or text-based tutorials. Regarding the latter, Codecademy’s video-less and text editor-based courses would qualify, but strict text tutorials like the ones from R tutorial would not. Though books are viable ways to learn programming, Python, and R, this guide focuses on courses.
It must be a decent length: at least ten hours in total for estimated completion.
We believe we covered every notable course that exists and which fits the above criteria. Since there are seemingly hundreds of courses on Udemy in Python and R, we chose to consider the most reviewed and highest rated ones only. There is a chance we missed something, however. Please let us know if you think that is the case.
How We Tested
We compiled average rating and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. If a series had multiple courses, like Rice University’s Part 1 and Part 2, the weighted average rating across all courses was calculated. We also read text reviews and used this feedback to supplement the numerical ratings.
A subjective syllabus judgment was made — see the “why you should trust us” section. We were looking for three main characteristics:
Coverage of the fundamentals of programming.
Coverage of more advanced, but useful, topics in programming. (E.g. several courses choose to not cover object-oriented programming. We believe this is a key topic, though not a deal-breaker, hence these courses only being docked marks and not excluded from consideration.)
How much of the syllabus is relevant to data science?
“Learn to Program: The Fundamentals” (LTP1) and “Learn to Program: Crafting Quality Code” (LPT2) from the University of Toronto (via Coursera) introduces the fundamental building blocks of programming using Python. We believe it has the best combination of high ratings (second-highest weighted average rating of 4.81/5 stars across 284 reviews), coverage of the fundamentals of programming, coverage of more advanced programming topics, and scope applicability to data science. The reviews are consistently stellar. We believe every bit of the curriculum is useful for those continuing on to data analysis or data science, which cannot be said for our number two pick.
Jennifer Campbell and Paul Gries, two associate professors in the University of Toronto’s department of computer science (which is regarded as one of the best in the world) teach the series. The self-paced, self-contained Coursera courses match the material in their book, “Practical Programming: An Introduction to Computer Science Using Python 3.” LTP1 covers 40–50% of the book and LTP2 covers another 40%. The 10–20% not covered is not particularly useful for data science, which helped their case for being our pick.
The professors kindly and promptly sent me detailed course syllabi upon request, which were difficult to find online prior to the course’s official restart in September.
This course provides an introduction to computer programming intended for people with no programming experience. It covers the basics of programming in Python including elementary data types (numeric types, strings, lists, dictionaries, and files), control flow, functions, objects, methods, fields, and mutability.
Installing Python, IDLE, mathematical expressions, variables, assignment statement, calling and defining functions, syntax, and semantic errors.
Strings, input/output, function reuse, function design recipe, and docstrings.
Booleans, import, namespaces, and if statements.
For loops and fancy string manipulation.
While loops, lists, and mutability.
For loops over indices, parallel lists and strings, and files.
You know the basics of programming in Python: elementary data types (numeric types, strings, lists, dictionaries, and files), control flow, functions, objects, methods, fields, and mutability. You need to be good at these in order to succeed in this course.
LTP: Crafting Quality Code covers the next steps: designing larger programs, testing your code so that you know it works, reading code in order to understand how efficient it is, and creating your own types.
Designing algorithms: how do you decide what to do in a function body? How do you figure out what functions to write in the first place?
Automated testing: doctest and unittest.
Analyzing code for speed — details of searching and sorting.
Creating new types: classes in Python.
Functions as arguments, default parameter values, and exceptions.
Associate professor Gries also provided the following commentary on the course structure:
“Each module has between about 45 minutes to a bit more than an hour of video. There are in-video quiz questions, which will bring the total time spent studying the videos to perhaps 2 hours.”
These videos are generally shorter than ten minutes each. He continued:
“In addition, we have one exercise (a dozen or two or so multiple choice and short-answer questions) per module, which should take an hour or two. There are three programming assignments in LTP1, each of which might take four to eight hours of work. There are two programming assignments in LTP2 of similar size.”
Estimating time spent is incredibly student-dependent
He emphasized that the estimate of 6–8 hours per week is a rough guess:
“Estimating time spent is incredibly student-dependent, so please take my estimates in that context. For example, someone who knows a bit of programming, perhaps in another programming language, might take half the time of someone completely new to programming. Sometimes someone will get stuck on a concept for a couple of hours, while they might breeze through on other concepts … That’s one of the reasons the self-paced format is so appealing to us.”
In total, the University of Toronto’s Learn to Program series runs an estimated 12 weeks at 6–8 hours per week, which is about standard for most online courses created by universities. If you prefer to binge-study your MOOCs, that’s 72–96 hours, which could feasibly be completed in two to three weeks, especially if you have a bit of programming experience.
“Jennifer and Paul are both world-class professors who hit a home run with their course. They have proven that a properly architected online class is a superior form of learning. Their video lectures were understandable, efficient, and relevant, and were not overdone or too long. The exercises and quizzes were challenging and effective. Through all of this they bridged the cyber world and physical world by making you feel as though you were sitting right in their office and they were giving you personal instruction in programming.” https://www.classcentral.com/mooc/385/coursera-learn-to-program-the-fundamentals#review-2818
The courses and the instructors are regarded as fun, quirky, organized, and exceptionally delivered.
If the Learn to Program series doesn’t catch your eye, our #2 pick is Rice University’s “An Introduction to Interactive Programming in Python” (Part 1 and Part 2) on Coursera. It has an insane weighted average rating of 4.93/5 stars across over 6,000 reviews (!). The courses and the instructors (John Greiner, Stephen Wong, Scott Rixner, and Joe Warren) are regarded as fun, quirky, organized, and exceptionally delivered (the latter for the courses, obviously). The materials are self-paced and free, and a paid certificate is available. The course must be purchased for $79 (USD) for access to graded materials.
The condensed course description and full syllabus are as follows:
“This two-part course is designed to help students with very little or no computing background learn the basics of building simple interactive applications … To make learning Python easy, we have developed a new browser-based programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard, and the mouse.
Recommended background: A knowledge of high school mathematics is required. While the class is designed for students with no prior programming experience, some beginning programmers have viewed the class as being fast-paced. For students interested in some light preparation prior to the start of class, we recommend a self-paced Python learning site such as codecademy.com.
Timeline: 5 weeks
Estimated time commitment: 7–10 hours per week
Week 0 — statements, expressions, variables
Understand the structure of this class, and explore Python as a calculator.
Week 1 — functions, logic, conditionals
Learn the basic constructs of Python programming, and create a program that plays a variant of Rock-Paper-Scissors.
Week 2 — event-driven programming, local/global variables
Learn the basics of event-driven programming, understand the difference between local and global variables, and create an interactive program that plays a simple guessing game.
Week 3 — canvas, drawing, timers
Create a canvas in Python, learn how to draw on the canvas, and create a digital stopwatch.
Week 4 — lists, keyboard input, the basics of modeling motion
Learn the basics of lists in Python, model moving objects in Python, and recreate the classic arcade game “Pong.”
Week 5 — mouse input, list methods, dictionaries Read mouse input, learn about list methods and dictionaries, and draw images. Week 6 — classes and object-oriented programming Learn the basics of object-oriented programming in Python using classes, and work with tiled images.
Week 7 — basic game physics, sprites Understand the math of acceleration and friction, work with sprites, and add sound to your game.
Week 8 — sets and animation Learn about sets in Python, compute collisions between sprites, and animate sprites.
Why #1 Over #2
Despite its incredible popularity, Rice’s offering doesn’t capture our subjective testing criteria as well as the University of Toronto series — specifically criterion #1 and #3, as explained below.
1. Coverage of the fundamentals of programming.
Several reviews note that the fundamentals aren’t covered enough in the Rice course. The course description itself mentions that the course can be difficult for complete beginners.
3. How much of the syllabus is relevant to data science.
The syllabus has a notable skew towards designing games and interactive applications. Event-driven programming, drawing on canvases, motion modeling, game physics, and animation, all of which are core lessons in Rice’s course, don’t need to be in a data scientist’s toolbox.
Here are some reviews that voice our concerns, and justify Learn to Program being our #1 pick.
“I also took Introduction to Interactive Programming via @Coursera @ Rice. This Toronto class is a little bit better for absolute beginners and focuses a little bit more on fundamentals, like the title suggests. If you were going to take both classes, I would take this Fundamentals class [from Toronto] first as the Rice class can have a big workload for beginners who don’t have these fundamentals.” https://www.classcentral.com/mooc/385/coursera-learn-to-program-the-fundamentals#review-3321
The debate between choosing Python or R as your language of choice for data science is heated (exhibits A, B, and C). We won’t answer that question here.
We believe the best approach to learning programming for data science using online coursesis to do it first through Python. Why? There is a lack of MOOC options that teach core programming principles and use R as the language of instruction. We found six such R courses that fit our testing criteria, compared to twenty-two Python-based courses. Most of the R courses didn’t receive great ratings and failed to meet most of our subjective testing criteria.
One option stood out as worthy in particular, however. That was DataCamp’s series of R courses: “Introduction to R,” “Intermediate R,” “Intermediate R – Practice,” and “Writing Functions in R.” DataCamp’s R courses are one of two R-based introductions to programming that sufficiently covers the fundamentals of programming and more advanced programming topics in one package. They have a very respectable weighted average rating of 4.29/5 stars over 14 reviews. The first course is free. The latter three come at a price point of $25/month, but could very feasibly be completed in one month.
Another option for R would be to take a Python-based introduction to programming course to cover the fundamentals of programming, and then pick up R syntax with an R basics course. This is what I did, but I did it with Udacity’s Data Analysis with R. It worked well for me.
Our #1 and #2 picks had a 4.71- and 4.93-star weighted average rating over 284 and 6,069 reviews, respectively. Let’s look at the other alternatives.
Python courses (descending weighted average ratings)
“Programming for Everybody (Getting Started with Python)” and “Python Data Structures” (University of Michigan/Coursera): another great option. It has a great teacher (Dr. Charles “Chuck” Severance), as well. This series came close to usurping our #1 pick because it matched it in rating and in most of the subjective criteria. This course is more gentle, however, with reviewers noting that it might not prepare you as well as other options. Dr. Chuck himself noted that this course is a bridge to more advanced programming courses: “I would suggest that after students complete my Python course, if they are interested in more programming, that they would take the Rice course.” We also felt that the reviews for our #1 pick were more enthusiastic. It has a 4.8-star weighted average rating over 4,800+ reviews.
Intro to Programming Nanodegree (Udacity): it has a notable focus on web development. It’s a great option for someone who doesn’t know what type of programming they want to do. It has a 4.4-star weighted average rating over 730 reviews.
R Programming (Johns Hopkins University/Coursera): doesn’t sufficiently cover the basics of programming. Reviewers note that it is difficult, and not in a good way. It has a 4.04-star weighted average rating over 900+ reviews, despite a 2.5-star rating over 212 reviews on Class Central.
TryR (CodeSchool): it’s not long enough to fit testing criteria, and doesn’t sufficiently cover programming fundamentals. It has a 4-star weighted average rating over 260 reviews.
Programming with R for Data Science (Microsoft/edX): more of an introduction to the R language rather than programming. The course site states, “If you have some programming experience, and would like to learn more about R, then you’re at the right place.” It has a 3-star weighted average rating over 12 reviews.
Wrapping it Up
This is the first of a six-piece series that covers the best MOOCs for launching yourself into the data science field. The other pieces cover several other data science core competencies: statistics, data analysis, data visualization, and machine learning.
The final piece will be a summary of those courses, and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.
If you’re looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.
If you have suggestions for courses I missed, let me know in the comments!
David Venturi created a personalized data science master’s curriculum for himself using MOOCs. He has a dual degree in Chemical Engineering and Economics, and especially enjoys math, stats, and coding. He’s a huge baseball and hockey fan, and writes about the latter with a focus on analytics.