Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions.
Machine learning brings together computer science and statistics to harness that predictive power. It’s a must-have skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions.
This is a class that will teach you the end-to-end process of investigating data through a machine learning lens. It will teach you how to extract and identify useful features that best represent your data, a few of the most important machine learning algorithms, and how to evaluate the performance of your machine learning algorithms.
This course is also a part of our Data Analyst Nanodegree.
Why Take This Course?
In this course, you’ll learn by doing! We’ll bring machine learning to life by showing you fascinating use cases and tackling interesting real-world problems like self-driving cars. For your final project you’ll mine the email inboxes and financial data of Enron to identify persons of interest in one of the greatest corporate fraud cases in American history.
When you finish this introductory course, you’ll be able to analyze data using machine learning techniques, and you’ll also be prepared to take our Data Analyst Nanodegree. We’ll get you started on your machine learning journey by teaching you how to use helpful tools, such as pre-written algorithms and libraries, to answer interesting questions.
You’ll learn how to start with a question and/or a dataset, and use machine learning to turn them into insights.
Lessons 1-4: Supervised Classification
Naive Bayes: We jump in headfirst, learning perhaps the world’s greatest algorithm for classifying text.
Support Vector Machines (SVMs): One of the top 10 algorithms in machine learning, and a must-try for many classification tasks. What makes it special? The ability to generate new features independently and on the fly.
Decision Trees: Extremely straightforward, often just as accurate as an SVM but (usually) way faster. The launch point for more sophisticated methods, like random forests and boosting.
Lesson 5: Datasets and Questions
Behind any great machine learning project is a great dataset that the algorithm can learn from. We were inspired by a treasure trove of email and financial data from the Enron corporation, which would normally be strictly confidential but became public when the company went bankrupt in a blizzard of fraud. Follow our lead as we wrestle this dataset into a machine-learning-ready format, in anticipation of trying to predict cases of fraud.
Lesson 6 and 7: Regressions and Outliers
Regressions are some of the most widely used machine learning algorithms, and rightly share prominence with classification. What’s a fast way to make mistakes in regression, though? Have troublesome outliers in your data. We’ll tackle how to identify and clean away those pesky data points.
Lesson 8: Unsupervised Learning
K-Means Clustering: The flagship algorithm when you don’t have labeled data to work with, and a quick method for pattern-searching when approaching a dataset for the first time.
Lessons 9-12: Features, Features, Features
Feature Creation: Taking your human intuition about the world and turning it into data that a computer can use.
Feature Selection: Einstein said it best: make everything as simple as possible, and no simpler. In this case, that means identifying the most important features of your data.
Principal Component Analysis: A more sophisticated take on feature selection, and one of the crown jewels of unsupervised learning.
Feature Scaling: Simple tricks for making sure your data and your algorithm play nicely together. Learning from Text: More information is in text than any other format, and there are some effective but simple tools for extracting that information.
Lessons 13-14: Validation and Evaluation
Training/testing data split: How do you know that what you’re doing is working? You don’t, unless you validate. The train-test split is simple to do, and the gold standard for understanding your results.
Cross-validation: Take the training/testing split and put it on steroids. Validate your machine learning results like a pro.
Precision, recall, and F1 score: After all this data-driven work, quantify your results with metrics tailored to what is most important to you.
Lesson 15: Wrapping it all Up
We take a step back and review what we’ve learned, and how it all fits together.
Mini-project at the end of each lesson
Final project: searching for signs of corporate fraud in Enron data
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.
Udacity's Intro to Machine Learning is an introduction to data analysis using Python and the sklearn package. The course consists of 15 lessons covering a wide range of machine learning topics including classification algorithms (Naive Bayes, decision...
Udacity's Intro to Machine Learning is an introduction to data analysis using Python and the sklearn package. The course consists of 15 lessons covering a wide range of machine learning topics including classification algorithms (Naive Bayes, decision trees and SVMs), linear regression, clustering, selecting and transforming features and validation. As a self-paced course, you can take however long you wish on each lesson; some take less than an hour, while others can take several hours depending on how long you work on the mini projects. Intro to Machine Learning requires basic programming and math skills.
Each lesson consists of a series of video segments and quizzes introducing a new topic followed by a mini-project that gives you a chance to work with code implementing the topics you learned in Python using scikit-learn. The course instructors Katie and Sebastian (the guy who runs Udacity) do a good job explaining the material keeping the course engaging, but they keep things simple. The quizzes, at times, are almost patronizingly easy. The mini projects are a bit harder and contribute more to learning, although they occasionally lack adequate guidance and feedback to help students arrive at the expected output. The final project and many of the mini-projects leading up to it, involve detecting persons of interest in the Enron scandal using a data set of emails sent by Enron employees. Interesting real-world data sets are always a plus.
Intro to Machine Learning is an accessible first course in machine learning that prioritizes breadth, high level understanding and practical tools over depth and theory. You won't be an expert in any of the topics covered in this course by the time you're done, but you will be exposed to several major topics in machine learning and have a basic understanding of how they work. If you are interested taking a similar course with many interesting mini projects that uses the R programming language, try MIT's Analytics Edge on edX. Coursera's Machine Learning with Andrew Ng is a logical next step to dig deeper into machine learning algorithm design and implementation, while Caltech's Learning from Data on edX is a great course if you are interested in machine learning theory. Just be aware that both of these courses (particularly the Caltech course) require a stronger math background.
I give this course 4 out of 5 stars: Very Good.
Anonymous completed this course.
I started this course after having taken the Coursera course of AndrewNg. My goal was to apply the algorithms in Python and to become familiar with Scikit learn. I have completed about 70% of Udacities intro to ML and I have to say I am very disappointed...
I started this course after having taken the Coursera course of AndrewNg. My goal was to apply the algorithms in Python and to become familiar with Scikit learn. I have completed about 70% of Udacities intro to ML and I have to say I am very disappointed about the quality of the course, especially about the quality of the videos and the quizzes. The mathematical level is broken down to high school level, which is good for the intuitive understanding, but in my opinion the level is far too low to learn anything serious, especially when comparing with AndrewNgs course. The same applies for the quizzes. Let me illustrate this with an example. Assume they want you to calculate a*b/(c*d+e*f). Then there would be a quiz to calculate a*b, another quiz to calculate c*d, another quiz to calculate e*f, another quiz to calculate c*d+e*f, and then finally the whole thing. One has to go through 6 videos and 5 quizzes to calculate a simple fraction. The programming assignments are similar in quality. I have to say I didnt finish the course and therefore I can not comment on the final project, which may be more serious. In conclucion, I can not recommend this course to anyone who has a serious interest in learning something about ML. Invest your time better!!
Anonymous is taking this course right now.
It's so cringe-worthy, I couldn't get past the first couple of sections. This is supposed to be a foundation for people wanting to pay to take the data science nanodegree. It's as of they're just not tskkmg it seriously at all. Painful to watch. Having completed and enjoyed the data analyst nanodegree, this has put me off further study with Udacity.
This course is video-based. All lectures are delivered in a good way. However, start this course if you have good listening power.
hello world this is foobar here - where are you ? i have been waiting only for an year now. Your review helps other learners like you discover great courses. Only review the course if you have taken or started taking this course.
Anonymous completed this course.
The math is sloppy and confusing. It often seems like he can't quite decide what he's asking for the probability of. Even worse, the expressions will suddenly change between slides with no explanation of why. In an attempt to simplify the math, they just muddle it up.
I'm not sure who the intended audience is for this course. It's conceptually too slow for anyone with sufficient background to do the math. Yet the math is almost unrecognizable to anyone who already knows it
Unfortunately, this is a lot of like other Udacity courses, that try too hard to be fun, and fail to be sufficiently substantive.
On a positive note, the Python examples are good.
Anonymous completed this course.
This is practical course, instructors are nice. If you like python you would love this course. Mathematics is not strong here but this an Intro to Machine learning and they are doing the best they can to expose us not only to machine learning algorithm but sci-kit learn api which keeps you hooked on this course. Once you get the idea of any algorithm you can go deeper into mathematical aspects of it. One of the issue I faced was the problem with quizzes few often they are a little opaque.
Sergej Novik completed this course and found the course difficulty to be easy.
The course will teach you the very basics of sklearn but not much of machine learning. Some core concepts are explained in an easy way. The quizzes are however sometime next to idiotic. It would be better to drop half of them altogether.
I gave it 4 because I did not know neither python nor sklearn and it was useful for me. If you know python then go somewhere else.
Anonymous completed this course.
I hated how the quiz questions weren't clearly written out (some missing information was said instead of shown visually). This stops you from skimming through the quizzes if you are already familiar with the concepts.