Anyone who has gone through a traditional college program will tell you about that one class that changed everything. The one that, when someone asks you about your college days, is the first course that comes to mind. Whether it was a groundbreaking subject or an enlightening, innovative professor, you will look back fondly on it later in your professional career. That course and that professor, came for me in Yaser S. Abu-Mostafa’s Learning From Data.
20 Foot View
Originally, I was interested in Andrew Ng’s flagship Machine Learning course via Coursera/Stanford which had just completed. While waiting for the next iteration, I read about CalTech’s Learning From Data telecourse which happened to be two weeks in already. Since the Netflix recommendation engine was a hot topic in all of the tech blogs, I hastily signed up. I ordered the hardcover book one-day-rush from Amazon, with visions of soon writing an algorithm to tell the masses what movies to watch. Truth be told, it was one of the most challenging courses I would ever take, and would set the precedent for my future interests and learning.
“It was one of the most challenging courses I would ever take”
The original telecourse, which is still being used, was broadcast live from the lecture hall at Caltech in April and May of 2012. In the fall of 2013 it was added to the edX platform with the same material and lectures but with a more liberal grading schema. There are many machine learning and big data courses popping up by all the MOOC providers, especially since Udacity’s Data Analytics Nanodegree launch. The glaring difference between Learning From Data and the rest, is the detailed and intricate understanding it provides of the elements that make up machine learning models and algorithms. More importantly though, absolutely nothing can or will replace the ability to sit in on a graduate level course such as this, which is something MOOCs have failed to replicate.
Yaser S. Abu-Mostafa
Yaser is the professor of Computer Science and Electrical Engineering at CalTech with expertise in machine learning and computational finance. He received his PhD in 1983, the same year he joined the CalTech faculty.
He founded NIPS, the premiere international conference on machine learning and has written many publications. Most recently, Yaser authored an article for the Scientific American in 2012 titled “Machines can think for themselves” speaking of the history and current trajectory of self aware systems, recommendation engines and learning algorithms.
The quality of Yaser’s lectures and the development of not only the course but the original platform can best be summed up in a quote taken from an interview he did with KD Nuggets in April of 2014:
“If you are thinking about doing an online course, do it very well or not at all. It will be a huge time sink, and it pays off only if it is truly distinguished.”
And also, Yaser does have a sense of humour. I will leave it to you to take the course and discover it for yourself.
For the first iteration of the Caltech telecourse taken in 2012, I had a modest familiarity of Python and PHP and no math or statistics background outside of high school. I made it about halfway before flipping the table and giving up. When the course was offered again later in the year, my ambition had not wavered, but the results were nearly identical. I finally passed Learning From Data when it moved to the edX platform in 2013 after which I had taken Udacity’s Introduction to Statistics and Programming a Robotic Car courses, as well as becoming more proficient in Python. (Let’s leave out the part where Yaser eased the passing percentage by about thirty points, I’m sure that had nothing to do with it!)
To be successful you must have a good understanding of Statistics, Probability, Linear Algebra and some Calculus
To be successful you must have a good understanding of Statistics, Probability, Linear Algebra and some Calculus. The syllabus states that some programming language or platform will help with the homework. I would say that this is a vast understatement. Relative expertise of at least one object-oriented or functional language is essential. Think Python, Java, R, Matlab, Octave, Haskell, etc. Learning From Data is very heavy on theory. And I mean all of it. This was a huge stumbling block during my first run. I was expecting something like “BAM!” this is how you implement Naive-Bayes, “Bang!” this is how you do Logistic Regression. I was ill-prepared to trudge through the miles of hypotheses, mathematical notations and symbols to arrive at a proof for each machine learning concept that was introduced. The end result was magnificent as I forced myself to memorize the notation provided so kindly in the printed version of Learning From Data’s glossary and develop a birds-eye understanding of Calculus.
As LIONlab puts it:
“Yaser’s course is not about giving you pre-digested baby food, but about strengthening your teeth.”
Course Up Close
The beginning of the course jumps right into the Perceptron Learning Algorithm and then delves into supervised, unsupervised and reinforcement learning. The first few sections are very heavy on probability distributions with bin sample problems which you can generally solve without any programming. There are a few questions on the PLA which just takes a little Python and help from Google. After that it becomes imperative to start building your own proof of concepts or seek assistance in the forums to find proven code samples. Really at no point does this course provide you with any guidance to solve the problems and it remains completely language-agnostic. It truly expects you to know on your own what programming language you will utilize and how to attack solving each problem. this is a huge departure from Andrew Ng’s Machine Learning course which walks you through the Octave necessary to solve some of the homework.
It does not become terribly clear until later on, but each week you are building a foundation and plugging ideas and proofs into this larger picture. At the end you should have several complete learning models at your disposal.
Each week you are building a foundation and plugging ideas and proofs into this larger picture
The first half of the lectures focus on building a Linear model by properly handling probability distribution with error and noise to fill training examples which feeds into the learning algorithm, where you will inevitably arrive at your final hypothesis. The second half describes at a very high level a number of different learning models such as Neural Networks, Support Vector Machines, and Radial Basis Functions.
With the free license to the downloadable LIONoso data visualization tools…you have the ability to import data, analyze it and view or modify your output to gain a concrete understanding of what you are learning.
One giant difference between the telecourse and the edX version, is the addition of a free license to the downloadable LIONoso data visualization tools provided by Italian data science company LIONlab. With this partnership you have the ability to import data, analyze it and view or modify your output to gain a concrete understanding of what you are learning. In the telecourse version, students were building their own visualizations and post them in the book forums for feedback. This is a huge hurdle removed for a number of students, some of whom would likely have dropped the course otherwise.
While Learning From Data was on the Caltech telecourse platform it was far more challenging, and if my memory serves me, required a passing grade of 70% or higher. There were weekly quizzes that typically consisted of 10 questions, plus a final exam. When the class was moved to the edX platform they eased up on the requirements and allowed for a 40% or higher to pass on the homework, dropping the lowest two scores. Unlike any MOOC I have taken before or after, you were only allowed one submission attempt. This increased the stress level to gigantic proportions. It left you sitting in front of the screen staring at your answers for hours to ensure they were at least hopefully correct. Even at 40% it was still VERY challenging.
Yaser personally responded to most forum questions at a phenomenal rate.
The syllabus states a time commitment of approximately ten to twenty hours a week. I can assure you that this was not an overstatement. There were two sets of lectures per week that clocked in around one hour each plus QA sessions. After extensive note-taking, I was able to get through the material in about five hours. The homework and research to attempt the homework took around another three to five hours. During the original telecourse, Yaser personally responded to most forum questions at a phenomenal rate. Never have I seen a professor as engaged in a distance/remote course. Students would get detailed responses typically within an hour of posting. I am fairly certain Yaser responded faster and more effectively than what I can recall from my local community college course.
It is akin to a really great movie, every time you watch it you discover more things visible only at a more granular level.
Clear your schedule and take no other classes alongside it. If for no other reason than it deserves that amount of attention. You must be a motivated learner who is able to create their own direction. The biggest challenge, other than the lack of hand-holding, is that it has the pace of a true graduate-level machine learning class. It assumes that you have the prerequisites and makes no effort give the student any background like a traditional MOOC would. Thanks in whole to Learning From Data, I have actually been able to truly understand and execute the material in the recently purchased books “Thoughtful Machine Learning” and “Building Machine Learning Systems With Python.” I have already decided to take this class again at some point, full of new knowledge and clarity obtained since the fall 2013 iteration. It is akin to a really great movie, every time you watch it you discover more things visible only at a more granular level.