Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.
This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.
Martijn Onderwater completed this course, spending 6 hours a week on it and found the course difficulty to be medium.
I really enjoyed taking this course! Initially, I was a bit annoyed with the various registrations that need to be filled out prior to the course (at edX, at Piazza, databricks, and one of the notebooks). But after this, everything was smooth sailing....
I really enjoyed taking this course! Initially, I was a bit annoyed with the various registrations that need to be filled out prior to the course (at edX, at Piazza, databricks, and one of the notebooks). But after this, everything was smooth sailing. The teachers explain the concepts well and they speak clearly. During the course, we touched various subjects that I am interested in: machine learning, spark, map reduce, python, and MLib. The lab exercices were at the right level for me (I have a solid background in math and software development) and took me about six hours each. I only got stuck at points where I had not read the instructions properly, and the active forum helped me through that. All in all I can recommend others to take this course!
--and of course: many thanks to the people at Berkeley for providing the class--