This six-week long Capstone course of the Data Mining Specialization will allow you to apply the learned algorithms and techniques for data mining from the previous courses in the Specialization, including Pattern Discovery, Clustering, Text Retrieval, Text Mining, and Visualization, to solve interesting real-world data mining challenges. Specifically, you will work on a restaurant review data set from Yelp and use all the knowledge and skills you’ve learned from the previous courses to mine this data set to discover interesting and useful knowledge. The design of the Capstone emphasizes: 1) simulating the workflow of a data miner in a real job setting; 2) integrating different mining techniques covered in multiple individual courses; 3) experimenting with different ways to solve a problem to deepen your understanding of techniques; and 4) allowing you to propose and explore your own ideas creatively.
The goal of the Capstone project is to analyze and mine a large Yelp review data set to discover useful knowledge to help people make decisions in dining. The project will include the following outputs:
Opinion visualization: explore and visualize the review content to understand what people have said in those reviews.
Cuisine map construction: mine the data set to understand the landscape of different types of cuisines and their similarities.
Discovery of popular dishes for a cuisine: mine the data set to discover the common/popular dishes of a particular cuisine.
Recommendation of restaurants to help people decide where to dine: mine the data set to rank restaurants for a specific dish and predict the hygiene condition of a restaurant.
From the perspective of users, a cuisine map can help them understand what cuisines are there and see the big picture of all kinds of cuisines and their relations. Once they decide what cuisine to try, they would be interested in knowing what the popular dishes of that cuisine are and decide what dishes to have. Finally, they will need to choose a restaurant. Thus, recommending restaurants based on a particular dish would be useful. Moreover, predicting the hygiene condition of a restaurant would also be helpful.
By working on these tasks, you will gain experience with a typical workflow in data mining that includes data preprocessing, data exploration, data analysis, improvement of analysis methods, and presentation of results. You will have an opportunity to combine multiple algorithms from different courses to complete a relatively complicated mining task and experiment with different ways to solve a problem to understand the best way to solve it. We will suggest specific approaches, but you are highly encouraged to explore your own ideas since open exploration is, by design, a goal of the Capstone.
You are required to submit a brief report for each of the tasks for peer grading. A final consolidated report is also required, which will be peer-graded, and the top 20 reports, as evaluated by peer review, will also be judged by an Industry Expert Committee who will select 10 of the reports as the Best Capstone Projects. The Industry Expert Committee is composed of multiple relevant industry experts representing different companies.
The Capstone project will include several peer graded tasks and a final report.
Kristina Šekrst completed this course and found the course difficulty to be medium.
The course was a quick finish to an 8-months specialization, and it felt a bit rushed. The major flaw in it is the fact that every week consists of two tasks: evaluating 5 reports, and doing a report by yourself, which takes a lot of time, especially...
The course was a quick finish to an 8-months specialization, and it felt a bit rushed. The major flaw in it is the fact that every week consists of two tasks: evaluating 5 reports, and doing a report by yourself, which takes a lot of time, especially if you want to do meaningful peer-reviews and understand the techniques and algorithms that your peers have used.
There were 6 tasks, based on Yelp datasets. The first was doing topic analysis; then clustering and cuisine maps - comparing similar cuisines, doing heatmaps; predicting and visualizing popular dishes and building a list - lots of text mining; building a small recommender system based on popular dishes and reviews; doing classification based on hygiene scores - whether a restaurant will pass the inspection.
The visualizations were a major part of the data mining reports, and d3 was the best tool for this, but we learned almost nothing about it in the last course, so it was up to you. Half of the tasks had a machine-graded part as well, where you had to submit a file with a dish list, or a classification list - and to achieve a great score, you needed to combine lots of tools. I planned to do originally everything in R, but in the ended I've used R, C++, Python, Weka and voodoo magic.
The worst part of the course was its end - we had to do task 6 and the final report (often 40 pages long), and the task 6 evaluation and final report (5 times 10-40 pages reports) evaluation as well in 10 days. This was extremely stressful.
I've learned a lot throughout the specialization because of forum interactions, and this was true for this course as well. Couldn't have done it without you guys. Instructors' involvement was non-existent, so I hope that these major issues will be fixed in the next iteration, and that the previous courses will have more practical assignments, rather than theoretical overviews. So far, my major criticism is focused on the too short duration of the capstone. Otherwise, it was a nice experience, but prepare yourself for lots of Googling and spending your nights on stackexchange. Three stars just because of the forums, which were self-contained.