Overview
Class Central Tips
The Data Mining specialization is intended for data science professionals and domain experts who want to learn the fundamental concepts and core techniques for discovering patterns in large-scale data sets. This specialization consists of three courses: (1) Data Mining Pipeline, which introduces the key steps of data understanding, data preprocessing, data warehouse, data modeling and interpretation/evaluation; (2) Data Mining Methods, which covers core techniques for frequent pattern analysis, classification, clustering, and outlier detection; and (3) Data Mining Project, which offers guidance and hands-on experience of designing and implementing a real-world data mining project.
Data Mining can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.
Specialization logo image courtesy of Diego Gonzaga, available here on Unsplash: https://unsplash.com/photos/QG93DR4I0NE
Syllabus
Course 1: Data Mining Pipeline
- Offered by University of Colorado Boulder. This course introduces the key steps involved in the data mining pipeline, including data ... Enroll for free.
Course 2: Data Mining Methods
- Offered by University of Colorado Boulder. This course covers the core techniques used in data mining, including frequent pattern analysis, ... Enroll for free.
Course 3: Data Mining Project
- Offered by University of Colorado Boulder. Data Mining Project offers step-by-step guidance and hands-on experience of designing and ... Enroll for free.
- Offered by University of Colorado Boulder. This course introduces the key steps involved in the data mining pipeline, including data ... Enroll for free.
Course 2: Data Mining Methods
- Offered by University of Colorado Boulder. This course covers the core techniques used in data mining, including frequent pattern analysis, ... Enroll for free.
Course 3: Data Mining Project
- Offered by University of Colorado Boulder. Data Mining Project offers step-by-step guidance and hands-on experience of designing and ... Enroll for free.
Courses
-
Note: You should complete all the other courses in this Specialization before beginning this course. This six-week long Project course of the Data Mining Specialization will allow you to apply the learned algorithms and techniques for data mining from the previous courses in the Specialization, including Pattern Discovery, Clustering, Text Retrieval, Text Mining, and Visualization, to solve interesting real-world data mining challenges. Specifically, you will work on a restaurant review data set from Yelp and use all the knowledge and skills you’ve learned from the previous courses to mine this data set to discover interesting and useful knowledge. The design of the Project emphasizes: 1) simulating the workflow of a data miner in a real job setting; 2) integrating different mining techniques covered in multiple individual courses; 3) experimenting with different ways to solve a problem to deepen your understanding of techniques; and 4) allowing you to propose and explore your own ideas creatively. The goal of the Project is to analyze and mine a large Yelp review data set to discover useful knowledge to help people make decisions in dining. The project will include the following outputs: 1. Opinion visualization: explore and visualize the review content to understand what people have said in those reviews. 2. Cuisine map construction: mine the data set to understand the landscape of different types of cuisines and their similarities. 3. Discovery of popular dishes for a cuisine: mine the data set to discover the common/popular dishes of a particular cuisine. 4. Recommendation of restaurants to help people decide where to dine: mine the data set to rank restaurants for a specific dish and predict the hygiene condition of a restaurant. From the perspective of users, a cuisine map can help them understand what cuisines are there and see the big picture of all kinds of cuisines and their relations. Once they decide what cuisine to try, they would be interested in knowing what the popular dishes of that cuisine are and decide what dishes to have. Finally, they will need to choose a restaurant. Thus, recommending restaurants based on a particular dish would be useful. Moreover, predicting the hygiene condition of a restaurant would also be helpful. By working on these tasks, you will gain experience with a typical workflow in data mining that includes data preprocessing, data exploration, data analysis, improvement of analysis methods, and presentation of results. You will have an opportunity to combine multiple algorithms from different courses to complete a relatively complicated mining task and experiment with different ways to solve a problem to understand the best way to solve it. We will suggest specific approaches, but you are highly encouraged to explore your own ideas since open exploration is, by design, a goal of the Project. You are required to submit a brief report for each of the tasks for peer grading. A final consolidated report is also required, which will be peer-graded.
-
This course introduces the key steps involved in the data mining pipeline, including data understanding, data preprocessing, data warehousing, data modeling, interpretation and evaluation, and real-world applications. This course can be taken for academic credit as part of CU Boulder’s MS in Data Science or MS in Computer Science degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Data Science: https://www.coursera.org/degrees/master-of-science-data-science-boulder MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder Course logo image courtesy of Francesco Ungaro, available here on Unsplash: https://unsplash.com/photos/C89G61oKDDA
-
This course covers the core techniques used in data mining, including frequent pattern analysis, classification, clustering, outlier analysis, as well as mining complex data and research frontiers in the data mining field. This course can be taken for academic credit as part of CU Boulder’s MS in Data Science or MS in Computer Science degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Data Science: https://www.coursera.org/degrees/master-of-science-data-science-boulder MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder Course logo image courtesy of Lachlan Cormie, available here on Unsplash: https://unsplash.com/photos/jbJp18srifE
Taught by
Qin (Christine) Lv