This course will cover the major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort.
Detailed analysis of text data requires understanding of natural language text, which is known to be a difficult task for computers. However, a number of statistical approaches have been shown to work well for the "shallow" but robust analysis of text data for pattern finding and knowledge discovery. You will learn the basic concepts, principles, and major algorithms in text mining and their potential applications.
-You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.
-During this module, you will learn the overall course design, an overview of natural language processing techniques and text representation, which are the foundation for all kinds of text-mining applications, and word association mining with a particular focus on mining one of the two basic forms of word associations (i.e., paradigmatic relations).
-During this module, you will learn more about word association mining with a particular focus on mining the other basic form of word association (i.e., syntagmatic relations), and start learning topic analysis with a focus on techniques for mining one topic from text.
-During this module, you will learn topic analysis in depth, including mixture models and how they work, Expectation-Maximization (EM) algorithm and how it can be used to estimate parameters of a mixture model, the basic topic model, Probabilistic Latent Semantic Analysis (PLSA), and how Latent Dirichlet Allocation (LDA) extends PLSA.
-During this module, you will learn text clustering, including the basic concepts, main clustering techniques, including probabilistic approaches and similarity-based approaches, and how to evaluate text clustering. You will also start learning text categorization, which is related to text clustering, but with pre-defined categories that can be viewed as pre-defining clusters.
-During this module, you will continue learning about various methods for text categorization, including multiple methods classified under discriminative classifiers, and you will also learn sentiment analysis and opinion mining, including a detailed introduction to a particular technique for sentiment classification (i.e., ordinal regression).
-During this module, you will continue learning about sentiment analysis and opinion mining with a focus on Latent Aspect Rating Analysis (LARA), and you will learn about techniques for joint mining of text and non-text data, including contextual text mining techniques for analyzing topics in text in association with various context information such as time, location, authors, and sources of data. You will also see a summary of the entire course.
Text Mining and Analytics is the fourth course in the Data Mining specialization offered by the University of Illinois at Urbana-Champagne through Coursera. Text Mining builds upon the second course in the specialization, Text Retrieval and Search Engines....
Text Mining and Analytics is the fourth course in the Data Mining specialization offered by the University of Illinois at Urbana-Champagne through Coursera. Text Mining builds upon the second course in the specialization, Text Retrieval and Search Engines. Course topics include mining word relations, topic discovery, text clustering, text categorization and sentiment analysis. The course lists programming proficiency (especially in C++) and knowledge of probability and statistics. Keeping with the system established by other data mining specialization track courses, grading is based entirely upon 4 multiple choice quizzes with 10 questions apiece. You only get one attempt at the quizzes.
Text Mining and Analytics is information-packed. Each week has 2.5 to 4 hours of lecture content in video segments that generally range from 10 to 20 minutes. The videos quality is satisfactory but the explanations and content on the slides could be a bit clearer. Despite the long videos, there are no comprehension questions or exercises to interact with during or after lecture segments to reinforce learning. By the time you reach the quiz at the end of the unit, you may find yourself having to go back review certain videos to answer the questions. There is an optional programming assignment.
Text Mining and Analytics covers many useful data mining topics, but it has too much lackluster video content for its own good. I can’t help but feel like a better course would have been able to condense the videos down to cover the same topics in half the time, leaving room for more quizzes and exercises. This course could serve as useful as reference material but students watching straight through may find a lot of information going in one ear and out the other.
I give Text Mining and Analytics 2.5 out of 5 stars: Mediocre.
Martijn completed this course, spending 4 hours a week on it and found the course difficulty to be medium.
I started with this course because I am interested in topic modeling. The first three weeks were the most useful for me, and were sufficient to give me a good start. The rest of the course was a bit more difficult to follow, mostly because the lecturer started giving highlights and skipped many details. This upped the pace, and I frequently had to pause the lectures to think about the contents. For me, it would have been preferable to replace weeks 4-6 with a more in-depth treatment of an extension to topic mining. Also, the lecturer's pronounciation is often a unclear, and I really needed the subtitles.
Having said that, I did learn a lot and am able to start trying topic modeling in practice.
Kristina completed this course and found the course difficulty to be medium.
I liked the way I could find out about newest algorithms and trends, but I'd like for the ratio of theory and practice to be at least equal, since it's too much focused on the overview of everything there is. I've learned new concepts, but more R/Py examples could help.