Text Retrieval and Search Engines

University of Illinois at Urbana-Champaign via Coursera

Go to class Write review

Details

Go to class

Provider

Coursera
Pricing

Free Online Course (Audit)
Languages

English
Certificate

Paid Certificate Available
Duration & workload

1 day 6 hours 49 minutes
Sessions

On-Demand
Subtitles

Arabic, French, Portuguese, Italian, German, Russian, English, Spanish, Korean, Thai, Indonesian, Kazakh, Hindi, Swedish, Greek, Chinese, Ukrainian, Japanese, Polish, Dutch, Turkish, Hungarian, Bengali, Pashto, Urdu, Azerbaijani, Farsi

Found in

Part of

Overview

Class Central Tips

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines.

Syllabus

Orientation

You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.

Week 1

During this week's lessons, you will learn of natural language processing techniques, which are the foundation for all kinds of text-processing applications, the concept of a retrieval model, and the basic idea of the vector space model.

Week 2

In this week's lessons, you will learn how the vector space model works in detail, the major heuristics used in designing a retrieval function for ranking documents with respect to a query, and how to implement an information retrieval system (i.e., a search engine), including how to build an inverted index and how to score documents quickly for a query.

Week 3

In this week's lessons, you will learn how to evaluate an information retrieval system (a search engine), including the basic measures for evaluating a set of retrieved results and the major measures for evaluating a ranked list, including the average precision (AP) and the normalized discounted cumulative gain (nDCG), and practical issues in evaluation, including statistical significance testing and pooling.

Week 4

In this week's lessons, you will learn probabilistic retrieval models and statistical language models, particularly the detail of the query likelihood retrieval function with two specific smoothing methods, and how the query likelihood retrieval function is connected with the retrieval heuristics used in the vector space model.

Week 5

In this week's lessons, you will learn feedback techniques in information retrieval, including the Rocchio feedback method for the vector space model, and a mixture model for feedback with language models. You will also learn how web search engines work, including web crawling, web indexing, and how links between web pages can be leveraged to score web pages.

Week 6

In this week's lessons, you will learn how machine learning can be used to combine multiple scoring factors to optimize ranking of documents in web search (i.e., learning to rank), and learn techniques used in recommender systems (also called filtering systems), including content-based recommendation/filtering and collaborative filtering. You will also have a chance to review the entire course.

Taught by

ChengXiang Zhai

Reviews

3.2 rating, based on 13 Class Central reviews

4.5 rating at Coursera based on 942 ratings

Start your review of Text Retrieval and Search Engines

Rafael Prados
Gregory J Hamel ( Life Is Study) @greg

Text Retrieval and Search Engines is the second course in Coursera's new data mining specialization offered by the University of Illinois at Urbana-Champaign. The course covers a variety of topics in text data mining and natural language processing…

Text Retrieval and Search Engines is the second course in Coursera's new data mining specialization offered by the University of Illinois at Urbana-Champaign. The course covers a variety of topics in text data mining and natural language processing including text retrieval, query ranking and evaluation methods, methods and the basics of recommender systems. Grading is based entirely on 4 weekly quizzes comprised of 10 multiple choice questions. You only get 1 attempt on the quizzes.

The weekly content in Text Retrieval and Search Engines consists of around 10 video lectures that range from 5 to 20 minutes followed by a short 10 question quiz. If that sounds like a lot of lecture per question, it is, and there are no in-lecture quizzes to reinforce concepts as you go along. The lectures themselves are definitely a step up from the first course in the specialization, Pattern Discovery in Data Mining. The professor isn't hard to understand this time around and he explains concepts well enough to grasp them without having to re-watch videos. As with many of Coursera's other 4-week specializations, however, lectures sometimes turn into information dumps where the professor ends up reading off slides. The course does have a C++ programming assignment which was nice to see.

Text Retrieval and Search Engines is a decent course that is worth a look if you are interested in text data mining and search engines. Although the lectures lackluster, they have some good information. If you're planning on getting a verified certificate, it is a good idea to try the practice quizzes before submitting the real one.

I give this course 2.75 out of 5 stars: Fair.
Marianne Cardwell

I've taken a number of courses on Coursera and have thoroughly enjoyed some of them, but it's clear that the quality varies. I was very disappointed in this course. Having applied to the University of Illinois' Master of Computer Science - Data Scie…

I've taken a number of courses on Coursera and have thoroughly enjoyed some of them, but it's clear that the quality varies. I was very disappointed in this course. Having applied to the University of Illinois' Master of Computer Science - Data Science, I thought it'd be a good idea to take some of their Coursera courses to get a sense of the quality of their education. I probably should have taken their classes first and then applied, saving me the trouble. If this is the type of instruction I can expect in the Masters program, I think I'll save myself the $19k in tuition.

The problems I have with this course are as follows:

- The quizzes for weeks 1 and 4 do not cover the material learned during those weeks. I've pointed it out on their forum and others have pointed it out in their reviews of the course on Coursera. This has not been fixed so they're not maintaining the course.
- I wanted to do the programming exercises to *really* learn something, not just go through the motions. I tried installing the required software, MeTA on two different Windows computers (W7 & W10) and it wouldn't install on either one. I was not the only one with the same problem but never found a solution. I then got a Linux VM and installed it successfully on there, only to be unable to install the UofI code required for the first assignment. Again, I was not the only one with this problem. For both issues, I posted on the forum. None of the "moderators" or instructors ever responded.
- The instructor was difficult to understand at first. Once you've listened to him for a bit, it gets easier though, so it was only a problem for me during the first couple of videos.
- I thought the instructor took too much time explaining some of the obvious things and too little time explaining the more complex things. More examples would have been very helpful.

I would not recommend paying to take this course as the quizzes aren't particularly useful and you most likely won't be able to get the programming assignments to work. I think this course is emblematic of one of the issues I see on Coursera: too much of a reliance on "peers" to help you.
Anonymous

I was initially excited for this course as it seemed a good dive into unstructured text data. But now I'd say: *skip this course*. I think the instructor is okay and presents the material in a sufficient enough manner to get a decent grasp of it.…

I was initially excited for this course as it seemed a good dive into unstructured text data. But now I'd say: *skip this course*. I think the instructor is okay and presents the material in a sufficient enough manner to get a decent grasp of it.

The reason I'd say skip this course is that the exercises are pretty bad. The class is only graded on quizzes and the optional programming assignments use an obscure text mining/analysis tool called MeTA which is time consuming to setup unless you're experienced in navigating the mess that open source C++ libraries are. Once you've set it up, it basically just runs you through a set of contrived steps that don't require any much programming or critical thinking.

To ACTUALLY learn document ranking and text retrieval, you really should have to get your hands dirty in constructing code that will do this, preferably on real world data or a very interesting test data set. And this course does not offer anything near this. I will complete this course, but only because I paid for it.

Thus I don't think most students will get more than a surface-level glance of text retrieval and search-engine-construction & document retrieval from this course.

And thus I'd say skip this course and find a better one. And to the instructors I'd say, add new programming assignments that require students to implement their own systems. Step-step handholding is no way to learn. Also, use tools that are more universal to the data science world.
Anonymous

Great class with a nice mix of theoretical and practical lessons. There was a competition at the end of the course which pushed us to come up with new ideas.
Anonymous

Precise and clear explanation about the concepts .This course completes focuses on text retrieval concepts with strong strong intro on what is text retrieval , what are the challenges faced and further gives an insight on various models and improvement in this field .Therefore, this course is mostly only for people more interested in an area in information retrieval.
Lien Block

The course is not very organised and even though they share a lot of information, it's not really very useful for someone who wants to get his/her hands dirty and really learn NLP/Text retrieval.

(+ Instructor is sometimes very hard to understand)
Kristina Šekrst

I'm encouraging more programming assignments dealing with NLP, and a bit smaller focus on C++ and more R/Py support. It was a fun experience, and I hope that the theoretical approach will slowly turn into a combination of theory and practice.
Anonymous

It's not complete, but a good start point for who want to learn more about information retrieval. Great course. I recommend.
Colin Khein
Basil Rormose
Mike Rocke
Deepak Jois