Overview

Class Central Tips

This course will teach you efficient and scalable data labeling for ML and various business processes. The key here is the crowdsourcing approach, based on splitting complex challenges into small tasks and distributing them among a vast cloud of performers.

You will get acquainted with crowdsourcing as a methodology, mastering certain steps and techniques that ensure quality and stable performance. All these techniques will be implemented in practice straight away: throughout the course, you’ll design your own crowdsourcing project.

Syllabus

Introduction to crowdsourcing

We will start the course with discussing what crowdsourcing is and how it applies to Machine Learning. Looking at examples of large-scale data labeling processes, you will learn how diverse and powerful crowdsourcing is. We will also go through the steps necessary to prepare a crowdsourcing project. You will be developing a basic understanding of how it is done, along with your own crowdsourcing projects, over the following weeks. This week, you will choose a project that’s most relevant to you. Last but not least, you will meet a team of Crowd Solutions Architects from Yandex. They will give a short introduction to their crowdsourcing projects and share their experience on how to design an efficient task pipeline.

Instructions and interfaces

This week, we will dive into designing crowdsourcing projects. After a task has been decomposed, it’s time to create interfaces and guidelines. We will go through some tips on performer-friendly interface design and learn how to compose guidelines that will help performers along the way.Week 2 is an important step in developing your own crowdsourcing project. You will create your projects on a real crowdsourcing platform. Stepping into the performers’ shoes, you will try to label data. You will also write instructions. We recommend investing a decent amount of time into this week’s assignments. It will contribute a lot to your final task of collecting labeled data.

Quality control

It’s time to talk about ensuring data quality. This week, we will discuss how to select and train performers and learn to configure quality checks depending on task specifics. Most crowdsourcing platforms offer a wide range of quality control mechanisms, but it is important to choose those that are most applicable to your task.
You will also develop training and quality control schemes for your own crowdsourcing projects. Our Crowd Solutions Architects will also share their experience about setting up complicated quality controls.

Smart techniques to enhance quality

This week is an introduction to research that focuses on crowdsourcing challenges. It encompasses a variety of topics, most of which concentrate on the same thing: improving quality while keeping within budget.
The first aspect we will discuss is performers’ motivation. Even though we say that crowdsourcing is an engineering task, its most important resource are people. It is necessary to think about their intentions and the possible benefits they can get out of your tasks. Another topic of discussion this week is enhancing quality by working with performers’ responses. There are several response aggregation algorithms that allow you to get more quality out of the same labeled dataset. Watch the videos and learn how it works!

How projects are launched and maintained

Wow, you’ve made it to Week 5! Congratulations:) This week, we will talk about crowdsourcing projects in a long-term perspective. Most crowdsourcing projects are not just one-time launches. For most business processes, you need to collect and label data all the time. We will share our experience in making the crowd of performers a stable and loyal community and will provide a list of metrics that can help you understand what is going on in your projects. The whole Crowd Solutions Architects team will join us to give you a full retrospective of the projects they talked about previously.

Taught by

Daria Baidakova, Rosmiyana Shekhovtsova, Vladimir Zubkov, Sergey Koshelev, Ivan Stelmakh, Ivan Semchuk, Damir Sibgatullin, Ekaterina Fedorenko and Ivan Karpeev