This course will teach you efficient and scalable data labeling for ML and various business processes. The key here is the crowdsourcing approach, based on splitting complex challenges into small tasks and distributing them among a vast cloud of performers.
You will get acquainted with crowdsourcing as a methodology, mastering certain steps and techniques that ensure quality and stable performance. All these techniques will be implemented in practice straight away: throughout the course, you’ll design your own crowdsourcing project.
Introduction to crowdsourcing
We will start the course with discussing what crowdsourcing is and how it is applicable to Machine Learning. By showing examples of large-scale data labeling processes we will learn how diverse and powerful crowdsourcing is. We will also go through the steps necessary to prepare a crowdsourcing projects. This basic understanding will be developed in the following weeks, as well as your own crowdsourcing projects. This time you will choose a project most relevant to you and draft its pipeline. Last but not least – you will meet a team of Yandex’s Crowd Solutions Architects. They will give a short introduction to their crowdsourcing projects and share experience on how to design an efficient task pipeline.
Instructions and interfaces
This week we will dive into designing crowdsourcing projects. After a task has been decomposed to smaller pieces, it is time to create interfaces and guidelines. We will go through some tips on performer-friendly interface design and learn how to compose guidelines that will help performers along the way.
Week 2 is an important step in developing your own crowdsourcing project. Based on the pipeline from last week, you will create your projects on a real crowdsourcing platform. Stepping into the performers’ shoes, you will try to label some data and create instructions about it. We recommend to invest a decent amount of time into this week’s assignments. It will contribute a lot into your final task of collecting labeled data.
It’s time to talk about ensuring data quality. This week we will discuss how to select and train performers and learn how to configure quality checks depending on task specifics. Most crowdsourcing platforms offer a vide range of quality control mechanisms, but it is important to choose those that are most applicable to your task.
You will also develop training and quality control for your own crowdsourcing projects. And our Crowd Solutions Architects will share their experience about setting up complicated quality controls.
Smart techniques to enhance quality
This week is an introduction to the research field dealing with crowdsourcing challenges. It is a variety of topics that mostly follow the same goal: get more quality while keeping budget limits.
The first aspect we will discuss is performers’ motivation. Even though we say that crowdsourcing is an engineering task, its most important resource are people. It is necessary to thinks about their possible benefits and intentions for working on your tasks.
Second topic of discussion is enhancing quality by working with collected answers. There are several answer aggregation algorithms that allow to get more quality out of the same label set. Watch the videos and learn how it works!
How projects are launched and maintained
Wow, we have made it to Week 5! Congratulations :)
This week we will talk about crowdsourcing projects in a long-term perspective. Most of them are not just one-time launches. For most business processes data needs to be collected and labeled constantly. We will share our experience about making the cloud of performers a stable and loyal community and provide a list of certain metrics that help to understand what is going on in your projects.
The team of Crowd Solutions Architects will appear in whole to give a full retrospective into the projects they have been talking about previously.
Daria Baidakova, Rosmiyana Shekhovtsova, Vladimir Zubkov, Sergey Koshelev, Ivan Stelmakh, Ivan Semchuk, Damir Sibgatullin, Ekaterina Fedorenko and Ivan Karpeev