Mechanical Turk (Tool)

From edegan.com
Jump to navigation Jump to search


McNair Project
Mechanical Turk (Tool)
Project logo 02.png
Project Information
Project Title
Start Date
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


Description

The purpose of this page is to introduce people to the use of mechanical turk in data processing. The document is structured as follows: 1. It begins by describing the mechanical turk and the many ways in which it can be used. 2. it provides simple getting started instructions that allows a new user to access the mechanical turk system and begin a new project. 3. We give an example of a project with sample code.


What is Mechanical Turk

Mechanical Turk (Mturk) is a system that allows people to outsource work to many different people in an efficient way. For the purposes of the McNair center, we will be focused on the use of Mturk for the acquisition and cleaning of data. This is a great way to look up or clean data when you have a small number of easily understood steps that need to be repeated many times. If you data task fits this definition, then it is worth thinking about turning it into a Mturk task. In the example below, we think about how to find all the Twitter handles for a set of companies in a spreadsheet. If you were to do this by hand as an RA, you would start with the spreadsheet and go through each row searching on either google or twitter for each company. In Mturk, you would create a project. In that project, you would create a task template that would provide a set of overall instructions as well as hooks to fill in specific information about one row from your spreadsheet. When the turker receives their assignment, or HIT, they will see both the overall instructions and the specific information for that row of data in your spreadsheet. The Mturk system allows many people to work on your spreadsheet in parallel allowing the work to be completed much more quickly. If this is confusing, we will provide a concrete example below. For now, just bear in mind some essential vocabulary.


Mechanical Turk Vocabulary
Requester: the people posting work on the system
HIT: one task completed by a worker
Project: a collection of HITs
Turker: a worker on Mturk

Accessing the Mechanical Turk Platform

email: esi@rice.edu
pass: 9Million!
  • To create a new project, click on the Create link and follow the directions in the Create Project Example section below
  • To modify an existing project, follow the directions in the Modify an Existing Project section below

Creating a New Project Example

In the steps below, we describe the creation of a Turk project that asks Turk workers to find the twitter handles of companies. It will take as input a series of google search queries in csv form and the workers to enter the search strings into google and look to see if there are google handles that are returned on the first page of the search results.


Step 1, Project Info: Once you click on the create link, you will be brought to an interface with a number of text entry boxes. You want to summarize your project in ways that will be informative for the team as well as potential Turk workers choosing between projects. In the figure below, we describe a HIT Project FINISH.


Figure 1: Twitter Project Info

 


Step 2, Choosing Pay Level: Once you have named the project, you have to decide on pay scale (Reward per assignment) and the number of people working on each project (number of assignments per HIT). The higher the pay per HIT, the quicker your work will be completed by turkers, but you obviously do not want to waste money. A good rule of thumb is to work on the tasks you need completed by turkers for 30-60 minutes and then see how many rows you completed. We want the per HIT pay rate to roughly equal $6.00 - $10.00 in hourly wage to get things done efficiently on the system. If you decide to have more than 1 worker per HIT, it will be because you believe that the data task requires a certain amount of human judgement and you want to make sure that you only accept results that have been "verified" by multiple people. The last three parameters in this box determine how each HIT will be completed by each worker and how long the HIT stays in the system. You generally want "Time Allotted" to be 1 day. Expiration of the HIT doesn't matter that much. One of the last important choice in this screen is the "Auto-approve" option. The quicker the auto approve, the more likely that Turkers will take your task. For now, set it to 24 hours, but remember that you are responsible for regularly auditing results when you have a project up on the Turk system.


Figure 2: Cost Parameters in Mturk

 


Step 3, Design Layout: At this point, you have to design what the turker sees when they receive your assignment. While it is possible that one turker will complete multiple HITs, it is important to design the HIT so that it can be easily completed the first (and possibly only) time by the worker. In figure 3, below, you can see the initial design layout of the default data acquisition project in the turk system. It is an example HIT that asks turkers to find the website of a restaurant. Please note that this is not a great HIT in terms of the level of clarity of the instructions. We will provide guidelines on creating instructions below. For now, just notice a few features of the HIT. To the right of "Restaurant Name", there is a field called ${name}. This is actually a hook, or a blanks space, that will be populated with the actual name of a restaurant that will come from a spread sheet that you will upload into the turk system. Each HIT will correspond to one row of the spreadsheet. This is the same for the "Address" and "Phone Number" rows. The last key thing to notice is the "Website Address" field with a text entry box right below it. When a turker receives this HIT, they will paste the web address into this text box and you will receive a new spreadsheet with whatever they (add all the other turkers) pasted in the same row as the data you used to populate each HIT.


Now, How do you modify this HIT to reflect your actual data task? You can actually change the wording of the task directly in the editor screen. Make sure that all of the data element hooks (like ${name}) correspond to the actual names of the columns in the CSV file that you will upload on the turk system. But what if you need the task to look substantially different from the one you are looking at? If you click on the "Source", it will show you the actual html code of your HIT task as displayed in Figure 4. The turk system allows you to display a full website essentially for your HIT task with javascript, CSS, etc. As we develop our system at the McNair center, your will have more existing tasks to choose from, but when you need to actually build your own, some useful HTML references are listed below. When you have completed editing your HIT template, click on the "Save" Button and then move to "Preview". In this last screen, it will show you exactly what the turkers will see (Figure 5). If it looks correct, click "Finish"

HTML references
REF 1
REF 2


Figure 3: Design Layout

 


Figure 4: Design Layout, Raw HTML

 


Figure 5: HIT Preview

 

Existing HIT Library

create a list of existing hits and what they do

TDL with HITS

  • Data validation using javascript

Hash

import requests
response = requests.get(
    "https://www.eventbriteapi.com/v3/organizers/2300226659/events/",
    headers = {
        "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN",
    },
    verify = True,
)