Amazon Mechanical Turk for Analyzing Demo Day Classifier's Results
|Amazon Mechanical Turk for Analyzing Demo Day Classifier's Results|
|Has title||Amazon Mechanical Turk for Analyzing Demo Day Classifier's Results|
|Has owner||Minh Le|
|Has start date||July 2018|
|Has deadline date|
|Has keywords||Amazon Mechanical Turk|
|Has project status||Complete|
|Is dependent on||Accelerator Demo Day|
|Has sponsor||McNair Center|
|Has project output||Tool, How-to|
|Copyright © 2019 edegan.com. All Rights Reserved.|
The code is stored in:
E:\McNair\Projects\Accelerator Demo Day\Turk
How to Use
username: firstname.lastname@example.org password: amount
There's a file in the folder
that was manually created by combining the file
in the mother folder and the file
This file combined the predictions to the actual url of the websites.
Since MTurk makes it hard for us to display the downloaded HTML, it is much faster to just copy the url into the question box rather than trying to display the downloaded HTML.
The advantage to this is that some websites, such as techcrunch.com behaves abnormally when downloaded as HTML so opening these kinds of websites in the browser would actually be more beneficial because the UI would not be messed up. Moreover, if certain websites has paywall or pop-up ads, the user can also click out of it. Since most of the times, paywall or pop-ups are scripts within HTMLs, the classifier can't rule them out because the body of the HTMLs may still contain useful information we are looking for. Major paywalls or websites that required log-ins such as linkedin have been black-listed in the crawler. More detail in the crawler section below.
However. there is a disadvantage to this: websites are ever changing, so there is a possibility that in the future, the URL may not be usable, or has changed to something else; on the other hand, downloaded HTMLs remain the same because it does not require any internet connection to render and thus, the content is static.
Test account: email: email@example.com password: sameastheoneforemail2018
For this project, all the fields that was asked of the user is:
- Whether the page had a list of companies going through an accelerator
- The month and year of the demo day (or article)
- Accelerator name
- Companies going through accelerator
We priced out task at $1.25 per HIT. Assuming workers take less than 10 minutes, this translates into >$7.50 per hour.
We sent out the task in two batches. The first was 20 HITs to be completed by two workers each, as to test for interjudge reliability.
The second batch was the remaining 264 HITs, to be completed by one worker each.
MTurk charged fees of $.25 per HIT and an additional $.0625, meaning each HIT cost us $1.50.
OUR FINAL PRICE: ((20*2)+264)*1.5625 = $475.00