Changes

Jump to navigation Jump to search
2,415 bytes removed ,  13:47, 21 September 2020
no edit summary
{{Project|Has project output=Tool|Has sponsor=McNair ProjectsCenter
|Has title=Accelerator Demo Day
|Has owner=Minh Le,
==Amazon Mechanical Turk==
Login infoPlease refer to: username: mcnair@rice.edu password: amount[[Amazon Mechanical Turk for Analyzing Demo Day Classifier's Results]]
There's a file in the folder
CrawledHTMLFull
called
FinalResultWithURL
that was manually created by combining the file
crawled_demoday_page_list.txt
in the mother folder and the file
predicted.txt
This file combined the predictions to the actual url of the websites.
 
Since MTurk makes it hard for us to display the downloaded HTML, it is much faster to just copy the url into the question box rather than trying to display the downloaded HTML.
 
The advantage to this is that some websites, such as techcrunch.com behaves abnormally when downloaded as HTML so opening these kinds of websites in the browser would actually be more beneficial because the UI would not be messed up. Moreover, if certain websites has paywall or pop-up ads, the user can also click out of it. Since most of the times, paywall or pop-ups are scripts within HTMLs, the classifier can't rule them out because the body of the HTMLs may still contain useful information we are looking for. Major paywalls or websites that required log-ins such as linkedin have been black-listed in the crawler. More detail in the crawler section below.
 
However. there is a disadvantage to this: websites are ever changing, so there is a possibility that in the future, the URL may not be usable, or has changed to something else; on the other hand, downloaded HTMLs remain the same because it does not require any internet connection to render and thus, the content is static.
 
To create the MTurk for this project, follow this tutorial in [[Mechanical Turk (Tool)]]. For testing and development purpose, use https://requestersandbox.mturk.com/
 
Test account:
email: mcboatfaceboaty670@gmail.com
password: sameastheoneforemail2018
 
For this project, all the fields that was asked of the user is:
 
*Whether the page had a list of companies going through an accelerator
*The month and year of the demo day (or article)
*Accelerator name
*Companies going through accelerator
 
Layout:
 
[[File:Demodayfinal.png]]
 
===Pricing===
 
Connor and Minh talked about how to price MTurk so that it is not too generous nor too stingy for workers. Connor could complete four MTurk HITs in 12 minutes (3min/HIT). He then asked his friends who were unfamiliar with MTurk to complete a few surveys, and found they completed around 3 in 15-20 minutes (5-7min/HIT). Given this, we think an upper limit of 10 min/HIT is appropriate. If this is the case, we should price each HIT at $1.50, which leads to an appropriate $9.00/hour rate for workers.
==Hand Collecting Data==
Once this process is finished, we will filter only the 1s in Column F, and [[Connor Rothschild]] and [[Maxine Tao]] will work to populate empty cells in The File to Rule Them All with that data.
 
==Advance User Guide: An in-depth look into the project and the various settings==

Navigation menu