=BackgroundHubs Pages=This *The main page represents the work used for mechanical turks for the paperHubs can be found: [[Hubs (Academic Paper)]]. *For the current work in progress for building the Hubs datasheet for the scorecard As of Spring 2016, go to: [[Hubs: Hubs Scorecard]]*For a list tracker of potential work in progress for the dataset building for the scorecard go to [[Hubs: Hubs with Data Building]]*For a set high-level overview of characteristics was created. Many of these are not what will be defined as Hubs. We will be creating a the variables for the scorecard go to help subjectively define [[Hubs: Hubs based on certain characteristics. Data]]
=List of Variables=For a more information on Mechanical Turks in general, -depth of the variables and procedure please see : [[Mechanical Turk (Tool)Hubs: Hubs Scorecard]]. This page will reflect the variables being collected separated into three categories. Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.
The main goal of the mechanical turk is to automate the collection of variables for potential hubs as much as possible. The key steps for the project are:
#Creating a '''comprehensive''' list of potential hubs
#Determining the best variables for the scorecard
#Building '''"filters"''' for automating the collection
#'''Running''' and '''auditing''' of the automation
#Collecting the remaining manual data
=Variables to be Used=
==Current Complete List==
'''As of Week of 7/11'''
#Onsite Venture Capital
#*Assets Under Management
#*Number
#Onsite Angel Investors
#Onsite Mentors
#Founding Date
#Site URL
#Office hours investors
#Office hours mentor/advisors
#Onsite temporary workshops
#Onsite mentors
#Networking Meetups
#Sponsors/Partners
#*University
#*Corporate
#Curriculum
#Onsite code school
#Alumni Network
#Nonprofit status
#Mission statement
#Specific Industry
#Price for a space
#Price for office
#Twitter activity
#Size (sqft)
#Size (# companies)
#Onsite accelerator
#Community membership??
#Franchise
#Multiple locations within city
==Grouping of Variables=='''07/29''' Ariel: code Hubs variable for Hubs There are a few categories the majority of the variables fall under:<code>E:/McNair/Projects/Hubs/Hubs Variable-Ariel</code>
'''Group 1: Low Hanging Fruit'''
Variables in this group are very easy to find and automate.
#Price for a space + office
#Twitter Activity
#Founding Date
#URL
#Mission Statement
#Nonprofit
#Sponsors/Partners
#Specific Industry
'''Group 2: The Difficult to Find'''
There are certain variables where the information is not readily available online or difficult to find.
#Size (can try to find press releases)
'''Group 3: In Between 1 and 2'''Variables that aren't too easy or difficult to find and automate.#Onsite accelerator#Alumni mentor---vs. other mentors??? '''Group 4: The Hard to Differentiate'''The key property of this group is that there are several similar variables, which would be difficult for a turk to differentiate. In order to fix this, we will need to create filters akin to the DSM5 scorecard. See the below section.#Onsite VC v. Angel Investors#Onsite OH Investors v. mentors#Onsite temporary workshops v. networking events#Curriculum v. code school '''Group 5: The Need further Discussion Before Collection'''Variables that need to be developed more prior to collection.#Franchise and multiple locations within a city#Community Membership ==Filters/Scorecard=====General Approach===The Scorecard will be broken down into three main parts: description, characteristics, andTBD parts. The procedure for creating these will be as follows: the description will be determined, develop the characteristics after looking over examples, the creation of possible mechanical turks that have complete accuracy even if not comprehension (e.g. a task will that always guarantees that there is an onsite mentor that covers only 40% of firms, but never misspecifies the existence of mentors), and auditing of the results. ===Example==='''Curriculum'''*'''Desc''': The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.*'''Characteristics''': **Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)***Code schools are for startup labor supply**Active input into a current entrepreneurial endeavor***e.g. " The program is designed to augment and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors" **Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be**Has evidence of a integrated curriculum leading to a new compentance **Has evidence of a set fixed start and end dates that last XXX long**Is a session linked to others that regularly occurs*'''TBD points'''**Do we care about outsourcing?*'''Potential Turk''' '''Code School'''*'''Desc''': training programs that teach coding, data processing, webpage building and other technical skills.*'''Characteristics''':**Target group are the developers or people who want to join the startups but not the founders themselves**Scheduled classes, not a one time meeting (as opposed to workshops) '''Temporary Workshops'''*'''Desc''':a discussion/learning As of a group Week of people on specific subjects*'''Characteristic''':**One time**Have a topic7/subject/goal ***e.g. learn to code workshop: Java script 101 =Additional Resources=#[[Mechanical Turk (Tool)]]#Veeral has created a google automating procedure for different lists =Work in Progress===Goals for WIP==#For GROUP 1, creation of mechanical turk steps:#*'''EXAMPLE:'''#*'25''Twitter Activity'''#**'''STATUS''': Complete/In Progress/Not Started#**'''Previously Collected''': Yes/No#**'''Published on Mechanical Turk''': Yes/No#**'''Audited''': Yes/No#**'''Updates''':#**'''Code''':#For GROUP 4:##Scorecard Example##Potential Mechanical Turk Steps (e.g. if specific organization is on website)##Mechanical Turk Example (GROUP 1)##Add Comments on:###How much manual work remains/What is missing###Any remaining difficulties#For GROUPS 2 and 3:##Brainstorm potential ways to find data##Follow Steps in Group1 ==Actual WIP==
===Group 1===
'''Variables Difficult to Obtain'''
#'''Founding Date''' ''(date_founded)''
#*''' ''Difficulty:'' ''' Finding date based on our strategies
#*''' ''New Approach:'' '''
#*#Whois.net Date
#*#Factavia/other press release searches
#'''Multiple locations within city + Franchise''' (as of now just addresses) ''(multi_address)''
#*''' ''Difficulty:'' ''' Company or establishment level will impact measurements
#*''' ''New Approach:'' ''' Will record all addresses at company level
#'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management) ''(onsite_Vc_bin)/(onsite_vc_list)'' ''(onsite_angel_bin)/etc.''
#*''' ''Levels:'' ''' Binary, list of investors
#*''' ''Difficulty:'' ''' Hub website usually does not include investors
#*''' ''New Approach:'' '''
#*#Google key terms with address of Hub
#*#Start with partners and use google/crunchbase
===Curriculum and Code SchoolGroup 2==='''CurriculumVariables Comfortable, Not Complete'''(rough order of most difficult to least difficult)*#'''Onsite accelerator'Desc''': The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.'(onsite_accel_bin)/(onsite_accel_cnt)/(onsite_accel_list)''#*'''Characteristics''Levels:'' ''': Binary, count, list#**Education that is for ''' ''Difficulty:'' ''' Usually not a founder (list, which requires more scrubbing as opposed to code schools which can be for people who many other variables just want require us to join find one page on a startup)website. #***Code schools are for startup labor supply''' ''Approach:'' '''#**Active input into a current entrepreneurial endeavor***e.g. " The program is designed to augment #Google searches and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors" **Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be**Has evidence of a integrated curriculum leading procedure to a new compentance use on website yields decent results#**Has evidence of a set fixed start and end dates that last XXX long**Is a session linked #Similar procedure to others that regularly occursonsite investors*#'''Size (# members)'''TBD points''(num_members)''#**Do we care about outsourcing?*"Potential Turks"**Google "Fullbridge" site''' ''Levels:URL '''Code School'''Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)#*'''Desc''Difficulty:': training programs that teach coding' ''' Some companies don’t list all members - only selective ones-, data processingothers do not separate current members and alumni, webpage building and other technical skillssome just write "we have served more than 120 startups..."#*'''Characteristics''Approach:'' ''':**Target group are the developers or people who want to join the startups but not the founders themselves**Scheduled classes, not For companies that have a one time meeting (as opposed to workshops) =Completed Work= =OLD1=We will be creating a "Hubs scorecard" to determine how hub-like potential spaces are. In order to do solist, we will evaluate the places based on certain variablescount. Previous variables for potential hubs were collected. BelowFor those with select members, we list will count those as well as other variables we think might be helpful they listed and try to build out the scorecardsee if there is a comment about how many they have. Ideally For those that just have a statement "with over, " we would have will write the following variables number and + (not collected previouslye.g. "120+):.#Onsite VC'''Office hours investors''' and '''Office hours mentor/advisors''' ''(OH_bin)/Angel(OH_inv_bin)/Investors (Count or binaryOH_inv_list)/etc.''##Comments*''' ''Levels:'' ''' Binary for OH, binary for two separate OH, list of names/descriptions of OH##Mechanical Turk Comments*''' ''Difficulty:#Onsite Mentors (binary) --- ''Are these the same as advisers?''##Comments:##Mechanical Turk Comments:#"Office hours" ' Some companies do not list who OH are with investors or mentors (binary)##Comments: Previously collected included number of events, but did not separate them into categories (e.g. networking eventsalways obvious if investor, workshopsmentor, etc.). We view this separation as importantor advisor, BUT very difficult to collectsometimes not clear if mentor is investor/future investor##Mechanical Turk Comments: #Onsite temporary workshops (binary or count) *** '''see mechanical turk''Approach:'' '''Google approach to get to OH pages and then lookup key words in description to separate out##Comments:##Mechanical Turk Comments:#'''Onsite temporary workshops and Networking Meetups ''' (Binary or countCount) *** '''see mechanical turk'(onsite_temp_events_bin)/(onsite_temp_workshop_bin)/(onsite_temp_workshop_cnt)/etc.''##Comments:##Mechanical Turk Comments:#Sponsors and Partners (binary and list) --- a*'''re these the same?''##CommentsLevels:##Mechanical Turk Comments:#Alumni Network (binary) --- ''not all potential hubslist this and the fact that some do might indicate its importance''' Binary for do they exist, count for each##Comments:##Mechanical Turk Comments:#Num of Companies --- *'''to help determine size as getting physical sqfootage is difficult''##CommentsDifficulty:##Mechanical Turk Comments:#Nonprofit (binary) --- ''helpful in determining goals of potential hubs''##Comments:##Mechanical Turk Comments:#Mission Includes Key Buzzwords ' Difficult for Turkers to differentiate between these two and also other potential events (e.g. "ecosystem", "community"symposiums) --- #*'''help separate simple coworking spaces form hubs'' Example of Prior Variables CollectedApproach:*Specific Industry -- ''defined as LinkedIN Self Identifier, no categories just plain text''' Uses key search terms (e.g. Java/etc. We think what we really want is ) to see if they have a specialty separate out workshops and key terms (e.g. healthcarelunch/happy hour)for networking meetings#''*Num of Events --- 'Onsite code school'relatively complete inputs, but from March 2016 (see above as well)''*Price for Single Space --- and ''defined as price for flexible desk, relatively complete inputs'Curriculum'*Price for Office --- ''no inputs''*Twitter Activity (Multinomial or Countonsite_long_term_courses) --- ''High=2/Moderate=1/No=0, no explanations on how to categorize the activity. Also no handles(onsite_code_school_bin)''#*Size (sqft) --- ''no records for majority of the companies''*Num Conference Rooms --- 'Levels:'''no records for majority of the companies'' Binary for do they exist, binary for each#*Onsite accelerator (binary) --- ''relatively complete inputs''*Onsite code school (binary) --- ''relatively complete inputs'Difficulty:'*Community Membership (binary) --- ''relatively complete inputs''Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups =OLD2=#*'''Twitter activity'''Approach: '' '''UPDATE Uses key search terms (7/14e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership)for curriculum#'''Sponsors/Partners'': Updated turk to reflect our desired formats'(University, Corporate) ''UPDATE (7sponsors_cnt)/12(sponsors_list)/etc.''#*': '''AUDIT RESULTS'Levels:'': We noticed '''UPDATE Count, list of sponsors/partners (7/11if exist), separate columns for university and corporate#*'''''Difficulty: uploaded and published on amazon's mechanical turk site' ''' Not all companies will list sponsors, partnesrs, or either. Given Not always clear the time cost to either record number of tweets in a month or look up more than 10 tweets, we decided to record the date of the last 10th tweet. Using a sample of ~10 companies, We noticed minimal differences in data observations difference among using 10sponsors,20partners, and 30 tweetsinvestors.''#Copy the text in the Search Text into a search engine.#Click on result from twitter.com with the company name. If the link does not appear on the first 3 pages, record DNE for both outputs#Record the company*''s Twitter Handle into Twitter Handle#Record the date (MM/DD/YY) of that tweet for Twitter Activity. If there are less than 10 tweets, record DNE. *'''NUMBER OF EVENTSApproach:''': ''UPDATE: writtenUse two different levels and use of google search, not publishedthen if list exists, on amazonseparate by "college"/"university" and rest#'s mechanical turk site''Alumni Network'''Considerations'''*Difficulties Encountered:*Expected Time to Complete:*Expectation of Results (accuracy of turk, comprehensivenessalumni_bin)/(alumni_list):''#*Other Comments''' ''Levels: '''Procedure'''#Copy the text in the Search Text into a search engine.#Click on the result that is the website of the company. If there does not exist a listing on the first three pagesBinary, mark as DNE.list#Look for links related to events, such as *'Events' or 'Calendar' on the homepage. #If not found on the homepage, check 'AboutDifficulty:' and check 'Community'#Count the number of events in July 2016 and record it. If there is no information of events on the website, record DNE. Note***: ''Events include meetupsNot all companies list alumni, workshops, info sessions etc. We do not want to count them separately since it is difficult to do so. Most companies put all the events on the same section and do not put event types in the titles of the events. We have to look into the details of the events to find out the type and even we do so some events descriptions do not allow us to determine the type easily. Differentiating the types of the events demands more time and effort and therefore is not suitable to be a mechanical turk project.'' only list "selected"#*'''Onsite Mentors'''Approach: ''UPDATE: written, not published, on amazon's mechanical turk site''#Copy the text in the Search Text into a search engine.#Click on the result Include all that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.have lists#Look for links related to mentorship such as 'mentors', 'mentorshipSize (sqft)''' '' or (size_sqft)'mentoring programs'#If the key words can be identified, mark as 1#If there is no explicit *''' 'mentoring' section, look for links related to a description of the company, such asLevels: 'About,' 'Our Team,' 'Our Mission,' etc., look for a subsection or mention of mentor/mentorship/mentoringNumber in sqft#If these exist, mark as 1.#If not, go to links related to membership *''' ''benefits,Difficulty:' 'perks,' or related.#Do same process as end of 4 and 5#If there is no mention of mentorship in these sections, type the company, city, and 'mentoring' into a search engine. If a link to a reliable website (such as Desktime) appears and mentorship can be found in the description, mark as 1.Not all companies list square feet online#If none of these steps result in a mark of 1, mark as 0 *'''Nonprofit'''Approach: ''UPDATE: written, not published, on amazon's mechanical turk site''#Copy the text in the Search Text into a *#Google search engine.with key words#Click on the result that is the website of the company. *#If there does results do not exist a listing on the first three pagesappear, mark as DNE.use of press releases is possible#Go to links that describe the company, usually they are labelled: 'About', 'Our Story,Onsite Mentors' 'Mission'#Look for the key word 'nonprofit'(onsite_mentors_bin)/(onsite_mentors_cnt)/(onsite_mentors_list)'non-profit'#If 'nonprofit' is identified, mark as 1, otherwise 0. *'''Number of Members'''Levels: ''UPDATE: written, not published, on amazon's mechanical turk site''#Copy the text in the Search Text into a search engine.#Click on the result that is the website Count and list of the company. If there does not mentors (if exist a listing on the first three pages, mark as DNE.)#Look for the link *'''Members' or 'ResidentsDifficulty:', usually they are under the links 'Community', 'Membership', Not all companies list mentors - bigger issue is onsite investors#*'Our Space' or 'The Space'.#Count the number of members#If the link or section of 'MembersApproach:' is not found, go the 'Community' and 'Coworking' Use two different levels and look for the description on number use of startups/founders/members in the community. Record the number.#If number of members cannot be identified using above steps, record DNE.google search
===Group 3==='''Variables Easy to Obtain'''#'''Twitter activity''' ''(twit_handle)/(twit_prev_mon_cnt_tweets)/(twit_cnt_followers)/(twit_cnt_retweets)''#*'''Sponsors and Partners''Levels:''':''UPDATE: writtenTwitter Handle, # Tweets in a Month, not published# Followers, # Retweets#*''' ''Approach:'' ''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on amazongoogle and then use Gunny's mechanical turk siteTwitter crawler to get other levels from handle#'''Site URL''' ''(url)''#*''' ''Levels:'' '''URL#Copy the text in the Search Text into a *''' ''Approach:'' ''' Google using Veeral's code that allows us to search engine#''' ''Whois Date'' ''' ''(date_whois)''#*''' ''Levels:'' ''' Date#*''' ''Approach:'' ''' Date active website was registered#'''Address''' ''(address)''#*''' ''Levels:'' ''' Will include all addresses#*''' ''Approach:'' ''' Google key terms (e.g.Contact Us) and URL using Veeral's code#Click on '''Nonprofit status''' ''(nonprofit_binary)''#*''' ''Levels:'' ''' Binary variable indicating if the result potential Hub is a nonprofit organization#*''' ''Approach:'' ''' http://www.guidestar.org/ is a site that we can use to search if a company is the website nonprofit or not#'''Mission statement''' ''(missions_stmt)''#*''' ''Levels:'' ''' Official mission statement or description of the company. If there (if mission does not exist a listing )#*''' ''Approach:'' ''' If not explicitly stated mission statement, will include "About" or statements on main page#'''Specific Industry''' ''(spec_industry)''#*''' ''Levels:'' ''' Industry included in statement (no aggregation)#*''' ''Approach:'' ''' *Based on the first three pagesMission Statement, mark as DNE.not aggregated#Look '''Price for the link or mention of a space/office''' ''(price_space)''#*'''Sponsors' or 'PartnersLevels:', many times of which is often under the section of 'About', 'Community'Two prices one for shared, or related sectionsother for private#If sponsors or partners can be found mark as 1 *''' ''Approach:'' ''' Uses google methodology with key terms and list them, otherwise mark as 0.URL[[Category: Internal]][[Internal Classification: Legacy| ]]