Changes

Jump to navigation Jump to search
11,134 bytes removed ,  17:36, 2 September 2016
no edit summary
=Hubs Pages=
*The main page for Hubs scorecard can be found: [[Hubs Scorecard (Academic Paper)]]*For the current a tracker of work in progress for the dataset building for the Hubs datasheet go to: ([[Hubs: Mechanical Turk]])*For a tracker of work in progress scorecard go to [[Hubs: Hubs Data BuildBuilding]]*For a high-level overview of the variables Hubs for the scorecard go to [[Hubs: Hubs Data]] 
=Background=
This page represents the work used for mechanical turks creating the hubs data for the paper: [[Hubs (Academic Paper)]]. As of Spring 2016, a list of potential Hubs with a set of characteristics was created. Many of these are not what will be defined as Hubs. We will be creating a scorecard to help subjectively define Hubs based on certain characteristics.
For more information on Mechanical Turks in general, see [[Mechanical Turk (Tool)]].
The main Our goal of the mechanical turk is to automate the collection of variables for potential hubs as much as possible. The key steps for the project are:#Creating a '''comprehensive''' list of potential hubs(Complete)##<code>E:\McNair\Projects\Hubs\Raw Program List</code> Contains 600 entities - vast majority are firmly not hubs (file pedigree unknown)##<code>E:\McNair\Projects\Hubs\Hubs Data</code> - Contains 125 entities - many are not hubs (overlap with above file unknown, this file's pedigree from old Hubs project).#Determining the best variables for the scorecard(Complete)#Building '''"filters"''' for automating the collection(Complete)#'''Running''' and '''auditing''' of the automation(In Progress)#*See section 4.2#Collecting the remaining manual data (next step)
=Variables to be Used=
 
Old variable list (see Hubs Data.xls) contains 18+3 variables. Overlap with new variable list is ~50%
 
==Current Complete List==
'''As of Week of 7/11'''
Variables that aren't too easy or difficult to find and automate.
#Onsite accelerator
#Alumni mentor---vs. other mentors???
#Onsite temporary workshops v. networking events
#Curriculum v. code school
 
=====General Approach Group 4=====
The Scorecard will be broken down into three main parts: description, characteristics, andTBD parts. The procedure for creating these will be as follows: the description will be determined, develop the characteristics after looking over examples, the creation of possible mechanical turks that have complete accuracy even if not comprehension (e.g. a task will that always guarantees that there is an onsite mentor that covers only 40% of firms, but never misspecifies the existence of mentors), and auditing of the results.
#Community Membership
==Filters/Scorecard=====General Approach===The Scorecard will be broken down into three main parts: description, characteristics, andTBD parts. The procedure for creating these will be as follows: the description will be determined, develop the characteristics after looking over examples, the creation of possible mechanical turks that have complete accuracy even if not comprehension (e.g. a task will that always guarantees that there is an onsite mentor that covers only 40% of firms, but never misspecifies the existence of mentors), and auditing of the results. ===Example==='''Curriculum'''*'''Desc''': The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.*'''Characteristics''': **Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)***Code schools are for startup labor supply**Active input into a current entrepreneurial endeavor***e.g. " The program is designed to augment and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors" **Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be**Has evidence of a integrated curriculum leading to a new compentance **Has evidence of a set fixed start and end dates that last XXX long**Is a session linked to others that regularly occurs*'''TBD points'''**Do we care about outsourcing?*'''Potential Turk''' '''Code School'''*'''Desc''': training programs that teach coding, data processing, webpage building and other technical skills.*'''Characteristics''':**Target group are the developers or people who want to join the startups but not the founders themselves**Scheduled classes, not a one time meeting (as opposed to workshops) '''Temporary Workshops'''*'''Desc''':a discussion/learning of a group of people on specific subjects*'''Characteristic''':**One time**Have a topic/subject/goal ***e.g. learn to code workshop: Java script 101 =Additional Resources=#[[Mechanical Turk (Tool)]]#Veeral has created a google automating procedure for different lists   =Work in Progress===Goals for WIP==#For GROUP 1, creation of mechanical turk steps:#*'''EXAMPLE:'''#*'''Twitter Activity'''#**'''STATUS''': Complete/In Progress/Not Started#**'''Previously Collected''': Yes/No#**'''Published on Mechanical Turk''': Yes/No#**'''Audited''': Yes/No#**'''Updates''':#**'''Code''':#For GROUP 4:##Scorecard Example##Potential Mechanical Turk Steps (e.g. if specific organization is on website)##Mechanical Turk Example (GROUP 1)##Add Comments on:###How much manual work remains/What is missing###Any remaining difficulties#For GROUPS 2 and 3:##Brainstorm potential ways to find data##Follow Steps in Group1 ==Steps Needed to Complete==#Establish automation process Create Processes for Groups 1-3Collecting Data#*Status (7/21): G/Y: Founding date, size (members) issues#*Begin Date: Started#*Reach Goal: Complete By Friday 7/22#Differentiate variables in Group 4 #*Status (7/2127): G/Y: much progress has been made, but issues with onsite venture capitalists/angel investors#*Begin Date: Started#*Reach Goal: Complete by Wednesday 7/27
#Have a comprehensive list of potential hubs
#*Status (7/2127): Hannah working on thisComplete
#Test processes and audit
#*Status (7/2127): NS#*Begin DateIn Progress (see section: TBD#*Reach Goal: TBD)
#Fill in Remaining Data Manually
#*Status (7/2127): NS#*Begin Date: TBD#*Reach Goal: TBD
==Actual WIPHow to Code the Variables==
===Group 1===
#Twitter Activity
#*'''STATUS''': Complete#*'''Previously Collected''': YES/NO - Recorded 2/1/0 to represent activity level, but not same as we are#*'''Published on Mechanical Turk''': Yes#*'''AUDITED''': Yes#**'''Audit Results''': Comparing to 30 that manually done, for '''twitter handle,''' all 3 turkers agreed with our results 81% of the time, but at least 2 turkers agreed with our results 98% (the exception was a company that had several twitter handles based on location). Results were 52% and 89% respectively.#*'''UPDATES''':#**'''UPDATE (7/20)''': Gunny has created a tool to do this process#**'''UPDATE (7/14)''': Updated turk to reflect our desired formats#**'''UPDATE (7/12)''': Audited#**'''UPDATE (7/11)''': uploaded and published on amazon's mechanical turk site. Given the time cost to either record number of tweets in a month or look up more than 10 tweets, we decided to record the date of the last 10th tweet. Using a sample of ~10 companies, We noticed minimal differences in data observations among using 10,20, and 30 tweets.''#*'''CODE''' (7/14)#*#Copy the text in the Search Text into a search engine.#*#Click on result from twitter.com with the company name. If the link does not appear on the first 3 pages, record DNE for both outputs#*#Record the company's Twitter Handle into Twitter Handle#*#Record the date (MM/DD/YY) of that tweet for Twitter Activity. If there are less than 10 tweets, record DNE.
#URL
#*'''STATUS''': In Progress#*'''Previously Collected''': YES#*'''Published on Mechanical Turk''': NO/No Need (Veeral)#*'''AUDITED''': NO#**'''Audit Results''': TBD#*'''UPDATES''':#**'''UPDATE (7/22)''': Veeral has code to do this procedure #**'''UPDATE (7/18)''': Code writtensearch company, expected time for each assignment is <15 seconds - pay rate, therefore, recommended $.04 #*'''CODE'''#*#Copy the text city in the Search Text into a search engine.#*#Record the URL of the first result in the following format ___.__/ (e.g. if url is example.us/other, record example.us/google)
#Address
#*'''STATUS''': In Progress#*'''Previously Collected''': YES#*'''Published on Mechanical Turk''': NO#*'''AUDITED''': NO#**'''Audit Results''': TBD#*'''UPDATES''':#**'''UPDATE (7/22)''': Code written. Difficulties occur with very large companies (e.g. Impact Hub). Will require Veeral's program, expected time for each assignment is 10-20 seconds - pay rate, therefore, recommended $.05
#*'''CODE'''
#*#Using Veeral's code, crossproduct allintext: (Group A) and site: (Group B), where '''Group A'''=Contact (high coverage), About Us, Find Us, Locations, Address, '''Group B'''= Company URLs.
#*#Click on first result. If addresses exist, record in ADDRESS, STATE, and ZIP.
#*#If not, go to company's URL. If addresses exist, record in ADDRESS, STATE, and ZIP.
#*#If address exists, but ZIP does not, plug in address into search engine and record ZIP.
#*#Otherwise, record DNE.
#Mission Statement
#*'''STATUS''': In Progress#*'''Previously Collected''': YES#*'''Published on Mechanical Turk''': NO#*'''AUDITED''': NO#**'''Audit Results''': TBD#*'''UPDATES''':#**'''UPDATE (7/18)''': Code written, expected time for each assignment is 20-30 seconds - pay rate, therefore, recommended $.08
#*'''CODE'''
#*#Copy the text in the Search Text 1 into a search engine (allintext: About/Mission site: from Company's URL).
#*#If no text exists, record "DNE"
#Specific Industry
#*'''STATUS''': In Progress#*'''Previously Collected''': YES/NO, based on LinkedIn identifier#*'''Published on Mechanical Turk''': NO#*'''AUDITED''': NO#**'''Audit Results''': TBD#*'''UPDATES''':#**'''UPDATE (7/21)''': Given that most companies include their specialty in mission statement and difficulty to turk, we will manually check each mission statement and mark it accordingly. #*'''CODE'''#*#NONE
#Nonprofit
#*'''STATUS''': In Progress
#*'''Previously Collected''': NO
#*'''Published on Mechanical Turk''': NO
#*'''AUDITED''': NO
#**'''Audit Results''': TBD
#**'''REQUIRES ADDITIONAL STEPS''': YES (need to double check results)
#*'''UPDATES''':
#**'''UPDATE (7/19)''': Code written, code 2 of 2 is believed to be more accurate and efficient. Expected time to complete is 15 seconds - pay rate, therefore, recommended $.04
#*'''CODE 1 of 2'''
#*#Go to Company's URL.
#*#Go to links (sometimes will be sections of the URL page) that describe the company, usually they are labelled: 'About', 'Our Story,' 'Mission'.
#*#If none of these exist, record DNE for PAGES
#*#Look for the word 'profit'/'nonprofit'/'non-profit'/'not-for-profit' (with or without -)
#*#If any of the key words exist is identified, record as 1, otherwise 0 for EXISTS (1/0).
#*#If it is marked as 1, record all sentences that the word is found in under SENTENCES.
#*#If the links do exist, record the name of the link under PAGES
#*#Repeat steps 4, 5, and 6 on the pages that were linked.
#*'''CODE 2 of 2'''
#*#Copy the text from Search Text into the search bar at http://www.guidestar.org/.
#*#Record all Organization Names that appear
#*#If no results appear, record DNE
#Sponsors/Partners
#*'''STATUS''': In Progress
#*'''Previously Collected''': NO
#*'''Published on Mechanical Turk''': NO
#*'''AUDITED''': NO
#**'''Audit Results''': TBD
#*'''UPDATES''':
#**'''UPDATE (7/21)''': Code written, but may require additional manual work. Expected time to complete is 45 seconds due to a potential list of a lot of sponsors/partners - pay rate, therefore, recommended $.12.
#*'''CODE'''
#*#Record all Sponsors from Search Text 1 into SPONSORS. If there does not exist a list or the link was for only 1 sponsor, record DNE.
#*#If any Sponsors from Search Text 1 include a University or College (will be listed in name), record them into UNIVERSITY SPONSORS
#*#Record all Partners from Search Text 2 into PARTNERS. If there does not exist a list or the link was for only 1 partner, record DNE.
#*#If any Partners from Search Text 2 include a University or College (will be listed in name), record them into UNIVERSITY PARTNERS
#Price for a space + office
#*'''STATUS''': Not Started
#*'''Previously Collected''': YES
#*'''Published on Mechanical Turk''': NO
#*'''AUDITED''': NO
#**'''Audit Results''': TBD
#*'''UPDATES''':
#**'''UPDATE (7/22)''': Code 1 written, code 2 need more work
#*'''CODE 1 of 2'''
#*#Go to company’s URL
#*#TBD
#Founding Date
#*'''STATUS''': To Be Discussed Further#*'''Previously Collected''': YES, but only year#*'''Published on Mechanical Turk''': NO#*'''AUDITED''': NO#**'''Audit Results''': TBD#*'''UPDATES''':#**'''UPDATE (7/21)''': Difficulties observed when figuring out how to Turk this #*'''CODE'''#*#Copy the text in the Search Text into a search engine, have solution (whois.#*#TBDnet)
===Group 2===
#Size (SQFT)
#*'''BRAINSTORM''': (7/19) 1), 2), 3): search allintext: sqft/square foot/square feet site: company URL. 4) Company Name, city, square feet and then choose frist first result. Process might be easier (and cheaper) if Veeral runs code firstto eliminate a bunch of 0 result returned.
#*'''STATUS''': In Progress
#*'''Previously Collected''': YES/NO, many missing
#*'''Published on Mechanical Turk''': NO
#*'''AUDITED''': NO
#**'''Audit Results''': TBD
#*'''UPDATES''':
#**'''UPDATE (7/19)''': Brainstorm and code updated
#*'''CODE'''
#*#Copy the text in the Search Text 1 into a search engine.
#Size (# Companies)
#*'''BRAINSTORM''': (7/22) Some companies don’t list all members but only selective ones. Some companies do not separate current members and alumni and goes like:"we have served more than 120 startups..."
#*'''STATUS''': In Progress
#*'''Previously Collected''': NO
#*'''Published on Mechanical Turk''': NO
#*'''AUDITED''': NO
#**'''Audit Results''': TBD
#*'''UPDATES''':
#**'''UPDATE (7/22)''': Brainstorm and code updated (Capital Factory 227)
#*'''CODE 1 of 2'''
#*#Go to Company URL
===Group 3===
#Mentors
#*'''BRAINSTORM''': Current form of this variable seems to be too general.
#*'''STATUS''': In Progress
#*'''Previously Collected''': NO
#*'''Published on Mechanical Turk''': NO
#*'''AUDITED''': NO
#**'''Audit Results''': TBD
#*'''UPDATES''':
#**'''UPDATE (7/19)''': Two possible codes written. First one requires more manual work
#*'''CODE 1 of 2'''
#*#Go to Company URL
#*#Mark as 1 if reliable site is populated, 0 otherwise
#Onsite Accelerator
#*'''BRAINSTORM''': Need a count.
#*'''STATUS''': Not Started
#*'''Previously Collected''': YES/NO, only a binary variable
#*'''Published on Mechanical Turk''': NO
#*'''AUDITED''': NO
#**'''Audit Results''': TBD
#*'''UPDATES''':
#**'''UPDATE (7/21)''': Code written - 2nd part, while more manual, appears to have greater range. 2nd code would only require Veeral's code. 1st code expected completion time is 30 seconds.
#*'''CODE 1 of 2'''
===Group 4===
===='''Curriculum and Code School====''' 
'''Curriculum'''
*'''Desc''': The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.
===='''Onsite OH Investors v. mentors====Thoughts (Ariel, 07/20): The names listed on 'mentor' page/sections must all be mentors, and the same applies for investors/OH investors although few companies list their investors. So here the only thing we are trying to differentiate is whether the mentor is a investor. maybe via checking whether they are from a VC firm?? But even they are from VC companies doesn't mean they are going to invest on the startups of the Hubs they are mentoring on. Or another way to think about it is differentiating between mentors/OH mentors. Mentors tend to give the particular startups long term support and available when needed while OH mentors only gives advice on the spot.  '''Mentors''' *'''Desc''': *'''Characteristics''':**Focus on improving entrepreneurial community through ongoing, recurring support**Help and guide the startups on: business plans and models, management, development, execution, technology innovation, marketing, sales**Common fields/occupations: founder/CEO of another company, business development, serial entrepreneur, marketing, sales, management consulting, technology and innovation, research professor etc.**Some companies offer mentor office hours *'''TBD Points''':  '''Investors''' *'''Desc''': *'''Characteristics''':**Focus on investing on early stage or growth stage startups**Usually from VC firms**Common fields/ occupations: VC firm manager, VC firm partner, fund manager *'''TBD Points''': 
*'''Potential Turks''':
#Search allintext:"office hours" site:URL
##search for 'fund'. (Ctrl + F) If 'fund' appears in the description paragraph of office hours on any of the five pages, mark ''investor OH'' as 1. Otherwise mark as DNE and copy the description paragraph of office hours of all five pages.
 ===='''Onsite temporary workshops v. networking events===='''Temporary Workshops''' *'''Desc''': *'''Characteristics''':**The purpose is learning and discussing **Often have a specific topic: business issue (e.g. online marketing) or techniques learning (e.g. intro to Java script)**In the forms of: workshop, class, panel, project, XX session, seminar, series, intro to XX**Exception: tech meetup is usually a workshop(e.g. C++ programmer meetup, http://techranchaustin.com/events/) *'''TBD Points''':**Do we care about what particular workshops (e.g. coding, leadership, etc.)?**Summits/major events *'''Potential Turks''':**See Turk for Both Below '''Networking Events''' *'''Desc''': *'''Characteristics''':**The purpose is to meet fellow entrepreneurs and experts and networking with them**Focus on experience sharing or communication as opposed to discussing a specific topic or technical subject**In the forms of: meetup, networking, happy hour, info session?, luncheon, XX night, socials, talks??, community XX *'''TBD Points''': *'''Potential Turks''':**See Turk for Both Below 
*'''Turk for Both 1 of 2''':
*#Search the Search Text 1 (allintext: events site: URL) and choose link to "Events", "Calendar", or related. Record 'url' on SOURCE If this does not exist, go to Step 7
#For all events that are related to scoial activities and networking (e.g. "Social," "Meet Up," "Breakfast"/"Lunch"/"Happy Hour", "Movie Night"/"Bowling"), count the number of the events into NETWORKING
 ===='''Onsite VC v. Angel Investors===='''
*Notes: Few companies have a section for their onsite VCs or angel investors. Even the company(Innovation Pavilion) that has Angel programs and VC programs does not conduct the programs by itself, but cooperate with external angel investors or VCs. Some companies have mentors or board members who are from VCs, but it does not mean they will invest in the member startups in those companies.
#*Getting
==Generating the Hubs Data==
All files can be found in the E:/Mcnair/Projects/Hubs/Searching
Recommended to select the CSV and Excel worksheets because there are many JSON files
#**DNEs
#Post Results in [[Hubs: Hubs Data Building]]
 
==Companies Used for Auditing/etc.==
Capital Factory, Austin
1871, Chicago
Rocket Space, San Francisco
1776, Washington D.C.
Betamore, Baltimore
Packard Place, Charlotte
The venture Center, Little Rock
GSV Labs, San Francisco
The Hive, Palo Alto
Innovation Pavilion, Denver
OSC Tech Lab, Akron
Speakeasy, Indianapolis
Riverside.io, Riverside
The Salt Mines, Columbus
InNEVation, Las Vegas
804 RVA
Impact Hub, Salt Lake
Awesome Inc, Louisville
Geekdom, San Antonio
Alloy26, Pittsburg
ReSET, Hartford
Ansir Innovation Center, San Diego
Domistation, Tallahassee
Atlanta Tech Village, Atlanta
Spark Labs, New York
=Completed Work=
See Section 3 of [[Hubs (Academic Paper)]]
[[Category: Internal]]
[[Internal Classification: Legacy| ]]

Navigation menu