Difference between revisions of "Accelerator Industry Classification (Data)"

From edegan.com
Jump to navigation Jump to search
Line 3: Line 3:
 
|Has owner=Veeral Shah
 
|Has owner=Veeral Shah
 
|Has start date=Summer 2016
 
|Has start date=Summer 2016
||Keywords=Mechanical Turk
+
|Keywords=Mechanical Turk
 
|Primary Billing=AccNBER01
 
|Primary Billing=AccNBER01
 
}}
 
}}

Revision as of 20:04, 28 February 2017


McNair Project
Accelerator Industry Classification (Data)
Project logo 02.png
Project Information
Project Title Accelerator Industry Classification (Data)
Owner Veeral Shah
Start Date Summer 2016
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


This page is included in the project Accelerators (Data)


Accelerators classified by the industry of which the accelerators' cohorts typically reside in.

Introduction and steps

The purpose of this project is to be able to give a NAICS code to each startup in our sample of startups that have attended accelerators. This will allow us to determine the specialization (if any) of each of the startup accelerators in our sample. The first few steps of the process are meant to get you up to speed.


Step 1: Retrieve URLs for each of the companies

For the first task, we will get the URLs for the company websites of every company that entered an accelerator. To do so, we will use “Porfolio Firms” Tab of the Google Sheet called “Accelerator-Portfolio-NAICS”.

The link to the google sheet is here: https://docs.google.com/spreadsheets/d/1Bw3qX7xLs5lSv577pS9YUtURDofYdqOza2MuP4dbthw/edit?usp=sharing

Here are the steps, each refers step refers to this sheet explicitely:

1. Look for a row that does not have the URL column filled out

2. Check to make sure that no one else is working on that row by seeing if someone has placed their cursor on that row

3. Now put your cursor on that row by clicking on the row’s Name column (this will make sure no one duplicates your work)

4. Now copy the Accelerator and Name columns of that row

5. Paste them into the google search box and search

6. Find a website that references both the accelerator name and the startup name

7. Click on that website’s link in google

8. See if the webpage includes a URL for the startup

9. Copy the company’s URL to the URL column in the row of the startup in the “Accelerator-Portfolio-NAICS” sheet

10. Put your initials (first, middle and last) in the “URL-ENTERER” column of the spreadsheet

11. If you have time, go back to step 1 and repeat

Step 2: Measure your productivity at Step 1

1. Repeat Step 1 for 1.5 - 2 hours while keeping track of how many rows you have completed (I do it by noting start and end row throughout this period). For this period of work, really try to focus and see how well you can do. We are trying to establish a top end of the productivity possible when running this task in this way.

2. Make note of your production for that time period both overall and in terms of per minute completion rate in the lab notes section below.

3. In addition to the simple fact of how fast you were able to complete rows of data, please make notes of what took the most amount of time on average for each step. Essentially, I want you to do a rough version of a "time motion" study (feel free to look that up) that will reveal potential sources of productivity gain if we could automate them.

Step 3: Install a new piece of software into the server system

1. Find time with Ed to install the software provided at the links below onto the system

https://www.dropbox.com/s/egau7yuho3wkjlo/gscript.py?dl=0

https://www.dropbox.com/s/afqzcpsxvvl1k2n/glink.py?dl=0

1a. While you wait for Ed to help you with the installation, I want you to read through the code in these two programs. I want you to really puzzle through how this all works until you think you understand it. As you do this, I want you to keep good notes on the wiki (including potentially screenshots) under the lab notebook tab. Also, I want you to freely message me with questions as you go.

Step 4: Run the new software on a limited batch of the spreadsheet above

1. Call me before you do this, but this step should be self explanatory

2. The program will output the same sheet you inputted but with appended search results. This should make it simpler to find a URL for the website

3. Create a set of steps that use the output of the program to find company URLS and note those steps in your lab notes

Lab Notes

6/21/2016 - 11:00 AM - 12:30 PM

Formed Productivity Baseline - 130 Rows completed in 90 minutes, slightly under 1.5 rows per minutes at full productivity.

Women

International