Difference between revisions of "Accelerator Data"

From edegan.com
Jump to navigation Jump to search
 
(8 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{McNair Projects
+
{{Project
 +
|Has project output=Data
 +
|Has sponsor=McNair Center
 
|Has title=Composite Accelerator Data
 
|Has title=Composite Accelerator Data
 
|Has owner=Matthew Ringheanu, Shrey Agarwal,
 
|Has owner=Matthew Ringheanu, Shrey Agarwal,
Line 6: Line 8:
 
|Has keywords=Accelerator, Data
 
|Has keywords=Accelerator, Data
 
|Has notes=Continuation of [Accelerator Seed List (Data)]
 
|Has notes=Continuation of [Accelerator Seed List (Data)]
|Has project status=Active
+
|Has project status=Subsume
 
|Is dependent on=Accelerator Seed List (Data),
 
|Is dependent on=Accelerator Seed List (Data),
 
}}
 
}}
 
 
=Relevant Files=
 
=Relevant Files=
 
==Location for All Relevant Files==
 
==Location for All Relevant Files==
Line 15: Line 16:
 
==List of All Relevant Files==
 
==List of All Relevant Files==
  
*'''Original Search'''
+
'''Original Search'''
 
 
 
 
**'''List of Preliminary Accelerators'''
 
***Original Location: [[Accelerator Seed List (Data)]]
 
***Description: This is the very first master list we compiled of potential accelerators. Look to [[Accelerator Seed List (Data)]] for process.
 
***Variables: Names of potential accelerators
 
 
 
**'''accelerator_data_noflag'''
 
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data
 
***Description: This text file contains the data on all accelerators that we found from our first round of research that were not flagged. It consolidates the data collected by all McNair Center interns, filtering out the organizations which are not accelerators.
 
***Variables: Name, Score, Flag, CohortURL, Address, Duration, Vintage, Industry, Description, Equity, Nonprofit, Notes
 
  
*'''Cohort Directory "Big Push"'''
+
*'''List of Preliminary Accelerators'''
 +
**Original Location: [[Accelerator Seed List (Data)]]
 +
**Description: This is the very first master list we compiled of potential accelerators. Look to [[Accelerator Seed List (Data)]] for process.
 +
**Variables: Names of potential accelerators
  
 +
*'''accelerator_data_noflag'''
 +
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data
 +
**Description: This text file contains the data on all accelerators that we found from our first round of research that were not flagged. It consolidates the data collected by all McNair Center interns, filtering out the organizations which are not accelerators.
 +
**Variables: Name, Score, Flag, CohortURL, Address, Duration, Vintage, Industry, Description, Equity, Nonprofit, Notes
  
**'''Data'''
+
'''Cohort Directory "Big Push"'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\
 
***Description: This folder contains files for each of the accelerators that we searched through from the "List of Preliminary Accelerators". There are three files per accelerator: 1) The "accelerator name.txt" file which contains each of the variables recorded by all of the McNair Center workers during our big push on the project winter 2016, 2) The .html file for the cohort page if the entry was indeed an accelerator and if the worker could find the cohort page on that accelerator, and 3) a "accelerator name.cohort.txt" file which contains a list of the cohort companies as well as all variables which were easily found alongside the cohort.
 
  
**'''List of Python files'''
+
*'''Data'''
***'''parse_accelerator_data'''
+
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\
***'''parse_cohort_data'''
+
**Description: This folder contains files for each of the accelerators that we searched through from the "List of Preliminary Accelerators". There are three files per accelerator: 1) The "accelerator name.txt" file which contains each of the variables recorded by all of the McNair Center workers during our big push on the project winter 2016, 2) The .html file for the cohort page if the entry was indeed an accelerator and if the worker could find the cohort page on that accelerator, and 3) a "accelerator name.cohort.txt" file which contains a list of the cohort companies as well as all variables which were easily found alongside the cohort.
***'''process_locations'''
 
***'''wayback_machine'''
 
***Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data
 
***Description: These files contain the code which Peter used to categorize the data from the "Data Copy" folder in Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\, which is just a copy of our cleaned data file. From this code, Peter returned for us a list of accelerators categorized by their flag and a compiled list of all the cohort companies as well as the variables recorded by McNair workers.
 
  
**'''Cleaned Cohort Data'''
+
*'''List of Python files'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
+
**'''parse_accelerator_data'''
***Description: This Excel file contains all data on all cohort companies for our entire list of current accelerators. All missing accelerators were updated by Veeral and we have used this as our final list of cohort companies for all accelerators.
+
**'''parse_cohort_data'''
***Variables: Accelerator Name, Company Name, Description, Website, Industry, Location, Acquisition, Notes, Inverstors, Perks, Status, Funding Stage, Founder, Executive, Program, Cohort, Year
+
**'''process_locations'''
 +
**'''wayback_machine'''
 +
**Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data
 +
**Description: These files contain the code which Peter used to categorize the data from the "Data Copy" folder in Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\, which is just a copy of our cleaned data file. From this code, Peter returned for us a list of accelerators categorized by their flag and a compiled list of all the cohort companies as well as the variables recorded by McNair workers.
 +
**'''Note''': We manually altered the cohort data which came out of Peter's code so that we could homogenize the formatting. This resulted in a unique cohort file which will not be replicated when running the code again. On the other hand, we manually altered the individual txt files for the accelerators to fix format so running Peter's code again should result in a similar file.
  
*'''Refining the List'''
+
*'''Cleaned Cohort Data'''
 +
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
 +
**Description: This Excel file contains all data on all cohort companies for our entire list of current accelerators. All missing accelerators were updated by Veeral and we have used this as our final list of cohort companies for all accelerators.
 +
**Variables: Accelerator Name, Company Name, Description, Website, Industry, Location, Acquisition, Notes, Inverstors, Perks, Status, Funding Stage, Founder, Executive, Program, Cohort, Year
  
 +
*'''First_Incomplete_PercentVC_Table'''
 +
**Original Location: Bulk(Z:)\Accelerators
 +
**Description: The VC percentage raise rate for 198 accelerators. At this point we realized we were missing almost 100 accelerators, so we decided to expand our list and gather more data.
 +
**Variables: Accelerator Name, Number of Cohort Companies, Number of VC Backed Cohort Companies, Raise rate percentage
  
**'''New Crunchbase Accelerators'''
+
'''Refining the List'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017
 
***Description: After conducting some SDC matches with cohort data from the cohort companies of the accelerators in the '''accelerator_data_noflag''' text file, we realized many potential accelerators were missing. We then got an Excel file from Crunchbase containing all of its organizations, which we then sorted to identify potential missing accelerators. The accelerators we were actually missing are in this Excel file.
 
***Variables: Names of Missing Accelerators
 
***Potential Crunchbase Variables
 
  
**'''Accelerator_Data'''
+
*'''New Crunchbase Accelerators'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Summer 2017\Veeral
+
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017
***Description: This text file contains cleaned data on all of our current accelerators. This file was compiled by Veeral over Summer 2017. Some of these accelerators are not based in the United States.
+
**Description: After conducting some SDC matches with cohort data from the cohort companies of the accelerators in the '''accelerator_data_noflag''' text file, we realized many potential accelerators were missing. We then got an Excel file from Crunchbase containing all of its organizations, which we then sorted to identify potential missing accelerators. The accelerators we were actually missing are in this Excel file.
***Variables: Accelerator, homepage_url, city, region, country_code, Creation date
+
**Variables: Names of Missing Accelerators
 +
**Potential Crunchbase Variables
  
**'''ListofAccs'''
+
*'''Accelerator_Data'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
+
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Summer 2017\Veeral
***Description: This text file contains a master list of all current accelerators we have been working with.
+
**Description: This text file contains cleaned data on all of our current accelerators. This file was compiled by Veeral over Summer 2017. Some of these accelerators are not based in the United States.
***Variables: Accelerator name, Whois parser code
+
**Variables: Accelerator, homepage_url, city, region, country_code, Creation date
  
*'''Additional Variables'''
+
*'''ListofAccs'''
 +
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
 +
**Description: This text file contains a master list of all current accelerators we have been working with.
 +
**Variables: Accelerator name, Whois parser code
  
 +
'''Additional Variables'''
  
**'''Accelerator_Cohort_Companies'''
+
*'''Accelerator_Cohort_Companies'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
+
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
***Description: This text file contains a master list of all cohort companies of all accelerators.
+
**Description: This text file contains a master list of all cohort companies of all accelerators.
***Variables: Cohort Companies, Accelerator name
+
**Variables: Cohort Companies, Accelerator name
  
**'''Current Matched Data'''
+
*'''Current Matched Data'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
+
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
***Description: Sheet 1 contains our matched data from matching our SDC pull with our cohort companies list found in '''Accelerator_Cohort_Companies'''. Sheet 2 removes the duplicates from the previous match. Sheet 3 contains the list of VCCompanies, which accelerator they went through, the date of their first investment. Sheet 4 contains our cohort list matched with the crunchbase organizations, but it contains too many duplicates to use.
+
**Description: Sheet 1 contains our matched data from matching our SDC pull with our cohort companies list found in '''Accelerator_Cohort_Companies'''. Sheet 2 removes the duplicates from the previous match. Sheet 3 contains the list of VCCompanies, which accelerator they went through, the date of their first investment. Sheet 4 contains our cohort list matched with the crunchbase organizations, but it contains too many duplicates to use.
***Variables: VCCompanies, Accelerator, Earliest Round Date
+
**Variables: VCCompanies, Accelerator, Earliest Round Date
  
**'''founders_linkedin'''
+
*'''founders_linkedin'''
***Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
+
**Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
***Description: This text file contains founder data for each accelerator found by Peter when crawling LinkedIn.
+
**Description: This text file contains founder data for each accelerator found by Peter when crawling LinkedIn.
***Variables: Accelerator name, Founder name, LinkedIn URL
+
**Variables: Accelerator name, Founder name, LinkedIn URL

Latest revision as of 13:41, 21 September 2020


Project
Accelerator Data
Project logo 02.png
Project Information
Has title Composite Accelerator Data
Has owner Matthew Ringheanu, Shrey Agarwal
Has start date Fall 2016
Has deadline date
Has keywords Accelerator, Data
Has project status Subsume
Is dependent on Accelerator Seed List (Data)
Subsumed by: U.S. Seed Accelerators
Has sponsor McNair Center
Has project output Data
Copyright © 2019 edegan.com. All Rights Reserved.

Relevant Files

Location for All Relevant Files

  • All relevant files are located in Bulk(E:)\McNair\Projects\Accelerators\All Relevant Files

List of All Relevant Files

Original Search

  • accelerator_data_noflag
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data
    • Description: This text file contains the data on all accelerators that we found from our first round of research that were not flagged. It consolidates the data collected by all McNair Center interns, filtering out the organizations which are not accelerators.
    • Variables: Name, Score, Flag, CohortURL, Address, Duration, Vintage, Industry, Description, Equity, Nonprofit, Notes

Cohort Directory "Big Push"

  • Data
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\
    • Description: This folder contains files for each of the accelerators that we searched through from the "List of Preliminary Accelerators". There are three files per accelerator: 1) The "accelerator name.txt" file which contains each of the variables recorded by all of the McNair Center workers during our big push on the project winter 2016, 2) The .html file for the cohort page if the entry was indeed an accelerator and if the worker could find the cohort page on that accelerator, and 3) a "accelerator name.cohort.txt" file which contains a list of the cohort companies as well as all variables which were easily found alongside the cohort.
  • List of Python files
    • parse_accelerator_data
    • parse_cohort_data
    • process_locations
    • wayback_machine
    • Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data
    • Description: These files contain the code which Peter used to categorize the data from the "Data Copy" folder in Bulk(E:)\McNair\Projects\Accelerators\Spring 2017\, which is just a copy of our cleaned data file. From this code, Peter returned for us a list of accelerators categorized by their flag and a compiled list of all the cohort companies as well as the variables recorded by McNair workers.
    • Note: We manually altered the cohort data which came out of Peter's code so that we could homogenize the formatting. This resulted in a unique cohort file which will not be replicated when running the code again. On the other hand, we manually altered the individual txt files for the accelerators to fix format so running Peter's code again should result in a similar file.
  • Cleaned Cohort Data
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
    • Description: This Excel file contains all data on all cohort companies for our entire list of current accelerators. All missing accelerators were updated by Veeral and we have used this as our final list of cohort companies for all accelerators.
    • Variables: Accelerator Name, Company Name, Description, Website, Industry, Location, Acquisition, Notes, Inverstors, Perks, Status, Funding Stage, Founder, Executive, Program, Cohort, Year
  • First_Incomplete_PercentVC_Table
    • Original Location: Bulk(Z:)\Accelerators
    • Description: The VC percentage raise rate for 198 accelerators. At this point we realized we were missing almost 100 accelerators, so we decided to expand our list and gather more data.
    • Variables: Accelerator Name, Number of Cohort Companies, Number of VC Backed Cohort Companies, Raise rate percentage

Refining the List

  • New Crunchbase Accelerators
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Spring 2017
    • Description: After conducting some SDC matches with cohort data from the cohort companies of the accelerators in the accelerator_data_noflag text file, we realized many potential accelerators were missing. We then got an Excel file from Crunchbase containing all of its organizations, which we then sorted to identify potential missing accelerators. The accelerators we were actually missing are in this Excel file.
    • Variables: Names of Missing Accelerators
    • Potential Crunchbase Variables
  • Accelerator_Data
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Summer 2017\Veeral
    • Description: This text file contains cleaned data on all of our current accelerators. This file was compiled by Veeral over Summer 2017. Some of these accelerators are not based in the United States.
    • Variables: Accelerator, homepage_url, city, region, country_code, Creation date
  • ListofAccs
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
    • Description: This text file contains a master list of all current accelerators we have been working with.
    • Variables: Accelerator name, Whois parser code

Additional Variables

  • Accelerator_Cohort_Companies
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
    • Description: This text file contains a master list of all cohort companies of all accelerators.
    • Variables: Cohort Companies, Accelerator name
  • Current Matched Data
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
    • Description: Sheet 1 contains our matched data from matching our SDC pull with our cohort companies list found in Accelerator_Cohort_Companies. Sheet 2 removes the duplicates from the previous match. Sheet 3 contains the list of VCCompanies, which accelerator they went through, the date of their first investment. Sheet 4 contains our cohort list matched with the crunchbase organizations, but it contains too many duplicates to use.
    • Variables: VCCompanies, Accelerator, Earliest Round Date
  • founders_linkedin
    • Original Location: Bulk(E:)\McNair\Projects\Accelerators\Fall 2017
    • Description: This text file contains founder data for each accelerator found by Peter when crawling LinkedIn.
    • Variables: Accelerator name, Founder name, LinkedIn URL