Determinants of Seed Accelerator Performance: The Horse, the Jockey, and the Racetrack

Academic Paper
Title	Determinants of Seed Accelerator Performance: The Horse, the Jockey, and the Racetrack
Author	Ed Egan, Yael Hochberg
Status	In development
	© edegan.com, 2016

Current Work

Note that TFTRTA-AcceleratorFinal.txt in E:\projects\accelerators was updated to included all creation dates and dead dates. This is not reflected below, except that the script had it its load SQL updated too.

Load the existing data

Dbase is accelerators

SQL code is in:

E:\projects\accelerators\LoadAcceleratorTables.sql

This script:

Loads files from The File to Rule Them All.xlsx

AcceleratorsFinal 165
cohortsfinal 12941
FoundersMain 187
FoundersExperience 823
FoundersEducation 353

Loads 5 timing info files

Timing1 'Formatted Timing Info.txt' 1167
Timing2 'merging_work.txt' 257
Timing3 'additional_timing_info2-fixed.txt' 1521
Timing4 'SmallBatchTimingInfo.txt' 169
Timing5 'TurkData2ndPush-FormattedTimingWHeaderClean.txt' 1538

See Seed Accelerator Data Assembly for more information on these files.

Determine 'conamecommon' and 'conamevariant' for all conames in timing files and cohortsfinal. Also create an accelerator name lookup file for between timing and TFTRTA ('AcceleratorFinalTimingUnionAcceleratorName.txt') and load it. Use both to build:

TimingUnionNamesProper 3592 (Coname, Accelerator pair)
CohortsFinalCommon 12941

Note that the following 'accelerators' were listed in the timing info but not in TFTRTA:

KarmaTech
Make In LA
Rockstart AI
Talent Tech Labs
Ventures Accelerator
Wake Forest Innovations
White House Demo Day
XRC Labs

The timing files were processed and their data was assembled. The stack starts with Attended (12896 obs, 7044 with year and 6493 with year and quarter) and sequentially adds timing information until the last table, Attended5 (15460 obs, 10446 with year and 9871 with year and quarter), is produced. With the exception of timing2, each timing file added new cohort cos. Timing1 and timing5 had evidence URLs (total of just 248 distinct).

New Pull

Made tables:

TheMissing, 129 accs missing total of 4979 cohort cos
ThePresent, 153 accs with total of 10446 cohort cos
ThePresentByYear, 601 acc years
TheReview, 475 acc years -> "TheReview.txt"

TheReview.txt was then processed into SearchTerms.txt in E:\projects\accelerators\Google:

Accelerator	SearchTerm	Year

After some experimentation, we decided to add the following keywords to every search: demo day graduation pitch competition cohort

We fixed up and ran E:\projects\accelerators\Google\DemoDayCrawler.py This script was based on E:\mcnair\Software\Accelerators\DemoDayCrawler.py, rather than the more recent E:\mcnair\Projects\Accelerator Demo Day\Test Run\STEP1_crawl.py

The output is:

E:\projects\accelerators\Google\Results.txt 2515
E:\projects\accelerators\Google\Results folder containing html

Previously run Google search results are in:

5 results per accelerator -- E:\mcnair\Software\Accelerators\demoday_crawl_full.txt 2777
10 results per accelerator -- E:\mcnair\Projects\Accelerator Demo Day\Test Run\demoday_crawl_full_from_testrun.txt 4351
10 results per select accelerator year -- E:\mcnair\Projects\Accelerator Demo Day\Test Run\demoday_crawl_full.txt 1230

These were all copied to Z:\accelerators and cleaned up, and loaded along with the new Results.txt into accelerators. The SQL is in E:\projects\accelerators\LoadAcceleratorTables.sql

It looks like 2340/2514 of our pages are new...

Other info

Found the following list of accelerators by accident: https://www.s-b-z.com/FORMING%20THE%20BUSINESS/db/accelerators.aspx

To do

Still to do:

Re-train the classifier
Run the classifier on the Google results
Post the results to Mech Turk
Process the Mech Turk results
Match cohort cos to portcos (regenerate GotVC and add timing)
Match cohort cos to crunchbase again

Previous Work

The main Accelerator Demo Day page was built by Minh Le and documented in Minh_Le_(Work_Log).

VC Code

The old VC code is in

E:\mcnair\Projects\MatchingEntrepsToVC\DataWorkMatchingEntrepsV2-2.sql

It uses vcdb2 and forks off of roundlinejoinerleanff, building the following sequence of tables:

roundlineaggfirmsseq -> roundlineaggseqwexit (using roundlineaggfunds)
RoundLineMasterSeqBase (from roundlineaggseqwexit and 10 LJ'd tables)
RoundLineMasterSeq (RoundLineMasterSeqBase with FirmnameRoundInduTotal, FirmnameRoundInduHist)
Build out by stage -- MatchMostNumerousSeed, MatchHighestRandomSeed, etc.
RoundLineByStageKeys -> MasterByStageBase -> MasterByStage -> MasterByStageKeys -> MasterByStageBlownout

There is untested seq table code at the end of

E:\projects\vcdb3\OriginalSQL\MatchingEntrepsV3.sql

They build just roundlineaggfirmsseq

Accelerator Demo Day

See the Accelerator Demo Day for more information. We ran the code and posted several iterations to Turk, and completed at least one iteration by hand. from Amazon Mechanical Turk for Analyzing Demo Day Classifier's Results

E:\mcnair\Projects\Accelerator Demo Day\Turk\batch_results_all_accs_excel.xlsx -- looks like it contains the results of a Turk run. 265 results, 160 usable.
Accelerator_Demo_Day#Hand_Collecting_Data provides a link to a Google Sheet. This sheet was downloaded to E:\projects\accelerators\Demo Day Timing Info.xlsx - it contains 136 observations. Files of this format were processed by a script written by Grace?

Accelerator Code

The last build was by Ed and Hira. Hira's notes are on the Seed Accelerator Data Assembly page.

Claims:

dbase is likely vcdb2
All data files are in Z:/accelerator
The SQL file that loads all data is: LoadAccData.sql. It is located in E:\McNair\Projects\Accelerators\Summer 2018
Source data is E:\McNair\Projects\Accelerators\Summer 2018\The File To Rule Them All.xlsx
timing_final - This table is based on the most updated information on timing compiled in source file: Z:/accelerator/Formatted Timing Info.txt (by Grace)
additional_timing_info - source file: "merging_work.xlxs" located in: E:\Projects\McNair\Seed DB 8)
additional_timing_info2 - source file: "formatted timing info2.txt" located in E:\Projects\McNair\Accelerators\Summer 2018. This was collected through MTurks.
9) timing_combined - This table combines all timing information we have and appends tables 4, 7 and 8. 10) cohortcompanies_wtiming - merges data in tables cohortcompany and timing_combined
See also, Grace's code E:/McNair/Projects/Accelerators/Summer 2018/format_timing.py. Last file it produced was TurkData2ndPush-FormattedTiming.txt

Hira's code

Load:

timing_final from formatted_timing_final.txt -- 1167
additional_timing_info from merging_work.txt --257
additional_timing_info2 from additional_timing_info2.txt -- 1523
timing_combined from all three above (additional_timing_info,timing_final,additional_timing_info2) -- 2817
cohortsfinal from cohorts_final.txt from File to Rule Them All.xlx
founders from founders_main.txt from File to Rule Them All.xlx
founders_experience from founders_experience.txt from File to Rule Them All.xlx

Last code written by Ed was likely:

E:\mcnair\Projects\Accelerators\Summer 2018\FindTiming.sql
/*
timing_final
	Demo manual fill out effort
	--1167
additional_timing_info
	SeedDB crawl
	--257
additional_timing_info2
	Main MTURK Crawl
	--1523
*/

Timing related tables:

Draws from timing_combined
Produces FindThese and FindTheseCos
\COPY TurkRun2 FROM 'TurkData2ndPush-FormattedTimingWHeader.txt' --1538
\COPY ManualAdd2 FROM 'SmallBatchTimingInfo.txt' --169

Timing Info Files

TurkData2ndPush-FormattedTimingWHeaderClean.txt <- TurkData2ndPush-FormattedTimingWHeader.txt
	company	pagedetails	accelerator	date	cohortname
	1539, cohortname is patchy but otherwise great

SmallBatchTimingInfo.txt
	conamestd	accelerator	date	month	year	cohort	quarter
	171, everything is patchy

merging_work.txt
	conamestd	accelerator	matched coname	url	cohort name	date	month	Year	Quarter
	259, very clean file

additional_timing_info2-fixed.txt	
		companyname	accelerator	cohortname	date	month	year	season	type
		1524 (seems messy)
	
	Same as: Formatted Timing Info2 wHeaderCleaned.txt <- Formatted Timing Info2 wHeader.txt
	Coname	Accelerator	ResultDate	ResultType	CohortName
	1524, fairly clean

Formatted Timing Info.txt
	coname	acceleratorname	keyword	url	webpage	predicted	gooddata	page_details	full_date	month	year	cohort_name	notes	prog_duration_wks	actual_date	actual_month	actual_year	season
	1168, fairly clean

	Same as: formatted_timing_final.txt
		coname	acceleratorname	keyword	url	webpage	predicted	gooddata	page_details	full_date	month	year	cohort_name	notes	prog_duration_wks	actual_date	actual_month	actual_year	season
		1169

Files in Summer 2018 with provenance

SmallBatchTimingInfo.txt

Appears hand collected
170 lines, conamestd	accelerator	date	month	year	cohort	quarter

TurkData2ndPush-FormattedTimingWHeader.txt

Processed by format_timing.py
Comes from Final Turk Push.xlsx
1515 lines, company name normalized

Formatted Timing Info 2

No header but: coname accelerator date pagetype
1523 lines
Seems to have come from GraceData.txt and been processed by an earlier version of format_timing.py

Formatted Timing Info

Header: coname	acceleratorname	keyword	url	webpage	predicted	gooddata	page_details	full_date	month	year	cohort_name	notes	prog_duration_wks	actual_date	actual_month	actual_year	season
1168 lines, company name normalized
Seems to have come from Demo Day Timing Info - Good Data Only.txt

Demo Day Timing Info Companies

No header, but appears coname normalized
1143 lines
Might have come from Demo Day Timing Info - Good Data Only.txt
Made obsolete by Formatted Timing Info?

Note that the most recent file is NewBatchForTimingInfo.txt, which contains coname, accelerator pairs. It's not clear if it was ever run.

Determinants of Seed Accelerator Performance: The Horse, the Jockey, and the Racetrack

Contents

Current Work

Load the existing data

New Pull

Other info

To do

Previous Work

VC Code

Accelerator Demo Day

Accelerator Code

Hira's code

Timing Info Files

Files in Summer 2018 with provenance

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools