Difference between revisions of "Determinants of Seed Accelerator Performance: The Horse, the Jockey, and the Racetrack"

From edegan.com
Jump to navigation Jump to search
Line 4: Line 4:
 
|Has paper status=In development
 
|Has paper status=In development
 
}}
 
}}
==Summary==
+
==Existing Data==
  
This page needs completing. This project is fully in development!
+
===VC Code===
 +
 
 +
The old VC code is in
 +
E:\mcnair\Projects\MatchingEntrepsToVC\DataWorkMatchingEntrepsV2-2.sql
 +
 
 +
It uses vcdb2 and forks off of roundlinejoinerleanff, building the following sequence of tables:
 +
*roundlineaggfirmsseq -> roundlineaggseqwexit (using roundlineaggfunds)
 +
*RoundLineMasterSeqBase (from roundlineaggseqwexit and 10 LJ'd tables)
 +
*RoundLineMasterSeq (RoundLineMasterSeqBase with FirmnameRoundInduTotal, FirmnameRoundInduHist)
 +
*Build out by stage -- MatchMostNumerousSeed, MatchHighestRandomSeed, etc.
 +
*RoundLineByStageKeys -> MasterByStageBase -> MasterByStage -> MasterByStageKeys -> MasterByStageBlownout
 +
 
 +
There is untested seq table code at the end of
 +
E:\projects\vcdb3\OriginalSQL\MatchingEntrepsV3.sql
 +
 
 +
They build just roundlineaggfirmsseq
 +
 
 +
===Accelerator Demo Day===
 +
 
 +
See the [[Accelerator Demo Day]] for more information. We ran the code and posted several iterations to Turk, and completed at least one iteration by hand. from [[Amazon Mechanical Turk for Analyzing Demo Day Classifier's Results]]
 +
*E:\mcnair\Projects\Accelerator Demo Day\Turk\batch_results_all_accs_excel.xlsx -- looks like it contains the results of a Turk run. 265 results, 160 usable.
 +
*[[Accelerator_Demo_Day#Hand_Collecting_Data]] provides a link to a Google Sheet. This sheet was downloaded to E:\projects\accelerators\Demo Day Timing Info.xlsx - it contains 136 observations. Files of this format were processed by a script written by Grace?
 +
 
 +
===Accelerator Code===
 +
 
 +
The last build was by Ed and Hira. Hira's notes are on the [[Seed Accelerator Data Assembly]] page.  
 +
 
 +
Claims:
 +
*dbase is likely vcdb2
 +
*All data files are in Z:/accelerator
 +
*The SQL file that loads all data is: LoadAccData.sql. It is located in E:\McNair\Projects\Accelerators\Summer 2018
 +
*Source data is E:\McNair\Projects\Accelerators\Summer 2018\The File To Rule Them All.xlsx
 +
*timing_final - This table is based on the most updated information on timing compiled in source file: Z:/accelerator/Formatted Timing Info.txt (by Grace)
 +
*additional_timing_info - source file: "merging_work.xlxs" located in: E:\Projects\McNair\Seed DB 8)
 +
*additional_timing_info2 - source file: "formatted timing info2.txt" located in E:\Projects\McNair\Accelerators\Summer 2018. This was collected through MTurks.
 +
*9) timing_combined - This table combines all timing information we have and appends tables 4, 7 and 8. 10) cohortcompanies_wtiming - merges data in tables cohortcompany and timing_combined
 +
*See also, Grace's code E:/McNair/Projects/Accelerators/Summer 2018/format_timing.py. Last file it produced was TurkData2ndPush-FormattedTiming.txt
 +
 
 +
Last code written by Ed was likely:
 +
E:\mcnair\Projects\Accelerators\Summer 2018\FindTiming.sql
 +
 
 +
 
 +
Best files with provenance:
 +
 
 +
SmallBatchTimingInfo.txt
 +
Appears hand collected
 +
170 lines, conamestd accelerator date month year cohort quarter
 +
 
 +
TurkData2ndPush-FormattedTimingWHeader.txt
 +
Processed by format_timing.py
 +
Comes from Final Turk Push.xlsx
 +
1515 lines, company name normalized
 +
 
 +
Formatted Timing Info 2
 +
No header but: coname accelerator date pagetype
 +
1523 lines
 +
Seems to have come from GraceData.txt and been processed by an earlier version of format_timing.py
 +
 
 +
Formatted Timing Info
 +
Header: coname acceleratorname keyword url webpage predicted gooddata page_details full_date month year cohort_name notes prog_duration_wks actual_date actual_month actual_year season
 +
1168 lines, company name normalized
 +
Seems to have come from Demo Day Timing Info - Good Data Only.txt
 +
 
 +
Demo Day Timing Info Companies
 +
No header, but appears coname normalized
 +
1143 lines
 +
Might have come from Demo Day Timing Info - Good Data Only.txt
 +
Made obsolete by Formatted Timing Info?
 +
 
 +
Note that the most recent file is NewBatchForTimingInfo.txt, which contains  coname, accelerator pairs. It's not clear if it was ever run.

Revision as of 18:14, 7 April 2019

Academic Paper
Title Determinants of Seed Accelerator Performance: The Horse, the Jockey, and the Racetrack
Author Ed Egan, Yael Hochberg
Status In development
© edegan.com, 2016

Existing Data

VC Code

The old VC code is in

E:\mcnair\Projects\MatchingEntrepsToVC\DataWorkMatchingEntrepsV2-2.sql

It uses vcdb2 and forks off of roundlinejoinerleanff, building the following sequence of tables:

  • roundlineaggfirmsseq -> roundlineaggseqwexit (using roundlineaggfunds)
  • RoundLineMasterSeqBase (from roundlineaggseqwexit and 10 LJ'd tables)
  • RoundLineMasterSeq (RoundLineMasterSeqBase with FirmnameRoundInduTotal, FirmnameRoundInduHist)
  • Build out by stage -- MatchMostNumerousSeed, MatchHighestRandomSeed, etc.
  • RoundLineByStageKeys -> MasterByStageBase -> MasterByStage -> MasterByStageKeys -> MasterByStageBlownout

There is untested seq table code at the end of

E:\projects\vcdb3\OriginalSQL\MatchingEntrepsV3.sql

They build just roundlineaggfirmsseq

Accelerator Demo Day

See the Accelerator Demo Day for more information. We ran the code and posted several iterations to Turk, and completed at least one iteration by hand. from Amazon Mechanical Turk for Analyzing Demo Day Classifier's Results

  • E:\mcnair\Projects\Accelerator Demo Day\Turk\batch_results_all_accs_excel.xlsx -- looks like it contains the results of a Turk run. 265 results, 160 usable.
  • Accelerator_Demo_Day#Hand_Collecting_Data provides a link to a Google Sheet. This sheet was downloaded to E:\projects\accelerators\Demo Day Timing Info.xlsx - it contains 136 observations. Files of this format were processed by a script written by Grace?

Accelerator Code

The last build was by Ed and Hira. Hira's notes are on the Seed Accelerator Data Assembly page.

Claims:

  • dbase is likely vcdb2
  • All data files are in Z:/accelerator
  • The SQL file that loads all data is: LoadAccData.sql. It is located in E:\McNair\Projects\Accelerators\Summer 2018
  • Source data is E:\McNair\Projects\Accelerators\Summer 2018\The File To Rule Them All.xlsx
  • timing_final - This table is based on the most updated information on timing compiled in source file: Z:/accelerator/Formatted Timing Info.txt (by Grace)
  • additional_timing_info - source file: "merging_work.xlxs" located in: E:\Projects\McNair\Seed DB 8)
  • additional_timing_info2 - source file: "formatted timing info2.txt" located in E:\Projects\McNair\Accelerators\Summer 2018. This was collected through MTurks.
  • 9) timing_combined - This table combines all timing information we have and appends tables 4, 7 and 8. 10) cohortcompanies_wtiming - merges data in tables cohortcompany and timing_combined
  • See also, Grace's code E:/McNair/Projects/Accelerators/Summer 2018/format_timing.py. Last file it produced was TurkData2ndPush-FormattedTiming.txt

Last code written by Ed was likely:

E:\mcnair\Projects\Accelerators\Summer 2018\FindTiming.sql


Best files with provenance:

SmallBatchTimingInfo.txt

Appears hand collected
170 lines, conamestd	accelerator	date	month	year	cohort	quarter

TurkData2ndPush-FormattedTimingWHeader.txt

Processed by format_timing.py
Comes from Final Turk Push.xlsx
1515 lines, company name normalized

Formatted Timing Info 2

No header but: coname accelerator date pagetype
1523 lines
Seems to have come from GraceData.txt and been processed by an earlier version of format_timing.py

Formatted Timing Info

Header: coname	acceleratorname	keyword	url	webpage	predicted	gooddata	page_details	full_date	month	year	cohort_name	notes	prog_duration_wks	actual_date	actual_month	actual_year	season
1168 lines, company name normalized
Seems to have come from Demo Day Timing Info - Good Data Only.txt

Demo Day Timing Info Companies

No header, but appears coname normalized
1143 lines
Might have come from Demo Day Timing Info - Good Data Only.txt
Made obsolete by Formatted Timing Info?

Note that the most recent file is NewBatchForTimingInfo.txt, which contains coname, accelerator pairs. It's not clear if it was ever run.