Changes

Jump to navigation Jump to search
no edit summary
|Has paper status=In development
}}
 ==Existing Current Work== Note that TFTRTA-AcceleratorFinal.txt in E:\projects\accelerators was updated to included all creation dates and dead dates. This is not reflected below, except that the script had it its load SQL updated too. ===Load the existing data=== Dbase is '''accelerators''' SQL code is in: E:\projects\accelerators\LoadAcceleratorTables.sql This script: Loads files from The File to Rule Them All.xlsx*AcceleratorsFinal 165*cohortsfinal 12941*FoundersMain 187*FoundersExperience 823*FoundersEducation 353 Loads 5 timing info files*Timing1 'Formatted Timing Info.txt' 1167*Timing2 'merging_work.txt' 257*Timing3 'additional_timing_info2-fixed.txt' 1521*Timing4 'SmallBatchTimingInfo.txt' 169*Timing5 'TurkData2ndPush-FormattedTimingWHeaderClean.txt' 1538 See [[Seed Accelerator DataAssembly]] for more information on these files. Determine 'conamecommon' and 'conamevariant' for all conames in timing files and cohortsfinal. Also create an accelerator name lookup file for between timing and TFTRTA ('AcceleratorFinalTimingUnionAcceleratorName.txt') and load it. Use both to build:*TimingUnionNamesProper 3592 (Coname, Accelerator pair)*CohortsFinalCommon 12941 Note that the following 'accelerators' were listed in the timing info but not in TFTRTA:*KarmaTech*Make In LA*Rockstart AI*Talent Tech Labs*Ventures Accelerator*Wake Forest Innovations*White House Demo Day*XRC Labs The timing files were processed and their data was assembled. The stack starts with Attended (12896 obs, 7044 with year and 6493 with year and quarter) and sequentially adds timing information until the last table, Attended5 (15460 obs, 10446 with year and 9871 with year and quarter), is produced. With the exception of timing2, each timing file added new cohort cos. Timing1 and timing5 had evidence URLs (total of just 248 distinct).  ===New Pull=== Made tables:*TheMissing, 129 accs missing total of 4979 cohort cos*ThePresent, 153 accs with total of 10446 cohort cos*ThePresentByYear, 601 acc years*TheReview, 475 acc years -> "TheReview.txt" TheReview.txt was then processed into SearchTerms.txt in E:\projects\accelerators\Google: Accelerator SearchTerm Year After some experimentation, we decided to add the following keywords to every search: demo day graduation pitch competition cohort We fixed up and ran E:\projects\accelerators\Google\DemoDayCrawler.py This script was based on E:\mcnair\Software\Accelerators\DemoDayCrawler.py, rather than the more recent E:\mcnair\Projects\Accelerator Demo Day\Test Run\STEP1_crawl.py The output is:*E:\projects\accelerators\Google\Results.txt 2515*E:\projects\accelerators\Google\Results folder containing html Previously run Google search results are in: *5 results per accelerator -- E:\mcnair\Software\Accelerators\demoday_crawl_full.txt 2777*10 results per accelerator -- E:\mcnair\Projects\Accelerator Demo Day\Test Run\demoday_crawl_full_from_testrun.txt 4351*10 results per select accelerator year -- E:\mcnair\Projects\Accelerator Demo Day\Test Run\demoday_crawl_full.txt 1230 These were all copied to Z:\accelerators and cleaned up, and loaded along with the new Results.txt into '''accelerators'''. The SQL is in E:\projects\accelerators\LoadAcceleratorTables.sql It looks like 2340/2514 of our pages are new... ====Other info==== Found the following list of accelerators by accident: https://www.s-b-z.com/FORMING%20THE%20BUSINESS/db/accelerators.aspx ===To do=== Still to do:#Re-train the classifier#Run the classifier on the Google results#Post the results to Mech Turk#Process the Mech Turk results#Match cohort cos to portcos (regenerate GotVC and add timing)#Match cohort cos to crunchbase again ==Previous Work== The main [[Accelerator Demo Day]] page was built by [[Minh Le]] and documented in [[Minh_Le_(Work_Log)]]. See also:*[[Accelerator Seed List (Data)]]*[[Accelerator Data]]
===VC Code===
*\COPY TurkRun2 FROM 'TurkData2ndPush-FormattedTimingWHeader.txt' --1538
*\COPY ManualAdd2 FROM 'SmallBatchTimingInfo.txt' --169
 
====Timing Info Files====
 
TurkData2ndPush-FormattedTimingWHeaderClean.txt <- TurkData2ndPush-FormattedTimingWHeader.txt
company pagedetails accelerator date cohortname
1539, cohortname is patchy but otherwise great
 
SmallBatchTimingInfo.txt
conamestd accelerator date month year cohort quarter
171, everything is patchy
 
merging_work.txt
conamestd accelerator matched coname url cohort name date month Year Quarter
259, very clean file
 
additional_timing_info2-fixed.txt
companyname accelerator cohortname date month year season type
1524 (seems messy)
Same as: Formatted Timing Info2 wHeaderCleaned.txt <- Formatted Timing Info2 wHeader.txt
Coname Accelerator ResultDate ResultType CohortName
1524, fairly clean
 
Formatted Timing Info.txt
coname acceleratorname keyword url webpage predicted gooddata page_details full_date month year cohort_name notes prog_duration_wks actual_date actual_month actual_year season
1168, fairly clean
Same as: formatted_timing_final.txt
coname acceleratorname keyword url webpage predicted gooddata page_details full_date month year cohort_name notes prog_duration_wks actual_date actual_month actual_year season
1169
====Files in Summer 2018 with provenance====

Navigation menu