Changes

Jump to navigation Jump to search
no edit summary
{{Project|Has project output=Data|Has sponsor=McNair ProjectsCenter
|Has title=Merging Existing Data with Crunchbase
|Has owner=Connor Rothschild
The other columns can be added to the end of our sheet as supplemental data.
 
==SQL Scripts, Files, and Databases==
 
The contents of E:\McNair\Projects\Accelerators\Summer 2018\For Ed Merge July 17.xlsx where copied into CohortCosWcbuuid.txt (in the Accelerators folder, as well as Z:/crunchbase2, Z:/../vcdb2).
 
The script '''AddCBData.sql''' loads this data into '''crunchbase2'''. It then outputs the relevant crunchbase data into '''CBCohortData.txt'''
 
The script '''LoadAcceleratorDataV2.sql''' (see around line 305) loads both '''CohortCosWcbuuid.txt''' and '''CBCohortData.txt''' into the database '''vcdb2'''. It then produces a CohortCoExtended table, which is output to a file.
 
Note that '''CohortCoExtended.txt''' includes a variable GotVC, which takes the value 1 if the cohort company got VC and zero otherwise:
 
gotvc | count
-------+-------
0 | 11465
1 | 1504
(2 rows)
 
We now need to determine which cohort companies we have timing information for and which we don't - and use demo days to get the info we are missing!
 
==Getting Timing info for Companies Who Got VC==
 
Line 136 of
E:\McNair\Software\Database Scripts\Crunchbase2\CompanyMatchScript.sql
 
contains the code to find the companies which recieved VC but did not have timing info. There are 809 such companies. This table was exported into '''needtiminginfo.txt'''.
 
A list of distinct accelerators that we need timing data for was also created, which was given to [[Minh Le]]. There's 75 accelerators that need their timing doing.
 
Doing training data - 2,600 pages and are a little bit more than 1/2 way (~1500-1600).

Navigation menu