Changes

Jump to navigation Jump to search
no edit summary
This page descripts the specification for retrieving <onlyinclude>[[Retrieving US VC Data From SDC]] requires SDC platinum, perl scripts and SQL scripts, as well as some manual processing.
==Scriptsand other info==
SDC and Perl Scripts are in:
SQL Scripts and finished data are in:
Z:\VentureCapitalData\SDCVCData
 
==Notes==
Notes:
*Firms includes branch office so attributes must be extracted
*Portfolio company descriptions - just the portco name, state, date of first inv, and the long description - has to be custom processed.
</onlyinclude>
Some files required some minor post-processing to load into PostgreSQL. Issues included:
*Firm level data didn't normalize correctly - had to adjust headers
*Stray quotation in address line(s)
*Area code had a 1- in it
*Some line counts were off by one or two
*"Firm Capital under Mgmt" column header for VCFirms has a {0mil} which screws up the normalizer. Delete this part of the column title prior to running normalizer and make sure to put in the proper number of spaces.
*VCFirms line 7139 the text 'L"opera' has a stray quotation mark which will prevent the copy into psql table. Remove stray quotation manually.
*VCFirms line 12461 the text '1-8' has a hyphen which creates and import error into psql table. Remove the hyphen manually.
==Build Specs==

Navigation menu