Changes

Jump to navigation Jump to search
1,671 bytes added ,  20:11, 17 November 2020
no edit summary
}}
<onlyinclude>The [[VCDB20Q3]] project documents a build of my VCDB -- '''V'''enture '''C'''apital '''D'''ata'''B'''ase -- covering until the end of 2020 Q3. Each VCDB includes investments, funds, startups, executives, exits, locations, and more, derived from data from [[VentureXpert]]. This project updates [[vcdb4]], which covered (almost) to the of Q3 2019, and replaces [[VCDB20H1]], which was a partial build. See also: [[SDC Normalizer]].</onlyinclude>
 
==Data design==
 
I followed the same data design as in [[VCDB20H1]]. Essentially the specification pulls everything, even things that aren't needed like incomplete M&As or withdrawn IPOs, or funds or investments that aren't venture capital, all the way to the present (the pull was done on 11/17/2020), and then place restrictions on the data later. Crucially, the pulls no longer use the venture-related flag, so the data contains private equity and other deals, and the data does contain secondaries and purchases. Note that the M&As are pulled separately for public and private acquirers, and in chunks by year to keep the request sizes manageable.
==Processing Steps==
USRoundOnOneLine1980.ssh
Update the paths and dates in the ssh files then run them(see [[SDC Platinum]])===Database import=== Run the [[SDC Normalizer]] on each of the files. For most of them, that's straightforward. You can safely ignore the Access Violation error messages that occur at the end of some pulls. However, the following require attention:*Fix the header row in USFirms1980.txt before normalizing (the Capital Under Management column name is too long)*Remove double quotes from USFund1980-normal.txt, USFundExecs1980-normal.txt, USPortCo1980-normal.txt*The private and public M&A files have to be combined after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t.*For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first then RoundOnOneLine.pl, and then fix the header.*The PortCo Long Description needs to be pre-processed from the command line (see [[VCDB20H1]] and [[Vcdb4#Long_Description]]) Create the dbase as a researcher: createdb vcdb20q3 Move the files to //mother/bulk/vcdb20q3 and run the load script: E:\projects\vcdb20q3\Load.sql

Navigation menu