Difference between revisions of "VCDB24"

From edegan.com
Jump to navigation Jump to search
 
Line 3: Line 3:
 
== Processing Steps ==
 
== Processing Steps ==
  
# Copy over the rpt, ssh, and pl files, and bulk edit the ssh files, now in E:\projects\vcdb24\SDC.  
+
Get the source data:
## Change 12/31/2020 (and one 07/20/2020) to 12/31/2022 and vcdb20 to vcdb23
+
# Copy over the rpt, ssh, and pl files to E:\projects\vcdb24\SDC, and bulk edit the ssh files.  
# Run the ssh files against SDC Platinum. Note that SDC Platinum's service will be withdrawn on 31 December 2023.
+
## Make final date 12/31/2023 and change vcdb23 to vcdb24
 +
# Run the ssh files against SDC Platinum one last time on 31 December 2023.
 
# Run the [[SDC Normalizer]] script (one of the pl files) on each output
 
# Run the [[SDC Normalizer]] script (one of the pl files) on each output
 
## Fix the header row in USFirms1980.txt before normalizing (the Capital Under Management column name is too long)
 
## Fix the header row in USFirms1980.txt before normalizing (the Capital Under Management column name is too long)
Line 11: Line 12:
 
## The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each.
 
## The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each.
 
## For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header.
 
## For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header.
## PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see VCDB20H1 and Vcdb4#Long_Description). However, I didn't load it for this run.
+
## PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see [[VCDB20H1]] and [[Vcdb4#Long_Description]]).
# Create a new database on mother (createdb vcdb23) and setup a directory for the input files: E:\projects\vcdb23
+
 
# Copy over and edit Load.sql. Run it section-by-section.
+
Create the postgres database:
 +
# Create a new database on mother (createdb vcdb24) and set up a directory for the input files: bulk\vcdb24
 +
# Copy over (to sql folder) and edit Load.sql. Run it section-by-section.

Latest revision as of 00:19, 30 December 2023

VCDB24 is the 2024 and final iteration of my VentureXpert based Venture Capital DataBase. Thomson-Reuters discontinued access to VentureXpert through SDC Platinum on December 31st, 2023 (see also: SDC Normalizer). This iteration contains data up until then. Each VCDB includes investments, funds, startups, executives, exits, locations, and more. The previous build was VCDB23, but the best previous instructions are from VCDB20.

Processing Steps

Get the source data:

  1. Copy over the rpt, ssh, and pl files to E:\projects\vcdb24\SDC, and bulk edit the ssh files.
    1. Make final date 12/31/2023 and change vcdb23 to vcdb24
  2. Run the ssh files against SDC Platinum one last time on 31 December 2023.
  3. Run the SDC Normalizer script (one of the pl files) on each output
    1. Fix the header row in USFirms1980.txt before normalizing (the Capital Under Management column name is too long)
    2. Remove double quotes from USFund1980-normal.txt, USFundExecs1980-normal.txt, USPortCo1980-normal.txt, USFirmBranchOffices1980.txt
    3. The private and public M&A file sets have to be separately combined into 2 files after they've been normalized. Then replace \tnp\t and \tnm\t with \t\t in each.
    4. For RoundOnOneLine, remove the footer, run NormalizeFixedWidth.pl first, then RoundOnOneLine.pl, and then fix the header.
    5. PortCoLongDescription must be pre-processed from the command line and then post-processed in excel (see VCDB20H1 and Vcdb4#Long_Description).

Create the postgres database:

  1. Create a new database on mother (createdb vcdb24) and set up a directory for the input files: bulk\vcdb24
  2. Copy over (to sql folder) and edit Load.sql. Run it section-by-section.