Changes

Jump to navigation Jump to search
3,028 bytes added ,  13:46, 16 June 2017
no edit summary
}}
==Project Objective==To create a master venture capital data set on which all other projects requiring venture capital data will be built.  Specifically, we will be building data to support the following:*Lists and counts by year as well as state, MSA, and city, zip code, etc.*Identify seed, early and later stage deals (first investment), dollars invested and rounds*Identify transactional VC dollars invested and rounds*Acquisitions and IPOs of VC backed firms*Identify when VC backed firms are "alive"*Geocode all portfolio companies*Geocode VC headquarters and branch offices*Compute distances between investors and their portfolio companies*Identify lead investors for portfolio companies*Identify top 100 VCs ==Retrieving data== Old notes are [[Old Venture Capital Data Work]] All pulls and processing scripts are in: E:\McNair\Projects\VC Database Each of the following has a .rpt, .ssh, and .txt file, and has the following constraints in its search:*Venture related deals*Round date in 1/1/1980 to 06/15/2017 The portfolio company (inc. round) based pulls also have:*Portfolio company nation = United States The datasets retrieved from SDC platinum (on June 13-15th 2017) are:*Portfolio companies (USVC1980-present.ssh) - attributes to be extracted from roundlevel information *Portfolio company descriptions - just the portco name and the long description. Custom processed.*Round-on-one-line (USVCRound1980-present) - processed using RoundOnOneLine.pl*Funds (USVCFund1980-present)*Firms (VCFirms1980-present-pull2) - includes branch office so attributes must be extracted*IPOs*M&A All files were processed with NormalizeFixedWidth.pl (after footers were removed) unless otherwise indicated. Some files required some minor post-processing to load into PostgreSQL. Issues included:*Firm level data didn't normalize correctly - had to adjust headers*Stray quotation in address line*Area code had a 1- in it*Some line counts were off by one or two Additional datafiles (in E:\McNair\Projects\VC Database):*GeocodedVCData.txt 43,724 records, tab-delimited with companynames but with "none" for some geocoords. ==Loading the data into SQL== The SQL script and load data are in: Z:\VentureCapitalData\SDCVCData The load script is: LoadData.sql ==Processing the base tables== The SQL scripts need to do the following:*Extract company infomation from roundbase then process it**Add variable for growth (1), transactional (2), or non-VC (3) (see below)**Create clean 5-digit ZIP field*Extract firm information from firmbase, then:**Add indicator variable for US**Create clean 5-digit ZIP field*Build out BranchOffice (BO) table*Build lookups for:**State Code**raisestatus ==Lookup Tables== ===Stage of Investment===  Growth VC (1) -------------- Seed Early Stage Later Stage  Tranactional VC (2) -------------- Acq. for Expansion Acquisition Bridge Loan Expansion Pending Acq Recap or Turnaround  Exclude (non-VC) (3) -------------- LBO MBO Open Market Purchase PIPE Secondary Buyout

Navigation menu