Difference between revisions of "Venture Capital (Data)"

From edegan.com
Jump to navigation Jump to search
Line 82: Line 82:
 
  CREATE TABLE companybase AS
 
  CREATE TABLE companybase AS
 
  SELECT DISTINCT  
 
  SELECT DISTINCT  
  coname,updateddate,foundingdate,datelastinv,datefirstinv,investedk,city,description,msa,msacode,nationcode,statecode,addr1,a
+
  coname,updateddate,foundingdate,datelastinv,datefirstinv,investedk,city,description,msa,msacode,nationcode,statecode,addr1,addr2,indclass,indsubgroup3,indminor,url,zip  
ddr2,indclass,indsubgroup3,indminor,url,zip  
 
 
  FROM roundbase
 
  FROM roundbase
 
  ORDER BY coname;
 
  ORDER BY coname;

Revision as of 18:18, 22 June 2017


McNair Project
Venture Capital (Data)
Project logo 02.png
Project Information
Project Title Venture Capital (Data)
Owner Adrian Smart, Jake Silberman
Start Date
Deadline
Keywords Data
Primary Billing
Notes
Has project status Active
Copyright © 2016 edegan.com. All Rights Reserved.


Project Objective

To create a master venture capital data set on which all other projects requiring venture capital data will be built.

Specifically, we will be building data to support the following:

  • Lists and counts by year as well as state, MSA, and city, zip code, etc.
  • Identify seed, early and later stage deals (first investment), dollars invested and rounds
  • Identify transactional VC dollars invested and rounds
  • Acquisitions and IPOs of VC backed firms
  • Identify when VC backed firms are "alive"
  • Geocode all portfolio companies
  • Geocode VC headquarters and branch offices
  • Compute distances between investors and their portfolio companies
  • Identify lead investors for portfolio companies
  • Identify top 100 VCs

Retrieving data

Old notes are Old Venture Capital Data Work

Retrieving US VC Data From SDC requires SDC platinum, perl scripts and SQL scripts, as well as some manual processing.

Scripts and other info

SDC and Perl Scripts are in:

E:\McNair\Projects\VC Database

This includes:

  • NormalizeFixedWidth.pl
  • RoundOnOneLine.pl - Needed for round on one line

SQL Scripts and finished data are in:

Z:\VentureCapitalData\SDCVCData

Notes:

  • Portfolio companies attributes to be extracted from roundlevel information
  • Round-on-one-line processed using RoundOnOneLine.pl and NormalizeFixedWidth.pl
  • Firms includes branch office so attributes must be extracted
  • Portfolio company descriptions - just the portco name, state, date of first inv, and the long description - has to be custom processed.

The next steps are detailed in VC Database Rebuild.


Additional datafiles (in E:\McNair\Projects\VC Database):

  • GeocodedVCData.txt 43,724 records, tab-delimited with companynames but with "none" for some geocoords.

Processing the base tables

The SQL scripts need to do the following:

  • Extract company infomation from roundbase then process it
    • Add variable for growth (1), transactional (2), or non-VC (3) (see below)
    • Create clean 5-digit ZIP field
  • Extract firm information from firmbase, then:
    • Add indicator variable for US
    • Create clean 5-digit ZIP field
  • Build out BranchOffice (BO) table
  • Build lookups for:
    • State Code
    • raisestatus

Lookup Tables

Stage of Investment

We are generating 6 binary indicators as follows:

Growth VC (4)
--------------
Seed (1)
Early Stage (2)
Later Stage (3)
Tranactional VC (5)
--------------
Acq. for Expansion
Acquisition
Bridge Loan
Expansion
Pending Acq
Recap or Turnaround
Exclude (non-VC) (6)
--------------
LBO
MBO
Open Market Purchase
PIPE
Secondary Buyout
VC Partnership
Other

Building companybase and round

Both are built from roundbase. The scripts are below.

CREATE TABLE companybase AS
SELECT DISTINCT 
coname,updateddate,foundingdate,datelastinv,datefirstinv,investedk,city,description,msa,msacode,nationcode,statecode,addr1,addr2,indclass,indsubgroup3,indminor,url,zip 
FROM roundbase
ORDER BY coname;
CREATE TABLE round AS
SELECT DISTINCT coname,statecode,datefirstinv,rounddate,stage1,stage3,rndamtdisck,rndamtestk,roundnum,numinvestors
FROM roundbase
ORDER BY coname;