Difference between revisions of "Venture Capital (Data)"
| Line 9: | Line 9: | ||
}} | }} | ||
| − | [[Old Venture Capital Data Work]] | + | ==Project Objective== |
| + | To create a master venture capital data set on which all other projects requiring venture capital data will be built. | ||
| + | |||
| + | Specifically, we will be building data to support the following: | ||
| + | *Lists and counts by year as well as state, MSA, and city, zip code, etc. | ||
| + | *Identify seed, early and later stage deals (first investment), dollars invested and rounds | ||
| + | *Identify transactional VC dollars invested and rounds | ||
| + | *Acquisitions and IPOs of VC backed firms | ||
| + | *Identify when VC backed firms are "alive" | ||
| + | *Geocode all portfolio companies | ||
| + | *Geocode VC headquarters and branch offices | ||
| + | *Compute distances between investors and their portfolio companies | ||
| + | *Identify lead investors for portfolio companies | ||
| + | *Identify top 100 VCs | ||
| + | |||
| + | ==Retrieving data== | ||
| + | |||
| + | Old notes are [[Old Venture Capital Data Work]] | ||
| + | |||
| + | All pulls and processing scripts are in: | ||
| + | E:\McNair\Projects\VC Database | ||
| + | |||
| + | Each of the following has a .rpt, .ssh, and .txt file, and has the following constraints in its search: | ||
| + | *Venture related deals | ||
| + | *Round date in 1/1/1980 to 06/15/2017 | ||
| + | |||
| + | The portfolio company (inc. round) based pulls also have: | ||
| + | *Portfolio company nation = United States | ||
| + | |||
| + | The datasets retrieved from SDC platinum (on June 13-15th 2017) are: | ||
| + | *Portfolio companies (USVC1980-present.ssh) - attributes to be extracted from roundlevel information | ||
| + | *Portfolio company descriptions - just the portco name and the long description. Custom processed. | ||
| + | *Round-on-one-line (USVCRound1980-present) - processed using RoundOnOneLine.pl | ||
| + | *Funds (USVCFund1980-present) | ||
| + | *Firms (VCFirms1980-present-pull2) - includes branch office so attributes must be extracted | ||
| + | *IPOs | ||
| + | *M&A | ||
| + | |||
| + | All files were processed with NormalizeFixedWidth.pl (after footers were removed) unless otherwise indicated. Some files required some minor post-processing to load into PostgreSQL. Issues included: | ||
| + | *Firm level data didn't normalize correctly - had to adjust headers | ||
| + | *Stray quotation in address line | ||
| + | *Area code had a 1- in it | ||
| + | *Some line counts were off by one or two | ||
| + | |||
| + | Additional datafiles (in E:\McNair\Projects\VC Database): | ||
| + | *GeocodedVCData.txt 43,724 records, tab-delimited with companynames but with "none" for some geocoords. | ||
| + | |||
| + | ==Loading the data into SQL== | ||
| + | |||
| + | The SQL script and load data are in: | ||
| + | Z:\VentureCapitalData\SDCVCData | ||
| + | |||
| + | The load script is: | ||
| + | LoadData.sql | ||
| + | |||
| + | ==Processing the base tables== | ||
| + | |||
| + | The SQL scripts need to do the following: | ||
| + | *Extract company infomation from roundbase then process it | ||
| + | **Add variable for growth (1), transactional (2), or non-VC (3) (see below) | ||
| + | **Create clean 5-digit ZIP field | ||
| + | *Extract firm information from firmbase, then: | ||
| + | **Add indicator variable for US | ||
| + | **Create clean 5-digit ZIP field | ||
| + | *Build out BranchOffice (BO) table | ||
| + | *Build lookups for: | ||
| + | **State Code | ||
| + | **raisestatus | ||
| + | |||
| + | ==Lookup Tables== | ||
| + | |||
| + | ===Stage of Investment=== | ||
| + | |||
| + | Growth VC (1) | ||
| + | -------------- | ||
| + | Seed | ||
| + | Early Stage | ||
| + | Later Stage | ||
| + | |||
| + | Tranactional VC (2) | ||
| + | -------------- | ||
| + | Acq. for Expansion | ||
| + | Acquisition | ||
| + | Bridge Loan | ||
| + | Expansion | ||
| + | Pending Acq | ||
| + | Recap or Turnaround | ||
| + | |||
| + | Exclude (non-VC) (3) | ||
| + | -------------- | ||
| + | LBO | ||
| + | MBO | ||
| + | Open Market Purchase | ||
| + | PIPE | ||
| + | Secondary Buyout | ||
Revision as of 12:46, 16 June 2017
| Venture Capital (Data) | |
|---|---|
| Project Information | |
| Project Title | Venture Capital (Data) |
| Owner | Adrian Smart, Jake Silberman |
| Start Date | |
| Deadline | |
| Keywords | Data |
| Primary Billing | |
| Notes | |
| Has project status | Active |
| Copyright © 2016 edegan.com. All Rights Reserved. | |
Contents
Project Objective
To create a master venture capital data set on which all other projects requiring venture capital data will be built.
Specifically, we will be building data to support the following:
- Lists and counts by year as well as state, MSA, and city, zip code, etc.
- Identify seed, early and later stage deals (first investment), dollars invested and rounds
- Identify transactional VC dollars invested and rounds
- Acquisitions and IPOs of VC backed firms
- Identify when VC backed firms are "alive"
- Geocode all portfolio companies
- Geocode VC headquarters and branch offices
- Compute distances between investors and their portfolio companies
- Identify lead investors for portfolio companies
- Identify top 100 VCs
Retrieving data
Old notes are Old Venture Capital Data Work
All pulls and processing scripts are in:
E:\McNair\Projects\VC Database
Each of the following has a .rpt, .ssh, and .txt file, and has the following constraints in its search:
- Venture related deals
- Round date in 1/1/1980 to 06/15/2017
The portfolio company (inc. round) based pulls also have:
- Portfolio company nation = United States
The datasets retrieved from SDC platinum (on June 13-15th 2017) are:
- Portfolio companies (USVC1980-present.ssh) - attributes to be extracted from roundlevel information
- Portfolio company descriptions - just the portco name and the long description. Custom processed.
- Round-on-one-line (USVCRound1980-present) - processed using RoundOnOneLine.pl
- Funds (USVCFund1980-present)
- Firms (VCFirms1980-present-pull2) - includes branch office so attributes must be extracted
- IPOs
- M&A
All files were processed with NormalizeFixedWidth.pl (after footers were removed) unless otherwise indicated. Some files required some minor post-processing to load into PostgreSQL. Issues included:
- Firm level data didn't normalize correctly - had to adjust headers
- Stray quotation in address line
- Area code had a 1- in it
- Some line counts were off by one or two
Additional datafiles (in E:\McNair\Projects\VC Database):
- GeocodedVCData.txt 43,724 records, tab-delimited with companynames but with "none" for some geocoords.
Loading the data into SQL
The SQL script and load data are in:
Z:\VentureCapitalData\SDCVCData
The load script is:
LoadData.sql
Processing the base tables
The SQL scripts need to do the following:
- Extract company infomation from roundbase then process it
- Add variable for growth (1), transactional (2), or non-VC (3) (see below)
- Create clean 5-digit ZIP field
- Extract firm information from firmbase, then:
- Add indicator variable for US
- Create clean 5-digit ZIP field
- Build out BranchOffice (BO) table
- Build lookups for:
- State Code
- raisestatus
Lookup Tables
Stage of Investment
Growth VC (1) -------------- Seed Early Stage Later Stage
Tranactional VC (2) -------------- Acq. for Expansion Acquisition Bridge Loan Expansion Pending Acq Recap or Turnaround
Exclude (non-VC) (3) -------------- LBO MBO Open Market Purchase PIPE Secondary Buyout