Venture Capital Databases

Jump to navigation Jump to search

Venture Capital Databases
Project logo 02.png
Project Information
Has title Venture Capital Databases
Has start date
Has deadline date
Has keywords Data
Has project status Complete
Has sponsor McNair Center
Has project output Data
Copyright © 2019 All Rights Reserved.

Resources on this Wiki

The main source of venture capital that we will use is Thomson Reuter's SDC Platinum platform. This contains VentureXpert. Some old Data Dictionaries are available for those interested in variable lists. There is a perl script based tool to reprocess the SDC data called The latest version of this script needs locating, updating and documenting (the one on the wiki is obsolete).

History of VC Surveys by Major Firms

VentureXpert, VentureOne, VentureSource...

Sandhill Econometrics provides estimated valuation data to VentureSource...

Databases for Venture Capital Investments

In alphabetic order:

  • Bright*Sun [1]
    • hasn't come to market yet
    • data collection technology using algorithims and models on loose data
    • recommendation based web platform
  • CB Insights [2]
    • proprietary machine learning software on VC deals and exits
    • 130,000 + sources
    • data on revenue and valuations
  • Crunchbase (by Techcrunch)[3]
    • opensource/content
    • search functionality for organizations, people, events, and products
    • detailed company portfolios
    • data includes acquisitions, funding rounds, valuation and investor information
    • platform for upcoming tech events
  • Dow Jones [4]
    • Proprietary
    • CompensationPro and VentureWire-- subscription based data centers
  • Gust [5]
    • collaborative platform between investors and startups
    • discuss, track, review, and share deals
  • MassInvestor [6]
    • questionable coverage, local or regional databases available for purchase
    • covers complete range of private capital investors
  • Mattermark [7]
    • open but with premium content
    • datasource utilizing machine learning, web crawlers, primary sources and natural language processing from new articles and websites
    • built in API application
  • Pitchbook [8]
    • membership based
    • comprehensive data on full life-cycle of VC, PE, and M&A
    • gathers data sources through web crawlers, machine learning, natural language processing, and surveys
  • Preqin [9]
    • 80,000 deals, 1,700 funds (if which 480 actively raising)
    • global data center
    • web-based research center on VC trends benchmarks with visualization tools
  • Thomson Reuters [10]
    • gold standard for academic research
    • SDC Platinum large comprehensive data platform
  • VC Experts [11]
    • open with premium tools (must subscribe to download)
    • access to private company data
    • web platform including filtering features and visualization tools

Aggregate data

The main source of aggregate data is PWC MoneyTree ( PWC MoneyTree uses the Thomson-Reuters data and also compiles the annual survey for the National Venture Capital Association (NVCA)[12].

Quandl provide summary data on angel investment, venture capital, and start-up valuations ( For North America, the underlying data providers are:

Quandl also draw data from the Kauffman Foundation and Crunchbas. They have an API[14].

Cooley is a leading international law firm in representing both emerging growth companies and venture capital funds in venture capital financings. Cooley does third-party valuations for venture capital backed firms[15]. They produce a report on their valuations[16].

The Center for Venture Research is a multidisciplinary research unit of the Peter T. Paul College of Business and Economics at the University of New Hampshire. The Center, founded in 1984, studies early stage equity financing for high growth ventures[17]. It is run by Jeffrey Sohl, and is noteworthy for its publication of statistics on Angel investment[18].

Other Sources of Data

Scott Stern and Jorge Guzman (Where Is Silicon Valley (2014)) have collected data on business registration for California, New York, Texas, Massachusetts, Florida, and Washington. Data from California was collected via the Secretary of State. This data set was combined with data from SDC Platinum and the U.S. Patent Office.