[[VCDB24]] is the 2024 and final iteration of my [[VentureXpert]] based '''V'''enture '''C'''apital '''D'''ata'''B'''ase. Thomson-Reuters discontinued access to VentureXpert through [[SDC Platinum]] on December 31st, 2023 (see also: [[SDC Normalizer]]). This iteration contains data up until then. Each VCDB includes investments, funds, startups, executives, exits, locations, and more. The previous build was [[VCDB23]], but the best previous instructions are from [[VCDB20]] or the [[McNair Center]] build, which was called [[VentureXpert Data]].
== Processing Steps ==
===Geocoding===
Part of Load.sql requires that we update the Geocoding - adding new long and lat for PortCos and firm offices that we haven't seen before.
The last time this was run was vcdb20. Accordingly:
* In E:/tools/Geocode run the script(s): Geocode.py for portcos and GeocodeOneKey.py for everything else.
** Strip the header line out of the input file(s)
** python Geocode.py portcogrowthneedsgeo-NoHeader.txt* Get the latest Gazetteer file(s): https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-10files.html* Check the coverage of portcogeo and create firmbogeoplus ===Exits=== Another part of Load.sql does the matching to IPOs and MAs and their precedence. Note that:* Issuer and target names are matched against themselves using the [[Matcher.pl]] script in mode 2* PortCo stdnames are matched to issuerstd and targetstd (separately) in mode 0* The state and date matching requirements are in the load.sql.* There seems to have been duplicate issue records in the IPO data for vcdb2020 (and perhaps earlier). Some of the duplicates are often identical, except that the date is a day apart. * The IPO records also contain listings on junior and foreign exchanges, as well as some OTC - I left these in and flagged them. ===Industry=== The industry coding is in IndustryCodes.txt. Note that:* The code map had to be updated. The Excel file is in projects/vcdb24 but I didn't have the original counts to hand.* The codes are not unique (576 out of 585 are unique) at the industry subgroup 3 level.* The codes (code, code20, code100) should be joined using indclass, indminorgroup, indsubgroup* The codes are 1,2,4 but not 3dg hierarchical.* 1dg codes are IT, LS, and Other. * Note that code is the full 4dg industry identifier, where as code20 and code100 are name-based aggregates with at least 20 or 100 observations in them.{| class="wikitable" |-! 2dg Code! Minorcode! No. of PortCos|-| 11| Communications and Media| 3930|-| 12| Computer Hardware| 3058|-| 13| Computer Software and Services| 21157|-| 14| Internet Specific| 14440|-| 15| Semiconductors/Other Elect.| 3145|-| 21| Biotechnology| 4251|-| 22| Medical/Health| 7138|-| 31| Consumer Related| 7459|-| 32| Industrial/Energy| 7028|-| 33| Other Products| 14246|} I also tried some keyword industry coding from both short and long descriptions. The source code is at the top of BuildBaseTables.sql. The results are in sheets in the IndustryCodes.xlsx file. ===BuildBaseTables.sql=== Build the PortCoGrowthGeoId table that codes the city-state to a geoid. {| class="wikitable" |- style="vertical-align:bottom;"! origin! count! Method|-| style="vertical-align:bottom;" | 1| style="vertical-align:bottom;" | 45,111| style="color:#808080;" | Address is geocoded and in tiger place|-| style="vertical-align:middle; color:#808080;" | 2| style="vertical-align:bottom;" | 270| style="color:#808080;" | city, statecode matches to only 1 geoid, so use it|-| style="vertical-align:middle; color:#808080;" | 3| style="vertical-align:bottom;" | 1,374| style="color:#808080;" | city, statecode matches to multiple geoids, use the most popular|-| style="vertical-align:middle; color:#808080;" | 4| style="vertical-align:bottom;" | 964| style="color:#808080;" | 1:1 straight city<->place and statecode match with tiger|-| style="vertical-align:middle; color:#808080;" | 5| style="vertical-align:bottom;" | 509| style="color:#808080;" | Use zctaplaceinfo to lookup the best place choice for the zipcode|- style="vertical-align:bottom;"| style="vertical-align:middle; color:#808080;" | 6| 636| style="color:#808080;" | Unable to code|-| style="vertical-align:middle; color:#808080;" | 9| style="vertical-align:bottom;" | 24| style="color:#808080;" | Custom coded|} ===StartupCities=== The original Startup Cities code is in E:\projects\BayesianStartupCities\V1\startupcities.sql. The new version is in e:\projects\BayesianStartupCities\StartupCitiesV2.sql.