Changes

Jump to navigation Jump to search
4,125 bytes added ,  16:44, 5 July 2021
*Firm: Firmname
Note that when normalizing Fundexecs use fundname, fundyear as the foreign keys. Generally, do not copy down keys unless they are foreign and blank in the source file. Also, clean up the round table and create the SEL flags, as well as the PortCoSEL table. And make sure that the end of period date (plus 1) is updated in PortCoAliveDead. ===Add the geocoding=== Load in the old geocoding and create data files for new Google Maps API runs (see [[Geocode.py]] for directions). Note that I separate out Growth and non-growth PortCo addresses so that I can get the growth ones first for the PortCos, and that there are also firm and firm branch office addresses to process (US only). The limit for free is 2,500 calls/day but the cost per call is pretty low. Note that PortCoGeoID and other tables are built in the BuildBaseTables.sql script. Change to Load script:*Retrieve and load ZCTA Gazetteer from the [https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html U.S. Census]*A small but meaningful proportion of US venture firms have zip codes but not addresses. I created the table firmbogeoplus to add longitude and latitudes for these firms in Load.sql (right after firmbogeo). A small number of US PortCos have zipcodes but not places (which are used in the Rankings and elsewhere). To address this, I loaded the 2010 ZCTA to place lookup in Load.sql (passing the result through TigerGeog) to get the placename. This is now incorporated into PortCoGeoid in BuildBaseTables.sql. 2010 Census data (2020 isn't available but the mappings are fairly static):*https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.2010.html*https://www2.census.gov/geo/pdfs/maps-data/data/rel/explanation_zcta_place_rel_10.pdf*https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_place_rel_10.txt I also load up the 2010 ANSI codes for places (i.e., statefp, statecode, placename, placefp, etc.): * Source: https://www.census.gov/library/reference/code-lists/ansi.html* Data: national_places.txt -> national_places_processed.txt (regexes)* Pop the last word off the placename! Note that the postprocessing of these two tables is done at the end of Load.sql. The final table is '''zctaplaceinfo''' which takes zcta, statecode and provides back placename and geoid (of the place). ===Load the Additions=== These include:*CPI (see the spreadsheet CPIEstimate.xlsx, source data from https://data.bls.gov/pdq/SurveyOutputServlet)*Population, statepop2017, and ACS. See [[American Community Survey (ACS) Data]].*tigerplaces, which builds tigergeog, and placedisplay. See [[Jeemin Sim (Work Log)]], [[Urban Start-up Agglomeration and Venture Capital Investment]] and [[Tiger Geocoder]]*Industry, Firm type, Firm stage, Title, statecode, and other lookup tables. *PortCoSBIR and PortCoPatent -- These are now out of date and don't appear to have build notes. ===Join in the exit data=== Bring in and clean up the IPO and MA (private target) data, match it to the PortCos, and reconcile the conflicts. This creates two core tables, one key table, and two PortCo results tables:*IpoCleanNoDups*MACleanNoDups*ExitKeys (IPO and MA only, not PortCo)*PortCoExit*PortCoAliveDead '''PortCoAliveDead''' follows the academic convention of marking a startup as alive if it has begun receiving investment and hasn't exited and if its last investment occurred less than five years ago. ==Base Tables== The base tables are built using BuildBaseTables.sql. These provide common foundations for:*PortCo Geography*Other PortCo tables, including industry, id, cpi adjustments, geoids.**Exit, Alive/dead and patents & SBIR/STTR grant (note that these need updating) information is also included.*Firm variables (note that the full build is only done for firms that make growth investments).*Round Line Joiner (RLJoiner) tables.*PortCo People, including gender, dr., titles, serials.*Fund People: gender and dr.*The Master tables:**PortCoMaster**PortCoPeopleMaster**FirmGrowthMaster (the Firm master table, for growth investments)**RLMaster Finally, this code also builds: MatchMostNumerous and MatchHighestRandom, which are used in [[Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists]]

Navigation menu