Changes

Jump to navigation Jump to search
This project was initially undertaken in the summer of 2017 and resulted in a full report on 2016. See: [[US Startup City Ranking]].
The data was then updated for 2017 and 2018 (Q1 and Q2), which resulted in a ranking spreadsheet and document, but not a full reportin the first rebuild using [[Restoring vcdb3|vcdb3]]. It was then updated again to include the first half of 2019, using [[Vcdb4]]. The third update includes everything up to the end of 2020 and uses [[VCDB20]].
==Third Rebuild== The third rebuild uses [[VCDB20]] and covers up until the end of 2020. It combines geocoding with city information when geocoding is unavailable. The code is in: E:\projects\vcdb20\Ranking.sql The outputs, including the xlsx file that puts everything together, are in: E:\projects\ranking The build implements the following decisions:*'''Placenames''' are from geocoding where possible and '''city''' names where not. *The '''ACS''' and '''Tigergeog''' tables are joined in using '''geoid''', which is determined from '''placenames''' (where possible).*A startup received growth VC if it one of its round stages was seed, early or later.*The ranking uses only growth VC (i.e., '''growthflag'''==1 on '''round''') and '''rounddate''' < '2021-01-01'.*Places must have non-zero VC in at least one year between 1980 and 2020, inclusive.*'''numalive''' calculates dead as exit==1 or date>=(datelastinv+5 years), and alive as date>=datefirstinv and dead==0.*'''newdeal''' is a first investment (i.e., irrespective of stage and one per startup)*Undisclosed amounts are treated as zeros. Amounts are in millions (unless otherwise stated). ===Datasets=== *PlaceYearRankingFull.txt (covers all places and years 1980-2020)*PlaceYearRanking200.txt (top 200 places for years 1980-2020)*StateYearRanking.txt (50 states + DC and PR for years 1980-2020) *PlaceYearRanking2020.txt (all places for 2020)*PlaceYearRanking200-2020.txt (top 200 places for 2020)*StateYearRanking2020.txt (50 states + DC and PR for 2020)  ===Artifacts and Facts=== The main artifacts are:*The Top 100 (and 200) for 2020*A graph of total US growth VC investment 1980-2020 Other artifacts/facts:*Fraction of data that is geocodable or has a valid placename*Fraction of data with disclosed amounts*Turnover of the Top 10, 20, 50, 100*Correlations between the three component measures (top 200?)*Cumulative Percentage of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time*Average Growth rate of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time*Focus on select cities: Rankings over time for Houston, St. Louis, Cincinnati, Boulder, Waltham, Palo Alto ==Second (Complete ) Rebuild== This build used [[Vcdb4]], see [[Vcdb4#Ranking]].
The data was completely rebuilt based on geocoded places (i.e., place names from Tiger) and was restricted to rounds of growth VC from the outset. In this way, it can be joined the [[American Community Survey (ACS) Data]], and it uses real, rather than self-reported cities (lots of people claim to be located in San Francisco while actually being in Alameda or South San Fran, etc.). Likewise, companies on the Emeryville side of the Emeryville-Berkeley border often claim to be in Berkeley.
A code section was added to E:\project\vcdb3\Ranking.sql to explore city growth rates.
The main problem with estimating the average of <math>\frac{dollars_t - dollars_{t-1} }{dollars_{t-1}}</math>, where dollars is growth VC dollars invested in a city-year, is a truncation issue. <math>dollars_{t-1}</math> is often zero, and it is impossible for the resulting value to be below -100% while it can be above +100%.
I experimented with various ways to limit this issue, including considering:
(47 rows)
Note that if we exclude 2000 (i.e. consider 2001 to 2017 inclusive), the overall average growth rate drops to 0.2812, and some cities, like Houston, have materially lower growth rates. It is worth noting that Houston had one big up year in 2013, where growth VC investment levels where 2.168 times the previous year, but then suffered a about a 50% mean reversion in the following years. Without 2013, Houston's average 2001 to 2017 growth rate is - -0.0197349989630022.
city | state | avg

Navigation menu