Changes

Jump to navigation Jump to search
{{Project|Has project output=|Has sponsor=McNair ProjectsCenter
|Has title=Ranking US Cities by Venture Capital
|Has owner=Ed Egan, Anne Dayton, Diana Carranza,
==Project Description==
This project was initially undertaken in the summer of 2017 and resulted in a full report on 2016. See: [[US Startup City Ranking]].
The data was then updated for 2017 and 2018 (Q1 and Q2), which resulted in a ranking spreadsheet and document, but not a full reportin the first rebuild using [[Restoring vcdb3|vcdb3]]. It was then updated again to include the first half of 2019, using [[Vcdb4]]. The third update includes everything up to the end of 2020 and uses [[VCDB20]].
==Third Rebuild== The third rebuild uses [[VCDB20]] and covers up until the end of 2020. It combines geocoding with city information when geocoding is unavailable. The code is in: E:\projects\vcdb20\Ranking.sql The outputs, including the xlsx file that puts everything together, are in: E:\projects\ranking The build implements the following decisions:*'''Placenames''' are from geocoding where possible and '''city''' names where not. *The '''ACS''' and '''Tigergeog''' tables are joined in using '''geoid''', which is determined from '''placenames''' (where possible).*A startup received growth VC if it one of its round stages was seed, early or later.*The ranking uses only growth VC (i.e., '''growthflag'''==1 on '''round''') and '''rounddate''' < '2021-01-01'.*Places must have non-zero VC in at least one year between 1980 and 2020, inclusive.*'''numalive''' calculates dead as exit==1 or date>=(datelastinv+5 years), and alive as date>=datefirstinv and dead==0.*'''newdeal''' is a first investment (i.e., irrespective of stage and one per startup)*Undisclosed amounts are treated as zeros. Amounts are in millions (unless otherwise stated). ===Datasets=== *PlaceYearRankingFull.txt (covers all places and years 1980-2020)*PlaceYearRanking200.txt (top 200 places for years 1980-2020)*StateYearRanking.txt (50 states + DC and PR for years 1980-2020) *PlaceYearRanking2020.txt (all places for 2020)*PlaceYearRanking200-2020.txt (top 200 places for 2020)*StateYearRanking2020.txt (50 states + DC and PR for 2020)  ===Artifacts and Facts=== The main artifacts are:*The Top 100 (and 200) for 2020*A graph of total US growth VC investment 1980-2020 Other artifacts/facts:*Fraction of data that is geocodable or has a valid placename*Fraction of data with disclosed amounts*Turnover of the Top 10, 20, 50, 100*Correlations between the three component measures (top 200?)*Cumulative Percentage of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time*Average Growth rate of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time*Focus on select cities: Rankings over time for Houston, St. Louis, Cincinnati, Boulder, Waltham, Palo Alto ==Second (Complete ) Rebuild== This build used [[Vcdb4]], see [[Vcdb4#Ranking]].
The data was completely rebuilt based on geocoded places (i.e., place names from Tiger) and was restricted to rounds of growth VC from the outset. In this way, it can be joined the [[American Community Survey (ACS) Data]], and it uses real, rather than self-reported cities (lots of people claim to be located in San Francisco while actually being in Alameda or South San Fran, etc.). Likewise, companies on the Emeryville side of the Emeryville-Berkeley border often claim to be in Berkeley.
A code section was added to E:\project\vcdb3\Ranking.sql to explore city growth rates.
The main problem with estimating the average of <math>\frac{dollars_t - dollars_{t-1} }{dollars_{t-1}}<\/math>, where dollars is growth VC dollars invested in a citcity-year, is a truncation issue. <math>dollars_{t-1}<\/math> is often zero, and it is impossible for the resulting value to be below -100% while it can be above +100%.
I experimented with various ways to limit this issue, including considering:
*Only cities that have had more than $10m in a given year
*City-year pairs with more than $10m
*The top 100, 50 , or 20 cities in 2017, for years from 1990 to 2017 or 2000 to 2017.
*Only cities that had $10m or more for every year from 2000 to 2017
The growth rate for cities that had $10m or more for every year from 2000 to 2017 was 0.334 per year, implying that such "established" ecosystems double their VC every three years. There are 47 such cities. The list of average growth rates by city for these cities is below. The average growth rate increases to 39.17% for the 61 cities which have $10m or more for at least all bar one years, and to 51.43% for the 72 cities that have it fro for all bar two years.
city | avg
(47 rows)
Note that if we exclude 2000 (i.e. consider 2001 to 2017 inclusive), the overall average growth rate drops to 0.2812, and some cities, like Houston, have materially lower growth rates. It is worth noting that Houston had one big up year in 2013, where growth VC investment levels where 2.168 times the previous year, but then suffered a about a 50% mean reversion in the following years. Without 2013, Houston's average 2001 to 2017 growth rate is - -0.0197349989630022.
city | state | avg

Navigation menu