Changes

Jump to navigation Jump to search
{{Project|Has project output=|Has sponsor=McNair ProjectsCenter
|Has title=Ranking US Cities by Venture Capital
|Has owner=Ed Egan, Anne Dayton, Diana Carranza,
|Has start date=Summer 2017
|Has project status=Active
|Does subsume=Top Cities for VC Backed Companies,
}}
 
==Project Description==
 
This project was initially undertaken in the summer of 2017 and resulted in a full report on 2016. See: [[US Startup City Ranking]].
 
The data was then updated for 2017 and 2018 (Q1 and Q2), which resulted in a ranking spreadsheet and document, but not a full report in the first rebuild using [[Restoring vcdb3|vcdb3]]. It was then updated again to include the first half of 2019, using [[Vcdb4]]. The third update includes everything up to the end of 2020 and uses [[VCDB20]].
 
==Third Rebuild==
 
The third rebuild uses [[VCDB20]] and covers up until the end of 2020. It combines geocoding with city information when geocoding is unavailable. The code is in:
E:\projects\vcdb20\Ranking.sql
 
The outputs, including the xlsx file that puts everything together, are in:
E:\projects\ranking
 
The build implements the following decisions:
*'''Placenames''' are from geocoding where possible and '''city''' names where not.
*The '''ACS''' and '''Tigergeog''' tables are joined in using '''geoid''', which is determined from '''placenames''' (where possible).
*A startup received growth VC if it one of its round stages was seed, early or later.
*The ranking uses only growth VC (i.e., '''growthflag'''==1 on '''round''') and '''rounddate''' < '2021-01-01'.
*Places must have non-zero VC in at least one year between 1980 and 2020, inclusive.
*'''numalive''' calculates dead as exit==1 or date>=(datelastinv+5 years), and alive as date>=datefirstinv and dead==0.
*'''newdeal''' is a first investment (i.e., irrespective of stage and one per startup)
*Undisclosed amounts are treated as zeros. Amounts are in millions (unless otherwise stated).
 
===Datasets===
 
*PlaceYearRankingFull.txt (covers all places and years 1980-2020)
*PlaceYearRanking200.txt (top 200 places for years 1980-2020)
*StateYearRanking.txt (50 states + DC and PR for years 1980-2020)
*PlaceYearRanking2020.txt (all places for 2020)
*PlaceYearRanking200-2020.txt (top 200 places for 2020)
*StateYearRanking2020.txt (50 states + DC and PR for 2020)
 
===Artifacts and Facts===
 
The main artifacts are:
*The Top 100 (and 200) for 2020
*A graph of total US growth VC investment 1980-2020
 
Other artifacts/facts:
*Fraction of data that is geocodable or has a valid placename
*Fraction of data with disclosed amounts
*Turnover of the Top 10, 20, 50, 100
*Correlations between the three component measures (top 200?)
*Cumulative Percentage of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time
*Average Growth rate of Growth VC, new deals, and alive by city (top 10, 20, 50, and 100) over time
*Focus on select cities: Rankings over time for Houston, St. Louis, Cincinnati, Boulder, Waltham, Palo Alto
 
==Second (Complete) Rebuild==
 
This build used [[Vcdb4]], see [[Vcdb4#Ranking]].
 
The data was completely rebuilt based on geocoded places (i.e., place names from Tiger) and was restricted to rounds of growth VC from the outset. In this way, it can be joined the [[American Community Survey (ACS) Data]], and it uses real, rather than self-reported cities (lots of people claim to be located in San Francisco while actually being in Alameda or South San Fran, etc.). Likewise, companies on the Emeryville side of the Emeryville-Berkeley border often claim to be in Berkeley.
 
==First code rebuild==
 
The constraint that a city had to have raised $100m in SEL VC over the last 34 years was too onerous for some of the very new ecosystems and was removed from the last build.
 
The build file is:
E:\project\vcdb3\Ranking.sql
 
This file replaces both RoundRanking.sql and RoundRankingState.sql, and fixes some minor issues with the latter having overwritten tables from the former.
 
The output files are:
*Ranking2016WMove.txt
*Ranking2017WMove.txt
*Ranking2018Q12WMove.txt
*Ranking2016WMoveState.txt
*Ranking2017WMoveState.txt
*Ranking2018Q12WMoveState.txt
*USSelamnt.txt
 
They are produced in Z:/vcdb3
 
===City Growth Rates===
 
A code section was added to E:\project\vcdb3\Ranking.sql to explore city growth rates.
 
The main problem with estimating the average of <math>\frac{dollars_t - dollars_{t-1} }{dollars_{t-1}}</math>, where dollars is growth VC dollars invested in a city-year, is a truncation issue. <math>dollars_{t-1}</math> is often zero, and it is impossible for the resulting value to be below -100% while it can be above +100%.
 
I experimented with various ways to limit this issue, including considering:
*Only cities that have had more than $10m in a given year
*City-year pairs with more than $10m
*The top 100, 50, or 20 cities in 2017, for years from 1990 to 2017 or 2000 to 2017.
*Only cities that had $10m or more for every year from 2000 to 2017
 
The growth rate for cities that had $10m or more for every year from 2000 to 2017 was 0.334 per year, implying that such "established" ecosystems double their VC every three years. There are 47 such cities. The list of average growth rates by city for these cities is below. The average growth rate increases to 39.17% for the 61 cities which have $10m or more for at least all bar one years, and to 51.43% for the 72 cities that have it for all bar two years.
 
city | avg
---------------------+--------------------
Malvern | 0.993156571636778
La Jolla | 0.610594418156654
Woburn | 0.603879780755441
Scottsdale | 0.575267305240787
Hayward | 0.541730108455613
Carlsbad | 0.523716703358772
Pleasanton | 0.509316345882909
Denver | 0.501050394443719
Burlingame | 0.500020365810236
Watertown | 0.472356466716033
Bedford | 0.463504586546262
Minneapolis | 0.458034053534397
Menlo Park | 0.456066638196533
North Waltham | 0.443317946821974
Palo Alto | 0.425111728982036
Emeryville | 0.421690271365118
Boston | 0.412678555600625
Seattle | 0.381774873244813
Los Gatos | 0.374521134293509
Lexington | 0.369969174443311
Los Angeles | 0.365024808593637
Atlanta | 0.351405016195985
South San Francisco | 0.321407553118583
Durham | 0.298104095936328
Morrisville | 0.297076835403358
Pittsburgh | 0.285423923204106
Boulder | 0.284022977435934
Rockville | 0.254953079027088
Campbell | 0.252975908859599
Waltham | 0.235784276400949
San Francisco | 0.231574232838588
Cambridge | 0.226097618461997
Burlington | 0.225559858389355
Chicago | 0.222048532754601
New York | 0.222000093754721
Irvine | 0.208556113551383
Houston | 0.206415333165726
Fremont | 0.204771682823134
San Mateo | 0.196399732252421
Cupertino | 0.18040546805283
Mountain View | 0.159432352723845
San Jose | 0.152962850233141
Redwood City | 0.0888134263293376
San Diego | 0.0711881806017128
Santa Clara | 0.0566618949392164
Sunnyvale | 0.0318692877947854
Austin | 0.0254452013443905
(47 rows)
 
Note that if we exclude 2000 (i.e. consider 2001 to 2017 inclusive), the overall average growth rate drops to 0.2812, and some cities, like Houston, have materially lower growth rates. It is worth noting that Houston had one big up year in 2013, where growth VC investment levels where 2.168 times the previous year, but then suffered a about a 50% mean reversion in the following years. Without 2013, Houston's average 2001 to 2017 growth rate is -0.0197349989630022.
 
city | state | avg
---------------------+-------+----------------------
La Jolla | CA | 0.680031123241241
Woburn | MA | 0.64163859928389
Malvern | PA | 0.548552704731425
Menlo Park | CA | 0.504383164761797
Bedford | MA | 0.504211517175831
Hayward | CA | 0.49942185834955
Carlsbad | CA | 0.488399093032109
North Waltham | MA | 0.479226357065314
Minneapolis | MN | 0.473356376707567
Scottsdale | AZ | 0.434025239485977
Watertown | MA | 0.409908014646241
Emeryville | CA | 0.403333541766609
Pleasanton | CA | 0.402025316290345
Burlingame | CA | 0.401579348905957
Seattle | WA | 0.398842474741046
Boston | MA | 0.378228024530469
Morrisville | NC | 0.354372777597948
Palo Alto | CA | 0.350685408163029
Atlanta | GA | 0.336685642867238
Denver | CO | 0.332644416223719
Los Angeles | CA | 0.304585103537747
Boulder | CO | 0.28880663996075
South San Francisco | CA | 0.248595987210001
Campbell | CA | 0.245514436831657
Rockville | MD | 0.24315823017573
Los Gatos | CA | 0.233216007276288
Pittsburgh | PA | 0.225499709366802
Burlington | MA | 0.223634772577002
San Francisco | CA | 0.219993519832261
Lexington | MA | 0.205000361475032
New York | NY | 0.203868079937506
Chicago | IL | 0.203682874006923
Cupertino | CA | 0.191836717363744
San Mateo | CA | 0.187723172713732
Mountain View | CA | 0.166926239441008
Irvine | CA | 0.16644409093795
Fremont | CA | 0.150247623895727
Cambridge | MA | 0.140990253943638
Houston | TX | 0.109011554442698
San Diego | CA | 0.067832637853121
Redwood City | CA | 0.0666242814798974
San Jose | CA | 0.0471224685794607
Durham | NC | 0.0331127017373814
Waltham | MA | 0.0258917169868607
Austin | TX | 0.00294040411546173
Santa Clara | CA | -0.00296623469124349
Sunnyvale | CA | -0.00667850429419462
(47 rows)
 
==2017 and 2018 Q1/2 updates==
 
The final files are in
E:\mcnair\Projects\VentureXpert Database
 
They include:
*US and TX SEL Amount - Done.xlsx
*Master Tables Sheet.docx
*Top 15 States (Based on All Time SELAmnt) SEL vs Year - Done.xlsx
*TX Cities Graphs and Data - Done.xlsx
*Top 100 Cities 2017 - Done.xlsx
*All States 2018 (Q1 and Q2) - Done.xlsx
*All States 2017 - Done.xlsx
*Top 100 Cities 2018 (Q1 And 2) - Done.xlsx
 
The database is '''vcdb3''' and the SQL scripts are in
E:\mcnair\Projects\VentureXpert Database\vcdb3\LoadingScripts
The database build is documented in [[VentureXpert Data#Ranking Tables and Graphs]]
 
==2016 Report==
 
===Project Location===
 
E:\McNair\Projects\Ecosystem\Ranking
 
The report is available from that directory as well https://www.bakerinstitute.org/research/top-100-us-startup-cities-2016/.
 
====Main SQL file location====
 
Ranking.sql
 
This file works on the database '''cities'''.
 
However, the cities database takes vc data from '''vcdb2'''. See [[VC Database Rebuild]] for information on the build.
 
The output from vcdb2 is '''roundleveloutput2''', which is exported as roundleveloutput2.txt
 
roundleveloutput2.txt is imported into cities as '''roundleveloutput2''', and then left joined with '''populationtablecorrected'''.
 
====Last Population SQL file location====
Z:\Hubs\2017\sql scripts
 
Z:\Hubs\2017\clean data\Population
 
One issue is that cities without populations in populationtablecorrected but in roundleveloutput2 get dropped later. For example, roundleveloutput2 contains Brooklyn, NY but populationtablecorrected doesn't (its New York count is therefore suspect too).
 
'''populationtablecorrected''' is loaded from files 'population1.txt' and 'population2.txt'. Dups are removed with a max.
 
Their construction notes are on [[Hubs#Population_Data]]

Navigation menu