Difference between revisions of "Urban Start-up Agglomeration and Venture Capital Investment"

From edegan.com
Jump to navigation Jump to search
Line 20: Line 20:
  
 
==Data==
 
==Data==
 +
 +
===Making the circle input data===
  
 
Ed's additional datawork is in  
 
Ed's additional datawork is in  
Line 28: Line 30:
 
We need to:
 
We need to:
 
#Winsorize CoLevelBlowout
 
#Winsorize CoLevelBlowout
 +
#Compute the circles!
 
#Make the Bay Area (over time) data
 
#Make the Bay Area (over time) data
 
#Plot the Bay Area data (with colors per Bay Area city) for 1985 to present  
 
#Plot the Bay Area data (with colors per Bay Area city) for 1985 to present  
 
#Combine the plots to make an animated gif
 
#Combine the plots to make an animated gif
  
*SDC VentureXpert
+
===Main Sources===
 +
 
 +
The primary sources of data for this project are:
 +
*SDC VentureXpert - from [[VC Database Rebuild]], the key table is '''
 
*GIS City Data
 
*GIS City Data
 
*Data on NSF, NIH, population, income, clinical trials, employment, schooling, R&D expenditures and revenue of firms can be found in [[Hubs]].
 
*Data on NSF, NIH, population, income, clinical trials, employment, schooling, R&D expenditures and revenue of firms can be found in [[Hubs]].
*Data on the number of new vc backed firms in each city and year is in:
+
 
 +
===VC data===
 +
 
 +
Data on the number of new vc backed firms in each city and year is in:
 
  Z:\Hubs\2017\clean data
 
  Z:\Hubs\2017\clean data
 
  The name of the file is '''firm_nr.txt'''.
 
  The name of the file is '''firm_nr.txt'''.
Line 51: Line 60:
 
The differences are taken in excel. The file containing the differences is in  
 
The differences are taken in excel. The file containing the differences is in  
 
  Z:\Hubs\2017 and the file name is '''new_colevel.txt'''.
 
  Z:\Hubs\2017 and the file name is '''new_colevel.txt'''.
 
  
 
*Data on the circle area in each city and year is in:
 
*Data on the circle area in each city and year is in:
Line 75: Line 83:
 
  The file name is '''new_final_kerda.txt'''.
 
  The file name is '''new_final_kerda.txt'''.
  
Also:
+
===Accelerator data===
*Accelerators data is in  
+
 
 +
Accelerators data is in  
 
  Z:\Hubs\2017\clean data
 
  Z:\Hubs\2017\clean data
 
  The file name is accelerators.txt
 
  The file name is accelerators.txt
Line 91: Line 100:
 
It also predicts the hazard rates, matches on the hazard rate in order to create synthetic control and treatment groups.  
 
It also predicts the hazard rates, matches on the hazard rate in order to create synthetic control and treatment groups.  
 
What is left to do is to add 2 lagged and 3 forward observations for the cities which do have a match.  Remove the overlapping observations for the years that get a treatment but which at the same time serve as a control.  
 
What is left to do is to add 2 lagged and 3 forward observations for the cities which do have a match.  Remove the overlapping observations for the years that get a treatment but which at the same time serve as a control.  
 +
 +
===See also===
  
 
Also:
 
Also:
Line 98: Line 109:
  
  
===Unbiased measure===
+
==Unbiased measure==
  
 
The number of startups affects the total area of the circles according to some function. The task is to find an unbiased measure of the area, which is not affected by the number of the startups, given the size and their distribution.   
 
The number of startups affects the total area of the circles according to some function. The task is to find an unbiased measure of the area, which is not affected by the number of the startups, given the size and their distribution.   
Line 104: Line 115:
 
For the unbiased calculation of a measure in a different context see: http://users.nber.org/~edegan/w/images/d/d0/Hall_(2005)_-_A_Note_On_The_Bias_In_Herfindahl_Type_Measures_Based_On_Count_Data.pdf
 
For the unbiased calculation of a measure in a different context see: http://users.nber.org/~edegan/w/images/d/d0/Hall_(2005)_-_A_Note_On_The_Bias_In_Herfindahl_Type_Measures_Based_On_Count_Data.pdf
  
===GIS Resources===
+
==GIS Resources==
 +
 
 
*https://www.census.gov/geo/maps-data/data/tiger-line.html  
 
*https://www.census.gov/geo/maps-data/data/tiger-line.html  
 
*https://www.census.gov/geo/maps-data/data/tiger.html
 
*https://www.census.gov/geo/maps-data/data/tiger.html
 
*http://postgis.net/features/
 
*http://postgis.net/features/
 
*https://en.wikipedia.org/wiki/GIS_file_formats
 
*https://en.wikipedia.org/wiki/GIS_file_formats

Revision as of 20:36, 16 September 2017

Academic Paper
Title Urban Start-up Agglomeration
Author Ed Egan
RAs Peter Jalbert, Jake Silberman, Christy Warden
Status In development
© edegan.com, 2016


Summary

Agglomeration is generally thought to be one of the most important determinants of growth for urban entrepreneurship ecosystems. However, there is essentially no empirical evidence to support this. This paper takes advantage of geocoding and introduces a novel measure of agglomeration. This measure is the smallest circle area that covers all startup offices, subject to having at least N startups in each circle. Using GIS data on cities, this paper controls for the density and socio-demographics of an area to identify the effect of just agglomeration.

Description

Clusters of economic activity plays a significant role in the firms performance and growth. An important driver of growth is the knowledge spillover between firms. This includes among others the facilitation of information flow and ideas between firms which could be a milestone especially in the growth of startup firms or small businesses. This project focuses on the effects of agglomeration on the performance and growth of startup firms. It introduces a novel measure of agglomeration which can be used to empirically test the effects of clustering. This measure the is smallest total circle area that covers all of the startups in the sample such that there are at least n firms in each circle. The projects is based on the creation of an algorithm which gives an unbiased measure to be used in the empirical analysis. The regression we are interested in takes the following form:

Regression equation.png

The dependent variable is a measure of growth of the firms. This measure could be investment forwarded one period or growth in investment. The control variables include the number of the startups firms, m, the agglomeration measure, A and a vector of other control variables affecting the growth of firms at time t. Because of the endogeneity in the circle area or the measure of agglomeration, A, there is a need for an instrumental variable to get consistent estimates of the effects we are interested in. The proposed instrument is the presence of a river, or road in between the points representing geographical locations of the venture capital backed up firms. The instrument affects agglomeration without having a direct impact on the growth. This makes it good candidate for a valid instrument. The next tasks are determining the additional control variables to include in the regression, years to include in the analysis and methods of finding an unbiased measure of agglomeration.

Data

Making the circle input data

Ed's additional datawork is in

Z:\VentureCapitalData\SDCVCData\vcdb2\ProcessingCoLevelSimple.sql

The key table for circle processing is CoLevelBlowout, which is restricted (to include cities with greater than 10 active at some point in the data) to make CoLevelForCircles.

We need to:

  1. Winsorize CoLevelBlowout
  2. Compute the circles!
  3. Make the Bay Area (over time) data
  4. Plot the Bay Area data (with colors per Bay Area city) for 1985 to present
  5. Combine the plots to make an animated gif

Main Sources

The primary sources of data for this project are:

  • SDC VentureXpert - from VC Database Rebuild, the key table is
  • GIS City Data
  • Data on NSF, NIH, population, income, clinical trials, employment, schooling, R&D expenditures and revenue of firms can be found in Hubs.

VC data

Data on the number of new vc backed firms in each city and year is in:

Z:\Hubs\2017\clean data
The name of the file is firm_nr.txt.

Database is cities SQL script is: nr_firms.sql

Raw data is in:

Z:\VentureCapitalData\SDCVCData\vcdb2
The file is colevelsimple.txt

In order to see if there are outliers, I get the average coordinates for all cities and find the differences of the firm's coordinates from the city coordinate. The script for the average city coordinates is in

Z:\Hubs\2017\sql scripts and the file name is newcolevel.sql.

The differences are taken in excel. The file containing the differences is in

Z:\Hubs\2017 and the file name is new_colevel.txt.
  • Data on the circle area in each city and year is in:
Z:\Hubs\2017\clean data
The name of the file is circles.txt. (It contains only 106 observations)

Database is cities SQL script is: circles.sql

The script for joining the two tables on the VC table is in:

Z:\Hubs\2017\sql scripts
 The name of the file is new_firm_nr_circles.sql
  • We use the cities with greater than 10 active VC backed firms. Data on the cities and number of active firms is in:
E:\McNair\Projects\Hubs\Summer 2017
The file is CitiesWithGT10Active.txt

The script for joining the final data with this file is located in

Z:\Hubs\2017\sql scripts
The file name is final_joined_kerda.sql.

The final data is in

Z:\Hubs\2017\clean data
The file name is new_final_kerda.txt.

Accelerator data

Accelerators data is in

Z:\Hubs\2017\clean data
The file name is accelerators.txt
The table is accelerators

The joined accelerators data with the VC table is in joined_accelerators table. The script is in

Z:\Hubs\2017\sql scripts
The file name is join_accelerators.sql

The do file is in

Z:\Hubs\2017\kerda
The name is agglomeartion_kerda.do

It includes the graphs, tables and the preliminary FE regressions with VC funding amount and growth rate. It also predicts the hazard rates, matches on the hazard rate in order to create synthetic control and treatment groups. What is left to do is to add 2 lagged and 3 forward observations for the cities which do have a match. Remove the overlapping observations for the years that get a treatment but which at the same time serve as a control.

See also

Also:


Unbiased measure

The number of startups affects the total area of the circles according to some function. The task is to find an unbiased measure of the area, which is not affected by the number of the startups, given the size and their distribution.

For the unbiased calculation of a measure in a different context see: http://users.nber.org/~edegan/w/images/d/d0/Hall_(2005)_-_A_Note_On_The_Bias_In_Herfindahl_Type_Measures_Based_On_Count_Data.pdf

GIS Resources