Changes

Jump to navigation Jump to search
no edit summary
All of our data assembly, and much of our data processing and analysis, is done in a [https://www.postgresql.org/ PostgreSQL] [https://postgis.net/ PostGIS] database. See our [[Research Computing Infrastructure]] page for more information.
However, we rely on [[https://www.python.org/ python]] scripts to retrieve addresses from Google Maps, as well as compute the [https://en.wikipedia.org/wiki/Hierarchical_clustering Hierarchical Cluster Analysis (HCA)] itself, and estimate a cubic to determine the HCA-regression method agglomeration count for an [https://en.wikipedia.org/wiki/Metropolitan_statistical_area MSA]. We also use two [https://www.stata.com/ Stata] scripts: one to compute the HCA-regressions, and another to estimate the paper's summary statistics and regression specifications. Finally, we use QGIS to construct the map images based on queries to our database. These images use a [https://maps.google.com Google Maps] base layer.
== Data Processing Steps ==
The script [[:File:Agglomeration_CBSA.sql.pdf|Agglomeration_CBSA.sql]] provides the processing steps within the PostgreSQL database. We first load the startup data, add in the longitudes and latitudes, and combine them with the [[https://en.wikipedia.org/wiki/Core-based_statistical_area CBSA]] boundaries. Startups in our data our keyed by a triple (coname, statecode, datefirstinv) as two different companies can have the same names in different states, or within the same state at two different times.
[[File:AgglomerationProcess_v2.png|center|thumb|768px|Data Processing Steps]]

Navigation menu