Changes

Jump to navigation Jump to search
#The elbow method is pretty questionable in its current form, so we are going to try using the elbow in the curvature (degree of concavity) instead.
#We might also try using elasticities...
#Rerun the distance calculations -- avghulldisthm and avgdisthm are only computed for layers that we select with some method (like max r2). However, this table hadn't been updated for the elbow method, perhaps as well as some other methods, so some distances would have been missing (and replaced with zeros in the STATA script).
#Create and run the new max R2 layer. In this variant, we'll use "the first layer a cluster number is reached as the representative layer for that cluster number"
I built two new curvature based elbow methods and so variables: curvaturelayer and curvaturelayerrestricted. They use the method described below and are identical except that curvaturelayerrestricted can't select layer 2 (both can't select the first and last layers as they use central second differences).
 
For the example cities we have:
{| class="wikitable" style="vertical-align:bottom;"
|-
! place
! statecode
! year
! numstartups
! elbowlayer
! finallayer
! curvaturelayer
|-
| Menlo Park
| CA
| 2,006
| 68
| 4
| 51
| 4
|-
| San Diego
| CA
| 2,006
| 220
| 3
| 184
| 181
|-
| Campbell
| CA
| 2,006
| 38
| 3
| 26
| 8
|-
| Charlotte
| NC
| 2,006
| 30
| 3
| 30
| 28
|-
| Waltham
| MA
| 2,006
| 106
| 3
| 58
| 55
|}
 
For these city-years, the curvaturelayer is the same as the curvaturelayerrestricted. As you can see, it is all over the place! I really don't think we can say that this method 'works' for any real value of 'works'.
 
There's a sheet (Curvature Raw Data Examples) in ResultsV3-6.xlsx, and there's graphs for the selected cities on sheet "Elbow Curvature Selected Cities".
 
====New MaxR2 Layer====
 
I noticed a copy and paste error in the do file and I re-ran the existing max R2 method too, just to be sure.
 
My process for the new method uses the code for the old chosenhullflayer variable. Key variables are:
*firstlayer the layer at which numclusters first achieves that value
*regfirst an indicator to select the right set of layers to run the max r2 estimation on
*chosenhullflayer - the variable that records the layer number selected using firstlayer and the max r2 method
*besthullflayer - the equivalent to besthulllayer but with the first layers instead of the lowest-highest ones
*targetnumclustersf, besthullflayerisadded, maxr2flayerflag, etc.
*'''regmaxr2f''' and '''regbestf''' - these are the dataset constraints to use. Everything is pushed through the database and back to generate them.
 
The results for our sample cities are as follows:
{| class="wikitable" style="vertical-align:bottom;"
|-
! place
! statecode
! year
! finallayer
! chosenhulllayer
! style="font-weight:bold;" | chosenhullflayer
! elbowlayer
|-
| Campbell
| CA
| 2,006
| 26
| 15
| style="font-weight:bold;" | 3
| style="font-weight:bold;" | 3
|-
| Charlotte
| NC
| 2,006
| 30
| 14
| style="font-weight:bold;" | 3
| style="font-weight:bold;" | 3
|-
| Menlo Park
| CA
| 2,006
| 51
| 33
| style="font-weight:bold;" | 21
| style="font-weight:bold;" | 4
|-
| San Diego
| CA
| 2,006
| 184
| 141
| style="font-weight:bold;" | 12
| style="font-weight:bold;" | 3
|-
| Waltham
| MA
| 2,006
| 58
| 31
| style="font-weight:bold;" | 3
| style="font-weight:bold;" | 3
|}
 
I build the max R2 graphs in the sheet '''New MaxR2''' in ResultsV3-6.xlsx
====Jim's notes on the curvature====
We could instead use the forward first difference - this isn't available for the last observation (for which we can't compute a second central anyway) but is available for the first observation - and increment the answer, much as Jim proposes decrementing it when using the backward layer. But seeing as we can't use the first observation we've gained nothing anyway! So we'll do Jim's method verbatim, and declare the result null if it comes out as either the first or last layer.
====Curvature==== {{Colored box|title=Specification|content=For layer <math>l</math>, I will compute the concavity curvature as -1 times the backward first difference in the variance explained ratio from layer <math>l+1</math> divided by the central second difference in the variance explained ratio from <math>l</math>. The first and last layers are forbidden results.}} The curvature results seem somewhat better than the elbow results but are still far from ideal. Here are some things I look for and/or don't like in a layer selection method:*Interior solutions are good, collapsing to the bounds, especially the lower bound is bad*Consistent solutions are good within cities - it's nice when adjacent years in the same city have more or less the same layer selected*Consistent solutions across cities are also good - When the method picks roughly similar layer indicies (i.e., % unclustered) across cities, particularly conceptually similar cities, that's a plus*From other analysis, I know that the equilibrium of agglomeration forces occurs when agglomerations have fairly small average hull sizes, perhaps on the order of 10hm2.
===Version 3.5 build notes===

Navigation menu