Changes

Urban Start-up Agglomeration and Venture Capital Investment (view source)

Revision as of 17:08, 12 November 2020

645 bytes added , 17:08, 12 November 2020

The objective is to apply the [https://en.wikipedia.org/wiki/Elbow_method_(clustering) Elbow Method], which involves finding the [https://en.wikipedia.org/wiki/Knee_of_a_curve Knee of the curve] of either the F-statistic or variance explained.

I used distances calculated by ST_Distance and calculated the '''variance explained''' using the ~~equation below. The between-group variance is undefined for the first layer, as it has~~ following equations: :<math>kSS_{exp}=\sum_{i=1}^{K} n_i(\bar{Y}_{i\cdot} - \bar{Y})^2</math> ~~and~~ :<math>SS_{unexp}=\sum_{i=1}^{K}\sum_{j=1}^{n_{i}} \left( Y_{ij}-\bar{Y}_{i\cdot} = \~~bar{Y}~~right)^2</math> ~~(i.e., a its single all-encompassing hull so its centroid the overall mean) and its variance is then~~ :<math>~~n_i(0)~~R^2~~/(0)~~= \frac{SS_{exp}}{SS_{exp}+>SS_{unexp}}</math>.

I then calculated forward differences, and added one to the answer, as using central differences left truncates the data. (An inspection of the data revealed that it is vastly more likely that the 'correct' answer is found at the left end of the data than the right. Also central first difference bridge the observation, which can lead to misidentification of monotonicity.) Specifically, I used:

:<math> f''(x) = f(x+2) - 2 f(x+1) + f(x)</math>

I required that a city-year had more than two layers, as it takes at least 3 layers to form an elbow. I then used <math>f'(x)</math> to determine the layer index from which the variance explained was monotonic (i.e., there was no change in sign in <math>f'(x)</math> in higher layer indices). This wasn't an issue when using the population variance explained. In an earlier version, ~~and~~ when we used ~~<math>f''(x)</math> to find~~ the ~~layer index <math>i</math> at which <math>varexp_i = min~~sample variance explained, we had some non-monotonic sections of the curve resulting from integer division (~~varexp)~~</math> ~~for some city~~\frac{k-~~year. I then marked <math>i+~~1}{n-k}</math> ~~as the elbow layer for that city-year, as we are using forward differences, not central differences~~).

I used <math>f'''(x)</math> to find the layer index <math>i</math> at which <math>varexp_i = min(varexp)</math> (for elbowlayer) or for which <math>varexp_i = max(varexp)</math> (for elbowmaxlayer), for some city-year. I ~~created a new build~~ then marked <math>i+1</math> as the elbow (~~version 3~~or elbowmax) layer for that city-year, as we are using forward differences, not central differences.2Note that the biggest change in slope could be found using max(abs(f''(x))) but this is essentially always min(f''(x)) ~~of the dataset~~, ~~do file and log file~~i.e., ~~which includes~~ the ~~variance explained~~ elbow ~~method~~layer, as the change in slopes are mostly negative. ~~It's~~ However, the changes in slopes do often go positive, and the elbowmax layer captures the ~~dropbox.'''~~biggest positive change in slope.

'''I created a new build (version 3.3) of the dataset, do file and log file, which includes the population variance explained elbow method, as well as the elbowmax method. It's in the dropbox.'''. Note that the lens found by ~~this~~ the population elbow method is ~~only~~ slightly bigger than the lenses found using sample elbow method from before, but the lens found using the ~~other heuristic~~ elbowmax method ~~and~~ is about the same size as the ~~maximum R2~~ sample elbow method ~~(and those two lenses are near identical!)~~, if not slightly smaller. ItI'~~s easy to look at~~ m not sure about the ~~differences in~~ justification of the ~~medians or means (etc~~elboxmax method though.~~) and see differences, but it's important to remember just how big those differences could be!~~

====Fixing the layer index====

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,612

edits

Changes

Urban Start-up Agglomeration and Venture Capital Investment (view source)

Revision as of 17:08, 12 November 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools