Changes

Jump to navigation Jump to search
The objective is to apply the [https://en.wikipedia.org/wiki/Elbow_method_(clustering) Elbow Method], which involves finding the [https://en.wikipedia.org/wiki/Knee_of_a_curve Knee of the curve] of either the F-statistic or variance explained.
 
I used distances calculated by ST_Distance and calculated the '''variance explained''' using the equation below. The between-group variance is undefined for the first layer, as it has <math>k=1</math> and <math>\bar{Y}_{i\cdot} = \bar{Y}</math> (i.e., a its single all-encompassing hull so its centroid the overall mean) and its variance is then <math>n_i(0)^2/(0)</math>.
 
I then calculated forward differences, and added one to the answer, as using central differences left truncates the data. (An inspection of the data revealed that it is vastly more likely that the 'correct' answer is found at the left end of the data than the right. Also central first difference bridge the observation, which can lead to misidentification of monotonicity.) Specifically, I used:
 
:<math> f'(x) = f(x + 1) - f(x) </math>
:<math> f''(x) = f(x+2) - 2 f(x+1) + f(x)</math>
 
I then used f'(x) to determine the layer index from which the variance explained was monotonic (i.e., there was no change in sign in f'(x) in higher layer indices), found the layer index <math>i</math> at which <math>varexp_i = min(varexp)</math>, and marked <math>i+1</math> as the elbow layer
====Fixing the layer index====

Navigation menu