Changes

Jump to navigation Jump to search
This section explores whether we could implement the '''actual''' elbow method (see https://en.wikipedia.org/wiki/Elbow_method_(clustering) ). The answer is that we might be able to, at least for some sub-sample of our data, but that it likely doesn't give us what we want.
 
I used distances calculated by ST_Distance and calculated the '''variance explained''' using the equation below. The between-group variance is undefined for the first layer, as it has <math>k=1</math> and <math>\bar{Y}_{i\cdot} = \bar{Y}</math> (i.e., a its single all-encompassing hull so its centroid the overall mean) and its variance is then <math>n_i(0)^2/(0)</math>.
 
I then calculated forward differences, and added one to the answer, as using central differences left truncates the data. (An inspection of the data revealed that it is vastly more likely that the 'correct' answer is found at the left end of the data than the right. Also central first difference bridge the observation, which can lead to misidentification of monotonicity.) Specifically, I used:
 
:<math> f'(x) = f(x + 1) - f(x) </math>
:<math> f''(x) = f(x+2) - 2 f(x+1) + f(x)</math>
 
I then used f'(x) to determine the layer index from which the variance explained was monotonic (i.e., there was no change in sign in f'(x) in higher layer indices), found the layer index <math>i</math> at which <math>varexp_i = min(varexp)</math>, and marked <math>i+1</math> as the elbow layer.
=====Background=====

Navigation menu