Changes

Jump to navigation Jump to search
====Implementing The Elbow Method====
This section explores whether we could implement the '''actual''' elbow method (see https://en.wikipedia.org/wiki/Elbow_method_(clustering)).The answer is that we might be able to, at least for some sub-sample of our data, but that it likely doesn't give us what we want. =====Background=====
The elbow method plots the number of clusters (on x) against the percentage of variance explained (on y) and finds the elbow. The elbow is the point at which the "diminishing returns [in variance explained] are no longer worth the additional cost [of adding another cluster]'. For the variance explained there are two main options:
where <math>\mu</math> is the average value. That is,
:<math>\mu = \frac{1}{n}\sum_{i=1}^n x_i .</math> =====Practical Consequences===== It is possible that any calculation of variance using the full sample of our data (layers x city-years) is computationally infeasible. It seems particularly unlikely that we are going to manage between-group variance. I had problems in the past calculating mean distances between all centroids for just hulls, let alone for all geometries! We could, however, do this for some meaningful sub-population. There is also the question as to whether this approach is sensible in our context. In its native form, we'd be selecting the number of statistical clusters. We could readily use it to select the number of hulls (economic clusters) instead. But, in either case, we'd have to be within-city for this to make sense.  We could normalize the number of clusters, dividing it by the maximum, to deal with the 'cities are different' problem. That is, we could put %unclustered (later called %complete) on the x-axis and %variance explained on the y-axis and fit a curve to a plot of city-year-layers. We could then pick a %unclustered value and apply it across cities. The difference between this and the 'heuristic method' is that we'd be choosing based on diminishing marginal returns in variance explained as opposed to in percentage locations in hulls.
====The Elbow Method Justification====

Navigation menu