Changes

Jump to navigation Jump to search
* [https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.squareform.html scipy.spatial.distance.squareform]
The HCA.py script uses several functions from another python module, [[:File:schedule_py.pdf|schedule.py]], which encodes agglomeration schedules produced by the AgglomerativeClustering package. The standard encoding [[https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html#sphx-glr-auto-examples-cluster-plot-agglomerative-dendrogram-py records the agglomeration schedule as complete paths]], indicating which clusters are merged together at each step. The layer-cluster encoding provided in schedule.py instead efficiently records the agglomeration schedule as a series of layers. It also relies on only a single read of the source data, so it is fast.
The code snippets provided in [[:File:hierarchy-InsertSnippets_py.pdf|hierarchy_InsertSnippets.py]] modify the standard library provided in the [https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html scipy.cluster.hierarchy package]. This code allows users to pre-calculate distances between locations (latitude-longitude pairs) using highly-accurate [https://postgis.net/docs/ST_Distance.html PostGIS spatial functions] in PostgreSQL. Furthermore, the code caches the results so, provided the distances fit into (high-speed) memory, it also allows users to increase the maximum feasible scale by around an order of magnitude. The code in hierarchy-InsertSnippets.py contains two snippets. The first snippet should be inserted at line 188 in the standard library. Then line 732 of the standard library should be commented out (i.e., #y = distance.pdist(y, metric)), and the second snippet should be inserted at line 734. A full copy of the amended [[:File:hierarchy_py.pdf|hierarchy.py]] is also available.

Navigation menu