Difference between revisions of "Hierarchical Clustering"

From edegan.com
Jump to navigation Jump to search
(Created page with "{{McNair Projects |Has title=Hierarchical Clustering |Has owner=Kyran Adams, Oliver Chang, |Has start date=12/1/2017 |Has keywords=Cluster, Clustering, Circles, Pain in the as...")
 
Line 6: Line 6:
 
|Has project status=Active
 
|Has project status=Active
 
}}
 
}}
Code located in <code>E:\McNair\Projects\FastCircles\src</code>.
+
 
 +
==Summary==
 +
 
 +
The code is in  
 +
E:\projects\hca
 +
 
 +
The python3 file is main.py
 +
 
 +
The code uses the AgglomerativeClustering from sklearn.cluster, which doesn't have GPU support.
 +
 
 +
The input is a tdt file named CoLevelForCircles.txt with 7 columns:
 +
city state year lat lon coname datefirstinv
 +
 
 +
The output is a tdt file named Results.tsv with 8 columns:
 +
(city, state, year) layer cluster ('lat','long','coname','datefirstinv') 
 +
 
 +
==Documentation==
 +
 
 +
There's useful reference material here: https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/
 +
 
 +
Note that it should be possible to use [https://www.tensorflow.org/api_docs/python/tf/contrib/factorization/KMeansClustering Tensorflow's KMeansClustering] to achieve the same result.
 +
 
 +
==Old Code Notes==
  
 
This code takes a CoLevel master file, clusters points using k (number of clusters) in the range [1, num points / 5), and creates a file output.tsv.
 
This code takes a CoLevel master file, clusters points using k (number of clusters) in the range [1, num points / 5), and creates a file output.tsv.
  
 
Output.tsv has columns place, statecode, year, layer, cluster, lat, long, coname, datefirstinv. Layer is k, and cluster is the id of the cluster that the point belongs to.
 
Output.tsv has columns place, statecode, year, layer, cluster, lat, long, coname, datefirstinv. Layer is k, and cluster is the id of the cluster that the point belongs to.
 +
 +
The original version by Kyran and Oliver is in:
 +
E:\McNair\Projects\FastCircles\src
  
 
You can run this program with:
 
You can run this program with:
 
   
 
   
 
<code>python3 main.py</code>
 
<code>python3 main.py</code>

Revision as of 19:11, 27 May 2019


McNair Project
Hierarchical Clustering
Project logo 02.png
Project Information
Project Title Hierarchical Clustering
Owner Kyran Adams, Oliver Chang
Start Date 12/1/2017
Deadline
Keywords Cluster, Clustering, Circles, Pain in the ass, Agglomeration
Primary Billing
Notes
Has project status Active
Copyright © 2016 edegan.com. All Rights Reserved.


Summary

The code is in

E:\projects\hca

The python3 file is main.py

The code uses the AgglomerativeClustering from sklearn.cluster, which doesn't have GPU support.

The input is a tdt file named CoLevelForCircles.txt with 7 columns:

city state year lat lon coname datefirstinv

The output is a tdt file named Results.tsv with 8 columns:

(city, state, year) layer cluster ('lat','long','coname','datefirstinv')  

Documentation

There's useful reference material here: https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/

Note that it should be possible to use Tensorflow's KMeansClustering to achieve the same result.

Old Code Notes

This code takes a CoLevel master file, clusters points using k (number of clusters) in the range [1, num points / 5), and creates a file output.tsv.

Output.tsv has columns place, statecode, year, layer, cluster, lat, long, coname, datefirstinv. Layer is k, and cluster is the id of the cluster that the point belongs to.

The original version by Kyran and Oliver is in:

E:\McNair\Projects\FastCircles\src

You can run this program with:

python3 main.py