Difference between revisions of "Enclosing Circle Algorithm (Rework)"
Line 42: | Line 42: | ||
[[File:houstonk.png]] [[File:houstonp.png]] | [[File:houstonk.png]] [[File:houstonp.png]] | ||
== Step By Step Use== | == Step By Step Use== | ||
− | The | + | 1. The file "vc_circles.py" contains the main script, "master_circles", which takes in the following |
− | + | a. "textfile", which is the filepath of the text data to read in. It must have fields "lattitude',"longitude","city","datefirstinv" | |
− | + | b. "n", which is the number of points to include in each circle | |
− | + | c. "iterations", the number of iterations to try and find a local optimum. Large numbers will slow the run time. | |
− | + | d. "basefn", the base file path of where to store the circles and points for each city,year pair <br> | |
− | + | an example: "C:\\users\\Name\\Documents" <br> | |
− | + | the function will add on the city,year to this base file name and save it to this file directory <br> | |
− | + | e. "outputfn", the full file path of where to output the area of the circles for a given city,year <br> | |
− | + | 2. To plot the circles,points for a given year onto google maps, use the function "googleplotter.py" | |
− | + | a. This function takes in the filename used in 1d | |
− | |||
== Benefits == | == Benefits == |
Revision as of 16:32, 12 July 2017
Contents
K-Means Based Algorithm
Location
The script and associated modules are located in:
E:\McNair\Software\CodeBase\New Implement of Enclosing Circle (Constrained K Means, Smallest Circle)
Explanation
The algorithm relies on an object oriented implementation of a "cluster".
Each "cluster" has the following associated with it:
1. area of minimum circle enclosing points in the cluster 2. points associated with the cluster 3. minimum circle associated with the cluster (x-coord,y-coord,radius) 4. "parent" cluster 5. "children" cluster(s)
Inputs:
1. n, the minimum number of points to include in a circle 2. k, the amount of clusters to split the points into 3. valid_nodes, an empty list to store the possible clusters to explore 4. explored_nodes, an empty list to store the explored clusters 5. cur_node, initialized to an object with all values set to none 6. end_nodes, initialized to an empty list used to store the clusters which cannot be split further
Algorithm:
1. Calculate the area of the minimum circle enclosing the total set of points 2. Split initial points into "k" clusters of at least "n" points using Constrained-K-Means Algorithm 3. construct minimum circles of "k" clusters and compute their total area 4. if this area <= area of the area of the first circle 5. Add all "k" clusters to "valid_nodes" 6. for node in valid_nodes, add node to explored_nodes, change cur_node to node, and update respective areas, parents, children, circle, points 7. run this algorithm again, change k = 2 8. if the cur_node cannot be split into 2, or the cur_node cannot be split into 2 circles with smaller areas than its parent, add cur_node to end_nodes and resume algorithm with next valid_node not in explored_node as the cur_node 9. if all the valid_nodes are in explored_nodes, return end_nodes 10. repeat the above algorithm for all k from 2 to int(#number of total points/n) and take the end_nodes with the lowest total area
Runtime:
The runtime of the minimum enclosing circle is O(n), and the runtime of constrained k-means depends on k itself. Here, the value of k is proportional to the total number of points.
We would expect the algorithm to slow as the number of points increase. For reference, a set of 80 points with n =3, took about 30 minutes, with number of iterations at 100.
Further improvements could implement an early-stopping method to converge to a local optima.
Visualization
The K-Means based algorithm returns the optimal solution (left), and faster.
Step By Step Use
1. The file "vc_circles.py" contains the main script, "master_circles", which takes in the following
a. "textfile", which is the filepath of the text data to read in. It must have fields "lattitude',"longitude","city","datefirstinv" b. "n", which is the number of points to include in each circle c. "iterations", the number of iterations to try and find a local optimum. Large numbers will slow the run time. d. "basefn", the base file path of where to store the circles and points for each city,year pair
an example: "C:\\users\\Name\\Documents"
the function will add on the city,year to this base file name and save it to this file directory
e. "outputfn", the full file path of where to output the area of the circles for a given city,year
2. To plot the circles,points for a given year onto google maps, use the function "googleplotter.py"
a. This function takes in the filename used in 1d
Benefits
Is faster than both Brute Force and Logical Enclosing Circle.
Is more accurate than Logical Enclosing Circle.
Can plot to google maps.
Will return a tab delim text file of the following fields: city,year,total area
Good for a large number of points
Drawbacks
Is not the optimal solution as number of points increases, but an approximation
Logical Enclosing Circle
Location
The python script is located in :
E:\McNair\Projects\Accelerators\Enclosing_Circle\Enclosing_Circle\EnclosingCircleRemake2.py
Explanation
The rationale is to take the furthest point, draw a circle with its nearest n-1 points.
Take the furthest point from the initial point, draw a circle with its nearest n-1 points.
Repeat until all points are enclosed.
Benefits
Can plot to google maps
Is faster than brute force
Returns city, circles, year to a tab delimited text file
Drawbacks
Really slow for large number of points
Offers an approximate solution that is worse than the Clustering based version
Brute Force
Location
The Julia Script is located in:
E:\McNair\Software\CodeBase\Julia Code for enclosing circle
The Python Script is located in:
E:\McNair\Projects\Accelerators\Enclosing_Circle\enclosing_circle_brute_force.py
Explanation
1) Find all combinations of two points in the total data set.
2) For each of these combinations, draw a circle and figure out what other points are contained in a circle around those two points. If the circle contains at least n points, add it to a set of valid circles. There are Number of Points choose 2 possible circles.
3) Combine the valid circles in all possible ways and see if the resulting scheme contains all of the points. If it does, add it to a set of valid schemes. I believe that the number of ways to do this is the sum from i = 1 to i = number of valid circles Number of Circles choose i.
4) Iterate through the valid schemes and calculate the areas of the schemes.
5) Return the scheme with the minimum area.
Benefits
Always gives you the correct answer.
Drawbacks
For a small number of points (i.e., <7), this is OKAY to use.
However, it becomes extremely slow as the number of points increases.
The Julia/Python code does not have the capability to plot to google maps, but will plot to a normal figure.
(will work on adding this implementation, if the need arises)