Enclosing Circle Algorithm
Enclosing Circle Algorithm | |
---|---|
Project Information | |
Project Title | Enclosing Circle Algorithm |
Owner | Christy Warden |
Start Date | 201701 |
Deadline | 201704 |
Primary Billing | |
Notes | |
Has project status | Active |
Copyright © 2016 edegan.com. All Rights Reserved. |
To-Do
1) Cleanup of geocoding
a) Take from tab delimited not from whitespace b) Load in to database c) Use google maps API d) Throw out center of world/center of city e) Change script to take Company Year f) 100 largest cities
Overview
This program takes in a set of points and the minimum number that should be included inside a unit, and returns circles of the smallest total area which encompass all of the data points. Function make_circle and all of its helper functions were taken from https://www.nayuki.io/res/smallest-enclosing-circle/smallestenclosingcircle.py.
Input: A sequence of pairs of floats or ints, e.g. [(0,5), (3.1,-2.7)].
Output: A triple of floats representing a circle.
Returns the smallest circle that encloses all the given points. Runs in expected O(n) time, randomized.
Algorithm Description
Location
The original script is located in:
E:\McNair\Software\CodeBase\EnclosingCircle.py
Applications
VC Data
The Enclosing Circle Algorithm will be applied to VC data acquired through the SDC Platinum database. The script makes use of the Python GeoPy GeoCoder to get latitude and longitude coordinates to be used by the Enclosing Circle Algorithm.
Geopy Geocoder User Agreements can be found here.
The relevant files are located in:
E:\McNair\Projects\Accelerators\Enclosing_Circle
The results may eventually be plotted to a graph using python as well. Here is documentation for a python library called basemap.
CURRENT STATUS: Bug fixes needed in EnclosingCircle.py. The program errors with a key error on line 187 in cases where n is not a multiple of the length of the dataset. I made some temporary fixes to the enclosing circle file located in the above directory, but I am not certain if it is a permanent fix.
Speeding up with C
With the large amount of VC data we have, the enclosing circle algorithm would take an extremely long amount of time to run (on the order of weeks/months). If we can compile the code into C, we can speed up runtime dramatically. I've listed some possible sources for running python code as C.
Cython
Cython. Documentation here. Basic tutorial for Cython is given here.
Currently, the RDP is missing a compiler to run Cython successfully. The error that appears is "unable to find vcvarsall.bat".
Getting a C++ Compiler
These are proposed fixes to solve the Cython error shown above. A possible C++ Compiler for Python can be downloaded directly from Windows here.
The C++ Compiler for Python has been downloaded and installed. Instructions for installation and uninstallation can be found here.
The Installation file is located in :
E:\McNair\Software\Utilities\VCForPython27.msi
Running Python vs. Cython
A trial test run on 82 coordinates resulted int the following time stamps:
EnclosingCircle in python: 16.069 seconds Enclosing Circle in cython: 9.633 seconds
The test files can be found in the following:
EnclosingCircle in python: E:\McNair\Projects\Accelerators\EnclosingCircle\EnclosingCircle.py EnclosingCircle in cython: E:\McNair\Projects\Accelerators\EnclosingCircle\EnclosingCircleC_Test.py
Usage
The basic tutorial for cython can be found phttp://docs.cython.org/en/latest/src/tutorial/cython_tutorial.html here].
Essentially, a setup.py file needs to be created with the following format:
try: from setuptools import setup from setuptools import Extension except ImportError: from distutils.core import setup from distutils.extension import Extension
from Cython.Build import cythonize setup( ext_modules = cythonize("filename.pyx") )
Then, after changing to the proper directory, execute the following from the command line
python setup.py build_ext --inplace
This will wrap your python program in C, and produce a filename.pyd file.
To use this new python code wrapped in C, simply import the pyd file as if it were a python file:
import filename
Treat this file as any other module. It will work just as if it were in Python, except it exhibits a faster run time.
NOTE: We fixed Enclosing Circle and drastically improved its runtime, so we decided to go with its Python implementation.
Results
A data set with city, state, company name, year, and geocoded coordinates can be found at:
E:\McNair\Projects\Accelerators\Code+Final_Data\ChristyCode\GeoCodedBusinesses.txt
NEXT STEP: We need to determine how many companies we want in each circle, and then we can begin running the enclosing circle algorithm on the city data.
The Top 50 cities with the maximum number of companies in any given year are: 'Santa Monica', 'Nashville', 'Santa Clara', 'Chicago', 'Philadelphia', 'Denver', 'Dallas', 'Burlington', 'San Francisco', 'San Mateo', 'Milpitas', 'Boulder', 'Bellevue', 'Herndon', 'Pittsburgh', 'Mountain View', 'San Diego', 'Fremont', 'Ann Arbor', 'Irvine', 'Brooklyn', 'Durham', 'Los Angeles', 'Atlanta', 'Alpharetta', 'Menlo Park', 'Rockville', 'San Jose', 'Lexington', 'Saint Louis', 'Sunnyvale', 'Palo Alto', 'Richardson', 'Redwood City', 'Austin', 'Waltham', 'Baltimore', 'Cupertino', 'Houston', 'Cambridge', 'Boston', 'Washington', 'Minneapolis', 'Pleasanton', 'New York', 'Cleveland', 'South San Francisco', 'Portland', 'Seattle'.
Data on the Top 50 VC Backed Companies can be found here.
These were determined by the decide_cities.py script located in:
E:\McNair\Projects\Accelerators\EnclosingCircle
The final data of cities at a given year and their minimized circles can be found at:
E:\McNair\Projects\Accelerators\EnclosingCircle\final_vc_circles.txt