Abhijit Brahme (Work Log)

From edegan.com
Jump to navigation Jump to search


Downloaded Atom IDE along with Juno, a Julia specific console.
Began looking at MATLAB code for the Entrepreneurship Matching Project and trying to understand it
Began translating enclosing circle algorithm from python to julia


Looked through MATLAB code for Entrepreneurshp Matching with James, Ed in-depth
Understood the general idea of the code; Going to call original author for more details.
Worked through the Julia implementation of Brute Force Enclosing Circle. (had to install "Combinatorics")
There are small errors with the number of valid circles; still need to implement plotting circles on map.


FIXED the errors involved with translating Python to Julia for Enclosing Circle (Brute Force)
Implemented the plots in Julia using PyPlot and Plots (need to Pkg.add("Plots") and Pkg.add("PyPlot"))
Run time in the Julia implementation is actually 2x as slow for the small test set ( 1.12 s for Julia, .55 for Python)
Need to look into vectorizing some of the code
Steps Forward:

    1) The code might not be optimized for Julia, so try doing that 
    2) The code for Julia for overhead might be great so it might be slower on a smaller set; try on a bigger set of test points
    3) Come up with a better algorithm

Code Location:

    E:\McNair\Software\CodeBase\Julia Code for enclosing circle


Ran the "bigexample" for Julia and Python Brute Force.... still running.
Only got through about 7 combinations in about 8 hours...not very promising.
Quit the program. It started taking up about 50% memory
Began working on enclosing circle algorithm. Found promising information regarding a constrained K-means algorithm
working on implementing the constrained K-means with the smallest enclosing circle to solve the Enclosing Circle Problem
Code Location for New Enclosing Circle:

   E:\McNair\Software\CodeBase\New Implement of Enclosing Circle (Constrained K Means, Smallest Circle)


Wrote up the code for the new algorithm. It runs blazing fast.
We are worried about the optimality of the solution. I don't take into account border points.
These are points that could be shared by circles. An idea is to add border points into both clusters, then run the smallest enclosing circle on those sets of points.
I'll include a full Algorithm write-up on Monday once I have all the kinks worked out.


Solved the issue of points not being shared by iterating through all possible splits (ie k = 1:floor(numpoints/n))
Algorithm runs much slower due to iterating through all K, but it is still faster than others
I tried a version where I don't keep "K" constant throughout the loop. This converged to a better solution
I ended up changing K to 2 after the first split. (first split points into 10 points then split those ten points into 2, etc)


Will run previous algorithm (Peter and Christy's) and compare to mine
If not the same, will update Enclosing Circle Algorithm page with my algorithm


Ran Christy and Peter's algorithm on a small data sample. It worked faster than mine, and achieved the correct answer.
Currently running their algorithm on a much larger data sample. It is still running, and exhibits slower run time than mine.
Will leave it running to see what it converges to.


Built a html scraper in Python to scrape Accelerator data from [1] The code and resulting tab-separated text file is located in:

    E:\McNair\Projects\Accelerators\Web Scraping for Accelerators


Attempted to try and plot the circles for Houston to Google Maps.
Succeeded in plotting the points themselves but not the circles
This is a bit strange considering I reused code from the previous project.


Succeeded in plotting the circles for Houston on Google Maps
Algorithm is faster than previous algorithm.
Made new page for the project


Created a script in python to concatenate 162 excel files The file is located in the following location:

"E:\McNair\Projects\SBIR\Aggregate SBIR"

It is titled "concat_excel.py"
The text file is in the folder, titled "SBIR"


fixed some bugs in the k-means based enclosing circle algorithm
updated Enclosing Circle Algorithm (Rework)
started running the algorithm on list of houston city coords


For a total of 41 coordinate sets, the algorithm completed the plots in about 32 hours.
Average run time at a little over an hour per coordinate set
Added functionality that would save the area of the total circles to a text file
Running this now and will be completed end of the day tomorrow.
Worked through the MATLAB code for the monte carlo
Stripped the code for MSM,Monte Data, Industry 4, and Genetic Algorithm
Will run tomorrow and adjust the data locations to try and debug


Identified some key errors in the stripped code for the Monte

1. The genetic algorithm is not returning a scalar value
2. The problem is the objective function is returning an empty vector, instead of a scalar

Made a plotter for Diana so she can visualize Houston Start-Up Data


Fixed the bug. It was because we didn't have a global variable in the master script.
James helped figure this out.


Figured out there was another bug in the code. After the optimization finished, there is a problem of mismatched dimensions.
The code itself still takes a very long time to run.


Fixed the mismatched dimensions problem, but the code is still running.
Went over a few methods to speed up the code:

1. Use gurobi for the solver
2. run the sampling on a bunch of clusters

Downloaded gurobi:

password: amount

Updated the plotter to include heatmaps.
The geocode might be erroneous.


Ran the MATLAB code for the monte carlo after fixing the bugs.
Installed the MATLAB interface for Gurobi
Worked on finishing some test cases for the enclosing circle.


The code ran and worked just fine.
Now I need to figure out how to replace the optimization in the code with a Gurobi optimization.
Finished the code for test cases on the enclosing circle.


Ran the enclosing circle and plotted run time vs. number of points
Received an exponential graph.
Finished deriving the areas for the test vc data using enclosing circle
Worked on modularizing the google plotting functions
Need to make the code a command line tool


Ran enclosing circle and saw that it used 2.6% of the CPU and ~200 MB of memory.
Talked to Oliver about parallelizing the code in C++ or Java.
Talked about using the highest K, with the lowest error to run the algorithm.
Made an interactive command line tool for plotting the google maps circles


Started running the enclosing circle for the PhD students.
Made an interactive command line tool for using the enclosing circle algorithm.
Started looking at how to attack the MATLAB monte carlo problem using Gurobi.


Looked at the optimization function for the Monte.
It is called readme.pdf
It seems that the output of msmf_corr_coeff.m contains the appropriate objective function.
It seems that nonlinearcons_msm.m outputs the appropriate constraints.
Need to look at how to implement this in Gurobi for Matlab.


Helped Diana learn how to implement the circle plotter for startup locations.
Read up more about the genetic algorithm function in matlab.


Created a separate script for gmm_2stage_estimation which is used to implement gurobi.
Was able to implement the lower bound,upper bound of the variables using gurobi.
Am finding it difficult to describe the objective function in terms of the variables themselves.
If I could do that, this would be a whole lot easier.
The objective function does not match the readme.pdf the author has provided.
Tried running the same program with the altered objective function.


Currently timing each of the functions involved with this script.
Matlab encountered an error which forced the closing of the script.
Will re-run


Ran the monte and timed the functions.


Since the code actually runs now, thought about ways to add more info to what we know about the code.
Still don't know how the function fed into the genetic algorithm actually relates to the documentation.
Some ideas for tomorrow:

  1. use the actual generations and values from the original script and time it


Did as suggested above.
Here are the results.

When comparing the two, it seems like changing the generation size just makes the function take longer to run, but even after one generation the fitness function does not improve its value.
I don't understand why you'd need more than one generation.


Fixed the monte with James' help and utilized integer linear programming inside the genetic algorithm.
Previously, the author was using a linear program and rounding the results (not advisable).
We figure there must be some error in the monte because the genetic algorithm does not update.


The value passed into the genetic algorithm is either 0 or .1642.
Here are the values of the key variables that determine the input of the genetic algorithm:

  1. mm: either .1111 or .2462
  2. M0: always .1111
  3. W: either 1 or .0240
  4. L: always 9

James seems to think the "trueval" is already set close enough to the optimal.
This seems correct.


I changed trueval to be a larger margin to see if the genetic algorithm changes.
And it didn't do anything. The function value always remains at 0 and the variables retain their values.
We changed the market size, etc and now the function seems to change its values.


Got the function to work and produce a different outcome.
Seems to be a problem with the strip_mkt_resample.
Got a lot of resulting "NaN" values, traced it to stderr.
We changed a value to an outer product in stderr and ran the code.


Here is the output when incorporating the outer product instead.


All the master files will run appropriately with the proper "task".
These files are located here:

  1. E:\McNair\Projects\MatchingEntrepsToVC\AdjustedCode

You literally open up the master.m file and press run.