Changes

Jump to navigation Jump to search
no edit summary
5) Return the scheme with the minimum area.
 
==Problems==
In my first implementation of EnclosingCircle (EnclosingCircle.py), there is a mistake in my memoization scheme where I had been setting a variable equal to
a "scheme[1].sort()", forgetting that the sort function operates in place and returns None. Thus, the memoization scheme was being used randomly and excessively such that not all possible circles were calculated and incorrect results were returned (hence the subsuming circle nonsense). Another mistake in the original algorithm was that when we were choosing the next point to add to a given circle based on how far it was from the first point I chose in the circle, rather than from the center of the circle. I corrected this by recalculating the circle every time I add a point and then sorting the rest of the points based on their distance from that center. Unfortunately, this code works very well and produces correct results where the original algorithm did not, but runs incredibly slowly. I do not think that it is feasible to use this algorithm on the volume of data that we need to analyze. I am currently running an example with 81 points (Houston 2012) split into groups of at least 4 and will report statistics on it soon.
 
==Ideas for Fixing==
One major holdup of the new algorithm (besides the fact that it calculates a ton more circles) is the recalculating of the center point every time I need to add a new point to the circle, which seems a bit absurd. I think it might be faster to adjust the center of the circle heuristically so between the first two, take the average x and y coordinate. For the next addition. Take a weighted average of the three points. Etc. I am not sure if this would work out mathematically but I can work it out and see. This might cut down on runtime, but we still need to calculate the distance from each individual point after that, which also takes a long time. This could possibly be resolved heuristically as well, since I don't really need to calculate all the distances since points that were really far from the circle before probably aren't going to be chosen anyway. I can look into implementing some of these algorithmic changes and testing them on Monday and Tuesday.
 
In the longer term, I would recommend trying to parallelize the algorithm, possibly with python threads or implementing it in java and using the constructs there. I do not know if I will have time to start this before the summer, but there are many processes that can be run in parallel for this algorithm so I think it is a good strategy.
 
Specifically, all calls to findgroups can be run in parallel so long as you wait to receive the results from all the threads before deciding on an answer.
For example imagine a system with 5 points split into groups of at least 2. There would be one call to findgroups on all 5 points, then calls to findgroups on a 3-2 split and on a 2-3 split. Those two calls could run in parallel, as long as the result from the call on all 5 points waits to receive the results from both children threads before deciding which is the optimal solution. You would have to be careful to keep threads from writing to the same memory (specifically for the removals in pointset), but I think (depending on how many cores our system operates on) this could incur impressive speed up.
 
==Runtime==
272

edits

Navigation menu