GAs with a starting point?
by Amit
Local search methods in Optimization problems typically start with an initial solution. It then iteratively explores around in its neighborhood and moves to a better solution, if found. This better solution found now becomes the starting point of the next iteration. When a better solution is not found, the iteration stops and usually we have the solution we are looking for.
Couple of things need some explanation, although brief:
- Definition of the neighboring solution: This is a problem dependent definition. If the problem is to find a optimal solution to the TSP problem, a neighboring solution would be a neighboring node. If the problem is to find a local minima of a function having a continuous search space, then a slight pertrubation in each components of the solution can be a new neighboring solution (as done in Hooke-Jeeve’s pattern search)
- The next better solution may be done deterministically by examining all the solutions in a neighborhood such as in Hooke-Jeeve’s pattern search and Hill Climbing or stochastically as in Simulated annealing
Genetic Algorithms on the other hand do not have such a definite and specified starting point, but a random starting population of individuals where we may have some potentially optimal solutions which will finally give us our desired solution after the GA iterations are over.
Recently, David on the pyevolve mailing list posted a query asking if there was a way if pyevolve allowed to choose a starting population, (not individual), since he had a fair idea of the solutions he was looking for but wanted to optimize it further, if possible. Pyevolve, doesn’t yet have a API for such a use case. I blatantly, foolishly replied with my thought that it’s not possible in population based method. Not possible? WTF WTF. I mixed up what Pyevolve doesn’t allow and we actually cannot.
On further thought, why not! Say we know that our 2-variable problem has (a , b) as a possible good solution and we want to know if further improvement is possible.
Now assuming that we want a population of size 100 for our current problem, we can simply choose our initial population to be around the point (a , b). How? A very simple way to do is to consider a circle with (a , b ) as the center and of some arbitrary radius. Now we can easily find a way to find 25 (=100/4) solutions from each quadrant of the circle. That way, we have a fairly uniform representation of the neighborhood around (a , b).
For example:
The above figure shows a circle centered at (0.5,0.6) considered as a starting point of an example problem and the representative 100 individuals chosen from its neighborhood.
I tried the above approach with a couple of 2 variable functions, one of them being the Himmelblau function, often used as a test function in classical optimization literature. With different starting points, I could get a Real coded GA to converge to the correct solution. These are just some initial results and probably the utility of such a approach is to be explored fully.
The key point here is to start with a initial population around a possibly good solution, rather than a completely random one.
Do you have good ideas for API changes to allow setting an initial population? Just started looking at changing some code, but would like to have your ideas as input. Pyevolve is working quite well and my only deficiency is seeding initial answers. The only other changes I would are are to override num cpus to use for multiprocessing. Do you take pull requests?