PiCloud + Pyevolve: Evolutionary Algorithms in the Cloud
It was quite simple to get Pyevolve running up on PiCloud.
For an Evolutionary Algorithm, there are a number of ways it can be parallelized. One of the easiest things to do is to run parallel instances of the algorithm with different initial random seeds. (This is a common practice in the research community – since starting with different random seeds allows testing of the robustness of a new algorithm or the correct implementation of an existing one). The first exercise that I tried with Pyevolve + PiCloud is to run Pyevolve’s Genetic Algorithm implementation with 10 different initial seeds on PiCloud – all running in parallel. Here’s how, you can go about it:
Installing Pyevolve
- Clone the git repository from https://github.com/perone/Pyevolve
- Install Pyevolve: $ sudo python setup.py install
- Just to check if it has been installed properly, just run one of the examples in the examples/directory
Setting up Picloud
- To setup PiCloud, please follow the instructions at http://www.picloud.com/ starting with registering for a free account
- Once you have installed the PiCloud client, please verify the installation by typing ‘import cloud’ from your Python prompt and see that it doesn’t throw up any errors
Getting Ready to Launch
For the purpose of this experiment, I shall be using one of the examples that ship with Pyevolve: pyevolve_ex7_rastrigin.py after some modifications. The modified file is here. Specifically, we change the main() function to run_ga() taking two parameters: seed and runid. The seed parameter is used to provide a random initial seed (used when creating the GA engine using ga=GSimpleGA.GSimpleGA(genome,seed)). The CSV adapater is initialized to store the statistics in a CSV file, where the runid file is used for specifying the name of the file.
Okay, now I shall describe the main driver script which I used to run this GA in the cloud. Here it is:
#!/usr/bin/python # Simple demo to show how to run the Pyevolve # Evolutionary Algorithms framework on PiCloud # Pyevolve: http://sourceforge.net/projects/pyevolve/ # PiCloud: https://www.picloud.com/ # Amit Saha # http://echorand.me from pyevolve_rastrigin import * import cloud cloud.setkey(<API KEY>, 'SECRET KEY') # List of Random seeds and run-Ids # assuming 10 runs seed_list=[100*(i+1) for i in range(10)] runid_list=[i+1 for i in range(10)] # calls the method defined in pyevolve_rastrigin.py # which initiates the GA execution. # Execute the code on PiCloud jids = cloud.map(run_ga,seed_list,runid_list) # check if the jobs are complete, if yes # pull the stat files cloud.join(jids) for i in range(10): cloud.files.get('stats_' + str(i+1) + '.csv','stats_' + str(i+1)+'.csv')
One of the first things you need to do is set the API Key and Secret key that you will be using to access PiCloud. The API Key and the Secret key can be seen in the “API Keys” page of your PiCloud account. This is done by the line cloud.setkey(, 'SECRET KEY')
. Next, we use the cloud.map() function to call the run_ga
function for each pair of the values of the lists seed_list
and runids_list
. This is a efficient way of running the same function in the cloud, but with a different set of parameters. Once that is launched, you can see the state of your jobs from your PiCloud’s account’s Jobs page.
Next, what we do is we wait for all the jobs to finish using cloud.join(jids)
and then pull all the statistic CSV files from PiCloud’s storage to the local file syste using cloud.files.get('stats_' + str(i+1) + '.csv','stats_' + str(i+1)+'.csv')
(More information on using the cloud.files module is here). Once the files have been copied, you can see the results from the CSV files – each CSV file representing the result of one run of the GA. We have not yet talked about how the CSV files were created, however.
Creating files on PiCloud from Pyevolve
The source code of CSV adapater, where the CSV files are created is in pyevolve/DBAdapters.py. The open()
method of class DBFileCSV
is where the file is opened for writing and the close() method of the same class closes the file handle when the writing is finished. However, this method of creating files won’t work for PiCloud – rather the data won’t be retrievable for our client. We have to create the file the PiCloud way – which is to use the cloud.files.put()
function. All I did here was add the line cloud.files.put(self.filename)
in the close() method. And I reinstalled Pyevolve module and it all worked fine. You may find my modified DBAdapaters.py file here.
Conclusion
I hope to discuss with the Pyevolve folks what they think of my changes to the DBAdapter class and see if they suggest a better way. Like I mentioned in the beginning, this is a very naive way of harnessing the power of parallel computing in Evolutionary Algorithms. I hope to explore more in this direction with PiCloud. If you have any queries, please leave a comment.
Thank you for reading.