Programming and writing about it.

echo $RANDOM

Category: Research

2-cent tip: IEEE PDFeXpress compatible PDF file

This is a borrowed 2-cent tip from a source I now don’t recall. (Kindly comment if it was YOU). My apologies in advance for not remembering you.

To create IEEE PDF eXpress compatble PDF files from your LaTex sources: (on Linux)

  1. Create the .dvi file: $ latex  paper.tex
  2. DVI to PS: $  dvips -Ppdf -G0 -tletter paper.dvi
  3. PS to PDF:  ps2pdf -dCompatibilityLevel=1.4 -dPDFSETTINGS=/prepress paper.ps paper.pdf

And you should be good to go.

Advertisement

PiCloud + Pyevolve: Evolutionary Algorithms in the Cloud

It was quite simple to get Pyevolve running up on PiCloud.

For an Evolutionary Algorithm, there are a number of ways it can be parallelized. One of the easiest things to do is to run parallel instances of the algorithm with different initial random seeds. (This is a common practice in the research community – since starting with different random seeds allows testing of the robustness of a new algorithm or the correct  implementation of an existing one). The first exercise that I tried with Pyevolve + PiCloud is to run Pyevolve’s Genetic Algorithm implementation with 10 different initial seeds on PiCloud  – all running in parallel. Here’s how, you can go about it:

Installing Pyevolve

  1. Clone the git repository from https://github.com/perone/Pyevolve
  2. Install Pyevolve: $ sudo python setup.py install
  3. Just to check if it has been installed properly, just run one of the examples in the examples/directory

Setting up Picloud

  1. To setup PiCloud, please follow the instructions at http://www.picloud.com/ starting with registering for a free account
  2. Once you have installed the PiCloud client, please verify the installation by typing ‘import cloud’ from your Python prompt and see that it doesn’t throw up any errors

Getting Ready to Launch

For the purpose of this experiment, I shall be using one of the examples that ship with Pyevolve: pyevolve_ex7_rastrigin.py after some modifications. The modified file is here. Specifically, we change the main() function to run_ga() taking two parameters: seed and runid. The seed parameter is used to provide a random initial seed (used when creating the GA engine using ga=GSimpleGA.GSimpleGA(genome,seed)). The CSV adapater is initialized to store the statistics in a CSV file, where the runid file is used for specifying the name of the file.

Okay, now I shall describe the main driver script which I used to run this GA in the cloud. Here it is:


#!/usr/bin/python
# Simple demo to show how to run the Pyevolve
# Evolutionary Algorithms framework on PiCloud
# Pyevolve: http://sourceforge.net/projects/pyevolve/
# PiCloud: https://www.picloud.com/

# Amit Saha
# http://echorand.me

from pyevolve_rastrigin import *
import cloud

cloud.setkey(<API KEY>, 'SECRET KEY')

# List of Random seeds and run-Ids
# assuming 10 runs
seed_list=[100*(i+1) for i in range(10)]
runid_list=[i+1 for i in range(10)]

# calls the method defined in pyevolve_rastrigin.py
# which initiates the GA execution.
# Execute the code on PiCloud
jids = cloud.map(run_ga,seed_list,runid_list)

# check if the jobs are complete, if yes
# pull the stat files
cloud.join(jids)
for i in range(10):
    cloud.files.get('stats_' + str(i+1) + '.csv','stats_' + str(i+1)+'.csv')

One of the first things you need to do is set the API Key and Secret key that you will be using to access PiCloud. The API Key and the Secret key can be seen in the “API Keys” page of your PiCloud account. This is done by the line cloud.setkey(, 'SECRET KEY'). Next, we use the cloud.map() function to call the run_ga function for each pair of the values of the lists seed_list and runids_list. This is a efficient way of running the same function in the cloud, but with a different set of parameters. Once that is launched, you can see the state of your jobs from your PiCloud’s account’s Jobs page.

Next, what we do is we wait for all the jobs to finish using cloud.join(jids) and then pull all the statistic CSV files from PiCloud’s storage to the local file syste using cloud.files.get('stats_' + str(i+1) + '.csv','stats_' + str(i+1)+'.csv') (More information on using the cloud.files module is here). Once the files have been copied, you can see the results from the CSV files – each CSV file representing the result of one run of the GA. We have not yet talked about how the CSV files were created, however.

Creating files on PiCloud from Pyevolve

The source code of CSV adapater, where the CSV files are created is in pyevolve/DBAdapters.py. The open() method of class DBFileCSV is where the file is opened for writing and the close() method of the same class closes the file handle when the writing is finished. However, this method of creating files won’t work for PiCloud – rather the data won’t be retrievable for our client. We have to create the file the PiCloud way – which is to use the cloud.files.put() function. All I did here was add the line cloud.files.put(self.filename) in the close() method. And I reinstalled Pyevolve module and it all worked fine. You may find my modified DBAdapaters.py file here.

Conclusion

I hope to discuss with the Pyevolve folks what they think of my changes to the DBAdapter class and see if they suggest a better way. Like I mentioned in the beginning, this is a very naive way of harnessing the power of parallel computing in Evolutionary Algorithms. I hope to explore more in this direction with PiCloud. If you have any queries, please leave a comment.

Thank you for reading.

Fedora Scientific: Open Source Scientific Computing

Hello Fedora People, this happens to be my first aggregated post on Planet Fedora! Great to be here. Onto real stuff.

Okay, this post comes at a time when December is already upon us, Fedora 16 has been released for a month now and that also means that Fedora Scientific has seen the light of the day for a month now. I felt this might be a good time to describe the current state of the project and my plans for the next release(s).

Software in Fedora Scientific (Fedora 16)

The current list of software available in Fedora Scientific is available here [1]. Briefly, they are:

Scientific Computing tools and environments: The numerical computing package GNU Octave, front-end wxMaxima, the Python scientific libraries SciPy, NumPy and Spyder (a Python environment for scientific computing) are some of the software included in this category. A development environment for R, the statistical computing environment, is also included, and so are the ROOT tools for analysing large amounts of data.

Generic libraries:    Software in this category includes the GNU C/C++ and FORTRAN compilers, the OpenJDK Java development tools, and the IDEs NetBeans and Eclipse. Also included are autotools, flex, bison, ddd and valgrind.

Parallel  and  distributed programming   tools / libraries: Software tools and libraries included in this category include the popular parallel programming libraries OpenMPI, PVM, and the shared-memory programming library OpenMP. Also included is the Torque resource manager to enable you to set up a batch-processing system.

Editing,  drawing  and  visualisation  tools: So you have simulated your grand experiments, and need to visualise the data, plot graphs, and create publication-quality articles and figures. The tools included to help you in this include LaTex compilers and the Texmaker and Kile editors, plotting and visualisation tools Gnuplot, xfig, MayaVi, Dia and Ggobi , and the vector
graphics tool Inkscape.

Version control, backup tools and document managers: Version control and back-up tools are
included to help you manage your data and documents better: Subversion, Git and Mercurial are available, along with the back-up tool backintime. Also included is a bibliography manager, BibTool.

Besides these four main categories, some of the other miscellaneous utilities include: hevea–the awesome LaTex-to-HTML converter, GNU Screen and IPython.

As you can see that the list of software is quite extensive, thanks to the awesome Fedora developers who have packaged this gamut of software.

Future Plans

The current release marks the beginning of a project very close to my heart. I feel that such a spin shall definitely be useful for the current Linux community members and future enthusiasts who use Linux for their computing needs. In the next release(s), I intend to explore the following directions for the spin:

  • A GNOME based spin in addition to the current KDE spin
  • Custom wallpapers
  • Work with the websites team to update the Spin website to include high quality images of scientific software and more content
  • Collect feedback from the community and act on it  :-)

Talk to Us, Contribute

Come, talk to us on the Fedora SciTech SIG mailing list [2]. Thanks to all the members of SciTech SIG for their useful discussions and comments.

Acknowledgements

Fedora Artwork and Fedora Websites team for help in the artwork for the spin,  Bill Nottingham for the initial comments on the idea and Christoph Wickert  for seeing the spin through for release. All the other people who contributed even with a single word of encouragement online and offline, please acknowledge my sincere thanks.

References

[1] https://fedoraproject.org/wiki/Scientific_Packages_List
[2] http://fedoraproject.org/wiki/SIGs/SciTech

Parts of this blog post has been reproduced from my article on Fedora Scientific Spin  published in the December, 2011 issue for Linux For You.

Fedora Scientific: The Prologue

The Itch

When I wrote this [1] article a while back, the intention was to publicize the software tools that I was personally using at the point of time to help me in my research work- plotting graphs, analysing data, writing papers, running simulations, e.t.c. Those tools soon became indispensable for my research and hence I always installed them first after a fresh install of Linux. I longed for a Linux distro which would already have these tools installed and allow me to have a fully functional Linux workstation from the first boot.

The Scratching begins

I was getting wary of Ubuntu after their last release (April, 2011) and was looking for a new distribution to commit to – I thought I will give Fedora a shot (last time I tried Fedora was during the Fedora Core days) on one of my computers. Then, I started looking around for ways to create custom Fedora spins when I came across the tutorial for Fedora [2]. And that’s pretty much all I needed to get started working on a Linux for users in Science and Academia – Fedora Scientific

Discussions on Mailing lists

The most fruitful technical part of the discussion happened on the Canberra Linux User’s Group. [4] Thanks to all the folks who made suggestions for various packages and more importantly opined that the spin would be useful to the target audience.

Fedora Spins SIG

The official word on whether the proposed spin would be found useful by the Fedora community in general and Linux community overall was decided by the Fedora Spins SIG  [5]. Thanks to their support and approval.

Where next

Fedora Scientific is officially on course for release with the Fedora 16 release in the next few days. The nightly builds are now available from [6]

Talk to Us, Contribute

Come, talk to us on the Fedora SciTech SIG mailing list [7]. Thanks to all the members of SciTech SIG for their useful discussions and comments.  This page explains the spin in more detail.

Current List of Packages

The current list of software made available in Fedora Scientific Spin are at [8].

Acknowledgements

Fedora Artwork and Fedora Websites team for help in the artwork for the spin,  Bill Nottingham for the initial comments on the idea and Christoph Wickert  for seeing the spin through for release. All the other people who contributed even with a single word of encouragement online and offline, please acknowledge my sincere thanks.

Links

[1] linuxgazette.net/173/saha.html
[2] http://fedoraproject.org/wiki/How_to_create_and_use_a_Live_CD
[3] https://fedoraproject.org/wiki/Scientific_Spin
[4] http://lists.samba.org/archive/linux/2011-July/030331.html
[5] http://fedoraproject.org/wiki/SIGs/Spins
[6] http://dl.fedoraproject.org/pub/alt/nightly-composes/
[7] http://fedoraproject.org/wiki/SIGs/SciTech
[8] https://fedoraproject.org/wiki/Scientific_Packages_List

In the next post, which I intend to do soon after the official release, I shall talk about the applications and programs installed in Fedora Scientific.

And last, but by no means the least- Snowy, you make this world a better place for me.

Explainer: Evolutionary Algorithms

Whenever you undertake an activity that seeks to minimise or maximise a well-defined quantity such as distance or the vague notion of the right amount of sleep, you are optimising.

Like I have mentioned elsewhere, I like to introduce complex (and not so complex) concepts in a popular science fashion – for consumption by even the high-school kid. Hence, when I was contacted by Bella from The Conversation, I was excited to write about Evolutionary Algorithms and optimization and what we as a group work on.

The article is now live here. Hope you enjoy the read. Many thanks to the team at The Conversation for the final touches on the article.

Snowy, your support in various ways is always appreciated.

Link: http://theconversation.edu.au/explainer-evolutionary-algorithms-3580

Book Review: Essentials of Metaheuristics

Getting the book: http://www.cs.gmu.edu/~sean/book/metaheuristics/

I picked up a copy of this book from the man himself, Sean Luke at the IEEE CEC 2011. I was “aware” of this book from a while back, so I thought it might be a good idea to pick a print copy for light readings during my travels post-conference. Here is a brief review of the book:

Synopsis:

As the author states, the book is a compilation of undergraduate lectures notes on Metaheuristics. It focuses on the applications of Metaheuristics to optimization problems including Multi-objective optimization, Combinatorial optimization and Policy optimization. Depending on your experience with Metaheuristics, this book will serve a different purpose for you:

  1. If you are quite well versed with them, this book will be a nice light reading, with interesting bits and pieces throughout
  2. If you are starting with them, or want to start with Metaheuristics, this book gives a nice well rounded view of the state-of-the art

Review

The book starts with an overview of gradient based optimization methods in Chapter 1 gradually moving to stochastic methods such as randomized hill-climbing, tabu search, simulated annealing in Chapter 2.

Chapter 3 introduces population methods — Evolution Strategies, Genetic Algorithms, Differential Evolution and Particle Swarm Optimization.

Over the last three chapters, the author introduces some fundamental concepts: the choice of representation of solutions, issues of exploration v$ exploitation and local optima traps.

Chapters 4-10 each discuss one specific topic. For example,  Chapter 4 is dedicated to representation of solutions — vectors, direct encoded graphs, program trees and rulesets.  Chapter 5 discussess parallel methods for metaheuristics and Chapter 7 talks about Multi-objective optimization. Chapter 8 and 10 talks about combinatorial optimization and policy optimization respectively. So, if you are looking for anything specific, you can directly jump to the relevant chapter (assuming, of course that you have the pre-requisite knowledge). As you can see in the ToC, most of the chapters from 4-10 depends on Chapters 3 & 4.

The book finally concludes with some descriptions of test problems and statistical tests that researchers often use to test their algorithms. The very important issue of selecting a proper random number generator is discussed in this chapter.

Conclusion

This book along with Evolutionary Computation: A Unified Approach (You may be interested in my review) is great for getting a holistic view of the Meta-heuristic methods, especially if you are more experienced with only one of them.

Getting the book: http://www.cs.gmu.edu/~sean/book/metaheuristics/

IEEE CEC 2011: Post-conference Thoughts

I am currently sitting by the window of my 10th floor of my hotel room and New Orleans looks beautiful at this time of the night. The neon glows of the hotels and shops around and the lights of those huge wooden/steel bodies on the mighty Mississippi is quite a spectacle for my bespectacled eyes. The IEEE Congress on Evolutionary Computation 2011 concluded today.  Over three days of paper and poster presentations, plenary lectures, cruise dinner on the steamboat Natchez and the sumptuous banquet last night, it was an awesome conference. Thank you Dr. Alice Smith and congratulations on the wonderful conference, which must have given you a lot of sleepless nights.

Here are some rough notes/thoughts/rants on the conference:

Plenary lectures

Each of the three days of the conference began with a plenary lecture. Natalio Krasnagor delivered the lecture on the first day talking about his work at the confluence of Natural sciences and Evolutionary Algorithms. Holger Hoos delivered the lecture on the 2nd day  where he had a lot of interesting things to talk about his research and mostly on topics of automating software development, having more degrees of freedom in software and algorithm selecting algorithms. Hod Lipson delivered the last of the plenary lectures and demonstrated his work on Evolutionary robotics and his super work, Eureqa. A lot to take home from each of these lectures. Enlightening and inspiring.

Interesting ideas/papers presented

  • A lot of work is being done on Genetic Programming, mainly as tools — in varied domains, from edge detection to blog network modeling. Once the IEEE CEC proceedings are indexed by IEEExplore, it would be very interesting to go through these papers. Available here.
  • Multi-view classification
  • Representation plays a key role in EAs and Daniel Ashlock‘s tutorial on this topic was (or supposed to be) quite enlightening, but I think I was busy doing something else. However, I intend to go through the slides he used and get an idea of the variety of representation schemes for different applications.

Rants

I usually tend to set high standards for myself and more often than not fail to achieve them, which ofcourse doesn’t deter me in setting them in the first place. Seeing a lot of “well known” people in this field presenting trivial works at a premier conference was quite disheartening. One good thing it does is that it makes me feel that may be I should be a little gentle to myself.

I wanted to change the world, but they lied to us

I don’t know about you, but I almost always think myself to be the cover page of the major world news papers or its nearest domain equivalent, whenever I do something cool/nice/interesting (according to myself, ofcourse). I thought writing up a paper titled “How does the good old GA perform at Real World Optimization?” would irk a lot of people and elicit reactions out of them. I guess nothing really matters. Sigh.

Well, anyway I am taking back a lot from this conference at the Big Easy. Good bye Poboy’s and Gumbo!

2-cent tip: Use them cores on MATLAB

Optimisation using Evolutionary Algorithms is a stochastic process. This makes it a fundamental requirement to “test” your algorithm using a number of different initial random seeds. Thus, several runs ranging from 10-30 of an algorithm are made to make a proper inference about its performance. Essentially this looks like:


FOR I = 1:MAXRUNS
.
"your algorithm"
.
END

This is essentially a sequential process and is a time consuming one. Even if you don’t have a cluster of nodes, but have a multi-core CPU at your disposal, you can easily make these runs run simultaneously on the multiple-cores using a couple of simple MATLAB constructs: matlabpool and parfor

First declare the number of worker labs using matlabpool open 3 . For example, if you have a quad-core box, you might want to set it to 3.  Then replace the FOR in above loop, by a parfor. Now you will see that there will be three MATLAB processes and three of your runs going on simultaneously.

2-cent tip: BibTex to bibitem format

I learnt this tip from:

Just putting it up here on my space, so as to replicate this awesomely useful tip so that I can quickly refer here when I need it and also for Google-fu to find it so that it may save some time for you, reader:

Steps:

Create a refs.bib file with all the BibTex entries, which are easily available from Google Scholar or similar

Create a “dummy” .tex file with the following entries:

\documentclass{article}
\begin{document}
\nocite{*}
\bibliography{refs}
\bibliographystyle{plain}
\end{document}

Now, do the following:

$ latex dummy
$ bibtex dummy
$ bibtex dummy
$ latex dummy

You will see a dummy.bbl file containing all your BibTex entries in \bibitem format.
Peace.

SEAL 2010 Papers I liked reading

SEAL 2010 is happening at IIT Kanpur, India from the 1st to 4th December, 2010. As someone who was a part of the initial arrangements and was part of the lab hosting it, it is surely going to be legendary what with people like Narendra Karmarkar delivering keynote lectures. Needless to say, it would have been great to be present.

The papers are up on the Springer website and here are some papers I liked reading:

  • Improving Differential Evolution by Altering Steps in EC: This is a very approachable paper where the authors describe their experiments by modifying a standard DE algorithm my incorporating relevant ideas from another EA, G3-PCX. The bigger picture is to move towards unified approach to Evolutionary Computing
  • Bayesian Reliability Analysis under Incomplete Information Using Evolutionary Algorithms
  • Metamodels for Fast Multi-objective Optimization: Trading off Global Exploration and Local Exploitation
  • Generating Sequential Space-Filling Designs Using Genetic Algorithms and Monte Carlo Methods
  • And a paper which I would have surely liked, had I udnerstood the paper fully would be Beyond Convexity: New Perspectives in Computational Optimization

The papers are available online at http://www.springerlink.com/content/978-3-642-17297-7/#section=819416&page=1