Programming and writing about it.

echo $RANDOM

Tag: beaker

LCA 2015 talk: Beaker’s Hardware Inventory system

The video is up on YouTube: http://t.co/WorOwbv37w

Slides: https://amitksaha.fedorapeople.org/lca2015/slides.html

Since I could not make it to LCA, Nick Coghlan presented the talk on my behalf. Thanks Nick!

Advertisements

A docker based workflow for working on beaker

While working with beaker‘s code base, I often feel the need to run my tests for a patch/feature and continue to work on with different things while they run, including running other tests testing something different. Currently this is not possible since we start off with a clean database on every test run and simultaneous runs would obviously make one run step on another’s feet.

I finally have an initial docker based prototype for making this possible.

Get started with Beaker on Fedora

Beaker 0.14 was released recently and if you are an existing user of Beaker, you may see the What’s new page here

If however, you do not know what Beaker is, the Architecture guide is a good start and if things look interesting, with this release there is also documentation now to setup a Beaker “test bed” using two Virtual machines (via libvirt). 

Notes on writing systemd unit files for Beaker’s daemon processes

Recently, I had a chance to write systemd unit files for the daemon processes that run as part of Beaker: beakerd which is the scheduling daemon running on the server and the four daemons running on the lab controller – beaker-proxy, beaker-provision, beaker-watchdog and beaker-transfer.

This post may be of interest to you if you are using python-daemon to write programs which are capable of running as daemon processes and you want to write systemd unit files for them.

beakerd’s unit file

Here is the systemd unit file for beakerd, which I will use to illustrate the core points of this post. The other unit files are similar, and hence I will explain only where they differ from this one:

[Unit]
Description=Beaker scheduler
After=mysqld.service

[Service]
Type=forking
PIDFile=/var/run/beaker/beakerd.pid
ExecStart=/usr/bin/beakerd
User=apache
Group=apache

[Install]
WantedBy=multi-user.target

The [Unit] section has a description of the service (using the Description option) and specifies that it should start after the mysqld.service has started using the After option. beakerd needs to communicate to a MySQL server before it can start successfully. It can work with a local or a remote MySQL server. Hence, specifying After sets up an ordering that if there is a local MySQL server, then wait for it to start before starting beakerd. Using Requires is not suitable here to accommodate the possibility that beakerd may be configured to use a remote MySQL server.

In the [Service] section, the Type is set to Forking. This is because, beakerd uses python-daemon which forks itself (detaches itself) during the daemonization. However, you must ensure that when creating a DaemonContext() object, you should specify detach_process=True. This is because, if python-daemon detects that it is running under a init manager, it doesn’t detach itself unless the keyword is explicitly set to True, as above (you can see the code in daemon.py). Hence, although not setting the above keyword would work under SysV Init, it doesn’t work under systemd (with Type=Forking), since the daemon doesn’t fork at all and systemd expects it to fork (and finally kills it). The PIDFile specifies where the process ID is dumped by beakerd and is setup while creating the DaemonContext object as follows and ExecStart specifies the location to the binary that is to be started.

The beakerd process is to be run as the apache user and group, which is specified by the User and Group options.

In the [Install] section, the WantedBy option specifies when the beakerd process should be started (similar to the concept of “run levels” in SysV init). systemd defines several targets, and here we define that we want beakerd to start as part of the multi user setup.

That’s all about beakerd’s unit file.

beaker-provision’s unit file

beaker-provision and the other daemons running on the lab controller have similar unit files:

[Unit]
Description=Beaker provisioning daemon
After=httpd.service

[Service]
Type=forking
PIDFile=/var/run/beaker-lab-controller/beaker-provision.pid
ExecStart=/usr/bin/beaker-provision
User=root
Group=root

[Install]
WantedBy=multi-user.target

All the four lab controller daemons need to communicate with Beaker web application – which can be local or remote, and hence the After option specifies the dependency on httpd.service. And, this particular daemon runs as root user/group, which is specified by the User and group options.

And everything else is similar to beakerd’s unit file and also the other lab controller daemons.

Shipping SysV init files and systemd unit files in the same package

The beaker packages ship both SysV init files and systemd unit files now so that it can use systemd when available, but use SysV init otherwise. This commit can give you some idea of how to go about it.

systemd resources

These links proved helpful to learn more about systemd, including how to package unit files for Fedora:

Conflicting libdb dependencies: rpm-libs + httpd 2.2

I ran into a segfault-ing web application while attempting to get Beaker server working on Fedora 17. Beaker uses TurboGears for the web application using mod_wsgi to talk to Apache (httpd-2.2). (I am a little new to the web application world, so ignore any misuse of terms).

The particular use case that triggered the crash was uploading a task RPM to the Beaker server. That was some hint into where to look for the faulting code. The task RPM was being saved fine on the disk, so there was no problem in writing (permissions, etc). So, there had to be a problem in reading the RPM file (which is what Beaker does to retrieve the files and task specifications).

To investigate whether it was a problem reading the RPM file, I simply copied the relevant code from Beaker sources to a standalone Python script. No problems there. Hence, there had to be something wrong with the combination of using the rpm libraries (rpm-libs) together with the web application (httpd).

Debugging httpd with gdb

After futile efforts of inserting exception handling codes combined with logging in the code to bracket the code which was causing the crash, I decided to bite the bullet and just go the gdb way. Turns out it was simpler than I thought. Run ‘httpd’ in single process mode, and  then monitor the error_log file for the wsgi daemon process ID and then attach to it using gdb in another terminal.

Start httpd in single process mode:

# gdb /usr/sbin/httpd
(gdb) run -X

Then attach gdb to the wsgi daemon process (You should ensure via your wsgi configuration to create only one mod_wsgi daemon process with a single thread):

# gdb /usr/sbin/httpd attach 
(gdb) cont.

Once I had done this, I simply performed the action that triggered the crash, and I had a good stack trace, first few lines of which were:

#0 0x0000000100000001 in ?? ()
#1 0x00007fffd7fe1097 in db_init (dbhome=0x7fffdf7a9e70 "/var/lib/rpm", rdb=0x7fffdd052970)
at backend/db3.c:151
#2 dbiOpen (rdb=rdb@entry=0x7fffdd052970, rpmtag=rpmtag@entry=0, dbip=dbip@entry=0x7fffe399ff38,
flags=flags@entry=0) at backend/db3.c:551
#3 0x00007fffd7fe8e53 in rpmdbOpenIndex (db=db@entry=0x7fffdd052970, rpmtag=rpmtag@entry=0, flags=0)
at rpmdb.c:149
#4 0x00007fffd7fe93ef in openDatabase (prefix=, dbpath=dbpath@entry=0x0, dbp=dbp@entry=
0x7fffdf797648, mode=mode@entry=0, perms=perms@entry=420, 
.
.
.

So that pretty much confirmed that it was something that httpd did not like about the rpm-libs which caused it to crash the application. Discussing with Dan, this was indeed a case of conflicting shared libraries and bit more of looking around we found that this was the Berkeley DB database library (libdb) that was the culprit. httpd had both libdb-4.8 and libdb-5.2 loaded in its process maps, also verified with ‘lsof‘ (Thanks to StackOverflow for the lsof tip):

# lsof /lib64/libdb-4.8.so

COMMAND   PID   USER  FD   TYPE DEVICE SIZE/OFF    NODE NAME
httpd    9722 apache mem    REG  253,1  1555128 1189531 /usr/lib64/libdb-4.8.so
httpd   10578 apache mem    REG  253,1  1555128 1189531 /usr/lib64/libdb-4.8.so
httpd   18828 apache mem    REG  253,1  1555128 1189531 /usr/lib64/libdb-4.8.so
httpd   18832 apache mem    REG  253,1  1555128 1189531 /usr/lib64/libdb-4.8.so
gdb     18863   root mem    REG  253,1  1555128 1189531 /usr/lib64/libdb-4.8.so

[root@asaha temp]# lsof /lib64/libdb-5.2.so 
COMMAND    PID   USER  FD   TYPE DEVICE SIZE/OFF    NODE NAME
gdb      18824   root mem    REG  253,1  1756808 1204315 /usr/lib64/libdb-5.2.so
httpd    18832 apache mem    REG  253,1  1756808 1204315 /usr/lib64/libdb-5.2.so
gdb      18863   root mem    REG  253,1  1756808 1204315 /usr/lib64/libdb-5.2.so
qemu-kvm 30866   qemu mem    REG  253,1  1756808 1204315 /usr/lib64/libdb-5.2.so

As you can see, both the libdb versions is mapped to httpd’s process space. Since, httpd itself depends only on libdb-4.8, there is something else in the web application which is bringing in libdb-5.2. That something turned out to be rpm-libs:

# yum deplist rpm-libs |  grep libdb
  dependency: libdb-5.2.so
   provider: libdb.i686 5.2.36-5.fc17

So, that’s the problem and the reason for the crash.

Solution

There is no solution on Fedora 17 at this point of time to this other than trying to get httpd-2.4 which is linked against libdb-5.2.so. However, the Fedora 18 release, currently in development ships with httpd-2.4 which is linked against libdb-5.3.so, same as the rpm-libs version it is shipped with. And indeed, the above crash did not occur there.

Reproducing

To reproduce this with a minimal application, I taught myself how to integrate Flask with mod_wsgi and wrote this simple Flask application, which you can check out here. Follow the steps on the Flask docs for help.

Resources

Thanks to Dan on #fedora-qa and Dan in real life who helped me with debugging the crash and thanks to Graham Dumpleton for the mod_wsgi help on #pocoo. Here are some of the docs I referred to:

Related bug reports