Introducing pwait

Posted by David Zaslavsky on November 29, 2014 2:52 PM

— Comments

Linux
pwait

Today I’m announcing pwait, a little program I wrote to wait for another program to finish and return its exit code. Download it from Github and read on…

Why pwait?

Because I was procrastinating one day and felt like doing some systems programming.

Seriously though. Sometimes you need to run one program after another one finishes. For example, you might be running a calculation and then you need to run the data analysis script afterwards, but only if the calculation finished successfully. The easy way to do this is

run_calculation && analyze_data

(sorry to readers who don’t know UNIX shell syntax, but then again the program I’m introducing is useless to you anyway).

Which is fine if you plan for this before you run the calculation in the first place, but sometimes you already started the calculation, and it’s been running for 3 hours and you don’t want to stop it and lose all that progress. The easy way to do this is to hit Ctrl+Z (or some equivalent; it depends on your terminal) to suspend the calculation, and then run

fg && analyze_data

which will resume it and run the analysis script afterwards.

Which is fine if the program is actually running in a terminal where you can get to it to suspend it, and doesn’t already have something else set to run after it. But what if it’s not?

Or what if doing this doesn’t give you enough hacker street cred?

This is where pwait comes in. You run it as

pwait <pid>

and it will wait for the process with ID <pid> to finish, intercept the exit code, and return it as pwait‘s own exit code. You can use this to passively observe whether a program finishes successfully or not. Or, at least, semi-passively.

Blurry animation of pwait at work

How it works

pwait uses the ptrace system call to attach itself as a tracer to the process you want to wait for. A tracer process can do all sorts of things to its tracee, including stopping and starting it, examining its memory, changing values in its memory, filtering signals that are going to be sent to it, and so on. ptrace is mainly used by debuggers. But pwait ignores most of its tracer superpowers, only watching out for one thing: the signal the process receives when it is about to exit. The exit code is contained in that signal. So pwait copies that status code and exits itself.

Using ptrace has some drawbacks. For instance, you can’t have multiple tracers tracing the same process. This means you can’t wait for a program you’re debugging. (I can’t imagine this ever really being a problem, but you never know.) You also can’t wait for a single program with multiple instances of pwait. (I could imagine this being an inconvenience.)

To get around some of these issues, I added a netlink mode. netlink is a way for the Linux kernel to pass messages to and from normal (userspace) programs. Of course, there are many ways messages get passed back and forth between the kernel and userspace, but netlink is rather generic and you can get a broad spectrum of information out of it. Of particular interest to pwait is that netlink can be configured to emit a message every time a process exits. pwait can then register itself to wait for that signal. It gets notified about every process that exits on the whole system, but it’ll just discard all those notifications until it finds one that matches the process ID it’s looking for.

netlink is definitely on the “new and fancy” end of Linux kernel tools; in fact, I could only find one website that demonstrates the functionality I needed for pwait.

How to get it

I’m not particularly confident in this program yet, so there’s no formal release. Just head to Github and click on “Download ZIP” in the lower right, or just clone the repository if you prefer. Bug reports and feedback are very welcome!