Handling UNIX/POSIX signals is hard. If you don’t know that they’re hard, then you aren’t paying attention. In Linux there’s a system call signalfd() that makes handling them much less error-prone and fits right in to a poll()-based event-loop. This article demonstrates why signals are hard, and how to use signalfd(2) with poll(2).[1]

The Problem

Why are signals hard? Suppose that a thread has mutex `m‘ locked. When your process receives the signal, the thread is interrupted to handle the signal. In the signal handler, you try to lock `m‘. Your program just deadlocked. The signal handler waits until `m‘ unlocked. But `m‘ is never unlocked because the thread holding the lock was suspended to handle the signal. In contrast, a multi-threaded program wouldn’t usually deadlock because both threads are still active (running).

Thus, signals programming is harder than multi-threaded programming.

If you read the man page for signal(7), it contains a list of functions that POSIX requires be “async-safe.” In other words, reentrant functions that don’t lock mutexes or semaphores. Functions that are safe to call from from a signal handler. If a function is not on the list, it could lead to deadlocks and undefined behavior. Note that functions like fprintf(), snprintf(), syslog(), etc. are not on the list.

Also, since you can’t lock a mutex, manipulating any global variables in your program is hard. However, it’s common for programmers to ignore this and illicitly do dangerous things in signal handlers (lock mutexes, use an un-safe function, write to variables without regard for atomicity). Usually they get away with it because signals are usually for exceptional (rare) situations — so they usually don’t hit the error space.

But after a developer learns how dangerous it is to do anything useful in a signal handler,[2] it’s not uncommon to see something like this:

volatile sig_atomic_t shutdown_program;

void signal_handler(int sig)
{
        if (sig == SIGTERM)
                shutdown_program = 1;
}

…and then let your program’s main loop poll the variable regularly to catch the shutdown signal. (See glibc’s recommendation)

Using signalfd(2) as a solution

The man page for signal(7) lists some synchronous solutions to this problem:

  1. sigwaitinfo(2), sigtimedwait(2), sigwait(3) – These suspend execution until a signal is received.  This is useful (for instance) if you have real work being done on threads, and you use your main() function exclusively to handle signals.
  2. signalfd(2) – Provides a file descriptor that will provide information about the signals when they occur.

The reason why I think signalfd(2) is cool is that you can make the file descriptor generate events for your main event loop. For example, if you use poll(2), ppoll(2), select(2), pselect(2), epoll(7), or the GLib GMainLoop… it’s easy to add file-descriptor-bound events to those loops.

Example using poll(2)

The code that follows is a simple example of how to use signalfd(2) with poll(2).  (Note: the man page for signalfd usually includes an example where you use read(2) to block until you get a signal.)  The procedure is as follows:

  1. Block the signals that you’re interested in. (Yes, block them.)
  2. Create the signalfd for the signals that you’re interested in.  (If you don’t block them, then signalfd won’t receive them.)
  3. When the file descriptor is ready to be read, read a `signalfd_siginfo’ structure from the file descriptor to get info about the signal.

Here’s an example:

/* Simple program using signalfd to watch for SIGINT
 * Compile: gcc -o main main.c
 * Execute: ./main
 * Warning: You must use kill(1) to terminate the program.
 *     Ctrl+C won't terminate it.
 */
#include <assert.h>
#include <poll.h>
#include <signal.h>
#include <stdio.h>
#include <sys/signalfd.h>
#include <unistd.h>

#define NFDS 1

int main(int argc, char* argv[])
{
        int err;
        sigset_t sigset;
        int fd;

        /* Create a sigset of all the signals that we're interested in */
        err = sigemptyset(&sigset);
        assert(err == 0);
        err = sigaddset(&sigset, SIGINT);
        assert(err == 0);
        err = sigaddset(&sigset, SIGHUP);
        assert(err == 0);

        /* We must block the signals in order for signalfd to receive them */
        err = sigprocmask(SIG_BLOCK, &sigset, NULL);
        assert(err == 0);

        /* Create the signalfd */
        fd = signalfd(-1, &sigset, 0);
        assert(fd != -1);

        /* This is the main loop */
        struct pollfd pfd[NFDS];
        int ret;
        ssize_t bytes;

        pfd[0].fd = fd;
        pfd[0].events = POLLIN | POLLERR | POLLHUP;

        for (;;) {
                printf("Waiting.\n");
                ret = poll(pfd, NFDS, -1);
                printf("I'm awake!\n");

                /* Bail on errors (for simplicity) */
                assert(ret > 0);
                assert(pfd[0].revents & POLLIN);

                /* We have a valid signal, read the info from the fd */
                struct signalfd_siginfo info;
                bytes = read(fd, &info, sizeof(info));
                assert(bytes == sizeof(info));

                unsigned sig = info.ssi_signo;
                unsigned user = info.ssi_uid;

                if (sig == SIGINT)
                        printf("Got SIGINT from user %u\n", user);
                if (sig == SIGHUP)
                        printf("Got SIGHUP from user %u\n", user);
        }

        return 0;
}

Try it out by running it and hitting `Ctrl+C’ or sending it a signal using kill(1). Because all our handling is happening in the main thread’s context, we don’t have hidden gotchas with respect to handling the signal.


[1] – Whenever you see `foo(2)’, that’s a reference to a man page that you can view by typing `man 2 foo’. The number is like saying “I want to read about `foo’ from chapter 2.” This is important because some man pages appear in more than one chapter (e.g. signal(2), signal(7)).

[2] – More precisely: “…in a signal handler that returns.”

Advertisements

Can you find the bug in the following code?

#include <pthread.h>

unsigned u = 0;

void* thread_main(void *arg)
{
        u = 1;
        return 0;
}

int main(int argc, char* argv[])
{
        pthread_t thread;
        pthread_create(&thread, 0, thread_main, &u);

        while (!u);

        pthread_join(thread, 0);
        return 0;
}

This is a simplified version of some test code I was writing. I was kicking off N threads and did a busy wait until all N threads were up and ready to go. Because of some function inlines, the compiler was accessing these thread variables directly… and eventually optimized them away.

If you compile this code with -O0 (using GCC), it will work fine.  If you compile it -O1, -O2, or -O3, then GCC will convert `while (!u)’ into:

0x00000000004004ea <+26>: mov 0x200b50(%rip),%eax # 0x601040 <u>
0x00000000004004f0 <+32>: test %eax,%eax
0x00000000004004f2 <+34>: jne 0x400500 <main+48>
0x00000000004004f4 <+36>: jmp 0x4004f4 <main+36>

Which is the assembly language equivalent of:

        if (!u) {
                for (;;);
        }

That is: GCC ignored the fact that the variable is a global (and could change in another thread), and it optimized away the check.  My first reaction is that this is a bug in GCC. However, after talking with some peers the consensus is that GCC is doing the right thing (whether I agree or not).

I know of 3 ways to fix the code…

Keyword `volatile’

You can declare `u’ as a volatile:

volatile unsigned u = 0;

By doing this, you tell the compiler to not optimize access to this variable at all.  The C and C++ standards imply that this would be used for things like memory-mapped hardware (meaning it could change unexpectedly).  Thus it is unclear if this is needed for multi-threaded programs.  It appears to be needed.

Memory barrier

You can place a memory barrier in the while() loop:

while (!u) __sync_synchronize(); /* GCC built-in instruction */

In a simplified sense, this will ensure that the cache is fully updated before continuing.  It is both a processor instruction and compiler directive.  It tells the compiler not to cache variable values across the barrier.

You don’t have to use __sync_synchronize() explicitly. Using a mutex, for instance, will cause a memory barrier to happen.

Call an `extern’ function (caveat emptor!)

If you call an `extern’ function inside the loop, then GCC doesn’t optimize away the `u’ check.

while (!u) usleep(1);

Notice that I said doesn’t, and not won’t or shouldn’t. I don’t know of any standard or convention that causes GCC to not optimize here.  Therefore, you should not depend on this behavior. Use one of the other methods.  However, this explains why this method usually works… because it’s usually written with a sleep() (or some other action) in the loop.

Conclusion

Some will say that the code was broken from the start because it didn’t access the integer through a mutex and that accessing integers isn’t always atomic. That’s not right. For one, on every platform that anyone cares about, reading or writing a single integer is atomic. But even if it isn’t, it’s usually 1 single bit that matters (0x00000000 and 0x00000001).  If you read bytes out of order it doesn’t matter because you’re only writing a 0 or a 1. Even if I read the new MSB and the old MSB-1… it doesn’t matter because they’re both 0x00. And cache synchronization usually doesn’t matter here, either. If I read a slightly out-of-date value, then I just go through the loop a couple extra times. Not a problem.

For the common case of using an integer as a lock-free mechanism to shut down a thread — `volatile’ is absolutely necessary. (The memory barrier is equivalent, but usually not portable.) This surprised me because I have been inundated by various sources that the keyword `volatile’ was not only The Wrong Thing — but it’s unnecessary unless you had some kind of deep interactions (e.g. memory-mapped hardware).

With that said, if you’re doing anything more complicated than this… do it right. Use mutexes and atomic operations. Lock-free algorithms are hard to get right, so don’t go there unless you’ve proven that you really need it. In other words, don’t optimize away the mutexes unless you really need to (and unless you’re ready for some tricky debugging).