The Linux Signals Handling Model
Signals are used to notify a process or thread of a particular event. Many computer science researchers compare signals with hardware interrupts, which occur when a hardware subsystem, such as a disk I/O (input/output) interface, generates an interrupt to a processor when the I/O completes. This event in turn causes the processor to enter an interrupt handler, so subsequent processing can be done in the operating system based on the source and cause of the interrupt.
UNIX guru W. Richard Stevens aptly describes signals as software interrupts. When a signal is sent to a process or thread, a signal handler may be entered (depending on the current disposition of the signal), which is similar to the system entering an interrupt handler as the result of receiving an interrupt.
Operating system signals actually have quite a history of design changes in the signal code and various implementations of UNIX. This was due in part to some deficiencies in the early implementation of signals, as well as the parallel development work done on different versions of UNIX, primarily BSD UNIX and AT&T System V. James Cox, Berny Goodheart and W. Richard Stevens cover these details in their respective well-known books, so they don't need to be repeated here.
Implementation of correct and reliable signals has been in place for many years now, where an installed signal handler remains persistent and is not reset by the kernel. The POSIX standards provided a fairly well-defined set of interfaces for using signals in code, and today the Linux implementation of signals is fully POSIX-compliant. Note that reliable signals require the use of the newer sigaction interface, as opposed to the traditional signal call.
The occurrence of a signal may be synchronous or asynchronous to the process or thread, depending on the source of the signal and the underlying reason or cause. Synchronous signals occur as a direct result of the executing instruction stream, where an unrecoverable error (such as an illegal instruction or illegal address reference) requires an immediate termination of the process. Such signals are directed to the thread which caused the error with its execution stream. As an error of this type causes a trap into a kernel trap handler, synchronous signals are sometimes referred to as traps.
Asynchronous signals are external to (and in some cases, unrelated to) the current execution context. One obvious example would be the sending of a signal to a process from another process or thread via a kill(2), _lwp_kill(2) or sigsend(2) system call, or a thr_kill(3T), pthread_kill(3T) or sigqueue(3R) library invocation. Asynchronous signals are also aptly referred to as interrupts.
Every signal has a unique signal name, an abbreviation that begins with SIG (SIGINT for interrupt signal, for example) and a corresponding signal number. Additionally, for all possible signals, the system defines a default disposition or action to take when a signal occurs. There are four possible default dispositions:
Exit: forces the process to exit.
Core: forces the process to exit and create a core file.
Stop: stops the process.
Ignore: ignores the signal; no action taken.
A signal's disposition within a process's context defines what action the system will take on behalf of the process when a signal is delivered. All threads and LWPs (lightweight processes) within a process share the signal disposition, which is processwide and cannot be unique among threads within the same process.
Table 1 provides a complete list of signals, along with a description and default action. The data structures in the kernel to support signals in Linux are to be found in the task structure. Here are the most common elements of said structure pertaining to signals:
current-->sig are the signal handlers.
sigmask_lock is a per-thread spinlock which protects the signal queue and atomicity of other signal operations.
current-signal and current-blocked contain a bitmask (currently 64 bits long, but freely expandable) of pending and permanently blocked signals.
sigqueue and sigqueue_tail is a double-linked list of pending signals—Linux has RT signals which can be queued as well. “Traditional” signals are internally mapped to RT signals.
The disposition of a signal can be changed from its default, and a process can arrange to catch a signal and invoke a signal-handling routine of its own or ignore a signal that may not have a default disposition of Ignore. The only exceptions are SIGKILL and SIGSTOP; their default dispositions cannot be changed. The interfaces for defining and changing signal disposition are the signal and sigset libraries and the sigaction system call. Signals can also be blocked, which means the process has temporarily prevented delivery of a signal. Generation of a signal that has been blocked will result in the signal remaining as pending to the process until it is explicitly unblocked or the disposition is changed to Ignore. The sigprocmask system call will set or get a process's signal mask, the bit array inspected by the kernel to determine if a signal is blocked or not. thr_setsigmask and pthread_sigmask are the equivalent interfaces for setting and retrieving the signal mask at the user-threads level.
I mentioned earlier that a signal may originate from several different places for a variety of different reasons. The first three signals listed in Table 1—SIGHUP, SIGINT and SIGQUIT—are generated by a keyboard entry from the controlling terminal (SIGINT and SIGHUP) or if the control terminal becomes disconnected (SIGHUP—use of the nohup command makes processes “immune” from hangups by setting the disposition of SIGHUP to Ignore).
Other terminal I/O-related signals include SIGSTOP, SIGTTIN, SIGTTOU and SIGTSTP. For the signals originating from a keyboard command, the actual key sequence that generates the signals, usually CTRL-C, is defined within the parameters of the terminal session, typically via stty(1) which results in a SIGINT being sent to a process, and has a default disposition of Exit.
User tasks in Linux, created via explicit calls to either thr_create or pthread_create, all have their own signal masks. Linux threads call clone with CLONE_SIGHAND; this shares all signal handlers between threads via sharing the current->sig pointer. Delivered signals are unique to a thread.
In some operating systems, such as Solaris 7, signals generated as a result of a trap (SIGFPE, SIGILL, etc.) are sent to the thread that caused the trap. Asynchronous signals are delivered to the first thread found not blocking the signal. In Linux, it is almost exactly the same. Synchronous signals happening in the context of a given thread are delivered to that thread.
Asynchronous in-kernel signals (e.g., asynchronous network I/O) is delivered to the thread that generated the asynchronous I/O. Explicit user-generated signals get delivered to the right thread as well. However, if CLONE_PID is used, all places that use the PID to deliver a signal will behave in a “weird” way; the signal gets randomly delivered to the first thread in the pidhash. Linux threads don't use CLONE_PID, so there is no such problem if you are using the pthreads.h thread API.
When a signal is sent to a user task, for example, when a user-space program accesses an illegal page, the following happens:
page_fault (entry.S) in the low-level page-fault handler.
do_page_fault (fault.c) fetches i386-specific parameters of the fault and does basic validation of the memory range involved.
handle_mm_fault (memory.c) is generic MM (memory management) code (i386-independent), which gets called only if the memory range (VMA) exists. The MM reads the page table entry and uses the VMA to find out whether the memory access is legal or not.
The case we are interested in now is when the access was illegal (e.g., a write was attempted to a read-only mapping): handle_mm_fault returns 0 to do_page_fault in this case. As you can see from Listing 1, locking of the MM is very finely grained (and it better be this way); the mm->mmap_sem, per-MM semaphore, is used (which typically varies from process to process).
force_sig(SIGBUS,current) is used to “force” the SIGBUS signal on the faulting task. force_sig delivers the signal even if the process has attempted to ignore SIGBUS.
force_sig fills out the signal event structure and queues it into the process's signal queue (current->sigqueue and current->sigqueue_tail). The signal queue holds an indefinite number of queued signals. The semantics of “classic” signals are that follow-up signals are ignored—this is emulated in the signal code kernel/signal.c. “Generic” (or RT) signals can be queued arbitrarily; there are reasonable limits to the length of the signal queue.
The signal is queued, and current-signal is updated. Now comes the tricky part: the kernel returns to user space. Return to user space happens from do_page_fault=>page_fault (entry.S), then the low-level exit code in entry.S is executed in this order:
page_fault=>(called do_page_fault)=>error_code=> ret_from_exception=>(checks if return to user space)=> ret_with_reschedule=>(sees that current->signal is nonzero) =>calls do_signal
Next, do_signal unqueues the signal to be executed. In this case, it's SIGBUS.
Then handle_signal is called with the “unqueued” signal (which can potentially hold extra event information in case of real-time signals/messages).
Next called is setup_frame, where all user-space registers are saved and the kernel stack frame return address is modified to point to the handler of the installed signal handler. A small sequence of code jumper is put on the user stack (obviously, the code first makes sure the user stack is valid) which will return us to kernel space once the signal handler has finished. (See Listing 2.)
Careful: this area is one of the least-understood pieces of the Linux kernel, and for good reason; it is really tough code to read and follow.
The popl %eax ; movl $,%eax ; int $0x80 x86 assembly sequence calls sys_sigret, which later on will restore the kernel stack frame return address to point to the original (faulting) user address.
What is all this magic good for? Well, first the kernel has to guarantee that signal handlers get called properly and the original state is restored. The kernel also has to deal with binary compatibility issues. Linux guarantees that on the IA-32 (Intel x86) architecture, we can run any iBC86-compliant binary code. Speed is also an issue.
Finally, we return to entry.S again, but current-signal is already cleared, so we do not execute do_signal but jump to restore_all as shown in Listing 3. restore.all executes the “iret” that brings us into user space. Suddenly, we are magically executing the signal handler.
Did you get lost yet? No? Here is some more magic. Once the signal handler finishes (it does an assembly “ret” like all well-behaving functions), it will execute the small jumper function we have set up on the user stack. Again we return to the kernel, but now we execute the sys_sigreturn system call, which lives in arch/i386/kernel/signal.c as well. It essentially executes the following code section:
if (restore_sigcontext(regs, &frame->sc, &eax)) goto badframe; return eax;
The above code restores the exact user-register contents into the kernel stack frame (including the return address and flags register) and executes a normal ret_from_syscall, bringing us back to the original faulting code. Hopefully the SIGBUS handler has fixed the problem of why we were faulting.
Now, while reading the above description, you might think this is awfully complex and slow. It actually isn't; lmbench reveals that Linux has the fastest signal-handler installation and execution performance by far of any UNIX running:
moon:~/l> ./lat_sig install Signal handler installation: 1.688 microseconds moon:~/l> ./lat_sig catch Signal handler overhead: 3.186 microseconds
Best of all, it scales linearly on SMP:
moon:~/l> ./lat_sig catch & ./lat_sig catch & Signal handler overhead: 3.264 microseconds Signal handler overhead: 3.248 microseconds moon:~/l> ./lat_sig install & ./lat_sig install & Signal handler installation: 1.721 microseconds Signal handler installation: 1.689 microseconds
Signals can be sent from system calls, interrupts and bottom-half handlers (see sidebar) alike; there is no difference. In other words, the Linux signal queue is interrupt-safe, as strange and recursive as that sounds, so it's fairly flexible.
An interesting signal-delivery case, however, is on SMP. Imagine a thread is executing on one processor, and it gets an asynchronous event (e.g., synchronous socket I/O signal) from an IRQ handler (or another process) on another CPU. In that case, we send a cross-CPU message to the running process, so there is no latency in signal delivery. (The speed of cross-CPU delivery is about five microseconds on a Pentium II 350MHz.)
Once again, we notice how Linux is actually the technology leader in important kernel aspects such as scheduling, interrupt handling and signals handling. This also proves the conjecture that the Linux developer community is collectively more capable and more resourceful than any private corporation's R&D department could ever be.
Moshe Bar (email@example.com) is an Israeli system administrator and OS researcher, who started learning UNIX on a PDP-11 with AT&T UNIX Release 6 back in 1981. He holds an M.Sc. in computer science. Visit Moshe's web site at http://www.moelabs.com/.