The Linux Signals Handling Model
The disposition of a signal can be changed from its default, and a process can arrange to catch a signal and invoke a signal-handling routine of its own or ignore a signal that may not have a default disposition of Ignore. The only exceptions are SIGKILL and SIGSTOP; their default dispositions cannot be changed. The interfaces for defining and changing signal disposition are the signal and sigset libraries and the sigaction system call. Signals can also be blocked, which means the process has temporarily prevented delivery of a signal. Generation of a signal that has been blocked will result in the signal remaining as pending to the process until it is explicitly unblocked or the disposition is changed to Ignore. The sigprocmask system call will set or get a process's signal mask, the bit array inspected by the kernel to determine if a signal is blocked or not. thr_setsigmask and pthread_sigmask are the equivalent interfaces for setting and retrieving the signal mask at the user-threads level.
I mentioned earlier that a signal may originate from several different places for a variety of different reasons. The first three signals listed in Table 1—SIGHUP, SIGINT and SIGQUIT—are generated by a keyboard entry from the controlling terminal (SIGINT and SIGHUP) or if the control terminal becomes disconnected (SIGHUP—use of the nohup command makes processes “immune” from hangups by setting the disposition of SIGHUP to Ignore).
Other terminal I/O-related signals include SIGSTOP, SIGTTIN, SIGTTOU and SIGTSTP. For the signals originating from a keyboard command, the actual key sequence that generates the signals, usually CTRL-C, is defined within the parameters of the terminal session, typically via stty(1) which results in a SIGINT being sent to a process, and has a default disposition of Exit.
User tasks in Linux, created via explicit calls to either thr_create or pthread_create, all have their own signal masks. Linux threads call clone with CLONE_SIGHAND; this shares all signal handlers between threads via sharing the current->sig pointer. Delivered signals are unique to a thread.
In some operating systems, such as Solaris 7, signals generated as a result of a trap (SIGFPE, SIGILL, etc.) are sent to the thread that caused the trap. Asynchronous signals are delivered to the first thread found not blocking the signal. In Linux, it is almost exactly the same. Synchronous signals happening in the context of a given thread are delivered to that thread.
Asynchronous in-kernel signals (e.g., asynchronous network I/O) is delivered to the thread that generated the asynchronous I/O. Explicit user-generated signals get delivered to the right thread as well. However, if CLONE_PID is used, all places that use the PID to deliver a signal will behave in a “weird” way; the signal gets randomly delivered to the first thread in the pidhash. Linux threads don't use CLONE_PID, so there is no such problem if you are using the pthreads.h thread API.
When a signal is sent to a user task, for example, when a user-space program accesses an illegal page, the following happens:
page_fault (entry.S) in the low-level page-fault handler.
do_page_fault (fault.c) fetches i386-specific parameters of the fault and does basic validation of the memory range involved.
handle_mm_fault (memory.c) is generic MM (memory management) code (i386-independent), which gets called only if the memory range (VMA) exists. The MM reads the page table entry and uses the VMA to find out whether the memory access is legal or not.
The case we are interested in now is when the access was illegal (e.g., a write was attempted to a read-only mapping): handle_mm_fault returns 0 to do_page_fault in this case. As you can see from Listing 1, locking of the MM is very finely grained (and it better be this way); the mm->mmap_sem, per-MM semaphore, is used (which typically varies from process to process).
force_sig(SIGBUS,current) is used to “force” the SIGBUS signal on the faulting task. force_sig delivers the signal even if the process has attempted to ignore SIGBUS.
force_sig fills out the signal event structure and queues it into the process's signal queue (current->sigqueue and current->sigqueue_tail). The signal queue holds an indefinite number of queued signals. The semantics of “classic” signals are that follow-up signals are ignored—this is emulated in the signal code kernel/signal.c. “Generic” (or RT) signals can be queued arbitrarily; there are reasonable limits to the length of the signal queue.
The signal is queued, and current-signal is updated. Now comes the tricky part: the kernel returns to user space. Return to user space happens from do_page_fault=>page_fault (entry.S), then the low-level exit code in entry.S is executed in this order:
page_fault=>(called do_page_fault)=>error_code=> ret_from_exception=>(checks if return to user space)=> ret_with_reschedule=>(sees that current->signal is nonzero) =>calls do_signal
Next, do_signal unqueues the signal to be executed. In this case, it's SIGBUS.
Then handle_signal is called with the “unqueued” signal (which can potentially hold extra event information in case of real-time signals/messages).
Next called is setup_frame, where all user-space registers are saved and the kernel stack frame return address is modified to point to the handler of the installed signal handler. A small sequence of code jumper is put on the user stack (obviously, the code first makes sure the user stack is valid) which will return us to kernel space once the signal handler has finished. (See Listing 2.)
Careful: this area is one of the least-understood pieces of the Linux kernel, and for good reason; it is really tough code to read and follow.
The popl %eax ; movl $,%eax ; int $0x80 x86 assembly sequence calls sys_sigret, which later on will restore the kernel stack frame return address to point to the original (faulting) user address.
What is all this magic good for? Well, first the kernel has to guarantee that signal handlers get called properly and the original state is restored. The kernel also has to deal with binary compatibility issues. Linux guarantees that on the IA-32 (Intel x86) architecture, we can run any iBC86-compliant binary code. Speed is also an issue.
Finally, we return to entry.S again, but current-signal is already cleared, so we do not execute do_signal but jump to restore_all as shown in Listing 3. restore.all executes the “iret” that brings us into user space. Suddenly, we are magically executing the signal handler.
Did you get lost yet? No? Here is some more magic. Once the signal handler finishes (it does an assembly “ret” like all well-behaving functions), it will execute the small jumper function we have set up on the user stack. Again we return to the kernel, but now we execute the sys_sigreturn system call, which lives in arch/i386/kernel/signal.c as well. It essentially executes the following code section:
if (restore_sigcontext(regs, &frame->sc, &eax)) goto badframe; return eax;
The above code restores the exact user-register contents into the kernel stack frame (including the return address and flags register) and executes a normal ret_from_syscall, bringing us back to the original faulting code. Hopefully the SIGBUS handler has fixed the problem of why we were faulting.
Now, while reading the above description, you might think this is awfully complex and slow. It actually isn't; lmbench reveals that Linux has the fastest signal-handler installation and execution performance by far of any UNIX running:
moon:~/l> ./lat_sig install Signal handler installation: 1.688 microseconds moon:~/l> ./lat_sig catch Signal handler overhead: 3.186 microseconds
Best of all, it scales linearly on SMP:
moon:~/l> ./lat_sig catch & ./lat_sig catch & Signal handler overhead: 3.264 microseconds Signal handler overhead: 3.248 microseconds moon:~/l> ./lat_sig install & ./lat_sig install & Signal handler installation: 1.721 microseconds Signal handler installation: 1.689 microseconds
- Let's Go to Mars with Martian Lander
- My Childhood in a Cigar Box
- Applied Expert Systems, Inc.'s CleverView for TCP/IP on Linux
- Papa's Got a Brand New NAS
- Returning Values from Bash Functions
- Rogue Wave Software's TotalView for HPC and CodeDynamics
- Tech Tip: Really Simple HTTP Server with Python
- Panther MPC, Inc.'s Panther Alpha
- OpenSSL Hacks