Linux System Calls
In Java, exceptions are objects. In addition to throwing objects whose class is declared in java.lang, you can throw objects of your own design. To create your own class of throwable objects, you need to declare it as a subclass of some member of the Throwable family. In general, however, the throwable classes you define should extend class Exception--they should be “exceptions”. Usually, the class of the exception object indicates the type of abnormal condition encountered. For example, if a thrown exception object has class illegalArgumentException, that indicates someone passed an illegal argument to a method.
When you throw an exception, you instantiate and throw an object whose class, declared in java.lang, descends from Throwable, which has two direct subclasses: Exception and Error. Errors (members of the Error family) are usually thrown for more serious problems, such as OutOfMemoryError, that may not be easy to handle. Errors are usually thrown by the methods of the Java API or the Java Virtual Machine. In general, code you write should throw only exceptions, not errors.
The Java Virtual Machine uses the class of the exception object to decide which catch clause, if any, should be allowed to handle the exception. The catch clause can also get information on the abnormal condition by querying the exception object directly for information you embedded in it during instantiation (before throwing it). The Exception class allows you to specify a detailed message as a string that can be retrieved by invoking getMessage on the exception object.
Each IA32 interrupt or exception has a number, which is referred to in the IA32 literature as its vector. The NMI interrupt and the processor-detected exceptions have been assigned vectors in the range 0 through 31, inclusive. The vectors for maskable interrupts are determined by the hardware. External interrupt controllers put the vector on the bus during the interrupt-acknowledge cycle. Any vector in the range 32 through 255, inclusive, can be used for maskable interrupts or programmed exceptions.
The startup_32 code found in /usr/src/linux/boot/head.S starts everything off at boot time by calling setup_idt. This routine sets up an IDT (Interrupt Descriptor Table) with 256 entries, each four bytes long, total 1024 bytes, offsets 0-255. It should be noted that the IDT contains vectors to both interrupt handlers and exception handlers, so “IDT” is something of a misnomer, but that's the way it is.
No interrupt entry points are actually loaded by startup_32, as that is done only after paging has been enabled and the kernel has been relocated to 0xC000000. At times, mostly during boot, the kernel must be loaded into certain addresses, because the underlying BIOS architecture demands it. After control is passed to the kernel exclusively, the Linux kernel can put itself wherever it wants. Usually this is very high up in memory, but below the 2GB limit.
When start_kernel (found in /usr/src/linux/init/main.c) is called, it invokes trap_init (found in /usr/src/linux/kernel/traps.c). trap_init sets up the IDT via the macro set_trap_gate (found in /usr/include/asm/system.h) and initializes the interrupt descriptor table as shown in the “Offset Descriptionis” table.
At this point, the interrupt vector for the system calls is not set up. It is initialized by sched_init (found in /usr/src/linux/kernel/sched.c). To set interrupt 0x80 to be a vector to the _system_call entry point, call:
set_system_gate (0x80, &system_call)
The priority of simultaneously seen interrupts and exceptions is shown in the sidebar “Runtime Priority of Interrupts”.
The Linux system call interface is vectored through a stub in libc (often glibc) and is exclusively “register-parametered”, i.e., the stack is not used for parameter passing. Each call within the libc library is generally a syscallX macro, where X is the number of parameters used by the actual routine. Under Linux, the execution of a system call is invoked by a maskable interrupt or exception class transfer (e.g., “throwing” an exception object), caused by the instruction in 0x80. Vector 0x80 is used to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors such as the system clock vector. On the assembly level (in user space), it looks like Listing 1. Nowadays, this code is contained in the glibc2.1 library. 0x80 is hardcoded into both Linux and glibc, to be the system call number which transfers control to the kernel. At bootup, the kernel has set up the IDT vector 0x80 to be a “call gate” (see arch/i386/kernel/traps.c:trap_init):
The vector layout is defined in include/asm-i386/hw_irq.h.
Not until the int $0x80 is executed does the call transfer to the kernel entry point _system_call. This entry point is the same for all system calls. It is responsible for saving all registers, checking to make sure a valid system call was invoked, then ultimately transferring control to the actual system call code via the offsets in the _sys_call_table. It is also responsible for calling _ret_from_sys_call when the system call has been completed, but before returning to user space.
Actual code for the system_call entry point can be found in /usr/src/linux/kernel/sys_call.S and the code for many of the system calls can be found in /usr/src/linux/kernel/sys.c. Code for the rest is distributed throughout the source files. Some system calls, like fork, have their own source file (e.g., kernel/fork.c).
The next instruction the CPU executes after the int $0x80 is the pushl %eax in entry.S:system_call. There, we first save all user-space registers, then we range-check %eax and call sys_call_table[%eax], which is the actual system call.
Since the system call interface is exclusively register-parametered, six parameters at most can be used with a single system call. %eax is the syscall number; %ebx, %ecx, %edx, %esi, %edi and %ebp are the six generic registers used as param0-5; and %esp cannot be used because it's overwritten by the kernel when it enters ring 0 (i.e., kernel mode).
In case more parameters are needed, some structure can be placed wherever you want within your address space and pointed to from a register (not the instruction pointer, nor the stack pointer; the kernel-space functions use the stack for parameters and local variables). This case is extremely rare, though; most system calls have either no parameters or only one.
Once the system call returns, we check one or more status flags in the process structure; the exact number will depend on the system call. creat might leave a dozen flags (existing, created, locked, etc.), whereas a sync might return only one.
If no work is pending, we restore user-space registers and return to user space via iret. The next instruction after the iret is the user-space popl %ebx instruction shown in Listing 1.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Profiles and RC Files
- Maru OS Brings Debian to Your Phone
- OpenSwitch Finds a New Home
- Understanding Ceph and Its Place in the Market
- Astronomy for KDE
- Git 2.9 Released
- Snappy Moves to New Platforms
- SoftMaker FreeOffice
- What's Our Next Fight?
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide