System Calls

Functions in the Linux kernel can be called by user programs. Howerver, it takes a bit of preparation. In this column, Michael guides you through the process step by step, explaining why as well as what.
Invoking Your System Call

Now you can call your new function from user code, but how? You can't simply declare extern int sys_name(int arg); and link. Instead, you have to #include <unistd.h> and use the appropriate syscallX() macro, where X is the number of arguments the system call takes. The syscallX() macros are actually defined in include/asm/unistd.h, which gets included by <unistd.h> automatically.

If your system call is declared as

asmlinkage int sys_name(void);

the syscall0() invocation is quite easy:

_syscall0(int, name)

(notice the leading underscore). This gets converted by the C preprocessor into

int name(void)
{
long __res;
__asm__ volatile ("int $0x80"
        : "=a" (__res)
        : "0" (__NR_name));
if (__res >= 0)
        return (int) __res;
errno = -__res;
return -1;
}

on Linux/i86. Because it uses assembly, it will be different on other architectures. Fortunately, it doesn't really matter. The important point is that it creates a function called name which generates an interrupt (remember the “white lie” about interrupts? System calls are interrupts, too) which calls the system call, and then returns the result if the answer is positive, and returns -1 if the answer is negative (has the high-order bit set), setting errno to the non-negative error number.

If your function has two arguments:

asmlinkage int sys_name(int num, struct foo *bar);

you would instead use this:

_syscall2(int, name, int, num, struct foo *, bar)

which would expand to:

int name(int num, struct foo * bar)
{
long __res;
__asm__ volatile ("int $0x80"
        : "=a" (__res)
        : "0" (__NR_name),
          "b" ((long)(num)), "c" ((long)(bar)));
if (__res >= 0)
        return (int) __res;
errno = -__res;
return -1;
}

Notice the unusual way of specifying the arguments to the macro, where the return type and the name of the function are followed by separate arguments for the type and name of each of the system call's arguments. Figuring out how to specify system calls with 1, 3, 4, or 5 arguments is left as an exercise for the reader.

For the curious: there is one other way that system calls may be called on Linux/i86. iBCS2-based programs call system calls with an lcall 7,0 instruction instead of an int $0x80 instruction. The lcall instruction takes slightly longer than the int instruction, which is why it is the default system call mechanism on Linux, but both are supported. The lcall instruction isn't exactly an interrupt, although it acts much like one; technically it is a “call gate”. So my “white lie” isn't really a lie after all.

Michael K. Johnson is the Editor of Linux Journal, and pretends to be a Linux guru in his spare time. He can be reached via e-mail as johnsonm@ssc.com.

______________________

Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Webcast
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up and watch now

Sponsored by Skybot