Kernel Korner - Network Programming in the Kernel

Take a tour of the kernel's networking functionality by writing a network client that runs in kernel space.
Module Basics

Let's examine the code for the module first. In the code snippets in the article, we omit error-checking and other irrelevant details for clarity. The complete code is available from the LJ FTP site (see Resources):


#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>

/* For socket etc */
#include <linux/net.h>
#include <net/sock.h>
#include <linux/tcp.h>
#include <linux/in.h>
#include <asm/uaccess.h>
#include <linux/file.h>
#include <linux/socket.h>
#include <linux/smp_lock.h>
#include <linux/slab.h>

...

int ftp_init(void)
{

    printk(KERN_INFO FTP_STRING
    "Starting ftp client module\n");
    sys_call_table[SYSCALL_NUM] = my_sys_call;
    return 0;
}

void ftp_exit(void)
{
    printk(KERN_INFO FTP_STRING
    "Cleaning up ftp client module, bye !\n");
    sys_call_table[SYSCALL_NUM] = sys_ni_syscall;
}

...

The program begins with the customary include directives. Notable among the header files are linux/kernel.h for KERN_ALERT and linux/slab.h, which contains definitions for kmalloc() and linux/smp_lock.h that define kernel-locking routines. System calls are handled in the kernel by functions with the same names in user space but are prefixed with sys_. For example, the sys_socket function in the kernel handles the task of the socket() system call. In this module, we are using system call number 223 for our new system call. This method is not foolproof and will not work on SMP machines. Upon unloading the module, we unregister our system call.

The System Call

The workhorse of the module is the new system call that performs an FTP read. The system call takes a structure as a parameter. The structure is self-explanatory and is given below:

struct params {
    /* Destination IP address */
    unsigned char destip[4];
    /* Source IP address */
    unsigned char srcip[4];
    /* Source file - file to be downloaded from
       the server */
    char src[64];
    /* Destination file - local file where the
       downloaded file is copied */
    char dst[64];
    char user[16]; /* Username */
    char pass[64]; /* Password */
};

The system call is given below. We explain the relevant details in next few paragraphs:


asmlinkage int my_sys_call
(struct params __user *pm)
{
    struct sockaddr_in saddr, daddr;
    struct socket *control= NULL;
    struct socket *data = NULL;
    struct socket *new_sock = NULL;

    int r = -1;
    char *response = kmalloc(SNDBUF, GFP_KERNEL);
    char *reply = kmalloc(RCVBUF, GFP_KERNEL);

    struct params pmk;

    if(unlikely(!access_ok(VERIFY_READ,
                 pm, sizeof(pm))))
        return -EFAULT;
    if(copy_from_user(&pmk, pm,
       sizeof(struct params)))
        return -EFAULT;
    if(current->uid != 0)
        return r;

    r = sock_create(PF_INET, SOCK_STREAM,
                    IPPROTO_TCP, &control);

    memset(&servaddr,0, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(PORT);
    servaddr.sin_addr.s_addr =
        htonl(create_address(128, 196, 40, 225));

    r = control->ops->connect(control,
             (struct sockaddr *) &servaddr,
             sizeof(servaddr), O_RDWR);
    read_response(control, response);
    sprintf(temp, "USER %s\r\n", pmk.user);
    send_reply(control, temp);
    read_response(control, response);
    sprintf(temp, "PASS %s\r\n", pmk.pass);
    send_reply(control, temp);
    read_response(control, response);

We start out by declaring pointers to a few socket structures. kmalloc() is the kernel equivalent of malloc() and is used to allocate memory for our character array. The array's response and reply will contain the responses to and replies from the server.

The first step is to read the parameters from user mode to kernel mode. This is customarily done with access_ok and verify_read/verify_write calls. access_ok checks whether the user-space pointer is valid to be referenced. verify_read is used to read data from user mode. For reading simple variables like char and int, use __get_user.

Now that we have the user-specified parameters, the next step is to create a control socket and establish a connection with the FTP server. sock_create() does this for us—its arguments are similar to those we pass to the user-level socket() system call. The struct sockaddr_in variable servaddr is now filled in with all the necessary information—address family, destination port and IP address of the server. Each socket structure has a member that is a pointer to a structure of type struct proto_ops. This structure contains a list of function pointers to all the operations that can be performed on a socket. We use the connect() function of this structure to establish a connection to the server. Our functions read_response() and send_reply() transfer data between the client and server (these functions are explained later):


r = sock_create(PF_INET, SOCK_STREAM,
                IPPROTO_TCP, &data);
memset(&claddr,0, sizeof(claddr));
claddr.sin_family = AF_INET;
claddr.sin_port = htons(EPH_PORT);
clddr.sin_addr.s_addr= htonl(
                       create_address(srcip));
r = data->ops->bind(data,
         (struct sockaddr *)&claddr,
         sizeof (claddr));
r = data->ops->listen(data, 1);

Now, a data socket is created to transfer data between the client and server. We fill in another struct sockaddr_in variable claddr with information about the client—protocol family, local unprivileged port that our client would bind to and, of course, the IP address. Next, the socket is bound to the ephemeral port EPH_PORT. The function listen() lets the kernel know that this socket can accept incoming connections:


a = (char *)&claddr.sin_addr;
p = (char *)&claddr.sin_port;

send_reply(control, reply);
read_response(control, response);

strcpy(reply, "RETR ");
strcat(reply, src);
strcat(reply, "\r\n");

send_reply(control, reply);
read_response(control, response);

As explained previously, a PORT command is issued to the FTP server to let it know the port for data transfer. This command is sent over the control socket and not over the data socket:


new_sock = sock_alloc();
new_sock->type = data->type;
new_sock->ops = data->ops;

r = data->ops->accept(data, new_sock, 0);
new_sock->ops->getname(new_sock,
    (struct sockaddr *)address, &len, 2);

Now, the client is ready to accept data from the server. We create a new socket and assign it the same type and ops as our data socket. The accept() function pulls the first pending connection in the listen queue and creates a new socket with the same connection properties as data. The new socket thus created handles all data transfer between the client and server. The getname() function gets the address at the other end of the socket. The last three lines in the above segment of code are useful only for printing information about the server:


if((total_written = write_to_file(pmk.dst,
            new_sock, response)) < 0)
    goto err3;

The function write_to_file deals with opening a file in the kernel and writing data from the socket back into the file. Writing to sockets works like this:


void send_reply(struct socket *sock, char *str)
{
    send_sync_buf(sock, str, strlen(str),
                  MSG_DONTWAIT);
}


int send_sync_buf
(struct socket *sock, const char *buf,
 const size_t length, unsigned long flags)
{
    struct msghdr msg;
    struct iovec iov;
    int len, written = 0, left = length;
    mm_segment_t oldmm;

    msg.msg_name     = 0;
    msg.msg_namelen  = 0;
    msg.msg_iov      = &iov;
    msg.msg_iovlen   = 1;
    msg.msg_control  = NULL;
    msg.msg_controllen = 0;
    msg.msg_flags    = flags;

    oldmm = get_fs(); set_fs(KERNEL_DS);

repeat_send:
    msg.msg_iov->iov_len = left;
    msg.msg_iov->iov_base = (char *) buf +
                                written;

    len = sock_sendmsg(sock, &msg, left);
    ...
    return written ? written : len;
}

The send_reply() function calls send_sync_buf(), which does the real job of sending the message by calling sock_sendmsg(). The function sock_sendmsg() takes a pointer to struct socket, the message to be sent and the message length. The message is represented by the struture msghdr. One of the important members of this structure is iov (io vector). The iovector has two members, iov_base and iov_len:

struct iovec
{
    /* Should point to message buffer */
    void *iov_base;
    /* Message length */
    __kernel_size_t iov_len;
};

These members are filled with appropriate values, and sock_sendmsg() is called to send the message.

The macro set_fs is used to set the FS register to point to the kernel data segment. This allows sock_sendmsg() to find the data in the kernel data segment instead of the user-space data segment. The macro get_fs saves the old value of FS. After a call to sock_sendmsg(), the saved value of FS is restored.

Reading from the socket works similarly:


int read_response(struct socket *sock, char *str)
{
        ...
        len = sock_recvmsg(sock, &msg,
                max_size, 0);
        ...
        return len;
}

The read_response() function is similar to send_reply(). After filling the msghdr structure appropriately, it uses sock_recvmsg() to read data from a socket and returns the number of bytes read.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Fail to create dynamic system call

Kunsheng's picture

Hi everyone,

Has anyone tried this code themselves? I failed to insert that system call dynamically (but succeeded to compile and insmod the module).

I am wondering if anyone know what's going on there.

PS: I am using Linux 2.6.29 in Ubuntu.

Thanks in advance,

-Kunsheng

kernel raw sockets

Anonymous's picture

Hello,
I am new to linux kernel development. so if any mistakes you find, pls
frgive it and correct me.

I wanted to send raw packets through ethernet, from kernel level.
So i use PF_PACKET family. & SOCK_RAW.And i used sock_create()
function to create socket.
But I found that when i create socket with
sock_create(PF_PACKET,SOCK_RAW,.....) the program always fails in
bind. (when i do sock->ops->bind(.....))
why is it so ? but when I use PF_INET & SOCK_PACKET to create socket.
bind happens successfully.
Can any one help me to come out of this issue?? OR direct me to create
raw packets and send from kernel??

thanks in advance
-Anuroop

create_address doesn't exist

Derek's picture

The function create_address doesn't exist in my kernel header (2.6.17). Where should I get the definition?

Thanks,
Derek

It's true.Create_address is

Prafulla's picture

It's true.Create_address is not in Kernel Header.
Has anyone found how to get it?
Please reply

insmod 'ing the module

Nathan's picture

In your example you don't actually use insmod after building the module, does that mean its not necessary? If not then how does the userland program see the system call. If so then do you know why it insmod'ing it would freeze my system? Cause it does. I did fiddle with the code a bit, mostly stripped it down to just connect, send a message, and close.

Thanks
Nathan

Problem was redefining the

Nathan's picture

Problem was redefining the system call. Seems linux doesn't appreciate it none too much and freezes. I've read its not really done anymore anyway.

Nathan

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix