Manipulating the Networking Environment Using RTNETLINK

How to use RTNETLINK to develop applications that control networking.

NETLINK is a facility in the Linux operating system for user-space applications to communicate with the kernel. NETLINK is an extension of the standard socket implementation. Using NETLINK, an application can send/receive information to/from different kernel features, such as networking, to check current status and control them.

In this article, I describe how a programmer can use the networking environment manipulation capability of NETLINK known as RTNETLINK. I discuss some areas of use of RTNETLINK, the relevant socket operations, the functionality, how RTNETLINK messages are formed and finally, provide a set of sample code that uses RTNETLINK. RTNETLINK for the IP version 4 environment is referred to as NETLINK_ROUTE, and for the IP version 6 environment, it is referred to as NETLINK_ROUTE6. The explanations given here are applicable for both IP versions 4 and 6.

Developers of network layer protocol handlers can use RTNETLINK to modify and monitor different components of networking, such as the routing table and network interfaces. There are many existing and upcoming protocol standards at the Internet Engineering Task Force (IETF) that can be implemented in user space. These implementations will require manipulating the routing and knowing what is being modified by other processes. Some of these protocol categories are as follows:

  • Dynamic routing protocols: protocols of this category, including the Routing Information Protocol (RIP), Open Shortest Path First (OSPF) and Exterior Gateway Protocol (EGP) actively manage the routing environment of a host while communicating with other equally capable hosts or routers in the network or Internet.

  • Mobility protocols: hosts that are mobile and connect to different networks at different times use protocols such as Mobile IP (MIP), Session Initiation Protocol (SIP) and Network Mobility (NEMO) to manage routing to maintain connectivity and continuity of communications.

  • Ad hoc networking protocols: hosts that are mobile and located in places where there is no networking infrastructure, such as routers and WLAN access points, require peer-to-peer communications with differently configured hosts. Mobile computers of rescue workers in an earthquake-struck area or other such emergencies can use ad hoc networking protocols. These protocols, such as the Ad hoc On-demand Distance Vector (AODV) and Optimized Link State Routing (OLSR), require managing the routing to find and communicate with other hosts using neighboring hosts as routers and gateways.

It helps reduce the complexity of the kernel code if you implement these protocols in user space. Further, it simplifies the development and testing of these protocols because of the availability of many user-space development tools. Problems, such as kernel crashes, that are likely with kernel-based code when testing or when used by end users will not occur in a user-space protocol handler.

Socket Operations

The socket implementation of Linux allows two end points to communicate. The socket API provides a standard set of functions and data structures. With RTNETLINK, the two end points in communication are user space and kernel space. The following sequence of socket calls have to be made when manipulating the networking environment through RTNETLINK:

  1. Open socket.

  2. Bind socket to local address (using process ID).

  3. Send message to the other end point.

  4. Receive message from the other end point.

  5. Close socket.

The socket() function opens an unattached end point to communicate with the kernel. The function prototype of this call is as follows:


int socket(int domain, int type, int protocol);

The domain refers to what type of socket is being used. For RTNETLINK, we use AF_NETLINK (PF_NETLINK). type refers to the type of protocol used when communicating. This can be raw (SOCK_RAW) or datagram (SOCK_DGRAM). This is not relevant for RTNETLINK sockets and either can be used. protocol refers to the exact NETLINK capability that we use; in our case, it is NETLINK_ROUTE. This function returns an integer with a positive number called the socket descriptor, if the socket opening was successful. This descriptor will be used in all the future RTNETLINK calls until the socket is closed. If there was a failure, a negative value is returned, and the system error variable errno included in errno.h is set to the appropriate error code.

The following is an example of a call to open an RTNETLINK socket:


int fd;
...
fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);

Once the socket is opened, it has to be bound to a local address. The user application can use a unique 32-bit ID to identify the local address. The function prototype of bind is as follows:


int bind(int fd, struct sockaddr *my_addr,
                              socklen_t addrlen);

To bind, the caller must provide a local address using the sockaddr_nl structure. This structure in the linux/netlink.h #include file has the following format:


struct sockaddr_nl
{
  sa_family_t     nl_family; // AF_NETLINK
  unsigned short  nl_pad;    // zero
  __u32           nl_pid;    // process pid
  __u32           nl_groups; // multicast grps mask
};

The nl_pid must contain a unique ID, which can be created using the return of the getpid() function. This function returns the process ID of the current user process that opened the RTNETLINK socket. But, if our process consists of multiple threads with each thread opening different RTNETLINK sockets, a modified process ID can be used.

Once this structure is filled, the binding can be done. The bind function returns zero if the operation succeeded. A negative number is returned in the case of failure, and the system error variable is set. The following is an example of calling bind:


struct sockaddr_nl la;
...
bzero(&la, sizeof(la));
la.nl_family = AF_NETLINK;
la.nl_pad = 0;
la.nl_pid = getpid();
la.nl_groups = 0;
rtn = bind(fd, (struct sockaddr*) &la, sizeof(la));

If the operation you require is multicast-based, you must set nl_groups to join the multicast group associated with the required RTNETLINK operation. For example, if you want to be notified of the changes to the routing table by other processes, you must OR (|) the RTMGRP_IPV4_ROUTE and RTMGRP_NOTIFY.

Sending routing RTNETLINK messages to the kernel is done through the use of the standard sendmsg() function of the socket interface. The following is the prototype of this function:


ssize_t sendmsg(int fd, const struct msghdr *msg,
                                      int flags);

msg is a pointer to a msghdr structure. The following is the format of this structure:


struct msghdr
{
  void *msg_name;        //Address to send to
  socklen_t msg_namelen; //Length of address data

  struct iovec *msg_iov; //Vector of data to send
  size_t msg_iovlen;     //Number of iovec entries

  void *msg_control;     //Ancillary data
  size_t msg_controllen; //Ancillary data buf len

  int msg_flags;         //Flags on received msg
};

The msg_name is a pointer to a variable of the type struct sockaddr_nl. This is the destination address of the sendmsg() function. Because this message is directed to the kernel, all variables of sockaddr_nl will be initialized to zero, except the nl_family member variable. The field msg_namelen should contain the size of a struct sockaddr_nl.

msg_iov should contain a pointer to a struct iovec, which is filled with the RTNETLINK message relevant to the request being made. The caller is allowed to place multiple RTNETLINK requests, if required. msg_iovlen points to the number of struct iovec structures that were placed in msg_iov. The rest of the variables are initialized to zero.

To receive RTNETLINK messages, the recv() function is used. Here is the prototype of this function:


ssize_t recv(int fd, void *buf, size_t len,
                                      int flags);

The second and third variables are a pointer to a buffer to place the bytes read and the length of this buffer, respectively. For RTNETLINK, the buffer will contain a set of RTNETLINK messages that have to be read one after the other using a set of macros provided in the netlink.h and rtnetlink.h #include files. flags is a set of flags to indicate how the receive should be performed. For RTNETLINK, this simply can be initialized to zero.

Once the socket communications are complete, the socket has to be closed using the close() function. Here's the prototype of this function:


int close(int fd);

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

wireless link

Pranab's picture

Is there any way to monitor the wireless link up and down using rtnetlink ? If yes, what all the parameters required to change ?

Getting default gateway

donX's picture

Is it possible to get the default gateway with out having to create a route? I am trying to get the default gateways ip with out having to add/delete to the routing table.

Please let me know.. and Thank You for this article it was great!

double event found on netlink socket

Anonymous's picture

hi All,

I M new for netlink socket programming. I am developing simple application which inform me when ever any Interface is make up/down using if/up/down/config or wire out from Link plug. now problem is i got two packet for every if/up/down or wire out event.

I am not able to solve problem and don't understand why this things happen.

I m using "nl_groups = RTNLGRP_LINK" only.

Thanks.

Route add does not work.

Nilesh's picture

Using this tutorial, i crated a function to add the route, I believe I am populating all the necessary elements of the data structures. However, the route is getting added wrongly. For any kind of route, the function only add 0.0.0.0 route with mask 255.255.255.255 and gateway 0.0.0.0. It points to the correct interface that i specify in RTA_OIF.

Here is the function.

unsigned int rtm_add_v4 (unsigned int prefix,
unsigned char len, u_char tbl_index,
unsigned int oif,
u_char proto,
u_char rt_type,
struct rtnexthop *rtnh)
{
struct sockaddr_nl ra;
struct msghdr msg;
struct iovec iov;
char buf[8192];
int rtn;

struct nlmsghdr *nlm;
int nlml;
struct rtmsg *rt;
int rtl;
struct rtattr *rta;
rtsock_req_t rreq;

assert(rtm_initialized);

bzero(&rreq, sizeof(rreq));

rtl = sizeof(struct rtmsg);

rta = (struct rtattr *)rreq.buf;

rta->rta_type = RTA_DST;
rta->rta_len = sizeof(struct rtattr) + 4;

printf("Copying prefix 0x%08x\n", prefix);

bcopy(&prefix,(char *)rta+rta->rta_len,4);

rtl += rta->rta_len;

rta = (struct rtattr *)(((char *)rta) + rta->rta_len);

rta->rta_type = RTA_OIF;
rta->rta_len = sizeof(struct rtattr) + 4;

printf("Copying OIF: %d\n", oif);
bcopy(&oif, (char *)rta+sizeof(struct rtattr), 4);

rtl += rta->rta_len;

/* Setup the NETLINK Header */

rreq.nl.nlmsg_len = NLMSG_LENGTH(rtl);
rreq.nl.nlmsg_flags = NLM_F_REQUEST|NLM_F_CREATE;
rreq.nl.nlmsg_type = RTM_NEWROUTE;

/* Setup operation header */

rreq.rt.rtm_family = AF_INET;
rreq.rt.rtm_dst_len = len;
rreq.rt.rtm_table = tbl_index;
rreq.rt.rtm_protocol = proto;
rreq.rt.rtm_scope = RT_SCOPE_UNIVERSE;
rreq.rt.rtm_type = rt_type;

bzero(&ra, sizeof(ra));
ra.nl_family = AF_NETLINK;

bzero(&msg, sizeof(msg));

msg.msg_name = (void *)&ra;
msg.msg_namelen = sizeof(ra);

iov.iov_base = (void *)&rreq.nl;
iov.iov_len = rreq.nl.nlmsg_len;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;

rtn = sendmsg(rtsock,&msg, 0);

if (rtn < 0)
{
printf("%s :", __FUNCTION__);
perror("sendmsg");
printf("\n");
return 0;
} else {
return 1;
}

return 0;

}

Here is the routing table before route add

[root@iLinux-Nilesh route]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.2.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
172.19.57.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.60.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 172.19.57.1 0.0.0.0 UG 0 0 0 eth0

Here is the sample run of the test program that uses this function.

Enter the route to be added: 172.21.1.1
Addr: 0xac150101
Enter prefix len: 32
len: 32
Enter the oif: 2
Copying prefix 0xac150101
Copying OIF: 2
ROUTE ADDED SUCCESSFULLY
[root@iLinux-Nilesh route]#

Routing table after route add program.

[root@iLinux-Nilesh route]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 0.0.0.0 255.255.255.255 UH 0 0 0 eth0 <<<<<
10.2.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
172.19.57.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.60.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 172.19.57.1 0.0.0.0 UG 0 0 0 eth0
[root@iLinux-Nilesh route]#

Any idea what is wrong? Function in test program is invoked as below.

rc = rtm_add_v4(addr.s_addr,len,RT_TABLE_MAIN,oif,RTPROT_STATIC,RTN_UNICAST,NULL);

Never mind..found the

Nilesh's picture

Never mind..found the problem.

Hello Asanga... I am

Anonymous's picture

Hello Asanga...
I am newbie to Linux. As far as u have googled i found only your material for a sample.With my understanding on ur illustration I have made the below module to get the destination address and the gateway address when i give "route add -host 192.168.2.45 gw 202.34.2.1"
I get the gateway address as 192.168.2.45 from the module. but i expect the gateway to be 202.34.2.1...

Please help me on this regard.
Thanks in advance...

/* Read message from kernel */
recv(sock_fd,nlh,size, 0);
printf(" Received message payload: %s\n",
NLMSG_DATA(nlh));

rtp = (struct rtmsg*)NLMSG_DATA(nlh);

rtap =(struct rtattr*) RTM_RTA(rtp);
rtl = RTM_PAYLOAD(nlh);
for(;RTA_OK(rtap,rtl);rtap = RTA_NEXT(rtap,rtl))
{

switch(rtap_rta_type)
{
case RTA_GATEWAY:
inet_ntop(AF_INET,RTA_DATA(rtap),gws,24);
printf("\n gateway address is %s",gws);
break;

case RTA_DST:
inet_ntop(AF_INET,RTA_DATA(rtap),dsts,24);
printf("\n destination address is %s",dsts);
break;

case RTA_SRC:
printf("\n received source address");
break;

default:
break;
}
}

Getting the IPV6 Address of a Device via rtnetlink

saltorfer's picture

Hi, I currently have the problem that i want to get the IPV6 Address of a device via rtnetlink. Your article was already very helpful, but I still cannot find out, which fields of struct ifaddrmsg I have to fill out if I pass it with a request so that i get the IP that I am looking for. I set ifa_family to AF_INET6 and ifa_index to the device that I am looking at. Nevertheless when parsing the "answer" buffer so to speak I get NLMSG_ERROR and nothing else. Well and there is the problem that the programm never does more than one iteration in while(1) but it also never leaves it .. well that is a different problem I guess. Still, I would like to know why you put the second break condition in there, is it not always true? You didn't even set nl_groups in your programm ?
Sorry for my bad English but it has been a frustrating day full of debugging.

Greetings from Switzerland,
S.

Re. Getting the IPV6 Address of a Device via rtnetlink

Asanga's picture

Hello,

Here is the request init part of a sample RTNETLINK program that shows the IPv6 address info of interfaces.

bzero(&local, sizeof(local));
local.nl_family = AF_NETLINK;
local.nl_pid = getpid();
if(bind(fd, (struct sockaddr*) &local, sizeof(local)) < 0) {
printf("Error in sock bind\n");
exit(1);
}

bzero(&peer, sizeof(peer));
peer.nl_family = AF_NETLINK;

bzero(&msg_info, sizeof(msg_info));
msg_info.msg_name = (void *) &peer;
msg_info.msg_namelen = sizeof(peer);

bzero(&netlink_req, sizeof(netlink_req));

netlink_req.nlmsg_info.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifaddrmsg));
netlink_req.nlmsg_info.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
netlink_req.nlmsg_info.nlmsg_type = RTM_GETADDR;
netlink_req.nlmsg_info.nlmsg_pid = getpid();

netlink_req.ifaddrmsg_info.ifa_family = AF_INET6;

iov_info.iov_base = (void *) &netlink_req.nlmsg_info;
iov_info.iov_len = netlink_req.nlmsg_info.nlmsg_len;
msg_info.msg_iov = &iov_info;
msg_info.msg_iovlen = 1;

rtn = sendmsg(fd, &msg_info, 0);

If you require the whole code, look in my home page.

Re. second break; Usually the end of a returned message is indicated by a NLMSG_DONE. But for monitoring of routing table changes, this will not work. Since the example code in this article was common, that second break is also part of the loop.

Kind regards,
Asanga

Getting route updates

Nagendra's picture

Hi..

This document was really helpfull. Thanks a lot.

I have a question regarding receiving route updates from the kernel.
I have a process that waits for any routing table changes. It is able to get updates when ever a new route is added or deleted. It gets arround 52 bytes of data.

When I add a new route entry I get 52 bytes but, it fails to enter
"for(;NLMSG_OK(nlp, nll);nlp=NLMSG_NEXT(nlp, nll))" loop of read_reply() as given in this document and more over when I try to print "nlp->nlmsg_type" its always RTM_NEWROUTE even though I deleted a route entry in my previous operation.

What I want is...
1) When ever a new entry gets added read_reply() function should print the new entry that got added.
2) When ever a entry is deleted from the route table, it should print the entry that got deleted as well as nlp->nlmsg_type shud be RTM_DELROUTE so that I know that the netlink message I got is because of delete operation.

Your help in this regard will be appreciated.

Thanks and regards,

Nagendra KS.

re: Getting route updates

Asanga's picture

Hello,

I added the statement

printf("Type %d\n", nlp->nlmsg_type);

just after the statement

nlp = (struct nlmsghdr *) buf;

in the mon_routing_table.c file and I see 25 (RTM_DELROUTE) for a route delete and 24 (RTM_ADDROUTE) for a route add.

Kind regards,
Asanga

Flush Cache

Mike CC's picture

It is possible to flush the route cache via rtnetlink sockets?

re: Flush Cache

Asanga's picture

Hello,

> It is possible to flush the route cache via rtnetlink sockets?

As far as I know, there isn't any RTNETLINK command to flush the routing cache. But after looking at the source code of the "ip" command suit I found that they write a -1 to
/proc/sys/net/ipv4/route/flush to flush the routing cache.

Kind regards,
Asanga

How to specify a NIC?

CC's picture

Hallo,
this article helps me to understand the way to implement a protocol.
But some questions are still confusing me.

If an application just wants to send message through a specified NIC ( e.g. the node has more than one NIC, like LAN, WLAN etc), how can the application just set this selectivly ?
Is it able to set up more than one NIC for sending/receiving at the same time, or it should be done in different threads ?

Is there a Windows-Version of NETLINK & RTNETLINK ?

thanks in advance

Re: How to specify a NIC?

Asanga's picture

Hello,

>If an application just wants to send message through a specified NIC ( e.g. the node has more than one NIC, like LAN, WLAN etc), how can the application just set this selectivly ?

I assume that you are asking about sending IP packets over an interface. If that is the case, you must use INET type sockets to do this.

> Is it able to set up more than one NIC for sending/receiving at the same time, or it should be done in different threads ?

What interface a packet takes, is usually decided by the routing table, depending on the destination address of the packet. But I think INET sockets also has a facility to send packets from a given interface (thru sendmsg())

> Is there a Windows-Version of NETLINK & RTNETLINK ?

As far as I know, no (atleast up to XP)

Regards,
Asanga

Via gateway

Miguel's picture

I'm having problems adding a route via a gateway, i tried to add it but it simply wont work.

This is the code i'm adding after the iface

rtap= (struct rtattr *) (((char *)rtap) + sizeof(struct rtattr));
rtap->rta_type=RTA_GATEWAY;
rtap->rta_len = sizeof (struct rtattr) + 4;
inet_pton(AF_INET, gw, ((char *)rtap) + sizeof (struct rtattr));
rtl +=rtap->rta_len;

Thanks

Re: Via gateway

Asanga's picture

Hello Miguel,

Here is the code to add the gateway to a route,

rtap = (struct rtattr *) (((char *)rtap)
+ rtap->rta_len);
rtap->rta_type = RTA_GATEWAY;
rtap->rta_len = sizeof(struct rtattr) + 4;
inet_pton(AF_INET, gw,
((char *)rtap) + sizeof(struct rtattr));
rtl += rtap->rta_len;

Your code piece is almost the same except for the rta_len addition. If there is no problem here, also check whether you can add the same route entry that you are trying to add programatically using ip route add command. A frequent problem of adding gateways to routes is that the gateway should be reachable.

Kind regards,
Asanga

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix