The sysctl Interface

A look at the sysctl system call that gives you the ability to fine tune kernel parameters.
A Quick Look at Some sysctl Entries

Although low-level, the tunable parameters of the kernel are very interesting to tweak and can help optimize system performance for the different environments where Linux is used.

The following list is an overview of some relevant /kernel and /vm files in /proc/sys. (This information applies to all kernels from 2.0 through 2.1.35.)

  • kernel/panic - The integer value is the number of seconds the system will wait before automatic reboot in case of system panic. A value of 0 means “disabled”. Automatic reboot is an interesting feature to turn on for unattended systems. The command-line option panic=value can be used to set this parameter at boot time.

  • kernel/file-max - The maximum number of open files in the system. file-nr, on the other hand, is the per-process maximum and can't be modified, because it is constrained by the hardware page size. Similar entries exist for the inodes: a system-wide entry and an immutable per-process one. Servers with many processes and many open files might benefit by increasing the value of these two entries.

  • kernel/securelevel - This is a hook for security features in the system. The securelevel file is currently read-only even for root, so it can only be changed by program code (e.g., modules). Only the EXT2 file system uses securelevel—it refuses to change file flags (like immutable and append-only) if securelevel is greater than 0. This means that a kernel, precompiled with a non-zero securelevel and no support for modules, can be used to protect precious files from corruption in case of network intrusions. But stay tuned for new features of securelevel.

  • vm/freepages - Contains three numbers, all counts of free pages. The first number is the minimum free space in the system. Free pages are needed to fulfill atomic allocation requests, like incoming network packets. The second number is the level at which to start heavy swapping, and the third is the level to start light swapping. A network server with high bandwidth benefits from higher numbers in order to avoid dropping packets due to free memory shortage. By default, one percent of the memory is kept free.

  • vm/bdflush - The numbers in this file can fine-tune the behaviour of the buffer cache. They are documented in fs/buffer.c.

  • vm/kswapd - This file exists in all of the 2.0.x kernels, but has been removed in 2.1.33 as not useful. It can safely be ignored.

  • vm/swapctl - This big file encloses all the parameters used in fine-tuning the swapping algorithms. The fields are listed in include/linux/swapctl.h and are used in mm/swap.c.

The Programming Interface: Plugging New Features

Module writers can easily add their own tunable features to /proc/sys by using the programming interface to extend the control tree. The kernel exports to modules the following two functions:

struct ctl_table_header *
        register_sysctl_table(ctl_table * table,
        int insert_at_head);
void unregister_sysctl_table(
        struct ctl_table_header * table);

The former function is used to register a “table” of entries and returns a token, which is used by the latter function to detach (unregister) your table. The argument insert_at_head tells whether the new table must be inserted before or after the other ones, and you can easily ignore the issue and specify 0, which means “not at head”.

What is the ctl_table type? It is a structure made up of the following fields:

  • int ctl_name - This is a numeric ID, unique within each table.

  • const char *procname - If the entry must be visible through /proc, this is the corresponding name.

  • void *data - The pointer to data. For example, it will point to an integer value for integer items.

  • int maxlen - The size of the data pointed to by the previous field; for example, sizeof(int).

  • mode_t mode - The mode of the file. Directories should have the executable bit turned on (e.g., 0555 octal).

  • ctl_table *child - For directories, the child table. For leaf nodes, NULL.

  • proc_handler *proc_handler - The handler is in charge of performing any read/write spawned by /proc files. If the item has no procname, this field is not used.

  • ctl_handler *strategy - This handler reads/writes data when the system call is used.

  • struct proc_dir_entry *de - Used internally.

  • void *extra1, *extra2 - These fields have been introduced in version 1.3.69 and are used to specify extra information for specific handlers. The kernel has an handler for integer vectors, for example, that uses the extra fields to be notified about the allowable minimum and maximum allowed values for each number in the array.

Well, the previous list may have scared most readers. Therefore, I won't show the prototypes for the handling functions and will instead switch directly to some sample code. Writing code is much easier than understanding it, because you can start by copying lines from existing files. The resulting code will fall under the GPL—of course, I don't see that as a disadvantage.

Let's write a module with two integer parameters, called ontime and offtime. The module will busy-loop for a few timer ticks and sleep for a few more; the parameters control the duration of each state. Yes, this is silly, but it is the simplest hardware-independent example I could imagine.

The parameters will be put in /proc/sys/kernel/busy, a new directory. To this end, we need to register a tree like the one shown in Figure 1. The /kernel directory won't be created by register_sysctl_table, because it already exists. Also, it won't be deleted at unregister time, because it still has active child files; thus, by specifying the whole tree of directories you can add files to every directory within /proc/sys.

Listing 2 is the interesting part of busy.c, which does all the work related to sysctl. The trick here is leaving all the hard work to proc_dointvec and sysctl_intvec. These handlers are exported only by version 2.1.8 and later of the kernel, so you need to copy them into your module (or implement something similar) when compiling for older kernels.

I won't show the code related to busy looping here, because it is completely out of the scope of this article. Once you have downloaded the source from the FTP site1, it can be compiled on your own system. It works with both version 2.0 and 2.1 on the Intel, Alpha and SPARC platforms.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Call sysctl from kernel ?

Anonymous's picture

I miss a description how to make a sysctl-call from the kernel and not from userspace i.E. for the case you want to communicate with another kernel module where exported variables are unknown or you just want to use the existing interface instead of creating another one.

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions