The “Virtual File System” in Linux
The mechanisms to access file system data described above are detached from the physical layout of data and are designed to account for all the Unix semantics as far as file systems are concerned.
Unfortunately, not all file system types support all of the functions just described—in particular, not every type has the concept of “inode”, even though the kernel identifies every file by means of its unsigned long inode number. If the physical data accessed by a file system type has no physical inodes, the code implementing readdir and read_inode must invent an inode number for each file in the storage medium.
A typical technique to choose an inode number is using the offset of the control block for the file within the file system data area, assuming the files are identified by something that can be called a “control block”. The iso9660 type, for example, uses this technique to create an inode number for each file in the device.
The /proc file system, on the other hand, has no physical device from which to extract its data and, therefore, uses hardwired numbers for files that always exist, like /proc/interrupts, and dynamically allocated inode numbers for other files. The inode numbers are stored in the data structure associated with each dynamic file.
Another typical problem faced when implementing a file system type is dealing with limitations in the actual storage capabilities. For example, how to react when the user tries to rename a file to a name longer than the maximum allowed length for the particular file system, or when she tries to modify the access time of a file within a file system that doesn't have the concept of access time.
In these cases, the standard is to return -ENOPERM, which means “Operation not permitted”. Most VFS functions, like all the system calls and a number of other kernel functions, return zero or a positive number in case of success, and a negative number in the case of errors. Error codes returned by kernel functions are always one of the integer values defined in <asm/errno.h>.
I'd now like to show a little code to play with VFS, but it's quite hard to conceive of a small enough file system type to fit in the article. Writing a new file system type is surely an interesting task, but a complete implementation includes 39 “operation” functions.
Fortunately enough, the /proc file system as defined in the Linux kernel lets modules play with the VFS internals without the need to register a whole new file system type. Each file within /proc can define its own inode operations and file operations and is, therefore, able to exploit all the features of the VFS. The method of creating /proc files is easy enough to be introduced here, although not in too much detail. “Dynamic /proc files” are so named because their inode number is dynamically allocated at file creation (instead of being extracted from an inode table or generated by a block number).
In this section we build a module called burp, for “Beautiful and Understandable Resource for Playing”. Not all of the module will be shown because the innards of each dynamic file are not related to VFS.
The main structure used in building up the file tree of /proc is struct proc_dir_entry. One such structure is associated with each node within /proc, and it is used to keep track of the file tree. The default readdir and lookup inode operations for the file system access a tree of struct proc_dir_entry to return information to the user process.
The burp module, once equipped with the needed structures, will create three files: /proc/root is the block device associated with the current root partition, /proc/insmod is an interface to load/unload modules without the need to become root, and proc/jiffies reads the current value of the jiffy counter (i.e., the number of clock ticks since system boot). These three files have no real value and are just meant to show how the inode and file operations are used. As you see, burp is really a “Boring Utility Relying on Proc”. To avoid making the utility too boring I won't give the details about module loading and unloading, since they have been described in previous Kernel Korner articles which are now accessible on the Web. The whole burp.c file is available as well from SSC's ftp site.
Creation and destruction of /proc files is performed by calling the following functions:
proc_register_dynamic(struct proc_dir_entry \ *where, struct proc_dir_entry *self); proc_unregister(struct proc_dir_entry *where, \ int inode);
In both functions, where is the directory where the new file belongs, and we'll use &proc_root to use the root directory of the file system. The self structure, on the other hand, is declared inside burp.c for each of the three files. The definition of the structure is reported in Listing 5 for your reference; I'll show the three burp incarnations of the structure in a while, after discussing their role in the game.
The “synchronous” part of burp reduces therefore to three lines within init_module() and three within cleanup_module(). Everything else is dispatched by the VFS interface and is “event-driven” inasmuch as a process accessing a file can be considered an event (yes, this way to see things is unorthodox, and you should never use it with professional people).
The three lines in ini_module() look like:
proc_register_dynamic(&proc_root, \ &burp_proc_root);
and the ones in cleanup_module() look like:
proc_unregister(&proc_root, \ burp_proc_root.low_ino);The low_ino field is the inode number for the file being unregistered, and has been dynamically assigned at load time.
But how will these three files respond to user access? Let's look at each of them independently.
/proc/root is meant to be a block device. Its “mode” should, therefore, have the S_IFBLK bit set, its inode operations should be those of block devices and its device number should be the same as the root device currently mounted. Since the device number associated with the inode is not part of the proc_dir_entry structure, the fill_inode field must be used. The inode number of the root device will be extracted from the table of mounted file systems.
/proc/insmod is a writable file. It needs its own file_operations to declare its “write” method. Therefore, it declares its inode_operations that points to its file operations. Whenever its write() implementation is called, the file asks kerneld to load or unload the module whose name has been written. The file is writable by anybody. This is not a big problem as loading a module doesn't mean accessing its resources and what is loadable is still controlled by root via /etc/modules.conf.
/proc/jiffies is much easier; the file is read-only. Kernel versions 2.0 and later offer a simplified interface for read-only files: the get_info function pointer, if set, will be asked to fill a page of data each time the file is read. Therefore, /proc/jiffies doesn't need its own file operations nor inode operations; it just uses get_info. The function uses sprintf() to convert the integer jiffies value to a string.
The snapshot of a tty session in Listing 6 shows how the files appear and how two of them work. Listing 7, finally, shows the three structures used to declare the file entries in /proc. The structures have not been completely defined, because the C compiler fills with zeroes any partially defined structure without issuing any warning (feature, not bug).
The module has been compiled and run on a PC, an Alpha and a Sparc, all of them running Linux version 2.0.x
The /proc implementation has other interesting features to offer, the most notable being the sysctl interface. This idea is so interesting, and it will need to be covered in a future Kernel Korner.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Tech Tip: Really Simple HTTP Server with Python
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Doing for User Space What We Did for Kernel Space
- Google's SwiftShader Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide