A Non-Technical Look inside the EXT2 File System
The easiest way to understand the EXT2 file system is to watch it in action.
To explain the EXT2 file system in action, we will need two things: a variable named DIR that holds directories, and a path name to look up. Some path names have many components (e.g., /usr/X11/bin/Xrefresh) and others do not (e.g., /vmlinuz).
Assume that a process wants to open a file. Each process is associated with a current working directory. All file names that do not start with “/” are resolved relative to this current working directory and DIR begins with the current working directory. File names that start with “/” are resolved relative to the root directory (see chroot for the one exception), and DIR begins with the root directory.
Each directory name in the path to be resolved is looked up in DIR in turn. This lookup yields the inode number of the subdirectory we're interested in.
Next, the inode of the subdirectory is accessed. The permissions are checked, and if you have access permissions, this new directory becomes DIR. Each subdirectory in the path is treated in this fashion, until only the last component of the path remains.
When the last component of the pathname is reached, the variable DIR contains the directory which contains the file name we've been searching for. Looking in DIR we find the inode number of the file. Accessing this final inode tells us the location of the data. After checking permissions, you can access the data.
How many disk accesses were needed to access the data you wanted? A reasonable maximum is two per subdirectory (one to look up the name, the other to find the inode) and two more for the actual file name. This effort is expended only at file open time. After a file has been opened, subsequent accesses can use the inode's data without looking it up again. Further, caching eliminates many of the accesses needed to look up a file (more later).
When a new file or directory is created, the EXT2 file system must decide where to store the data. If the disk is mostly empty, data can be stored almost anywhere. However, performance is maximized if the data are clustered with other related data to minimize seek times.
The EXT2 file system attempts to allocate each new directory in the group containing its parent directory, on the theory that accesses to parent and children directories will be closely related. The EXT2 file system also attempts to place files in the same group as their directory entries, because directory accesses often lead to file accesses. However, if the group is full, the new file or new directory is placed in some other non-full group.
The data blocks needed to store directories and files can be found by looking in the data allocation bitmap. Any needed space in the inode table can be found by looking in the inode allocation bitmap.
Like most file systems, the EXT2 system relies very heavily on caching. A cache is a part of RAM dedicated to holding file system data. The cache holds directory information inode information and actual file contents. Whenever an application (like a text editor or a compiler) tries to look up a file name or requests file data, the EXT2 system first checks the cache. If the answer can be found in the cache, the request can be answered very quickly indeed without using the disk.
The cache is filled with data from prior requests. If you request data that you have never requested before, the data must first be retrieved from disk. Most of the time most people ask for data they have used before. These repeat requests are answered quickly from the cache, saving the disk drive much effort while providing the user quick access.
Of course, each computer has a limited amount of RAM available. Most of that RAM is used for other things like running applications, leaving perhaps 10% to 30% of total RAM available for the cache. When the cache becomes full, the oldest unused data (least recently used data) is thrown out. Only recently used data remains in the cache.
Since larger caches can hold more data, they can also satisfy a larger number of requests. The figure below shows a typical curve of the total cache size versus the percent of all requests that can be satisfied from the cache. As you can see in Figure 3, using more RAM for caching increases the number of requests answered from the cache, and therefore increases the apparent speed of the file system.
Webinar: 8 Signs You’re Beyond Cron
11am CDT, April 29th
Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.Join us!
- DevOps: Better Than the Sum of Its Parts
- Return of the Mac
- Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites
- Play for Me, Jarvis
- Non-Linux FOSS: .NET?
- Not So Dynamic Updates
- Designing Foils with XFLR5
- Users, Permissions and Multitenant Sites
- April 2015 Issue of Linux Journal: High-Performance Computing