A Non-Technical Look inside the EXT2 File System
The easiest way to understand the EXT2 file system is to watch it in action.
To explain the EXT2 file system in action, we will need two things: a variable named DIR that holds directories, and a path name to look up. Some path names have many components (e.g., /usr/X11/bin/Xrefresh) and others do not (e.g., /vmlinuz).
Assume that a process wants to open a file. Each process is associated with a current working directory. All file names that do not start with “/” are resolved relative to this current working directory and DIR begins with the current working directory. File names that start with “/” are resolved relative to the root directory (see chroot for the one exception), and DIR begins with the root directory.
Each directory name in the path to be resolved is looked up in DIR in turn. This lookup yields the inode number of the subdirectory we're interested in.
Next, the inode of the subdirectory is accessed. The permissions are checked, and if you have access permissions, this new directory becomes DIR. Each subdirectory in the path is treated in this fashion, until only the last component of the path remains.
When the last component of the pathname is reached, the variable DIR contains the directory which contains the file name we've been searching for. Looking in DIR we find the inode number of the file. Accessing this final inode tells us the location of the data. After checking permissions, you can access the data.
How many disk accesses were needed to access the data you wanted? A reasonable maximum is two per subdirectory (one to look up the name, the other to find the inode) and two more for the actual file name. This effort is expended only at file open time. After a file has been opened, subsequent accesses can use the inode's data without looking it up again. Further, caching eliminates many of the accesses needed to look up a file (more later).
When a new file or directory is created, the EXT2 file system must decide where to store the data. If the disk is mostly empty, data can be stored almost anywhere. However, performance is maximized if the data are clustered with other related data to minimize seek times.
The EXT2 file system attempts to allocate each new directory in the group containing its parent directory, on the theory that accesses to parent and children directories will be closely related. The EXT2 file system also attempts to place files in the same group as their directory entries, because directory accesses often lead to file accesses. However, if the group is full, the new file or new directory is placed in some other non-full group.
The data blocks needed to store directories and files can be found by looking in the data allocation bitmap. Any needed space in the inode table can be found by looking in the inode allocation bitmap.
Like most file systems, the EXT2 system relies very heavily on caching. A cache is a part of RAM dedicated to holding file system data. The cache holds directory information inode information and actual file contents. Whenever an application (like a text editor or a compiler) tries to look up a file name or requests file data, the EXT2 system first checks the cache. If the answer can be found in the cache, the request can be answered very quickly indeed without using the disk.
The cache is filled with data from prior requests. If you request data that you have never requested before, the data must first be retrieved from disk. Most of the time most people ask for data they have used before. These repeat requests are answered quickly from the cache, saving the disk drive much effort while providing the user quick access.
Of course, each computer has a limited amount of RAM available. Most of that RAM is used for other things like running applications, leaving perhaps 10% to 30% of total RAM available for the cache. When the cache becomes full, the oldest unused data (least recently used data) is thrown out. Only recently used data remains in the cache.
Since larger caches can hold more data, they can also satisfy a larger number of requests. The figure below shows a typical curve of the total cache size versus the percent of all requests that can be satisfied from the cache. As you can see in Figure 3, using more RAM for caching increases the number of requests answered from the cache, and therefore increases the apparent speed of the file system.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- The Qt Company's Qt Start-Up
- Devuan Beta Release
- May 2016 Issue of Linux Journal
- EnterpriseDB's EDB Postgres Advanced Server and EDB Postgres Enterprise Manager
- The US Government and Open-Source Software
- Open-Source Project Secretly Funded by CIA
- The Humble Hacker?
- The Death of RoboVM
- BitTorrent Inc.'s Sync
- New Container Image Standard Promises More Portable Apps
In modern computer systems, privacy and security are mandatory. However, connections from the outside over public networks automatically imply risks. One easily available solution to avoid eavesdroppers’ attempts is SSH. But, its wide adoption during the past 21 years has made it a target for attackers, so hardening your system properly is a must.
Additionally, in highly regulated markets, you must comply with specific operational requirements, proving that you conform to standards and even that you have included new mandatory authentication methods, such as two-factor authentication. In this ebook, I discuss SSH and how to configure and manage it to guarantee that your network is safe, your data is secure and that you comply with relevant regulations.Get the Guide