A Non-Technical Look inside the EXT2 File System
Everyone wants a fast computer; however, not everyone realizes that one of the most important factors of computer performance is the speed of the file system. Regardless of how fast your CPU is, if the file system is slow, then the whole computer will seem slow. Many people who have very fast Pentium Pros with slow disk drives and even slower networked file systems rediscover this fact daily.
Linux has a very fast file system called the Extended File System Version 2 (EXT2). The EXT2 file system was created by Remy Card (email@example.com).
There are several objectives when deciding how to lay out data on a disk.
First and foremost, the data structure should be recoverable. If there is an error while writing data to the disk (like a user pulling the power cord), the entire file system should not be lost. Although losing the data currently being written is sometimes acceptable, losing all the data on the disk is not.
Second, the data structure must allow for an efficient implementation of all needed operations. The hardest operation to implement is normally the hard link. When using a hard link, there is more than one directory entry (i.e., file name) that points to the same file data. Accessing the data by any of the valid file names should produce the same data.
Another hard operation involves deleting an open file. If an application has a file open for access at the same time that a user deletes the file, the application should still be able to access the file's data. The data should not be cleared off the disk until the last application closes the file. This sort of behavior is quite unlike DOS/Windows, where deleting a file results in immediate loss of access to that file by any application in the process of reading/writing to it. Applications exhibiting this type of Unix behavior are more common than one might think, and changing it would cause many applications to break.
Third, a disk layout should minimize seek times by clustering data on the disk. A drive needs more time to read two pieces of data that are widely separated on the disk than the same sized pieces located close to each other. A good disk layout can minimize disk seek time (and maximize performance) by clustering related data close together. For example, parts of the same file should be close together on disk and, also, near the directory containing the file's name.
Finally, the disk layout should conserve disk space. Conserving disk space was more important in the past, when hard drives were small and expensive. These days, conserving disk space is not so important; however, one should not waste disk space.
Partitions are the first level of disk layout. Each disk must have one or more partitions. The operating system pretends each partition is a separate logical disk, even though they may share the same physical disk. The most common use of partitioning is to place more than one file system on the same physical disk, each in its own partition. Each partition has its own device file in the /dev directory (e.g., /dev/hda1, /dev/hda2, etc.). Every EXT2 file system occupies one partition, and completely fills it.
The EXT2 file system is divided into groups, which are sections of a partition. The division into groups is done at the time the file system is formatted and cannot change without reformatting. Each group contains related data. A group is the unit of clustering in the EXT2 file system. Each group contains a superblock, a group descriptor, a block bitmap, an inode bitmap, an inode table and finally data blocks, all in that order.
Some information about a file system belongs to the file system as a whole and not to any particular file or group. This information is stored in the superblock, and includes the total number of blocks within the file system, the time it was last checked for errors and so on.
The first superblock is the most important one, since it is the first one read when the file system is mounted. The information in the superblock is so important that the file system cannot be mounted without it. If a disk error occurred while updating the superblock, the entire file system would be ruined; therefore, a copy of the superblock is kept in each group. If the first superblock becomes corrupted, the redundant copies can be used to fix the error by using the command e2fsck.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Validate an E-Mail Address with PHP, the Right Way
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?