Trees in the Reiser4 Filesystem, Part I

The basic structure of the new ReiserFS—graphs vs. trees, keys, nodes and blocks.
Types of Items

Reiser4 includes many different kinds of items designed to hold different types of information:

  • static_stat_data: holds the owner, permissions, last access time, creation time, last modification time, size and the number of links (names) to the file.

  • cmpnd_dir_item: holds directory entries and the keys of the files they link to.

  • extent pointers: explained above.

  • node pointers: explained above.

  • bodies: hold parts of files not large enough to be stored in unfleaves.

Units

We call a unit that which we must place as a whole into an item, without splitting it across multiple items. When traversing an item's contents, it is often convenient to do so in units:

  • For body items the units are bytes.

  • For directory items the units are directory entries. The directory entries contain a name and a key of the file named (in practice the name and key may be compressed).

  • For extent items the units are extents. Extent items contain only extents from the same file.

  • For static_stat_data the whole stat data item is one indivisible unit of fixed size.

Figure 5. What Node Formats Look Like

Conclusion

I have explained the basic structures of the Reiser4 tree, but the fun stuff is yet to come. I have not yet explained how other researchers structure their trees. Nor did you learn why object contents are stored at the bottom of the tree, why high fanout is important or what are the different kinds of balancing. No hint have I yet given as to why balanced trees are better and dancing trees are best. What I have most especially not done is explain how a subtle and controversial tree structure change, which you can see in the trees depicted in this article, doubled Reiser4 read speed compared to Reiser3. This will (space permitting) be in Part II in next month's issue of Linux Journal.

Resources

Hans Reiser (reiser@namesys.com) entered UC Berkeley in 1979 after completing the eigth grade and majored in “Systematizing”, an individual major based on the study of how theoretical models are developed. His senior thesis discussed how the philosophy of the hard sciences differs from that of computer science, with the development of a naming system as a case study. He is still implementing that naming system, of which Reiser4 is the storage layer. In 1993 he went to Russia and hired a team of programmers to develop ReiserFS. He worked full-time to pay their salaries while spending nights and weekends arguing over algorithms. In 1999 it began to work well enough that his mother stopped suggesting a salaried job at a nice big company.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState