Exploiting 64-Bit Linux
Wide availability of 64-bit systems offers an opportunity to simplify programming. By leveraging large virtual addresses and memory mapped files, we combine programming and file persistence into a single activity. This simplifies programming when persistence and sharing of large complex data structures are required. It also improves the efficiency of data access and sharing, as multiple programs can access data directly (operate in place) from the single real page copy. This eliminates the need to copy the data through multiple layers of buffering. When all programs share data at the same virtual address, there are opportunities for the kernel to manage the memory map to avoid aliases and share MMU resources across applications.
As a proof of concept, I have a small user library I call SPHDE (Shared Persistent Heap/Data Environment). This library manages a portion of the application's address space to provide blocks of persistent and shared storage. The library also provides some locking primitives and some “utility objects” that provide finer-grained storage allocation and indexing.
As a demonstration, I wrote a gigapixel Mandelbrot (GTK2) GUI program. This program computes each point in the Mandelbrot set once and saves the resulting image as a quadtree in SPHDE storage. This quadtree representation can be shared and displayed by multiple instances of the program. Program instances can cooperate to fill in the next level of detail.
System design and programming models are (initially) driven by the need to manage scarce resources. In the early days of programming, both address spaces and physical memory were small. IO Subsystems evolved to manage the buffering of serial media, such as tape and punch cards. Even the introduction of random access disk and virtual memory did not change the basic paradigm much. The programming paradigms we use and teach today have far outlived the original scarcity that caused them. The separation of programming from persistent data is a prime example. It creates inefficiencies in both programming effort and runtime execution.
With the prevalence of 64-bit microprocessors, the original (addressing/memory) scarcity is gone. We can leverage virtual memory to access and share massive data structures directly using “operate in place” and “implicit persistence” patterns. This simplifies programming, because any data structure is implicitly persistent by where it is allocated. This improves operational efficiency, because it eliminates layers of buffer management and the OS has more opportunities to share a single physical copy. There are additional opportunities for improving address translation efficiency by eliminating address aliasing and exploiting hardware features, such as large pages.
There have been attempts to unify programming and persistence before. Persistence frameworks abound, but they simply add layers of buffering and data conversions to the top of the IO Subsystem stack. Operational efficiency is lost to obtain a modest reduction in programming effort.
Alternatively, Single-Level Stores have been a research topic for years. Although the technology works (simplified programming and data efficiency), the requirement for specialized hardware or unique OS environments has limited acceptance.
This picture changes again with widely available 64-bit microprocessors and standardized POSIX memory management APIs. The addition of a small runtime library allows adventurous programmers to use operate in place and implicit persistence across numerous OSes and 64-bit hardware platforms. This hybrid Single-Level Store approach allows for incremental exploration and adoption of a new programming paradigm.
Simple memory-mapped files is not enough. We need a design for how the address space and backing files are managed. The design should balance simplicity and efficiency in the management of the virtual (address space) and physical (real memory and disk space) resources.
Here are some principles I believe are important to include in the design. The runtime should be simple to use and compatible with UNIX programming and POSIX APIs. It should use resources (address space) that are not normally used in existing applications. It should appear to the program like a large shared persistent heap (SPH). Access to data in the SPH should use standard C pointer semantics. Setup to access to data in the SPH should be minimum. The design evolved to the concepts of regions, stores, segment and blocks, all with power of two sizes.
The region is simply the range of virtual addresses that the application wants the runtime to manage on its behalf. The region should be in a range of virtual addresses not normally used by the application. We want to ensure that the region does not interfere with the normal program usage of (local temporary) heap and dynamic libraries. On Linux, this means the area above the TASK_UNMAPPED_BASE and below the main stack. Leaving lots of room below the stack and above TASK_UNMAPPED_BASE still allows for the use of up to half the program's address space for SPH.
Next, we need to choose a segment size. The segment is the unit size for allocating backing files and memory mapping those files into the region. This should be large enough so that we don't do frequent kernel calls and small enough so that we don't waste file space.
A store is a directory that contains one or more files used to back SPH segments. A system can have multiple SPH stores by allocating the backing files in different directories. For now (also to keep things simple), a program can bind to only one region/store at a time.
Finally, a block is the unit of space allocation within an SPH region. It is normally between a page and a segment in size, but that is not a hard limit. The only hard limit is that the block fits with the remaining space at the time. If the block is allocated to a portion of the SPH region that does not currently have a backing file allocated, the SPH runtime will implicitly create the file(s) in the store directory associated with the region.
For overall simplicity, we use power of two sizes for blocks and segments. This allows a power of two buddy system to be used to track all aspects of SPH storage. The buddy system reduces fragmentation and simplifies recombining smaller blocks into larger blocks when they are freed.
We need some low-level utility functions to help manage the buddy lists. First, we need a simple heap manager to sub-allocate blocks for smaller control blocks and lists. The lists need to be ordered and will be searched frequently, so we use a simple binary tree. Algorithms for the heap manager and binary tree can be found in Donald E. Knuth's The Art of Computer Programming (see Resources).
Finally, we need a place to store and maintain this information. The anchor block is the first block in the region. The anchor block starts with a block header, which includes signature words (eye-catchers initialized to special values), type info, pointer to the start of the local free list (heap) and a set of pointers for the various binary trees that manage space in the SPH region. The remainder of the anchor block is initialized as free heap space. This internal heap is used to allocate additional nodes required for the binary trees and any other internal control blocks needed to manage the SPH region.
After an SPH region is initialized, we have mostly an empty region with one segment (backing file in the store directory) and one block (the anchor block) allocated from that segment. The various lists have been initialized to depict this state. For example, the region free and used lists contain entries for the unallocated (no backing files) and allocated (with backing files) portion of the region. An entry for the anchor block is added to the “used” list. The “uncommitted” list contains entries for the current unused portions of the first segment. And, the free list is empty. Now we are ready to allocate more blocks.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Reply to comment | Linux Journal
4 hours 48 min ago - Reply to comment | Linux Journal
5 hours 4 min ago - Favorite (and easily brute-forced) pw's
6 hours 55 min ago - Have you tried Boxen? It's a
12 hours 47 min ago - seo services in india
17 hours 18 min ago - For KDE install kio-mtp
17 hours 19 min ago - Evernote is much more...
19 hours 19 min ago - Reply to comment | Linux Journal
1 day 4 hours ago - Dynamic DNS
1 day 4 hours ago - Reply to comment | Linux Journal
1 day 5 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
I'd also appreciate some more links to the source code
I don't think the article is understandable at all for people who is not familiar with memory mapping, etc.
Also timings for different calculations are given, but there is no reference values to compare the improvements gained from shifting to the 64-bit platform.
Where is the code?
Neither the resources nor the text provide URLs for the code for the SPHDE library nor the Mandelbrot demo program. When did Linux Journal become a place to report research without allowing readers the opportunity to participate (try out, test, play, and expand on the work described)?
Also, the SASOS URL doesn't work.