Running Linux with Broken Memory
We assume a memory checker like memtest86 finds errors in memory modules. As explained, this checker lists all erroneous addresses it finds. And, because a whole row or column may fail after a static discharge, this could lead to extremely long lists of errors, hundreds of errors not being an exception. Naturally, such a list would be tedious and error prone to enter into the kernel, and it may also give rise to problems caused by strictly limited resources at boot time. It would be ideal to have a very small representation of all the errors found.
Luckily, memory errors are usually laid out in regular patterns, such as starting at address 0<\#215>1234, every 0<\#215>0040 bytes, over 16 occurrences—or a bit more variation, of course.
The regularity is often easiest to see when doing binary. That is because the rows and columns are addressed by spreading address bits over these lines, and by interpreting part of the address to decide whether the used address lines fall into the correct region.
A single error can be described as an address, of which all bits are valuable information. We write this, for example, as 0<\#215>1234,0<\#215>ffff where the first number is the base address and the second number is a mask. The “1” bits in the mask indicate which of the corresponding base address bits are valuable. If, aside from this address, also the address plus 0<\#215>0040 is wrong, then we can simply alter the mask to represent this. The resulting address/mask pair now becomes 0<\#215>1234,0<\#215>ffbf. We can see that this is correct because the addresses covered are all addresses A for which (A & 0<\#215>ffbf)=0<\#215>1234, or concretely, 0<\#215>1234 and 0<\#215>1274. If there are 16 faulty addresses with this intermittent offset, then the whole thing becomes 0<\#215>1234,0<\#215>fc3f to capture faulty addresses: 0<\#215>1234, 0<\#215>1274, ..., 0<\#215>15f4.
This approach works well to capture faults over a row or column, as long as these cover a power of two for bits, which is normal. It also works to capture the specks of dust that disable a few neighboring bits on the chip. But what if a memory module contains more errors? Or, similarly, if multiple memory modules each have errors? To cover these cases, we usually assume a list of the aforementioned address/mask pairs. In practice, it turns out that five of these pairs suffice for most practical situations. Be aware that any set of errors can always be compacted in as little as one such pair, albeit with loss of good addresses, 0<\#215>0000,0<\#215>0000 being the ultimate example that captures all errors, but unfortunately, no good addresses. With five pairs, we usually have no problems that extreme.
memtest86, starting with version 2.3, is able to generate BadRAM patterns that can directly be entered on the command line of a BadRAM-patched Linux kernel. In the above example, memtest86 would report a sequence of evolving patterns, eventually leading to:
Having written this down, you may now reboot the system and enter this on the command line:
LILO: linux badram=0<\#215>1234,0<\#215>fb3fand your system should boot fine, ignoring the broken memory parts. Now, go ahead and add this line to your /etc/lilo.conf:
append="badram=0<\#215>1234,0<\#215>fb3f"and run LILO. You will not have to enter the address/mask pairs anymore on consecutive boots.
My old ZX Spectrum had a nifty way to expand memory with an additional 32K. The Sinclair memory expansion kit was comprised of 64K chips, of which either the high or the low half was known to work while the other half was faulty, which made the price lower than the “proper” expansion method with 32k chips. By selecting some toggle on the motherboard, the computer was instructed which half of the expansion memory chips it should use. Basically, the BadRAM patch does the same thing, with a bit more refinement.
Memory management units, which are a necessity for all Linux distributions to work well, redirect a page as accessed by a “user space” program to any physical page. What appears like a long stretch of memory allocated just for your user process is, in reality, scattered over the memory modules (and may even be swapped out). When a user-level program allocates memory, the kernel gives it out by allocating single pages of physical memory.
The only part of Linux that works in terms of physical memory addresses is the kernel. It is loaded in the beginning of the memory (obviously, you must not have errors in that region) and may allocate blocks of memory from the same pile of free memory that is also used to serve the user process.
This pile of memory is filled with the physical pages at boot time. Actually all that the BadRAM patch does is leave out those pages that fall in one of the address/mask pairs entered on the command line.
This means that all the work involved in BadRAM is done at boot time. Afterward, the only effect is that the pile of free memory is a tad smaller. Extensive benchmarks have shown that this has no measurable influence on runtime performance.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
|Working with Command Arguments||May 28, 2016|
|Secure Desktops with Qubes: Installation||May 28, 2016|
|CentOS 6.8 Released||May 27, 2016|
|Secure Desktops with Qubes: Introduction||May 27, 2016|
|Chris Birchall's Re-Engineering Legacy Software (Manning Publications)||May 26, 2016|
|ServersCheck's Thermal Imaging Camera Sensor||May 25, 2016|
- Tips for Optimizing Linux Memory Usage
- Working with Command Arguments
- Secure Desktops with Qubes: Introduction
- Secure Desktops with Qubes: Installation
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- CentOS 6.8 Released
- The Italian Army Switches to LibreOffice
- Linux Mint 18
- Oracle vs. Google: Round 2
- The FBI and the Mozilla Foundation Lock Horns over Known Security Hole
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide