Porting Linux to the DEC Alpha
With all of the infrastructure in place, I was now able to turn my attention to the task of porting the kernel itself. My experience with previous Unix ports has been that the greatest number of system dependencies are found in the virtual memory subsystem, the process scheduler, the system call interface, the device drivers, and the trap handlers. In this particular case, I wasn't worried about the device drivers because I was prepared to write a few trivial drivers to interface to the console devices anyway.
My own approach to software development and porting is to consider the data structures central to the operation of the program. Accordingly, I used the kernel include files as my starting point for understanding the code's structure and system dependencies. I combed through the include files, making notations as to where I thought there would be system dependencies and where algorithms might need to be modified for the new environment. Frequently I'd go back and forth between the include files, the C code, and my porting notes. Eventually a (relatively) coherent approach to the port emerged, which I began implementing.
One change that I made everywhere—and later regretted—involved the cli() and sti() routines. On Intel, cli() and sti() disable and enable interrupts, respectively. The Digital Unix PALcode on Alpha, however, implements a seven-level prioritized interrupt scheme. At the time that I started the port, I was not certain whether it would be necessary to preserve the interrupt hierarchy.
I laboriously replaced all instances of cli() with calls to the ipl() routine, setting the current IPL (interrupt priority level) to the maximum and preserving the previous IPL. I replaced calls to sti() with calls to ipl() to restore the previously-saved level. I did this because I was not certain what the IPL might be when a particular piece of code is run, and it would be a mistake to implement sti() as ipl(0) if the code in fact had been entered at a non-zero IPLl; it turned out that this was largely unnecessary.
Linux implements two-stage interrupt handling, where interrupt service routines are divided into a “top half” and a “bottom half”. The top half is what runs at nonzero IPL when the interrupt is received. Generally, the top half performs the minimum amount of work necessary to acknowledge the interrupt, and queues subsequent actions to be run by the bottom half. This means that the interrupt handlers themselves are pretty much self-contained, and the bulk of the kernel code runs at IPL 0 unless explicitly raised. For Alpha, I could just as easily implement cli() as ipl(IPLMAX) and sti() as ipl(0), without ill effects. This is exactly what we did for the device driver work.
The virtual memory subsystem was one of those places where I had to implement Alpha-specific versions of Intel-specific routines. In many ways, the Alpha memory-mapping scheme is similar to the Intel scheme: Intel uses a two-level page table to map a 32-bit virtual address space, while Alpha uses a three- level page table to map a 64-bit virtual address space. However, if one is only mapping 32 bits of virtual address, Alpha only requires a single first-level page table entry and a single second-level page table. Therefore, on a 32-bit system the Alpha scheme essentially collapses into a two-level scheme. The upshot of all this is that similar algorithms could be used to manipulate both Intel and Alpha page tables.
The Alpha Level 1 page table is set up once at boot time and is never heard from again; the Level 2 page table corresponds to the Page Directory on Intel; the Level 3 page tables correspond to the actual Intel page tables. In fact, to save memory, I implemented only a single system-wide Level 2 page table. It turns out that with the addressing scheme that I outlined above, I could map the entire address space using only the first 256 Level 2 page table entries, 128 of which can map the entire user address space. Therefore, I maintained a single Level 2 page table, kept the kernel entries continually mapped, and copied in new user entries for each context switch. The contents of the user entries were kept in the pcb_struct (an Alpha-specific structure not present in the Intel version), which was attached to the task_struct.
Unfortunately, the Intel Linux memory-management code took advantage of some fortuitous features of the Intel paging model. For instance, to obtain the physical memory address of a virtual memory page on Intel, you can simply obtain the corresponding page table entry and mask out the low bits. Page table entries on Alpha are not so accommodating—they are 64 bits wide. If I had 64-bit computation available to me at the beginning, I could do a mask and a shift. As it was, I had to treat a page table entry as a struct of two integers, extract the page frame number from one member, and shift it to obtain the physical address.
Because I ended up changing virtually every line of memory.c to accommodate the slightly different page frame traversal and parsing semantics, I instead produced two versions of every routine in memory.c—one for i386, and one for alpha. Context switching was another area requiring significant change, and one of the more difficult to debug. Much of the context switching and system call handling code had to be rewritten, as it was originally implemented in Intel assembly language. Intel code saves some process state on the stack, but relies on the native task-switching mechanism of the Intel CPU to save and restore other process state to and from the Task State Segment (TSS). While the Digital Unix PALcode supports the concept of a “process context” structure, this structure contains relatively little of the actual process context. Instead, it contains the vital pointers (kernel stack pointer, user stack pointer, page table base register) needed to allow a process to save and restore its own context.
Most of the process context for a Linux/Alpha process resides on the process's kernel stack. Six items (PS, PC, GP, A0, A1, and A2) are pushed onto the kernel stack by the PALcode upon entry into kernel mode (i.e. any time a trap or interrupt is taken). The remainder of the processor's register state is either pushed onto the kernel stack by the trap handler, or stored in the process's task_struct.
In my 32-bit port, I decided to play it safe by always pushing the entire register state onto the stack, including the floating-point registers. This is needlessly wasteful, of course, especially if the process in question has never used the floating-point registers. I had hoped eventually to optimize the register save/restore path, but our development group switched to version 1.2 before I got around to it.
I also had to update the Level 2 page table area on every context switch. There were 128 Level 2 page table entries per process, of which at most two or three were typically used. For ease of implementation, I simply saved and restored all 128 entries on every context switch. Again, this was something I had hoped to be able to optimize but didn't get a chance to implement before cutting over to 1.2.
Re-implementing the system call and trap handlers was not too difficult. For the system call handler, I had to figure out the Intel system call semantics for passing arguments in registers, and use the analogous Alpha registers to pass arguments. As for trap handling: while Alpha implements a different set of traps than Intel, it was relatively straightforward to figure out where to vector the various Alpha traps.
The only pieces of the file system that required extensive attention were the buffer cache and the exec() code. The buffer cache had to be reviewed to verify that it would work with a different hardware page size (8KB on Alpha as opposed to 4KB on Intel). The exec() code had to be made aware of the executable file format generated by gcc and the GNU binutils (in this case, it was a COFF variant).
After several weeks of reviewing and modifying code, I was ready to try to compile it. Not surprisingly, getting a clean compile was itself an iterative process. I would encounter an error, decide whether it represented an error on my part or an attempt to compile code that I did not yet want to support, and take appropriate action.
After much effort, I finally had an executable file named “linux” full of Alpha code. The next step was to try to boot it.
Not surprisingly, I did not get very far the first time...or the second...or the third. So I put a printk statement early in the boot sequence so that I could show some early success to my management, and added many additional printk's to track the progress of the kernel through the initialization sequences. Most of the problems I encountered over the succeeding weeks were due to errors on my part in not attending to all the ramifications of certain code changes. What was amazing was that the code I didn't touch frequently worked perfectly the first time. For example, I would spend several days debugging init(), then when it came time to mount the root file system it would just work.
Once I had mounted the root file system and completed all of the kernel initializations, the next step was to run a user-mode executable. Since I did not yet have a C runtime library or any gcc support for anything but the kernel, I decided to hand-craft a program that, though extremely simple, would nonetheless show some outward sign of functioning. I wrote a variant of the ever-popular “hello, world” program in assembly language. Instead of using printf(), I hand-crafted a system call in assembly which called the write() system call, passing it the address and length of the string. Attempting to run this program provided me with much opportunity to debug the exec() code in the file system and the virtual memory page-fault handler. Eventually, though, Linux/Alpha did indeed say “hello, world” to me.
At this point, I needed more executables both to test Linux/Alpha and to transform it from a kernel into a useful system. Since I had not designed my 32-bit port to be binary- compatible with anything else (such as Digital Unix), I had to produce any executables I was going to use from scratch. In order to compile anything other than specially hand-crafted programs, I was going to need a C runtime library. At this point the project had grown larger than one person could handle. (Actually, it had passed that point long before, but at this point I could no longer deny it.)
Fortunately, help arrived in the person of Brian Nelson. Brian had been working for our group for some time already, supporting the VEST VAX-to-Alpha binary translator. At this point VEST's support requirements had diminished somewhat and Brian found himself with some time on his hands. Although he knew very little Unix at the time, his enthusiasm for the Linux project more than made up for any lack of specific knowledge. I tutored him in the arcana of gcc, make, and libraries, and set him off porting the GNU libc from the InfoMagic CD-ROM to Alpha. I handled some of the system- dependent portions while Brian handled the rest.
Porting libc turned out to be less than trivial, mainly because we could not get libc's symbol_alias macro to work properly for the life of us. This macro essentially creates a symbol in the object file's symbol table which is an exact synonym for another symbol, and stdio uses it heavily. We finally managed to build a “Frankenstein-style” libc by stitching together pieces from various sources. Most of it was GNU libc 4.1, but stdio came from BSD, and a few miscellaneous routines came from wherever I could scare them up. Nevertheless, we managed (after a fashion) to get clean builds of various GNU utilities using this library.
We started out porting some Slackware packages, but soon realized that a smaller distribution would get us to a usable system more quickly. I poked around and decided that MCC would be a better choice.
One problem we had compiling virtually any package had to do with configuration. The auto-configure scripts with several packages did not understand the concept of cross-compilation. Since we were doing our development on a Digital Unix system, attempts to configure packages would either fail or produce a Digital Unix version when I wanted a Linux version. Finally I suggested to Brian that he log into an Intel Linux system, configure the packages there, hand-edit the makefile to reference the cross-tool suite, then compile the package on the Digital Unix system using the Linux/Alpha cross tools. This rather baroque strategy actually worked, and he was finally able to get clean builds of some of the smaller utilities. One of the first things I needed was a shell. Brian started off porting bash, but ran into troubles. I scoured the net and found a bunch of freeware shells. Brian and I then started porting like mad until we could get one that would compile cleanly with the cross tools. We finally were able to compile the Plan-9 rc shell.
Brian then went off to continue porting other utilities while I tried to boot Linux and run the rc shell.
Often, code that works in the trivial case can fail in subtle ways when presented with a more complicated case—such was the situation with the shell. While the COFF image-loading code that I was using worked for loading a one-page “hello, world” executable, bugs showed themselves when I attempted to use it on a larger file. Once these problems were resolved, I had to debug the various system calls that rc was attempting to use.
When debugging a newly-ported utility that uses a newly- ported library and runs on a newly-ported system, one needs to keep an open mind as to where potential problems might be. While debugging rc, I ran into problems in all areas. In one case I was not propagating system call error status correctly from the kernel to the user; this caused an erroneous success condition to be returned to the program. In another case, I found that the kernel init() function was not correctly opening /dev/tty0, so that even if rc had been running correctly, it could not have read from or written to the console.
Late one afternoon, I was working from home, using the ISP Alpha simulator and several nm listings to debug yet another rc problem. I'd just fixed a virtual-memory bug when I sent a mail message to my colleagues stating that I was well on my way and would probably have a shell prompt by the end of the week. Then I tried One More Fix, rebooted, watched the initialization messages scroll by, and then saw the screen freeze. On closer inspection, I saw a prompt at the bottom of the screen! Pressing return had the desired effect. I had very few tools to work with, but I could simulate a crude ls by typing echo ; I did that, and was greeted with the names of the few files on the root file system.
Achieving the shell prompt is one of the major achievements of any operating system porting project. I informed my colleagues, and we knocked off and had a beer, proud of our achievement. Next month, we'll cover debugging and further development.
Jim Paradis works as a Principal Software Engineer for Digital Equipment Corporation as a member of the Alpha Migration Tools group. Ever since a mainframe system administrator yelled at him in college, he's wanted to have a multiuser, multitasking operating system on his own desktop system. To this end, he has tried nearly every Unix variant ever produced for PCs, including PCNX, System V, Minix, BSD, and Linux. Needless to say, he likes Linux best. Jim currently lives in Worcester, Massachusetts with his wife, eleven cats, and a house forever under renovation.