Porting LinuxBIOS to the AMD SC520: A Follow-up Report

Getting the board Flashed led to some interesting detective work for the LANL team.
First Contact

The Flash is burned. The serial port works. Let's plug it in.

We also had to modify the SC520 startup code to mimic the setup of the PAR registers. With this set of changes made, we got our first serial output:


LinuxBIOS-1.1.8.0Fallback Tue Jun 14 13:36:22 MDT 2005 
starting... 

Copying LinuxBIOS to ram. 

Jumping to LinuxBIOS

Well, it's a start. For the record, this is version


LinuxBIOS@LinuxBIOS.org--devel/freebios--devel--2.0--patch-45. 

What's going on now? What does jumping to LinuxBIOS mean?

What this all means is the ROMCC-based code is working, but the SDRAM is not. Because the SDRAM is not working, the GCC-compiled code doesn't work either. It's time to put in some printing. It's also time to scan carefully the src/cpu/amd/sc520/raminit.c code for errors. As of this version, this code still is pretty ugly, as it came from assembly code. Quick perusal does show a few errors, but prints are the best bet at this point. It is hard to tell what is really going on at times.

Here is the output from this version:


LinuxBIOS-1.1.8.0Fallback Tue Jun 14 16:29:46 MDT 2005 
starting... 

HI THERE! 

sizemem 

NOP 

And then it resets. For reference, I have committed this version as patch-46. See the raminit code to see where this is blowing up.

At this point, we had to do a bit more digging. We noticed in the AMD assembly code that although a lot of byte registers are used to control various things, some of the assembly seems to use word writes. Even for a byte-wide register that has another register right after it, the code uses word writes.

We moved to word writes and things got much better. Once it is all working, for the sake of cleanliness, we're going to try to turn these back into byte writes. Word writes make no sense, unless there's a hardware problem.

It's Important to Specify the Correct CPU

We consistently had resets on a certain code sequence, almost as though we were compiling for the wrong processor. Well, as it happened, we were. Although we had set this line in the Config.lb for the mainboard:


arch i386 end

we had a mistake in one of the extra compilation rules. We were telling ROMCC that the the CPU was a P3:


makerule ./auto.inc depends "$(MAINBOARD)/auto.c 
option_table.h ./romcc" action "./romcc -mcpu=p3 -O 
-I$(TOP)/src -I. $(CPPFLAGS) $(MAINBOARD)/auto.c -o $@" 
end 

What's the problem with doing this? In short, when we specify P3 as the CPU for ROMCC, ROMCC generates MMX instructions to use those extra registers. This usage causes trouble, as there are no MMX registers on a 486.

We modified the line as follows:


makerule ./auto.inc depends "$(MAINBOARD)/auto.c 
option_table.h ./romcc" action "./romcc -mcpu=i386 -O 
-I$(TOP)/src -I. $(CPPFLAGS) $(MAINBOARD)/auto.c -o $@" 
end 

Things suddenly got much, much better.

Finally, Getting into LinuxBIOS

But how do we tell? The code that copies the LinuxBIOS RAM part is assembly. It says jumping to LinuxBIOS, but all we see is POST EE. We're going to give you an overview of how you might debug for a new platform. What we're going to do is bypass most of LinuxBIOS that occurs outside of the ROMCC code. Mostly what this code does is uncompress the GCC code and copy it to SDRAM. This code can be hard to follow, however, so we're going to skip it completely.

We're going to make the code that gets copied to RAM be uncompressed rather than compressed, which will take more space. So we need to use as much of the Flash as we can. We're going to need to make Flash map in at 0x2000000. In the auto.c code, we're going to copy that Flash to RAM. Finally, in the code, we're going to insert a few loops like this one:


1: jmp 1b

so that if the machine hangs, we know it got to the infinite loop.

Let's take it one part at a time. In our src/cpu/amd/sc520/raminit.c file, we add the following:


*par++ = 0x8a020200; 

/*PAR15: BOOTCS:code:nocache:write:Base 0x2000000, size 
    0x80000:*/ 

You can see that code in there even now. This maps in the FLASH at the 32MB location. Next, we set up LinuxBIOS so that the GCC payload is uncompressed. How do we do this? First, we need to explain memory layout. A number of variables control Flash layout in LinuxBIOS, as shown in Figure 2. Notice that each set of variables can be changed for each payload. In our example, at this point, we are using only one payload, so we show the variables for that case.

Figure 2. ROM Sizing Controls

In src/mainboard/digitallogic/msm586/Options.lb, we set CONFIG_COMPRESS to zero. We set the ROM_SIZE to 128K, and we set ROM_IMAGE_SIZE large enough to hold the uncompressed payload. If you look at various patch levels of LinuxBIOS in the repository, you can trace our progress in debugging; space does not allow it all here. We've left the appropriate code in auto.c between ifdefs so you can see how it looks. A word of warning: care must be taken with volatile. Romcc is a good compiler. If you're not careful with volatile, it gladly will optimize out copy-assignment loops.

Also, in crt0.s, we did some playing around. Here's a useful assembly sequence for telling you where you are and making sure you get to see it:


_start: movb $0x12, %al ; outb %al, $0x80; jmp _start

We do verify that in the assembly, we're getting to hardware main. So in hardwaremain(), we put in a call to post, followed by a while(1), and we do see the system hang at that point.

The next step is to test a back-to-back post(). In other words, we call post() twice. Why is this important? It verifies that the stack is working too. To this point, all we've done is call functions; we haven't really found out about returns. Calls can work always, but returns rely on a stack that works. If memory is not correctly set up, a return will fail. We have, in the past, had a sequence of function calls that worked fine until the first function exit, at which point the system failed. Memory really can be this tricky. Back-to-back calls to post(), though, verify that we have a working stack.

The key idea here is the careful placement of so-called "halt-and-catch-fire" instructions, with a little bit of output, can allow you to pinpoint how far you are getting in the code.

To make a long story short, we got caught by our own error in the config file. We forgot to tell LinuxBIOS what kind of console we have. This is fixed easily. In src/mainboard/digitallogic/msm586seg/Options.lb, we add:


uses CONFIG_CONSOLE_SERIAL8250

default CONFIG_CONSOLE_SERIAL8250=1

Now, do we have a console? Let's see.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

MMCR can be moved...

AndrewD's picture

Quote:
"Probably the single biggest problem is the location of the configuration registers, placed right in the middle of the top 2MB of memory"

You can map the MMCR to any 4k offset in the lower 1G by writing to the CBAR register. 0xFFFEF000 is just the default location, so I would not see that as a flaw.

I ended up using rolo instead of Linuxbios on a SC520 based product I designed a few years ago. U-boot is another very powerful option, but they are both more targeted to embedded products.

MMCR can be moved ... but not removed.

Stefan Reinauer's picture

The problem in this case is that even if you "move" the MMCR, it is still sitting at its old position 0xFFFEF000, in addition to the new one.

Code sample

Craig Ringer's picture

The code snippet is broken - presumably due to mangling by a CMS. Given the extreme difficulty I'm having in getting this godawful commenting system to not mangle these code snippets, that's my working theory. LJ staff, please fix your CMS so it doesn's strip spaces in >code<>/code< blocks! I had to use   in a >code<>/code<block, which is (a) gross and (b) should probably not be interpreted as an entity, but rather as a literal.

In addition to the total loss of indenting, I found two other problems. First:


bios = mmap(0, size, PROT_READ, MAP_SHARED, fd_mem,
off_t) (0xffffffff - size + 1));

looks like it should be:


bios = mmap(0, size, PROT_READ, MAP_SHARED, fd_mem,
           (off_t)(0xffffffff - size + 1) );

and:


#include #
include

should evidently be:


#include
#include

With those changes it builds OK here. The dump doesn't look too interesting on this system - probably not a flash image, anyway - but it's an AMD64 box so it's quite likely the flash is mapped to somewhere different. Any idea what address it might be at?

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix