The Ultimate Linux Lunchbox

 in
For those of you with carry-on, high-performance computing clusters, please ensure that they are securely stowed underneath the seat in front of you.
The TS7200

The TS7200, offered by Technologic Systems, is a StrongARM-based single-board computer. It is, to use a colloquialism, built like a brick outhouse. All the components are soldered on. There are no heatsinks—you can run this board in a closed box with no ventilation. It has a serial port and Ethernet port built on, requiring no external dongles or modules for these connections. It runs on 5 VDC, and requires only .375A, or roughly 2W to operate. In short, this board meets all our requirements. Figure 6 is a picture of the board. Also shown in Figure 6 is a CompactFlash plugged in to the board, although we do not use one on our lunchbox nodes.

Figure 6. The TS7200, from Technologic Systems, is StrongARM-based, needs no heatsinks and draws only about two Watts (courtesy Technologic Systems).

One item we had to delay for now is putting LinuxBIOS on this board. The soldered-on Flash part makes development of LinuxBIOS difficult, and we were more concerned with getting the cluster working first. The board does have a custom BIOS with the eCos operating system, which, although not exactly fast, is not nearly as slow as a standard PC BIOS.

Building the Lunchbox

There are several factors that determine the shape of a minicluster: the box, the size and shape of the board and the board spacing, or distance between boards. The spacing tends to dominate all other factors and is complicated by the fact that PC/104 was not designed with multiprocessors in mind. All I/O boards in PC/104 stack just fine, as long as there is only one CPU board; we are breaking the rules when we stack CPU boards, and it gets us into trouble every time. On all the miniclusters shown, there was at least one empty board space between the boards. Nevertheless, the process of designing starts with the box, then the board shape and then the board spacing.

First, the box: it's the same box we used earlier. Also, we're going to use the same Parvus SnapStiks that we have been using for years to stack boards. We bought the professional set, part number PRV-0912-71. The SnapStik works well in the lunchbox format. One warning: just buy 1/4" threaded rod to tie the stack together. Do not use the supplied threaded plastic rod that comes with SnapStik kits. That plastic rod tends to, well, “snap” under load, and watching bits of your minicluster drop off is less than inspiring.

Second, the size and shape of the TS7200 nodes: there's a slight problem here. The boards are not quite PC/104: they're a little large. One way to tell is that two of the holes in the TS7200 are not at the corners. In Figure 7, the holes are in the right place, but the board extends out past them, leaving the holes too far in from the edge. The board is a bit bigger to accommodate the connectors shown on the right. These connectors caused two problems, which we will show below.

Third, the stack: the tight spacing was going to make the stack more challenging than previous miniclusters. We would have to find a way to make the SnapStiks work with a nonstandard board form factor and the close spacing.

To solve the SnapStik problem, we spent some time seeing how the supports could fit the board. The best we could find was a configuration in which three SnapStiks fit on three of the holes in the board, as shown in Figure 7. Notice the threaded metal rod, available in any hardware store.

Figure 7. Stack Showing Three out of Four SnapStiks Connected

For the fourth hole, we set up a spacer as shown in Figure 8.

Figure 8. The Spacer in the Fourth Hole

The spacer is a simple nylon spacer from our local hardware store. The bolts and nuts allow us to create an exact spacing between the boards. We needed the exact spacing for the next problem we ran into.

The boards cannot be stacked at exactly a one-per-slot spacing. There is an Ethernet connector that needs just a bit more room than that—if the boards are stacked too closely, the Ethernet connector on the lower board shorts out the Ethernet connector pins on the higher board. The spacing could be adjusted easily with the nut-and-bolt assembly shown above, but how could we space the SnapStiks?

If you look at the Geode cluster shown in Figure 8, you can see some white nylon spacers between the green SnapStiks. That is one way to do it. But that spacing would have been too large to allow 16 nodes to fit into the lunchbox. We needed only about 1/32 of an inch in extra spacing.

Josiah England, who built this version of the lunchbox, had a good idea: small wire rings, which he says he learned how to build while making chainmail. The fabrication is shown in Figures 9–11. The wire rings add just enough space to create enough clearance between the boards, while still allowing us to put 16 boards in the lunchbox.

Figures 9–11. Medieval solution to a 21st-century hardware problem: wire spacing rings constructed chainmail-style (courtesy Josiah England).

With this fix, we now had a stack that was spaced correctly. The stack shown above was finished off with a Parvus OnPower-90 power supply and a Parvus fan board, which you can see at the top. This supply can provide 18A at 5V, more than enough for our needs, as well as the 12V needed for the switch.

Our next step was the Ethernet switch. At first, we tried using several cheap eight-port switches in the lid, as shown in Figure 12. By the way, these miniclusters always include a bit of improvisation. The switches shown are bolted to a shelf from our departmental mailbox. The shelf is a nice, gray plastic and was ideal (once we trimmed it with a hacksaw) for our purposes. Notice the nice finger hole, which can be used for routing wires under the lid. We'd like to think we used the Erik Hendriks mailbox shelf, since Erik's bproc work was so important to our minicluster development. Erik is now at Google.

Figure 12. First try at switches: the gray panel is a mailbox shelf.

The cascaded switches worked very poorly. The nodes would not come up on the network reliably. It all looked great, with 48 LEDs, but it did not work at all. DHCP requests were dropped, and the nodes took forever to come up.

The second attempt was to get a Netgear 16-port switch, remove the switch from the case and put it into the lid. This required that we sacrifice another mailbox shelf, but we have plenty. This change worked fine. The nodes come up very quickly now, as packets are not getting lost.

You can see the final configuration in Figure 13. Notice the two switches: one switch controls power to the Ethernet switch and nodes, and the other controls power to the fan. We're not yet sure we need the fan but we're being careful.

Figure 13. Final design: one of the switches on the gray metal panel, to the left of the Ethernet plugs, controls power to the nodes and the Ethernet switch, and the other one controls the fan.

Regarding Ethernet cables: always label them, and always make it so you can figure out, easily, which one goes into which network switch connector. Put them into the switch in some order, left to right or right to left. Just make sure you can tell, at a glance, which LED on the switch goes with which board. You'll be glad you did.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

yes

farquatdhth's picture

cool

WOW

netnut's picture

Yeah! What a Lunchbox! Amazing what is possible...

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix