Raising the Bar: Improving the Ultimate Linux Box

Ramping up the numbers with better serial-ATA drives and a better graphics driver, plus a new graphics benchmark.

I've noted in my last two articles that the configurations I tested weren't exactly optimal. Because this is supposed to be the Ultimate Linux Box, I decided to see just how close to optimal I could get--and I'm pleased with the results.

The fine folks at Monarch Computer Systems sent me a set of four Western Digital Raptor 10kRPM serial-ATA drives, plus a set of Red Hat 8.0 CDs. The original system had three Seagate drives that spun at 7200 RPM and Red Hat 9--fewer, slower spindles for RAID 5 to work its magic and a version of XFree86 that isn't compatible with ATI's proprietary drivers. These two improvements should bump up the testbed's already nice performance to decidedly snappy.

We'll deal with the drives first. When unwrapping the Raptors, the first thing that caught my attention was the heat sink looking design of the left side of the case, as you face the business end. I don't know if this is put there to be functional or if it simply looks cool, but it certainly caught my eye. The second thing I noticed, as I considered installing four of these hotrods in what had been a three-drive system, was not only did they have the standard S-ATA power connector but an auxiliary (legacy) Molex power connector as well, right where it should be. This inclusion makes things easy. I extracted the drive cage (two screws in the Lian Li case), removed the Seagate drives with their horizontal-mount adapter and laid them aside. There is only room enough to mount three drives horizontally in the lower cage, but five can be mounted vertically. With a little fiddling, I got data and power sent to all four drives; Monarch thoughtfully included a fourth data cable.

With power on the system, I dropped into the 3Ware BIOS and built a new RAID 5 array. The array build seemed to go awfully fast. I dropped the first Red Hat 8 CD in the drive as the build neared completion, and the computer automatically dropped into boot. I selected for a nearly everything custom install, then sat back to watch the fun. The install, however, didn't go any faster than usual; I suspect I maxed out the sustained read rate on the parallel IDE controller. Half an hour later, I saw a root prompt. Now for some fun.

Tiobench reveals some surprising numbers. While the Dell SCSI system I mentioned the last time we did drives still owns the 3Ware/Raptor combo in some areas, the marked performance improvement in adding a fourth spindle and cranking things up to 10kRPM enabled the ULB to make the SCSI box look bad in the multithreaded sequential read department. Some comparisons:

Sequential Read

Dell Thread Rates

ULB Thread Rates

1

95.18

81.60

2

26.39

58.07

4

23.20

54.37

8

27.14

55.29

Things were similar in the random read department; I ended up with a 3.70 to 2.17 advantage at 8 threads. At only 2 threads, though, the advantage wasn't much. SCSI still owns the random writes department with a steady 30something mb/sec rate--until you get to 8 threads. Then, the ULB edges out at 21.87 to 18.76; this is a drastic improvement over the 10-13mb/sec rate achieve with the old configuration. Random writes on the ULB still don't come close to SCSI, but they improved from an average 0.46 to 0.63; SCSI hovered around 4.88. Not too many applications are heavily into random writes, however. In all other areas, as you scale up, serial-ATA becomes the faster technology. Now we're getting to something we can call Ultimate. And at a street price of $159 (thanks, Froogle), perhaps you now can get good, fast and cheap in disk drives as well as you can in operating systems.

In the graphics area I discovered a new benchmark. Chromium is an OpenGL enabled scrolling space shooter game that comes with Red Hat. (It's also available for Debian.) Chromium has a handy frames-per-second display much like Quake, but Chromium's is a lot less trouble. Plus, it's Artistic licensed. Chromium with the free RADEON driver scored a painfully slow four frames per second. Let's see if we can improve that, shall we? I popped over to ATI's site, got into the drivers section, located the driver for XFree86 4.2 (which is what comes with Red Hat 8.0) and was greeted with a registration screen. I fed it what I considered appropriate data, and 5MB worth of RPM later, I was ready to rock. The usual rpm -Uvh was greeted with a conflict on the OpenGL library, but the README on the web site said to expect that, so I added --force and tried again. This time I had a successful install. The RPM's postinstall script generated a new fglrx kernel module on the fly--NVIDIA should take notes. I then ran the fglrxconfig utility, which looks a lot like xf86config, then chose the appropriate options and restarted X. The driver did have options for Xinerama (going dual-head) in the config tool, but as I noted before, Xinerama and DRI are mutually exclusive. The only way I can see to do multiple screens off the same card in accelerated mode is to make two separate X sessions, complete with mouse and keyboard. Cranking Chromium again netted a right snappy 51fps--that's more like it! For comparison, my GeForce 2 MX netted 20fps; all testing at 1024x768 in an X window with the eye candy set to high.

The fglrx package also comes with a little application called fireglcontrol that allows you to configure dual-headedness and the X Gamma of your monitor(s) from within X itself. On the other hand, redhat-config-xfree86 has no idea what you've done to your X configuration; it registers unknown driver, unknown monitor. That's okay for fglrxconfig, but it may not go over so well if you're sending a system to someone who knows just enough to be dangerous. On the other hand, because this is supposed to be a high-end workstation, that possibility may not be much of an issue. It's still something a good admin or support tech should be aware of.

So, you're probably wondering, is he ever going to do the soundproofing article? Well, yes he is. In the same box with the Raptors and the Red Hat CDs was a new fan for the testbed. The back case fan had noise issues, and Monarch was happy to replace it. Unfortunately, for both space and time issues, I can't cover it this week. By next week, though, we should have a nice, fast, quiet testbed system. And, I'm told that the Real Thing is in-house at Monarch and awaiting final configuration. I'm also told to Expect Great Things from it.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

GL Benchmarking

Anonymous's picture

Why not use glxgears to benchmark performance? I'm getting 3200 - 3600 FPS with a Quadro4 700 XGL using NVidia's driver running in Twinview 2560x1024x24bpp.

Chris

Re: Raising the Bar: Improving the Ultimate Linux Box

Anonymous's picture

I've done a fair amount of raid benchmarking, and have found that where well supported (3ware does it well) Raid 10 is far superior to Raid 5. In fact, 3ware Raid 10 with 6 or 8 drives is especially speedy.

My tests with these SATA WD Raptors have been on Raid 0, and while traditionally that's too risky, the performance far surpasses 15K SCSI 320 drives, and you can reduce the risks with backups.

Note the MTBF of these drives, and you sort of stop worrying about the fragility of Raid 0. Since this is an ultimate workstation, faster is definitely better.

Re: Raising the Bar: Improving the Ultimate Linux Box

Anonymous's picture

Note the MTBF of these drives, and you sort of stop worrying about the fragility of Raid 0. Since this is an ultimate workstation, faster is definitely better.

That's one of the most idiotic statements I've ever seen. My guess is you haven't even reached high school yet.

I've had two IBM drives, with astronomical mtbf ratings, fail in the same week. One was backing up to the other, and when the first one failed, I shut everything down, and figured out how to get data onto a new backup drive. In accessing the unfailed IBM drive, reading all the sectors, caused the second drive to fail. Problem specific to IBM GXP hard drives. See tech report for the details. I've also had Maxtors fail on me, the drives also with fantastic mtbf rates. The Maxtors were after the IBM drives, and I was ready. I had an adaptec raid 5 setup, and managed to salvage the data after the first maxtor failure, and an adaptec raid 1 setup during the second failure, where I managed to salvage data again.

Others should note that there are reported incidents where failed drives were returned to manufacturers, the drives were repaired, or parts replaced, and the drives were then allegedly resold as refurbished drives, or used as warranty replacements, were data from the original customer was discovered/recovered from the drives. Some drives fail, and the fix is to replace the electronics on the outside of the drive, and then use that as a warranty replacement. In a raid 5 setup, there is no concern about returning a drive for warranty service and exposing your financial or business or personal data to others. You pull the drive after the hot spare builds, then replace a new drive as the new hot spare.

Running raid 0 on any kind of workstation is sheer stupidity unless you use a different computer on the network to save all data, with a raid 1 or raid 5 with hot spare, to backup the drive, along with regular backups onto optical media or tape.

Let me guess...you also prefer to overclock and water cool your intel cpu, rather than buy a cheaper and faster amd cpu simply so you can brag about your overclock achievement in your sig.

Hit the nail on the head, didn't I?

Snotty comments

Anonymous's picture

Have you reached high school yet? If you had, perhaps you'd know how to discuss issues in a civilized manner. Disagree with the poster's statements if you must, but drop the sneering, sarcastic tone and try treating people with respect.

If you're unable to manage that, make the Linux community a better place by finding something else to spend your time on.

Re: Raising the Bar: Improving the Ultimate Linux Box

Anonymous's picture

"This time I had a successful install. The RPM's postinstall script generated a new fglrx kernel module on the fly--NVIDIA should take notes."

Um, actually, the installer nvidia have used for their last two releases both downloads and compiles the kernel module on the fly. All the end-user still needs to do manually is change "nv" in the XF86Config-4 file to "nvidia" (though you'd think they could automate this...)

Several questions

Anonymous's picture

- Why redhat ? It has i386 binaries. Latest Mandrake/Suse have i586 binaries. And Gentoo would be even better (will give athlon-xp binaries for athlon-xp processor).

- The http://www.ati.com/support/driver.html page is outdated.

1. Latest ATI drivers are from http://www.schneider-digital.de/html/download_ati.html where XFree86-4.3.0 drivers are available. So the author could easily have used redhat 9 with these drivers.

2. The author could also have used the trick mentioned in http://users.actrix.co.nz/michael/radeon9200.html to get the DRI drivers (in redhat 9) to give 3D accel with a radeon 9200. That would be a true open-source solution.

Re: Several questions

Anonymous's picture

Please, shut up when you don't know.

1) On anything who is not a true Pentium, binaries optimized for it will run slower than binaries optimized for the 386. The reason is Pentium's weird pipelining and a time table who seems to differ wildly from all other processors (if A is the smart choice on Pentium and B on the 386 then on an
Athlon, PIII or Pentium IV it is usuall B)

2) RedHat ships binaries who can be executed on the 386 but they are optimized for the PII/PIII family. Mandrake ships binaries who are supposed to use Pentium instructions while optimized for the PII/PIII
however gcc is not smart enough for that. It can use 386 instructions when optimizing for PII/PIII or use PII/PIII instructions when optimizing for the PII/PIII but it looks like it silently reverse to 386 instructions if you tell it to use Pentium instructions and optimize for PII/PIII. When comparing binaries optimized for the PII/PIII one iwth Pentium instructions and the other with 386 instructions the difference not 10%, not 1% but zero, zilch, nada.

3) With ordinay programs (ie no parts in preocessor-specific assembler) and gcc 3.2 compiling programs with full PII/PIII instructions gives really little respective to restricting to the 386 instruction set: 2 or 3% on average.

4) On software who has parts in processor-specific assembler (kernel, glibc) RedHat ships special binaries for PII/PIII and another set for the Athlon. These noyt only have those assmebly parts activated but the C parts are compiled with the full PII/PIII or Athlon instructions. In constrast Mandrake ships only one binary and thus have to restrict to the Pentium versions of the assembly parts. And those have been written for processors who a completely different pipelining and timetable not to mention that the the C parts will not being using the full instruction set.

Re: Several questions

Anonymous's picture

1) On anything who is not a true Pentium, binaries optimized for it will run slower than binaries optimized for the 386. The reason is Pentium's weird pipelining and a time table who seems to differ wildly from all other processors (if A is the smart choice on Pentium and B on the 386 then on an
Athlon, PIII or Pentium IV it is usuall B)

Hey, dumbo. You keep blabbering and showing ur ignorance while not addressing the central issue: will an athlon aptimized binary run faster on an athlon CPU compared to a 386 optimized binary ? Obviously YES. If there is no speed improvement (as you claim) then whole distros like GENTOO must be run by stupid bloke heads. So shut up and clean your brain first.

2) RedHat ships binaries who can be executed on the 386 but they are optimized for the PII/PIII family. Mandrake ships binaries who are supposed to use Pentium instructions while optimized for the PII/PIII
however gcc is not smart enough for that. It can use 386 instructions when optimizing for PII/PIII or use PII/PIII instructions when optimizing for the PII/PIII but it looks like it silently reverse to 386 instructions if you tell it to use Pentium instructions and optimize for PII/PIII. When comparing binaries optimized for the PII/PIII one iwth Pentium instructions and the other with 386 instructions the difference not 10%, not 1% but zero, zilch, nada.

*****, have you ever compiled with options like mcpu and march ? Are you telling me that mcpu=i386 march=i386 is as fast as mcpu=athlon march=athlon on an athlon CPU ? If so, try compiling MPLAYER and do some video benchmarking before displaying any more ignorance.

3) With ordinay programs (ie no parts in preocessor-specific assembler) and gcc 3.2 compiling programs with full PII/PIII instructions gives really little respective to restricting to the 386 instruction set: 2 or 3% on average.

Don't quote random numbers without anything to support them. And if gcc fails to do something does not mean other compilers like icc (intel's compiler) will do as bad a job.

4) On software who has parts in processor-specific assembler (kernel, glibc) RedHat ships special binaries for PII/PIII and another set for the Athlon. These noyt only have those assmebly parts activated but the C parts are compiled with the full PII/PIII or Athlon instructions. In constrast Mandrake ships only one binary and thus have to restrict to the Pentium versions of the assembly parts. And those have been written for processors who a completely different pipelining and timetable not to mention that the the C parts will not being using the full instruction set.

Hello, a couple of athlon/i686 optimized kernel/glibc is no where near good enough. While mandrake may not supply an athlon/i686 optimized kernel, it does supply a whole distro optimized for i586. Also, gentoo supplies an entire distro optimized to athlon (if thats ur CPU architechture), so you can go and lecture them on how dumb they are too. And perhaps you can also ask redhat why they have chosen to supply i686/athlon kernel/glibc in the first place given that its supposed to be only "2 or 3%" better.

Re: Several questions

Anonymous's picture

First of all I should not have been rude to the original poster but I have heard the thing about "binaries for Pentium" one too many times:;

Now about first point:

1) I know there are full distros like Gentoo where you have to spend your valuable time (and the valuable time of your CPU) rebuilding them for your processor. But IMHO they are mostly a scam aimed at people who a) haven't done benchmarks and b) failed the math exam at kindergarten since you will not recover your investement until your CPU has been crunching non-kernel, non-glibc code for a whole year (ie time in kernel, glibc, idle or waiting for I/O doesn't count). At least if you are running a PII/PIII. On an athlon or Pentium 4 the recovery period will be several times lot shorter, at least as long as RedHat and others target the PII/PIII for optimization purposes. I am still sceptical about if it is worth the trouble fiven that in real world a computer will spend most of its time either idle or in kernel/glibc.

2) Yes I have. But I thought that you would understand that I wasn't referring to "-mcpu=i386 -march=i386" vs "-mcpu=i686 -march-i686" but about "-mcpu=i686 -march=i386" versus "-mcpu=i686 -march=i686". The use of special intructions (except for MMX but that requires still another parm) brings little with gcc 3.2: you have five or more times a greater speed gap between "-mcpu=i386 -march=i386" and "-mcpu=i686 -march=i386" than between "-mcpu=i686 -march=i386" and "-mcpu=i686 -march=i686". Also if you use either "-march=i586" alone or in conjunction with "-mcpu=i586" the compiler will effectively use "-march=i586 -mcpu=i586" but if, like in Mandrake, you combine it with "-mcpu=X" with X not being i586 then the compiler silently reverts to "-march=i386 -mcpu=X"

For the MPlayer issue you are just displaying your ignorance: mplayer has significant parts in processor-specific assembler: in fact it uses MMX/SSE instructions. That means that if RedHat shipped it mplayer would be the kind of software they ship with separate packages according to processor and these are ever compiled with -march="processor_targetted".

3) It happens that unlike you, I have benchmarked the effect of compiler flags.

4) Have you ever peered in the kernel code? Then you should know that there is plenty of assembler code and that some of this assembler is either for taking advantage of some processor quirk or is hand optimized for it and using non-386 instructions. That means that if you generate a kernel for the athlon but forcing -march=i386 you will get a kernel who will be much faster on an athlon than a 386 kernel but it will still not run on the 386 (to begin with, because 486 and above kernels expect hardware honoring write protection bits for pages even when in ring 0) and probably not on anything who is not an athlon/Duron.
At this point Redhat (or anyone with common sense) will tell: "since that thing will only run on Athlon we have no reason to restrict it to 386 instruction let's get those two or three percent additional speed we could get with -march=athlon".

And now let's do some math: assuming that my application spends 90% of its time in kernel or glibc (agreed this is abnormally high) and only 10% in the non fully-optimized parts what will be the effect of fully optimizing those 10%? One per cent? Less? What will be the effect when we also account for time waiting for disk I/O or network transfer? Oh, and what happens when your mission-critical application crashes due to either a bug revealed by -march=athlon or a bug in gcc itself? Remember that exotic compiler flags are far less well tested.

There is a way to get a significant speed bonus: rebuild everything with mmx/sse/3dnow turned on. But the traffic on gcc list points at gcc 3.2 still having significant problems with them.

There is also a way to make an application look fast because it loads quickly: build a crippled version. If optional libraries are not present at build time or if you explicitly turned off the feature then autoconf will build a Makefile with those option turned off. Then the application will load blindingly fast since there will be far less loading, dynamic linking, initialization to do. You will impress your friends (or unwary customers) but you will get a GIMP unable to handle most formats or a SAMBA unable to use SSL.

Re: Several questions

Anonymous's picture

And I have to add that my original reply was about the Mandrake/Suse "i586 RPMS" and their supposed superiority (a myth about which I am tired) not about Gentoo and its pupose built software since in fact I missed the part about Gentoo.

Re: Several questions

Anonymous's picture

It would be bad enough if you were rude and knew what you were talking about, but at least it would be forgivable.... rudeness while being ignorant about what you are supposedly talking about is inexcusable.

Re: Several questions

Anonymous's picture

I agree, those that don't know what they are talking about should shut up. So you need to take some of your own advice.

50FPS on a Radeon for a 2D game??!?

Anonymous's picture

50 fps (or 20fps for a GeForce 2) for a 2D scroller under a RADEON (9800?) is completely PATHETIC. I can't believe Linux OpenGL performance is that lousy.

Re: 50FPS on a Radeon for a 2D game??!?

Anonymous's picture

50fps does suck. My AMD 2400+ with GeForceMX2 & Redhat 9 & Nvidia drivers give 50/51fps, so I reckon he's got something set wrong somewhere.

Re: 50FPS on a Radeon for a 2D game??!?

Anonymous's picture

At 1280x960 on Chromium, with graphics settings on highest, I averaged 50 fps (with very little deviation from 50, as in tenths of a frame at most) with an Athlon XP 2400+ / Geforce 4 Ti4200... Which leads me to believe Chromium has a cap on FPS similar to "com_maxfps" like in Quake 3. Besides, you're applying shooter FPS standards to a different type of game. 50 FPS is completely acceptable for Chromium, with no noticeable choppiness. Neverwinter Nights is the same way - it gets lower FPS than shooters do, but it still plays smoothly. In most shooters, I usually don't tolerate anything lower than 60 or 70 FPS, however I find 50 FPS in Chromium completely acceptable.

I play all other available, high-end 3D games (Quake 3, RTCW, UT, UT2003, Neverwinter Nights) at 1024x768 with all detail settings maxed out. Quake 3 averages ~181 fps with my settings...

OpenGL performance in Linux is fine. I would attribute poor performance to the game software (as Linux versions usually get less attention than Windows versions).

Re: 50FPS on a Radeon for a 2D game??!?

Anonymous's picture

I am not sure whether there is a cap on the framerate in Chromium but I got the same results with a lowly Geforce 4 MX440 on an XP 2000... So, I really don't think this benchmark is significant...

Re: 50FPS on a Radeon for a 2D game??!?

Anonymous's picture

There is a cap, so this benchmark is really stupid.

Re: 50FPS on a Radeon for a 2D game??!?

Anonymous's picture

quake 3 averages 181 fps on your monitor that's running on what, 75hz? so 2/3 of your frames are getting dropped per second? what's the point???

Re: 50FPS on a Radeon for a 2D game??!?

Anonymous's picture

Smoothness, in a word. If your graphics card *can* drive the game at 181 fps, and you only ask for 70 or 80 (on a 75Hz screen), then you may rest assured that your CPU has its registers free to correctly turn the incoming packet data into the image of your opponent's helmet being bisected by your railgun shot, instead of waiting on the graphics card to do its magic. :)

Re: 50FPS on a Radeon for a 2D game??!?

Anonymous's picture

http://mirror.ati.com/support/faq/linux.html?cboOS=LinuxXFree86&cboProducts=NOT+SURE&eula=&choice=agree&cmdNext=Next#3d

The above page will tell you that ATI drivers do not have 3d support for the RADEON 9800. So this card must have been a lower performing card.

Re: Raising the Bar: Improving the Ultimate Linux Box

Anonymous's picture

I get 50 FPS with the open source Radeon DRI drivers. I get 5 FPS when I use software Mesa. This make me think you did not have the DRI hardware drivers setup correctly. Make sure the radeon kernel modules is loaded and DRI is enabled.

Re: Raising the Bar: Improving the Ultimate Linux Box

Anonymous's picture

The open source drivers don't support the 9x00 (well, except the 9100, which is really an 8500) series Radeons. I wouldn't be surprised if he has one of these, in which case the closed drivers are the only option.

Re: Raising the Bar: Improving the Ultimate Linux Box

Anonymous's picture

http://dri.sourceforge.net/dri_status.phtml
" Radeons up to R9200 are supported "

Re: Raising the Bar: Improving the Ultimate Linux Box

Anonymous's picture

Looks like your prayers for 250GB 7,200 RPM SATA drives have already been answered, see this article:

WD Announces SATA-Based Caviar SE Drive

Paul Wujek

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState