Linuxjournal.com Down: Incident Report

December 19th, 2001 by Dan Wilder in

SSC head geek Dan Wilder explains the mysterious, but temporary disappearance of the linuxjournal.com web site.
Your rating: None

Linuxjournal.com has experienced a few growing pains. This newly deployed PHPNuke site encountered some hours of operation at reduced capacity and some downtime after having a story linked by Slashdot. Changes are now in place that, it is hoped, will allow the site to weather similar loads in the future. Some hardware has been upgraded, some software optimizations have been installed, and further hardware upgrades are being considered.

At 3:00 yesterday afternoon as I was waiting in line at the Post Office, a call on the cell phone alerted me of problems on our web site. Returning to the office, I found my colleagues poring over mysql errors.

A quick look revealed that the site was running its maximum allowed number of Apache and MySQL processes, too much memory paged out to swap files, and had up to 50 processes in backlog. In a word, thrashing.

Linux isn't famous for thrashing conditions. On the test bench I've tried many times to provoke these by pushing a system to its limits, with not much luck. Notwithstanding, I had to deal with the evidence at hand, and thrashing is what it looked like.

Hoping to avoid at least the mysql errors, we shut down Apache and MySQL daemons, then reduced the number of Apache processes run concurrently. While possibly failing to satisfy some incoming requests, the plan was to keep the server out of its thrashing condition. Initially this looked good. At 5:00 in the afternoon the system was serving pages at a rapid clip, while keeping indices of performance such as process backlog within acceptable levels.

The system was still running significant swapfile activity, but we judged it stable and adjourned to the bar.

Unfortunately something happened late in the evening that upset the web server. Even with the new and lower maximums of web and SQL server processes, the system did indeed thrash, serving up more mysql errors than pages. This condition persisted through the low-demand period of the night, and I became aware of it due to an early-morning phone call.

The next obvious thing to try was a RAM upgrade. A phone call to Rackspace.com, our co-location host, set that one in motion.

After upgrading to the full rated amount of RAM the server would no longer boot. The symptoms suggested hard drive failure. Some time was spent testing and substituting components, including the hard drive and the RAM, before discovering a solution: installing less additional RAM. At this point the BIOS was able to recognize the hard drive and the server was able to boot successfully again. Rackspace technicians indicated this could represent a motherboard defect. Hardware changes were suggested.

Up and running with twice its previous amount of RAM, the server now chugged happily along with 0K bytes swapped. Load as measured in pages viewed per minute was comparable to that observed yesterday.

Back at the ranch, the gang was busy looking through PHPNuke. Before the site roll out we'd added code to implement partial page caching, yielding a large reduction in the number of SQL queries per page. Now we reviewed session management, which produced further improvements. Overall we've managed to reduce, in the last couple of days, the number of queries involved in delivering a page by a little less than half. On top of the previous page caching arrangement this leaves us with at least some hope that the site may be stable under heavy load.

Some of our changes are LJ specific. We are investigating the possibility of submitting the remainder to the PHPNuke project.

Bottom line: for what we're doing with this site, we might well be using a heavier server. For at the next few days we won't be. We've upgraded memory, and are now concentrating on further optimizing PHPNuke. MySQL parameters will also come in for close inspection.

I remain unsatisfied with the thrashing hypothesis. The site went into an error condition some time last night and stayed there through several hours of reduced load. That is not very Linux-like. It doesn't add up. We're still looking for a missing piece.

To be continued.

__________________________


Special Magazine Offer -- 2 Free Trial Issues!
Receive 2 free trial issues of Linux Journal as well as instant online access to current and past issues. There's NO RISK and NO OBLIGATION to buy. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Sorry, offer available in the US only. International orders, click here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Linuxjournal.com Down: Incident Report

On January 4th, 2002 Anonymous says:

the problems are that phpnuke is almost 99% sql based, there isn't a whole hell of a lot of static content if any.. therefore under heavy loads it barfs and jacks server loads up high.

I run a site which gets around 180-300,000 hits a day, yet i wrote my own news scripts which server half static content.

Re: Linuxjournal.com Down: Incident Report

On December 26th, 2001 Anonymous says:

I am curious to know why you decided to go with the PHP-Nuke solution rather than PostNuke? PHP-Nuke is not known for its stability or elegance of coding and much of the reduction of database calls and clean up have already been done in Postnuke.

Re: Linuxjournal.com Down: Incident Report

On December 26th, 2001 Anonymous says:

If you don't mind my asking, what was the old and new hardware config and how much traffic (roughly) did it handle? I'm building some new servers for a new web application similar to PHPNuke in perf needs and I'm curious.

Specifically, I'm considering:

1xAMD Athlon ~1.4Ghz

2U case (for extra cooling, tried the AMD in 1U and had lockup problems.)

2x100GB 7200 RPM IDE HD. (Cheap and big, and RAM is cheap for perf-caching)

4GB RAM

Any thoughts/comments from anyone on this config? Will it work well, or are dual PIII worth the extra cash? (Not really considering P4 at all, and not really considering AthlonMP -- should I be?)

Thanks!

Re: Linuxjournal.com Down: Incident Report

On December 23rd, 2001 Anonymous says:

Well - I'm sure you've already considered this , but since I don't want to be accused of not stating the obvious:

- What kernel (specifically - which VM) are you using?

Re: Linuxjournal.com Down: Incident Report

On December 25th, 2001 detroit_dan (not verified) says:

2.2.mumble.

We don't believe it is a VM problem. At this point we're

looking in a couple of other directions. Including the

possibility that we're just asking too much of one poor

little server, and that we need to either shed load or

spread load.

We believe we know what locked up the other night,

and why. We may share the details, later.

--

Dan Wilder

Re: Linuxjournal.com Down: Incident Report

On December 20th, 2001 Anonymous says:

I administer two PHP-Nuke sites. sister2sisteronline.org, and a site on my employer's intr-A-net. The semi-work-related site had similar problems to yours...lots of mysql errors and eventually the mysql daemon committed suicide. (I've since quit doing background image rendering jobs on the same machine.)

I'd be interested in that "partial page caching" you implemented, just out of curiosity. How about a feature article on PHP-Nuke and how it's utilized at Linuxjournal.com ?

BTW, my PHP-Nuke server at work is an AST 486/66Mhz with 32 MB RAM.

Re: Linuxjournal.com Down: Incident Report

On December 22nd, 2001 detroit_dan (not verified) says:

We're interested in sharing our PHP-Nuke optimizations.

The developers have indicated interest in what we're

doing, and our work may find its way into the distribution

at some point.

The site is based at present on an older release of Nuke,

and we need to port our changes forward to current release

(among other things) before they're ready for prime time

Meantime, an article is a good idea. We're also interested

in hearing from other PHP-Nuke users who might be

interested in sharing work. We're happy to discuss our

ideas with those who may be interested.

I'll include more information in the follow-on article to be put

up when we've finished resolving the immediate problems.

--

Dan Wilder

Re: Linuxjournal.com Down: Incident Report

On December 20th, 2001 Anonymous says:

PostgreSQL. Mysql has too many limitations. (mainly in that it does table locking rather than rowlevel locking like postgres)

Re: Linuxjournal.com Down: Incident Report

On January 4th, 2002 Anonymous says:

MySQL does support row-level locking and transaction support. Get your facts straight before spouting FUD.

Re: Linuxjournal.com Down: Incident Report

On December 20th, 2001 detroit_dan (not verified) says:

Less important in a write-few-read-many application

such as ours. This trait is shared with lots of other web

apps.

That's one reason, I believe, why MySQL maintains its strong presence on the web, in the face of PostgreSQL's finer-grained locking, transaction support, subqueries, triggers, scripting

languages, and other features not found at present in MySQL.

I'm aware that MySQL offers transaction support when you

use BDB tables, and I believe that the upcoming (??) new

version offers subqueries. An improvement I applaud. No doubt other features will be introduced with time.

While we're interested in PostgreSQL (indeed, I use it at home)

this application is based on MySQL, and we don't mean to

reconsider that decision unless we find reason to believe MySQL

is our problem. We have not found this.

--

Dan Wilder

One Server?

On December 20th, 2001 RoadWarriorX (not verified) says:

Have you thought about scaling out instead of scaling up?

Re: One Server?

On December 20th, 2001 detroit_dan (not verified) says:

You mean spreading the load between multiple

servers?

It's an option. Unfortunately, for our application

it introduces additional single points of failure. We'd

like to avoid this if we can.

--

Dan Wilder

Re: Linuxjournal.com Down: Incident Report

On December 20th, 2001 Anonymous says:

Zope?

Re: Linuxjournal.com Down: Incident Report

On December 20th, 2001 detroit_dan (not verified) says:

We had considered Zope, and had done some prototyping.

Zope is a great product, and we admire it a lot. I would

not hesitate to recommend it. It is written in Python,

which some may prefer to PHP, the language used for

PHP-Nuke. Zope's mission extends beyond that

of PHP-Nuke.

Zope wasn't precisely the hammer we wanted, for this

particular nail. Though we're experiencing some growing

pains with PHP-Nuke, it looks like it'll do what we need done.

While I'm putting in a word or two for other packages, let

me say that PostNuke is another package well worth looking

at, if you happen to be evaluating content management

systems. Your own mileage will certainly vary.

--

Dan Wilder

Featured Videos

Email is one of the least private and least secure forms of communication, although few people realize this. MixMaster is one way to allow secure, anonymous communication even over the very public medium of email. This tutorial will get you started with MixMaster quickly and easily.

In case you were wondering about the fun side of Linux World Expo, we thought we'd give you a peek at our shenanigans. We at Linux Journal love what we do so much, that we can't help but have a ball wherever we go.

From the Magazine

September 2008, #173

Feeling a bit like a Thermian? Never give up, never surrender! Someday, you could go from underdog to top dog. Just take a look at a few of the underdogs we highlight in this issue: Mutt, djbdns, Nginix, Gentoo, Xara and the program voted mostly likely to fail just a few years back—Firefox. If Firefox is not radical enough for you, check out Chef Marcel's column for some more alternatives. Having trouble mapping your program data to your relational database? If so, Rueven Lerner shows you some tricks in his At The Forge column.

Need to run GUI applications on your server in the next state? In his Paranoid Penguin column, Mick Bauer shows you how to do it securely. Kyle Rankin keeps hacking and slashing and shows you a few split screen secrets you may not be familiar with. Finally, we all know what happens next February, but only Doc knows what happens afterward.

Read this issue