Why Application Servers Crash and How to Avoid It
To fix this problem, two possible solutions seem interesting at first glance: add more RAM in the application server machine or increase the speed of the login check in the security server. Adding RAM is only a false solution as, even if it fixes the problem temporarily, it will be used up entirely again later because the site has a growing popularity. The security server is not a solution (in this example) because it is in a black box that comes with the explicit condition that it not be messed with. Neither of those two solutions is a good one.
A technically effective solution would be to rewrite the application in a lower-level language, one that is much less memory-hungry. The memory footprint of each concurrent request or session could be reduced from 1MB to something like 100KB (using C or C++). But, this would take too much development time and thus is not a practical avenue.
There is a way to keep the same object-oriented technology if the application server product offers enough flexibility. The developers can add a pool of request processor resources (RPRs) in the application server application. An RPR is an abstract resource, a proxy resource like a ticket that allows the use of a memory-hungry data object or series of objects. An RPR does nothing by itself. What is useful is that the application server manages a pool of RPRs and has a FIFO queue through which all incoming requests must go before acquiring an RPR and the huge memory resource for which it stands. If all RPRs in the pool are in use, the request should wait in the queue, which follows a FIFO discipline (first in, first out). With this technique, the memory usage can be predicted and controlled.
This technique does not accelerate the security server nor the application server. What it does is prevent the application server from pushing the whole machine into an out-of-memory scenario. Basically, this approach simplifies the problem of response time: we are now back at a simpler situation (with a normal distribution of service time), and eventually, the users will get responses and the machine will not die (assuming that the HTTP server has plenty of RAM for many concurrent requests and that HTTP servers are much less memory-hungry, it should be okay).
The following list summarizes the suggestions explained previously:
Get a good database server and fine-tune the queries and the database at the physical level (your DBA person might be more important than the database product itself).
Make sure that the application server product you select provides fast and stable throughput.
Remember that if something is slower downstream, the server at the front end will accumulate a very large number of requests in memory (fast downstream processing is an asset).
If your application server uses a memory-hungry, object-oriented language, add a pool of Request Processor Resources in your application to control the use of resources right away when the requests enter the server.
Keep it simple. If you have a cascade of application servers before a request is finally processed by the database server, your application will never be fast (no matter how fast the machines are: a hop is a hop and an I/O remains an I/O).
Measure everything. Time stamp each phase of the processing of a request and write that to a log. You will be able to compute interesting stats on normal days, and on bad days you will be able to find where things went wrong and correct the situation after a much shorter analysis.
If a web page or a graphic element can be stored statically, without being impractical, do it.
Do not make a request to the security server for each static web page or element (find another way to guarantee the required level of security in the HTTP server environment).
Evaluate security server products yourself, and reject all products that are too slow for the performances you expect to need in the next five years.
Plan for success now: expect extremely large loads.
Consider that a simple and fast architecture can be less expensive and more robust than a fancy distributed computing solution relying on a leading-edge software/hardware solution. Test the products first, believe the hype later.
Data and Computer Communication, 6th Edition by William Stallings. Prentice Hall, 1999.
"A Practical Guide to Queuing Analysis" by William Stallings. BYTE, February 1991, pp. 309-316.
John T. Francois has been developing transactional systems since 1981 and gateways and servers using various protocols since 1985. He has been working with C and UNIX since 1987.
Free DevOps eBooks, Videos, and more!
Regardless of where you are in your DevOps process, Linux Journal can help!
We offer here the DEFINITIVE DevOps for Dummies, a mobile Application Development Primer, and advice & help from the expert sources like:
- Linux Journal
- High-Availability Storage with HA-LVM
- DNSMasq, the Pint-Sized Super Dæmon!
- March 2015 Issue of Linux Journal: System Administration
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- The Usability of GNOME
- PostgreSQL, the NoSQL Database
- Linux for Astronomers
- You're the Boss with UBOS