An Event Mechanism for Linux
In the traditional programming model, software components explicitly synchronize with others. This is the common model when a lot of interaction is required. For example, the typical approach is to use select() or poll() to listen to file descriptors. A generic implementation of select scans the entire array of descriptors. This is not scalable because the time it takes to detect activity on a descriptor is proportional to the size of the array. This increases the application latency and leads to a decrease in the overall system performance.
Scanning an array of descriptors or waiting for data consumes processing time. A common idea in the design of efficient algorithms is to handle system events asynchronously. Some examples of mechanisms that provide event notification to user-space applications are the POSIX AIO, epoll or the BSD kqueue (see Resources).
When describing the efficiency of such mechanisms, it is common to compute the average time it takes from the moment an event is detected in the kernel to the moment it is effectively handled by the application. One of the main reasons this is done is that micro-benchmarks for this type of method are not relevant. Such mechanisms can be quite efficient locally but inefficient when combined with other mechanisms not well adapted, such as multithreaded architectures. As an example, many web servers use a pool of threads that is started when the application is launched. A typical architecture is to use one dedicated thread to manage incoming connections and one thread per transaction. Usually this design is efficient for a low number of incoming connections but is inefficient when the load goes higher.
Multithreaded applications are needed when a high level of concurrency is required between objects competing for the CPU. Well-known examples are found in high-performance computing applications where the speed of execution of every thread is important, but the number of threads run is not high.
Threads provide a sequential and synchronous model of development, and they have become the standard way of implementing applications when a high level of concurrency is needed. But flaws in the design of applications or flaws in handling synchronizations easily can create system contentions and affect the overall system performance. J. Ousterhout, in “Why Threads Are a Bad Idea”, established that programming with threads is quite difficult and mainly leads to applications unable to execute properly under high loads.
No competition between threads exists in telecom applications. But concurrency occurs when handling common objects, such as distributed data structures. For these applications, threads are needed to provide concurrent accesses to shared data.
Telecom applications are used to handle thousands of transactions per second and hundreds of simultaneous connections on the same processor. In addition, system events, including database accesses, applications faults, overload notifications, alarms, state change of system components and so on, must be taken into account. Thousands of events can be generated in the same system during the execution of an application, so managing events with threads would be inefficient.
Traditional asynchronous mechanisms try to solve this scalability issue by preventing applications from waiting unnecessarily or, like epoll on Linux, they aim to improve the detection of active descriptors. Unfortunately, these solutions are limited to file descriptors, which represent only a fraction of the events of interest. Also, starting a huge number of threads, as needed for web servers to handle these events, would create a bottleneck and aggravate the situation.
The development of complex distributed software architectures demands the implementation of a mechanism that is suitable to take advantage of system resources at runtime. A promising solution that is more appropriate to address this issue is the introduction of an event-based mechanism in Linux. Such mechanisms enable a real cooperation between the operating system and the applications. They provide components able to register for events that can be asynchronously notified later, through the execution of handlers.
If we compare signal handlers and event handlers, we find the latter more informative because they bring the data directly to the application. Basically, an asynchronous event mechanism can be used to implement generic user-level handlers triggered by system events or to implement periodic monitoring components, like timers. The first case is particularly interesting if an application doesn't know when an event occurs. When receiving events asynchronously, the application can take action without recovering all the necessary data because it is supplied in parameters.
Some investigations already have been done regarding fast message-passing mechanisms, which are based on the same principles as asynchronous events. For example, active messages (see Resources) execute asynchronously on the stack of the receiver process. In pop-up threads, a thread of execution is created for every handler, and in single-threaded upcalls, a dedicated thread is created on each processor. AEM is an emerging mechanism that offers a native environment for the development of applications requiring real asynchrony. For example, we used AEM to implement a native asynchronous socket interface for TCP. In AEM, the choice is made at registration time to define a handler that is executed on either the current execution task or a new thread of execution. Some other research projects have proposed similar solutions to improve web server capabilities under high load (see “A Scalable and Explicit Event Delivery Mechanism for UNIX”, Resources).
The main benefit of the event paradigm is the integration of event handling and thread management in the same mechanism. Concretely, it gives full control to resource consumption.
Performance is really a goal for event-based mechanisms. Decoupling event management from the application permits increased locality by taking advantage of different memory allocation schemes or influencing the scheduler decision. For example, soft real-time responsiveness is ensured by enforcing process priorities depending on pending events.
This emerging paradigm provides a simpler and more natural programming style compared to the complexity offered by multithreaded architectures. It proves its efficiency for the development of multilayer software architectures, where each layer provides a service to the upper layer. This type of architecture is quite common for distributed applications.
Figure 2 illustrates a typical distributed application based on an event-driven model. It is composed of many software components, and a process represents one layer of the application. In distributed applications, a lot of local and remote communications are engaged either at the same level or at a different level.
In many situations such applications have to provide services that must operate worldwide with high performance. It is essential that these applications take advantage of hardware resources and scale linearly with respect to the platform's capabilities.
The design of this software must ensure that no deadlock or race condition is possible between the components. The impact of such design flaws on system integrity can be catastrophic. This situation is difficult to solve when using a multithreaded approach, because it is hard to detect and correct due to the high number of possible configurations. An event-based mechanism reduces the chance of introducing points of failure by controlling the number of threads started asynchronously. It is easier to guarantee atomicity of handler executions, because the mechanism is kept in the kernel.
System resources are limited, and the number of processes that can be started is always limited. At registration time, the alternative is given to choose the type of handler to execute. This permits the production of more robust applications as the load increases. The main advantage for applications is the possibility to mix sequential code and asynchronous code. It then is possible to design applications that exploit capabilities of both strategies.
An event-based framework offers operators dynamic reconfiguration with minimum impact on the system uptime. Hardware hot swap and dynamic software upgrade must be possible without restarting the system. Distributed applications are built from a large number of interacting components, and upgrading such software is a critical operation.
Telecom platforms require 99.999% uptime for all services. The services cannot be stopped during maintenance operations, as this would impact other service platforms and subscriber requests connected to it. Software upgrades must be performed gradually. Event-based mechanisms introduce the potential for such capability to distributed applications. As we can see in Figure 2, there are no direct dependencies between software layers if communication is performed asynchronously. It then is possible to replace some of the application parts without major disturbance.