MOSIX: A Cluster Load-Balancing Solution for Linux

Ibrahim introduces the MOSIX software package and describes how it was installed on an experimental Linux cluster in the Ericsson Systems Research Lab in Montréal.

Software clustering technologies have been evolving for the past few years and are currently gaining a lot of momentum for several reasons. These reasons include the benefits of deploying commodity, off-the-shelf hardware (high-power PCs at low prices), using inexpensive high-speed networking such as fast Ethernet, as well as the resulting benefits of using Linux. Linux appears to be an excellent choice for its robust kernel, the flexibility it offers, the various networking features it supports and the early availability of its IP releases.

ARIES Project

MOSIX

The ARIES Project

With the growth of the popularity of clustering technologies, we decided to start a project that aims at finding and prototyping the necessary technology to prove the feasibility of a clustered Linux internet server that demonstrates telecom-grade characteristics. Thus, a star was born, and it was called ARIES (advanced research on internet e-servers).

ARIES started at the Ericsson Core Unit of Research in January 2000, and its objective was to use the Linux kernel as the base technology and to rely on open-source software to build the desirable system with characteristics that include guaranteed availability and response time, linear scalability, high performance and the capability of maintaining the system without any impacts on its availability.

Traffic distribution and load balancing were two main areas of investigation. The strategy followed was to survey the open-source world, check the available solutions in those areas, test them and determine to what extent they meet our requirements for the targeted near telecom-grade Linux internet server.

This article covers our experience with MOSIX, a software package developed at the Hebrew University of Jerusalem. We expose the MOSIX technology, the algorithms developed and how they operate and describe how we installed MOSIX on an experimental Linux cluster. We also discuss MOSIX's strengths and weaknesses in order to help others decide if MOSIX is right for them.

The MOSIX Technology

MOSIX is a software package for Linux that transforms independent Linux machines into a cluster that works like a single system and performs load balancing for a particular process across the nodes of the cluster. MOSIX was designed to enhance the Linux kernel with cluster computing capabilities and to provide means for efficient management of the cluster-wide resources. It consists of kernel-level, adaptive resource sharing algorithms that are geared for high performance, overhead free scalability and ease of use of a scalable computing cluster.

The core of MOSIX is its capability to make multiple server nodes work cooperatively as if part of a single system. MOSIX's algorithms are designed to respond to variations in the resource usage among the nodes by migrating processes from one node to another, pre-emptively and transparently, for load balancing and to prevent memory exhaustion at any node. By doing so, MOSIX improves the overall performance by dynamically distributing and redistributing the workload and the resources among the nodes in the cluster.

MOSIX Operation

MOSIX operates transparently to the applications and allows the execution of sequential and parallel applications without regard for where the processes are running or what other cluster users are doing.

Shortly after the creation of a new process, MOSIX attempts to assign it to the best available node at that time. MOSIX then continues to monitor the new process, as well as all the other processes, and will move it among the nodes to maximize the overall performance. This is done without changing the Linux interface, and users can continue to see (and control) their processes as if they run on their local node.

Users can monitor the process migration and memory usages on the nodes using QPS, a contributed program that is not part of MOSIX but supports MOSIX-specific fields. QPS is a visual process manager, an X11 version of “top” and “ps” that displays processes in a window and allows the user to sort and manipulate them. It supports special fields (see Figure 1) such as on which node a process was started, on which node it is currently running, the percentage of memory it is using and its current working directory.

Figure 1. QPS Supported Fields

Since MOSIX's algorithms are decentralized, each node is both a master for processes that were created locally and a server for processes that migrated from other nodes. This implies that nodes can be added or removed from the cluster at any time, with minimal disturbances to the running processes.

MOSIX improves the overall performance by better utilizing the network-wide resources and by making the system easier to use. MOSIX's system image model is based on the home-node model. In this model, all the user's processes seem to run at the user's login-node. Each new process is created at the same node(s) as its parent process. Processes that have migrated interact with the user's environment through the user's home-node, but where possible, they use local resources. As long as the load of the user's login-node remains below a threshold value, all the user's processes are confined to that node. However, when this load rises above a threshold value, some processes may be migrated (transparently) to other nodes.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix