Parallel Computing Using Linux
Parallel computing involves the design of a computing system that uses more than one processor to solve a single problem. For example, if two arrays with ten elements each must be added, two processors can be used to compute the results. One processor computes the sum of the first five elements and the second processor computes the sum of the second five elements. After the computation, the results from one processor must be communicated to the other processor. Before starting the computation, both processors agree to work on independent sub-problems. Each processor works on a sub-problem and communicates when the solution is available. Theoretically, a two-processor computer should add the array of numbers twice as fast as a single-processor computer. In practice, there is overhead and the benefits of using more processors decrease for larger processor configurations.
Obtaining a Unix workstation for the cost of a PC has been one of the benefits of using Linux. This idea has been carried a step further by linking together a number of Linux PCs. Several research projects are underway to link PCs using high performance networks. High speed networking is a hot topic and there are a number of projects using Linux to develop a low latency and high bandwidth parallel machine. (One URL is http://yara.ecn.purdue.edu/~pplinux.)
Currently, there is not much high level support for shared memory programming under SMP Linux. The basic Linux mechanisms for sharing memory across processors are available. They include the System V Inter-Processor Communication system calls and a thread library. But, it will be some time before a parallel C or C++ compiler will be available for Linux. Parallel programming can still be done on an SMP Linux machine or on a cluster of Linux PCs using message passing.
Parallel computing is advantageous in that it makes it possible to obtain the solution to a problem faster. Scientific applications are already using parallel computation as a method for solving problems. Parallel computers are routinely used in computationally intensive applications such as climate modeling, finite element analysis and digital signal processing. New commercial applications which process large amounts of data in sophisticated ways are driving the development of faster computers. These applications include video conferencing, data mining and advanced graphics. The integration of parallel computation, multimedia technology and high performance networking has led to the development of video servers. A video server must be capable of rapidly encoding and decoding megabytes of data while simultaneously handling hundreds of requests. While commercial parallel applications are gaining popularity, scientific applications will remain important users of parallel computers. Both application types are merging as scientific and engineering applications use large amounts of data and commercial applications perform more sophisticated operations.
Parallel computing is a broad topic and this article will focus on how Linux can be used to implement a parallel application. We will look at two models of parallel programming: message passing and shared memory constructs.
Conceptually, the idea behind message passing is simple—multiple processors of a parallel computer run the same or different programs, each with private data. Data is exchanged between processors when needed. A message is transmitted by a sender processor to a receiver processor. One processor can be either a sender or a receiver processor at any time. The sender processor can either wait for an acknowledgement after sending or it can continue execution. The receiver processor checks a message buffer to retrieve a message. If no message is present, the processor can continue execution and try again later or wait until a message is received. Multiple sends and receives can occur simultaneously in a parallel computer. All processors within the parallel computer must be inter-connected by a network (Figure 1).
Figure 1. A Parallel Computer with Distributed Memory
All processors can exchange data with all other processors. The routing of messages is handled by the operating system. The message-passing model can be used on a network of workstations or within a tightly coupled group of processors with a distributed memory. The number of hops between processors can vary depending on the type of inter-connection network.
Message passing between processors is achieved by using a communication protocol. Depending on the communication protocol used, the send routine usually accepts a destination processor ID, a message type, the start address for the message buffer and the number of bytes to be transmitted. The receive routine can receive a message from any processor or from a particular processor. The message can be of any particular type.
Most communication protocols maintain the order in which messages are sent between a pair of processors. For example, if processor 0, sends a message of type a followed by a message of type b to processor 1, then when processor 1 issues a receive from processor 0 for a generic message type, the message of type a will be received first. However, in a multi-processor system, if a processor issues a receive from any processor, there is no guarantee of the order of messages received from the sending processors. The order in which messages are transported depends on the router and the traffic on the network. To maintain the order of messages sent, the safest way is to use the source processor number and message type.
Message passing has been used successfully to implement many parallel applications. But a disadvantage of message-passing is the added programming required. Adding message-passing code to a large program requires considerable time. A domain decomposition technique must be chosen. Data for the program must be divided such that there is minimal overlap between processors, the load across all processors is balanced and each processor can independently solve a sub-problem. For regular data structures, the domain decomposition is fairly straightforward, but for irregular grids, dividing the problem so that the load is balanced across all processors is not trivial.
Another disadvantage of message passing is the possibility of deadlock. It is very easy to hang a parallel computer by misplacing a call to the send or receive routines. So, while message passing is conceptually simple, it has not been adopted fully by the scientific or commercial communities.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Build a Skype Server for Your Home Phone System
- Validate an E-Mail Address with PHP, the Right Way
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
- Tech Tip: Really Simple HTTP Server with Python




3 hours 11 min ago
3 hours 19 min ago
5 hours 33 min ago
8 hours 3 min ago
18 hours 6 min ago
22 hours 33 min ago
1 day 2 hours ago
1 day 2 hours ago
1 day 5 hours ago
1 day 5 hours ago